Protecting Your Data from Geopolitical Threats: A Practical DLP guide.

Here’s how you can use Microsoft Purview’s Data Loss Prevention (DLP) policies to safeguard your information from unauthorised access today.


Important:

As a best practice, always conduct a business impact assessment first. Doing activities 1 and 2 can disrupt legitimate business operations. Ask yourself:

  • Do we have suppliers, partners, or customers in these regions?
  • Are there ongoing projects requiring data exchange that will go to this region?
  • Could this affect our global workforce or remote employees?

1. Block Risky IP Addresses

Start by implementing IP-based restrictions in your DLP policies. Block known IP addresses from high-risk countries to prevent data exfiltration attempts. This creates your first line of defence against unauthorised access from these regions.

You can do this through Defender for Cloud apps: https://learn.microsoft.com/en-us/defender-cloud-apps/ip-tags

2. Restrict File Sharing to Risky Platforms

Many data breaches happen through seemingly innocent file sharing. Block access to popular file-sharing services hosted in these regions:

Here’s a few popular mail and file sharing sites for the 2 countries mentioned in the Microsoft Security Program post.

Russian platforms:

• Yandex.Disk (https://360.yandex.com/disk/)

• Mail.ru Cloud (https://mail.ru/)

Chinese platforms:

• Baidu Pan (https://pan.baidu.com/)

• Tencent Weiyun (https://www.weiyun.com/)

Configure your DLP policies to detect and block uploads to these services automatically.

You can also create a policy to block uploads to a group of domains, so that end-user will NOT be able to uploaded sensitive data through their devices. The can be configured for Purview Endpoint DLP.

3. Monitor Email Communications

Email remains a primary vector for data theft. Block or monitor communications with popular email services from these regions, including Yandex.Mail, Mail.ru, QQ Mail, and 163.com. Your DLP policies can flag or prevent sensitive data from being sent to these domains.

4. Track Your Data’s Journey

Use Purview Information Protection’s Track and Trace feature to maintain visibility over your sensitive documents. This powerful tool shows you:

• Who’s accessing your protected files

• Where they’re being opened

• When access attempts occur

It’s like having a GPS tracker for your most valuable data.

5. Regular Health Checks with SharePoint Advanced Management

Don’t set and forget. Use SharePoint Advanced Management to regularly review:

• Which files are being shared externally

• Who has access to sensitive documents

• Unusual sharing patterns that might indicate compromise.

Think of it as your monthly data health check-up.

Read up on how SharePoint Advance management works here: https://learn.microsoft.com/en-us/sharepoint/advanced-management


Additional tips:

Tip 1 : Start with monitoring and alerting rather than outright blocking. This lets you understand your data flows before implementing restrictions. You can always tighten controls once you’ve mapped legitimate business needs.

Tip 2: Consider creating exceptions for specific, verified business partners rather than blanket country blocks. This gives you granular control whilst maintaining necessary business relationships.

Remember, technology is only as strong as the people using it. Train your team to recognise suspicious requests and understand why these protections matter.

Recommended solutions when Encryption breaks your workflows

If you’ve read my previous blog post on what breaks when you turn on Encryption with sensitivity labels (read it here: When Encryption breaks reality)

Now we will look into how we can remediate it with these practical solutions that works in the real world.

1. Establish clear data storage policies

The recommended solution: Codify in your Information Security Standards that confidential or sensitive data must NOT be stored in third-party systems.

Why this works: By keeping sensitive data within Microsoft 365’s ecosystem, you maintain full control over encryption, access permissions, and audit trails. Microsoft 365 provides native integration between all its services—from SharePoint and OneDrive to Teams and Outlook—ensuring encrypted documents work seamlessly across your organisation’s approved platforms.

Implementation tip: Create a simple classification guide that shows users exactly which data types belong where. Make it clear that “Confidential” and above stays in Microsoft 365, while “General” business information can live elsewhere.

2. Educate users on platform selection

The recommended solution: Train your end-users on what platforms to use, when to use them, and how to properly share confidential data.

Why this works: Most encryption-related issues stem from users not understanding the boundaries of their tools. When people know that encrypted documents won’t work in Dropbox, they’ll choose SharePoint instead.

Implementation tip: Create simple decision trees: “Need to share confidential data externally? Use secure email with expiry dates. Need to collaborate on sensitive documents? Use SharePoint with guest access controls.”

3. Configure service accounts for automation

The recommended solution: For AI and RPA systems (especially in-house built ones), add the named user accounts that run these systems to your encryption policies as approved users.

Why this works: Many automation systems use dedicated service accounts to access files. By explicitly granting these accounts decryption rights, your automated workflows continue functioning while maintaining security controls.

Implementation tip: Create a dedicated security group for automation accounts and include this group in your sensitivity label encryption settings. This makes it easier to manage permissions as you add more automated systems.

4. Implement data minimisation for BI tools

The recommended solution: Third-party BI tools should not access confidential data directly. Instead, use data minimisation, anonymisation, and masking techniques.

What this means: Data minimisation involves only sharing the minimum data necessary for analysis. Data anonymisation removes personally identifiable information, while data masking replaces sensitive values with realistic but fake data that maintains statistical properties.

Why it’s important: This approach protects sensitive information while still enabling business intelligence. Your sales dashboard can show trends and patterns without exposing individual customer details or confidential pricing information.

Implementation tip: Create sanitised data exports specifically for BI tools, removing or masking sensitive fields before the data leaves your secure environment.

5. Standardise PDF readers organisation-wide

The recommended solution: Ensure all devices run the same, supported version of PDF readers using Intune, Group Policy, or IT deployment checklists.

Why this works: Consistency eliminates the “it works on my machine” problem. When everyone uses Adobe Reader/Acrobat version 22 or later, encrypted PDFs open reliably across your organisation.

Implementation tip: Include PDF reader version checks in your device compliance policies. Set up automatic updates where possible, and create a simple verification script for IT teams to run during device setup.

6. Map your external ecosystem

The recommended solution:: Understand what software your vendors, suppliers, and customers are using before sharing encrypted documents.

Why this works: Knowing that your vendor/ partner/ supplier/ law firm uses LibreOffice or your client prefers Google Docs helps you choose the right sharing method upfront, avoiding embarrassing “I can’t open your file” conversations.

Best practice examples:

  • Maintain a simple spreadsheet of key partners and their preferred platforms
  • Ask about software compatibility during vendor onboarding
  • Include system requirements in your standard contract templates
  • Create partner-specific sharing guidelines for your teams

7. Identify critical platform dependencies

The recommended solution: In connection to item 6, note which critical partners use non-Microsoft platforms that could be impacted by encrypted data sharing, then ensure users know the right channels for sensitive data exchange.

Why this works: This builds on your data storage policies (solution 1) and user education (solution 2) by creating specific guidance for high-stakes relationships.

Implementation tip: For critical partners who can’t handle encrypted files, establish secure alternatives like password-protected SharePoint links with expiry dates, or use secure email gateways that work across platforms.


Two additional best practices you shouldn’t miss

8. Create encryption exception processes

The recommended solution: Establish a formal process for temporarily removing encryption when legitimate business needs arise.

Why you need this: Sometimes encrypted documents genuinely need to be shared with systems that can’t handle them. Rather than having users work around security controls, create an approved exception process with proper approvals, time limits, and audit trails.

9. Implement regular compatibility testing

The recommended solution: Schedule quarterly tests of your encryption policies against your actual business workflows.

Why this matters: Software updates, new vendor relationships, and changing business processes can break previously working encryption setups. Regular testing catches these issues before they impact critical business operations.

Implementation tip: Create a simple test matrix covering your most common document types, sharing scenarios, and external platforms. Run through this checklist each quarter and after major system updates.


Remember: The goal isn’t perfect security—it’s effective security that people can actually use.

When Encryption Meets Reality: What Actually Breaks When You Deploy Sensitivity Labels

Microsoft Purview’s sensitivity labels are brilliant for protecting your organisation’s data—until they’re not. While the encryption capabilities of labels like “Highly Confidential” and “Internal Only” provide robust security, they can also create unexpected roadblocks that’ll have your users reaching for the IT helpdesk.

I’ve previously written several blog post on the subject that you can read

Adding to what I’ve already mentioned above, let’s explore the seven other common issues when encryption through sensitivity labels meets the real world.

1. Third-Party Cloud Storage Platforms

What breaks: Dropbox, Adobe Creative Cloud, DocuSign, and similar platforms

Why it happens: Purview treats these as external environments and blocks access to encrypted content. Your beautifully protected document becomes a digital paperweight the moment someone tries to edit it outside the Microsoft 365 ecosystem.

2. AI and RPA Systems

What breaks: Third-party artificial intelligence tools and robotic process automation systems

Why it happens: These systems need to read and process your data, but encryption renders the content unreadable to external AI engines.

The impact: Your automated processing stops working, chatbots can’t access knowledge base documents, and data extraction workflows grind to a halt.

3. Business Intelligence Dashboards

What breaks: Third-party analytics platforms that pull data from encrypted Excel files.

Why it happens: BI tools can’t decrypt and read the underlying data in your spreadsheets, leaving your dashboards empty or displaying errors.

The impact: Executive reports fail to update, sales dashboards show no data, and business intelligence grinds to a halts

4. Legacy Adobe PDF Readers

What breaks: Adobe versions older than Adobe Reader/Acrobat 22

Why it happens: Older Adobe versions lack the necessary components to handle Purview’s encryption standards.

The impact: Users with older software installations can’t open encrypted PDFs, creating accessibility issues across different departments or external partners.

As per Microsoft the version that supports labelling is version 22.003.20258

Adobe’s official documentation is more update and it shows version 23.003.20201.1ec7624

In my personal experience, I’ve seen devices that has Acrobat 21, 19 and 15 not even be able to open up encrypted PDF files.

5. Online PDF Viewers

What breaks: Browser-based PDF viewers (with the exception of Microsoft Edge)  and 3rd party PDF reader apps.

Why it happens: These lightweight PDF viewers (ex. Nitro PDF and PDFgear) don’t have the decryption capabilities required for Purview-protected documents.

The impact: Document previews fail, web-based workflows break. Users using 3rd party reader apps either is not able to open the files or gets an error message when they open an encrypted PDF.

6. Open Source Office Suites

What breaks: LibreOffice, OpenOffice, and similar free alternatives

Why it happens: These applications lack the proprietary decryption libraries needed to handle Microsoft’s encryption.

The impact: Your vendors, remote branch offices or sub-member firms who runs their own IT systems who are using these free office software suddenly can’t access company documents, creating a two-tier system of document access.

I’ve checked the LibreOffice documentation and could not find any mention of support for RMS.

7. Non-Microsoft Productivity Platforms

What breaks: Google Workspace (Docs, Sheets, Slides) and Apple iWork (Pages, Numbers, Keynote)

Why it happens: Competing platforms don’t support Microsoft’s encryption standards—hardly surprising, but often overlooked during planning and deployment.

You can read more about that here:

Does sensitivity label applied docs can be opened in google docs if I add my google account while applying the label?

Also, Google Workspace has a competing data classification scheme: Enable or disable a classification label which is why I think it not likely that Google will make this cross-platform work.

The impact: Cross-platform collaboration becomes impossible, and BYOD policies clash with security requirements.


If you encounter any of these, read part 2 with my recommended actions/ workarounds.

When Purview SIT NAMES gives me headaches

There are now 326 built-in Purview SIT (as of 07-June-2025). The new addition great but Microsoft needs to do a better job in managing how SIT’s documented and communicated to it’s users. Here’s my quick rant on this matter:

Rant 1: The Update list that is stuck in February 2024

    The official Microsoft page (Sensitive information type entity definitions) hasn’t been updated since February 7, 2024. Seriously? 1 year and five months and counting.

    Why it’s frustrating: We’re stuck creating custom SIT because the docs are MIA. (Although, I did win a project out of creating custom SIT – but still!)

    Rant #2: Inconsistencies in SIT naming

    If you compare the SIT names in the In the official documentation against the SIT names that you get in the Purview portal, the names that are used shows up are different.

    You can see in this image below that in the Purview Portal (left), There are items such as Hungarian Social Security number versus Hungary Social security number.

    Some of the names have an added n (Australian instead of Australia), some have the words Identification spelled out instead of ID. There’s a long list of inconsistencies with the naming.

    Admittedly, from within their XML data, the names are correct. So now I ask the product team, why have the names in the documentation title be different then?

    Rant 3: No announcements when new Built-it SITs are out

    There are no announcement to the Purview community when new built-in SITs comes out. I had to manually check every now and then by exporting a list of SIT and comparing it with previous data. Good thing I keep a track of what comes out (read my latest post on the matter: https://www.linkedin.com/pulse/12-new-sits-available-microsoft-purview-victor-wingsing-uzzze/)


    Advise to Purview admins

    Use the Immutable IDs when deploying SITs. Use the following command to export all the ID in a csv file

    Get-DlpSensitiveInformationType | Select-Object Name, Identity | Export-Csv -Path "SIT_Names_and_IDs.csv" -NoTypeInformation
    
    Change the path name to a location where you want the file to be exported.

    Using the Immutable Identity (ID) is better when deploying your Purview policies as names could easily change but these ID (as the name implies) are immutable. This makes your deployment scripts future-proof.

    When to use Purview Information Barriers and Purview DLP

    At first glance, it’s easy to think that if you have Data Loss Prevention (DLP) capabilities where you have policies monitoring internal data flows, then Information Barriers might be an unnecessary extra. After all, DLP diligently scans every email, document, and chat for sensitive content. This is certainly the sentiment that I often get when talking to Cyber Security teams.

    This made me realise 2 things:

    • Microsoft needs to do a better job in marketing/ promoting Purview Information Barriers and
    • Information Barrier has it’s own purpose that DLP cannot do.

    What is Purview Information Barrier

    Microsoft Purview Information Barrier is designed to restrict communication and collaboration between defined groups within an organisation. It’s primary function is to ensure that teams with conflicting interests (think of trading and research groups in financial services) cannot interact with each other. By enforcing internal boundaries, these policies help maintain confidentiality and avoid accidental data leakage between sensitive departments. (ex. Insider Trading)

    With Purview Information Barrier, you can create a policies that can automatically prevent internal teams from communicating with each other through Microsoft teams. These include the following actions:

    In SharePoint and OneDrive, Information Barriers can prevent the following unauthorized collaboration:

    Capabilities shared by Information Barrier in Microsoft Purview DLP

    You probably noticed that the activities above such as “Sharing a file with another” and “Sharing content with another user” can already be done within Microsoft Purview DLP. In essence, yes, that is correct. An admin can setup a policy to BLOCK these file sharing to another user.

    Where DLP falls short and Information Barriers shine

    While Purview DLP is effective at blocking explicit sending or sharing actions, it misses scenarios where access is already granted, which is where Purview Information Barriers come in to the rescue. DLP policies activate when a user actively sends data, but if sensitive information is already shared through granted permissions, the DLP policy remains dormant. For example, if User A (Finance) adds User B (Sales) as a member to the Finance Teams site or SharePoint site, User B gains immediate access to all files without any explicit sharing event, leaving DLP unable to intervene.

    Alternatively, User A could simply send a meeting invite and start a Teams call with screen sharing, bypassing the trigger for DLP.

    Another example, consider a situation where User A uploads a confidential document to a shared folder that automatically grants access to a broader group—here, Information Barriers would prevent unauthorised viewing by restricting access at the source, whereas DLP would not block the document being placed in that shared location.

    Strategy in using BOTH Information Barrier and DLP

    You should view Purview Information Barriers as a key part of your data governance and protection strategy. Relying solely on DLP leaves gaps that Information Barriers can fill—by preventing risky internal interactions before they even happen. Here’s a few actionable items that you can do today:

    • Start by reviewing your organisation’s internal communication flows to identify potential conflicts of interest and assign segmented rules that restrict who can communicate with whom.
    • Work with your Corporate Communications, Human Resources teams and Legal team to identify when and where to apply restrictions between groups of users.
    • Ensure these barriers align with your overall compliance and governance framework, and conduct regular testing to confirm their effectiveness. Then codify these in your data governance policies
    • Finally, train your teams on why these measures are necessary and how to adhere to them.

    Adopting a dual strategy with both DLP and Information Barriers will provide much stronger data protection stance, reducing the chance of inadvertent data leaks from within.

    References:

    Deep dive in PDF labeling and data protection

    Let’s cut to the chase – PDFs are everywhere in your organisation, and they’re housing your sensitive data. I’m talking about those finalised e-signed contracts, bank statements, and countless other critical documents. While we’re all busy protecting our Office files with fancy security measures, PDFs often slip through the cracks. But here’s the thing – they need the same level of classification and protection as your typical .docx or .xlsx files.

    Here’s the different ways you could label PDF files and simple to follow deployment strategy to enable PDF data classification to your data.

    Labeling PDFs: Three Approaches

    1. Label data natively in Microsoft Office then save it as PDF
    2. Label data using Adobe Acrobat
    3. Label data using the Microsoft Purview In

    Read all the way to the end to see what would happen if you use the “Open in PDF Word” function to an encrypted PDF file.

    Approach 1: Label natively using Microsoft Office then save it as a PDF

    Approach 1: Label Then Save as PDF
    This approach is something you can do now. This method involves applying a sensitivity label directly to an Office document in an application like Microsoft Word, and then saving it as a PDF. Although the label transfers to the PDF, note that if your label incorporates encryption, you must disable the PDF/A option when saving. The resulting PDF will display protection via Purview Information Protection, and its custom properties will indicate the applied label.

    Created a New word document
    Saved as a PDF. The document security shows no security as the label that I used is just a plain label without any encryption.
    Custom values shows the label that I used.

    TAKE NOTE that if your label has ENCRYPTION turned on, then you need to unselect the PDF/A option as you save it.

    The security tab displays that it’s protected by Purview Information Protection.
    The custom properties shows the Privileged/ Protected / Encrypted label used

    Approach 2: Label data using Adobe Acrobat PDF Reader

    Here’s where it gets interesting (and a bit challenging). Most of us view these PDFs through web browsers or PDF readers, with Adobe being the undisputed king of the PDF world. In fact, Adobe’s so dominant that in most organisations I’ve worked with, it’s practically become the default way to handle PDFs – much like how we all say “Google it” instead of “search for it”.

    Unlike your Microsoft Office suite (Word, Excel, PowerPoint, Outlook), Adobe Acrobat doesn’t play nicely with Sensitivity labels. The “solution”? Mucking about in the Windows registry. Yes, you read that right – registry editing. Adobe’s own support documentation lists down the exact steps to do this. Source (Adobe MPIP Support: https://helpx.adobe.com/enterprise/kb/mpip-support-acrobat.html)

    Sure, tweaking the registry is not difficult to do. But imagine rolling this out across thousands of machines in your enterprise. Any experienced IT admin who’s attempted large-scale registry changes will tell you that it’s not fun.

    There is a way to do this via Intune to simplify things. You can read it here from Simon Skotheimsik’s blog: https://skotheimsvik.no/how-to-use-intune-to-enable-sensitivity-labels-on-pdf-files

    Image from: Adobe

    This option is great if you need to add the same Header, Footer or Watermark that you use in your Word, Excel and PowerPoint files to your PDF.

    Approach 3: Label data using the Microsoft Purview Information Protection client

    This client must be installed first to your Windows devices before it would work, you can get it here: https://www.microsoft.com/en-gb/download/details.aspx?id=53018

    Once installed, you now have a tool that can label PDF files and do so much more. There are some limitation to this that you’ll see below. The client application can be launched by right clicking a file and selecting Apply sensitivity label with Microsoft Purview.

    One big benefit of using this client is that you can select multiple files or even an entire folder and mass label them in 1 go. You can use this to MANUALLY label all the files sitting inside a PC or even in a Shared Network Drive.

    The limitation.

    The limitation of using this tool is that you will not be able label data while a PDF is open, there is no label interface inside of Adobe Acrobat, also with this tool cannot apply headers, footers or watermarks. This is by design as the client is an application/ process that applies labels outside of office files. Read it here: https://learn.microsoft.com/en-us/purview/sensitivity-labels-office-apps#when-office-apps-apply-content-marking-and-encryption

    Opening Encrypted PDF in Word?

    This was a question to me by a client: What happens when a user tries to open a PDF in Word?

    Most of us by now know that you can open and edit a PDF in Word (if you don’t know how, please check this: https://support.microsoft.com/en-us/office/opening-pdfs-in-word-1d1d2acc-afa0-46ef-891d-b76bcd83d9c8

    The short answer is that your data is still protected. Here’s what happens when I tried to open an encrypted PDF file in Word.

    Here’s the original PDF file that I have encrypted.

    After using Word to open the PDF. A pop-up prompt asked me select how I want the file to be opened.

    From the Preview window, I can already see that the data is encrypted by Microsoft IRM Services. This gives me confidence that the data is protected.

    Then upon opening the file, all I can see are the hashed data. The text + image in the original file is no longer readable.

    Deployment strategy

    Now that you know how labels works for PDFs. Let’s talk about Deployment.

    Begin with Approach 1 because it leverages familiar tools like Microsoft Word and allows you to secure sensitive PDFs right from the document creation stage. This straightforward step minimises the learning curve and reduces the likelihood of errors, enabling your team to adopt essential security measures immediately.

    Once the basics are in place, invest in user education to ensure proper application and management of sensitivity labels. Training reinforces security compliance and builds a strong foundation, empowering your staff to understand and uphold data protection practices across the organisation.

    After establishing confidence in Approach 1, transition to the Microsoft Purview Information Protection client (Approach 3) to enable scalable, mass labelling across devices and shared drives. This phased progression not only improves operational efficiency and consistency but also sets the stage for introducing more advanced options like registry adjustments (Approach 2) when additional formatting or watermark requirements arise.

    References:

    All Adobe related guides:

    In the center of it all.

    Among all Microsoft Purview security solutions, there’s one that you absolutely must get right. If you don’t, your entire data security strategy could fall apart, no matter what other security tools you’re using.

    This key solution brings together three basic but crucial tasks: finding your sensitive data, labelling it correctly, and keeping it safe. This solution is Microsoft Purview Information Protection (MIP), and it’s at the heart of how you protect your company’s data.

    Why is MIP so critical?

    Think of the Microsoft Purview’s Data Classification service as the system that helps all other security tools know what to do. Here’s how it works with different Purview tools:

    Purview Data Loss Prevention (DLP):

    • Works like a security guard that reads the labels
    • If it sees a file marked ‘Secret’, it knows exactly what protection rules to follow
    • For example: “This is confidential data, so don’t let it be shared outside the company”

    Endpoint DLP (Devices) and Microsoft Defender for Cloud Apps:

    • These tools check the labels whether you’re working on your laptop or in cloud apps like Workday, Salesforce, etc.
    • They constantly ask “What’s this file’s label?” before allowing any action
    • Then they make sure the right safety measures are in place

    Microsoft Purview Insider Risk Management:

    • This one’s particularly clever about using the labels
    • It watches for unusual behaviour with sensitive data
    • For example: If someone suddenly downloads 100 files marked ‘Highly Confidential’, it raises an alert
    • It can then start extra monitoring or take other protective steps”

    Microsoft Purview Data Governance (Data Map)

    • This service uses MIP to help you map and catalog your structured data.
    • It gives you the ability to apply consistent classification across your data estate. You can have a standardised label across your organisation.
    • For example: “A ‘Confidential’ label means the same thing everywhere, making it easier to manage and protect”

    Third party services using MIP

    Even third party servicse leverages on the MIP data classification services.

    Trellix integrates it’s DLP network appliance with MIP: https://docs.trellix.com/bundle/data-loss-prevention-11.11.x-product-guide/page/UUID-5d61c924-38ac-3cb9-fb84-17596363740f.html

    Crowdstrike leverage Microsoft Purview Information Protection labels (page 5 of 7): https://www.crowdstrike.com/wp-content/uploads/2023/12/A-Modern-Approach-to-Confidently-Stopping-Unauthorized-Data-Exfiltration_WhitePaper.pdf

    zScaler and Egnyte can import MIP labels as part of it’s DLP: https://help.zscaler.com/downloads/zscaler-technology-partners/data/zscaler-and-egnyte-deployment-guide/Zscaler-Egnyte-Deployment-Guide-FINAL.pdf

    Microsoft Purview Information Protection is the foundation that your entire data security and governance strategy builds upon. Without a properly planned and implemented MIP deployment, even the most sophisticated Purview solutions won’t deliver their full value. Think of it as building a house – you need to get the foundation right first.

    As your organisation grows and your data landscape becomes more complex, your MIP strategy needs to evolve too. Regular reviews of your classification labels, updating sensitivity rules, and fine-tuning your protection policies aren’t just good practice – they’re essential for keeping your data secure and compliant.

    Making the case for Optical Content Recognition (OCR) in your Data Protection strategy

    I recently applied for a U.S. visa, and as part of the process, I had to submit my passport, bank records, and a lot of personally identifiable information to the embassy in the form of PDF and JPEG files. This meant that much of my sensitive data is now stored as images. This made me wonder: How are organisations safeguarding data that is image-based rather than text-based?

    Traditional Data Loss Prevention (DLP) strategies, while effective in monitoring text-based data, often fall short when it comes to image-based content. This shortcoming can lead to significant vulnerabilities, as sensitive information is frequently embedded within images (see my example above). Optical Content Recognition ( OCR) emerges as a must-have tool in addressing this gap, enabling organisations to extract and analyze text from images. For Cyber Security teams aiming to enhance their data security posture, integrating OCR into their DLP strategy is not just beneficial—it is a must!

    What are the industry use cases for OCR in DLP?

    • Financial services: Sensitive information such as account numbers, credit card details, and personally identifiable information (PII) is often embedded in scanned documents, receipts, and screenshots
    • Healthcare industry: There are data that are in the form of Medical records and scans, prescriptions and doctor’s notes (assuming that your doctor can write legibly)
    • Retail and Ecommerce: Scanned receipts and invoices and most product returns and refunds that starts in paper get scanned and stored.
    • Manufacturing: Contracts, Blueprints, R&D documents and even internal presentations (most of which gets converted to either an image or a PDF)
    • Government and Public Sector: Scanned copies of passports, drivers licenses and PII data, Incident reports (which again starts on paper and ends up as a image)

    These are just examples of where OCR in DLP can come in to ensure that data is not leaked out.

    OCR in Microsoft Purview

    Microsoft Purview has OCR capability that allows you to be able to identify, and protect data. This allows you to scan images for Sensitive Information but do remember that this is an OPTIONAL feature and must be enabled at a Tenant level. There’s also a bit of a cost to it (more on this later)

    To turn on OCR in your Microsoft Purview you’d need to do the following.

    1. Go to Settings > Select Optical Content Recognition.
    2. Choose where you want OCR to scan.

    The full Technical instruction can be found here: https://learn.microsoft.com/en-us/purview/ocr-learn-about?tabs=purview#workflow-at-a-glance

    The Cost of OCR

    This capability is powerful as it leverages on the Azure AI to use OCR. As of today, the cost to run $1.00 USD per 1,000 scanned item. The keywords to look out for in the costing is ‘per scanned item’ this is because Microsoft considers each page in a PDF or each individual image page in a set of images as 1 scan. So a PDF that contains 10 pages counts as 10 scans. https://learn.microsoft.com/en-us/purview/ocr-learn-about?tabs=purview#estimate-your-ocr-scanning-charges

    Data Strategy in using OCR for the first time.

    To limit your cost and be more deliberate in running this OCR scan, here’s a helpful strategy so that you use to get started.

    Data Search Using Content Search in Purview: Utilize Microsoft Purview’s Content Search feature to filter by file type, such as JPEG and PNG, to identify potential images containing sensitive information. This targeted approach ensures that all image files are scanned for embedded text.

    Focus on Known Locations: Identify departments or teams that handle sensitive data, such as Finance, Sales, and Marketing, and focus OCR searches on their respective SharePoint sites. This strategy maximizes the efficiency of OCR by concentrating on areas where sensitive information is most likely to reside.

    File Name Analysis: Implement keyword searches for terms that indicate sensitive content, such as “passport” or “ccn” (credit card number), in file names. This proactive approach helps in identifying and flagging files that may contain sensitive information.

    AI Implementation Failures: What We Learned from 2024

    My news feed is filled with “A Year in Review” of what happened in 2024 and the thing that stood out to me was 2024 was a bit of a mess for AI implementations.

    From chat-bots giving illegal advice to fake content flooding our news and social media feeds (I’m pretty sure that I’m not the only ones who’ve seen the Pope wear a cool puffy jacket)

    So how did we get here:

    The rush to implement AI solutions was largely driven by market pressure and FOMO (Fear of Missing Out). Companies, desperate to stay competitive, rushed to deploy AI solutions without proper governance frameworks or security controls. Board rooms worldwide echoed with demands for “AI strategy,” often without understanding what that actually meant for their business.

    This perfect storm was further fueled by the accessibility of AI tools and platforms. What used to require deep technical expertise became available through simple APIs and low-code interfaces. While this democratisation of AI is generally positive, it led to a “wild west” scenario where implementations often outpaced proper security and compliance considerations.

    The result? Poor deployment, Terrible user experience and many half-baked AI solutions, security vulnerabilities, and trust issues.


    Before You Start: The Boring (But Essential) Bits

    Look, I get it – you want to jump straight into the exciting world of AI. But here’s the thing: you need to sort out your data house first. Think of it like baby-proofing your home. Your CISO and security team need to know exactly what data you’ve got, where it lives, and who’s allowed to play with it.

    Get your Microsoft Purview DLP policies sorted, tag your sensitive stuff using Purview Information Protection, and make sure you’ve got the right security controls in place. Trust me, this boring bit will save you from some proper headaches later.


    The Fix: Four Simple Actionable Steps

    1. Sort Out Your Governance
      • Get an AI committee going
      • Write clear policies on AI usage, Data Protection, etc
      • Set proper standards
      • Actually check if things work (please audit!)
    2. Lock Down Security
    3. Quality Control
      • Keep humans in the loop
      • Test, test, test
      • Watch those outputs (again please run audit checks)
      • Clean data = better results
    4. Smart Implementation
      • Start small, scale later (even on a controlled Copilot for Microsoft 365, pilot it first with a handful of trusted people)
      • Train your people properly, (end-user education is a must)
      • Listen to user feedback
      • Don’t rush it

    2024 showed us that rushing in without proper planning is a recipe for disaster. Take your time, do it right, and maybe we won’t see your company in next year’s “AI Fails” list.

    Other Sources:

    Excluding a specific user (or group) from Sensitivity labels

    I’m excited to share a practical guide I’ve created that walks you through the process of excluding specific users or groups from Microsoft Purview Sensitivity Labels. This guide comes from a real-world scenario where an organization is piloting a new approach to simplify its labeling structure. They wanted to test how reducing the number of labels applied to users would affect workflows and information protection. To support this, I’ve put together detailed instructions on how to effectively manage exclusions in Purview, along with a back-out process to ensure a smooth rollback if needed.

    This PDF guide is packed with step-by-step instructions, screenshots, and expert tips to help you navigate the nuances of label exclusions. Whether you’re in the middle of a label simplification pilot or simply looking to better control label application, this guide will help streamline your process. Get ready to dive in and experience a more flexible, user-centered approach to managing Sensitivity Labels in Microsoft Purview!