When inheriting (a label) is an issue

I encountered a MIP labelling use case that I have not encountered before. The use case question is:

The answer: a whole lot of complication as you can see below.

Starting with the basic of access control: Microsoft Purview Information Protection gives IT admin an option to select which type of permission model to choose. IT defined or User defined model. The User defined model enables end-users to be to define the encryption for their document. This can be done through the label itself under controls.

This gives the user the ability to mix and match how they want their data to be accessed. Users can select who can have read or view or edit rights. These can be different individuals in 1 permission.

In the label publishing policy, you can then configure whether emails should inherit the label of a attachment if the label of the document is higher. This is to ensure that the higher label (with it’s higher security) takes precedence.

I ran a test with the following parameters.

  • Open Outlook > The email label is set a NO LABEL
  • Attached the Word document I created earlier with the label called Highly Confidential (this is the same file with the permission set from the 2nd screenshot above)
  • Sent it to other 2 accounts that was not in the permission list above. This is to simulate how the recipients would see the message
    • Sent to 1 internal account (Barry Allen)
    • Sent to 1 external account that is NOT in the permission list

Outcome:

  1. Outlook was NOT able to inherit the higher label.

On a positive note, this means that the encryption/ permission still works on the document. The screenshot above is from an external email that I have that was not in the permission list of the attached file. So IT security can at least have that peace of mind to know that as long as the data is properly labelled. Data leakage is kept to a minimal.

In another test where the encryption option model for the label that I used is set to use IT Admin defined (all permission is pre-defined.)

The Outlook email was able to properly inherit the label.

The Value of Testing and Advice for IT Admins

Testing is essential when setting up MIP labels and encryption. Real-world testing helps uncover issues or behaviours that might not be obvious from the documentation. By testing it yourself, you can be confident that the setup works as expected in your environment and meets your organisation’s needs.

Advice for IT Admins:
If you plan to use user-defined encryption, make sure your users are properly trained. This model can be confusing, and users might think they’ve set permissions correctly when they haven’t. To avoid mistakes, provide clear instructions and training. Testing these scenarios yourself will also help you spot potential problems and give better support to your users.

Reference: https://learn.microsoft.com/en-us/purview/create-sensitivity-labels#publish-sensitivity-labels-by-creating-a-label-policy

New Built-in Role in Entra: AI Admin

Microsoft has recognised the need for a specialised Admin account to manage AI and Microsoft Copilot across the organisation. This AI Admin role has started rolling out across all Microsoft 365 and Microsoft Entra clients since November 2024.

With AI Admin account can do the following tasks:

  • Manage all aspects of Microsoft 365 Copilot
  • Manage AI-related enterprise services, extensibility, and copilot agents from the Integrated apps page in the Microsoft 365 admin center
  • Approve and publish line-of-business copilot agents
  • Allow users to install an app or install an app for users in the organization if the app does not require permission
  • Read and configure Azure and Microsoft 365 service health dashboards
  • View usage reports, adoption insights, and organizational insight
  • Create and manage support tickets in Azure and the Microsoft 365 admin center

Reference: https://learn.microsoft.com/en-us/entra/identity/role-based-access-control/permissions-reference#ai-administrator

In the center of it all.

Among all Microsoft Purview security solutions, there’s one that you absolutely must get right. If you don’t, your entire data security strategy could fall apart, no matter what other security tools you’re using.

This key solution brings together three basic but crucial tasks: finding your sensitive data, labelling it correctly, and keeping it safe. This solution is Microsoft Purview Information Protection (MIP), and it’s at the heart of how you protect your company’s data.

Why is MIP so critical?

Think of the Microsoft Purview’s Data Classification service as the system that helps all other security tools know what to do. Here’s how it works with different Purview tools:

Purview Data Loss Prevention (DLP):

  • Works like a security guard that reads the labels
  • If it sees a file marked ‘Secret’, it knows exactly what protection rules to follow
  • For example: “This is confidential data, so don’t let it be shared outside the company”

Endpoint DLP (Devices) and Microsoft Defender for Cloud Apps:

  • These tools check the labels whether you’re working on your laptop or in cloud apps like Workday, Salesforce, etc.
  • They constantly ask “What’s this file’s label?” before allowing any action
  • Then they make sure the right safety measures are in place

Microsoft Purview Insider Risk Management:

  • This one’s particularly clever about using the labels
  • It watches for unusual behaviour with sensitive data
  • For example: If someone suddenly downloads 100 files marked ‘Highly Confidential’, it raises an alert
  • It can then start extra monitoring or take other protective steps”

Microsoft Purview Data Governance (Data Map)

  • This service uses MIP to help you map and catalog your structured data.
  • It gives you the ability to apply consistent classification across your data estate. You can have a standardised label across your organisation.
  • For example: “A ‘Confidential’ label means the same thing everywhere, making it easier to manage and protect”

Third party services using MIP

Even third party servicse leverages on the MIP data classification services.

Trellix integrates it’s DLP network appliance with MIP: https://docs.trellix.com/bundle/data-loss-prevention-11.11.x-product-guide/page/UUID-5d61c924-38ac-3cb9-fb84-17596363740f.html

Crowdstrike leverage Microsoft Purview Information Protection labels (page 5 of 7): https://www.crowdstrike.com/wp-content/uploads/2023/12/A-Modern-Approach-to-Confidently-Stopping-Unauthorized-Data-Exfiltration_WhitePaper.pdf

zScaler and Egnyte can import MIP labels as part of it’s DLP: https://help.zscaler.com/downloads/zscaler-technology-partners/data/zscaler-and-egnyte-deployment-guide/Zscaler-Egnyte-Deployment-Guide-FINAL.pdf

Microsoft Purview Information Protection is the foundation that your entire data security and governance strategy builds upon. Without a properly planned and implemented MIP deployment, even the most sophisticated Purview solutions won’t deliver their full value. Think of it as building a house – you need to get the foundation right first.

As your organisation grows and your data landscape becomes more complex, your MIP strategy needs to evolve too. Regular reviews of your classification labels, updating sensitivity rules, and fine-tuning your protection policies aren’t just good practice – they’re essential for keeping your data secure and compliant.

Making the case for Optical Content Recognition (OCR) in your Data Protection strategy

I recently applied for a U.S. visa, and as part of the process, I had to submit my passport, bank records, and a lot of personally identifiable information to the embassy in the form of PDF and JPEG files. This meant that much of my sensitive data is now stored as images. This made me wonder: How are organisations safeguarding data that is image-based rather than text-based?

Traditional Data Loss Prevention (DLP) strategies, while effective in monitoring text-based data, often fall short when it comes to image-based content. This shortcoming can lead to significant vulnerabilities, as sensitive information is frequently embedded within images (see my example above). Optical Content Recognition ( OCR) emerges as a must-have tool in addressing this gap, enabling organisations to extract and analyze text from images. For Cyber Security teams aiming to enhance their data security posture, integrating OCR into their DLP strategy is not just beneficial—it is a must!

What are the industry use cases for OCR in DLP?

  • Financial services: Sensitive information such as account numbers, credit card details, and personally identifiable information (PII) is often embedded in scanned documents, receipts, and screenshots
  • Healthcare industry: There are data that are in the form of Medical records and scans, prescriptions and doctor’s notes (assuming that your doctor can write legibly)
  • Retail and Ecommerce: Scanned receipts and invoices and most product returns and refunds that starts in paper get scanned and stored.
  • Manufacturing: Contracts, Blueprints, R&D documents and even internal presentations (most of which gets converted to either an image or a PDF)
  • Government and Public Sector: Scanned copies of passports, drivers licenses and PII data, Incident reports (which again starts on paper and ends up as a image)

These are just examples of where OCR in DLP can come in to ensure that data is not leaked out.

OCR in Microsoft Purview

Microsoft Purview has OCR capability that allows you to be able to identify, and protect data. This allows you to scan images for Sensitive Information but do remember that this is an OPTIONAL feature and must be enabled at a Tenant level. There’s also a bit of a cost to it (more on this later)

To turn on OCR in your Microsoft Purview you’d need to do the following.

  1. Go to Settings > Select Optical Content Recognition.
  2. Choose where you want OCR to scan.

The full Technical instruction can be found here: https://learn.microsoft.com/en-us/purview/ocr-learn-about?tabs=purview#workflow-at-a-glance

The Cost of OCR

This capability is powerful as it leverages on the Azure AI to use OCR. As of today, the cost to run $1.00 USD per 1,000 scanned item. The keywords to look out for in the costing is ‘per scanned item’ this is because Microsoft considers each page in a PDF or each individual image page in a set of images as 1 scan. So a PDF that contains 10 pages counts as 10 scans. https://learn.microsoft.com/en-us/purview/ocr-learn-about?tabs=purview#estimate-your-ocr-scanning-charges

Data Strategy in using OCR for the first time.

To limit your cost and be more deliberate in running this OCR scan, here’s a helpful strategy so that you use to get started.

Data Search Using Content Search in Purview: Utilize Microsoft Purview’s Content Search feature to filter by file type, such as JPEG and PNG, to identify potential images containing sensitive information. This targeted approach ensures that all image files are scanned for embedded text.

Focus on Known Locations: Identify departments or teams that handle sensitive data, such as Finance, Sales, and Marketing, and focus OCR searches on their respective SharePoint sites. This strategy maximizes the efficiency of OCR by concentrating on areas where sensitive information is most likely to reside.

File Name Analysis: Implement keyword searches for terms that indicate sensitive content, such as “passport” or “ccn” (credit card number), in file names. This proactive approach helps in identifying and flagging files that may contain sensitive information.

AI Implementation Failures: What We Learned from 2024

My news feed is filled with “A Year in Review” of what happened in 2024 and the thing that stood out to me was 2024 was a bit of a mess for AI implementations.

From chat-bots giving illegal advice to fake content flooding our news and social media feeds (I’m pretty sure that I’m not the only ones who’ve seen the Pope wear a cool puffy jacket)

So how did we get here:

The rush to implement AI solutions was largely driven by market pressure and FOMO (Fear of Missing Out). Companies, desperate to stay competitive, rushed to deploy AI solutions without proper governance frameworks or security controls. Board rooms worldwide echoed with demands for “AI strategy,” often without understanding what that actually meant for their business.

This perfect storm was further fueled by the accessibility of AI tools and platforms. What used to require deep technical expertise became available through simple APIs and low-code interfaces. While this democratisation of AI is generally positive, it led to a “wild west” scenario where implementations often outpaced proper security and compliance considerations.

The result? Poor deployment, Terrible user experience and many half-baked AI solutions, security vulnerabilities, and trust issues.


Before You Start: The Boring (But Essential) Bits

Look, I get it – you want to jump straight into the exciting world of AI. But here’s the thing: you need to sort out your data house first. Think of it like baby-proofing your home. Your CISO and security team need to know exactly what data you’ve got, where it lives, and who’s allowed to play with it.

Get your Microsoft Purview DLP policies sorted, tag your sensitive stuff using Purview Information Protection, and make sure you’ve got the right security controls in place. Trust me, this boring bit will save you from some proper headaches later.


The Fix: Four Simple Actionable Steps

  1. Sort Out Your Governance
    • Get an AI committee going
    • Write clear policies on AI usage, Data Protection, etc
    • Set proper standards
    • Actually check if things work (please audit!)
  2. Lock Down Security
  3. Quality Control
    • Keep humans in the loop
    • Test, test, test
    • Watch those outputs (again please run audit checks)
    • Clean data = better results
  4. Smart Implementation
    • Start small, scale later (even on a controlled Copilot for Microsoft 365, pilot it first with a handful of trusted people)
    • Train your people properly, (end-user education is a must)
    • Listen to user feedback
    • Don’t rush it

2024 showed us that rushing in without proper planning is a recipe for disaster. Take your time, do it right, and maybe we won’t see your company in next year’s “AI Fails” list.

Other Sources:

Excluding a specific user (or group) from Sensitivity labels

I’m excited to share a practical guide I’ve created that walks you through the process of excluding specific users or groups from Microsoft Purview Sensitivity Labels. This guide comes from a real-world scenario where an organization is piloting a new approach to simplify its labeling structure. They wanted to test how reducing the number of labels applied to users would affect workflows and information protection. To support this, I’ve put together detailed instructions on how to effectively manage exclusions in Purview, along with a back-out process to ensure a smooth rollback if needed.

This PDF guide is packed with step-by-step instructions, screenshots, and expert tips to help you navigate the nuances of label exclusions. Whether you’re in the middle of a label simplification pilot or simply looking to better control label application, this guide will help streamline your process. Get ready to dive in and experience a more flexible, user-centered approach to managing Sensitivity Labels in Microsoft Purview!

From Novice to Ninja: a new CISOs guide to DLP

Congratulations, CISO! 🎉 Great job in landing your new role, where protecting sensitive data isn’t just a job—it’s a daily tightrope walk over a pit of cyber threats, compliance demands, and evolving technology.

Now that you’re at the steering wheel, your inbox is probably overflowing with security concerns, regulatory requirements, and a few “fun” audit emails. Don’t worry, you’re in good company. This guide is here to give you actionable steps to set up your Data Loss Prevention (DLP) strategy, ensuring you don’t just survive in this role—you thrive.

So, what does being a CISO mean? Well, you’re now the go-to person when sensitive data sneaks out, malicious insiders get a bit too curious, or someone clicks that suspicious link promising free money from an unknown relative in Timbuktu. No pressure, right? But here’s the deal: inaction is risk. Delaying or overlooking the core elements of a solid DLP strategy could lead to breaches that cost more than your next cybersecurity budget.

To make your journey smoother, I’ve prepared a handy worksheet that you can use right now to take action on your Data Loss Prevention strategy. These aren’t just checkboxes—these are critical steps to lock down your organization’s data and avoid waking up to a breach nightmare.

You can Download the worksheet below.

Here’s what you can expect see inside:

1. Classifying Data and Why It’s Important

Why it matters: Not all data is created equal. By classifying your data, you can prioritize resources and security measures where they’re needed most. Would you protect the company picnic plan with the same force as your customers’ financial information? (Spoiler: probably not!)

Example:

  • High-risk data: Customer credit card details, proprietary code, or confidential HR files—things you’d never want to see in the wrong hands.
  • Medium-risk data: Internal meeting notes, marketing strategies—sensitive, but not catastrophic if leaked.
  • Low-risk data: Public reports, customer FAQs—this is the stuff you’d share at a conference.

Take Action Today: Review your organization’s data and start tagging it by risk level. Ask yourself, “What would happen if this data got out?” and use that to guide your classification efforts

2. Why and How to Identify Sensitive Data

Why it matters: You can’t protect what you don’t know exists. Sensitive data is often hidden across different platforms—sometimes even in the most unexpected places (like a random email attachment or NTFS file shares). Identifying it is the first step to ensuring it stays secure.

Example:

  • Sensitive Data: Personally Identifiable Information (PII) like social security numbers or health records, intellectual property (IP), and anything that’s subject to regulations like GDPR or HIPAA.
  • Surprise Discovery: Finding a list of client emails attached to a forgotten project buried in a shared folder.

Take Action Today: Use a discovery tool or audit your data manually. Start with cloud storage, email servers, and shared folders. Look for data that could lead to a privacy violation or financial loss if exposed.

3. Developing a Data Handling Policy

Why it matters: A solid data handling policy is the foundation of your DLP strategy. Without clear rules in place, sensitive information can slip through the cracks, exposing your organization to unnecessary risk. Your data handling policy ensures everyone—from top execs to interns—understands the dos and don’ts of handling sensitive information.

Example:

  • Clear Guidelines: For high-risk data like financial information, the policy might mandate encryption during transfer and restricted access to authorized personnel only.
  • Real-Life Scenario: Imagine your marketing team accidentally sharing a file with customer details over an unsecured network. A proper data handling policy would prevent this by enforcing secure file transfer practices.

Take Action Today: Draft a policy that covers how different types of data (high, medium, low risk) should be handled. It should specify everything from encryption requirements to access control and data retention periods. Involve key stakeholders (Legal, IT, HR) to ensure all bases are covered.

Now that you know the key steps to securing your organization’s data, it’s time to plan it out, partner with your internal stakeholders, and take action. DLP isn’t a one-person job—it’s a team effort that involves collaboration across IT, Legal, HR, and beyond. The risks of inaction are far too high, so don’t wait until something goes wrong. Proactively implementing these best practices today will not only protect your data but also strengthen your leadership as a new CISO.

Take a Load Off and SIT (an oversimplified explanation of using SIT)

In my Purview Ninja Training (you can take the training too, click here), one of the Purview capabilities that I struggled understanding at first was using the Sensitive Information Types for automatic classification. Not because it’s difficult to understand but becaue there were so many different options you can choose from that can be applied to similar use cases.

So to save time in understanding it, here is an over-simplified matrix of when to use the different automatic classification options using Microsoft Purview Information Protection.

When to use each capability.

  • Built-in SIT: Ready-to-use, predefined data types like social security numbers, credit card numbers, and other common sensitive data formats. Ideal for general compliance and basic data protection needs.
  • Custom SIT: Customizable to meet unique organizational requirements. Suitable for both structured and unstructured data.
  • EDM (Exact Data Match SITs): Best for exact matches of structured data with consistent formats, such as financial records and personal IDs.
  • Document Fingerprinting: Detects and protects standardized documents with repeatable structures, like legal forms and templates.
  • Named Entities SIT: Used for for detecting contextual sensitive or important data, like names or organizations, particularly within unstructured formats.
  • Trainable Classifiers: Useful for complex or ever changing data types, especially in unstructured data, where static rules or patterns are inadequate

Dude, Where’s my DATA

Data is the new currency in today’s digital age. Just as you wouldn’t leave your house title lying around for anyone to take, understanding where your data resides is crucial for its protection. Knowing the exact location of your data allows you to implement proper security measures, ensuring it’s not vulnerable to unauthorized access or breaches.

Understanding your data’s location also plays a vital role in regulatory compliance. For instance, CIS controls (https://www.cisecurity.org/controls) Control 13: Data Protection and Control 14: Controlled Access Based on the Need to Know, emphasize the need to secure data and limit access strictly to those who need it. By mapping out where your data lives, you can better align your practices with these controls, reducing risks and meeting compliance requirements.

In this blog, I will guide you through the various methods to discover where your data resides, the specific tools to use for different types of data, and when and how to effectively utilize each tool.


The 2 Methods in discovery data

    Manual methods involve physically documenting all the locations where your data is stored. This approach requires you to actively track and record each data repository, whether it’s on-premises, in the cloud, or across various applications and devices. While this method can be thorough and provide a deep understanding of your data landscape, it is also time-consuming and prone to human error. Think of it as manually creating an inventory of every item in your home – it’s detailed but can be exhausting and easy to miss something.

    Automatic methods leverage technology to scan, map, and classify your data across different environments. These methods use specialized tools to automatically discover data locations, classify sensitive information, and provide insights into data usage and movement.


    Type of Data in an Organization

    Organizations typically handle two primary types of business data: documents and organizational business data.

    Documents include files like reports, presentations, spreadsheets, and PDFs, which often contain sensitive information and require careful management and protection.

    On the other hand, Organizational business data encompasses the data generated from business operations, workflows, and applications, such as transaction records, customer information, and operational metrics. Think of applications such as Dynamics 365, Workday data, SAP data, etc. This type of data is what is used for day-to-day operations.

    Now that we know about the 2 different data in an organisation, let’s go have a look at what are the available Microsoft solutions to use to DISCOVER DATA (most of which are already included in your Microsoft Business Premium, or E3 and E5 licenses)

    Quick Note:

    There are solutions that are not on this list that has some form of search/ discovery capability (ex. Purview Data Life Cycle Management, Audit Log Search) I’ve omitted it in this list as their primary purpose is data governance and the data discovery capability relies on the other items that I’ve listed down below


    Document discovery tool

    Microsoft Purview Information Protection: (for documents stored in Email, SharePoint, OneDrive and Teams): It helps classify and label data based on its sensitivity. Start by defining your data classification schema, apply labels to your documents using built-in or custom labels, and configure policies to automatically classify and protect sensitive information as it is created or modified.

    Microsoft Purview Information Scanner (for On-prem data): This is designed to scan and classify on-premises data. To use it, deploy the scanner to your on-premises environment, configure scanning jobs to target specific data repositories, and review the scan results to understand where sensitive information resides and how it is being used.

    Microsoft Compliance Center (Content Search Tool): The Content search tool in the Microsoft Compliance Center allows you to search for and manage content across your organization.

    Microsoft 365 eDiscovery: This helps you manage and analyze large volumes of data for legal and compliance purposes. To use it, access the eDiscovery portal, create a case, add data sources, and run searches and analytics to gather relevant information for your legal or compliance needs.

    Defender for Cloud Apps: This is a comprehensive solution for monitoring and controlling data movement across cloud applications. The tool also offers data classification and protection through integration with Microsoft Purview Information Protection, ensuring consistent data security across your organization​

    Priva (using Privacy Assessments): This is specifically just for Personal data discovery. Automates the discovery, documentation, and evaluation of personal data use across your entire data estate. Using this regulatory-independent solution, you can automate privacy assessments and build a complete compliance record for the responsible use of personal data.

    Organizational Business Data Tools

    Purview Data Map: Helps you create a unified map of your data estate by automatically scanning and classifying your data sources. To use it, configure scanning rules and connect your data sources to Purview. The Data Map will continuously update, providing an up-to-date view of your data landscape, including classification and sensitivity labels, which helps in managing data compliance and governance.

    Purview Data Catalog: Provides a searchable catalog of data assets, making it easy to discover and understand data across your organization. To use it, start by connecting your data sources to Purview, which will automatically scan and index your data. Users can then search for data assets, view metadata, and understand data lineage, facilitating better data governance and management.

    Information Rights Management vs. Encryption via Sensitivity Labels: Why You Can’t Use Both on One Document

    An interesting use case came from a client where they were looking at enabling encryption using Sensitivity labels and do away with the existing Information Rights Management (IRM) to protect their files in Sharepoint.

    One of the Security analyst asked why not use BOTH at the same time. If both of them offers security protection, surely having DOUBLE protection will be better right? Well…

    Before we dive deeper in to the reason, let’s have an understanding first of what is Information Rights Management and Encryption through Sensitivity labels.


    What is IRM? Information Rights Management (IRM) is a tool that helps protect and control who can access, edit, print, or forward your documents and emails. Think of it as a digital lock that only lets certain people in and tells them what they can and can’t do with the information.

    How to Use IRM in Microsoft 365:

    1. Go to the document or email you want to protect.
    2. Click on the “File” tab.
    3. Select “Info” and then “Protect Document.”
    4. Choose “Restrict Access” and set the permissions for who can access and what they can do.

    What are Sensitivity Labels? Sensitivity Labels are part of Microsoft Information Protection solutions. They allow organizations to classify and protect documents and emails based on their sensitivity. These labels can apply encryption, watermarking, and content marking, as well as define access policies. Key features include: The organisation designs which label enables encryption.

    In simple terms, the encryption is applied once the appropriate Label is selected.


    Here’s why you can’t use both of them at the same time.

    The primary reason you cannot use both IRM and Sensitivity Labels encryption simultaneously on a document is due to overlapping functionalities and potential conflicts between the two systems:

    • Redundant Encryption: Both systems apply encryption, which can lead to conflicts or redundancy in the encryption process. Encrypting a document twice can complicate access management and decryption processes.
    • Policy Conflicts: IRM and Sensitivity Labels both define access and usage policies. Applying both might result in conflicting policies, making it difficult to enforce a clear and consistent set of rules.

    Which Encryption wins if these 2 were used at the same time?

    Based on my testing, Information Rights Management (IRM) Wins. When a document is protected by both IRM and a Sensitivity Label, the document retains the IRM encryption and loses the Sensitivity Label.

    This outcome makes sense because IRM encryption is embedded directly into the document. On the other hand, Sensitivity Label encryption is more flexible and can be easily changed by applying or reapplying different labels. Therefore, the more rigid and integrated IRM encryption overrides the more adaptable Sensitivity Label encryption.