Take a Load Off and SIT (an oversimplified explanation of using SIT)

In my Purview Ninja Training (you can take the training too, click here), one of the Purview capabilities that I struggled understanding at first was using the Sensitive Information Types for automatic classification. Not because it’s difficult to understand but becaue there were so many different options you can choose from that can be applied to similar use cases.

So to save time in understanding it, here is an over-simplified matrix of when to use the different automatic classification options using Microsoft Purview Information Protection.

When to use each capability.

  • Built-in SIT: Ready-to-use, predefined data types like social security numbers, credit card numbers, and other common sensitive data formats. Ideal for general compliance and basic data protection needs.
  • Custom SIT: Customizable to meet unique organizational requirements. Suitable for both structured and unstructured data.
  • EDM (Exact Data Match SITs): Best for exact matches of structured data with consistent formats, such as financial records and personal IDs.
  • Document Fingerprinting: Detects and protects standardized documents with repeatable structures, like legal forms and templates.
  • Named Entities SIT: Used for for detecting contextual sensitive or important data, like names or organizations, particularly within unstructured formats.
  • Trainable Classifiers: Useful for complex or ever changing data types, especially in unstructured data, where static rules or patterns are inadequate

Dude, Where’s my DATA

Data is the new currency in today’s digital age. Just as you wouldn’t leave your house title lying around for anyone to take, understanding where your data resides is crucial for its protection. Knowing the exact location of your data allows you to implement proper security measures, ensuring it’s not vulnerable to unauthorized access or breaches.

Understanding your data’s location also plays a vital role in regulatory compliance. For instance, CIS controls (https://www.cisecurity.org/controls) Control 13: Data Protection and Control 14: Controlled Access Based on the Need to Know, emphasize the need to secure data and limit access strictly to those who need it. By mapping out where your data lives, you can better align your practices with these controls, reducing risks and meeting compliance requirements.

In this blog, I will guide you through the various methods to discover where your data resides, the specific tools to use for different types of data, and when and how to effectively utilize each tool.


The 2 Methods in discovery data

    Manual methods involve physically documenting all the locations where your data is stored. This approach requires you to actively track and record each data repository, whether it’s on-premises, in the cloud, or across various applications and devices. While this method can be thorough and provide a deep understanding of your data landscape, it is also time-consuming and prone to human error. Think of it as manually creating an inventory of every item in your home – it’s detailed but can be exhausting and easy to miss something.

    Automatic methods leverage technology to scan, map, and classify your data across different environments. These methods use specialized tools to automatically discover data locations, classify sensitive information, and provide insights into data usage and movement.


    Type of Data in an Organization

    Organizations typically handle two primary types of business data: documents and organizational business data.

    Documents include files like reports, presentations, spreadsheets, and PDFs, which often contain sensitive information and require careful management and protection.

    On the other hand, Organizational business data encompasses the data generated from business operations, workflows, and applications, such as transaction records, customer information, and operational metrics. Think of applications such as Dynamics 365, Workday data, SAP data, etc. This type of data is what is used for day-to-day operations.

    Now that we know about the 2 different data in an organisation, let’s go have a look at what are the available Microsoft solutions to use to DISCOVER DATA (most of which are already included in your Microsoft Business Premium, or E3 and E5 licenses)

    Quick Note:

    There are solutions that are not on this list that has some form of search/ discovery capability (ex. Purview Data Life Cycle Management, Audit Log Search) I’ve omitted it in this list as their primary purpose is data governance and the data discovery capability relies on the other items that I’ve listed down below


    Document discovery tool

    Microsoft Purview Information Protection: (for documents stored in Email, SharePoint, OneDrive and Teams): It helps classify and label data based on its sensitivity. Start by defining your data classification schema, apply labels to your documents using built-in or custom labels, and configure policies to automatically classify and protect sensitive information as it is created or modified.

    Microsoft Purview Information Scanner (for On-prem data): This is designed to scan and classify on-premises data. To use it, deploy the scanner to your on-premises environment, configure scanning jobs to target specific data repositories, and review the scan results to understand where sensitive information resides and how it is being used.

    Microsoft Compliance Center (Content Search Tool): The Content search tool in the Microsoft Compliance Center allows you to search for and manage content across your organization.

    Microsoft 365 eDiscovery: This helps you manage and analyze large volumes of data for legal and compliance purposes. To use it, access the eDiscovery portal, create a case, add data sources, and run searches and analytics to gather relevant information for your legal or compliance needs.

    Defender for Cloud Apps: This is a comprehensive solution for monitoring and controlling data movement across cloud applications. The tool also offers data classification and protection through integration with Microsoft Purview Information Protection, ensuring consistent data security across your organization​

    Priva (using Privacy Assessments): This is specifically just for Personal data discovery. Automates the discovery, documentation, and evaluation of personal data use across your entire data estate. Using this regulatory-independent solution, you can automate privacy assessments and build a complete compliance record for the responsible use of personal data.

    Organizational Business Data Tools

    Purview Data Map: Helps you create a unified map of your data estate by automatically scanning and classifying your data sources. To use it, configure scanning rules and connect your data sources to Purview. The Data Map will continuously update, providing an up-to-date view of your data landscape, including classification and sensitivity labels, which helps in managing data compliance and governance.

    Purview Data Catalog: Provides a searchable catalog of data assets, making it easy to discover and understand data across your organization. To use it, start by connecting your data sources to Purview, which will automatically scan and index your data. Users can then search for data assets, view metadata, and understand data lineage, facilitating better data governance and management.

    Information Rights Management vs. Encryption via Sensitivity Labels: Why You Can’t Use Both on One Document

    An interesting use case came from a client where they were looking at enabling encryption using Sensitivity labels and do away with the existing Information Rights Management (IRM) to protect their files in Sharepoint.

    One of the Security analyst asked why not use BOTH at the same time. If both of them offers security protection, surely having DOUBLE protection will be better right? Well…

    Before we dive deeper in to the reason, let’s have an understanding first of what is Information Rights Management and Encryption through Sensitivity labels.


    What is IRM? Information Rights Management (IRM) is a tool that helps protect and control who can access, edit, print, or forward your documents and emails. Think of it as a digital lock that only lets certain people in and tells them what they can and can’t do with the information.

    How to Use IRM in Microsoft 365:

    1. Go to the document or email you want to protect.
    2. Click on the “File” tab.
    3. Select “Info” and then “Protect Document.”
    4. Choose “Restrict Access” and set the permissions for who can access and what they can do.

    What are Sensitivity Labels? Sensitivity Labels are part of Microsoft Information Protection solutions. They allow organizations to classify and protect documents and emails based on their sensitivity. These labels can apply encryption, watermarking, and content marking, as well as define access policies. Key features include: The organisation designs which label enables encryption.

    In simple terms, the encryption is applied once the appropriate Label is selected.


    Here’s why you can’t use both of them at the same time.

    The primary reason you cannot use both IRM and Sensitivity Labels encryption simultaneously on a document is due to overlapping functionalities and potential conflicts between the two systems:

    • Redundant Encryption: Both systems apply encryption, which can lead to conflicts or redundancy in the encryption process. Encrypting a document twice can complicate access management and decryption processes.
    • Policy Conflicts: IRM and Sensitivity Labels both define access and usage policies. Applying both might result in conflicting policies, making it difficult to enforce a clear and consistent set of rules.

    Which Encryption wins if these 2 were used at the same time?

    Based on my testing, Information Rights Management (IRM) Wins. When a document is protected by both IRM and a Sensitivity Label, the document retains the IRM encryption and loses the Sensitivity Label.

    This outcome makes sense because IRM encryption is embedded directly into the document. On the other hand, Sensitivity Label encryption is more flexible and can be easily changed by applying or reapplying different labels. Therefore, the more rigid and integrated IRM encryption overrides the more adaptable Sensitivity Label encryption.