Azure AD

How to Define Custom Sensitive Information Types for Use in DLP Policies


New Interface and New Capabilities Make It Easier to Manage Sensitive Information Types

Sensitive information types are used by Microsoft 365 components like DLP policies and auto-label (retention) policies to locate data in messages and documents. Earlier in January, Microsoft released a set of new sensitive information types to make it easier to detect country-specific sensitive data like identity cards and driving licenses. Message center notification MC233488 (updated January 23) brings news of two additional changes rolled out in late January and early February.

Confidence Levels for Matching

First, the descriptions of the confidence levels used for levels of match accuracy are changing from being expressed as a percentage (65%, 75%, 85%) to low, medium, and high. Existing policies pick up the new descriptors automatically. This is Microsoft 365 roadmap item 68915.

The change is intended to make it easier for people to better understand the accuracy of matches when using sensitive information types. DLP detects matches in messages and documents by looking for patterns and other forms of evidence. The primary element (like a regular expression) defined in the pattern for a sensitive information type defines how basic matching is performed for that type. Secondary elements (like keyword lists) add evidence to increase the likelihood that a match exists. The more evidence is gathered, the higher the match accuracy and the confidence level increases from low to medium to high. Policy rules use the confidence level to decide what action is necessary when matches are found. For instance, a rule might block documents and email when a high confidence exists that sensitive data is present while just warning (with a policy tip) for lower levels of confidence.

Copying Sensitive Information Types

The second change is that you can copy sensitive information types, including the set of sensitive information types distributed by Microsoft. For instance, let’s assume that you don’t think that the standard definition for a credit card number works as well as it could. You can go to the Data Classification section of the Compliance Center, select the credit card number sensitive information type, and copy it to create a new type (Figure 1). The copy becomes a custom sensitive information type under your control to tweak as you see fit.

Copying the sensitive information type for a credit card number
Figure 1: Copying the sensitive information type for a credit card number

Improved User Interface

The third change is the most important. Figure 1 is an example of a new user interface to manage sensitive information types (Microsoft 365 roadmap item 68916). The new interface is crisper and exposes more information about how information types work. For instance, in Figure 1, we can see that the primary element for credit card number detection is a function to detect a valid credit card number. Further evidence (supporting elements) come from the presence of keywords like a credit card name (for example, Visa or MasterCard) and expiration date near a detected number.

Few are likely to have the desire to tweak the standard sensitive information types. However, being able to examine how Microsoft puts these objects together is instructive and helpful when the time comes to create custom sensitive information types to meet business requirements.

Detecting Azure AD Passwords

For instance, MVP James Cussen points out that Azure AD passwords are not included in the list of sensitive information types. While some people need to send passwords around in email and Teams messages, it’s not the most secure way of transmitting credentials. In this post, he uses the old interface to define a sensitive information type to detect passwords. To test the new interface, I used his definition as the basis for a new custom sensitive information type.

The primary element to match passwords is a regular expression:

A bunch of suggested expressions to detect passwords can be found on the internet. Most fail when input for use with a sensitive information type because they fail Microsoft’s rules to detect illegal or inefficient expressions. Not being a Regex expert, I tried several (all worked when tested against, and all failed except the one created by James.

A keyword list is a useful secondary element to add evidence that a password is found. The list contains a comma-separated set of common words that you might expect to find close to a password. For instance:

“Here’s your new password: 515AbcSven!”

“Use this pwd to get into the site ExpertsAreUs33@”

In multilingual tenants, the ideal situation is to include relevant words in different languages in the keyword list. For instance, if the tenant has Dutch and Swedish users, you could include wachtwoord (Dutch) and lösenord (Swedish). To accommodate the reality that people don’t always spell words correctly, consider adding some misspelt variations of keywords. In this instance, we could add keywords like passwrod or pword.

James’s definition allows keywords to be in a window of 300 characters anchored on the detected password (see this Microsoft article to understand how the window works). I think this window is too large and might result in many false positives. The keyword is likely to be close to the password, so I reduced the window to 80 characters.

Figure 2 shows the result after inputting the regular expression, keyword list, confidence level (medium), and character proximity. It’s a less complex definition than for Microsoft’s sensitive information types. The big question is does it work.

Definition for the Azure Active Directory password custom sensitive information type
Figure 2: Definition for the Azure Active Directory password custom sensitive information type


The Test option allows you to upload a file containing sample text to run against the definition to see if it works. As you can see in Figure 3, the test was successful.

Testing a custom sensitive information type
Figure 3: Testing a custom sensitive information type

Using the Custom Sensitive Information Type in a Teams DLP policy

Testing gives some confidence that the custom sensitive information type will work properly when deployed in a DLP policy. After quickly creating a DLP policy for Teams, we can confirm its effectiveness (Figure 4) in chats and channel conversations.

Passwords blocked in a Teams chat
Figure 4: Passwords blocked in a Teams chat

I deliberately choose Teams as the DLP target because organizations probably don’t want their users swapping passwords in chats and channel conversations. Before rushing to extend the DLP policy to cover email, consider the fact that it’s common to send passwords in email. For instance, I changed the policy to cover email and Teams and discovered that the policy blocks any invitation to Zoom meetings because these invitations include the word “pwd” as in:

Although it might be an attractive idea to block Zoom to force people to use Teams online meetings instead, it’s not a realistic option. The simple solution is not to use this DLP policy for email.

False Positives and Policy Overrides

The downside of matching text in messages against keywords defined in a policy is that some false positives can happen. For instance, I have a Flow to import tweets about Office 365 into a team channel. As Figure 5 shows, some tweets are picked up as potential password violations because a keyword appears near a string which could be a valid password.

Tweets posted in Teams are blocked because they match the password definition
Figure 5: Tweets posted in Teams are blocked because they match the password definition

Adjusting the definition for the sensitive information type to reduce the character proximity count (from 80 to 60) reduced the number of false positives. Testing and observation will tell how effective changes like this are when exposed to real-life data.

Apart from adjusting character proximity, two other potential solutions exist. First, amend the DLP policy to allow users to override the block and send the message with a justification reported to the administrator. If the message is important, users will override the policy. The administrator will be notified when overrides occur and tweak the policy (if possible) to avoid further occurrences.

The second option is to exclude accounts (individually or the members of a distribution list) which have a business need to send passwords from the DLP policy. DLP will then ignore messages sent by these individuals.

Creating Custom Sensitive Information Types a Nice to Have

Given the broad range of standard types created by Microsoft, the need to define custom sensitive information types isn’t likely to be a priority for most tenants. However, for those who need this feature for business reasons, the recent changes are both welcome and useful.

Source Practical365

Chioma Ugochukwu

The author Chioma Ugochukwu

Leave a Response