Transforms: decomposeDiacriticalMarks

In our current solution we have built a powershell script to clear out diacritic marks and are now moving to SailPoint IDNow. We noticed the transform decomposeDiacriticalMarks to perform the same function.

I am wondering if we can get the substitution list that is utilized in this transform so that if the substitutions are different we can prepare our users for that.

Welcome to the developer community Todd.

We don’t have a complete list of substitution mappings, but I can tell you what libraries/method are being used by the transform.

The decomposeDiacriticalMarks transform uses the Normalizer library to decompose the diacritical marks. It specifically uses the Normalization Form KD (NFKD), as described in Sections 3.6, 3.10, and 3.11 of The Unicode Standard, also summarized under Annex 4: Decomposition.

Once decomposed, the transform then uses a Regex Replace to replace all diacritical marks using the InCombiningDiacriticalMarks property of Unicode (ex. replaceAll("[\\p{InCombiningDiacriticalMarks}]", "")).

That’s probably more technical than you wanted, but it will hopefully give you some idea of what’s going on under the hood.

If you want to run some tests in code, you can use this java code to compare the results of what the transform does with what your PowerShell does.

import java.text.Normalizer;
import java.util.regex.Pattern;

// Decomposes characters from their diacritical marks
input = Normalizer.normalize(input, Normalizer.Form.NFKD);

// Removes the marks
input = input.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");