Falsehoods Developers Believe About Names

There is a blog post that I’ve shared in this forum at least twice, and share with every team I work with in the Identity space. I’m sharing the meat of the post here, in case the original ceases to exist on the internet at some point in the future.

It’s a good primer on the assumptions people make about what constitutes a name, or more generally, what they think they know about their data.

I have lived in Japan for several years, programming in a professional capacity, and I have broken many systems by the simple expedient of being introduced into them. (Most people call me Patrick McKenzie, but I’ll acknowledge as correct any of six different “full” names, any many systems I deal with will accept precisely none of them.) Similarly, I’ve worked with Big Freaking Enterprises which, by dint of doing business globally, have theoretically designed their systems to allow all names to work in them. I have never seen a computer system which handles names properly and doubt one exists, anywhere.

So, as a public service, I’m going to list assumptions your systems probably make about names. All of these assumptions are wrong. Try to make less of them next time you write a system which touches names.

  1. People have exactly one canonical full name.
  2. People have exactly one full name which they go by.
  3. People have, at this point in time, exactly one canonical full name.
  4. People have, at this point in time, one full name which they go by.
  5. People have exactly N names, for any value of N.
  6. People’s names fit within a certain defined amount of space.
  7. People’s names do not change.
  8. People’s names change, but only at a certain enumerated set of events.
  9. People’s names are written in ASCII.
  10. People’s names are written in any single character set.
  11. People’s names are all mapped in Unicode code points.
  12. People’s names are case sensitive.
  13. People’s names are case insensitive.
  14. People’s names sometimes have prefixes or suffixes, but you can safely ignore those.
  15. People’s names do not contain numbers.
  16. People’s names are not written in ALL CAPS.
  17. People’s names are not written in all lower case letters.
  18. People’s names have an order to them. Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
  19. People’s first names and last names are, by necessity, different.
  20. People have last names, family names, or anything else which is shared by folks recognized as their relatives.
  21. People’s names are globally unique.
  22. People’s names are almost globally unique.
  23. Alright alright but surely people’s names are diverse enough such that no million people share the same name.
  24. My system will never have to deal with names from China.
  25. Or Japan.
  26. Or Korea.
  27. Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use.
  28. That Klingon Empire thing was a joke, right?
  29. Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names.
  30. There exists an algorithm which transforms names and can be reversed losslessly. (Yes, yes, you can do it if your algorithm returns the input. You get a gold star.)
  31. I can safely assume that this dictionary of bad words contains no people’s names in it.
  32. People’s names are assigned at birth.
  33. OK, maybe not at birth, but at least pretty close to birth.
  34. Alright, alright, within a year or so of birth.
  35. Five years?
  36. You’re kidding me, right?
  37. Two different systems containing data about the same person will use the same name for that person.
  38. Two different data entry operators, given a person’s name, will by necessity enter bitwise equivalent strings on any single system, if the system is well-designed.
  39. People whose names break my system are weird outliers. They should have had solid, acceptable names, like 田中太郎.
  40. People have names.

This list is by no means exhaustive.

Feel free to share examples you’ve seen (celebrities or public figures, preferably) that violate the above assumptions!

Hi @sup3rmark

TL:DR While I think it’s a fun post, and Identity professionals should be aware, I think it’s more an HR issue than an Identity management issue.

Taking the first “assumption”:

People have exactly one canonical full name.

Canonical names are authoritative only in the domain (realm/scope) for which they are defined. In the Identity Management domain, one can state that the Full Name from HR is the Canonical Full Name, because we define the scope of what that means. Yes, it can change, may not be unique and people may have different ideas of what a full name is; and that is why it should be avoided as an identifier for linking or account name purposes, but that doesn’t mean it isn’t “canonical”. It could also be used for correlation purposes, but only as a secondary criterion.

To use a different attribute as an example; within the Identity Management domain we can state that

joseph.bloggs@mycompany.com

is the Canonical email address, because it is the ‘primary’ email address for the Identity’s mailbox. Yes, there may be aliases, such as

joe.bloggs@mycompanytradingname.com

and, yes, it can change so should only be used for linking or account name purposes where nothing else is available. Ok for correlation as it should be unique within a particular domain.

@j_place It’s definitely something that is relevant to us in our space. “Full Name” can mean different things, which is what the blog post was getting at:

  • legalFirstName legalLastName
  • legalFirstName legalMiddleName legalLastName
  • preferredFirstName preferredLastName
  • preferredFirstName preferredMiddleName preferredLastName

When a system owner talks about “full name,” we need to ensure we’re getting clarification around which value they’re actually asking for.

Hi Mark

Yes, I agree, but aren’t all attributes in the same category? My point is the Identity Administrator will choose one of the options (based on the Authoritative Source) and then declare it as canonical. Similarly display names, job titles, etc. Connected systems can ask for aliases, ie in different formats, but the central identity should be the canonical version within the organisation - the single source of truth.

Some HR systems contain surname prefixes, titles, suffixes etc. it is for the Identity Administrator to say “Van Winkle, Rip (Sir) KBE” (or some combination) is the nomalised/canonicalised version of Display Name within the organisation and similarly “Rip Van Winkle” is his Full Name.