CASE STUDY: An assessment of pseudonym truncation in an NHS database

Egerton Consulting was approached to assess the implications of truncating a 40 digit “pseudonym” generated for each NHS patient to just 20 digits. A pseudonym code was generated from each patient’s NHS number to uniquely identify each patient while ensuring anonymity when stored with the patient’s electronic records.

An IT change had resulted in a new encryption basis that produced 40 digit pseudonyms compared with 20 digit pseudonyms produced previously. In keeping with the previous basis, ancillary systems had been designed to handle 20 digits only and so it was proposed to truncate the 40 digit pseudonyms to 20 digits in order to maintain compatibility. However this would introduce the risk of non-unique pseudonyms occurring and it was therefore required to understand this risk, specifically the probability of creating identical 20 digit pseudonyms for different NHS patients.

The calculation of this probability was not possible using conventional methods of calculation due to the conjunction of extremely small individual probabilities with extremely large numbers of possibilities. The usual Poisson assumption and exponential approximations could not be applied in this case, so a numerical method of calculation based on a logarithmic expansion was devised to solve this numerical problem.

The risks of non-uniqueness occurring for various degrees of truncation and population sizes were explored, and recommendations were also made for the use of check digits for quality assurance purposes during the transcription of pseudonyms by NHS staff.