How to untangle phone numbers

Have you ever noticed how everyone writes phone numbers differently? Some people use spaces, some use dashes, and some use parentheses. Different people group a different number of digits together.

This becomes a real problem when you're trying to store phone numbers in a database and need to retrieve records by phone number.

Jump directly to:

What is a normalized phone number?

In IT and computer science, to normalize means to make something consistent or standard. A normalized phone number is a phone number that is consistently formatted across all records.

Is there an international standard for phone numbers? The International Telecommunication Union (ITU) has published the E.123 standard which defines standard notations for phone numbers, email addresses, and web addresses.

An international phone number written in the E.123 format looks like this:

+12 345 678 901

The + sign indicates the international dialing prefix. The country code is 12, and the phone number is 345 678 901.

But when asked to enter their phone number in a form field, most people would write it in their local convention or even personal preference. And these can vary widely from country to country and even from person to person.

Phone number formats are a mess

For example, in the US, the same phone number would be written as:

(345) 678-901

In the UK, a local phone number would be written as:

01234 56 7890

And in India, it would be written as:

0123-456-7890

And some people might use the international access code 00 instead of the + sign.

0012 345 678 901

Why is this a problem?

When you're storing phone numbers in a database, you want to be able to search for them regardless of how they were entered.

Imagine you have the following phone numbers in your database:

  • 123-456-7890
  • (234) 567-8901
  • 0345 / 678 9012
  • +456 789 0123
  • 567-890-1234

If you get a call from +12 345 678 9012, how will you find it in your database to look up the customer's details?

How to normalize a phone number

The least ambiguous format to store a phone number is the international format defined by E.123 without any spaces: +123456789012.

But how do you get there?

Via the user interface

The very best way is to try to get your users to enter their phone numbers in this format in the first place.

This could be done by providing a dropdown list of countries and then automatically formatting the phone number based on the country code.

For example the International Telephone Input library offers this nice UI:

International Telephone Input Source: https://github.com/jackocnr/intl-tel-input

Handling existing data

If you can't get your users to enter their phone numbers in the correct format because you're dealing with legacy data or you're importing data from another source, your journey will be harder - but not impossible.

First of all, consider using a library. Google has open-sourced a library called libphonenumber that can help you in many cases.

If you're using Python, take a look at the Python port of libphonenumber called phonenumbers.

Or for JavaScript google-libphonenumber.

But sometimes you'll need to write your own normalization function or handle special cases the library can't handle because it doesn't have enough context.

I'll focus on the vast majority of cases where your main enemies are inconsistencies in the way phone numbers are written. For the weirdest local edge cases, take a look at the phone number philosophy section at the end of this article.

Here are the steps you'll need to take:

Strip some non-numeric characters

Be careful stripping non-numeric characters as your first step. Some can help you determine the area code.

The way someone writes a phone number can give you hints about the country and area codes. For example, in the US, the area code is often enclosed in parentheses.

  • (234) 567-8901

Or sometimes the area code is separated by a space or a slash.

  • 0345 / 678 9012
  • 01234 56 7890

Take these structures into account and save your first guess about the country or area code for later.

Determine the country code

If you're only dealing with phone numbers from one country, you can assume that all phone numbers either already have the country code in the format +1, +12 or +123, or that they are local numbers.

People might also have entered the country code without the + sign, like 1 or 12.

If a local number starts with a single 0, strip the 0 and prepend the country code. If it starts with 00, strip the 00 but assume the country code is already there.

  • 01234 56 7890 becomes +441234 56 7890
  • 0044 1234 56 7890 becomes +441234 56 7890

If you're dealing with phone numbers from multiple countries, you'll need to know which country the user was assuming when they entered the phone number.

If you are lucky and only need to handle a few countries, area codes can give you a hint about the country: Some area codes are used in one country but not in another.

Only keep the digits and + sign

As the final step, strip all non-numeric characters except the + sign at the beginning.

Your final normalized phone number should look like this:

+123456789012

And now you're ready to store it as a string.

When you have to search for a phone number, normalize the search term in the same way and search.

To be safe, store the original phone number as well. You might need it later for verification or to display it to the user.

Phone number philosophy

I have to admit something: I made it sound more straightforward than it is. And while the rest of this article should cover most cases you'll ever encounter, reality is way messier.

For a deeper dive into the topic, I recommend reading the article Falsehoods Programmers Believe About Phone Numbers which is part of the libphonenumber repository.

Here are the most important aspects that could bite you if you're not careful:

  • Phone numbers are not always unique: People can have multiple phone numbers, one phone number can be shared by many people.
  • Phone numbers can change: People can change their phone numbers and phone numbers can be reassigned.
  • Not all numbers are dialable or textable
  • Non-ASCII characters: For example, in Egypt, the Arabic script is often used to write phone numbers.
  • Numbering plans change: Countries can change their numbering plans and there may be a transition period during which both formats are valid.
  • Phone numbers are no numbers: Phone numbers are not numbers in the mathematical sense. 7 is not the same phone number as 007.

Final thoughts

Storing phone numbers only as they were entered almost guarantees that you'll have trouble finding them later. Normalizing phone numbers early is a good idea if you want to be able to search for them later on.

Here is a reminder of the key points in this article:

  • Try to get the normalized number from the user in the first place.
  • Show the normalized number to the user and ask them to confirm.
  • Some non-numeric characters can give you hints about the country/area code.
  • Be aware of local conventions when trying to guess the country code.
  • Do not simply strip all non-numeric characters and store as a number.
  • Store phone numbers as strings in the normalized format and keep the original format as well.

Do you want to use your data everywhere?

Sign up for our newsletter to get our freshest insights and product updates.

We care about the protection of your data. Read our Privacy Policy.