Have you ever noticed how everyone writes phone numbers differently? Some people use spaces, some use dashes, and some use parentheses. Different people group a different number of digits together.
This becomes a real problem when you're trying to store phone numbers in a database and need to retrieve records by phone number.
Jump directly to:
In IT and computer science, to normalize means to make something consistent or standard. A normalized phone number is a phone number that is consistently formatted across all records.
Is there an international standard for phone numbers?
The E.164 standard defines the international public telecommunication numbering plan. And if we want to store phone numbers in a standardized way, we should use the E.164 format.
There is another standard that focuses on human-readable formats. The International Telecommunication Union (ITU) has published the E.123 standard which defines standard notations for phone numbers, email addresses, and web addresses.
An international phone number written in the E.123 format looks like this:
+44 345 678 901
The same phone number in the E.164 format looks like this:
+44345678901
The +
sign indicates the international dialing prefix. The country code is
44
, and the phone number is 345678901
.
But when asked to enter their phone number in a form field, most people would write it in their local convention or even personal preference. And these can vary widely from country to country and even from person to person.
For example, in the US, people would write their phone number as:
(341) 678-9012
In the UK, a local phone number would be written as:
023 4567 8901
or 01865 567890
And in India, it would be written as:
0123-456-7890
And some people might use the international access code 00
instead of the +
sign.
00358 345 678 901
When you're storing phone numbers in a database, you want to be able to search for them regardless of how they were entered.
Imagine you have the following phone numbers in your database:
123-456-7890
(234) 567-8901
0345 / 678 9012
+496 789 0123
567-890-1234
If you get a call from +49 345 678 9012
, how will you find it in your database
to look up the customer's details?
The least ambiguous format to store a phone number is the international format defined
by E.123 without any spaces: +331234567890
.
But how do you get there?
The very best way is to try to get your users to enter their phone numbers in this format in the first place.
This could be done by providing a dropdown list of countries and then automatically formatting the phone number based on the country code.
For example the International Telephone Input library offers this nice UI:
Source: https://github.com/jackocnr/intl-tel-input
If you can't get your users to enter their phone numbers in the correct format because you're dealing with legacy data or you're importing data from another source, your journey will be harder - but not impossible.
First of all, consider using a library. Google has open-sourced a library called libphonenumber that can help you in many cases.
If you're using Python, take a look at the Python port of libphonenumber called phonenumbers.
Or for JavaScript google-libphonenumber.
But sometimes you'll need to write your own normalization function or handle special cases the library can't handle because it doesn't have enough context.
I'll focus on the vast majority of cases where your main enemies are inconsistencies in the way phone numbers are written. For the weirdest local edge cases, take a look at the phone number philosophy section at the end of this article.
Here are the steps that should get you most of the way there in most cases:
Be careful stripping non-numeric characters as your first step. Some can help you determine the area code.
The way someone writes a phone number can give you hints about the country and area codes. For example, in the US, the area code is often enclosed in parentheses.
(234) 567-8901
Or sometimes the area code is separated by a space or a slash.
0345 / 678 9012
01234 56 7890
Take these structures into account and save your first guess about the country or area code for later.
If you're only dealing with phone numbers from one country, you can assume that
all phone numbers either already have the country code in the format +1
, +33
or +351
, or that they are local numbers.
People might also have entered the country code without the +
sign, like 1
or 39
.
If a local number starts with a single 0
, in almost all countries you can (and
must) strip the 0
and prepend the country code. However, in
Italy
the leading 0 has to stay in the number even when calling from abroad with a
country code.
If the user entered a number starting with 00
, strip the 00
but assume the
country code is already there. Watch out for a special case in some
countries.
For example, in Australia or Hong Kong, 00xx
is used to select the carrier to
use for an outgoing international call.
But in most cases:
0345 678 901
becomes +44 345 678 901
0044 1234 56 7890
becomes +441234 56 7890
If you're dealing with phone numbers from multiple countries, you'll need to know which country the user was assuming when they entered the phone number.
If you are lucky and only need to handle a few countries, area codes can give you a hint about the country: Some area codes are used in one country but not in another.
As the final step, strip all non-numeric characters except the +
sign at the
beginning.
Your final normalized phone number should look like this:
+331234567890
And now you're ready to store it as a string.
When you have to search for a phone number, normalize the search term in the same way and search.
To be safe, store the original phone number as well. You might need it later for verification or to display it to the user.
I have to admit something: I made it sound more straightforward than it is. And while the rest of this article should cover most cases you'll ever encounter, reality is way messier.
For a deeper dive into the topic, I recommend reading the article Falsehoods Programmers Believe About Phone Numbers which is part of the libphonenumber repository.
Here are the most important aspects that could bite you if you're not careful:
7
is not the same phone number as 007
and there are
countries where *5770
is a dialable phone number (e.g. Israel Railways).Storing phone numbers only as they were entered almost guarantees that you'll have trouble finding them later. Normalizing phone numbers early is a good idea if you want to be able to search for them later on.
Here is a reminder of the key points in this article:
Sign up for our newsletter to get our freshest insights and product updates.
We care about the protection of your data. Read our Privacy Policy.