We have recently spent a lot of time looking at data quality issues in publisher data. Email addresses are a key piece of contact info, and essential for online marketing campaigns, so it’s particularly important that these are present and correct. But what exactly is a “valid” email address?
Interestingly, the answer is not as straightforward as it might sound. As a starting point, there are the official specs (e.g. RFC 2822), which define what syntax and characters are allowed. But – would you be surprised to see that all of the following are formally valid according to those?
postbox@com
(no dot)"very.unusual.@.unusual.com"@example.com
(two @ signs)!#$%&'*+-/=?^_`{}|~@example.com
(no alphanumeric chars in first part)" "@example.com
(contains a space)üñîçøðé@example.com
(Unicode characters in first part)
Examples courtesy of Wikipedia
In the real world, these addresses would most likely be rejected when signing up to online systems, which commonly have their own ideas about what is “valid”.
On the flip side, many “normally” formatted email addresses may in fact look suspicious on closer inspection:
a@example.com
(single letter in first part)bbb@example.com
(repeated letters in first part)test@example.com
(test address)dummy123@example.com
(dummy address)
In these cases, it’s likely that users may have entered a made up address to fast track the registration process. Alternatively, they may even have been submitted automatically by a “bot” creating fake accounts, which may need some further investigation.
As an extra complication, even an email address that at surface value looks fine may not be deliverable to. For example, john.smith@company.com
may not be reachable because that company no longer exists. In this case, it’s possible to check whether a domain exists and accepts mail.
But what if the individual has simply changed jobs, meaning that this account no longer exists at the company? In that case, the “validity” of the address can only be tested by actually sending an email to it, which may then be returned to the sender and flagged as a “bounce”.
However, in other cases the message could appear to have been delivered OK but not received by the intended recipient. Here, looking for a history of non-opens and non-clicks can help, but isn’t fool proof – given certain user settings messages can be opened without being tracked.
These are all issues to be aware of when using the phrase “valid” email. Since this may be interpreted differently depending on who – or which system – you are talking to, it’s important to be clear exactly what you mean.
Here at DataSalon we’ve been working on identifying “valid” email addresses for many years and have developed sophisticated rules to identify incorrect and suspect values. This is important when using email as part of a personal identifier in MasterVision, or when cleansing customer data using MasterVision DQ. You can contact us to find out more.