Photos:

China pictures processed

 
0%
 
2009.03.11 @16:27 

The best regexp possible for email validation even in javascript

The best regexp possible for email validation even in javascript

For a while I sometimes had a look on this page : http://fightingforalostcause.net/misc/2006/compare-email-regex.php

for a good regexp to validate emails.

I thought the idea was neat : finding some of the most used regexps on the web and compare them thanx to a good unit test with an interesting set of valid and unvalid emails.

I totally agree with Ian that "It's my philosophy that it's better to accept a few invalid addresses than reject any valid ones, so I'm shooting for 0 false-positives and as few false-negatives as possible."

Also sadly his winning regexp does not seem to work in javascript because of advanced regexp features that are not supported in javascript.

And since the second one on his list seemed simpler and easier to enhance while not far from the finishing line...

Here is my attempt to sanitize the world of email addresses :

based Warren Gaebel's regexp

Here are the results :

Should be Valid:

l3tt3rsAndNumb3rs@domain.com : Valid
has-dash@domain.com : Valid
hasApostrophe.o'leary@domain.org : Valid
uncommonTLD@domain.museum : Valid
uncommonTLD@domain.travel : Valid
uncommonTLD@domain.mobi : Valid
countryCodeTLD@domain.uk : Valid
lettersInDomain@911.com : Valid
underscore_inLocal@domain.net : Valid
IPInsteadOfDomain@127.0.0.1 : Valid
IPAndPort@127.0.0.1:25 : Valid
subdomain@sub.domain.com : Valid
local@dash-inDomain.com : Valid
dot.inLocal@foo.com : Valid
a@singleLetterLocal.org : Valid
singleLetterDomain@x.org : Valid
&*=?^+{}'~@validCharsInLocal.net : Valid

 

Should be NOT Valid :

missingDomain@.com : Not Valid
@missingLocal.org : Not Valid
missingatSign.net : Not Valid
missingDot@com : Not Valid
two@@signs.com : Not Valid
colonButNoPort@127.0.0.1: : Not Valid
  : Not Valid
someone-else@127.0.0.1.26 : Not Valid
.localStartsWithDot@domain.com : Not Valid
localEndsWithDot.@domain.com : Not Valid
two..consecutiveDots@domain.com : Not Valid
domainStartsWithDash@-domain.com : Not Valid
domainEndsWithDash@domain-.com : Valid
TLDDoesntExist@domain.moc : Not Valid
numbersInTLD@domain.c0m : Not Valid
missingTLD@domain. : Not Valid
! "#$%(),/;<>[]`|@invalidCharsInLocal.org : Not Valid
invalidCharsInDomain@! "#$%(),/;<>_[]`|.org : Not Valid
local@SecondLevelDomainNamesAreInvalidIfTheyAreLongerThan64Charactersss.org : Valid

 

Javascript Unit Test code

This way you can use the same regexp code in several languages including Javascript.

Note that it can be improved but the "domain-" case is a bit painful to be improved in a simple way. Indeed it's easy to forbid to finish a domain with a "-" if you divide your (sub)domain in three parts but in this case you would forbid domain names with only one character... resolving this without advanced tricks (lookahead/back) seems trickier.

Another thing to keep in mind is that it uses a list of pre-defined TLDs, when new TLDs are created, the regexp would need to be updated otherwise flag the new TLDs as invalid. But on the other hand, it's neat to be able to find typos in domain names sur as "@xx.infos" where it should have been "@xx.info".

So dont forget to add to the list if anything new comes in (greater then two characters).

If you dont want to update your old code then change :

to :

this works. (but adds a false negative in the unit test obviously since "moc" does not really exist on the net)

3 Comments
  1. Hank said:

    Tuesday, September 11, 2012 at 21:18

    missingDot@com is actually valid, according to the standard. Top level domains can theoretically receive email. (See the wikipedia article for an easy to read reference: http://en.wikipedia.org/wiki/Email_address#Domain_part ). Other than this and the detail that the domain name part can be abbreviated, this is a really good regex. (See section 6.2.2 of the official specification: http://www.ietf.org/rfc/rfc0822.txt?number=822 ). Thanks!

  2. Archange said:

    Thursday, September 13, 2012 at 15:40

    You're right indeed. Although one can argue that when you make your own website, it's most probably impossible to have this form of email address for one of your users (unless on intranet). And the satisfaction to avoid the mistake for those non-top level users can be neat. Actually the form address@I.P.A.DDress is not really useful either in this case even though that can work + these years we'll have to make it IPv6 enable which will complexify the regexp ;)

  3. rmaksim said:

    Wednesday, October 24, 2012 at 18:32

    rmaksim@333.444.555.666 Valid!!! :(

Post a Comment
Comments have been deactivated thanx to screwers.