Tag Archives: parsers

Validating e-mail addresses

tl;dr: Most likely, you want to validate using the regular expression from the WhatWG (please think about the trade-off you want between practicality and precision); but if you read the caveats below and still want to validate to RFC 5322, then you want libemailvalidation.

Validating e-mail addresses is hard, and not something which you normally want to do in great detail: while it’s possible to spend a lot of time checking the syntax of an e-mail address, the real measure of whether it’s valid is whether the mail server on that domain accepts it. There is ultimately no way around checking that.

Given that a lot of mail providers implement their own restrictions on the local-part (the bit before the ‘@’) of an e-mail address, an address like !!@gmail.com (which is syntactically valid) probably won’t actually be accepted. So what’s the value in doing syntax checks on e-mail addresses? The value is in catching trivial user mistakes, like pasting the wrong data into an e-mail address field, or making a trivial typo in one.

So, for most use cases, there’s no need to bother with fancy validation: just check that the e-mail address matches the regular expression from the WhatWG. That should catch simple mistakes, accept all valid e-mail addresses, and reject some invalid addresses.

Why have I been doing further? Walbottle needs it — I think where one RFC references another is one of the few times it’s necessary to fully implement e-mail validation. In this case, Walbottle needs to be able to validate e-mail addresses provided in JSON files, for its email defined format.

So, I’ve just finished writing a small copylib to validate e-mail addresses according to all the RFCs I could get my hands on; mostly RFC 5322, but there is a sprinking of 5234, 5321, 3629 and 6532 in there too. It’s called libemailvalidation (because naming is hard; typing is easier). Since it’s only about 1000 lines of code, there seems to be little point in building a shared library for it and distributing that; so add it as a git submodule to your code, and use validate.c and validate.h directly. It provides a single function:

size_t error_position;

is_valid = emv_validate_email_address (address_to_check,
                                       length_of_address_to_check,
                                       EMV_VALIDATE_FLAGS_NONE,
                                       &error_position);

if (!is_valid)
  fprintf (stderr, "Invalid e-mail address; error at byte %zu\n",
           error_position);

I’ve had fun testing this lot using test cases generated from the ABNF rules taken directly from the RFCs, thanks to abnfgen. If you find any problems, please get in touch!

Fun fact for the day: due to the obs-qp rule, a valid e-mail address can contain a nul byte. So unless you ignore deprecated syntax for e-mail addresses (not an option for programs which need to be interoperable), e-mail addresses cannot be passed around as nul-terminated strings.