I was checking my email after lunch and saw a message sitting in my Inbox with the subject: "REGEX".  Okay, I thought, what now?  Here's the email:

You know how to build regular expressions?

/^(2[0-3])|[01][0-9]:[0-5][0-9]$/

This is supposed to test for a valid time.

Can you tell me what the (2[0-3]) at the beginning does?

Also, how would I add on A/PM?

Let's discuss the email.  Do I know how to build regular expressions?  Sure.

Can you tell me what the (2[0-3]) at the beginning does?  It matches a 2 followed by a single digit in the range from 0 - 3 and places the result of that match in a group.  Specifically it's looking for 20, 21, 22 or a 23.

Also, how would I add on A/PM?  Well, before the $ you'd want to add something like:

(\s*[AP]M)?

That would catch any amount of white space after the last digit, including none.  Then AM or PM, but only if it was present.  The ? means the pattern could happen 0 or 1 times.

Obviously AM/PM doesn't make sense in the context of 24 hour time, which is what that regex above validates.

Now's the time to insert a quote on regex in an under-handed and humorous way:

Some people, when confronted with a problem, think
“I know, I'll use regular expressions.”   Now they have two problems.

If you're interested in the origin of that quote, as I was, you'll want to read this.  It was Jamie Zawinski if you can't be bothered with the link.

I had another email, about a month earlier, about using a RegularExpressionValidator in an ASP.NET application.  Here's the interesting bit:

ValidationExpression="[0-9][0-9]"

The question was something to the effect of: "This should validate that the field has two numbers, but it's accepting a89 and 99e as input.  Do you have any idea why?"  Sure, that regex is supposed to match 2 digits and it is.  If you only want to match 2 digits and nothing else then you need to bound your input with something like:

^\d{2}$

I understand, that to some, regex are mysterious voodoo like bit math but it's one of those things that keep popping up and you'd be doing yourself a favor if you figured out basic structure and rules.  Don't learn all the escape sequences (\s, \S, \d, \w, \b) because that's easy enough to find on a cheatsheet.  It doesn't help that there are so many ways to express the same intent (doesn't help in a deciphering sense).  Just look at the second expression above, [0-9][0-9].  That alone could be rewritten in any of the following forms:

  • \d\d
  • \d{2}
  • [0-9]{2}

After you have the basic structure I think you need to find a decent tool that will let you evaluate regular expressions in your target language (the first regex was for javascript, the second was for .NET).  The one I keep going back to for .NET is The Regulator by Roy Osherove.  Here's an ajax version for javascript: http://tools.netshiftmedia.com/regexlibrary/ 

You would be even better served by writing up some tests asserting how the validations should work with examples of passing and failing inputs.  For one, they're more permanent than the tools.  They also allow you to add assertions based on changing requirements or bugs and help document regular expressions in code.  If you think it's hard to read straight code a few weeks or months later, even your own code, it's even harder to read a regular expression.  Even a slightly complicated regex without context could be a nightmare to debug a few months down the road, those tests will give your regex context.

The title refers to the quote, I have no problems with regex.