Match Special Letters with PHP Regular Expressions

By  on  

Regular expressions come with all sorts of peculiarities, one of which I recently ran into when creating a regex within PHP and preg_match.  I was trying to parse strings with the format "Real Name (:username)" when I ran into a problem I would see a lot at Mozilla:  my regular expression wasn't properly catching "special" or "international" letters, like à, é, ü, and the dozens of others.

My regular expression was using A-z in the real name matching piece of the regex, which I assumed would match special letters, but it did not:

preg_match(
  "/([A-Za-z -]+)?\s?\[?\(?:([A-Za-z0-9\-\_]+)\)?\]?/", 
  "Yep Nopé [:ynope]", $matches);

// 0 => '[:ynope]', 1 => 'Yep Nopé', 2 => 'ynope'

To match international letters, I needed to update my regular expression in two ways:

  • Change A-z to \pL within the matching piece
  • Add the u modifier makes the string treated as UTF-8

The updated regex would be:

preg_match(
  "/([\pL -]+)?\s?\[?\(?:([\pL0-9\-\_]+)\)?\]?/u", 
  "Yep Nopé [:ynope]", $matches);

// 0 => 'Yep Nopé [:ynope]', 1 => 'Yep Nopé', 2 => 'ynope'

You can see my simple test bed here. If you're afraid that other characters might seep in, or don't trust \pL, you could list every special letter manually (i.e. [A-zàáâä....])

One of the nice parts of working at a truly global organization like Mozilla is that I'm exposed to many edge cases; in this case, a few special letters!

Recent Features

  • By
    CSS Filters

    CSS filter support recently landed within WebKit nightlies. CSS filters provide a method for modifying the rendering of a basic DOM element, image, or video. CSS filters allow for blurring, warping, and modifying the color intensity of elements. Let's have...

  • By
    Conquering Impostor Syndrome

    Two years ago I documented my struggles with Imposter Syndrome and the response was immense.  I received messages of support and commiseration from new web developers, veteran engineers, and even persons of all experience levels in other professions.  I've even caught myself reading the post...

Incredible Demos

  • By
    Telephone Link Protocol

    We've always been able to create links with protocols other than the usual HTTP, like mailto, skype, irc ,and more;  they're an excellent convenience to visitors.  With mobile phone browsers having become infinitely more usable, we can now extend that convenience to phone numbers: The tel...

  • By
    AJAX Page Loads Using MooTools Fx.Explode

    Note: All credit for Fx.Explode goes to Jan Kassens. One of the awesome pieces of code in MooTools Core Developer Jan Kassens' sandbox is his Fx.Explode functionality. When you click on any of the designated Fx.Explode elements, the elements "explode" off of the...

Discussion

  1. [A-z] doesn’t do what you seem to quite what you think it does. That character range includes the characters in the ASCII table between Z and a: [\]^_. It looks like you should be using [A-Za-z].

Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!