Match Special Letters with PHP Regular Expressions

By  on  

Regular expressions come with all sorts of peculiarities, one of which I recently ran into when creating a regex within PHP and preg_match.  I was trying to parse strings with the format "Real Name (:username)" when I ran into a problem I would see a lot at Mozilla:  my regular expression wasn't properly catching "special" or "international" letters, like à, é, ü, and the dozens of others.

My regular expression was using A-z in the real name matching piece of the regex, which I assumed would match special letters, but it did not:

preg_match(
  "/([A-Za-z -]+)?\s?\[?\(?:([A-Za-z0-9\-\_]+)\)?\]?/", 
  "Yep Nopé [:ynope]", $matches);

// 0 => '[:ynope]', 1 => 'Yep Nopé', 2 => 'ynope'

To match international letters, I needed to update my regular expression in two ways:

  • Change A-z to \pL within the matching piece
  • Add the u modifier makes the string treated as UTF-8

The updated regex would be:

preg_match(
  "/([\pL -]+)?\s?\[?\(?:([\pL0-9\-\_]+)\)?\]?/u", 
  "Yep Nopé [:ynope]", $matches);

// 0 => 'Yep Nopé [:ynope]', 1 => 'Yep Nopé', 2 => 'ynope'

You can see my simple test bed here. If you're afraid that other characters might seep in, or don't trust \pL, you could list every special letter manually (i.e. [A-zàáâä....])

One of the nice parts of working at a truly global organization like Mozilla is that I'm exposed to many edge cases; in this case, a few special letters!

Recent Features

  • By
    Facebook Open Graph META Tags

    It's no secret that Facebook has become a major traffic driver for all types of websites.  Nowadays even large corporations steer consumers toward their Facebook pages instead of the corporate websites directly.  And of course there are Facebook "Like" and "Recommend" widgets on every website.  One...

  • By
    Chris Coyier’s Favorite CodePen Demos

    David asked me if I'd be up for a guest post picking out some of my favorite Pens from CodePen. A daunting task! There are so many! I managed to pick a few though that have blown me away over the past few months. If you...

Incredible Demos

  • By
    MooTools Link Fading

    We all know that we can set a different link color (among other properties) on the hover event, but why not show a little bit more dynamism by making the original color fade to the next? Using MooTools 1.2, you can achieve that effect. The MooTools...

  • By
    Fancy Navigation with MooTools JavaScript

    Navigation menus are traditionally boring, right? Most of the time the navigation menu consists of some imagery with a corresponding mouseover image. Where's the originality? I've created a fancy navigation menu that highlights navigation items and creates a chain effect. The XHTML Just some simple...

Discussion

  1. [A-z] doesn’t do what you seem to quite what you think it does. That character range includes the characters in the ASCII table between Z and a: [\]^_. It looks like you should be using [A-Za-z].

Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!