Match Special Letters with PHP Regular Expressions

By  on  

Regular expressions come with all sorts of peculiarities, one of which I recently ran into when creating a regex within PHP and preg_match.  I was trying to parse strings with the format "Real Name (:username)" when I ran into a problem I would see a lot at Mozilla:  my regular expression wasn't properly catching "special" or "international" letters, like à, é, ü, and the dozens of others.

My regular expression was using A-z in the real name matching piece of the regex, which I assumed would match special letters, but it did not:

preg_match(
  "/([A-Za-z -]+)?\s?\[?\(?:([A-Za-z0-9\-\_]+)\)?\]?/", 
  "Yep Nopé [:ynope]", $matches);

// 0 => '[:ynope]', 1 => 'Yep Nopé', 2 => 'ynope'

To match international letters, I needed to update my regular expression in two ways:

  • Change A-z to \pL within the matching piece
  • Add the u modifier makes the string treated as UTF-8

The updated regex would be:

preg_match(
  "/([\pL -]+)?\s?\[?\(?:([\pL0-9\-\_]+)\)?\]?/u", 
  "Yep Nopé [:ynope]", $matches);

// 0 => 'Yep Nopé [:ynope]', 1 => 'Yep Nopé', 2 => 'ynope'

You can see my simple test bed here. If you're afraid that other characters might seep in, or don't trust \pL, you could list every special letter manually (i.e. [A-zàáâä....])

One of the nice parts of working at a truly global organization like Mozilla is that I'm exposed to many edge cases; in this case, a few special letters!

Recent Features

  • By
    fetch API

    One of the worst kept secrets about AJAX on the web is that the underlying API for it, XMLHttpRequest, wasn't really made for what we've been using it for.  We've done well to create elegant APIs around XHR but we know we can do better.  Our effort to...

  • By
    Introducing MooTools Templated

    One major problem with creating UI components with the MooTools JavaScript framework is that there isn't a great way of allowing customization of template and ease of node creation. As of today, there are two ways of creating: new Element Madness The first way to create UI-driven...

Incredible Demos

  • By
    Animated Progress Bars Using MooTools: dwProgressBar

    I love progress bars. It's important that I know roughly what percentage of a task is complete. I've created a highly customizable MooTools progress bar class that animates to the desired percentage. The Moo-Generated XHTML This DIV structure is extremely simple and can be controlled...

  • By
    CSS Vertical Center with Flexbox

    I'm 31 years old and feel like I've been in the web development game for centuries.  We knew forever that layouts in CSS were a nightmare and we all considered flexbox our savior.  Whether it turns out that way remains to be seen but flexbox does easily...

Discussion

  1. [A-z] doesn’t do what you seem to quite what you think it does. That character range includes the characters in the ASCII table between Z and a: [\]^_. It looks like you should be using [A-Za-z].

Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!