Match Special Letters with PHP Regular Expressions

By  on  

Regular expressions come with all sorts of peculiarities, one of which I recently ran into when creating a regex within PHP and preg_match.  I was trying to parse strings with the format "Real Name (:username)" when I ran into a problem I would see a lot at Mozilla:  my regular expression wasn't properly catching "special" or "international" letters, like à, é, ü, and the dozens of others.

My regular expression was using A-z in the real name matching piece of the regex, which I assumed would match special letters, but it did not:

preg_match(
  "/([A-Za-z -]+)?\s?\[?\(?:([A-Za-z0-9\-\_]+)\)?\]?/", 
  "Yep Nopé [:ynope]", $matches);

// 0 => '[:ynope]', 1 => 'Yep Nopé', 2 => 'ynope'

To match international letters, I needed to update my regular expression in two ways:

  • Change A-z to \pL within the matching piece
  • Add the u modifier makes the string treated as UTF-8

The updated regex would be:

preg_match(
  "/([\pL -]+)?\s?\[?\(?:([\pL0-9\-\_]+)\)?\]?/u", 
  "Yep Nopé [:ynope]", $matches);

// 0 => 'Yep Nopé [:ynope]', 1 => 'Yep Nopé', 2 => 'ynope'

You can see my simple test bed here. If you're afraid that other characters might seep in, or don't trust \pL, you could list every special letter manually (i.e. [A-zàáâä....])

One of the nice parts of working at a truly global organization like Mozilla is that I'm exposed to many edge cases; in this case, a few special letters!

Recent Features

  • By
    Create a CSS Cube

    CSS cubes really showcase what CSS has become over the years, evolving from simple color and dimension directives to a language capable of creating deep, creative visuals.  Add animation and you've got something really neat.  Unfortunately each CSS cube tutorial I've read is a bit...

  • By
    5 Ways that CSS and JavaScript Interact That You May Not Know About

    CSS and JavaScript:  the lines seemingly get blurred by each browser release.  They have always done a very different job but in the end they are both front-end technologies so they need do need to work closely.  We have our .js files and our .css, but...

Incredible Demos

  • By
    Introducing MooTools ElementSpy

    One part of MooTools I love is the ease of implementing events within classes. Just add Events to your Implements array and you can fire events anywhere you want -- these events are extremely helpful. ScrollSpy and many other popular MooTools plugins would...

  • By
    QuickBoxes for Dojo

    Adding to my mental portfolio is important to me. First came MooTools, then jQuery, and now Dojo. I speak often with Peter Higgins of Dojo fame and decided it was time to step into his world. I chose a simple but useful plugin...

Discussion

  1. [A-z] doesn’t do what you seem to quite what you think it does. That character range includes the characters in the ASCII table between Z and a: [\]^_. It looks like you should be using [A-Za-z].

Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!