Sorting Strings with Accented Characters

By  on  

Strings can create a whole host of problems within any programming language.  Whether it's a simple string, a string containing emojis, html entities, and even accented characters, if we don't scrub data or make the right string handling choices, we can be in a world of hurt.

While looking through Joel Lovera's JSTips repo, I spotted a string case that I hadn't run into yet (...I probably have but didn't notice it):  sorting accented characters to get the desired outcome.  The truth is that accented characters are handled a bit differently than you'd think during a sort:

// Spanish
['único','árbol', 'cosas', 'fútbol'].sort();
// ["cosas", "fútbol", "árbol", "único"] // bad order

// German
['Woche', 'wöchentlich', 'wäre', 'Wann'].sort();
// ["Wann", "Woche", "wäre", "wöchentlich"] // bad order

Yikes -- accented characters don't simply follow their unaccented character counterparts.  By taking an extra step, i.e. localeCompare, we can ensure that our strings are sorted in the way we likely wanted in the first place:

['único','árbol', 'cosas', 'fútbol'].sort(function (a, b) {
  return a.localeCompare(b);
});
// ["árbol", "cosas", "fútbol", "único"]

['Woche', 'wöchentlich', 'wäre', 'Wann'].sort(function (a, b) {
  return a.localeCompare(b);
});
// ["Wann", "wäre", "Woche", "wöchentlich"]

// Or even use Intl.Collator!
['único','árbol', 'cosas', 'fútbol'].sort(Intl.Collator().compare);
// ["árbol", "cosas", "fútbol", "único"]

['Woche', 'wöchentlich', 'wäre', 'Wann'].sort(Intl.Collator().compare);
// ["Wann", "wäre", "Woche", "wöchentlich"]

Localization is already a big challenge without the added confusion that comes with accented characters.  Keep localeCompare and Intl.Collator in mind every time you want to sort strings!

Recent Features

  • By
    How I Stopped WordPress Comment Spam

    I love almost every part of being a tech blogger:  learning, preaching, bantering, researching.  The one part about blogging that I absolutely loathe:  dealing with SPAM comments.  For the past two years, my blog has registered 8,000+ SPAM comments per day.  PER DAY.  Bloating my database...

  • By
    Responsive Images: The Ultimate Guide

    Chances are that any Web designers using our Ghostlab browser testing app, which allows seamless testing across all devices simultaneously, will have worked with responsive design in some shape or form. And as today's websites and devices become ever more varied, a plethora of responsive images...

Incredible Demos

  • By
    Create a Download Package Using MooTools Moousture

    Zohaib Sibt-e-Hassan recently released a great mouse gestures library for MooTools called Moousture. Moousture allows you to trigger functionality by moving your mouse in specified custom patterns. Too illustrate Moousture's value, I've created an image download builder using Mooustures and PHP. The XHTML We provide...

  • By
    The Simple Intro to SVG Animation

    This article serves as a first step toward mastering SVG element animation. Included within are links to key resources for diving deeper, so bookmark this page and refer back to it throughout your journey toward SVG mastery. An SVG element is a special type of DOM element...

Discussion

  1. only problem is that this approach is slower, there is a solution which is to use a Intl.Collator object which speeds things up

    https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare

    It has other benefits too, such as support for numeric sorts, aka natural or human sorting, so that numerics are also sorted as humans expect, i.e 10 comes after 2

    https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Collator

  2. Nifty, except that ["Wann", "wäre", "Woche", "wöchentlich"] is not the correct order when sorting in German. So it’s not really internationalized, it’s just sorting in a way an English speaker would prefer. A native German would be very confused if you sorted thing this way.

    • Of course if localeCompare is run in the browser, and the user has a German language setting, it would be sorted as expected. My point is that it is dangerous to assume that all languages are sorted the same way.

    • Sébastien

      localeCompare accept a language argument so you could pass de as argument to sort even if the browser is set to English.

  3. Gustavo Costa

    Is it possible to sort mixed numbers (numerically by first) and accented or non-accented letters (alphabetically by second)?

  4. Gustavo Costa

    Test with this:

    "3", "2", "10", "40", "6", "4", "30", "33", "1", "Gustavo", "julho", "Klaus", "keyboard", "último", "árbol", "uma", "água", "Argentina", "Ángelo", "argelino", "unido"

    .

Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!