Python html5lib Skipped Elements

By  on  

I've been working on some interesting python stuff at Mozilla and one task recently called for called for rending a page and then finding elements with a URL attribute value (like img[src] or a[href]) and ensuring they become absolute URLs.  One problem I encountered when using html5lib was that LINK and IMG elements were being skipped when I tokenized the HTML.  After browsing through the html5lib source code, I found a variable called voidElements which included both LINK and IMAGE:

voidElements = frozenset((
    "base",
    "command",
    "event-source",
    "link",
    "meta",
    "hr",
    "br",
    "img",
    "embed",
    "param",
    "area",
    "col",
    "input",
    "source"
))

When I commented out those two elements, they were found upon next run of my routine, meaning their presence in the set were causing me problems.  Here's how I skirted the issue:

new_void_set = set()
for item in html5lib_constants.voidElements:
	new_void_set.add(item)
new_void_set.remove('link')
new_void_set.remove('img')
html5lib_constants.voidElements = frozenset(new_void_set)

Since voidElements is a frozenset, I couldn't simply remove LINK and IMG, so I needed to create a new frozenset without those elements.  Let me know if there's a more python-ish way of creating this frozen set.  In an event, delving into the deep recesses of html5lib paid off and I accomplished the goal!

Recent Features

  • By
    Facebook Open Graph META Tags

    It's no secret that Facebook has become a major traffic driver for all types of websites.  Nowadays even large corporations steer consumers toward their Facebook pages instead of the corporate websites directly.  And of course there are Facebook "Like" and "Recommend" widgets on every website.  One...

  • By
    Create a Sheen Logo Effect with CSS

    I was inspired when I first saw Addy Osmani's original ShineTime blog post.  The hover sheen effect is simple but awesome.  When I started my blog redesign, I really wanted to use a sheen effect with my logo.  Using two HTML elements and...

Incredible Demos

  • By
    New MooTools Plugin:  ElementFilter

    My new MooTools plugin, ElementFilter, provides a great way for you to allow users to search through the text of any mix of elements. Simply provide a text input box and ElementFilter does the rest of the work. The XHTML I've used a list for this example...

  • By
    Create a Dynamic Flickr Image Search with the Dojo Toolkit

    The Dojo Toolkit is a treasure chest of great JavaScript classes.  You can find basic JavaScript functionality classes for AJAX, node manipulation, animations, and the like within Dojo.  You can find elegant, functional UI widgets like DropDown Menus, tabbed interfaces, and form element replacements within...

Discussion

  1. You could use list comprehension to filter the elements instead; something like:

    html5lib_constants.voidElements = frozenset([e for e in html5lib_constants.voidElements if e not in [“link”, “img”]])

    Sounds like it would be useful to add void element overriding as a feature to the library in the future, though.

Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!