Building Resilient Systems on AWS: Learn how to design and implement a resilient, highly available, fault-tolerant infrastructure on AWS.

Python html5lib Skipped Elements

By David Walsh on August 8, 2013

I've been working on some interesting python stuff at Mozilla and one task recently called for called for rending a page and then finding elements with a URL attribute value (like img[src] or a[href]) and ensuring they become absolute URLs. One problem I encountered when using html5lib was that LINK and IMG elements were being skipped when I tokenized the HTML. After browsing through the html5lib source code, I found a variable called voidElements which included both LINK and IMAGE:

voidElements = frozenset((
    "base",
    "command",
    "event-source",
    "link",
    "meta",
    "hr",
    "br",
    "img",
    "embed",
    "param",
    "area",
    "col",
    "input",
    "source"
))

When I commented out those two elements, they were found upon next run of my routine, meaning their presence in the set were causing me problems. Here's how I skirted the issue:

new_void_set = set()
for item in html5lib_constants.voidElements:
	new_void_set.add(item)
new_void_set.remove('link')
new_void_set.remove('img')
html5lib_constants.voidElements = frozenset(new_void_set)

Since voidElements is a frozenset, I couldn't simply remove LINK and IMG, so I needed to create a new frozenset without those elements. Let me know if there's a more python-ish way of creating this frozen set. In an event, delving into the deep recesses of html5lib paid off and I accomplished the goal!

Recent Features

By David WalshJune 29, 2016
Being a Dev Dad
I get asked loads of questions every day but I'm always surprised that they're rarely questions about code or even tech -- many of the questions I get are more about non-dev stuff like what my office is like, what software I use, and oftentimes...
By David WalshNovember 7, 2011
Create Spinning Rays with CSS3: Revisited
Last December I wrote a blog post titled Create Spinning Rays with CSS3 Animations & JavaScript where I explained how easy it was to create a spinning rays animation with a bit of CSS and JavaScript. The post became quite popular so I...

Incredible Demos

By David WalshJuly 1, 2013
9 Mind-Blowing Canvas Demos
The <canvas> element has been a revelation for the visual experts among our ranks. Canvas provides the means for incredible and efficient animations with the added bonus of no Flash; these developers can flash their awesome JavaScript skills instead. Here are nine unbelievable canvas demos that...
By David WalshOctober 5, 2009
MooTools-Like Element Creation in jQuery
I really dislike jQuery's element creation syntax. It's basically the same as typing out HTML but within a JavaScript string...ugly! Luckily Basil Goldman has created a jQuery plugin that allows you to create elements using MooTools-like syntax. Standard jQuery Element Creation Looks exactly like writing out...

Discussion

Darren Hickling
You could use list comprehension to filter the elements instead; something like:

html5lib_constants.voidElements = frozenset([e for e in html5lib_constants.voidElements if e not in [“link”, “img”]])

Sounds like it would be useful to add void element overriding as a feature to the library in the future, though.

Python html5lib Skipped Elements

Recent Features

Being a Dev Dad

Create Spinning Rays with CSS3: Revisited

Incredible Demos

9 Mind-Blowing Canvas Demos

MooTools-Like Element Creation in jQuery

Discussion