Python html5lib Skipped Elements
I've been working on some interesting python stuff at Mozilla and one task recently called for called for rending a page and then finding elements with a URL attribute value (like img[src] or a[href]) and ensuring they become absolute URLs. One problem I encountered when using html5lib was that LINK and IMG elements were being skipped when I tokenized the HTML. After browsing through the html5lib source code, I found a variable called voidElements which included both LINK and IMAGE:
voidElements = frozenset((
    "base",
    "command",
    "event-source",
    "link",
    "meta",
    "hr",
    "br",
    "img",
    "embed",
    "param",
    "area",
    "col",
    "input",
    "source"
))
When I commented out those two elements, they were found upon next run of my routine, meaning their presence in the set were causing me problems. Here's how I skirted the issue:
new_void_set = set()
for item in html5lib_constants.voidElements:
	new_void_set.add(item)
new_void_set.remove('link')
new_void_set.remove('img')
html5lib_constants.voidElements = frozenset(new_void_set)
Since voidElements is a frozenset, I couldn't simply remove LINK and IMG, so I needed to create a new frozenset without those elements. Let me know if there's a more python-ish way of creating this frozen set. In an event, delving into the deep recesses of html5lib paid off and I accomplished the goal!





You could use list comprehension to filter the elements instead; something like:
html5lib_constants.voidElements = frozenset([e for e in html5lib_constants.voidElements if e not in [“link”, “img”]])
Sounds like it would be useful to add void element overriding as a feature to the library in the future, though.