Python html5lib Skipped Elements

By  on  

I've been working on some interesting python stuff at Mozilla and one task recently called for called for rending a page and then finding elements with a URL attribute value (like img[src] or a[href]) and ensuring they become absolute URLs.  One problem I encountered when using html5lib was that LINK and IMG elements were being skipped when I tokenized the HTML.  After browsing through the html5lib source code, I found a variable called voidElements which included both LINK and IMAGE:

voidElements = frozenset((
    "base",
    "command",
    "event-source",
    "link",
    "meta",
    "hr",
    "br",
    "img",
    "embed",
    "param",
    "area",
    "col",
    "input",
    "source"
))

When I commented out those two elements, they were found upon next run of my routine, meaning their presence in the set were causing me problems.  Here's how I skirted the issue:

new_void_set = set()
for item in html5lib_constants.voidElements:
	new_void_set.add(item)
new_void_set.remove('link')
new_void_set.remove('img')
html5lib_constants.voidElements = frozenset(new_void_set)

Since voidElements is a frozenset, I couldn't simply remove LINK and IMG, so I needed to create a new frozenset without those elements.  Let me know if there's a more python-ish way of creating this frozen set.  In an event, delving into the deep recesses of html5lib paid off and I accomplished the goal!

Recent Features

  • By
    Responsive and Infinitely Scalable JS Animations

    Back in late 2012 it was not easy to find open source projects using requestAnimationFrame() - this is the hook that allows Javascript code to synchronize with a web browser's native paint loop. Animations using this method can run at 60 fps and deliver fantastic...

  • By
    Responsive Images: The Ultimate Guide

    Chances are that any Web designers using our Ghostlab browser testing app, which allows seamless testing across all devices simultaneously, will have worked with responsive design in some shape or form. And as today's websites and devices become ever more varied, a plethora of responsive images...

Incredible Demos

  • By
    Check All/None Checkboxes Using MooTools

    There's nothing worse than having to click every checkbox in a list. Why not allow users to click one item and every checkbox becomes checked? Here's how to do just that with MooTools 1.2. The XHTML Note the image with the ucuc ID -- that...

  • By
    iPhone Checkboxes Using MooTools

    One of the sweet user interface enhancements provided by Apple's iPhone is their checkbox-slider functionality. Thomas Reynolds recently released a jQuery plugin that allows you to make your checkboxes look like iPhone sliders. Here's how to implement that functionality using the beloved...

Discussion

  1. You could use list comprehension to filter the elements instead; something like:

    html5lib_constants.voidElements = frozenset([e for e in html5lib_constants.voidElements if e not in [“link”, “img”]])

    Sounds like it would be useful to add void element overriding as a feature to the library in the future, though.

Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!