Using DOMDocument to Modify HTML with PHP
One of the first things you learn when wanting to implement a service worker on a website is that the site requires SSL (an https
address). Ever since I saw the blinding speed service workers can provide a website, I've been obsessed with readying my site for SSL. Enforcing SSL with .htaccess
was easy -- the hard part is updating asset links in blog content. You start out by feeling as though regular expressions will be the quick cure but anyone that has experience with regular expression knows that working with URLs is a nightmare and regex is probably the wrong decision.
The right decision is DOMDocument, a native PHP object which allows you to work with HTML in a logical, pleasant fashion. You start by loading the HTML into a DOMDocument instance and then using its predictable functions to make things happen.
// Formats post content for SSL
function format_post_content($content = '') {
$document = new DOMDocument();
// Ensure UTF-8 is respected by using 'mb_convert_encoding'
$document->loadHTML(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8'));
$tags = $document->getElementsByTagName('img');
foreach ($tags as $tag) {
$tag->setAttribute('src',
str_replace('http://davidwalsh.name',
'https://davidwalsh.name',
$tag->getAttribute('src')
)
);
}
return $document->saveHTML();
}
In my example above, I find all img
elements and replace their protocol with https://
. I will end up doing the same with iframe src
, a href
, and a few other rarely used tags. When my modifications are done, I call saveHTML
to get the new string.
Don't fall into the trap of trying to use regular expressions with HTML -- you're in for a future of failure. DOMDocument is lightweight and will make your code infinitely more maintainable.
![5 More HTML5 APIs You Didn’t Know Existed]()
The HTML5 revolution has provided us some awesome JavaScript and HTML APIs. Some are APIs we knew we've needed for years, others are cutting edge mobile and desktop helpers. Regardless of API strength or purpose, anything to help us better do our job is a...
![6 Things You Didn’t Know About Firefox OS]()
Firefox OS is all over the tech news and for good reason: Mozilla's finally given web developers the platform that they need to create apps the way they've been creating them for years -- with CSS, HTML, and JavaScript. Firefox OS has been rapidly improving...
![9 Incredible CodePen Demos]()
CodePen is a treasure trove of incredible demos harnessing the power of client side languages. The client side is always limited by what browsers provide us but the creativity and cleverness of developers always pushes the boundaries of what we think the front end can do. Thanks to CSS...
![CSS Rounded Corners]()
The ability to create rounded corners with CSS opens the possibility of subtle design improvements without the need to include images. CSS rounded corners thus save us time in creating images and requests to the server. Today, rounded corners with CSS are supported by all of...
So do you know if there is a performance hit with creating an element using this vs creating a string of html?
The right decision is skipping domain entirely if it isn’t hosted on some subdomain (
/path/to/asset
), and skipping protocol if it is ((//example.com/path/to/asset
)David, rather than str_replace all your (internal)
http://
strings withhttps://
you should replace them with//
– that way your links become protocol-agnostic — a more future-proof solution.Why don’t you use the
search-replace
function in WP-CLI?Why not remove the protocol completely?
//davidwalsh.name/
would default to whatever protocol is used in the address bar.I agree that
//
would be better but some RSS feed readers usehttp
, othershttps
. I’m asserting complete control.