Interview with John Hann, Creator of curl.js
In the world of JavaScript, John Hann is one B.A.M.F. His usual handle is unscriptable, but that should be the last thing he should be called. John has created and contributed to many incredible JavaScript tools -- simply check out his GitHub page. This blog uses John's curljs, an incredibly efficient, flexible JavaScript loader. I wanted to interview John about creating a loader: the pitfalls, the browser differences, and what's in store for the future.
Hello John! For those who don't know you, give us a quick introduction and let us know what you're working on.
Hi, I'm John. John Hann. "@unscriptable" on most of the interwebs. I've been writing Javascript since 1995. Like many, I wasn't enthused, at first. In 2005, I was comfortable enough with it to start appreciating it's good parts and started coding in it exclusively.
Ha! There's a good story around that. I'll make it quick. At that time, I was running a boutique software development company. We were 12 employees at the peak in 2001, but had dwindled to 5 by 2005. Internet bubble: you know the story. Anyway, I announced to my staff that Javascript was the way of the future.
Hmmm. let me back up for a sec. I should mention that I often prognosticated about software development trends and was usually right. For instance, the day I heard about C#, I predicted that it would eclipse all other Microsoft languages and told all my employees that they needed to learn it *now*. They all complied and we were in high demand for a long time.
However, when I predicted that Javascript was the next big thing, they all -- every last one of them -- shrugged and disagreed. I sold off the company and never looked back.
Anyways, by 2008, I had written three decent Javascript frameworks from scratch for various private projects and was irritated that the most of the industry was still doing things that I considered archaic. Finally, in 2010, I decided to go open source. That's when cujo.js was conceived.
I started out by structuring cujo.js as an application framework on top of dojo. That seemed like the best way to get started: stand on the shoulders of giants. At the same time, I felt like I wasn't targeting the right community. After all, it was the jQuery-centric folks that needed the most guidance.
By happenstance, I found out one of the colleagues I admired the most was also toying with similar ideas. It was from discussions with Brian Cavalier later in 2010 that we discovered that we didn't want to create yet another framework at all. We wanted to build an "architectural framework" -- a collection of architectural tools that can work together or as individual libraries. More importantly, these tools must work with other popular libraries as well.
cujo.js, as we know it today, came to life in 2011. That's what I do now. I work with Brian and some other part-timers to make cujo.js more awesome every day. Increasingly, that's what I do at my day job at SpringSource, too. They're the best company I've worked with so far.
On weekends, I like to build things with my kids and post pictures of the results on Flickr.
You're a well known supporter of the AMD format. What led to your love of AMD? Why is AMD the best format for writing JavaScript?
I came to love AMD when I realized it was the first Javascript module format that wasn't tied to a specific library or company. Open source FTW!
Seriously. I had grown fond of dojo.require(), but was really wishing my code wasn't entangled with dojo. dojo was -- and still is -- one of the most awesome Javascript frameworks, but it just didn't feel right that my code was inextricably tied to it. AMD was the first module format -- and the only module format at that time -- that didn't entangle my code with a framework.
I'm going to go out on a tangent here, but I think it's important to mention: Javascript is a community of fanboys. There aren't many standards, no universal best practices, no de facto high-level frameworks like Java or C#. We've got no other choice but to rally around our favorite library or framework.
Furthermore, we're not overly educated. Many of us don't have computer science degrees. We don't even have engineering backgrounds. We're just hacks who love what we do. So when something powerful, yet simple comes along and suddenly blows our minds, WE JUST LOVE IT TO DEATH.
AMD did that for me. The idea that I could write modular code that was totally independent of any framework made me an instant fanboy.
So, why is AMD the best format for Javascript? Hmm... I guess it comes down to this: it's the simplest format I've seen that doesn't require a build step. It was designed for browsers. You can get started with AMD by just downloading an AMD loader and writing some code. Hit F5 or Cmd-R and see your first module load.
When you created curl.js, there were other loaders available. What was your inspiration behind creating curl.js?
Heh. Ok, so I'm a little bit competitive. Not overtly, but definitely competitive. When I first tried RequireJS, I thought it was really cool, but why was it so big???? At the time, RequireJS was pretty unstable, too. I'd rather rely on my own buggy code than somebody else's since I understand how to fix my own code.
I thought I could create an AMD loader that was smaller and faster. Turns out I was right.
To be fair, RequireJS has some legacy code in it. That adds some bloat. I was able to start from scratch. The first version of curl.js was about 3.5 KB gzipped when RequireJS was about 6 KB. Of course, RequireJS had way, way more features.
But the tiny size of curl.js motivated me. I obsessed on it. I vowed to never let it grow larger. Today, it's still around 3.5 KB and has a similar feature set as RequireJS.
Again, to be fair, RequireJS seems very stable now and has an amazing test suite.
It also occurred to me that there needed to be multiple implementations of a standard in order to really be considered a standard. I felt that AMD needed to be larger than just RequireJS in order to be taken seriously.
What challenges did you encounter when starting development on curl.js? What problems did you not anticipate and how did you solve them?
CommonJS. I totally didn't know what the heck I was doing and didn't know *anything* about CommonJS modules -- or packages -- until more recently. Also: configuration. I still can't believe how many bytes of curl.js are used up trying to handle all the ways that users can configure curl.js. Now I know why people write unfriendly, unconfigurable APIs!
Oh. I guess what you're probably inquiring about is what browser-ish roadblocks I encountered? Whew! Lots. Browsers really didn't standardize script loading behavior until this year.
Luckily, browsers pretty much fall into two camps: IE and everything else. Wait, but then there's Opera which is somewhere in between, but not really even.
The trick to script loading is knowing the precise moment when a script has executed. In IE, you can detect the currently executing script by looping through all of the recently created script elements and sniffing for which of them has a readyState of "interactive". Of course, Opera misleads us and says some seemingly random script is "interactive", so we have to account for that.
The standards-compliant browsers work a bit differently. They queue up the executing scripts and fire each script's onload event immediately *after* it executes. This requires an entirely different algorithm than IE, of course.
Error handling is a different matter. IE and Opera still don't fire an onerror event if a script element 404's. Luckily, we can detect if an AMD `define()` hasn't been called and throw a meaningful error anyways.
CSS loading is a serious can of worms. curl.js treats CSS just like it does Javascript. You can load (and wait) for it just like Javascript. The problem is that even browsers like Chrome and Firefox haven't had adequate support for onload and onerror handlers on link elements until very recently. The CSS-handling code is just atrocious. But it works.
What part of creating a JavaScript loader is easier than people may think?
It's not easy. None of it. As soon as you create anything that handles real-world code, that thing becomes complex. Every web developer on this planet wants to do things *their way*. Nobody ever does things quite like the next person.
Hmm.... there must be something in curl that's not overly complex. Thinking... thinking... Nope. Forget it. There's not a single line of code that hasn't cost me hours of testing, nail biting, or frustration. Seriously.
How big of a factor is the modern browser when creating a loader? Which browser was easiest, and which was the most difficult to accommodate?
Modern browsers are much better, of course. Firefox has been the easiest, by far. Chrome and Safari are next. IE and Opera still don't support basic error handling. In fact, they still falsely declare success if a script 404's. Brilliant.
Firefox always seemed to be diligent about script loading, even before Kyle Simpson -- the godfather of script loading -- joined Mozilla. Oh... link loading, too. They were the first to implement functional onload and onerror handlers for script *and* link elements. They were the first to support the async attribute on script elements, too. They also seemed to know that the sequencing of script evaluation and onload events needed to be predictable long before other browsers, if I recall correctly.
curl.js even works in Netscape 7 because of that. Hm... I haven't tested in Netscape 7 lately. YMMV.
Performance is an important part of any software component. What steps have you taken to make curl.js efficient and compact?
As I mentioned earlier, I've obsessed about code size since day one. That said, I think curl.js is in need of a diet. As soon as the next big features are released, I'll give it a look-over to see what I can trim.
Size isn't the only concern. I'm also obsessed with http performance. Maybe not so obsessed as John-David Dalton (he's nuts), but obsessed enough to not accept compromise.
One of the differences between curl.js and other loaders, say RequireJS, is that curl.js resolves it's dependencies synchronously. In production, if you've properly concatenated your modules, sync resolution doesn't make a huge difference. However, during development -- when concatenation is burdensome and totally unnecessary -- the average 12ms delay caused by async resolution can make a huge difference. We were once working on a project that had 300+ modules. That's 300 http requests! We were waiting forever -- like over 30 seconds -- for the app to load in IE6. It was actually faster to run a build script to concatenate the modules and then load the single file into IE.
Ahhhh! I just remembered. That was another one of the reasons I wrote curl.js. RequireJS was timing out and giving up. Even when we set the timeout to 60 seconds, it still would puke. I was sure we could write a loader that didn't waste 12ms per module just sitting around. Little did I know that async module resolution was way easier than sync module resolution.
Timeouts are problematic, anyways. It's impossible to set a timeout that works across all browsers and for every connection speed. curl.js doesn't use one. curl.js doesn't need one.
Also: slow IE6 is slow no matter what you throw at it. We cut the un-concatenated load time in half with curl.js, but it was still 6 times slower than Firefox and Chrome.
How difficult was implementing the promise API for curl.js?
Well. Once I implemented promise-like behavior inside curl, it wasn't hard to implement it in the API. To be fair, curl.js doesn't implement the full CommonJS Promises/A standard. It's just promise-like. We've got another library, when.js, that is fully compliant and blazingly fast, too.
With the ability to set aliases, packages, and external module URLs, how difficult is path resolving when creating a loader?
Wow. Loaded question. Where to start. I've been meaning to write more documentation about this. I guess I'll first mention that AMD loader authors have concluded that it's important to think about two different stages in url resolution. First, you have to normalize the module's id. Then, you can resolve a url.
Id resolution requires a few steps. First, you have to reduce leading dots. For instance, if you're requiring a module that's two folders up from the current (parent) module, you've got two levels of double-dots to fold into the parent module's id. At this point, you've hopefully got no more leading dots. If you do have leading dots, then the module id is really a url path, and that's problematic, but I'll just skip over that for now.
Once you've removed all of the leading dots, you can perform id transforms. curl.js currently has two module id transforms: 1) a plugin id transform and 2) a package "main" module transform. Both of these types of ids have shortcut notation. curl checks to see if the module you're requesting is a shortcut for a plugin or a main module and expands them into their long forms.
Ok, so once you have a normalized id, you can look up the url path. curl.js uses a very fast, regex-driven algorithm that allows the dev to create increasingly specific url transforms. Basically, curl sorts the url transforms by the number of slashes in it. The more slashes, the higher priority. curl.js uses this algorithm to search through the paths configuration to determine where you've put the module. Finally, curl appends the path to the base url and uses that to fetch the module.
curl.js comes bundled with many plugins, allowing basic XHR requesting, CSS file loading, domReady callback execution, and more. Essentially you can load a complete UI widget, for example, within your module dependency array. How difficult was it to integrate the plugins, and do you have additional plugins you plan to include in the future?
James Burke designed a very simple plugin API consisting of one function. With a little help from Rawld Gill of dojo fame and myself, we finalized a complete, yet still simple, run-time plugin API that consists of only two functions and a property. James and Rawld have extended that API a bit to suit certain requirements. However, I've been able to do everything with the original API.
The major use cases for plugins are loading HTML templates with the text plugin and loading localization files with the i18n plugin. curl.js has two flavors of CSS plugin, too. Other people have created Coffeescript plugins, CommonJS plugins, and plugins for other compile-to-Javascript languages.
Our favorite pattern is -- like you said -- to create an entire UI component in a module. Javascript, CSS, HTML, localization files, etc. all in one folder. Lots of folks offer widgets, but the way you handle the Javascript and the CSS is so disjointed. When you can co-locate the Javascript and CSS together, you've got a truly portable widget. curl.js does that so well.
We've already got a good set of plugins. I think where we'll concentrate going forward is on transpilers. As of curl 0.8, we'll have full support for transpilers using the same old plugin API that regular plugins use. We call this concept "Compile to AMD" and it's pretty powerful. You just find or write a plugin that transpiles your preferred language -- Coffeescript, Haskell, Sybilant, TypeScript, whatever -- and tell curl.js that you want to use it to convert a set of modules to AMD. Other modules in your project don't need to know what language any others were written in. They're all converted to AMD either at run time or at build time, but you probably don't want to convert them at build time for production code.
This sure feels like the future!
What challenges are presented, from a code and logic standpoint, when accounting for loading both async and sync files within the same module?
Well, curl doesn't load files sync. I should say that *AMD* doesn't load files sync. You can write code that assumes a file will be loaded sync, but the AMD loader will detect that and pre-load the file asynchronously.
Since AMD was written for browsers, the AMD format just let's you write your code as if the dependencies are available synchronously. If you want to write in the CommonJS Modules style, there's a specific way to wrap your modules for this to work. I think James Burke calls it "Simplified CommonJS-wrapped modules". Just google that and you'll find some decent docs on it.
curl.js actually has a way to load CommonJS modules without wrapping. It's an "experimental" feature that previews the "Compile to AMD" features coming in 0.8. It's awesome because you get the best of both worlds. I call it "experimental", but it works great today. It's just that the configuration settings will change.
What challenges did adding jQuery support present?
Well, James did all the leg work by getting the jQuery folks to support AMD, but the way they implemented it required a loader that resolves modules asynchronously. curl.js resolves modules sync, as I mentioned earlier. The first jQuery release with AMD support, 1.7, didn't account for sync resolution. Version 1.7.2 did. All subsequent versions work great with curl.
jQuery does something else that requires special note, though. They *name* their module. jQuery's define statement has a hard-coded module id in it. This makes it possible to use non-AMD build tools in conjunction with an AMD loader. I don't think anybody in the real world is actually doing this, but, oh well, we can deal with it.
The only way to handle named modules is to specify a path config for the module. In short, you absolutely have to specify a path mapping for jQuery in your AMD config. This isn't a big deal in my opinion since I think the developer should specify a path mapping to every package or library in their app, anyways. It can just trip up newbs.
Do you have any small but useful code snippets from the curl.js you'd like to share? (i.e. are there any edge feature detection snippets or "hacks" that some people wouldn't know?)
Oh gawd. The css plugin is chock full of hacks and edge cases. I think the best one is the method we're using to avoid the 31-stylesheet limit in IE6-9. This method also provides onerror support since IE's link elements don't normally call onerror when a url 404's. Here's how it works:
First, a "collector" sheet is created. This stylesheet will be used to collect the first 31 stylesheets. We add an onload and an onerror handler to the collector sheet and insert the first requested stylesheet as an @import. The collector sheet will fire either the onload or the onerror handler when the imported sheet loads or fails. For some reason at this point, the onerror handler becomes non-functional, so we have to replace it -- and the onload handler -- before we try to load the next stylesheet.
We keep replacing handlers and inserting @imports until we reach the 31-sheet limit. At 31 sheets, we create a new collector sheet and start counting to 31 all over again.
The problem with this algorithm is that it can only load one sheet at a time. To get around this limitation, we create up to 12 simultaneous collector sheets. The css plugin uses a "round robin" technique so that up to 12 sheets may be loaded simultaneously. Since IE9's HTTP request limit is 12, this works out nicely.
If you're well-versed in CSS semantics, red lights are flashing and sirens are ringing in your head right now. A round robin rotation algorithm like this would surely screw up the CSS cascade. You'd be right -- if you were thinking of the behavior of *normal browsers*. IE is not a normal browser. Unlike all other browsers, the IE team interpreted the cascade differently. They decided that the *temporal* order decides the cascade preference. All other browsers decide cascade preference by the *DOM* order. When you put static link elements in your html page, temporal order and DOM order are the same so you've probably never noticed the difference.
In short, since we ensure that the CSS stylesheets are handled in their proper temporal sequence, it all works out. Legacy IE can load up to a total of 372 stylesheets using this algorithm, and it's pretty darn fast.
What features do you anticipate adding to curl.js in the near future?
Well, I mentioned the "Compile to AMD" feature. That's going to be hot.
The other major feature is the "Portable AMD Bundle" feature. curl.js's sister project, cram.js, will be able to concatenate modules into larger files. This isn't anything earth shattering if you're already familiar with RequireJS's build tool, r.js. However, there are a few twists. First, CSS may also be bundled into the file. Second, there will be a sensible way to break the files into logical chunks that we call "bundles". Lastly, the files should be loadable by even the dumbest of AMD loaders since they'll be compiled down to the least common denominator.
You could take these bundles and host them on a CDN somewhere, publish them on github, or just use them within your own organization. It won't matter that you used some of curl.js's super cool features to create the bundle, it should work just about anywhere.
Are there any tips you can provide for easier debugging with AMD modules?
Good point. Debugging async *anything* is hard. curl's debug module is useful for logging each module as it gets processed. But it's almost as easy to just watch the console and the network activity. Here are some things to watch out for:
- If a module 404'ed, take a look at the url the browser used. Did you use too many double-dot parent path navigations? Does it look like curl didn't apply a path mapping? Try fetching the module in the console by typing `curl([
], console.log.bind(console));` and see what happens. - If curl just fails silently and you're loading non-AMD javascript using the js plugin, try using the `exports=` feature of the js plugin. That feature provides explicit error feedback in all browsers.
- Create a testing harness and limit the problem scope. Tracking dozens of async things is crazy hard. Keep cutting the problem scope until you've got a handle on what's happening.
Other gotchas:
- Be careful not to try to use a global
require()
by accident. Unlike CommonJS environments, AMD environments don't automatically provide a context-sensitiverequire()
(aka a "local require"). A global require can't figure out how to find relative dependencies, which can lead to serious WTF moments. By default, curl.js fails early and loudly if you've referenced the global require by accident since it doesn't declare a global `require()` at all (unless you tell it to). Be sure to always request a local require in your modules and don't declare a global require unless you are certain your project is in the 0.00001% of use cases that actually need a global require. - Don't let urls creep into your module ids. As soon as you have urls in your module ids, your options for moving files becomes limited. The situation gets worse when you're concatenating your files into bundles.
There are two ways that urls creep into module ids. I mentioned the first one already. It happens when you try to navigate up too many levels.
define(["../../util/foo"], function (foo) { /* create something epic */ });
In general, using double dots in your application code is a code smell. The only time you should ever use double dots is to reference a related module within the same package. Highly modular third-party packages like dojo, wire.js, poly.js, etc. use double dots a lot. If you find you're using them in your web app, you should consider breaking your app into packages. You don't need to make them legitimate packages with a package.json; you just need to configure the loader to recognize that there's a logical organization of modules.
Actually, I think urls in general are problematic. Module ids are more flexible and are more in line with CommonJS and node.js patterns. I guess the take-away should be that you should use your AMD loader's path mapping and package mapping features. If your module ids look any more sophisticated than "myFoo" or "myPackage/foo" -- in other words, if they have a lot of slashes or double-dots -- you're probably playing with a footgun.
Awesome interview. Very educative.
How to use curl in sony devices ?
Thanks, David and John. Very helpful insight!