O’Reilly Jupyter Conference for Machine Learning is Coming!

By David Walsh on July 13, 2017

Machine learning is an exciting topic in the computer science world these days, which means that O'Reilly has you covered with a first class conference. Jupyter Conference is coming this August so I wanted to speak to O'Reilly's Director of Learning Group, Paco Nathan, about what machine learning and Jupyter are, why they're important, and what you can expect at this conference!

Here's a really good recent article, "What Is Jupyter?" by Mike Loukides, https://www.oreilly.com/ideas/what-is-jupyter which explains the essential story about Jupyter. In a nutshell, Jupyter provides a way for people to run code remotely in a particular environment.

One of the more popular ways this gets used is in Jupyter notebooks. Wolfram Research has used notebooks as a UI metaphor since the 1980s for their popular Mathematica product. In many ways, Jupyter notebooks draw inspiration from those.

If you think of how an Excel spreadsheet is organized, let's simplify that and stretch it out: instead we have cells arranged vertically down a web page. Each of the cells may have rich text (HTML), some image or video, source code, results from running the source code, etc. You run each, step by step, and the data flows from one to the next. Meanwhile, the code is being executed elsewhere: locally on your laptop, or in the cloud, or on some supercomputer at a large university. You control everything through the one web page, and only need to use your browser.

As an example, you may be a grad student working on a scientific computing problem, which you run from your laptop to get started, using a small data set. Then eventually run it from a large supercomputer to work with much larger data sets, but still using the same code, the same notebook.

As another example, say you are a data scientist working within a team to surface insights about a line of business. You can use the notebook to encapsulate your notes and observations, quite literally the code used to pull the data, then the analysis along with the results it produces. Anyone else in your company could re-run the same work, as long as they have a URL for your notebook.

This notion of "repeatable science" has gained so much traction among scientists. The two leads on Project Jupyter are both physicists, and recently the discovery of gravity waves (which may well lead to a Nobel Prize) was published as a Jupyter notebook: https://losc.ligo.org/s/events/GW150914/GW150914_tutorial.html

There's a declining trend for books in general, and textbooks in particular. Professors who would've written textbooks in an earlier year, today tend to use open source tools such as Jupyter to publish learning materials for their courses. These can be readily shared online.

O'Reilly Media, as a publisher, watches this trend carefully. We've leveraged Jupyter notebooks for what is called "computable content". In other words, people can visit a web page and learn to do some complex computation through hands-on coding synchronized with a video of the author explaining the concepts -- based on Jupyter. No software installation is required. just a browser required. That's a huge boost for people talking courses, for example in tutorials at our conferences. Try it out our "Oriole" tutorials showing computable content:

We see approaches such as Jupyter as the future of publishing and learning materials. Here's a talk that I did recently, describing more about Jupyter and some of our use cases at O'Reilly Media: https://dominodatalab.wistia.com/medias/ydax0dpjug

Machine learning is a different thing. That field traces back to early "cybernetics" and Control Theory research in the 1920s by Norbert Wiener, followed by a project Wiener sponsored at MIT during WWII for the first "artificial neural networks" by McCulloch and Pitts.

Advances throughout the 1970s-1990s set the stage for a point in the late 1990s: given the success of e-commerce, by late 1997 companies such as Amazon and eBay had "Big Data" available, along with the beginnings of what we now call cloud computing. Given the two vital elements together ("big data" and "big compute"), companies such as Amazon were able to productize Machine Learning into consumer services at mass scale, such as "People who bought this book also bought...", with its patent filed in 1998. Another company, Google, was still a research project at Stanford at the time, and famously used machine learning algorithms to help people search the web more effectively.

A few years later, Jonathan Goldman applied similar approaches at LinkedIn to create "People you may know", for one of the early large-scale machine learning applications in Social Networks. Facebook, Twitter, Spotify, etc., followed in their path.

Microprocessors for many years doubled in speed every 18 months, due to what's called "Moore's Law" -- although that began to run out by the 2010s. Instead, researchers revised advanced math and computation on GPUs, which had previously been popular for video games. One area where this was applied was neural networks -- a subset of machine learning -- and specifically to stacked layers of neural networks in what's called Deep Learning. By 2012, Google, Facebook, and Microsoft had each funded research teams working on Deep Learning. That was followed by fantastic results circa 2015-2016 in Artificial Intelligence applications, such as speech recognition and translation. We say that three factors ("big data", "big compute", and "big models") allowed AI to become a commercial success.

Jupyter notebooks are inherently well-suited for teaching how to work with machine learning, and AI in particular. Some of the better examples which O'Reilly has published come from Jake VanderPlas at U Washington: https://www.oreilly.com/people/89c9c-jake-vanderplas

Another excellent example is the aforementioned AI tutorial by Jon Bruner: https://www.oreilly.com/learning/generative-adversarial-networks-for-beginners

I did a video called Just Enough Math -- which shows more details about the history of machine learning, in use cases suited for business executives. That, oddly enough, also uses Jupyter notebooks for the coding exercises :)

In terms of other introductory materials for getting started with machine learning, here are several on our Safari learning platform -- that requires a login, although people can sign up for free to get a trial membership:

Data Visualization	Scott Murray (O'Reilly Media)	https://www.safaribooksonline.com/learning-paths/learning-path-data/9781491987223/	Learning Path
Get Started with Natural Language Processing in Python	Paco Nathan (O'Reilly Media)	https://www.safaribooksonline.com/live-training/courses/get-started-with-natural-language-processing-in-python/0636920066279/	Learning Path
Get Started with Natural Language Processing Using Python, Spark, and Scala	Jonathan Mugan (DeepGrammar)	https://www.safaribooksonline.com/library/view/learning-path-get/9781491985854/	Learning Path
Hands-On Machine Learning with Scikit-Learn and TensorFlow	Aurélien Géron (indep.)	https://www.safaribooksonline.com/library/view/hands-on-machine-learning/9781491962282/	Book
Hello, TensorFlow!	Aaron Schumacher (Deep Learning Analytics)	https://www.safaribooksonline.com/oriole/hello-tensorflow-oriole	Live Online Training
Getting Started with Deep Learning using Keras and Python	Mike Williams (Fast Forward Labs)	https://www.safaribooksonline.com/oriole/getting-started-with-deep-learning-using-keras-and-python	Book
Intro to Deep Learning Theory and Practice Featuring Keras	Adam Breindel (indep.)	https://www.safaribooksonline.com/live-training/courses/intro-to-deep-learning-theory-and-practice-featuring-keras/0636920086291/	Learning Path
Machine Learning for Designers	Patrick Hebron (indep.)	https://www.safaribooksonline.com/library/view/machine-learning-for/9781491982754/	Live Online Training
Machine Learning and Security	David Freeman (LinkedIn), Clarence Chio (Shape Security)	https://www.safaribooksonline.com/library/view/machine-learning-and/9781491979891/	Learning Path
Data Mining, 4th Edition	Christopher Pal (École Poly. de Montréal), et al.	https://www.safaribooksonline.com/library/view/data-mining-4th/9780128043578/	Book
Practical Machine Learning Techniques for Building Intelligent Applications	Ben Lorica (O'Reilly Media)	https://www.safaribooksonline.com/library/view/practical-machine-learning/9781491961872/	Oriole
Deep Learning Models and Computer Vision with TensorFlow	Lucas Adams (Jet.com)	https://www.safaribooksonline.com/learning-paths/learning-path-deep/9781491995167/	Oriole

You asked whether machine learning will replace developers? We've had a lot of related material at our recent conferences. The answers range from "No", "Yes", and "AI certainly augments people":

Tim O'Reilly, using AI to create new jobs: https://www.oreilly.com/ideas/using-ai-to-create-new-jobs
Peter Norvig, head of research @ Google -- how software engineering with AI has changed substantially: https://www.safaribooksonline.com/library/view/oreilly-artificial-intelligence/9781491976289/video311928.html
Jason Laska, Clara Lab, about hybrid systems of people + machines: https://conferences.oreilly.com/artificial-intelligence/ai-ny/public/schedule/detail/59016
Adam Marcus, B12, about using AI to replace (some) of what web developers do: https://conferences.oreilly.com/artificial-intelligence/ai-ny/public/schedule/detail/59231

It's also interesting to note that when "Deep Dive", a very popular AI project at Stanford University recently needed to build a custom UI for people to use their work, they chose to use Jupyter notebooks: http://dawn.cs.stanford.edu/2017/05/08/snorkel/

I'll be giving a related talk at JupyterCon on Thu, Aug 24, which describes how we use Jupyter notebooks at O'Reilly Media for our AI work on my team, to help people and machines collaborate together:

Humans in the loop: Jupyter notebooks as a frontend for AI pipelines at scale

https://conferences.oreilly.com/jupyter/jup-ny/public/schedule/detail/60058

I'm super thrilled about JupyterCon. It'll be *so great* to get these speakers, thinkers, innovators, all together in one place -- finally!!