Scrape Images with wget

By  on  

The desire to download all images or video on the page has been around since the beginning of the internet.  Twenty years ago I would accomplish this task with a python script I downloaded.  I then moved on to browser extensions for this task, then started using a PhearJS Node.js JavaScript utility to scrape images.  All of these solutions are nice but I wanted to know how I could accomplish this task from command line.

To scrape images (or any specific file extensions) from command line, you can use wget:

wget -nd -H -p -A jpg,jpeg,png,gif -e robots=off http://boards.4chan.org/sp/

The script above downloads images across hosts (i.e. from a CDN or other subdomain) to the directory from which the command is run from.  You'll see downloaded media as they come down:

Reusing existing connection to s.4cdn.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 1505 (1.5K) [image/jpeg]
Saving to: '1490571194319s.jpg'

1490571194319s.jpg 100%[=====================>] 1.47K --.-KB/s in 0s

2017-03-26 18:33:26 (205 MB/s) - '1490571194319s.jpg' saved [1505/1505]

FINISHED --2017-03-26 18:33:26--
Total wall clock time: 2.7s
Downloaded: 66 files, 412K in 0.2s (2.10 MB/s)

Everyone loves cURL, which is another awesome resource, but don't foget about wget, which is arguably easier to use!

Recent Features

  • By
    Tips for Starting with Bitcoin and Cryptocurrencies

    One of the most rewarding experiences of my life, both financially and logically, has been buying and managing cryptocurrencies like Bitcoin, Litecoin, Ethereum. Like learning any other new tech, I made rookie mistakes along the way, but learned some best practices along the way. Check out...

  • By
    Welcome to My New Office

    My first professional web development was at a small print shop where I sat in a windowless cubical all day. I suffered that boxed in environment for almost five years before I was able to find a remote job where I worked from home. The first...

Incredible Demos

  • By
    Editable Content Using MooTools 1.2, PHP, and MySQL

    Everybody and their aerobics instructor wants to be able to edit their own website these days. And why wouldn't they? I mean, they have a $500 budget, no HTML/CSS experience, and extraordinary expectations. Enough ranting though. Having a website that allows for...

  • By
    Control Element Outline Position with outline-offset

    I was recently working on a project which featured tables that were keyboard navigable so obviously using cell outlining via traditional tabIndex=0 and element outlines was a big part of allowing the user navigate quickly and intelligently. Unfortunately I ran into a Firefox 3.6 bug...

Discussion

    Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!