Scrape Images with wget

By  on  

The desire to download all images or video on the page has been around since the beginning of the internet.  Twenty years ago I would accomplish this task with a python script I downloaded.  I then moved on to browser extensions for this task, then started using a PhearJS Node.js JavaScript utility to scrape images.  All of these solutions are nice but I wanted to know how I could accomplish this task from command line.

To scrape images (or any specific file extensions) from command line, you can use wget:

wget -nd -H -p -A jpg,jpeg,png,gif -e robots=off http://boards.4chan.org/sp/

The script above downloads images across hosts (i.e. from a CDN or other subdomain) to the directory from which the command is run from.  You'll see downloaded media as they come down:

Reusing existing connection to s.4cdn.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 1505 (1.5K) [image/jpeg]
Saving to: '1490571194319s.jpg'

1490571194319s.jpg 100%[=====================>] 1.47K --.-KB/s in 0s

2017-03-26 18:33:26 (205 MB/s) - '1490571194319s.jpg' saved [1505/1505]

FINISHED --2017-03-26 18:33:26--
Total wall clock time: 2.7s
Downloaded: 66 files, 412K in 0.2s (2.10 MB/s)

Everyone loves cURL, which is another awesome resource, but don't foget about wget, which is arguably easier to use!

Recent Features

  • By
    How to Create a Twitter Card

    One of my favorite social APIs was the Open Graph API adopted by Facebook.  Adding just a few META tags to each page allowed links to my article to be styled and presented the way I wanted them to, giving me a bit of control...

  • By
    Send Text Messages with PHP

    Kids these days, I tell ya.  All they care about is the technology.  The video games.  The bottled water.  Oh, and the texting, always the texting.  Back in my day, all we had was...OK, I had all of these things too.  But I still don't get...

Incredible Demos

  • By
    RealTime Stock Quotes with MooTools Request.Stocks and YQL

    It goes without saying but MooTools' inheritance pattern allows for creation of small, simple classes that possess immense power.  One example of that power is a class that inherits from Request, Request.JSON, and Request.JSONP:  Request.Stocks.  Created by Enrique Erne, this great MooTools class acts as...

  • By
    Do / Undo Functionality with MooTools

    We all know that do/undo functionality is a God send for word processing apps. I've used those terms so often that I think of JavaScript actions in terms of "do" an "undo." I've put together a proof of concept Do/Undo class with MooTools. The MooTools...

Discussion

    Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!