Scrape Images with wget

By  on  

The desire to download all images or video on the page has been around since the beginning of the internet.  Twenty years ago I would accomplish this task with a python script I downloaded.  I then moved on to browser extensions for this task, then started using a PhearJS Node.js JavaScript utility to scrape images.  All of these solutions are nice but I wanted to know how I could accomplish this task from command line.

To scrape images (or any specific file extensions) from command line, you can use wget:

wget -nd -H -p -A jpg,jpeg,png,gif -e robots=off http://boards.4chan.org/sp/

The script above downloads images across hosts (i.e. from a CDN or other subdomain) to the directory from which the command is run from.  You'll see downloaded media as they come down:

Reusing existing connection to s.4cdn.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 1505 (1.5K) [image/jpeg]
Saving to: '1490571194319s.jpg'

1490571194319s.jpg 100%[=====================>] 1.47K --.-KB/s in 0s

2017-03-26 18:33:26 (205 MB/s) - '1490571194319s.jpg' saved [1505/1505]

FINISHED --2017-03-26 18:33:26--
Total wall clock time: 2.7s
Downloaded: 66 files, 412K in 0.2s (2.10 MB/s)

Everyone loves cURL, which is another awesome resource, but don't foget about wget, which is arguably easier to use!

Recent Features

  • By
    Regular Expressions for the Rest of Us

    Sooner or later you'll run across a regular expression. With their cryptic syntax, confusing documentation and massive learning curve, most developers settle for copying and pasting them from StackOverflow and hoping they work. But what if you could decode regular expressions and harness their power? In...

  • By
    5 Ways that CSS and JavaScript Interact That You May Not Know About

    CSS and JavaScript:  the lines seemingly get blurred by each browser release.  They have always done a very different job but in the end they are both front-end technologies so they need do need to work closely.  We have our .js files and our .css, but...

Incredible Demos

  • By
    CSS @supports

    Feature detection via JavaScript is a client side best practice and for all the right reasons, but unfortunately that same functionality hasn't been available within CSS.  What we end up doing is repeating the same properties multiple times with each browser prefix.  Yuck.  Another thing we...

  • By
    Use Custom Missing Image Graphics Using MooTools

    Missing images on your website can make you or your business look completely amateur. Unfortunately sometimes an image gets deleted or corrupted without your knowledge. You'd agree with me that IE's default "red x" icon looks awful, so why not use your own missing image graphic? The MooTools JavaScript Note that...

Discussion

    Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!