Scrape Images with wget

By  on  

The desire to download all images or video on the page has been around since the beginning of the internet.  Twenty years ago I would accomplish this task with a python script I downloaded.  I then moved on to browser extensions for this task, then started using a PhearJS Node.js JavaScript utility to scrape images.  All of these solutions are nice but I wanted to know how I could accomplish this task from command line.

To scrape images (or any specific file extensions) from command line, you can use wget:

wget -nd -H -p -A jpg,jpeg,png,gif -e robots=off http://boards.4chan.org/sp/

The script above downloads images across hosts (i.e. from a CDN or other subdomain) to the directory from which the command is run from.  You'll see downloaded media as they come down:

Reusing existing connection to s.4cdn.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 1505 (1.5K) [image/jpeg]
Saving to: '1490571194319s.jpg'

1490571194319s.jpg 100%[=====================>] 1.47K --.-KB/s in 0s

2017-03-26 18:33:26 (205 MB/s) - '1490571194319s.jpg' saved [1505/1505]

FINISHED --2017-03-26 18:33:26--
Total wall clock time: 2.7s
Downloaded: 66 files, 412K in 0.2s (2.10 MB/s)

Everyone loves cURL, which is another awesome resource, but don't foget about wget, which is arguably easier to use!

Recent Features

  • By
    Detect DOM Node Insertions with JavaScript and CSS Animations

    I work with an awesome cast of developers at Mozilla, and one of them in Daniel Buchner. Daniel's shared with me an awesome strategy for detecting when nodes have been injected into a parent node without using the deprecated DOM Events API.

  • By
    CSS @supports

    Feature detection via JavaScript is a client side best practice and for all the right reasons, but unfortunately that same functionality hasn't been available within CSS.  What we end up doing is repeating the same properties multiple times with each browser prefix.  Yuck.  Another thing we...

Incredible Demos

  • By
    MooTools PulseFade Plugin

    I was recently driven to create a MooTools plugin that would take an element and fade it to a min from a max for a given number of times. Here's the result of my Moo-foolery. The MooTools JavaScript Options of the class include: min: (defaults to .5) the...

  • By
    WebKit-Specific Style:  -webkit-appearance

    I was recently scoping out the horrid source code of the Google homepage when I noticed the "Google Search" and "I'm Feeling Lucky" buttons had a style definition I hadn't seen before:  -webkit-appearance.  The value assigned to the style was "push-button."  They are buttons so that...

Discussion

    Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!