Scrape Images with wget

By  on  

The desire to download all images or video on the page has been around since the beginning of the internet.  Twenty years ago I would accomplish this task with a python script I downloaded.  I then moved on to browser extensions for this task, then started using a PhearJS Node.js JavaScript utility to scrape images.  All of these solutions are nice but I wanted to know how I could accomplish this task from command line.

To scrape images (or any specific file extensions) from command line, you can use wget:

wget -nd -H -p -A jpg,jpeg,png,gif -e robots=off http://boards.4chan.org/sp/

The script above downloads images across hosts (i.e. from a CDN or other subdomain) to the directory from which the command is run from.  You'll see downloaded media as they come down:

Reusing existing connection to s.4cdn.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 1505 (1.5K) [image/jpeg]
Saving to: '1490571194319s.jpg'

1490571194319s.jpg 100%[=====================>] 1.47K --.-KB/s in 0s

2017-03-26 18:33:26 (205 MB/s) - '1490571194319s.jpg' saved [1505/1505]

FINISHED --2017-03-26 18:33:26--
Total wall clock time: 2.7s
Downloaded: 66 files, 412K in 0.2s (2.10 MB/s)

Everyone loves cURL, which is another awesome resource, but don't foget about wget, which is arguably easier to use!

Recent Features

  • By
    5 Awesome New Mozilla Technologies You’ve Never Heard Of

    My trip to Mozilla Summit 2013 was incredible.  I've spent so much time focusing on my project that I had lost sight of all of the great work Mozillians were putting out.  MozSummit provided the perfect reminder of how brilliant my colleagues are and how much...

  • By
    fetch API

    One of the worst kept secrets about AJAX on the web is that the underlying API for it, XMLHttpRequest, wasn't really made for what we've been using it for.  We've done well to create elegant APIs around XHR but we know we can do better.  Our effort to...

Incredible Demos

  • By
    Duplicate DeSandro’s CSS Effect

    I recently stumbled upon David DeSandro's website when I saw a tweet stating that someone had stolen/hotlinked his website design and code, and he decided to do the only logical thing to retaliate:  use some simple JavaScript goodness to inject unicorns into their page.

  • By
    pointer Media Query

    As more devices emerge and differences in device interaction are implemented, the more important good CSS code will become.  In order to write good CSS, we need some indicator about device capabilities.  We've used CSS media queries thus far, with checks for max-width and pixel ratios.

Discussion

    Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!