Scrape Images with wget

By  on  

The desire to download all images or video on the page has been around since the beginning of the internet.  Twenty years ago I would accomplish this task with a python script I downloaded.  I then moved on to browser extensions for this task, then started using a PhearJS Node.js JavaScript utility to scrape images.  All of these solutions are nice but I wanted to know how I could accomplish this task from command line.

To scrape images (or any specific file extensions) from command line, you can use wget:

wget -nd -H -p -A jpg,jpeg,png,gif -e robots=off http://boards.4chan.org/sp/

The script above downloads images across hosts (i.e. from a CDN or other subdomain) to the directory from which the command is run from.  You'll see downloaded media as they come down:

Reusing existing connection to s.4cdn.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 1505 (1.5K) [image/jpeg]
Saving to: '1490571194319s.jpg'

1490571194319s.jpg 100%[=====================>] 1.47K --.-KB/s in 0s

2017-03-26 18:33:26 (205 MB/s) - '1490571194319s.jpg' saved [1505/1505]

FINISHED --2017-03-26 18:33:26--
Total wall clock time: 2.7s
Downloaded: 66 files, 412K in 0.2s (2.10 MB/s)

Everyone loves cURL, which is another awesome resource, but don't foget about wget, which is arguably easier to use!

Recent Features

  • By
    9 Mind-Blowing WebGL Demos

    As much as developers now loathe Flash, we're still playing a bit of catch up to natively duplicate the animation capabilities that Adobe's old technology provided us.  Of course we have canvas, an awesome technology, one which I highlighted 9 mind-blowing demos.  Another technology available...

  • By
    How to Create a RetroPie on Raspberry Pi – Graphical Guide

    Today we get to play amazing games on our super powered game consoles, PCs, VR headsets, and even mobile devices.  While I enjoy playing new games these days, I do long for the retro gaming systems I had when I was a kid: the original Nintendo...

Incredible Demos

  • By
    jQuery topLink Plugin

    Last week I released a snippet of code for MooTools that allowed you to fade in and out a "to the top" link on any page. Here's how to implement that functionality using jQuery. The XHTML A simple link. The CSS A little CSS for position and style. The jQuery...

  • By
    Using Dotter for Form Submissions

    One of the plugins I'm most proud of is Dotter. Dotter allows you to create the typical "Loading..." text without using animated images. I'm often asked what a sample usage of Dotter would be; form submission create the perfect situation. The following...

Discussion

    Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!