Scrape Images with wget

By  on  

The desire to download all images or video on the page has been around since the beginning of the internet.  Twenty years ago I would accomplish this task with a python script I downloaded.  I then moved on to browser extensions for this task, then started using a PhearJS Node.js JavaScript utility to scrape images.  All of these solutions are nice but I wanted to know how I could accomplish this task from command line.

To scrape images (or any specific file extensions) from command line, you can use wget:

wget -nd -H -p -A jpg,jpeg,png,gif -e robots=off http://boards.4chan.org/sp/

The script above downloads images across hosts (i.e. from a CDN or other subdomain) to the directory from which the command is run from.  You'll see downloaded media as they come down:

Reusing existing connection to s.4cdn.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 1505 (1.5K) [image/jpeg]
Saving to: '1490571194319s.jpg'

1490571194319s.jpg 100%[=====================>] 1.47K --.-KB/s in 0s

2017-03-26 18:33:26 (205 MB/s) - '1490571194319s.jpg' saved [1505/1505]

FINISHED --2017-03-26 18:33:26--
Total wall clock time: 2.7s
Downloaded: 66 files, 412K in 0.2s (2.10 MB/s)

Everyone loves cURL, which is another awesome resource, but don't foget about wget, which is arguably easier to use!

Recent Features

  • By
    CSS 3D Folding Animation

    Google Plus provides loads of inspiration for front-end developers, especially when it comes to the CSS and JavaScript wonders they create. Last year I duplicated their incredible PhotoStack effect with both MooTools and pure CSS; this time I'm going to duplicate...

  • By
    Welcome to My New Office

    My first professional web development was at a small print shop where I sat in a windowless cubical all day. I suffered that boxed in environment for almost five years before I was able to find a remote job where I worked from home. The first...

Incredible Demos

  • By
    LightFace:  Facebook Lightbox for MooTools

    One of the web components I've always loved has been Facebook's modal dialog.  This "lightbox" isn't like others:  no dark overlay, no obnoxious animating to size, and it doesn't try to do "too much."  With Facebook's dialog in mind, I've created LightFace:  a Facebook lightbox...

  • By
    Translate Content with the Google Translate API and JavaScript

    Note:  For this tutorial, I'm using version1 of the Google Translate API.  A newer REST-based version is available. In an ideal world, all websites would have a feature that allowed the user to translate a website into their native language (or even more ideally, translation would be...

Discussion

    Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!