Scrape Images with wget

By  on  

The desire to download all images or video on the page has been around since the beginning of the internet.  Twenty years ago I would accomplish this task with a python script I downloaded.  I then moved on to browser extensions for this task, then started using a PhearJS Node.js JavaScript utility to scrape images.  All of these solutions are nice but I wanted to know how I could accomplish this task from command line.

To scrape images (or any specific file extensions) from command line, you can use wget:

wget -nd -H -p -A jpg,jpeg,png,gif -e robots=off http://boards.4chan.org/sp/

The script above downloads images across hosts (i.e. from a CDN or other subdomain) to the directory from which the command is run from.  You'll see downloaded media as they come down:

Reusing existing connection to s.4cdn.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 1505 (1.5K) [image/jpeg]
Saving to: '1490571194319s.jpg'

1490571194319s.jpg 100%[=====================>] 1.47K --.-KB/s in 0s

2017-03-26 18:33:26 (205 MB/s) - '1490571194319s.jpg' saved [1505/1505]

FINISHED --2017-03-26 18:33:26--
Total wall clock time: 2.7s
Downloaded: 66 files, 412K in 0.2s (2.10 MB/s)

Everyone loves cURL, which is another awesome resource, but don't foget about wget, which is arguably easier to use!

Recent Features

  • By
    Vibration API

    Many of the new APIs provided to us by browser vendors are more targeted toward the mobile user than the desktop user.  One of those simple APIs the Vibration API.  The Vibration API allows developers to direct the device, using JavaScript, to vibrate in...

  • By
    Serving Fonts from CDN

    For maximum performance, we all know we must put our assets on CDN (another domain).  Along with those assets are custom web fonts.  Unfortunately custom web fonts via CDN (or any cross-domain font request) don't work in Firefox or Internet Explorer (correctly so, by spec) though...

Incredible Demos

  • By
    Link Nudging with CSS3 Animations

    One of the more popular and simple effects I've featured on this blog over the past year has been linking nudging.  I've created this effect with three flavors of JavaScript:  MooTools, jQuery, and even the Dojo Toolkit.  Luckily CSS3 (almost) allows us to ditch...

  • By
    Xbox Live Gamer API

    My sharpshooter status aside, I've always been surprised upset that Microsoft has never provided an API for the vast amount of information about users, the games they play, and statistics within the games. Namely, I'd like to publicly shame every n00b I've baptized with my...

Discussion

    Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!