Download with wget

2 minute read

Download entire websites easy

GNU Wget is a nice tool for downloading resources from the internet. The basic usage is wget url:

wget http://linuxreviews.org/

The power of wget is that you may download sites recursive, meaning you also get all pages (and images and other data) linked on the front page:

wget -r http://linuxreviews.org/

But many sites do not want you to download their entire site. To prevent this, they check how browsers identify. Many sites refuse you to connect or send a blank page if they detect you are not using a web-browser. You might get a message like:

Sorry, but the download manager you are using to view this site is not supported. We do not support use of such download managers as flashget, go!zilla, or getright

There is a very handy -U option for sites like this. Use

-U My-browser

to tell the site you are using some commonly accepted browser:

 wget  -r -p -U Mozilla http://www.stupidsite.com/restricedplace.html

A web-site owner will probably get upset if you attempt to download his entire site using a simple

wget http://foo.bar

command. However, the web-site owner will not even notice you if you limit the download transfer rate and pause between fetching files.

To make sure you are not manually added to a blacklist, the most important command line options are –limit-rate= and –wait= .

To pause 20 seconds between retrievals you should add

--wait=20

and to limit the download rate use something like

--limit-rate=20K

as this option defaults to bytes, add K to set KB/s.

Example:

wget --wait=20 --limit-rate=20K -r -p -U Mozilla http://www.stupidsite.com/restricedplace.html

A very handy option that guarantees wget will not download anything from the folders beneath the folder you want to acquire is:

--no-parent

Use this to make sure wget does not fetch more than it needs to if you just want to download the files in a folder.

Read the manual page for wget to learn more about GNU Wget. The full official manual is available here.

To install the Gnome front-end for wget click here.

The original version of this how-to is available at http://linuxreviews.org/quicktips/wget/wget.en.pdf

Copyright (c) 2000-2004 Øyvind Sæther. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”. Tags: wget Created: 7 years ago. Last edited: 6 years ago. Read 2294 times.

.highlight {
  margin: 0;
  padding: 1em;
  font-family: $monospace;
  font-size: $type-size-7;
  line-height: 1.8;
}

Check out the Jekyll docs for more info on how to get the most out of Jekyll. File all bugs/feature requests at Jekyll’s GitHub repo. If you have questions, you can ask them on Jekyll Talk.