Download the content of a website using wget
On Linux, there is a very practical utility which allows you to retrieve files online: wget.
** French version: Aspirer un site Internet avec wget **
Executable as a command line, wget
allows you when you give it some specifics arguments to get the content of a given website.
Download a webpage
To download a website and its internal links (only from the same domain) and without getting parents pages, you can simply use the following command inside a console:
wget -r -k -np http://www.example.com
We have different arguments:
-r
: recursive download (wget
will follow the links it found in the webpage)-k
: change the links paths to allow the website to be viewed locally-np
: do not download parent webpages
Source : Ubuntu-fr
Download from a list of URL
It is possible to specify inside a file a list of URL you want wget
to downlad:
wget -i fichier.txt
The file must contains one URL by line.
In order to keep the tree structure, we can add -x argument:
wget -x -i fichier.txt
On Windows
On Windows, it is possible, if you are using Git Bash, to download wget
as an extension:
- Go to the webpage: https://gist.github.com/evanwill/0207876c3243bbb6863e65ec5dc3f058#wget
- Follow the instructions to download the ZIP file.
- Copy
wget.exe
insidemingw64\bin
folder from the Git Bash installation directory: for example,C:\Users\\AppData\Local\Programs\Git\mingw64\bin
Originally published at https://www.sliceo.com.