Sveriges mest populära poddar

Hacker Public Radio

HPR4293: HTTrack website copier software

N/A • 15 januari 2025

This show has been flagged as Clean by the host.

The Wayback Machine by The Internet Archive is a very good resource for web sites no longer existing or older revisions of them.


However, sometimes I have also found it is nice and useful to have my own copy of a web site. It means I have control over the copy, it can be accessed offline and no world wide wait for the page to load.


My most typical use case if for web sites that I am manager of myself. For one or another reason, I want to keep a snapshot of the site. I have also used it for fact based sites which I want to always have access to, like a reference book. One of my recent use cases was a magazine that has closed down and announced the web site will also soon be terminated. Although it is available in the Wayback machine, I wanted to have a copy myself for a short period of time.


The software I use for this HTTrack. This software is available for Windows, Android, Linux and unix-like systems. It is at least for some platforms available with a graphical user interface. I have myself only used HTTrack with the terminal interface on Linux. HTTrack is a free and open source software.


In its simplest way to operate, it is just to type "httrack" followed by the url to the start page of the site to be copied.


In many cases this works well, I get a perfect copy. In other cases, it works less well. First of all, of course, I do not copy very big websites, both for the amount of time it takes and the disc space. What is stated in the robot textfile can also matter for the result. Another issue can be the folder structure of the site, HTTrack may not find all folders in its default setup, for example how images are stored. I have myself also got issues when menues and links not works normally where I instead have to right click to open the link.


The HTTrack web site has quite a lot of information in the documentation and it also has a forum. And in the terminal, there is also good help about all additional available commands. I have in general for my usage found the simple first attempt to copy sites gives perfect or good enough result directly without need to research details.


So, when I want to preserve snapshot of earlier releases of my own sites or when I want to have an offline and preserved copy of an important site, I consider HTTrack to be an easy to use and yet powerful tool. I am aware other similar tools exist, but this is the one I currently use.


HTTrack website copier website: https://www.httrack.com/

Provide feedback on this episode.

Förekommer på
00:00 -00:00