Wget is a content retrieval application and is part of the GNU project. It supports HTTP, HTTPS, and FTP protocols, however it does not seem to support HTTP compression. Also Wget supports robots.txt exclusion standard, but it seems many users disable this feature and make Wget crawl sites even when robots.txt includes
User-agent: Wget
Disallow: /