website fingerprinting

Next generation web scanner. Identify what websites are running.

Download whatweb-0.4.4.tar.gz
Latest Version 0.4.4, 29th June 2010
License GPLv2
Author urbanadventurer aka Andrew Horton from Security-Assessment.com

Introduction

Identify content management systems (CMS), blogging platforms, stats/analytics packages, javascript libraries, servers and more. When you visit a website in your browser the transaction includes many unseen hints about how the webserver is set up and what software is delivering the webpage. Some of these hints are obvious, eg. “Powered by XYZ” and others are more subtle. WhatWeb recognises these hints and reports what it finds.

WhatWeb has over 160 plugins and needs community support to develop more. Plugins can identify systems with obvious signs removed by looking for subtle clues. For example, a WordPress site might remove the tag but the WordPress plugin also looks for “wp-content” which is less easy to disguise. Plugins are flexible and can return any datatype, for example plugins can return version numbers, email addresses, account ID’s and more.

There are both passive and aggressive plugins, passive plugins use information on the page, in cookies and in the URL to identify the system. A passive request is as light weight as a simple GET / HTTP/1.1 request. Aggressive plugins guess URLs and request more files. Plugins are easy to write, you don’t need to know ruby to make them.

Example Usage

Using WhatWeb on a handful of websites. (This is a screenshot of an older version)

whatweb-examples

Help

WhatWeb - Next generation web scanner.
Version 0.4.4 by urbanadventurer aka Andrew Horton from Security-Assessment.com
Homepage: http://www.morningstarsecurity.com/research/whatweb

Usage: whatweb [options] 

			Enter URLs or filenames. Use /dev/stdin to pipe HTML directly
--input-file=FILE, -i	Identify URLs found in FILE
--aggression, -a	1 passive - on-page
			2 polite - follow on-page links if in the extra-urls list (default)
			3 impolite - try extra-urls when plugin matches (smart, guess a few urls)
			4 aggressive - try extra-urls for every plugin (guess a lot of urls)
--recursion, -r		Follow links recursively. Only follows links under the path (default: off)
--depth, -d		Maximum recursion depth (default: 10)
--max-links, -m		Maximum number of links to follow on one page (default: 250)
--list-plugins, -l	List the plugins
--run-plugins, -p	Run comma delimited list of plugins. Default is to run all
--info-plugins, -I	Display information about a comma delimited list of plugins. Default is all
--example-urls, -e	Add example urls for each plugin to the target list
--colour=[WHEN],
--color=[WHEN]		control whether colour is used. WHEN may be `never', `always', or `auto'
--log-full=FILE		Log verbose output
--log-brief=FILE	Log brief, one-line output
--log-xml=FILE		Log XML format
--user-agent, -U	Identify as user-agent instead of WhatWeb/VERSION.
--max-threads, -t	Number of simultaneous threads identifying websites in parallel. Default is 25.
--no-redirect		Do not follow HTTP 3xx redirects.
--proxy, -t		 Set proxy hostname and port (default: 8080)
--proxy-user, -t	 Set proxy user and password
--help, -h		This help
--verbose, -v		Increase verbosity (recommended), use twice for debugging.
--version		Display verion information.

Log Output

There are currently 3 types of log output. They are:
* Brief logging
* Full logging
* XML logging

Brief Logging

Example usage: whatweb –brief-full b.log digg.com

http://digg.com [200] X-Powered-By[PHP/5.2.9-digg8], Cookies[1337,PHPSESSID,ccc], UncommonHeaders[keep-alive],
Title[Digg - The Latest News Headlines, Videos and Images], HTTPServer[Apache], Mailto,
Header-Hash[2df7eaaa4480f28013aaf48ae9266b84], MD5[24bc43e698e5d1388e836f5eee094fbe],
Footer-Hash[ca2ffbc939969a2246cde196f0fc4841], Div-Span-Structure[828d809947c3c760d41c720c9203993b]

Full Logging

Example usage: whatweb –log-full f.log digg.com

    Identifying: http://digg.com
	HTTP-Status: 200
	[["Cookies",
	  [{:probability=>100,
		:name=>"cookie names",
		:string=>["1337", "PHPSESSID", "ccc"]}]],
	 ["Div-Span-Structure",
	  [{:probability=>100,
		:name=>"div structure",
		:string=>"828d809947c3c760d41c720c9203993b"}]],
	 ["Footer-Hash",
	  [{:probability=>100,
		:name=>"hash",
		:string=>"ca2ffbc939969a2246cde196f0fc4841"}]],
	 ["HTTPServer",
	  [{:probability=>100, :name=>"server string", :string=>"Apache"}]],
	 ["Header-Hash",
	  [{:probability=>100,
		:name=>"hash",
		:string=>"2df7eaaa4480f28013aaf48ae9266b84"}]],
	 ["MD5",
	  [{:probability=>100,
		:name=>"page title",
		:string=>"455e6da4264cc6334b78a72c083ced77"}]],
	 ["Mailto",
	  [{:emails=>
		 ["?subject=Digg Story: Jennifer Aniston,wins the battle of the bikini with Model 23&body=I wanted to share this story with you: http://digg.com/d31RvOK?e\r\n --- \r\n\"Jennifer Aniston,wins the battle of the bikini with Model 23\"\r\nActresses peeled off to reveal a two-piece as they filmed romantic comedy Just Go With It in Hawaii.\r\n+156 people dugg this story."]
		:probability=>100,
		:name=>"mailto:"}]],
	 ["Title",
	  [{:probability=>100,
		:name=>"page title",
		:string=>"Digg - The Latest News Headlines, Videos and Images"}]],
	 ["UncommonHeaders",
	  [{:probability=>100, :name=>"headers", :string=>"keep-alive"}]],
	 ["X-Powered-By",
	  [{:probability=>100,
		:name=>"x-powered-by string",
		:string=>"PHP/5.2.9-digg8"}]]]

XML Logging

The XML logging is currently naive and may change. Please contact me if you have suggestions.

Example usage: ./whatweb –log-full f.log –log-xml x.log digg.com

	<target>
		<uri>http://digg.com</uri>
		<http-status>200</http-status>
		<plugin>
			<name>Cookies</name>
			<string>1337</string>
			<string>PHPSESSID</string>
			<string>ccc</string>
		</plugin>
		<plugin>
			<name>Div-Span-Structure</name>
			<string>828d809947c3c760d41c720c9203993b</string>
		</plugin>
		<plugin>
			<name>Footer-Hash</name>
			<string>ca2ffbc939969a2246cde196f0fc4841</string>
		</plugin>
		<plugin>
			<name>HTTPServer</name>
			<string>Apache</string>
		</plugin>
		<plugin>
			<name>Header-Hash</name>
			<string>2df7eaaa4480f28013aaf48ae9266b84</string>
		</plugin>
		<plugin>
			<name>MD5</name>
			<string>455e6da4264cc6334b78a72c083ced77</string>
		</plugin>
		<plugin>
			<name>Mailto</name>
		</plugin>
		<plugin>
			<name>Title</name>
			<string>Digg - The Latest News Headlines, Videos and Images</string>
		</plugin>
		<plugin>
			<name>UncommonHeaders</name>
			<string>keep-alive</string>
		</plugin>
		<plugin>
			<name>X-Powered-By</name>
			<string>PHP/5.2.9-digg8</string>
		</plugin>
	</target>

Plugins

Plugins are easy to make.
Matches are made with regular expressions, Google Hack Database queries, and custom ruby code.
The certainty means maybe (25), probably (75) and certain (100).

phpBB,0.2
phpFreeChat,0.1
phpMyAdmin,0.1
phpPgAdmin,0.1
phpSysInfo,0.1
phpinfo,0.1
uPortal,0.1

Aggressive Plugins

There are currently aggressive plugins for Joomla, phpBB, FluxBB, OSCommerce and Tomcat.
With the passive plugin we know that ardentcreative.co.nz is running Joomla version 1.5

Be cafeful when using aggressive plugins with recursive site crawling. WhatWeb has no understanding of a website, instead it currently treats each URL separately. It also has no caching so if you use aggressive plugins with recursion you will fetch the same files multiple times.

Writing Plugins

View the tutorial on writing WhatWeb plugins at www.morningstarsecurity.com/downloads/How-to-develop-WhatWeb-plugins-1.1.txt.

A typical plugin looks like this:

There are 3 levels to a plugin. Simple matches, passive and agressive tests. You don’t need to know ruby to write plugins with simple matches. Passive and aggressive tests are written in ruby.

If you port a GHDB match, use :ghdb. I usually rewrite the GHDB matches with regular expressions, especially if they require inurl:

Example:

# http://johnny.ihackstuff.com/ghdb?function=detail&id=1840
{:name=>”GHDB: \”Powered by Vsns Lemon\” intitle:\”Vsns Lemon\”",
:probability=>100,
:ghdb=>’”Powered by Vsns Lemon” intitle:”Vsns Lemon”‘}

Note the GHDB queries are case insensitive, as a Google query is. Support codes are intitle:, inurl: and filetype:.

Each plugin can access @body, @meta, @status and @base_uri variables.

Passive tests add matches to the m array, each match is a hash containing the name of the match, probability and more.
The entire hash is returned with Full output, Brief output returns just the match, :version and :string

To discover the regular expressions to match against, wget about 20-30 examples into the tests/ folder. Be aware that some software can have dramatic variations between versions.
First view the META data and HTML of a few examples.
The find-common-stuff tool can help discover unexpected similarities in the examples.

Recursive Spider

The recursion option is used to scan some or all of a website with whatweb. Recursive spidering will follow each link on a webpage if it is within the same website, then repeat the process on the followed pages.

The configurable settings for recursive spidering are:
–recursion, -r Follow links recursively. Only follows links under the path (default: off)
–depth, -d Maximum recursion depth (default: 3)
–max-links, -m Maximum number of links to follow on one page (default: 25)

Limitations of the spidering. This follows links in <a> tags, these are the HTML tags designed specifically for links. The spider does not obtain urls from other sources. Some good choices for future improvement are image tags, eg. <img src=”/images/boats.jpg”>, form tags, eg. <form action=”/vote.php”>, url paths in CSS files, etc.

Related Projects

WhatWeb is unique however there are some web projects with the same goal of identifying a website.

WAFP – Web Application Finger Printing
Wafp identifies systems by requesting a large quantity of URLs and comparing md5 sums of the results against a database. This method is reliable for known systems in the database and it is simple to add new ones. Unlike whatweb, this method is intrusive and will create a lot of webserver log entries.
http://www.mytty.org/wafp

Wappalyzer
This is the most similar project to WhatWeb.
Firefox plugin identifies sites using 1 regexp each. Only looks for obvious identifiers like meta generator tags. Sends all recognized urls to a DB. Has nice icons
https://addons.mozilla.org/en-US/firefox/addon/10229

w3af
http://w3af.sourceforge.net
Very slight overlap of features in the grep and discovery scripts section.

HTTPRecon
No feature overlap, fingerprints the HTTP Server
http://w3dt.net/tools/httprecon/

http://www.net-square.com/httprint/httprint_paper.html

http://www.darknet.org.uk/2007/09/httprint-v301-web-server-fingerprinting-tool-download/

Nmap version scan
Nmap shows some info about HTTP servers when using version scan, eg. nmap -sV -p80 treshna.com

THC’s Amap
This tool is an application fingerprint scanner which can identify an HTTP protocol server. It doesn’t identify types of HTTP servers.

What’s that web server running 1.0 (whatweb.exe)
This shares the same name and goal but is shit. It ONLY uses the HTTP Server string. For example ‘Apache/2.0.55 (Ubuntu) PHP/5.1.2′
http://www.spambutcher.com/whatweb.html

www.http-stats.com
Lots of info about HTTP server names

Funny & Unusual

Slashdot.org
X-Fry: You mean Bender is the evil Bender? I’m shocked! Shocked! Well not that shocked.

popurls.com
X-popurls-a: in the future every url will be popular for 1.5 seconds

reddit.com
HTTPServer:’; DROP TABLE servertypes; –

Notes

Version 0.3 Released at Kiwicon III (kiwicon.org), 2009.
Version 0.4, March 14th 2010
Version 0.4.1, April 28th 2010
Version 0.4.2, April 30th 2010
Version 0.4.3, May 24th 2010
Version 0.4.4, June 29th 2010

Credits

Written by urbanadventurer aka Andrew Horton from Security-Assessment.com
Homepage: http://www.morningstarsecurity.com
License: GPLv2

Anemone library (used for spidering) is written by Chris Kite
Homepage: http://anemone.rubyforge.org/
License: MIT

Community Plugins

Thank you to the following people who have contributed a plugin to WhatWeb.

Brendan Coles
Emilio Casbas
Louis Nyffenegger
Patrik Wallström

Thank you to Michal Ambroz for writing the Makefile and Man pages