A Crawler-based Study of Spyware on the Web

Tuesday, January 31. 2006
At NDSS'06 in February, there is an interesting paper that examines the amount of spyware on the world wide web. In the paper "A Crawler-based Study of Spyware on the Web the authors desribe their results from crawling the web for malicious content. The basic idea is to simply crawl the web and then analyze all captured binary with the help of a VM and Ad-Aware. Moreover, they also examined web sites containing malicious content that exploit browser vulnerabilities. The abstract gives some more details on the amount of malware found:

Abstract:
Malicious spyware poses a significant threat to desktop security and integrity. This paper examines that threat from an Internet perspective. Using a crawler, we performed a large-scale, longitudinal study of the Web, sampling both executables and conventional Web pages for malicious objects. Our results show the extent of spyware content. For example, in a May 2005 crawl of 18 million URLs, we found spyware in 13.4% of the 21,200 executables we identified. At the same time, we found scripted “drive-by download” attacks in 5.9% of the Web pages we processed. Our analysis quantifies the density of spyware, the types of of threats, and the most dangerous Web zones in which spyware is likely to be encountered. We also show the frequency with which specific spyware programs were found in the content we crawled. Finally, we measured changes in the density of spyware over time; e.g., our October 2005 crawl saw a substantial reduction in the presence of drive-by download attacks, compared with those we detected in May.

Unfortunately, they do not give an explanation why there is a drop in their results in October compared to May. And it would be interesting to carry out such an analysis at a larger scale, perhaps in cooperation with a search engine like Google ("A Statistical Review of 1 Billion Web Pages")...

Distribution of Filesize

Monday, January 30. 2006
The following picture shows the distribution of filesize in kilobytes for about 14,000 unique malware samples I have collected during the last few months. Uniqueness is defined in this context as "unique md5sum".

Distribution of filesize


As you can see, there are several spikes, mainly around 190KB, 45 KB, and 10 KB. The picture only shows the filesize between 0 and 250 KB. nepentes also captured some rather large bots (> 1MB) - I wonder how long it takes to infect a computer hanging on a modem line with such a large bot...

If you are interested in samples, please contact me at thorsten [dot] holz [at] gmail.com

Blog.Worm

Thursday, January 26. 2006

Blog.Worm

Slides From 17th TF-CSIRT/FIRST Meeting

Tuesday, January 24. 2006
You can now download the slides from my talk about the German Honeynet Project at the 17th TF-CSIRT and FIRST joint event.

Effektives Sammeln von Malware mit Honeypots

Saturday, January 21. 2006
(Sorry folks, this posting is in German...)

Anlässlich des 13. DFN-CERT Workshop "Sicherheit in vernetzten Systemen" gibt es einen Artikel, der das Sammeln von Malware mit Hilfe von mwcollect beschreibt.

Abstract:
Ein Großteil der sich heutzutage autonom verbreitenden Malware infiziert weitere Opfer über bereits bekannte Schwachstellen in Netzwerkdiensten, die sich automatisiert exploiten lassen. Darüber hinaus tauchen immer mehr Bots auf, die auf der gleichen Quellcode-Familie basieren, jedoch oft mit unterschiedlichen und teilweise modifizierten Packern gepackt sind. Daher ist es wichtig, solche Malware automatisiert sammeln zu können, um effektiv neue Signaturen für Virenscanner zu erstellen oder das Verhalten von Botnetzen zu studieren.

Da es sich um bekannte Schwachstellen handelt, lassen sich reaktiv Pattern für diese Schwachstellen erstellen und ein Daemon kann implementiert werden, der verwundbare Services gegenüber sich autonom verbreitender Malware simuliert. Dabei ist es nicht nötig, diese Services vollständig und korrekt nachzubilden, sondern es ist ausreichend, eine vereinfachten Emulation der Dienste zu implementieren.

Einen solchen Daemon stellt das seit März 2005 vom Honeynet Project entwickelte Projekt mwcollect bereit.

Den vollständigen Artikel gibt es als effektives-sammeln-von-malware.pdf.

Integrating Google Hack and GenIII Honeypots

Thursday, January 19. 2006
Together with Ryan McGeehan from the Google Hack Honeypot (GHH) team I have written a short summary of how current Gen III honeynets and GHHs could be integrated. Essentially, we are adding advertisement to honeypot technology. That is all this really comes down to. The tricky part is how it is advertised to reduce false positives, which we will design after we know what resources we will be using. But due to this advertisement, we will be able to attract a new class of attackers and learn about new tools.

Furthermore, this is a way to learn more about targeted attacks. So instead of blind scanning, this is more like a hitlist that is generated with the help of different search engines. This is a new aspect in the area of "classical" GenIII honeypots since they have no real way to attract attackers and to learn more about targeted attacks.

The basic ideas are

  • Redirecting traffic from GHHs to GenIII honeypots

  • Analyzing GHH logfiles with the help of GenIII honeypots

  • Generating GHHs with the help of information collected with GenIII honeypots

  • Cooperation with Google or other search engines to improve data capture capabilities


There is also an elaborated version available.

Continue reading "Integrating Google Hack and GenIII Honeypots"

Sebek 3: Tracking the Attackers

Wednesday, January 18. 2006
SecurityFocus has published a new article by Raul Siles entitled Sebek 3: Tracking the Attackers. The article deals with the basics of Sebek 3 and gives detailed information about the mechanism behind this tool. In addition, several challenges of Sebek are presented. Most of these challenges have already been covered in previous articles published at SecurityFocus:


Introduction of the article:
It has become increasingly important for security professionals to deploy new detection mechanisms to track and capture an attacker's activities. Third Generation (GenIII) Honeynets provide all the components and tools required to gather this information at the deepest level. Sebek is the primary data capture tool for GenIII Honeynets.

The first of this two-part series will discuss what Sebek is and what makes it so interesting. We'll start by looking at the latest Sebek release, version 3, its new capabilities, the Sebek protocol specification and how it integrates with GenIII Honeynet infrastructures. The second article will briefly address how to install and use Sebek on Linux and Windows. It will then focus on a Sebek patch developed by this article's author that makes possible not only to watch what the attacker types but also the response received.