A Crawler-based Study of Spyware on the Web

Tuesday, January 31. 2006
At NDSS'06 in February, there is an interesting paper that examines the amount of spyware on the world wide web. In the paper "A Crawler-based Study of Spyware on the Web the authors desribe their results from crawling the web for malicious content. The basic idea is to simply crawl the web and then analyze all captured binary with the help of a VM and Ad-Aware. Moreover, they also examined web sites containing malicious content that exploit browser vulnerabilities. The abstract gives some more details on the amount of malware found:

Abstract:
Malicious spyware poses a significant threat to desktop security and integrity. This paper examines that threat from an Internet perspective. Using a crawler, we performed a large-scale, longitudinal study of the Web, sampling both executables and conventional Web pages for malicious objects. Our results show the extent of spyware content. For example, in a May 2005 crawl of 18 million URLs, we found spyware in 13.4% of the 21,200 executables we identified. At the same time, we found scripted “drive-by download” attacks in 5.9% of the Web pages we processed. Our analysis quantifies the density of spyware, the types of of threats, and the most dangerous Web zones in which spyware is likely to be encountered. We also show the frequency with which specific spyware programs were found in the content we crawled. Finally, we measured changes in the density of spyware over time; e.g., our October 2005 crawl saw a substantial reduction in the presence of drive-by download attacks, compared with those we detected in May.

Unfortunately, they do not give an explanation why there is a drop in their results in October compared to May. And it would be interesting to carry out such an analysis at a larger scale, perhaps in cooperation with a search engine like Google ("A Statistical Review of 1 Billion Web Pages")...

Sebek 3: Tracking the Attackers

Wednesday, January 18. 2006
SecurityFocus has published a new article by Raul Siles entitled Sebek 3: Tracking the Attackers. The article deals with the basics of Sebek 3 and gives detailed information about the mechanism behind this tool. In addition, several challenges of Sebek are presented. Most of these challenges have already been covered in previous articles published at SecurityFocus:


Introduction of the article:
It has become increasingly important for security professionals to deploy new detection mechanisms to track and capture an attacker's activities. Third Generation (GenIII) Honeynets provide all the components and tools required to gather this information at the deepest level. Sebek is the primary data capture tool for GenIII Honeynets.

The first of this two-part series will discuss what Sebek is and what makes it so interesting. We'll start by looking at the latest Sebek release, version 3, its new capabilities, the Sebek protocol specification and how it integrates with GenIII Honeynet infrastructures. The second article will briefly address how to install and use Sebek on Linux and Windows. It will then focus on a Sebek patch developed by this article's author that makes possible not only to watch what the attacker types but also the response received.

Potemkin Honeyfarm System

Wednesday, January 11. 2006
An interesting paper was presented at the 20th ACM Symposium on Operating Systems Principles. The paper entitled "Scalability, Fidelity and Containment in the Potemkin Virtual Honeyfarm" describes a prototype implementation of a honeyfarm system that is capable of emulating thousand of hosts in parallel. They use XEN, a virtual machine monitor that uses paravirtualization, as a basic building block. Furthermore, the paper introduces the ideas of flashing cloning and delta virtualization to enhance performance. Unfortunately, the system is not avaiable for download...

Abstract:
The rapid evolution of large-scale worms, viruses and botnets have made Internet malware a pressing concern. Such infections are at the root of modern scourges including DDoS extortion, on-line identity theft, SPAM, phishing, and piracy. However, the most widely used tools for gathering intelligence on new malware - network honeypots - have forced investigators to choose between monitoring activity at a large scale or capturing behavior with high fidelity. In this paper, we describe an approach to minimize this tension and improve honeypot scalability by up to six orders of magnitude while still closely emulating the execution behavior of individual Internet hosts. We have built a prototype honeyfarm system, called Potemkin, that exploits virtual machines, aggressive memory sharing, and late binding of resources to achieve this goal. While still an immature implementation, Potemkin has emulated over 64,000 Internet honeypots in live test runs, using only a handful of physical servers.

Continue reading "Potemkin Honeyfarm System"