A Crawler-based Study of Spyware on the Web

At NDSS'06 in February, there is an interesting paper that examines the amount of spyware on the world wide web. In the paper "A Crawler-based Study of Spyware on the Web the authors desribe their results from crawling the web for malicious content. The basic idea is to simply crawl the web and then analyze all captured binary with the help of a VM and Ad-Aware. Moreover, they also examined web sites containing malicious content that exploit browser vulnerabilities. The abstract gives some more details on the amount of malware found:

Abstract:
Malicious spyware poses a significant threat to desktop security and integrity. This paper examines that threat from an Internet perspective. Using a crawler, we performed a large-scale, longitudinal study of the Web, sampling both executables and conventional Web pages for malicious objects. Our results show the extent of spyware content. For example, in a May 2005 crawl of 18 million URLs, we found spyware in 13.4% of the 21,200 executables we identified. At the same time, we found scripted “drive-by download” attacks in 5.9% of the Web pages we processed. Our analysis quantifies the density of spyware, the types of of threats, and the most dangerous Web zones in which spyware is likely to be encountered. We also show the frequency with which specific spyware programs were found in the content we crawled. Finally, we measured changes in the density of spyware over time; e.g., our October 2005 crawl saw a substantial reduction in the presence of drive-by download attacks, compared with those we detected in May.

Unfortunately, they do not give an explanation why there is a drop in their results in October compared to May. And it would be interesting to carry out such an analysis at a larger scale, perhaps in cooperation with a search engine like Google ("A Statistical Review of 1 Billion Web Pages")...

Trackbacks

    No Trackbacks

Comments

Display comments as (Linear | Threaded)

    No comments


Add Comment


Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA