A Crawler-based Study of Spyware on the Web
Tuesday, January 31. 2006
At NDSS'06 in February, there is an interesting paper that examines the amount of spyware on the world wide web. In the paper "A Crawler-based Study of Spyware on the Web the authors desribe their results from crawling the web for malicious content. The basic idea is to simply crawl the web and then analyze all captured binary with the help of a VM and Ad-Aware. Moreover, they also examined web sites containing malicious content that exploit browser vulnerabilities. The abstract gives some more details on the amount of malware found:
Abstract:
Malicious spyware poses a significant threat to desktop security and integrity. This paper examines that threat from an Internet perspective. Using a crawler, we performed a large-scale, longitudinal study of the Web, sampling both executables and conventional Web pages for malicious objects. Our results show the extent of spyware content. For example, in a May 2005 crawl of 18 million URLs, we found spyware in 13.4% of the 21,200 executables we identified. At the same time, we found scripted “drive-by download” attacks in 5.9% of the Web pages we processed. Our analysis quantifies the density of spyware, the types of of threats, and the most dangerous Web zones in which spyware is likely to be encountered. We also show the frequency with which specific spyware programs were found in the content we crawled. Finally, we measured changes in the density of spyware over time; e.g., our October 2005 crawl saw a substantial reduction in the presence of drive-by download attacks, compared with those we detected in May.
Unfortunately, they do not give an explanation why there is a drop in their results in October compared to May. And it would be interesting to carry out such an analysis at a larger scale, perhaps in cooperation with a search engine like Google ("A Statistical Review of 1 Billion Web Pages")...
Abstract:
Malicious spyware poses a significant threat to desktop security and integrity. This paper examines that threat from an Internet perspective. Using a crawler, we performed a large-scale, longitudinal study of the Web, sampling both executables and conventional Web pages for malicious objects. Our results show the extent of spyware content. For example, in a May 2005 crawl of 18 million URLs, we found spyware in 13.4% of the 21,200 executables we identified. At the same time, we found scripted “drive-by download” attacks in 5.9% of the Web pages we processed. Our analysis quantifies the density of spyware, the types of of threats, and the most dangerous Web zones in which spyware is likely to be encountered. We also show the frequency with which specific spyware programs were found in the content we crawled. Finally, we measured changes in the density of spyware over time; e.g., our October 2005 crawl saw a substantial reduction in the presence of drive-by download attacks, compared with those we detected in May.
Unfortunately, they do not give an explanation why there is a drop in their results in October compared to May. And it would be interesting to carry out such an analysis at a larger scale, perhaps in cooperation with a search engine like Google ("A Statistical Review of 1 Billion Web Pages")...



