"Is the Internet for Porn? An Insight Into the Online Adult Industry"

Thursday, May 6. 2010
research
Recently, we studied an aspect of the World Wide Web that did not receive a lot of attention yet - the online adult industry. Compared to traditional media, the Internet provides fast, easy, and anonymous access to the desired content. That, in turn, results in a huge number of users accessing pornographic content. To improve the understanding of this part of the Web, we performed a study of the online adult industry. As a result, we provide a detailed overview of the individual actors and roles within the online adult industry, which enables us to better understand the mechanisms with which visitors are redirected between the individual parties and how money flows between them. Furthermore, we examined the security aspects of more than 250,000 adult pages and studied, among other aspects, the prevalence of drive-by download attacks. In addition, we analyzed domain-specific security threats such as disguised traffic redirection techniques, and surveyed the hosting infrastructure of adult sites.

Lastly, we operated two adult web sites on our own. By becoming adult web site operators ourselves, we gained additional insights on unique security aspects in this domain. This enabled us to obtain a deeper understanding of the related abuse potential. We participated in adult traffic trading, and provide a detailed discussion of this unique aspect of adult web sites, including insights into the economical implications, and possible attack vectors that a malicious site operator could leverage. For example, we discovered that a malicious operator could infect more than 20,000 with a minimal investment of about $160. Furthermore, we experimentally show that a malicious site operator could benefit from domain-specific business practices that facilitate click-fraud and mass exploitation. We conclude that many participants of this industry have business models that are based on very questionable practices that could very well be abused for malicious activities and conducting cyber-crime. In fact, we found evidence that this kind of abuse is already happening in the wild.

All details of our study are available in the paper. The paper will be presented at the Ninth Workshop on the Economics of Information Security (WEIS 2010). WEIS will take place on June 7/8 at Harvard University.

Abstract:
The online adult industry is among the most profitable business branches on the Internet, and its web sites attract large amounts of visitors and traffic. Nevertheless, no study has yet characterized the industry’s economical and security-related structure. As cyber-criminals are motivated by financial incentives, a deeper understanding and identification of the economic actors and interdependencies in the online adult business is important for analyzing security-related aspects of this industry.
In this paper, we provide a survey of the different economic roles that adult web sites assume, and highlight their economic and technical features. We provide insights into security flaws and potential points of interest for cyber-criminals. We achieve this by applying a combination of automatic and manual analysis techniques to investigate the economic structure of the online adult industry and its business cases. Furthermore, we also performed several experiments to gain a better understanding of the flow of visitors to these sites and the related cash flow, and report on the lessons learned while operating adult web sites on our own.

This paper was joint work with Gilbert Wondracek, Christian Platzer, Engin Kirda, and Christopher Kruegel, all members of the International Secure Systems Lab. You can get the paper at http://honeyblog.org/junkyard/paper/adultSites-weis2010.pdf.

Technical Report: "Abusing Social Networks for Automated User Profiling"

Wednesday, March 17. 2010
research
We recently published a technical report on another project related to social networks. The paper is entitled "Abusing Social Networks for Automated User Profiling" and we focus on automatically collecting information about users based on the information available in different networks.

Imagine that you have a profile on Facebook, on LinkedIn, and on MySpace. Perhaps you do not want to directly link these profiles, for example since you want to have a more serious profile on LinkedIn, while having a more relaxed one on MySpace and Facebook. Thus you use different pseudonym/names on the different profiles and expect that the information can not be correlated. However, there is a problem with that assumption: during the registration on the different networks, you used the same e-mail address. And a social network typically enables a user to search for e-mail addresses in order to find friends (a convenient feature, after all you want to network with your friends). An attacker can thus go ahead and search on each network for a given e-mail address, scrape the profile related to that address, and then correlate the information found on different network. At the end, an attacker can thus enrich a given e-mail address with information collected on different social networks.

An attacker can not only search for one e-mail address at a time, but typically for hundreds or even thousands. And he can not only do this once, but thousands of times per day. For example, we were able to check about 10 million e-mail addresses on Facebook per day. A spammer could use this "feature" to verify e-mail addresses by using Facebook as an oracle to determine whether or not a given e-mail address is valid. Furthermore, the correlation aspect is of course also a privacy problem since an attacker can find "hidden" information and correlate information across different networks.

We have contacted different social networks. Facebook and XING have already addressed the problem - thanks a lot!

Abstract:
Recently, social networks such as Facebook have experienced a huge surge in popularity. The amount of personal information stored in these sites calls for appropriate security precautions to protect this data.
In this paper, we describe how we are able to take advantage of a common weakness, namely the fact that an attacker can query the social network for registered e-mail addresses on a large scale. Starting with a list of about 10.4 million email addresses, we were able to automatically identify more than 1.2 million user profiles associated with these addresses. By crawling these profiles, we collect publicly available personal information about each user, which we use for automated profiling (i.e., to enrich the information available from each user).
Finally, we propose a number of mitigation techniques to protect the user’s privacy. We have contacted the most popular providers, who acknowledged the threat and are currently implementing our countermeasures. Facebook and XING in particular have recently fixed the problem.

The technical report is available at http://www.iseclab.org/papers/socialabuse-TR.pdf and it was joint work with Marco Balduzzi, Christian Platzer, Engin Kirda, Davide Balzarotti, and Christopher Kruegel.

"Inspector Gadget: Automated Extraction of Proprietary Gadgets from Malware Binaries"

Friday, March 12. 2010
When analyzing malware samples, a human analyst is typically interested in understanding/recovering a specific algorithms of the given sample. In the case of Conficker, for example, she might be interested in extracting the domain generation algorithm such that she can understand what domains are currently and in the future used by the malware. Or for spam bots, she might be interested in how the malware downloads spam templates, decodes them, and then generates the actual spam messages. Or for bots, she might be interested in understanding how binary updates are downloaded, decoded, and then executed.

In each case, the binary itself encodes the algorithm, but it is cumbersome and hard work to understand all of this. Thus it would be useful to have a tool that enables a malware analyst to automatically extract from a given binary sample the relevant algorithm related to a specific task. In a paper that will be presented at the 31st IEEE Symposium on Security & Privacy we introduce Inspector Gadget, a tool that implements exactly this. A gadget encapsulates all code related to a specific task and can be executed in a stand-alone fashion. A gadget player can take a gadget and replay it, for example to determine which domains are currently used by Conficker, or download and decode an update for a bot binary. Furthermore, we introduce an approach to revert gadget based on a enhanced brute-force algorithm: this is useful to understand the effects of malware in detail and we can (in certain cases) also revert obfuscation algorithms, i.e., to understand what data has been exfiltrated by a given sample. The full paper has all the details and describes Inspector Gadget in more depth. And if you are interested in the topic, you should also read the paper by Caballero et al. on BCR (paper title is "Binary Code Extraction and Interface Identification for Security Applications").

Abstract:
Unfortunately, malicious software is still an unsolved problem and a major threat on the Internet. An important component in the fight against malicious software is the analysis of malware samples: Only if an analyst understands the behavior of a given sample, she can design appropriate countermeasures. Manual approaches are frequently used to analyze certain key algorithms, such as downloading of encoded updates, or generating new DNS domains for command and control purposes.
In this paper, we present a novel approach to automatically extract, from a given binary executable, the algorithm related to a certain activity of the sample. We isolate and extract these instructions and generate a so-called gadget, i.e., a stand-alone component that encapsulates a specific behavior. We make sure that a gadget can autonomously perform a specific task by including all relevant code and data into the gadget such that it can be executed in a self-contained fashion.
Gadgets are useful entities in analyzing malicious software: In particular, they are valuable for practitioners, as understanding a certain activity that is embedded in a binary sample (e.g., the update function) is still largely a manual and complex task. Our evaluation with several real-world samples demonstrates that our approach is versatile and useful in practice.

The full paper is available at http://www.iseclab.org/papers/ieee_sp10_inspector_gadget.pdf and will be presented in May at the 31st IEEE Symposium on Security & Privacy. The paper was joint work with Clemens Kolbitsch, Christopher Kruegel, and Engin Kirda - all members of the International Secure Systems Lab.

"A Practical Attack to De-Anonymize Social Network Users"

Monday, February 1. 2010
In the last couple of months, we have worked on a technique to de-anonymize users based on the way they interact with social networks. The idea behind our attack is the fact that the group memberships of a user (i.e., the groups of a social network to which a user belongs) is often sufficient to uniquely identify this user. This means that there are only a few (or in the best case only one) users of a social network that are a member of exactly the same groups.

The attack scenario is the following: a malicious website wants to de-anonymize a user, i.e., find out the real name and identity of a visitor. The attack is implemented in two phases. In a first phase, we crawl the groups of a social network to determine the members of the different groups. This is our database from which we can generate a group fingerprint per user. In the second phase, we use the well-known technique of history stealing to probe the browser's history for links to group, thus determining the group fingerprint of the visitor. Wen can then compare this fingerprint to our database and de-anonymize the visitor. Even when unique identification is not possible, then the attack might still significantly reduce the size of the set of candidates that the victim belongs to.

As a proof-of-concept, we implemented the attack for XING, a well-known "Social Network for Business Professionals". Please note that this attack is not specific to XING or any other social network - it is generally applicable to different kinds of modern web applications that contain unique links for user that can be probed via history stealing. We crawled the ~7000 public groups of XING and found about 1.8 million members that belong to at least one group. These users are vulnerable to our attack and we have a demo website to participate in our experiment. Note that this test is only successful if you are a member of XING and a member of at least one group. If you regularly participate in groups the chances are higher that we can successfully de-anonymize you :-)

The following pictures show the different stages of the proof-of-concept attack:



We have published a technical report that summarizes our preliminary results at http://www.iseclab.org/papers/sonda-TR.pdf. In the next couple of weeks, we will finish the work on the paper and present our results at the 31st IEEE Symposium on Security & Privacy in May. A demo of the attack is available at http://www.iseclab.org/people/gilbert/experiment/.

Data Set For Malware Clustering/Classification

Friday, January 29. 2010
About one month ago I blogged about our research on malware clustering and classification. We have now also released the full data set from our experiments, such that other people can reproduce the results and compare our approach to theirs. You can find all information at http://pi1.informatik.uni-mannheim.de/malheur/, together with a description of the different data.

Quick overview of the data:
Our reference data set is extracted from our large database of malware binaries maintained at CWSandbox. The malware binaries have been collected over a period of three years from a variety of sources. From the overall database, we select binaries which have been assigned to a known class of malware by the majority of six independent anti-virus products. We append the overall anti-virus label to the filename of each report. Although anti-virus labels suffer from inconsistency, we expect the selection using different scanners to be reasonable consistent and accurate. To compensate for the skewed distribution of classes, we discard classes with less than 20 samples and restrict the maximum contribution of each class to 300 binaries. The selected malware binaries are then executed and monitored using CWSandbox, resulting in a total of 3.133 behavior reports in MIST format.

The application data set consists of seven chunks of malware binaries obtained from the anti-malware vendor Sunbelt Software. The binaries correspond to malware collected during seven consecutive days in August 2009 and originate from a variety of sources. Sunbelt Software uses these very samples to create and update signatures for their VIPRE anti-malware product as well as for their security data feed ThreatTrack. The complete test data set consists of 33.698 behavior reports in MIST format.

The full technical report is available at http://honeyblog.org/junkyard/paper/malheur-TR-2009.pdf.

Update: I changed the terms within the description to use the correct description.