"Inspector Gadget: Automated Extraction of Proprietary Gadgets from Malware Binaries"

Friday, March 12. 2010
When analyzing malware samples, a human analyst is typically interested in understanding/recovering a specific algorithms of the given sample. In the case of Conficker, for example, she might be interested in extracting the domain generation algorithm such that she can understand what domains are currently and in the future used by the malware. Or for spam bots, she might be interested in how the malware downloads spam templates, decodes them, and then generates the actual spam messages. Or for bots, she might be interested in understanding how binary updates are downloaded, decoded, and then executed.

In each case, the binary itself encodes the algorithm, but it is cumbersome and hard work to understand all of this. Thus it would be useful to have a tool that enables a malware analyst to automatically extract from a given binary sample the relevant algorithm related to a specific task. In a paper that will be presented at the 31st IEEE Symposium on Security & Privacy we introduce Inspector Gadget, a tool that implements exactly this. A gadget encapsulates all code related to a specific task and can be executed in a stand-alone fashion. A gadget player can take a gadget and replay it, for example to determine which domains are currently used by Conficker, or download and decode an update for a bot binary. Furthermore, we introduce an approach to revert gadget based on a enhanced brute-force algorithm: this is useful to understand the effects of malware in detail and we can (in certain cases) also revert obfuscation algorithms, i.e., to understand what data has been exfiltrated by a given sample. The full paper has all the details and describes Inspector Gadget in more depth. And if you are interested in the topic, you should also read the paper by Caballero et al. on BCR (paper title is "Binary Code Extraction and Interface Identification for Security Applications").

Unfortunately, malicious software is still an unsolved problem and a major threat on the Internet. An important component in the fight against malicious software is the analysis of malware samples: Only if an analyst understands the behavior of a given sample, she can design appropriate countermeasures. Manual approaches are frequently used to analyze certain key algorithms, such as downloading of encoded updates, or generating new DNS domains for command and control purposes.
In this paper, we present a novel approach to automatically extract, from a given binary executable, the algorithm related to a certain activity of the sample. We isolate and extract these instructions and generate a so-called gadget, i.e., a stand-alone component that encapsulates a specific behavior. We make sure that a gadget can autonomously perform a specific task by including all relevant code and data into the gadget such that it can be executed in a self-contained fashion.
Gadgets are useful entities in analyzing malicious software: In particular, they are valuable for practitioners, as understanding a certain activity that is embedded in a binary sample (e.g., the update function) is still largely a manual and complex task. Our evaluation with several real-world samples demonstrates that our approach is versatile and useful in practice.

The full paper is available at http://www.iseclab.org/papers/ieee_sp10_inspector_gadget.pdf and will be presented in May at the 31st IEEE Symposium on Security & Privacy. The paper was joint work with Clemens Kolbitsch, Christopher Kruegel, and Engin Kirda - all members of the International Secure Systems Lab.

Waledac Infection Check

Tuesday, March 2. 2010
Ben Stock has implemented a web service to check a given IP address for infection with Waledac, similar to the Conficker Eye Chart. The idea is that we are currently tracking Waledac as part of the take-down effort and thus we have a pretty good overview of the individual bots within the botnet. Therefore we are in a position to determine if we have seen a given IP address in the recent past as a bot, which indicates that this IP address might be related to a Waledac infection. Of course, effects like NAT or DHCP need to be taken into account: if an IP address is not listed, this does not necessarily mean that you are not infected.

The check is available at http://mwanalysis.org/waledac/, feedback is welcome!

Waledac Takedown Successful

Thursday, February 25. 2010
A few weeks ago, I blogged about our paper "Walowdac – Analysis of a Peer-to-Peer Botnet". The paper provides an overview of the Waledac botnet and its specific aspects compared to Storm Worm and similar peer-to-peer botnets. The paper also contains some measurement results for the botnet like the typical number of online bots and similar statistics.

In the last couple of days, the situation changed a bit: we worked on an active takedown of the botnet together with experts from Microsoft, Shadowserver, the University of Mannheim, University of Bonn, University of Washington, Symantec and others. The operation is know within Microsoft as "Operation b49" and involved domain takedowns and additional technical countermeasures. Microsoft also did some fantastic work on the legal side, the complaint filed by Microsoft ("Microsoft Corporation v. John Does 1-27, et. al.") is available online. As a result, the communication infrastructure of Waledac has been disrupted to a certain extent and the botmaster can effectively not send commands to the bots. The Waledac Tracker by sudosecure.net also shows a nice decline in the number of bots for the last few days. Note, however, that the infected machines are still up and running, thus some clean-up at that side is still necessary...

You can read more about the story in a blog post by Microsoft: "Cracking Down on Botnets". And I will update the blog with new information once we start to analyze the collected data...

"A Practical Attack to De-Anonymize Social Network Users"

Monday, February 1. 2010
In the last couple of months, we have worked on a technique to de-anonymize users based on the way they interact with social networks. The idea behind our attack is the fact that the group memberships of a user (i.e., the groups of a social network to which a user belongs) is often sufficient to uniquely identify this user. This means that there are only a few (or in the best case only one) users of a social network that are a member of exactly the same groups.

The attack scenario is the following: a malicious website wants to de-anonymize a user, i.e., find out the real name and identity of a visitor. The attack is implemented in two phases. In a first phase, we crawl the groups of a social network to determine the members of the different groups. This is our database from which we can generate a group fingerprint per user. In the second phase, we use the well-known technique of history stealing to probe the browser's history for links to group, thus determining the group fingerprint of the visitor. Wen can then compare this fingerprint to our database and de-anonymize the visitor. Even when unique identification is not possible, then the attack might still significantly reduce the size of the set of candidates that the victim belongs to.

As a proof-of-concept, we implemented the attack for XING, a well-known "Social Network for Business Professionals". Please note that this attack is not specific to XING or any other social network - it is generally applicable to different kinds of modern web applications that contain unique links for user that can be probed via history stealing. We crawled the ~7000 public groups of XING and found about 1.8 million members that belong to at least one group. These users are vulnerable to our attack and we have a demo website to participate in our experiment. Note that this test is only successful if you are a member of XING and a member of at least one group. If you regularly participate in groups the chances are higher that we can successfully de-anonymize you :-)

The following pictures show the different stages of the proof-of-concept attack:

We have published a technical report that summarizes our preliminary results at http://www.iseclab.org/papers/sonda-TR.pdf. In the next couple of weeks, we will finish the work on the paper and present our results at the 31st IEEE Symposium on Security & Privacy in May. A demo of the attack is available at http://www.iseclab.org/people/gilbert/experiment/.

Data Set For Malware Clustering/Classification

Friday, January 29. 2010
About one month ago I blogged about our research on malware clustering and classification. We have now also released the full data set from our experiments, such that other people can reproduce the results and compare our approach to theirs. You can find all information at http://pi1.informatik.uni-mannheim.de/malheur/, together with a description of the different data.

Quick overview of the data:
Our reference data set is extracted from our large database of malware binaries maintained at CWSandbox. The malware binaries have been collected over a period of three years from a variety of sources. From the overall database, we select binaries which have been assigned to a known class of malware by the majority of six independent anti-virus products. We append the overall anti-virus label to the filename of each report. Although anti-virus labels suffer from inconsistency, we expect the selection using different scanners to be reasonable consistent and accurate. To compensate for the skewed distribution of classes, we discard classes with less than 20 samples and restrict the maximum contribution of each class to 300 binaries. The selected malware binaries are then executed and monitored using CWSandbox, resulting in a total of 3.133 behavior reports in MIST format.

The application data set consists of seven chunks of malware binaries obtained from the anti-malware vendor Sunbelt Software. The binaries correspond to malware collected during seven consecutive days in August 2009 and originate from a variety of sources. Sunbelt Software uses these very samples to create and update signatures for their VIPRE anti-malware product as well as for their security data feed ThreatTrack. The complete test data set consists of 33.698 behavior reports in MIST format.

The full technical report is available at http://honeyblog.org/junkyard/paper/malheur-TR-2009.pdf.

Update: I changed the terms within the description to use the correct description.