DIMVA'08: "Learning and Classification of Malware Behavior"
Today and tomorrow DIMVA'08 takes place in Paris. DIMVA'08 is the Fifth Conference on Detection of Intrusions and Malware & Vulnerability Assessment and organized by the special interest group SIDAR of the German Informatics Society (GI).
Our paper entitled "Learning and Classification of Malware Behavior" is a joint work with Konrad Rieck, Carsten Willems, Patrick Düssel, Pavel Laskov, and Felix Freiling. The paper deals with malware classification, i.e., how to automatically learn malware families using labels. We use (noisy) labels by an anti-virus product and then apply machine learning algorithms to classify malware based on execution traces generated with the help of CWSandbox. In an experiment with over 3,000 previously undetected malware binaries, our system correctly predicted almost 70% of labels assigned by an anti-virus scanner four weeks later. Our method also detects unknown behavior, so that malware families not present in the learning corpus are correctly identified as unknown. The analysis of prominent features inferred by our discriminative models has shown interesting similarities between malware families; in particular, we have discovered that Doomber and Gobot worms derive from the same origin, with Doomber being an extension of Gobot - all in an automated way.
Abstract:
Malicious software in form of Internet worms, computer viruses, and Trojan horses poses a major threat to the security of networked systems. The diversity and amount of its variants severely undermine the effectiveness of classical signature-based detection. Yet variants of malware families share typical behavioral patterns reflecting its origin and purpose. We aim to exploit these shared patterns for classification of malware and propose a method for learning and discrimination of malware behavior. Our method proceeds in three stages: (a) behavior of collected malware is monitored in a sandbox environment, (b) based on a corpus of malware labeled by an anti-virus scanner a malware behavior classifier is trained using learning techniques and (c) discriminative features of the behavior models are ranked for explanation of classification decisions. Experiments with different heterogeneous test data collected over several months using honeypots demonstrate the effectiveness of our method, especially in detecting novel instances of malware families previously not recognized by commercial anti-virus software.
The full paper is now available.
Our paper entitled "Learning and Classification of Malware Behavior" is a joint work with Konrad Rieck, Carsten Willems, Patrick Düssel, Pavel Laskov, and Felix Freiling. The paper deals with malware classification, i.e., how to automatically learn malware families using labels. We use (noisy) labels by an anti-virus product and then apply machine learning algorithms to classify malware based on execution traces generated with the help of CWSandbox. In an experiment with over 3,000 previously undetected malware binaries, our system correctly predicted almost 70% of labels assigned by an anti-virus scanner four weeks later. Our method also detects unknown behavior, so that malware families not present in the learning corpus are correctly identified as unknown. The analysis of prominent features inferred by our discriminative models has shown interesting similarities between malware families; in particular, we have discovered that Doomber and Gobot worms derive from the same origin, with Doomber being an extension of Gobot - all in an automated way.
Abstract:
Malicious software in form of Internet worms, computer viruses, and Trojan horses poses a major threat to the security of networked systems. The diversity and amount of its variants severely undermine the effectiveness of classical signature-based detection. Yet variants of malware families share typical behavioral patterns reflecting its origin and purpose. We aim to exploit these shared patterns for classification of malware and propose a method for learning and discrimination of malware behavior. Our method proceeds in three stages: (a) behavior of collected malware is monitored in a sandbox environment, (b) based on a corpus of malware labeled by an anti-virus scanner a malware behavior classifier is trained using learning techniques and (c) discriminative features of the behavior models are ranked for explanation of classification decisions. Experiments with different heterogeneous test data collected over several months using honeypots demonstrate the effectiveness of our method, especially in detecting novel instances of malware families previously not recognized by commercial anti-virus software.
The full paper is now available.



Tracked: Jul 22, 13:56