Internet worms pose a serious threat to computer security.
Traditional approaches using signatures to detect worms
pose little danger to the zero day attacks. The focus of
malware research is shifting from using signature patterns
to identifying the malicious behavior displayed by the
malwares. This paper presents a novel idea of extracting
variable length instruction sequences that can identify
worms from clean programs using data mining techniques.
The analysis is facilitated by the program control flow
information contained in the instruction sequences. Based
upon general statistics gathered from these instruction
sequences we formulated the problem as a binary classification
problem and built tree based classifiers including
decision tree, bagging and random forest. Our approach
showed 95.6% detection rate on novel worms whose data
was not used in the model building process.