Malware Analysis with 3-Valued Deterministic Finite Tree Automata
Date Issued
2011
Date
2011
Author(s)
Wang, Yi-Hsiang
Abstract
There exist many security threats on the Internet, and the most notorious is malware.
Malware (malicious software) refers to programs that have malicious intention and per-
form some harmful actions. Typical malware includes viruses, worms, trojan horses, and
spyware. The rst line of defense to deter malware is malware detector. Each malware
detector has its own analysis method. The most basic and prevalent methods used in
commercial malware detectors are based on syntactic signature matching. It is widely
recognized that this detection mechanism cannot cope with advanced malware. Advanced
malware uses program obfuscation to alter program structures and therefore can evade
the detection easily. However, the semantics of a malware instance is usually preserved
after obfuscation. So, it is feasible to develop a malware detector that is based on pro-
gram semantics.
In this thesis, we propose a semantics-based approach to malware analysis. Observing
recently proposed methods for malware detection, we notice that string-based signatures
are still used widely. It is natural to extend from string to tree, which is more general and
can carry more semantics. Therefore, we use trees as signatures. Our malware detector
requires a set of malware instances and a set of benign programs. The semantics of each
input program is extracted and represented as a system call dependence graph. The
graph is then transformed into a tree. With the set of trees generated from malware and
benign programs, we use the method of grammatical inference to learn a 3-valued deter-
ministic nite tree automaton (3DFT). A 3DFT has three di erent nal states: accept,
reject, and unknown. If we take this 3DFT as the malware detector, it outputs three
di erent values. If an input program is a malware instance, the detector outputs true. If
an input program is a benign program, the detector outputs false. Otherwise, it outputs
unknown. According to our experiments, our detector exhibits very low false positives.
However, there is a tradeo that many programs are identi ed as unknown.
Subjects
Malware Analysis
Malware Detector
Grammatical Inference
3-Valued Automata
Program Semantics
System Call
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-100-R98725035-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):caafc2c5a03e29648e184de13bb01ba8
