A Novel Feature Vector for AI-assisted Windows Malware Detection

Abstract

Dynamic malware analysis, which has been a major field in malware analysis and detection, involves executing malware in a controlled environment and observing its behavior. Dynamic analysis reports include Windows API calls, which are extracted as a data source for statistical features, and have allowed for effective malware detection. However, existing works neglect certain critical information about the API calls when constructing feature vectors. In this work, we develop a novel feature vector, taking into account not only the API name and its arguments but also other statistical features such as the return values and the number of times it is called in a sample. Due to the diversity of API calls in terms of the number of arguments, names, and return values, we adopt hash functions to construct a fixed-size feature vector, thus facilitating the design and development of artificial intelligence (AI)-assisted algorithms for malware detection. We experiment with various deep learning and machine learning models and perform extensive hyperparameter tuning to come up with an optimal model for our feature vector. The experimental dataset was recently collected from an anti-virus company, including 14860 samples with 7398 malign samples and 7462 benign samples. Extensive experiments show that our solution outperforms many baseline state-of-the-art malware detectors in various performance metrics, including accuracy, and false positive or false negative rate, thus proving the effectiveness of our feature vector and detection models.

Members

Yau Le Qi

Lam Yik Ting

Prannaya Gupta

Justin Lim

Ishneet Singh