MALPRED
Predictive Modeling for Malware Detection in Windows Systems using Ensemble Learning

Abstract

Malware infections are a pervasive issue for computers running the Windows operating system. In this study, we present a machine-learning based approach to predict the likelihood of malware infection in Windows machines. Our methodology involves conducting data pre-processing, feature engineering, and selection on the Microsoft Malware Prediction dataset. We then perform extensive experimentation using various machine learning algorithms and identify XGBoost, LightGBM and CatBoost as the 3 best-performing algorithms. Through hyperparameter tuning via the Tree-Structured Parzen Estimator and using a Meta Learner on top of our top 3 best-performing algorithms, our optimal novel model achieves an AUC score of 73.24% across Stratified 5-fold cross-validation, demonstrating the efficacy of our approach. Additionally, we develop a web-based interface enabling users to input their Windows machine specifications and obtain predictions regarding the probability of malware infection.

Members

Yau Le Qi

Mahir Shah

Eugene Ang

Dhanvine Rameshkumar

Justin Chee

Lam Yik Ting

Prannaya Gupta

External Links

View Report