Detection of Windows Portable Executable Malware using NLP Techniques and Proxy-server


  • Maksym Mishchenko
  • Mariia Dorosh



cybersecurity, NLP, word2vec, proxy-server, machine learning, Windows Portable Executable, malware, ssdeep, LAN


This paper aims to investigate the effectiveness of virus detection in Windows Portable Executable file using NLP, machine learning and a computer network proxy. Selected classification performance metrics are the accuracy and F1-score of the virus type classification in a specific file and the average time spent on analyzing the file. To classify viruses, a static analysis of the Optional Header Directories section in PE file is conducted. The list of imported libraries is vectorized using the word2vec model and submitted for classification by the Random Forest Classifier, Support Vector Machine and Multilayer Perceptron models. As a result, the best training mean accuracy of 94% and F1 score of 0.94 for the Random Forest Classifier model is achieved. To determine the effectiveness of virus file detection, a local area network (LAN) of three computers and a proxy server is configured. The conducted experiments on the detection of malicious files with the use of a proxy shows request time of 2.3 seconds for Support Vector Machine, 2.28 seconds for Multilayer Perceptron and 2.6 seconds for Random Forest Classifier. For reducing delay, ssdeep based cache is introduced, which reduces delay to 2.1 seconds for Random Forest Classifier and 2.15 seconds delay for Multilayer Perceptron. The proxy classification F1 score obtained on the evaluation proxy data confirmed and outperformed the F1 score obtained on the training dataset. This gives grounds for asserting the feasibility of using a proxy server and NLP techniques to detect Windows Portable Executable malware.


