Ransomware Family Attribution With ML: A Comprehensive Evaluation of Datasets Quality, Models Comparison, and a Simulated Deployment Academic Article in Scopus uri icon

abstract

  • Ransomware, a form of cyber extortion that holds data hostage for payment, poses a growing threat to global digital infrastructure, potentially disrupting supply chains, financial systems, and critical healthcare services. Despite multiple proposed detection methods, critical gaps remain. For instance, most studies focus solely on binary classification, overlook dataset quality issues, and rarely evaluate model performance in practical deployment environments. These limitations lead to biased performance comparisons and hinder practical applicability. This study introduces a comprehensive evaluation framework, covering the entire pipeline from dataset collection to model deployment. We begin by presenting a feature extraction method from ransomware execution logs into usable datasets. From there, we assess dataset quality based on class balance, feature independence, data timeliness, class separability, and feature variability. We then train four traditional Machine Learning (ML) and two Deep Learning (DL) models, applying consistent preprocessing and hyperparameter tuning for fair comparisons. Next, we propose a real-time detection architecture in a Windows 10 virtual environment to deploy the models against live ransomware. Our dataset analysis revealed that, based on a quality score Q ¿ [ [0,1] , class separability (Formula presented), feature variability (Formula presented), and data timeliness (Formula presented) are the main factors hindering dataset quality. As for detection, while Random Forest (RF) achieved the highest offline accuracy (100%), surpassing Multi Layer Perceptron (MLP) with 99.48%, the latter performed better in deployment, reaching 70% accuracy. These findings offer valuable insights and underscore the importance of comprehensive evaluation across all phases of model development, providing a foundation for building more robust and applicable detection systems. © 2013 IEEE.

publication date

  • January 1, 2025