GSMT: An explainable semi-supervised multi-label method based on Gower distance
Academic Article in Scopus
-
- Overview
-
- Identity
-
- Additional document info
-
- View All
-
Overview
abstract
-
The financial, health, and education sectors produce vast amounts of data daily. Labeling entries such as assets, patients, and students is both costly and complex due to the evolution of databases into multi-label settings. Handling real-world data requires automatic labeling to circumvent slow manual procedures and explanations for compliance with regulations. In this work, we introduce GSMT, an inductive Explainable Semi-Supervised Multi-Label Random Forest Method based on Gower Distance, which uses supervised and unsupervised data to provide a non-linear solution for mainly tabular multi-label datasets with fully unknown label vectors. GSMT splits the dataset using multi-dimensional manifolds, completes missing label information and inductively predicts new observations while achieving explainability. We demonstrate state-of-the-art performance across Micro F1 Score, AUPRC, AUROC, and Label Rank Average Precision in a study involving 20 numerical and 5 mostly categorical datasets with five missing data ratios. By leveraging unsupervised information on top of numerical and categorical data, GSMT outputs the pattern rules annotated with performance measures, explanations on attribute and label space as well as an inductive model capable of predicting multi-label observations. © 2025 The Authors
status
publication date
Identity
Digital Object Identifier (DOI)
Additional document info
has global citation frequency
volume