Process monitoring for quality ¿ A multiple classifier system for highly unbalanced data Academic Article in Scopus uri icon

abstract

  • © 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).In big data-based analyses, because of hyper-dimensional feature spaces, there has been no previous distinction between machine learning algorithms (MLAs). Therefore, multiple diverse algorithms should be included in the analysis to develop an adequate model for detecting/recognizing patterns exhibited by classes. If multiple classifiers are developed, the next natural step is to determine whether the prediction benchmark set by the top performer can be improved by combining them. In this context, multiple classifier systems (MCSs) are powerful solutions for difficult pattern recognition problems because they usually outperform the best individual classifier, and their diversity tends to improve resilience and robustness to high-dimensional and noisy data. To design an MCS, an appropriate fusion method is required to optimally combine the individual classifiers and determine the final decision. Process monitoring for quality is a Quality 4.0 initiative aimed at defect detection via binary classification. Because most mature organizations have merged traditional quality philosophies, their processes generate only a few defects per million of opportunities. Therefore, manufacturing data sets for binary classification of quality tends to be highly/ ultra-unbalanced. Detecting these rare quality events is one of the most relevant intellectual challenges posed to the fourth industrial revolution, Industry 4.0 (I 4.0). A new MCS aimed at analyzing these data structures is presented. It is based on eight well-known MLAs, an ad hoc fitness function, and a novel meta-learning algorithm. For predicting the final quality class, this algorithm considers the prediction from a set of classifiers as input and determines which classifiers are reliable and which are not. Finally, to demonstrate the superiority of the MLAs over extensively used fusion rules, multiple publicly available data sets are analyzed.

publication date

  • October 1, 2021