abstract
- © 2021 Elsevier LtdThere are many fields of computing in which having access to large volumes of data allows very precise models to be developed. For example, machine learning employs a range of algorithms that deliver important insights based on analysis of data resources. Similarly, process mining develops algorithms that use event data induced by real-world processes to support the modeling of ¿ and hence understanding and long-term improvement of ¿ those processes. In process mining, the quality of the learned process models is assessed using conformance checking techniques, which measure how well the models represent and generalize the data. This article presents the entropic relevance measure for conformance checking of stochastic process models, which are models that also provide information in regard to the likelihood of observing each sequence of observed events. Accurate stochastic conformance measurement allows identification of models that describe the data better, including the captured sequences of process events and their frequencies, with information about the likelihood of the described processes being an essential step toward simulating and forecasting future processes. Entropic relevance represents a blend between the traditional precision and recall quality criteria in conformance checking, in that it both penalizes observed processes that the model does not describe, and also penalizes processes that are permitted by the model yet were not observed. Entropic relevance can be computed in time linear in the size of the input data; and measures a fundamentally different phenomenon than other existing measures. Our evaluation over industrial datasets confirms the feasibility of using the measure in practice.