abstract
- © 2021, Springer Nature Switzerland AG.Mexican Sign Language (MSL) is the primary form of communication for the deaf community in Mexico. MSL has a different grammatical structure than Spanish; furthermore, facial expression plays a determining role in complementing context-based meaning. This turns it difficult for a hearing person without prior knowledge of the language to understand what is to be transmitted, representing an important communication barrier for deaf people. In order to face this, we present the first architecture to consider facial features as indicators of grammatical tense to develop a real-time interpreter from MSL to written Spanish. Our model uses the open source MediaPipe library to extract marks from the face, body position and hands. Three 2D convolutional neural networks are used to encode individually and extract patterns, the networks converge to a multilayer perceptron for classification. Finally, a Hidden Markov Model is used to morphosyntactically predict the most probable sequence of words based on a preloaded knowledge base. From the experiments were carried out, a precision of 94.9% was obtained with ¿= 0.07 for the recognition of 75 isolated words and 94.1% with ¿= 0.09 for the interpretation of 20 sentences in MSL in a medical context. Being an approach based on camera inputs and observing that even with a few samples an adequate generalization can be achieved, it would be feasible to scale our architecture to other sign languages and offer possibilities of efficient communication to millions of people with hearing disability.