CoLI@FIRE2024: Findings of Word-level Code-Mixed Language Identification in Dravidian Languages
Academic Article in Scopus
-
- Overview
-
- Identity
-
- Additional document info
-
- View All
-
Overview
abstract
-
Code-mixing, a linguistic phenomenon where multiple languages are blended within a single text, has become increasingly prevalent in multilingual societies, particularly in digital communication. The CoLI-Dravidian shared task, organized as part of Forum for Information Retrieval and Evaluation (FIRE) 2024, aimed to address these challenges by inviting researchers to develop models capable of classifying words in code-mixed texts involving Dravidian languages - Tamil, Kannada, Malayalam, and Tulu - interwoven with English. The task presents significant challenges due to the complexity of linguistic structures, mixed-language tokens, and dialectal variations, especially in low-resource languages like those in the Dravidian family. The participating teams employed various methodologies, including traditional Machine Learning (ML), Deep Learning (DL), and transformer-based models, to tackle these challenges. This paper presents important findings of the task, baselines, and an overview of the submitted methodologies. The top-performing models achieved macro F1 scores ranging from 0.7656 for Tamil to 0.9293 for Kannada, demonstrating the capability of advanced computational techniques to process these complex multilingual texts effectively. © 2024 Copyright held by the owner/author(s).
status
publication date
Identity
Digital Object Identifier (DOI)
Additional document info
has global citation frequency
start page
end page