abstract
- This study explores a multilingual transfer learning strategy for Named Entity Recognition (NER) in mammography radiology reports, aiming to improve breast cancer diagnosis. By utilizing a dataset from TecSalud, which includes mammograms and Electronic Health Records (EHRs) over ten years, this study seeks to address the linguistic barriers in medical documentation through advanced Natural Language Processing (NLP) models. Our approach involves meticulously labeling twenty-four distinct entities within the predominantly Spanish dataset, covering a range of diagnostic features and interpretive findings, highlighting the challenge of linguistic diversity in medical records and the potential of NLP to bridge this gap.The results demonstrate that fine-tuning on the last layer offers a balanced approach between simplicity and accuracy, avoiding overfitting and achieving state-of-art results. © 2024 IEEE.