abstract
- Taxonomies play a critical role in structuring knowledge within rapidly evolving domains such as professional skills. Traditional manual taxonomy management faces challenges due to its labor-intensive nature and the rapid emergence of new concepts. To address these issues, we propose a novel semi-supervised approach leveraging Retrieval-Augmented Generation (RAG) for taxonomy expansion and completion, particularly tailored for dynamic skill-based taxonomies. Our contributions include the creation of a comprehensive dataset derived from automotive sector job postings, designed explicitly to evaluate taxonomy expansion and completion tasks. This methodology integrates the precision of retrieval-based mechanisms with the flexibility of generative models, enabling accurate and efficient updates to taxonomy structures. We evaluated our method using this dataset, demonstrating an overall accuracy of 78%. Although the model performed robustly in horizontal expansions, accurately recognizing variations of existing concepts, it revealed limitations in vertical expansions, especially in identifying entirely new categories. These findings underline the necessity for improved data representation strategies and the incorporation of contextual enrichment to enhance taxonomy robustness. © 2025 by the authors.