Multi-modal detection transformer with data engineering technique to stratify patients with Ulcerative Colitis Academic Article in Scopus uri icon

abstract

  • Data engineering has become a powerful tool for machine learning applications over the last few years. In computer vision for generative AI, the necessity of large amounts of data for training models has become a significant bottleneck. Data augmentation is a technique with limitations, even if it is useful. Training models with synthetic data are a solution due to the flexibility and scalability of the data creation that they can offer. When creating a synthetic dataset, one of the biggest challenges, however, is generating accurate and valuable data that can guarantee that the samples are a factual representation of the area of interest or an image; therefore, validating the dataset by subject matter experts becomes crucial. Examples of the multiple applications this method can use are image captioning, question-answering applications, generative AI, and overall multimodal problems. For image captioning, an image and its description are needed. Regions-of-Interest (ROI) in the image can be associated with text, forming a multimodal relationship associating an ROI inside an image and describing it. This work proposes a methodology to create automatic ROI-description multimodal Ulcerative Colitis (UC) dataset construction. To deal with the requirement of a large dataset for model training, we introduce stable diffusion for generating images that represent these patch-level characteristics widely used to classify a sample into an MES score. We utilise the clinically accepted phenotypes for informed decision-making. These include ulcers, bleeding, and erosions. We use this dataset to train a transformer-based detection pipeline (DETR) to find the characteristic inside the raw UC image to generate an ROI and associate it with a text template that describes the region. Finally, we compare our results against a baseline ROI dataset that medical experts have validated. © 2024 IEEE.

publication date

  • January 1, 2024