Enhancing Educational Innovation: Evaluating the Accuracy of AI-Generated Assessments for Engineering Course Reports

Nowadays, the use of Artificial Intelligence (AI) for various tasks is an area of significant interest. This work evaluates the accuracy of AI, specifically ChatGPT-4, in scoring and providing feedback on Civil Engineering laboratory practice reports and examines the influence of language on AI performance and results. The study revealed that AI has limitations in scoring when reports include images and tables with important information in a non-editable format. Additionally, when evaluating the technology using different languages (English, French, and Spanish), a significant dependence on the language was observed for delivering assertive and humanized feedback and accurate scoring. English was the most effective language, while French was the least effective. Image recognition limitations can be addressed with appropriate prompt design and continuous verification and adaptation. Similarly, the language capacity limitations of Generative Pre-trained Transformer technology can be overcome with proper training and information feeding. Despite ChatGPT-4's demonstrated capacity to generate scoring and feedback, this work emphasizes the current limitations, such as the need for adequate training and ongoing verification and direction from educators to ensure accurate AI responses. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Enhancing Educational Innovation: Evaluating the Accuracy of AI-Generated Assessments for Engineering Course Reports Chapter in Scopus