Squacc BiLSTM: a framework for dense video captioning using neural knowledge graph and deep learning Academic Article in Scopus uri icon

abstract

  • Dense Video Captioning narrates a thorough comprehension of the video through accurate identification of the frame details, including the objects and their movements. However, most of the existing models fail to learn the visual context information and linguistic clues present in the video, which may limit the model¿s cognitive capacity to produce the descriptions. Therefore, in this research, the automatic dense video captioning is performed using the Squacc Bidirectional Long Short-Term Memory (Squacc BiLSTM) model, where a Neural knowledge graph (NKG) is generated based on the recurrent neural network. The generated NKG unveils the temporal video features, which support. The prediction of the video contents for captioning using Squacc BiLSTM classifier. Furthermore, the Squacc optimization algorithm fine-tunes the classifier parameters, which supports capturing the past and future contexts of the video for precise captioning. The experimental results demonstrate that the proposed Squacc BiLSTM model has been proven effective in video captioning, showcasing enhanced BLEU, ROUGE, CIDEr, METEOR, and SPICE scores of 0.439, 0.511, 0.759, 0.264, and 19.994, outperforming the existing techniques. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.

publication date

  • January 1, 2025