Competencies Development in YOLO-CNN and Stereo Camera Vision to Enhance Bin Picking in Simulated Environments Academic Article in Scopus uri icon

abstract

  • In the age of swift technological advancements, precise object classification and arrangement-termed 'bin picking'-is essential. The challenge revolves around accurately identifying and positioning diverse objects in terms of color, size, and orientation within simulated environments. This work delves into a comparative analysis of two distinct methodologies: Seman-tic Segmentation through YOLO integrated with Convolutional Neural Networks (CNN), and a stereo camera-centric approach amalgamating segmentation with pose estimation. These method-ologies find their application within Python frameworks in a virtual environment. To ascertain the efficiency of each method, evaluation metrics like the precision recall curve, mean average precision 50-95 and euclidean distance are employed. The main objective of this paper is to quantitatively measure the gain of specific competencies and skills in postgraduate students resulting from the development of two bin-picking algorithms in the field of computer vision. These competencies and skills, such as advanced programming proficiency, problem-solving awareness, and in-depth knowledge of computer vision, will be assessed through pre-and post-development evaluations that consist of self-assessment surveys. Additionally, we aim to highlight the educational impact of acquiring these competencies and skills, demonstrating how they can enhance student's abilities and understanding, ultimately benefiting their academic and professional pursuits. The study focuses on segmentation and pose estimation using synthetic scenarios. Results with YOLO show 100% precision and recall rates due to a well defined testing set, with negligible translation errors for most shapes. Stereo Vision assessment compares estimated centroids with ground truth values. Emphasis is placed on prioritizing pose estimation in the XY plane for bin picking purposes, indicating potential for improvement despite notable distances achieved between centroids. Participant competency development reveals significant growth in relevant skills. Overall, the study highlights the efficacy of synthetic data in deep learning, suggests further research for pose estimation algorithms, and provides valuable insights into stereo vision development. In educational contexts, bin picking extends beyond technical knowledge, providing students with a distinctive perspective on how machines perceive and engage with the world. This research, rooted in the fundamental principles of both algorithms, not only enriches knowledge dissemination but also enhances learner's understanding of automation and its practical applications. © 2024 IEEE.

publication date

  • January 1, 2024