Effective Multi-Object of Interest-Based Video Summarization Using Deep Learning: A Smart Technique for User’s Desired Content Optimization
บทความในวารสาร
ผู้เขียน/บรรณาธิการ
กลุ่มสาขาการวิจัยเชิงกลยุทธ์
รายละเอียดสำหรับงานพิมพ์
รายชื่อผู้แต่ง: Hafiz Burhan Ul Haq; Watcharapan Suwansantisuk; Kosin Chamnongthai
ผู้เผยแพร่: Institute of Electrical and Electronics Engineers
ปีที่เผยแพร่ (ค.ศ.): 2025
วารสาร: IEEE Access (2169-3536)
Volume number: 13
หน้าแรก: 208563
หน้าสุดท้าย: 208589
จำนวนหน้า: 27
นอก: 2169-3536
eISSN: 2169-3536
ภาษา: English-Great Britain (EN-GB)
บทคัดย่อ
Video summarizing is an essential technique for effective content management, owing to the exponential increase in video data across several domains. However, video analysis to obtain specific content is time-consuming, complicated, and requires high-speed computing power, along with experts, to extract content according to user requirements. To address these issues, this study presents a flexible method that summarizes a video according to the user’s requirements for the object of interest. The proposed method enhances user satisfaction and efficiency by keeping only video frames relevant to the user’s interests. The proposed method integrates two major modules designed in this study: Multiple Object of Interest-based Video Summarization (MOoIVS) and Shirt Color Object of Interest-based Video Summarization (SCOoIVS). In addition, redundant frame elimination was performed using the proposed temporal frame pruning technique to remove repetitive frames and shorten the video summary. Moreover, techniques, such as temporal scaling and random temporal downsampling, have been proposed and implemented to produce a summary video with a length within the duration required by the user. Finally, a thorough experimental study was conducted on three datasets: a widely used SUMME dataset, custom dataset for multiple object detection (CD1), and custom dataset for shirt color detection (CD2). After redundant frame elimination, the proposed method on the CD1 dataset achieved 94.32% accuracy with a 75.98% Summarization Rate (SMR) using YOLOv3 as an object detector model and achieved a better accuracy of 94.70% with 75.86% SMR using YOLOv5. In terms of providing actual content, YOLOv5 provided 87.19% of the actual content in the summarized video and was superior to YOLOv3, which provided 86.75% of the actual content. On the other hand, testing on the SUMME dataset, the proposed video summarization method has the following performance when paired with different object detector models: 90.40% accuracy, 76.61% SMR, and 72.14% actual content for YOLOv3; 89.93% accuracy, 76.16% SMR, and 73.54% actual content for YOLOv5; and 79.54% accuracy, 85.13% SMR, and 46.86% actual content for the Single Shot Detector (SSD). A small percentage of the actual SSD content indicates that the proposed video summarization method should be paired with either YOLOv3 or YOLOv5 for superior performance. Furthermore, we evaluated the ability of the proposed method to provide a video summary that meets the user’s requirement for selecting persons with a specific shirt color. To achieve this, we retrained YOLOv5 on the shirt color dataset to recognize different shirt colors. The evaluation results showed that this customized YOLOv5 performed exceedingly well on the CD2 dataset, achieving 97.90% accuracy and 67.86% SMR with an actual content of 96.03%. Finally, compared with state-of-the-art video summarization methods, the proposed method is superior in terms of accuracy, SMR, and object selection flexibility. The proposed video summarization method is simple and has practical applications in surveillance and digital marketing dealing with large videos. © 2013 IEEE.
คำสำคัญ
ไม่พบข้อมูลที่เกี่ยวข้อง






