Novel american sign language fingerspelling recognition in the wild with weakly supervised learning and feature embedding

Conference proceedings article

ผู้เขียน/บรรณาธิการ

วุฒิพงษ์ คำวิลัยศักดิ์

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

การเปลี่ยนแปลงด้วยเทคโนโลยีดิจิตอล (รูปแบบการวิจัยเชิงกลยุทธ์)

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Pannattee P., Kumwilaisak W., Hansakunbuntheung C., Thatphithakkul N.

ผู้เผยแพร่: Elsevier

ปีที่เผยแพร่ (ค.ศ.): 2021

หน้าแรก: 291

หน้าสุดท้าย: 294

จำนวนหน้า: 4

ISBN: 9780738111278

นอก: 0928-4931

eISSN: 1873-0191

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85112848412&doi=10.1109%2fECTI-CON51831.2021.9454677&partnerID=40&md5=9a2e4940120d112d316562a7e25928e8

ภาษา: English-Great Britain (EN-GB)

ดูบนเว็บไซต์ของสำนักพิมพ์

บทคัดย่อ

This paper presents a new method in fingerspelling recognition in highly dynamic video sequences. Sign language videos are labeled only in a video sequence level. A deep learning network extracts spatial features of video frames with the AlexNet and uses them to derive a language model with the Long-Short Term Memory (LSTM) network. The results of this deep learning network are the predicted fingerspelling gestures at a frame level. The recognition results of testing video sequences with 100 percent accuracy are used to improve spatial features of video frames. We construct a Siamese network from the recognition results in the first recognition pass. A network deployed in the Siamese network is the ResNet-50. We employ the Siamese network to derive the efficient representation of each fingerspelling gesture. The derived features corresponding to each video frame are fed to the LSTM network to predict fingerspelling gestures. Our proposed method can outperform state of the art fingerspelling recognition algorithms by almost four percent in recognition accuracy from our experimental results. © 2021 IEEE.

คำสำคัญ

Feature embedding, Fingerspelling recognition, Siamese network, Weakly supervised learning