Sang H, Hai G. A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description. EJAS [Internet]. 2019Sep.8 [cited 2026Feb.4];7(4):17-30. Available from: http://116.203.177.230/index.php/AIVP/article/view/6717