SANG, H.; HAI, G. A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description. European Journal of Applied Sciences, [S. l.], v. 7, n. 4, p. 17–30, 2019. DOI: 10.14738/aivp.74.6717. Disponível em: http://116.203.177.230/index.php/AIVP/article/view/6717. Acesso em: 4 feb. 2026.