[1]

Sang, H. and Hai, G. 2019. A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description. European Journal of Applied Sciences. 7, 4 (Sep. 2019), 17–30. DOI:https://doi.org/10.14738/aivp.74.6717.