Sang, H. and Hai, G. (2019) “A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description”, European Journal of Applied Sciences, 7(4), pp. 17–30. doi: 10.14738/aivp.74.6717.