Sang, H. and Hai, G. (2019) “A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description”,
European Journal of Applied Sciences
, 7(4), pp. 17–30. doi: 10.14738/aivp.74.6717.