(1)

Sang, H.; Hai, G. A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description. EJAS 2019, 7, 17-30.