(1)
Sang, H.; Hai, G. A Framework: Region-Frame-Attention-Compact Bilinear Pooling Layer Based S2VT For Video Description.
EJAS
2019
,
7
, 17-30.