...
首页> 外文期刊>ACM transactions on multimedia computing communications and applications >Rich Visual and Language Representation with Complementary Semantics for Video Captioning
【24h】

Rich Visual and Language Representation with Complementary Semantics for Video Captioning

机译:丰富的视觉和语言表示与视频标题的互补语义

获取原文
获取原文并翻译 | 示例
           

摘要

It is interesting and challenging to translate a video to natural description sentences based on the video content. In this work, an advanced framework is built to generate sentences with coherence and rich semantic expressions for video captioning. A long short term memory (LSTM) network with an unproved factored way is first developed, which takes the inspiration of LSTM with a conventional factored way and a common practice to feed multi-modal features into LSTM at the first time step for visual description. Then, the incorporation of the LSTM network with the proposed improved factored way and un-factored way is exploited, and a voting strategy is utilized to predict candidate words. In addition, for robust and abstract visual and language representation, residuals are employed to enhance the gradient signals that are learned from the residual network (ResNet), and a deeper LSTM network is constructed. Furthermore, three convolutional neural network based features extracted from GoogLeNet, ResNet101, and ResNet152, are fused to catch more comprehensive and complementary visual information. Experiments are conducted on two benchmark datasets, including MSVD and MSR-VTT2016, and competitive performances are obtained by the proposed techniques as compared to other state-of-the-art methods.
机译:根据视频内容将视频转换为自然语描述句子是有趣和挑战性。在这项工作中,建立了一个先进的框架,以生成带有相干性和丰富语义表达式的句子,用于视频字幕。首先开发了具有未经证实的考核方式的长期内记忆(LSTM)网络,这是利用传统的因子的启发,并在第一次进行视觉描述中将多模态特征馈送到LSTM中的常见做法。然后,利用所提出的改进的因子和未被发生的方式将LSTM网络纳入,并且利用投票策略来预测候选词。此外,对于鲁棒和抽象的视觉和语言表示,使用残差来增强从残余网络(Reset)学习的梯度信号,并且构建更深的LSTM网络。此外,从Googlenet,Resnet101和Reset152中提取的三个基于卷积神经网络的特征被融合以捕获更全面和互补的视觉信息。实验在两个基准数据集上进行,包括MSVD和MSR-VTT2016,与其他最先进的方法相比,所提出的技术获得了竞争性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号