...
首页> 外文期刊>Microelectronics journal >A pipelined area-efficient and high-speed reconfigurable processor for floating-point FFT/IFFT and DCT/IDCT computations
【24h】

A pipelined area-efficient and high-speed reconfigurable processor for floating-point FFT/IFFT and DCT/IDCT computations

机译:用于浮点FFT / IFFT和DCT / IDCT计算的流水线面积高效且可重新配置的处理器

获取原文
获取原文并翻译 | 示例
           

摘要

For scientific computing and high-resolution imaging applications, this paper presents a pipelined reconfigurable processor to implement variable-length single-precision floating-point FFT/IFFT and DCT/IDCT computations compatible with the IEEE 754 standard. In order to minimize the total hardware overhead and power consumption, a reconfigurable radix-4 butterfly (RR4BF) is proposed to reduce 75% adders in comparison to the conventional parallel radix-4 butterfly, and the partially shared Ping-Pong structured register bank (PSPPRB) provides an efficient and specific intermediate data caching mechanism to realize the maximized adder resource utilization ratio in RR4BF and to guarantee the high throughput for the pipelined design. Moreover, fused floating-point 4-input adder and fused floating-point 2-term dot product unit are proposed, which can not only improve about 3 dB signal-to quantization-noise ratio (SQNR), but also save 28% and 19% hardware overhead compared with discrete implementations and previous state-of-the-art design, respectively. Simulation results show that the latency for FFT computations is about 25% of the R4SDF design without any throughput loss, and over 139 dB SQNR is achieved. Logic synthesis results in a 65 nm CMOS technology show that the power consumption ranges from 43.5 mW to 372.3 mW for 16- to 1024-point FFTs at 500 MHz, and the total hardware overhead is equivalent to 543k NAND2 gates. (C) 2015 Elsevier Ltd. All rights reserved.
机译:对于科学计算和高分辨率成像应用,本文提出了一种流水线可重构处理器,以实现与IEEE 754标准兼容的可变长度单精度浮点FFT / IFFT和DCT / IDCT计算。为了将总的硬件开销和功耗降至最低,与传统的并行radix-4蝶形和部分共享的Ping-Pong结构的寄存器组相比,提出了一种可重配置的radix-4蝶形(RR4BF),以减少75%的加法器。 PSPPRB)提供了一种高效且特定的中间数据缓存机制,以实现RR4BF中最大的加法器资源利用率,并确保流水线设计的高吞吐量。此外,提出了融合浮点四输入加法器和融合浮点二项点积单元,不仅可以提高约3 dB的信噪比(SQNR),而且可以节省28%和19与分立实现和以前的最新技术相比,硬件开销分别为%。仿真结果表明,FFT计算的等待时间约为R4SDF设计的25%,而没有任何吞吐量损失,并且实现了139 dB以上的SQNR。 65 nm CMOS技术的逻辑综合结果表明,在500 MHz下进行16至1024点FFT的功耗范围为43.5 mW至372.3 mW,总硬件开销相当于543k NAND2门。 (C)2015 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号