...
首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Loop Optimization for Divergence Reduction on GPUs with SIMT Architecture
【24h】

Loop Optimization for Divergence Reduction on GPUs with SIMT Architecture

机译:采用SIMT架构的循环优化可减少GPU上的发散

获取原文
获取原文并翻译 | 示例
           

摘要

The single-instruction multiple thread (SIMT) architecture that can be found in some latest graphical processing units (GPUs) builds on the conventional single-instruction multiple data (SIMD) parallelism while adopting the thread programming model. The architecture suffers from a degraded performance caused by the inefficient divergence handling, a problem hidden by the programmer’s view of independent threads. A loop optimization technique having the potential to increase efficiency of the core SIMD block while processing embedded divergences is investigated here. Concurrent loops are generally not bound to iterate in lock-step, allowing better alignment of thread flows via iteration scheduling. The concept efficiency is analyzed for fixed and flow-adapting scheduling policies. The proposed payoff model captures loop overhead implications, allowing one to assess the tradeoffs of applying the technique to a specific loop instance. Processing speedups can generally be observed in the total running time if kernels are compute-bound, as demonstrated by several examples. The studied iteration scheduling policies do not impose alterations to the core SIMD concept and design, thus preserving the benefits of data level parallelism.
机译:在一些最新的图形处理单元(GPU)中可以找到的单指令多线程(SIMT)体系结构是在采用线程编程模型的基础上,基于常规的单指令多数据(SIMD)并行性构建的。该架构因分散处理效率低下而导致性能下降,而程序员对独立线程的看法隐藏了这一问题。本文研究了一种循环优化技术,该技术有可能在处理嵌入式散度的同时提高核心SIMD块的效率。并发循环通常不以锁定步骤进行迭代,因此可以通过迭代调度更好地对齐线程流。分析了固定和流量自适应调度策略的概念效率。拟议的回报模型捕获了循环开销的影响,允许人们评估将技术应用于特定循环实例的权衡取舍。如几个示例所示,如果内核受计算限制,通常可以在总运行时间内观察到处理速度的提高。研究的迭代调度策略不会对SIMD核心概念和设计施加任何更改,从而保留了数据级并行性的好处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号