...
【24h】

A Performance Model for GPUs with Caches

机译:具有缓存的GPU的性能模型

获取原文
获取原文并翻译 | 示例
           

摘要

To exploit the abundant computational power of the world’s fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, accurate performance estimation models for modern GPUs do not exist. This paper presents two accurate models for modern GPUs: a sampling-based linear model, and a model based on machine-learning (ML) techniques which improves the accuracy of the linear model and is applicable to modern GPUs with and without caches. We first construct the sampling-based linear model to predict the runtime of an arbitrary OpenCL kernel. Based on an analysis of NVIDIA GPUs’ scheduling policies we determine the earliest sampling points that allow an accurate estimation. The linear model cannot capture well the significant effects that memory coalescing or caching as implemented in modern GPUs have on performance. We therefore propose a model based on ML techniques that takes several compiler-generated statistics about the kernel as well as the GPU’s hardware performance counters as additional inputs to obtain a more accurate runtime performance estimation for modern GPUs. We demonstrate the effectiveness and broad applicability of the model by applying it to three different NVIDIA GPU architectures and one AMD GPU architecture. On an extensive set of OpenCL benchmarks, on average, the proposed model estimates the runtime performance with less than 7 percent error for a second-generation GTX 280 with no on-chip caches and less than 5 percent for the Fermi-based GTX 580 with hardware caches. On the Kepler-based GTX 680, the linear model has an error of less than 10 percent. On an AMD GPU architecture, Radeon HD 6970, the model estimates with 8 percent of error rates. The proposed technique outperforms existing models by a factor of 5 to 6 in terms of accuracy.
机译:为了利用世界上最快的超级计算机的强大计算能力,必须将工作负载平均分配给典型的异构计算设备。尽管对于常规CPU存在相对准确的性能模型,但对于现代GPU却不存在准确的性能估计模型。本文介绍了两种适用于现代GPU的准确模型:基于采样的线性模型和基于机器学习(ML)技术的模型,该模型可提高线性模型的准确性,并适用于具有和不具有缓存的现代GPU。我们首先构建基于采样的线性模型,以预测任意OpenCL内核的运行时间。在对NVIDIA GPU的调度策略进行分析的基础上,我们确定了可以进行准确估算的最早采样点。线性模型无法很好地捕捉到现代GPU中实现的内存合并或缓存对性能的重大影响。因此,我们提出了一种基于ML技术的模型,该模型将编译器生成的有关内核的统计信息以及GPU的硬件性能计数器作为附加输入,以获取更精确的现代GPU运行时性能估计。通过将其应用于三种不同的NVIDIA GPU架构和一种AMD GPU架构,我们证明了该模型的有效性和广泛的适用性。在大量广泛的OpenCL基准测试中,对于没有片上缓存的第二代GTX 280,建议的模型平均估计运行时性能误差小于7%,而基于Fermi的GTX 580的运行时性能误差小于5%。硬件缓存。在基于开普勒的GTX 680上,线性模型的误差小于10%。在AMD GPU架构Radeon HD 6970上,该模型的错误率估计为8%。提出的技术在准确性方面比现有模型高5到6倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号