A Performance Model for GPUs with Caches

Dao Thanh Tuan; Kim Jungwon; Seo Sangmin; Egger Bernhard; Lee Jaejin

首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >A Performance Model for GPUs with Caches

【24h】

A Performance Model for GPUs with Caches

机译：具有缓存的GPU的性能模型

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

To exploit the abundant computational power of the world’s fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, accurate performance estimation models for modern GPUs do not exist. This paper presents two accurate models for modern GPUs: a sampling-based linear model, and a model based on machine-learning (ML) techniques which improves the accuracy of the linear model and is applicable to modern GPUs with and without caches. We first construct the sampling-based linear model to predict the runtime of an arbitrary OpenCL kernel. Based on an analysis of NVIDIA GPUs’ scheduling policies we determine the earliest sampling points that allow an accurate estimation. The linear model cannot capture well the significant effects that memory coalescing or caching as implemented in modern GPUs have on performance. We therefore propose a model based on ML techniques that takes several compiler-generated statistics about the kernel as well as the GPU’s hardware performance counters as additional inputs to obtain a more accurate runtime performance estimation for modern GPUs. We demonstrate the effectiveness and broad applicability of the model by applying it to three different NVIDIA GPU architectures and one AMD GPU architecture. On an extensive set of OpenCL benchmarks, on average, the proposed model estimates the runtime performance with less than 7 percent error for a second-generation GTX 280 with no on-chip caches and less than 5 percent for the Fermi-based GTX 580 with hardware caches. On the Kepler-based GTX 680, the linear model has an error of less than 10 percent. On an AMD GPU architecture, Radeon HD 6970, the model estimates with 8 percent of error rates. The proposed technique outperforms existing models by a factor of 5 to 6 in terms of accuracy.

机译：为了利用世界上最快的超级计算机的强大计算能力，必须将工作负载平均分配给典型的异构计算设备。尽管对于常规CPU存在相对准确的性能模型，但对于现代GPU却不存在准确的性能估计模型。本文介绍了两种适用于现代GPU的准确模型：基于采样的线性模型和基于机器学习（ML）技术的模型，该模型可提高线性模型的准确性，并适用于具有和不具有缓存的现代GPU。我们首先构建基于采样的线性模型，以预测任意OpenCL内核的运行时间。在对NVIDIA GPU的调度策略进行分析的基础上，我们确定了可以进行准确估算的最早采样点。线性模型无法很好地捕捉到现代GPU中实现的内存合并或缓存对性能的重大影响。因此，我们提出了一种基于ML技术的模型，该模型将编译器生成的有关内核的统计信息以及GPU的硬件性能计数器作为附加输入，以获取更精确的现代GPU运行时性能估计。通过将其应用于三种不同的NVIDIA GPU架构和一种AMD GPU架构，我们证明了该模型的有效性和广泛的适用性。在大量广泛的OpenCL基准测试中，对于没有片上缓存的第二代GTX 280，建议的模型平均估计运行时性能误差小于7％，而基于Fermi的GTX 580的运行时性能误差小于5％。硬件缓存。在基于开普勒的GTX 680上，线性模型的误差小于10％。在AMD GPU架构Radeon HD 6970上，该模型的错误率估计为8％。提出的技术在准确性方面比现有模型高5到6倍。

著录项

来源
《Parallel and Distributed Systems, IEEE Transactions on》 |2015年第7期|1800-1813|共14页
作者
Dao Thanh Tuan; Kim Jungwon; Seo Sangmin; Egger Bernhard; Lee Jaejin;
展开▼
作者单位

School of Computer Science and Engineering, Seoul National University, Seoul, Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
AMD; GPU; NVIDIA; OpenCL; caches; performance modeling; scheduling;

机译：AMD;GPU;NVIDIA;OpenCL;缓存;性能建模;计划;

相似文献

外文文献
中文文献
专利

1. Early miss prediction based periodic cache bypassing for high performance GPUs [J] . Do Cong Thuan, Kim Jong Myon, Kim Cheol Hong Microprocessors and microsystems . 2017,第nova期

机译：基于早期未命中预测的高性能GPU的定期缓存绕过
2. GREEN Cache: Exploiting the Disciplined Memory Model of OpenCL on GPUs [J] . Lee Jaekyu, Woo Dong Hyuk, Kim Hyesoon, Computers, IEEE Transactions on . 2015,第11期

机译：绿色缓存：在GPU上利用OpenCL的规范内存模型
3. Performance models for hierarchy of caches: Application to modern DNS caches [J] . Alouf Sara, Fofack Nicaise Choungmo, Nedkov Nedko Performance Evaluation . 2016,第Mara期

机译：缓存层次结构的性能模型：应用于现代DNS缓存
4. Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling [C] . André Lopes, Frederico Pratas, Leonel Sousa, 2017 IEEE International Symposium on Performance Analysis of Systems and Software . 2017

机译：利用可识别缓存的Roofline建模探索GPU性能，功耗和能效界限
5. An Analysis of the Memory Bottleneck and Cache Performance of Most Apparent Distortion Image Quality Assessment Algorithm on GPU. [D] . Kannan, Vignesh. 2016

机译：GPU上最明显失真图像质量评估算法的内存瓶颈和缓存性能分析。
6. Accuracy and Performance of Functional Parameter Estimation Using a Novel Numerical Optimization Approach for GPU-Based Kinetic Compartmental Modeling [O] . Igor Svistoun, Brandon Driscoll, Catherine Coolens 2019

机译：基于GPU的动力学隔室建模的新型数值优化方法估计功能参数的准确性和性能
7. Efficient Cache Performance Modeling in GPUs Using Reuse Distance Analysis [O] . Mohsen Kiani, Amir Rajabzadeh 2019

机译：使用重用距离分析的GPU中高效缓存性能建模

A Performance Model for GPUs with Caches

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅