...
首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Using Migratable Objects to Enhance Fault Tolerance Schemes in Supercomputers
【24h】

Using Migratable Objects to Enhance Fault Tolerance Schemes in Supercomputers

机译:使用可迁移对象增强超级计算机中的容错方案

获取原文
获取原文并翻译 | 示例
           

摘要

Supercomputers have seen an exponential increase in their size in the last two decades. Such a high growth rate is expected to take us to exascale in the timeframe 2018-2022. But, to bring a productive exascale environment about, it is necessary to focus on several key challenges. One of those challenges is fault tolerance. Machines at extreme scale will experience frequent failures and will require the system to avoid or overcome those failures. Various techniques have recently been developed to tolerate failures. The impact of these techniques and their scalability can be substantially enhanced by a parallel programming model called migratable objects. In this paper, we demonstrate how the migratable-objects model facilitates and improves several fault tolerance approaches. Our experimental results on thousands of cores suggest fault tolerance schemes based on migratable objects have low performance overhead and high scalability. Additionally, we present a performance model that predicts a significant benefit of using migratable objects to provide fault tolerance at extreme scale.
机译:在过去的二十年中,超级计算机的规模呈指数级增长。如此高的增长率预计将使我们在2018年至2022年的时间里达到万亿级。但是,要实现高效的百亿亿次环境,有必要重点关注几个关键挑战。这些挑战之一是容错能力。极端规模的机器会经常发生故障,因此需要系统避免或克服这些故障。最近已经开发出各种技术来容忍故障。通过称为可迁移对象的并行编程模型,可以大大增强这些技术的影响及其可伸缩性。在本文中,我们演示了可迁移对象模型如何促进和改进几种容错方法。我们在数千个内核上的实验结果表明,基于可迁移对象的容错方案具有较低的性能开销和较高的可伸缩性。此外,我们提出了一种性能模型,该模型预测了使用可迁移对象在极端规模下提供容错能力的显着优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号