...
首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Fail-Stop Failure Algorithm-Based Fault Tolerance for Cholesky Decomposition
【24h】

Fail-Stop Failure Algorithm-Based Fault Tolerance for Cholesky Decomposition

机译:基于故障失效算法的Cholesky分解容错

获取原文
获取原文并翻译 | 示例
           

摘要

Cholesky decomposition is a widely used algorithm to solve linear equations with symmetric and positive definite coefficient matrix. With large matrices, this often will be performed on high performance supercomputers with a large number of processors. Assuming a constant failure rate per processor, the probability of a failure occurring during the execution increases linearly with additional processors. Fault tolerant methods attempt to reduce the expected execution time by allowing recovery from failure. This paper presents an analysis and implementation of a fault tolerant Cholesky factorization algorithm that does not require checkpointing for recovery from fail-stop failures. Rather, this algorithm uses redundant data added in an additional set of processes. This differs from previous works with algorithmic methods as it addresses fail-stop failures rather than fail-continue cases. The proposed fault tolerance scheme is incorporated into ScaLAPACK and validated on the supercomputer Kraken. Experimental results demonstrate that this method has decreasing overhead in relation to overall runtime as the matrix size increases, and thus shows promise to reduce the expected runtime for Cholesky factorizations on very large matrices.
机译:Cholesky分解是一种广泛使用的算法,用于求解具有对称和正定系数矩阵的线性方程。对于大型矩阵,这通常将在具有大量处理器的高性能超级计算机上执行。假设每个处理器的故障率保持不变,则执行期间发生故障的可能性会随着其他处理器的增加而线性增加。容错方法试图通过允许从故障中恢复来减少预期的执行时间。本文提出了一种容错的Cholesky因式分解算法的分析和实现,该算法不需要检查点即可从故障停止故障中恢复。而是,此算法使用在一组附加过程中添加的冗余数据。这不同于以前使用算法方法的工作,因为它解决的是故障停止失败而不是故障继续发生的情况。所提出的容错方案已合并到ScaLAPACK中,并在超级计算机Kraken上进行了验证。实验结果表明,随着矩阵大小的增加,该方法相对于整体运行时间的开销有所减少,因此有希望减少非常大的矩阵上Cholesky因式分解的预期运行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号