期刊文献+

可靠的分布式系统生存性保障模型 被引量:2

Reliable assurance model for distributed system survivability
在线阅读 下载PDF
导出
摘要 基于检查点的协同式回滚恢复机制是一种针对分布式系统生存性保障的有效机制,现有分布式系统中基于检查点的回滚恢复机制以分布式信道可靠作为假设前提,而实际应用场景中,该假设并不总是成立。针对分布式系统实际的应用环境,提出了适用于信道不可靠的分布式计算环境的协同式系统生存性保障模型。该模型在保留检查点回滚恢复机制优点的基础上,通过建立冗余通信链路和进程迁移来保障不可靠通信信道环境下分布式系统的生存性。 The cooperative rollback recovery mechanism based on checkpointing is an effective mechanism for the survivability of distributed system. The existing cooperative rollback recovery mechanism based on checkpointing presumes that the communication channel is reliable. However, this assumption is not always true in actual application scenarios. For the actual application scenarios of distributed system, a reliable assurance model for the survivability of distributed system was proposed, based on the checkpointing-based rollback recovery mechanism. Through the creation of redundant communication channel and process migration mechanism, the proposed model assures the survivability of distributed system in actual application scenarios where the communication channel is not reliable.
机构地区 电子科技大学
出处 《计算机应用》 CSCD 北大核心 2012年第10期2748-2751,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(60973118) 中央高校基本科研业务项目(ZYGX2011J072)
关键词 检查点 分布式系统 软件生存 回滚恢复 进程迁移 checkpoint distributed system software survivability rollback recovery process migration
作者简介 耿技(1963-),男,安徽合肥人,教授,博士研究生,主要研究方向:系统软件、软件确保、信息安全;通信作者:陈非(1988-),男,江西樟树人,硕士研究生,主要研究方向:软件确保、信息安全,电子邮箱flikecn@126.com;聂鹏(1977-),男,陕西汉中人,博士研究生,主要研究方向:软件确保、软件测试软件可靠性;陈伟(1978-),男,四川温江人,讲师,博士研究生,主要研究方向:无线网络路由、网络安全;秦志光(1965-),男,四川隆昌人,教授,博士生导师,主要研究方向:开放系统、中间件、信息安全。
  • 相关文献

参考文献15

  • 1GUPTA B, RAHIMI S, YANG Y. A novel roll-back mechanism for performance enhancement of asynchronous checkpointing and recov- ery[ J]. Infonnatiea, 2007, 3(11) : 1 - 13.
  • 2ELNOZAHY E N, ALVISI L, WANG Y M, et al. A survey of roll- back-recovery protocols in message-passing systems[ J]. ACM Com- puting Surveys, 2002, 34(3): 375 "408.
  • 3WANG Y M, CHUNG P Y, LIN I J, et al. Checkpoint space recla- mation for uncoordinated checkpointing in message-passing systems [ J]. IEEE Transactions on Parallel and Distributed Systems, 1995, 6(5): 546 -554.
  • 4RUSCIO J F, HEFFNER M A, VARADARJAN S. DejaVu: Trans- parent user-level checkpointing, migration, and recovery for distrib- uted systems [ C]// SC'06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. New York: ACM, 2006: 158.
  • 5MALONEY A, GOSCINSKI A. A survey and review of the current state of rollback-recovery for cluster systems[ J]. Concurrency and Computation: Practice and Experience, 2009, 21 (12) : 1632 - 1666.
  • 6TRIPATHY M, TRIPATHY C R. A new coordinated checkpointing and rollback recovery scheme for distributed shared memory clusters [ J]. International Journal of Distributed and Parallel Systems, 2011, 2(1): 49 -58.
  • 7PRIYA S B, RAVICHANDRAN T. Fault tolerance and recovery for grid application reliability using check pointing mechanism[ J]. Intematianal Journal of Computer Applications,2011, 26(5): 32 -37.
  • 8BOUTEILLER A, HERAULT T, BOSILCA G, et al. Correlated set coordination in fault tolerant message logging protocols[ C]// Euro- Par'l 1: Proceedings of the 17th International Conference on Parallel Processing. Berlin: Springer-Verlag, 2011:51-64.
  • 9CI-/ANDY K M, LAMPORT L. Distributed snapshots: Determining global states of distributed systems[ J]. ACM Transactions on Com- puter Systems, 1985, 3(1) : 63 -75.
  • 10ELNOZAHY E N, JOHNSON D B, ZWAENEPOEL W. The per- formance of consistent checkpointing[ C]// Proceedings of the 11 th Symposium on Reliable Distributed Systems. [ S. 1. ] : IEEE, 1992: 39 - 47.

二级参考文献30

  • 1Wang YM, Chung PY, Lin IJ, Fuchs WK. Checkpoint space reclamation for uncoordinated checkpointing in message-passing systems. IEEE Trans. on Parallel and Distributed Systems, 1995,6(5):546-554.
  • 2Wang YM, Fuchs WK. Optimal message log reclamation for uncoordinated checkpointing. In: Proc. of the Conf. on Fault-Tolerant Parallel and Distributed Systems. Piscataway: IEEE Computer Society Press, 1995.24-29.
  • 3Gupta B, Rahimi S, Yang Y. A novel roll-back mechanism for performance enhancement of asynchronous cheekpointing and recovery. Informatica, 2007,31(1):1-13.
  • 4Elnozahy EN, Johnson DB, Zwaenepoel W. The performance of consistent checkpointing. In: Proc. of the 11th Syrup. on Reliable Distributed Systems. 1992.39-47.
  • 5Koo R, Toueg S. Checkpointing and rollback-recovery for distributed systems. IEEE Trans. on Software Engineering, 1987, SE-13(1):23-31.
  • 6Cao G, Singhal M. Low-Cost checkpointing with mutable checkpoints in mobile computing systems. In: Proc. of the 18th Int'l Conf. on Distributed Computing Systems. Piseataway: IEEE Computer Society Press, 1998. 464-471.
  • 7Sakata TC, Garcia IC. Non-Blocking synchronous checkpointing based on rollback-dependency trackability. In: Proc. of the 25th IEEE Symp. on Reliable Distributed Systems. Piscataway: IEEE Computer Society Press, 2006.411-420.
  • 8Tong Z, Kain RY, Tsai WT. A low overhead checkpointing and rollback recovery scheme for distributed systems. In: Proc. of the 8th Symp. on Reliable Distributed Systems. Piscataway: IEEE Computer Society Press, 1989. 12-20.
  • 9Cristian F, Jahanian F. A timestamp-based checkpointing protocol for long-lived distributed computations. In: Proc. of the 10th Symp. on Reliable Distributed Systems. Piscataway: IEEE Computer Society Press, 1991.12-20.
  • 10Kavanaugh GP, Sanders WH. Performance analysis of two time-based coordinated checkpointing protocols. In: Proc. of the Pacific Rim Int'l Symp. on Fault-Tolerant Systems. Los Alamitos: IEEE Computer Society Press, 1997. 194-201.

共引文献43

同被引文献15

  • 1A Gainaru, F Cappello, W Kramer. Taming of the shrew: Model- ing the normal and faulty behavior of large- scale HPC systems [ C ]. Proceedings of the IEEE 26th international conference on IP- DPS. Shanghai, 2012:1168-1179.
  • 2I Goiri, et al, Checkpoint-based fauh-tolerant infrastructure for virtualized service providers [ C ]. Proceedings of the IEEE Network Operations and Management Symp. Osaka, JP, 2010:455-462.
  • 3R Landauer. Irreversibility and heat generation in the computing process[ J ]. IBM Journal of Research and Development, 1961,5 (3) :183-191.
  • 4T Yokoyama, R. A GI II Ik reversible programming language and its invertible self-interpreter[ C]. Proceedings of the Partial Evalu- ation and Program Manipulation, New York, USA, 2007 : 144 - 153.
  • 5T Yokoyama. Reversible computation and reversible programming languages[ J]. Electronic Notes in Theoretical Computer Science, 2010,253(6) :71-81.
  • 6M PFrank. Reversibility for efficient computing [ D ]. Cambridge Massachusetts: MIT, 1999.
  • 7G Vulov, et al. The backstroke framework for source level reverse computation applied to parallel discrete event simulation[ C]. Pro- ceedings of the 2011 Winter on Simulation Conference (WSC), Phoenix, AZ, 2011:2960-2974.
  • 8Thomas H Cormen, et al. Introduction to Algorithms(3nd ed) [ M]. Massachusetts: MIT Press, 2009.
  • 9Chen Chih-Ho, Ting Yung, Heh Jia-Sheng. Low overhead in- cremental checkpointing and rollback recovery scheme on Win- dows operating system[ C]. Proceedings of 3rd International Con- ference on Knowledge Discovery and Data Mining, Phuket, 2010: 268-271.
  • 10I Jangjaimon, Tzeng Ning-Feng. Adaptive incremental check- pointing via delta compression for networked multicore systems [C]. Proceedings of the IEEE 27th International Symposium on IPDPS, Boston, MA, 2013:7-18.

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部