Implementation of a Fault Tolerant Cluster with Error Recovery for Scientific Computation
Date Issued
2007
Date
2007
Author(s)
Tzou, I-ta
DOI
en-US
Abstract
Recently, parallel computing is one of the main techniques to enhance computer performance. High performance computer can be applied to different fields, including commerce, national defense, and science. Numerical simulation is an important method that flourished science today. The simulation will fail if there is a intrusion during the simulation, so fault tolerance is an important issue.
There are two main categories of fault tolerant techniques, 1) Automatic, and 2)Non-Automatic. Basic automatic fault tolerant techniques applied on clusters will be discussed, which includes coordinated, uncoordinated checkpoints and pessimistic, optimistic message logging.
An automatic fault tolerant cluster under a scientific computational environment will be implemented with coordinated checkpoint. A storage backup strategy will also be implemented with a redundant array of inexpensive disks level five network file server.
Subjects
容錯機制
叢集系統
檢查點
訊息記錄
獨立磁碟備援陣列
網路檔案伺服器
Fault Tolerant
cluster
checkpoint
message log
redundant array of inexpensive disks
network file server
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-96-J93921042-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):85697b2c370857f1c91363a7c57b8020
