End-to-End Diagnosis of Cloud Systems Against Intermittent Faults

Chao Wang1, 3, Zhongchuan Fu2 and Yanyan Huo1

  1. Computer School, Beijing Information Science and Technology University
    North 4th Ring Mid Road 35, 100101 Beijing, China
    wangchao@bistu.edu.cn
  2. Computer Science & Technology Department, Harbin Institute of Technology
    Xidazhi Street 92, 150001 Heilongjiang, China
    fuzhongchuan@hit.edu.cn
  3. Beijing Advanced Innovation Center for Materials Genome Engineering
    North 4th Ring Mid Road 35
    100101 Beijing, China

Abstract

The diagnosis of intermittent faults is challenging because of their random manifestation due to intricate mechanisms. Conventional diagnosis methods are no longer effective for these faults, especially for hierachical environment, such as cloud computing. This paper proposes a fault diagnosis method that can effectively identify and locate intermittent faults originating from (but not limited to) processors in the cloud computing environment. The method is end-to-end in that it does not rely on artificial feature extraction for applied scenarios, making it more generalizable than conventional neural network-based methods. It can be implemented with no additional fault detection mechanisms, and is realized by software with almost zero hardware cost. The proposed method shows a higher fault diagnosis accuracy than BP network, reaching 97.98% with low latency.

Key words

cloud system, intermittent fault, fault diagnosis, end-to-end, LSTM, PNN

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS200620040W

Publication information

Volume 18, Issue 3 (June 2021)
Year of Publication: 2021
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Wang, C., Fu, Z., Huo, Y.: End-to-End Diagnosis of Cloud Systems Against Intermittent Faults. Computer Science and Information Systems, Vol. 18, No. 3, 771–790. (2021), https://doi.org/10.2298/CSIS200620040W