综述

命令

  1. lshw
    • lshw -short
  2. dmidecode
  3. sensors
  4. mcelog

状态监测

  1. Error Detection And Correction (EDAC) Devices

    • /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count
    • /sys/devices/system/edac/mc/mc*/csrow*/ue_count
  2. Reliability, Availability and Serviceability

    • mcelog

术语

FRU Field Replaceable Unit
DIMM Dual Inline Memory Module
CE Correctable Error
UE Uncorrected Error