Check the detected serial number failure

#CHECK THE DETECTED SERIAL NUMBER FAILURE HOW TO#

Status : WARNING - PREDICTIVE FAILURE Manufacturer : HITACHI Model Number : H7220AA30SUN2.0T Size : 2.0TB Serial Number : 1016M7JX2Z Firmware : JKAOA28A Slot Number : 3 Cell Disk : CD_03_enkcel01 Grid Disk : DBFS_DG_CD_03_enkcel01, DATA_CD_03_enkcel01, RECO_CD_03_enkcel01 "ĪlertAction: "The data on this disk has been successfully rebalanced by Oracle ASM to other disks. Detailed information on this problem can be found at "ĪlertDescription: " Hard disk can be replaced now"ĪlertMessage: "Hard disk can be replaced now. Please wait until rebalance has completed before replacing the disk. Another alert will be sent and a blue OK-to-Remove LED will be lit on the drive when rebalance completes. The data from the disk will be automatically rebalanced by Oracle ASM to other disks. A white cell locator LED has been turned on to help locate the affected cell, and an amber service action LED has been lit on the drive to help locate the affected drive. Status : WARNING - PREDICTIVE FAILURE Manufacturer : HITACHI Model Number : H7220AA30SUN2.0T Size : 2.0TB Serial Number : 1016M7JX2Z Firmware : JKAOA28A Slot Number : 3 Cell Disk : CD_03_enkcel01 Grid Disk : DBFS_DG_CD_03_enkcel01, DATA_CD_03_enkcel01, RECO_CD_03_enkcel01"ĪlertAction: "The data hard disk has entered predictive failure status. CellCLI> LIST ALERTHISTORY WHERE alertSequenceID = 456 DETAIL ĪlertDescription: " Data hard disk entered predictive failure status"ĪlertMessage: "Data hard disk entered predictive failure status. When listing the alerts from the storage cell, indeed we see that a failure has been predicted, warning raised and even handled – XDMG process gets notified and the ASM disks get dropped from the failed grid disks (as you see from the exadisktopo output above if you scroll right). Ok, looks like /dev/sdd (with address 35:3) is the “failed” one. dev/sdd 35:3 warning - predictive CD_03_enkcel01 /dev/sdd RECO_CD_03_enkcel01 "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" dev/sdd 35:3 warning - predictive CD_03_enkcel01 /dev/sdd DBFS_DG_CD_03_enkcel01 "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" dev/sdd 35:3 warning - predictive CD_03_enkcel01 /dev/sdd DATA_CD_03_enkcel01 "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" dev/sdc 35:2 normal CD_02_enkcel01 /dev/sdc RECO_CD_02_enkcel01 RECO_CD_02_ENKCEL01 RECO "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" dev/sdc 35:2 normal CD_02_enkcel01 /dev/sdc DBFS_DG_CD_02_enkcel01 DBFS_DG_CD_02_ENKCEL01 DBFS_DG "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" dev/sdc 35:2 normal CD_02_enkcel01 /dev/sdc DATA_CD_02_enkcel01 DATA_CD_02_ENKCEL01 DATA "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" dev/sdb 35:1 normal CD_01_enkcel01 /dev/sdb3 RECO_CD_01_enkcel01 RECO_CD_01_ENKCEL01 RECO "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" dev/sdb 35:1 normal CD_01_enkcel01 /dev/sdb3 DATA_CD_01_enkcel01 DATA_CD_01_ENKCEL01 DATA "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" dev/sda 35:0 normal CD_00_enkcel01 /dev/sda3 RECO_CD_00_enkcel01 RECO_CD_00_ENKCEL01 RECO "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" To find out which exact disk, I ran one of my scripts for displaying Exadata disk topology (partial output below): SQL> Exadata disk topology from V$ASM_DISK and V$CELL_CONFIG.ĬELLNAME LUN_DEVICENAME PHYSDISK PHYSDISK_STATUS CELLDISK CD_DEVICEPART GRIDDISK ASM_DISK ASM_DISKGROUP LUNWRITECACHEMODEġ92.168.12.3 /dev/sda 35:0 normal CD_00_enkcel01 /dev/sda3 DATA_CD_00_enkcel01 DATA_CD_00_ENKCEL01 DATA "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" So, one of the disks in storage cell with IP 192.168.12.3 has been put into predictive failure mode. Show Exadata cell versions from V$CELL_CONFIG.ĭISKTYPE CELLNAME STATUS TOTAL_GB AVG_GB NUM_DISKS PREDFAIL POORPERF WTCACHEPROB PEERFAIL CRITICALįlashDisk 192.168.12.3 not present 183 23 8 3

#CHECK THE DETECTED SERIAL NUMBER FAILURE HOW TO#

I just noticed that one of our Exadatas had a disk put into “predictive failure” mode and thought to show how to measure why the disk is in that mode (as opposed to just replacing it without really understanding the issue -) SQL> cellpd Scroll down to smartctl if you wan’t to skip the Oracle stuff and get straight to the Linux disk diagnosis commands. This post also applies to non-Exadata systems as hard drives work the same way in other storage arrays too – just the commands you would use for extracting the disk-level metrics would be different. Hard Drive Predictive Failures on Linux and Exadata Tanel Poder

YOUR CART

Check the detected serial number failure

#CHECK THE DETECTED SERIAL NUMBER FAILURE HOW TO#