SSD disk failures demand firmware updates

HP Enterprise disk drives are facing a failure date of October 2020, unless administrators apply a crucial firmware patch. A pair of notices from HPE warns owners of some disks about failures not earlier than October. Other Solid State Drive (SSD) disks are already in danger of dying.

The drive failure is a certainty without the patch, sooner or later. Some of these SSD drives have already rolled past a failure date last fall, if they’ve operated constantly since late 2015. The failure of the drives is being called a data death bug.

For some, HPD7 firmware is a critical fix. HPE says that an SSD manufacturer, that it hasn’t identified, told the vendor about failures in certain Serial Attached Storage (SAS) models inside HPE server and storage products. Some SAS SSD drives can use external connections to HPE’s VMS Itanium servers.

The drives can be inside HPE’s ProLiant, Synergy, and Apollo 4200 servers, plus the Synergy Storage Modules, D3000 Storage Enclosure, and StoreEasy 1000 Storage. If the disks have a firmware version prior to HPD7, they will fail at 40,000 hours of operation (i.e., 4 years, 206 days, 16 hours). Another, even larger group of HP disks will fail at 3 years, 270 days 8 hours after power-on, a total of 32,768 hours.

The numbers mean that the failures might have started as early as September of last year. The first affected drives shipped in late 2015. HP estimates the earliest date of failure based on when it first shipped the drives. Another batch of HP drives shipped in 2017. They are also at risk. These are the drives looking at an October 2020 failure date without a firmware update.

Beyond HP gear

The SSD drives are in more than just HPE servers and devices. The devices are Western Digital’s SanDisk units, according to a recent report at the website The Register. Dell has a similar support warning for its enterprise customers.

HP Enterprise says RAID failures will occur if there is no fault tolerance, such as RAID 0. Drives will fail even in a fault tolerance RAID mode “if more SSDs fail than are supported by the fault tolerance of the RAID mode on the logical drive. Example: RAID 5 logical drive with two failed SSDs.”

Adding to the complexity of the SSD failures, firmware to fix the issue has two different numbers. HPD7 repairs the 40,000-hour drives. HPD8 repairs a bigger list of devices. Leaving the HPD7 firmware inside drives among the larger list of disks — which have a death date that may arrive very soon this year — will ensure the failures.

Full details from HP’s bulletins for the 40,000-hour and for the 32,768-hour drives are at the HPE website. There are instructions on how to use HP’s Smart Storage Administrator to discover uptime, plus a script for VMware, Unix, and Linux. The scripts “perform an SSD drive firmware check for the 32,768 power-on-hours failure issue on certain HPE SAS SSD drives.”

A list of 20 HPE disk units falls under the 32,768-hour deadline. Four other HPE devices are in the separate 40,000-hour support bulletin.

SSD devices first rolled into IT markets with some concerns about failures. While many legacy sites do not use SSD, some have that type of storage in servers providing emulation and NAS.

Image by pagefact from Pixabay

Leave a Reply