Score:2

Recovering data from a Dell MD3220 array after both controllers have databases that were corrupt

ve flag

I have been working on trying to restore data from an array created on a Dell MD3220 PowerVault storage unit. On the phone with Dell and another support group for weeks now and running into brick walls. I was hoping that someone here might be able to have an idea that I could try in hopes of recovering the data. The storage appliance has 24 drive bays identified with a 0 base (so drive 24 is called 23, and drive 1 is called 0).

MD3220 FrontMD3220 Back

The unit experienced a power outage and I guess the storage unit going offline before the two servers accessing the data (via SAS cables) did was the cause of the issue. So the DBs that contains the array config located on each controller in the MD3220 (there are two) became corrupt.

  • We tried to recover the DBs by replacing the current DB with the latest backup found on the controller itself (a common scenario). That seemed to fail.

  • We even went as far as trying to rebuild the database with the files stored on my server (DBM files) that I use to manage the appliance. We had Dell generate a Validator key to use when rebuilding the databases. That seemed to fail as well.

The error I keep seeing that I can't get around is Exception type N3adp6Device24ExtentAllocatedExceptionE message "N3adp6Device24ExtentAllocatedExceptionE"with extent:553 of size:1106 for drive ordinal22.

09/29/21-19:24:37 (tRAID): WARN:  UWManager::initializeNvsramIWLog: IWLog invalidated
09/29/21-19:24:37 (tRAID): NOTE:  UWMgr findIWLogs: Found IW log drive. Devnum 0x10001 tray=0 slot=2 ssd=0 qos=3 controller=0
09/29/21-19:24:37 (tRAID): NOTE:  UWMgr findIWLogs: Found IW log drive. Devnum 0x10002 tray=0 slot=3 ssd=0 qos=3 controller=0
09/29/21-19:24:37 (IWTask): NOTE:  UWMgr: IW logging started
09/29/21-19:24:41 (tRAID): ERROR: CrushDrive::allocateExtent - Exception type N3adp6Device24ExtentAllocatedExceptionE message "N3adp6Device24ExtentAllocatedExceptionE"with extent:553 of size:1106 for drive ordinal22
09/29/21-19:24:41 (tRAID): ERROR: CrushStripe DeSerialization - Couldn't allocate extent! CrushDrive 22 Volume 1 CrushPiece 2 Extent 553
09/29/21-19:24:41 (tRAID): ERROR: Exception during stripe allocation in vdm::CrushStripePersistenceManager::initialize(1)
09/29/21-19:24:41 (tRAID): ERROR: vdm::CrushInvalidCfgMgr DB_CORRUPT detected
09/29/21-19:24:41 (tRAID): NOTE:  lockdownPrimaryDBInvalidWorker: OBB already in pcache, not updating.
09/29/21-19:24:41 (tRAID): WARN:  BackupDatabaseManager:lockdownPrimaryDBInvalid Exception IconSendInfeasibleException Error
09/29/21-19:24:41 (tRAID): WARN:  BDBM:  Client detected Primary DB Corruption. Forcing dualControllerLockdown.
09/29/21-19:24:41 (tRAID): WARN:  Ctl Reboot:
                                Reboot CompID: 0x407
                                Reboot reason: 0x11
                                Reboot reason extra: 0x2
09/29/21-19:24:41 (tRAID): WARN:  Rebooting this Controller now

I'm guessing "ordinal 22" is talking about drive 23 (of 24 drives)? Not sure what it's complaining about though. Is drive 23 bad? Is there a database on all the drives and the DB on drive 23 bad? Is there a way to restore that drives database like copy it from another etc? Is it even talking about drive 23? Any help that anyone can toss at me would have a bunch.

Thanks!!

br flag
That's a 10 year old model - I'm surprised it's still in support! When you say databases do you mean the array layout or your actual application databases? Can you not just wipe the array completely, rebuild the array and restore your data from backup?
djdomi avatar
za flag
open a ticket on Dell if you still bought it
ve flag
@djdomi not really sure how to respond to that comment. In my question, I said I have been working with dell for weeks now. So yes I have a ticket that I opened with Dell and ... yes I bought it..?
ve flag
@Chopper3 it's not under contract and we had to pay dearly to have the "one-time support" option. By DB I mean array layout. Dell calls it a database on the RAID controllers. There are a few things on it that were not backed up. So yes I could but I would lose some data I would prefer not to. It could save me many weeks in rebuild time so it would be worth trying to recover.
djdomi avatar
za flag
Shortly said, Pay the Support for dell aslong as you have this item in use. We had a Similar device, which got broken due the fact that the controllers stopped working. The Final Critial point was, where dell was onside, everything was shutdowned the last of both controller failed in that moment.... a Firmware update revived them
Zac67 avatar
ru flag
> The unit experienced a power outage and I guess the storage unit going offline before the two servers accessing the data (via SAS cables) did was the cause of the issue. That is something you *really* need to prevent. The box has two PSUs and at least one of them needs to be hooked up to a UPS (or both to different UPSes). A sudden power outage can corrupt the RAID setup (as in your case) or the data stored on the appliance - even silently, so you'd only notice days/weeks later or perhaps never. I had an MD3220i in heavy use for many years and remember that the original firmware caused a few
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.