EF & E-Series, SANtricity, and Related Plug-ins

E2600 Disaster Recovery

agapotron
16,895 Views

We have a few E2600s that are now out of support. One of them seemed to have a controller failure recently, and somehow it got fully messed up in that the other controller got locked up, and the unit became unusable. Eventually we got the controller unlocked, but now that controller lost all the configuration and reports the unit as if it had never been configured. Wondering if there's a way to recover the configuration from the disks, before I completely give up and wipe away all the data. We could even move all the disks to a similar unit and try recovering there. But all my attempts have failed so far... Thanks for any pointers...

 

Jorge

29 REPLIES 29

rachid
14,251 Views

Hi jorge,

 

The configuration information is stored in the DACstore which live in each drive depending on your firmware version

 

How did you unlock the controller ?

How do you see you lost all configuration ? Santricity ?

 

"reports the unit as if it had never been configured" --> what message do you have exactly (error or warning or information) ?

 

Do you have an old support bundle ?

agapotron
14,626 Views

After a long time looking around, I unlocked the controller by going in through the serial console and doing lemClearLockdown. Before doing that, I couldn't really do anything else. 

 

When I see the unit now in Santricity, it gives me the default configuration screen, asking if I want to create a disk pool and saying I have 60 disks available. I also am able to connect using SMcli, and do a show storageArray healthStatus; which returns the following:

 

show storageArray healthStatus
;

The controller clocks in the storage array are out of synchronization with the storage management station.

Controller in Slot A: Unknown
Controller in Slot B: Wed Mar 30 08:53:28 PDT 2016
Storage Management Station: Wed Mar 30 16:09:19 PDT 2016

 

The following failures have been found:
Offline Controller
Storage array: Unnamed
Component reporting problem: Controller in slot A
  Status: Failed
  Location: Controller/Drive tray 0, Controller in slot A
  Replacement part number: 46482-00
  Board ID: 2660
  Submodel ID: 138
  Serial number: SV11813489
Component requiring service: Controller in slot A
  Service action (removal) allowed: Yes
  Service action LED on component: Yes

 

Storage Array in Recovery Mode
Storage array: Unnamed
  Database location: On drives
    Status: Invalid
    Revision number: 1.0
    Identification number: 0000000000000000000000000000000000000000
    Generation number: 0
    Accessibility state: Offline
  Database location: Onboard (Controller B)
    Status: Frozen
    Revision number: Unknown
    Identification number: Unknown
    Generation number: 0
    Accessibility state: Recovery Mode
  Database location: Onboard (Controller A)
    Status: Unknown
    Revision number: Unknown
    Identification number: Unknown
    Generation number: 0
    Accessibility state: Unknown

 

 

I'm worried about the fact that it says that the database on the drives is invalid...

rachid
14,622 Views

Okayyyyy

 

When controllers detect corruption in the primary database, they reboot and enter a Lockdown state, which is the Recovery Mode.

 

Just to confirm. What does the seven segment display shows you ?

 

1- You should have on the santricity host the backup configuration file. In Windows, this is under C:\Program Files (x86)\StorageManager\client\data\monitor\dbcapture.

 

2- Go on the enterprise management window > right-click > Execute script and type the following command

(Check the help about this command by going in the menu Help > commande reference)

 

load storageArray dbmDatabase file=”<configuration file name>"

 

3- Check the syntax and Execute the command

 

Controler should exit the recovery mode afterwards, if not clear it.

agapotron
14,615 Views

Hmmm. What is "the seven segment display"?

 

Our Santricity is running on linux, not Windows, and it is managing several hosts. Not sure how we could find the backup configuration file there...

 

rachid
14,225 Views

The seven segment display is a small LED screen on the back of the controller.

 

 

Try in /opt or /bin

Sahana
14,217 Views

Hi,

 

Please refer Kb 3014685 What procedure needs to be followed if a controller enters lockdown?

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

rachid
14,598 Views

Hello Jorge,

 

Did you find any backup file ?

 

If not, I have an alternative but I would need a recent support bundle. I can try to generate a recovery script

 

or

 

You can clear the recovery mode but with no garanty that the controller will like it.

 

 

agapotron
14,571 Views

Hello again,

 

Sorry about the no reply yesterday, other crisis took precedence. Anyway, here are the answers to your questions:

 

The seven segment display is showing 00 for the non-faulty controller, and 0E, then Lt for the faulty controller.

 

I didn't find a backup file, but I'm not sure what I'm looking for. Any particular name for the file I'm looking for? Hmmm. Maybe I did find something. I looked for anything called .cfg, and there's this candidate:

 

   ./opt/SM/FirmwareUpgradeReports/genome-cloudstore/genome-cloudstore_60080E500023B966000000004F8805F3_Configuration.cfg

 

I found a couple of old support bundles, but they are for a different array. We have other arrays with similar configurations, if that would help.

 

 

 

rachid
14,562 Views

Welcome back.

 

Haha no worries.

 

1- 0E + Lt means that he is in recovery mode.

 

2- I know that in windows it lives in a zip file

 

3- No, you can't take support bundle from other array, there is some specific values volume wwn or capacity or assigned drives that would not match.

 

Let's try something else then. Type the following command on both controllers and send me the output please

bdbmShowBackupDatabaseInfo

 

 

agapotron
13,908 Views

Hi again,

 

I tried running the command "bdbmShowBackupDatabaseInfo"  on the controllers. Could only run it on one controller, and the output was:

 

-> bdbmShowBackupDatabaseInfo
==============================
Backup Database Properties
==============================
BDBM BackupDatabaseManager : 0x3c39538
m_IsInitialized : 1
m_IsRecordInitialized : 0
m_IsRecordUpdateOK : 1
m_ClearBackupDuringInit : 0
m_FreezeMessageQueue : x0 count: 0

Validation Mgr (0x03bdf990):
Structure validation:
Process Mutex: 0x03bdf9d8
Current DQ Writer: 0x00000000
Max Orphan Messages: 128
Cycles Task ID Mutex: 0x03bdfa58
Cycles Task ID: 0x0 (N/A)
OBB Mirror validation:
Status: 3 (EarlyExecution)
Running: 0
Cycles: 0
Errors: 0
Not Ready: 0
bdbmPersistentBackupClient : 0x3c39808
m_BackupSOD : 0x0
m_IsRecoveryInProgress : no
m_IsSpaceAvailable : yes
m_BackupStatus : 0x0
m_ForeignState : 0x0
OBB exists on flash? : yes
bdbmDisableBackupClient : false
Last Backup Time : 01/01/1970 00:00:00 (0)
Last Restore Time : 01/01/1970 00:00:00 (0)
Last Delete Time : 01/01/1970 00:00:00 (0)

State Manager (0x3c3cf48) :
State Pointer : 0x3c3d9e0 (Frozen)
Bkup Timer Node : 0x3c3cfc8
Bkup Timer sec remain : 0 (0x0)

NVSRAMManager (0x3c3cfc4), byte = 0x80
Frozen: true
PreferRPA: false
FWActivation: false

State (0x3c3d9e0) :
Identification : 16
Name : Frozen
Supported Queries :
Is OBB Write OK : No
Is OBB Read OK : Yes
Is Synchronized : No
Is Synchronizing : No
Is Periodic Back Up? : No
Status : Frozen
Accessibility : RecoveryMode
Static State Variables:
Is SOD Complete : Yes
Is Alt SOD Complete : Yes
Is Primary Invalid : Yes
Is Primary W/O Drives : No
Is Battery Usable : Yes
Is Primary DB Recreated: No
Is OBB DB Recreated : Yes
Is In Service Mode : No
Is pCache Available : Yes
Is Lockdown Required : No
value = 0 = 0x0
-> Working BDBMRecord:
Revision: 1.0
Primary Recreated Ctlr A: 1
Primary Recreated Ctlr B: 1
Clear OBB Ctlr A: 0
Clear OBB Ctlr B: 0
PrimaryDatabaseId: " " | 0000000000000000000000000000000000000000
PrimaryDatabase created at: Never
BackupDatabaseA ID: " " | 0000000000000000000000000000000000000000
BackupDatabaseA created at: Never
BackupDatabaseB ID: " " | 0000000000000000000000000000000000000000
BackupDatabaseB created at: Never
Generation Number: 0

From Native database:
04/01/16-14:17:20 (bdbmShowRecord): ERROR: getRecordNative: Caught DbmNoFileSystemException: recType: 140
Error reading from native database

From On-board Backup
Error reading from on-board backup database

 

 

The second controller hangs right after it tries to boot. Here's the output from that controller when it is booting:

 

Reset, Power-Up Diagnostics - Loop 1 of 1
3600 Processor DRAM
01 Data lines Passed
02 Address lines Passed
3300 NVSRAM
01 Data lines Passed
4410 Ethernet 82574 1
01 Register read Passed
02 Register address lines Passed
4411 Ethernet 82574 2
01 Register read Passed
02 Register address lines Passed
6D40 Bobcat
02 Flash Test Passed
3700 PLB SRAM
01 Data lines Passed
02 Address lines Passed
3900 Real-Time Clock
01 RT Clock Tick Passed
Diagnostic Manager exited normally.
04/01/16-14:24:56 (tNetCfgInit): NOTE: eth0: LinkUp event
Current date: 04/01/16 time: 14:25:16

Send <BREAK> for Service Interface or baud rate change
04/01/16-14:25:16 (tRAID): NOTE: Set Powerup State
04/01/16-14:25:16 (tRAID): SOD Sequence is Normal, 0 on controller B
04/01/16-14:25:17 (tRAID): NOTE: loading flash file: ECTGenOEM
04/01/16-14:25:17 (tRAID): NOTE: unloading flash file: ECTGenOEM
04/01/16-14:25:18 (tRAID): NOTE: Turning on tray summary fault LED
04/01/16-14:25:18 (tRAID): NOTE: SODRebootLoop- Limit:5 Cnt:2
04/01/16-14:25:18 (tRAID): WARN: Error threshold exceeded: "Client_Corrupt_DB_Detected_On_Alt"
04/01/16-14:25:18 (tRAID): NOTE: Installed Protocols: <MTPs: SAS USB > <ITPs: RDMA IPOIB > <STPs: FCP RmtDMA iSCSI SAS >
04/01/16-14:25:18 (tRAID): NOTE: Required Protocols: <MTPs: SAS > <ITPs: UNK

 

*at that point, it just hangs, and it won't accept any commands or <BREAK> or anything.

 

Jorge

 

rachid
13,905 Views

This is not a good sign :

 

From Native database:
04/01/16-14:17:20 (bdbmShowRecord): ERROR: getRecordNative: Caught DbmNoFileSystemException: recType: 140
Error reading from native database

From On-board Backup
Error reading from on-board backup database

 

the both places were are stored config backup are unreadable :S

rachid
13,511 Views

little correction on my statement 2 replies before, Lt is a lockdown code for database corrupted

agapotron
13,897 Views

Is this the configuration file you want? This is from when we recently upgraded the firmware on the unit:

 

// Logical configuration information from Storage Array genome-cloudstore.
// Saved on February 25, 2015
// Firmware package version for Storage Array genome-cloudstore = 07.84.44.00
// NVSRAM package version for Storage Array genome-cloudstore = N26X0-784834-DB2

//on error stop;

// Uncomment the two lines below to delete the existing configuration.
//show "Deleting the existing configuration.";
//clear storageArray configuration;

// Storage Array global logical configuration script commands
show "Setting the Storage Array user label to genome-cloudstore.";
set storageArray userLabel="genome-cloudstore";

show "Setting the Storage Array media scan rate to disabled.";
set storageArray mediaScanRate=disabled;

// Uncomment the three lines below to remove the default volume (if exists). NOTE: Default volume name is always = "Unnamed".
//on error continue;
//show "Deleting the default volume created during the removal of the existing configuration.";
//delete volume["Unnamed"] removeVolumeGroup=true;
//on error stop;

// Copies the hot spare settings
// NOTE: These statements are wrapped in on-error continue and on-error stop statements to
// account for minor differences in capacity from the drive of the Storage Array on which the
// configuration was saved to that of the drives on which the configuration will be copied.
show "Setting the Storage Array cache block size to 32.";
set storageArray cacheBlockSize=32;

show "Setting the Storage Array to begin cache flush at 80% full.";
set storageArray cacheFlushStart=80;

show "Setting the Storage Array to end cache flush at 80% full.";
set storageArray cacheFlushStop=80;

// Creating Host Topology

show "Creating Disk Pool Disk_Pool_1.";
//This command creates disk pool <Disk_Pool_1>.
create diskPool drives=(99,1,1 99,2,1 99,3,1 99,4,1 99,5,1 99,1,2 99,2,2 99,3,2 99,4,2 99,5,2 99,1,3 99,2,3 99,3,3 99,4,3 99,5,3 99,1,4 99,2,4 99,3,4 99,4,4 99,5,4 99,1,5 99,2,5 99,3,5 99,4,5 99,5
,5 99,1,6 99,2,6 99,3,6 99,4,6 99,5,6 99,1,7 99,2,7 99,3,7 99,4,7 99,5,7 99,1,8 99,2,8 99,3,8 99,4,8 99,5,8 99,1,9 99,2,9 99,3,9 99,4,9 99,5,9 99,1,10 99,2,10 99,3,10 99,4,10 99,5,10 99,1,11 99,2,
11 99,3,11 99,4,11 99,5,11 99,1,12 99,2,12 99,3,12 99,4,12 99,5,12) userLabel="Disk_Pool_1" securityType=capable dataAssurance=none warningThreshold=0 criticalThreshold=0 criticalPriority=highest
degradedPriority=high backgroundPriority=low;
show "Setting the reserved drive count to 2.";
set diskPool ["Disk_Pool_1"] reservedDriveCount=2;

show "Creating volume 1 on disk pool Disk_Pool_1.";
//This command creates volume <1> on disk pool <Disk_Pool_1>.
create volume diskPool="Disk_Pool_1" userLabel="1" owner=A capacity=15294206742528 Bytes dataAssurance=none mapping=none;
show "Setting additional attributes for volume 1.";
// Configuration settings that can not be set during Volume creation.
set volume["1"] cacheFlushModifier=10;
set volume["1"] cacheWithoutBatteryEnabled=false;
set volume["1"] mirrorEnabled=true;
set volume["1"] readCacheEnabled=true;
set volume["1"] writeCacheEnabled=false;
set volume["1"] mediaScanEnabled=false;
set volume["1"] redundancyCheckEnabled=false;
set volume["1"] cacheReadPrefetch=true;
set volume["1"] modificationPriority=high;

show "Creating volume 2 on disk pool Disk_Pool_1.";
//This command creates volume <2> on disk pool <Disk_Pool_1>.
create volume diskPool="Disk_Pool_1" userLabel="2" owner=B capacity=15294206742528 Bytes dataAssurance=none mapping=none;
show "Setting additional attributes for volume 2.";
// Configuration settings that can not be set during Volume creation.
set volume["2"] cacheFlushModifier=10;
set volume["2"] cacheWithoutBatteryEnabled=false;
set volume["2"] mirrorEnabled=true;
set volume["2"] readCacheEnabled=true;
set volume["2"] writeCacheEnabled=true;
set volume["2"] mediaScanEnabled=false;
set volume["2"] redundancyCheckEnabled=false;
set volume["2"] cacheReadPrefetch=true;
set volume["2"] modificationPriority=high;

show "Creating volume 3 on disk pool Disk_Pool_1.";
//This command creates volume <3> on disk pool <Disk_Pool_1>.
create volume diskPool="Disk_Pool_1" userLabel="3" owner=A capacity=15294206742528 Bytes dataAssurance=none mapping=none;
show "Setting additional attributes for volume 3.";
// Configuration settings that can not be set during Volume creation.
set volume["3"] cacheFlushModifier=10;
set volume["3"] cacheWithoutBatteryEnabled=false;
set volume["3"] mirrorEnabled=true;
set volume["3"] readCacheEnabled=true;
set volume["3"] writeCacheEnabled=true;
set volume["3"] mediaScanEnabled=false;
set volume["3"] redundancyCheckEnabled=false;
set volume["3"] cacheReadPrefetch=true;
set volume["3"] modificationPriority=high;

show "Creating volume 4 on disk pool Disk_Pool_1.";
//This command creates volume <4> on disk pool <Disk_Pool_1>.
create volume diskPool="Disk_Pool_1" userLabel="4" owner=B capacity=15294206742528 Bytes dataAssurance=none mapping=none;
show "Setting additional attributes for volume 4.";
// Configuration settings that can not be set during Volume creation.
set volume["4"] cacheFlushModifier=10;
set volume["4"] cacheWithoutBatteryEnabled=false;
set volume["4"] mirrorEnabled=true;
set volume["4"] readCacheEnabled=true;
set volume["4"] writeCacheEnabled=true;
set volume["4"] mediaScanEnabled=false;
set volume["4"] redundancyCheckEnabled=false;
set volume["4"] cacheReadPrefetch=true;
set volume["4"] modificationPriority=high;

show "Creating volume 5 on disk pool Disk_Pool_1.";
//This command creates volume <5> on disk pool <Disk_Pool_1>.
create volume diskPool="Disk_Pool_1" userLabel="5" owner=A capacity=15294206742528 Bytes dataAssurance=none mapping=none;
show "Setting additional attributes for volume 5.";
// Configuration settings that can not be set during Volume creation.
set volume["5"] cacheFlushModifier=10;
set volume["5"] cacheWithoutBatteryEnabled=false;
set volume["5"] mirrorEnabled=true;
set volume["5"] readCacheEnabled=true;
set volume["5"] writeCacheEnabled=true;
set volume["5"] mediaScanEnabled=false;
set volume["5"] redundancyCheckEnabled=false;
set volume["5"] cacheReadPrefetch=true;
set volume["5"] modificationPriority=high;

show "Creating volume 6 on disk pool Disk_Pool_1.";
//This command creates volume <6> on disk pool <Disk_Pool_1>.
create volume diskPool="Disk_Pool_1" userLabel="6" owner=B capacity=15294378541056 Bytes dataAssurance=none mapping=none;
show "Setting additional attributes for volume 6.";
// Configuration settings that can not be set during Volume creation.
set volume["6"] cacheFlushModifier=10;
set volume["6"] cacheWithoutBatteryEnabled=false;
set volume["6"] mirrorEnabled=true;
set volume["6"] readCacheEnabled=true;
set volume["6"] writeCacheEnabled=true;
set volume["6"] mediaScanEnabled=false;
set volume["6"] redundancyCheckEnabled=false;
set volume["6"] cacheReadPrefetch=true;
set volume["6"] modificationPriority=high;
// Creating Volume-To-LUN Mappings
show "Creating Volume-to-LUN Mapping for Volume 1 to LUN 0.";
set volume ["1"] logicalUnitNumber=0 hostGroup=defaultGroup;

show "Creating Volume-to-LUN Mapping for Volume 2 to LUN 1.";
set volume ["2"] logicalUnitNumber=1 hostGroup=defaultGroup;

show "Creating Volume-to-LUN Mapping for Volume 3 to LUN 2.";
set volume ["3"] logicalUnitNumber=2 hostGroup=defaultGroup;

show "Creating Volume-to-LUN Mapping for Volume 4 to LUN 3.";
set volume ["4"] logicalUnitNumber=3 hostGroup=defaultGroup;

show "Creating Volume-to-LUN Mapping for Volume 5 to LUN 4.";
set volume ["5"] logicalUnitNumber=4 hostGroup=defaultGroup;

show "Creating Volume-to-LUN Mapping for Volume 6 to LUN 5.";
set volume ["6"] logicalUnitNumber=5 hostGroup=defaultGroup;

 

 

rachid
13,886 Views

ohhh yeah it looks like it but instead of create we should do a recover.

 

What I would do is to :

 

- See if from that file I can generate a recovery script

- Wipe the data (normally the data should not be deleted) by setting an option in the VKI_EDIT_OPTIONS

- Once the controllers are up and running clear the options in the VKI_EDIT_OPTIONS

- Execute the recovery script

 

Since I forgot my charger at work I'm out of battery - Will contact you back on Monday to continue working with you on it and check if it's doable

agapotron
13,884 Views

That sounds wonderful! Thanks so much for all the help!!!! 

 

By the way, it seems that, under linux, there's a treasure trove of data under /var/opt/SM - that's where I found this information, as well as a lot of other files for the other untis we have...

rachid
12,969 Views

hello jorge,

 

do you have a dbm backup in the following path

 

/var/opt/SM/monitor/dbcapture/

 

Many thanks

rachid
12,574 Views

Hello Jorge,

 

May be you have found some support bundle.

Let me know.

 

Thanks

agapotron
12,953 Views

There are a lot of files in the /var/opt/SM/monitor/dbcapture directory. I think the appropriate ones are:

 

-rw-rw-r-- 1 root root 744547 Mar 7 09:11 RetrievedRecords_60080e500023b966000000004f8805f3_2016_03_07_09_11_33.dbm.zip
-rw-rw-r-- 1 root root 750627 Mar 7 15:24 RetrievedRecords_60080e500023b966000000004f8805f3_2016_03_07_15_24_15.dbm.zip
-rw-rw-r-- 1 root root 750759 Mar 7 17:29 RetrievedRecords_60080e500023b966000000004f8805f3_2016_03_07_17_29_17.dbm.zip

 

Is this what you're looking for? What do I do with them?

agapotron
12,937 Views

So, I'm wondering if, from your earlier post, what I need to do is to pick one of the files on the dbcapture directory (I assume the latest file) and do a:

 

load storageArray dbmDatabase file=”RetrievedRecords_60080e500023b966000000004f8805f3_2016_03_07_17_29_17.dbm.zip"

 

(maybe I need to unzip it first)... Is that all, or do I need to do something before that? You mentioned something about generating a recovery script, but not sure how to do that... Thanks!

rachid
12,803 Views

hello jorge,

 

Hope you are doing well.

 

If you have the DBM, that perfect.

I would the DBM zip file and the 2 serial numbers of controller A et Controller B to generate a validator key to be able to recover the system (Send it by PM)

 

Many thanks

Public