BlueXP Services

Problem recovering from steelstore critical failure

justanotheritguy
10,418 Views

 

I was moving large folders (several GBs) from one top level folder (served by steelstore CIFS) to another using windows file explorer when the copy failed halfway through. At that point the entire steelstore appliance stopped responding, including the web console interface (displayed a basic steelstore page indicating the web server was up but no console running behind it, so not quite a 404 error).

 

I waited 24 hours and the web console eventually returned on it's own accord, however the alarms triggered were:

Appliance Health: Critical
Storage Optimization Service: Critical
Storage Optimization Service Down: Critical
 
A colleague recommended I stand-up a new Steelstore appliance (now based on NetApp) and follow the upgrade procedure in the "NetApp® AltaVault® Cloud Integrated Storage 4.0 Installation and Service Guide for Cloud Appliances" (ECMP12455065) guide - which I did...

 

The first hurdle (after importing the config from the old console and attaching the required volumes to the new instance except /dev/sda1 and /dev/sdk) was when I executed the "megastore guid reset" line in the doco....

 

I received the warning/error: "Deleting megastore.guid in cloud bucket returned 110" for which I can find no information about online.

 

I pushed forward and ran "service enable" and after many hours of waiting for "Starting optimization service..." I got a "Storage Optimization Service: initialization error"

 

The current console shows the same alarm and service states:

 

Alarms Triggered:

Appliance Health: Critical
Storage Optimization Service: Critical
Storage Optimization Service Down: Critical
 
Optimization Service:
Service: running
Status: not ready
Mode: Optimized for backup workloads
 
am not sure how to proceed in recovering the system to get access to the data in the S3 bucket. I want to decommission this appliance anyway but want to recover the data that it served as some of it is important.  There isn't a lot of doco on recovery from critical failure so any advice would be appreciated. I'll include some other ssh console output (below) which i've logged along the way in case it helps.
 
steelstore01 (config) # show log
ng error message to mgmtd
Jul 28 23:24:07 steelstore01 rfsd[6600]: [replicator.ERR] (6602) test_cloud fail ed in restore mode:invalid argument
Jul 28 23:24:07 steelstore01 rfsd[6600]: [rfsd.ERR] (6602) Cloud test failed: in valid argument
Jul 28 23:24:07 steelstore01 rfsd[6600]: [rfsd.ERR] (6602) Cloud test failed
Jul 28 23:24:07 steelstore01 rfsd[6600]: [rfsd.INFO] (6600) tearing down RfsCont ext
Jul 28 23:24:07 steelstore01 rfsd[6600]: [rfsd.INFO] (6600) Megamount not runnin g
Jul 28 23:24:07 steelstore01 rfsd[6600]: [rfsd.INFO] (6600) Shutting down backen d threads
Jul 28 23:24:07 steelstore01 rfsd[6600]: [mgmt/mgmtd.NOTICE] (6600) rfsd sent ev ent to mgmtd: /rbt/rfsd/events/notready
Jul 28 23:24:07 steelstore01 mgmtd[2443]: [mgmtd.INFO]: EVENT: /rbt/rfsd/events /notready
Jul 28 23:24:07 steelstore01 mgmtd[2443]: [mgmtd.INFO]: in rfsd_notup
Jul 28 23:24:07 steelstore01 mgmtd[2443]: [mgmtd.ERR]: Error no message binding from rfsd.
Jul 28 23:24:07 steelstore01 cli[31032]: [cli.INFO]: user admin: Executing comma nd: show log
Jul 28 23:24:07 steelstore01 cli[31032]: [cli.INFO]: user admin: Command show lo g authorized
lines 24877-24889/24889 (END) steelstore01 (config) #
steelstore01 (config) #
steelstore01 (config) # show log
Jul 29 09:03:15 steelstore01 mgmtd[2443]: [mgmtd.INFO]: Calling module apply fun ction for 15 modules
Jul 29 09:03:15 steelstore01 mgmtd[2443]: [mgmtd.INFO]: Finished calling module apply functions for 14 modules
Jul 29 09:03:15 steelstore01 mgmtd[2443]: [mgmtd.INFO]: Finished database commit
Jul 29 09:13:15 steelstore01 mgmtd[2443]: [mgmtd.INFO]: Starting database commit
Jul 29 09:13:15 steelstore01 mgmtd[2443]: [mgmtd.INFO]: Commit side-effects loop executed 1 times
Jul 29 09:13:15 steelstore01 mgmtd[2443]: [mgmtd.INFO]: Calling node apply funct ions and sysctl key handling functions for 0 nodes
Jul 29 09:13:15 steelstore01 mgmtd[2443]: [mgmtd.INFO]: Finished calling node ap ply functions for 0 nodes
Jul 29 09:13:15 steelstore01 mgmtd[2443]: [mgmtd.INFO]: Finished applying sysctl node values for 0 nodes
Jul 29 09:13:15 steelstore01 mgmtd[2443]: [mgmtd.INFO]: Calling module apply fun ction for 15 modules
Jul 29 09:13:15 steelstore01 mgmtd[2443]: [mgmtd.INFO]: Finished calling module apply functions for 14 modules
Jul 29 09:13:15 steelstore01 mgmtd[2443]: [mgmtd.INFO]: Finished database commit
10 REPLIES 10
Public