EF & E-Series, SANtricity, and Related Plug-ins

E-Series array. 2540-M2 array. Ctrl with Sub-Model Identifier Not Set or Mismatched – L7 Code.

CristianB
7,636 Views

Hello everyone,

Basically We have an array 2540-M2 with dual controllers, one of them gets EO L7 code in the 7 segment display.


we've tried to clear 'held in reset status' using one of these commands run from the active controller without success:

 

setControllerToActive_MT 1
resetController_MT 1
altResetRelease
cmgrSetAltToOptimal

 

As you can see below at the beginning the state is Lockdown:

 

Primary Ctlr 0x2c84e50
State : Optimal
Serial : SV23501785
ActualApp : 07844710
CurrentApp : 07844710
ExpectedApp : 07844710
BootVersion : 07844710
ProductId/Rev : LCSM100_F /0784
ModelName : 2680
BoardId : 2660
HostBoardId1 : 0801
HostBoardId2 :
Vendor : SUN
PartNumber : 45234-06
OEMPartNumber : 7053352
Manufactured : 50396700
Cache/Proc : 1696/352
PhyCacheSize : 1696
FRCMemSize : 209715200
ForeignState : 1
IsInitialized : 1
FailedChannels : 0
IocFaulted : 0
FailedMirChan : 0
SubModel : 191
SubModelSupport: 1
LockdownReason : 255
NumNetIfaces : 2

Alternate Ctlr 0x2c86630
State : Lockdown <---------------------------
Serial : SV30301683
ActualApp : 07844710
CurrentApp : 07844710
ExpectedApp : 07844710
BootVersion : 07844710
ProductId/Rev : LCSM100_F /0784
ModelName : 2680
BoardId : 2660
HostBoardId1 : 0801
HostBoardId2 :
Vendor : SUN
PartNumber : 45234-06
OEMPartNumber : 7053352
Manufactured : 50f34a80
Cache/Proc : 1696/352
PhyCacheSize : 1696
FRCMemSize : 209715200
ForeignState : 0
IsInitialized : 0
FailedChannels : 0
IocFaulted : 0
FailedMirChan : 0
SubModel : 191
SubModelSupport: 1
LockdownReason : 255
NumNetIfaces : 2

 

But when we tried to run altResetRelease it seems Ctrl A (alternate) gets Optimal status:

 

Primary Ctlr 0x2c84e50
State : Optimal
Serial : SV23501785
ActualApp : 07844710
CurrentApp : 07844710
ExpectedApp : 07844710
BootVersion : 07844710
ProductId/Rev : LCSM100_F /0784
ModelName : 2680
BoardId : 2660
HostBoardId1 : 0801
HostBoardId2 :
Vendor : SUN
PartNumber : 45234-06
OEMPartNumber : 7053352
Manufactured : 50396700
Cache/Proc : 1696/352
PhyCacheSize : 1696
FRCMemSize : 209715200
ForeignState : 1
IsInitialized : 1
FailedChannels : 0
IocFaulted : 0
FailedMirChan : 0
SubModel : 191
SubModelSupport: 1
LockdownReason : 255
NumNetIfaces : 2

Alternate Ctlr 0x2c86630
State : Optimal <----------------------


Serial : SV30301683
ActualApp : 07844710
CurrentApp : 07844710
ExpectedApp : 07844710
BootVersion : 07844710
ProductId/Rev : LCSM100_F /0784
ModelName : 2680
BoardId : 2660
HostBoardId1 : 0801
HostBoardId2 :
Vendor : SUN
PartNumber : 45234-06
OEMPartNumber : 7053352
Manufactured : 50f34a80
Cache/Proc : 1696/352
PhyCacheSize : 1696
FRCMemSize : 209715200
ForeignState : 0
IsInitialized : 0
FailedChannels : 0
IocFaulted : 0
FailedMirChan : 0
SubModel : 191
SubModelSupport: 1
LockdownReason : 255
NumNetIfaces : 2

 

But after a few minutes Ctrl A shows Lockdown status and 7 segment display as EO L7 again.
We've tried to replace this Controller twice as well without sucess either.

As far as I know the code EO L7 means Sub-Model Identifier Not Set or Mismatched but as you can see above in the cmgrShow command
the Submodel-ID is the same for both controllers (SubModel : 191)

 

Please, any advise/idea?

 

Thanks in advance,
Regards.


Cristian 

10 REPLIES 10

AlexDawson
7,526 Views

Hi there,

 

E-Series systems enter lockdown mode primarily to protect user data - therefore the best idea is to raise a support case if the system is under support. Is this applicable in this scenario?

 

CristianB
7,500 Views

Hi Alex,

 

unfortunatelly there is no valid Service Contract with NetApp for this Array.

So with the scenario provided ... Do you have any idea wht the Alternate Ctrl gets EO L7 code in the 7 segment display after apply

commands to clear held in reset situation? Mainly when cmgrShow commadn shows the Submodel-ID is the same for both controllers (SubModel : 191).

OE+L7 normally means the controllers sub-model is not matched so this is the main reason I don't understand this behaviour, Submodel is the same here.
The 7 segment display will only show one error but there can be more, in fact we have replaced the controller twice, rebooted the active controller with the faulty one removed but no sucess.

As far as i know to clear 'held in reset status' using one of these commands run from the active controller:

setControllerToActive_MT 1
resetController_MT 1
altResetRelease
cmgrSetAltToOptimal

But any of these commands didn't solvethis issue so far, PLease ... any idea?

Thank you,

REgards.

Cristian

AlexDawson
7,304 Views

What is the history of this system?

 

Have you tried "lemClearLockdown"?

 

Please run shell command "getMfgSubModelId" and post response

CristianB
7,258 Views

HI Alex,

 

basically one Controller in this array got a lockdown status, we tried to release as I mentioned previously from Shell commands.

Custoemr wanted to replace hardware in order to isloate this scenario so Ctrl was replaced twice and recently Midplane was replacewd as well,

obviosly the situation is the same, alternate ctrl A is showing EO LU currently so we are thinking to apply this action  plan:

-- Log into the serial port of controller A that has the "LU" status.

When logged into the serial port, you will see...

 

Press within 5 seconds: <S> for Service Interface, <BREAK> for baud rate

----> Press ESC to get into the serial interface

 

login: shellUsr

Password:

->

Serial Port shell started.

 

 

-> loadDebug

-> lemClearLockdown

-> cmgrShow

Do you agree? Any other idea/advise?

 

Thank you,

Regards.


Cristian

AlexDawson
7,252 Views

Sounds like a good start - when you get into the shell please run the "getMfgSubModelId" command - you may have received a functionally similar part from another subvendor, which is not modifiable in the field 😞

CristianB
7,240 Views

Hi Alex,

 

first of all, thank you for your replies and support, I'll collect this command output tonight when we will apply

the action plan mentioned before.

 

So ... Are you thinking the SubModel-ID are still different? This command getMfgSubModelId is to get information abolut SubModelId?

But I can see the SubModel-ID seems equal:

Primary Ctlr 0x3198700
State          : Optimal
Serial         : SV23501785
ActualApp      : 07844710
CurrentApp     : 07844710
ExpectedApp    : 07844710
BootVersion    : 07844710
ProductId/Rev  : LCSM100_F       /0784
ModelName      : 2680
BoardId        : 2660
HostBoardId1   : 0801
HostBoardId2   :
Vendor         : SUN
PartNumber     : 45234-06
OEMPartNumber  : 7053352
Manufactured   : 50396700
Cache/Proc     : 1696/352
PhyCacheSize   : 1696
FRCMemSize     : 209715200
ForeignState   : 1
IsInitialized  : 1
FailedChannels : 0
IocFaulted     : 0
FailedMirChan  : 0
SubModel       : 191 <<-------------------------------
SubModelSupport: 1
LockdownReason : 255
NumNetIfaces   : 2

Alternate Ctlr 0x3199ee0
State          : Optimal
Serial         : SV30301683
ActualApp      : 07844710
CurrentApp     : 07844710
ExpectedApp    : 07844710
BootVersion    : 07844710
ProductId/Rev  : LCSM100_F       /0784
ModelName      : 2680
BoardId        : 2660
HostBoardId1   : 0801
HostBoardId2   :
Vendor         : SUN
PartNumber     : 45234-06
OEMPartNumber  : 7053352
Manufactured   : 50f34a80
Cache/Proc     : 1696/352
PhyCacheSize   : 1696
FRCMemSize     : 209715200
ForeignState   : 0
IsInitialized  : 0
FailedChannels : 0
IocFaulted     : 0
FailedMirChan  : 0
SubModel       : 191   <<-------------------------------
SubModelSupport: 1
LockdownReason : 255
NumNetIfaces   : 2

Thank you,

Regards.

AlexDawson
7,185 Views

Thanks for clarifying - that was my guess, but it seems not to be. Unfortunately I have no further suggestions. Our documentation does not cover any scenario where this is not the cause of this error.

john16
7,085 Views

Better to use  command in shell mode

 

getMfgSubModelId

getCurrentSubModelId

 

Value       Controller State      Description
L0            Suspended                Mismatched controller types 
L1            Suspended                Missing interconnect canister
L2            Suspended                Persistent memory errors
L3            Suspended                Persistent hardware errors
L4            Suspended                Persistent data protection errors
L5            Suspended                ACS failure 
L6            Suspended                Unsupported host card 
L7            Suspended                Sub-model identifier not set or mismatched
L8            Suspended                Memory configuration error 
L9            Suspended                Link Speed Mismatch 
LA           Suspended                Reserved 
Lb            Suspended                Host Card configuration error 
LC           Suspended                Persistent cache backup configuration error
Ld            Suspended                Mixed cache memory DIMMs 
LE            Suspended                Uncertified cache memory DIMM Sizes
LF            Suspended                Lockdown with limited SYMbol support
LH           Suspended                Controller Firmware Mismatch 
LL            Suspended                Unable to Access Either Midplane SBB
Ln            Suspended                Canister not valid for enclosure
LP            Suspended                Drive port mapping tables not found
LU           Suspended                SOD reboot limit exceeded

CristianB
7,054 Views

Hi All,

 

Sorry for the delay but I was on vacation, first of all thank you for all replies.

To sum up this issue, it was a weird situation, we were able to solve this problem
once both Ctrls had the same SubModelId which was seen correctly through getCurrentSubModelId command.

We ordered the same part number to replace here (7053343    2GB FC-AL Controller Module) but it seems
that the OEMPartNumber was different for the same part number provided as you can see below:

-> getCurrentSubModelId
value = 149 = 0x95    

OEMPartNumber  : 7011128

-> getCurrentSubModelId
value = 191 = 0xbf

OEMPartNumber  : 7053352

 

Once, this issue was solved and lockdown status was cleared by lemClearLockdown command everything was fine:

Primary Ctlr 0x3198700

State          : Optimal

Serial         : SV20922442

ActualApp      : 07844710

CurrentApp     : 07844710

ExpectedApp    : 07844710

BootVersion    : 07844710

ProductId/Rev  : LCSM100_F       /0784

ModelName      : 2680

BoardId        : 2660

HostBoardId1   : 0801

HostBoardId2   :

Vendor         : SUN

PartNumber     : 45233-06

OEMPartNumber  : 7011128

Manufactured   : 4f52b080

Cache/Proc     : 1696/352

PhyCacheSize   : 1696

FRCMemSize     : 209715200

ForeignState   : 1

IsInitialized  : 1

FailedChannels : 0

IocFaulted     : 0

FailedMirChan  : 0

SubModel       : 149

SubModelSupport: 1

LockdownReason : 255

NumNetIfaces   : 2

Alternate Ctlr 0x3199ee0

State          : Optimal

Serial         : SV22808449

ActualApp      : 07844710

CurrentApp     : 07844710

ExpectedApp    : 07844710

BootVersion    : 07844710

ProductId/Rev  : LCSM100_F       /0784

ModelName      : 2680

BoardId        : 2660

HostBoardId1   : 0801

HostBoardId2   :

Vendor         : SUN

PartNumber     : 45233-06

OEMPartNumber  : 7011128

Manufactured   : 4ffb7080

Cache/Proc     : 1696/352

PhyCacheSize   : 1696

FRCMemSize     : 209715200

ForeignState   : 1

IsInitialized  : 1

FailedChannels : 0

IocFaulted     : 0

FailedMirChan  : 0

SubModel       : 149

SubModelSupport: 1

LockdownReason : 255

NumNetIfaces   : 2



And I have to say CAM reports part numbers 7053352 and 7011128 in the supportdata but the correct part number is 7053343.

Do you know any idea about this behavior?


Thank you,

Regards.

Cristian

AlexDawson
6,450 Views

Hi Cristian,

 

I don't have any information available about that, sorry.

 

Glad to hear you're back up and running!

Public