SAS re-cabling & disk path

_ioshy_ · ‎2014-10-13

Hi all,

First of all, sorry if this question have been answered yet, but I don't find any documentation related to this

We have a FAS3240 dual-controller system on HA, with 2 shelves. These systems have a SAS card without connections. Well, now we have 1 stack with both shelves, connected to 0a & 0b adapters on both controllers. We want to recabling the system, connecting the actual stack to 0a & 1b ports (as is described in Universal SAS Cabling document).

My question is, can we connect SAS cable from 0b to 1b port without disruption? All disks have dual path now. After this change, the disk ID will change to 0b.xx.xx to 1b.xx.xx, could it affect to the aggr configuration?

StorageNA*> storage show disk -p
PRIMARY PORT SECONDARY PORT SHELF BAY
-------- ---- --------- ---- ---------

0b.10.0 A 0a.10.0 B 10 0
0b.10.1 A 0a.10.1 B 10 1
0b.10.2 A 0a.10.2 B 10 2
0b.10.3 A 0a.10.3 B 10 3
0b.10.4 A 0a.10.4 B 10 4
0b.10.5 A 0a.10.5 B 10 5

(..)

Thanks in advance to all,

JSHACHER11 · ‎2014-11-15

"can we connect SAS cable from 0b to 1b port without disruption?" YES (given that all the disks are still connected to 0a)

"After this change, the disk ID will change to 0b.xx.xx to 1b.xx.xx, could it affect to the aggr configuration?" NO (aggregate is labeled on the disks. disk path doesn't matter)

aborzenkov · ‎2014-11-15

Do you have reference to NetApp documentation that says this change is permitted online? Changes that affect disk path access were never officially approved.

JSHACHER11 · ‎2014-11-15

not sure what you mean by "permitted". I've done that hundreds of times (online). when you remove the cable from 0b, the stack will be single path (0a only). When you put the cable into 1b, the stack will be multipathed again (0a/1b)

aborzenkov · ‎2014-11-15

not sure what you mean by "permitted".

Supported by NetApp.

I've done that hundreds of times (online).

That does not matter. What matters - will NetApp accept responsibility for service interruption/data corruption if it happens?

JSHACHER11 · ‎2014-11-15

"will NetApp accept responsibility for service interruption/data corruption if it happens?"

why would you have data corruption? I've seen shelves serving data from a single path for years

saranraj456 · ‎2014-11-16

Single path wont be a problem. but changing the adapter from 0b to 1 b will it be accepted by ontap ?

scottgelb · ‎2014-11-17

The node that has the source port moved should be shutdown or taken over... then you can perform stack maintenance. I have seen a panic before when someone moved the source port with the controller online. Support did a great job getting things back online as quickly as possible and no data was lost.. Below is an excerpt of the case (names of aggrs, etc. redacted) we had open for the customer. It caused a panic on one node, but worked on the other...these were on FC shelves back in the day but would use the same methodology for SAS shelves.

I also always boot to maintenance mode to make sure both paths are available then halt and boot to giveback per the advice below...have never had a problem moving a source port when the node is taken over and following the support recommendation.

Summary:

Loop maintenance was being performed on filer xxxxxx which involved moving FC 0a disk loop over to 2c disk loop. After removal of 0a disks remained available through multipath on loop 0d. Once the primary disk loop (previously 0a) was inserted into port 2c the disk identities were changed with the 2c prefix versus the established disk mapping of the 0a prefix. This induced a filer panic with WAFL Inconsistent for aggrxxxxx. At the time of 2c port initialization NVRAM was flushing to disks on 0d which did not allow enough time for disk mapping to complete on 2c. To protect data from being written to the wrong disk ID the filer will panic with WAFL inconsistent. To avoid this problem the filer should be brought down or taken over by the partner before loop maintenance.

The same cable maintenance was done on the partner which did not panic.
This is due to the timing of writes to disk which luckily did not occur during a disk rescan on the new port.

Impact:
Filerxxxxx is in a down state and is unable to serve data. Filer outage affecting xxxxx users.

Recovery:

WAFL_check performed on aggrxxxxx to ensure data integrity and to clear the inconsistent flag. Escalations will review the WAFL_check output from console before committing WAFL_check changes. Once WAFL_check commit is approved the changes may be committed and aggr0 status should show the aggregate online. After the aggregate is online the filer may be rebooted thereby restoring service. Once back in service issue a new autosupport from both filers and re-enable clustering.

Issue avoidance:
Netapp does not recommend performing disk loop maintenance while online.
Filer may be taken over by the partner then booted into maintenance mode or shut down when performing FC maintenance tasks. Booting into maintenance mode will provide visibility on new port path and disks. If all disks can be accounted for and aggr status displays complete raid groups then the filer may be rebooted for giveback thereby completing maintenance.

JSHACHER11 · ‎2014-11-17

- SAS is different

- there is no "primary disk loop" - both ports are equal. ONTAP will round-robin the I/O between the paths