ONTAP Discussions

how can I physically move filers that are mirrored ?

ANTON_MSSM
7,944 Views

Hello I have a question on how to move filers that are mirrored.

I have filer 1 and filer 2 which are scheduled(1-3 months) to be moved to a different data center.

The aggregates on the filers are mirrored in a PLEX.  I see that some aggregates are mirrored and some are not from doing a aggr status -r

netapp-a> aggr status -r

Aggregate aggr0 (online, raid4, mirrored) (block checksums)

  Plex /aggr0/plex0 (online, normal, active, pool0)

    RAID group /aggr0/plex0/rg0 (normal)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------

      parity    5a.16   5a    1   0   FC:A   0  FCAL 10000 272000/557056000  280104/573653840

      data      5a.32   5a    2   0   FC:A   0  FCAL 10000 272000/557056000  280104/573653840

  Plex /aggr0/plex2 (online, normal, active, pool1)

    RAID group /aggr0/plex2/rg0 (normal)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------

      parity    6a.32   6a    2   0   FC:A   1  FCAL 10000 272000/557056000  274845/562884296

      data      6a.16   6a    1   0   FC:A   1  FCAL 10000 272000/557056000  280104/573653840

Aggregate aggr2_SATA (online, raid_dp, mirrored) (block checksums)

  Plex /aggr2_SATA/plex0 (online, normal, active, pool0)

    RAID group /aggr2_SATA/plex0/rg0 (normal)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------

      dparity   5b.19   5b    1   3   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      parity    5b.32   5b    2   0   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      data      5b.48   5b    3   0   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      data      5b.20   5b    1   4   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      data      5b.33   5b    2   1   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      data      5b.49   5b    3   1   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      data      5b.21   5b    1   5   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      data      5b.34   5b    2   2   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      data      5b.50   5b    3   2   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      data      5b.22   5b    1   6   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      data      5b.35   5b    2   3   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      data      5b.51   5b    3   3   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      data      5b.23   5b    1   7   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

      data      5b.36   5b    2   4   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

  Plex /aggr2_SATA/plex1 (online, normal, active, pool1)

    RAID group /aggr2_SATA/plex1/rg0 (normal)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------

      dparity   6b.16   6b    1   0   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      parity    6b.32   6b    2   0   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      data      6b.48   6b    3   0   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      data      6b.17   6b    1   1   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      data      6b.33   6b    2   1   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      data      6b.49   6b    3   1   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      data      6b.18   6b    1   2   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      data      6b.34   6b    2   2   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      data      6b.50   6b    3   2   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      data      6b.19   6b    1   3   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      data      6b.35   6b    2   3   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      data      6b.51   6b    3   3   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      data      6b.20   6b    1   4   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

      data      6b.36   6b    2   4   FC:A   1  ATA   7200 847555/1735794176 847827/1736350304

What would be the process of moving them to the new data center ? Would I break the Aggregate mirror and first move one filer with its disks, followed by another after some time ? Can I break syncmirror or its not possible ?

What are my options ? thank you.

1 ACCEPTED SOLUTION

billshaffer
7,435 Views

Correct:

  cf disable (just to keep them from trying to take over)

  halt each head

  power everything off

  LABEL CABLES!!!

  move everything

  recable

  power on disk shelves, wait for initialization

  power on heads (can be done at the same time, or one at a time)

  verify everything

  cf enable

View solution in original post

19 REPLIES 19

billshaffer
7,872 Views

Anton:

What exactly are you trying to do?  Are you trying to move without downtime?  Are these two systems in a cluster relationship?

The aggr status output you posted shows two mirrored aggregates in ONE system.  Are you proposing breaking the mirror (aggr split), and assigning the split disks to the other head in the new datacenter?

Bill

ANTON_MSSM
7,872 Views

Bill, the filers are both in a cluster relationship, I am trying to minimize downtime as much as I can so that management can agree with this plan.  The mirroring of aggrs are on both heads, can I break the mirror by splitting to the other head (head b) and move head a to new data center?

aborzenkov
7,872 Views

Of course you can move head to another datacenter, but this head will probably not useful without disks. If you want to move head and disks, say so.

ANTON_MSSM
7,872 Views

sorry with disks of course

aborzenkov
7,872 Views

Well … the first problem is, NetApp does not officially support hot shelf unplug.

Second problem is, even if you manage to move one head and half of the shelves while second head stays in takeover mode, you have no way to perform giveback now. So you will face downtime while you be moving second head from one datacenter to another. This is exactly the same downtime as for moving both heads. So you do not save any downtime.

Based on information you provided so far, I do not see any sense in doing even if it were possible.

ANTON_MSSM
7,873 Views

so I just want to clarify... if I do a takeover on one node, shut down the second node and move it with its disks to the new data center, I will not be able to perform a giveback once they get moved over?

Then what are my options, shut down both nodes and then bring them back up after the move is complete?

Break the aggregate mirrors and then recreate them once the second node the is moved?

billshaffer
7,873 Views

Cluster relationships don't work across datacenters like that, so no, you wouldn't be able to perform a giveback after moving one head - plus, as aborzenkov points out, you cannot "hot-unplug" the shelves, so you would incur downtime to isolate disk shelves.

If you can split the aggr mirrors to discrete shelves, you could move one half over, swap the services, then move the other half over, as I described below (still incurs downtime, plus data consistency issues).

Best bet, if you can get the downtime, is to move everything at once.

Bill

ANTON_MSSM
7,365 Views

Ok so both filers need to be down for the duration of the move.

We thought about splitting up the mirror at first but then found another reason to keep them to say the least.

would this be as simple as stopping services on both and powering them down and then bring them up together at once, or one at a time ?

billshaffer
7,436 Views

Correct:

  cf disable (just to keep them from trying to take over)

  halt each head

  power everything off

  LABEL CABLES!!!

  move everything

  recable

  power on disk shelves, wait for initialization

  power on heads (can be done at the same time, or one at a time)

  verify everything

  cf enable

ANTON_MSSM
7,365 Views

the funny thing in all of this is that we have 2 vendors giving 2 different opinions on how to do this. one says the cf takeover should be fine and the other tells us to simply break the plex mirrors.

billshaffer
7,366 Views

Vendors can sometimes be scary...

The guy saying just do a cf takeover is up in the night, or doesn't understand what you're trying to do.

Breaking the plexes IS an option, but (as I've said) still incurs downtime and consistency issues, and you must be able to split the aggrs onto discrete shelves..

Good luck!

Bill

aborzenkov
7,365 Views

If you adventurous, you can let your vendors do it. Make sure that you have backup, and that your vendors will be responsible to compensate any loss in case of downtime. May be they know some secret tricks.

billshaffer
7,872 Views

IF you can get one plex of each mirror on the same set of shelves - so shelves a-m would contain all the "1st" plexes, and shelves n-z would contain all the "2nd" plexes, you could, theoretically:

-split the mirrors

-assign all the 2nd plex drives (now an aggr) to one head, and all the 1st plexes to the other - aggr downtime, but it is the copy, so no real downtime

-do a "quick" shutdown to recable from a clustered system to two non-clustered systems - one head with shelves a-m, the other with shelves n-z

-bring up the system that is not moving - downtime is over

-move the other system (with shelves n-z) to the new data center

-bring up the remote system

-repoint servers and applications

-shut down the local system, move it to the new datacenter

-take another "quick" downtime to recable to a clustered system

-rebuild the mirrors

One of the many problems with this is that your data at the remote site will be inconsistent.

If you have snapmirror licenses (or can get a temp), you could do as above - split the mirrors, isolate disk shelves, recable, move one head and shelves - then snapmirror the data over, repoint systems, move the other head, recable.

There is some more detail in there but you get the point.

If you can get a weekend maint window I would suggest taking the downtime and moving everything.  Better yet use the move as a reason to do a hardware refresh....

Hope that helps

Bill

billshaffer
7,872 Views

Forgot you mentioned you were active/active.  So in addition to juggling the shelves, you'd need to juggle the aggregates to get the primary on the 1st shelf....  May not even be doable, depending on your environment.

thomas_glodde
7,364 Views

Hi Anton,

if you have a 100% mirrored Stretch MetroCluster you can takeover to node 1, move node 2 and node 2 pool 0 as well as node 1 pool 1 shelves to the new datacenter. Hot unplugging shelves is supported under the assumption that the same type of shelf appears again on the exact same hba, you just need to offline the plex and offline the adapter (see shelf guide under hot shelf replacement).

When node 2 is up again you need to resync the aggregates and then you can giveback (be sure to 100% resync before giveback!). then you can rakeover to node 2, move node 1 and node 1 pool 0 as well as node 2 pool 1 to the new location.

I (storage consultant at a netapp partner) have done this procedure on a few occasions. As i said, it must be a fully mirrored MetroCluster and you need to be 100% sure to properly label everything to put the right cables on the right adapters. After moving both nodes i suggest to do yet another takeover/giveback to be on the safe side and to be sure everything works fine.

If you deem this procedure too risky your best chance would be to prepare power, fibre channel and network cables as well as rack mount kits on the new location beforehand to save time. Then shut down the complete system, move it over, recable and boot up again,

As you have never done either procedure before I suggest you to have an experienced netapp consultant onsite to support you if anything goes wrong.

Kind regards

Thomas

ANTON_MSSM
5,892 Views

thanks Thomas,

Our Netapp rep has sent me TR-3548 and said page 50 documents my scenario.

They advised us to do a cf forcetakeover -d

as per the TR;

13.2 Split-Brain Scenario

The cf forcetakeover -d command previously described allows the surviving site to take over the failed site’s responsibilities without a quorum of disks available at the failed site (normally required).

Once the problem at the failed site is resolved, the administrator must restrict booting of the previously failed node. If access is not restricted, a split-brain scenario might occur. This is the result of the controller at the failed site coming back up and not knowing that there is a takeover situation. It begins servicing data requests while the remote site also continues to serve requests. The result is the possibility of data corruption.

we dont want to shut down both nodes but if we must we will.

thomas_glodde
5,892 Views

Anton,

i would NOT recommend to go for a cf forcetakeover -d as this is not needed at all. Forcetakeover -d is only used in case of a real desaster and you more or less expect the partner to not come back anytime soon.

cf takeover on node1 so it takes over node2

power off node 2

offline pool 0 from node 2

power off pool 0 from node 2

offline pool 1 from node 1

power off pool 1 from node 1

label cables

move equipment

power on pool 0 from node 2

resync pool 0 from node 2

power on pool 1 form node 1

resync pool 1 from node 1

power up node 2

cf giveback on node 1 to give back node 2

ANTON_MSSM
5,892 Views

Thomas,

but I am only moving one node to a new location not two, you state to power off pool 1 from node 1?

thomas_glodde
5,892 Views

yes, as you move the mirrored disk of node 1 to the new location as well to have a properly spread Cluster

Public