Subscribe

Linux DB service was down, after takeover operated.

[ Edited ]

Hi,

 

One of my customer who has been using NetApp Storage LUN with various OS (Windows, Solaris, AIX, Linux etc..).

At that time, NetApp recommaned to upgrade their Ontap OS 8.1.4P2 -> 8.1.4P6.

So, I decided to upgrade both controller using non-disruptive opertation.

 

Controller 1 is for SAN / Controller 2 is for NAS Service. Following our customer's opinion, we divided the controller for uses.

Here is the problem.

 

Controller 2 which has been using NAS service took over Controller1 (SAN), FCP Services was down and up for few second during taking over.

That time, I/O error was ouccred on the DB server side, which has been using NetApp LUN. cuz DB process was down too.

More specific when server detected a FC disconnection between server and storage lun, DB dumpted large dump log, made full status to server volume.

 

Long story short, Detecting FC disconnection -> server dumpted large log to volume -> volume was full status -> DB Process was down.

NetApp support gave me a solution to change a LUN timeout value on both server side, default (30 second)  to 120 second.

 

Even though  the value of LUN connection time out was default (30 second), why the LUN was disconnected?  FCP service was down for just few second.

 

[root@redhat-cn ~]# cat /sys/block/sdX/device/timeout

[root@redhat-cn ~]# echo 120 > /sys/block/sdX/device/timeout // does this value make health check interval between two device?

 

I don't get any idea of this issue....

 

Plz give me some help

 

Thanks,