Solved: Problem with ALUA & Linux

robinetapp · ‎2013-06-03

Hi to all,

i have one problem with ALUA and linux box. We have RHEL 5.9 and FAS6240 storage. We have two box of FAS on different site, which they are in metrocluster configuration. In some linux we have performance problem. We tried a lot of things, but didn't help. Currently we investigate multipathing. We discovered, that some linux machine are using controller on other site. So - let we say: i have linux and one fas on site 1 and on site 2 other fas. I create lun for server (in non-mirror agregate) and atached it in site 1, but the majority of traffic for this server goes through controller on site 2. Initiator group is defined with ALUA. Zoning are ok.

Does anyone have any idea what's wrong?

My multipath.conf:

# These are the compiled in default settings. They will be used unless you
# overwrite these values in your config file.

defaults {
# udev_dir /dev
polling_interval 5

max_fds max

        queue_without_daemon    no
        path_checker            tur
        rr_min_io               100

        failback                mmediate
        no_path_retry           5
        user_friendly_names     no
}

devices {
        device {
                vendor                  "NETAPP"
                product                 "LUN.*"

                getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
                features                "3 queue_if_no_path pg_init_retries 50"
                hardware_handler        "1 alua"
                # prio_callout          "/sbin/mpath_prio_alua -d/tmp %d"
                path_selector           "round-robin 0"
                path_grouping_policy    group_by_prio
                failback                immediate
                rr_weight               uniform
                rr_min_io               128
                path_checker            tur
                flush_on_last_del       yes

prio "alua"
}
}

robinetapp · ‎2013-06-12

Hi,

i have a solution.

the last multipath driver for RHEL 5.9 is device-mapper-multipath-0.4.7-54.el5_9.2 (with deps rpm kpartx-0.4.7-54.el5_9.2). But we have one linux server, where was installed Oracle database. In the process of installation you must update system (and oracle of course) with yum from oracle repository. From there multipath driver device-mapper-multipath-0.4.9-56.0.3.el5 was installed. I imagine, that this driver is needed for Oracle linux (which is redhat with small modification), becouse i found rpm package only in conjunction with oracle.

When i deinstaled this package and kpartx (with option --nodeps - other deps package are ok) and installed rpm from redhat repository (so device-mapper-multipath-0.4.7-54.el5_9.2) everything is working well.

So - be careful when you use mpio, linux and oracle in the same system.

BR

Robi

View solution in original post

jbell · ‎2013-06-06

Robi,

I noticed one thing that may or may not be contributing, but at any rate should be fixed:

In your default section-

# These are the compiled in default settings. They will be used unless you
# overwrite these values in your config file.

defaults {
# udev_dir /dev
polling_interval 5

max_fds max

        queue_without_daemon    no
        path_checker            tur
        rr_min_io               100

        failback                mmediate
        no_path_retry           5
        user_friendly_names     no

You should fix the typo and make mmediate immediate.

Other than that I would recommend opening a case with support. They will be able to inspect the configuration closer and validate all lun and SAN config. Also this might require a closer look by Red Hat if RHEL is sticking to non-optimized paths.

Thanks,

Jonathan

robinetapp · ‎2013-06-07

Hi Jonathan,

Yes it was my mistake. In the meantime i changed conf (with right syntax and some small changes) in one prod server and it seams that is working without problem. Tomorrow i'll test the contoller failure on test server (with deactivation active path on the san switches) and … we will see. I hope for the best.

Anyway – i'll publish the result and also new conf file.

Thanks.

Best regards,

Robi Hrvatin

robinetapp · ‎2013-06-07

Hi,

I have some good news, but...

I prepared new multipath.conf, which was confirmed by NeatApp suport:

defaults {

max_fds max

pg_prio_calc avg

queue_without_daemon no

user_friendly_names no

flush_on_last_del yes

}

#}

# All data under blacklist must be specific to your system.

blacklist {

devnode "^hd[a-z]"

devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"

devnode "^cciss.*"

}

#

devices {

device {

vendor "NETAPP"

product "LUN"

# product "LUN.*"

getuid_callout "/sbin/scsi_id -g -u -s /block/%n"

features "3 queue_if_no_path pg_init_retries 50"

hardware_handler "1 alua"

prio_callout "/sbin/mpath_prio_alua /dev/%n"

path_selector "round-robin 0"

path_grouping_policy group_by_prio

failback immediate

rr_weight uniform

rr_min_io 128

path_checker tur

# flush_on_last_del yes

}

And it's working. I tested different scenarios, and is working fine. I don't know why, but when i changed multipath.conf and when i wanted to force the system to work with primary path, file system become ReadOnly. After server was restarted everything working ok. OS is Red Hat Enterprise Linux Server release 5.8 (Tikanga) with kernel 2.6.18-308.8.2.el5, 64 bit. The same configuration on Red Hat Enterprise Linux Server release 5.9 (Tikanga) with kernel 2.6.18-348.4.1.el5, 64 bit, didn't work in the same way. I noticed that in the last system, trafic is balanced between two controllers almost 50:50 %. In conjuction with this, there are also worse performace (almost 20-25 %).

Also i noticed that two parammeters are not good for 5.9 - pg_prio_calc and prio_callout.

We investigate this issue.

BR

Robi

robinetapp · ‎2013-06-10

Hi,

another difference:

rpm -qa | grep multip

a) on RHEL 5.8:

device-mapper-multipath-0.4.7-48.el5_8.1

b) on RHEL 5.9

device-mapper-multipath-libs-0.4.9-56.0.3.el5

device-mapper-multipath-0.4.9-56.0.3.el5

BR

Robi

robinetapp · ‎2013-06-12

Hi,

i have a solution.

the last multipath driver for RHEL 5.9 is device-mapper-multipath-0.4.7-54.el5_9.2 (with deps rpm kpartx-0.4.7-54.el5_9.2). But we have one linux server, where was installed Oracle database. In the process of installation you must update system (and oracle of course) with yum from oracle repository. From there multipath driver device-mapper-multipath-0.4.9-56.0.3.el5 was installed. I imagine, that this driver is needed for Oracle linux (which is redhat with small modification), becouse i found rpm package only in conjunction with oracle.

When i deinstaled this package and kpartx (with option --nodeps - other deps package are ok) and installed rpm from redhat repository (so device-mapper-multipath-0.4.7-54.el5_9.2) everything is working well.

So - be careful when you use mpio, linux and oracle in the same system.

BR

Robi

Problem with ALUA & Linux

Alua enabled linux igroup

NetApp FAS6240 & Linux performance

NFSv4 : Linux client, Netapp Server -> Problem with id mapping

ALUA and C-mode

Problem with permissions on CIFS share to Windows and Linux