Subscribe
Accepted Solution

Problem with ALUA & Linux

Hi to all,

i have one problem with ALUA and linux box. We have RHEL 5.9 and FAS6240 storage. We have two box of FAS on different site, which they are in metrocluster configuration. In some linux we have performance problem. We tried a lot of things, but didn't help. Currently we investigate multipathing. We discovered, that some linux machine are using controller on other site. So - let we say: i have linux and one fas on site 1 and on site 2 other fas. I create lun for server  (in non-mirror agregate) and atached it in site 1, but the majority of traffic for this server goes through controller on site 2. Initiator group is defined with ALUA. Zoning are ok.

Does anyone have any idea what's wrong?

My multipath.conf:

# These are the compiled in default settings.  They will be used unless you
# overwrite these values in your config file.

defaults {
#       udev_dir                /dev
        polling_interval        5

        max_fds                 max

        queue_without_daemon    no
        path_checker            tur
        rr_min_io               100

        failback                mmediate
        no_path_retry           5
        user_friendly_names     no
}

devices {
        device {
                vendor                  "NETAPP"
                product                 "LUN.*"

                getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
                features                "3 queue_if_no_path pg_init_retries 50"
                hardware_handler        "1 alua"
                # prio_callout          "/sbin/mpath_prio_alua -d/tmp %d"
                path_selector           "round-robin 0"
                path_grouping_policy    group_by_prio
                failback                immediate
                rr_weight               uniform
                rr_min_io               128
                path_checker            tur
                flush_on_last_del       yes

                prio                    "alua"
        }
}

Re: Problem with ALUA & Linux

Robi,

I noticed one thing that may or may not be contributing, but at any rate should be fixed:

In your default section-

# These are the compiled in default settings.  They will be used unless you
# overwrite these values in your config file.

defaults {
#       udev_dir                /dev
        polling_interval        5

        max_fds                 max

        queue_without_daemon    no
        path_checker            tur
        rr_min_io               100

        failback                mmediate
        no_path_retry           5
        user_friendly_names     no

You should fix the typo and make mmediate immediate.

Other than that I would recommend opening a case with support.  They will be able to inspect the configuration closer and validate all lun and SAN config.  Also this might require a closer look by Red Hat if RHEL is sticking to non-optimized paths.

Thanks,

Jonathan

Re: Problem with ALUA & Linux

Hi Jonathan,

Yes it was my mistake. In the meantime i changed conf (with right syntax and some small changes) in one prod server and it seams that is working without problem. Tomorrow i'll test the contoller failure on test server (with deactivation active path on the san switches)  and … we will see. I hope for the best.

Anyway – i'll publish the result and also new conf file.

Thanks.

Best regards,

Robi Hrvatin

Re: Problem with ALUA & Linux

Hi,


I have some good news, but...


I prepared new multipath.conf, which was confirmed by NeatApp suport:

defaults {

        max_fds                 max

        pg_prio_calc            avg

        queue_without_daemon    no

        user_friendly_names     no

        flush_on_last_del       yes

}

#}

# All data under blacklist must be specific to your system.

blacklist {

devnode "^hd[a-z]"

devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"

devnode "^cciss.*"

}

#

devices {

        device {

                vendor                  "NETAPP"

                product                 "LUN"

#                product                 "LUN.*"

                getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"

                features                "3 queue_if_no_path pg_init_retries 50"

                hardware_handler        "1 alua"

                prio_callout            "/sbin/mpath_prio_alua /dev/%n"

                path_selector           "round-robin 0"

                path_grouping_policy    group_by_prio

                failback                immediate

                rr_weight               uniform

                rr_min_io               128

                path_checker            tur

#               flush_on_last_del       yes

        }

}


And it's working. I tested different scenarios, and is working fine. I don't know why, but when i changed multipath.conf and when i wanted to force the system to work with primary path, file system become ReadOnly. After server was restarted everything working ok. OS is Red Hat Enterprise Linux Server release 5.8 (Tikanga) with kernel 2.6.18-308.8.2.el5, 64 bit. The same configuration on  Red Hat Enterprise Linux Server release 5.9 (Tikanga) with kernel 2.6.18-348.4.1.el5, 64 bit, didn't work in the same way. I noticed that in the last system, trafic is balanced between two controllers almost 50:50 %. In conjuction with this, there are also worse performace (almost 20-25 %).

Also i noticed that two parammeters are not good for 5.9 - pg_prio_calc and prio_callout.

We investigate this issue.


BR

Robi

Re: Problem with ALUA & Linux

Hi,

another difference:

rpm -qa | grep multip

a) on RHEL 5.8:

            device-mapper-multipath-0.4.7-48.el5_8.1

b) on RHEL 5.9     

           device-mapper-multipath-libs-0.4.9-56.0.3.el5

           device-mapper-multipath-0.4.9-56.0.3.el5

BR

Robi

Re: Problem with ALUA & Linux

Hi,

i have a solution.

the last multipath driver for RHEL 5.9 is device-mapper-multipath-0.4.7-54.el5_9.2 (with deps rpm kpartx-0.4.7-54.el5_9.2). But we have one linux server, where was installed Oracle database. In the process of installation you must update system (and oracle of course) with yum from oracle repository. From there multipath driver device-mapper-multipath-0.4.9-56.0.3.el5 was installed. I imagine, that this driver is needed for Oracle linux (which is redhat with small modification), becouse i found rpm package only in conjunction with oracle.

When i deinstaled this package and kpartx (with option --nodeps - other deps package are ok) and installed rpm from redhat repository (so device-mapper-multipath-0.4.7-54.el5_9.2) everything is working well.

So - be careful when you use mpio, linux and oracle in the same system.

BR

Robi