The transition to NetApp MS Azure AD B2C is complete. If you missed the pre-registration, you will be invited to reigister at next log in.
Please note that access to your NetApp data may take up to 1 hour.
To learn more, read the FAQ and watch the video.
Need assistance? Complete this form and select “Registration Issue” as the Feedback Category.

Active IQ Unified Manager Discussions

WFA issue with cache updates

dcornely1

Hello,

I'm running the latest version of WFA v2.1.0.70.32 in my environment and I've come across an issue I thought wouldn't exist because I'm using certified commands.  The issue is that WFA does not appear to be aware of changes it has made before the OnCommand cache updates occur.  Here is the scenario:


Step 1)

I'm creating a new CDOT export policy, export rule, and volume.  This flow works without issue and is called ss_cdot_CreateNFSVolumeWithExport

Step 2)

Before WFA and/or OnCommand has had a chance to learn about the change from the step 1 workflow via scheduled cache updates I attempt to run a workflow

that creates a new rule in the policy created from step 1.  This fails and will continue to fail until WFA's cache is updated from OnCommand and it learns about this new policy.

This flow is called ss_cdot_CreateExportRule

All the commands in both flows are certified so I had thought that would avoid this issue.  I had originally been using a modified No-Op command in the first create flow for the reasons behind this post but even after removing that single non-certified command the problem remains.  The only thing I can think of is that I created these flows in WFA v2.0 and recently upgraded to v2.1.

I'm either missing something or have encountered a bug in WFA regarding export policies and cache awareness of them, although I'm leaning towards an error I made somewhere but haven't found it yet.  I'm attaching both flows in the hopes that they will reveal where I've tripped up.  Hopefully it's something simple, thanks in advance.

-Dave

37 REPLIES 37

ag
NetApp

Francois and Yann,

You guys have actually uncovered a limitation in WFA.

I was able to reproduce the issue of the "clone volume workflow" where you remove a volume and clone it again.

I digged deeper and here is what i found:

     1. There are two commands that create reservations here:

          Remove volume( if exists)

          Create Clone

     2. An important point here is that the volume that gets deleted and the clone being created have the same name.

     3. The workflow runs perfectly fine the first time and reservations are created. But at this point, the "cache update" is set to "NO" for both the reservations. Which is expected. Now if you go ahead and run data acquisition on DFM and cache acquisition in WFA, the reservation for "create clone" is cleared("cache updated" is set to YES). But, the reservation for "remove volume" is not cleared("cache updated" in still NO). This is what you guys observed.

     4. Why is that happening?

There are two conflicting reservations that are happening here. One reservation thinks "test_clone" volume does not exist(REMOVE VOLUME). The other reservation thinks "test_clone" volume exists and was newly created(CREATE CLONE).

Now, When the cache acquisition happens in WFA, the congruence test for "create clone" checks if the newly reported data contains the "test_clone" volume. It does exist, so the reservation is CLEARED. No problems here.

The congruence test for "remove volume" is expecting that "test_clone" volume is NOT present in the cache because it deleted the volume itself. But, surprise!! It is present, so it thinks that the OCUM has not yet reported that the volume is deleted and it remains in "cache updated" being NO. It will stay NO until the default period of 4hours(because OCUM will never report that the volume does not exist), which is why francois was able to come again the next day and execute it successfully.

     5.Coming to the results of the filter being weird.

When you run a filter with "Use reservation" being checked, the filter takes the WFA cache, merges it with reservation data and provides the result.

Essentially, what is happening here is that the filter takes the "test_clone" volume from WFA and applies any reservations (Remember, the "remove volume" is not cleared yet). The resultant effect is that the volume is removed.

If you remove the "use reservation" check in the filter, the "test_clone" volume is taken from WFA cache but the "remove volume" reservation is NOT run. Therefore you will see the volume in the result.

     6. How can we fix this?

This can happen to any object(not just volume) if a workflow create conflicting reservations.

To decide upon a fix and provide a workaround, i need to understand the use case here.

Why is this workflow being used exactly?

Are there similar workflows that are being used?

If not, what other workflows are being run?

How frequently will this workflow required to be run?

It will be helpful if i can get answers to those questions.

Thanks,

Anil

mgoddard

Hi Dave,

Good news! I've created a custom Create Export Policy command that includes reservations missing from the certified command, you could use it to also avoid the problem in a more robust manor, attached below.

I tested by replacing the certified command in the first workflow (CreateNFSVolWithExport), and the second workflow now finds the policy before a polling cycle.

Hope that's useful!

Kind Regards,

- Michael.

francoisbnc

Michael,

I'm looking for a way to use reservation in my custom commands, for caching purpose.

I saw in xml file:

INSERT INTO cm_storage.export_policy

SELECT NULL as id,

PolicyName as name,

vs.id as vserver_id

FROM cm_storage.vserver vs

JOIN

cm_storage.cluster cl

ON (cl.primary_address=Cluster OR cl.name=Cluster)

AND vs.cluster_id = cl.id

AND vs.name = VserverName;

Is it the clue?

Regards,

François

adaikkap

Hi Francois,

     Pls add your case to RFE 746319 so that it gets prioritized.

BTW can you provide some more details on your custom command ?

Regards

adai

francoisbnc

Hello  adai,

Custom command is Clone Volume like with clone on last snapshot with specific suffix name.

I use this for cloning SQL environnement snapshoted with SMSQL and unic name convention -UN

Regards,

François

abhit

Reservation is not supported in custom commands.

This feature may be available in a future release.

Michael, you have used a certified command in the workflow

which has a reservation script.


Regards

Abhi

francoisbnc

Hi abhit,

I exported my custom command to dar and changed xml to integrate <reservationScript> section from certified command "Clone Volume" .

My problem is fixed.

Do you see something dangerous  to use in this way?

François

yannb

Great tip François.

When I did the same thing, it seemed to work, until you do an acquisition in WFA.

I don't know why yet, but what I got, looking at the reservations in WFA web UI, was a "Cache Updated" YES, for an export policy that was not refreshed in OCUM yet (It was "NO" before acquisition). The volume reservation had a good status of refreshed to "NO" (i.e. waiting for OCUM to report it).

I might have done something wrong, I need to do some research

francoisbnc

Hi Yann,

Where do you retrieve <reservationScript> xml portion? As certified command are not exportable, it was necessary to take a look directly in MySQL DB.

For that,  I installed a separate MySQL server, where I have root access and I restored the full DB of WFA. wfa.command_definition was now accessible.

All the informations was in

SELECT reservation_script FROM wfa.command_definition

WHERE command LIKE '%clone%';

François

yannb

Well, actually it is even weirder... Qtree creation is not populated in the qtree table either, but that Command is supposed to use reservation... really odd

ag
NetApp

Yann,

To answer this specific question, reservations are not directly committed to the specific tables. In your case, qtrees that are newly created are not directly committed to the qtree table. Rather they are stored in the wfa.reservation table and will be committed once the acquisition from OCUM confirms the objects presence. However from the WFA UI, if you were to use a filter to find the newly created qtree, it will use this reservation data and make it appears as though it were taken from the qtree table itself.

francoisbnc

Anil,

I experience the same behavior, event certified commands  "clone volume", "remove volume"

my simple workflow delete cloned volume first, before clone again.

In the first round:

Tested the existence with "if volume was not found: disable this command" , the remove volume step was omitted. Good

Clone created successfully.

Second round cache works fine, because " remove volume" is executed, and clone works fine.


As Yann said, if I force OCUM acquire, cache updated change to YES  and I tried to relaunch workflow, the first step is ommited again. So delete doesn't occur and clone failed.

What is wrong?

ag
NetApp

Francois,

This looks a bit strange because i find that there are reservations and congruence tests in both the remove volume and clone volume command.

With the given description i cannot figure out much.

I will be able to help if you can attach both workflows and a backup of your WFA.

francoisbnc

Hi Anil,

Attached the simple workflow, note my custom command is inside, but disabled.

This morning after one night, the situation was back to normal, workflow running fine.

DFM view:

before clone:

root@gdc01093# dfm volume list |grep test

1761 gdc01148:/test_clone                 Flexible     64_bit     No        

            

after clone

root@gdc01093# dfm volume list |grep test

1764 gdc01148:/test_clone                 Flexible     64_bit     No                     

After aquire, WFA pull the old volume definition 1761, because dfm discover was not launched.

Could be an incidence?

Regards,

ag
NetApp

Francois,

It could be a one-off issue. Keep a watch and if it happens again, we will raise an internal bug and address it.

Thanks,

Anil

francoisbnc

That is not an one-off, I can reproduce, just with Datasource acquire now.

Seems that clean the cache for the volume concerned or change the reference back to the old cloned volume in precedent round

Francois

abhit

Are you using your modified commands, where you have written the reservation/congruence scripts?

It is most likely an issue with the congruence script.

The congruence script is running late or not cleaning the reservation.

Congruence cleans the reserved entry when the entry is available in DB.

Hence when you uncheck "use reservation data" you are getting the correct result.

Regards

Abhi

francoisbnc

This is happening with certified commands "clone volume", "remove volume" as well.

After acquire volume disappears from query when cache is used.

Waiting on complet cycle OCUM discover and WFA acquire, volume come back again

François

yannb

Just to be clear, if "ITS Clone volume on Last Snapshot" command does not have congruenceTest, you will have reservation issue.

Now, you said it is happening with certified commands as well ? Canyou provide a workflow that demonstrate the issue ? Using only certified ?

francoisbnc

Hello Yann,

Effectively I had forgotten congruence in my custom command, but anyway with only certified I experience the same. Is it working for you?

Here attached my example.

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public