Solved: vFiler command cannot be run.

GTHOMAS13 · ‎2018-07-09

Hi Storage guru's

About a couple of weeks ago, I've issued the command "vfiler dr resync" since then it seems like it is hanging.

When I run any vfiler command I recieve the following error - vfiler command cannot be run while 'vfiler dr' command is running; try again later.

I initially thought giving it sometime to complete, however weeks laters it still shows the same message.

My environment:

FAS3220

ONTAP 8.14P10 7-Mode

Please assist...

GTHOMAS13 · ‎2018-10-09

++ Update

Performed a failover/giveback of the node.

Issue has been resolved.

View solution in original post

JGPSHNTAP · ‎2018-07-09

I've never seen it take that long. vfiler dr resync does something with your mirrors.

check your snapmirrors to see if anyone of them are hung up

snapmirror status

GTHOMAS13 · ‎2018-07-09

Thanks for quick response.

I can confim that there are no hanging snapmirrors, every job is transferring as per schedual.

On that note: The newly created volume was manually initialized using the snapmirror commands as the vfiler dr resync would just not work.

I'm now considering the takeover and giveback to resolve this, however what implacations would that have?

JGPSHNTAP · ‎2018-07-09

^^

That's a wierd one, I've never seen a process hang like that. And you are 100% sure all vols are replicated, then the only thing you can do is a failover and giveback.

GTHOMAS13 · ‎2018-08-21

I'm still investigating this issue, NetApp are reluctant to assist as the software version is EOS.

While digging, I noticed the source vfiler contained one more volume than the destination.

BUG 543416 lists the symptoms (Misconfiguration of SnapMirror, Busy source/destination, Unavailable source volume) and solution (switch "snapmirror off")

We then created the destination volume and manually initialized the replication for the missing volume.

This did not solve the issue either, digged abit deeper and found article:

https://kb.netapp.com/app/answers/answer_view/a_id/1071014/loc/en_US

Checking the processes, nothing stands out as unusual.

filer% ps -eaf
PID TT STAT TIME COMMAND
1444 con Is+ 0:00.08 login /dev/cuacons.auth (ontaplogin)
1445 sp. Ss+ 17:59.38 login /dev/cuasp.auth (ontaplogin) > The only PID which is increasing in time.
1446 rlm Is+ 0:00.01 login /dev/console (ontaplogin)
76746 p0 Ss 0:00.01 login [pam] (login)
76747 p0 S 0:00.01 USER=diag LOGNAME=diag HOME=/var/home/diag SHELL=/bin
76752 p0 R+ 0:00.00 USER=diag LOGNAME=diag HOME=/var/home/diag SHELL=/bin

GTHOMAS13 · ‎2018-10-09

++ Update

Performed a failover/giveback of the node.

Issue has been resolved.