We are using the Ansible (na_ontap_commands) Epic refresh script (eeod) and it has been failing intermittently.
Our setup consist of vm running the Epic environments (both production as the source of the refresh as well as non-production as the destination of the refresh) as well as physical proxy servers (as the destination of the refresh for backup purposes; i.e. not running an environment).
The issues we have seen are sometimes related to timing issues (getting unknown devices and just after the failure these unknowns are gone; getting /dev/sd* devices instead of /dev/mapper/... devices esp. for the physical proxy's) but also non-timing related issues (getting unknown devices that can't be solved by waiting - we solve this by removing the volume group to be able to rerun the script).
We already have made several changes to the script (together with Christian Bauernfeind of NetApp) to include sleep command's, increased logging etc.
What are your experiences with this script and do you have pointers to improve the reliability?