ONTAP Discussions

NDMP BACKUP on CDOT8.2.2P1 NDMP BACKUPS FAILED WITH "Open filer Read Error"

widelinks
1,779 Views

We have a situation,  where backup team reported as volumes with huge used capacity, they are taking multiple re-attempts and some times they are failings with error "open filer read error " and the exist code is 12"

 

Environment : Clustered OnTAP 8.2.2P1 with 4 nodes.  (2 nodes - FAS3250 & 2 nodes - FAS8040 )

                           Netbackup version   : 7.5  (1-Master & 2-media servers)

                           NMDP mode :  LAN based backups.

                           Volumes        :  All are the thin volumes provisioned as CIFS and NFS exports.

                           Aggregate space : Reached to 90%

 

Part of our initial verification, we have observed:

 

1)Storage node having idle sessions and they were not cleared automatically.

 

   After verification, came to know....there are many parallel sessions were active and many of them are in queue.  Job schedules were not configured in balancing mode. 

   We have given suggestion to make changes in backup policies  and due to that, we are in a position of not seeing any backup failures.

 

   Also for idle sessions, Netapp vendor having a known bug.  which leads to have uncleared stale NDMP sessions.

 

2)Still we are seeing, some of the volumes for which used capacity is more than 15TB of used space it is taking multiple re-attempts and finally it was able to complete the  

    backups.

   

    Steps Followed:  Verified the available free space and identified some of the volumes are highly utilized and it was not having sufficient space for snapshot .  

 

                                     We are working with  respective stake holders for action items.

 

  3) Checked with vendor about aggregate utilization details and according to him, we can go up to 95%.  Then it will leads to performance Issues and till that time, NDMP     

      backups will consider available space and inodes at volume level.  

 

   Current Status: 1) There are 2 volumes, with 15TB and 57TB used capacity and having sufficient free space. On these volumes we are seeing

                               2)  NDMP log shows below messages:

 

                                        a) 00000033.03de2a6c 1693d35a Thu Dec 29 2016 14:50:11 +10:00 [kern_ndmpd:info:16996] A [src/rdb/SLM.cc 2451 (0x8090061c0)]: SLM 1000: current _sitelist is now: SLV <<10,122003>, b2f8553f-3ee6-11e4-b0fb-123478563412>.
00000033.03de2a6d 1693d35a Thu Dec 29 2016 14:50:11 +10:00 [kern_ndmpd:info:16996] [76539]  DEBUG: get on userprofile_all iterator failed with error entry doesn't exist
00000033.03de2a6e 1693d35a Thu Dec 29 2016 14:50:11 +10:00 [kern_ndmpd:info:16996] [76539]  ERROR: Password for user 'root' cannot be fetched locally

 

                                          b) 00000033.03de2b70 1693d712 Thu Dec 29 2016 14:51:52 +10:00 [kern_ndmpd:info:16996] [76218]   INFO: buffer blocked for ndmp_io_process (fd=10) count total 736128180 (37879314)
00000033.03de2e44 1693e4da Thu Dec 29 2016 14:57:44 +10:00 [kern_ndmpd:info:16996] [76218]   INFO: buffer full for ndmp_io_process (fd=10) count total 1014896584 (5)
00000033.03de2e45 1693e5b5 Thu Dec 29 2016 14:58:06 +10:00 [kern_ndmpd:info:16996] [76218]   INFO: buffer full for ndmp_io_process (fd=10) count total 1021764620 (5)
00000033.03de2e46 1693e67d Thu Dec 29 2016 14:58:26 +10:00 [kern_ndmpd:info:16996] [76218]   INFO: buffer full for ndmp_io_process (fd=10) count total 1028395244 (5)
00000033.03de2e48 1693e764 Thu Dec 29 2016 14:58:49 +10:00 [kern_ndmpd:info:16996] [76218]   INFO: buffer full for ndmp_io_process (fd=10) count total 1035235636 (5) (-0:00:01)
00000033.03de2e4a 1693e848 Thu Dec 29 2016 14:59:12 +10:00 [kern_ndmpd:info:16996] [76218]   INFO: buffer full for ndmp_io_process (fd=10) count total 1042425468 (5)
00000033.03de2e4b 1693e930 Thu Dec 29 2016 14:59:35 +10:00 [kern_ndmpd:info:16996] [76218]   INFO: buffer full for ndmp_io_process (fd=10) count total 1049218720 (5)

 

 

Kindly need your support to understand

 

0 REPLIES 0
Public