You are most likely running into a common issue that backups of millions small files seem to present. Even going to disk, reading and preparing millions of small files takes a toll on bandwidth. NDMP builds the image, then streams it to the storage device. It is not an NDMP backup issue as I have seen 75MB/sec and higher throughput on direct attached tape.
Now the really horrible way to do this is to backup farther down. For simplicity, if you have 12 qtrees, and 4 DSUs....Allow multistreaming, make sure you have 3 or more jobs allowed per policy, and after every 4th qtree, put in a new directive - NEW_STREAM. What this will do is have Netbackup break the qtrees up into three jobs, with 4 qtrees each. This will break the millions of files up into more streams and should hopefully allow you to get multiple streams at that 40MB mark.
Unfortunately when dealing with millions of files, your inode traversal is a factor as well, there are tons of variables, down to file types(different amounts of meta-data), inode fragmentation, etc.
You are on NBU 7, it will do multistream and now even multiplex on NDMP(though I don't know how, that isn't part of NDMP until version 5! Some internal translation in NBU I assume.) All you are interested in while using DSUs is multistream.