Tech ONTAP Blogs

Google Cloud NetApp Volumes: Test Active Directory connectivity

okrause
NetApp
216 Views

SMB volumes or NFS with extended groups or Kerberos depend on Microsoft Active Directory for lookup and authentication of user/group identities. To use Active Directory, Google Cloud NetApp Volumes needs to join Active Directory as a member server. Because joining Active Directory is a complex task and can fail due to misconfiguration or network problems, NetApp Volumes has introduced a connectivity test that helps identify and resolve issues.

 

Is your Active Directory policy correct?

To access Active Directory, you must create an Active Directory policy in NetApp Volumes. Such a policy contains all the information the service needs to connect to Active Directory, such as domain name, DNS servers, and domain join credentials. When you create a policy, this information is stored in the service, but the service doesn’t join the domain right away. Only when you create the first volume that requires Active Directory integration (for example, an SMB volume) is the service joined to your domain.

 

How does the volume know how to join? A volume is created within a storage pool, and the storage pool in turn is associated with an Active Directory policy. It’s as if the pool “knows” how to connect to Active Directory through the information provided by the associated Active Directory policy, and that knowledge is inherited by the volume.

 

This means that you can successfully create an Active Directory policy using incorrect data without getting an error. Or maybe you entered the correct data, but incorrect routing or a firewall blocks NetApp Volumes from connecting to the domain controllers. Or your Active Directory administrator gave you correct user credentials, but that user doesn’t have domain join permissions.

 

If you really can connect to Active Directory is tested later. For service level Flex, this happens when you attach the Active Directory policy to the pool. For service levels Standard, Premium and Extreme it happens when you create the first volume in a pool which requires Active Directory. This “fail late potential can be confusing. So, we decided to provide an optional workflow that pulls in the connectivity test right after Active Directory policy creation or updates. This workflow is available for service levels Standard, Premium and Extreme, but not yet for Flex.

 

Testing an Active Directory connection

Let’s run through the workflow using an example that lets us diagnose and resolve two actual connectivity problems. We’ll use Cloud Console to run the workflow.

 

okrause_0-1733145818631.png

 


It all starts with creating an Active Directory policy. After we create the policy, a screen displays all our Active Directory policies.

 

 

Clicking our policy opens the details screen, which adds the Associated Storage Pools heading:

okrause_1-1733145818635.png

 

The screen shows that this policy isn’t currently being used by any storage pool, so let’s assign it to an existing storage pool. We can do this conveniently from the policy detail screen, or the classic way of editing the storage pool.

 

When we click the +Assign button to the right of Associated Storage Pools, a dialog box opens, allowing us to select a pool. Note that the dialog box shows only pools that have no policy attached and are in the same region as the policy:

okrause_2-1733145818637.png

 

 

After we click Assign, the policy is now attached to the pool, and we get the option to run a test, using the Test Active Directory Connection button.

okrause_3-1733145818639.png

 

 

Connectivity tests are always triggered on a pool level. Different pools can connect to different networks (Virtual Private Clouds, or VPCs), and Active Directory might be reachable from one VPC but not from another. The storage pool is the construct that ties volumes, VPCs, and Active Directory policies together.

 

Let’s trigger a test by clicking Test Active Directory Connection:

okrause_4-1733145818640.png

 

The test runs for several minutes. Meanwhile, the Active Directory policy cannot be modified.

 

Troubleshooting problems

The main purpose of a connectivity test is to identify problems with Active Directory connections. Let’s intentionally put some bad data into the policy to showcase the iterative problem-solving process. The first problem we encounter is:

 

okrause_5-1733145818642.png

 

 

The test returns an error message specifying the problem. Here, NetApp Volumes fails to connect to the DNS server we specified. DNS plays a crucial role in Active Directory, because it makes heavy use of DNS-based service discovery to identify domain controllers that provide services like LDAP or Kerberos. A simple DNS query returns a list of domain controllers. For the service to join the domain, at least one of those domain controllers needs to be reachable.

 

In our example, we can’t reach the DNS server. There are three issues that could cause this problem:

  1. The specified server isn’t a DNS server.
  2. The server cannot be reached by NetApp Volumes due to routing issues. A common problem is nontransitive routing of VPC peering. NetApp Volumes is connected using VPC peering to our network. If all our domain controllers are in another network that’s connected to our network through VPC peering, the domain controllers are two VPC peering hops away from NetApp Volumes and traffic won’t get through. We need to move at least one domain controller into the network that the service is peered to.
  3. The DNS server has a firewall that doesn’t allow connections from NetApp Volumes. We need to open its firewall to traffic originating from NetApp Volumes.

 

For issues #2 and #3, we have to make changes to our infrastructure. After every change, we can retest to see if the change solves the problem. In this example, we simply specified the wrong DNS server IP address (issue #1). After editing the Active Directory policy and fixing the DNS IP, we’re ready for the next test run.

 

It fails again, but this time with a different error. So, we resolved the first problem, but now we encounter the next one. The new error says:

 

okrause_6-1733145818644.png

 

 

The error message says that the credentials to join the service to the domain are wrong. Although we still haven’t reached our goal, we made good progress. The service managed to use DNS-based discovery to find domain controllers. It also talked to the NetLogon service on one of the domain controllers, which means that routing and firewalls are good; we simply specified a wrong username or password. In this example, it’s just a typo in the password. After we edit the Active Directory policy and fix the password, the next connectivity test run is successful.

 

okrause_7-1733145818645.png

 

 

We’re now ready to deploy an SMB volume in our pool.

 

Best practices

Troubleshooting connectivity to Active Directory is often an iterative process. Active Directory is a complex service with many components spread over multiple servers. Add routing limitations or firewalls misconfigurations, and it can be challenging to connect NetApp Volumes to this external dependency that’s outside NetApp Volumes’ control.

 

The new Active Directory connectivity test moves the process of establishing a successful connection closer to the process of creating and updating the connectivity instructions, which are the Active Directory policy. This allows quicker iterations to success and enables better service resilience.

 

Questions and answers

In closing, here’s a short Q&A on best practices.

 

Q: Do I need to run this test for every pool and every Active Directory policy?

A: No, you don’t have to. If a given policy works for one pool in a region, it will work for other pools in that region too, if they connect to the same VPC. If they connect to different VPCs, we strongly recommend running the test. Other regions will use different policies and pools anyway and are subject to their own tests.

 

Q: Should I run the tests regularly?

A: No. If your pool already has SMB volumes that use an existing Active Directory policy, you won’t gain any availability advantages by running the test again. If your pool doesn’t yet have any volumes that require Active Directory, running the test to “be prepared” might increase convenience, but also adds overhead to the system. We recommend running the test before you make use of Active Directory.

 

Q: Is this test available for service level Flex?

A: Currently this test is only available for service levels Standard, Premium and Extreme. For service level Flex, the actual volume creation will test Active Directory connectivity and will error out with helpful error messages in case connectivity is bad.

 

If you have additional questions or would like more information, please reach out to me or comment on this blog.

 

 

 

Public