One of the cool things about doing new installations is the ability to pick up on some of the latest gotchas. During a recent deployment of a greenfield I came across an oddity when adding the SSO Identity Source for the AD infrastructure. I went through the normal motions of adding the identity source and everything seemed fine then…BAM…try to login with AD credentials and suddenly received an error that the client was not authenticated to the VMware Inventory Service.
That definitely is not good…empty inventory when trying to get around in vCenter. I went through the typical motions of troubleshooting and found that it only happened during the AD credential logins. That leads one to believe that the issue is directly related to SSO and AD authentication, right? Exactly! So what happens next? You run to the AD guys and say, AD authentication is failing between SSO and AD…they say everything is fine (Of course). When digging into the logs we see the following error in ds.log:
[2013-11-15 11:05:25,593 pool-11-thread-1 ERROR com.vmware.vim.vcauthenticate.servlets.AuthenticationHelper] Invalid user com.vmware.vim.dataservices.ssoauthentication.exception.InvalidUserException: Domain does not exist: NT AUTHORITY
NT AUTHORITY???? That isn’t a domain. Next we need to dig a little deeper into another log file. Let’s take a look at vmware-sts-idmd.log. We would see a similar error like so:
2013-11-15 11:35:25,326 WARN [ActiveDirectoryProvider] obtainDcInfo for domain [NT AUTHORITY] failed Failed to get domain controller information for NT AUTHORITY(dwError - 1212 - ERROR_INVALID_DOMAINNAME)
Ok, so we now see this NT AUTHORITY domain reference in multiple places. So let’s take a look at the SAML strings being passed. To find this we needed to look into the vmware-identity-sts.log file. In this file we can search for NT AUTHORITY or the login credentials you were using and we see the following:
<saml2:AttributeValue xsi:type="xs:string">vsential.net\Domain Adminsvsential.net\ESX Admins**NT AUTHORITY\Claims Valid**saml2:AttributeValue>vsential.net\Denied RODC Password Replication Groupvsphere.local\Administratorsvsphere.local\Everyone
There is the NT AUTHORITY culprit…great! So now that we found it, what do we do to fix it? Come to find out the NT AUTHORITY\Clams Valid addition to the SAML token is caused by something new to Server 2012 Active Directory Group Policy. If you go into your GPO Editor and look in your Default Domain Policy and look at: Computer Configuration -> Administrative Templates -> System -> KDC -> KDC support for claims, compound authentication and Kerberos armoring. This policy will be enabled. From what I have been told, this policy is new to Server 2012 Active Directory. This particular policy is what is adding that NT AUTHORITY\Claims Valid to the SAML strings. Disable this policy and refresh the GPO on the vCenter management VMs and VIOLA!!!! Everything works like normal!
Now this may not be a complete fix but it at least seems complete. More testing will find if this fix is completely valid but it did fix my environment. Before disabling this policy be sure to touch base with your local AD gurus and get an impact assessment before doing so. Hope this helps you out as this took me a while to find and fix. Good luck!