SSO High-Availability Single Site and vCenter Linked-Mode

I came across an interesting bug today when deploying a linked-mode vCenter and SSO in High-Availability mode.  The installation of SSO, Inventory Service, and vCenter all went as planned…initially.  During my deployment I decided to reboot the secondary vCenter server.  Once the server came back online the VMware VirtualCenter Service failed to start along with the Management Webservices service.  OMG!  Are you serious????

So being the good engineer I went and looked at the logs to find out why.  Looking at the vpxd.log file I found the following:

:rotating_light:2013-11-15T14:16:04.657-08:00 [05048 error 'HttpConnectionPool-000001'] [ConnectComplete] Connect failed to ; cnx: (null), error: class Vmacore::Ssl::SSLVerifyException(SSL Exception: Verification parameters: --> PeerThumbprint: 12:03:EF:EE:17:10:29:2B:A7:14:20:8E:4E:F6:D3:88:A7:09:5F:19 --> ExpectedThumbprint: --> ExpectedPeerName: dcamgmtvc.diginsite.net --> The remote host certificate has these problems: --> --> * A certificate in the host's chain is based on an untrusted root. --> --> * self signed certificate in certificate chain) 2013-11-15T14:16:04.657-08:00 [01956 error '[SSO][SsoFactory_CreateFacade]'] Unable to create SSO facade: SSL Exception: Verification parameters: --> PeerThumbprint: 12:03:EF:EE:17:10:29:2B:A7:14:20:8E:4E:F6:D3:88:A7:09:5F:19 --> ExpectedThumbprint: --> ExpectedPeerName: dcamgmtvc.diginsite.net --> The remote host certificate has these problems: --> --> * A certificate in the host's chain is based on an untrusted root. --> --> * self signed certificate in certificate chain. 2013-11-15T14:16:04.657-08:00 [01956 error 'vpxdvpxdMain'] [Vpxd::ServerApp::Init] Init failed: Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr) --> Backtrace: --> backtrace[00] rip 000000018018cd7a --> backtrace[01] rip 0000000180106c48 --> backtrace[02] rip 000000018010803e --> backtrace[03] rip 00000001800907f8 --> backtrace[04] rip 00000000006f5bac --> backtrace[05] rip 0000000000716722 --> backtrace[06] rip 000007f6c0cbddfa --> backtrace[07] rip 000007f6c0cb795c --> backtrace[08] rip 000007f6c0ee80ab --> backtrace[09] rip 000007fb6f3cbaa1 --> backtrace[10] rip 000007fb6f0e1832 --> backtrace[11] rip 000007fb6fb2d609 --> 2013-11-15T14:16:04.658-08:00 [01956 error 'Default'] Failed to intialize VMware VirtualCenter. Shutting down...

As we can see above, it is apparent that there is some type of certificate failure.  Checking the standard things for certificate troubleshooting, I decided to verify a couple of things first.  Have a look at your vpxd.cfg and check the section of the config file.  Make sure it is pointing to the appropriate primary SSO server.  Make sure that your DNS resolution is working both forwards and reverse.  Once you have verified this, do the following:

  1. Go to C:\ProgramData\VMware\SSL
  2. Rename the ca-certificates.crt to ca-certificates.crt.old
  3. Copy the ca-certificates.crt file from the primary SSO server to this folder.  (You will find the file in the same location on your primary SSO server’s filesystem.)
  4. Restart the Inventory Service
  5. Attempt to start the VMware VirtualCenter and VMware VirtualCenter Management Webservices services

If everything was done appropriately, you should see your services come back online.  Come to find out this is a known bug with vCenter 5.5 SSO High-Availability deployments.  Basicly, the certificate that gets put in that directory is not the same certificate as the one from the primary where it should be coming from.  Hope this helps out!