Clustered SQL Server 2008 SP1: services not starting

By tom on December 15th, 2009

I ran into a problem last week with our design SQL 2008 cluster. After finally having our instances installed (were we had some problems with our HP CLX component – you can read about this here) we noticed that the SQL Server and SQL Server Agent services failed.

We also noticed following entries in the SQL Server Errorlogs:

Logon       Error: 17806, Severity: 20, State: 2.
Logon       SSPI handshake failed with error code 0x8009030c while establishing a connection with integrated security; the connection has been closed. [CLIENT: 171.26.245.106]
Logon       Error: 18452, Severity: 14, State: 1.
Logon       Login failed. The login is from an untrusted domain and cannot be used with Windows authentication. [CLIENT: 171.26.245.106]

The SSPI handshake error sounds like a kerberos problem, so I checked the SPNs (they were correctly registered) and the kerberos tickets (with kerbtray). No problems there at first sight.

Starting the SQL Server services from the local services window went fine, no errors in the eventlog/SQL errorlog. However in SQL Server Management Studio when trying to log on with windows authentication, it gave me the same error I saw in the errorlog: “Login failed. The login is from an untrusted domain and cannot be used with Windows authentication”. Logging on with a SQL account works fine.
Sounds like an authentication issue…

With the help of Microsoft Support we found that KB957097 was the culprit. This is a security fix to avoid remote code execution. There are 2 workarounds given in the KB. The first simply disables the LoopbackDetectionCheck. This might nog be a good idea, as this makes your system vulnerable again for the remode code execution flaw.
The second one seems to be the good one:
HKLM\SYSTEM\CurrentControlSet\Control\Lsa –> new DWORD DisableLoopbackCheck = 0
HKLM\SYSTEM\CurrentControlSet\Control\Lsa\MSV1_0 –> new Multi-String BackConnectionHostNames = CNAME of your server

Apply this on both nodes of your cluster, reboot and problem solved.

Update:
MS Support gave me a little bit more info on this:

This issue occurs if you install Microsoft Windows XP Service Pack 2 (SP2) or Microsoft Windows Server 2003 Service Pack 1 (SP1). Windows XP SP2 and Windows Server 2003 SP1 include a loopback check security feature that is designed to help prevent reflection attacks on your computer. Therefore, authentication fails if the FQDN or the custom host header that you use does not match the local computer name.

In the case of a clustered sql instance, sql services are starting with the virtual instance name, so this is a violation of the loopback check security feature. To avoid this, exclude your hostnames from the loopback check.

Hope this helps,

Tom