Resolution to “Connection Timeout Expired. [Pre-Login] initialization =18090; handshake=14281” error

This blog post is about a SQL Server connection issue that presents itself:

AG_handshake error

We were building an Availability Group (AG) at the time for an online banking platform.

PROD would have 4 nodes – 2 in Christchurch and 2 in Auckland. Whilst building the prePROD installation (a 3 node cluster (2 in Christchurch and 1 in Auckland) we ran into an interesting issue as described in the title.

During the build phase of setting up the AG you have to add in the replicas – and this brings up the normal connect window in SSMS.

Except for some reason Node 1 could not connect to Node 2.

Yet Node 2 could connect to Node 1.

What the?

Things got down right weird when I decided to try connecting with SQL Authentication and Node 1 COULD connect to Node 2.

But using windows authentication to connect – Node 1 could NOT connect to Node 2.

A brief description of the environment – which for this setup is one of the most secure/restrictive I’ve ever installed SQL Server in.

Each Node has a base IP address – but it also had a secondary IP address that SQL Server would listen on and the environment required non-standard ports for SQL Server to listen on.

Node 1 – 172.34.59.106 and a secondary address of  172.34.59.108

Node 2 – 172.34.59.107 and a secondary address of 172.34.59.109

The clustered IP address in Christchurch was going to be 172.34.59.110.

The non-standard port for connecting was 51234.

Just to add some complication the client had already started testing the application using Node 1 – using 172.34.59.108 which had a DNS entry associated with it that the application would connect to.

We had tried to connect to the instance on Node 2 from Node 1 using a client alias setup for the instance name (using the IP address and port ) and we also had tried using IP Address,port:

172.34.59.109,51234

We could connect to it on Node 2 itself, we could telnet to it from Node 1 but we could not connect to it IF we were using windows authentication.

As mentioned SQL Authentication worked just fine.

What the…..????

After about an hour trying every permutation we stumbled upon the eventual answer which was found by logging onto the DNS server and lo and behold Node 2 did not have a DNS entry associated with its secondary address – 172.34.59.109.

Because of the restrictive nature of this install it was not my team setting up DNS – which we normally do – or our networking department who do the big stuff.

So we added in forward and reverse DNS records and voila – things worked.

This was a very confusing error – as it only occurred if we were using windows authentication – which we needed to do for the AG.

I could not find much on the internet about the error number but after the fact I found a forum post that I have since responded to – the answers there were close but I think my answer is closer – well for my situation anyway…

This error just goes to show how important it is to go through all the variables associated with a problem, investigate everything and also make sure that things are setup how you expect – don’t assume they are.

Yip.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s