Quantcast
Channel: Sophos User Bulletin Board
Viewing all articles
Browse latest Browse all 14361

High Availability on VMware not stable

$
0
0
Hi All,

I have an installation on VMware using HA (hot standby). The systems are stable for the most part, but about once per day things go a bit sideways for a little bit. It does correct itself, but there is a small outage that results (this is especially troublesome because tunnels may take a few minutes to come back).

The basic setup is that we have two ESX servers, each is running an instance of ASG (downloaded from ftp.astaro.com - VMware image). eth5 (the sync NIC) is a direct hardware link from one ESX system to another - it doesn't use a vswitch. I have also used a backup interface but this does not help, regardless of which interface I choose.

So far, I know that some heartbeats are indeed getting lost, but I don't know why. There are only a few missing sometimes which causes a failover and then a master/master conflict which resolves by using the preferred master. The one thing that I have done that has helped the problem is increasing the dead_time$ (cc > ha > times > dead_time$) to 6 (from 3). I am hesitant to increase it further.

So after that long message, my main question: Does anybody know of a way to enable debugging in order to find missing heartbeats? I can't see why, if one system sends a heartbeat the other doesn't get it because it's a direct hardware link between the two. Historically, I've seen high load cause this type of problem, but that is not the case here.

Viewing all articles
Browse latest Browse all 14361

Trending Articles