Hi Guys,
I have a hot standby HA setup in VMware ESXi. The topology is as follows (sorry for the crude diagram)
Basically I have 2 ESXi hosts, each host has many guests [We're mainly interested in two ASGs and a couple Ubuntu boxes]. The HA link is a direct physical connection and HA itself is stable.
The problem is that when we turn on the firewall2 node (on ESX2) all guests on esx2 can no longer communicate with the ASG pair once it becomes Active (as seen with the hs command) and therefore can't get to the Internet. All LAN communication is fine, though - even between esx hosts.
To hammer home the ARP Flux thought, look to section 2.1.4 here. This is very close to the issue that I see. The only difference is that after the first reply to arping, the ASG is sending a single reply with eth4's MAC (instead of eth0). I find this odd.
Things that I've tried:
Here is what I've still been seeing as symptoms:
See the bottom of this post for helpful debug output.
So the short question is How do I get firewall1 to ONLY respond to ARP requests with the MAC if target interface?
SOME COMMAND OUTPUT:
I have a hot standby HA setup in VMware ESXi. The topology is as follows (sorry for the crude diagram)
Code:
ESX1 ESX2
+-------------+ +-------------+
| | eth3 +---Physical connection for HA---+ eth3 | |
| firewall1 ==========+ +========== firewall2 |
| | eth0 +-----vSwitch (Server LAN)-------+ eth0 | |
| | | | | |
| Ubuntu --+------------+ +------------| Ubuntu |
+-------------+ +-------------+
The problem is that when we turn on the firewall2 node (on ESX2) all guests on esx2 can no longer communicate with the ASG pair once it becomes Active (as seen with the hs command) and therefore can't get to the Internet. All LAN communication is fine, though - even between esx hosts.
To hammer home the ARP Flux thought, look to section 2.1.4 here. This is very close to the issue that I see. The only difference is that after the first reply to arping, the ASG is sending a single reply with eth4's MAC (instead of eth0). I find this odd.
Things that I've tried:
- disabled virtual MAC addressing via "cc set ha advanced virtual_mac 0"
- set arp_ignore to 1 via sysctl
- set arp_announce to 2 via sysctl
- set arp_filter to 1 via sysctl
- enabled proxy ARP (hey why not)
Here is what I've still been seeing as symptoms:
- With firewall2 down, I get 3 ARP replies from firewall1, the first reply is correct, each subsequent reply is incorrect
- Ubuntu on esx2 has full connectivity to all nodes and the Internet
- once firewall2 is active, connectivity to esx1 is normal, but esx2 is down
- There are still 3 replies from firewall1, but the first MAC is now incorrect
- Ubuntu on esx2 now caches an incorrect MAC address for firewall1
See the bottom of this post for helpful debug output.
So the short question is How do I get firewall1 to ONLY respond to ARP requests with the MAC if target interface?
SOME COMMAND OUTPUT:
Code:
<M> :/home/login # tcpdump -ni any host 192.168.81.148
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 376 bytes
02:26:12.649734 arp who-has 192.168.81.1 (ff:ff:ff:ff:ff:ff) tell 192.168.81.148
02:26:12.649752 arp reply 192.168.81.1 is-at 00:1a:8c:f0:82:a0
02:26:12.649762 arp who-has 192.168.81.1 (ff:ff:ff:ff:ff:ff) tell 192.168.81.148
02:26:12.649774 arp reply 192.168.81.1 is-at 00:1a:8c:f0:82:a4
02:26:12.649782 arp who-has 192.168.81.1 (ff:ff:ff:ff:ff:ff) tell 192.168.81.148
02:26:12.649789 arp reply 192.168.81.1 is-at 00:1a:8c:f0:82:a3
02:26:13.649953 arp who-has 192.168.81.1 (00:1a:8c:f0:82:a4) tell 192.168.81.148
02:26:13.649961 arp reply 192.168.81.1 is-at 00:1a:8c:f0:82:a4
02:26:14.650246 arp who-has 192.168.81.1 (00:1a:8c:f0:82:a4) tell 192.168.81.148
02:26:14.650296 arp reply 192.168.81.1 is-at 00:1a:8c:f0:82:a4
02:26:15.650542 arp who-has 192.168.81.1 (00:1a:8c:f0:82:a4) tell 192.168.81.148
02:26:15.650553 arp reply 192.168.81.1 is-at 00:1a:8c:f0:82:a4
02:26:16.650865 arp who-has 192.168.81.1 (00:1a:8c:f0:82:a4) tell 192.168.81.148
02:26:16.650871 arp reply 192.168.81.1 is-at 00:1a:8c:f0:82:a4
02:26:17.651166 arp who-has 192.168.81.1 (00:1a:8c:f0:82:a4) tell 192.168.81.148
02:26:17.651178 arp reply 192.168.81.1 is-at 00:1a:8c:f0:82:a4
02:26:18.651401 arp who-has 192.168.81.1 (00:1a:8c:f0:82:a4) tell 192.168.81.148
02:26:18.651408 arp reply 192.168.81.1 is-at 00:1a:8c:f0:82:a4
02:26:19.651691 arp who-has 192.168.81.1 (00:1a:8c:f0:82:a4) tell 192.168.81.148
02:26:19.651700 arp reply 192.168.81.1 is-at 00:1a:8c:f0:82:a4
02:26:20.651950 arp who-has 192.168.81.1 (00:1a:8c:f0:82:a4) tell 192.168.81.148
02:26:20.651960 arp reply 192.168.81.1 is-at 00:1a:8c:f0:82:a4
^C
22 packets captured
22 packets received by filter
0 packets dropped by kernel
<M> :/home/login # sysctl net.ipv4.conf.eth0.arp_filter=1
net.ipv4.conf.eth0.arp_filter = 1
<M> :/home/login # ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:1A:8C:F0:82:A0
inet addr:192.168.81.1 Bcast:192.168.81.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:165001910 errors:0 dropped:0 overruns:0 frame:0
TX packets:219847818 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:195922132 (186.8 Mb) TX bytes:1992419006 (1900.1 Mb)