Hyper-V Virtual Machines losing network connectivity under Server 2012 R2
Recently was dealt with a pretty major issue where a brand new cluster configuration had an issue with the virtual machines losing network connectivity sporadically. It would happen without any warning, or errors. Event viewer looked clean, and the network adapters were functioning fine.
Problem was not all virtual machines would lose network connectivity. One would go down, and others on the same external Hyper-V switch would function just fine. I found out thorough troubleshooting that if I live migrated the VM from one host to another that it would restore network connectivity, but sometimes with a loss to others. The customer was also complaining that the performance of accessing network data was poor even when the network on the VM’s was operational.
Since I was ruling out everything in the Hyper-V cluster, I quickly started to think that the network switch the network adapters were plugged into was faulty. It was a new switch, HP 1810, and I figured I was dealing with an ARP issue on the switch. (1810 unfortunately doesn’t give many options for managing it). We had similar setups, with the only variable being this cluster was ‘R2’, and all ran ok for years without this issue. I called both HP and Microsoft for support, and initially we replaced the switch. Of course, things ran ok for a few days, but then again in the worst time possible the VM’s started losing connectivity again.
After about 6 calls to Microsoft, and escalation to senior network support, I was given a fix: Disable VMQ on the physical network adapters on the host boxes.
To disable VMQ, open up device manager, right clicking each adapter (including any teams that you have configured - as well as each member of the team) choose properties, and select the advanced tab. Finally scroll down to Virtual Machine Queues and choose the option Disable from the drop down box:
Note that when you do this it will disable, then re-enable the adapter on the host. So make sure that you are aware of the network loss when you make this change.
Also be sure to uncheck “Enable Virtual Machine Queue” under each of the virtual machine settings as well. Both need to be done, or you could wind up with network performance issues.
I asked the Microsoft tech if there was a KB article on this specific issue, something I could send to our mutual customer, but he stated there was nothing specifically for the lost network connectivity. He did point me to the following KB article for poor performance under 2012 (non-R2), something that I have not witnessed yet.
Post a comment below on your thoughts, or if you have ran into this specific issue.