dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
819
JoelC707
Premium Member
join:2002-07-09
Lanett, AL

JoelC707

Premium Member

Strange Network Glitches

I originally posted about this in the MS forum because it's mainly dealing with my home network (even though my home and work network are more or less one and the same) and everything is MS based so I figured that was a good first stop. Sadly I haven't gotten anywhere with it and my issue persists so I figured I'd come here hoping to catch more people that have used Hyper-V (as I suspect the issue is either with Hyper-V or Server 2012 R2).

Here's my network setup:
Dell C2100 running Server 2012 R2 and Hyper-V
HP 1810g-24 main switch
Intel Pro/1000 PT quad port NIC in 4-port LACP trunk to the switch (have also tested without the trunk in place and a single NIC including onboard NIC)

The Hyper-V system hosts all the work VMs as well as a pair of DCs for my home network, my file server and pfSense. As far as I know, nothing is going on with the work VMs, those using them daily would have mentioned something by now (which is another reason I didn't post it here originally).

What happens is occasionally my connection to the file server will "glitch" and drop my connection to it. If I've got a video playing from it it'll abort and tell me the file can't be found anymore. If I have explorer open it'll close and claim the network device isn't available anymore. I can always immediately reconnect to it with no issues.

I use a GPO to mount the drive and honestly I have no other need for a domain at home so if it's really something to do with Server 2012 I can get rid of it and go with another solution if it comes to that.

I have disabled SR-IOV and VMq on both the physical adapters as well as the logical adapter, the virtual switch on the host and on all VMs. The quad port NIC doesn't even support either of those, but the onboard NICs do. All NICs are Intel, no Broadcom so the VMq thing technically shouldn't have been an issue though as I said, it's all disabled anyway.

I was doing dynamic RAM on the file server (same with most of the other VMs as well) but I had the thought that it was running out of RAM before it could expand, or it was simply running out and was fully expanded. I first increased it's maximum dynamic allotment and later switched it to a static allotment of 10GB (max I saw used with a 10GB dynamic was 6.8GB). My VMs are all running 2012 R2 but the work stuff is all still on 2008 so maybe it's something with the new OS?

guppy_fish
Premium Member
join:2003-12-09
Palm Harbor, FL

guppy_fish

Premium Member

What does event viewer show in your logs?

In general, shot gunning parts is not a good debugging technique, if something takes a dump, you will most likely have error messages somewhere ...

It been to long for AD stuff for me, but it has its own logs and viewers
JoelC707
Premium Member
join:2002-07-09
Lanett, AL

JoelC707

Premium Member

Sorry forgot to mention that, the only thing in the host or guest logs is mention of lost RPC connection to one or more DCs but they do not always coincide with the glitches. I have not checked into AD specific logs. Here's my original thread on the subject: »Odd network glitches.

guppy_fish
Premium Member
join:2003-12-09
Palm Harbor, FL

guppy_fish

Premium Member

Did you disable jumbo frames? that would seem like a good candidate for issues as it not universally supported by NICS ( last post in the thread you linked you said you had it enabled )
JoelC707
Premium Member
join:2002-07-09
Lanett, AL

JoelC707

Premium Member

I thought I had but I just checked my config to be sure (also wanted to check other settings) and it was still on. I just disabled it and also verified that flow control, loop detection and STP are all off (though I don't think it actually supports STP, the 2848 I had it connected to did).
Edit: Also just checked the server, jumbo frames are disabled on the physical adapters and not an option on the logical adapter.

PToN
Premium Member
join:2001-10-04
Houston, TX

PToN to JoelC707

Premium Member

to JoelC707
How about inter-VM communication, does it happen too when it's VM to VM..? Or just VM to other desktop?

I had a similar issue. Pings would work with some loss, and even remote desktop worked for a bit then it just would terminate the session. I looked at switch config, router config, firewall, etc. Ended up being a bad NIC in the server.

You mentioned you have LACP with the 4 NICs, it should technically be fine if one NIC fails; but maybe just try it one NIC at the time?
JoelC707
Premium Member
join:2002-07-09
Lanett, AL

JoelC707

Premium Member

All of the other VMs are mainly servers but a couple of them are Windows 7 so I could dump VLC on it and see what happens, hadn't thought of that yet.

It does happen to many other devices and not just Windows. It happens to my Xbox 360 as a media center extender and it happens to my Android phone via FTP (it can do SMB but FTP seems to be faster and more reliable on the phone).

I have changed the virtual switch to use one of the onboard NICs standalone and the LACP trunk went unused for a couple of weeks. It was still configured during this test though if that matters. Honestly, a glitch on LACP was my first thought, the NIC it was using glitched in some way and it moved to another NIC and couldn't maintain seamless connection. Taking the LACP trunk out of the mix though should have solved that (though if the single NIC is glitching as well then it's still going to be visible). I'll look again at the errors logged on the switch ports but there were very few (single digits) if any last time I looked.

The patch cables are brand new, maybe a year old but I've got boxes full of them ranging from identical to much older (bought about 50 cables from Monoprice). I'll try letting VLC play a video on a VM while I watch one on my desktop and see if either of them glitches, at least that will help identify the scope.
JoelC707

JoelC707 to PToN

Premium Member

to PToN
Click for full size
said by PToN:

How about inter-VM communication, does it happen too when it's VM to VM..? Or just VM to other desktop?

Looks like inter-VM is OK. VLC just cut off on me on my desktop but on a VM it's still going. This could explain why the work VMs aren't affected. The firewall is a VM and there's an internal only vswitch they are connected to so their traffic never leaves the VM world unless it goes to the Internet. The VM doing the playback is on an external vswitch along with my other VMs. Basically it's hitting the same vswitch either way but in the case of the desktop, it's actually leaving the host.

That seems to direct me to the HP switch (it could be the NICs on the host but since it does it whether it's on the add-in NIC or the onboard and with/without the LACP trunk seems to rule that out). And since it happens on multiple devices attached to the switch, unless they are ALL malfunctioning, the one common element is the switch.

I've checked the logs on the switch and do not see where the port to my desktop (port 1) is going down and it's not sending/receiving any bad packets or collisions (it is receiving some pause frames though). The same goes for the trunk (ports 7, 8, 9, 10). Ports, 2, 4, 20 and 22 are going up and down but that's just computers being turned on and off from what I can tell. I'll have to trace out to see what other ports go where to be sure about them all though.
rayik
join:2005-08-04
united state

rayik to JoelC707

Member

to JoelC707
said by JoelC707:

Intel Pro/1000 PT quad port NIC in 4-port LACP trunk to the switch (have also tested without the trunk in place and a single NIC including onboard NIC)

Since vm to vm connectivity is not effected, maybe it's the LACP set up.

Server 2012r2 does "NIC teaming" which is switch independent. No need to create LAG groups on the switch. Server 2012r2 manages everything. It's simple to create. On 2012r2 you want to select switch independent teaming. On the HP remove the LAG as you want just individual ports.

Everything is easier with a 2012r2 nic team. No switch configuration necessary. No need to use any 3rd party software / drivers on the physical nics. Windows manages the team. You can make the team available to vms and hyperv will take care on which physical nics need to be used.

(I can't figure out how to upload a picture where it is not on the top of the post I'll post team screenshots in the post below)
rayik

rayik

Member

Click for full size
JoelC707
Premium Member
join:2002-07-09
Lanett, AL

JoelC707 to rayik

Premium Member

to rayik
Thanks, no need for the pic, I know where it is. There is an LACP option in that configuration that is switch dependent but I can certainly try the switch independent option as well. I'm not 100% sure it's the LACP setup though since I have tried it with just a single NIC connected to the vswitch (the LACP setup was still in place just not used anywhere.

DarkLogix
Texan and Proud
Premium Member
join:2008-10-23
Baytown, TX

DarkLogix to rayik

Premium Member

to rayik
Doing teaming on the server side only covers a single direction of data.
JoelC707
Premium Member
join:2002-07-09
Lanett, AL

JoelC707

Premium Member

That was my thought as well and why I opted for LACP, especially since the switch is capable of it. IIRC the type is "hyper-v port" but I'll check when I get home to verify.

PToN
Premium Member
join:2001-10-04
Houston, TX

PToN to JoelC707

Premium Member

to JoelC707
Ok, cool. We now know the problem occurs when traversing from VM to the clients.

As i mentioned on my first post, if you have the time (which at this point it sure looks like what you have to do. ), remove 1 of the NICs on the host from any sort of trunk and see if it works, if not, re-add the NIC and remove another NIC and so on.

Note that when you remove the NIC, you might want to create a new network and vlan for that NIC to communicate with the streaming VM and the clients.

I am not too familiar with Hyper-V networking, but maybe you can directly assign the physical NIC to the VM and see if tthat works. Maybe some issue (acl, qos, etc) with the vswitch and client traffic.
kc8jwt
join:2005-10-27
Belpre, OH

kc8jwt to JoelC707

Member

to JoelC707
I can say look at your HP switch. I love the HP stuff, but the 1810 has issues with pushing large amounts of bandwidth. I had 5 of them in various labs at our high school and when we would image the labs about half way through, or sooner, the switch would freeze. Tried doing firmware updates, but it didn't help any. I ended up replacing them with HP 1910's and it works much better.

Believe it or not. I have an 1810 with basically 3 devices hanging off of it and from time to time the switch freezes and locks up.
JoelC707
Premium Member
join:2002-07-09
Lanett, AL

JoelC707 to PToN

Premium Member

to PToN
Looks like it's using "dynamic" for the load balancing method instead of hyper-v port or address hash. This was a new addition to 2012R2 and is apparently a blend of hyper-v port for inbound and address hash for outbound. It's apparently the recommended method to use in 2012R2. That said, I wonder if falling back to hyper-v port is better?

The bulk of the VMs don't need to be accessible from outside the host (and aren't as they are only on an internal network) so really a 4-port LACP trunk is overkill. I could certainly just dedicate one port to the file server since it needs the most bandwidth, one to everything else (firewall, DCs, etc.) and kill the LACP trunk entirely. This really doesn't explain why it still did it with a single NIC assigned to the vswitch instead of the trunk but I also figure it'll be easier to troubleshoot with less in the way (such as the NIC team).
said by PToN:

Note that when you remove the NIC, you might want to create a new network and vlan for that NIC to communicate with the streaming VM and the clients.

Actually if I do like I'm thinking above, the streaming VM (it's really what I use to do all my downloads with) would be on one port and the file server on another. That would mean it still needs to go out to the physical switch unless multiple vswitches have uplinks between them. That will help test/verify it's something external to the host.
JoelC707

JoelC707 to kc8jwt

Premium Member

to kc8jwt
I've not actually seen that personally but it is a low end switch so anything is possible. I've actually heard it doesn't have enough oomph to handle both jumbo frames and flow control at the same time. I have a second 1810 and a 2848 that I used to use together with more hosts and ran iSCSI over them and live migration traffic. It seemed to handle that load just fine but I had everything split between them so even if the 1810 did freeze up randomly I may not have noticed it unless it actually reset (the uptime counter would of course reset). FWIW, the switch in question now has an uptime of ~158 days so if it is freezing, it's recovering without a reset.