Trouble shooting malformed packets

Let’s say hypothetically you’re having an issue on your network, users are having trouble accessing files, browsing the web, everything really. You’re also experiencing significant ping loss.

Your best bet is to fire up a traffic sniffer. Back in the day, you’d have had to have paid 1000s for a decent traffic analysis tool. These days, Wireshark is probably a good a tool as you’ll ever need. On linux you also have tcpdump, which while not as capable, is sufficient for most applications, and has the advantage of running on the command line.

So, in our purely hypothetical scenario we might run tcpdump for a few seconds and see a bunch of packets like this:

$tcpdump 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
16:45:25.857702 ARP, Unknown (2048) 
	0x0000:  0001 0800 0604 0800 0604 0800 062b 0001  .............+..
	0x0010:  782b 0001 782b 0082 5add cb82 5add cb82  x+..x+..Z...Z...
	0x0020:  5a15 0a8c 7815 0a8c 7815 0a00 0000 0000  Z...x...x.......
	0x0030:  0000 0000 008c 0000 0a8c 0000 0a8c 008d  ................
	0x0040:  0000 828d 0000 828d 0000 0000 0000 0000  ................
	0x0050:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0060:  0000 0000 0000 0000 0000 0000 0000 0000  ................
...junk data with some stuff that kind of looks like text...
	0x0120:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0130:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0140:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0150:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0160:  0000 0000 0000 0000 0000 0000 00         .............
16:45:25.859144 14:30:00:30:14:30 (oui Unknown) Unknown SSAP 0x2a > 01:00:02:01:01:30 (oui Unknown) Unknown DSAP 0x14 Information, send seq 9, rcv seq 3, Flags [Response], length 365
16:45:25.859259 00:00:82:8d:00:00 (oui Unknown) > 00:8d:00:00:82:8d (oui Unknown) Null Information, send seq 0, rcv seq 0, Flags [Command], length 365
... some normal ssh data...
16:45:25.862577 0a:8c:00:00:0a:00 (oui Unknown) > 00:00:0a:8c:00:00 (oui Unknown), ethertype Unknown (0x828d), length 379: 
	0x0000:  0000 828d 0000 8200 0000 0000 0000 0000  ................
	0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0020:  0000 0000 0000 0000 0000 0000 0000 005e  ...............^
...junk data with some stuff that kind of looks like text...
	0x00e0:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x00f0:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0100:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0110:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0120:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0130:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0140:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0150:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0160:  0000 0000 0000 0000 0000 0000 00         .............
16:45:25.862594 ARP, Unknown (2048) 
	0x0000:  0001 0800 0604 0800 0604 0800 062b 0001  .............+..
	0x0010:  782b 0001 782b 0082 5add cb82 5add cb82  x+..x+..Z...Z...
	0x0020:  5a15 0a8c 7815 0a8c 7815 0a00 0000 0000  Z...x...x.......
	0x0030:  0000 0000 008c 0000 0a8c 0000 0a8c 008d  ................
	0x0040:  0000 828d 0000 828d 0000 0000 0000 0000  ................
	0x0050:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0060:  0000 0000 0000 0000 0000 0000 0000 0000  ................
...junk data with some stuff that kind of looks like text...
	0x0120:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0130:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0140:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0150:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0160:  0000 0000 0000 0000 0000 0000 00         .............
16:45:25.862786 06:04:08:01:78:2b (oui Unknown) Unknown SSAP 0x2a > 08:00:06:04:08:00 (oui Unknown) Unknown DSAP 0x78 
^C^C^C^C
Information, send seq 0, rcv seq 0, Flags [Final], length 482
3511 packets captured
51751 packets received by filter
48210 packets dropped by kernel

Oh dear, there are two things wrong here. Firstly these don’t look like normal packets. MAC addresses contain a ID which is unique to each vendor, (oui Unknown) means that ID hasn’t been issued to a vendor, all the MAC addresses are also different! The packets themselves can’t be decoded by tcpdump, and seem to contain junk.

Secondly there are a lot of these packets, they’re flooding the network, but they’re not going to this host, and they’re not broadcast packets… Because they’re going to mac addresses that don’t exist the switches don’t know which port to send these packets to and they just floods the network in the hope of finding the right host. In a normal situation this would be find, the first packet to the host would locate the host, and all subsequent packets could go directly to it. In this case however , the host is a) never found because it doesn’t exist and b) is different almost every time.

So, something on the network is throwing out junk packets. Flooding the network with traffic. Not only that but it’s flooding the network with new mac addresses. To understand why this is a problem you need to know a little bit about network switching.

Time was, we all used network “hubs”. Hubs were dumb devices, they took the traffic from each port and forwarded it to all the others. This was fine for small networks, and great for hackers (who could sniff all the traffic on your network easily). However we got smarter, and started making things called switches. Network switches look at each packet coming from each port, when they see a new mac address they add it to a list of packets they’ve seen on that port. So inside your switch there’s a list that looks like this:

Port 1 - 58:55:aa:fb:cc:29
Port 2 - 58:55:ad:aa:ac:19
Port 3 - 58:55:ae:fe:e1:21
Port 4 - 58:55:ea:eb:ce:2a
Port 5 - 58:55:aa:ab:ca:4b
Port 6 - 58:55:aa:fb:ac:b9 58:55:4a:f2:a2:16 58:55:62:a2:1d:15 58:55:1c:1e:16:63 58:55:12:a3:31:cd 58:55:1d:c2:16:93

Ports 1 to 5 each have one host attached to them. That’s normal, you’d normally plug one computer into each port, so you’d only see traffic coming from one device. But what’s happening on port 6? That’s the port you’ve connected to another switch. This is an important rule when reading mac address tables. Each port should have one mac address, unless it’s an uplink port going to another switch.

Now, with something on your network producing hundreds of mac addresses your mac address tables are going to get pretty ugly. This is not good. Mac address tables can get full, and this can cause even more problems.

Anyway, lets recap. We know there’s something producing just packets, with junk mac addresses and throwing them out on the network. We now need to track this device down and burn it (ideally after hitting it very hard with a hammer). Those screwy mac address tables come to our rescue!

The exact procedure will be different for every switch, but pick a switch at random and login. Then view the mac address table. On Dell 7048s this is available from Switching->Address Tables->Dynamic Address Tables, select all rows. You should see one port which has all those weird broken MAC addresses assigned to it.

You now need to know a bit about your physical network. If that port is actually connected to a single host, then bingo you’ve found the host generating all the crap traffic. If it’s connected to another switch you need to repeat the procedure on that switch.

That’s it, congrats! You’ve found the device, now remove and toast lightly over an open fire.

In order to automate the process of finding which port a mac address is connected to I’ve written a bunch of Perl scripts for dell switches which you can find here. They come almost completely without instructions. But will pull mac address tables from your switches, and let you search for a particular mac address. I’ve scripted this to interrogate all switches at a site and dump their mac tables to a file. However be warned, if your network is behaving strangely it may be difficult to access your switches remotely.