I recently wrote a post about the reasons folks should stop submitting their speed test results to their IT helpdesk, so I thought it might be useful for anyone out there on the IT side of things to know how I troubleshoot bandwidth related complaints. Keep in mind, there a ton of tools out there that might help with this kind of troubleshooting, but these two are my favorite, and usually get the job done.
LibreNMS
LibreNMS is an open source (FREE!!!) SNMP network monitoring solution. It runs on Linux (more FREE!!! software) and pretty much just works out of the box. It’s useful for many more monitoring applications than bandwidth utilization, and can monitor most anything that speaks SNMP. The best thing (to me) about it is how simple it is to install and operate.
For bandwidth troubleshooting it’s a straight-forward use case. I head to the management page of the network device for the office or network in question, find the appropriate port, and look at both historical and real-time bandwidth utilization graphs. The historical stuff is great, but I spend more time looking at “real-time” graphs in most situations.
I’m often combating a complaint that the office isn’t getting the bandwidth they think they should. In that case, I run a speed test from the network while looking at a real-time graph. If the graph plateaus around the same rate as their circuit is configured for, we know there are no issues with the circuit provisioning itself. Then I tend to look at the real-time graph on and off throughout the day to see if they’re going through periods of high utilization.
Scrutinizer by Plixer
Scrutinizer is a netflow collector, which means the network device in question needs to support and be configured to export netflow data to Scrutinizer (most enterprise class routers and firewalls support this). While I use the licensed version of their product, they do have a free version with limited history retention which may be fine for some. The latest version of their software runs on a Linux based virtual appliance which is a breeze to install if you’re familiar with deploying VM templates.
Back to the case example above…..Now that I’ve determined that an overuse of bandwidth exists by pinpointing it using LibreNMS, I need to find out if there are any cases of bandwidth misuse in play , or if the office has simply outgrown their existing circuit. Scrutinizer is my go to tool for this.
To see the specific conversations on the network, I run a report in Scrutinizer referred to by the software as “Pair Reports – Conversations Well Known Ports (WKP)”. There are a huge number of other report types available, but this is really the only one I need on a regular basis. After modifying the other parameters (bidirectional, rate, bits vs percentage, etc..) I can easily see the conversations sorted highest rate to lowest. From here you may still have to do a bit of investigation. Let’s say I see the majority of the bandwidth is between an external IP address/hostname that I don’t recognize to a single internal computer on the network over port 443. I’ll usually throw the unknown IP in the search bar at arin.net to see what organization it belongs to. Often times it’s Microsoft (maybe OneDrive), Dropbox, some sort of video streaming service, etc.. and even if I can’t pinpoint exactly what service that IP address belongs to, I know which computer on our network is involved and I can start there to figure out what’s going on, and report back to those who originally raised the concern or performance complaint.
Without tools like this, it’s virtually impossible to accurately troubleshoot bandwidth usage issues. Even worse, if you don’t know you have bandwidth misuse happening, it could lead you down the road of increasing bandwidth just to get relief. No company should be forking out extra recurring costs because employees are watching Netflix, or March Madness finals during the work day.
What about Solarwinds, WhatsUpGold, PRTG, [insert similar product here]???
It’s absolutely true that there are a number of monitoring packages out there that can do both SNMP monitoring, and netflow collection, along with a million other bells and whistles. If you use one of the these, and it works for you that’s awesome. I’ve tried some of them and I personally found them to do too many things. So much that I spent far more time configuring the products than using them. On the netflow side of the house, they seemed to be unable to present the data in a way that was really useful to me. The focus was more towards top talker reports rather than zeroing in on specific network conversations. After failing to justify the cost of these solutions in my own mind, I eventually settled on LibreNMS and Scrutinizer as my network monitoring workhorses.