As Sophos UTM is a full layer 7 firewall, it does a lot more than simply allow or drop traffic based on the source or destination like a traditional 'layer 4' firewall.
Sophos UTM evaluates full HTTP GET requests, determines if the user or subnet has access, if the site is restricted or not, if it contains malware, if the reputation recently dropped, and it can even do HTTPS inspection on the traffic. It makes decisions based on multiple factors, all while maintaining VPN connections, filtering spam, securing web applications, and performing many other roles.
As you can imagine, it takes some serious processing power to do all of this this at scale.
If you are using a virtual appliance and your CPU comes under pressure, you could simply add more CPUs or reduce the CPU demand from other systems on the host. But if you are using a hardware device, managing resource usage is a little more challenging. For this reason, Sophos UTM CPU performance monitoring becomes a critical component in managing your Sophos UTM investment.
This article provides some tips on how gain accurate real-time visibility into Sophos UTM's resource usage, and how to reduce the CPU usage of Sophos UTM's web protection feature.
Over time, your configuration can gain complexity as you use the UTM in different ways. This in turn can lead to inefficient Web protection rules that can unnecessarily use CPU resources. Once your UTM starts hitting CPU congestion, things start to go downhill. The UTM may start to behave erratically and users could experience proxy connection delays and rejection errors. The problem is that simply looking at the UTM management console won't tell you what is happening.
The Resource Usage widget on the Dashboard has indicators for CPU, RAM and Disk. The problem is that the CPU indicator is averaged and only updates as frequently as you set the dashboard refresh interval.
The other method of tracking CPU utilization on the UTM is by looking at the hardware log. Again, it can be useful to give you an overall view, but it won't help you identify sporadic peaks that could be causing issues. As with the Resource Usage widget, the graph is averaged out over time, so short bursts get hidden in the noise.
In the graph below you can see a spike. To hit an 'average' load maximum of 99.32% for a single instance, the CPU would have been stuck at (or around) 100% for quite a while. From the graph you can also see a large drop in CPU usage immediately following the spike. In this case, the drop correlates to when the UTM was rebooted, following the errors network users experienced/reported.
To get a real time accurate view of what the CPU is doing on the UTM, you are going to have to hit the shell. First, enable shell access by following these steps:
The application we are going to use is called top. It is a Linux / Unix tool that shows you the processes and system resources, much like Windows Task Manger. When you run top for the first time you are presented with the screen below. All the useful info is in there, but it may require a little explaining.
There are a number of CPU counters, but the labels aren't very helpful. For your convenience here's a more detailed explanation:
You might also notice that we have a single process that is consuming 298% of the CPU?! The simple explanation is that we are looking at a multi-core system using multi-threaded processes.
Fortunately you can improve the display to make the resource usage statistics a little easier to read:
If your UTM is running low on CPU you will notice spikes in the %us field. With the higher refresh rate you may now see short bursts of high usage that could be the root cause of some of your issues.
The simplest way to confirm how much CPU you actually have at any given time is by running lscpu:
Generally speaking, the highest load will be generated by the http proxy, which usually correlates to settings in the Web Protection module. Key settings that can have a direct impact on CPU usage include single or dual malware engine scanning, HTTPS inspection, and exclusions.
Now that you have an accurate view of what the CPU is doing, you can make changes to the system and watch the effect.
I used Fastvue Sophos Reporter's Web Protection dashboard to show which Web Protection policies had the most matched hits. Making changes to the policies that deal with majority of traffic is a great place to start in order to make the largest improvement to CPU usage.
In this particular case the issue was simply the ordering of the filter profiles. I noticed there were Web filter profiles higher up the list than the profile being used by the bulk of the connections. This meant that the traffic was evaluated against multiple profiles before the matching profile was used.
Simply by reorganizing the profiles I was able to significantly reduce the amount of CPU that was required. The system stabilized and has been far more robust from a user browsing perspective.
Checking again the following day I could see that the UTM was running nice and smoothly having lowered the overall CPU usage. So much so that it even showed up on the CPU graph!
Using the right tools to manage Sophos UTM allows you to extract the most value from it. Sophos UTM's management console is a great interface and Sophos have managed to put a lot of pertinent information in there. However, if you need more real-time or more detailed information, it is great to know you can turn to the shell and Fastvue.
Hopefully this article has provided a few Sophos UTM CPU performance monitoring tips, and shown how the right combination of tools can lead to a faster resolution of any performance issues you might experience on Sophos UTM.
Download our FREE 30-day trial, or schedule a demo and we'll show you how it works.
Active Directory SSO Authentication in Transparent Proxy Mode
Deploying Endpoint Protection with Sophos UTM and Enterprise Console