Finally! Useful Web Browsing Activity Reports.
Introducing Fastvue Site Clean
The modern web is a messy place.
You think you’re simply reading the latest news headline in your browser, but behind the scenes, your browser is actually visiting dozens of weird and wonderful sites such as akamaihd.net, doubleclick.net, google-analytics.com, and even facebook.com thanks to that handy ‘Like’ button on the page.
A typical website consists of advertising banners, site analytic scripts and social sharing widgets, and the site’s content itself is served from content delivery networks (CDNs).
Of course, no one ever visits these websites directly, yet they dominate the ‘Top Sites” in all the reports from your web gateway or firewall.
Simply aggregating the website domains from your web gateway’s log files, no longer tells us anything about web browsing activity!
But if we dig a little deeper and look at other characteristics of web browsing data, a more accurate picture of web browsing activity can be determined. And that is what Fastvue Site Clean is all about.
Now available in Fastvue Sophos Reporter (2.0) and Fastvue TMG Reporter (3.0 Beta).
How does it work?
All the messy web resources and domains that come along for the ride when visiting a web page have certain characteristics. Fastvue Site Clean looks for these characteristics by examining other log fields apart from the URL, such as Referrer URLs, Mime Types and Categories.
This technique is further backed up with a comprehensive list of regularly updated CDNs (aka Domain Substitutes) and Junk URLs, so that Fastvue Site Clean can calculate the Origin Domain of every URL in your web log files.
How do I use it?
The result of Fastvue Site Clean is found throughout the Reports, Dashboards and Alerts in any Fastvue application (currently Fastvue TMG Reporter or Fastvue Sophos Reporter).
Fastvue Site Clean Web Service
One of the great features of Fastvue Site Clean is that it incorporates a web service to automatically update your server with the latest definitions of CDNs (Domain Substitutes) and Junk URLs every day.
Fastvue’s web crawler regularly scans thousands of websites to identify CDNs, widget URLs, advertising, common API calls and more, and pushes these discoveries to your Fastvue Reporter server .
Customizing Fastvue Site Clean
Even with Fastvue Site Clean, there are a number of reasons why a CDN, widget, or advertising domain may still appear in your reports (See ‘The Uncleanable’ section below). Fortunately, there is an easy way to customize Fastvue Site Clean with your own individual discoveries.
Although not in the current release, you will also have the option of contributing your Domain Substitutes and Junk URL discoveries back to the Fastvue Site Clean service so that all Fastvue customers can benefit from each other's discoveries.
Editing Domain Substitutes
If you see a website in your reports that either needs to be represented differently (for example, convert a CDN such as fbcdn.net into the more recognisable domain such as Facebook.com), you can add it to your own list of Domain Substitutes. This is done in Settings | Site Clean | Domain Substitutes.
Note: Wildcard are accepted such as *.fbcdn.net or fbstatic*.akamaihd.net
Editing Junk URLs
If you see a website that you would prefer to remove from your reports entirely, you can add the domain or URL to the list of Junk URLs. This can be done in Settings | Site Clean | Junk URLs.
Note: an implicit wildcard is added to the end of the URL or domain specified. for example www.doubleclick.com will match www.doubleclick.com/advertising/. However wild cards are NOT currently accepted within the Junk URL itself (eg. *.doubleclick.com)
Things to Know
Fastvue Site Clean’s goal is to make web reporting simpler, and better reflect what is actually happening on the web from a human perspective. However, there is potential for confusion to arise in certain situations.
Categories and Productivity
Web Categories are assigned to individual web resources by your web gateway / UTM. A single site can therefore be categorised multiple ways due to the different types of web resources on the page. For example, the image to the right shows all the web resources from stackoverflow.com along with their category such as Web Ads, Internet Services, Technical Information, Business and Media Sharing.
You may therefore see the same sites appear in both the ‘Unproductive browsing by site’ and the ‘Productive browsing by site’ sections of your Overview Reports. Although this may seem confusing at first, you can click the ‘Clean (off)’ option, to view the site domain that was categorised as unproductive or productive.
Fastvue Site Clean depends on information in log data to determine the Origin Domain of a web resource. If there is not enough information in the log data to make this determination, the normal site domain is displayed – uncleaned.
The most common reason of an uncleaned or junky looking site ending up in your reports is when the Referrer URL is not logged, which is the case for HTTPS sites as the referrer header is not sent. The known list of CDNs (Domain Substitutes) can help pick up the slack here.
Fastvue Site clean also works best when the full URL is logged. For example, Fastvue Site clean can detect Facebook ‘like’ buttons by looking for URLs such as this:
However, when browsing HTTPS sites, only the site’s domain is logged. For example:
Fastvue Site Clean will therefore not detect a facebook like button in this situation. To improve this, try enabling HTTPS Inspection in your Web Gateway or UTM to log the full URL as though it was a normal resource transmitted over HTTP.
Fortunately, many CDNs and junk can be identified by the domain alone, and can be cleaned using the known list of Domain Substitutes and Junk URLs.