Finally! Useful Web Browsing Activity Reports.

Introducing Fastvue Site Clean

The modern web is a messy place.

You think you’re simply reading the latest news headline in your browser, but behind the scenes, your browser is actually visiting dozens of weird and wonderful sites such as akamaihd.net, doubleclick.net, google-analytics.com, and even facebook.com thanks to that handy ‘Like’ button on the page.

A typical website consists of advertising banners, site analytic scripts and social sharing widgets, and the site’s content itself is served from content delivery networks (CDNs).

Of course, no one ever visits these websites directly, yet they dominate the ‘Top Sites” in all the reports from your web gateway or firewall.

Simply aggregating the website domains from your web gateway’s log files, no longer tells us anything about web browsing activity!

But if we dig a little deeper and look at other characteristics of web browsing data, a more accurate picture of web browsing activity can be determined. And that is what Fastvue Site Clean is all about.

Now available in Fastvue Sophos Reporter (2.0) and Fastvue TMG Reporter (3.0 Beta).

Top Sites With and Without Fastvue Site Clean

How does it work?

All the messy web resources and domains that come along for the ride when visiting a web page have certain characteristics. Fastvue Site Clean looks for these characteristics by examining other log fields apart from the URL, such as Referrer URLs, Mime Types and Categories.

This technique is further backed up with a comprehensive list of regularly updated CDNs (aka Domain Substitutes) and Junk URLs, so that Fastvue Site Clean can calculate the Origin Domain of every URL in your web log files.

How do I use it?

The result of Fastvue Site Clean is found throughout the Reports, Dashboards and Alerts in any Fastvue application (currently Fastvue TMG Reporter or Fastvue Sophos Reporter).

Dashboards

The Top Websites shown in your Live Dashboard are automatically cleaned. You should notice a lack of advertising, CDNs, and so on.

Top Sites Clean

Overview Reports

Fastvue Site Clean is most noticeable in Overview and User Overview Reports. All sections that show a list of web sites now have options to show Clean (on), Clean (off) and Show Both.

As web gateways block and categorize web traffic at the individual web resource level (think an image or ad on a page, rather than the page itself), we felt it was important be able to see the domain that was actually blocked or categorized by your web gateway, in addition to the ‘cleaned’ Origin Domain determined by Fastvue Site Clean.

Fastvue Site Clean In Overview Reports

Activity Reports

Viewing a chronological list of when and how long websites were visited is now much easier and much more useful. The introduction of green gantt-chart-style bars embedded in the Activity Reports, provide an easy visual indication of the start time, end time and, and total browsing time spent on these Origin Domains for each hour of the day.

Fastvue Site Clean Activity Report

When running an Activity Report on a user, department or other similar object, the Origin Domain determined by Fastvue Site Clean will group all the individual web resources. When you click on a row in this report, you see all the individual web resources that are now grouped under the single Origin Domain, including CDNs, advertising, site analytics scripts and more.

Fastvue Site Clean Activity Report Showing Full Details About ProductHunt.com

Origin Domain Filter

Need to run a report on who accessed a particular site? Unless you know all the CDNs or ‘sister sites’ that the website used, you may not be reporting on all the data you need to. For example, to really report on youtube.com you would need to also include ytimg.com, googlevideo.com, you­tube-nocookie.co­m, youtubeeducation.com, and youtu.be in report’s your filter.

Now with the Origin Domain field, you can simply enter Origin Domain ‘Equal to’ youtube.com and be confident that the report is built from all data that originated from youtube.com, such as the streaming video content from googlevideo.com, the thumbnail images from ytimg.com and so on.

Origin Domain Filter

Fastvue Site Clean Web Service

One of the great features of Fastvue Site Clean is that it incorporates a web service to automatically update your server with the latest definitions of CDNs (Domain Substitutes) and Junk URLs every day.

Fastvue’s web crawler regularly scans thousands of websites to identify CDNs, widget URLs, advertising, common API calls and more, and pushes these discoveries to your Fastvue Reporter server .

Fastvue Site Clean Web Service - Domain Substitutes

Customizing Fastvue Site Clean

Even with Fastvue Site Clean, there are a number of reasons why a CDN, widget, or advertising domain may still appear in your reports (See ‘The Uncleanable’ section below). Fortunately, there is an easy way to customize Fastvue Site Clean with your own individual discoveries.

Although not in the current release, you will also have the option of contributing your Domain Substitutes and Junk URL discoveries back to the Fastvue Site Clean service so that all Fastvue customers can benefit from each other's discoveries.

Editing Domain Substitutes

If you see a website in your reports that either needs to be represented differently (for example, convert a CDN such as fbcdn.net into the more recognisable domain such as Facebook.com), you can add it to your own list of Domain Substitutes. This is done in Settings | Site Clean | Domain Substitutes.

Note: Wildcard are accepted such as *.fbcdn.net or fbstatic*.akamaihd.net

Fastvue Site Clean - Domain Substitutes for Facebook.com

Editing Junk URLs

If you see a website that you would prefer to remove from your reports entirely, you can add the domain or URL to the list of Junk URLs. This can be done in Settings | Site Clean | Junk URLs.

Note: an implicit wildcard is added to the end of the URL or domain specified. for example www.doubleclick.com will match www.doubleclick.com/advertising/. However wild cards are NOT currently accepted within the Junk URL itself (eg. *.doubleclick.com)

Fastvue Site Clean - Junk URLs for Facebook.com

Things to Know

Fastvue Site Clean’s goal is to make web reporting simpler, and better reflect what is actually happening on the web from a human perspective. However, there is potential for confusion to arise in certain situations.

Categories and Productivity

Web Categories are assigned to individual web resources by your web gateway / UTM. A single site can therefore be categorised multiple ways due to the different types of web resources on the page. For example, the image to the right shows all the web resources from stackoverflow.com along with their category such as Web Ads, Internet Services, Technical Information, Business and Media Sharing.

You may therefore see the same sites appear in both the ‘Unproductive browsing by site’ and the ‘Productive browsing by site’ sections of your Overview Reports. Although this may seem confusing at first, you can click the ‘Clean (off)’ option, to view the site domain that was categorised as unproductive or productive.

Activity Report showing all the web resources from Stackoverflow.com

The ‘Uncleanable’

Fastvue Site Clean depends on information in log data to determine the Origin Domain of a web resource. If there is not enough information in the log data to make this determination, the normal site domain is displayed – uncleaned.

Blank Referrer

The most common reason of an uncleaned or junky looking site ending up in your reports is when the Referrer URL is not logged, which is the case for HTTPS sites as the referrer header is not sent. The known list of CDNs (Domain Substitutes) can help pick up the slack here.

Truncated URLs

Fastvue Site clean also works best when the full URL is logged. For example, Fastvue Site clean can detect Facebook ‘like’ buttons by looking for URLs such as this:

http://www.facebook.com/plugins/likebox.php

However, when browsing HTTPS sites, only the site’s domain is logged. For example:

https://www.facebook.com

Fastvue Site Clean will therefore not detect a facebook like button in this situation. To improve this, try enabling HTTPS Inspection in your Web Gateway or UTM to log the full URL as though it was a normal resource transmitted over HTTP.

Fortunately, many CDNs and junk can be identified by the domain alone, and can be cleaned using the known list of Domain Substitutes and Junk URLs.

Contact Fastvue Support

Have a question about Fastvue Site Clean?

We're here to help! Let us know if you have any issues or questions!
Contact Fastvue Support