Do you know what portion of your traffic comes from bots? Do you know which bots are good and which ones are bad? Do you know what the bad bots are up to?
We all know Googlebot and consider it a good bot. It crawls your site and makes it possible for others to find you via Google. Great! What do you know about Mars? No, not the red planet – Mars is one of the top bad bots that’s probably going through your site right now. What about all the other bad bots? In the simplest case, they could be consuming your bandwidth, using up your server resources, and increasing your monthly bill. They could also be stealing your content, modifying your pages to include viruses, or performing other malicious acts.
But while some bad bots are easy to spot, there are a lot more bad bots pretending to be regular human-controlled browsers, making them very hard to detect. This is where Access Watch comes into play. Access Watch, a startup from Berlin, deploys the industry’s most precise robot intelligence that can be easily plugged into any existing data pipeline, such as those handling web server logs.
Web server logs typically contain the following information:
– IP Address of the client
– Request URL
– User agent
– HTTP protocol version
– HTTP method etc.
Web server logs are often analyzed purely for getting web access statistics – most popular pages, top countries, etc. Sometimes web server logs are enriched with GeoIP information to get a bit more information about visitors. Using a threat intelligence databases we could figure out a lot more about our “visitors”, some of which are not visitors at all, but malicious bots. Some IP addresses are known to spread viruses or are abused to execute hacker attacks. Many attacks have a typical fingerprint – a combination of URL, header fields, user agent and IP address from a blacklisted server. The relevant information changes frequently and accurate classification requires real-time access to a threat intelligence database. Access Watch REVEAL is able to identify good and malicious web traffic and provides this information via HTTP API.
Enriched web server log with request reputation and threat analysis from Access Watch API call.
All we need to do to get accurate threat intelligence information is call the Access Watch API with information from our web server logs, and then store the enriched web server log to visualize and analyze the malicious traffic. To make this super simple, Logagent users can use the new Access Watch plugin to perform security and traffic analysis, store that in Sematext Cloud or any place else (e.g. their own Elasticsearch cluster) for further analysis, visualization, etc.
Visualisation of bot traffic with bad reputation
Combining real-time security analysis of logs with alerting and ChatOps integration one is able to receive real-time alerts about malicious traffic and take countermeasures like blocking specific clients. Another obvious application of data gained via Access Watch is the exclusion of all bot traffic prior to website traffic analysis in order to get more accurate statistics.
Visualisation of enriched logs in Kibana
To make this super simple, Logagent users can use the new Access Watch plugin to perform security and traffic analysis, store that in Sematext Cloud or any place else (e.g. their own Elasticsearch cluster) for further analysis, visualization, etc.
Using Access Watch and Sematext provides you with:
- Detection of all robotic behaviour, good and bad, profiled, and threat assessed
- Clear and precise insights into the makeup of your traffic
- Knowledge of what robot activity comes from search engine crawlers, feed readers, price or data scrapers as well as abusive activity from brute force bots and more