The rather broad definition of personal data in the GDPR requires paying special attention to log data. GDPR and personal data in web server logs is a popular topic in many GDPR fora. For example, IP addresses or cookies might be considered personal data. Consequently, such data must be stored only with the consent of customers for a limited time. It is highly recommended to anonymize personal data before you hand over the logs to any 3rd party to minimize risk. A good example is the anonymization of IP addresses before you send data to Google Analytics.
Note that cloud and SaaS providers can’t take the full responsibility for data you send to them for storage or analytics. In the GDPR world, the service provider often has two roles. The provider typically acts as “Data Controller” for your personal data (name, address, e-mail, phone number, etc.). For your content, such as your logs, the role of “Data Processor” might be applicable, in which case you are the “Data Controller” and you are responsible for logs you send to the cloud service provider.
So what are the best practices that will help you win the GDPR fight?
1. Centralize log storage
Centralize your log storage. This lets you apply policies in one place. Centralizing logs reduces the complexity and risk of maintaining policies in multiple places. Most log management services support retention policies per data source. You should define a reasonable retention time for every log source.
2. Delete local logs from your servers (periodically)
Duplicated data could create problems when enforcing policies. Therefore, you should make sure that logs stored in a central place are removed from local servers as soon as possible. Logrotate is a common tool used to delete logs periodically (weekly by default). A log shipper streams the logs to the centralized log storage in near-realtime.
3. Structure your logs
You can structure logs with parser rules in a log shipper configuration. Structured logs make it easier to mask or anonymize sensitive data as we point out in the next step. Wherever possible, applications should log directly in a structured format like JSON. Using a structured log format saves human time needed to create parser rules, as well as CPU cycles for processing.
4. Anonymize sensitive data fields in logs
5. Encrypted logs in transit
Use only encrypted channels to transmit log data to a central storage. Logs are often shipped unencrypted with Syslog/UDP for performance reasons. That is bad practice. Do not do that. Configure your syslog servers for TLS connections. If you use Elastic Stack, secure Elasticsearch and check X-Pack alternatives.
Free eBook: A Quick Guide to Logging Basics
Looking to replace Splunk or a similar commercial solution with Elasticsearch, Logstash, and Kibana (aka, “ELK stack” or “Elastic stack”) or an alternative logging stack? In this eBook, you’ll find useful how-to instructions, screenshots, code, info about structured logging with rsyslog and Elasticsearch, and more.
Some of the above are general logging best practices one should follow. With the arrival of GDPR it becomes essential to follow them in order to protect your organization from potential legal issues. Furthermore, if you are storing European data on servers outside EU, you are effectively exporting the PII (Personally Identifiable Information). European law does not allow exporting of user “personally identifiable information” unless companies can demonstrate they will protect European users’ privacy and data. Thus, if you are shipping your logs to a log management service, using a logging service in the EU is another best practice to consider. This is precisely why we’ve launched Sematext Cloud Europe back in 2017. For more on logging follow @sematext. If you are into logging, check out our other logging blog posts and our Elasticsearch for Logging class.