How to handle PII personally identifiable information in software applications

PII (personally identifiable information) handling in software applications comes down to four principles: minimize collection (don't collect data you don't need), encrypt everywhere (at rest and in transit, no exceptions), control access (least-privilege access to PII, with audit logging), and enable deletion (you must be able to delete a specific user's data when requested, not just soft-delete but actual removal from databases, backups, logs, and analytics). Most engineering teams fail on the fourth point, they can create user data easily but can't fully delete it because it's scattered across databases, log files, analytics systems, and third-party integrations.

PII Handling for Engineers: A Practical Guide to Not Getting Your Company Sued

A client received a GDPR data deletion request from a European customer. "Delete all my data." Simple request. Nightmare execution. The customer's data was in the production database, three backup snapshots, the application logs (which included their email address in request headers), the analytics platform (Mixpanel), the email marketing tool (Mailchimp), the customer support system (Intercom), and the error tracking service (Sentry, which captured their user ID in error reports).

It took the engineering team two weeks to find and delete all instances. For one customer. They had 12,000 customers.

This is the PII problem that most engineering teams don't think about until it's urgent: collecting personal data is easy. Managing and deleting it is hard. And the regulations (GDPR, CCPA, state privacy laws) require you to do the hard part.

What Counts as PII

PII is any information that can identify a specific individual, either directly or in combination with other data.

Obviously PII: Name, email address, phone number, Social Security number, driver's license number, passport number, credit card number, home address, date of birth, biometric data.

Less obviously PII: IP addresses (can identify individuals, especially combined with timestamps), device identifiers, location data, cookies that track behavior across sessions, employee IDs, student IDs, and, this one catches people, any unique identifier that can be linked back to an individual through another dataset.

The combination problem: Data that isn't PII alone can become PII in combination. "Male, age 34, ZIP code 14850" isn't individually identifying. But in a small ZIP code, that combination might identify one person. Anonymization is harder than it looks.

The Four Engineering Principles

Minimize collection. Before adding a field to your database schema, ask: do we actually need this? If you're collecting date of birth but only need to verify the user is over 18, collect a boolean "is_over_18" instead. If you're collecting a full street address but only need the zip code for regional features, collect only the zip code.

Data you don't collect can't be breached, can't be mishandled, and doesn't need to be deleted. Minimization is the single most effective PII protection strategy.

Encrypt everywhere. Encryption at rest (database encryption, encrypted file storage) and encryption in transit (TLS/HTTPS for all connections) are baseline requirements for every compliance framework. Modern cloud providers make this the default, but verify it. I still find clients with unencrypted database connections between their application servers and their database because "they're in the same VPC."

For particularly sensitive PII (Social Security numbers, financial account numbers), use field-level encryption with application-managed keys, not just database-level encryption. This means even a database administrator can't read the sensitive fields without the application's decryption key.

Control access. Not every engineer needs access to production PII. Not every customer support agent needs to see a customer's full profile. Implement role-based access control that limits PII access to the minimum necessary for each role.

Log every access to PII-containing data. Not just writes, reads too. When an auditor or regulator asks "who accessed customer X's data in the last 90 days?", you need to be able to answer precisely. This logging also serves as a deterrent: people handle data more carefully when they know their access is recorded.

Enable deletion. This is the hardest principle to implement retroactively. You need to be able to find and delete a specific user's PII across every system that stores it. That means your production database (cascade deletes, not just soft deletes), your database backups (either exclude PII or have a process for purging specific records from backup restores), your application logs (either don't log PII or have a log scrubbing pipeline), your analytics platforms (Mixpanel, Amplitude, Google Analytics, can you delete a specific user's events?), your third-party integrations (email tools, support tools, error tracking, do they support deletion APIs?), and any data warehouses or analytics databases that received data exports.

Build the deletion capability early. Retrofitting it onto a system that scattered PII across 15 services is the engineering equivalent of finding every grain of sand you dropped in a swimming pool.

The Log Problem

Application logs are the most common PII leak. Engineers log request data for debugging, and that request data contains email addresses, IP addresses, user agents, and sometimes form data. These logs are often stored in systems (ELK stack, CloudWatch, Datadog) with broad access and long retention periods.

Two approaches: don't log PII (sanitize logs at the application level, replacing PII with tokens or hashes before the log statement executes), or implement log retention policies that automatically delete logs after a defined period (30-90 days is typical for compliance).

The first approach is better. If PII never enters the log stream, you don't need to worry about log retention, log access controls, or log-level deletion requests.

Annual PII Audit

Once a year, map where PII lives in your systems. Every database, every log store, every third-party service, every backup system. Compare the current map to your privacy policy. If your privacy policy says "we collect your email to provide our service" but your email address appears in seven systems including an analytics tool and an error tracker, your privacy policy doesn't match your reality.

This audit typically takes 1-2 days for a small company and reveals at least one surprise, PII in a system nobody realized was collecting it. Better to find it yourself than during a regulatory investigation.