Loading module...

Data Classification & Handling | Top 10 Dev Training

GSA-04

Data Classification & Handling

Identify the types of data your organization handles, understand the rules governing each classification level, and avoid accidental data exposure.

Audio overview (optional)

AI-generated summary. The Read tab has the full written content, which is the primary source and what the quiz is based on.

Transcript

Not all data carries the same risk. Your company's lunch menu and your customers' Social Security numbers both live somewhere in your systems, but they require very different levels of protection. Data classification is the system your company uses to sort information into categories based on sensitivity, and then apply the appropriate security controls to each category.

This isn't just organizational housekeeping. It's a compliance requirement. SOC 2's Confidentiality Trust Services Criteria requires organizations to identify and protect confidential information. Auditors expect to see a documented classification policy, evidence that employees understand it, and proof that data is handled according to its classification.

Most organizations use a four-tier classification system. The exact names vary from company to company, but the logic is consistent.

Restricted is your most sensitive data. Exposure could result in regulatory penalties, legal liability, or direct financial harm to individuals. Customer Social Security numbers, payment card numbers, protected health information, encryption keys, authentication credentials, production database access tokens, and merger documents before public announcement. Restricted data should be encrypted at rest and in transit. Access requires explicit authorization and is logged. Never shared via unencrypted channels.

Confidential is sensitive business information that could cause significant harm if exposed. Customer lists, employee compensation data, internal financial reports, product roadmaps, source code, proprietary algorithms. Encrypted in transit. Shared only with authorized personnel, and when shared externally, only under a nondisclosure agreement.

Internal is information intended for use within the company. Meeting notes, org charts, project plans, training materials. Shared freely within the company using approved tools. Not posted publicly.

Public is information intentionally available to anyone. Blog posts, marketing materials, press releases. No access restrictions, but once something is public, it can't be made private again.

Understanding the classification levels is the framework. Knowing the specific types of sensitive data is what makes the framework actionable.

Personally identifiable information, or PII, is any data that can identify a specific individual. Direct identifiers like name, Social Security number, email, phone number, biometric data. Indirect identifiers like date of birth, ZIP code, job title, or IP address, which can identify someone when combined. The average cost per compromised PII record in 2025 was approximately one hundred sixty dollars.

Protected health information, PHI, is PII linked to health care. Medical records, prescription history, insurance claims. Governed by HIPAA. Health care breaches are the most expensive of any industry, averaging seven point four two million dollars per incident in 2025. HIPAA fines run up to two point one three million dollars per violation category per year.

Financial data splits into two regulatory categories. PIFI, personally identifiable financial information, covers bank accounts, transaction histories, loan information, account balances. Governed by SEC Regulation S-P and the Gramm-Leach-Bliley Act. And PCI, payment card industry data, covers card numbers, cardholder names, expiration dates, CVV codes. Governed by the PCI Data Security Standard. If your company accepts card payments, PCI DSS compliance isn't optional. A breach can cost you the ability to process payments entirely.

Company financial data is a category people tend to under-classify because there's no regulatory acronym forcing the issue. Revenue, burn rate, customer contract values, fundraising details, cap tables, commission structures. For a startup, a leaked burn rate can spook investors. A leaked pricing model hands your go-to-market strategy to a competitor. Treat company financial data as Confidential at a minimum, Restricted during active fundraising or M and A activity.

Intellectual property: source code, architecture documents, algorithms, unreleased designs, pricing models, customer acquisition strategies. IP theft averaged one hundred seventy eight dollars per record in 2025, the highest per-record cost of any data type.

Most data exposure isn't a sophisticated hack. It's a normal person making a normal mistake during a normal workday. Five common scenarios worth knowing.

First, sending to the wrong recipient. Email autocomplete fills in the wrong Sarah. A revenue report goes to a vendor contact instead of a colleague. Slow down on send. Double-check the recipient field when the email contains attachments or sensitive data.

Second, oversharing in collaboration tools. Pasting a customer API key into a Slack channel with forty people to troubleshoot an issue. Sharing a Google Drive folder with "anyone with the link" for convenience, not realizing the folder contains confidential contracts. Treat Slack and Drive with the same care as email.

Third, uploading to the wrong place. Pushing code to a public GitHub repository with hardcoded credentials. Uploading a customer case study with real revenue numbers to a public-facing CMS instead of an internal staging environment. Never hardcode credentials. Use environment variables or a secrets manager.

Fourth, pasting sensitive data into AI tools. The fastest-growing exposure vector in 2026. An employee pastes a customer contract or source code into a public AI assistant for analysis. That data may be stored by the AI provider, used for training, or accessible to provider employees. Never paste Restricted or Confidential data into a public AI tool unless it's been specifically approved by your security team. We'll cover AI tools in depth in module eight.

And fifth, forgetting to revoke access. An employee leaves but their access to shared folders and SaaS tools persists for weeks. A contractor's project ends but their credentials stay active.

When you're unsure how to handle a piece of data, walk through four questions. What classification level is this. Who is authorized to access it. Am I using an approved channel. And what happens after I share it. If any of these gives you pause, that pause is the security control working. When in doubt, default to more protective. It's always easier to open access than to undo an exposure.

Quiz & Certification

Training content is freely available. Sign in and get a training credit to take quizzes, track progress, and generate compliance reports.

Attestation not yet available

Complete and pass every module quiz in this course to unlock the final attestation. Once all modules are done, this tab will open for signing.

Module 4: Data Classification & Handling

General Security Awareness Training
Estimated Time: 15 minutes

Learning Objectives

By the end of this module, you will be able to:

Explain why not all data is treated the same and why classification matters for SOC 2 compliance
Identify the four standard classification levels and give examples of data in each
Distinguish between PII, PHI, financial data, and intellectual property, and describe the obligations that come with each
Apply the correct handling rules for sharing, storing, and disposing of data at each classification level
Recognize the most common ways data gets accidentally exposed and how to prevent them

Why Classification Matters

Not all data carries the same risk. Your company's lunch menu and your customers' Social Security numbers both live somewhere in your systems, but they require very different levels of protection. Data classification is the system your company uses to sort information into categories based on its sensitivity, then apply the appropriate security controls to each category.

This isn't just organizational housekeeping. It's a compliance requirement. SOC 2's Confidentiality Trust Services Criteria requires that your organization identify and protect confidential information. Auditors expect to see a documented classification policy, evidence that employees understand it, and proof that data is handled according to its classification. When a breach exposes data that should have been classified as confidential but was stored or shared as though it were internal, the auditor's question is straightforward: "Did your people know the rules, and did they follow them?"

Classification also drives cost-effective security. You can't put maximum protection on everything; the budget and friction would be unsustainable. What you can do is put maximum protection on the data that would cause the most damage if exposed and apply proportionate controls to everything else. That's what a classification system enables.

The Four Classification Levels

Most organizations, including those pursuing SOC 2, use a four-tier classification system. The exact names vary from company to company, but the logic is consistent. Your company's policy may use slightly different terminology, so check your internal documentation for the specific labels that apply to you.

Restricted

This is your most sensitive data. Exposure could result in regulatory penalties, legal liability, loss of customer trust, or direct financial harm to individuals. Access is limited to a small number of specifically authorized people with a documented business need.

Examples: Customer Social Security numbers, payment card numbers, protected health information (PHI), encryption keys, authentication credentials, production database access tokens, and merger/acquisition documents before public announcement.

Handling rules: Encrypted at rest and in transit. Access requires explicit authorization and is logged. Never shared via email, Slack, or other unencrypted channels without approved safeguards. Subject to the strictest retention and disposal policies.

Confidential

Sensitive business information that could cause significant harm to the company or its customers if exposed to unauthorized parties. Broader access than Restricted, but still limited to employees with a legitimate need to know.

Examples: Customer lists and contact information, employee compensation data, internal financial reports, product roadmaps, vendor contracts, security audit results, source code, and proprietary algorithms.

Handling rules: Encrypted in transit. Stored in access-controlled systems. Shared only with authorized personnel and, when shared externally, only under a nondisclosure agreement or equivalent contractual protection. Not to be stored on personal devices without IT approval.

Internal

Information intended for use within the company but not meant for public distribution. Exposure wouldn't cause severe harm, but it's still not something you'd want competitors, the press, or unauthorized individuals to access.

Examples: Internal meeting notes, organizational charts, project plans, internal policies, training materials, and employee directories.

Handling rules: Shared freely within the company using approved tools. Not posted publicly or shared with external parties without review. Reasonable access controls applied, but encryption isn't always required.

Public

Information that is intentionally available to anyone. No harm results from broad distribution.

Examples: Published blog posts, marketing materials, press releases, job postings, and public-facing product documentation.

Handling rules: No access restrictions. Should still be reviewed for accuracy before publication. Once something is public, it can't be made private again.

The Types of Sensitive Data You Need to Know

Understanding the classification levels is the framework. Knowing the specific types of sensitive data you encounter in your job is what makes the framework actionable. Here are the categories that matter most for SOC 2 compliance and general security.

Personally Identifiable Information (PII)

PII is any data that can be used to identify a specific individual, either on its own or in combination with other data. This is the broadest and most common category of sensitive data in most SaaS companies.

Direct identifiers (identify someone on their own): Full name, Social Security number, driver's license number, passport number, email address, phone number, biometric data (fingerprints, facial recognition), and financial account numbers.

Indirect identifiers (identify someone when combined): Date of birth, ZIP code, job title, IP address, device IDs, and demographic information. A ZIP code alone isn't PII. A ZIP code combined with a date of birth and gender can narrow identification to a single individual in most of the U.S. population.

Why it matters: PII exposure triggers regulatory obligations (state breach notification laws, GDPR, CCPA), damages customer trust, and creates legal liability. The average cost per compromised PII record in 2025 was approximately $160, and that adds up fast at scale.

Protected Health Information (PHI)

PHI is PII that is linked to health care. If your company handles any data related to a person's medical history, treatment, diagnosis, insurance, or health care payment, that data is PHI and is governed by HIPAA.

Examples: Medical records, prescription histories, insurance claim data, lab results, therapy notes, and any PII that is associated with health care services (a patient's name and appointment date, for instance).

Why it matters: Health care breaches are the most expensive across all industries, averaging $7.42 million per incident in 2025. HIPAA violations carry fines of up to $2.13 million per violation category per year.

Personally Identifiable Financial Information (PIFI) and Payment Card Data (PCI)

Financial data falls into two overlapping but distinct regulatory categories. Understanding the difference matters because each carries its own compliance obligations.

PIFI (Personally Identifiable Financial Information) is the broader category. Defined under SEC Regulation S-P and rooted in the Gramm-Leach-Bliley Act, PIFI covers any nonpublic data a consumer provides to obtain a financial product or service, or that results from a financial transaction. If your company handles customer billing, invoicing, or financial account information, you likely touch PIFI.

PIFI examples: Bank account and routing numbers, transaction histories, loan or credit information, account balances, Social Security numbers in a financial context, and tax identification numbers.

PCI (Payment Card Industry) data is a narrower, more specific category governed by the PCI Data Security Standard (PCI DSS). It covers the data elements directly tied to credit and debit card transactions.

PCI examples: Primary account numbers (the card number itself), cardholder names, card expiration dates, CVV/CVC codes, and PIN data.

Why both matter: PIFI exposure can result in identity theft, regulatory fines under the GLBA, and loss of consumer trust. PCI exposure can result in all of the above plus direct monetary theft and, critically, loss of the ability to process credit card payments, which for a SaaS company can be existential. If your company accepts card payments, PCI DSS compliance isn't optional.

Company Financial Data

Not all sensitive financial information falls under PIFI or PCI. Your company also generates and handles internal financial data that carries no specific regulatory mandate but could cause serious business harm if exposed. This is the category people tend to treat too casually because there's no acronym or compliance framework forcing the issue.

Examples: Revenue figures, annual recurring revenue (ARR), burn rate and runway projections, customer contract values and pricing terms, fundraising details, cap tables, investor communications, board decks, commission structures, compensation models, vendor pricing, and financial forecasts.

Why it matters: For a startup, a leaked burn rate can spook investors. A leaked pricing model can hand a competitor your entire go-to-market strategy. Board decks shared outside authorized channels can derail a funding round. None of this data triggers a regulatory notification the way a PII breach does, but the business damage can be just as severe. Treat company financial data as Confidential at a minimum, and Restricted when it involves active fundraising, M&A activity, or board-level strategy.

Intellectual Property (IP)

Proprietary information that gives your company a competitive advantage. This is the category people most often forget to classify because it doesn't come with the same regulatory requirements as PII or PHI.

Examples: Source code, product architecture documents, proprietary algorithms, unreleased feature designs, pricing models, customer acquisition strategies, and trade secrets.

Why it matters: IP theft averaged about $178 per record in 2025, the highest per-record cost of any data type. Unlike PII breaches, IP theft may not be detected for months or years, and the competitive damage is often irreversible.

How Accidental Exposure Actually Happens

Most data exposure isn't the result of a sophisticated hack. It's the result of a normal person making a normal mistake during a normal workday. Here are the most common scenarios.

Sending to the Wrong Recipient

You're emailing a report and autocomplete fills in the wrong "Sarah." The report containing customer revenue data goes to a vendor contact instead of your colleague. This is one of the most common causes of data incidents, and it happens because email autocomplete works against you when multiple contacts share similar names.

Prevention: Slow down on the send. Double-check the recipient field, especially when the email contains attachments or sensitive data. If your email client supports it, enable a brief send delay (even 10 seconds creates a window to catch mistakes).

Oversharing in Collaboration Tools

You paste a customer's API key into a Slack channel to troubleshoot an issue. That channel has 40 people in it, most of whom don't need access to that key. Or you share a Google Drive folder with "anyone with the link" to make it easier for a colleague to access, forgetting that the folder also contains confidential contract terms.

Prevention: Treat collaboration tools with the same care as email. Don't paste credentials, keys, or sensitive data into shared channels. Use direct messages or dedicated secure channels for sensitive troubleshooting. Set the most restrictive sharing permissions first, then open access only as needed.

Uploading to the Wrong Place

A developer pushes code to a public GitHub repository without realizing it contains hardcoded database credentials. A marketing team member uploads a customer case study draft (with the customer's real revenue numbers) to a public-facing content management system instead of the internal staging environment.

Prevention: Never hardcode credentials in source code; use environment variables or a secrets manager. Review what you're uploading and where it's going. Public and internal environments should be clearly separated, and the default should always be the more restrictive option.

Pasting Sensitive Data into AI Tools

This is the newest and fastest-growing exposure vector. An employee pastes a customer contract, a snippet of source code, or an internal financial report into a public AI assistant to get a summary or analysis. That data may now be stored by the AI provider, used to train future models, or accessible to the provider's employees. Depending on the data and the tool, this can constitute a breach.

Prevention: Follow your company's AI acceptable use policy (Module 8 covers this in detail). Never paste Restricted or Confidential data into a public AI tool unless that tool has been specifically approved for that classification of data by your security team.

Forgetting to Revoke Access

An employee leaves the company, but their access to shared Google Drive folders, Slack channels, and third-party SaaS tools isn't removed for weeks. A contractor's project ends, but their credentials remain active. This isn't technically an "accident" in the moment, but the cumulative effect is unauthorized access to data that persists long after it should have been revoked.

Prevention: This is primarily an IT/security team responsibility (Module 5 covers access control in depth), but every employee can help by flagging when a colleague departs or a contractor's engagement ends.

Data Retention and Disposal

Keeping data longer than necessary increases risk without adding value. Every record you retain is a record that could be exposed in a breach. Data retention policies define how long each category of data should be kept and what happens to it when the retention period expires.

Why it matters for SOC 2: Auditors expect to see evidence of a data retention policy and proof that the organization follows it. Keeping customer PII for five years after the customer cancelled their account isn't just sloppy. It's a compliance gap.

Your responsibilities:

Know the retention rules for the data you handle. If you're not sure how long something should be kept, ask.
Don't create unnecessary copies. Every copy of a sensitive document is another copy that needs to be tracked, secured, and eventually destroyed.
Dispose of data securely. Deleting a file from your desktop doesn't erase it from existence. Your company should have processes for secure deletion and media destruction. For physical documents, use designated shredding bins.
Don't hoard data "just in case." If the retention period has passed and there's no business or legal requirement to keep it, it should be destroyed. Data you don't have can't be breached.

A Simple Decision Framework

When you're unsure how to handle a piece of data, walk through these four questions:

1. What classification level is this? Check your company's data classification policy. If you're not sure, treat it as Confidential until you can confirm.

2. Who is authorized to access it? If the person you're about to share it with doesn't have a legitimate business need, don't share it. When in doubt, ask your manager or security team.

3. Am I using an approved channel? Restricted data should never travel through unapproved tools. Confidential data should be encrypted in transit. If the channel feels informal (a text message, a personal email, an unapproved AI tool), it's probably not the right one.

4. What happens after I share it? Think about the lifecycle. Will the recipient store it securely? Will they know to delete it when it's no longer needed? If you're sharing externally, is there a contractual obligation (NDA, DPA) in place?

If any of these questions gives you pause, that pause is the security control working. Slow down, verify, and ask if you need to.

Key Takeaways

Not all data is created equal. Classification systems (Restricted, Confidential, Internal, Public) ensure that each type of data receives protection proportionate to the harm its exposure would cause.
Know the data types. PII, PHI, financial data, and intellectual property each carry different obligations. PII is the broadest category and the one you're most likely to encounter.
Most data exposure is accidental. Wrong recipients, oversharing in collaboration tools, uploading to the wrong environment, pasting into AI tools, and failing to revoke access are the most common causes. None of them require a hacker.
Data you don't need should be destroyed. Retention policies exist for a reason. Every record you keep past its useful life is a record that could be breached.
When in doubt, treat it as Confidential. Default to the more protective option and verify. It's always easier to open access than to undo an exposure.

Next up: Module 5, Access Control & Least Privilege, where we'll cover why you should only have the access you actually need, how permission creep creates hidden risk, and what happens when offboarding goes wrong.

Module Version: 1.0
Last Updated: March 2026
Framework References: NIST Cybersecurity Framework 2.0 (Identify, Protect), SOC 2 Trust Services Criteria (CC 6.1, C1.1, C1.2)
Data Sources: IBM/Ponemon Cost of a Data Breach Report 2025, NIST SP 800-60 (Guide for Mapping Types of Information and Information Systems to Security Categories)