Loading module...
Loading module...
GSA-04
Identify the types of data your organization handles, understand the rules governing each classification level, and avoid accidental data exposure.
General Security Awareness Training
Estimated Time: 15 minutes
By the end of this module, you will be able to:
Not all data carries the same risk. Your company's lunch menu and your customers' Social Security numbers both live somewhere in your systems, but they require very different levels of protection. Data classification is the system your company uses to sort information into categories based on its sensitivity, then apply the appropriate security controls to each category.
This isn't just organizational housekeeping. It's a compliance requirement. SOC 2's Confidentiality Trust Services Criteria requires that your organization identify and protect confidential information. Auditors expect to see a documented classification policy, evidence that employees understand it, and proof that data is handled according to its classification. When a breach exposes data that should have been classified as confidential but was stored or shared as though it were internal, the auditor's question is straightforward: "Did your people know the rules, and did they follow them?"
Classification also drives cost-effective security. You can't put maximum protection on everything; the budget and friction would be unsustainable. What you can do is put maximum protection on the data that would cause the most damage if exposed and apply proportionate controls to everything else. That's what a classification system enables.
Most organizations, including those pursuing SOC 2, use a four-tier classification system. The exact names vary from company to company, but the logic is consistent. Your company's policy may use slightly different terminology, so check your internal documentation for the specific labels that apply to you.
This is your most sensitive data. Exposure could result in regulatory penalties, legal liability, loss of customer trust, or direct financial harm to individuals. Access is limited to a small number of specifically authorized people with a documented business need.
Examples: Customer Social Security numbers, payment card numbers, protected health information (PHI), encryption keys, authentication credentials, production database access tokens, and merger/acquisition documents before public announcement.
Handling rules: Encrypted at rest and in transit. Access requires explicit authorization and is logged. Never shared via email, Slack, or other unencrypted channels without approved safeguards. Subject to the strictest retention and disposal policies.
Sensitive business information that could cause significant harm to the company or its customers if exposed to unauthorized parties. Broader access than Restricted, but still limited to employees with a legitimate need to know.
Examples: Customer lists and contact information, employee compensation data, internal financial reports, product roadmaps, vendor contracts, security audit results, source code, and proprietary algorithms.
Handling rules: Encrypted in transit. Stored in access-controlled systems. Shared only with authorized personnel and, when shared externally, only under a nondisclosure agreement or equivalent contractual protection. Not to be stored on personal devices without IT approval.
Information intended for use within the company but not meant for public distribution. Exposure wouldn't cause severe harm, but it's still not something you'd want competitors, the press, or unauthorized individuals to access.
Examples: Internal meeting notes, organizational charts, project plans, internal policies, training materials, and employee directories.
Handling rules: Shared freely within the company using approved tools. Not posted publicly or shared with external parties without review. Reasonable access controls applied, but encryption isn't always required.
Information that is intentionally available to anyone. No harm results from broad distribution.
Examples: Published blog posts, marketing materials, press releases, job postings, and public-facing product documentation.
Handling rules: No access restrictions. Should still be reviewed for accuracy before publication. Once something is public, it can't be made private again.
Understanding the classification levels is the framework. Knowing the specific types of sensitive data you encounter in your job is what makes the framework actionable. Here are the categories that matter most for SOC 2 compliance and general security.
PII is any data that can be used to identify a specific individual, either on its own or in combination with other data. This is the broadest and most common category of sensitive data in most SaaS companies.
Direct identifiers (identify someone on their own): Full name, Social Security number, driver's license number, passport number, email address, phone number, biometric data (fingerprints, facial recognition), and financial account numbers.
Indirect identifiers (identify someone when combined): Date of birth, ZIP code, job title, IP address, device IDs, and demographic information. A ZIP code alone isn't PII. A ZIP code combined with a date of birth and gender can narrow identification to a single individual in most of the U.S. population.
Why it matters: PII exposure triggers regulatory obligations (state breach notification laws, GDPR, CCPA), damages customer trust, and creates legal liability. The average cost per compromised PII record in 2025 was approximately $160, and that adds up fast at scale.
PHI is PII that is linked to health care. If your company handles any data related to a person's medical history, treatment, diagnosis, insurance, or health care payment, that data is PHI and is governed by HIPAA.
Examples: Medical records, prescription histories, insurance claim data, lab results, therapy notes, and any PII that is associated with health care services (a patient's name and appointment date, for instance).
Why it matters: Health care breaches are the most expensive across all industries, averaging $7.42 million per incident in 2025. HIPAA violations carry fines of up to $2.13 million per violation category per year.
Financial data falls into two overlapping but distinct regulatory categories. Understanding the difference matters because each carries its own compliance obligations.
PIFI (Personally Identifiable Financial Information) is the broader category. Defined under SEC Regulation S-P and rooted in the Gramm-Leach-Bliley Act, PIFI covers any nonpublic data a consumer provides to obtain a financial product or service, or that results from a financial transaction. If your company handles customer billing, invoicing, or financial account information, you likely touch PIFI.
PIFI examples: Bank account and routing numbers, transaction histories, loan or credit information, account balances, Social Security numbers in a financial context, and tax identification numbers.
PCI (Payment Card Industry) data is a narrower, more specific category governed by the PCI Data Security Standard (PCI DSS). It covers the data elements directly tied to credit and debit card transactions.
PCI examples: Primary account numbers (the card number itself), cardholder names, card expiration dates, CVV/CVC codes, and PIN data.
Why both matter: PIFI exposure can result in identity theft, regulatory fines under the GLBA, and loss of consumer trust. PCI exposure can result in all of the above plus direct monetary theft and, critically, loss of the ability to process credit card payments, which for a SaaS company can be existential. If your company accepts card payments, PCI DSS compliance isn't optional.
Not all sensitive financial information falls under PIFI or PCI. Your company also generates and handles internal financial data that carries no specific regulatory mandate but could cause serious business harm if exposed. This is the category people tend to treat too casually because there's no acronym or compliance framework forcing the issue.
Examples: Revenue figures, annual recurring revenue (ARR), burn rate and runway projections, customer contract values and pricing terms, fundraising details, cap tables, investor communications, board decks, commission structures, compensation models, vendor pricing, and financial forecasts.
Why it matters: For a startup, a leaked burn rate can spook investors. A leaked pricing model can hand a competitor your entire go-to-market strategy. Board decks shared outside authorized channels can derail a funding round. None of this data triggers a regulatory notification the way a PII breach does, but the business damage can be just as severe. Treat company financial data as Confidential at a minimum, and Restricted when it involves active fundraising, M&A activity, or board-level strategy.
Proprietary information that gives your company a competitive advantage. This is the category people most often forget to classify because it doesn't come with the same regulatory requirements as PII or PHI.
Examples: Source code, product architecture documents, proprietary algorithms, unreleased feature designs, pricing models, customer acquisition strategies, and trade secrets.
Why it matters: IP theft averaged about $178 per record in 2025, the highest per-record cost of any data type. Unlike PII breaches, IP theft may not be detected for months or years, and the competitive damage is often irreversible.
Most data exposure isn't the result of a sophisticated hack. It's the result of a normal person making a normal mistake during a normal workday. Here are the most common scenarios.
You're emailing a report and autocomplete fills in the wrong "Sarah." The report containing customer revenue data goes to a vendor contact instead of your colleague. This is one of the most common causes of data incidents, and it happens because email autocomplete works against you when multiple contacts share similar names.
Prevention: Slow down on the send. Double-check the recipient field, especially when the email contains attachments or sensitive data. If your email client supports it, enable a brief send delay (even 10 seconds creates a window to catch mistakes).
You paste a customer's API key into a Slack channel to troubleshoot an issue. That channel has 40 people in it, most of whom don't need access to that key. Or you share a Google Drive folder with "anyone with the link" to make it easier for a colleague to access, forgetting that the folder also contains confidential contract terms.
Prevention: Treat collaboration tools with the same care as email. Don't paste credentials, keys, or sensitive data into shared channels. Use direct messages or dedicated secure channels for sensitive troubleshooting. Set the most restrictive sharing permissions first, then open access only as needed.
A developer pushes code to a public GitHub repository without realizing it contains hardcoded database credentials. A marketing team member uploads a customer case study draft (with the customer's real revenue numbers) to a public-facing content management system instead of the internal staging environment.
Prevention: Never hardcode credentials in source code; use environment variables or a secrets manager. Review what you're uploading and where it's going. Public and internal environments should be clearly separated, and the default should always be the more restrictive option.
This is the newest and fastest-growing exposure vector. An employee pastes a customer contract, a snippet of source code, or an internal financial report into a public AI assistant to get a summary or analysis. That data may now be stored by the AI provider, used to train future models, or accessible to the provider's employees. Depending on the data and the tool, this can constitute a breach.
Prevention: Follow your company's AI acceptable use policy (Module 8 covers this in detail). Never paste Restricted or Confidential data into a public AI tool unless that tool has been specifically approved for that classification of data by your security team.
An employee leaves the company, but their access to shared Google Drive folders, Slack channels, and third-party SaaS tools isn't removed for weeks. A contractor's project ends, but their credentials remain active. This isn't technically an "accident" in the moment, but the cumulative effect is unauthorized access to data that persists long after it should have been revoked.
Prevention: This is primarily an IT/security team responsibility (Module 5 covers access control in depth), but every employee can help by flagging when a colleague departs or a contractor's engagement ends.
Keeping data longer than necessary increases risk without adding value. Every record you retain is a record that could be exposed in a breach. Data retention policies define how long each category of data should be kept and what happens to it when the retention period expires.
Why it matters for SOC 2: Auditors expect to see evidence of a data retention policy and proof that the organization follows it. Keeping customer PII for five years after the customer cancelled their account isn't just sloppy. It's a compliance gap.
Your responsibilities:
When you're unsure how to handle a piece of data, walk through these four questions:
1. What classification level is this? Check your company's data classification policy. If you're not sure, treat it as Confidential until you can confirm.
2. Who is authorized to access it? If the person you're about to share it with doesn't have a legitimate business need, don't share it. When in doubt, ask your manager or security team.
3. Am I using an approved channel? Restricted data should never travel through unapproved tools. Confidential data should be encrypted in transit. If the channel feels informal (a text message, a personal email, an unapproved AI tool), it's probably not the right one.
4. What happens after I share it? Think about the lifecycle. Will the recipient store it securely? Will they know to delete it when it's no longer needed? If you're sharing externally, is there a contractual obligation (NDA, DPA) in place?
If any of these questions gives you pause, that pause is the security control working. Slow down, verify, and ask if you need to.
Next up: Module 5, Access Control & Least Privilege, where we'll cover why you should only have the access you actually need, how permission creep creates hidden risk, and what happens when offboarding goes wrong.
Module Version: 1.0
Last Updated: March 2026
Framework References: NIST Cybersecurity Framework 2.0 (Identify, Protect), SOC 2 Trust Services Criteria (CC 6.1, C1.1, C1.2)
Data Sources: IBM/Ponemon Cost of a Data Breach Report 2025, NIST SP 800-60 (Guide for Mapping Types of Information and Information Systems to Security Categories)