Nine Tips for Scalable Data Masking for Growing Companies

While data masking is a simple concept, it’s actually quite complex in practice.

October 28, 2022

Many companies are undergoing data masking projects to keep data safe and compliant while still accessible to the right people. Ben Herzberg, chief scientist at Satori Cyber, shares tips and strategies growing companies can use for scalable data masking.

Companies are required to comply with data privacy regulations like General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), and the Payment Card Industry Data Security Standard (PCI DSS). These standards are important in preventing unauthorized users from accessing sensitive data but can hinder the ability of others to access, analyze, and efficiently use the data.  

Wikipedia defines data masking as “the process of modifying sensitive data in such a way that it is of no or little value to unauthorized intruders, while still being usable by software or authorized personnel.” It’s a strategy used to protect highly sensitive data like PHI, PPI, and PCI while providing a functional alternative when real data is not needed—for example, in user training or software testing. 

While data masking is, in theory, a simple concept, it’s actually quite complex in practice and can be difficult for organizations to implement and scale.

See More: BYOD: A Threat to Data Security and Privacy Protection?

Data Masking 101

Data masking basically disguises sensitive data like social security numbers, insurance information, medical histories, credit card details, and intellectual property so suspicious parties won’t recognize it as valuable data they can exploit. The two most common types of data masking are static and dynamic.

Static data masking (SDM) involves creating a duplicated, clean version of a dataset with fully or partially masked data and moving it to a separate location where you can safely share it with a specific audience. However, SDM makes it difficult to maintain a “single source of truth” for your data – having different copies of your original data can result in confusion. 

On the other hand, dynamic data masking (DDM), or masking data in real-time, allows data engineering teams to maintain control over sensitive data with a single, clear source of truth. DDM also offers scalability because you don’t need to copy or move data when the number of users of data sets increases. DDM allows you to stream data directly from the original location to different systems (i.e., a development or testing environment) without storing data in a secondary environment, like with SDM.

In terms of how data can be masked, teams often use encryption, character shuffling, or word substitution. For example, personally identifiable information like names or credit card numbers might be replaced with symbols, substituted with random data, or only viewable with a decryption key. 

Nine Tactics for Scalable Data Masking 

Dynamic data masking is the best option to mitigate risks associated with a data breach, and these nine tips will set you up for success.

    1. Start with proper data discovery and classification: Companies don’t usually store all of their data in one central location – it’s spread out across many places, making it tricky to know exactly where sensitive data actually is. In order to mask data effectively and avoid delays, it first needs to be discovered and accurately classified. 
    2. Apply dynamic data masking rules to new projects: Dynamic data masking alters information in real-time directly to the production database to ensure that only authorized users see the original data, while non-privileged users see masked data. Applying these rules to all new projects will make everything run more smoothly and securely.
    3. Create universal data masking rules across all data stores: Often, data is stored across many platforms (for example, MySQL, Snowflake, Redshift, or Athena), which makes it challenging to mask. Universal masking rules that can be automatically applied regardless of the data platform without coding or reliance on other technologies will simplify data masking projects and ongoing maintenance. 
    4. Tackle data masking rules for semi- and unstructured data: Masking structured data is challenging as is – trying to mask semi- or unstructured data like images adds another layer of complication. Teams can use tactics like optical character recognition or anonymization to pre-sorted sensitive data to mask images from passports, driver’s licenses, and checks.   
    5. Tie data masking rules to RBAC: Access to masked data must comply with security policies related to roles, locations, and permissions. Data masking minimizes the exposure of sensitive data by masking it to non-privileged users, including database administrators and developers. By tying it to RBAC, employees can perform their duties without fear that they will accidentally expose data they aren’t authorized to view. 
    6. Build a universal layer for compliance rules: Compliance and security policies are always changing. A data masking strategy that includes a universal layer for compliance will keep data masking projects on track during policy changes or updates without the need for a team to manually disable any previously built logic.
    7. Apply and enforce real-time data masking policies: Dynamic data masking with built-in granular access controls enables organizations to monitor and mask data in real-time based on roles and users. This approach keeps data masked to non-privileged users and concurrently accessible to authorized users. 
    8. Move away from hard-coded data masking rules: Dynamic data masking can be difficult to scale when performed at the data infrastructure level, specifically when using several different data stores. Look for a tool that can apply dynamic masking across all your data stores without writing any code, regardless of specific native capabilities.
    9. Manage conflicting requirements from data, compliance, and security teams: Data masking requirements often come from different teams within the organization, such as the data protection, privacy, product management, data governance, or security teams. Conflicting requirements make it difficult for data masking teams to determine what needs to be changed or ignored. Planning for this ahead of time and having a set process in place will save time in the long run. 

When undergoing data masking projects, teams must be confident that reverse engineering will not be possible or efforts will be futile. It is also important to test the results of your data masking to confirm if you are adequately protecting sensitive data and complying with government and industry regulations.

Companies that work with sensitive data in the cloud need the ability to mask data quickly, reliably, and cost-effectively. While data masking projects seem simple, they can be tricky to execute and full of conditions that aren’t flexible enough to keep up with a dynamic data environment. 

The key is to set up a consistent, repeatable data masking approach that enables secure access to sensitive data while remaining compliant. You can set yourself up for success with the few simple rules we’ve discussed.

Which of these data masking tips are you employing? Share with us on  FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window .

Image Source: Shutterstock

MORE ON DATA SECURITY

Ben Herzberg
Ben is an experienced tech leader and book author with a background in endpoint security, analytics, and application & data security. Ben filled roles such as the CTO of Cynet, and Director of Threat Research at Imperva. Ben is the Chief Scientist for Satori, the DataSecOps platform.
Take me to Community
Do you still have questions? Head over to the Spiceworks Community to find answers.