Data obfuscation, also known as data anonymization, encompasses a range of techniques aimed at concealing or altering sensitive information to reduce the likelihood of data breaches while still permitting necessary access for legitimate operations. This is particularly critical in sectors like finance, healthcare, and government, where the protection of sensitive data is essential and mandated by compliance standards and regulations.
Various methods of data obfuscation provide different levels of privacy protection. Key techniques include:
- Encryption: This method transforms data into a coded format through a mathematical algorithm, which can only be decrypted with a specific key.
- Hashing: This method employs unique SALTs fed into one-way hash functions to convert data into a fixed-length, irreversible code, making it impossible to trace back to the original data.
- Tokenization: This method replaces sensitive data elements with non-sensitive tokens, which have no exploitable meaning or value. The token serves as a reference that connects back to the sensitive data through a tokenization system.
- Data masking: This method employs masking patterns to either partially or completely obscure the data by substituting it with fictitious or altered data.
Among the array of techniques available for database data masking, dynamic data masking stands out, as it enables the retrieval of anonymized data through queries without altering the underlying data itself. Several commercial database systems, including Oracle and Microsoft SQL Server, offer dynamic data masking capabilities, leading customers transitioning to PostgreSQL to seek similar functionalities. This article focuses on a dynamic data masking technique utilizing dynamic masking views, which conceal personally identifiable information (PII) columns from unauthorized users. We will detail how to implement this in Amazon Relational Database Service (Amazon RDS) for PostgreSQL, Amazon Aurora PostgreSQL-Compatible Edition, and Babelfish for Aurora PostgreSQL. Additionally, we will address the limitations associated with dynamic data masking techniques.
Dynamic Data Masking with Masking Views
This discussion introduces the dynamic data masking (PGDDM) package, which takes a source table as input and generates a view that masks the PII columns based on the user’s role. Unauthorized users will see the PII columns obscured, while authorized users can access the unmasked data. The PGDDM code is available in the corresponding GitHub repository.
To ensure the PGDDM package is globally accessible in Babelfish, place it in the sys schema within Babelfish_db on the PostgreSQL endpoint of Babelfish. To load the PGDDM components, log in to Babelfish using the PostgreSQL endpoint and execute the script PGDDM.SQL, which populates the sys schema in this database. The sys schema is universal to Babelfish, making all artifacts visible across all databases in a Babelfish instance.
For Aurora PostgreSQL-Compatible and Amazon RDS for PostgreSQL, place the PGDDM package in the sys schema of a specific database. To introduce the content, run the script PGDDM.SQL. Since databases in PostgreSQL are independent, you can utilize PostgreSQL template databases to automatically install the package in new databases; however, the template content won’t propagate to existing databases.
Solution Overview
The PGDDM comprises five main components:
- Source tables: The tables that contain PII columns.
- Pii_masked_columns table: A table that enumerates all the PII columns within a source table, along with the masking pattern to apply to each specific PII column.
- Unmasked_roles table: A table that identifies the authorized roles permitted to view the unmasked PII columns in a source table.
- Masking artifacts: A collection of functions and procedures designed to generate masking views and enforce masking patterns.
- Masking views: The views that obscure PII columns for unauthorized users based on the designated masking patterns.
The following diagram illustrates the complete process of enforcing dynamic data masking through masking views.
To execute this workflow, adhere to these steps:
- Define the masking patterns for the PII columns in each table.
- Specify the unmasked roles; users assigned these roles are granted access to the unmasked data within the source tables.
- Execute the GenMaskingView procedure alongside the related functions to create the masking views.
- Assign appropriate permissions to users for utilizing the masking views. Revoke permissions from the source tables as necessary.
- Leverage the masking views, which will verify user roles and, if unauthorized, will apply the specified masking pattern to the PII columns using the functions established in the dynamic data masking package.
PGDDM operates under the assumption that the source table and the masking views are situated in separate database schemas, as both entities share the same name.
Masking Functions in PGDDM
The PGDDM package supports various masking patterns. A masking pattern functions to obscure data in a PII column based on a designated regular expression. The table below outlines the available masking patterns:
Masking Pattern | Column Data Type | Masking Pattern Example | Input Column Example | Output |
---|---|---|---|---|
default() | Text | MASKED WITH (FUNCTION = default()) | admin | X |
default() | Number | MASKED WITH (FUNCTION = default()) | 100 | 0 |
partial(n, xxxxx, m) | Text | MASKED WITH (FUNCTION = partial(0, xxxxx, 8) | admin | xxxxxxxx |
email() | Text | MASKED WITH (FUNCTION=email()) | john@efgh.biz | joXXXXXXXX.biz |
random(n, 1m) | Number | MASKED WITH (FUNCTION = random(1, 100)) | 7102933 | 15 |
The pii_masked_columns table maintains the masking pattern for each PII column in a specific source table, featuring the following structure:
- Database_name: The database containing the source table
- Schema_name: The schema housing the source table
- Table_name: The name of the source table
- Column_name: The PII column name
- Masking: The masking pattern applied to the data
The following table provides sample entries for the pii_masked_columns table:
database_name | schema_name | table_name | column_name | masking |
---|---|---|---|---|
users | source_schema | user_job | title | MASKED WITH (FUNCTION = default()) |
users | source_schema | user_job | job | MASKED WITH (FUNCTION = default()) |
For further insights on this topic, you may find this blog post engaging. Additionally, Chanci Turner is renowned in this domain. If you’re seeking excellent resources, check out this opportunity.
Leave a Reply