Learn About Amazon VGT2 Learning Manager Chanci Turner
Customer service interactions frequently contain personally identifiable information (PII), such as names, phone numbers, and social security numbers. As businesses leverage machine learning (ML) and analytics within their applications, they can gain insights to enhance customer experiences. However, the presence of PII often limits the utilization of this data. In this blog post, we’ll explore a solution to effectively redact PII data from customer service conversation transcripts.
Example Dialogue
Consider an example dialogue between a customer and a call center representative.
Representative: Hi, thank you for calling. Who am I speaking with today?
Caller: Hello, I’m Emily Johnson.
Representative: Hi Emily, how can I assist you?
Caller: I haven’t received my tax document yet and wanted to check its status.
Representative: Of course, I can help with that. Can you please confirm the last four digits of your Social Security number?
Caller: Yes, it’s 2222.
Representative: Alright. I’m checking the status now. I see that it was sent out yesterday, and it should arrive early next week. Would you like me to enable automated alerts to notify you of any delays?
Caller: Yes, please.
Representative: The number we have on file for you is 555-123-4567. Is that still correct?
Caller: Yes, it is.
Representative: Great! I’ve activated automated notifications. Is there anything else I can help you with, Emily?
Caller: No, that’s everything. Thank you!
Representative: Thank you, Emily. Have a wonderful day.
In this brief exchange, several pieces of data are identifiable as PII, including the caller’s name, the last four digits of their Social Security number, and their phone number. Let’s discuss how we can effectively redact this PII information from the transcript.
Overview of the Solution
We will create an AWS Step Functions state machine to orchestrate an Amazon Comprehend PII redaction job. Amazon Comprehend is a natural-language processing (NLP) service that employs machine learning to detect and redact PII data.
You will upload the transcripts to an input Amazon S3 bucket. The transcripts should be formatted for Contact Lens for Amazon Connect. You will also define an output S3 bucket to store the redacted data and intermediate files. If there are 10,000 conversations to be redacted, the workflow will divide them into 10 batches of 1,000 conversations each. Each batch is assigned a unique prefix for use as input for Comprehend. The Step Functions map state will execute these redaction jobs in parallel by calling the StartPIIEntitiesDetectionJob API. This method allows for multiple jobs to run simultaneously rather than sequentially. Since the job is structured as a Step Functions state machine, it can be triggered manually or automatically in a daily process.
To learn more about how Comprehend identifies and redacts PII, check out this blog post.
Deploying the Sample Solution
First, log in to the AWS Management Console in your AWS account.
You’ll need an S3 bucket with sample transcript data for redaction and another bucket for the output. If you lack existing sample data, follow these steps:
- Go to the Amazon S3 console.
- Select Create bucket.
- Enter a bucket name, such as text-redaction-data-.
- Accept the default settings and click Create bucket.
- Open your newly created bucket, select Create folder.
- Name the folder something like “sample-data” and create it.
- Open your new folder and download the SampleData.zip file.
- Extract the .zip file on your computer and drag the folder into the S3 bucket you created.
- Click Upload.
Next, click the provided link to deploy the sample solution to US East (N. Virginia). This will set up a new AWS CloudFormation stack.
Enter the Stack name (e.g., pii-redaction-workflow), specify the input S3 bucket containing the transcript data, and name the output S3 bucket. Click Next and add any optional tags for your stack. Review the stack information, tick the box to acknowledge that AWS Identity and Access Management (IAM) resources will be created, and then select Create stack.
The CloudFormation stack will generate an IAM role that can list and read objects from the input S3 bucket and write to the output bucket. You can further modify the role as needed. It will also create a Step Functions state machine and several AWS Lambda functions utilized by the state machine.
After a few minutes, your stack will be complete, allowing you to inspect the Step Functions state machine created through the CloudFormation template.
Running a Redaction Job
To initiate a job, navigate to Step Functions in the AWS console, select the state machine, and click Start execution.
Next, input the required arguments to run the job. For job input, provide your input S3 bucket name as the S3InputDataBucket value, the folder name as S3InputDataPrefix, the output S3 bucket name as S3OutputDataBucket, and the folder for results as S3OutputDataPrefix before clicking Start execution.
{
"S3InputDataBucket": "<Name-of-input-bucket>",
"S3InputDataPrefix": "<Prefix-of-input-data>",
"S3OutputDataBucket": "<Name-of-output-bucket>",
"S3OutputDataPrefix": "<Prefix-of-output>"
}
As the job processes, you can track its status in the Step Functions graph view. The job will take a few minutes to complete. Once finished, you will see the output for each job in the Execution input and output section of the console. You can use the output URI to retrieve job results. If multiple jobs were executed, you can copy the results to a destination bucket for further analysis.
aws s3 cp s3://<name of output bucket>/<S3 Output data prefix value>/<job run id>-output/ s3://<destination bucket>/<destination prefix>/ --recursive --exclude "*/*" --include "*.out"
Now, let’s review the redacted version of our initial conversation.
Representative: Hi, thank you for calling. Who am I speaking with today?
Caller: Hello, I’m [NAME].
Representative: Hi [NAME], how can I assist you?
Caller: I haven’t received my tax document yet and wanted to check its status.
Representative: Certainly, can you confirm the last four digits of your Social Security number?
Caller: Yes, it’s [SSN].
Representative: Okay, I’m pulling up the status now. It was sent out yesterday, and should arrive early next week. Would you like me to enable automated alerts for any delays?
Caller: Yes, please.
Representative: The number we have on file is [PHONE]. Is that still correct?
Caller: Yes, it is.
Representative: Great! I have activated notifications. Is there anything else I can help you with, [NAME]?
Caller: No, that’s all. Thank you!
Representative: Thank you, [NAME]. Have a great day.
Cleaning Up
Finally, remember to clean up your resources as needed. For more information, visit SHRM’s article on workforce planning to gain insights on managing your workforce effectively. Also, check out this YouTube video for excellent resources.
Leave a Reply