Event-based trigger on an Amazon S3 bucket to run an AWS Lambda function and perform file processing. When a CSV file is uploaded, each line of the file is examined
Services Covered
S3
AWS Lambda
Lab description
A company has a legacy system that accepts CSV (comma separated values) file uploads. The company is preparing to migrate the system to the cloud.
The new cloud-based system will use an event-based trigger on an Amazon S3 bucket to run an AWS Lambda function and perform file processing. When a CSV file is uploaded, each line of the file is examined. The line is copied to a new file placed under a folder named after the first field in the line.
Learning Objectives
- Creating an Amazon S3 bucket
- Implementing a Python AWS Lambda function using a pre-existing implementation
- Modifying the function code to generate random identifiers that are 12 characters long
- Testing the cloud-based system by uploading a test CSV file to ensure that the AWS Lambda function is triggered
Lab date
27-10-2021
Prerequisites
- AWS account
Lab steps
- Create an Amazon S3 Bucket, then create a folder in the S3 bucket named input.
- Create an AWS Lambda trigger that satisfies the following: triggered by S3 events, on all object create events, only triggered by objects created in the input/ prefix and only with .csv suffix.
- Implement the Lambda function:
import io import boto3 import string import random s3 = boto3.client("s3") INPUT_PREFIX = "input" OUTPUT_PREFIX = "output" ID_LENGTH = 12 def random_id(): return "".join(random.choices(string.ascii_uppercase + string.digits, k=ID_LENGTH)) def separate_object(bucket, key): body = s3.get_object(Bucket=bucket, Key=key)["Body"].read().decode("utf-8") output = {} for line in io.StringIO(body): fields = line.split(",") output.setdefault(fields[0], []).append(line) return output def write_objects(objects, bucket, key): file_name = key.split("/")[-1] for prefix in objects.keys(): identifier = random_id() s3.put_object( Body=",".join(objects[prefix]), Key=f"{OUTPUT_PREFIX}/{prefix}/{identifier}-{file_name}", Bucket=bucket, ) def lambda_handler(event, context): record = event["Records"][0]["s3"] bucket = record["bucket"]["name"] key = record["object"]["key"] if key.startswith(INPUT_PREFIX): objects = separate_object(bucket, key) write_objects(objects, bucket, key) return "OK"
The OUTPUT_PREFIX is set to output and ID_LENGTH is 12 characters long as instructed.
- Upload a example CSV file to the input folder in your S3 bucket.
- Check if the files has been separated and created under the output folder.