
Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies
Introduction
In this era of cloud, where data is always on the move. It is imperative for anyone dealing with moving data, to hear about Amazon’s Simple Storage Service, or popularly known as S3. As the name suggests, it is a simple file storage service, where we can upload or remove files – better referred to as objects. It is a very flexible storage and it will take care of scalability, security, performance and availability. So this is something which comes very handy for a lot of applications & use cases.
The next best thing we use here – AWS Lambda! The new world of Serverless Computing. You will be able to run your workloads easily using Lambda without absolutely bothering about provisioning any resources. Lambda takes care of it all.
Advantages
S3 as we already know is object-based storage, highly scalable & efficient. We can use it as a data source or even as a destination for various applications. AWS Lambda being serverless allows us to run anything without thinking about any underlying infrastructure. So you can use Lambda for a lot of your processing jobs or even simple communicating with any of your AWS resources.
Use Case
Copying new files to a different location(bucket/path) while preserving the hierarchy. We will use AWS Python SDK to solve this.
Problem Statement
Say, we have an application writing files to a S3 bucket path every time an Employee updates his/her tasks at any time of the day during working hours.
For eg, The work activity of Ajay Muralidhar for 6th April 2020, of 12:00 PM will be stored in source-bucket-006/AjayMuralidhar/2020-04-06/12/my-task.txt. Refer to the Tree for more clarity. We need to move these task files to a new bucket while preserving the file hierarchy.


Solution
For solving this problem, we will use Amazon S3 events. Every file pushed to the source bucket will be an event, this needs to trigger a Lambda function which can then process this file and move it to the destination bucket.
1. Creating a Lambda Function
1.1 Go to the AWS Lambda Console and click on Create Function

1.2 Select an Execution Role for your Function
This is important because this ensures that your Lambda has access to your source & destination buckets. Either you can use an existing role that already has access to the S3 buckets, or you can choose to Create an execution role. If you choose the later, you will need to attach S3 permission to your role.

1.2.1 Optional – S3 Permission for new execution role

Go to Basic settings in your Lambda Function. You will find this when you scroll down your Lambda Function. Click Edit. You can edit your Lambda runtime settings here, like Timeout – Max of 15 mins. This is the time for which your Lambda can run. Advisable to set this as per your job requirement. Any time you get an error of Lambda timed out. You can increase this value.
Or you can also check the Permissions section for the role.

Click on View the <your-function-name>-role-<xyzabcd> role on the IAM console. This takes you to the IAM console. Click on Attach policies. You can also create inline policy if you need more control on the access you are providing. You can restrict this to particular buckets. For ease of demonstration, we are using AmazonS3FullAccess here.

Select AmazonS3FullAccess, click on Attach policy

Once the policy is successfully attached to your role, you can go back to your Lambda Function.
2. Setting S3 Event Trigger
2.1 Under Designer tab, Click on Add trigger

2.2 From the Trigger List dropdown, select S3 events
Select your source bucket. There are various event types you can choose from.
Find out more about S3 events here, https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#notification-how-to-event-types-and-destinations
We are using PUT since we want this event to trigger our Lambda when any new files are uploaded to our source bucket. You can add Prefix & Suffix if you need any particular type of files. Check on Enable Trigger

Python Script
We now write a simple Python script which will pick the incoming file from our source bucket and copy it to another location. The best thing about setting the Lambda S3 trigger is, whenever a new file is uploaded, it will trigger our Lambda. We make use of the event object here to gather all the required information.
This is how a sample event object looks like. This is passed to your Lambda function.
{
"Records":[
{
"eventVersion":"2.1",
"eventSource":"aws:s3",
"awsRegion":"xx-xxxx-x",
"eventTime":"2020-04-08T19:36:34.075Z",
"eventName":"ObjectCreated:Put",
"userIdentity":{
"principalId":"AWS:POWERUPCLOUD:powerup@powerupcloud.com"
},
"requestParameters":{
"sourceIPAddress":"XXX.XX.XXX.XX"
},
"responseElements":{
"x-amz-request-id":"POWERUPCLOUD",
"x-amz-id-2":"POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD"
},
"s3":{
"s3SchemaVersion":"1.0",
"configurationId":"powerup24-powerup-powerup-powerup",
"bucket":{
"name":"source-test-bucket-006",
"ownerIdentity":{
"principalId":"POWERUPCLOUD"
},
"arn":"arn:aws:s3:::source-test-bucket-006"
},
"object":{
"key":"AjayMuralidhar/2020-04-06/12/my-tasks.txt",
"size":20,
"eTag":"1853ea0cebd1e10d791c9b2fcb8cc334",
"sequencer":"005E8E27C31AEBFA2A"
}
}
}
]
}
Your Lambda function makes use of this event dictionary to identify the location where the file is uploaded.
import json
import boto3
# boto3 S3 initialization
s3_client = boto3.client("s3")
def lambda_handler(event, context):
destination_bucket_name = 'destination-test-bucket-006'
# event contains all information about uploaded object
print("Event :", event)
# Bucket Name where file was uploaded
source_bucket_name = event['Records'][0]['s3']['bucket']['name']
# Filename of object (with path)
file_key_name = event['Records'][0]['s3']['object']['key']
# Copy Source Object
copy_source_object = {'Bucket': source_bucket_name, 'Key': file_key_name}
# S3 copy object operation
s3_client.copy_object(CopySource=copy_source_object, Bucket=destination_bucket_name, Key=file_key_name)
return {
'statusCode': 200,
'body': json.dumps('Hello from S3 events Lambda!')
}
You can test your implementation by uploading a file in any folders of your source bucket, and then check your destination bucket for the same file.
source-test-bucket-006

destination-test-bucket-006

You can check your Lambda execution logs in CloudWatch. Go to Monitoring and click View Logs in CloudWatch

Congrats! We have solved our problem. Just before we conclude this blog, we would like to discuss an important feature of Lambda which will help you to upscale your jobs. What if your application is writing a huge number of files at the same time? Don’t worry, Lambda will help you with this too. By default, Lambda has a Concurrency of 1000. If you need to scale up, you can increase this as per your business requirements.

Conclusion
This is how easy it was to use S3 with Lambda to move files between buckets.
In Part 2 of this series, we will try to handle a bit more complex problem, where we will try to move files as date partitioned structures at our destination.
You can find link to part 2 here :
Hope this was helpful for an overview of the basics of using s3 events triggers with AWS Lambda. Do leave your comments. Happy reading.
References
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html
Tags: Amazon S3, AWS Lambda, S3 events, Python, Boto3, S3 Triggers, Lambda Trigger, S3 copy objects