Copying objects using AWS Lambda based on S3 events – Part 1

By April 21, 2020 May 18th, 2020 AWS, Blogs, Cloud

Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies

Introduction

In this era of cloud, where data is always on the move. It is imperative for anyone dealing with moving data, to hear about Amazon’s Simple Storage Service, or popularly known as S3. As the name suggests, it is a simple file storage service, where we can upload or remove files – better referred to as objects. It is a very flexible storage and it will take care of scalability, security, performance and availability. So this is something which comes very handy for a lot of applications & use cases.

The next best thing we use here – AWS Lambda! The new world of Serverless Computing. You will be able to run your workloads easily using Lambda without absolutely bothering about provisioning any resources. Lambda takes care of it all.

Advantages

S3 as we already know is object-based storage, highly scalable & efficient. We can use it as a data source or even as a destination for various applications. AWS Lambda being serverless allows us to run anything without thinking about any underlying infrastructure. So you can use Lambda for a lot of your processing jobs or even simple communicating with any of your AWS resources.

Use Case

Copying new files to a different location(bucket/path) while preserving the hierarchy. We will use AWS Python SDK to solve this.

Problem Statement

Say, we have an application writing  files to a S3 bucket path every time an Employee updates his/her tasks at any time of the day during working hours.

For eg, The work activity of Ajay Muralidhar for 6th April 2020, of 12:00 PM will be stored in source-bucket-006/AjayMuralidhar/2020-04-06/12/my-task.txt. Refer to the Tree for more clarity. We need to move these task files to a new bucket while preserving the file hierarchy.

Solution

For solving this problem, we will use Amazon S3 events. Every file pushed to the source bucket will be an event, this needs to trigger a Lambda function which can then process this file and move it to the destination bucket.

1. Creating a Lambda Function

1.1 Go to the AWS Lambda Console and click on Create Function

1.2 Select an Execution Role for your Function

This is important because this ensures that your Lambda has access to your source & destination buckets. Either you can use an existing role that already has access to the S3 buckets, or you can choose to Create an execution role. If you choose the later, you will need to attach S3 permission to your role.

1.2.1 Optional – S3 Permission for new execution role

Go to Basic settings in your Lambda Function. You will find this when you scroll down your Lambda Function. Click Edit. You can edit your Lambda runtime settings here, like Timeout – Max of 15 mins. This is the time for which your Lambda can run. Advisable to set this as per your job requirement. Any time you get an error of Lambda timed out. You can increase this value.

Or you can also check the Permissions section for the role.

Click on View the <your-function-name>-role-<xyzabcd> role on the IAM console. This takes you to the IAM console. Click on Attach policies. You can also create inline policy if you need more control on the access you are providing. You can restrict this to particular buckets. For ease of demonstration, we are using AmazonS3FullAccess here.

Select AmazonS3FullAccess, click on Attach policy

Once the policy is successfully attached to your role, you can go back to your Lambda Function.

2. Setting S3 Event Trigger

2.1 Under Designer tab, Click on Add trigger

2.2 From the Trigger List dropdown, select S3 events

Select your source bucket. There are various event types you can choose from.

Find out more about S3 events here, https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#notification-how-to-event-types-and-destinations

We are using PUT since we want this event to trigger our Lambda when any new files are uploaded to our source bucket. You can add Prefix & Suffix if you need any particular type of files. Check on Enable Trigger

Python Script

We now write a simple Python script which will pick the incoming file from our source bucket and copy it to another location. The best thing about setting the Lambda S3 trigger is, whenever a new file is uploaded, it will trigger our Lambda. We make use of the event object here to gather all the required information.

This is how a sample event object looks like. This is passed to your Lambda function.

{
   "Records":[
      {
         "eventVersion":"2.1",
         "eventSource":"aws:s3",
         "awsRegion":"xx-xxxx-x",
         "eventTime":"2020-04-08T19:36:34.075Z",
         "eventName":"ObjectCreated:Put",
         "userIdentity":{
            "principalId":"AWS:POWERUPCLOUD:powerup@powerupcloud.com"
         },
         "requestParameters":{
            "sourceIPAddress":"XXX.XX.XXX.XX"
         },
         "responseElements":{
            "x-amz-request-id":"POWERUPCLOUD",
            "x-amz-id-2":"POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD"
         },
         "s3":{
            "s3SchemaVersion":"1.0",
            "configurationId":"powerup24-powerup-powerup-powerup",
            "bucket":{
               "name":"source-test-bucket-006",
               "ownerIdentity":{
                  "principalId":"POWERUPCLOUD"
               },
               "arn":"arn:aws:s3:::source-test-bucket-006"
            },
            "object":{
               "key":"AjayMuralidhar/2020-04-06/12/my-tasks.txt",
               "size":20,
               "eTag":"1853ea0cebd1e10d791c9b2fcb8cc334",
               "sequencer":"005E8E27C31AEBFA2A"
            }
         }
      }
   ]
}

Your Lambda function makes use of this event dictionary to identify the location where the file is uploaded.

import json
import boto3

# boto3 S3 initialization
s3_client = boto3.client("s3")


def lambda_handler(event, context):
   destination_bucket_name = 'destination-test-bucket-006'

   # event contains all information about uploaded object
   print("Event :", event)

   # Bucket Name where file was uploaded
   source_bucket_name = event['Records'][0]['s3']['bucket']['name']

   # Filename of object (with path)
   file_key_name = event['Records'][0]['s3']['object']['key']

   # Copy Source Object
   copy_source_object = {'Bucket': source_bucket_name, 'Key': file_key_name}

   # S3 copy object operation
   s3_client.copy_object(CopySource=copy_source_object, Bucket=destination_bucket_name, Key=file_key_name)

   return {
       'statusCode': 200,
       'body': json.dumps('Hello from S3 events Lambda!')
   }

You can test your implementation by uploading a file in any folders of your source bucket, and then check your destination bucket for the same file.

source-test-bucket-006

destination-test-bucket-006

You can check your Lambda execution logs in CloudWatch. Go to Monitoring and click View Logs in CloudWatch

Congrats! We have solved our problem. Just before we conclude this blog, we would like to discuss an important feature of Lambda which will help you to upscale your jobs. What if your application is writing a huge number of files at the same time? Don’t worry, Lambda will help you with this too. By default, Lambda has a Concurrency of 1000. If you need to scale up, you can increase this as per your business requirements.

Conclusion

This is how easy it was to use S3 with Lambda to move files between buckets.

In Part 2 of this series, we will try to handle a bit more complex problem, where we will try to move files as date partitioned structures at our destination.

You can find link to part 2 here :

Hope this was helpful for an overview of the basics of using s3 events triggers with AWS Lambda. Do leave your comments. Happy reading.

References

https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html

Tags: Amazon S3, AWS Lambda, S3 events, Python, Boto3, S3 Triggers, Lambda Trigger, S3 copy objects

Leave a Reply