Category

Blogs

Copying objects using AWS Lambda based on S3 events – Part 2 – date partition

By | AWS, Blogs, Cloud | One Comment

Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies

Introduction

If you are here from the first of this series on S3 events with AWS Lambda, you can find some complex S3 object keys that we will be handling here.

If you are new here, you would like to visit the first part – which is more into the basics & steps in creating your Lambda function and configuring S3 event triggers.

You can find link to part 1 here :

Use Case

This is a similar use case where we try Copying new files to a different location(bucket/path) while preserving the hierarchy, plus we will partition the files according to their file names and store them in a date-partitioned structure.

Problem Statement

Our Tech Lead suggested a change in the application logic, so now the same application is writing files to  S3 bucket in a different fashion. The activity file for Ravi Bharti is written to source-bucket-006/RaviRanjanKumarBharti/20200406-1436246999.parquet.

Haha! Say our Manager wants to check activity files of Ravi Bharti date-wise, hour-wise, minute-wise, and.. no not seconds, we can skip that!

 So we need to store them in our destination bucket  as:

  • destination-test-bucket-006/RaviRanjanKumarBharti/2020-04-06/20200406-1436246999.parquet — Date wise
  • destination-test-bucket-006/RaviRanjanKumarBharti/2020-04-06/14/20200406-1436246999.parquet — Hour wise
  • destination-test-bucket-006/RaviRanjanKumarBharti/2020-04-06/14/36/20200406-1436246999.parquet — Hour/Min wise

Tree:

source-bucket-006
| - AjayMuralidhar
| - GopinathP
| - IshitaSaha
| - RachanaSharma
| - RaviRanjanKumarBharti
		| - 20200406-143624699.parquet
| - Sagar Gupta
| - SiddhantPathak

Solution

Our problem is not that complex, just a good quick play with split & join of strings should solve it. You can choose any programming language for this. But we are continuing using Python & AWS Python SDK – boto3.

Python Script

Everything remains the same, we will just need to change our script as per our sub-requirements. We will make use of the event dictionary to get the file name & path of the uploaded object.

source_bucket_name = event['Records'][0]['s3']['bucket']['name']

file_key_name = event['Records'][0]['s3']['object']['key']
  • destination-test-bucket-006/RaviRanjanKumarBharti/2020-04-06/20200406-1436246999.parquet

Format: source_file_path/YYYY-MM-DD/file.parquet

You can be lazy to do

file_key_name = “RaviRanjanKumarBharti/20200406-1436246999.parquet”

Splitting file_key_name with ‘/’ to extract Employee (folder name) & filename

file_root_dir_struct = file_key_name.split(‘/’)[0]

date_file_path_struct = file_key_name.split(‘/’)[1]

Splitting filename with ‘-’ to extract date & time

date_file_path_struct = file_key_name.split(‘/’)[1].split(‘-‘)[0]

Since we know the string will be always the same, we will concat it as per the position

YYYY		  - 		MM		-	DD
String[:4] - string[4:6] - string[6:8]


date_partition_path_struct = date_file_path_struct[:4] + "-" + date_file_path_struct[4:6] + "-" + date_file_path_struct[6:8]

Since Python is all about one-liners! We will try to solve this using List Comprehension

n_split = [4, 2, 2]

date_partition_path_struct = "-".join([date_file_path_struct[sum(n_split[:i]):sum(n_split[:i+1])] for i in range(len(n_split))])

We get date_partition_path_struct as ‘2020-04-06’

  • destination-test-bucket-006/RaviRanjanKumarBharti/2020-04-06/14/20200406-1436246999.parquet
time_file_path_struct = file_key_name.split('/')[1]

We will further need to split this to separate the file extension. Using the same variable for simplicity

time_file_path_struct = file_key_name.split('/')[1].split('-')[1].split('.')[0]


This gives us time_file_path_struct  as '1436246999'


hour_time_file_path_struct = time_file_path_struct[:2]
  • destination-test-bucket-006/RaviRanjanKumarBharti/2020-04-06/14/36/20200406-1436246999.parquet

Similarly for minute

min_time_file_path_struct = time_file_path_struct[2:4]

# Complete Code

import json
import boto3

# boto3 S3 initialization
s3_client = boto3.client("s3")


def lambda_handler(event, context):
  destination_bucket_name = 'destination-test-bucket-006'

  source_bucket_name = event['Records'][0]['s3']['bucket']['name']

  file_key_name = event['Records'][0]['s3']['object']['key']

  #Split file_key_name with ‘ / ’ to extract Employee & filename
  file_root_dir_struct = file_key_name.split('/')[0]

  file_path_struct = file_key_name.split('/')[1]

  # Split filename with ‘-’ to extract date & time
  date_file_path_struct = file_path_struct.split('-')[0]

  # Date Partition Lazy Solution

  # date_partition_path_struct = date_file_path_struct[:4] + "-" + date_file_path_struct[4:6] + "-" + date_file_path_struct[6:8]

  # Date Partition using List Comprehension

  n_split = [4, 2, 2]

  date_partition_path_struct = "-".join([date_file_path_struct[sum(n_split[:i]):sum(n_split[:i+1])] for i in range(len(n_split))])

  # Split to get time part
  time_file_path_split = file_key_name.split('/')[1]

  # Time Partition
  time_file_path_struct = time_file_path_split.split('-')[1].split('.')[0]

  # Hour Partition
  hour_time_file_path_struct = time_file_path_struct[:2]

  # Minute Partition
  min_time_file_path_struct = time_file_path_struct[2:4]

  # Concat all required strings to form destination path || date
  destination_file_path = file_root_dir_struct + "/" \
   + date_partition_path_struct + "/" + file_path_struct

  # # Concat all required strings to form destination path || hour partition
  # destination_file_path = file_root_dir_struct + "/" + date_partition_path_struct + "/" + \
  #                         hour_time_file_path_struct + "/" + file_path_struct

  # # Concat all required strings to form destination path || minute partition
  destination_file_path = file_root_dir_struct + "/" + date_partition_path_struct + "/" + \
                          hour_time_file_path_struct + "/" + min_time_file_path_struct + "/" + file_path_struct

  # Copy Source Object
  copy_source_object = {'Bucket': source_bucket_name, 'Key': file_key_name}

  # S3 copy object operation
  s3_client.copy_object(CopySource=copy_source_object, Bucket=destination_bucket_name, Key=destination_file_path)

  return {
      'statusCode': 200,
      'body': json.dumps('Hello from S3 events Lambda!')
  }

You can test your implementation by uploading a file in any folders of your source bucket, and then check your destination bucket of the respective Employee.

source-test-bucket-006

destination-test-bucket-006

Conclusion

This has helped us to solve the most popular use-case involved in data migration of storing files in a partitioned structure for better readability.

Hope this two series blog was useful to understand how we can use AWS Lambda and process your S3 objects based on event triggers.

Do leave your comments. Happy reading.

References

https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html

https://stackoverflow.com/questions/44648145/split-the-string-into-different-lengths-chunks

Tags: Amazon S3, AWS Lambda, S3 events, Python, Boto3, S3 Triggers, Lambda Trigger, S3 copy objects, date-partitioned, time-partitioned

Copying objects using AWS Lambda based on S3 events – Part 1

By | AWS, Blogs, Cloud | No Comments

Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies

Introduction

In this era of cloud, where data is always on the move. It is imperative for anyone dealing with moving data, to hear about Amazon’s Simple Storage Service, or popularly known as S3. As the name suggests, it is a simple file storage service, where we can upload or remove files – better referred to as objects. It is a very flexible storage and it will take care of scalability, security, performance and availability. So this is something which comes very handy for a lot of applications & use cases.

The next best thing we use here – AWS Lambda! The new world of Serverless Computing. You will be able to run your workloads easily using Lambda without absolutely bothering about provisioning any resources. Lambda takes care of it all.

Advantages

S3 as we already know is object-based storage, highly scalable & efficient. We can use it as a data source or even as a destination for various applications. AWS Lambda being serverless allows us to run anything without thinking about any underlying infrastructure. So you can use Lambda for a lot of your processing jobs or even simple communicating with any of your AWS resources.

Use Case

Copying new files to a different location(bucket/path) while preserving the hierarchy. We will use AWS Python SDK to solve this.

Problem Statement

Say, we have an application writing  files to a S3 bucket path every time an Employee updates his/her tasks at any time of the day during working hours.

For eg, The work activity of Ajay Muralidhar for 6th April 2020, of 12:00 PM will be stored in source-bucket-006/AjayMuralidhar/2020-04-06/12/my-task.txt. Refer to the Tree for more clarity. We need to move these task files to a new bucket while preserving the file hierarchy.

Solution

For solving this problem, we will use Amazon S3 events. Every file pushed to the source bucket will be an event, this needs to trigger a Lambda function which can then process this file and move it to the destination bucket.

1. Creating a Lambda Function

1.1 Go to the AWS Lambda Console and click on Create Function

1.2 Select an Execution Role for your Function

This is important because this ensures that your Lambda has access to your source & destination buckets. Either you can use an existing role that already has access to the S3 buckets, or you can choose to Create an execution role. If you choose the later, you will need to attach S3 permission to your role.

1.2.1 Optional – S3 Permission for new execution role

Go to Basic settings in your Lambda Function. You will find this when you scroll down your Lambda Function. Click Edit. You can edit your Lambda runtime settings here, like Timeout – Max of 15 mins. This is the time for which your Lambda can run. Advisable to set this as per your job requirement. Any time you get an error of Lambda timed out. You can increase this value.

Or you can also check the Permissions section for the role.

Click on View the <your-function-name>-role-<xyzabcd> role on the IAM console. This takes you to the IAM console. Click on Attach policies. You can also create inline policy if you need more control on the access you are providing. You can restrict this to particular buckets. For ease of demonstration, we are using AmazonS3FullAccess here.

Select AmazonS3FullAccess, click on Attach policy

Once the policy is successfully attached to your role, you can go back to your Lambda Function.

2. Setting S3 Event Trigger

2.1 Under Designer tab, Click on Add trigger

2.2 From the Trigger List dropdown, select S3 events

Select your source bucket. There are various event types you can choose from.

Find out more about S3 events here, https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#notification-how-to-event-types-and-destinations

We are using PUT since we want this event to trigger our Lambda when any new files are uploaded to our source bucket. You can add Prefix & Suffix if you need any particular type of files. Check on Enable Trigger

Python Script

We now write a simple Python script which will pick the incoming file from our source bucket and copy it to another location. The best thing about setting the Lambda S3 trigger is, whenever a new file is uploaded, it will trigger our Lambda. We make use of the event object here to gather all the required information.

This is how a sample event object looks like. This is passed to your Lambda function.

{
   "Records":[
      {
         "eventVersion":"2.1",
         "eventSource":"aws:s3",
         "awsRegion":"xx-xxxx-x",
         "eventTime":"2020-04-08T19:36:34.075Z",
         "eventName":"ObjectCreated:Put",
         "userIdentity":{
            "principalId":"AWS:POWERUPCLOUD:powerup@powerupcloud.com"
         },
         "requestParameters":{
            "sourceIPAddress":"XXX.XX.XXX.XX"
         },
         "responseElements":{
            "x-amz-request-id":"POWERUPCLOUD",
            "x-amz-id-2":"POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD"
         },
         "s3":{
            "s3SchemaVersion":"1.0",
            "configurationId":"powerup24-powerup-powerup-powerup",
            "bucket":{
               "name":"source-test-bucket-006",
               "ownerIdentity":{
                  "principalId":"POWERUPCLOUD"
               },
               "arn":"arn:aws:s3:::source-test-bucket-006"
            },
            "object":{
               "key":"AjayMuralidhar/2020-04-06/12/my-tasks.txt",
               "size":20,
               "eTag":"1853ea0cebd1e10d791c9b2fcb8cc334",
               "sequencer":"005E8E27C31AEBFA2A"
            }
         }
      }
   ]
}

Your Lambda function makes use of this event dictionary to identify the location where the file is uploaded.

import json
import boto3

# boto3 S3 initialization
s3_client = boto3.client("s3")


def lambda_handler(event, context):
   destination_bucket_name = 'destination-test-bucket-006'

   # event contains all information about uploaded object
   print("Event :", event)

   # Bucket Name where file was uploaded
   source_bucket_name = event['Records'][0]['s3']['bucket']['name']

   # Filename of object (with path)
   file_key_name = event['Records'][0]['s3']['object']['key']

   # Copy Source Object
   copy_source_object = {'Bucket': source_bucket_name, 'Key': file_key_name}

   # S3 copy object operation
   s3_client.copy_object(CopySource=copy_source_object, Bucket=destination_bucket_name, Key=file_key_name)

   return {
       'statusCode': 200,
       'body': json.dumps('Hello from S3 events Lambda!')
   }

You can test your implementation by uploading a file in any folders of your source bucket, and then check your destination bucket for the same file.

source-test-bucket-006

destination-test-bucket-006

You can check your Lambda execution logs in CloudWatch. Go to Monitoring and click View Logs in CloudWatch

Congrats! We have solved our problem. Just before we conclude this blog, we would like to discuss an important feature of Lambda which will help you to upscale your jobs. What if your application is writing a huge number of files at the same time? Don’t worry, Lambda will help you with this too. By default, Lambda has a Concurrency of 1000. If you need to scale up, you can increase this as per your business requirements.

Conclusion

This is how easy it was to use S3 with Lambda to move files between buckets.

In Part 2 of this series, we will try to handle a bit more complex problem, where we will try to move files as date partitioned structures at our destination.

You can find link to part 2 here :

Hope this was helpful for an overview of the basics of using s3 events triggers with AWS Lambda. Do leave your comments. Happy reading.

References

https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html

Tags: Amazon S3, AWS Lambda, S3 events, Python, Boto3, S3 Triggers, Lambda Trigger, S3 copy objects

Handling Asynchronous Workflow-Driven pipeline with AWS CodePipeline and AWS Lambda

By | AWS, Blogs, Cloud, Cloud Assessment, Data pipeline | No Comments

Written by Praful Tamrakar Senior Cloud Engineer, Powerupcloud Technologies

Most of the AWS customers use AWS lambda widely for performing almost every task, especially its a very handy tool when it comes to customizing the way your pipeline works. If we are talking about pipelines, then AWS Lambda is a service that can be directly integrated with AWS CodePipeline. And the combination of these two services make it possible for AWS customers to successfully automate various tasks, including infrastructure provisioning, blue/green deployments, serverless deployments, AMI baking, database provisioning, and deal with asynchronous behavior.

Problem Statement :

Our customer has a requirement to trigger and monitor the status of the Step Function state machine, which is a long-running asynchronous process. The customer is using the AWS Step Function to run the ETL jobs with the help of AWS Glue jobs and AWS EMR. We proposed to achieve this with Lambda but lambda has a limitation of its timeout i.e. 15 min. Now the real problem is that such an asynchronous process needs to continue and succeed even if it exceeds a fifteen-minute runtime (a limit in Lambda).

Here in this blog we have a solution in which we have figured out how we can solve and automate this approach, with the combination of lambda and AWS CodePipeline with Continuous token.

Assumptions :

This blog assumes you are familiar with AWS CodePipeline and AWS Lambda and know how to create pipelines, functions, Glue jobs and the IAM policies and roles on which they depend.

Pre-requisites:

  1. Glue jobs has already been configured
  2. A StepFunction StateMachine configured to run  Glue Jobs.
  3. CodeCommit repository for Glue scripts

Solution :

In this blog post, we discuss how a CodePipeline action can trigger a Step Functions state machine and how the pipeline and the state machine are kept decoupled through a Lambda function.

The source code for the sample pipeline, pipeline actions, and state machine used in this post is available at https://github.com/powerupcloud/lambdacodepipeline.git.

The below diagram highlights the CodePipeline-StepFunctions integration that will be described in this post. The pipeline contains two stages: a Source stage represented by a CodeCommit Git repository and a DEV stage with CodeCommit, CodeBuild and Invoke Lambda actions that represent the workflow-driven action.

The Steps involved  in the CI/CD pipeline:

  1. Developers commit AWS Glue job’s Code in the SVC (AWS CodeCommit)
  2. The AWS CodePipeline in the Tools Account gets triggered due to step
  3. The Code build steps involve multiple things as mentioned below
    • Installations of dependencies and packages needed
    • Copying the Glue and EMR jobs to S3 location where the Glue jobs will pick the script from.
  4. CHECK_OLD_SFN: The Lambda is invoked to ensure that the Previous Step function execution is not still in a running state before we run the actual Step function. Please find below the process.
    • This action invokes a Lambda function (1).
    • In (2) Lamba Checks the State Machine  Status, which returns a Step Functions State Machine status.
    • In (3) The lambda gets the execution state of the State Machine ( RUNNING || COMPLETED || TIMEOUT )
    • In (4) The Lambda function sends a continuation token back to the pipeline

If The State Machine State is RUNNING in Seconds later, the pipeline invokes the Lambda function again (4), passing the continuation token received. The Lambda function checks the execution state of the state machine and communicates the status to the pipeline. The process is repeated until the state machine execution is complete.

Else (5) Lambda  sends a Job completion token  and completes the pipeline stage.

  1.  TRIGGER_SFN_and_CONTINUE : Invoking Lambda to execute the new Step function execution and Check the status of the new execution. Please find below the process.
    • This action invokes a Lambda function (1) called the State Machine, which, in turn, triggers a Step Functions State Machine to process the request (2).
    • The Lambda function sends a continuation token back to the pipeline (3) to continue its execution later and terminates.
    • Seconds later, the pipeline invokes the Lambda function again (4), passing the continuation token received. The Lambda function checks the execution state of the state machine (5,6) and communicates the status to the pipeline. The process is repeated until the state machine execution is complete.
    • Then the Lambda function notifies the pipeline that the corresponding pipeline action is complete (7). If the state machine has failed, the Lambda function will then fail the pipeline action and stop its execution (7). While running, the state machine triggers various Glue Jobs to perform ETL operations. The state machine and the pipeline are fully decoupled. Their interaction is handled by the Lambda function.
  2. Approval to the Higher Environment. In this stage, we Add a Manual Approval Action to a Pipeline in CodePipeline. Which can be implemented using https://docs.aws.amazon.com/codepipeline/latest/userguide/approvals-action-add.html

Deployment Steps :

Step 1: Create a Pipeline

  1. Sign in to the AWS Management Console and open the CodePipeline console at http://console.aws.amazon.com/codesuite/codepipeline/home.
  2. On the Welcome page, Getting started page, or the Pipelines page, choose Create pipeline.
  3. In Choose pipeline settings, in Pipeline name, enter the pipeline name.
  4. In-Service role, do one of the following:
    • Choose a New service role to allow CodePipeline to create a new service role in IAM.
    • Choose the Existing service role to use a service role already created in IAM. In Role name, choose your service role from the list.
  5. Leave the settings under Advanced settings at their defaults, and then choose Next.

6. In the Add source stage, in Source provider, Choose Source Provider as CodeCommit.

7. Provide Repository name and Branch Name

8. In Change detection options Choose AWS CodePipeline

9. In Add build stage,  in Build provider choose AWS CodeBuild, choose the Region

10. Select the existing Project name or Create project

11. You Can add Environment Variables, which you may use in buildspec.yaml file , and click Next

NOTE: The build Step has a very special reason. Here we copy the glue script from SVC (AWS CodeCommit ) to the S3 bucket, from where the Glue job picks its script to execute in its next execution.

12. Add deploy stage, Skip deploy Stage.

13. Now Finally click Create Pipeline.

Step 2: Create the CHECK OLD SFN LAMBDA Lambda Function

  1. Create the execution role
  • Sign in to the AWS Management Console and open the IAM console

Choose Policies, and then choose Create Policy. Choose the JSON tab, and then paste the following policy into the field.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "states:*",
                "codepipeline:PutJobFailureResult",
                "codepipeline:PutJobSuccessResult"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "logs:*",
            "Resource": "arn:aws:logs:*:*:*"
        }
    ]
}
  • Choose Review policy.
  • On the Review policy page, in Name, type a name for the policy (for example, CodePipelineLambdaExecPolicy). In Description, enter Enables Lambda to execute code.
  • Choose Create Policy.
  • On the policy dashboard page, choose Roles, and then choose to Create role.
  • On the Create role page, choose AWS service. Choose Lambda, and then choose Next: Permissions.
  • On the Attach permissions policies page, select the checkbox next to CodePipelineLambdaExecPolicy, and then choose Next: Tags. Choose Next: Review.
  • On the Review page, in Role name, enter the name, and then choose to Create role.

2. Create the CHECK_OLD_SFN_LAMBDA Lambda function to use with CodePipeline

  • Open the Lambda console and choose the Create function.
  • On the Create function page, choose Author from scratch. In the Function name, enter a name for your Lambda function (for example, CHECK_OLD_SFN_LAMBDA ) .
  • In Runtime, choose Python 2.7.
  • Under Role, select Choose an existing role. In Existing role, choose your role you created earlier, and then choose the Create function.
  • The detail page for your created function opens.
  • Copy the check_StepFunction.py code into the Function code box
  • In Basic settings, for Timeout, replace the default of 3 seconds with 5 Min.
  • Choose Save.

3. Create the TRIGGER_SFN_and_CONTINUE Lambda function to use with CodePipeline

  • Open the Lambda console and choose the Create function.
  • On the Create function page, choose Author from scratch. In Function name, enter a name for your Lambda function (for example, TRIGGER_SFN_and_CONTINUE ) .
  • In Runtime, choose Python 2.7.
  • Under Role, select Choose an existing role. In Existing role, choose your role you created earlier, and then choose the Create function.
  • The detail page for your created function opens.
  • Copy the trigger_StepFunction.py code into the Function code box
  • In Basic settings, for Timeout, replace the default of 3 seconds with 5 Min.
  • Choose Save.

Step 3: Add the CHECK OLD SFN LAMBDA, Lambda Function to a Pipeline in the CodePipeline Console

In this step, you add a new stage to your pipeline, and then add a Lambda action that calls your function to that stage.

To add stage

  • Sign in to the AWS Management Console and open the CodePipeline console at http://console.aws.amazon.com/codesuite/codepipeline/home.
  • On the Welcome page, choose the pipeline you created.
  • On the pipeline view page, choose Edit.
  • On the Edit page, choose + Add stage to add a stage after the Build stage with thaction. Enter a name for the stage (for example, CHECK_OLD_SFN_LAMBDA ), and choose Add stage.
  • Choose + Add action group. In Edit action, in Action name, enter a name for your Lambda action (for example, CHECK_OLD_SFN_LAMBDA ). In Provider, choose AWS Lambda. In Function name, choose or enter the name of your Lambda function (for example, CHECK_OLD_SFN_LAMBDA )
  • In UserParameters, you must provide a JSON string with a parameter: { “stateMachineARN”: “<ARN_OF_STATE_MACHINE>” } EG: 
  • choose Save.

Step 4: Add the TRIGGER_SFN_and_CONTINUE  Lambda Function to a Pipeline in the CodePipeline Console

In this step, you add a new stage to your pipeline, and then add a Lambda action that calls your function to that stage.

To add a stage

  • Sign in to the AWS Management Console and open the CodePipeline console at http://console.aws.amazon.com/codesuite/codepipeline/home.
  • On the Welcome page, choose the pipeline you created.
  • On the pipeline view page, choose Edit.
  • On the Edit page, choose + Add stage to add a stage after the Build stage with thaction. Enter a name for the stage (for example, TRIGGER_SFN_and_CONTINUE ), and choose Add stage.
  • Choose + Add action group. In Edit action, in Action name, enter a name for your Lambda action (for example, TRIGGER_SFN_and_CONTINUE ). In Provider, choose AWS Lambda. In Function name, choose or enter the name of your Lambda function (for example, TRIGGER_SFN_and_CONTINUE )
  • In UserParameters, you must provide a JSON string with a parameter: { “stateMachineARN”: “<ARN_OF_STATE_MACHINE>” }
  • choose Save.

Step 5: Test the Pipeline with the Lambda function

  • To test the function, release the most recent change through the pipeline.
  • To use the console to run the most recent version of an artifact through a pipeline
  • On the pipeline details page, choose Release change. This runs the most recent revision available in each source location specified in a source action through the pipeline.
  • When the Lambda action is complete, choose the Details link to view the log stream for the function in Amazon CloudWatch, including the billed duration of the event. If the function failed, the CloudWatch log provides information about the cause.

Example JSON Event

The following example shows a sample JSON event sent to Lambda by CodePipeline. The structure of this event is similar to the response to the GetJobDetails API, but without the actionTypeId and pipelineContext data types. Two action configuration details, FunctionName and UserParameters, are included in both the JSON event and the response to the GetJobDetails API. The values in green text are examples or explanations, not real values.

{
    "CodePipeline.job": {
        "id": "11111111-abcd-1111-abcd-111111abcdef",
        "accountId": "111111111111",
        "data": {
            "actionConfiguration": {
                "configuration": {
                    "FunctionName": "MyLambdaFunctionForAWSCodePipeline",
                    "UserParameters": "some-input-such-as-a-URL"
                }
            },
            "inputArtifacts": [
                {
                    "location": {
                        "s3Location": {
                            "bucketName": "s3-bucket-name",
                            "objectKey": "for example CodePipelineDemoApplication.zip"
                        },
                        "type": "S3"
                    },
                    "revision": null,
                    "name": "ArtifactName"
                }
            ],
            "outputArtifacts": [],
            "artifactCredentials": {
                "secretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
                "sessionToken": "MIICiTCCAfICCQD6m7oRw0uXOjANBgkqhkiG9w
0BAQUFADCBiDELMAkGA1UEBhMCVVMxCzAJBgNVBAgTAldBMRAwDgYDVQQHEwdTZEDmFJl0ZxBHjJnyp378OD8uTs7fLvjx79LjSTbNYiytVbZPQUQ5Yaxu2jXnimvwdasdadasljdajldlakslkdjakjdkaljdaljdasljdaljdalklakkoi9494k3k3owlkeroieowiruwpirpdk3k23j2jk234hjl2343rrszlaEXAMPLE=",
                "accessKeyId": "AKIAIOSFODNN7EXAMPLE"
            },
            "continuationToken": "A continuation token if continuing job",
            "encryptionKey": { 
              "id": "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab",
              "type": "KMS"
            }
        }
    }
}

Conclusion

In this blog post, we discussed how a Lambda function can be used fully to decouple the pipeline and the state machine and manage their interaction. We also learned how asynchronous processes that need to continue and succeed, even if it exceeds a fifteen-minute runtime (a limit in Lambda) are handled using Continuous Token.

Please Visit our Blogs for more interesting articles.

Securing Spring Boot and React JS with Spring Security using JWT authentication

By | Blogs, Cloud, Cloud Assessment | 4 Comments

Written by Kiran M D Software Engineer – Powerupcloud Technologies

This article helps you set up Spring Security with Basic and JWT authentication with a full-stack application using React Js as Frontend framework and Spring Boot as the backend REST API.

Let’s understand what is the use of JWT Token and how we are going to use it in our application

JSON Web Token

JSON Web Token (JWT) is an open standard (RFC 7519) that defines a compact and self-contained way for securely transmitting information between parties as a JSON object. This information can be verified and trusted because it is digitally signed.

How does it work? 

In authentication, when the user successfully logs in using their credentials, a JSON Web Token will be returned. Since the tokens are credentials, we must prevent security bugs/breaches. In general, you should not keep tokens longer than required.

Whenever the user wants to access a protected route or resource, the user agent should send the JWT, typically in the Authorization header using the Bearer schema. The content of the header should look like the following:

Sample JSON:

{

“Authorization”: “Bearer <token>”

}

Lets see how can we integrate with Springboot and React JS.

1. Creating spring boot application and configuring JWT authentication.

1.1 Creating a sample spring boot application

Basic spring boot application can be generated using spring initializer with the following dependencies.

1.Spring web

2.spring security

Open the Spring initializer URL and add above dependency

Spring Boot REST API Project Structure

Following screenshot shows the structure of the Spring Boot project, we can create a Basic Authentication.

1.2 Add the below dependency in Pom.xml for JWT token.

<dependency>
	<groupId>io.jsonwebtoken</groupId>
	<artifactId>jjwt</artifactId>
	<version>0.9.1</version>
</dependency>
		
<dependency>
           <groupId>org.json</groupId>
           <artifactId>json</artifactId>
           <version>20180813</version>
 </dependency>

1.3 Create the following files in config package.

JwtAuthenticationEntryPoint.java

The purpose of this file is to handle exceptions and whenever JWT  token is not validated it throws Unauthorised exception.

@Component
public class JwtAuthenticationEntryPoint implements AuthenticationEntryPoint, Serializable
{
    @Override
    public void commence(HttpServletRequest request, 
	HttpServletResponse response,
	AuthenticationException authException) throws IOException 
    {
	response.sendError(HttpServletResponse.SC_UNAUTHORIZED,         "Unauthorized");
    }
}

JwtRequestFilter.java

The purpose of this file is to handle filtering the request from the client-side or react js, here is where all the request will come first before hitting the rest API, if the token validation is successful then actual API gets a request.

@Component
@Component
public class JwtRequestFilter extends OncePerRequestFilter 
{
	
	@Autowired
	private JwtUserDetailsService jwtUserDetailsService;
	
	
      @Autowired
	private JwtTokenUtil jwtTokenUtil;

	@Override
	protected void doFilterInternal(HttpServletRequest request, 
		   HttpServletResponse response, FilterChain chain)
			throws ServletException, IOException 
	{
		final String requestTokenHeader =                             request.getHeader("authorization");
		
		String username = null;
		String jwtToken = null;
		
            // JWT Token is in the form "Bearer token". 
            //Remove Bearer word and get only the Token
		if (requestTokenHeader != null &&                       requestTokenHeader.startsWith("Bearer ")) 
            {
			jwtToken = requestTokenHeader.substring(7);
			try {
				username = 
				jwtTokenUtil.getUsernameFromToken(jwtToken);

			} catch (IllegalArgumentException e) {
				System.out.println("Unable to get JWT Token");
			} catch (ExpiredJwtException e) {
				System.out.println("JWT Token has expired");
			}
		} else {
			logger.warn("JWT Token does not begin
                                with Bearer String");
		}
		// Once we get the token validate it.
		if (username != null && 
            SecurityContextHolder.getContext().getAuthentication() == null) 
		{
			UserDetails userDetails =    this.jwtUserDetailsService.loadUserByUsername(username);
		// if token is valid configure Spring Security to manually set
			// authentication
			if (jwtTokenUtil.validateToken(jwtToken, userDetails)) 
			{
				UsernamePasswordAuthenticationToken    usernamePasswordAuthenticationToken = new 		    		                        UsernamePasswordAuthenticationToken(userDetails, null, userDetails.getAuthorities());
				usernamePasswordAuthenticationToken
						.setDetails(new WebAuthenticationDetailsSource().buildDetails(request));
				
// After setting the Authentication in the context, we specify
// that the current user is authenticated. So it passes the
// Spring Security Configurations successfully.
				SecurityContextHolder.getContext().setAuthentication(usernamePasswordAuthenticationToken);
			}
		}
		chain.doFilter(request, response);
	}
}

JwtTokenUtil.java

Util class is to create and validate JWT token.

@Component
public class JwtTokenUtil implements Serializable
{
	
	public static final long JWT_TOKEN_VALIDITY = 1000 * 3600;
	
	@Value("${jwt.secret}")
	private String secret;

	// retrieve username from jwt token
	public String getUsernameFromToken(String token) {
		return getClaimFromToken(token, Claims::getSubject);
	}

	// retrieve expiration date from jwt token
	public Date getExpirationDateFromToken(String token) {
		return getClaimFromToken(token, Claims::getExpiration);
	}

	public <T> T getClaimFromToken(String token, Function<Claims, T> claimsResolver) {
		final Claims claims = getAllClaimsFromToken(token);
		return claimsResolver.apply(claims);
	}

	// for retrieveing any information from token we will need the secret key
	private Claims getAllClaimsFromToken(String token) {
		return Jwts.parser().setSigningKey(secret)
		.parseClaimsJws(token).getBody();
	}

	// check if the token has expired
	private Boolean isTokenExpired(String token) {
		final Date expiration = getExpirationDateFromToken(token);
		return expiration.before(new Date());
	}

	// generate token for user
	public String generateToken(UserDetails userDetails) {
		Map<String, Object> claims = new HashMap<>();
		String username = userDetails.getUsername();
		return doGenerateToken(claims, username);
	}

	// while creating the token -
	// 1. Define claims of the token, like Issuer, Expiration, Subject, and the ID
	// 2. Sign the JWT using the HS512 algorithm and secret key.
	// 3. According to JWS Compact
	// compaction of the JWT to a URL-safe string
	private String doGenerateToken(Map<String, Object> claims, String subject) {
		return Jwts.builder().setClaims(claims).setSubject(subject)
			.setIssuedAt(new Date(System.currentTimeMillis()))
			.setExpiration(new Date(System.currentTimeMillis() + JWT_TOKEN_VALIDITY))
			.signWith(SignatureAlgorithm.HS512, secret).compact();
	}

	// validate token
	public Boolean validateToken(String token, UserDetails userDetails) {
		final String username = getUsernameFromToken(token);
		return (username.equals(userDetails.getUsername()) && !isTokenExpired(token));
	}
}

WebSecurityConfig.java

Spring security is configured in this file that extends the websecurity configuration adapter.

@Configuration
@EnableWebSecurity
@EnableGlobalMethodSecurity(prePostEnabled = true)
public class WebSecurityConfig extends WebSecurityConfigurerAdapter 
{
	@Autowired
	private JwtAuthenticationEntryPoint jwtAuthenticationEntryPoint;

	@Autowired
	private UserDetailsService jwtUserDetailsService;

	@Autowired
	private JwtRequestFilter jwtRequestFilter;

	@Autowired
	public void configureGlobal(AuthenticationManagerBuilder auth) throws Exception {
		// configure AuthenticationManager so that it knows from where to load
		// user for matching credentials
		// Use BCryptPasswordEncoder
		auth.userDetailsService(jwtUserDetailsService).passwordEncoder(passwordEncoder());
	}

	@Bean
	public PasswordEncoder passwordEncoder() {
		return new BCryptPasswordEncoder();
	}

	@Bean
	@Override
	public AuthenticationManager authenticationManagerBean() throws Exception {
		return super.authenticationManagerBean();
	}

	public void addCorsMappings(CorsRegistry registry) {
		registry.addMapping("/**").allowedOrigins("*")
		.allowedMethods("HEAD", "GET", "PUT", "POST",
		"DELETE", "PATCH").allowedHeaders("*");
	}

	@Override
	protected void configure(HttpSecurity httpSecurity) throws Exception {
		// We don't need CSRF for this example
		httpSecurity
		.cors()
		.and()
		.csrf()
		.disable()
		.headers()
		.frameOptions()
		.deny()
		.and()
		// dont authenticate this particular request
		.authorizeRequests().antMatchers("/authenticate").permitAll().
		// all other requests need to be authenticated
		anyRequest().authenticated().and().
		// make sure we use stateless session; session won't be used to
		// store user's state.
		exceptionHandling().authenticationEntryPoint(jwtAuthenticationEntryPoint).and().sessionManagement()
		.sessionCreationPolicy(SessionCreationPolicy.STATELESS);
		// Add a filter to validate the tokens with every request
		httpSecurity.addFilterBefore(jwtRequestFilter, UsernamePasswordAuthenticationFilter.class);
	}
}

1.4 Create the following files in the controller package.

HelloWorldController.java

Simple rest API to test request, after token authentication successful, the request will come here

@RestController
@CrossOrigin(origins = "*", allowedHeaders = "*")
public class HelloWorldController {

	@RequestMapping("/dashboard")
	public String firstPage() {
		return "success";
	}
}

JwtAuthenticationController.java

This file contains authentication rest API that receives the username and password for authentication and it returns the JWT token on successful response.

@RestController
@CrossOrigin
public class JwtAuthenticationController {
	
	@Autowired
	private AuthenticationManager authenticationManager;
	
	@Autowired
	private JwtTokenUtil jwtTokenUtil;
	
	@Autowired
	private JwtUserDetailsService userDetailsService;
	
	@RequestMapping(value = "/authenticate", method = RequestMethod.POST)
	public ResponseEntity<?> createAuthenticationToken(@RequestBody JwtRequest authenticationRequest) throws Exception 
	{
		//authenticate(authenticationRequest.getUsername(), 
		authenticationRequest.getPassword());
		final UserDetails userDetails = 
		 userDetailsService.loadUserByUsername(authenticationRequest.getUsername());
		//JwtUserDetails userDetails = new JwtUserDetails();
		//userDetails.setUsername(authenticationRequest.getUsername());
			
			
		final String token = jwtTokenUtil.generateToken(userDetails);
		return ResponseEntity.ok(new JwtResponse(token));
	}
	
	private void authenticate(String username, String password) throws Exception {
		try {
			authenticationManager.authenticate(new UsernamePasswordAuthenticationToken(username, password));
		} catch (DisabledException e) {
			throw new Exception("USER_DISABLED", e);
		} catch (BadCredentialsException e) {
			throw new Exception("INVALID_CREDENTIALS", e);
		}
	}
	
}

1.5 Create following files in model package.

JwtRequest.java

It’s a pojo class that contains a username and password to get a request data in an authentication method.

public class JwtRequest implements Serializable {
	
	private String username;
	private String password;

	// need default constructor for JSON Parsing
	public JwtRequest() {

	}

	public JwtRequest(String username, String password) {
		this.setUsername(username);
		this.setPassword(password);
	}

	public String getUsername() {
		return this.username;
	}

	public void setUsername(String username) {
		this.username = username;
	}

	public String getPassword() {
		return this.password;
	}

	public void setPassword(String password) {
		this.password = password;
	}
}

JwtResponse.java

Its pojo class that return jwt token string and if we need other field to send as response then need to declare a field in this file.

public class JwtResponse implements Serializable
{
	
	private final String jwttoken;

	public JwtResponse(String jwttoken) {
		this.jwttoken = jwttoken;
	}

	public String getToken() {
		return this.jwttoken;
	}
}

JwtUserDetails.java

The class which contains spring security user details fields.

@SuppressWarnings("serial")
public class JwtUserDetails implements org.springframework.security.core.userdetails.UserDetails {

	private String username;

	@Override
	public Collection<? extends GrantedAuthority> getAuthorities() {
		return null;
	}

	@Override
	public String getPassword() {
		return null;
	}
                                                                                                                                                                                                                   
	@Override
	public String getUsername() {
		return username;
	}

	@Override
	public boolean isAccountNonExpired() {
		return false;
	}

	@Override
	public boolean isAccountNonLocked() {
		return false;
	}

	@Override
	public boolean isCredentialsNonExpired() {
		return false;
	}

	@Override
	public boolean isEnabled() {
		return false;
	}

	public void setUsername(String username) {
		this.username = username;
	}

}

1.6 Create JwtUserDetailsService in service package.

JwtUserDetailsService.java

To validate username and password and return user details object.

@Service
public class JwtUserDetailsService implements UserDetailsService {
	
	@Override
	public UserDetails loadUserByUsername(String username) throws UsernameNotFoundException 
	{
		if ("admin".equals(username)) 
		{
			return new User("admin", "$2a$10$slYQmyNdGzTn7ZLBXBChFOC9f6kFjAqPhccnP6DxlWXx2lPk1C3G6",
					new ArrayList<>());
		} else {
			throw new UsernameNotFoundException("User not found with username: " + username);
		}
	}
}

Now we have done with server side setup and next will move to the second step.

2. Creating React JS application and accessing rest API using JWT token.

Run the below command in command prompt to generate react application.

Command : npx create-react-app demo-app

After creating application use prefered IDE to import.

Understanding the React js Project Structure

Following screenshot shows the structure of the React js project and Inside src folder we are going to create the login.js , dashboard.js and Interceptors file like below.

2.1.Login.js  

Here we have a hardcoded username and password, after successful login, we will receive JWT token as a response from a server that is saved in local storage.

import React, { Component } from "react";
import axios from "axios";

class login extends Component {
  constructor() {
    super();

    this.state = {
      username: "admin",
      password: "admin"
    };
    this.handleFormSubmit = this.handleFormSubmit.bind(this);
  }

  handleFormSubmit = event => {
    event.preventDefault();

    const endpoint = "http://localhost:8080/authenticate";

    const username = this.state.username;
    const password = this.state.password;

    const user_object = {
      username: username,
      password: password
    };

    axios.post(endpoint, user_object).then(res => {
      localStorage.setItem("authorization", res.data.token);
      return this.handleDashboard();
    });
  };

  handleDashboard() {
    axios.get("http://localhost:8080/dashboard").then(res => {
      if (res.data === "success") {
        this.props.history.push("/dashboard");
      } else {
        alert("Authentication failure");
      }
    });
  }

  render() {
    return (
      <div>
        <div class="wrapper">
          <form class="form-signin" onSubmit={this.handleFormSubmit}>
            <h2 class="form-signin-heading">Please login</h2>
            <div className="form-group">
              <input type="text"
                class="form-control"
                placeholder="User name"
                value="admin"
              />
            </div>
            <div className="form-group">
              <input type="password"
                class="form-control"
                placeholder="password"
                value="admin"
              />
            </div>
            <button class="btn btn-lg btn-primary btn-block" type="submit">
              Login
            </button>
          </form>
        </div>
      </div>
    );
  }
}
export default login;

2.2.Dashboard.js

This is the Home page of the application after logging.

import React, { Component } from "react";

class dashboard extends Component {
  handleLogout() {
    localStorage.clear();
    window.location.href = "/";
  }

  render() {
    return (
      <div>
        <h1>WELCOME TO DASHBOARD</h1>
        
        <a
          href="javascript:void(0);"
          onClick={this.handleLogout}
          className="d-b td-n pY-5 bgcH-grey-100 c-grey-700">
          <i className="ti-power-off mR-10"></i>
          <span style={{ color: "white" }}>Logout</span>
        </a>
      </div>
    );
  }
}
export default dashboard;

2.3.Interceptors.js

This is a global configuration that will intercept each request by adding an authorization header with a JWT token that is stored in local storage.

var axios = require("axios");

export const jwtToken = localStorage.getItem("authorization");

axios.interceptors.request.use(
  function(config) {
    if (jwtToken) {
      config.headers["authorization"] = "Bearer " + jwtToken;
    }
    return config;
  },
  function(err) {
    return Promise.reject(err);
  }
);

2.4.App.js

This file is an entry component for react application, the new route should be configured when a new file is added like below.

import React from "react";
import "./App.css";
import { BrowserRouter, Route } from "react-router-dom";
import interceptors from "../src/Interceptors";
import login from "./login";
import dashboard from "./dashboard";

function App() {
  return (
    <div className="App">
      <header className="App-header">
        <BrowserRouter>
          <Route exact path="/" component={login} />
          <Route exact path="/dashboard" component={dashboard} />
        </BrowserRouter>
      </header>
    </div>
  );
}

export default App;

Note : Once all the files are added in the react application, start the spring boot application and start npm development server using the below command.

Command : npm start

Once the application has started, you can access the application using below url.

URL : http://localhost:3000

 

JWT Authentication URLs

You can send a POST request to

http://domain-name:port/authenticate with the request body containing the credentials.

{

  “username”:”admin”,

  “password”:”admin”

}

The Response contains the JWT token

{
"token": "eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJyYW5nYSIsImV4cCI6MTU0MjQ3MjA3NCwiaWF0IoxNTQxODY3Mjc0fQ.kD6UJQyxjSPMzAhoTJRr-Z5UL-FfgsyxbdseWQvk0fLi7eVXAKhBkWfj06SwH43sY_ZWBEeLuxaE09szTboefw"
}

Since this is the demo project, the application will work only for below username and password.

After the successful login the application landed on the home page that looks similar to below image.

Conclusion

In this article, we added authentication to our React Js app. We secured our REST APIs in the server-side and our private routes at the client-side.

Amazon EBS Multi-Attach now available on Provisioned IOPS io1 volumes

By | Blogs | No Comments

Prepared by Srividhya T (Cloud Engineer) and Jaswinder kour (Cloud Engineer)

Starting today, customers running Linux on Amazon Elastic Compute Cloud (EC2) can take advantage of new support for attaching Provisioned IOPS (io1) Amazon Elastic Block Store (EBS) volumes to multiple EC2 instances. Each EBS volume, when configured with the new Multi-Attach option, can be attached to a maximum of 16 EC2 instances in a single Availability Zone. Additionally, each Nitro-based EC2 instance can support the attachment of multiple Multi-Attach enabled EBS volumes. Multi-Attach capability makes it easier to achieve higher availability for applications that provide a write an order to maintain storage consistency.

Applications can attach Multi-Attach volumes as non-boot data volumes, with full read and write permission. Snapshots can be taken of volumes configured for Multi-Attach, just as with regular volumes, but additionally, the snapshot can be initiated from any instance that the volume is attached to, and Multi-Attach volumes also support encryption. Multi-Attach enabled volumes can be monitored using Amazon CloudWatch metrics, and to monitor performance per instance, you can use the Linux iostat tool.

I mentioned above that your applications do need to provide a write an order to maintain storage consistency, as obviously if multiple instances write data at the same time there is a risk of data being overwritten and becoming inconsistent. One simple possibility for Linux is to use a single-writer, multiple-reader approach where the volume is mounted read-write on one instance, and read-only on all others. Or you can choose to manage to enforce write ordering and consistency within your application code.

It supports the following features: –

  • Applications can attach Multi-Attach volumes as non-boot data volumes, with full read and write permission.
  • Snapshots can be taken of volumes configured for Multi-Attach, just as with regular volumes and also automates the Amazon EBS snapshot life cycle.
  • Multi-Attach volumes support EBS encryption.
  • Amazon CloudWatch metrics can be used to monitor Multi-Attach enabled volumes, and to monitor performance per instance you can use the Linux iostat tool.
  • Multi-Attach EBS volumes support Amazon CloudWatch Events.

Limitations

  • Multi-Attach enabled volumes can be attached to up to 16 Nitro-based instances that are in the same Availability Zone.
  • Multi-Attach is available in the N. Virginia (us-east-1) Oregon(us-west-2) Ireland( eu-west-1) and Asia Pacific (Seoul) Regions.
  • Multi-Attach enabled volumes can’t be created as boot volumes.
  • Multi-Attach enabled volumes can be attached to one block device mapping per instance.
  • You can’t enable or disable Multi-Attach after volume creation.You can’t change the volume type, size, or Provisioned IOPS of a Multi-Attach enabled volume.
  • Multi-Attach can’t be enabled during instance launch using either the Amazon EC2 console or Run Instances API.
  • Multi-Attach enabled volumes that have an issue at the Amazon EBS infrastructure layer are unavailable to all attached instances. Issues at the Amazon EC2 or networking layer might only impact some attached instances.
  • You can enable Multi-Attach for an Amazon EBS volume during creation only.

Getting Started With Multi Attach EBS Volumes

Configuring and using Multi-Attach volumes is a simple process for new volumes using either the AWS Command Line Interface (CLI) or the AWS Management Console.

Here I going to create a volume,configured for multi-attach and attach it to two Linux EC2 Instance.

From one instance I will write a simple text file, and from the other instance I will read the contents.

In the AWS Management Console I first navigate to the EC2 homepage, select Volumes from the navigation panel and then click Create Volume.

Choosing Provisioned IOPS SSD (io1) for Volume Type, I enter my desired size and IOPS and then check the Multi-Attach option.

To instead do this using the AWS Command Line Interface(CLI) We simple use the EC2 Create Volume Command,with the –multi-attach-enabled option,

As shown below

I can verify that Multi-Attach is enabled on my volume from the Description tab when the volume is selected. The volume table also contains a column, Multi-Attach Enabled that displays a simple ‘yes/no’ value.

With the volume created and ready for use, I next launch two T3 EC2 instances running Linux. Remember, Multi-Attach needs an AWS Nitro System based instance type and the instances have to be created in the same Availability Zone as my volume. My instances are running Amazon Linux 2, and have been placed into the us-east-1a Availability Zone, matching the placement of my new Multi-Attach enabled volume.

Once the instances are running, it’s time to attach my volume to both of them. I click Volumes from the EC2 dashboard, then select the Multi-Attach volume I created. From the Actions menu, I click Attach Volume. In the screenshot below you can see that I have already attached the volume to one instance, and am attaching to the second.

If I’m using the AWS Command Line Interface (CLI) to attach the volume, I make use of the ec2 attach-volume command, as I would for any other volume type:

For a given volume, the AWS Management Console shows me which instances it is attached to, or those currently being attached, when I select the volume:

With the volume attached to both instances, let’s make use of it with a simple test. Selecting my first instance in the Instances view of the EC2 dashboard, I click Connect and then open a shell session onto the instance using AWS Systems Manager‘s Session Manager. Following the instructions here, I created a file system on the new volume attached as /dev/sdf, mounted it as /data, and using vim I write some text to a file.

sudo mkfs -t xfs /dev/sdf

sudo mkdir /data

sudo mount /dev/sdf /data

cd /data

sudo vim file1.txt

Selecting my second instance in the AWS Management Console, I repeat the connection steps. I don’t need to create a file system this time but I do again mount the /dev/sdf volume as /data (although I could use a different mount point if I chose). On changing directory to /data, I see that the file I wrote from my first instance exists, and contains the text I expect.

Creating and working with Multi-Attach volumes is simple! Just remember, they need to be attached to and be in the same Availability Zone as the instances they are to be attached to.

Detaching an Amazon EBS Volume from an Instance

We can detach an Amazon EBS volume from an instance explicitly or by terminating the instance. However, if the instance is running, you must first unmount the volume from the instance.If an EBS volume is the root device of an instance, you must stop the instance before you can detach the volume.

We can reattach a volume that you detached (without unmounting it), but it might not get the same mount point. If there were writes to the volume in progress when it was detached, the data on the volume might be out of sync.

To unmount the /dev/sdf  device we use the following command

unmount -d /dev/sdf

In the navigation pane, choose Volumes.

Select a volume and choose Actions, Detach Volume.

In the confirmation dialog box, choose Yes, Detach.

Using Deleting-on-Termination with Multi

Attach volumes

If you prefer to make use of the option to delete attached volumes on EC2 instance termination then we recommend you have a consistent setting of the option across all of the instances that a Multi-Attach volume is attached to – use either all delete, or all retain, to allow for predictable termination behavior. If you attach the volume to a set of instances that have differing values for Delete-on-Termination then deletion of the volume depends on whether the last instance to detach is set to delete or not.

Monitoring

You can monitor a Multi-Attach enabled volume using CloudWatch Metrics for Amazon EBS voumes.

Pricing and Billing

There are no additional charges for using Amazon EBS Multi-Attach. You are billed the standard charges that apply to Provisioned IOPS SSD (io1) volumes.

Difference between EBS and EFS

Types EBS(Elastic Block Storage) EFS(Elastic File Storage)
Defination Amazon EBS is the block storage offered on AWS. An Amazon EBS volume is a persistent storage device that can be used as a file system for databases, application hosting and storage, and plug and play devices. Amazon EFS is an NFS file system service offered by AWS. An Amazon EFS file system is excellent as a managed network file system that can be shared across different Amazon EC2 instances and works like NAS devices.
Accessibility Accessible via single EC2 instance(updated to multiple provisioned instances) Accessible from multiple availability zones in the same region
Performance Manually scale the size of the volumes without stopping instance.Baseline performance of 3 IOPS per GB for General Purpose volume.Use Provisioned IOPS for increased performance Highly Scalable Managed Service.Supports up to 7000 file system operations per second
Scalability Manual Scale up Scalable
Availability 99.99 Percent No Publicly available SLA(Service level agreement)
Access Control Security group.Use-based authentication(IAM) IAM user-based authentication.Security groups
Storage and file size limits Max storage size of 16TB.No file size limit on disk No limits on size of the system. 52 TB maximum for individual files
Encryption Uses an AWS KMS–Managed Customer Master Key (CMK) and AES 256-bit Encryption standards Uses an AWS KMS–Managed Customer Master Key (CMK) and AES 256-bit Encryption standards
Storage Type Block Storage Object Storage
Data Stored   Data stored stays in the same Availability zone.Replicas are made within the AZ for higher durability Data stored in AWS EFS stays in the region.Replicas are made within the region
Data Access Can be accessed by a single Amazon EC2 instance Can be accessed by 1 to 1000’s of Ec2 instances from multiple AZs concurrently
File System Supports various file systems, including ext3 and ext4 File storage service for use with AWS EC2. EFS can be used as network file system for on-premise servers too using AWS Direct Connect.
Durability   20 times more reliable than normal hard disks Highly durable (No public SLA)
Availability Zone Failure   Cannot withstand AZ failure without point-in time EBS Snapshots Every file system object is redundantly stored across multiple Availability Zones so it can survive one AZ failure.
Data Throughput and I/O   SSD- and HDD-backed storage types. Use of SSD backed and Provisioned IOPS is recommended for dedicated IO operations as needed Default throughput of 3GB/s for all connected client
Pricing There is no additional charge for using EBS Multi-attach .You are billed the standard charges that apply to Provisioned IOPS SSD (io1)volumes. You pay only for the resources that you use.There is no minimum fee and setup charges

 

 Amazon Reference link:
https://aws.amazon.com/blogs/aws/new-multi-attach-for-provisioned-iops-io1-amazon-ebs-volumes/

Kubernetes Security Practices on AWS

By | Blogs, Cloud, Cloud Assessment, Kubernetes | One Comment

Written by Praful Tamrakar Senior Cloud Engineer, Powerupcloud Technologies

Security in Cloud and Infra level

  1. Ensure the worker nodes AMI meet the CIS benchmark. 
    1. For K8s benchmark :
    1. Below is a list of tools and resources that can be used to automate the validation of an instance of Kubernetes against the CIS Kubernetes Benchmark:
  1. Verify that the Security Groups and NACL do not allow all traffic access and the rules allow access to ports and Protocol needed only for Application and ssh purposes.
  2. Make sure that you have encryption of data at rest. Amazon KMS can be used for encryption of data at rest. For Example : 
  • EBS volumes for ControlPlane nodes and worker nodes can be encrypted via KMS.
  • You can encrypt the  Logs Data either in Cloudwatch Logs or  in S3 using KMS.
  1. If Instance(s) are behind the ELB, make sure you have configured HTTPS encryption and decryption process (generally known as SSL termination) handled by an Elastic Load Balancer.
  2. Make sure the worker nodes and RDS are provisioned in Private Subnets.
  3. It’s always best practise to have a Separate Kubernetes(EKS) cluster for each Environment( Dev/UAT/Prod).
  4. Ensure to use AWS Shield/WAF to prevent DDOS attacks.

Container Level

  1. Ensure to use a minimal base image ( Eg: Alpine image to run the App)
  2. Ensure  that the docker image  registry you are using is a trusted, authorized and private registry. EG: Amazon ECR.
  3. Make sure you remove all the unnecessary files in your docker image. Eg: In tomcat server, you need to remove: 
  • $CATALINA_HOME/webapps/examples
  • $CATALINA_HOME/webapps/host-manager
  • $CATALINA_HOME/webapps/manager
  • $CATALINA_HOME/conf/Catalina/localhost/manager.xml 
  1. Ensure to disable the display of the app-server version or server information. For example, below in the Tomcat server, we can see the server information is displayed. This can be mitigated using the procedure below.

Update an empty value to server.info(server.info=””) in the file,$CATALINA_HOME/lib/org/apache/catalina/util/ServerInfo.properties

  1. Ensure not to copy or add any sensitive file/data in the Docker image, it’s always recommended to use Secrets ( K8s secrets are encrypted at rest by default onwards Kubernetes v1.13 )  You may also use another secret management tool of choice such as  AWS Secret Manager/Hashicorp Vault.
    • Eg: do not enter Database Endpoints, username, passwords in the docker file. Use K8s secrets and these secrets can be used as an Environmental variables  
apiVersion: v1
kind: Pod
metadata:
  name: secret-env-pod
spec:
  containers:
  - name: myapp
    image: myapp
    env:
      - name: DB_USERNAME
        valueFrom:
          secretKeyRef:
            name: dbsecret
            key: username
      - name: DB_PASSWORD
        valueFrom:
          secretKeyRef:
            name: dbsecret
            key: password
      - name: DB_ENDPOINT
        valueFrom:
          secretKeyRef:
            name: dbsecret
            key: endpoint

5. Ensure to disable Bash from the container images.

6. Endorse Multi-Stage build for smaller, cleaner and secure images.

To understand how can you leverage multi-stage can be found on :

https://docs.docker.com/develop/develop-images/multistage-build/

7. Verify that the container images are scanned for vulnerability assessment before it is pushed to the registry. The AWS ECR has the feature that you can scan Repository to Scan on Push. Eg : CLAIR/AQUA/etc assessment tools can be used to scan images. These tools can be embedded in the CI/CD pipeline making sure if there is any vulnerability, the docker image push can be rejected/terminated. Find sample implementation  – https://www.powerupcloud.com/email-va-report-of-docker-images-in-ecr/

K8s level

  1. Make sure to use or upgrade Kubernetes to the latest stable version.
  2. It’s recommended not to use default namespace. Instead, create a namespace for each application, i.e separate Namespaces for separate sensitive workloads.
  3. Make sure to enable Role-Based Access Control (RBAC) for Clients( Service Accounts / Users) for restricted privileges.

RBAC Elements:

  • Subjects: The set of users and processes that want to access the Kubernetes API.
  • Resources: The set of Kubernetes API Objects available in the cluster. Examples include Pods, Deployments, Services, Nodes, and PersistentVolumes, among others.
  • Verbs: The set of operations that can be executed to the resources above. Different verbs are available (examples: get, watch, create, delete, etc.), but ultimately all of them are Create, Read, Update or Delete (CRUD) operations.

Let’s see  RBAC meant for seeing Kubernetes as a production-ready platform.

  • Have multiple users with different properties, establishing a proper authentication mechanism.
  • Have full control over which operations each user or group of users can execute.
  • Have full control over which operations each process inside a pod can execute.
  • Limit the visibility of certain resources of namespaces.

4. Make sure to standardize the naming and labeling Convention of the Pod, Deployment, and service. This will ease the operational burden for security management ( Pod Network Policy ).

5. Ensure to use Kubernetes network policy which will restrict the  Pods communication, i.e how groups of pods are allowed to communicate with each other and other network endpoints. Please find how to implement the network policy in Amazon EKS https://blog.powerupcloud.com/restricting-k8s-services-access-on-amazon-eks-part-ix-7d75c97c9f3e

6. AWS Single Sign-On (SSO), AWS Managed Microsoft Active Directory Service, and the AWS IAM authenticator can be used to control access to your Amazon EKS cluster running on the AWS cloud.

7. Corroborate to use Pod Security Context.

  • Ensure to disable root access. the docker image should be accessible from a non-root user
  • Make sure to configure read-only root file system
  • Security-Enhanced Linux (SELinux): You can assign SELinuxOptions objects using the seLinuxOptions field. Note that the SELinux module needs to be loaded on the underlying Linux nodes for these policies to take effect.
  • Make sure  Linux capabilities and/or add non-default Linux capabilities are used if it’s required.
  • Make sure not to run pods/containers as privileged unless you will require access to all devices on the host. Permission to access an object, like a file, is based on user ID (UID) and group ID (GID).

Please Find the  Snippet for Pod Security Context  :

...
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
    readOnlyRootFilesystem: true
    allowPrivilegeEscalation: false
    seLinuxOptions:
    level: "s0:c123,c456"
    capabilities:
      drop:
        - NET_RAW
        - CHOWN
      add: ["NET_ADMIN", "SYS_TIME"]
...

Note : Pod Security content can be used in pod as well as container level.

apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo-2
spec:
  #Pod level
  securityContext:
    runAsUser: 1000
  containers:
  - name: sec-ctx-demo-2
    image: gcr.io/google-samples/node-hello:1.0
   #container level
    securityContext:
      runAsUser: 2000
      allowPrivilegeEscalation: false

8. Make sure to embed these Kubernetes Admission Controllers in all possible ways.

  • AlwaysPullImages – modifies every new Pod to force the image pull policy to Always. This is useful in a multitenant cluster so that users can be assured that their private images can only be used by those who have the credentials to pull them.
  • DenyEscalatingExec – will deny exec and attach commands to pods that run with escalated privileges that allow host access. This includes pods that run as privileged, have access to the host IPC namespace or have access to the host PID namespace.
  • ResourceQuota – will observe the incoming request and ensure that it does not violate any of the constraints enumerated in the ResourceQuota object in a Namespace.
  • LimitRanger- will observe the incoming request and ensure that it does not violate any of the constraints enumerated in the LimitRange object in a Namespace. Eg: CPU and Memory

10. Ensure to scan Manifest files (yaml/json) for which any credentials are passed in objects ( deployment, charts )  Palo Alto Prisma / Alcide Kubernetes Advisor.

11. Ensure to use TLS authentication for Tiller when Helm is being used.

12. It’s always recommended not to use a default Service account

  • The default service account has a very wide range of permissions in the cluster and should, therefore be disabled.

13. Do not create a Service Account or a User which has full cluster-admin privileges unless necessary,  Always follow Least Privilege rule.

14. Make sure to disable anonymous access and send Unauthorized responses to unauthenticated requests. Verify the following Kubernetes security settings when configuring kubelet parameters:

  • anonymous-auth is set to false to disable anonymous access (it will send 401 Unauthorized responses to unauthenticated requests).
  • kubelet has a `–client-ca-file flag, providing a CA bundle to verify client certificates.
  • –authorization-mode is not set to AlwaysAllow, as the more secure Webhook mode will delegate authorization decisions to the Kubernetes API server.
  • –read-only-port is set to 0 to avoid unauthorized connections to the read-only endpoint (optional).

15. Ensure to put restricted access to etcd from only the API server and nodes that need that access. This can be restricted in the Security Group attached to ControlPlane.

K8s API call level

  1. Ensure that all the communication from the client(Pod/EndUser) to the K8s(API SERVER) should be TLS encrypted
    1. May experience throttle if huge API calls happen
  2. Corroborate that all the communication from k8s API server to ETCD/Kube Control Manager/Kubelet/worker node/Kube-proxy/Kube Scheduler  should be TLS encrypted
  3. Enable Control Plane API to call logging and Auditing. ( EG:  EKS Control Plane Logging)
  4. If you are using Managed Services for K8s such as Amazon  EKS, GKE, Azure Kubernetes Service (AKS), these all things are taken care

EKS Security Considerations

  • EKS does not support Kubernetes Network Policies or any other way to create firewall rules for Kubernetes deployment workloads apart from Security Groups on the Worker node, since it uses VPC CNI plugin by default, which does not support network policy. Fortunately, this has a simple fix. The Calico CNI can be deployed in EKS to run alongside the VPC CNI, providing Kubernetes Network Policies support.
  • Ensure to Protect EC2 Instance Role Credentials and Manage AWS IAM Permissions for Pods. These can be configured by using below tools:
  • By using the IAM roles for the service accounts feature, we no longer need to provide extended permissions to the worker node’s IAM role so that pods on that node can call AWS APIs. We can scope IAM permissions to a service account, and only pods that use that service account have access to those permissions. This feature also eliminates the need for third-party solutions such as kiam or kube2iam.

https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-technical-overview.html

Security Monitoring of K8s

Sysdig Falco is an open-source, container security monitor designed to detect anomalous activity in your containers. Sysdig Falco taps into your host’s (or Node’s in the case Kubernetes) system calls to generate an event stream of all system activity. Falco’s rules engine then allows you to create rules based on this event stream, allowing you to alert on system events that seem abnormal. Since containers should have a very limited scope in what they run, you can easily create rules to alert on abnormal behavior inside a container.

Ref: https://sysdig.com/opensource/falco/

The Alcide Advisor is a Continuous Kubernetes and Istio hygiene checks tool that provides a single-pane view for all your K8s-related issues, including audits, compliance, topology, networks, policies, and threats. This ensures that you get a better understanding and control of distributed and complex Kubernetes projects with a continuous and dynamic analysis. A partial list of the checks we run includes:

  • Kubernetes vulnerability scanning
  • Hunting misplaced secrets, or excessive secret access
  • Workload hardening from Pod Security to network policies
  • Istio security configuration and best practices
  • Ingress controllers for security best practices.
  • Kubernetes API server access privileges.
  • Kubernetes operators security best practices.

Ref :https://aws.amazon.com/blogs/apn/driving-continuous-security-and-configuration-checks-for-amazon-eks-with-alcide-advisor/

Automate and Manage AWS KMS from Centralized AWS Account

By | AWS, Blogs, Cloud, Cloud Assessment | No Comments

Written by Priyanka Sharma, DevOps Architect, Powerupcloud Technologies

As we have discussed in our previous blog that we use the AWS Landing Zone concept for many of our customers which consists of separate AWS accounts so they can meet the different needs of their organization. One of the accounts is the Security account where the security-related components reside. KMS Keys are one of the security-related key components that helps in the encryption of data. 

A Customer Master Key (CMK) is a logical representation of a master key which includes the following details:

  • metadata, such as the key ID, creation date, description
  • key state
  • key material used to encrypt and decrypt data.

There are three types of AWS KMS:

  • Customer Managed CMK: CMKs that you create, own, and manage. You have full control over these CMKs.
  • AWS Managed CMK: CMKs that are created, managed, and used on your behalf by an AWS service that is integrated with AWS KMS. Some AWS services support only an AWS managed CMK.
  • AWS Owned CMK:  CMKs that an AWS service owns and manages for use in multiple AWS accounts. You do not need to create or manage the AWS owned CMKs.

This blog covers the automation of Customer Managed CMKs i.e. how we can use the Cloudformation templates to create the Customer Managed CMKs. It also discusses the strategy that we follow for our enterprise customers for enabling encryption in cross accounts. 

KMS Encryption Strategy

We are covering the KMS strategy that we follow for most of our customers.

In each of the Accounts, create a set of KMS Keys for the encryption of data. For example,

  • UAT/EC2
    • For enabling the default EC2 encryption, go to Ec2 dashboard settings in the right-hand side as shown in the below screenshot:

Select “Always encrypt the EBS volumes” and Change the default key. Paste the ARN of the UAT/EC2 KMS Key ARN.

  • UAT/S3
    • Copy the ARN of UAT/S3 KMS Key ARN.
    • Go to Bucket Properties and Enable Default Encryption with Custom AWS-KMS. Provide the KMS ARN from the security account.
  • UAT/RDS
    • This Key can be used while provisioning the RDS DB instance.
    • Ensure to provide the Key ARN if using via cross-account
  • UAT/OTHERS

Automated KMS Keys Creation

Below Cloudformation template can be used to create a set of KMS Keys as follows:

https://github.com/powerupcloud/automate-kms-keys-creation/blob/master/kms-cf-template.json

Ensure to replace the SECURITY_ACCOUNT_ID variable with the 12-digit AWS security account ID where KMS keys will be created.

The CF Template does the following:

  • Creates the below KMS Keys in the Target Account:
    • PROD/EC2
      • It is used to encrypt the EBS Volumes.
    • PROD/S3
      • Used to encrypt the S3 buckets.
    • PROD/RDS
      • Used to encrypt the RDS data.
    • PROD/OTHERS
      • It can be used to encrypt the AWS resources other than EC2, S3, and RDS. For example, if EFS requires to be created in the production account, PROD/OTHERS KMS key can be used for the encryption of EFS.
  • In our case, we are using the Landing Zone concept, so the “OrganizationAccountAccessRole” IAM Role used for Switch Role access from Master account is one of the Key Administrator.
  • Also, We have enabled the Single sign-on in our account, the IAM Role created by SSO “AWSReservedSSO_AdministratorAccess_3687e92578266b74″ has also the Key Administrator access.

The Key administrators can be changed as required in the Key Policy.

The “ExternalAccountID” in the Cloudformation parameters is used to enable the cross-account access via KMS Key policy.

Hope you found it useful.

Transfer Data from Mysql to BigQuery using Data Fusion Pipeline

By | Blogs, Cloud Assessment, data, Data pipeline | One Comment

Written by Anjali Sharma, Software Engineer at Powerupcloud Technologies

What is Cloud Data Fusion Pipeline? –

Cloud Data fusion is an enterprise data integration service provided by Google for quickly building and managing pipelines. A fully managed, cloud-native platform where a source(MySQL) and sink(bigquery) can be connected easily without using any code.

Since it’s a code-free environment anyone can use it easily without having any hindrances of technical skills or coding knowledge.

Cloud Data Fusion is built on the open-source project CDAP, and this open core ensures data pipeline portability for users. CDAP’s broad integration with on-premises and public cloud platforms gives Cloud Data Fusion users the ability to break down silos and deliver insights that were previously inaccessible.

How it does look on Cloud Data Fusion Platform

Why Cloud Data Fusion?

Now the question arises, why do we use cloud data fusion as we have other options to make a connection from MySQL DB to bigquery for ETL/ELT.

Data fusion pipelines provide fully managed, virtual interface, easy to use, fully scalable, fully distributed platform that enables you to connect to many different data sources easily.

Data fusion pipelines have the flexibility to have all the pipelines as code and enable you to use rest API calls to create and trigger pipelines. Hence cloud data fusion is a complete package to develop data pipelines easily and efficiently.

How do we create the data pipeline? –

Creating data fusion pipeline is quite easy on the Google cloud platform, we can get it done by following a few steps-

Step1- Go to GCP console find Cloud data fusion click on ‘Create Instance’.

Step2- Fill the instance name and region name and click on create.

Step3- It will take 10-15 minutes to create an instance, now go to view instance and click on redirect URL.

Step4-  Now you are inside cloud data fusion instance, click on HUB and choose a pipeline(import data from MySQL).

Step5- Along with pipelines in HUB you are getting several options. Choose import data from MySQL. Now we’re going to install Driver.

Step6-  Install Google cloud JDBC driver which will make a connection to let MySQL database communicate bigquery. We can find the driver from here itself but make sure the driver is of the latest version.

Step7-  Now go to Navigation Bar and click on the control center.

Step8-  Go to green encircled plus symbol and upload the latest version of JDBC driver.

Step9- Give a name to the driver and a suitable class name which is invalid format com.example.myclass and click on finish.

Step10- Now again go to HUB, click on import data from MySQL pipeline and click on create. Give a name to the pipeline and finish. Now you are able to customize your pipeline.

Here in the cloud data fusion studio we can change source and sink accordingly as here we need to connect Database(source) to Bigquery(sink).

Step11- Go to database properties and fill the plugin name and types. After filling the details Browse database and click on Add connection.

Step12- Here you will find installed Mysql driver click on it and put connection name, host, port, database name, id, password.

Step13- Test the connection and add it.

Step14- Now you are able to import your query. Deploy the Pipeline.

Step15- You have deployed your data fusion pipeline successfully.

Conclusion

Cloud Data Fusion takes care of most of ETL/ELT works for you. And since it’s part of Google Cloud, you can take advantage of built-in security benefits when using Cloud Data Fusion rather than self-managed CDAP servers:

  • Cloud-native security control with Cloud IAM—Identity management and authentication efforts are taken care of by Cloud Identity.
  • Full observability with Stackdriver Logging and Monitoring—Logs include pipeline logs and audit logs
  • Reduced exposure to the public internet with private networking.

Cloud Data Fusion offers both preconfigured transformations from an OSS library as well as the ability to create an internal library of custom connections and transformations that can be validated, shared, and reused across an organization. It lays the foundation for collaborative data engineering and improves productivity. That means less waiting for data engineers and, importantly, less sweating about code quality.

Running Amazon EKS behind Customer HTTP Proxy without NAT

By | Blogs, Cloud, Cloud Assessment | One Comment

Written by Praful Tamrakar and Manoj S Rao, Senior Cloud Engineer, Powerupcloud Technologies

Most of the enterprise customers would use the Proxy for the indirect network connections to other network services.

One of our customers has the following network configurations in AWS:

  • No AWS NAT gateway and Internet Gateway  for outbound traffic
  • All the traffic to the Internet must go via Proxy to reduce surface attacks and all such traffic will be monitored pro-actively.
  • All URLs outside the VPC must be whitelisted

The DNS and proxy resolution Diagram

Problem Statement:

With the above networking configurations, we had a requirement to have an EKS Cluster in private subnets. We faced multiple challenges (as mentioned below) with the EKS connectivity.

To join the worker nodes to an EKS cluster, we require to execute a bootstrap command through the user-data script in the EC2 server. With our networking configurations, the kubelet was not able to start after executing the bootstrap command and we were facing connection timed out  issues in the below two scenarios:

  1. When the kubelet service was trying to pull the pod-infra-container image from the docker API call.
  2. When kubelet service was trying to connect to EC2 and ECR API call

With the below solution we were able to resolve both the issues and run EKS behind the proxy successfully.

In this article, we are elaborating on how we can achieve and automate the configuration of an HTTP proxy for Amazon Elastic Kubernetes Service (Amazon EKS) worker nodes with the help of user data.

Assumptions and requisites:

  1. This solution can be used for either Terraform or Amazon Cloudformation which will help you to create an EKS cluster worker-node initial setup or upgrading the worker node.

The cloud formation script can be found on ->

https://github.com/powerupcloud/kubernetes-spot-webinar/tree/master/provision-eks-worker-nodes

The terraform Script can be found on ->

https://learn.hashicorp.com/terraform/aws/eks-intro

  1. You must edit the user-data on both the above method with the solution mentioned below.
  2. IF the EKS cluster API Endpoint setup is a Private subnet and does not have NAT Gateway,  Please setup VPC endpoint for Amazon EC2 and Amazon ECR. ( please ensure the EC2 and ECR endpoint Security Groups must be same as the worker node Security Group)

Resolution

1.    Let’s find out the CIDR Block of the cluster :

kubectl get service kubernetes -o jsonpath='{.spec.clusterIP}'; echo

This will return either 10.100.0.1, or 172.20.0.1, which means that your cluster IP CIDR block is either 10.100.0.0/16 or 172.20.0.0/16.

2.    Let’s create a ConfigMap file named proxy-env-vars-config.yaml.

If the output from the command in step 1 has an IP from the range 172.20.x.x, then structure your ConfigMap file will be:

apiVersion: v1
kind: ConfigMap
metadata:
name: proxy-environment-variables
namespace: kube-system
data:
HTTPS_PROXY:http://customer.proxy.host:proxy_port
HTTP_PROXY: http://customer.proxy.host:proxy_port
NO_PROXY: 172.20.0.0/16,localhost,127.0.0.1,VPC_CIDR_RANGE,169.254.169.254,.internal,.s3.amazonaws.com,.s3.<aws-region-code>.amazonaws.com

If the output from the command in step 1 has an IP from the range 10.100.x.x, then structure your ConfigMap file as follows:

apiVersion: v1
kind: ConfigMap
metadata:
name: proxy-environment-variables
namespace: kube-system
data:
HTTPS_PROXY:http://customer.proxy.host:proxy_port
HTTP_PROXY: http://customer.proxy.host:proxy_port
NO_PROXY: 10.100.0.0/16,localhost,127.0.0.1,VPC_CIDR_RANGE,169.254.169.254,.internal,.s3.amazonaws.com,.s3.<aws-region-code>.amazonaws.com

3. Now we will create a  ConfigMap :

kubectl apply -f /path/to/yaml/proxy-env-vars-config.yaml

Consider the following:

  • If you use a VPC endpoint, add a public endpoint subdomain to NO_PROXY (for example, with an Amazon Simple Storage Service (Amazon S3) endpoint in the region you run your EK cluster.).
  • You don’t need a proxy configuration to communicate, because the kube-dns pod communicates directly with the Kubernetes service.
  • Verify that the NO_PROXY variable in the proxy-environment-variables ConfigMap (used by the kube-proxy and aws-node pods) includes the Kubernetes cluster IP address space.

4.   Now we will come to Bootstrapping the worker nodes to configure the Docker daemon and kubelet by injecting user data into your worker nodes.

We must update or create yum, Docker, and kubelet configuration files before starting the Docker daemon and kubelet.

Let’s take user data injected into worker nodes using an AWS CloudFormation template that’s launched from the AWS Management Console, see Launching Amazon EKS Worker Nodes.

#Set the proxy hostname and port
PROXY="http://customer.proxy.host:proxy_port"
VPC_CIDR=VPC_CIDR_RANGE

#Create the docker systemd directory
mkdir -p /etc/systemd/system/docker.service.d

#Configure yum to use the proxy
cat << EOF >> /etc/yum.conf
proxy=http://$PROXY
EOF

#Set the proxy for future processes, and use as an include file
cat << EOF >> /etc/environment
http_proxy=$PROXY
https_proxy=$PROXY
HTTP_PROXY=$PROXY
HTTPS_PROXY=$PROXY
no_proxy=$VPC_CIDR,localhost,127.0.0.1,169.254.169.254,.internal,.<aws-region-code>.eks.amazonaws.com
NO_PROXY=$VPC_CIDR,localhost,127.0.0.1,169.254.169.254,.internal,.<aws-region-code>.eks.amazonaws.com
EOF

#Configure docker with the proxy
tee <<EOF /etc/systemd/system/docker.service.d/proxy.conf >/dev/null
[Service]
EnvironmentFile=/etc/environment
EOF


#Configure the kubelet with the proxy
tee <<EOF /etc/systemd/system/kubelet.service.d/proxy.conf >/dev/null
[Service]
EnvironmentFile=/etc/environment
EOF

#!/bin/bash
set -o xtrace

#Set the proxy variables before running the bootstrap.sh script
set -a
source /etc/environment

/etc/eks/bootstrap.sh ${ClusterName} ${BootstrapArguments}
/opt/aws/bin/cfn-signal
    --exit-code $? \
    --stack  ${AWS::StackName} \
    --resource NodeGroup  \
    --region ${AWS::Region}

5. To update the AWS-node and Kube-proxy pods, run the following commands:

kubectl patch -n kube-system -p '{ "spec": {"template": { "spec": { "containers": [ { "name": "aws-node", "envFrom": [ { "configMapRef": {"name": "proxy-environment-variables"} } ] } ] } } } }' daemonset aws-node

kubectl patch -n kube-system -p '{ "spec": {"template":{ "spec": { "containers": [ { "name": "kube-proxy", "envFrom": [ { "configMapRef": {"name": "proxy-environment-variables"} } ] } ] } } } }' daemonset kube-proxy

6. If you change the ConfigMap, apply the updates, and then set the ConfigMap in the pods again to initiate an update as follows:

kubectl set env daemonset/kube-proxy --namespace=kube-system --from=configmap/proxy-environment-variables --containers='*'

kubectl set env daemonset/aws-node --namespace=kube-system --from=configmap/proxy-environment-variables --containers='*'

Note: You must update all YAML modifications to the Kubernetes objects kube-dns or AWS-node when these objects are upgraded. To update a ConfigMap to a default value :

With EKSCTL:

eksctl utils update-kube-proxy

WARNINGS :

If the proxy loses connectivity to the API server, then the proxy becomes a single point of failure and your cluster’s behavior can become unpredictable. For this reason, it’s best practice to run your proxy behind a service discovery namespace or load balancer, and then scale as needed.

And that’s all..!! Hope you found it useful. Keep following our Kubernetes series for more interesting articles.

References:

Migrate for Anthos: Modernized approach for migrating Compute Engine to Kubernetes Engine

By | Blogs, Cloud, Kubernetes | No Comments

Written by Madan Mohan K, Associate Cloud Architect

ANTHOS-One Management Solution for a hybrid cloud and multi-cloud world

The growing importance of hybrid cloud and multi-cloud environments is transforming the entire computing industry as well as the way businesses can leverage technology to innovate. Economics and speed are the two greatest issues driving this market change. Using a hybrid cloud/multi-cloud not only allows companies to scale computing resources, but it also eliminates the need to make massive capital expenditures to handle short-term spikes in demand as well as when the business needs to free up local resources for more sensitive data or applications.

Anthos:

Anthos is an open-source application platform that enables an enterprise to modernize their existing applications on hybrid or multi-cloud environments. You can build new VMs and run them anywhere in a secure manner. Anthos is built on open source technologies pioneered by Google—including Kubernetes, Istio, and Knative—and enables consistency between on-premises and cloud environments.

When workloads are upgraded to containers, IT departments can eliminate OS-level maintenance and security patching for VMs and automate policy and security updates at scale. Monitoring across on-premises and cloud environments are done through a single interface in the Google Cloud Console.

Scenario:

Rewriting existing applications to Kubernetes isn’t always possible or feasible to do manually. That’s where Migrate for Anthos can help, by modernizing the existing applications and getting them to run in Kubernetes.

Migrate for Anthos:

Migrate for Anthos provides an almost real-time solution to take an existing VM and make it available as a Kubernetes hosted pod with all the values associated with executing the applications in a Kubernetes cluster.

Let’s look at an example, migrating a Compute Engine instance to a Kubernetes Engine cluster running Migrate for Anthos to start with the basics.

Prerequisites:

https://cloud.google.com/migrate/anthos/docs/gce-to-gke-prerequisites

Compatible VM operating systems:

https://cloud.google.com/migrate/anthos/docs/compatible-os-versions

Instance Creation:

  • From the Console go to Compute Engine > VM Instances, then click the Create button
  • Name the instance “migrate-vm-anthos” or whichever preferred, check the box for “Allow HTTP traffic“, and accept all the other defaults. Click Create.
  • Once the VM is created, SSH

Install the Apache web server by running the following commands:

sudo apt-get update
 sudo apt-get install apache2 -y
 echo "Hello World" > index.html
 sudo mv index.html /var/www/html

A sample Hello World page is displayed.

Note: To migrate the VM, first stop it from running

We need a Kubernetes cluster to migrate the virtual machine into. The Migrate for Anthos app can be deployed to an existing cluster as well. Let’s install Migrate for Anthos through the Google Cloud Marketplace.

Deploying Migrate for Anthos

Navigate to Market place and search Migrate for Anthos.

Click the configure button.

For this lab, we can accept the default settings. Click the Create cluster button.

Once the cluster is created, check the box to accept the Terms of Service, then click the Deploy button. The migration for the Anthos environment will now be set up.

Migrating your VM to your new container:

Open cloud shell and run the following

pip3 install --user pyyaml 

This installs a Python prerequisite that will process YAML files.

Execute the following command

python3 /google/migrate/anthos/gce-to-gke/clone_vm_disks.py \
  -p $GOOGLE_CLOUD_PROJECT \
  -z us-central1-a \
  -T us-central1-a \
  -i migrate-vm-anthos \
  -A myworkload \
  -o myYaml.yaml

This command will take a few minutes to complete. The migration file is a YAML file called myYaml.yaml, created by Migrate for Anthos. When deployed to Kubernetes, it will perform the migration.

A successful yaml generation is seen in the below screenshot

Next we must initialize the kubectl environment and perform the migration by running the following command

kubectl apply -f myYaml.yaml

The execution result is obtained as shown

In the Console, from the Navigation menu, browse to Kubernetes Engine > Workloads. You will see a workload called myworkload. Wait for its status to change to OK

Validate the Migrated Instance:

In Cloud Shell, log in to the Kubernetes pod that is running the workload that has been migrated:

kubectl exec -it myworkload-0 -- /bin/bash
curl localhost

Well, the application works as expected.

Open the code editor in cloud shell and edit the myYaml.yaml file and do add the following at the top.

apiVersion: v1
kind: Service
metadata:
  name: myworkload-svc
  labels:
    app: myworkload
spec:
  type: LoadBalancer
  ports:
  - port: 80
    name: web
  selector:
    app: myworkload
---

Now find the entry for the StatefulSet definition. Find the entry for containers that is nested at. Add the following two lines directly below myworkload

ports:
- containerPort: 80

Make sure the indentation places the ports,element as child of name: myworkload. Save the file.

Apply the Kubernetes changes:

kubectl apply -f myYaml.yaml

There you go the service is exposed and we can validate it by navigating to the Service & Ingress section.

A browser tab will appear, and you will see the web page which is the simple text “Hello World”. This illustrates that you have successfully migrated the web server that was running in the Compute Engine to be running in a Kubernetes cluster.

Inference:

Anthos unites all of Google Cloud Platform’s powerful tools under one roof, and in doing so it delivers unprecedented efficiency, scalability, and cost-effectiveness to IT operations. With an introduction to Anthos, an organization can enjoy the full benefits of managing its multi and hybrid cloud environment at ease and it also offers the ability to innovate using cloud technologies.