Category

Blogs

Serverless Data processing Pipeline with AWS Step Functions and Lambda

By | Blogs, data, Data Lake, Data pipeline | No Comments

Written by Arun Kumar, Associate Cloud Architect at Powerupcloud Technologies

In the traditional ETL world generally, we use our own scripts or any paid tool or Open source Data processing tool or an orchestrator to deploy our data pipeline. If the Data processing pipeline is not complex, if we use these server-based solutions then sometimes it would add additional costs(considering deploying some non-complex pipelines). In AWS we have multiple serverless solutions Lambda and Glue. But lambda has the execution time limitation and Glue is running an EMR cluster in the background, so ultimately it’ll charge you a lot. So we decided to explore AWS Step functions with Lambda which are serverless at the same time as an orchestration service that executes our process on the event bases and terminates the resources post-execution of the process. Let’s see how we can build a data pipeline with this.

Architecture Description:

  1. The Teradata server from on-prem will send the input.csv  file to the S3  bucket(data_processing folder) on schedule bases.
  2. CloudWatch Event Rule will trigger the step function in the case of PutObject in the specified S3 bucket and start processing the input file.
  3. The cleansing script placed on ECS.
  4. AWS Step function will call Lambda Function and it will trigger ECS tasks(a bunch of Python and R script).
  5. Once the cleansing is done the output file will be uploaded to the target S3 bucket.
  6. AWS lambda function will be triggered to get the output file from the target bucket and send it to the respective team.

Create a custom CloudWatch Event Rule for S3 put object operation

Choose Event Pattern -> Service Name -> S3 -> Event Type -> Object level operations -> choose put object -> give the bucket name.

  • In targets choose the step function to be triggered -> give the name of the state machine created.
  • Create a new role or existing role as a cloud watch event requires permission to send events to your step function.
  • Choose one more target to trigger the lambda function -> choose the function which we created before.

AWS Management Console and search for Step Function

  • Create a state machine
  • On the Define state machine page select Author with code snippets.
  • Give a name. Review the State machine definition and visual workflow.
  • Use the graph in the Visual Workflow pane to check that your Amazon States Language code describes your state machine correctly.
  • Create or If you have previously created an IAM role select created an IAM role.

Create an ECS with Fargate

ECS console -> choose create cluster -> choose cluster template -> Networking only -> Next -> Configure -> cluster -> give cluster name -> create

In the navigation pane, choose Task Definitions, Create new Task Definition.

On the Select compatibilities page, select the launch type that your task should use and choose Next step. Choose Fargate launch type.

For Task Definition Name, type a name for your task definition.

For Task Role, choose an IAM role that provides permissions for containers in your task to make calls to AWS API operations on your behalf.

To create an IAM role for your tasks

a.   Open the IAM console.

b.   In the navigation pane, choose Roles, Create New Role.

c.   In the Select Role Type section, for the Amazon Elastic Container Service Task Role service role, choose Select.

d.   In the Attach Policy section, select the policy to use for your tasks and then choose Next Step.

e.   For Role Name, enter a name for your role. Choose Create Role to finish.

Task execution IAM role, either select your task execution role or choose to Create a new role so that the console can create one for you.

Task size, choose a value for Task memory (GB) and Task CPU (vCPU).

For each container in your task definition, complete the following steps:

a.   Choose Add container.

b.   Fill out each required field and any optional fields to use in your container definitions.

c.   Choose Add to add your container to the task definition.

Create a Lambda function

  • Create lambda function to call the ECS

In lambda console -> Create function -> Choose Author from scratch -> give function name -> Give runtime -> choose python 3.7 -> Create a new role or if you have existing role choose the role with required permission [ Amazon ECS Full Access, AWS Lambda Basic Execution Role ]

import boto3
import os
import time
 client = boto3.client('ecs')
def lambda_handler(event,context):
	response = client.run_task(
    	cluster='Demo',
    	launchType='FARGATE',
    	taskDefinition='Demo-ubuntu-new',
    	count=1,
    	platformVersion='LATEST',
    	networkConfiguration={
        	'awsvpcConfiguration': {
            	'subnets': ['subnet-f5e959b9','subnet-11713279'],
            	'assignPublicIp': 'ENABLED',
            	'securityGroups': ['sg-0462860d9c60d87d3']
        	},
    	}
	)
	print("this is the response",response)
	task_arn=response['tasks'][0]['taskArn']
	print (task_arn)
	time.sleep(31.5)
	stop_response = client.stop_task(
	cluster='Demo',
	task=task_arn
	)
	print (stop_response)
	return str(stop_response) ]
  • Give the required details such as cluster, launch Type, task Definition, count, platform Version, network Configuration.
  • Applications hosted in ECS Fargate will process the data_process.csv file and out file will be pushed to S3 bucket of output folder.

Create Notification to trigger lambda function(Send Email)

  • To enable the event notifications for an S3 bucket -> open the Amazon S3 console.
  • In the Bucket name list, choose the name of the bucket that you want to enable events for.
  • Choose Properties ->Under Advanced settings, choose Events ->Choose Add notification.
  •  In Name, type a descriptive name for your event configuration.
  • Under Events, select one or more of the type of event occurrences that you want to receive notifications for. When the event occurs a notification is sent to a destination that you choose.
  • Type an object name Prefix and/or a Suffix to filter the event notifications by the prefix and/or suffix.
  •  Select the type of destination to have the event notifications sent to.
  • If you select the Lambda Function destination type, do the following:
  • In Lambda Function, type or choose the name of the Lambda function that you want to receive notifications from Amazon S3 and choose to save.
  • Create a lambda function with Node.js
    • Note: Give bucket name, folder, file name, Email address verified.
[ var aws = require('aws-sdk');
var nodemailer = require('nodemailer');
var ses = new aws.SES({region:'us-east-1'});
var s3 = new aws.S3();
 function getS3File(bucket, key) {
	return new Promise(function (resolve, reject) {
    	s3.getObject(
        	{
            	Bucket: bucket,
            	Key: key
        	},
        	function (err, data) {
            	if (err) return reject(err);
            	else return resolve(data);
        	}
    	);
	})
}
 exports.handler = function (event, context, callback) {
     getS3File('window-demo-1', 'output/result.csv')
    	.then(function (fileData) {
        	var mailOptions = {
            	from: 'arun.kumar@powerupcloud.com',
            	subject: 'File uploaded in S3 succeeded!',
            	html: `<p>You got a contact message from: <b>${event.emailAddress}</b></p>`,
            	to: 'arun.kumar@powerupcloud.com',
            	attachments: [
                	{
                        filename: "result.csv",
                        content: fileData.Body
                	}
            	]
        	};
            console.log('Creating SES transporter');
        	// create Nodemailer SES transporter
        	var transporter = nodemailer.createTransport({
            	SES: ses
        	});
        	// send email
            transporter.sendMail(mailOptions, function (err, info) {
            	if (err) {
                    console.log(err);
                    console.log('Error sending email');
                    callback(err);
            	} else {
                    console.log('Email sent successfully');
                    callback();
            	}
        	});
    	})
    	.catch(function (error) {
        	console.log(error);
            console.log('Error getting attachment from S3');
        	callback(error);
    	});
}; ]

Conclusion:

If you are looking for a serverless orchestrator for your batch processing or on a complex Data Processing pipeline then give a try with AW Step functions and Lambda here we use  ECS Farget to cleanse the data. If your Data processing script is more complex you can integrate with Glue but still Step function will act as your orchestrator. 

Azure Data Factory – Setting up Self-Hosted IR HA enabled

By | Blogs, data | No Comments

Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies

Introduction

In the world of big data, raw, unorganized data is often stored in relational, non-relational, and other storage systems. However, on its own, raw data doesn’t have the proper context or meaning to provide meaningful insights to analysts, data scientists, or business decision-makers.

Big data requires service that can orchestrate and operationalize processes to refine these enormous stores of raw data into actionable business insights. Azure Data Factory(ADF) is a managed cloud service that’s built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.

This is how Azure introduces you to ADF. You can refer to the Azure documentation on ADF to know more.

Simply said, ADF is an ETL tool that will help you connect to various data sources to load data, perform transformations as per your business logic, and store them into different types of storages. It is a powerful tool and will help solve a variety of use cases.

In this blog, we will create a self hosted integration runtime (IR) with two nodes for high availability.

Use Case

A reputed client on OTT building an entire Content Management System (CMS) application on Azure having to migrate their old data or historical data from AWS which is hosting their current production environment. That’s when ADFs with self-hosted IRs come to your rescue – we were required  to connect to a different cloud, different VPC, private network, or on-premise data sources.

Our use-case here was to read data from a production AWS RDS MySQL Server inside a private VPC from ADF. To make this happen, we set up a two node self-hosted IR with high availability (HA).

Pre-requisites

  •  Windows Server VMs (Min 2 – Node1 & Node2)
  • .NET Framework 4.6.1 or later
  • For working with Parquet, ORC, and Avro formats you will require 
    • Visual C++ 2010 Redistributable Package (x64)
    • Java

Installation Steps

Step1: Login to the Azure Portal. Go to https://portal.azure.com

Step 2: Search for Data Factory in the Search bar. Click on + Add to create a new Data Factory.

Step 3: Enter a valid name for your ADF.

Note: The name can contain only letters, numbers, and hyphens. The first and last characters must be a letter or number. Spaces are not allowed.

Select the Subscription & Resource Group you want to create this ADF in. It is usually a good practice to enable Git for your ADF. Apart from being able to  store all your code safely, this also helps you when you have to migrate your ADF to a production subscription. You can get all your pipelines on the go.

Step 4: Click Create

You will need to wait for a few minutes, till your deployment is complete. If you get any error messages here, check your Subscription & Permission level to make sure you have the required permissions to create data factories.

Click on Go to resource

Step 5:

Click on Author & Monitor

Next, click on the Pencil button on the left side panel

Step 6: Click on Connections

Step 7: Under Connections tab, click on Integration runtimes, click on + New to create a new IR

Step 8: On clicking New, you will be taken to the IR set-up wizard.

Select Azure, Self-Hosted and click on Continue

Step 9: Select Self-Hosted  and Continue

Step 10: Enter a valid name for your IR, and click Create

Note: Integration runtime Name can contain only letters, numbers and the dash (-) character. The first and last characters must be a letter or number. Every dash (-) character must be immediately preceded and followed by a letter or a number. Consecutive dashes are not permitted in integration runtime names.

Step 11:

On clicking Create, your IR will be created.

Next you will need to install the IRs in your Windows VMs. At this point you should login to your VM (Node1) or wherever you want to install your

You are provided with two options for installation :

  • Express Setup – This is the easiest way to install and configure your IRs.  We are following the Express Setup in this setup. Connect to your Windows Server where you want to install.

Login to Azure Portal in your browser (inside your VM) → Data Factory →  select your ADF → Connections → Integration Runtimes →  integrationRuntime1 → Click Express Setup → Click on the link to download setup files.

  • Manual Setup – You can download the integration runtime and add the authentication keys to validate your installation.

Step 12: Express Setup

Click on the downloaded file.

On clicking on the downloaded file, your installation will start automatically.

Step 13:

Once the installation and authentication is successfully completed, go to the Start Menu → Microsoft Integration Runtime → Microsoft Integration Runtime

Step 14: You will need to wait till your node is able to connect to the cloud service. If for any reason, you get any error at this step, you can troubleshoot by referring to self hosted integration runtime troubleshoot guide

Step 15: High availability 

One node setup is complete. For high availability, we will need to set up at least 2 nodes. An IR can have a max of 4 nodes.

Note: Before setting up other nodes, you need to enable remote access. To enable remote access, you need to make sure you are doing it in your very first node, i.e, you have a single node when you are doing this configuration, you might face issues with connectivity later if you forget this step.

Go to Settings tab and  Click on Change under Remote access from intranet

Step 16:

Select Enable without TLS/SSL certificate (Basic) for dev/test purpose, or use TLS/SSL for a more secured connection.

You can select a different TCP port – else use the default 8060

Step 17:

Click on OK. Your IR will need to be restarted for this change to be effected. Click OK again.

You will notice remote access enabled for your node.

Step 18:

Login to your other VM (Node2). Repeat Steps 11 to 17. At this point you will probably get a Connection Limited message stating your nodes are not able to connect to each other. Guess why? We will need to enable inbound access to port 8060 for both nodes.

Go to Azure Portal → Virtual Machines → Select your VM (Node1) → Networking.

Click on Add inbound port rule

Step 19:

Select Source → IP Addresses → Set Source IP as the IP of your Node2. Node2 will need to connect to Port 8060 of Node 1. Click Add

Node1 IP – 10.0.0.1 & Node2 IP – 10.0.0.2. You can use either of private or public IP addresses.

We will need to do a similar exercise for Node2.

Go to the VM page of Node2 and add Inbound rule for Port 8060. Node1 & Node2 need to be able to communicate with each other via port 8060.

Step 20:

If you go to your IR inside your Node1 and Node2, you will see the green tick implying your nodes are successfully connected to each other and also to the cloud. You can wait for some time for this sync to happen. If for some reason, you get an error at this step, you can view integration runtime logs from Windows Event Viewer to further troubleshoot. Restart both of your nodes.

To verify this connection, you can also check in the ADF Console.

Go to your Data Factory → Monitor (Watch symbol on the left panel, below Pencil symbol – Check Step 5) → Integration runtimes

Here you can see the number of registered nodes and their resource utilization. The HIGH AVAILABILITY ENABLED featured is turned ON now.

Step 21: Test Database connectivity from your Node

If you want to test database connectivity from your Node, make sure you have whitelisted the Public IP of your Node at the Database Server inbound security rules.

For e.g, if your Node1 has an IP address 66.666.66.66 and needs to connect to an AWS RDS MySQL Server. Go to your RDS security group and add Inbound rules of your MySQL Port for this IP.

To test this. Login to your Node1 → Start → Microsoft Integration Runtime → Diagnostics → Add your RDS connection details → Click on Test

Conclusion

This brings you to the end of successfully setting up a self-hosted IR with high availability enabled.

Hope this was informative. Do leave your comments below. Thanks for reading.

References

Detect highly distributed web DDoS on CloudFront, from botnets

By | Blogs, Cloud, Cloud Assessment | No Comments

Author: Niraj Kumar Gupta, Cloud Consulting at Powerupcloud Technologies.

Contributors: Mudit Jain, Hemant Kumar R and Tiriveedi Srividhya

INTRODUCTION TO SERVICES USED

 CloudWatch Metrics

Metrics are abstract data points indicating performance of your systems. By default, several AWS services provide free metrics for resources (such as Amazon EC2 instances, Amazon EBS volumes, and Amazon RDS DB instances).

CloudWatch Alarms

AWS CloudWatch Alarm is a powerful service provided by Amazon for monitoring and managing our AWS services. It provides us with data and actionable insights that we can use to monitor our application/websites, understand and respond to critical changes, optimize resource utilization, and get a consolidated view of the entire account. CloudWatch collects monitoring and operational information in the form of logs, metrics, and events. You can configure alarms to initiate an action when a condition is satisfied, like reaching a pre-configured threshold.

CloudWatch Dashboard

Amazon CloudWatch Dashboards is a feature of AWS CloudWatch that offers basic monitoring home pages for your AWS accounts. It provides resource status and performance views via graphs and gauges. Dashboards can monitor resources in multiple AWS regions to present a cohesive account-wide view of your accounts.

CloudWatch Composite Alarms

Composite alarms enhance existing alarm capability giving customers a way to logically combine multiple alarms. A single infrastructure event may generate multiple alarms, and the volume of alarms can overwhelm operators or mislead the triage and diagnosis process. If this happens, operators can end up dealing with alarm fatigue or waste time reviewing a large number of alarms to identify the root cause. Composite alarms give operators the ability to add logic and group alarms into a single high-level alarm, which is triggered when the underlying conditions are met. This gives operators the ability to make intelligent decisions and reduces the time to detect, diagnose, and performance issues when it happen.

What are Anomaly detection-based alarms?

Amazon CloudWatch Anomaly Detection applies machine-learning algorithms to continuously analyze system and application metrics, determine a normal baseline, and surface anomalies with minimal user intervention. You can use Anomaly Detection to isolate and troubleshoot unexpected changes in your metric behavior.

Why Composite Alarms?

  1. Simple Alarms monitor single metrics. Most of the alarms triggered, limited by the design, will be false positives on further triage. This adds to maintenance overhead and noise.
  2. Advance use cases cannot be conceptualized and achieved with simple alarms.

Why Anomaly Detection?

  1. Static alarms trigger based on fixed higher and/or lower limits. There is no direct way to change these limits based on the day of the month, day of the week and/or time of the day etc. For most businesses these values change massively over different times of the day and so on. Specially so, while monitoring user behavior impacted metrics, like incoming or outgoing traffic. This leaves the static alarms futile for most of the time. 
  2. It is cheap AI based regression on the metrics.

Solution Overview

  1. Request count > monitored by anomaly detection based Alarm1.
  2. Cache hit > monitored by anomaly detection based Alarm2.
  3. Alarm1 and Alarm2 > monitored by composite Alarm3.
  4. Alarm3 > Send Notification(s) to SNS2, which has lambda endpoint as subscription.
  5. Lambda Function > Sends custom notification with CloudWatch Dashboard link to the distribution lists subscribed in SNS1.

Solution

Prerequisites

  1. Enable additional CloudFront Cache-Hit metrics.

Configuration

This is applicable to all enterprise’s CloudFront CDN distributions.

  1. We will configure an Anomaly Detection alarm on request count increasing by 10%(example) of expected average.

2. We will add an Anomaly Detection alarm on CacheHitRate percentage going lower than standard deviation of 10%(example) of expected average.

3. We will create a composite alarm for the above-mentioned alarms using logical AND operation.

4. Create a CloudWatch Dashboard with all required information in one place for quick access.

5. Create a lambda function:

This will be triggered by SNS2 (SNS topic) when the composite alarm state changes to “ALARM”. This lambda function will execute to send custom notifications (EMAIL alerts) to the users via SNS1 (SNS topic)

The target arn should be the SNS1, where the user’s Email id is configured as endpoints.

In the message section type the custom message which needs to be notified to the user, here we have mentioned the CloudWatch dashboard URL.

6. Create two SNS topics:

  • SNS1 – With EMAIL alerts to users [preferably to email distribution list(s)].
  • SNS2 – A Lambda function subscription with code sending custom notifications via SNS1 with links to CloudWatch dashboard(s). Same lambda can be used to pick different dashboard links based on the specific composite alarm triggered, from a DynamoDB table with mapping between SNS target topic ARN to CloudWatch Dashboard link.

7. Add notification to the composite alarm to send notification on the SNS2.

Possible False Positives

  1. There is some new promotion activity and the newly developed pages for the promotional activity.
  2. Some hotfix went wrong at the time of spikes in traffic.

Summary

This is one example of implementing a simple setup of composite alarms and anomaly-based detection alarms to achieve advance security monitoring. We are submitting the case that these are very powerful tools and can be used to design a lot of advanced functionalities.

Why hybrid is the preferred strategy for all your cloud needs

By | AWS, Azure, Blogs, GCP, hybrid cloud, Powerlearnings | No Comments

Written by Kiran Kumar, Business analyst at Powerupcloud Technologies.

While public cloud is a globally accepted and proven solution for CIO’s and CTO’s looking for a more agile, scalable and versatile  IT environment, there are still questions about security, reliability, cloud readiness of the enterprises and that require a lot of time and resources to fully migrate to a cloud-native organization. This is exacerbated especially for start-ups, as it is too much of a risk to work with these uncertainties. This demands a solution that is innocuous and less expensive to drive them out of the comforts of their existing on-prem infrastructure. 

Under such cases, a hybrid cloud is the best approach providing you with the best of both worlds while keeping pace with all your performance, & compliance needs within the comforts of your datacenter.

So what is a hybrid cloud?

Hybrid cloud delivers a seamless computing experience to the organizations by combining the power of the public and private cloud and allowing data and applications to be shared between them. It provides enterprises the ability to easily scale their on-premises infrastructure to the public cloud to handle any fluctuations in the work-load without giving third-party datacenters access to the entirety of their data. Understanding the benefits, various organizations around the world have streamlined their offerings to effortlessly integrate these solutions into their hybrid infrastructures. However, an enterprise has no direct control over the architecture of a public cloud so, for hybrid cloud deployment, enterprises must architect their private cloud to achieve consistent hybrid experience with the desired public cloud or clouds. 

A 2019 survey (of 2,650 IT decision-makers from around the world) respondents reported steady and substantial hybrid deployment plans over the next five years. In addition to this, a vast majority of 2019 survey respondents about more than 80% selected hybrid cloud as their ideal IT operating model and more than half of these respondents cited hybrid cloud as the model that meets all of their needs. And more than 60% of them stated that data security is the biggest influencer.

Also, respondents felt having the flexibility to match the right cloud to each application showcases the scale of adaptability that enterprises are allowed to work with, in a hybrid multi-cloud environment 

Banking is one of those industries that will embrace the full benefits of a hybrid cloud, because of how the industry operates they require a unique mix of services and an infrastructure which is easily accessible and also affordable

In a recent IBM survey 

  • 50 percent of banking executives say they believe the hybrid cloud can lower their cost of IT ownership 
  • 47 percent of banking executives say they believe hybrid cloud can improve operating margin 
  • 47 percent of banking executives say they believe hybrid cloud can accelerate innovation

Hybrid adoption – best practices and guidelines 

Some of the biggest challenges in cloud adoption include security, talent, and costs, according to the report’s hybrid computing has shown that it can eliminate security challenges and manage risk, by positioning all the important digital assets and data on-prem. Private clouds are still considered to be an appropriate solution to host and manage sensitive data and applications and also the enterprises still need the means to support their conventional enterprise computing models. A sizeable number of businesses still have substantial on-premise assets comprising archaic technology, sensitive collections of data, and highly coupled legacy apps that either can’t be easily moved or swapped for public cloud. 

Here some of the guidelines for hybrid adoption 

Have a cloud deployment model for applications and data

Deployment models talk about what cloud resources and applications should be deployed and where. Hence it is crucial to understand the 2 paced system ie, steady and fast-paced system to determine the deployment models. 

A steady paced system must continue to support the traditional enterprise applications on-prem to keep the business running and maintain the current on-premise services. Additionally, off-premises services, such as private dedicated IaaS, can be used to increase infrastructure flexibility for enterprise services.

And a fast-paced system is required to satisfy the more spontaneous needs like delivering applications and services quickly whether it’s scaling existing services to satisfy spikes in demand or providing new applications quickly to meet an immediate business need. 

The next step is determining where applications and data must reside.

Placement of application and datasets on private, public or on-prem is crucial since IT architects must access the right application architecture to achieve maximum benefit. This includes understanding application workload characteristics and determining the right deployment model for multi-tier applications. 

Create heterogeneous environments 

To achieve maximum benefits from a hybrid strategy, the enterprise must leverage its existing in-house investments with cloud services, by efficiently integrating them, as new cloud services are deployed, the applications running on them with various on-premises applications and systems becomes important.

Integration between applications typically includes 

  • Process (or control) integration, where an application invokes another one in order to execute a certain workflow. 
  • Data integration, where applications share common data, or one application’s output becomes another application’s input. 
  • Presentation integration, where multiple applications present their results simultaneously to a user through a dashboard or mashup.

To obtain a seamless integration between heterogeneous environments, the following actions are necessary:

  • a cloud service provider must support open source technologies for admin and business interfaces.
  • Examine the compatibility of in-house systems to work with cloud services providers and also ensure that on-premises applications are following SOA design principles and can utilize and expose APIs to enable interoperability with private or public cloud services.
  • Leverage the support of third party ID and Access Management functionality to authenticate and authorize access to cloud services. Put in place suitable API Management capabilities to prevent unauthorized access.

Network security requirements 

Network type – Technology used for physical connection over the WAN Depends on aspects like bandwidth, latency, service levels, and costs. Hybrid cloud solutions can rely on P2P links as well as the Internet to connect on-premises data centers and cloud providers. The selection of the connectivity type depends on the analysis of aspects like performance, availability, and type of workloads. 

Security – connectivity domain needs to be evaluated and understood; to match the network security standards between cloud provider network security standards and the overall network security policies, guidelines and compliance. The encrypting and authenticating traffic on the WAN can be evaluated at the application level. And aspects like systems for the computing resources, applications must be considered and technologies such as VPNs can be employed to provide secure connections between components running in different environments.

Web apps security and Management services like DNS and DDoS protection which are available on the cloud can free up dedicated resources required by an enterprise to procure, set-up and maintain such services and instead concentrate on business applications. This is especially applicable to the hybrid cloud for workloads that have components deployed into a cloud service and are exposed to the Internet. The system that is deployed on-premises needs to adapt to work with the cloud, to facilitate problem identification activities that may span multiple systems that have different governance boundaries.

Security and privacy challenges & counter-measures

Hybrid cloud computing has to coordinate between applications and services spanning across various environments, which involves the movement of applications and data between the environments. Security protocols need to be applied across the whole system consistently, and additional risks must be addressed with suitable controls to account for the loss of control over any assets and data placed into a cloud provider’s systems. Despite this inherent loss of control, enterprises still need to take responsibility for their use of cloud computing services to maintain situational awareness, weigh alternatives, set priorities, and effect changes in security and privacy that are in the best interest of the organization. 

  • A single and uniform interface must be used to curtail or Deal with risks arising from using services from various cloud providers since it is likely that each will have its own set of security and privacy characteristics. 
  • Authentication and Authorization. A hybrid environment could mean that gaining access to the public cloud environment could lead to access to the on-premises cloud environment.
  • Compliance check between cloud providers used and in-home systems.

Counter-measures

  • A single Id&AM system should be used.
  • Networking facilities such as VPN are recommended between the on-premises environment and the cloud.  
  • Encryption needs to be in place for all sensitive data, wherever it is located.
  • firewalls, DDoS attack handling, etc, needs to be coordinated across all environments with external interfaces.

Set-up an appropriate DB&DR plan

As already been discussed, a hybrid environment provides organizations the option to work with the multi-cloud thus offering business continuity, which has been one of the most important aspects of business operations. It is not just a simple data backup to the cloud or a Disaster Recovery Plan, it means when a disaster or failure occurs, data is still accessible with little to no downtime. Which is measured in terms of time to restart (RTO: recovery time objective) and maximum data loss allowed (RPO: recovery point objective).

Therefore a business continuity solution has to be planned considering some of the key elements such as resilience, time to restart (RTO: recovery time objective) and maximum data loss allowed (RPO: recovery point objective) which was agreed upon by the cloud provider. 

Here are some of the challenges encountered while making a DR plan 

  • Although the RTO and RPO values give us a general idea of the outcome, they cannot be trusted fully so the time required to restart the operation may take longer 
  • As the systems get back up and operational there will be a sudden burst of request for resources which is more apparent in large scale disasters.
  • Selecting the right CSP is crucial as most of the cloud providers do not provide DR as a managed service instead, they provide a basic infrastructure to enable our own DRaaS.

Hence enterprises have to be clear and select their DR strategy which best suits their IT infrastructure as this is very crucial in providing mobility to the business thus making the business more easily accessible from anywhere around the world and also data insurance in the event of a disaster natural or even in case of technical failures, by minimizing downtime and the costs associated with such an event.

How are leading OEMs like AWS, Azure and Google Cloud adapting to this changing landscape  

Google Anthos

In early, 2019 google came up with Anthos which is one of the first multi-cloud solutions from a mainstream cloud provider, Anthos is an open application modernization platform that enables you to modernize your existing applications, build new ones, and run them anywhere, built on open-source, including Kubernetes as its central command and control center, Istio enables federated network management across the platform, and Knative provides an open API and runtime environment that enables you to run your serverless workloads anywhere you choose. Anthos enables consistency between on-premises and cloud environments. Anthos helps accelerate application development and strategically enables your business with transformational technologies. 

AWS Outposts

AWS Outposts is a fully managed service that extends the same AWS hardware infrastructure, services, APIs, and tools to build and run your applications on-premises and in the cloud for a truly consistent hybrid experience. AWS compute, storage, database, and other services run locally on Outposts, and you can access the full range of AWS services available in the Region to build, manage, and scale your on-premises applications using familiar AWS services and tools. across your on-premises and cloud environments. Your Outposts infrastructure and AWS services are managed, monitored, and updated by AWS just like in the cloud.

Azure Stack

Azure Stack is a hybrid solution provided by Azure built and distributed by approved Hardware vendors(like DellLenovoHPE, etc,.) that bring Azure cloud into your on-prem data center. It is a fully managed service where hardware is managed by the certified vendors and software is managed by the Microsoft Azure. Using azure stack you can extend the azure technology anywhere, from the datacenter to edge locations and remote offices. Enabling you to build, deploy, and run hybrid and edge computing apps consistently across your IT ecosystem, with flexibility for diverse workloads.

How Powerup approaches Hybrid cloud for its customers 

Powerup is one of the few companies in the world to have achieved the status of a launch partner with AWS outposts with the experience in working on over 200+projects across various verticals and having top-tier certified expertise in all the 3 major cloud providers in the market. We can bring an agile, secure, and seamless hybrid experience across the table. Outposts is a fully managed services hence it eliminates the hassle of managing an on-prem data center so that the enterprises can concentrate more on optimizing their infrastructure

Reference Material

Practical Guide to Hybrid Cloud Computing

Multicast in AWS using AWS Transit Gateway

By | AWS, Blogs, Powerlearnings | No Comments

Written by Aparna M, Associate Solutions Architect at Powerupcloud Technologies.

Multicast is a communication protocol used for delivering a single stream of data to multiple receiving computers simultaneously.

Now AWS Transit Gateway multicast makes it easy for customers to build multicast applications in the cloud and distribute data across thousands of connected Virtual Private Cloud networks. Multicast delivers a single stream of data to many users simultaneously. It is a preferred protocol to stream multimedia content and subscription data such as news articles and stock quotes, to a group of subscribers.

Now let’s understand the key concepts of Multicast:

  1. Multicast domain – Multicast domain allows the segmentation of a multicast network into different domains and makes the transit gateway act as multiple multicast routers. This is defined at the subnet level.
  2. Multicast Group – A multicast group is used to identify a set of sources and receivers that will send and receive the same multicast traffic. It is identified by a group IP address.
  3. Multicast source – An elastic network interface associated with a supported EC2 instance that sends multicast traffic.
  4. Multicast group member – An elastic network interface associated with a supported EC2 instance that receives multicast traffic. A multicast group has multiple group members.

Key Considerations for setting up Multicast in AWS:

  • Create a new transit gateway to enable multicast
  • You cannot share multicast-enabled transit gateways with other accounts
  • Internet Group Management Protocol (IGMP) (IGMP) support for managing group membership is not supported right now
  • A subnet can only be in one multicast domain.
  • If you use a non-Nitro instance, you must disable the Source/Dest check. 
  • A non-Nitro instance cannot be a multicast sender.

Let’s walkthrough how to set up multicast via AWS Console.

Create a Transit gateway for multicast:

In order to create a transit gateway multicast follow the below steps:

  1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.
  2. On the navigation pane, choose Create Transit Gateway.
  3. For Name tag, enter a name to identify the Transit gateway.
  4. Enable Multicast support.
  5. Choose Create Transit Gateway.

Create a Transit Gateway Multicast Domain

  1. On the navigation pane, choose the Transit Gateway Multicast.
  2. Choose Create Transit Gateway Multicast domain.
  3. (Optional) For Name tag, enter a name to identify the domain.
  4. For Transit Gateway ID, select the transit gateway that processes the multicast traffic.
  5. Choose Create Transit Gateway multicast domain.

Associate VPC Attachments and Subnets with a Transit Gateway Multicast Domain

To associate VPC attachments with a transit gateway multicast domain using the console

  1. On the navigation pane, choose Transit Gateway Multicast.
  2. Select the transit gateway multicast domain, and then choose ActionsCreate association.

3. For Transit Gateway ID, select the transit gateway attachment.

4. For Choose subnets to associate, select the subnets to include in the domain.

5. Choose Create association.

Register Sources with a Multicast Group

In order to register sources for transit gateway multicast:

  1. On the navigation pane, choose Transit Gateway Multicast.
  2. Select the transit gateway multicast domain, and then choose ActionsAdd group sources.
  3. For Group IP address, enter either the IPv4 CIDR block or IPv6 CIDR block to assign to the multicast domain. IP range must be in 224.0.0.0/4.
  4. Under Choose network interfaces, select the multicast sender’s (ec2 servers) network interfaces.
  5. Choose Add sources.

Register Members with a Multicast Group

To register members in the transit gateway multicast:

  1. On the navigation pane, choose Transit Gateway Multicast.
  2. Select the transit gateway multicast domain, and then choose ActionsAdd group members.
  3. For Group IP address, enter either the IPv4 CIDR block or IPv6 CIDR block to assign to the multicast domain. Specify the same multi cast IP specified while adding source.
  4. Under Choose network interfaces, select the multicast receivers'(ec2 server) network interfaces.
  5. Choose Add members.

Modify Security groups of the Member servers(receivers):

  1. Allow inbound traffic on Custom UDP port 5001

Once your setup is completed follow the below steps to test the multicast routing.

  1. Login to all the Source and member servers.
  2. Make sure you install iperf package in all your servers in order to test the functionality
  3. Run the below command in the Source Machine
iperf -s -u -B 224.0.0.50 -i 1

– 224.0.0.50 will be your multicast group IP provided during the setup

  1. Run the below command in all the member servers
iperf -c 224.0.0.50 -u -T 32 -t 100 -i 1

Once you start sending the data from the source server simultaneously that can be seen across all members. Below is the screenshot for your reference.

Conclusion

This blog helps you to host multicast applications on AWS leveraging AWS Transit gateway. Hope you found it useful.

Keycloak with java and reactJS

By | Blogs, Cloud | 2 Comments

Written by Soumya. a, Software Developer at Powerupcloud Technologies.

In a common development environment we create login algorithms and maintain all the details in the project database.which can be a risk to maintain in huge application development and maintenance of all the client data and user information so we use a 3rd party software for maintaining login details to make the application more secure. Keycloak even helps in maintaining multiple applications with different / same users. 

Problem statement:

Local database can become a task to maintain login details and maintenance of it so we use 3rd party software like keycloak to make the application more secure 

How it works:

We add keycloak to the working environment and add required details of keycloak in code.and add application details in keycloak and run in the working environment.Detailed steps for local and server environments is in the Document.

Keycloak:

Keycloak is an open source Identity and Access Management solution aimed at modern applications and services. It makes easy to secure applications and services with little to no code.

Keycloak is used to add security to any application with the keycloak details added to the application it gives various options like a simple login ,login with username and password ,use of otp for login etc.

When we use keycloak we need not maintain login details in our database all the details are saved in the keycloak server and it is secure only the required details can be stored in our database.  

Different features of keycloak:

Users authenticate with Keycloak rather than individual applications. This means that your applications don’t have to deal with login forms, authenticating users, and storing users. Once logged-in to Keycloak, users don’t have to login again to access a different application.

This also applied to logout. Keycloak provides single-sign out, which means users only have to logout once to be logged-out of all applications that use Keycloak.

Kerberos bridge

If your users authenticate to workstations with Kerberos (LDAP or active directory) they can also be automatically authenticated to Keycloak without having to provide their username and password again after they log on to the workstation.

Identity Brokering and Social Login

Enabling login with social networks is easy to add through the admin console. It’s just a matter of selecting the social network you want to add. No code or changes to your application is required.

Keycloak can also authenticate users with existing OpenID Connect or SAML 2.0 Identity Providers. Again, this is just a matter of configuring the Identity Provider through the admin console.

User Federation

Keycloak has built-in support to connect to existing LDAP or Active Directory servers. You can also implement your own provider if you have users in other stores, such as a relational database.

Client Adapters

Keycloak Client Adapters makes it really easy to secure applications and services. We have adapters available for a number of platforms and programming languages, but if there’s not one available for your chosen platform don’t worry. Keycloak is built on standard protocols so you can use any OpenID Connect Resource Library or SAML 2.0 Service Provider library out there.

Gatekeeper

You can also opt to use a proxy to secure your applications which removes the need to modify your application at all.

Admin Console

Through the admin console administrators can centrally manage all aspects of the Keycloak server.

They can enable and disable various features. They can configure identity brokering and user federation.

They can create and manage applications and services, and define fine-grained authorization policies.

They can also manage users, including permissions and sessions.

Account Management Console

Through the account management console users can manage their own accounts. They can update the profile, change passwords, and setup two-factor authentication.

Users can also manage sessions as well as view history for the account.

If you’ve enabled social login or identity brokering users can also link their accounts with additional providers to allow them to authenticate to the same account with different identity providers.

Standard Protocols

Keycloak is based on standard protocols and provides support for OpenID Connect, OAuth 2.0, and SAML.

Authorization Services

If role-based authorization doesn’t cover your needs, Keycloak provides fine-grained authorization services as well. This allows you to manage permissions for all your services from the Keycloak admin console and gives you the power to define exactly the policies you need.

How to use:

In Local:

Prerequisites:

  1. Download keycloak
  2. Run keycloak in port 8085 (default keycloak port:8080)./standalone.sh -Djboss.socket.binding.port-offset=5
  3. Log in with master login which is registered while creating keycloak

In Server:

Prerequisites:

  1. Download java 
  2. Download keycloak using  wget
  3. Run keycloak in port 8085 (default keycloak port:8080)./standalone.sh -Djboss.socket.binding.port-offset=5
  4. Add ssl certificate for keycloak
  5. Log in with master login which is registered while creating keycloak

Steps for server :

Installation

Step 1: Login to the Linux server Step 2: Download Keycloak.

cd /opt/
wget https://downloads.jboss.org/keycloak/7.0.0/keycloak-7.0.0.tar.gz
tar -xvzf keycloak-7.0.0.tar.gz
mv keycloak-7.0.0.tar.gz keycloak

Step 3: Create a user to run keycloak application

adduser techrunnr
chown techrunnr.techrunnr -R /opt/keycloak

Step 4: switch the user to newly created user

sudo su - techrunnr

Step 5: Goto the keycloak home directory.

cd /opt/keycloak

Step 6: Execute the below command to make the application run on the reverse proxy.

./bin/jboss-cli.sh 'embed-server,/subsystem=undertow/server=default-server/http-listener=default:write-attribute(name=proxy-address-forwarding,value=true)'
./bin/jboss-cli.sh 'embed-server,/socket-binding-group=standard-sockets/socket-binding=proxy-https:add(port=443)'
./bin/jboss-cli.sh 'embed-server,/subsystem=undertow/server=default-server/http-listener=default:write-attribute(name=redirect-socket,value=proxy-https)'

Step 7: Create a systemd configuration to start and stop keycloak using systemd.

cat > /etc/systemd/system/keycloak.service <<EOF
[Unit]
Description=Keycloak
After=network.target

[Service]
Type=idle
User=keycloak
Group=keycloak
ExecStart=/opt/keycloak/current/bin/standalone.sh -b 0.0.0.0
TimeoutStartSec=600
TimeoutStopSec=600

[Install]
WantedBy=multi-user.target
EOF

Step 8: Reload the systemd daemon and start Keycloak.

systemctl daemon-reload
systemctl enable keycloak
systemctl start keycloak

Step 9: Create an admin user using below command line.

./bin/add-user-keycloak.sh -u admin -p YOURPASS -r master

Configure Nginx reverse proxy

Step 1: Login to Nginx server and update in nginx.conf file.

upstream keycloak {
# Use IP Hash for session persistence
ip_hash;

# List of Keycloak servers
server 127.0.0.1:8080;
}

server {
listen 80;
server_name keycloak.domain.com;

# Redirect all HTTP to HTTPS
location / { 
return 301 https://\$server_name\$request_uri;
}
}

server {
listen 443 ssl http2;
server_name keycloak.domain.com;

ssl_certificate /etc/pki/tls/certs/my-cert.cer;
ssl_certificate_key /etc/pki/tls/private/my-key.key;
ssl_session_cache shared:SSL:1m;
ssl_prefer_server_ciphers on;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass http://ke

Once it’s completed restart the Nginx server to take immediate effect. Now access the given URL to access the keycloak server and use the credentials which you created in Step 9.

Steps for both server and local:

1. Create a new realm:

To create a new realm, complete the following steps:

  1. Go to http://localhost:8085/auth/admin/ and log in to the Keycloak Admin Console using the account you created in Install and Boot.
  2. From the Master drop-down menu, click Add Realm. When you are logged in to the master realm this drop-down menu lists all existing realms.
  3. Type demo iin the Name field and click Create.

When the realm is created, the main admin console page opens. Notice the current realm is now set to demo. Switch between managing the master realm and the realm you just created by clicking entries in the Select realm drop-down menu.

2. Create new client:

To define and register the client in the Keycloak admin console, complete the following steps:

  1. In the top-left drop-down menu select and manage the demo realm. Click Clients in the left side menu to open the Clients page.

2. On the right side, click Create.

3. Complete the fields as shown here:

4. Click Save to create the client application entry.

5. Change Access type to confidential 

6. Click the Installation tab in the Keycloak admin console to obtain a configuration template.

7. Select Keycloak OIDC JSON to generate an JSON template. Copy the contents for use in the next section.

3. Role: 

Roles identify a type or category of user. Keycloak often assigns access and permissions to specific roles rather than individual users for fine-grained access control.

Keycloak offers three types of roles:

  • Realm-level roles are in the global namespace shared by all clients.
  • Client roles have basically a namespace dedicated to a client.
  • A composite role is a role that has one or more additional roles associated with it. 

3.1. Create new role 

Roles->Add Role->Role name and Description(admin_role)

3.2. To add manage user permission for the newly created role

enable composite roles->client roles–>realm management->manage users.

3.3. This step is used to add manage user permission for default roles

Roles->Default Roles->Client Roles–>realm management–> add manage users.

4. Adding created role permission to your client

Client->select your client->scope->realmroles->add created role(admin_role).

5. Add permission to a new realm from master realm

5.1. Master realm->client->select your client(demo-realm)–>roles->manage-users(default false make it as true)

5.2. For making it true : Enable composite roles–>client-roles–>select your client(demo-realm)–>add manage users

6. For adding manage user permission for client in master

Master->roles->default roles->client roles–>select ur client(demo-realm)–>add manage users.

7. Once the permissions are given the first user in the new realm should be created using which we can create multiple users from code(outside keycloak)

7.1.Select your realm(demo)–>Users->New user->details(add email id and name)->

7.2.Credentials(password)->

7.3.Role mappings->Client role->realm management->check if manage users is present else add

In React JS for connecting keycloak and adding authentication

Keycloak.js file

JSON file for adding keycloak server details(json from installation tab)

{
 "realm": "realm name",
 "auth-server-url": "keycloak url",
 "ssl-required": "external",
 "resource": "client name",
 "credentials": {
   "secret": "secret key"
 },
 "confidential-port": 0
}

Keycloak functions in app.js

These functions are used to connect to java method and authenticate the user

async Keycloakfunc() {
   const keycloak = Keycloak("/keycloak.json");
   console.log("keycloak");
   return keycloak
     .init({ onLoad: "login-required", promiseType: "native" })
 
     .then(authenticated => {
       if (authenticated) {
         if (sessionStorage.getItem("Loggined") == null) {
           return this.getUserId(keycloak.tokenParsed.preferred_username);
         } else {
           this.setState({ Loggined: true });
         }
       }
     });
 }
 
 async getUserId(user_name) {
   //alert(user_name);
   const endpoint = CURRENT_SERVER + "authenticateLogin";
   const bot_obj = {
     username: user_name
   };
   console.log(bot_obj);
 
   return axios.post(endpoint, bot_obj).then(res => {
     let data = res.data;
     if (data.status == "success") {
       //setting token locally to access through out the application
       sessionStorage.setItem("authorization", data.response.token);
       this.setState({ isAuth: "true" });
       console.log("login success");
       localStorage.setItem("userId", JSON.stringify(data.response));
 
       localStorage.setItem("userName", user_name);
       localStorage.setItem("rolename", data.response.roleid);
 
       this.setState({ Loggined: true });
       sessionStorage.setItem("Loggined", true);
     } else if (data.status == "failed") {
       console.log(data.response.error);
     }
   });
 }

In Java code:

Authenticake login service method:

This method is used to check if the username / email id is registered in our DB and authenticate for next steps 

@Service
public class LoginServices {

@Autowired
private UserMainRepo userMain;

@Autowired
private RoleUserPermissionRepo roleUserPermission;

@Autowired
private JwtTokenUtil jwtTokenUtil;

public String keyCloak(String object) {
System.out.println("LoginServices.keyCloak()");
String userId = null;
String roleid = null;
JSONObject responseJSON = new JSONObject();
JSONObject resultJSON = new JSONObject();

JSONObject obj = new JSONObject(object);
String username = (String) obj.get("username");

List<UserMain> authenticate = userMain.findByEmailId(username);

if (authenticate.isEmpty()) {
responseJSON.put("error", "user not found in DB");
resultJSON.put("status", "failed");
resultJSON.put("response", responseJSON);
} else {
List<RoleUserPermission> roleUserData = roleUserPermission.findByUserMainId(authenticate.get(0));
userId = roleUserData.get(0).getId();
roleid = roleUserData.get(0).getRoleMasterId().getId();

// Creating JWT token for security
JwtUserDetails userDetails = new JwtUserDetails();
userDetails.setUsername(username.trim());
final String token = jwtTokenUtil.generateToken(userDetails);
final String secretKey = "botzer";
String encryptedToken = AES.encrypt(token, secretKey);
responseJSON.put("token", encryptedToken);
responseJSON.put("userId", userId);
responseJSON.put("isLoggedIn", "true");
responseJSON.put("roleid", roleid);
resultJSON.put("status", "success");
resultJSON.put("response", responseJSON);
}
return resultJSON.toString();
}

User controller Method :

All the details for creating user such as email id username etc is passed from front end as a JSON

This method to create user 

@RequestMapping(value = "/createUser", method = RequestMethod.POST,consumes = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<String> createUser(@RequestBody UserMain json) throws ParseException 
{
System.out.println("UserDetailsController.editUser()");
String response=userservices.createUser(json);
return new ResponseEntity<String>(response, HttpStatus.OK);
}

User services method :

This Method is used to create agent:

@PostMapping("/createUser")
public String createUser(@RequestBody UserMain user) {
System.out.println("UserDetailsController.createUser()");

try {
if (user.getId() == null) {
int count = getUserEmailIdCount(user.getEmailId());

if (count == 0) {
String userresult = createAgentOnKeyclock(user);
return userresult;

} else {
JSONObject obj = new JSONObject();
return "Failure";
}

} 
      }
catch(Exception e) {
return "Failure";
}
}

This method to get if the email id is existing and size of list :

public int getUserEmailIdCount(String emilId) {
System.out.println("UserServices.getUserEmailIdCount()");
List<UserMain> list= usermainRepo.findAllByEmailId(emilId);
return list.size();

}

This method is used to create user-id and password to the keycloack.\

public String createAgentOnKeyclock(UserMain usermain) {

try {
String serverUrl = Credentials.KEYCLOAK_SERVER_URL;
String realm = Credentials.KEYCLOAK_RELAM;
String clientId = Credentials.KEYCLOAK_CLIENT_ID;
String clientSecret = Credentials.KEYCLOAK_SECREAT_KEY;

Keycloak keycloak = KeycloakBuilder.builder() //
.serverUrl(serverUrl) //
.realm(Credentials.KEYCLOAK_RELAM) //
.grantType(OAuth2Constants.PASSWORD) //
.clientId(clientId) //
.clientSecret(clientSecret) //
.username(Credentials.KEYCLOAK_USER_NAME) //
.password(Credentials.KEYCLOAK_USER_PASSWORD) //
.build();

// Define user
UserRepresentation user = new UserRepresentation();
user.setEnabled(true);
user.setUsername(usermain.getEmailId());
user.setFirstName(usermain.getFirstName());
user.setLastName(usermain.getLastName());
user.setEmail(usermain.getEmailId());

// Get realm
RealmResource realmResource = keycloak.realm(realm);
UsersResource userRessource = realmResource.users();

Response response = userRessource.create(user);

System.out.println("response : " + response);

String userId = response.getLocation().getPath().replaceAll(".*/([^/]+)$", "$1");

System.out.println("userId : " + userId);
// Define password credential
CredentialRepresentation passwordCred = new CredentialRepresentation();
//
passwordCred.setTemporary(true);
passwordCred.setType(CredentialRepresentation.PASSWORD);
passwordCred.setValue(Credentials.DEFAULT_PASSWORD_AGENT_CREATION);

// Set password credential
userRessource.get(userId).resetPassword(passwordCred);
String userObj = createUser(usermain,userId);
return userObj;
} catch (Exception e) {
e.printStackTrace();
return "Failure";
}
}

Credential page to store keycloak credentials:

Keycloak client details (details from installation JSON from keycloak)

  • public static final String KEYCLOAK_SERVER_URL = “Keycloak server url“;
  • public static final String KEYCLOAK_RELAM = “Realm name“;
  • public static final String KEYCLOAK_CLIENT_ID = “Client name“;
  • public static final String KEYCLOAK_SECREAT_KEY = “Secret key of new client“;

First user in new realm with manage user permission details 

  • public static final String KEYCLOAK_USER_NAME = “First user emailId“;
  • public static final String KEYCLOAK_USER_PASSWORD = “First user password“;

Default password is stored in keycloak

  • public static final String DEFAULT_PASSWORD_AGENT_CREATION = “Default password“;
  • public static final String DB_SCHEMA_NAME = null;

Conclusion:

Using keycloak in the application makes it more secure and easy for maintaining user data for login.

Text to Speech using Amazon Polly with React JS & Python

By | AI, AWS, Blogs, Chatbot | One Comment

Written by Ishita Saha, Software Engineer, Powerupcloud Technologies

In this blog, we will discuss how we can integrate AWS Polly using Python & React JS to a chatbot application.

Use Case

We are developing a Chatbot Framework where we use AWS Polly for an exquisite & lively voice experience for our users

Problem Statement

We are trying to showcase how we can integrate AWS Polly voice services with our existing chatbot application built on React JS & Python.

What is AWS Polly ?

Amazon Polly is a service that turns text into lifelike speech. Amazon Polly enables existing applications to speak as a first-class feature and creates the opportunity for entirely new categories of speech-enabled products, from mobile apps and cars to devices and appliances. Amazon Polly includes dozens of lifelike voices and support for multiple languages, so you can select the ideal voice and distribute your speech-enabled applications in many geographies. Amazon Polly is easy to use – you just send the text you want converted into speech to the Amazon Polly API, and Amazon Polly immediately returns the audio stream to your application so you can play it directly or store it in a standard audio file format, such as MP3.

AWS Polly is easy to use. We only need an AWS subscription. We can test Polly directly from the AWS Console.

Go to :

https://console.aws.amazon.com/polly/home/SynthesizeSpeech

There is an option to select Voice from Different Languages & Regions.

Why Amazon Polly?

You can use Amazon Polly to power your application with high-quality spoken output. This cost-effective service has very low response times, and is available for virtually any use case, with no restrictions on storing and reusing generated speech.

Implementation

User provides input to the Chatbot. This Input goes to our React JS Frontend, which interacts internally with a Python Application in the backend. This Python application is responsible for interacting with AWS Polly and sending response back to the React app which plays the audio streaming output as mp3.

React JS

In this implementation, we are using the Audio() constructor.

The Audio() constructor creates and returns a new HTMLAudioElement which can be either attached to a document for the user to interact with and/or listen to, or can be used offscreen to manage and play audio.

Syntax :

audio = new Audio(url);

Methods :

play – Make the media object play or resume after pausing.
pause – Pause the media object.
load – Reload the media object.
canPlayType – Determine if a media type can be played.
 
Here, we are using only play() and pause() methods in our implementation.

Step 1: We have to initialize a variable into the state.

this.state = {
audio : "",
languageName: "",
voiceName: ""
}

Step 2 : Remove all unwanted space characters from input.

response = response.replace(/\//g, " ");
response = response.replace(/(\r\n|\n|\r)/gm, "");

Step 3 : If any existing reply from Bot is already in play. We can stop it.

if (this.state.audio != undefined) {
     this.state.audio.pause();
   }

Step 4 :

This method interacts with our Python Application. It sends requests to our Python backend with the following parameters. We create a new Audio() object. We are passing the following parameters dynamically to handle speaker() method :

  • languageName
  • voiceName
  • inputText
handleSpeaker = inputText => {
this.setState({
     audio: ""
   });
   this.setState({
     audio: new Audio(
       POLLY_API +
         "/texttospeech?LanguageCode=" +
         this.state.languageName +
         "&VoiceId=" +
         this.state.voiceName +
         "&OutputFormat=mp3"
    "&Text=" + inputText
     )
   });
}

Step 5 : On getting the response from our POLLY_API Python App, we will need to play the mp3 file.

this.state.audio.play();

Python

The Python application communicates with AWS Polly using AWS Python SDK – boto3.

Step 1: Now we will need to configure AWS credentials for accessing AWS Polly by using Secret Key, Access Keys & Region.

import boto3
def connectToPolly():
 polly_client = boto3.Session(
     aws_access_key=”xxxxxx”,
     aws_secret_key=”xxxxxx”,
     region=”xxxxxx”).client('polly')

 return polly_client

Here, we are creating a polly client to access AWS Polly Services.

Step 2: We are using synthesize_speech() to get an audio stream file.

Request Syntax :

response = client.synthesize_speech(
    Engine='standard'|'neural',
    LanguageCode='arb'|'cmn-CN'|'cy-GB'|'da-DK'|'de-DE'|'en-AU'|'en-GB'|'en-GB-WLS'|'en-IN'|'en-US'|'es-ES'|'es-MX'|'es-US'|'fr-CA'|'fr-FR'|'is-IS'|'it-IT'|'ja-JP'|'hi-IN'|'ko-KR'|'nb-NO'|'nl-NL'|'pl-PL'|'pt-BR'|'pt-PT'|'ro-RO'|'ru-RU'|'sv-SE'|'tr-TR',
        				OutputFormat='json'|'mp3'|'ogg_vorbis'|'pcm',
    									TextType='ssml'|'text',
    VoiceId='Aditi'|'Amy'|'Astrid'|'Bianca'|'Brian'|'Camila'|'Carla'|'Carmen'|'Celine'|'Chantal'|'Conchita'|'Cristiano'|'Dora'|'Emma'|'Enrique'|'Ewa'|'Filiz'|'Geraint'|'Giorgio'|'Gwyneth'|'Hans'|'Ines'|'Ivy'|'Jacek'|'Jan'|'Joanna'|'Joey'|'Justin'|'Karl'|'Kendra'|'Kimberly'|'Lea'|'Liv'|'Lotte'|'Lucia'|'Lupe'|'Mads'|'Maja'|'Marlene'|'Mathieu'|'Matthew'|'Maxim'|'Mia'|'Miguel'|'Mizuki'|'Naja'|'Nicole'|'Penelope'|'Raveena'|'Ricardo'|'Ruben'|'Russell'|'Salli'|'Seoyeon'|'Takumi'|'Tatyana'|'Vicki'|'Vitoria'|'Zeina'|'Zhiyu'
)

Response Syntax :

{
    'AudioStream': StreamingBody(),
    'ContentType': 'string',
    'RequestCharacters': 123
}

We are calling textToSpeech Flask API which accepts parameters sent by React and further proceeds to call AWS Polly internally. The response is sent back to React as a mp3 file. The React application then plays out the audio file for the user.

@app.route('/textToSpeech', methods=['GET'])
def textToSpeech():
 languageCode = request.args.get('LanguageCode')
 voiceId = request.args.get('VoiceId')
 outputFormat = request.args.get('OutputFormat')
 polly_client = credentials.connectToPolly(aws_access_key, aws_secret_key, region)
 response = polly_client.synthesize_speech(Text="<speak>" + text + "</speak>",    
     LanguageCode=languageCode,
     VoiceId=voiceId,
     OutputFormat=outputFormat,
     TextType='ssml')
 return send_file(response.get("AudioStream"),
           AUDIO_FORMATS['mp3'])

Conclusion

This blog showcases the simple implementation of React JS integration with Python to utilize AWS Polly services. This can be used as a reference for such use cases with chatbots.

AWS WorkSpaces Implementation

By | AWS, Blogs, Powerlearnings | One Comment

Written by Arun Kumar, Associate Cloud Architect at Powerupcloud Technologies

Introduction

Amazon WorkSpaces is managed & secured Desktop-as-a-service (DaaS) provided by AWS cloud. WorkSpace eliminates the need for provisioning the hardware and software configurations, which becomes the easy tasks for IT admins to provision managed desktops on cloud.  End users can access the virtual desktop from any device or browser like Windows, Linux, iPad, and Android. Managing the corporate applications for end users becomes easier using WAM (Workspace Application Manager) or integrating with existing solutions like SCCM,WSUS and more.

To manage the end user’s and provide them access to WorkSpaces below solutions can be leveraged with AWS.

  • Extending the existing on-premises Active Directory by using AD Connector in AWS.
  • Create & configure AWS managed Simple AD or Microsoft Active Directory based on size of the organization.

WorkSpaces architecture with simple AD approach

In this architecture, WorkSpace is deployed for the Windows and Linux virtual desktop both are associated with the VPC and the Directory service (Simple AD) to store and manage information of users and WorkSpace.

The above architecture describes the flow of end users accessing Amazon WorkSpaces using Simple AD which authenticates users. Users access their WorkSpaces by using a client application from a supported device or web browser, and they log in by using their directory credentials.The login information is sent to an authentication gateway, which forwards the traffic to the directory for the WorkSpace. Once the user is authenticated, a streaming traffic is processed through the streaming gateway which works over PCoIP protocol to provide the end users complete experience of the desktop.

Prerequisites

To use the WorkSpace the following requirements need to be completed.

  • A directory service to authenticate users and provide access to their WorkSpace.
  • The WorkSpaces client application is based on the user’s device and requires an Internet connection.

For this demo we have created the Simple AD, this can be created from the workspace console.

Directory

  • Create the Simple AD
  • Choose the Directory size based on your organization size.
  • Enter the fully qualified domain name and Administrator password make note of your admin password somewhere for reference.
  • We’ll need a minimum of two subnets created for the AWS Directory Service which requires Multi-AZ deployment.
  • Directory is now created.

WorkSpace

Now let’s create the WorkSpace for employees.

  • Select the Directory which you need to create WorkSpace for the user access.
  • Select the appropriate subnets that we created in the previous section to provision the workspaces in a Multi-AZ deployment.
  • Ensure that the self-service permissions is always set to  “NO”, else the users will have the privilege to change the workspaces configurations on the fly without the workspaces admin knowledge.
  • Enabling WorkDocs based on the user’s requirement.
  • You can select the user from the Directory list or user can create a new user on the fly.
  • Select the Bundle of compute, operating system, storage for each of your users.

You can select the running mode of the WorkSpaces based on your company needs. This can directly impact the monthly bill as selecting “Always -On “ mode will have a fixed pricing  whereas ‘AutoStop’ mode is an on-demand pricing model. Ensure right running mode is selected during the workspaces creation based on business requirements of the user.

  • Review and launch your workSpace.
  • Now your WorkSpace is up and running. Once it is available and ready to use. You will receive an email from amazon with workspaces login details.
  • By selecting the URL to create a password for your user to access the WorkSpace.
  • Download the client based on your device or you have web login.
  • Install the WorkSpace agent in your local.
  • Open the WorkSpace client and enter the registration code which you received in the email.
  • It prompts for username and password.
  • Now you are prompted to your Virtual Desktop

Security and compliance of WorkSpace

  • By default encryption at transit.
  • KMS can be used to encrypt our data at rest.
  • IP based restrictions.
  • Multi-factor authentication(RADIUS)
  • PCI DSS Level 1 Complaint.
  • HIPAA-Eligible with business level agreements
  • Certification- ISO 9001 and ISO 27001

Cost

  • No upfront payment.
  • On- Demand pricing  – Autostop of the WorkSpaces – In this model when the user is not using the virtual desktop Amazon automatically gets stopped based on the Autostop hours selected for the user.
  • Fixed Pricing – Always-On model – In this model the WorkSpace virtual desktop cost is calculated on a fixed monthly basis based on the selected bundle.

.Licencing

  • Built in license – Which allows us to select the right Windows bundle as per business needs.
  • WorkSpaces additionally supports BYOL( bring your own license)  license model for Windows 10.

Monitoring

  • CloudTrail can monitor the API calls.
  • CloudWatch Monitoring can see the number of users connected to WorksSpaces and latency of the session and more.

Additional features

  • API support(SDK, AWS CLI)
  • WorkSpace Application Manager(WAM).
  • Custom images.
  • Audio input.
  • Pre Built applications in AWS Marketplace, we can add those applications to our WorkSpace.
  • User control in Directory level.
  • Integration with WorkDocs.

Conclusion

By adapting to the AWS WorkSpaces we can enable the end-users to securely access the business applications, documents that they are currently using within their organization devices or existing VDI solutions and experience a seamless performance of their desktop on cloud and also access the workspaces in the most secure way which prevents any data breach by enabling encryption options and also restricting client devices for users.

Benefits like reducing the overhead of maintenance of existing hardware and purchasing new hardware. Monitoring and managing the end-user workspaces becomes an easy task by integrating with AWS native services.

Copying objects using AWS Lambda based on S3 events – Part 2 – date partition

By | AWS, Blogs, Cloud | No Comments

Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies

Introduction

If you are here from the first of this series on S3 events with AWS Lambda, you can find some complex S3 object keys that we will be handling here.

If you are new here, you would like to visit the first part – which is more into the basics & steps in creating your Lambda function and configuring S3 event triggers.

You can find link to part 1 here :

Use Case

This is a similar use case where we try Copying new files to a different location(bucket/path) while preserving the hierarchy, plus we will partition the files according to their file names and store them in a date-partitioned structure.

Problem Statement

Our Tech Lead suggested a change in the application logic, so now the same application is writing files to  S3 bucket in a different fashion. The activity file for Ravi Bharti is written to source-bucket-006/RaviRanjanKumarBharti/20200406-1436246999.parquet.

Haha! Say our Manager wants to check activity files of Ravi Bharti date-wise, hour-wise, minute-wise, and.. no not seconds, we can skip that!

 So we need to store them in our destination bucket  as:

  • destination-test-bucket-006/RaviRanjanKumarBharti/2020-04-06/20200406-1436246999.parquet — Date wise
  • destination-test-bucket-006/RaviRanjanKumarBharti/2020-04-06/14/20200406-1436246999.parquet — Hour wise
  • destination-test-bucket-006/RaviRanjanKumarBharti/2020-04-06/14/36/20200406-1436246999.parquet — Hour/Min wise

Tree:

source-bucket-006
| - AjayMuralidhar
| - GopinathP
| - IshitaSaha
| - RachanaSharma
| - RaviRanjanKumarBharti
		| - 20200406-143624699.parquet
| - Sagar Gupta
| - SiddhantPathak

Solution

Our problem is not that complex, just a good quick play with split & join of strings should solve it. You can choose any programming language for this. But we are continuing using Python & AWS Python SDK – boto3.

Python Script

Everything remains the same, we will just need to change our script as per our sub-requirements. We will make use of the event dictionary to get the file name & path of the uploaded object.

source_bucket_name = event['Records'][0]['s3']['bucket']['name']

file_key_name = event['Records'][0]['s3']['object']['key']
  • destination-test-bucket-006/RaviRanjanKumarBharti/2020-04-06/20200406-1436246999.parquet

Format: source_file_path/YYYY-MM-DD/file.parquet

You can be lazy to do

file_key_name = “RaviRanjanKumarBharti/20200406-1436246999.parquet”

Splitting file_key_name with ‘/’ to extract Employee (folder name) & filename

file_root_dir_struct = file_key_name.split(‘/’)[0]

date_file_path_struct = file_key_name.split(‘/’)[1]

Splitting filename with ‘-’ to extract date & time

date_file_path_struct = file_key_name.split(‘/’)[1].split(‘-‘)[0]

Since we know the string will be always the same, we will concat it as per the position

YYYY		  - 		MM		-	DD
String[:4] - string[4:6] - string[6:8]


date_partition_path_struct = date_file_path_struct[:4] + "-" + date_file_path_struct[4:6] + "-" + date_file_path_struct[6:8]

Since Python is all about one-liners! We will try to solve this using List Comprehension

n_split = [4, 2, 2]

date_partition_path_struct = "-".join([date_file_path_struct[sum(n_split[:i]):sum(n_split[:i+1])] for i in range(len(n_split))])

We get date_partition_path_struct as ‘2020-04-06’

  • destination-test-bucket-006/RaviRanjanKumarBharti/2020-04-06/14/20200406-1436246999.parquet
time_file_path_struct = file_key_name.split('/')[1]

We will further need to split this to separate the file extension. Using the same variable for simplicity

time_file_path_struct = file_key_name.split('/')[1].split('-')[1].split('.')[0]


This gives us time_file_path_struct  as '1436246999'


hour_time_file_path_struct = time_file_path_struct[:2]
  • destination-test-bucket-006/RaviRanjanKumarBharti/2020-04-06/14/36/20200406-1436246999.parquet

Similarly for minute

min_time_file_path_struct = time_file_path_struct[2:4]

# Complete Code

import json
import boto3

# boto3 S3 initialization
s3_client = boto3.client("s3")


def lambda_handler(event, context):
  destination_bucket_name = 'destination-test-bucket-006'

  source_bucket_name = event['Records'][0]['s3']['bucket']['name']

  file_key_name = event['Records'][0]['s3']['object']['key']

  #Split file_key_name with ‘ / ’ to extract Employee & filename
  file_root_dir_struct = file_key_name.split('/')[0]

  file_path_struct = file_key_name.split('/')[1]

  # Split filename with ‘-’ to extract date & time
  date_file_path_struct = file_path_struct.split('-')[0]

  # Date Partition Lazy Solution

  # date_partition_path_struct = date_file_path_struct[:4] + "-" + date_file_path_struct[4:6] + "-" + date_file_path_struct[6:8]

  # Date Partition using List Comprehension

  n_split = [4, 2, 2]

  date_partition_path_struct = "-".join([date_file_path_struct[sum(n_split[:i]):sum(n_split[:i+1])] for i in range(len(n_split))])

  # Split to get time part
  time_file_path_split = file_key_name.split('/')[1]

  # Time Partition
  time_file_path_struct = time_file_path_split.split('-')[1].split('.')[0]

  # Hour Partition
  hour_time_file_path_struct = time_file_path_struct[:2]

  # Minute Partition
  min_time_file_path_struct = time_file_path_struct[2:4]

  # Concat all required strings to form destination path || date
  destination_file_path = file_root_dir_struct + "/" \
   + date_partition_path_struct + "/" + file_path_struct

  # # Concat all required strings to form destination path || hour partition
  # destination_file_path = file_root_dir_struct + "/" + date_partition_path_struct + "/" + \
  #                         hour_time_file_path_struct + "/" + file_path_struct

  # # Concat all required strings to form destination path || minute partition
  destination_file_path = file_root_dir_struct + "/" + date_partition_path_struct + "/" + \
                          hour_time_file_path_struct + "/" + min_time_file_path_struct + "/" + file_path_struct

  # Copy Source Object
  copy_source_object = {'Bucket': source_bucket_name, 'Key': file_key_name}

  # S3 copy object operation
  s3_client.copy_object(CopySource=copy_source_object, Bucket=destination_bucket_name, Key=destination_file_path)

  return {
      'statusCode': 200,
      'body': json.dumps('Hello from S3 events Lambda!')
  }

You can test your implementation by uploading a file in any folders of your source bucket, and then check your destination bucket of the respective Employee.

source-test-bucket-006

destination-test-bucket-006

Conclusion

This has helped us to solve the most popular use-case involved in data migration of storing files in a partitioned structure for better readability.

Hope this two series blog was useful to understand how we can use AWS Lambda and process your S3 objects based on event triggers.

Do leave your comments. Happy reading.

References

https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html

https://stackoverflow.com/questions/44648145/split-the-string-into-different-lengths-chunks

Tags: Amazon S3, AWS Lambda, S3 events, Python, Boto3, S3 Triggers, Lambda Trigger, S3 copy objects, date-partitioned, time-partitioned

Copying objects using AWS Lambda based on S3 events – Part 1

By | AWS, Blogs, Cloud | No Comments

Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies

Introduction

In this era of cloud, where data is always on the move. It is imperative for anyone dealing with moving data, to hear about Amazon’s Simple Storage Service, or popularly known as S3. As the name suggests, it is a simple file storage service, where we can upload or remove files – better referred to as objects. It is a very flexible storage and it will take care of scalability, security, performance and availability. So this is something which comes very handy for a lot of applications & use cases.

The next best thing we use here – AWS Lambda! The new world of Serverless Computing. You will be able to run your workloads easily using Lambda without absolutely bothering about provisioning any resources. Lambda takes care of it all.

Advantages

S3 as we already know is object-based storage, highly scalable & efficient. We can use it as a data source or even as a destination for various applications. AWS Lambda being serverless allows us to run anything without thinking about any underlying infrastructure. So you can use Lambda for a lot of your processing jobs or even simple communicating with any of your AWS resources.

Use Case

Copying new files to a different location(bucket/path) while preserving the hierarchy. We will use AWS Python SDK to solve this.

Problem Statement

Say, we have an application writing  files to a S3 bucket path every time an Employee updates his/her tasks at any time of the day during working hours.

For eg, The work activity of Ajay Muralidhar for 6th April 2020, of 12:00 PM will be stored in source-bucket-006/AjayMuralidhar/2020-04-06/12/my-task.txt. Refer to the Tree for more clarity. We need to move these task files to a new bucket while preserving the file hierarchy.

Solution

For solving this problem, we will use Amazon S3 events. Every file pushed to the source bucket will be an event, this needs to trigger a Lambda function which can then process this file and move it to the destination bucket.

1. Creating a Lambda Function

1.1 Go to the AWS Lambda Console and click on Create Function

1.2 Select an Execution Role for your Function

This is important because this ensures that your Lambda has access to your source & destination buckets. Either you can use an existing role that already has access to the S3 buckets, or you can choose to Create an execution role. If you choose the later, you will need to attach S3 permission to your role.

1.2.1 Optional – S3 Permission for new execution role

Go to Basic settings in your Lambda Function. You will find this when you scroll down your Lambda Function. Click Edit. You can edit your Lambda runtime settings here, like Timeout – Max of 15 mins. This is the time for which your Lambda can run. Advisable to set this as per your job requirement. Any time you get an error of Lambda timed out. You can increase this value.

Or you can also check the Permissions section for the role.

Click on View the <your-function-name>-role-<xyzabcd> role on the IAM console. This takes you to the IAM console. Click on Attach policies. You can also create inline policy if you need more control on the access you are providing. You can restrict this to particular buckets. For ease of demonstration, we are using AmazonS3FullAccess here.

Select AmazonS3FullAccess, click on Attach policy

Once the policy is successfully attached to your role, you can go back to your Lambda Function.

2. Setting S3 Event Trigger

2.1 Under Designer tab, Click on Add trigger

2.2 From the Trigger List dropdown, select S3 events

Select your source bucket. There are various event types you can choose from.

Find out more about S3 events here, https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#notification-how-to-event-types-and-destinations

We are using PUT since we want this event to trigger our Lambda when any new files are uploaded to our source bucket. You can add Prefix & Suffix if you need any particular type of files. Check on Enable Trigger

Python Script

We now write a simple Python script which will pick the incoming file from our source bucket and copy it to another location. The best thing about setting the Lambda S3 trigger is, whenever a new file is uploaded, it will trigger our Lambda. We make use of the event object here to gather all the required information.

This is how a sample event object looks like. This is passed to your Lambda function.

{
   "Records":[
      {
         "eventVersion":"2.1",
         "eventSource":"aws:s3",
         "awsRegion":"xx-xxxx-x",
         "eventTime":"2020-04-08T19:36:34.075Z",
         "eventName":"ObjectCreated:Put",
         "userIdentity":{
            "principalId":"AWS:POWERUPCLOUD:powerup@powerupcloud.com"
         },
         "requestParameters":{
            "sourceIPAddress":"XXX.XX.XXX.XX"
         },
         "responseElements":{
            "x-amz-request-id":"POWERUPCLOUD",
            "x-amz-id-2":"POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD/POWERUPCLOUD"
         },
         "s3":{
            "s3SchemaVersion":"1.0",
            "configurationId":"powerup24-powerup-powerup-powerup",
            "bucket":{
               "name":"source-test-bucket-006",
               "ownerIdentity":{
                  "principalId":"POWERUPCLOUD"
               },
               "arn":"arn:aws:s3:::source-test-bucket-006"
            },
            "object":{
               "key":"AjayMuralidhar/2020-04-06/12/my-tasks.txt",
               "size":20,
               "eTag":"1853ea0cebd1e10d791c9b2fcb8cc334",
               "sequencer":"005E8E27C31AEBFA2A"
            }
         }
      }
   ]
}

Your Lambda function makes use of this event dictionary to identify the location where the file is uploaded.

import json
import boto3

# boto3 S3 initialization
s3_client = boto3.client("s3")


def lambda_handler(event, context):
   destination_bucket_name = 'destination-test-bucket-006'

   # event contains all information about uploaded object
   print("Event :", event)

   # Bucket Name where file was uploaded
   source_bucket_name = event['Records'][0]['s3']['bucket']['name']

   # Filename of object (with path)
   file_key_name = event['Records'][0]['s3']['object']['key']

   # Copy Source Object
   copy_source_object = {'Bucket': source_bucket_name, 'Key': file_key_name}

   # S3 copy object operation
   s3_client.copy_object(CopySource=copy_source_object, Bucket=destination_bucket_name, Key=file_key_name)

   return {
       'statusCode': 200,
       'body': json.dumps('Hello from S3 events Lambda!')
   }

You can test your implementation by uploading a file in any folders of your source bucket, and then check your destination bucket for the same file.

source-test-bucket-006

destination-test-bucket-006

You can check your Lambda execution logs in CloudWatch. Go to Monitoring and click View Logs in CloudWatch

Congrats! We have solved our problem. Just before we conclude this blog, we would like to discuss an important feature of Lambda which will help you to upscale your jobs. What if your application is writing a huge number of files at the same time? Don’t worry, Lambda will help you with this too. By default, Lambda has a Concurrency of 1000. If you need to scale up, you can increase this as per your business requirements.

Conclusion

This is how easy it was to use S3 with Lambda to move files between buckets.

In Part 2 of this series, we will try to handle a bit more complex problem, where we will try to move files as date partitioned structures at our destination.

You can find link to part 2 here :

Hope this was helpful for an overview of the basics of using s3 events triggers with AWS Lambda. Do leave your comments. Happy reading.

References

https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html

Tags: Amazon S3, AWS Lambda, S3 events, Python, Boto3, S3 Triggers, Lambda Trigger, S3 copy objects