Azure Data Factory – Self Hosted Integration Runtime Sharing

By | Blogs, data | No Comments

Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies

Contributors: Sagar Gupta, Senior Software Engineer | Amruta Despande, Azure Associate Architect | Suraj R, Azure Cloud Engineer


Continuing our discussion on Azure Data Factory(ADF) from our previous blogs. In the past we have discussed ADF and configuration steps for a high availability self hosted integration runtime (IR). You can read more about that here: Azure Data Factory – Setting up Self-Hosted IR HA enabled

This is a quick short post on IR sharing in ADFs for better cost optimization and resource utilization also covers common shortcomings while creating ADF using Terraform and/or SDKs.

Use Case

This is again part of a major data migration assignment from AWS to Azure. We are extensively using ADF to setup ETL pipelines and migrate data effectively – both historical and incremental data.

Problem Statement

 Since the data migration activity involves different types of databases and complex data operations, we are using multiple ADFs to achieve this. Handling private production data required self-hosted IRs to be configured to connect to the production environment. The general best practices for self-hosted IR is a high-availability architecture. An IR can have max 4 nodes – a minimum of 2 nodes for high availability. So here arises the problem – for multiple ADFs how many such self-hosted IRs would one use to power this?


This is where IR sharing comes into the picture. ADF has this brilliant feature of IR sharing wherein many ADFs can share the same IR. The advantage of this will be price & resource reduction. Suppose you had to run 2 ADFs – one to perform various heavy migrations for AWS RDS MySQL to Azure, and the other one for AWS RDS PostgreSQL. Ideally we would have created 2 different IRs one each able to connect to MySQL & PostgreSQL separately. For a production level implementation, this would mean 2X4 = 8 nodes (Windows VMs). Using IR sharing, we can create one self-hosted IR with 4 nodes and share this IR with both ADFs cutting cost on 4 extra nodes. Please note – The IR node sizing depends on your workloads. That’s a separate calculation. This is only from a high level consideration.

Steps to enable IR sharing between ADFs

Step1: Login to the Azure Portal.

Step 2: Search forData Factories in the main search bar.

Step3: Select your Data Factory. Click on Author & Monitor.

Click on Pencil icon to edit.

Step 4: Click on Connections. Open Management Hub.

Step 5: Click on Integration runtimes to view all your IRs. Select your self-hosted IR for which you want to enable sharing.

Refer to for detailed information on creating self-hosted IRs.

Step 6: This opens the Edit integration runtime tab on the right side. Go to Sharing and Click on + Grant permission to another Data Factory.

Copy the Resource ID from this step. We will use it in Step 9.

This will list down all ADFs with which you can share this IR.

Step 7: You can either search your ADF or manually enter service identity application ID. Click on Add

Note: You may sometimes be unable to find the ADF from this dropdown list. Even though your ADF lists in the Data Factory page, it does not show up in this list. That will leave you puzzled. Not to worry – such a case might arise when you are creating ADFs using the Azure APIs programmatically or through Terraform. Don’t forget to add the optional identity parameter while creating. This assigns a system generated Identity to the resource.

Sample Terraform for ADF

provider "azurerm" {
    version = "~>2.0"
  features {}

resource "azurerm_data_factory" "adf-demo" {
  name                = "adf-terraform-demo"
  location            = "East US 2"
  resource_group_name = "DEMO-ADF-RG"
  identity {
    type = "SystemAssigned"

To locate the service identity id of ADF. Go to Data Factories page, select the ADF and click on Properties.

Step 8: Click on Apply for the changes to effect.

Incase you do not have required permissions, you might get the following error

Error occurred when grant permission to xxxxxxxx-xxxx-xxxx-xxx-xxxxxxxxx. Error: {"error":{"code":"AuthorizationFailed","message":"The client '' with object id 'xxxxxxxx-xxxx-xxxx-xxx-xxxxxxxxx' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/xxxxxxxx-xxxx-xxxx-xxx-xxxxxxxxx/resourcegroups/DEMO-ADF-RG/providers/Microsoft.DataFactory/factories/adf-terraform-demo-Postgres-to-MySQL/integrationRuntimes/integrationRuntime3/providers/Microsoft.Authorization/roleAssignments/xxxxxxxx-xxxx-xxxx-xxx-xxxxxxxxx' or the scope is invalid. If access was recently granted, please refresh your credentials."}}

Step 9:

Now go to the ADF where this has to be shared (one added in the sharing list – adf-terraform-demo). Go to Connections → Integration runtimes → +New  →  Azure, Self Hosted

Here you will find Type as  Self-Hosted (Linked). Enter the Resource ID from Step 6 and Create.

After successful creation, you can find the new IR with sub-type Linked

The IR sharing setup is complete. Be seamless with your ADF pipelines now.


Sharing IRs between ADFs will save greatly on the infrastructure costs. Sharing is simple & effective. We will come up with more ADF use cases and share our problem statements, approaches and solutions.

Hope this was informative. Do leave your comments below for any questions.

Read the series here

Thundering Clouds – Technical overview of AWS vs Azure vs Google Cloud

By | Blogs, Powerlearnings | No Comments

Compiled by Kiran Kumar, Business Analyst at Powerupcloud Technologies.

The battle of the Big 3 Cloud Service Providers

The cloud ecosystem is in a constant state of evolution, with increasing maturity and adoption, the battle for the mind and wallet intensifies. With Amazon Web Services (AWS), Microsoft Azure, and Google Cloud (GCP) leading with IaaS maturity, the likes of Salesforce, SAP, and Oracle to Workday, which recently reached $1B in quarterly revenue are both gaining ground and carving out niches in the the ‘X’aaS space. The recent COVID crisis has accelerated both adoption and consideration as enterprises transform to cope, differentiate, and sustain an advantage over the competition.  

In this article, I will stick to referencing the AWS, Azure, and GCP and terming them as the BIG 3, a disclaimer, Powerup is a top-tier partner with all three and the comparisons are purely objective based on current publically available information. It is very likely that when you do read this article a lot might have already changed. Having said that, the future will belong to those who excel in providing managed solutions around artificial intelligence, analytics, IoT, and edge computing. So let’s dive right in:      

Amazon Web Services –  As the oldest amongst the three and the most widely known, showcasing the biggest spread of availability zones and an extensive roster of services. It has monopolized its maturity to activate a developer ecosystem globally, which has proven to be a critical enabler of its widespread use.      

Microsoft Azure – Azure is the closest that one gets to AWS in terms of products and services. While AWS has fully leveraged its head start, Azure tapped into Microsoft’s huge enterprise customers and let them take advantage of the already existing infrastructure by providing better value through Windows support and interoperability.

Google Cloud Platform –  Google Cloud was announced in 2011, for being less than a decade old it has created a significant footprint. Initially intended to strengthen google’s products but later came up with an enterprise offering. A lot is expected from its deep expertise in AI, ML, deep learning & data analytics to give it a significant edge over the other providers.

AWS vs. Azure vs. Google Cloud: Overall Pros and Cons

In this analysis, I dive into broad technical aspects of these 3 cloud providers based on the common parameters listed below.

  • Compute
  • Storage
  • Exclusives  


AWS Compute:

Amazon EC2 EC2 or Elastic compute cloud is Amazon’s compute offering. EC2 can support multiple instance types (bare metal, GPU, windows, Linux, and more)and can be launched with different security and networking options, you can choose from a wide range of templates available based on your use case. EC2 can both resize and autoscale to handle changes in requirements which eliminates the need for complex governance.

Amazon Elastic Container Service a highly scalable, high-performance container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications, manage and scale a cluster of VM’s, or schedule containers on those VM’s.

Amazon EKS makes it easy to deploy, manage, and scale containerized applications using Kubernetes on AWS.

It also has its own Fargate service that automates server and cluster management for containers, a virtual private cloud option known as Lightsail for batch computing jobs, Elastic Beanstalk for running and scaling Web applications, lambda for launching serverless applications.

Container services Include Amazon Elastic Container Registry a fully-managed Docker container registry which allows you to store, manage, and deploy Docker container images.

Microsoft VM:

Azure VM: Azure VM’s are a secure and highly scalable compute solution with various instance types optimized for high-performance computing, Ai, and ML-based computing container instances and with azure’s emphasis on hybrid computing, support for multiple OS’s types, Microsoft software, and services. Virtual Machine Scale Sets are used to auto-scale your instances.

Azure container services include Azure Kubernetes service fully managed Kubernetes based Container Solution.

Container Registry which lets you store and manage container images across all types of Azure deployments.

Service Fabric A unique fully managed services which lets you develop microservices and orchestrate containers on Windows or Linux.

Other services include Web App for Containers which lets you run, scale, and deploy containerized web apps. Azure Functions for launching serverless applications, Azure Red Hat OpenShift, with support for  OpenShift.

Google Compute Engine:

Google Compute Engine (GCE) is google compute service Google is fairly new to cloud compared to the other two CSP’s and it is reflected in its catalog of services GCE offers the standard array of features starting from windows and Linux instances, RESTful API’s, load balancing, data storage, and networking, CLI and GUI interfaces, and easy scaling. Backed by Google, GCE can spin up instances faster than most of its competition under most cases. It runs on a carbon-neutral infrastructure and offers the best value for your buck among the competition.

Google Kubernetes Engine (GKE) is based on Kubernetes, originally developed inhouse Google has the highest expertise when it comes to Kubernetes and has deeply integrated it into the google cloud platform GKE service can be used to automate many of your deployment, maintenance, and management tasks. Also can be used with hybrid clouds via the Anthos service.


AWS Storage:

Amazon S3 is an object storage service that offers scalability, data availability, security, and performance for most of your storage requirements. Amazon Elastic Block Store persistent block storage that can be used with your Amazon EC2 instances. Elastic file system for scalable file storage.

Other storage services include S3 Glacier, a secure, durable, and extremely low-cost storage service for data archiving and long-term backup, Storage Gateway for hybrid storage, and snowball, a device used for offline small to medium scale data transfer.


And other database services like Amazon Aurora a SQL compatible relational database, RDS (relational database service), DynamoDB NoSQL database, Amazon ElastiCache forElasti Cache in-memory data store, Redshift data warehouse, Amazon Neptune a graph database.

Azure Storage:

Azure Blobs A massively scalable object storage solution, includes support for big data analytics through Data Lake Storage Gen2, Azure Files Managed file storage solution with support for on-prem, Azure Queues A reliable messaging store, Azure Tables A NoSQL storage solution for structured data.

Azure Disks Block-level storage volumes for Azure VMs similar to Amazon EBS.


Database Services Include SQL based database like Azure SQL Database, Azure Database for MySQL, and, Azure Database for PostgreSQL for NoSQL data warehouse services, Cosmos DB, and table storage, Server stretch database is a hybrid storage service designed specifically for organizations leveraging Microsoft SQL on-prem and, Redis cache is an in-memory data storage service.

Google Cloud Storage:

GCP’s cloud storage service includes Google Cloud Storage unified, scalable, and highly durable object storage, Filestore network-attached storage (NAS) for Compute Engine and GKE instances, Persistent Disk object storage for VM instances and, Transfer Appliance for Large data transfer.


On the database side, GCP has 3 NoSQL database Cloud BigTable for storing big data, Firestore a document database for mobile and web application data, Bigquery an analytics server, Memorystore for in-memory storage, Firebase Realtime Database cloud database for storing and syncing data in real-time. SQL-based Cloud SQL and a relational database called, Cloud Spanner that is designed for mission-critical workloads.

Benchmarks Reports

An additional drill-down would be to analyze performance figures for the three across for network, storage, and CPU, and here I quote research data from a study conducted by Cockroach labs.


GCP has taken significant strides when it comes to network and latency compared to last year as it even outperforms AWS and Azure in network performance

  • Some of GCP’s best performing machines hover around 40-60 GB/sec
  • AWS machines stick to their claims and offer a consistent 20 to 25 GB/sec and
  • Azure’s machines offered significantly less at 8 GB/sec.  
  • When it comes to latency AWS outshines the competition by offering a consistent performance across all of its machines.
  • GCP does undercut AWS under some cases but still lacks the consistency of AWS.
  • Azure’s negligible performance in the network department has reflected in high latency making it the least performing among the three.

NOTE: GCP believes that skylake for the n1 family of machines, is the reason for their increase in performance on the network side.


AWS has superior performance in storage; neither GCP nor Azure even comes close to the read-write speeds and latency figures. This is largely due to the storage optimized instances like the i3 series. Azure and GCP do not have storage optimized instances and have a performance that is comparable to the non-storage optimized instances from Amazon While Azure offered slightly better read-write speed among the two, GCP offered better latency.


While comparing the CPU’s performances Azure machines showcased a slightly higher CPU performance thanks to Using conventional 16 core CPUs. Azure machines use 16 cores with a single thread per core and other clouds use hyperthreading to achieve 16 cores by combining 8cores with 2 threads. After comparing each offering across the three platforms here’s the best each cloud platform has to offer.

  • AWS c5d.4xlarge 25000 – 50000 Bogo ops per sec
  • Azure Standard_DS14_v2  just over 75000 Bogo ops per sec
  • GCP c2-standard-16 25000 – 50000 Bogo ops per sec
  • While AWS and GCP figures look similar AWS overall offers slightly better than GCP and
  • Avoiding hyperthreading has inflated Azure’s figures and while it might still be superior in performance it may not accurately represent the difference in the performance power it offers.

For detailed benchmarking reports visit Cockroach Labs  

Key Exclusives

Going forward, technologies like Artificial Intelligence, Machine Learning, the Internet of Things(IoT), and serverless computing will play a huge role in shaping the technology industry. The goal of most of the services and products will try to take advantage of these technologies to deliver solutions more efficiently and with precision. All of the “BIG 3“providers have begun experimenting with offerings in these areas. This can very well be the key differentiator between them.

AWS Key Tools:

Some of the latest additions to the AWS portfolio include AWS Graviton processors built using 64 bit Arm Neoverse cores. EC2 based M6g, C6g, and R6g instances are powered by these new-gen instances. Thanks to the power-efficient Arm architecture it is said to provide 40% better price performance over the X86 based instances.

AWS Outpost: Outpost is Amazon’s emphasis on the hybrid architecture; it is a fully managed ITaaS solution that brings all AWS products and services to anywhere by physically deploying it in your site. It is aimed at offering a consistent hybrid experience with the scalability and flexibility of AWS.

AWS has put a lot of time and effort into developing a relatively broad range of products and services in AI and ML space. Some of the important ones include AWS Sagemaker service for training and deploying machine learning models, the Lex conversational interface, and Polly text-to-speech service which powers Alexa services, its Greengrass IoT messaging service and the Lambda serverless computing service.

And AI-powered services like DeepLens which can be trained and used for OCR, Image, and, character Recognition, Gluon, an open-source deep-learning library designed to build and quickly train neural networks without having to know AI programming.

Azure Key Tools:

When it comes to hybrid support Azure offers a very strong proposition, with services like Azure stack and Azure Arc minimize your risks of going wrong. Knowing that a lot of enterprises are already using Microsoft’s services Azure tries to deepen this by offering enhanced security and flexibility through its hybrid services. With Azure Arc customers can manage resources deployed within Azure and outside of Azure through the same control plane enabling organizations to extend Azure services to their on-prem data centers.

Azure also consists of a comprehensive family of AI services and cognitive APIs which helps you build intelligent apps, services like Bing Web Search API, Text Analytics API, Face API, Computer Vision API and Custom Vision Service come under it. For IoT, it has several management and analytics services, and it also has a serverless computing service known as Functions.

Google Cloud Key Tools:

AI and machine learning are big areas of focus for GCP. Google is a leader in AI development, thanks to TensorFlow, an open-source software library for building machine learning applications. It is the single most popular library in the market, with AWS also adding support for TensorFlow in an acknowledgment of this.

Google Cloud has strong offerings in APIs for natural language, speech, translation, and more. Additionally, it offers IoT and serverless services, but both are still in beta stage. However Google has been working extensively on Anthos, as quoted by Sundar Pichai Anthos follows the “Write once and run anywhere” approach by allowing organizations to run Kubernetes workloads on-premises, AWS or Azure, however, Azure support is still in a beta testing stage. 


Each of the three has its own set of features and come with their own set of constraints and advantages. The selection of the appropriate cloud provider should, therefore, like with most enterprise software be based on your organizational goals over the long term.

However, we strongly believe that multi-cloud will be the way forward for an organization for e.g. if an organization is an existing user of Microsoft’s services it is natural for it to prefer Azure. Most small, web-based/digitally native companies looking to scale quickly by leveraging AI/ML, Data services, would want to take a good look at Google Cloud. And of course, AWS with its absolute scale of products and services and maturity makes it very hard to ignore in any mix.

Hope this shed some light on the technical considerations, and will follow this up with some of the other key evaluating factors that we think you should consider while selecting your cloud provider.

Serverless Data processing Pipeline with AWS Step Functions and Lambda

By | Blogs, data, Data Lake, Data pipeline | One Comment

Written by Arun Kumar, Associate Cloud Architect at Powerupcloud Technologies

In the traditional ETL world generally, we use our own scripts or any paid tool or Open source Data processing tool or an orchestrator to deploy our data pipeline. If the Data processing pipeline is not complex, if we use these server-based solutions then sometimes it would add additional costs(considering deploying some non-complex pipelines). In AWS we have multiple serverless solutions Lambda and Glue. But lambda has the execution time limitation and Glue is running an EMR cluster in the background, so ultimately it’ll charge you a lot. So we decided to explore AWS Step functions with Lambda which are serverless at the same time as an orchestration service that executes our process on the event bases and terminates the resources post-execution of the process. Let’s see how we can build a data pipeline with this.

Architecture Description:

  1. The Teradata server from on-prem will send the input.csv  file to the S3  bucket(data_processing folder) on schedule bases.
  2. CloudWatch Event Rule will trigger the step function in the case of PutObject in the specified S3 bucket and start processing the input file.
  3. The cleansing script placed on ECS.
  4. AWS Step function will call Lambda Function and it will trigger ECS tasks(a bunch of Python and R script).
  5. Once the cleansing is done the output file will be uploaded to the target S3 bucket.
  6. AWS lambda function will be triggered to get the output file from the target bucket and send it to the respective team.

Create a custom CloudWatch Event Rule for S3 put object operation

Choose Event Pattern -> Service Name -> S3 -> Event Type -> Object level operations -> choose put object -> give the bucket name.

  • In targets choose the step function to be triggered -> give the name of the state machine created.
  • Create a new role or existing role as a cloud watch event requires permission to send events to your step function.
  • Choose one more target to trigger the lambda function -> choose the function which we created before.

AWS Management Console and search for Step Function

  • Create a state machine
  • On the Define state machine page select Author with code snippets.
  • Give a name. Review the State machine definition and visual workflow.
  • Use the graph in the Visual Workflow pane to check that your Amazon States Language code describes your state machine correctly.
  • Create or If you have previously created an IAM role select created an IAM role.

Create an ECS with Fargate

ECS console -> choose create cluster -> choose cluster template -> Networking only -> Next -> Configure -> cluster -> give cluster name -> create

In the navigation pane, choose Task Definitions, Create new Task Definition.

On the Select compatibilities page, select the launch type that your task should use and choose Next step. Choose Fargate launch type.

For Task Definition Name, type a name for your task definition.

For Task Role, choose an IAM role that provides permissions for containers in your task to make calls to AWS API operations on your behalf.

To create an IAM role for your tasks

a.   Open the IAM console.

b.   In the navigation pane, choose Roles, Create New Role.

c.   In the Select Role Type section, for the Amazon Elastic Container Service Task Role service role, choose Select.

d.   In the Attach Policy section, select the policy to use for your tasks and then choose Next Step.

e.   For Role Name, enter a name for your role. Choose Create Role to finish.

Task execution IAM role, either select your task execution role or choose to Create a new role so that the console can create one for you.

Task size, choose a value for Task memory (GB) and Task CPU (vCPU).

For each container in your task definition, complete the following steps:

a.   Choose Add container.

b.   Fill out each required field and any optional fields to use in your container definitions.

c.   Choose Add to add your container to the task definition.

Create a Lambda function

  • Create lambda function to call the ECS

In lambda console -> Create function -> Choose Author from scratch -> give function name -> Give runtime -> choose python 3.7 -> Create a new role or if you have existing role choose the role with required permission [ Amazon ECS Full Access, AWS Lambda Basic Execution Role ]

import boto3
import os
import time
 client = boto3.client('ecs')
def lambda_handler(event,context):
	response = client.run_task(
        	'awsvpcConfiguration': {
            	'subnets': ['subnet-f5e959b9','subnet-11713279'],
            	'assignPublicIp': 'ENABLED',
            	'securityGroups': ['sg-0462860d9c60d87d3']
	print("this is the response",response)
	print (task_arn)
	stop_response = client.stop_task(
	print (stop_response)
	return str(stop_response) ]
  • Give the required details such as cluster, launch Type, task Definition, count, platform Version, network Configuration.
  • Applications hosted in ECS Fargate will process the data_process.csv file and out file will be pushed to S3 bucket of output folder.

Create Notification to trigger lambda function(Send Email)

  • To enable the event notifications for an S3 bucket -> open the Amazon S3 console.
  • In the Bucket name list, choose the name of the bucket that you want to enable events for.
  • Choose Properties ->Under Advanced settings, choose Events ->Choose Add notification.
  •  In Name, type a descriptive name for your event configuration.
  • Under Events, select one or more of the type of event occurrences that you want to receive notifications for. When the event occurs a notification is sent to a destination that you choose.
  • Type an object name Prefix and/or a Suffix to filter the event notifications by the prefix and/or suffix.
  •  Select the type of destination to have the event notifications sent to.
  • If you select the Lambda Function destination type, do the following:
  • In Lambda Function, type or choose the name of the Lambda function that you want to receive notifications from Amazon S3 and choose to save.
  • Create a lambda function with Node.js
    • Note: Give bucket name, folder, file name, Email address verified.
[ var aws = require('aws-sdk');
var nodemailer = require('nodemailer');
var ses = new aws.SES({region:'us-east-1'});
var s3 = new aws.S3();
 function getS3File(bucket, key) {
	return new Promise(function (resolve, reject) {
            	Bucket: bucket,
            	Key: key
        	function (err, data) {
            	if (err) return reject(err);
            	else return resolve(data);
 exports.handler = function (event, context, callback) {
     getS3File('window-demo-1', 'output/result.csv')
    	.then(function (fileData) {
        	var mailOptions = {
            	from: '',
            	subject: 'File uploaded in S3 succeeded!',
            	html: `<p>You got a contact message from: <b>${event.emailAddress}</b></p>`,
            	to: '',
            	attachments: [
                        filename: "result.csv",
                        content: fileData.Body
            console.log('Creating SES transporter');
        	// create Nodemailer SES transporter
        	var transporter = nodemailer.createTransport({
            	SES: ses
        	// send email
            transporter.sendMail(mailOptions, function (err, info) {
            	if (err) {
                    console.log('Error sending email');
            	} else {
                    console.log('Email sent successfully');
    	.catch(function (error) {
            console.log('Error getting attachment from S3');
}; ]


If you are looking for a serverless orchestrator for your batch processing or on a complex Data Processing pipeline then give a try with AW Step functions and Lambda here we use  ECS Farget to cleanse the data. If your Data processing script is more complex you can integrate with Glue but still Step function will act as your orchestrator. 

Azure Data Factory – Setting up Self-Hosted IR HA enabled

By | Blogs, data | No Comments

Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies


In the world of big data, raw, unorganized data is often stored in relational, non-relational, and other storage systems. However, on its own, raw data doesn’t have the proper context or meaning to provide meaningful insights to analysts, data scientists, or business decision-makers.

Big data requires service that can orchestrate and operationalize processes to refine these enormous stores of raw data into actionable business insights. Azure Data Factory(ADF) is a managed cloud service that’s built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.

This is how Azure introduces you to ADF. You can refer to the Azure documentation on ADF to know more.

Simply said, ADF is an ETL tool that will help you connect to various data sources to load data, perform transformations as per your business logic, and store them into different types of storages. It is a powerful tool and will help solve a variety of use cases.

In this blog, we will create a self hosted integration runtime (IR) with two nodes for high availability.

Use Case

A reputed client on OTT building an entire Content Management System (CMS) application on Azure having to migrate their old data or historical data from AWS which is hosting their current production environment. That’s when ADFs with self-hosted IRs come to your rescue – we were required  to connect to a different cloud, different VPC, private network, or on-premise data sources.

Our use-case here was to read data from a production AWS RDS MySQL Server inside a private VPC from ADF. To make this happen, we set up a two node self-hosted IR with high availability (HA).


  •  Windows Server VMs (Min 2 – Node1 & Node2)
  • .NET Framework 4.6.1 or later
  • For working with Parquet, ORC, and Avro formats you will require 
    • Visual C++ 2010 Redistributable Package (x64)
    • Java

Installation Steps

Step1: Login to the Azure Portal. Go to

Step 2: Search for Data Factory in the Search bar. Click on + Add to create a new Data Factory.

Step 3: Enter a valid name for your ADF.

Note: The name can contain only letters, numbers, and hyphens. The first and last characters must be a letter or number. Spaces are not allowed.

Select the Subscription & Resource Group you want to create this ADF in. It is usually a good practice to enable Git for your ADF. Apart from being able to  store all your code safely, this also helps you when you have to migrate your ADF to a production subscription. You can get all your pipelines on the go.

Step 4: Click Create

You will need to wait for a few minutes, till your deployment is complete. If you get any error messages here, check your Subscription & Permission level to make sure you have the required permissions to create data factories.

Click on Go to resource

Step 5:

Click on Author & Monitor

Next, click on the Pencil button on the left side panel

Step 6: Click on Connections

Step 7: Under Connections tab, click on Integration runtimes, click on + New to create a new IR

Step 8: On clicking New, you will be taken to the IR set-up wizard.

Select Azure, Self-Hosted and click on Continue

Step 9: Select Self-Hosted  and Continue

Step 10: Enter a valid name for your IR, and click Create

Note: Integration runtime Name can contain only letters, numbers and the dash (-) character. The first and last characters must be a letter or number. Every dash (-) character must be immediately preceded and followed by a letter or a number. Consecutive dashes are not permitted in integration runtime names.

Step 11:

On clicking Create, your IR will be created.

Next you will need to install the IRs in your Windows VMs. At this point you should login to your VM (Node1) or wherever you want to install your

You are provided with two options for installation :

  • Express Setup – This is the easiest way to install and configure your IRs.  We are following the Express Setup in this setup. Connect to your Windows Server where you want to install.

Login to Azure Portal in your browser (inside your VM) → Data Factory →  select your ADF → Connections → Integration Runtimes →  integrationRuntime1 → Click Express Setup → Click on the link to download setup files.

  • Manual Setup – You can download the integration runtime and add the authentication keys to validate your installation.

Step 12: Express Setup

Click on the downloaded file.

On clicking on the downloaded file, your installation will start automatically.

Step 13:

Once the installation and authentication is successfully completed, go to the Start Menu → Microsoft Integration Runtime → Microsoft Integration Runtime

Step 14: You will need to wait till your node is able to connect to the cloud service. If for any reason, you get any error at this step, you can troubleshoot by referring to self hosted integration runtime troubleshoot guide

Step 15: High availability 

One node setup is complete. For high availability, we will need to set up at least 2 nodes. An IR can have a max of 4 nodes.

Note: Before setting up other nodes, you need to enable remote access. To enable remote access, you need to make sure you are doing it in your very first node, i.e, you have a single node when you are doing this configuration, you might face issues with connectivity later if you forget this step.

Go to Settings tab and  Click on Change under Remote access from intranet

Step 16:

Select Enable without TLS/SSL certificate (Basic) for dev/test purpose, or use TLS/SSL for a more secured connection.

You can select a different TCP port – else use the default 8060

Step 17:

Click on OK. Your IR will need to be restarted for this change to be effected. Click OK again.

You will notice remote access enabled for your node.

Step 18:

Login to your other VM (Node2). Repeat Steps 11 to 17. At this point you will probably get a Connection Limited message stating your nodes are not able to connect to each other. Guess why? We will need to enable inbound access to port 8060 for both nodes.

Go to Azure Portal → Virtual Machines → Select your VM (Node1) → Networking.

Click on Add inbound port rule

Step 19:

Select Source → IP Addresses → Set Source IP as the IP of your Node2. Node2 will need to connect to Port 8060 of Node 1. Click Add

Node1 IP – & Node2 IP – You can use either of private or public IP addresses.

We will need to do a similar exercise for Node2.

Go to the VM page of Node2 and add Inbound rule for Port 8060. Node1 & Node2 need to be able to communicate with each other via port 8060.

Step 20:

If you go to your IR inside your Node1 and Node2, you will see the green tick implying your nodes are successfully connected to each other and also to the cloud. You can wait for some time for this sync to happen. If for some reason, you get an error at this step, you can view integration runtime logs from Windows Event Viewer to further troubleshoot. Restart both of your nodes.

To verify this connection, you can also check in the ADF Console.

Go to your Data Factory → Monitor (Watch symbol on the left panel, below Pencil symbol – Check Step 5) → Integration runtimes

Here you can see the number of registered nodes and their resource utilization. The HIGH AVAILABILITY ENABLED featured is turned ON now.

Step 21: Test Database connectivity from your Node

If you want to test database connectivity from your Node, make sure you have whitelisted the Public IP of your Node at the Database Server inbound security rules.

For e.g, if your Node1 has an IP address 66.666.66.66 and needs to connect to an AWS RDS MySQL Server. Go to your RDS security group and add Inbound rules of your MySQL Port for this IP.

To test this. Login to your Node1 → Start → Microsoft Integration Runtime → Diagnostics → Add your RDS connection details → Click on Test


This brings you to the end of successfully setting up a self-hosted IR with high availability enabled.

Hope this was informative. Do leave your comments below. Thanks for reading.


Detect highly distributed web DDoS on CloudFront, from botnets

By | Blogs, Cloud, Cloud Assessment | No Comments

Author: Niraj Kumar Gupta, Cloud Consulting at Powerupcloud Technologies.

Contributors: Mudit Jain, Hemant Kumar R and Tiriveedi Srividhya


 CloudWatch Metrics

Metrics are abstract data points indicating performance of your systems. By default, several AWS services provide free metrics for resources (such as Amazon EC2 instances, Amazon EBS volumes, and Amazon RDS DB instances).

CloudWatch Alarms

AWS CloudWatch Alarm is a powerful service provided by Amazon for monitoring and managing our AWS services. It provides us with data and actionable insights that we can use to monitor our application/websites, understand and respond to critical changes, optimize resource utilization, and get a consolidated view of the entire account. CloudWatch collects monitoring and operational information in the form of logs, metrics, and events. You can configure alarms to initiate an action when a condition is satisfied, like reaching a pre-configured threshold.

CloudWatch Dashboard

Amazon CloudWatch Dashboards is a feature of AWS CloudWatch that offers basic monitoring home pages for your AWS accounts. It provides resource status and performance views via graphs and gauges. Dashboards can monitor resources in multiple AWS regions to present a cohesive account-wide view of your accounts.

CloudWatch Composite Alarms

Composite alarms enhance existing alarm capability giving customers a way to logically combine multiple alarms. A single infrastructure event may generate multiple alarms, and the volume of alarms can overwhelm operators or mislead the triage and diagnosis process. If this happens, operators can end up dealing with alarm fatigue or waste time reviewing a large number of alarms to identify the root cause. Composite alarms give operators the ability to add logic and group alarms into a single high-level alarm, which is triggered when the underlying conditions are met. This gives operators the ability to make intelligent decisions and reduces the time to detect, diagnose, and performance issues when it happen.

What are Anomaly detection-based alarms?

Amazon CloudWatch Anomaly Detection applies machine-learning algorithms to continuously analyze system and application metrics, determine a normal baseline, and surface anomalies with minimal user intervention. You can use Anomaly Detection to isolate and troubleshoot unexpected changes in your metric behavior.

Why Composite Alarms?

  1. Simple Alarms monitor single metrics. Most of the alarms triggered, limited by the design, will be false positives on further triage. This adds to maintenance overhead and noise.
  2. Advance use cases cannot be conceptualized and achieved with simple alarms.

Why Anomaly Detection?

  1. Static alarms trigger based on fixed higher and/or lower limits. There is no direct way to change these limits based on the day of the month, day of the week and/or time of the day etc. For most businesses these values change massively over different times of the day and so on. Specially so, while monitoring user behavior impacted metrics, like incoming or outgoing traffic. This leaves the static alarms futile for most of the time. 
  2. It is cheap AI based regression on the metrics.

Solution Overview

  1. Request count > monitored by anomaly detection based Alarm1.
  2. Cache hit > monitored by anomaly detection based Alarm2.
  3. Alarm1 and Alarm2 > monitored by composite Alarm3.
  4. Alarm3 > Send Notification(s) to SNS2, which has lambda endpoint as subscription.
  5. Lambda Function > Sends custom notification with CloudWatch Dashboard link to the distribution lists subscribed in SNS1.



  1. Enable additional CloudFront Cache-Hit metrics.


This is applicable to all enterprise’s CloudFront CDN distributions.

  1. We will configure an Anomaly Detection alarm on request count increasing by 10%(example) of expected average.

2. We will add an Anomaly Detection alarm on CacheHitRate percentage going lower than standard deviation of 10%(example) of expected average.

3. We will create a composite alarm for the above-mentioned alarms using logical AND operation.

4. Create a CloudWatch Dashboard with all required information in one place for quick access.

5. Create a lambda function:

This will be triggered by SNS2 (SNS topic) when the composite alarm state changes to “ALARM”. This lambda function will execute to send custom notifications (EMAIL alerts) to the users via SNS1 (SNS topic)

The target arn should be the SNS1, where the user’s Email id is configured as endpoints.

In the message section type the custom message which needs to be notified to the user, here we have mentioned the CloudWatch dashboard URL.

6. Create two SNS topics:

  • SNS1 – With EMAIL alerts to users [preferably to email distribution list(s)].
  • SNS2 – A Lambda function subscription with code sending custom notifications via SNS1 with links to CloudWatch dashboard(s). Same lambda can be used to pick different dashboard links based on the specific composite alarm triggered, from a DynamoDB table with mapping between SNS target topic ARN to CloudWatch Dashboard link.

7. Add notification to the composite alarm to send notification on the SNS2.

Possible False Positives

  1. There is some new promotion activity and the newly developed pages for the promotional activity.
  2. Some hotfix went wrong at the time of spikes in traffic.


This is one example of implementing a simple setup of composite alarms and anomaly-based detection alarms to achieve advance security monitoring. We are submitting the case that these are very powerful tools and can be used to design a lot of advanced functionalities.

Why hybrid is the preferred strategy for all your cloud needs

By | AWS, Azure, Blogs, GCP, hybrid cloud, Powerlearnings | No Comments

Written by Kiran Kumar, Business analyst at Powerupcloud Technologies.

While public cloud is a globally accepted and proven solution for CIO’s and CTO’s looking for a more agile, scalable and versatile  IT environment, there are still questions about security, reliability, cloud readiness of the enterprises and that require a lot of time and resources to fully migrate to a cloud-native organization. This is exacerbated especially for start-ups, as it is too much of a risk to work with these uncertainties. This demands a solution that is innocuous and less expensive to drive them out of the comforts of their existing on-prem infrastructure. 

Under such cases, a hybrid cloud is the best approach providing you with the best of both worlds while keeping pace with all your performance, & compliance needs within the comforts of your datacenter.

So what is a hybrid cloud?

Hybrid cloud delivers a seamless computing experience to the organizations by combining the power of the public and private cloud and allowing data and applications to be shared between them. It provides enterprises the ability to easily scale their on-premises infrastructure to the public cloud to handle any fluctuations in the work-load without giving third-party datacenters access to the entirety of their data. Understanding the benefits, various organizations around the world have streamlined their offerings to effortlessly integrate these solutions into their hybrid infrastructures. However, an enterprise has no direct control over the architecture of a public cloud so, for hybrid cloud deployment, enterprises must architect their private cloud to achieve consistent hybrid experience with the desired public cloud or clouds. 

A 2019 survey (of 2,650 IT decision-makers from around the world) respondents reported steady and substantial hybrid deployment plans over the next five years. In addition to this, a vast majority of 2019 survey respondents about more than 80% selected hybrid cloud as their ideal IT operating model and more than half of these respondents cited hybrid cloud as the model that meets all of their needs. And more than 60% of them stated that data security is the biggest influencer.

Also, respondents felt having the flexibility to match the right cloud to each application showcases the scale of adaptability that enterprises are allowed to work with, in a hybrid multi-cloud environment 

Banking is one of those industries that will embrace the full benefits of a hybrid cloud, because of how the industry operates they require a unique mix of services and an infrastructure which is easily accessible and also affordable

In a recent IBM survey 

  • 50 percent of banking executives say they believe the hybrid cloud can lower their cost of IT ownership 
  • 47 percent of banking executives say they believe hybrid cloud can improve operating margin 
  • 47 percent of banking executives say they believe hybrid cloud can accelerate innovation

Hybrid adoption – best practices and guidelines 

Some of the biggest challenges in cloud adoption include security, talent, and costs, according to the report’s hybrid computing has shown that it can eliminate security challenges and manage risk, by positioning all the important digital assets and data on-prem. Private clouds are still considered to be an appropriate solution to host and manage sensitive data and applications and also the enterprises still need the means to support their conventional enterprise computing models. A sizeable number of businesses still have substantial on-premise assets comprising archaic technology, sensitive collections of data, and highly coupled legacy apps that either can’t be easily moved or swapped for public cloud. 

Here some of the guidelines for hybrid adoption 

Have a cloud deployment model for applications and data

Deployment models talk about what cloud resources and applications should be deployed and where. Hence it is crucial to understand the 2 paced system ie, steady and fast-paced system to determine the deployment models. 

A steady paced system must continue to support the traditional enterprise applications on-prem to keep the business running and maintain the current on-premise services. Additionally, off-premises services, such as private dedicated IaaS, can be used to increase infrastructure flexibility for enterprise services.

And a fast-paced system is required to satisfy the more spontaneous needs like delivering applications and services quickly whether it’s scaling existing services to satisfy spikes in demand or providing new applications quickly to meet an immediate business need. 

The next step is determining where applications and data must reside.

Placement of application and datasets on private, public or on-prem is crucial since IT architects must access the right application architecture to achieve maximum benefit. This includes understanding application workload characteristics and determining the right deployment model for multi-tier applications. 

Create heterogeneous environments 

To achieve maximum benefits from a hybrid strategy, the enterprise must leverage its existing in-house investments with cloud services, by efficiently integrating them, as new cloud services are deployed, the applications running on them with various on-premises applications and systems becomes important.

Integration between applications typically includes 

  • Process (or control) integration, where an application invokes another one in order to execute a certain workflow. 
  • Data integration, where applications share common data, or one application’s output becomes another application’s input. 
  • Presentation integration, where multiple applications present their results simultaneously to a user through a dashboard or mashup.

To obtain a seamless integration between heterogeneous environments, the following actions are necessary:

  • a cloud service provider must support open source technologies for admin and business interfaces.
  • Examine the compatibility of in-house systems to work with cloud services providers and also ensure that on-premises applications are following SOA design principles and can utilize and expose APIs to enable interoperability with private or public cloud services.
  • Leverage the support of third party ID and Access Management functionality to authenticate and authorize access to cloud services. Put in place suitable API Management capabilities to prevent unauthorized access.

Network security requirements 

Network type – Technology used for physical connection over the WAN Depends on aspects like bandwidth, latency, service levels, and costs. Hybrid cloud solutions can rely on P2P links as well as the Internet to connect on-premises data centers and cloud providers. The selection of the connectivity type depends on the analysis of aspects like performance, availability, and type of workloads. 

Security – connectivity domain needs to be evaluated and understood; to match the network security standards between cloud provider network security standards and the overall network security policies, guidelines and compliance. The encrypting and authenticating traffic on the WAN can be evaluated at the application level. And aspects like systems for the computing resources, applications must be considered and technologies such as VPNs can be employed to provide secure connections between components running in different environments.

Web apps security and Management services like DNS and DDoS protection which are available on the cloud can free up dedicated resources required by an enterprise to procure, set-up and maintain such services and instead concentrate on business applications. This is especially applicable to the hybrid cloud for workloads that have components deployed into a cloud service and are exposed to the Internet. The system that is deployed on-premises needs to adapt to work with the cloud, to facilitate problem identification activities that may span multiple systems that have different governance boundaries.

Security and privacy challenges & counter-measures

Hybrid cloud computing has to coordinate between applications and services spanning across various environments, which involves the movement of applications and data between the environments. Security protocols need to be applied across the whole system consistently, and additional risks must be addressed with suitable controls to account for the loss of control over any assets and data placed into a cloud provider’s systems. Despite this inherent loss of control, enterprises still need to take responsibility for their use of cloud computing services to maintain situational awareness, weigh alternatives, set priorities, and effect changes in security and privacy that are in the best interest of the organization. 

  • A single and uniform interface must be used to curtail or Deal with risks arising from using services from various cloud providers since it is likely that each will have its own set of security and privacy characteristics. 
  • Authentication and Authorization. A hybrid environment could mean that gaining access to the public cloud environment could lead to access to the on-premises cloud environment.
  • Compliance check between cloud providers used and in-home systems.


  • A single Id&AM system should be used.
  • Networking facilities such as VPN are recommended between the on-premises environment and the cloud.  
  • Encryption needs to be in place for all sensitive data, wherever it is located.
  • firewalls, DDoS attack handling, etc, needs to be coordinated across all environments with external interfaces.

Set-up an appropriate DB&DR plan

As already been discussed, a hybrid environment provides organizations the option to work with the multi-cloud thus offering business continuity, which has been one of the most important aspects of business operations. It is not just a simple data backup to the cloud or a Disaster Recovery Plan, it means when a disaster or failure occurs, data is still accessible with little to no downtime. Which is measured in terms of time to restart (RTO: recovery time objective) and maximum data loss allowed (RPO: recovery point objective).

Therefore a business continuity solution has to be planned considering some of the key elements such as resilience, time to restart (RTO: recovery time objective) and maximum data loss allowed (RPO: recovery point objective) which was agreed upon by the cloud provider. 

Here are some of the challenges encountered while making a DR plan 

  • Although the RTO and RPO values give us a general idea of the outcome, they cannot be trusted fully so the time required to restart the operation may take longer 
  • As the systems get back up and operational there will be a sudden burst of request for resources which is more apparent in large scale disasters.
  • Selecting the right CSP is crucial as most of the cloud providers do not provide DR as a managed service instead, they provide a basic infrastructure to enable our own DRaaS.

Hence enterprises have to be clear and select their DR strategy which best suits their IT infrastructure as this is very crucial in providing mobility to the business thus making the business more easily accessible from anywhere around the world and also data insurance in the event of a disaster natural or even in case of technical failures, by minimizing downtime and the costs associated with such an event.

How are leading OEMs like AWS, Azure and Google Cloud adapting to this changing landscape  

Google Anthos

In early, 2019 google came up with Anthos which is one of the first multi-cloud solutions from a mainstream cloud provider, Anthos is an open application modernization platform that enables you to modernize your existing applications, build new ones, and run them anywhere, built on open-source, including Kubernetes as its central command and control center, Istio enables federated network management across the platform, and Knative provides an open API and runtime environment that enables you to run your serverless workloads anywhere you choose. Anthos enables consistency between on-premises and cloud environments. Anthos helps accelerate application development and strategically enables your business with transformational technologies. 

AWS Outposts

AWS Outposts is a fully managed service that extends the same AWS hardware infrastructure, services, APIs, and tools to build and run your applications on-premises and in the cloud for a truly consistent hybrid experience. AWS compute, storage, database, and other services run locally on Outposts, and you can access the full range of AWS services available in the Region to build, manage, and scale your on-premises applications using familiar AWS services and tools. across your on-premises and cloud environments. Your Outposts infrastructure and AWS services are managed, monitored, and updated by AWS just like in the cloud.

Azure Stack

Azure Stack is a hybrid solution provided by Azure built and distributed by approved Hardware vendors(like DellLenovoHPE, etc,.) that bring Azure cloud into your on-prem data center. It is a fully managed service where hardware is managed by the certified vendors and software is managed by the Microsoft Azure. Using azure stack you can extend the azure technology anywhere, from the datacenter to edge locations and remote offices. Enabling you to build, deploy, and run hybrid and edge computing apps consistently across your IT ecosystem, with flexibility for diverse workloads.

How Powerup approaches Hybrid cloud for its customers 

Powerup is one of the few companies in the world to have achieved the status of a launch partner with AWS outposts with the experience in working on over 200+projects across various verticals and having top-tier certified expertise in all the 3 major cloud providers in the market. We can bring an agile, secure, and seamless hybrid experience across the table. Outposts is a fully managed services hence it eliminates the hassle of managing an on-prem data center so that the enterprises can concentrate more on optimizing their infrastructure

Reference Material

Practical Guide to Hybrid Cloud Computing

Multicast in AWS using AWS Transit Gateway

By | AWS, Blogs, Powerlearnings | No Comments

Written by Aparna M, Associate Solutions Architect at Powerupcloud Technologies.

Multicast is a communication protocol used for delivering a single stream of data to multiple receiving computers simultaneously.

Now AWS Transit Gateway multicast makes it easy for customers to build multicast applications in the cloud and distribute data across thousands of connected Virtual Private Cloud networks. Multicast delivers a single stream of data to many users simultaneously. It is a preferred protocol to stream multimedia content and subscription data such as news articles and stock quotes, to a group of subscribers.

Now let’s understand the key concepts of Multicast:

  1. Multicast domain – Multicast domain allows the segmentation of a multicast network into different domains and makes the transit gateway act as multiple multicast routers. This is defined at the subnet level.
  2. Multicast Group – A multicast group is used to identify a set of sources and receivers that will send and receive the same multicast traffic. It is identified by a group IP address.
  3. Multicast source – An elastic network interface associated with a supported EC2 instance that sends multicast traffic.
  4. Multicast group member – An elastic network interface associated with a supported EC2 instance that receives multicast traffic. A multicast group has multiple group members.

Key Considerations for setting up Multicast in AWS:

  • Create a new transit gateway to enable multicast
  • You cannot share multicast-enabled transit gateways with other accounts
  • Internet Group Management Protocol (IGMP) (IGMP) support for managing group membership is not supported right now
  • A subnet can only be in one multicast domain.
  • If you use a non-Nitro instance, you must disable the Source/Dest check. 
  • A non-Nitro instance cannot be a multicast sender.

Let’s walkthrough how to set up multicast via AWS Console.

Create a Transit gateway for multicast:

In order to create a transit gateway multicast follow the below steps:

  1. Open the Amazon VPC console at
  2. On the navigation pane, choose Create Transit Gateway.
  3. For Name tag, enter a name to identify the Transit gateway.
  4. Enable Multicast support.
  5. Choose Create Transit Gateway.

Create a Transit Gateway Multicast Domain

  1. On the navigation pane, choose the Transit Gateway Multicast.
  2. Choose Create Transit Gateway Multicast domain.
  3. (Optional) For Name tag, enter a name to identify the domain.
  4. For Transit Gateway ID, select the transit gateway that processes the multicast traffic.
  5. Choose Create Transit Gateway multicast domain.

Associate VPC Attachments and Subnets with a Transit Gateway Multicast Domain

To associate VPC attachments with a transit gateway multicast domain using the console

  1. On the navigation pane, choose Transit Gateway Multicast.
  2. Select the transit gateway multicast domain, and then choose ActionsCreate association.

3. For Transit Gateway ID, select the transit gateway attachment.

4. For Choose subnets to associate, select the subnets to include in the domain.

5. Choose Create association.

Register Sources with a Multicast Group

In order to register sources for transit gateway multicast:

  1. On the navigation pane, choose Transit Gateway Multicast.
  2. Select the transit gateway multicast domain, and then choose ActionsAdd group sources.
  3. For Group IP address, enter either the IPv4 CIDR block or IPv6 CIDR block to assign to the multicast domain. IP range must be in
  4. Under Choose network interfaces, select the multicast sender’s (ec2 servers) network interfaces.
  5. Choose Add sources.

Register Members with a Multicast Group

To register members in the transit gateway multicast:

  1. On the navigation pane, choose Transit Gateway Multicast.
  2. Select the transit gateway multicast domain, and then choose ActionsAdd group members.
  3. For Group IP address, enter either the IPv4 CIDR block or IPv6 CIDR block to assign to the multicast domain. Specify the same multi cast IP specified while adding source.
  4. Under Choose network interfaces, select the multicast receivers'(ec2 server) network interfaces.
  5. Choose Add members.

Modify Security groups of the Member servers(receivers):

  1. Allow inbound traffic on Custom UDP port 5001

Once your setup is completed follow the below steps to test the multicast routing.

  1. Login to all the Source and member servers.
  2. Make sure you install iperf package in all your servers in order to test the functionality
  3. Run the below command in the Source Machine
iperf -s -u -B -i 1

– will be your multicast group IP provided during the setup

  1. Run the below command in all the member servers
iperf -c -u -T 32 -t 100 -i 1

Once you start sending the data from the source server simultaneously that can be seen across all members. Below is the screenshot for your reference.


This blog helps you to host multicast applications on AWS leveraging AWS Transit gateway. Hope you found it useful.

Keycloak with java and reactJS

By | Blogs, Cloud | 2 Comments

Written by Soumya. a, Software Developer at Powerupcloud Technologies.

In a common development environment we create login algorithms and maintain all the details in the project database.which can be a risk to maintain in huge application development and maintenance of all the client data and user information so we use a 3rd party software for maintaining login details to make the application more secure. Keycloak even helps in maintaining multiple applications with different / same users. 

Problem statement:

Local database can become a task to maintain login details and maintenance of it so we use 3rd party software like keycloak to make the application more secure 

How it works:

We add keycloak to the working environment and add required details of keycloak in code.and add application details in keycloak and run in the working environment.Detailed steps for local and server environments is in the Document.


Keycloak is an open source Identity and Access Management solution aimed at modern applications and services. It makes easy to secure applications and services with little to no code.

Keycloak is used to add security to any application with the keycloak details added to the application it gives various options like a simple login ,login with username and password ,use of otp for login etc.

When we use keycloak we need not maintain login details in our database all the details are saved in the keycloak server and it is secure only the required details can be stored in our database.  

Different features of keycloak:

Users authenticate with Keycloak rather than individual applications. This means that your applications don’t have to deal with login forms, authenticating users, and storing users. Once logged-in to Keycloak, users don’t have to login again to access a different application.

This also applied to logout. Keycloak provides single-sign out, which means users only have to logout once to be logged-out of all applications that use Keycloak.

Kerberos bridge

If your users authenticate to workstations with Kerberos (LDAP or active directory) they can also be automatically authenticated to Keycloak without having to provide their username and password again after they log on to the workstation.

Identity Brokering and Social Login

Enabling login with social networks is easy to add through the admin console. It’s just a matter of selecting the social network you want to add. No code or changes to your application is required.

Keycloak can also authenticate users with existing OpenID Connect or SAML 2.0 Identity Providers. Again, this is just a matter of configuring the Identity Provider through the admin console.

User Federation

Keycloak has built-in support to connect to existing LDAP or Active Directory servers. You can also implement your own provider if you have users in other stores, such as a relational database.

Client Adapters

Keycloak Client Adapters makes it really easy to secure applications and services. We have adapters available for a number of platforms and programming languages, but if there’s not one available for your chosen platform don’t worry. Keycloak is built on standard protocols so you can use any OpenID Connect Resource Library or SAML 2.0 Service Provider library out there.


You can also opt to use a proxy to secure your applications which removes the need to modify your application at all.

Admin Console

Through the admin console administrators can centrally manage all aspects of the Keycloak server.

They can enable and disable various features. They can configure identity brokering and user federation.

They can create and manage applications and services, and define fine-grained authorization policies.

They can also manage users, including permissions and sessions.

Account Management Console

Through the account management console users can manage their own accounts. They can update the profile, change passwords, and setup two-factor authentication.

Users can also manage sessions as well as view history for the account.

If you’ve enabled social login or identity brokering users can also link their accounts with additional providers to allow them to authenticate to the same account with different identity providers.

Standard Protocols

Keycloak is based on standard protocols and provides support for OpenID Connect, OAuth 2.0, and SAML.

Authorization Services

If role-based authorization doesn’t cover your needs, Keycloak provides fine-grained authorization services as well. This allows you to manage permissions for all your services from the Keycloak admin console and gives you the power to define exactly the policies you need.

How to use:

In Local:


  1. Download keycloak
  2. Run keycloak in port 8085 (default keycloak port:8080)./ -Djboss.socket.binding.port-offset=5
  3. Log in with master login which is registered while creating keycloak

In Server:


  1. Download java 
  2. Download keycloak using  wget
  3. Run keycloak in port 8085 (default keycloak port:8080)./ -Djboss.socket.binding.port-offset=5
  4. Add ssl certificate for keycloak
  5. Log in with master login which is registered while creating keycloak

Steps for server :


Step 1: Login to the Linux server Step 2: Download Keycloak.

cd /opt/
tar -xvzf keycloak-7.0.0.tar.gz
mv keycloak-7.0.0.tar.gz keycloak

Step 3: Create a user to run keycloak application

adduser techrunnr
chown techrunnr.techrunnr -R /opt/keycloak

Step 4: switch the user to newly created user

sudo su - techrunnr

Step 5: Goto the keycloak home directory.

cd /opt/keycloak

Step 6: Execute the below command to make the application run on the reverse proxy.

./bin/ 'embed-server,/subsystem=undertow/server=default-server/http-listener=default:write-attribute(name=proxy-address-forwarding,value=true)'
./bin/ 'embed-server,/socket-binding-group=standard-sockets/socket-binding=proxy-https:add(port=443)'
./bin/ 'embed-server,/subsystem=undertow/server=default-server/http-listener=default:write-attribute(name=redirect-socket,value=proxy-https)'

Step 7: Create a systemd configuration to start and stop keycloak using systemd.

cat > /etc/systemd/system/keycloak.service <<EOF

ExecStart=/opt/keycloak/current/bin/ -b


Step 8: Reload the systemd daemon and start Keycloak.

systemctl daemon-reload
systemctl enable keycloak
systemctl start keycloak

Step 9: Create an admin user using below command line.

./bin/ -u admin -p YOURPASS -r master

Configure Nginx reverse proxy

Step 1: Login to Nginx server and update in nginx.conf file.

upstream keycloak {
# Use IP Hash for session persistence

# List of Keycloak servers

server {
listen 80;

# Redirect all HTTP to HTTPS
location / { 
return 301 https://\$server_name\$request_uri;

server {
listen 443 ssl http2;

ssl_certificate /etc/pki/tls/certs/my-cert.cer;
ssl_certificate_key /etc/pki/tls/private/my-key.key;
ssl_session_cache shared:SSL:1m;
ssl_prefer_server_ciphers on;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass http://ke

Once it’s completed restart the Nginx server to take immediate effect. Now access the given URL to access the keycloak server and use the credentials which you created in Step 9.

Steps for both server and local:

1. Create a new realm:

To create a new realm, complete the following steps:

  1. Go to http://localhost:8085/auth/admin/ and log in to the Keycloak Admin Console using the account you created in Install and Boot.
  2. From the Master drop-down menu, click Add Realm. When you are logged in to the master realm this drop-down menu lists all existing realms.
  3. Type demo iin the Name field and click Create.

When the realm is created, the main admin console page opens. Notice the current realm is now set to demo. Switch between managing the master realm and the realm you just created by clicking entries in the Select realm drop-down menu.

2. Create new client:

To define and register the client in the Keycloak admin console, complete the following steps:

  1. In the top-left drop-down menu select and manage the demo realm. Click Clients in the left side menu to open the Clients page.

2. On the right side, click Create.

3. Complete the fields as shown here:

4. Click Save to create the client application entry.

5. Change Access type to confidential 

6. Click the Installation tab in the Keycloak admin console to obtain a configuration template.

7. Select Keycloak OIDC JSON to generate an JSON template. Copy the contents for use in the next section.

3. Role: 

Roles identify a type or category of user. Keycloak often assigns access and permissions to specific roles rather than individual users for fine-grained access control.

Keycloak offers three types of roles:

  • Realm-level roles are in the global namespace shared by all clients.
  • Client roles have basically a namespace dedicated to a client.
  • A composite role is a role that has one or more additional roles associated with it. 

3.1. Create new role 

Roles->Add Role->Role name and Description(admin_role)

3.2. To add manage user permission for the newly created role

enable composite roles->client roles–>realm management->manage users.

3.3. This step is used to add manage user permission for default roles

Roles->Default Roles->Client Roles–>realm management–> add manage users.

4. Adding created role permission to your client

Client->select your client->scope->realmroles->add created role(admin_role).

5. Add permission to a new realm from master realm

5.1. Master realm->client->select your client(demo-realm)–>roles->manage-users(default false make it as true)

5.2. For making it true : Enable composite roles–>client-roles–>select your client(demo-realm)–>add manage users

6. For adding manage user permission for client in master

Master->roles->default roles->client roles–>select ur client(demo-realm)–>add manage users.

7. Once the permissions are given the first user in the new realm should be created using which we can create multiple users from code(outside keycloak)

7.1.Select your realm(demo)–>Users->New user->details(add email id and name)->


7.3.Role mappings->Client role->realm management->check if manage users is present else add

In React JS for connecting keycloak and adding authentication

Keycloak.js file

JSON file for adding keycloak server details(json from installation tab)

 "realm": "realm name",
 "auth-server-url": "keycloak url",
 "ssl-required": "external",
 "resource": "client name",
 "credentials": {
   "secret": "secret key"
 "confidential-port": 0

Keycloak functions in app.js

These functions are used to connect to java method and authenticate the user

async Keycloakfunc() {
   const keycloak = Keycloak("/keycloak.json");
   return keycloak
     .init({ onLoad: "login-required", promiseType: "native" })
     .then(authenticated => {
       if (authenticated) {
         if (sessionStorage.getItem("Loggined") == null) {
           return this.getUserId(keycloak.tokenParsed.preferred_username);
         } else {
           this.setState({ Loggined: true });
 async getUserId(user_name) {
   const endpoint = CURRENT_SERVER + "authenticateLogin";
   const bot_obj = {
     username: user_name
   return, bot_obj).then(res => {
     let data =;
     if (data.status == "success") {
       //setting token locally to access through out the application
       sessionStorage.setItem("authorization", data.response.token);
       this.setState({ isAuth: "true" });
       console.log("login success");
       localStorage.setItem("userId", JSON.stringify(data.response));
       localStorage.setItem("userName", user_name);
       localStorage.setItem("rolename", data.response.roleid);
       this.setState({ Loggined: true });
       sessionStorage.setItem("Loggined", true);
     } else if (data.status == "failed") {

In Java code:

Authenticake login service method:

This method is used to check if the username / email id is registered in our DB and authenticate for next steps 

public class LoginServices {

private UserMainRepo userMain;

private RoleUserPermissionRepo roleUserPermission;

private JwtTokenUtil jwtTokenUtil;

public String keyCloak(String object) {
String userId = null;
String roleid = null;
JSONObject responseJSON = new JSONObject();
JSONObject resultJSON = new JSONObject();

JSONObject obj = new JSONObject(object);
String username = (String) obj.get("username");

List<UserMain> authenticate = userMain.findByEmailId(username);

if (authenticate.isEmpty()) {
responseJSON.put("error", "user not found in DB");
resultJSON.put("status", "failed");
resultJSON.put("response", responseJSON);
} else {
List<RoleUserPermission> roleUserData = roleUserPermission.findByUserMainId(authenticate.get(0));
userId = roleUserData.get(0).getId();
roleid = roleUserData.get(0).getRoleMasterId().getId();

// Creating JWT token for security
JwtUserDetails userDetails = new JwtUserDetails();
final String token = jwtTokenUtil.generateToken(userDetails);
final String secretKey = "botzer";
String encryptedToken = AES.encrypt(token, secretKey);
responseJSON.put("token", encryptedToken);
responseJSON.put("userId", userId);
responseJSON.put("isLoggedIn", "true");
responseJSON.put("roleid", roleid);
resultJSON.put("status", "success");
resultJSON.put("response", responseJSON);
return resultJSON.toString();

User controller Method :

All the details for creating user such as email id username etc is passed from front end as a JSON

This method to create user 

@RequestMapping(value = "/createUser", method = RequestMethod.POST,consumes = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<String> createUser(@RequestBody UserMain json) throws ParseException 
String response=userservices.createUser(json);
return new ResponseEntity<String>(response, HttpStatus.OK);

User services method :

This Method is used to create agent:

public String createUser(@RequestBody UserMain user) {

try {
if (user.getId() == null) {
int count = getUserEmailIdCount(user.getEmailId());

if (count == 0) {
String userresult = createAgentOnKeyclock(user);
return userresult;

} else {
JSONObject obj = new JSONObject();
return "Failure";

catch(Exception e) {
return "Failure";

This method to get if the email id is existing and size of list :

public int getUserEmailIdCount(String emilId) {
List<UserMain> list= usermainRepo.findAllByEmailId(emilId);
return list.size();


This method is used to create user-id and password to the keycloack.\

public String createAgentOnKeyclock(UserMain usermain) {

try {
String serverUrl = Credentials.KEYCLOAK_SERVER_URL;
String realm = Credentials.KEYCLOAK_RELAM;
String clientId = Credentials.KEYCLOAK_CLIENT_ID;
String clientSecret = Credentials.KEYCLOAK_SECREAT_KEY;

Keycloak keycloak = KeycloakBuilder.builder() //
.serverUrl(serverUrl) //
.realm(Credentials.KEYCLOAK_RELAM) //
.grantType(OAuth2Constants.PASSWORD) //
.clientId(clientId) //
.clientSecret(clientSecret) //
.username(Credentials.KEYCLOAK_USER_NAME) //
.password(Credentials.KEYCLOAK_USER_PASSWORD) //

// Define user
UserRepresentation user = new UserRepresentation();

// Get realm
RealmResource realmResource = keycloak.realm(realm);
UsersResource userRessource = realmResource.users();

Response response = userRessource.create(user);

System.out.println("response : " + response);

String userId = response.getLocation().getPath().replaceAll(".*/([^/]+)$", "$1");

System.out.println("userId : " + userId);
// Define password credential
CredentialRepresentation passwordCred = new CredentialRepresentation();

// Set password credential
String userObj = createUser(usermain,userId);
return userObj;
} catch (Exception e) {
return "Failure";

Credential page to store keycloak credentials:

Keycloak client details (details from installation JSON from keycloak)

  • public static final String KEYCLOAK_SERVER_URL = “Keycloak server url“;
  • public static final String KEYCLOAK_RELAM = “Realm name“;
  • public static final String KEYCLOAK_CLIENT_ID = “Client name“;
  • public static final String KEYCLOAK_SECREAT_KEY = “Secret key of new client“;

First user in new realm with manage user permission details 

  • public static final String KEYCLOAK_USER_NAME = “First user emailId“;
  • public static final String KEYCLOAK_USER_PASSWORD = “First user password“;

Default password is stored in keycloak

  • public static final String DEFAULT_PASSWORD_AGENT_CREATION = “Default password“;
  • public static final String DB_SCHEMA_NAME = null;


Using keycloak in the application makes it more secure and easy for maintaining user data for login.

Text to Speech using Amazon Polly with React JS & Python

By | AI, AWS, Blogs, Chatbot | One Comment

Written by Ishita Saha, Software Engineer, Powerupcloud Technologies

In this blog, we will discuss how we can integrate AWS Polly using Python & React JS to a chatbot application.

Use Case

We are developing a Chatbot Framework where we use AWS Polly for an exquisite & lively voice experience for our users

Problem Statement

We are trying to showcase how we can integrate AWS Polly voice services with our existing chatbot application built on React JS & Python.

What is AWS Polly ?

Amazon Polly is a service that turns text into lifelike speech. Amazon Polly enables existing applications to speak as a first-class feature and creates the opportunity for entirely new categories of speech-enabled products, from mobile apps and cars to devices and appliances. Amazon Polly includes dozens of lifelike voices and support for multiple languages, so you can select the ideal voice and distribute your speech-enabled applications in many geographies. Amazon Polly is easy to use – you just send the text you want converted into speech to the Amazon Polly API, and Amazon Polly immediately returns the audio stream to your application so you can play it directly or store it in a standard audio file format, such as MP3.

AWS Polly is easy to use. We only need an AWS subscription. We can test Polly directly from the AWS Console.

Go to :

There is an option to select Voice from Different Languages & Regions.

Why Amazon Polly?

You can use Amazon Polly to power your application with high-quality spoken output. This cost-effective service has very low response times, and is available for virtually any use case, with no restrictions on storing and reusing generated speech.


User provides input to the Chatbot. This Input goes to our React JS Frontend, which interacts internally with a Python Application in the backend. This Python application is responsible for interacting with AWS Polly and sending response back to the React app which plays the audio streaming output as mp3.

React JS

In this implementation, we are using the Audio() constructor.

The Audio() constructor creates and returns a new HTMLAudioElement which can be either attached to a document for the user to interact with and/or listen to, or can be used offscreen to manage and play audio.

Syntax :

audio = new Audio(url);

Methods :

play – Make the media object play or resume after pausing.
pause – Pause the media object.
load – Reload the media object.
canPlayType – Determine if a media type can be played.
Here, we are using only play() and pause() methods in our implementation.

Step 1: We have to initialize a variable into the state.

this.state = {
audio : "",
languageName: "",
voiceName: ""

Step 2 : Remove all unwanted space characters from input.

response = response.replace(/\//g, " ");
response = response.replace(/(\r\n|\n|\r)/gm, "");

Step 3 : If any existing reply from Bot is already in play. We can stop it.

if ( != undefined) {;

Step 4 :

This method interacts with our Python Application. It sends requests to our Python backend with the following parameters. We create a new Audio() object. We are passing the following parameters dynamically to handle speaker() method :

  • languageName
  • voiceName
  • inputText
handleSpeaker = inputText => {
     audio: ""
     audio: new Audio(
       POLLY_API +
         "/texttospeech?LanguageCode=" +
         this.state.languageName +
         "&VoiceId=" +
         this.state.voiceName +
    "&Text=" + inputText

Step 5 : On getting the response from our POLLY_API Python App, we will need to play the mp3 file.;


The Python application communicates with AWS Polly using AWS Python SDK – boto3.

Step 1: Now we will need to configure AWS credentials for accessing AWS Polly by using Secret Key, Access Keys & Region.

import boto3
def connectToPolly():
 polly_client = boto3.Session(

 return polly_client

Here, we are creating a polly client to access AWS Polly Services.

Step 2: We are using synthesize_speech() to get an audio stream file.

Request Syntax :

response = client.synthesize_speech(

Response Syntax :

    'AudioStream': StreamingBody(),
    'ContentType': 'string',
    'RequestCharacters': 123

We are calling textToSpeech Flask API which accepts parameters sent by React and further proceeds to call AWS Polly internally. The response is sent back to React as a mp3 file. The React application then plays out the audio file for the user.

@app.route('/textToSpeech', methods=['GET'])
def textToSpeech():
 languageCode = request.args.get('LanguageCode')
 voiceId = request.args.get('VoiceId')
 outputFormat = request.args.get('OutputFormat')
 polly_client = credentials.connectToPolly(aws_access_key, aws_secret_key, region)
 response = polly_client.synthesize_speech(Text="<speak>" + text + "</speak>",    
 return send_file(response.get("AudioStream"),


This blog showcases the simple implementation of React JS integration with Python to utilize AWS Polly services. This can be used as a reference for such use cases with chatbots.

AWS WorkSpaces Implementation

By | AWS, Blogs, Powerlearnings | 3 Comments

Written by Arun Kumar, Associate Cloud Architect at Powerupcloud Technologies


Amazon WorkSpaces is managed & secured Desktop-as-a-service (DaaS) provided by AWS cloud. WorkSpace eliminates the need for provisioning the hardware and software configurations, which becomes the easy tasks for IT admins to provision managed desktops on cloud.  End users can access the virtual desktop from any device or browser like Windows, Linux, iPad, and Android. Managing the corporate applications for end users becomes easier using WAM (Workspace Application Manager) or integrating with existing solutions like SCCM,WSUS and more.

To manage the end user’s and provide them access to WorkSpaces below solutions can be leveraged with AWS.

  • Extending the existing on-premises Active Directory by using AD Connector in AWS.
  • Create & configure AWS managed Simple AD or Microsoft Active Directory based on size of the organization.

WorkSpaces architecture with simple AD approach

In this architecture, WorkSpace is deployed for the Windows and Linux virtual desktop both are associated with the VPC and the Directory service (Simple AD) to store and manage information of users and WorkSpace.

The above architecture describes the flow of end users accessing Amazon WorkSpaces using Simple AD which authenticates users. Users access their WorkSpaces by using a client application from a supported device or web browser, and they log in by using their directory credentials.The login information is sent to an authentication gateway, which forwards the traffic to the directory for the WorkSpace. Once the user is authenticated, a streaming traffic is processed through the streaming gateway which works over PCoIP protocol to provide the end users complete experience of the desktop.


To use the WorkSpace the following requirements need to be completed.

  • A directory service to authenticate users and provide access to their WorkSpace.
  • The WorkSpaces client application is based on the user’s device and requires an Internet connection.

For this demo we have created the Simple AD, this can be created from the workspace console.


  • Create the Simple AD
  • Choose the Directory size based on your organization size.
  • Enter the fully qualified domain name and Administrator password make note of your admin password somewhere for reference.
  • We’ll need a minimum of two subnets created for the AWS Directory Service which requires Multi-AZ deployment.
  • Directory is now created.


Now let’s create the WorkSpace for employees.

  • Select the Directory which you need to create WorkSpace for the user access.
  • Select the appropriate subnets that we created in the previous section to provision the workspaces in a Multi-AZ deployment.
  • Ensure that the self-service permissions is always set to  “NO”, else the users will have the privilege to change the workspaces configurations on the fly without the workspaces admin knowledge.
  • Enabling WorkDocs based on the user’s requirement.
  • You can select the user from the Directory list or user can create a new user on the fly.
  • Select the Bundle of compute, operating system, storage for each of your users.

You can select the running mode of the WorkSpaces based on your company needs. This can directly impact the monthly bill as selecting “Always -On “ mode will have a fixed pricing  whereas ‘AutoStop’ mode is an on-demand pricing model. Ensure right running mode is selected during the workspaces creation based on business requirements of the user.

  • Review and launch your workSpace.
  • Now your WorkSpace is up and running. Once it is available and ready to use. You will receive an email from amazon with workspaces login details.
  • By selecting the URL to create a password for your user to access the WorkSpace.
  • Download the client based on your device or you have web login.
  • Install the WorkSpace agent in your local.
  • Open the WorkSpace client and enter the registration code which you received in the email.
  • It prompts for username and password.
  • Now you are prompted to your Virtual Desktop

Security and compliance of WorkSpace

  • By default encryption at transit.
  • KMS can be used to encrypt our data at rest.
  • IP based restrictions.
  • Multi-factor authentication(RADIUS)
  • PCI DSS Level 1 Complaint.
  • HIPAA-Eligible with business level agreements
  • Certification- ISO 9001 and ISO 27001


  • No upfront payment.
  • On- Demand pricing  – Autostop of the WorkSpaces – In this model when the user is not using the virtual desktop Amazon automatically gets stopped based on the Autostop hours selected for the user.
  • Fixed Pricing – Always-On model – In this model the WorkSpace virtual desktop cost is calculated on a fixed monthly basis based on the selected bundle.


  • Built in license – Which allows us to select the right Windows bundle as per business needs.
  • WorkSpaces additionally supports BYOL( bring your own license)  license model for Windows 10.


  • CloudTrail can monitor the API calls.
  • CloudWatch Monitoring can see the number of users connected to WorksSpaces and latency of the session and more.

Additional features

  • API support(SDK, AWS CLI)
  • WorkSpace Application Manager(WAM).
  • Custom images.
  • Audio input.
  • Pre Built applications in AWS Marketplace, we can add those applications to our WorkSpace.
  • User control in Directory level.
  • Integration with WorkDocs.


By adapting to the AWS WorkSpaces we can enable the end-users to securely access the business applications, documents that they are currently using within their organization devices or existing VDI solutions and experience a seamless performance of their desktop on cloud and also access the workspaces in the most secure way which prevents any data breach by enabling encryption options and also restricting client devices for users.

Benefits like reducing the overhead of maintenance of existing hardware and purchasing new hardware. Monitoring and managing the end-user workspaces becomes an easy task by integrating with AWS native services.