Continuing our discussion on Azure Data Factory(ADF) from our previous blogs. In the past we have discussed ADF and configuration steps for a high availability self hosted integration runtime (IR). You can read more about that here: Azure Data Factory – Setting up Self-Hosted IR HA enabled
This is a quick short post on IR sharing in ADFs for better cost optimization and resource utilization also covers common shortcomings while creating ADF using Terraform and/or SDKs.
Use Case
This is again part of a major data migration assignment from AWS to Azure. We are extensively using ADF to setup ETL pipelines and migrate data effectively – both historical and incremental data.
Problem Statement
Since the data migration activity involves different types of databases and complex data operations, we are using multiple ADFs to achieve this. Handling private production data required self-hosted IRs to be configured to connect to the production environment. The general best practices for self-hosted IR is a high-availability architecture. An IR can have max 4 nodes – a minimum of 2 nodes for high availability. So here arises the problem – for multiple ADFs how many such self-hosted IRs would one use to power this?
Solution
This is where IR sharing comes into the picture. ADF has this brilliant feature of IR sharing wherein many ADFs can share the same IR. The advantage of this will be price & resource reduction. Suppose you had to run 2 ADFs – one to perform various heavy migrations for AWS RDS MySQL to Azure, and the other one for AWS RDS PostgreSQL. Ideally we would have created 2 different IRs one each able to connect to MySQL & PostgreSQL separately. For a production level implementation, this would mean 2X4 = 8 nodes (Windows VMs). Using IR sharing, we can create one self-hosted IR with 4 nodes and share this IR with both ADFs cutting cost on 4 extra nodes. Please note – The IR node sizing depends on your workloads. That’s a separate calculation. This is only from a high level consideration.
Steps to enable IR sharing between ADFs
Step1: Login to the Azure Portal.
Step 2: Search forData Factories in the main search bar.
Step3: Select your Data Factory. Click on Author & Monitor.
Click on Pencil icon to edit.
Step 4: Click on Connections. Open Management Hub.
Step 5: Click on Integration runtimes to view all your IRs. Select your self-hosted IR for which you want to enable sharing.
Step 6: This opens the Edit integration runtime tab on the right side. Go to Sharing and Click on + Grant permission to another Data Factory.
Copy the Resource ID from this step. We will use it in Step 9.
This will list down all ADFs with which you can share this IR.
Step 7: You can either search your ADF or manually enter service identity application ID. Click on Add
Note: You may sometimes be unable to find the ADF from this dropdown list. Even though your ADF lists in the Data Factory page, it does not show up in this list. That will leave you puzzled. Not to worry – such a case might arise when you are creating ADFs using the Azure APIs programmatically or through Terraform. Don’t forget to add the optionalidentity parameter while creating. This assigns a system generated Identity to the resource.
Sample Terraform for ADF
provider "azurerm" {
version = "~>2.0"
features {}
}
resource "azurerm_data_factory" "adf-demo" {
name = "adf-terraform-demo"
location = "East US 2"
resource_group_name = "DEMO-ADF-RG"
identity {
type = "SystemAssigned"
}
}
To locate the service identity id of ADF. Go to Data Factories page, select the ADF and click on Properties.
Step 8: Click on Apply for the changes to effect.
Incase you do not have required permissions, you might get the following error
Error occurred when grant permission to xxxxxxxx-xxxx-xxxx-xxx-xxxxxxxxx. Error: {"error":{"code":"AuthorizationFailed","message":"The client 'xxxxxxxx@powerupcloud.com' with object id 'xxxxxxxx-xxxx-xxxx-xxx-xxxxxxxxx' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/xxxxxxxx-xxxx-xxxx-xxx-xxxxxxxxx/resourcegroups/DEMO-ADF-RG/providers/Microsoft.DataFactory/factories/adf-terraform-demo-Postgres-to-MySQL/integrationRuntimes/integrationRuntime3/providers/Microsoft.Authorization/roleAssignments/xxxxxxxx-xxxx-xxxx-xxx-xxxxxxxxx' or the scope is invalid. If access was recently granted, please refresh your credentials."}}
Step 9:
Now go to the ADF where this has to be shared (one added in the sharing list – adf-terraform-demo). Go to Connections → Integration runtimes → +New → Azure, Self Hosted
Here you will find Type as Self-Hosted (Linked). Enter the Resource ID from Step 6 and Create.
After successful creation, you can find the new IR with sub-type Linked
The IR sharing setup is complete. Be seamless with your ADF pipelines now.
Conclusion
Sharing IRs between ADFs will save greatly on the infrastructure costs. Sharing is simple & effective. We will come up with more ADF use cases and share our problem statements, approaches and solutions.
Hope this was informative. Do leave your comments below for any questions.
Compiled by Kiran Kumar, Business Analyst at Powerupcloud Technologies.
The battle of the Big 3 Cloud Service Providers
The cloud ecosystem is in a constant state of evolution, with increasing maturity and adoption, the battle for the mind and wallet intensifies. With Amazon Web Services (AWS), Microsoft Azure, and Google Cloud (GCP) leading with IaaS maturity, the likes of Salesforce, SAP, and Oracle to Workday, which recently reached $1B in quarterly revenue are both gaining ground and carving out niches in the the ‘X’aaS space. The recent COVID crisis has accelerated both adoption and consideration as enterprises transform to cope, differentiate, and sustain an advantage over the competition.
In this article, I will stick to referencing the AWS, Azure, and GCP and terming them as the BIG 3, a disclaimer, Powerup is a top-tier partner with all three and the comparisons are purely objective based on current publically available information. It is very likely that when you do read this article a lot might have already changed. Having said that, the future will belong to those who excel in providing managed solutions around artificial intelligence, analytics, IoT, and edge computing. So let’s dive right in:
Amazon Web Services – As the oldest amongst the three and the most widely known, showcasing the biggest spread of availability zones and an extensive roster of services. It has monopolized its maturity to activate a developer ecosystem globally, which has proven to be a critical enabler of its widespread use.
Microsoft Azure – Azure is the closest that one gets to AWS in terms of products and services. While AWS has fully leveraged its head start, Azure tapped into Microsoft’s huge enterprise customers and let them take advantage of the already existing infrastructure by providing better value through Windows support and interoperability.
Google Cloud Platform – Google Cloud was announced in 2011, for being less than a decade old it has created a significant footprint. Initially intended to strengthen google’s products but later came up with an enterprise offering. A lot is expected from its deep expertise in AI, ML, deep learning & data analytics to give it a significant edge over the other providers.
AWS vs. Azure vs. Google Cloud: Overall Pros and Cons
In this analysis, I dive into broad technical aspects of these 3 cloud providers based on the common parameters listed below.
Compute
Storage
Exclusives
Compute
AWS Compute:
Amazon EC2 EC2 or Elastic compute cloud is Amazon’s compute offering. EC2 can support multiple instance types (bare metal, GPU, windows, Linux, and more)and can be launched with different security and networking options, you can choose from a wide range of templates available based on your use case. EC2 can both resize and autoscale to handle changes in requirements which eliminates the need for complex governance.
Amazon Elastic Container Service a highly scalable, high-performance container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications, manage and scale a cluster of VM’s, or schedule containers on those VM’s.
Amazon EKS makes it easy to deploy, manage, and scale containerized applications using Kubernetes on AWS.
It also has its own Fargate service that automates server and cluster management for containers, a virtual private cloud option known as Lightsail for batch computing jobs, Elastic Beanstalk for running and scaling Web applications, lambda for launching serverless applications.
Container services Include Amazon Elastic Container Registry a fully-managed Docker container registry which allows you to store, manage, and deploy Docker container images.
Microsoft VM:
Azure VM: Azure VM’s are a secure and highly scalable compute solution with various instance types optimized for high-performance computing, Ai, and ML-based computing container instances and with azure’s emphasis on hybrid computing, support for multiple OS’s types, Microsoft software, and services. Virtual Machine Scale Sets are used to auto-scale your instances.
Azure container services include Azure Kubernetes service fully managed Kubernetes based Container Solution.
Container Registry which lets you store and manage container images across all types of Azure deployments.
Service Fabric A unique fully managed services which lets you develop microservices and orchestrate containers on Windows or Linux.
Other services include Web App for Containers which lets you run, scale, and deploy containerized web apps. Azure Functions for launching serverless applications, Azure Red Hat OpenShift, with support for OpenShift.
Google Compute Engine:
Google Compute Engine (GCE) is google compute service Google is fairly new to cloud compared to the other two CSP’s and it is reflected in its catalog of services GCE offers the standard array of features starting from windows and Linux instances, RESTful API’s, load balancing, data storage, and networking, CLI and GUI interfaces, and easy scaling. Backed by Google, GCE can spin up instances faster than most of its competition under most cases. It runs on a carbon-neutral infrastructure and offers the best value for your buck among the competition.
Google Kubernetes Engine (GKE) is based on Kubernetes, originally developed inhouse Google has the highest expertise when it comes to Kubernetes and has deeply integrated it into the google cloud platform GKE service can be used to automate many of your deployment, maintenance, and management tasks. Also can be used with hybrid clouds via the Anthos service.
Storage
AWS Storage:
Amazon S3 is an object storage service that offers scalability, data availability, security, and performance for most of your storage requirements. Amazon Elastic Block Store persistent block storage that can be used with your Amazon EC2 instances. Elastic file system for scalable file storage.
Other storage services include S3 Glacier, a secure, durable, and extremely low-cost storage service for data archiving and long-term backup, Storage Gateway for hybrid storage, and snowball, a device used for offline small to medium scale data transfer.
Database
And other database services like Amazon Aurora a SQL compatible relational database, RDS (relational database service), DynamoDB NoSQL database, Amazon ElastiCache forElasti Cache in-memory data store, Redshift data warehouse,Amazon Neptune a graph database.
Azure Storage:
Azure Blobs A massively scalable object storage solution, includes support for big data analytics through Data Lake Storage Gen2, Azure Files Managed file storage solution with support for on-prem, Azure Queues A reliable messaging store, Azure Tables A NoSQL storage solution for structured data.
Azure Disks Block-level storage volumes for Azure VMs similar to Amazon EBS.
Database
Database Services Include SQL based database like Azure SQL Database, Azure Database for MySQL, and, Azure Database for PostgreSQL for NoSQL data warehouse services, Cosmos DB, and table storage, Server stretch database is a hybrid storage service designed specifically for organizations leveraging Microsoft SQL on-prem and, Redis cache is an in-memory data storage service.
Google Cloud Storage:
GCP’s cloud storage service includes Google Cloud Storage unified, scalable, and highly durable object storage, Filestore network-attached storage (NAS) for Compute Engine and GKE instances, Persistent Disk object storage for VM instances and, Transfer Appliance for Large data transfer.
Database
On the database side, GCP has 3 NoSQL database Cloud BigTable for storing big data, Firestore a document database for mobile and web application data, Bigquery an analytics server, Memorystore for in-memory storage, Firebase Realtime Database cloud database for storing and syncing data in real-time. SQL-based Cloud SQL and a relational database called, Cloud Spanner that is designed for mission-critical workloads.
Benchmarks Reports
An additional drill-down would be to analyze performance figures for the three across for network, storage, and CPU, and here I quote research data from a study conducted by Cockroach labs.
Network
GCP has taken significant strides when it comes to network and latency compared to last year as it even outperforms AWS and Azure in network performance
Some of GCP’s best performing machines hover around 40-60 GB/sec
AWS machines stick to their claims and offer a consistent 20 to 25 GB/sec and
Azure’s machines offered significantly less at 8 GB/sec.
When it comes to latency AWS outshines the competition by offering a consistent performance across all of its machines.
GCP does undercut AWS under some cases but still lacks the consistency of AWS.
Azure’s negligible performance in the network department has reflected in high latency making it the least performing among the three.
NOTE: GCP believes that skylake for the n1 family of machines, is the reason for their increase in performance on the network side.
Storage
AWS has superior performance in storage; neither GCP nor Azure even comes close to the read-write speeds and latency figures. This is largely due to the storage optimized instances like the i3 series. Azure and GCP do not have storage optimized instances and have a performance that is comparable to the non-storage optimized instances from Amazon While Azure offered slightly better read-write speed among the two, GCP offered better latency.
CPU
While comparing the CPU’s performances Azure machines showcased a slightly higher CPU performance thanks to Using conventional 16 core CPUs. Azure machines use 16 cores with a single thread per core and other clouds use hyperthreading to achieve 16 cores by combining 8cores with 2 threads. After comparing each offering across the three platforms here’s the best each cloud platform has to offer.
AWS c5d.4xlarge 25000 – 50000 Bogo ops per sec
Azure Standard_DS14_v2 just over 75000 Bogo ops per sec
GCP c2-standard-16 25000 – 50000 Bogo ops per sec
While AWS and GCP figures look similar AWS overall offers slightly better than GCP and
Avoiding hyperthreading has inflated Azure’s figures and while it might still be superior in performance it may not accurately represent the difference in the performance power it offers.
Going forward, technologies like Artificial Intelligence, Machine Learning, the Internet of Things(IoT), and serverless computing will play a huge role in shaping the technology industry. The goal of most of the services and products will try to take advantage of these technologies to deliver solutions more efficiently and with precision. All of the “BIG 3“providers have begun experimenting with offerings in these areas. This can very well be the key differentiator between them.
AWS Key Tools:
Some of the latest additions to the AWS portfolio include AWS Graviton processors built using 64 bit Arm Neoverse cores. EC2 based M6g, C6g, and R6g instances are powered by these new-gen instances. Thanks to the power-efficient Arm architecture it is said to provide 40% better price performance over the X86 based instances.
AWS Outpost: Outpost is Amazon’s emphasis on the hybrid architecture; it is a fully managed ITaaS solution that brings all AWS products and services to anywhere by physically deploying it in your site. It is aimed at offering a consistent hybrid experience with the scalability and flexibility of AWS.
AWS has put a lot of time and effort into developing a relatively broad range of products and services in AI and ML space. Some of the important ones include AWS Sagemaker service for training and deploying machine learning models, the Lex conversational interface, and Polly text-to-speech service which powers Alexa services, its Greengrass IoT messaging service and the Lambda serverless computing service.
And AI-powered services like DeepLens which can be trained and used for OCR, Image, and, character Recognition, Gluon, an open-source deep-learning library designed to build and quickly train neural networks without having to know AI programming.
Azure Key Tools:
When it comes to hybrid support Azure offers a very strong proposition, with services like Azure stack and Azure Arc minimize your risks of going wrong. Knowing that a lot of enterprises are already using Microsoft’s services Azure tries to deepen this by offering enhanced security and flexibility through its hybrid services. With Azure Arc customers can manage resources deployed within Azure and outside of Azure through the same control plane enabling organizations to extend Azure services to their on-prem data centers.
Azure also consists of a comprehensive family of AI services and cognitive APIs which helps you build intelligent apps, services like Bing Web Search API, Text Analytics API, Face API, Computer Vision API and Custom Vision Service come under it. For IoT, it has several management and analytics services, and it also has a serverless computing service known as Functions.
Google Cloud Key Tools:
AI and machine learning are big areas of focus for GCP. Google is a leader in AI development, thanks to TensorFlow, an open-source software library for building machine learning applications. It is the single most popular library in the market, with AWS also adding support for TensorFlow in an acknowledgment of this.
Google Cloud has strong offerings in APIs for natural language, speech, translation, and more. Additionally, it offers IoT and serverless services, but both are still in beta stage. However Google has been working extensively on Anthos, as quoted by Sundar Pichai Anthos follows the “Write once and run anywhere” approach by allowing organizations to run Kubernetes workloads on-premises, AWS or Azure, however, Azure support is still in a beta testing stage.
Verdict
Each of the three has its own set of features and come with their own set of constraints and advantages. The selection of the appropriate cloud provider should, therefore, like with most enterprise software be based on your organizational goals over the long term.
However, we strongly believe that multi-cloud will be the way forward for an organization for e.g. if an organization is an existing user of Microsoft’s services it is natural for it to prefer Azure. Most small, web-based/digitally native companies looking to scale quickly by leveraging AI/ML, Data services, would want to take a good look at Google Cloud. And of course, AWS with its absolute scale of products and services and maturity makes it very hard to ignore in any mix.
Hope this shed some light on the technical considerations, and will follow this up with some of the other key evaluating factors that we think you should consider while selecting your cloud provider.
Written by Arun Kumar, Associate Cloud Architect at Powerupcloud Technologies
In the traditional ETL world generally, we use our own scripts or any paid tool or Open source Data processing tool or an orchestrator to deploy our data pipeline. If the Data processing pipeline is not complex, if we use these server-based solutions then sometimes it would add additional costs(considering deploying some non-complex pipelines). In AWS we have multiple serverless solutions Lambda and Glue. But lambda has the execution time limitation and Glue is running an EMR cluster in the background, so ultimately it’ll charge you a lot. So we decided to explore AWS Step functions with Lambda which are serverless at the same time as an orchestration service that executes our process on the event bases and terminates the resources post-execution of the process. Let’s see how we can build a data pipeline with this.
Architecture Description:
The Teradata server from on-prem will send the input.csv file to the S3 bucket(data_processing folder) on schedule bases.
CloudWatch Event Rule will trigger the step function in the case of PutObject in the specified S3 bucket and start processing the input file.
The cleansing script placed on ECS.
AWS Step function will call Lambda Function and it will trigger ECS tasks(a bunch of Python and R script).
Once the cleansing is done the output file will be uploaded to the target S3 bucket.
AWS lambda function will be triggered to get the output file from the target bucket and send it to the respective team.
Create a custom CloudWatch Event Rule for S3 put object operation
Choose Event Pattern -> Service Name -> S3 -> Event Type -> Object level operations -> choose put object -> give the bucket name.
In targets choose the step function to be triggered -> give the name of the state machine created.
Create a new role or existing role as a cloud watch event requires permission to send events to your step function.
Choose one more target to trigger the lambda function -> choose the function which we created before.
AWS Management Console and search for Step Function
Create a state machine
On the Define state machine page select Author with code snippets.
Give a name. Review the State machine definition and visual workflow.
Use the graph in the Visual Workflow pane to check that your Amazon States Language code describes your state machine correctly.
Create or If you have previously created an IAM role select created an IAM role.
Create an ECS with Fargate
ECS console -> choose create cluster -> choose cluster template -> Networking only -> Next -> Configure -> cluster -> give cluster name -> create
In the navigation pane, choose Task Definitions, Create new Task Definition.
On the Select compatibilities page, select the launch type that your task should use and choose Next step. Choose Fargate launch type.
For Task Definition Name, type a name for your task definition.
For Task Role, choose an IAM role that provides permissions for containers in your task to make calls to AWS API operations on your behalf.
To create an IAM role for your tasks
a. Open the IAM console.
b. In the navigation pane, choose Roles, Create New Role.
c. In the Select Role Type section, for the Amazon Elastic Container Service Task Role service role, choose Select.
d. In the Attach Policy section, select the policy to use for your tasks and then choose Next Step.
e. For Role Name, enter a name for your role. Choose Create Role to finish.
Task execution IAM role, either select your task execution role or choose to Create a new role so that the console can create one for you.
Task size, choose a value for Task memory (GB) and Task CPU (vCPU).
For each container in your task definition, complete the following steps:
a. Choose Add container.
b. Fill out each required field and any optional fields to use in your container definitions.
c. Choose Add to add your container to the task definition.
Create a Lambda function
Create lambda function to call the ECS
In lambda console -> Create function -> Choose Author from scratch -> give function name -> Give runtime -> choose python 3.7 -> Create a new role or if you have existing role choose the role with required permission [ Amazon ECS Full Access, AWS Lambda Basic Execution Role ]
In Name, type a descriptive name for your event configuration.
Under Events, select one or more of the type of event occurrences that you want to receive notifications for. When the event occurs a notification is sent to a destination that you choose.
Type an object name Prefix and/or a Suffix to filter the event notifications by the prefix and/or suffix.
Select the type of destination to have the event notifications sent to.
If you select the Lambda Function destination type, do the following:
In Lambda Function, type or choose the name of the Lambda function that you want to receive notifications from Amazon S3 and choose to save.
Create a lambda function with Node.js
Note: Give bucket name, folder, file name, Email address verified.
[ var aws = require('aws-sdk');
var nodemailer = require('nodemailer');
var ses = new aws.SES({region:'us-east-1'});
var s3 = new aws.S3();
function getS3File(bucket, key) {
return new Promise(function (resolve, reject) {
s3.getObject(
{
Bucket: bucket,
Key: key
},
function (err, data) {
if (err) return reject(err);
else return resolve(data);
}
);
})
}
exports.handler = function (event, context, callback) {
getS3File('window-demo-1', 'output/result.csv')
.then(function (fileData) {
var mailOptions = {
from: 'arun.kumar@powerupcloud.com',
subject: 'File uploaded in S3 succeeded!',
html: `<p>You got a contact message from: <b>${event.emailAddress}</b></p>`,
to: 'arun.kumar@powerupcloud.com',
attachments: [
{
filename: "result.csv",
content: fileData.Body
}
]
};
console.log('Creating SES transporter');
// create Nodemailer SES transporter
var transporter = nodemailer.createTransport({
SES: ses
});
// send email
transporter.sendMail(mailOptions, function (err, info) {
if (err) {
console.log(err);
console.log('Error sending email');
callback(err);
} else {
console.log('Email sent successfully');
callback();
}
});
})
.catch(function (error) {
console.log(error);
console.log('Error getting attachment from S3');
callback(error);
});
}; ]
Conclusion:
If you are looking for a serverless orchestrator for your batch processing or on a complex Data Processing pipeline then give a try with AW Step functions and Lambda here we use ECS Farget to cleanse the data. If your Data processing script is more complex you can integrate with Glue but still Step function will act as your orchestrator.
Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies
Introduction
In the world of big data, raw, unorganized data is often stored in relational, non-relational, and other storage systems. However, on its own, raw data doesn’t have the proper context or meaning to provide meaningful insights to analysts, data scientists, or business decision-makers.
Big data requires service that can orchestrate and operationalize processes to refine these enormous stores of raw data into actionable business insights. Azure Data Factory(ADF) is a managed cloud service that’s built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.
This is how Azure introduces you to ADF. You can refer to the Azure documentation on ADF to know more.
Simply said, ADF is an ETL tool that will help you connect to various data sources to load data, perform transformations as per your business logic, and store them into different types of storages. It is a powerful tool and will help solve a variety of use cases.
In this blog, we will create a self hosted integration runtime (IR) with two nodes for high availability.
Use Case
A reputed client on OTT building an entire Content Management System (CMS) application on Azure having to migrate their old data or historical data from AWS which is hosting their current production environment. That’s when ADFs with self-hosted IRs come to your rescue – we were required to connect to a different cloud, different VPC, private network, or on-premise data sources.
Our use-case here was to read data from a production AWS RDS MySQL Server inside a private VPC from ADF. To make this happen, we set up a two node self-hosted IR with high availability (HA).
Pre-requisites
Windows Server VMs (Min 2 – Node1 & Node2)
.NET Framework 4.6.1 or later
For working with Parquet, ORC, and Avro formats you will require
Step 2: Search for Data Factory in the Search bar. Click on + Add to create a new Data Factory.
Step 3: Enter a valid name for your ADF.
Note:The name can contain only letters, numbers, and hyphens. The first and last characters must be a letter or number. Spaces are not allowed.
Select the Subscription & Resource Group you want to create this ADF in. It is usually a good practice to enable Git for your ADF. Apart from being able to store all your code safely, this also helps you when you have to migrate your ADF to a production subscription. You can get all your pipelines on the go.
Step 4: Click Create
You will need to wait for a few minutes, till your deployment is complete. If you get any error messages here, check your Subscription & Permission level to make sure you have the required permissions to create data factories.
Click on Go to resource
Step 5:
Click on Author & Monitor
Next, click on the Pencil button on the left side panel
Step 6: Click on Connections
Step 7: Under Connections tab, click on Integration runtimes, click on + New to create a new IR
Step 8: On clicking New, you will be taken to the IR set-up wizard.
Select Azure, Self-Hosted and click on Continue
Step 9: Select Self-Hosted and Continue
Step 10: Enter a valid name for your IR, and click Create
Note: Integration runtime Name can contain only letters, numbers and the dash (-) character. The first and last characters must be a letter or number. Every dash (-) character must be immediately preceded and followed by a letter or a number. Consecutive dashes are not permitted in integration runtime names.
Step 11:
On clicking Create, your IR will be created.
Next you will need to install the IRs in your Windows VMs. At this point you should login to your VM (Node1) or wherever you want to install your
You are provided with two options for installation :
Express Setup – This is the easiest way to install and configure your IRs. We are following the Express Setup in this setup. Connect to your Windows Server where you want to install.
Login to Azure Portal in your browser (inside your VM) → Data Factory → select your ADF → Connections → Integration Runtimes → integrationRuntime1 → Click Express Setup → Click on the link to download setup files.
Manual Setup – You can download the integration runtime and add the authentication keys to validate your installation.
Step 12:Express Setup
Click on the downloaded file.
On clicking on the downloaded file, your installation will start automatically.
Step 13:
Once the installation and authentication is successfully completed, go to the Start Menu → Microsoft Integration Runtime → Microsoft Integration Runtime
Step 14: You will need to wait till your node is able to connect to the cloud service. If for any reason, you get any error at this step, you can troubleshoot by referring to self hosted integration runtime troubleshoot guide
Step 15: High availability
One node setup is complete. For high availability, we will need to set up at least 2 nodes. An IR can have a max of 4 nodes.
Note: Before setting up other nodes, you need to enable remote access. To enable remote access, you need to make sure you are doing it in your very first node, i.e, you have a single node when you are doing this configuration, you might face issues with connectivity later if you forget this step.
Go to Settings tab and Click on Change under Remote access from intranet
Step 16:
Select Enable without TLS/SSL certificate (Basic) for dev/test purpose, or use TLS/SSL for a more secured connection.
You can select a different TCP port – else use the default 8060
Step 17:
Click on OK. Your IR will need to be restarted for this change to be effected. Click OK again.
You will notice remote access enabled for your node.
Step 18:
Login to your other VM (Node2). Repeat Steps 11 to 17. At this point you will probably get a Connection Limited message stating your nodes are not able to connect to each other. Guess why? We will need to enable inbound access to port 8060 for both nodes.
Go to Azure Portal → Virtual Machines → Select your VM (Node1) → Networking.
Click on Add inbound port rule
Step 19:
Select Source → IP Addresses → Set Source IP as the IP of your Node2. Node2 will need to connect to Port 8060 of Node 1. Click Add
Node1 IP – 10.0.0.1 & Node2 IP – 10.0.0.2. You can use either of private or public IP addresses.
We will need to do a similar exercise for Node2.
Go to the VM page of Node2 and add Inbound rule for Port 8060. Node1 & Node2 need to be able to communicate with each other via port 8060.
Step 20:
If you go to your IR inside your Node1 and Node2, you will see the green tick implying your nodes are successfully connected to each other and also to the cloud. You can wait for some time for this sync to happen. If for some reason, you get an error at this step, you can view integration runtime logs from Windows Event Viewer to further troubleshoot. Restart both of your nodes.
To verify this connection, you can also check in the ADF Console.
Go to your Data Factory → Monitor (Watch symbol on the left panel, below Pencil symbol – Check Step 5) → Integration runtimes
Here you can see the number of registered nodes and their resource utilization. The HIGH AVAILABILITY ENABLED featured is turned ON now.
Step 21: Test Database connectivity from your Node
If you want to test database connectivity from your Node, make sure you have whitelisted the Public IP of your Node at the Database Server inbound security rules.
For e.g, if your Node1 has an IP address 66.666.66.66 and needs to connect to an AWS RDS MySQL Server. Go to your RDS security group and add Inbound rules of your MySQL Port for this IP.
To test this. Login to your Node1 → Start → Microsoft Integration Runtime → Diagnostics → Add your RDS connection details → Click on Test
Conclusion
This brings you to the end of successfully setting up a self-hosted IR with high availability enabled.
Hope this was informative. Do leave your comments below. Thanks for reading.
Author: Niraj Kumar Gupta, Cloud Consulting at Powerupcloud Technologies.
Contributors: Mudit Jain, Hemant Kumar R and Tiriveedi Srividhya
INTRODUCTION TO SERVICES USED
CloudWatch Metrics
Metrics are abstract data points indicating performance of your systems. By default, several AWS services provide free metrics for resources (such as Amazon EC2 instances, Amazon EBS volumes, and Amazon RDS DB instances).
CloudWatch Alarms
AWS CloudWatch Alarm is a powerful service provided by Amazon for monitoring and managing our AWS services. It provides us with data and actionable insights that we can use to monitor our application/websites, understand and respond to critical changes, optimize resource utilization, and get a consolidated view of the entire account. CloudWatch collects monitoring and operational information in the form of logs, metrics, and events. You can configure alarms to initiate an action when a condition is satisfied, like reaching a pre-configured threshold.
CloudWatch Dashboard
Amazon CloudWatch Dashboards is a feature of AWS CloudWatch that offers basic monitoring home pages for your AWS accounts. It provides resource status and performance views via graphs and gauges. Dashboards can monitor resources in multiple AWS regions to present a cohesive account-wide view of your accounts.
CloudWatch Composite Alarms
Composite alarms enhance existing alarm capability giving customers a way to logically combine multiple alarms. A single infrastructure event may generate multiple alarms, and the volume of alarms can overwhelm operators or mislead the triage and diagnosis process. If this happens, operators can end up dealing with alarm fatigue or waste time reviewing a large number of alarms to identify the root cause. Composite alarms give operators the ability to add logic and group alarms into a single high-level alarm, which is triggered when the underlying conditions are met. This gives operators the ability to make intelligent decisions and reduces the time to detect, diagnose, and performance issues when it happen.
What are Anomaly detection-based alarms?
Amazon CloudWatch Anomaly Detection applies machine-learning algorithms to continuously analyze system and application metrics, determine a normal baseline, and surface anomalies with minimal user intervention. You can use Anomaly Detection to isolate and troubleshoot unexpected changes in your metric behavior.
Why Composite Alarms?
Simple Alarms monitor single metrics. Most of the alarms triggered, limited by the design, will be false positives on further triage. This adds to maintenance overhead and noise.
Advance use cases cannot be conceptualized and achieved with simple alarms.
Why Anomaly Detection?
Static alarms trigger based on fixed higher and/or lower limits. There is no direct way to change these limits based on the day of the month, day of the week and/or time of the day etc. For most businesses these values change massively over different times of the day and so on. Specially so, while monitoring user behavior impacted metrics, like incoming or outgoing traffic. This leaves the static alarms futile for most of the time.
It is cheap AI based regression on the metrics.
Solution Overview
Request count > monitored by anomaly detection based Alarm1.
Cache hit > monitored by anomaly detection based Alarm2.
Alarm1 and Alarm2 > monitored by composite Alarm3.
Alarm3 > Send Notification(s) to SNS2, which has lambda endpoint as subscription.
Lambda Function > Sends custom notification with CloudWatch Dashboard link to the distribution lists subscribed in SNS1.
Solution
Prerequisites
Enable additional CloudFront Cache-Hit metrics.
Configuration
This is applicable to all enterprise’s CloudFront CDN distributions.
We will configure an Anomaly Detection alarm on request count increasing by 10%(example) of expected average.
2. We will add an Anomaly Detection alarm on CacheHitRate percentage going lower than standard deviation of 10%(example) of expected average.
3. We will create a composite alarm for the above-mentioned alarms using logical AND operation.
4. Create a CloudWatch Dashboard with all required information in one place for quick access.
5. Create a lambda function:
This will be triggered by SNS2 (SNS topic) when the composite alarm state changes to “ALARM”. This lambda function will execute to send custom notifications (EMAIL alerts) to the users via SNS1 (SNS topic)
The target arn should be the SNS1, where the user’s Email id is configured as endpoints.
In the message section type the custom message which needs to be notified to the user, here we have mentioned the CloudWatch dashboard URL.
6. Create two SNS topics:
SNS1 – With EMAIL alerts to users [preferably to email distribution list(s)].
SNS2 – A Lambda function subscription with code sending custom notifications via SNS1 with links to CloudWatch dashboard(s). Same lambda can be used to pick different dashboard links based on the specific composite alarm triggered, from a DynamoDB table with mapping between SNS target topic ARN to CloudWatch Dashboard link.
7. Add notification to the composite alarm to send notification on the SNS2.
Possible False Positives
There is some new promotion activity and the newly developed pages for the promotional activity.
Some hotfix went wrong at the time of spikes in traffic.
Summary
This is one example of implementing a simple setup of composite alarms and anomaly-based detection alarms to achieve advance security monitoring. We are submitting the case that these are very powerful tools and can be used to design a lot of advanced functionalities.
Written by Kiran Kumar, Business analyst at Powerupcloud Technologies.
While public cloud is a globally accepted and proven solution for CIO’s and CTO’s looking for a more agile, scalable and versatile IT environment, there are still questions about security, reliability, cloud readiness of the enterprises and that require a lot of time and resources to fully migrate to a cloud-native organization. This is exacerbated especially for start-ups, as it is too much of a risk to work with these uncertainties. This demands a solution that is innocuous and less expensive to drive them out of the comforts of their existing on-prem infrastructure.
Under such cases, a hybrid cloud is the best approach providing you with the best of both worlds while keeping pace with all your performance, & compliance needs within the comforts of your datacenter.
So what is a hybrid cloud?
Hybrid cloud delivers a seamless computing experience to the organizations by combining the power of the public and private cloud and allowing data and applications to be shared between them. It provides enterprises the ability to easily scale their on-premises infrastructure to the public cloud to handle any fluctuations in the work-load without giving third-party datacenters access to the entirety of their data. Understanding the benefits, various organizations around the world have streamlined their offerings to effortlessly integrate these solutions into their hybrid infrastructures. However, an enterprise has no direct control over the architecture of a public cloud so, for hybrid cloud deployment, enterprises must architect their private cloud to achieve consistent hybrid experience with the desired public cloud or clouds.
A 2019 survey (of 2,650 IT decision-makers from around the world) respondents reported steady and substantial hybrid deployment plans over the next five years. In addition to this, a vast majority of 2019 survey respondents about more than 80% selected hybrid cloud as their ideal IT operating model and more than half of these respondents cited hybrid cloud as the model that meets all of their needs. And more than 60% of them stated that data security is the biggest influencer.
Also, respondents felt having the flexibility to match the right cloud to each application showcases the scale of adaptability that enterprises are allowed to work with, in a hybrid multi-cloud environment
Banking is one of those industries that will embrace the full benefits of a hybrid cloud, because of how the industry operates they require a unique mix of services and an infrastructure which is easily accessible and also affordable
50 percent of banking executives say they believe the hybrid cloud can lower their cost of IT ownership
47 percent of banking executives say they believe hybrid cloud can improve operating margin
47 percent of banking executives say they believe hybrid cloud can accelerate innovation
Hybrid adoption – best practices and guidelines
Some of the biggest challenges in cloud adoption include security, talent, and costs, according to the report’s hybrid computing has shown that it can eliminate security challenges and manage risk, by positioning all the important digital assets and data on-prem. Private clouds are still considered to be an appropriate solution to host and manage sensitive data and applications and also the enterprises still need the means to support their conventional enterprise computing models. A sizeable number of businesses still have substantial on-premise assets comprising archaic technology, sensitive collections of data, and highly coupled legacy apps that either can’t be easily moved or swapped for public cloud.
Here some of the guidelines for hybrid adoption
Have a cloud deployment model for applications and data
Deployment models talk about what cloud resources and applications should be deployed and where. Hence it is crucial to understand the 2 paced system ie, steady and fast-paced system to determine the deployment models.
A steady paced system must continue to support the traditional enterprise applications on-prem to keep the business running and maintain the current on-premise services. Additionally, off-premises services, such as private dedicated IaaS, can be used to increase infrastructure flexibility for enterprise services.
And a fast-paced system is required to satisfy the more spontaneous needs like delivering applications and services quickly whether it’s scaling existing services to satisfy spikes in demand or providing new applications quickly to meet an immediate business need.
The next step is determining where applications and data must reside.
Placement of application and datasets on private, public or on-prem is crucial since IT architects must access the right application architecture to achieve maximum benefit. This includes understanding application workload characteristics and determining the right deployment model for multi-tier applications.
Create heterogeneous environments
To achieve maximum benefits from a hybrid strategy, the enterprise must leverage its existing in-house investments with cloud services, by efficiently integrating them, as new cloud services are deployed, the applications running on them with various on-premises applications and systems becomes important.
Integration between applications typically includes
Process (or control) integration, where an application invokes another one in order to execute a certain workflow.
Data integration, where applications share common data, or one application’s output becomes another application’s input.
Presentation integration, where multiple applications present their results simultaneously to a user through a dashboard or mashup.
To obtain a seamless integration between heterogeneous environments, the following actions are necessary:
a cloud service provider must support open source technologies for admin and business interfaces.
Examine the compatibility of in-house systems to work with cloud services providers and also ensure that on-premises applications are following SOA design principles and can utilize and expose APIs to enable interoperability with private or public cloud services.
Leverage the support of third party ID and Access Management functionality to authenticate and authorize access to cloud services. Put in place suitable API Management capabilities to prevent unauthorized access.
Network security requirements
Network type – Technology used for physical connection over the WAN Depends on aspects like bandwidth, latency, service levels, and costs. Hybrid cloud solutions can rely on P2P links as well as the Internet to connect on-premises data centers and cloud providers. The selection of the connectivity type depends on the analysis of aspects like performance, availability, and type of workloads.
Security– connectivity domain needs to be evaluated and understood; to match the network security standards between cloud provider network security standards and the overall network security policies, guidelines and compliance. The encrypting and authenticating traffic on the WAN can be evaluated at the application level. And aspects like systems for the computing resources, applications must be considered and technologies such as VPNs can be employed to provide secure connections between components running in different environments.
Web apps security and Management services like DNS and DDoS protection which are available on the cloud can free up dedicated resources required by an enterprise to procure, set-up and maintain such services and instead concentrate on business applications. This is especially applicable to the hybrid cloud for workloads that have components deployed into a cloud service and are exposed to the Internet. The system that is deployed on-premises needs to adapt to work with the cloud, to facilitate problem identification activities that may span multiple systems that have different governance boundaries.
Security and privacy challenges & counter-measures
Hybrid cloud computing has to coordinate between applications and services spanning across various environments, which involves the movement of applications and data between the environments. Security protocols need to be applied across the whole system consistently, and additional risks must be addressed with suitable controls to account for the loss of control over any assets and data placed into a cloud provider’s systems. Despite this inherent loss of control, enterprises still need to take responsibility for their use of cloud computing services to maintain situational awareness, weigh alternatives, set priorities, and effect changes in security and privacy that are in the best interest of the organization.
A single and uniform interface must be used to curtail or Deal with risks arising from using services from various cloud providers since it is likely that each will have its own set of security and privacy characteristics.
Authentication and Authorization. A hybrid environment could mean that gaining access to the public cloud environment could lead to access to the on-premises cloud environment.
Compliance check between cloud providers used and in-home systems.
Counter-measures
A single Id&AM system should be used.
Networking facilities such as VPN are recommended between the on-premises environment and the cloud.
Encryption needs to be in place for all sensitive data, wherever it is located.
firewalls, DDoS attack handling, etc, needs to be coordinated across all environments with external interfaces.
Set-up an appropriate DB&DR plan
As already been discussed, a hybrid environment provides organizations the option to work with the multi-cloud thus offering business continuity, which has been one of the most important aspects of business operations. It is not just a simple data backup to the cloud or a Disaster Recovery Plan, it means when a disaster or failure occurs, data is still accessible with little to no downtime. Which is measured in terms of time to restart (RTO: recovery time objective) and maximum data loss allowed (RPO: recovery point objective).
Therefore a business continuity solution has to be planned considering some of the key elements such as resilience, time to restart (RTO: recovery time objective) and maximum data loss allowed (RPO: recovery point objective) which was agreed upon by the cloud provider.
Here are some of the challenges encountered while making a DR plan
Although the RTO and RPO values give us a general idea of the outcome, they cannot be trusted fully so the time required to restart the operation may take longer
As the systems get back up and operational there will be a sudden burst of request for resources which is more apparent in large scale disasters.
Selecting the right CSP is crucial as most of the cloud providers do not provide DR as a managed service instead, they provide a basic infrastructure to enable our own DRaaS.
Hence enterprises have to be clear and select their DR strategy which best suits their IT infrastructure as this is very crucial in providing mobility to the business thus making the business more easily accessible from anywhere around the world and also data insurance in the event of a disaster natural or even in case of technical failures, by minimizing downtime and the costs associated with such an event.
How are leading OEMs like AWS, Azure and Google Cloud adapting to this changing landscape
In early, 2019 google came up with Anthos which is one of the first multi-cloud solutions from a mainstream cloud provider, Anthos is an open application modernization platform that enables you to modernize your existing applications, build new ones, and run them anywhere, built on open-source, including Kubernetes as its central command and control center, Istio enables federated network management across the platform, and Knative provides an open API and runtime environment that enables you to run your serverless workloads anywhere you choose. Anthos enables consistency between on-premises and cloud environments. Anthos helps accelerate application development and strategically enables your business with transformational technologies.
AWS Outposts is a fully managed service that extends the same AWS hardware infrastructure, services, APIs, and tools to build and run your applications on-premises and in the cloud for a truly consistent hybrid experience. AWS compute, storage, database, and other services run locally on Outposts, and you can access the full range of AWS services available in the Region to build, manage, and scale your on-premises applications using familiar AWS services and tools. across your on-premises and cloud environments. Your Outposts infrastructure and AWS services are managed, monitored, and updated by AWS just like in the cloud.
Azure Stack is a hybrid solution provided by Azure built and distributed by approved Hardware vendors(like Dell, Lenovo, HPE, etc,.) that bring Azure cloud into your on-prem data center. It is a fully managed service where hardware is managed by the certified vendors and software is managed by the Microsoft Azure. Using azure stack you can extend the azure technology anywhere, from the datacenter to edge locations and remote offices. Enabling you to build, deploy, and run hybrid and edge computing apps consistently across your IT ecosystem, with flexibility for diverse workloads.
How Powerup approaches Hybrid cloud for its customers
Powerup is one of the few companies in the world to have achieved the status of a launch partner with AWS outposts with the experience in working on over 200+projects across various verticals and having top-tier certified expertise in all the 3 major cloud providers in the market. We can bring an agile, secure, and seamless hybrid experience across the table. Outposts is a fully managed services hence it eliminates the hassle of managing an on-prem data center so that the enterprises can concentrate more on optimizing their infrastructure
Written by Aparna M, Associate Solutions Architect at Powerupcloud Technologies.
Multicast is a communication protocol used for delivering a single stream of data to multiple receiving computers simultaneously.
Now AWS Transit Gateway multicast makes it easy for customers to build multicast applications in the cloud and distribute data across thousands of connected Virtual Private Cloud networks. Multicast delivers a single stream of data to many users simultaneously. It is a preferred protocol to stream multimedia content and subscription data such as news articles and stock quotes, to a group of subscribers.
Now let’s understand the key concepts of Multicast:
Multicast domain – Multicast domain allows the segmentation of a multicast network into different domains and makes the transit gateway act as multiple multicast routers. This is defined at the subnet level.
Multicast Group – A multicast group is used to identify a set of sources and receivers that will send and receive the same multicast traffic. It is identified by a group IP address.
Multicast source – An elastic network interface associated with a supported EC2 instance that sends multicast traffic.
Multicast group member – An elastic network interface associated with a supported EC2 instance that receives multicast traffic. A multicast group has multiple group members.
Key Considerations for setting up Multicast in AWS:
Create a new transit gateway to enable multicast
You cannot share multicast-enabled transit gateways with other accounts
On the navigation pane, choose Create Transit Gateway.
For Name tag, enter a name to identify the Transit gateway.
Enable Multicast support.
Choose Create Transit Gateway.
Create a Transit Gateway Multicast Domain
On the navigation pane, choose the Transit Gateway Multicast.
Choose Create Transit Gateway Multicast domain.
(Optional) For Name tag, enter a name to identify the domain.
For Transit Gateway ID, select the transit gateway that processes the multicast traffic.
Choose Create Transit Gateway multicast domain.
Associate VPC Attachments and Subnets with a Transit Gateway Multicast Domain
To associate VPC attachments with a transit gateway multicast domain using the console
On the navigation pane, choose Transit Gateway Multicast.
Select the transit gateway multicast domain, and then choose Actions, Create association.
3. For Transit Gateway ID, select the transit gateway attachment.
4. For Choose subnets to associate, select the subnets to include in the domain.
5. Choose Create association.
Register Sources with a Multicast Group
In order to register sources for transit gateway multicast:
On the navigation pane, choose Transit Gateway Multicast.
Select the transit gateway multicast domain, and then choose Actions, Add group sources.
For Group IP address, enter either the IPv4 CIDR block or IPv6 CIDR block to assign to the multicast domain. IP range must be in 224.0.0.0/4.
Under Choose network interfaces, select the multicast sender’s (ec2 servers) network interfaces.
Choose Add sources.
Register Members with a Multicast Group
To register members in the transit gateway multicast:
On the navigation pane, choose Transit Gateway Multicast.
Select the transit gateway multicast domain, and then choose Actions, Add group members.
For Group IP address, enter either the IPv4 CIDR block or IPv6 CIDR block to assign to the multicast domain. Specify the same multi cast IP specified while adding source.
Under Choose network interfaces, select the multicast receivers'(ec2 server) network interfaces.
Choose Add members.
Modify Security groups of the Member servers(receivers):
Allow inbound traffic on Custom UDP port 5001
Once your setup is completed follow the below steps to test the multicast routing.
Login to all the Source and member servers.
Make sure you install iperf package in all your servers in order to test the functionality
Run the below command in the Source Machine
iperf -s -u -B 224.0.0.50 -i 1
– 224.0.0.50 will be your multicast group IP provided during the setup
Run the below command in all the member servers
iperf -c 224.0.0.50 -u -T 32 -t 100 -i 1
Once you start sending the data from the source server simultaneously that can be seen across all members. Below is the screenshot for your reference.
Conclusion
This blog helps you to host multicast applications on AWS leveraging AWS Transit gateway. Hope you found it useful.
Written by Soumya. a, Software Developer at Powerupcloud Technologies.
In a common development environment we create login algorithms and maintain all the details in the project database.which can be a risk to maintain in huge application development and maintenance of all the client data and user information so we use a 3rd party software for maintaining login details to make the application more secure. Keycloak even helps in maintaining multiple applications with different / same users.
Problem statement:
Local database can become a task to maintain login details and maintenance of it so we use 3rd party software like keycloak to make the application more secure
How it works:
We add keycloak to the working environment and add required details of keycloak in code.and add application details in keycloak and run in the working environment.Detailed steps for local and server environments is in the Document.
Keycloak:
Keycloak is an open source Identity and Access Management solution aimed at modern applications and services. It makes easy to secure applications and services with little to no code.
Keycloak is used to add security to any application with the keycloak details added to the application it gives various options like a simple login ,login with username and password ,use of otp for login etc.
When we use keycloak we need not maintain login details in our database all the details are saved in the keycloak server and it is secure only the required details can be stored in our database.
Different features of keycloak:
Users authenticate with Keycloak rather than individual applications. This means that your applications don’t have to deal with login forms, authenticating users, and storing users. Once logged-in to Keycloak, users don’t have to login again to access a different application.
This also applied to logout. Keycloak provides single-sign out, which means users only have to logout once to be logged-out of all applications that use Keycloak.
Kerberos bridge
If your users authenticate to workstations with Kerberos (LDAP or active directory) they can also be automatically authenticated to Keycloak without having to provide their username and password again after they log on to the workstation.
Identity Brokering and Social Login
Enabling login with social networks is easy to add through the admin console. It’s just a matter of selecting the social network you want to add. No code or changes to your application is required.
Keycloak can also authenticate users with existing OpenID Connect or SAML 2.0 Identity Providers. Again, this is just a matter of configuring the Identity Provider through the admin console.
User Federation
Keycloak has built-in support to connect to existing LDAP or Active Directory servers. You can also implement your own provider if you have users in other stores, such as a relational database.
Client Adapters
Keycloak Client Adapters makes it really easy to secure applications and services. We have adapters available for a number of platforms and programming languages, but if there’s not one available for your chosen platform don’t worry. Keycloak is built on standard protocols so you can use any OpenID Connect Resource Library or SAML 2.0 Service Provider library out there.
Gatekeeper
You can also opt to use a proxy to secure your applications which removes the need to modify your application at all.
Admin Console
Through the admin console administrators can centrally manage all aspects of the Keycloak server.
They can enable and disable various features. They can configure identity brokering and user federation.
They can create and manage applications and services, and define fine-grained authorization policies.
They can also manage users, including permissions and sessions.
Account Management Console
Through the account management console users can manage their own accounts. They can update the profile, change passwords, and setup two-factor authentication.
Users can also manage sessions as well as view history for the account.
If you’ve enabled social login or identity brokering users can also link their accounts with additional providers to allow them to authenticate to the same account with different identity providers.
Standard Protocols
Keycloak is based on standard protocols and provides support for OpenID Connect, OAuth 2.0, and SAML.
Authorization Services
If role-based authorization doesn’t cover your needs, Keycloak provides fine-grained authorization services as well. This allows you to manage permissions for all your services from the Keycloak admin console and gives you the power to define exactly the policies you need.
How to use:
In Local:
Prerequisites:
Download keycloak
Run keycloak in port 8085 (default keycloak port:8080)./standalone.sh -Djboss.socket.binding.port-offset=5
Log in with master login which is registered while creating keycloak
In Server:
Prerequisites:
Download java
Download keycloak using wget
Run keycloak in port 8085 (default keycloak port:8080)./standalone.sh -Djboss.socket.binding.port-offset=5
Add ssl certificate for keycloak
Log in with master login which is registered while creating keycloak
Steps for server :
Installation
Step 1: Login to the Linux server Step 2: Download Keycloak.
cd /opt/
wget https://downloads.jboss.org/keycloak/7.0.0/keycloak-7.0.0.tar.gz
tar -xvzf keycloak-7.0.0.tar.gz
mv keycloak-7.0.0.tar.gz keycloak
Step 1: Login to Nginx server and update in nginx.conf file.
upstream keycloak {
# Use IP Hash for session persistence
ip_hash;
# List of Keycloak servers
server 127.0.0.1:8080;
}
server {
listen 80;
server_name keycloak.domain.com;
# Redirect all HTTP to HTTPS
location / {
return 301 https://\$server_name\$request_uri;
}
}
server {
listen 443 ssl http2;
server_name keycloak.domain.com;
ssl_certificate /etc/pki/tls/certs/my-cert.cer;
ssl_certificate_key /etc/pki/tls/private/my-key.key;
ssl_session_cache shared:SSL:1m;
ssl_prefer_server_ciphers on;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass http://ke
Once it’s completed restart the Nginx server to take immediate effect. Now access the given URL to access the keycloak server and use the credentials which you created in Step 9.
Steps for both server and local:
1. Create a new realm:
To create a new realm, complete the following steps:
From the Master drop-down menu, click Add Realm. When you are logged in to the master realm this drop-down menu lists all existing realms.
Type demo iin the Name field and click Create.
When the realm is created, the main admin console page opens. Notice the current realm is now set to demo. Switch between managing the master realm and the realm you just created by clicking entries in the Select realm drop-down menu.
2. Create new client:
To define and register the client in the Keycloak admin console, complete the following steps:
In the top-left drop-down menu select and manage the demo realm. Click Clients in the left side menu to open the Clients page.
2. On the right side, click Create.
3. Complete the fields as shown here:
4. Click Save to create the client application entry.
5. Change Access type to confidential
6. Click the Installation tab in the Keycloak admin console to obtain a configuration template.
7. Select Keycloak OIDC JSON to generate an JSON template. Copy the contents for use in the next section.
3. Role:
Roles identify a type or category of user. Keycloak often assigns access and permissions to specific roles rather than individual users for fine-grained access control.
Keycloak offers three types of roles:
Realm-level roles are in the global namespace shared by all clients.
Client roles have basically a namespace dedicated to a client.
A composite role is a role that has one or more additional roles associated with it.
3.1. Create new role
Roles->Add Role->Role name and Description(admin_role)
3.2. To add manage user permission for the newly created role
Client->select your client->scope->realmroles->add created role(admin_role).
5. Add permission to a new realm from master realm
5.1. Master realm->client->select your client(demo-realm)–>roles->manage-users(default false make it as true)
5.2. For making it true : Enable composite roles–>client-roles–>select your client(demo-realm)–>add manage users
6. For adding manage user permission for client in master
Master->roles->default roles->client roles–>select ur client(demo-realm)–>add manage users.
7. Once the permissions are given the first user in the new realm should be created using which we can create multiple users from code(outside keycloak)
7.1.Select your realm(demo)–>Users->New user->details(add email id and name)->
7.2.Credentials(password)->
7.3.Role mappings->Client role->realm management->check if manage users is present else add
In React JS for connecting keycloak and adding authentication
Keycloak.js file
JSON file for adding keycloak server details(json from installation tab)
Written by Ishita Saha, Software Engineer, Powerupcloud Technologies
In this blog, we will discuss how we can integrate AWS Polly using Python & React JS to a chatbot application.
Use Case
We are developing a Chatbot Framework where we use AWS Polly for an exquisite & lively voice experience for our users
Problem Statement
We are trying to showcase how we can integrate AWS Polly voice services with our existing chatbot application built on React JS & Python.
What is AWS Polly ?
Amazon Polly is a service that turns text into lifelike speech. Amazon Polly enables existing applications to speak as a first-class feature and creates the opportunity for entirely new categories of speech-enabled products, from mobile apps and cars to devices and appliances. Amazon Polly includes dozens of lifelike voices and support for multiple languages, so you can select the ideal voice and distribute your speech-enabled applications in many geographies. Amazon Polly is easy to use – you just send the text you want converted into speech to the Amazon Polly API, and Amazon Polly immediately returns the audio stream to your application so you can play it directly or store it in a standard audio file format, such as MP3.
AWS Polly is easy to use. We only need an AWS subscription. We can test Polly directly from the AWS Console.
There is an option to select Voice from Different Languages & Regions.
Why Amazon Polly?
You can use Amazon Polly to power your application with high-quality spoken output. This cost-effective service has very low response times, and is available for virtually any use case, with no restrictions on storing and reusing generated speech.
Implementation
User provides input to the Chatbot. This Input goes to our React JS Frontend, which interacts internally with a Python Application in the backend. This Python application is responsible for interacting with AWS Polly and sending response back to the React app which plays the audio streaming output as mp3.
React JS
In this implementation, we are using the Audio() constructor.
The Audio() constructor creates and returns a new HTMLAudioElement which can be either attached to a document for the user to interact with and/or listen to, or can be used offscreen to manage and play audio.
Syntax :
audio = new Audio(url);
Methods :
play – Make the media object play or resume after pausing. pause – Pause the media object. load – Reload the media object. canPlayType – Determine if a media type can be played.
Here, we are using only play() and pause() methods in our implementation.
Step 1: We have to initialize a variable into the state.
Step 3 : If any existing reply from Bot is already in play. We can stop it.
if (this.state.audio != undefined) {
this.state.audio.pause();
}
Step 4 :
This method interacts with our Python Application. It sends requests to our Python backend with the following parameters. We create a new Audio() object. We are passing the following parameters dynamically to handle speaker() method :
We are calling textToSpeech Flask API which accepts parameters sent by React and further proceeds to call AWS Polly internally. The response is sent back to React as a mp3 file. The React application then plays out the audio file for the user.
This blog showcases the simple implementation of React JS integration with Python to utilize AWS Polly services. This can be used as a reference for such use cases with chatbots.
Written by Arun Kumar, Associate Cloud Architect at Powerupcloud Technologies
Introduction
Amazon WorkSpaces is managed & secured Desktop-as-a-service (DaaS) provided by AWS cloud. WorkSpace eliminates the need for provisioning the hardware and software configurations, which becomes the easy tasks for IT admins to provision managed desktops on cloud. End users can access the virtual desktop from any device or browser like Windows, Linux, iPad, and Android. Managing the corporate applications for end users becomes easier using WAM (Workspace Application Manager) or integrating with existing solutions like SCCM,WSUS and more.
To manage the end user’s and provide them access to WorkSpaces below solutions can be leveraged with AWS.
Extending the existing on-premises Active Directory by using AD Connector in AWS.
Create & configure AWS managed Simple AD or Microsoft Active Directory based on size of the organization.
WorkSpaces architecture with simple AD approach
In this architecture, WorkSpace is deployed for the Windows and Linux virtual desktop both are associated with the VPC and the Directory service (Simple AD) to store and manage information of users and WorkSpace.
The above architecture describes the flow of end users accessing Amazon WorkSpaces using Simple AD which authenticates users. Users access their WorkSpaces by using a client application from a supported device or web browser, and they log in by using their directory credentials.The login information is sent to an authentication gateway, which forwards the traffic to the directory for the WorkSpace. Once the user is authenticated, a streaming traffic is processed through the streaming gateway which works over PCoIP protocol to provide the end users complete experience of the desktop.
Prerequisites
To use the WorkSpace the following requirements need to be completed.
A directory service to authenticate users and provide access to their WorkSpace.
The WorkSpaces client application is based on the user’s device and requires an Internet connection.
For this demo we have created the Simple AD, this can be created from the workspace console.
Directory
Create the Simple AD
Choose the Directory size based on your organization size.
Enter the fully qualified domain name and Administrator password make note of your admin password somewhere for reference.
We’ll need a minimum of two subnets created for the AWS Directory Service which requires Multi-AZ deployment.
Directory is now created.
WorkSpace
Now let’s create the WorkSpace for employees.
Select the Directory which you need to create WorkSpace for the user access.
Select the appropriate subnets that we created in the previous section to provision the workspaces in a Multi-AZ deployment.
Ensure that the self-service permissions is always set to “NO”, else the users will have the privilege to change the workspaces configurations on the fly without the workspaces admin knowledge.
Enabling WorkDocs based on the user’s requirement.
You can select the user from the Directory list or user can create a new user on the fly.
Select the Bundle of compute, operating system, storage for each of your users.
You can select the running mode of the WorkSpaces based on your company needs. This can directly impact the monthly bill as selecting “Always -On “ mode will have a fixed pricing whereas ‘AutoStop’ mode is an on-demand pricing model. Ensure right running mode is selected during the workspaces creation based on business requirements of the user.
Review and launch your workSpace.
Now your WorkSpace is up and running. Once it is available and ready to use. You will receive an email from amazon with workspaces login details.
By selecting the URL to create a password for your user to access the WorkSpace.
Download the client based on your device or you have web login.
Install the WorkSpace agent in your local.
Open the WorkSpace client and enter the registration code which you received in the email.
It prompts for username and password.
Now you are prompted to your Virtual Desktop
Security and compliance of WorkSpace
By default encryption at transit.
KMS can be used to encrypt our data at rest.
IP based restrictions.
Multi-factor authentication(RADIUS)
PCI DSS Level 1 Complaint.
HIPAA-Eligible with business level agreements
Certification- ISO 9001 and ISO 27001
Cost
No upfront payment.
On- Demand pricing – Autostop of the WorkSpaces – In this model when the user is not using the virtual desktop Amazon automatically gets stopped based on the Autostop hours selected for the user.
Fixed Pricing – Always-On model – In this model the WorkSpace virtual desktop cost is calculated on a fixed monthly basis based on the selected bundle.
.Licencing
Built in license – Which allows us to select the right Windows bundle as per business needs.
WorkSpaces additionally supports BYOL( bring your own license) license model for Windows 10.
Monitoring
CloudTrail can monitor the API calls.
CloudWatch Monitoring can see the number of users connected to WorksSpaces and latency of the session and more.
Additional features
API support(SDK, AWS CLI)
WorkSpace Application Manager(WAM).
Custom images.
Audio input.
Pre Built applications in AWS Marketplace, we can add those applications to our WorkSpace.
User control in Directory level.
Integration with WorkDocs.
Conclusion
By adapting to the AWS WorkSpaces we can enable the end-users to securely access the business applications, documents that they are currently using within their organization devices or existing VDI solutions and experience a seamless performance of their desktop on cloud and also access the workspaces in the most secure way which prevents any data breach by enabling encryption options and also restricting client devices for users.
Benefits like reducing the overhead of maintenance of existing hardware and purchasing new hardware. Monitoring and managing the end-user workspaces becomes an easy task by integrating with AWS native services.