All Posts By

powerupcloud

How CTX used AWS Well-Architected to save infrastructure cost by 70%

By | Cloud Case Study | No Comments

Customer: One of the largest digital asset trading companies

Summary

Cyberdyne Tech Exchange (CTX) is one of the largest digital asset exchange companies who publish and trade-in asset-backed security tokens like artwork, real estate, diamonds, etc. This trading is carried out by qualified issuers and investors through their newly developed applications which they plan to migrate to AWS cloud. The deployed applications must provision for high availability and scalability, automation and security along with a well-architected framework.

About the customer

CTX is one of the largest digital asset exchange companies where qualified issuers and investors publish and trade asset-backed security tokens. These security tokens are backed by curated investment-grade assets such as artwork, diamonds, real estate and equity securities.

Their platform offers a complete suite of services that include primary issuance, trading and settlement as well as custody services.

Global institutional investors trade 24/7 on their trading architecture that is powered by Nasdaq’s matching and market surveillance (SMARTS) engines. Clients have the assurance that they are trading on an institutional-grade platform with fair and transparent price discovery.

Problem Statement

CTX intends to deploy their newly developed application to AWS. The deployed application should adhere to the following:

  • Highly Available & Scalable environment
  • AWS Well-Architected Framework
  • MAS & PDPA compliance
  • Automated Infrastructure Provisioning
  • Automated Application deployment

Proposed Solution

Architecture Description

  • The AWS accounts are provisioned based on the landing zone concept, with a centralized logging account, security account, shared service account and separate accounts for different applications.
  • Master Account hosts AWS organization Cost, SCP for member accounts and consolidated billing.
  • Shared service account hosts all common services like Transit Gateway, ECR, Jenkins, Bastion Route 53 etc.
  • Security Account hosts GuardDuty master and all other accounts will be added as members.
  • All security services like IPS & IDS, Cyberark and other future security-related services will be deployed in Security accounts.
  • Centralized Logging Account host all logs like VPC flow logs, ELB logs, CloudTrail, Cloudwatch and Elasticsearch streaming live application logs from all member accounts.
  • DEV Account / UAT Account / Staging Account / Production Account is provisioned to host the application.
  • All the infrastructure provisioning happens through CloudFormation Template. [VPC, EC2, EKS, S3, RDS, TGW, TGW connections]
  • CTX has two major applications – Business System[BS] and Business Application[BA].
  • BS & BA are provisioned in separate VPCs for all the environments.
  • BS is a monolithic application and the applications are deployed on EC2 instances.
  • BA is a service layer and talks to BS systems through API. BA deployed in the EKS cluster.
  • Around 20 microservices are deployed in the EKS cluster.
  • ALB ingress controller has been deployed along with AWS WAF for the application communication from External users to microservices.
  • The application deployment lifecycle is completely automated using JenkinsFile.
  • PostgreSQL RDS is used as a Database.
  • CloudWatch service will be used for monitoring and SNS will be used to notify the users in case of alarms, metrics crossing thresholds etc.
  • All snapshot backups will be regularly taken and automated based on best practices.

Security & Logging

  • AWS SSO is created and appropriate IAM accounts have been formed with least permissible access provided to the accounts.
  • MFA has been enabled on both root and IAM accounts.
  • Except for bastion, all servers will be placed in private subnets.
  • Security groups are used to control traffic at the VM level. Only the required ports will be opened, and access allowed from required IP addresses.
  • Network Access Control Lists (NACLs) are used to control traffic at the subnet level.
  • CloudTrail has been enabled to capture all the API activities occurring in the account.
  • VPC flow logs enabled to capture all network traffic.
  • AWS Guard Duty enabled for threat detection and identifying malicious activities in the account like account compromise.
  • AWS Config enabled, and all the AWS recommended config rules are created.
  • All servers will be encrypted using KMS. KMS keys are stored in Security accounts and admin access to keys are restricted only to Security Admins.

AWS WAF

  • WAF is mandatory for Compliance requirements[MAS].
  • AWS WAF has been used as a Web Application Firewall for the external-facing applications.
  • WAF Managed rules are created to mitigate top 10 OWASP’s web application vulnerabilities.
  • AWS CloudFormation has been created to deploy the WAF rules in all the required environments.
  • The following rules are created using the CloudFormation template.
    • Generic-detect-admin-access
    • Generic-detect-bad-auth-tokens
    • Generic-detect-blacklisted-ips
    • Generic-detect-php-insecure
    • Generic-detect-rfi-lfi-traversal
    • Generic-detect-ssi
    • Generic-enforce-csrf
    • Generic-mitigate-sqli
    • Generic-mitigate-xss
    • Generic-restrict-sizes
  • Web ACL logging has been enabled to capture information about all incoming requests.
  • AWS Cloudwatch has been used to monitor and alert based on WAF rules.
  • AWS WAF has been integrated with ALB. ALB has been provisioned by ALB ingress controller which is deployed in EKS cluster.

Benefits

  • Successfully provisioned the entire application in AWS using CloudFormation with all the necessary security measures as per MAS compliance specifications.
  • Spot instances are used for scalable instances. This saves AWS infrastructure cost by 60% to 70%.
  • Application deployment is completely automated using Jenkins.
  • The highly secured environment has been provisioned with the help of AWS services like AWS WAF, Guard Duty and other third-party solutions like Trend Micro Deep Security Manager.

Cloud platform

AWS.

Technologies/Services used

EC2, RDS, S3, EKS, ECR, TGW, Guard Duty, Route53, ALB, AWS WAF, IAM, KMS.

AWS Lambda Java: Creating Deployment Package for Java 8/11 using Maven in Eclipse – Part 1

By | Powerlearnings | No Comments

Written by Tejaswee Das, Software Engineer at Powerupcloud Technologies Collaborator: Neenu Jose, Senior Software Engineer

Introduction

When we talk about being serverless, AWS Lambda is definitely one that we connect with. It was never that simple before, AWS Lambda has made life easy for Developers and Data Engineers alike. You will hardly find any use-cases involving AWS without Lambda. It’s a nightmare to think about AWS without a Lambda. To know more about my AWS Lambda and some simple S3 events use cases around it, you should have a look at one of my earlier posts on AWS Lambda

This post is part of a two-part blog series. This part 1 blog will guide you through the steps to create a Java (Java 8/Java 11) deployment package for AWS Lambda in Eclipse using Maven and use S3 Event triggers. 

In Part 2, we will discuss steps on using SES with S3 Event triggers to Lambda.

Use Case

One of our clients had their workloads running on Azure Cloud. They had few serverless applications in Java 8 in Azure Functions. They wanted to upgrade Java from Java 8 to Java 11. Since Java 11 was not supported (Java 11 for Azure Functions has recently been released in Preview), they wanted to try out other cloud services – that’s when AWS Lambda was the one to come into the picture. We did a POC feasibility check for Java 11 applications running on AWS Lambda.

Deployment Steps

Step 1: Install AWS Toolkit in Eclipse

1.1 Open Eclipse → Go to Help → Install New Software

1.2 Enter https://aws.amazon.com/eclipse in Work with and select AWS Toolkit for Eclipse Core(Required)

1.3 Click Next and Install

Note: The toolkit requires Eclipse 4.4 (Luna) or higher.

Step 2: Create New AWS Lambda Function

You might need to restart Eclipse for the installation to reflect.  Add your AWS Access Keys when asked. This step is optional, you can anytime add/remove AWS credentials/accounts from Preferences Menu.

2.1 Go to File → New → Other…

2.2 Select AWS → AWS Lambda Java Project → Next

2.3 Fill in your Project Name and other details

Class Name is your Lambda Handler. In Lambda terms – it’s like the main function of your Project.  You can have anything here – we are using the default name.

For our demo we are using in-built S3 Events. There are a lot of other events to use from like – DynamoDB Event, Stream Request Handler, SNS Event, Kinesis Event, Cognito Event or even Custom if you want to build from scratch.

2.4 Click Finish

For our demonstration & test purposes, you can go with the default code. We are using us-east-1.  Make sure you add the region, you might encounter an error if not added.

Sample Code

package com.amazonaws.lambda.demo;

import com.amazonaws.regions.Regions;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.S3Event;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.GetObjectRequest;
import com.amazonaws.services.s3.model.S3Object;

public class LambdaFunctionHandler implements RequestHandler<S3Event, String> {

	private AmazonS3 s3 = AmazonS3ClientBuilder.standard()
    		.withRegion(Regions.US_EAST_1)
    		.build();

    public LambdaFunctionHandler() {}

    // Test purpose only.
    LambdaFunctionHandler(AmazonS3 s3) {
        this.s3 = s3;
    }

    @Override
    public String handleRequest(S3Event event, Context context) {
        context.getLogger().log("Received event: " + event);

        // Get the object from the event and show its content type
        String bucket = event.getRecords().get(0).getS3().getBucket().getName();
        String key = event.getRecords().get(0).getS3().getObject().getKey();
        try {
            S3Object response = s3.getObject(new GetObjectRequest(bucket, key));
            String contentType = response.getObjectMetadata().getContentType();
            context.getLogger().log("CONTENT TYPE: " + contentType);
            return contentType;
        } catch (Exception e) {
            e.printStackTrace();
            context.getLogger().log(String.format(
                "Error getting object %s from bucket %s. Make sure they exist and"
                + " your bucket is in the same region as this function.", key, bucket));
            throw e;
        }
    }
}

Step 4: Java Runtime Environment (JRE)

4.1 Go to Windows → Preferences → Java → Installed JREs

Step 5: Maven Build

Now to the final step where we will build our deployment package.

5.1 Right click on your project in the Project Explorer → Run As → Maven Build

5.2 Edit Configuration & Launch

Enter ‘package’in Goals. Select your JRE, can leave else as default.

5.3 Run

Your build should happen without any errors for Java 8, but with Java 11, you might run into few errors. Make sure you add updated mockito-core.

In pom.xml of  the generated project change the version of mockito-core

<dependency>
      <groupId>org.mockito</groupId>
      <artifactId>mockito-core</artifactId>
      <version>2.7.22</version> //Change to 3.3.3
      <scope>test</scope>
    </dependency>

This version change is necessary for java 11 build to work.

On successful build, you should see something similar

Sample build

[INFO] Scanning for projects...
[INFO] 
[INFO] ------< com.amazonaws.lambda:demo >----------------------
[INFO] Building demo 1.0.0
[INFO] -----------------[ jar ]---------------------------------
[INFO] 
[INFO] -- maven-resources-plugin:2.6:resources (default-resources) @ demo ---
[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 0 resource
[INFO] 
[INFO] --- maven-compiler-plugin:3.6.0:compile (default-compile) @ demo ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ demo ---
[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 1 resource
[INFO] 
[INFO] --- maven-compiler-plugin:3.6.0:testCompile (default-testCompile) @ demo ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ demo ---
[INFO] Surefire report directory: D:\Eclipse Workspace\Maven\demo-blog-s3\target\surefire-reports

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running com.amazonaws.lambda.demo.LambdaFunctionHandlerTest
Received event: com.amazonaws.services.lambda.runtime.events.S3Event@124ac145
CONTENT TYPE: image/jpeg
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.117 sec

[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ demo ---
[INFO] Downloading from : https://repo.maven.apache.org/maven2/org/apache/maven/maven-archiver/2.5/maven-archiver-2.5.pom
.
.
.
.
.
[INFO] Replacing original artifact with shaded artifact.
[INFO] Replacing D:\Eclipse Workspace\Maven\demo-blog-s3\target\demo-1.0.0.jar with D:\Eclipse Workspace\Maven\demo-blog-s3\target\demo-1.0.0-shaded.jar
[INFO] Dependency-reduced POM written at: D:\Eclipse Workspace\Maven\demo-blog-s3\dependency-reduced-pom.xml
[INFO] ----------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ----------------------------------------------------------------
[INFO] Total time:  03:10 min
[INFO] Finished at: 2020-08-03T14:53:09+05:30
[INFO] ---------------------------------------------------------------

Look at the highlighted path to locate your .jar file. This is as per your Workspace directory configuration.

My Eclipse Workspace directory here is : D:\Eclipse Workspace\Maven\

Step 6: Create Test S3 Bucket

Create S3 bucket,  for putting the files, from the AWS console.

Make sure this bucket is in the same region where you are planning to create the Lambda Function.

Step 7: Creating Lambda Function in AWS Console

There are a couple of ways of creating Lambda functions. The easiest way is through the AWS Console. Choose Language, create Function and get going. You get a lot of runtimes to choose from. Start writing code on the Go. Works great for interpreted languages – Python, Node, Ruby others.

But for compiler based languages like Java, Go, .NET, you will need to upload the deployment package and do not allow in-line editing.

There are other ways to directly upload Lambda functions from Eclipse itself. We faced issues with that, so to get our task done, we created a deployment package (.jar in Java) and uploaded it to Lambda. Works great.

7.1 S3 Event Triggers & IAM

Please refer to one of my previous posts

Follow solution steps 1&2. Create required Execution roles and attach policies to provide required permissions.

Step 8: Deploying Lambda Function

8.1 All setup done, you just need to upload the maven built jar file in Step 5.3 here.

You can either directly upload if file size is less than 10MB, else you can upload large files using Amazon S3.

Great! Your code is deployed successfully. Time to test it now.

Step 9: Testing

9.1 To test your deployment, Go to S3

9.2 Go to your Bucket that you created in Step 6 and configured trigger in 7.1

Upload a test file

9.3 Go back to your Lambda Function. Click on Monitoring → View logs in CloudWatch

You can see the S3 trigger events log here. When the file was uploaded to that bucket, it triggered Lambda.

Conclusion

This was a very simple proof of concept that we have demonstrated here. This was mainly to get AWS Lambda working with Java 11 for our client. In the next part of this series, we will try to demonstrate some more stuff we can try with AWS Lambda using Java 8/11 – using AWS Java SDK to send emails & notifications on file upload to S3.

Hope this was informative. We had a tough time figuring the correct resources to use, so planned to write this to help folks out there looking for help with different Java versions & AWS Lambda.

References

https://aws.amazon.com/eclipse/

Data lake setup aiding rapid insights with regulatory compliance

By | Data Case Study | No Comments

Summary

The customer is a leading US-based medical equipment company catering mainly to cloud-connected medical devices that transform care for people with sleep apnea, COPD and other chronic diseases. They are looking at integrating their MyApp application’s data to MosaIQ Data Lake platform on AWS cloud. MyApp is a self-monitoring sleep therapy progress application used extensively by medical representatives and caregivers.

About Customer

The customer is one of the top medical equipment companies based in San Diego, California. They primarily provide cloud-connectable medical devices for the treatment of sleep apnea, chronic obstructive pulmonary disease (COPD) and other respiratory conditions. It employs more than 7,500 employees worldwide with a presence in more than 120 countries globally that have manufacturing facilities in Australia, France, Singapore and the United States.

Problem Statement

MyApp is the customer’s patient self-monitoring application that helps track patient’s sleep therapy progress both online as well as on smartphones. MyApp facilitates tailored coaching and handy tips to make therapy more comfortable. The Customer wanted to,

  • To integrate MyApp application data to MosaIQ Data Lake platform on AWS.
  • Reuse and replicate data flow of AirView, inclusive of policy, pseudo rules, de-identification, Protected Health Information (PHI) and non-PHI.
  • Build code for data staging, data transformations for regulatory adherence and storage on AWS Simple Storage Service (S3).

Proposed Solution

Powerup to analyze and define the scope of integration. Obtain complete access to AWS development, system integration test and production setups and create AWS services catering to Virtual Private Network (VPC)s, subnets, route tables and Internet gateways. Define fixed and incremental S3 buckets for PHI as well as non-PHI accounts.

Ensure that a detailed definition of MyApp S3 policies including source connections and scheduling is made available before coding in the development environment. Also, freeze all policies and pseudo rules for PHI and non-PHI data encryption until coding completion and migration to test environment.

 Implement Data Migration Service (DMS) to migrate data from on-prem to AWS cloud storage S3. Data with all the files to be pushed inside a single folder per table in the S3 bucket via lambda functions. CDC to be implemented for incremental data transfer to S3 event which in turn will trigger and push the requests to Amazon Simple Queue Service (SQS).

Leverage Fargate containers to run scripts in order to check data against the IDs. Run Electronic Medical Records (EMR) cluster by applying masking logic to this data which is sent for further analytics. Identify and save the same in S3 buckets. The next step is to create a test strategy for unit and integration tests.

Powerup DevOps to configure Complement Fixation Test (CFT) and implement continuous integration and continuous deployment (CI/CD) process for MyApp migration. Create integration test scripts, test CI/CD process before the actual system integration migration (SIT), prepare migration to development and UAT environments and devise automation.

The next task is to migrate to SIT through Ci/CD to validate all the resources and execute full load and schedule trigger for CDC load before moving to production deployment. Repeat the process in the production environment and perform UAT.

Post the integration, Powerup took up the responsibility of architectural assessment and went ahead with the Well-Architected Review (WAR) framework. WAR is an architectural assessment based on AWS framework that is built on five pillars – operational efficiency, reliability, security, performance efficiency and cost optimization.

Powerup identified the workload to be reviewed and once relevant data were identified, reviews were arranged with the stakeholders at the company. Review could be conducted onsite or remotely. A report aligning with AWS best practices, categorized as critical, needs improvement or meets best practices were generated for the selected workload. The report highlights the priority with which remediation should be carried out.

 

Benefits

MyApp application data has been integrated to MosaIQ on AWS cloud successfully. This platform can now provide capabilities to wider business team communities as MosaIQ is a data lake platform built on top of AWS stack and stores structure and unstructured data in raw format. It assists in the rapid discovery of actionable insights to improve patient care and business outcomes while maintaining security and regulatory compliance.

MosaIQ platform allows analytics, engineers, and data scientists to respond more efficiently and provide timely information to support better business decisions. This is mainly because data segregation is more organized and bifurcated for PHI and non-PHI data.

Reusable design from MyApp integration can be utilized for similar use cases across the company. A significant improvement in performance was noticed due to features like scalability and reduction of in-memory processing.

Cloud platform

AWS.

Technologies used

AWS S3, Lambda, AWS Glue, AWS EMR, AWS DynamoDB, AWS Step Function, AWS CloudFormation, AWS DMS + CDC.

Greenfield Deployment for One of the top biopharmaceutical companies

By | Cloud Case Study | No Comments

Summary

The customer is an international clinical-stage biopharmaceutical company focusing on cellular immunotherapy treatments for cancer is looking at adopting cloud services for the very first time. They plan to structure their database on Google cloud platform. The intention is to enhance performance and have efficient research outputs from their applications especially since they handle large volumes of data. They were also looking at the ability to scale at any point of time during peak loads along with complete automation of continuous integration and continuous deployment (CI/CD) process for easier deployments and better auditing, monitoring and log management.

About Customer

The customer is a clinical-stage biopharmaceutical organization with the scientific vision of revolutionizing the treatment of cancer. They specialize in the research, clinical development and commercialization of cancer immunotherapy treatments. The combination of technologies from its academic, clinical and commercial research partners have enabled the company to create a fully integrated approach to the treatment of cancer with immunotherapy. They plan to work with Powerup to use Google Cloud Platform (GCP) as its cloud platform for their Cancer Research program.

Problem Statement

The customer plans to use Google Cloud Platform (GCP) as its cloud platform for their Cancer Research program. Data scientists will be using a Secure File Transfer Protocol (SFTP) server to upload data on an average of one to two times a month with an estimated data volume of 2-6 TB per month.

The data transferred to GCP has to undergo a two-step cleansing process before uploading it on a database. The first step is to do a checksum to match the data schema against the sample database. The second step is transcoding and transformation of data after which the data is stored on a raw database.

Proposed Solution

Greenfield setup on GCP

Understanding customer needs while also understanding the current python models and workflows to be created were the first steps in initiating this project. Post these preliminary studies and sign-off, a detailed plan and solution architecture document formed a part of the greenfield project deliverables.

The set up included shared services, logging UAT and production accounts. The Cloud Deployment Manager (CDM) was configured to manage their servers, networks, infrastructure and web applications. Cloud Identity and Access Management (IAM) roles were created to access different GCP services as per customer specification, which helped in securely accessing another service.

On-premise connectivity is established via VPN tunnels.

The data scientists team have built nearly 50+ python/R models that help in the data processing. All the models are stored in GitHub currently. Python model will meet performance expectations when deployed and CI/CD pipelines to be created for 48 python models.

Once the data arrives on the database, the customer wants the python code to process the data and store the results on an intermediate database.

Multiple folders were created to deploy production, UAT and management applications. Cloud NAT was set up to enable internet access, Virtual Private Cloud (VPC) peering done for inter-connectivity of required VPCs and SFTP server was deployed on Google Compute Engine.

Once data gets uploaded on the raw GCS, checksum function will be triggered to initiate data cleansing. In the first phase, the data schema will be verified against a sample database after which the data will be pushed to transcoding and transformation jobs. Processed data will be stored to GCS.

All the python/R models will be deployed as a docker image on a Kubernetes cluster that is managed by Google ensuring that GCP is taking care of high availability and scaling.

The customer will have multiple workflows created to process data that in turn would be able to define all the workflows for python model executions.

The customer team will view the current data through a web application.

The processed data also has to be synced back to the on-premise server. An opensource antivirus tool is used to scan and verify data before migrating to Google Cloud Storage (GCS).

Monitoring and Logging

Monitoring tools such as stackdriver for infrastructure and application monitoring as well as log analytics was used as it supported features like tracing, debugging and profiling to monitor the overall performance of the application.

Additional tools such as Sensu to monitor infrastructure, Cloud Audit logging that checks Application Program Interface (API) activities, VPC flow logs to capture network logs and FluxDB as well as Grafana to store data on the database and visualize and create dashboards respectively were utilised.

Stackdriver logging module ensures centralized logging and monitoring of the entire system.

Security and Compliance

IAM with least permissible access and Multi-Factor Authentication (MFA) be enabled as an additional layer of security for account access. The databases won’t have direct access to critical servers like database and app servers. Firewall rules will be configured at the virtual networking level for effective protection and traffic control regardless of the operating system used. Only the required ports will be opened to give access to the necessary IP addresses.

Both data in transit and at rest are by default encrypted in GCP along with provisions for static code analysis and container image-level scanning.

CI/CD pipeline

Setup CI/CD pipeline using Jenkins which is an open-source tool that facilitates modern DevOps environment. It bridges the gap between development and operations by automating building, testing and deployment of applications.

Benefits

After the successful deployment of code, code integration and log auditing got simpler. The customer was able to handle large blocks of data efficiently and auto-scaling at any point of time during new product launches and marketing events became effortless. This improved their performance as well.

The customer was also able to scale up without worrying about storage and compute requirements. They could move into an Opex model on the cloud by paying as per usage.

Moving to GCP enabled the customer to save 20% of their total costs as they could adopt various pricing models and intelligent data tiering.

Cloud platform

GCP.

Technologies used

Shared VPC, Cloud VPN, Compute Engine, Kubernetes Engine, Cloud Storage, Cloud Security Scanner, Cloud IAM, Cloud Security Command Center, Cloud Registry.

Key evaluating factors in deciding the right Cloud Service Provider

By | Powerlearnings | No Comments

Compiled by Kiran Kumar 

Since August 9, 2006, when the then Google CEO Eric Schmidt introduced the term to an industry conference, cloud computing has been driving the IT industry over the past decade and a half through its outright performance, ease of use, and its industry adaptability. Broadly segmented in Iaas, PaaS, SaaS, BPaaS, and Management & Security with the line’s blurring between each it, makes it challenging to find the right fit for your computing needs. So we have listed down a few factors you should consider while evaluating, and what are the core considerations for each. 

  1. Infrastructure setup
  2. The learning curve
  3. The relevance of the service catalog
  4. Data governance and Security
  5. Partner relationships
  6. SLAs
  7. Consistency and Reliability
  8. Back-Up and Support
  9. Cost
  10. Flexibility and Exit strategy

Infrastructure Setup

Infrastructure setup can have a huge impact on your latency, network speeds, data transfer rate, and so on, a diversified infrastructure setup would require a data center (also referred to as Availability Zones) that is closest to your preferred location. 

There are 4 standards of data centers as defined by uptime institute – globally accepted standards for data center planning, the exact guidelines and protocols are not clearly out in public but some of these metrics include redundant electrical path for power, uptime guarantee, cooling capacity, and concurrent maintainability, etc. Tier 4 is the highest standard for data centers. Min requirements for Tier 4 data centers are 

  • 99.995 % uptime in a  year. 
  • 2N+1 infrastructure (two times the amount required for operation plus a backup). 
  • Maximum allowed downtime per year of 26.3 minutes. 
  • 96-hour power outage protection.

Data centers can be easily affected by power outages, earthquakes, tornadoes, lightning strikes, etc. hence require careful planning, try and get a sense of the key design parameters adopted in setting up their data centers to counter such occurrences. Also, make sure to evaluate the cloud provider’s crisis management processes and guidelines as it showcases their ability and how well equipped they are to quickly resolve an ongoing crisis.

If your enterprise is into IoT and edge computing check for highly redundant network connectivity (5g) and low latency services to improve response times and save bandwidth. They are key factors in supporting the edge computing environments like real-time securities market forecasting, autonomous vehicles, and transportation traffic routing, etc. 

Key considerations

  • Location (Availability Zone’s)
  • Datacenter tier
  • Crisis management guidelines and protocols
  • Roadmap for upcoming technology support

The learning curve 

Despite its popularity among enterprises, with almost all Fortune 500 companies having one provider, there is still a shortage of cloud understanding. Cloud transformation can be challenging and demands to be considered as a separate project of its own, the lack of necessary cloud skills can cause inefficient migration leading to unintended consequences and unwanted costs. 

Every organization has its strengths and weaknesses, identify your strengths, and try to build your Cloud infrastructure around it. Start by assessing each cloud provider and the type of offerings available. It is imperative to be cognizant of all skills required for the general operation, governance, compliance amongst others. The storage of data is often an afterthought, stemming from a general absence of protocol-related knowledge. This needs to be addressed through various upskilling programs and strategic talent acquisitions. So it is important to list down for each of those providers how steep the learning curve would be before and after the migration. As a remark, most companies preferred to outsource these functions to specialist managed services providers.

Key considerations

  • Ease of learning
  • Upskilling support from the cloud provider
  • Active communities and partnership 

The relevance of the service catalog

Each cloud provider offers different products and services but it is important to make sure that your cloud goals align with the provider’s vision for improvement. Do your preferences match with the provider’s standards, SLA’s and your security needs, how much re-coding or customization is required at the architectural level to suit your workloads, and what are the associated costs? 

Especially if you are looking for SaaS-based services with a high dependency on a particular application, understanding the service development roadmap, or how they continue to innovate, grow, and support their product over time. Does their roadmap fit your needs in the long term? 

Cloud providers also offer a lot of services to assist you in your cloud transformation, assessment, and planning or in case of a large scale public cloud provider they provide a lot of offers custom made for your organization. In support, they also have a well-established partner community that can help you with all your cloud requirements. 

Key considerations

  • Services in-line with your needs
  • Rich and broad marketplace and an active developer community
  • Compatibility over the long term

Data Governance and Security

Storing data can be tricky simply because of the diversities in the data law across the world. Every organization needs to be aware of the local data regulations and prevailing privacy laws.

Your choice will depend on the cloud provider offering most flexibility and most compliant, also be aware of the provider’s data center location, and verify it.

Data encryption is another factor that needs your attention. Assess what are the different modes of encryption available for both data transit and during storage. Check the provider’s history for any major incidents that have occurred and understand what processes they have in place to quickly resolve the issues. High risk, highly sensitive data can be stored using much more secure and encrypted data storage solutions to and cheaper storage solutions to store some of the less sensitive data like (inventory information, daily logs, etc). 

On the information-security side check what are the compliance standards followed and any recognizable certifications they hold. However, even with all this in place, it is important for the cloud provider to offer the flexibility to support your own security practices and your commitments to your clients.   

Key considerations

  • Flexible data access and management
  • Data Compliance & Security
  • Wide range of data services  

Partner relationships

It is common practice to have a partner ecosystem to guide and facilitate these transitions to the cloud. It is, therefore, important to assess the provider’s relationship with key vendors, their accreditation levels, technical capabilities, number of projects completed, staff certifications, and the overall expertise they bring to the table. Important to note here, expertise in multi-cloud is a bonus. Powerup, for example, is a top-tier partner with the big three cloud service providers.

If you are largely reliant on SaaS-based services, check for the overall compatibility of the product across the platforms, as some of the SaaS-based services are platform-specific. Look for an active marketplace to buy complementary services that are super compatible.

In some of the regions, cloud services are made available through a subcontractor mainly due to the local laws like in the case of China, such interdependencies have to be uncovered and guarantee the primary SLAs stated across all parts of the service.

Key factors

  • Check for accreditations
  • Level of expertise across platforms
  • Relationship with the cloud provider
  • Service compatibility across platforms 

SLAs

Cloud agreements seem complex simply because of the lack of industry standards which defines how these contracts should be constructed. However, ISO/IEC 19086-1:2016 tries to an extent to facilitate a common understanding between cloud service providers and cloud service customers. Usually, agreements are a mixture of commonly agreed general terms and conditions and some negotiated terms.  

Service Levels

Make sure each service objectives like accessibility, service availability or uptime in percentages, service capacity and what is the upper limit in terms of users, connections, resources response time, and deliverables are defined. Be clearly aware of your roles and responsibilities related to delivery, provisioning, service management, monitoring, support, escalations, etc and how responsibilities are split between you and the provider. 

Other scenarios include during an outage or natural disaster the min and max accepted downtime, data loss, or recovery times have to be clearly analyzed against your requirements. Control over data access, data location, confidentiality usage, and ownership rights are crucial check for standards and commitments under data resilience and data backup needs to be in line with your requirements and necessary provisions for a safe exit strategy.

Some of the key business considerations: 

  • Contractual and service governance includes to what extent the provider can unilaterally change the terms of service or contract
  • What are the policies on contract renewals and exits and what are the notice periods?
  • What insurance policies, guarantees, and penalties are included, and some exceptions.

Key considerations

  • Key SLAs
  • How compatible are the terms and conditions with your organization’s goals
  • Are they equipped to support their claims?
  • Are they negotiable?
  • Are the liabilities and responsibilities equally shared?  

Consistency and Reliability 

High availability and reliability are essential for both CSP and the client in maintaining customer’s confidence and preventing revenue losses due to service level agreement (SLA) violation penalties. Cloud computing has appealed to a larger audience in recent years for supporting critical mission systems. However, the lack of consistency in cloud services is quickly becoming a major issue. According to 2018 research reports, about $285 million have been lost yearly due to cloud service failures and offering availability of about 99.91%.

No cloud platform is perfect and downtime may occur more often than not, so try to measure it against their SLAs for the last 6-12 months, this data is mostly available online if not on request. Check what learnings they take back from such occurrences and how consistent they have been in their recovery times as stated in SLAs. Also, ensure monitoring and reporting tools on offer are sufficient and can neatly integrate into your overall management and reporting systems.

Key factors

  • Check for consistency in delivery through past performance.
  • Fault management and reporting systems. 

Back-Up and Support

Check what back-up provisions and processes are in place and understand the limits of their ability to support your data preservation expectations. Roles, responsibilities, escalation processes, and who has the burden of proof, all must be clearly documented in the service agreement. Taking into consideration the increasing levels in criticalness of data, data sources, scheduling, backup, restore, integrity checks, etc. Consider purchasing additional risk insurance if the costs associated with recovery are not covered by the provider’s terms and conditions.

Cloud providers offer a wide range of support services to help their customers out on each step migration, managed services, etc. The support delivery medium offered is also important, where you want them to be available – a phone call, chat, or email? And if they are offered through a partner, the expertise they bring to you. Here staff certification is a good barometer of the quality of the support on offer. 

Key considerations

  • Well equipped multi-channel support services (24/7)
  • Clear documentation around the roles and responsibilities
  • Insurance coverage

Cost

Don’t just go by the list price, as providers might offer services at low cost but may not offer you the optimum level of performance. While that does not mean more expensive is better. The correct way to approach it is by comparing all of them against your core requirements as illustrated above. It is not uncommon to ask for offers, so do so with the ultimate goal to incorporate all the services you need in the desired price range. Studies express that just basic optimization can save about 10% of your cloud cost. Some of the cloud providers have tried and tested ways through which you can save costs. Also through a multi-cloud approach, you can leverage far more value and flexibility, read more about it here

Key considerations

  • Price to value comparison
  • Look for offers
  • Predefined guidelines to save cost 

Flexibility & Exit Strategy

Vendor lock-in is an important factor in most considerations, however, some things cannot be avoided and it’s best that we check if the provider has minimal use of such services. Here is where adopting open source services can be an effective workaround. Also, stay aware of the updates which drastically change the working model policies and technologies of a product to support a particular platform and make sure to have policies in place to counter such situations.

Exiting a CSP can be tricky; it boils down to the exit provisions provided by the service provider and the services levels agreed upon by both parties. All your digital assets starting from your data products and services need to have their own exit strategy which needs to be integrated deeply into your cloud transformation plan. Most organizations don’t include an exit strategy in their cloud adoption roadmap which leads to a lack of preparation, waste of effort, and penalties due to exceeding the exit duration.

Key considerations

  • Ease of transition
  • Support for open source
  • Exit provisions

Azure Data Factory – Self Hosted Integration Runtime Sharing

By | Blogs, data | No Comments

Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies

Contributors: Sagar Gupta, Senior Software Engineer | Amruta Despande, Azure Associate Architect | Suraj R, Azure Cloud Engineer

Introduction

Continuing our discussion on Azure Data Factory(ADF) from our previous blogs. In the past we have discussed ADF and configuration steps for a high availability self hosted integration runtime (IR). You can read more about that here: Azure Data Factory – Setting up Self-Hosted IR HA enabled

This is a quick short post on IR sharing in ADFs for better cost optimization and resource utilization also covers common shortcomings while creating ADF using Terraform and/or SDKs.

Use Case

This is again part of a major data migration assignment from AWS to Azure. We are extensively using ADF to setup ETL pipelines and migrate data effectively – both historical and incremental data.

Problem Statement

 Since the data migration activity involves different types of databases and complex data operations, we are using multiple ADFs to achieve this. Handling private production data required self-hosted IRs to be configured to connect to the production environment. The general best practices for self-hosted IR is a high-availability architecture. An IR can have max 4 nodes – a minimum of 2 nodes for high availability. So here arises the problem – for multiple ADFs how many such self-hosted IRs would one use to power this?

Solution

This is where IR sharing comes into the picture. ADF has this brilliant feature of IR sharing wherein many ADFs can share the same IR. The advantage of this will be price & resource reduction. Suppose you had to run 2 ADFs – one to perform various heavy migrations for AWS RDS MySQL to Azure, and the other one for AWS RDS PostgreSQL. Ideally we would have created 2 different IRs one each able to connect to MySQL & PostgreSQL separately. For a production level implementation, this would mean 2X4 = 8 nodes (Windows VMs). Using IR sharing, we can create one self-hosted IR with 4 nodes and share this IR with both ADFs cutting cost on 4 extra nodes. Please note – The IR node sizing depends on your workloads. That’s a separate calculation. This is only from a high level consideration.

Steps to enable IR sharing between ADFs

Step1: Login to the Azure Portal.

Step 2: Search forData Factories in the main search bar.

Step3: Select your Data Factory. Click on Author & Monitor.

Click on Pencil icon to edit.

Step 4: Click on Connections. Open Management Hub.

Step 5: Click on Integration runtimes to view all your IRs. Select your self-hosted IR for which you want to enable sharing.

Refer to https://www.powerupcloud.com/azure-data-factory-setting-up-self-hosted-ir-ha-enabled/ for detailed information on creating self-hosted IRs.

Step 6: This opens the Edit integration runtime tab on the right side. Go to Sharing and Click on + Grant permission to another Data Factory.

Copy the Resource ID from this step. We will use it in Step 9.

This will list down all ADFs with which you can share this IR.

Step 7: You can either search your ADF or manually enter service identity application ID. Click on Add

Note: You may sometimes be unable to find the ADF from this dropdown list. Even though your ADF lists in the Data Factory page, it does not show up in this list. That will leave you puzzled. Not to worry – such a case might arise when you are creating ADFs using the Azure APIs programmatically or through Terraform. Don’t forget to add the optional identity parameter while creating. This assigns a system generated Identity to the resource.

Sample Terraform for ADF

provider "azurerm" {
    version = "~>2.0"
  features {}
}

resource "azurerm_data_factory" "adf-demo" {
  name                = "adf-terraform-demo"
  location            = "East US 2"
  resource_group_name = "DEMO-ADF-RG"
  identity {
    type = "SystemAssigned"
  }
}

To locate the service identity id of ADF. Go to Data Factories page, select the ADF and click on Properties.

Step 8: Click on Apply for the changes to effect.

Incase you do not have required permissions, you might get the following error

Error occurred when grant permission to xxxxxxxx-xxxx-xxxx-xxx-xxxxxxxxx. Error: {"error":{"code":"AuthorizationFailed","message":"The client 'xxxxxxxx@powerupcloud.com' with object id 'xxxxxxxx-xxxx-xxxx-xxx-xxxxxxxxx' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/xxxxxxxx-xxxx-xxxx-xxx-xxxxxxxxx/resourcegroups/DEMO-ADF-RG/providers/Microsoft.DataFactory/factories/adf-terraform-demo-Postgres-to-MySQL/integrationRuntimes/integrationRuntime3/providers/Microsoft.Authorization/roleAssignments/xxxxxxxx-xxxx-xxxx-xxx-xxxxxxxxx' or the scope is invalid. If access was recently granted, please refresh your credentials."}}

Step 9:

Now go to the ADF where this has to be shared (one added in the sharing list – adf-terraform-demo). Go to Connections → Integration runtimes → +New  →  Azure, Self Hosted

Here you will find Type as  Self-Hosted (Linked). Enter the Resource ID from Step 6 and Create.

After successful creation, you can find the new IR with sub-type Linked

The IR sharing setup is complete. Be seamless with your ADF pipelines now.

Conclusion

Sharing IRs between ADFs will save greatly on the infrastructure costs. Sharing is simple & effective. We will come up with more ADF use cases and share our problem statements, approaches and solutions.

Hope this was informative. Do leave your comments below for any questions.

Read the series here

Significance of BI tools in the Era of Big Data

By | Uncategorized | No Comments

Written by Anjali Sharma, Software Engineer at Powerupcloud Technologies

The Demand of business intelligence tool in the Big data world has become the BOOM these days. Today, after Big Data, one of the most used buzzword in the business world is nothing but Business Intelligence. Then how do they both relate to each other? The ascendance of Business Intelligence to the highest priority of most companies has meant that BI Analysts are highly sought after. Business Intelligence (BI) tools have enabled organizations to get revealing insights into their operations and processes and use them to improve productivity, boost revenue, cut costs, etc.

BI refers to the business strategy and technological tools used for analysing business information, including analysis of historical data, analysis of current data as well as future predictions. Hence, BI is a business discipline, much as it is also a technological discipline. As the technological part of BI, companies use various databases and data analytics tool, which comprise their enterprise BI infrastructure. BI tools have been around for decades. However, in recent years, the advent of Big data and artificial intelligence technologies have increased the number and broadened the functionalities of BI technologies.

Gone are the days when businesses were assumed to be like gambling. In those days, there were no other options than making ‘the perfect guess.’ But now, as you know, when it comes to a company’s future, this is no longer an appropriate method to arrive at a strategy. With the help of Business Intelligence software, one can have accurate data, real time updates, and means for forecasting and even to predict conditions.

Assortments- BI tool can have several visages according to the business demand or technical requirements:

  • Data Visual representation tool
  • Data Mining tool
  • Reporting tool
  • Querying tool
  • Analysis tool
  • Geolocation analysis tool etc.

How Tableau becomes the most Powerful BI tool

Now let’s understand among all BI tools how Tableau becomes the most powerful & user friendly-

Tableau offers powerful and sophisticated data collection, analysis and visualizations. One of the claims on Tableau’s website is “Tableau helps people see and understand their data” Tableau allows users to drill deep into data, create powerful visualizations to analyse the information, and automatically produce valuable business insights.

Several Data Source Connections

One of the main strengths of Tableau is that it can automatically connect with hundreds of data sources without any programming needed, including big data providers.

Tableau is one of the leading BI tools for Big Data Hadoop which you can use. It provides the connectivity to various Hadoop tools for the data source like Hive, Cloudera, HortonWorks, etc. Also, not only with Hadoop, Tableau provides the option to connect the data source from over 50 different sources including AWS and SAP.

Drag & Drop facility

Tableau’s drag & drop facility makes it really easy and user friendly. Tableau is designed with most integration taking place through drag-and-drop icons. You can quickly create visuals from data by dragging the icon for the relevant data set into the visualisation area. In other words, you can access visualisations that reveal important insights within a few clicks.

Live and Extracted Data Connection

Tableau allows users to connect live data and extracted data both. User can instantly switch between live data connections and pre-extracted data. You can also schedule extract refreshes and get notifications when live data connections fail.

Security

Users can collaborate securely across networks or the cloud, using Tableau Server and Tableau Online. This allows rapid sharing of insights, meaning that people can take action more quickly to save costs or make more money for the business.

Above mentioned features of tableau make it different from other BI tools. Data is growing faster than ever. With the proliferation of the internet, we now generate even more information. According to IBM, 2.5 quintillion bytes of data are created every day! However, less than 0.5% of it is ever analysed and used. Therefore, the importance of data analysis tools has increased these days. From past 6 years Tableau has been the leader among all data analysis and visualisation tool. Specializing in beautiful visualizations, Tableau lets you perform complex tasks with simple drag-and-drop functionalities and numerous type of charts.

If you are beginner, for better understanding let’s do a hands-on on Tableau with some sample data. Here I am using skill registry dataset where we have created a Google form for the employees of our organisation, we have shared it among them where they can fill their name, email address skills, Total experience etc. After collecting the data, we have created a CEO dashboard.

Download & Install Tableau desktop 14 days’ trial version-  

https://www.tableau.com/en-gb/products/trial

Also you can try free Tableau Public version 2020.2.

Open tableau and connect the data source wherever you have your data as Tableau provides more than 100 data sources we can connect.

After connecting data source check if the data is in correct format, any data source filter needs to be applied or should we use the data interpreter etc. Connections can be Live or Extracted as per the requirements.

What is Live & Extract? (Refer the link given below)

https://www.tableau.com/about/blog/2016/4/tableau-online-tips-extracts-live-connections-cloud-data-53351

If the data is not sufficient in one table, you can take another table using joins.

Now go to the sheet. It would be the first step moving forward creating your very first dashboard.

Tableau divides its data in two types- Measures & Dimensions.

Now Dimensions are something which contain qualitative values like Name, Date, Country etc.

And Measures are those field that can be aggregated or can be used for mathematical operations. In short the numeric values of the dimensions are measures.

As I am using Employees data I can put their location in one sheet using Map chart.

For another view I have put Employees’ skills in two different sheets skill categories and skill-sub categories using name count in measures so that we can analyze how many resources we do have in each skill category.

In last view I have added resource information like their email address, service group, Resume also I have added using action filter.

Now go to the dashboard symbol put all the sheets together and create a visualize representation. You can apply filters according to the requirements also use format option for making you dashboard clean and colorful.

(Data security is the reason why I have hidden the counts and resource information)

For the practice you can download sample data from https://www.kaggle.com/datasets and create your own dashboard.

DevOps for Databases using Liquibase, Jenkins and CodeCommit

By | data, Data pipeline, DevOps | 4 Comments

Written by Arun Kumar, Associate Cloud Architect at Powerupcloud Technologies

In the Infra modernization, we are moving the complete architecture to microservices-based with CI/CD deployment methods, these techniques are suitable for any application deployments. But  most of the time Database deployments are manual efforts. Applications and databases are growing day by day. Especially the database size and the operational activities are getting complex and maintaining this by a database administrator is a bit tedious task.

 For enterprise organizations this is even more complex when it comes to managing multiple DB engines with hundreds of Databases or multi-tenant databases . Currently, below are the couple of challenges that the DBA faces indeed which are the manual activities.

  • Creating or Modifying the stored procedures, triggers and functions in the database.
  •  Altering the table in the database.
  • Rollback any Database deployment.
  • Developers need to wait for any new changes to be made in the database by DBA which increase the TAT(turn around time) to test any new features even in non-production environments.
  • Security concerns by giving access to the Database to do change and maintaining access to the database will be huge overhead.
  • With vertical scaling different DB engines of the database it is difficult to manage.

One of our Enterprise customers has all the above challenges, to overcome this we have explored various tools and come up with the strategy to use Liquibasefor deployments.  Liquibase supports standard SQL databases (SQL, MySQL, Oracle, PostgreSQL, Redshift, Snowflake, DB2, etc..) but the community is improving their support towards NoSQL databases, now it supports MongoDB, Cassandra. Liquibase will help us in versioning, deployment and rollback.

With our DevOps experience, We have integrated the both open source tools Liquibase and Jenkins automation server for continuous deployment. We can implement this solution across any cloud platform or on-premise.

Architecture

For this demonstration we will be considering the AWS platform. MS SQL is our main database, lets see how to setup a CI/CD pipeline for the database.

Pre-request:

  • Setup a sample repo in codecommit .
  • Jenkins server up and running.
  • Notification service configured in Jenkins.
  • RDS MSSQL up and running.

Setup the AWS codecommit repo:

To create a code repo in AWS CodeCommit refer the following link .

https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up-https-windows.html

Integration of codecommit with Jenkins:

To trigger the webhook from the AWS CodeCommit, We needto configure the AWS SQS and SNS. please follow the link

https://github.com/riboseinc/aws-codecommit-trigger-plugin

Webhook connection from Codecommit to Jenkins, We need to install the AWS CodeCommit Trigger Plugin.

Select -> Manage Jenkins -> Manage Plugins -> Available ->  AWS CodeCommit Trigger Plugin.

  • In Jenkin, create a new freestyle project. 
  • In the Source Code Management add your CodeCommit repo url and credentials.

Jenkins -> Manage Jenkins -> Configure System -> AWS CodeCommit Trigger SQS Plugin.

Installation and configuration of Liquibase:

sudo add-apt-repository ppa:webupd8team/java
sudo apt install openjdk-8-jdk
java -version
wget https://github.com/liquibase/liquibase/releases/download/v3.8.1/liquibase-3.8.1.tar.gz
mkdir liquibase
cd liquibase/
mv liquibase-3.8.1.tar.gz /opt/liquibase/
tar -xvzf liquibase-3.8.1.tar.gz

Based on your Database, you need to download the JDBC driver(jar file) in the same location of the liquibase directory. Go through the following link.

https://docs.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-ver15

Integration of Jenkins with Liquibase:

During the deployment Jenkins will ssh into Liquibase instance, we need to generate a ssh key pair for Jenkins user and paste the key into Liquibase server linux user. Here we have a Ubuntu user on the Liquibase server.

Prepare the deployment script in Liquibase server.

For Single database server deployment: singledb-deployment.sh

-- Script Name: singledb-deployment.sh
#!/bin/bash
set -x
GIT_COMMIT=`cat /tmp/gitcommit.txt`
sudo cp /opt/db/temp/temp.sql /opt/db/db-script.sql
old=$(sudo cat /opt/db/db-script.sql | grep   'change' | cut -d ":" -f 2)
sudo  sed -i "s/$old/$GIT_COMMIT/g" /opt/db/db-script.sql
dburl=`cat /home/ubuntu/test | head -1 | cut -d ":" -f 1`
dbname=`cat /home/ubuntu/test | head -1 | cut -d ":" -f 2`
sed  -i -e "1d" /home/ubuntu/test
sudo sh -c 'cat /home/ubuntu/test >> /opt/db/db-script.sql'
export PATH=/opt/liquibase/:$PATH
echo DB_URLs is $dburl
echo DB_Names is $dbname
for prepare in $dbname; do  liquibase --driver=com.microsoft.sqlserver.jdbc.SQLServerDriver --classpath="/opt/liquibase/mssql-jdbc-7.4.1.jre8.jar" --url="jdbc:sqlserver://$dburl:1433;databaseName=$prepare;integratedSecurity=false;" --changeLogFile="/opt/db/db-script.sql"  --username=xxxx --password=xxxxx  Update;  done
sudo rm -rf /opt/db/db-script.sql  /home/ubuntu/test /tmp/gitcommit.txt

For Multi database server deployment: Multidb-deployment.sh

--Script name: Multidb-deployment.sh
#!/bin/bash
set -x
GIT_COMMIT=`cat /tmp/gitcommit.txt`
sudo cp /opt/db/temp/temp.sql /opt/db/db-script.sql
old=$(sudo cat /opt/db/db-script.sql | grep   'change' | cut -d ":" -f 2)
sudo  sed -i "s/$old/$GIT_COMMIT/g" /opt/db/db-script.sql

csplit -sk /home/ubuntu/test '/#----#/' --prefix=/home/ubuntu/test
sed  -i -e "1d" /home/ubuntu/test01
while IFS=: read -r db_url db_name; do
echo "########"
sudo sh -c 'cat /home/ubuntu/test01 >> /opt/db/db-script.sql'
export PATH=/opt/liquibase/:$PATH
echo db_url is $db_url
echo db_name is $db_name
for prepare in $db_name; do  liquibase --driver=com.microsoft.sqlserver.jdbc.SQLServerDriver --classpath="/opt/liquibase/mssql-jdbc-7.4.1.jre8.jar" --url="jdbc:sqlserver://$db_url:1433;databaseName=$prepare;integratedSecurity=false;" --changeLogFile="/opt/db/db-script.sql"  --username=xxxx --password=xxxx  Update;  done
done < /home/ubuntu/test00
sudo rm -rf /opt/db/db-script.sql  /home/ubuntu/test* /tmp/gitcommit.txt
  • In your Jenkins Job use shell to execute the commands.
  • The file test is actually coming from your code commit repo which contains the SQL queries and SQL server information
  • Below is the example job for multiple database servers. So we used to trigger the mutlidb-deployment.sh file. If you are using single SQL server deployment use singledb-deployment.sh

Prepare sample SQL Database for  demo:

CREATE DATABASE employee;
use employee;
CREATE TABLE employees
( employee_id INT NOT NULL,
  last_name VARCHAR(30) NOT NULL,
  first_name VARCHAR(30),
  salary VARCHAR(30),
  phone INT NOT NULL,
  department VARCHAR(30),
  emp_role VARCHAR(30)
);
INSERT into [dbo].[employees] values ('1', 'kumar' ,'arun', '1000000', '9999998888', 'devops', 'architect' );
INSERT into [dbo].[employees] values ('2', 'hk' ,'guna', '5000000', '9398899434, 'cloud', 'engineer' );
INSERT into [dbo].[employees] values ('3', 'kumar' ,'manoj', '900000', '98888', 'lead', 'architect' );

Deployment 1: (for single SQL server deployment)

We are going to insert a new row using CI/CD

  • db-mssql: CodeCommit Repo
  • test: SQL server information( RDS endpoint: DBname) and SQL that we need to deploy.
  • Once we commit our code to our repository(CodeCommit). The webhook triggers the deployment

Check the SQL server to verify the row inserted:

Deployment 2: (for Multiple SQL server to deploy same SQL statements)

  • db-mssql: CodeCommit Repo
  • test: SQL server information( RDS endpoint: DBname) and SQL that we need to deploy.
  • #—-#: this is the separator for the servers and SQL queries so don’t remove this.

Deployment 3:  (for Multiple SQL server to deploy same SQL stored procedure)

  • db-mssql: CodeCommit Repo
  • test: SQL server information( RDS endpoint: DBname) and SQL that we need to deploy.
  • #—-#: this is the separator for the servers and SQL queries so don’t remove this.

Notification:

  • Once the Job is executed you will get the email notification.

Liquibase Limitations:

  • Commented messages in the function or SP will not get updated in the Database.

Conclusion:

Here we used this liquibase on AWS, so we used RDS, CodeCommit and etc. But you can use the same method to configure the automatic deployment pipeline for databases with versioning, rollback in (AWS RDS, Azure SQL Database, Google Cloud SQL, Snowflake) using open source tool Liquibase and Jenkins.

Migration: Assessment & Planning for one of the largest low-cost airlines

By | Case Study, Cloud Case Study | No Comments

About Customer

The customer is UAE’s aviation corporation catering over 70 million passengers as of date. Passenger service system (PSS), their ticket booking application was a legacy system that they intended to migrate to a cloud environment while ensuring they manage to leverage administered services of cloud by conducting a Migration Readiness Assessment & Planning (MRAP).

Problem Statement

Passenger Service System (PSS) was the existing ticket booking application for the customer. The objective was to understand this legacy system and then recommend how it can be migrated to AWS while leveraging the cloud-native capabilities via an MRAP assessment. The focus would be application modernization rather than a lift & shift migration to the cloud. The customer team intends to leverage managed services of cloud and go serverless, containers, open source etc. wherever possible. The customer team also wants to move away from the commercial Oracle database to a more open-source AWS Aurora PostgreSQL database due to the high licensing costs imposed by Oracle.

MRAP is critical to any organization that plans to adapt to the cloud as this tool-based assessment checks their application’s ability to cloud. Powerup was approached to perform MRAP on their existing set up to propose a migration plan as well as a roadmap, post its analysis.

Proposed Solution

The customer’s MRAP Process

To begin with, the RISC Networks RN150 virtual appliance, an application discovery tool that poses as an optional deployment architecture was configured and installed on the customer’s existing PSS Equinix data centre (DC) to collect data and create a detailed tool-based assessment to understand the existing set up ‘s readiness to migration.

Application stacks were built for the applications in scope and assessments as well as group interviews were conducted with all stakeholders. Data gathered from stakeholders were cross-verified with the information provided by the customer’s IT and application team to bridge gaps if any. Powerup team would then work on creating a proposed migration plan and a roadmap.

MRAP Deliverables

A comprehensive and detailed MRAP report included the following information:

Existing overall architecture

The existing PSS system was bought from a vendor called Radixx International, which provided three major services:

  • Availability service, an essential core service mainly used by online travel agencies (OTAs), end-users and global distribution system (GDS) to check the availability of their customer’s flights. It’s base system contained modules like Connect Point CP (core), payments, the enterprise application (Citrix app) all written in .NET and the enterprise application for operation and administration written in VB6.
  • Reservation service was used in booking passengers’ tickets where data was stored in two sessions, Couchbase and the Oracle database. The webpage traffic was 1000:1 when compared to availability service.
  • DCS System (Check-in & Departure Control Systems) is another core system of any airline, which assists in passenger check-in, baggage check-in and alerting the required officials. It is a desktop application used by airport officials to manage passengers from one location to another with the availability of an online check-in module as well.

Existing Database: Oracle is the current core database that stores all critical information consisting of 4 nodes – 2 Read-Write nodes in RAC1 & another 2 (read-only nodes) in RAC2. All availability checks are directed to the read-only Oracle nodes. The Oracle database nodes are heavily utilized roughly at 60-70% on an average with currently 14 schemas within the Oracle database accessed by the various modules. Oracle Advanced Queuing is used is some cases to push the data to the Oracle database.

Recommended AWS Landing zone structure

The purpose of AWS Landing Zone is to set up a secure, scalable, automated multi-account AWS environment derived from AWS best practices while implementing an initial security baseline through the creation of core accounts and resources.

The following Landing Zone Account structure was recommended for the customer:

AWS Organizations Account:

Primarily used to manage configuration and access to AWS Landing Zone managed accounts, the AWS organizations account provides the ability to create and financially manage member accounts.

Shared Services Account:

It is a reference for creating infrastructure shared services. In the customer’s case, Shared Services Account will have 2 VPCs – one for management applications like AD, Jenkins, Monitoring Server, Bastion etc. and other Shared services like NAT Gateway & Firewall. Palo Alto Firewall will be deployed in the shared services VPC across 2 Availability Zones (AZ)s and load balanced using AWS Application Load Balancer.

AWS SSM will be configured in this account for patch management of all the servers. AWS Pinpoint will be configured in this account to send notifications to customer – email, SMS and push notifications.

Centralized Logging Account:

The log archive account contains a central Amazon S3 bucket for storing copies of all logs like CloudTrail, Config, CloudWatch logs, ALB Access logs, VPC flow logs, Application Logs etc. The logging account will also host the Elasticsearch cluster, which can be used to create custom reports as per customer needs, and Kibana will be used to visualize those reports. All logs will be pushed to the current Splunk solution used by the customer for further analysis.

Security Account:

The Security account creates auditor (read-only) and administrator (full-access) cross-account roles from a security account to all AWS Landing Zone managed accounts. The organization’s security and compliance team can audit or perform emergency security operations with this setup and this account is also designated as the master Amazon GuardDuty account. Security Hub will be configured in this account to get a centralized view of security findings across all the AWS accounts and AWS KMS will be configured to encrypt sensitive data on S3, EBS volumes & RDS across all the accounts. Separate KMS keys will be configured for each account and each of the above-mentioned services as a best practice.

Powerup recommended Trend Micro as the preferred anti-malware solution and the management server can be deployed in the security account.

Production Account:

This account will be used to deploy the production PSS application and the supporting modules. High availability (HA) and DR will be considered to all deployments in this account. Auto-scaling will be enabled wherever possible.

UAT Account – Optimized Lift & Shift:

This account will be used to deploy the UAT version of the PSS application. HA and scalability are not a priority in this account. It is recommended to shut down the servers during off-hours to save cost.

DR Account:

Based on the understanding of the customer’s business a Hot Standby DR was recommended where a scaled-down version of the production setup will be always running and will be quickly scaled up in the event of a disaster.

UAT Account – Cloud-Native:

The account is where the customer’s developers will test all the architectures in scope. Once the team has made the required application changes, they will use this account to test the application on the cloud-native services like Lambda, EKS, Fargate, Cognito, DynamoDB etc.

Application Module – Global Distribution Systems (GDS)

A global distribution system (GDS) is one of the 15 modules of the PSS application. It is a computerized network system that enables transactions between travel industry service providers, mainly airlines, hotels, car rental companies, and travel agencies by using real-time inventory (for e.g., number of hotel rooms available, number of flight seats available, or number of cars available) to service providers.

  • The customer gets bookings from various GDS systems like Amadeus, Sabre, Travelport etc.
  • ARINC is the provider, which connects the client with various GDS systems.
  • The request comes from GDS systems and is pushed into the IBM MQ cluster of ARINC where it’s further pushed to the customer IBM MQ.
  • The GMP application then polls the IBM MQ queue and sends the requests to the PSS core, which in turn reads/writes to the Oracle DB.
  • GNP application talks with the Order Middleware, which then talks with the PSS systems to book, cancel, edit/change tickets etc.
  • Pricing is provided by the Offer Middleware.

Topology Diagram from RISC tool showing interdependency of various applications and modules:

Any changes in the GDS architecture can break the interaction between applications and modules or cause a discrepancy in the system that might lead to a compromise in data security. In order to protect the system from becoming vulnerable, Powerup recommended migrating the architecture as is while leveraging the cloud capabilities.

Proposed Migration Plan

IBM MQ cluster will be setup on EC2, and auto-scaling will be enabled to maintain the required number of nodes thus ensuring availability of EC2 instances at all times. IBM MQ will be deployed in a private subnet.

Amazon Elastic File System (Amazon EFS) will be automatically mounted on the IBM MQ server instance for distributed storage, to ensure high availability of the queue manager service and the message data. If the IBM MQ server fails in one availability zone, a new server is created in the second availability zone and connected to the existing data, so that no persistent messages are lost.

Application Load Balancer will be used to automatically distribute connections to the active IBM MQ server. GMP Application and PNL & ADL application will be deployed on EC2 across 2 AZs for high availability. GMP will be deployed in an auto-scaling group to scale based on the queue length in the IBM MQ server and consume and process the messages as soon as possible whereas PNL & ADL to scale out in case of high traffic.

APIS Inbound Application, AVS application, PSF & PR application and the Matip application will all be deployed on EC2 across 2 AZs for high availability in an auto-scaling group to scale out in case of high traffic.

Cloud-Native Architecture

  • GMP and GMP code sharing applications will be deployed as Lambda functions. The lambda function will run when a new message comes to the IBM MQ.
  • PNL & ADL application will be deployed as a Lambda function and the function will run when there is a change in the PNR number in which case a message must be sent to the airport.
  • AVS application will be deployed as Lambda functions where it will run when a message will be sent to the external systems.
  • Matip application will be deployed as a Lambda function and will run when a message will be sent using the MATIP protocol.
  • PFS & PR application will be deployed as Lambda functions. The lambda function will run when a message will be sent to the airport for booking.
  • APIS Inbound application will be deployed as a Lambda function and it will run when an APIS message will be sent to the GDS systems.

For all the above, required compute resources will be assigned as per the requirement. Lambda function will scale based on the load.

Application modifications recommended

All the application components like GMP, AVS, PNL & ADL, PFS & PR, Matip, etc are currently in .NET. which have to be moved into .NET Core to be run as Lambda functions. The applications are recommended to be broken down into microservices.

Oracle to Aurora Database Migration

AWS schema conversion tool (SCT) is run on the source database, which will generate a schema conversion report that will help understand interdependencies of existing schemas, and how they can be migrated on to Aurora PostgreSQL. The report will contain database objects some that can be directly converted by the SCT tools and the rest, which would need manual intervention. For Oracle functionalities that are not supported in Aurora PostgreSQL, the application team must write custom code to migrate those. Once all the schemas are migrated, AWS Database Migration Service will be used to migrate the entire data set from Oracle to Aurora.

Oracle to Aurora-PostgreSQL Roadmap

  • Lift & shift:

The current Oracle database will be moved to AWS as-is without any changes in order to kick-start the migration. The Oracle database can run on AWS RDS service or EC2 instances. One RDS node will be the master database in read/write mode. The master instance is the only instance to which the application can write to. There will be 3 additional read-replicas spread across 2 AZs of AWS to handle the load that is coming in for read requests. In case the master node goes down one of the read replicas is promoted as the master node.

  • Migrate the Oracle schemas to Aurora:

Once the Oracle database is fully migrated to AWS, the next step is to gradually migrate the schemas one by one to Aurora – PostgreSQL. The first step is to map all the 14 schemas with each application module of the customer. The Schemas will be migrated based on this mapping, wherever there are non-dependent schemas on other modules, it will be identified and migrated first.

The application will be modified to work with the new Aurora schema. Any functionality, which is not supported by Aurora, will be moved to application logic.

DB links can be established from Oracle to Aurora, however, it cannot be established from Aurora to Oracle database.

Any new application development that is in progress should be compatible and aligned with the Aurora schema.

  • Final Database:

Finally, all the 14 schemas will be migrated onto Aurora and the data will be migrated using DMS service. The entire process is expected to take up to 1 year. There will be 4 Aurora nodes – One Master Write & 3 Read Replicas spread across 2 AZs of AWS for high availability.

Key Findings

The assessment posed as a roadmap to move away from Oracle to PostgreSQL saving up to 30% in Oracle License cost. It also provided a way forward for each application towards cloud-native.

Current infrastructure provisioned was utilized at around 40-50% and a significant reduction in the overall total cost of ownership (TCO) was identified if they went ahead with cloud migration. Less administration by using AWS managed services also proved to be promising, facilitating smooth and optimized functioning of the system while requiring minimum administration.

With the MRAP assessment and findings in place, the customer now has greater visibility towards cloud migration and the benefits it would derive from implementing it.

Cloud platform

AWS.

Technologies used

EPS, ALB, PostgreSQL Aurora, Lambda, RDS Oracle, VPC.

Thundering Clouds – Technical overview of AWS vs Azure vs Google Cloud

By | Blogs, Powerlearnings | No Comments

Compiled by Kiran Kumar, Business Analyst at Powerupcloud Technologies.

The battle of the Big 3 Cloud Service Providers

The cloud ecosystem is in a constant state of evolution, with increasing maturity and adoption, the battle for the mind and wallet intensifies. With Amazon Web Services (AWS), Microsoft Azure, and Google Cloud (GCP) leading with IaaS maturity, the likes of Salesforce, SAP, and Oracle to Workday, which recently reached $1B in quarterly revenue are both gaining ground and carving out niches in the the ‘X’aaS space. The recent COVID crisis has accelerated both adoption and consideration as enterprises transform to cope, differentiate, and sustain an advantage over the competition.  

In this article, I will stick to referencing the AWS, Azure, and GCP and terming them as the BIG 3, a disclaimer, Powerup is a top-tier partner with all three and the comparisons are purely objective based on current publically available information. It is very likely that when you do read this article a lot might have already changed. Having said that, the future will belong to those who excel in providing managed solutions around artificial intelligence, analytics, IoT, and edge computing. So let’s dive right in:      

Amazon Web Services –  As the oldest amongst the three and the most widely known, showcasing the biggest spread of availability zones and an extensive roster of services. It has monopolized its maturity to activate a developer ecosystem globally, which has proven to be a critical enabler of its widespread use.      

Microsoft Azure – Azure is the closest that one gets to AWS in terms of products and services. While AWS has fully leveraged its head start, Azure tapped into Microsoft’s huge enterprise customers and let them take advantage of the already existing infrastructure by providing better value through Windows support and interoperability.

Google Cloud Platform –  Google Cloud was announced in 2011, for being less than a decade old it has created a significant footprint. Initially intended to strengthen google’s products but later came up with an enterprise offering. A lot is expected from its deep expertise in AI, ML, deep learning & data analytics to give it a significant edge over the other providers.

AWS vs. Azure vs. Google Cloud: Overall Pros and Cons

In this analysis, I dive into broad technical aspects of these 3 cloud providers based on the common parameters listed below.

  • Compute
  • Storage
  • Exclusives  

Compute

AWS Compute:

Amazon EC2 EC2 or Elastic compute cloud is Amazon’s compute offering. EC2 can support multiple instance types (bare metal, GPU, windows, Linux, and more)and can be launched with different security and networking options, you can choose from a wide range of templates available based on your use case. EC2 can both resize and autoscale to handle changes in requirements which eliminates the need for complex governance.

Amazon Elastic Container Service a highly scalable, high-performance container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications, manage and scale a cluster of VM’s, or schedule containers on those VM’s.

Amazon EKS makes it easy to deploy, manage, and scale containerized applications using Kubernetes on AWS.

It also has its own Fargate service that automates server and cluster management for containers, a virtual private cloud option known as Lightsail for batch computing jobs, Elastic Beanstalk for running and scaling Web applications, lambda for launching serverless applications.

Container services Include Amazon Elastic Container Registry a fully-managed Docker container registry which allows you to store, manage, and deploy Docker container images.

Microsoft VM:

Azure VM: Azure VM’s are a secure and highly scalable compute solution with various instance types optimized for high-performance computing, Ai, and ML-based computing container instances and with azure’s emphasis on hybrid computing, support for multiple OS’s types, Microsoft software, and services. Virtual Machine Scale Sets are used to auto-scale your instances.

Azure container services include Azure Kubernetes service fully managed Kubernetes based Container Solution.

Container Registry which lets you store and manage container images across all types of Azure deployments.

Service Fabric A unique fully managed services which lets you develop microservices and orchestrate containers on Windows or Linux.

Other services include Web App for Containers which lets you run, scale, and deploy containerized web apps. Azure Functions for launching serverless applications, Azure Red Hat OpenShift, with support for  OpenShift.

Google Compute Engine:

Google Compute Engine (GCE) is google compute service Google is fairly new to cloud compared to the other two CSP’s and it is reflected in its catalog of services GCE offers the standard array of features starting from windows and Linux instances, RESTful API’s, load balancing, data storage, and networking, CLI and GUI interfaces, and easy scaling. Backed by Google, GCE can spin up instances faster than most of its competition under most cases. It runs on a carbon-neutral infrastructure and offers the best value for your buck among the competition.

Google Kubernetes Engine (GKE) is based on Kubernetes, originally developed inhouse Google has the highest expertise when it comes to Kubernetes and has deeply integrated it into the google cloud platform GKE service can be used to automate many of your deployment, maintenance, and management tasks. Also can be used with hybrid clouds via the Anthos service.

Storage

AWS Storage:

Amazon S3 is an object storage service that offers scalability, data availability, security, and performance for most of your storage requirements. Amazon Elastic Block Store persistent block storage that can be used with your Amazon EC2 instances. Elastic file system for scalable file storage.

Other storage services include S3 Glacier, a secure, durable, and extremely low-cost storage service for data archiving and long-term backup, Storage Gateway for hybrid storage, and snowball, a device used for offline small to medium scale data transfer.

Database

And other database services like Amazon Aurora a SQL compatible relational database, RDS (relational database service), DynamoDB NoSQL database, Amazon ElastiCache forElasti Cache in-memory data store, Redshift data warehouse, Amazon Neptune a graph database.

Azure Storage:

Azure Blobs A massively scalable object storage solution, includes support for big data analytics through Data Lake Storage Gen2, Azure Files Managed file storage solution with support for on-prem, Azure Queues A reliable messaging store, Azure Tables A NoSQL storage solution for structured data.

Azure Disks Block-level storage volumes for Azure VMs similar to Amazon EBS.

Database

Database Services Include SQL based database like Azure SQL Database, Azure Database for MySQL, and, Azure Database for PostgreSQL for NoSQL data warehouse services, Cosmos DB, and table storage, Server stretch database is a hybrid storage service designed specifically for organizations leveraging Microsoft SQL on-prem and, Redis cache is an in-memory data storage service.

Google Cloud Storage:

GCP’s cloud storage service includes Google Cloud Storage unified, scalable, and highly durable object storage, Filestore network-attached storage (NAS) for Compute Engine and GKE instances, Persistent Disk object storage for VM instances and, Transfer Appliance for Large data transfer.

Database

On the database side, GCP has 3 NoSQL database Cloud BigTable for storing big data, Firestore a document database for mobile and web application data, Bigquery an analytics server, Memorystore for in-memory storage, Firebase Realtime Database cloud database for storing and syncing data in real-time. SQL-based Cloud SQL and a relational database called, Cloud Spanner that is designed for mission-critical workloads.

Benchmarks Reports

An additional drill-down would be to analyze performance figures for the three across for network, storage, and CPU, and here I quote research data from a study conducted by Cockroach labs.

Network

GCP has taken significant strides when it comes to network and latency compared to last year as it even outperforms AWS and Azure in network performance

  • Some of GCP’s best performing machines hover around 40-60 GB/sec
  • AWS machines stick to their claims and offer a consistent 20 to 25 GB/sec and
  • Azure’s machines offered significantly less at 8 GB/sec.  
  • When it comes to latency AWS outshines the competition by offering a consistent performance across all of its machines.
  • GCP does undercut AWS under some cases but still lacks the consistency of AWS.
  • Azure’s negligible performance in the network department has reflected in high latency making it the least performing among the three.

NOTE: GCP believes that skylake for the n1 family of machines, is the reason for their increase in performance on the network side.

Storage

AWS has superior performance in storage; neither GCP nor Azure even comes close to the read-write speeds and latency figures. This is largely due to the storage optimized instances like the i3 series. Azure and GCP do not have storage optimized instances and have a performance that is comparable to the non-storage optimized instances from Amazon While Azure offered slightly better read-write speed among the two, GCP offered better latency.

CPU

While comparing the CPU’s performances Azure machines showcased a slightly higher CPU performance thanks to Using conventional 16 core CPUs. Azure machines use 16 cores with a single thread per core and other clouds use hyperthreading to achieve 16 cores by combining 8cores with 2 threads. After comparing each offering across the three platforms here’s the best each cloud platform has to offer.

  • AWS c5d.4xlarge 25000 – 50000 Bogo ops per sec
  • Azure Standard_DS14_v2  just over 75000 Bogo ops per sec
  • GCP c2-standard-16 25000 – 50000 Bogo ops per sec
  • While AWS and GCP figures look similar AWS overall offers slightly better than GCP and
  • Avoiding hyperthreading has inflated Azure’s figures and while it might still be superior in performance it may not accurately represent the difference in the performance power it offers.

For detailed benchmarking reports visit Cockroach Labs  

Key Exclusives

Going forward, technologies like Artificial Intelligence, Machine Learning, the Internet of Things(IoT), and serverless computing will play a huge role in shaping the technology industry. The goal of most of the services and products will try to take advantage of these technologies to deliver solutions more efficiently and with precision. All of the “BIG 3“providers have begun experimenting with offerings in these areas. This can very well be the key differentiator between them.

AWS Key Tools:

Some of the latest additions to the AWS portfolio include AWS Graviton processors built using 64 bit Arm Neoverse cores. EC2 based M6g, C6g, and R6g instances are powered by these new-gen instances. Thanks to the power-efficient Arm architecture it is said to provide 40% better price performance over the X86 based instances.

AWS Outpost: Outpost is Amazon’s emphasis on the hybrid architecture; it is a fully managed ITaaS solution that brings all AWS products and services to anywhere by physically deploying it in your site. It is aimed at offering a consistent hybrid experience with the scalability and flexibility of AWS.

AWS has put a lot of time and effort into developing a relatively broad range of products and services in AI and ML space. Some of the important ones include AWS Sagemaker service for training and deploying machine learning models, the Lex conversational interface, and Polly text-to-speech service which powers Alexa services, its Greengrass IoT messaging service and the Lambda serverless computing service.

And AI-powered services like DeepLens which can be trained and used for OCR, Image, and, character Recognition, Gluon, an open-source deep-learning library designed to build and quickly train neural networks without having to know AI programming.

Azure Key Tools:

When it comes to hybrid support Azure offers a very strong proposition, with services like Azure stack and Azure Arc minimize your risks of going wrong. Knowing that a lot of enterprises are already using Microsoft’s services Azure tries to deepen this by offering enhanced security and flexibility through its hybrid services. With Azure Arc customers can manage resources deployed within Azure and outside of Azure through the same control plane enabling organizations to extend Azure services to their on-prem data centers.

Azure also consists of a comprehensive family of AI services and cognitive APIs which helps you build intelligent apps, services like Bing Web Search API, Text Analytics API, Face API, Computer Vision API and Custom Vision Service come under it. For IoT, it has several management and analytics services, and it also has a serverless computing service known as Functions.

Google Cloud Key Tools:

AI and machine learning are big areas of focus for GCP. Google is a leader in AI development, thanks to TensorFlow, an open-source software library for building machine learning applications. It is the single most popular library in the market, with AWS also adding support for TensorFlow in an acknowledgment of this.

Google Cloud has strong offerings in APIs for natural language, speech, translation, and more. Additionally, it offers IoT and serverless services, but both are still in beta stage. However Google has been working extensively on Anthos, as quoted by Sundar Pichai Anthos follows the “Write once and run anywhere” approach by allowing organizations to run Kubernetes workloads on-premises, AWS or Azure, however, Azure support is still in a beta testing stage. 

Verdict

Each of the three has its own set of features and come with their own set of constraints and advantages. The selection of the appropriate cloud provider should, therefore, like with most enterprise software be based on your organizational goals over the long term.

However, we strongly believe that multi-cloud will be the way forward for an organization for e.g. if an organization is an existing user of Microsoft’s services it is natural for it to prefer Azure. Most small, web-based/digitally native companies looking to scale quickly by leveraging AI/ML, Data services, would want to take a good look at Google Cloud. And of course, AWS with its absolute scale of products and services and maturity makes it very hard to ignore in any mix.

Hope this shed some light on the technical considerations, and will follow this up with some of the other key evaluating factors that we think you should consider while selecting your cloud provider.