All Posts By

powerupcloud

Data lake on cloud for India’s Largest Electric Car manufacturers

By | Uncategorized | No Comments

Customer: The pioneer of Electric vehicles and related technologies in India.

 

Problem Statement

All Customer’s vehicles are equipped with Internet of Things (IoT) sensors and the data collected by them is used to improve and track performance. Majority of the connected car services require a bi-directional communication between the car and the cloud. Cars send data to the cloud and enable apps like predictive maintenance, assisted driving, etc. Similarly, the car needs to be able to receive messages from the cloud to respond to remote commands, like charging of battery, remote lock/unlock door and remote activation of horn or lights. While scalable web technology, like TCP/IP can be implemented for car-to-cloud communication, however, implementing the cloud-to-car communication would require static IP addresses for each car in the system. This is not possible since cars move through cellular networks where there is no single IP address for each device. Other technical challenges for connected car services include unreliable connectivity, network latency and security.

MQTT addresses many of the challenges of creating scalable and reliable connected car services by enabling a persistent always-on connection between the car and cloud. When a network connection is available, a vehicle will publish data to the MQTT broker and will receive subscribed data from the same broker in near real-time. If a network connection is not available, the vehicle will wait until the network is available before attempting to transmit data. While the vehicle is offline, the broker will buffer data, and as soon as the vehicle is back online, it will immediately deliver the data. MQTT’s advanced message retention policies and offline message queuing are essential to accommodating network latency and unreliable mobile networks. MQTT brokers can be deployed to cluster nodes running on a private or public cloud infrastructure. This allows the broker to scale up and down depending on the number of vehicles trying to connect. MQTT is a secure protocol as each car is responsible for establishing a secure persistent TCP connection, using TLS, with the MQTT broker in the cloud. This means no public Internet endpoint is exposed on the car so no one can directly connect to the car. This makes it virtually impossible for a car to be directly attacked by a hacker on the Internet.

Almost 65% of the current Customer fleet operate on legacy platforms which send sensor data on TCP/IP instead on MQTT. The major challenge with the current architecture is that critical notifications like low battery, door open etc. will take ~10 minutes to reach the customer. The customer wanted to reduce the Turnaround Time (TAT) to near real time. Going forward, all new cars are expected to support MQTT and Websockets. All new sensors support updates via OTA or SMS etc. Secure File Transfer Protocol (SFTP) support is also there to download the updates.

Proposed Solution

The following sections explain in detail the current architecture:

  1. Sensor data size over TCP is ~360 bytes and over MQTT is ~440 bytes.
  2. IP whitelisting is done with the Azure IPs for authentication in the IoT sensors during the manufacturing and assembly stage
  3. Currently, Azure has a TCP/IP Gateway and MQTT Gateway server with Parsers running which pushes all the IoT time series data to the Cassandra Database. The Gateway and Parser applications are all Java-based. The application is services based and its written in NodeJS.
  • 21 services are currently running. Among which only 8 are containerized, remaining are running on plain VMs as Node applications. Multiple services are running on the same VM on Azure. Docker Swarm is being used for container orchestration.
  • Consul is used for service delivery and storing key-value pairs required for OAuth service.
  • There is an API Gateway service which connects to 2 backend secure kong containers.
  • The customer uses Redis to store user sessions. The key expire event is used to trigger notifications on schedule. RabbitMQ to store messages and ELK for log management and creating custom reports. All these are running as a Docker Containers.
  • 2 databases used are 3 Node Cassandra cluster and PostgreSQL. Cassandra is mostly to store the Time Series data from IoT Sensor. PostgreSQL database contains customer profile data and vehicle data and is mostly used by the Payment Microservice. All transactional data is stored in PostgreSQL and services access them. Total database size of Cassandra is ~120 GB while PostgreSQL is ~150 MB.

Solution approach

  • All application microservices and MQTT/TCP IoT brokers will be containerized and deployed on AWS Fargate.
  • All latest IoT sensor data will be sent to AWS environment. IoT Sensor data will be pushed to a Kinesis stream and a Lambda function to query the stream to find the critical data(low battery, door open etc.) and call the notification microservice.
  • Old sensor data to be sent to Azure environment initially due to existing public IP whitelisting. MQTT bridge & TCP port forwarding to be done to proxy the request from Azure to AWS. Once the old sensors are updated fully cut-over to AWS.

The important steps in the architecture are explained below:

  • IAM roles will be created to access different AWS service.
  • Network will be setup using the VPC service. Appropriate CIDR range, subnets, route tables etc. will be created.
  • NAT Gateway will be setup to enable internet access for servers in the private subnet.
  • All Docker Images will be stored in Elastic Container Registry (ECR).
  • AWS ECS Fargate will be used to run the Docker Containers. ECS Task Definitions will be configured for each container to be run.
  • AWS ECS – Fargate will be used to deploy all the container images on the Worker Nodes. In Fargate the Control Plain and Worker Nodes are managed by AWS. The Scaling, highly available (HA) services and patching is handled by AWS. Application load balancer will be deployed as the front end to all the application Microservices. ALB will forward the request to the Kong API Gateway which in turn will route the requests to the microservices.
  • Service level scaling will be configured in Fargate for more containers to spin up based on load.
  • Elasticache service with Redis Engine will be deployed across multiple Availability Zone (AZ) for HA. Elasticache is a managed service from AWS, where HA, patching, updates etc. is managed by AWS.
  • Aurora PostgreSQL will be used to host the PostgreSQL Database. SQL Dump will be taken from Azure PostgreSQL VM and then restored on Aurora.
  • 3 Node Cassandra cluster will be setup across multiple AZs in AWS for HA. 2 Nodes will be running in one AZ and another node in the second AZ.
  • A 3 Node Elasticsearch cluster will also be setup using the Managed Services from AWS. When using Elasticsearch service of AWS all the nodes are managed by AWS.

Notification workflow

Bi-directional notification workflow is explained below:

  • TCP & MQTT gateways will be running on EC2 machines and Parser application on a different EC2 instance.
  • AWS Public IP addresses will be whitelisted on the IoT Sensor during manufacturing for the device to securely connect to AWS.
  • The Gateway Server will push the raw data coming from the sensors to a Kinesis Stream.
  • The Parser server will push the converted/processed data to same/another Kinesis stream.
  • Lambda function will query the data in the Kinesis stream to find the fault/notification type data and will invoke the notification Microservice/ SNS to notify the customer. This reduces the current notification time from 6-8 minutes to almost near real time.
  • We will have Kinesis Firehose as a consumer reading from the Kinesis streams to push processed data to a different S3 bucket.
  • Another Firehose will push the processed data to Cassandra Database and a different S3 bucket.
  • AWS Glue will be used for data aggregation previously done using Spark jobs and push the data to a separate S3 bucket.
  • Athena will be used to query on the S3 buckets. Standard SQL queries works with Athena. Dashboards will be created using Tableau.

 

 

Cloud platform

AWS.

Technologies used

Cassandra, Amazon Kinesis, Amazon Redshift, Amazon Athena, Tableau.

Benefit

The customer vehicles are able to send/receive notifications in real-time. Using AWS, applications are able to scale on a secure, fault-tolerant, and low-latency global cloud. With the implementation of Continuous Integration (CI)/Continuous Delivery (CD) pipeline, the customer team is no longer spending its valuable time on mundane administrative tasks. Powerup helped customer achieved its goal of securing data, while lowering cloud bills and simplifying compliance.

Amazon EBS Multi-Attach now available on Provisioned IOPS io1 volumes

By | Uncategorized | No Comments

Prepared by Srividhya T (Cloud Engineer) and Jaswinder kour (Cloud Engineer)

Starting today, customers running Linux on Amazon Elastic Compute Cloud (EC2) can take advantage of new support for attaching Provisioned IOPS (io1) Amazon Elastic Block Store (EBS) volumes to multiple EC2 instances. Each EBS volume, when configured with the new Multi-Attach option, can be attached to a maximum of 16 EC2 instances in a single Availability Zone. Additionally, each Nitro-based EC2 instance can support the attachment of multiple Multi-Attach enabled EBS volumes. Multi-Attach capability makes it easier to achieve higher availability for applications that provide a write an order to maintain storage consistency.

Applications can attach Multi-Attach volumes as non-boot data volumes, with full read and write permission. Snapshots can be taken of volumes configured for Multi-Attach, just as with regular volumes, but additionally, the snapshot can be initiated from any instance that the volume is attached to, and Multi-Attach volumes also support encryption. Multi-Attach enabled volumes can be monitored using Amazon CloudWatch metrics, and to monitor performance per instance, you can use the Linux iostat tool.

I mentioned above that your applications do need to provide a write an order to maintain storage consistency, as obviously if multiple instances write data at the same time there is a risk of data being overwritten and becoming inconsistent. One simple possibility for Linux is to use a single-writer, multiple-reader approach where the volume is mounted read-write on one instance, and read-only on all others. Or you can choose to manage to enforce write ordering and consistency within your application code.

It supports the following features: –

  • Applications can attach Multi-Attach volumes as non-boot data volumes, with full read and write permission.
  • Snapshots can be taken of volumes configured for Multi-Attach, just as with regular volumes and also automates the Amazon EBS snapshot life cycle.
  • Multi-Attach volumes support EBS encryption.
  • Amazon CloudWatch metrics can be used to monitor Multi-Attach enabled volumes, and to monitor performance per instance you can use the Linux iostat tool.
  • Multi-Attach EBS volumes support Amazon CloudWatch Events.

Limitations

  • Multi-Attach enabled volumes can be attached to up to 16 Nitro-based instances that are in the same Availability Zone.
  • Multi-Attach is available in the N. Virginia (us-east-1) Oregon(us-west-2) Ireland( eu-west-1) and Asia Pacific (Seoul) Regions.
  • Multi-Attach enabled volumes can’t be created as boot volumes.
  • Multi-Attach enabled volumes can be attached to one block device mapping per instance.
  • You can’t enable or disable Multi-Attach after volume creation.You can’t change the volume type, size, or Provisioned IOPS of a Multi-Attach enabled volume.
  • Multi-Attach can’t be enabled during instance launch using either the Amazon EC2 console or Run Instances API.
  • Multi-Attach enabled volumes that have an issue at the Amazon EBS infrastructure layer are unavailable to all attached instances. Issues at the Amazon EC2 or networking layer might only impact some attached instances.
  • You can enable Multi-Attach for an Amazon EBS volume during creation only.

Getting Started With Multi Attach EBS Volumes

Configuring and using Multi-Attach volumes is a simple process for new volumes using either the AWS Command Line Interface (CLI) or the AWS Management Console.

Here I am going going yo create a volume,configured for multi-attach and attach it to two Linux EC2 Instance.

From one instance I will write a simple text file, and from the other instance I will read the contents.

In the AWS Management Console I first navigate to the EC2 homepage, select Volumes from the navigation panel and then click Create Volume.

Choosing Provisioned IOPS SSD (io1) for Volume Type, I enter my desired size and IOPS and then check the Multi-Attach option.

To instead do this using the AWS Command Line Interface(CLI) We simple use the EC2 Create Volume Command,with the –multi-attach-enabled option,

As shown below

I can verify that Multi-Attach is enabled on my volume from the Description tab when the volume is selected. The volume table also contains a column, Multi-Attach Enabled that displays a simple ‘yes/no’ value.

With the volume created and ready for use, I next launch two T3 EC2 instances running Linux. Remember, Multi-Attach needs an AWS Nitro System based instance type and the instances have to be created in the same Availability Zone as my volume. My instances are running Amazon Linux 2, and have been placed into the us-east-1a Availability Zone, matching the placement of my new Multi-Attach enabled volume.

Once the instances are running, it’s time to attach my volume to both of them. I click Volumes from the EC2 dashboard, then select the Multi-Attach volume I created. From the Actions menu, I click Attach Volume. In the screenshot below you can see that I have already attached the volume to one instance, and am attaching to the second.

If I’m using the AWS Command Line Interface (CLI) to attach the volume, I make use of the ec2 attach-volume command, as I would for any other volume type:

For a given volume, the AWS Management Console shows me which instances it is attached to, or those currently being attached, when I select the volume:

With the volume attached to both instances, let’s make use of it with a simple test. Selecting my first instance in the Instances view of the EC2 dashboard, I click Connect and then open a shell session onto the instance using AWS Systems Manager‘s Session Manager. Following the instructions here, I created a file system on the new volume attached as /dev/sdf, mounted it as /data, and using vim I write some text to a file.

sudo mkfs -t xfs /dev/sdf

sudo mkdir /data

sudo mount /dev/sdf /data

cd /data

sudo vim file1.txt

Selecting my second instance in the AWS Management Console, I repeat the connection steps. I don’t need to create a file system this time but I do again mount the /dev/sdf volume as /data (although I could use a different mount point if I chose). On changing directory to /data, I see that the file I wrote from my first instance exists, and contains the text I expect.

Creating and working with Multi-Attach volumes is simple! Just remember, they need to be attached to and be in the same Availability Zone as the instances they are to be attached to.

Detaching an Amazon EBS Volume from an Instance

We can detach an Amazon EBS volume from an instance explicitly or by terminating the instance. However, if the instance is running, you must first unmount the volume from the instance.If an EBS volume is the root device of an instance, you must stop the instance before you can detach the volume.

We can reattach a volume that you detached (without unmounting it), but it might not get the same mount point. If there were writes to the volume in progress when it was detached, the data on the volume might be out of sync.

To unmount the /dev/sdf  device we use the following command

unmount -d /dev/sdf

In the navigation pane, choose Volumes.

Select a volume and choose Actions, Detach Volume.

In the confirmation dialog box, choose Yes, Detach.

Using Deleting-on-Termination with Multi

Attach volumes

If you prefer to make use of the option to delete attached volumes on EC2 instance termination then we recommend you have a consistent setting of the option across all of the instances that a Multi-Attach volume is attached to – use either all delete, or all retain, to allow for predictable termination behavior. If you attach the volume to a set of instances that have differing values for Delete-on-Termination then deletion of the volume depends on whether the last instance to detach is set to delete or not.

Monitoring

You can monitor a Multi-Attach enabled volume using CloudWatch Metrics for Amazon EBS voumes.

Pricing and Billing

There are no additional charges for using Amazon EBS Multi-Attach. You are billed the standard charges that apply to Provisioned IOPS SSD (io1) volumes.

Difference between EBS and EFS

TypesEBS(Elastic Block Storage)EFS(Elastic File Storage)
DefinationAmazon EBS is the block storage offered on AWS. An Amazon EBS volume is a persistent storage device that can be used as a file system for databases, application hosting and storage, and plug and play devices.Amazon EFS is an NFS file system service offered by AWS. An Amazon EFS file system is excellent as a managed network file system that can be shared across different Amazon EC2 instances and works like NAS devices.
AccessibilityAccessible via single EC2 instance(updated to multiple provisioned instances)Accessible from multiple availability zones in the same region
PerformanceManually scale the size of the volumes without stopping instance.Baseline performance of 3 IOPS per GB for General Purpose volume.Use Provisioned IOPS for increased performanceHighly Scalable Managed Service.Supports up to 7000 file system operations per second
ScalabilityManual Scale upScalable
Availability99.99 PercentNo Publicly available SLA(Service level agreement)
Access ControlSecurity group.Use-based authentication(IAM)IAM user-based authentication.Security groups
Storage and file size limitsMax storage size of 16TB.No file size limit on diskNo limits on size of the system. 52 TB maximum for individual files
EncryptionUses an AWS KMS–Managed Customer Master Key (CMK) and AES 256-bit Encryption standardsUses an AWS KMS–Managed Customer Master Key (CMK) and AES 256-bit Encryption standards
Storage TypeBlock StorageObject Storage
Data Stored  Data stored stays in the same Availability zone.Replicas are made within the AZ for higher durabilityData stored in AWS EFS stays in the region.Replicas are made within the region
Data AccessCan be accessed by a single Amazon EC2 instanceCan be accessed by 1 to 1000’s of Ec2 instances from multiple AZs concurrently
File SystemSupports various file systems, including ext3 and ext4File storage service for use with AWS EC2. EFS can be used as network file system for on-premise servers too using AWS Direct Connect.
Durability  20 times more reliable than normal hard disksHighly durable (No public SLA)
Availability Zone Failure  Cannot withstand AZ failure without point-in time EBS SnapshotsEvery file system object is redundantly stored across multiple Availability Zones so it can survive one AZ failure.
Data Throughput and I/O  SSD- and HDD-backed storage types. Use of SSD backed and Provisioned IOPS is recommended for dedicated IO operations as neededDefault throughput of 3GB/s for all connected client
PricingThere is no additional charge for using EBS Multi-attach .You are billed the standard charges that apply to Provisioned IOPS SSD (io1)volumes.You pay only for the resources that you use.There is no minimum fee and setup charges

Customer Support Automation using AWS Connect

By | Uncategorized | No Comments

Customer: One of the largest Global Insurance providers

 

Problem statment

  • The client was the 1st company globally to provide insurance to people travelling overseas. Today, it’s one of the largest Insurance Groups in the world, offering services in roadside assistance, travel insurance & assistance, health & identity protection along with senior care & concierge services.
  • The customer support team receives 5000+ calls on a daily basis, while 80% of these calls were services related & post sales support. Most of these queries being repetitive & standardized in nature, client was looking to automate these queries on their call center, so the customer care team can look into more critical queries.
  • A solution which can integrate with Genesys for a seam less handoff to the automated system

Solution

  • Automated flows for 7 use cases, which included user authentication using an alphanumeric policy #, filing a claim, claim refund, refund status & so on AWS Connect, which used AWS Lex for NLP classification & user query understanding
  • AWS Connect integrated with Genesys dial in nos on the existing call center support system, with a seamless handover onto the voice automated system
  • Complete design was on a serverless architecture, with policy manipulation logic written on Lambda Functions on AWS
  • System integrated with live policy database via REST based APIs for live policy updates & reading the latest policy information
  • Completely voice based interaction, system handed off to a human agent if it is not able to resolve user query

Architecture diagram

Demo Link

File a claim            Claim Status

Policy Cancellation  User Authentication

Business Impact

  • 35% Reduction in call volume to the agents
  • 90% Reduction in resolution time for customers

Enabling remote work at scale

By | Uncategorized | No Comments

Customer: A leading biotech company

 

Problem Statement

A leading biotechnology company has a lot of contractors joining them for temporary work and below are the challenges faced while making sure contractors are productive:

  1. Time taken to allocate a hardened workstation to the contractor took weeks.
  2. Preventing data loss from these workstations.
  3. Security issues like viruses or malware attacks impacting the overall environment.
  4. Self-service option with an integrated approval workflow.

Proposed Solution

AWS Workspace was recommended for this requirement. It is a secure and managed cloud desktop as a service. With Amazon WorkSpaces, you can provide either a Windows or Linux desktop for your users in minutes and allow them to access desktops from any supported devices from any location.

The workspace self-service portal was created to cater to the self-service requirement.

Using this portal, the users can provision their own WorkSpaces with an integrated approval workflow that doesn’t require IT intervention for each request.

This is entirely serverless leveraging AWS Lambda, S3, API Gateway, Step Functions, Cognito, and SES and provides continuous deployment through AWS CodePipeline, CodeBuild, CloudFormation with SAM, and GitHub.

Cloud platform

AWS.

Technologies used

Lambda, Amazon Workspaces, GitHub, CloudFormation, S3, API Gateway, Directory services.

Benefit

  • The time taken for the contractor to be productive has come down drastically due to the quick availability of the workspace.
  • Standardization w.r.t the configuration of the workspaces.
  • No security incidents related to malware or virus attacks.

Managed services for a leading ecommerce company

By | Uncategorized | No Comments

Problem Statement

Our E-commerce client has multiple websites one for each country in Singapore, Malaysia, Japan, Australia. Each website will have its own infra. Very frequently development and application team will need the copy of production DB’s in UAT, DEV and staging environments for their testing and bug fixing. Since it’s commerce site before restoring the DB to UAT or DEV environments we need to remove the customer data and restore it. It was timing taking process to manually dump, clean up customer data and restore in respective environments and also there is a chance of human error happens every now and then.

Proposed Solution

In order to avoid the manual effort, the task has been automated with the help of Shell scripting, AWS spot instances & Jenkins.

Every day shell script will be used to take production DB dump and move that to S3 & local copy will be available in AWS EC2 server for 7 days.

Then spot instance will be launched using the backup volume and multiple DB jobs will be run in the background which will restore the production data and truncate customer data tables and it will dump the cleaned DB and move it to S3.

Then whenever dev team requires they will use Jenkins Job to fetch the cleaned DB file from S3 & restore it in their respective environments.

After sometime when the data grows in production DB spot instance was getting terminated even before it finishes the process, then we increased spot price by a little and ran multiple restore jobs in parallel which consumes less time.

Cloud platform

AWS.

Technologies used

EC2, Jenkins, S3.

AWS Connect

By | Uncategorized | No Comments

Customer: Multinational home appliances manufacturer

 

Problem Statement

The customer wanted to replace their existing Avaya Systems which had an IVR set up to take consumer calls. The categories included Service Schedules/Inquiries, Spare part status, Service location for maintenance, Product Information, etc.

Agent pain points in the AS-IS Process which also needed to be sorted:

  • Spare part status – Resolution is based on Inventory Management
  • Appointment scheduling – 5 executions for a technician per day
  • Agents coaching – Send message/email based on the event to supervisor & Real-time call listening

The team also had Avaya, network and Consumer Pain Point in the AS-IS Process.

Proposed Solution

Powerup successfully helped the client to set up a customer support environment for Customer agents in Indonesia through AWS Connect. This helped them to host AWS services in Sydney, Australia region with the ability to conference and transfer calls. In addition, Powerup set-up a customer support environment for agents which will route the voice calls from consumers to appropriate agents based on the language proficiency using AWS Connect based on the language support provided by the agents (English/Bahasa). It also facilitated Call Recording using AWS Connect capabilities and ability for the agent to make an outbound call using call information provided in the InstaEdge CRM. The Solution also included Out of the box reports with Real-time and historical reports along with login/log out reports to the client.

Cloud platform

AWS.

Technologies used

Amazon Connect, S3, Lambda.

Benefit

  1. Demonstrates that a voice call generated is successfully routed and addressed by an agent connected to AWS.
  2. Demonstrates that an agent connected to AWS Connect can make a successful outbound call to a consumer basis details provided in the CRM.
  3. An iframe of AWS Control Panel is demonstrated within a web application

Migration to cloud

By | Uncategorized | No Comments

Customer: A leading provider of cloud-based software solutions

 

Problem Statement

Being a part of the highly regulated life sciences industry, recognized the benefits of cloud a long time ago. The Customer were one of the very first life sciences solution vendors to deliver SaaS solutions to its customers. Currently, that momentum continues as the business goes “all-in on AWS” by moving their entire cloud infrastructure to the AWS platform.

As their platform and solutions are powered entirely by the AWS cloud, the business wanted to find ways to reduce costs, strengthen security and increase the availability of the existing AWS environment. Powerup’s services were enlisted with the following objectives:

  1. Cost optimization of the existing AWS environment
  2. Deployment automation of
  3. Safety infrastructure on AWS
  4. Architecture and deployment of centralized Log Management solution
  5. Architecture review and migration of the client’s customer environment to AWS including POC for Database Migration Service (DMS)
  6. Evaluation of DevOps strategy

Proposed Solution

 

1. Cost optimization of the existing AWS environment

Here are the three steps followed by Powerup to optimize costs:

  • Addressing idle resources by proper server tagging, translating into instant savings
  • Right sizing recommendation for instances after a proper data analysis
  • Planning Amazon EC2 Reserved Instances (RI) purchasing for resized EC2 instances to capture long-term savings

Removing idle/unused resource clutter would fail to achieve its desired objective in the absence of a proper tagging strategy. Tags created to address wasted resources also help to properly size resources by improving capacity and usage analysis. After right sizing, committing to reserved instances gets a lot easier. For example, Powerup team was able to draw a comparison price chart for the running EC2 & RDS instances based on the On-Demand Vs RI costs and share a detailed analysis explaining the RI Instances pricing plans.

By following these steps, Powerup estimated 30% reduction in monthly spend of the customer on AWS.

2. Deployment automation Safety infrastructure on AWS

In AWS, the client has leveraged key security features like Cloud Watch and Cloud trail to closely monitor the traffic and actions performed at API level. Critical functions like Identity & Access Management, Encryption, Log management is also managed by using features of AWS. Capabilities like AWS Guard Duty, which is a ML-based tool, which continuously monitors threats and add industry intelligence to the alerts it generates is used by them for 24/7 monitoring; along with AWS Inspector, which is a vulnerability detection tool. To ensure end to end cyber security, they have deployed an end to end Endpoint Detection and Response (EDR) solution called Trend Micro Deep Security. All their products are tested for security vulnerabilities using IBM AppScan tool and manual code review, following OWASP Top10 guidelines and NIST standards to ensure Confidentiality, Integrity and Availability of data.

As part of deployment automation, Powerup used Cloud formation (CF) and/or Terraform templates to automate infrastructure provision and maintenance. In addition to this, Powerup’s team simplified all modules used to perform day to day tasks to render them re-usable for deployments across multiple AWS accounts. Logs generated for all provisioning tasks were stored in a centralized S3 bucket. The business had requested for incorporating security parameters and tagging files, along with tracking of user actions in cloud trail.

3. Architecture and deployment of centralized Log Management solution

Multiple approaches for Log management were shared with the customer. Powerup and the client team agreed on the approach “AWS CW Event Scheduler/SSM Agent”. Initially, the scope was generation of Log management system for Safety infrastructure account, later, it was expanded to other accounts as well. Powerup team built solution architecture for Log management using ELK stack and Cloud Watch. Scripts were written such that it can be used across their client’s on AWS cloud. Separate scripts were written for Linux /Windows machines using Shell scripting and Powershell. No hard coding was done on the script. All inputs are through a csv file which would have Instance ID, Log Path, Retention Period, backup folder path & S3 bucket path.

Furthermore, Live hands-on workshops were conducted by Powerup team to train the client’s Operations team for future implementations.

4. Architecture review and migration of the client’s environment to AWS including POC for Database Migration Service (DMS)

The client’s pharmacovigilance software and drug safety platform is now powered by the AWS Cloud, and currently more than 85 of their 200+ customers have been migrated, with more to quickly follow. In addition, the wanted Powerup to support the migration of one of its customer to AWS. Powerup reviewed and validated the designed architecture. Infrastructure was deployed as per the approved architecture. Once the architecture was deployed, Powerup used the AWS Well-Architected Framework to evaluate the deployed architecture and provide guidance to implement designs that scale with customer’s application needs over time. Powerup also supported the application team for production Go-live on AWS infrastructure, along with deploying and testing DMS POC.

5. Evaluation of DevOps strategy

Powerup was responsible for evaluating DevOps automation processes and technologies to suit the products built by the client’s product engineering team.

Cloud platform

AWS.

Technologies used

EC2, RDS, CloudFormation, S3.

Benefit

Powerup equipped the client with efficient and completely on-demand infrastructure provisioning with hours, along with built-in redundancies, all managed by AWS. Eliminating idle and over-allocated capacity, RI management and continuous monitoring enabled them to optimize costs. They successfully realized 30% savings on overlooked AWS assets, resulting in an overall 10 percent optimization in AWS cost. In addition, the client can now schedule and automate application backups, scale up databases in minutes by changing instance type, and have instances automatically moved to a healthy infrastructure in less than 15 minutes in case of a downtime, giving customers improved resiliency and availability.

The client continues to provide a globally unified, standardized solution on the AWS infrastructure-as-a-service (IaaS) platform to drive compliance and enhance the experiences of all its customers.

Sales prediction engine

By | Uncategorized | No Comments

Customer: One of the world’s largest corporate food catering firm

 

Problem Statement

One of the world’s largest corporate food catering companies wanted to understand their customer behaviour including their food ordering trends. This will help them discontinue less popular foods and combos, eventually helping them increase customer satisfaction and profit margins.

Proposed Solution

The POS data from customer’s catering sites were pushed to a central Data Warehouse. The data is then processed by machine learning powered prediction engine to predict several important business parameters including plate consumption, top combo foods, inventory prediction etc.

Cloud platform

Azure.

Technologies used

Azure Machine Learning, Python, SQL Server, PowerBI.

Kubernetes Security Practices on AWS

By | Blogs | No Comments

Written by Praful Tamrakar Senior Cloud Engineer, Powerupcloud Technologies

Security in Cloud and Infra level

  1. Ensure the worker nodes AMI meet the CIS benchmark. 
    1. For K8s benchmark :
    1. Below is a list of tools and resources that can be used to automate the validation of an instance of Kubernetes against the CIS Kubernetes Benchmark:
  1. Verify that the Security Groups and NACL do not allow all traffic access and the rules allow access to ports and Protocol needed only for Application and ssh purposes.
  2. Make sure that you have encryption of data at rest. Amazon KMS can be used for encryption of data at rest. For Example : 
  • EBS volumes for ControlPlane nodes and worker nodes can be encrypted via KMS.
  • You can encrypt the  Logs Data either in Cloudwatch Logs or  in S3 using KMS.
  1. If Instance(s) are behind the ELB, make sure you have configured HTTPS encryption and decryption process (generally known as SSL termination) handled by an Elastic Load Balancer.
  2. Make sure the worker nodes and RDS are provisioned in Private Subnets.
  3. It’s always best practise to have a Separate Kubernetes(EKS) cluster for each Environment( Dev/UAT/Prod).
  4. Ensure to use AWS Shield/WAF to prevent DDOS attacks.

Container Level

  1. Ensure to use a minimal base image ( Eg: Alpine image to run the App)
  2. Ensure  that the docker image  registry you are using is a trusted, authorized and private registry. EG: Amazon ECR.
  3. Make sure you remove all the unnecessary files in your docker image. Eg: In tomcat server, you need to remove: 
  • $CATALINA_HOME/webapps/examples
  • $CATALINA_HOME/webapps/host-manager
  • $CATALINA_HOME/webapps/manager
  • $CATALINA_HOME/conf/Catalina/localhost/manager.xml 
  1. Ensure to disable the display of the app-server version or server information. For example, below in the Tomcat server, we can see the server information is displayed. This can be mitigated using the procedure below.

Update an empty value to server.info(server.info=””) in the file,$CATALINA_HOME/lib/org/apache/catalina/util/ServerInfo.properties

  1. Ensure not to copy or add any sensitive file/data in the Docker image, it’s always recommended to use Secrets ( K8s secrets are encrypted at rest by default onwards Kubernetes v1.13 )  You may also use another secret management tool of choice such as  AWS Secret Manager/Hashicorp Vault.
    • Eg: do not enter Database Endpoints, username, passwords in the docker file. Use K8s secrets and these secrets can be used as an Environmental variables  
apiVersion: v1
kind: Pod
metadata:
  name: secret-env-pod
spec:
  containers:
  - name: myapp
    image: myapp
    env:
      - name: DB_USERNAME
        valueFrom:
          secretKeyRef:
            name: dbsecret
            key: username
      - name: DB_PASSWORD
        valueFrom:
          secretKeyRef:
            name: dbsecret
            key: password
      - name: DB_ENDPOINT
        valueFrom:
          secretKeyRef:
            name: dbsecret
            key: endpoint

5. Ensure to disable Bash from the container images.

6. Endorse Multi-Stage build for smaller, cleaner and secure images.

To understand how can you leverage multi-stage can be found on :

https://docs.docker.com/develop/develop-images/multistage-build/

7. Verify that the container images are scanned for vulnerability assessment before it is pushed to the registry. The AWS ECR has the feature that you can scan Repository to Scan on Push. Eg : CLAIR/AQUA/etc assessment tools can be used to scan images. These tools can be embedded in the CI/CD pipeline making sure if there is any vulnerability, the docker image push can be rejected/terminated. Find sample implementation  – https://www.powerupcloud.com/email-va-report-of-docker-images-in-ecr/

K8s level

  1. Make sure to use or upgrade Kubernetes to the latest stable version.
  2. It’s recommended not to use default namespace. Instead, create a namespace for each application, i.e separate Namespaces for separate sensitive workloads.
  3. Make sure to enable Role-Based Access Control (RBAC) for Clients( Service Accounts / Users) for restricted privileges.

RBAC Elements:

  • Subjects: The set of users and processes that want to access the Kubernetes API.
  • Resources: The set of Kubernetes API Objects available in the cluster. Examples include Pods, Deployments, Services, Nodes, and PersistentVolumes, among others.
  • Verbs: The set of operations that can be executed to the resources above. Different verbs are available (examples: get, watch, create, delete, etc.), but ultimately all of them are Create, Read, Update or Delete (CRUD) operations.

Let’s see  RBAC meant for seeing Kubernetes as a production-ready platform.

  • Have multiple users with different properties, establishing a proper authentication mechanism.
  • Have full control over which operations each user or group of users can execute.
  • Have full control over which operations each process inside a pod can execute.
  • Limit the visibility of certain resources of namespaces.

4. Make sure to standardize the naming and labeling Convention of the Pod, Deployment, and service. This will ease the operational burden for security management ( Pod Network Policy ).

5. Ensure to use Kubernetes network policy which will restrict the  Pods communication, i.e how groups of pods are allowed to communicate with each other and other network endpoints. Please find how to implement the network policy in Amazon EKS https://blog.powerupcloud.com/restricting-k8s-services-access-on-amazon-eks-part-ix-7d75c97c9f3e

6. AWS Single Sign-On (SSO), AWS Managed Microsoft Active Directory Service, and the AWS IAM authenticator can be used to control access to your Amazon EKS cluster running on the AWS cloud.

7. Corroborate to use Pod Security Context.

  • Ensure to disable root access. the docker image should be accessible from a non-root user
  • Make sure to configure read-only root file system
  • Security-Enhanced Linux (SELinux): You can assign SELinuxOptions objects using the seLinuxOptions field. Note that the SELinux module needs to be loaded on the underlying Linux nodes for these policies to take effect.
  • Make sure  Linux capabilities and/or add non-default Linux capabilities are used if it’s required.
  • Make sure not to run pods/containers as privileged unless you will require access to all devices on the host. Permission to access an object, like a file, is based on user ID (UID) and group ID (GID).

Please Find the  Snippet for Pod Security Context  :

...
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
    readOnlyRootFilesystem: true
    allowPrivilegeEscalation: false
    seLinuxOptions:
    level: "s0:c123,c456"
    capabilities:
      drop:
        - NET_RAW
        - CHOWN
      add: ["NET_ADMIN", "SYS_TIME"]
...

Note : Pod Security content can be used in pod as well as container level.

apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo-2
spec:
  #Pod level
  securityContext:
    runAsUser: 1000
  containers:
  - name: sec-ctx-demo-2
    image: gcr.io/google-samples/node-hello:1.0
   #container level
    securityContext:
      runAsUser: 2000
      allowPrivilegeEscalation: false

8. Make sure to embed these Kubernetes Admission Controllers in all possible ways.

  • AlwaysPullImages – modifies every new Pod to force the image pull policy to Always. This is useful in a multitenant cluster so that users can be assured that their private images can only be used by those who have the credentials to pull them.
  • DenyEscalatingExec – will deny exec and attach commands to pods that run with escalated privileges that allow host access. This includes pods that run as privileged, have access to the host IPC namespace or have access to the host PID namespace.
  • ResourceQuota – will observe the incoming request and ensure that it does not violate any of the constraints enumerated in the ResourceQuota object in a Namespace.
  • LimitRanger- will observe the incoming request and ensure that it does not violate any of the constraints enumerated in the LimitRange object in a Namespace. Eg: CPU and Memory

10. Ensure to scan Manifest files (yaml/json) for which any credentials are passed in objects ( deployment, charts )  Palo Alto Prisma / Alcide Kubernetes Advisor.

11. Ensure to use TLS authentication for Tiller when Helm is being used.

12. It’s always recommended not to use a default Service account

  • The default service account has a very wide range of permissions in the cluster and should, therefore be disabled.

13. Do not create a Service Account or a User which has full cluster-admin privileges unless necessary,  Always follow Least Privilege rule.

14. Make sure to disable anonymous access and send Unauthorized responses to unauthenticated requests. Verify the following Kubernetes security settings when configuring kubelet parameters:

  • anonymous-auth is set to false to disable anonymous access (it will send 401 Unauthorized responses to unauthenticated requests).
  • kubelet has a `–client-ca-file flag, providing a CA bundle to verify client certificates.
  • –authorization-mode is not set to AlwaysAllow, as the more secure Webhook mode will delegate authorization decisions to the Kubernetes API server.
  • –read-only-port is set to 0 to avoid unauthorized connections to the read-only endpoint (optional).

15. Ensure to put restricted access to etcd from only the API server and nodes that need that access. This can be restricted in the Security Group attached to ControlPlane.

K8s API call level

  1. Ensure that all the communication from the client(Pod/EndUser) to the K8s(API SERVER) should be TLS encrypted
    1. May experience throttle if huge API calls happen
  2. Corroborate that all the communication from k8s API server to ETCD/Kube Control Manager/Kubelet/worker node/Kube-proxy/Kube Scheduler  should be TLS encrypted
  3. Enable Control Plane API to call logging and Auditing. ( EG:  EKS Control Plane Logging)
  4. If you are using Managed Services for K8s such as Amazon  EKS, GKE, Azure Kubernetes Service (AKS), these all things are taken care

EKS Security Considerations

  • EKS does not support Kubernetes Network Policies or any other way to create firewall rules for Kubernetes deployment workloads apart from Security Groups on the Worker node, since it uses VPC CNI plugin by default, which does not support network policy. Fortunately, this has a simple fix. The Calico CNI can be deployed in EKS to run alongside the VPC CNI, providing Kubernetes Network Policies support.
  • Ensure to Protect EC2 Instance Role Credentials and Manage AWS IAM Permissions for Pods. These can be configured by using below tools:
  • By using the IAM roles for the service accounts feature, we no longer need to provide extended permissions to the worker node’s IAM role so that pods on that node can call AWS APIs. We can scope IAM permissions to a service account, and only pods that use that service account have access to those permissions. This feature also eliminates the need for third-party solutions such as kiam or kube2iam.

https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-technical-overview.html

Security Monitoring of K8s

Sysdig Falco is an open-source, container security monitor designed to detect anomalous activity in your containers. Sysdig Falco taps into your host’s (or Node’s in the case Kubernetes) system calls to generate an event stream of all system activity. Falco’s rules engine then allows you to create rules based on this event stream, allowing you to alert on system events that seem abnormal. Since containers should have a very limited scope in what they run, you can easily create rules to alert on abnormal behavior inside a container.

Ref: https://sysdig.com/opensource/falco/

The Alcide Advisor is a Continuous Kubernetes and Istio hygiene checks tool that provides a single-pane view for all your K8s-related issues, including audits, compliance, topology, networks, policies, and threats. This ensures that you get a better understanding and control of distributed and complex Kubernetes projects with a continuous and dynamic analysis. A partial list of the checks we run includes:

  • Kubernetes vulnerability scanning
  • Hunting misplaced secrets, or excessive secret access
  • Workload hardening from Pod Security to network policies
  • Istio security configuration and best practices
  • Ingress controllers for security best practices.
  • Kubernetes API server access privileges.
  • Kubernetes operators security best practices.

Ref :https://aws.amazon.com/blogs/apn/driving-continuous-security-and-configuration-checks-for-amazon-eks-with-alcide-advisor/

Migration to Amazon ECS and DevOps Setup

By | Uncategorized | No Comments

Customer: India’s largest trucking platform

Problem Statement

The customer’s environment on AWS was facing scalability challenge as it was maintained across a heterogeneous set of software solutions with many different types of programming languages and systems and there was no fault-tolerant mechanism implemented. The lead time to get a developer operational was high as the developer ended up waiting for a longer duration to access cloud resources like EC2, RDS, etc. Additionally, the deployment process was manual which increased the chances of unforced human errors and configuration discrepancies. Configuration management took longer time which further slowed down the deployment process. Furthermore, there was no centralized mechanism for user management, log management, and cron jobs monitoring.

Proposed Solution

For AWS cloud development the built-in choice for infrastructure as code (IAC) is AWS CloudFormation. However, before building the AWS Cloudformation (CF) templates, Powerup conducted a thorough assessment of customer’s existing infrastructure to identify the gaps and plan the template preparation phase accordingly. Below were a few key findings from their assessment:

  • Termination Protection was not enabled to many EC2 instances
  • IAM Password policy was not implemented
  • Root Multi Factor Authentication (MFA) was not enabled
  • IAM roles were not used to access the AWS services from EC2 instances
  • CloudTrail was not integrated with Cloudwatch logs
  • S3 Access logs for Cloudtrail S3 bucket was not enabled
  • Log Metric was not enabled for Unauthorised API Calls; Using ROOT Account to access the AWS Console; IAM Policy changes; Changes to CloudTrail, CloudConfig, S3 Bucket policy; Alarm for any security group changes, NACL, RouteTable, VPCs
  • SSH ports of few security groups were open to Public
  • VPC Flow logs were not enabled for few VPCs

 

Powerup migrated their monolithic service into smaller independent services which are self-deployable, sustainable, and scalable. They also set up CI/CD using Jenkins and Ansible. Centralized user management was implemented using FreeIPA, while ELK stack was used to implement centralized log management. Healthcheck.io was used to implement centralized cron jobs monitoring.

CloudFormation (CF) Templates were then used in the creation of the complete AWS environment. The template can be reused to create multiple environments in the future. 20 Microservices in the stage environment were deployed and handed over to the customer team for validation. Powerup also shared the Ansible playbook which helps in setting up the following components – Server Hardening / Jenkins / Metabase / FreeIPA / Repository.

The below illustrates the architecture:

  • Different VPCs are provisioned for Stage, Production and Infra management. VPC peering is established from Infra VPC to Production / Stage VPC.
  • VPN tunnel is established between customer office to  AWS Infra VPC for the SSH access / Infra tool access.
  • All layers except the elastic load balancer is configured in private subnet.
  • Separate security group configured for each layer like DB / Cache / Queue / App / ELB / Infra security groups. Only required Inbound / Outbound rules.
  • Amazon ECS is configured in Auto-scaling mode . So the ECS workers will scale horizontally based on the Load to the entire ECS cluster.
  • Service level scaling is implemented for each service to scale the individual service automatically based on the load.
  • Elasticache (Redis) is used to store the end user session
  • Highly available RabbitMQ cluster is configured. RabbitMQ is used as messaging broker between the micro services.
  • For MySQL and Postgresql RDS Multi-AZ is configured. MongoDB is configured in Master-slave mode.
  • IAM roles are configured for accessing the AWS resources like S3 from EC2 instances.
  • VPC flow logs / cloud trail / Cloud Config are enabled for logging purpose. The logs are streamed into AWS Elasticsearch services using AWS Lambda. Alerts are configured for critical events like instance termination, IAM user deletion, Security group updates etc.
  • AWS system manager is used to manage collect the OS, Application, instance meta data of EC2 instances for inventory management.
  • AMIs and backups are configured for business continuity.
  • Jenkins is configured for CI / CD process.
  • CloudFormation template is being used for provisioning / updating of the environment.
  • Ansible is used as configuration management for all the server configurations like Jenkins / Bastion / FreeIPA etc.
  • Sensu monitoring system is configured to monitor system performance
  • New Relic is configured for application performance monitoring and deployment tracking

Cloud platform

AWS.

Technologies used

Amazon Redshift, freeIPA, Amazon RDS, Redis.

Benefit

IaC enabled customer to spin up an entire infrastructure architecture by running a script. This will allow the customer to not only deploy virtual servers, but also launch pre-configured databases, network infrastructure, storage systems, load balancers, and any other cloud service that is needed. IaC completely standardized the setup of infrastructure, thereby decreasing the chances of any incompatibility issues with infrastructure and applications can run more smoothly. IaC is helpful for risk mitigation because the code can be version-controlled, every change in the server configuration is documented, logged, and tracked. And these configurations can be tested, just like code. So if there is an issue with the new setup configuration, it can be pinpointed and corrected much more easily, minimizing risk of issues or failure.

Developer productivity drastically increases with the use of IaC. Cloud architectures can be easily deployed in multiple stages to make the software development life cycle much more efficient.