The customer is an international clinical-stage biopharmaceutical company focusing on cellular immunotherapy treatments for cancer is looking at adopting cloud services for the very first time. They plan to structure their database on Google cloud platform. The intention is to enhance performance and have efficient research outputs from their applications especially since they handle large volumes of data. They were also looking at the ability to scale at any point of time during peak loads along with complete automation of continuous integration and continuous deployment (CI/CD) process for easier deployments and better auditing, monitoring and log management.
The customer is a clinical-stage biopharmaceutical organization with the scientific vision of revolutionizing the treatment of cancer. They specialize in the research, clinical development and commercialization of cancer immunotherapy treatments. The combination of technologies from its academic, clinical and commercial research partners have enabled the company to create a fully integrated approach to the treatment of cancer with immunotherapy. They plan to work with Powerup to use Google Cloud Platform (GCP) as its cloud platform for their Cancer Research program.
The customer plans to use Google Cloud Platform (GCP) as its cloud platform for their Cancer Research program. Data scientists will be using a Secure File Transfer Protocol (SFTP) server to upload data on an average of one to two times a month with an estimated data volume of 2-6 TB per month.
The data transferred to GCP has to undergo a two-step cleansing process before uploading it on a database. The first step is to do a checksum to match the data schema against the sample database. The second step is transcoding and transformation of data after which the data is stored on a raw database.
Greenfield setup on GCP
Understanding customer needs while also understanding the current python models and workflows to be created were the first steps in initiating this project. Post these preliminary studies and sign-off, a detailed plan and solution architecture document formed a part of the greenfield project deliverables.
The set up included shared services, logging UAT and production accounts. The Cloud Deployment Manager (CDM) was configured to manage their servers, networks, infrastructure and web applications. Cloud Identity and Access Management (IAM) roles were created to access different GCP services as per customer specification, which helped in securely accessing another service.
On-premise connectivity is established via VPN tunnels.
The data scientists team have built nearly 50+ python/R models that help in the data processing. All the models are stored in GitHub currently. Python model will meet performance expectations when deployed and CI/CD pipelines to be created for 48 python models.
Once the data arrives on the database, the customer wants the python code to process the data and store the results on an intermediate database.
Multiple folders were created to deploy production, UAT and management applications. Cloud NAT was set up to enable internet access, Virtual Private Cloud (VPC) peering done for inter-connectivity of required VPCs and SFTP server was deployed on Google Compute Engine.
Once data gets uploaded on the raw GCS, checksum function will be triggered to initiate data cleansing. In the first phase, the data schema will be verified against a sample database after which the data will be pushed to transcoding and transformation jobs. Processed data will be stored to GCS.
All the python/R models will be deployed as a docker image on a Kubernetes cluster that is managed by Google ensuring that GCP is taking care of high availability and scaling.
The customer will have multiple workflows created to process data that in turn would be able to define all the workflows for python model executions.
The customer team will view the current data through a web application.
The processed data also has to be synced back to the on-premise server. An opensource antivirus tool is used to scan and verify data before migrating to Google Cloud Storage (GCS).
Monitoring and Logging
Monitoring tools such as stackdriver for infrastructure and application monitoring as well as log analytics was used as it supported features like tracing, debugging and profiling to monitor the overall performance of the application.
Additional tools such as Sensu to monitor infrastructure, Cloud Audit logging that checks Application Program Interface (API) activities, VPC flow logs to capture network logs and FluxDB as well as Grafana to store data on the database and visualize and create dashboards respectively were utilised.
Stackdriver logging module ensures centralized logging and monitoring of the entire system.
Security and Compliance
IAM with least permissible access and Multi-Factor Authentication (MFA) be enabled as an additional layer of security for account access. The databases won’t have direct access to critical servers like database and app servers. Firewall rules will be configured at the virtual networking level for effective protection and traffic control regardless of the operating system used. Only the required ports will be opened to give access to the necessary IP addresses.
Both data in transit and at rest are by default encrypted in GCP along with provisions for static code analysis and container image-level scanning.
Setup CI/CD pipeline using Jenkins which is an open-source tool that facilitates modern DevOps environment. It bridges the gap between development and operations by automating building, testing and deployment of applications.
After the successful deployment of code, code integration and log auditing got simpler. The customer was able to handle large blocks of data efficiently and auto-scaling at any point of time during new product launches and marketing events became effortless. This improved their performance as well.
The customer was also able to scale up without worrying about storage and compute requirements. They could move into an Opex model on the cloud by paying as per usage.
Moving to GCP enabled the customer to save 20% of their total costs as they could adopt various pricing models and intelligent data tiering.
The customer is UAE’s aviation corporation catering over 70 million passengers as of date. Passenger service system (PSS), their ticket booking application was a legacy system that they intended to migrate to a cloud environment while ensuring they manage to leverage administered services of cloud by conducting a Migration Readiness Assessment & Planning (MRAP).
Passenger Service System (PSS) was the existing ticket booking application for the customer. The objective was to understand this legacy system and then recommend how it can be migrated to AWS while leveraging the cloud-native capabilities via an MRAP assessment. The focus would be application modernization rather than a lift & shift migration to the cloud. The customer team intends to leverage managed services of cloud and go serverless, containers, open source etc. wherever possible. The customer team also wants to move away from the commercial Oracle database to a more open-source AWS Aurora PostgreSQL database due to the high licensing costs imposed by Oracle.
MRAP is critical to any organization that plans to adapt to the cloud as this tool-based assessment checks their application’s ability to cloud. Powerup was approached to perform MRAP on their existing set up to propose a migration plan as well as a roadmap, post its analysis.
The customer’s MRAP Process
To begin with, the RISC Networks RN150 virtual appliance, an application discovery tool that poses as an optional deployment architecture was configured and installed on the customer’s existing PSS Equinix data centre (DC) to collect data and create a detailed tool-based assessment to understand the existing set up ‘s readiness to migration.
Application stacks were built for the applications in scope and assessments as well as group interviews were conducted with all stakeholders. Data gathered from stakeholders were cross-verified with the information provided by the customer’s IT and application team to bridge gaps if any. Powerup team would then work on creating a proposed migration plan and a roadmap.
A comprehensive and detailed MRAP report included the following information:
Existing overall architecture
The existing PSS system was bought from a vendor called Radixx International, which provided three major services:
Availability service, an essential core service mainly used by online travel agencies (OTAs), end-users and global distribution system (GDS) to check the availability of their customer’s flights. It’s base system contained modules like Connect Point CP (core), payments, the enterprise application (Citrix app) all written in .NET and the enterprise application for operation and administration written in VB6.
Reservation service was used in booking passengers’ tickets where data was stored in two sessions, Couchbase and the Oracle database. The webpage traffic was 1000:1 when compared to availability service.
DCS System (Check-in & Departure Control Systems) is another core system of any airline, which assists in passenger check-in, baggage check-in and alerting the required officials. It is a desktop application used by airport officials to manage passengers from one location to another with the availability of an online check-in module as well.
Existing Database: Oracle is the current core database that stores all critical information consisting of 4 nodes – 2 Read-Write nodes in RAC1 & another 2 (read-only nodes) in RAC2. All availability checks are directed to the read-only Oracle nodes. The Oracle database nodes are heavily utilized roughly at 60-70% on an average with currently 14 schemas within the Oracle database accessed by the various modules. Oracle Advanced Queuing is used is some cases to push the data to the Oracle database.
Recommended AWS Landing zone structure
The purpose of AWS Landing Zone is to set up a secure, scalable, automated multi-account AWS environment derived from AWS best practices while implementing an initial security baseline through the creation of core accounts and resources.
The following Landing Zone Account structure was recommended for the customer:
AWS Organizations Account:
Primarily used to manage configuration and access to AWS Landing Zone managed accounts, the AWS organizations account provides the ability to create and financially manage member accounts.
Shared Services Account:
It is a reference for creating infrastructure shared services. In the customer’s case, Shared Services Account will have 2 VPCs – one for management applications like AD, Jenkins, Monitoring Server, Bastion etc. and other Shared services like NAT Gateway & Firewall. Palo Alto Firewall will be deployed in the shared services VPC across 2 Availability Zones (AZ)s and load balanced using AWS Application Load Balancer.
AWS SSM will be configured in this account for patch management of all the servers. AWS Pinpoint will be configured in this account to send notifications to customer – email, SMS and push notifications.
Centralized Logging Account:
The log archive account contains a central Amazon S3 bucket for storing copies of all logs like CloudTrail, Config, CloudWatch logs, ALB Access logs, VPC flow logs, Application Logs etc. The logging account will also host the Elasticsearch cluster, which can be used to create custom reports as per customer needs, and Kibana will be used to visualize those reports. All logs will be pushed to the current Splunk solution used by the customer for further analysis.
The Security account creates auditor (read-only) and administrator (full-access) cross-account roles from a security account to all AWS Landing Zone managed accounts. The organization’s security and compliance team can audit or perform emergency security operations with this setup and this account is also designated as the master Amazon GuardDuty account. Security Hub will be configured in this account to get a centralized view of security findings across all the AWS accounts and AWS KMS will be configured to encrypt sensitive data on S3, EBS volumes & RDS across all the accounts. Separate KMS keys will be configured for each account and each of the above-mentioned services as a best practice.
Powerup recommended Trend Micro as the preferred anti-malware solution and the management server can be deployed in the security account.
This account will be used to deploy the production PSS application and the supporting modules. High availability (HA) and DR will be considered to all deployments in this account. Auto-scaling will be enabled wherever possible.
UAT Account – Optimized Lift & Shift:
This account will be used to deploy the UAT version of the PSS application. HA and scalability are not a priority in this account. It is recommended to shut down the servers during off-hours to save cost.
Based on the understanding of the customer’s business a Hot Standby DR was recommended where a scaled-down version of the production setup will be always running and will be quickly scaled up in the event of a disaster.
UAT Account – Cloud-Native:
The account is where the customer’s developers will test all the architectures in scope. Once the team has made the required application changes, they will use this account to test the application on the cloud-native services like Lambda, EKS, Fargate, Cognito, DynamoDB etc.
Application Module – Global Distribution Systems (GDS)
A global distribution system (GDS) is one of the 15 modules of the PSS application. It is a computerized network system that enables transactions between travel industry service providers, mainly airlines, hotels, car rental companies, and travel agencies by using real-time inventory (for e.g., number of hotel rooms available, number of flight seats available, or number of cars available) to service providers.
The customer gets bookings from various GDS systems like Amadeus, Sabre, Travelport etc.
ARINC is the provider, which connects the client with various GDS systems.
The request comes from GDS systems and is pushed into the IBM MQ cluster of ARINC where it’s further pushed to the customer IBM MQ.
The GMP application then polls the IBM MQ queue and sends the requests to the PSS core, which in turn reads/writes to the Oracle DB.
GNP application talks with the Order Middleware, which then talks with the PSS systems to book, cancel, edit/change tickets etc.
Pricing is provided by the Offer Middleware.
Topology Diagram from RISC tool showing interdependency of various applications and modules:
Any changes in the GDS architecture can break the interaction between applications and modules or cause a discrepancy in the system that might lead to a compromise in data security. In order to protect the system from becoming vulnerable, Powerup recommended migrating the architecture as is while leveraging the cloud capabilities.
Proposed Migration Plan
IBM MQ cluster will be setup on EC2, and auto-scaling will be enabled to maintain the required number of nodes thus ensuring availability of EC2 instances at all times. IBM MQ will be deployed in a private subnet.
Amazon Elastic File System (Amazon EFS) will be automatically mounted on the IBM MQ server instance for distributed storage, to ensure high availability of the queue manager service and the message data. If the IBM MQ server fails in one availability zone, a new server is created in the second availability zone and connected to the existing data, so that no persistent messages are lost.
Application Load Balancer will be used to automatically distribute connections to the active IBM MQ server. GMP Application and PNL & ADL application will be deployed on EC2 across 2 AZs for high availability. GMP will be deployed in an auto-scaling group to scale based on the queue length in the IBM MQ server and consume and process the messages as soon as possible whereas PNL & ADL to scale out in case of high traffic.
APIS Inbound Application, AVS application, PSF & PR application and the Matip application will all be deployed on EC2 across 2 AZs for high availability in an auto-scaling group to scale out in case of high traffic.
GMP and GMP code sharing applications will be deployed as Lambda functions. The lambda function will run when a new message comes to the IBM MQ.
PNL & ADL application will be deployed as a Lambda function and the function will run when there is a change in the PNR number in which case a message must be sent to the airport.
AVS application will be deployed as Lambda functions where it will run when a message will be sent to the external systems.
Matip application will be deployed as a Lambda function and will run when a message will be sent using the MATIP protocol.
PFS & PR application will be deployed as Lambda functions. The lambda function will run when a message will be sent to the airport for booking.
APIS Inbound application will be deployed as a Lambda function and it will run when an APIS message will be sent to the GDS systems.
For all the above, required compute resources will be assigned as per the requirement. Lambda function will scale based on the load.
Application modifications recommended
All the application components like GMP, AVS, PNL & ADL, PFS & PR, Matip, etc are currently in .NET. which have to be moved into .NET Core to be run as Lambda functions. The applications are recommended to be broken down into microservices.
Oracle to Aurora Database Migration
AWS schema conversion tool (SCT) is run on the source database, which will generate a schema conversion report that will help understand interdependencies of existing schemas, and how they can be migrated on to Aurora PostgreSQL. The report will contain database objects some that can be directly converted by the SCT tools and the rest, which would need manual intervention. For Oracle functionalities that are not supported in Aurora PostgreSQL, the application team must write custom code to migrate those. Once all the schemas are migrated, AWS Database Migration Service will be used to migrate the entire data set from Oracle to Aurora.
Oracle to Aurora-PostgreSQL Roadmap
Lift & shift:
The current Oracle database will be moved to AWS as-is without any changes in order to kick-start the migration. The Oracle database can run on AWS RDS service or EC2 instances. One RDS node will be the master database in read/write mode. The master instance is the only instance to which the application can write to. There will be 3 additional read-replicas spread across 2 AZs of AWS to handle the load that is coming in for read requests. In case the master node goes down one of the read replicas is promoted as the master node.
Migrate the Oracle schemas to Aurora:
Once the Oracle database is fully migrated to AWS, the next step is to gradually migrate the schemas one by one to Aurora – PostgreSQL. The first step is to map all the 14 schemas with each application module of the customer. The Schemas will be migrated based on this mapping, wherever there are non-dependent schemas on other modules, it will be identified and migrated first.
The application will be modified to work with the new Aurora schema. Any functionality, which is not supported by Aurora, will be moved to application logic.
DB links can be established from Oracle to Aurora, however, it cannot be established from Aurora to Oracle database.
Any new application development that is in progress should be compatible and aligned with the Aurora schema.
Finally, all the 14 schemas will be migrated onto Aurora and the data will be migrated using DMS service. The entire process is expected to take up to 1 year. There will be 4 Aurora nodes – One Master Write & 3 Read Replicas spread across 2 AZs of AWS for high availability.
The assessment posed as a roadmap to move away from Oracle to PostgreSQL saving up to 30% in Oracle License cost. It also provided a way forward for each application towards cloud-native.
Current infrastructure provisioned was utilized at around 40-50% and a significant reduction in the overall total cost of ownership (TCO) was identified if they went ahead with cloud migration. Less administration by using AWS managed services also proved to be promising, facilitating smooth and optimized functioning of the system while requiring minimum administration.
With the MRAP assessment and findings in place, the customer now has greater visibility towards cloud migration and the benefits it would derive from implementing it.
Customer: One of India’s top media solutions company
The powerup cloud helped the customer completely transform their business environment through complete automation. Our design architecture and solution engineering improved business process efficiency, without any manual intervention, resulting in turnaround time is decreased by more than 90%. Now most of their applications running on the cloud, the customer has become one of most customer-friendly media companies in India.
The customer’s team wants to concentrate on building applications rather than spending time on the infrastructure setup and dependencies packages installed and maintained on the servers. The proposing solution needs to be a quick & scalable one so that business performance will be improved significantly.
Focusing on workload and transaction volume, we designed a customer-friendly, network optimized, a highly agile and scalable cloud platform that enabled cost optimization, effective management, and easy deployment. This helped reducing interventions and cost overheads.
We used AWS native tool CloudFormation to deploy the infrastructure as code, the ideology behind this is deployed infra as well as we can use it for Disaster Recovery.
CloudFormation template implemented in Stage & prod environment based on the best practice of AWS by subjecting the severs to reside in private subnets and internet routing with the help of Nat-gateway.
To remove the IP dependencies for a better way to manage failures, the servers and the websites are pointed to the Application load balancers where a single load balancer we managed to have multiple target groups in the view of cost optimization.
Base Packages Dependency:
This solution must remove the dependency of the developer to install the packages on the server to support the application.
The packages need to be installed on the infra setup, so the developer can deploy the code using the code deployer services rather than spending time to install dependencies.
Hence, we proposed & implemented the solution via Ansible, With the help of ansible we can able to manage multiple servers under a single roof. We have prepared a shell script that will install the packages on the server.
The architecture majorly differentiated in the means of Backend & frontend Module.
Backend Module where the java application will be running, hence a shell script will run the backend servers which will install the Java-8 versions and creates a Home path based on standard path, so home path execution of application will be always satisfied by this condition.
Frontend Module which more likely of Nginx combined with node.js which achieved by the same methodology.
The application’s logs and other backup artifacts are managed in the secondary EBS volume which the mount point to the fstab entries are also automated.
The Main part of deployment achieved by the code-deployer hence the servers should be installed with code-deployment agents during the server setup which is also done through ansible.
User access is another solution, where the access to the servers restricted for some people in the development team and the access will be provided to the server with the approval of their leads.
We had, dev, qa, psr, stage and prod environments we clubbed all the servers in the ansible inventory and generated a public key and private key and passed them on the standard part. When the user adds scripts runs, ansible will copy the public keys and create a user on the destination server by pasting the public key in the authorized file.
This method will be hidden the pub key from the end-user when the user asked to removed using ansible we will delete those users from the server.
Monitoring with sensu:
Infra team is responsible for monitoring the infra, hence we created a shell script that will install the sensu on the destination server for monitoring using ansible.
By implementing these solutions, the development was less worried about the packages dependencies which allowed them to concentrate on their app development and fixing bugs and user access got streamlined.
Bastion with MFA settings:
The servers in the environment can get accessed only by the bastion server which acts as the entry point.
This bastion server was set up with the MFA mechanism, where each user must access the server with MFA authentication as a security best practice.
In one of the legacy account, SSL offloaded at the server level with a lot of Vhosts. Hence renewing certificates will take time to reduce the time we used SSL with ansible to rotate the certificates in a quick time with less human efforts.
Automation in Pipeline :
Base packages installation on bootup which reduces one step of installation.
User access with automatic expiry condition.
In addition to the on-going consulting engagement with the customer for enhancement, and designing a system to meet the client’s need, Powerupcloud also faced some challenges. The Infra has to be created in quick time with 13 servers under the application load balancers, which includes Networking, compute and load balancers with target groups. The Instances were required to install with certain dependencies to run the application smoothly. As a result, the development process became more complicated.
The solution was also expected to meet the very high level of security, continuous monitoring, Non-stop 24X7 operation, High availability, agility, scalability, less turnaround time, and high performance, which was a considerable challenge given the high business criticality of the application.
To overcome these challenges, we established a predictive performance model for early problem detection and prevention. Also, started a dedicated performance analysis team with active participation from various client groups.
All the changes in configuration are smoothly and rapidly executed from the viewpoint of minimizing load balance and outage time.
Business Result & outcome
With the move to automation, the customer’s turn-around time decreased by 30%. This new system also helped them reduced capital investments as it is completely automated. The solution was designed in-keeping with our approach of security, scalability, agility, and reusability.
Successful implementation of the CloudFormation template.
The customer is one of the largest Indian entertainment companies catering to acquiring, co-producing, and distributing Indian cinema across the globe. They believe that media and OTT platforms can derive maximum benefit in terms of multi-tenant media management solutions provided by the cloud. Therefore, they are looking at migrating their existing servers, databases, applications, and content management system on to cloud for better scalability, maintenance of large volumes of data, modernization, and cost-effectiveness. The customer intends to also look at alternative migration strategies like re-structuring and refactoring if need be.
The customer is a global Indian entertainment company that acquires, co-produces, and distributes Indian films across all available formats such as cinema, television and digital new media. The customer became the first Indian media company to list on the New York Stock Exchange. It has experience of over three decades in establishing a global platform for Indian cinema. The company has an extensive and growing movie library comprising over 3,000 films, which include Hindi, Tamil, and other regional language films for home entertainment distribution.
The company also owns the rapidly growing Over The Top (OTT) platform. With over 100 million registered users and 7.9 million paying subscribers, the customer is one of India’s leading OTT platforms with the biggest catalogue of movies and music across several languages.
Problem statement / Objective
The online video market has brought a paradigm shift in the way technology is being used to enhance the customer journey and user experience. Media companies have huge storage and serving needs as well as the requirement for high availability via disaster recovery plans so that a 24x7x365 continuous content serving is available for users. Cloud could help media and OTT platforms address some pressing business challenges. Media and OTT companies are under constant pressure to continuously upload more content cost-effectively. At the same time, they have to deal with changing patterns in media consumption and the ways in which it is delivered to the audience.
The customer was keen on migrating their flagship OTT platform from a key public cloud platform to Microsoft Azure. Some of the key requirements were improved maintainability, scalability, and modernization of technology platforms. The overall migration involved re-platforming and migrating multiple key components such as the content management system (CMS), the Application Program Interfaces (APIs), and the data layer.
Powerup worked closely with the client’s engineering teams and with the OEM partner (Microsoft) to re-architect and re-platform the CMS component by leveraging the right set of PaaS services. The deployment and management methodology changed to containers (Docker) and Kubernetes.
Key learnings from the project are listed below:
Creating a bridge between the old database (in MySql) and a new database (in Postgres).
Migration of a massive volume of content from the source cloud platform to Microsoft Azure.
Rewriting the complete CMS app using a modern technology stack (using Python/Django) while incorporating functionality enhancements.
Setting up and maintaining the DevOps pipeline on Azure using open source components.
Modernized infrastructure powered by Azure, provided improved scalability and stability. The customer was able to minimize infrastructure maintenance using PAAS services. Modular design set-up enabled by migration empowered the developers with the ability to prototype new features faster.
The customer’s environment on AWS was facing scalability challenges as it was maintained across a heterogeneous set of software solutions with many different types of programming languages and systems and there was no fault-tolerant mechanism implemented. The lead time to get a developer operational was high as the developer ended up waiting for a longer duration to access cloud resources like EC2, RDS, etc. Additionally, the deployment process was manual which increased the chances of unforced human errors and configuration discrepancies. Configuration management took a long time which further slowed down the deployment process. Furthermore, there was no centralized mechanism for user management, log management, and cron job monitoring.
For AWS cloud development the built-in choice for infrastructure as code (IAC) is AWS CloudFormation. However, before building the AWS Cloudformation (CF) templates, Powerup conducted a thorough assessment of the customer’s existing infrastructure to identify the gaps and plan the template preparation phase accordingly. Below were a few key findings from their assessment:
Termination Protection was not enabled to many EC2 instances
IAM Password policy was not implemented
Root Multi-Factor Authentication (MFA) was not enabled
IAM roles were not used to access the AWS services from EC2 instances
CloudTrail was not integrated with Cloudwatch logs
S3 Access logs for Cloudtrail S3 bucket was not enabled
Log Metric was not enabled for Unauthorised API Calls; Using ROOT Account to access the AWS Console; IAM Policy changes; Changes to CloudTrail, CloudConfig, S3 Bucket policy; Alarm for any security group changes, NACL, RouteTable, VPCs
SSH ports of few security groups were open to Public
VPC Flow logs were not enabled for few VPCs
Powerup migrated their monolithic service into smaller independent services which are self-deployable, sustainable, and scalable. They also set up CI/CD using Jenkins and Ansible. Centralized user management was implemented using FreeIPA, while ELK stack was used to implement centralized log management. Healthcheck.io was used to implement centralized cron job monitoring.
CloudFormation (CF) Templates were then used in the creation of the complete AWS environment. The template can be reused to create multiple environments in the future. 20 Microservices in the stage environment was deployed and handed over to the customer team for validation. Powerup also shared the Ansible playbook which helps in setting up the following components – Server Hardening / Jenkins / Metabase / FreeIPA / Repository.
The below illustrates the architecture:
Different VPCs are provisioned for Stage, Production, and Infra management. VPC peering is established from Infra VPC to Production / Stage VPC.
VPN tunnel is established between the customs office to AWS Infra VPC for the SSH access / Infra tool access.
All layers except the elastic load balancer are configured in a private subnet.
Separate security group configured for each layer like DB / Cache / Queue / App / ELB / Infra security groups. Only required Inbound / Outbound rules.
Amazon ECS is configured in Auto-scaling mode. So the ECS workers will scale horizontally based on the Load to the entire ECS cluster.
Service level scaling is implemented for each service to scale the individual service automatically based on the load.
Elasticache (Redis) is used to store the end-user session
A highly available RabbitMQ cluster is configured. RabbitMQ is used as messaging broker between the microservices.
For MySQL and Postgresql, RDS Multi-AZ is configured. MongoDB is configured in Master-slave mode.
IAM roles are configured for accessing the AWS resources like S3 from EC2 instances.
VPC flow logs/cloud trail / Cloud Config are enabled for logging purposes. The logs are streamed into AWS Elasticsearch services using AWS Lambda. Alerts are configured for critical events like instance termination, IAM user deletion, Security group updates, etc.
AWS system manager is used to manage to collect the OS, Application, instance metadata of EC2 instances for inventory management.
AMIs and backups are configured for business continuity.
Jenkins is configured for CI / CD process.
CloudFormation template is being used for provisioning/updating of the environment.
Ansible is used as configuration management for all the server configurations like Jenkins / Bastion / FreeIPA etc.
Sensu monitoring system is configured to monitor system performance
New Relic is configured for application performance monitoring and deployment tracking
Amazon Redshift, free IPA, Amazon RDS, Redis.
IaC enabled customers to spin up an entire infrastructure architecture by running a script. This will allow the customer to not only deploy virtual servers, but also launch pre-configured databases, network infrastructure, storage systems, load balancers, and any other cloud service that is needed. IaC completely standardized the setup of infrastructure, thereby decreasing the chances of any incompatibility issues with infrastructure and applications that can run more smoothly. IaC is helpful for risk mitigation because the code can be version-controlled, every change in the server configuration is documented, logged, and tracked. And these configurations can be tested, just like code. So if there is an issue with the new setup configuration, it can be pinpointed and corrected much more easily, minimizing the risk of issues or failure.
Developer productivity drastically increases with the use of IaC. Cloud architectures can be easily deployed in multiple stages to make the software development life cycle much more efficient.
Tools used & AWS Services used – Compute – EC2,EKS,ECR,Lambda, Shared storage – SFTP,EFS, Database – RDS Postgres, Advanced networking – Route53, route 53 resolver , custom DHCP, Security – AWS IAM, Active Directory, CloudTrail, AWS Config, IAAS CloudFormation, Other Services – Terraform, Jenkins with sonarQube, Nexus and Clair
The customer is regarded as global insurance giants of the financial sector. ABS is their consolidated insurance system that they are looking at migrating to AWS along with its supporting applications. They also wanted Powerup to create a Disaster Recovery facility on AWS and make the ABS insurance system available as a backup solution for one of their esteemed banking clients while also catering to a business continuity strategy, automation of applications and security & compliance.
The Customer is a German multinational, one of the leading integrated financial services company, headquartered in Munich. Their core business caters to offering products and services in insurance and asset management.
Problem statement / Objective
ABS is a monolithic application while the supporting applications are microservices-based. Hence, a microservice architecture which can seamlessly integrate with the customer’s core insurance module was needed.
They wanted Powerup to deploy their applications on production as well as on a secondary (Disaster Recovery) DR facility on AWS using a Continuous Integration (CI)/ Continuous Deployment (CD) pipeline. This was to serve as a Business Continuity solution for one of their esteemed banking clients.
For business continuity, the customer anticipated the Recovery Time Objective (RTO) of less than 4hr and Recovery Point Objective (RPO) of not more than 24 hours.
In addition to infrastructure deployment, all application deployments were requested to be automated by the client. Being a financial services company, the customer is bound by multiple regulatory and compliance-related obligations for which Cloud Best Security practices were also to be instrumented.
AWS Landing Zone was set up with the following accounts – Organization Account, Production Account, Dev, Pre-Prod, Management, DR, Centralized Logging & Security Account.
The operational unit consisted of the customer business system (i.e., CISL (Core insurance layer), RAP (Rich Application), MFDB (Core application Database), CTRLM (Batch job automation)) and Non-ABS (Non-customer business system i.e., dispatcher).
All Logs will be centrally stored in the logging account. All management applications like Control-M, AD, Jenkins etc. will be deployed in the management account.
ABS application is deployed across multiple AZs and load balanced using AWS Application load balancer. Non-ABS applications are microservices-based and talk to the ABS running to process or fetch the required data based on request. Close to over 10 microservices are running on Docker within the EKS cluster.
Auto-scaling is enabled at the service level as well as EC2 level to scale out the microservices based on the load. The application uses Active Directory to authenticate.
Amazon Elastic Kubernetes Service (EKS) backed the highly available, reliable and decoupled API services which are accessible only inside the customer’s private global shared network. Each module is segregated with the namespace.
Jenkins pipelines were used to build automation, Nexus tool to store artifacts and Clair for checking vulnerabilities. Build Artifact Vulnerability management was made easy with the aid of SonarQube.
Active passive Disaster recovery
Actively synchronised AWS Secure File Transfer Protocol (SFTP) is the active directory and private file storage space on the cloud.
Powerup methodically designed and tested a cross-account and cross-region disaster recovery strategy. At the time of live deployment, docker images are tagged (with versioning) and shipped to Amazon Elastic Container Register (ECR) DR account. Encrypted (Amazon Machine Image) AMIs and Relational Database Service (RDS) snapshots are passively shipped to DR account with a Recovery Point Objective (RPO) of 3 hours.
Custom coupled Lambda functions are used to generate, ship and eliminate encrypted AMIs and snapshots to DR accounts with no human intervention as a backup solution.
Advanced strategy to ensure the best security
Custom cloud formation template helped in monitoring AWS API calls made to Change or update configurations of IAM roles / SecurityGroups inbound or outbound rules/ EC2. Granular rules are followed in the AWS config for maintaining and remediating as per regulatory compliance.
The customer’s network setup was the biggest issue faced. In AWS, the network was completely private, an environment without Internet Gateway (i.e., direct internet access) and Network Address Translation (NAT) because of which a custom Dynamic Host Configuration Protocol (DHCP) option set had to be used to cope up with an existing custom DNS server which was set up in the customer Shared Service account alongside a custom proxy setup for the internet. The most challenging part was registering the worker nodes to Master in EKS as some of the internal kubelet components were failing due to enterprise proxy and custom DNS servers. In order to fix this, AWS private link, route 53 resolver and Kubernetes configMap were fine-tuned effectively.
The ABS insurance application was successfully deployed on AWS environment while meeting all security & high availability guidelines as per the stated compliance directives.
During load tests, the application was found to be able to handle 200 concurrent users successfully.
Microservices helped in ‘easier to build and maintain’ application programming interface (API) services. Flexibility and scalability of different API applications were also achieved.
The customer could maintain the lifecycle of identifying, investigating and prioritizing vulnerabilities in code as well as containers without any compromise.
The customer could now implement strong access and control measures and maintain an information security policy.
Tabletop run-through and DR scenario simulations ensured business continuity.
The customer offers a broad range of financial products and services to diversified customer segments and has a sizable presence in the large retail market segment through its life insurance, housing finance, mutual fund and retail financial businesses across domestic and global geographies.
The customer, together with a strong network of sub-brokers and authorized persons, serve approximately 12-lakh strong client bases through 10,052 employees based out of 448 offices across all major cities in India.
Their business comprises of multiple asset classes broadly divided into Credit (retail and corporate), Franchise and Advisory (asset and wealth management, capital markets) and Insurance (Life and General insurance).
Cloud computing technology has gained significant momentum in the financial sector and the customer is looking at building a digital organization to align technology with evolving customer needs and behavior. Though they have been on the cloud from the beginning, cloud migration has accelerated at a rapid pace and they see the urge to be at par with the growing needs.
Problem statement / Objective
With such a manifold existence, the customer realized it was extremely necessary for them to set up an environment that would not just support diverse applications but also cater to teams and/or projects across multiple locations for their domestic as well as global customers. This was possible only if they migrated their applications to the cloud.
Powerup’s scope of work was to carry out a cloud readiness assessment in order to understand how well prepared the customer is for the technology-driven transitional shift. They were to define, plan, assess, and report the customer’s readiness via Migration Readiness Assessment & Planning (MRAP).
The customer’s MRAP Process:
Migration Readiness Assessment & Planning (MRAP) is the process of assessing the current on-premise environment in order to analyze how ready it is to migrate to the cloud and every organization intending to migrate to the cloud must undergo this. The analysis explains how the entire process works and in what order should the events occur.
The customer carried out MRAP for almost 250 applications and had intended to migrate all the applications that are a part of this assessment.
The first step in planning the MRAP exercise was to understand the number and type of applications, identify the appropriate stakeholders for interviews, tools to be installed, different types of installations, creation of project plans, to name a few.
To begin with, RISC networks, an application discovery tool, were configured and installed on the customer environment. It allowed all customer-specific data to be kept onsite or in a location of the customer’s choice to gather data from the on-premise existing in the customer environment. Application discovery service helped collect hardware and server specification information, credentials, details of running processes, network connectivity, and port details. It also helped acquire a list of all on-premise servers in scope along with their IP addresses, business application names hosted on them, and the stakeholders using those apps.
Deployment and assessment:
Once the application is deployed and has the necessary access, servers need to be licensed so that the RISC tool can start collecting data. It is recommended to have the tool collecting data for at least 2 weeks so that a significant amount of information is captured.
At the customer’s organization, a total of 363 servers were licensed and almost 216 applications that belonged to 7 different lines of businesses (LOBs) were covered in the process.
Application stacks were built for all applications in scope by grouping applications based on connectivity information. Assessment and group interviews were conducted with application users, namely; application team, network team, security team, and DevOps team to cross verify the data provided by the IT team and application team with RISC’s grouping and bridging the gaps if any. A proposed migration plan was to be developed post-analysis that would state identified migration patterns for the applications in scope, create customized or modernized target architecture to plan a rapid lift & shift migration strategy.
A comprehensive and detailed MRAP report included information on the overall current on-premise architecture, infrastructure and architecture details for all identified applications, suggested migration methodology for each application, migration roadmap with migration waves, total cost of ownership (TCO) analysis and an executive presentation for business cases.
The purpose of an AWS Landing Zone is to provide a framework for creating, automating, baselining, and maintaining a multi-account environment. This is considered as a best practice usually recommended before deploying applications on AWS.
The customer, with Powerup’ guidance, decided to set-up and maintain the following AWS Landing Zone accounts –
Business unit accounts – UAT Account & Production Account
Topology Diagram from RISC tool showing the interdependence of various applications and modules:
The report would also provide details on each application across LoBs that would cover the following information:
Current Application Architecture
To be Architecture on Cloud
Current Application Inventory Details with Utilization.
Recommended Sizing on Cloud
Network Topology for each application.
Migration Methodology – 7 Rs of Migration – Rehost, Refactor etc.
The MRAP report depicted in-depth details on the customized AWS Architecture for the customer:
Identifying the migration pattern for all applications was the key. Target architecture for applications was created in such a manner that it could be changed or improvised, if required, in the future. This architecture catered to not just application and network deployment but also covered non-functional requirements, data security, data sizes, operations, and monitoring and logging.
A VPN tunnel set up between the customer House and AWS Transit Gateway while the Transit Gateway was deployed in the Shared Services account to communicate with Virtual Private Networks (VPC) from other accounts.
Sensu Monitoring Server and Palo Alto Firewall were deployed in the Shared Services Account.
A shared services account was used to host Active Directory (AD) and a bastion host.
The production environment was isolated as the customer had several applications running development, test, and production from the same account.
Key Findings from the customer MRAP
● Current Infrastructure provisioned was utilized to only 30%.
● Close to 20% servers are already outdated or turning obsolete within the next one year.
● OS Split – 70% Windows, 20% RHEL, 10% OpenSource Linux Distributions.
● Database (DB) Split – 70% SQL Server, 20% Oracle, 10% – MYSQL, PostgreSQL, MariaDB, MongoDB. Databases are being shared across multiple applications.
● Up to 76 applications are running on the same servers.
● Multiple DB engines on the same DB server.
● Servers are being shared across LOBs
● Close to 20% Open Source applications are running on Windows/RHEL – this can be easily moved to Amazon Linux (opensource) during migrations.
● Close to 20% of applications can be moved to new AMD/ARM architectures to save costs.
● Up to 50% savings on TCO can be achieved over the next 5 years by moving to AWS
With the MRAP assessment and findings in place, the customer now has greater visibility towards cloud migration and the benefits it would derive from implementing it. With a rapid lift & shift migration strategy, they could now look at better resource utilization, enhanced productivity, and operational efficiency going forward.
The customer offers a broad range of financial products and services to diversified customer segments that include corporations, institutions, and individuals across domestic and global geographies. Financial service providers have long been at the forefront of cloud adoption and the customer has been no exception. Cloud migration has accelerated at a rapid pace across multiple business groups and the customer plans to stay abreast of the growing surge. The idea was to migrate their applications one- by- one to AWS.
For this purpose, a migration readiness assessment for almost 250 applications was conducted which included stakeholder interviews and tool data analysis. A rapid lift and shift migration were intended to be implemented as quickly as possible.
Powerup’s scope of services was to define and plan a business case for the Migration Readiness Assessment & Planning (MRAP) by gathering data from the existing setup and validating the same in terms of understanding how well equipped the customer is for cloud migration. An MRAP report would then be drafted which would act as a roadmap to the actual migration.
Worxogo is a pioneer in AI and sales have extended the services across the globe. They require to have security in terms of data and as well in the Network level infra running in Azure systems.
PUC recommended the following for the client to enhance the security aspects.
1.Azure Security Centre:
Azure Security Center is a unified infrastructure security management system that strengthens the security posture of the data centers, and provides advanced threat protection across your hybrid workloads in the cloud – whether they’re in Azure or not – as well as on-premises.
In the Security Center, we can set our policies to run on management groups, across subscriptions, and even for a whole tenant.
The advanced monitoring capabilities in the Security Center also let us track and manage compliance and governance over time. The overall compliance provides you with a measure of how much your subscriptions are compliant with policies associated with your workload.
Security Center continuously discovers new resources that are being deployed across your workloads and assess whether they are configured according to security best practices if not, they’re flagged and you get a prioritized list of recommendations for what you need to fix to protect your machines.
As our client add up new resources to the environment, this feature helps in the validation of the resources and fix the security issues based on the recommendations.
Enables us to see the topology of the workloads, so we can see if each node is properly configured. we can see how the nodes are connected, which helps you block unwanted connections that could potentially make it easier for an attacker to creep along with your network.
Security Center makes mitigating your security alerts one step easier, by adding a Secure Score. The Secure Scores are now associated with each recommendation you receive to help you understand how important each recommendation is to your overall security posture.
Azure Security center protects the following
Protect against threats
Integration with Microsoft Defender Advanced threat protection
Brute force attack
Protect IoT and hybrid cloud workloads
Hence Azure Security Center speaks to the growing need for an enterprise-grade security management platform that encompasses both cloud and onsite resources with a unified, analytics-rich, actionable interface that helps you take control of the security of your resources on all fronts.
Azure managed disks automatically encrypt your data by default when persisting it to the cloud. Server-side encryption (SSE) protects your data and helps you meet your organizational security and compliance commitments.
Data in Azure managed disks is encrypted transparently using 256-bit AES encryption, one of the strongest block ciphers available, and is FIPS 140-2 compliant.
Note: Encryption does not impact the performance of managed disks and there is no additional cost for the encryption.
If Azure Security Center is used, it notifies an alert if you have VMs that aren’t encrypted. The alerts show High Severity and the recommendation is to encrypt these VMs.
Accessing Azure resources Using Secure VPN Connection
As the organizational members access the VM resources for deployment and coding purposes daily and there is a need for secure communication between the Azure resources and the PC’s of the organizational members.
A Point-to-Site (P2S) VPN gateway connection lets to create a secure connection to your virtual network from an individual client computer. A P2S connection is established by starting it from the client computer. Users use the native VPN clients on Windows and Mac devices for P2S. Azure provides a VPN client configuration zip file that contains settings required by these native clients to connect to Azure.
For Windows devices, the VPN client configuration consists of an installer package that users install on their devices.
For Mac devices, it consists of the mobileconfig file that users install on their devices.
The zip file also provides the values of some of the important settings on the Azure side that you can use to create your profile for these devices. Some of the values include the VPN gateway address, configured tunnel types, routes, and the root certificate for gateway validation.
Add a layer of security, Fortigate firewalls have been configured to monitor the incoming/outgoing traffic.
The FortiGate-VM on Microsoft Azure delivers next-generation firewall capabilities for organizations of all sizes, with the flexibility to be deployed as a next-generation firewall and/or VPN gateway. It protects against cyber threats with high performance, security efficacy, and deep visibility.
Google Services used: Compute Engine, Container Build, Google Kubernetes Engine, Container Registry, Cloud SQL, Google Storage Bucket, Cloud Identity & Access Management, Cloud VPN, Cloud DNS, Cloud Load Balancing.
The customer is a technology-based real estate platform that has been built to buy, sell, rent, and find a paying guest or a flatmate without paying any brokerage. They enable owners and tenants to interact with each other directly by using their technologically advanced platform. The plan is to re-architect and migrate their entire infrastructure to Google Cloud Platform (GCP) to reduce costs and increase their portal’s workload efficiency.
Headquartered in Bengaluru, India, the customer is a technology-based real estate search portal that connects property owners and tenants and buyers and sellers directly, eliminating the concept of a broker. The organization does not charge any brokerage for the services it provides. It was founded by IIT alumnus in March 2014 and is a team of 350 people serving over 35 lakh customers today. They have worked endlessly to remove all asymmetric information caused by brokers.
Their goal is to lead India’s real estate industry towards an era of doing convenient housing transactions, brokerage-free. They currently save their customers over 250 crores per year in brokerage. They are proving to be a new and disruptive force in the real estate industry.
The customer has been running their infrastructure on Digital Ocean and after much research and analysis, they have evaluated GCP and its solution to be more suited to their requirements. Powerup is proposing to work with the customer to help them migrate their infrastructure to Google Cloud from Digital Ocean, which in turn would help them in running their workload on cloud in a very efficient and cost-effective manner.
The customer currently runs its infrastructure on Digital Ocean. This set up, however, restricts them from managing their network, containers and storage connectivity. They have also been unable to use multiple features like auto-scaling, managed Kubernetes, load balancer and Google Cloud Storage due to the current constricted set up.
Powerup has proposed to work with the customer team to help migrate their infrastructure from the Digital Ocean to Google Cloud Platform (GCP), which will help them in running their workload on the cloud more efficiently and cost-effectively. However, any application-level changes as part of the migration and on-premise or Digital Ocean VPN tunnel setup is out of scope for this project. Meanwhile, on Powerup’s recommendation, the customer has evaluated Google Cloud Platform (GCP) and found the solution to be most suited to their requirements.
Understanding the current Digital Ocean setup, migration timelines and the business needs for it, application and network architectures were the key pre-migration tasks.
Project Set Up
To establish a hierarchy of ownership, GCP organization node was setup. Separate folders were created for production, UAT and management environments. Identity and access management (IAM) framework was adapted to access different GCP services. This was mainly to have control on user access to critical information within the organization, which meant, a GCP service could securely access data on another service.
The set up involved organizing shared Virtual Private Cloud (VPC). Multiple such VPCs were created to deploy production, UAT and management applications. Configuring VPC peering between the required VPCs, creating appropriate CIDR range, subnets, and route tables as per architecture requirements and establishing VPN tunnels between Digital Ocean/ On-premise and GCP were all part of network configuration.
Cloud Network Address Translation (NAT) enabled Internet access for servers in the private subnet. Hypertext Transfer Protocol Secure (HTTPS) load balancers were created to distribute the traffic between the web and app servers.
Re-architecting and migrating to GKE
CloudEndure service was used to migrate the application and databases from Digital Ocean to GCP. The purpose was to install CloudEndure Agents on the required servers and ports to be opened to connect to GCP. A staging VPC in the respective folders was to host the Linux instance (volumes) to which CloudEndure will replicate the data. CLoudEndure will then replicate the data at the block level to the volumes on GCP. Once the replication is done CloudEndure will deploy the application and database servers to the required subnets mentioned in the replication settings.
All ElasticSearch clusters would be created and the data imported to the new cluster. Frontend application servers and Elastic search clusters were deployed on GKE. SQL cluster was deployed on GCP as a Master-Master setup with Percona. Redis Instance will be used to host the Redis Cache. MYSQL database will have multiple Read Replicas created. MongoDB will contain all the status data for emails and SMS sent to the customer. Kafka cluster will be recreated on GCP. Storm application will be migrated using CloudEndure. All static content will be migrated from S3 to Cloud Storage on GCP. Content Delivery Network (CDN) will be migrated from AWS Cloudfront to GCP Cloud CDN. Jenkins will be used to Setup Continuous integration/continuous deployment (CI/CD) for the application servers. Backups will be taken based on recommended best practices and customer requirements.
Monitoring and logging
Stackdriver tool to be used as the Monitoring tool for infrastructure/application monitoring, log analytics etc. Stackdriver also supports features like tracing, debugging, profiling etc. to monitor the overall performance of the application. Appropriate alarms and Stackdriver alerting policies have to be configured based on customer requirements. Sensu will be used for additional monitoring of infrastructure. FluxDB will be used as the database to store all data and Graphana will be used to visualize and create the dashboards.
Cloud Audit logging will be enabled to capture all API activities in the account. VPC flow logs will be enabled to capture all network logs. Centralized Logging and monitoring will be done using the Stackdriver Logging module.
Security & Compliance
IAM users and groups will be created with the least permissible access. MFA will be enabled for account access which will provide an additional layer of security.
All application, database, Elastic Search (ES), Kafka, etc will be deployed in the private subnet. This ensures that there is no direct internet access to critical servers like database and app servers. Firewall rules will be configured to control traffic to and from the resources attached to the VPC. Firewall rules are applied at the virtual networking level, so they help provide effective protection and traffic control regardless of the operating system that the instances use. Only the required ports will be opened, and access will be provided only to the required IP addresses. Both data in transit and at rest are by default encrypted in GCP. All GCP API endpoints are SSL enabled. VPN Tunnels will be enabled between GCP, Customer Locations & Customer Data Centers.
Google Cloud Platform (GCP) helped the customer sail through a hassle-free digital transformation providing them with a highly efficient and cost-effective solution.
Re-architecturing cut down their costs by approximately 40% and helped them gain more scalability and high availability.
Managing the current setup on GCP became more structured. This helped them gain more control from a security standpoint.
With a combination of strategies like recreation/lift and shift, re-architecture of their current infrastructure running on Digital Ocean being adopted and planned with advanced technologies like Managed Kubernetes, App Engine, Big Query & Managed Load Balancer lead to a better experience for its customers. Additionally, the increase of web traffic on their site had no adverse impact on end-users and there were zero performance issues.
The customer team could even scale up the infrastructure at any point in time as and when required.
The customer: A leading therapeutics company in immuno-oncology
A leading therapeutics company in immuno-oncology was running their research data of more than 2 TB in on-premise setup and were facing multiple challenges when performing tests and other application validations, due to constraints on scaling and performance validation.
Google Kubernetes Platform (GKE) helped Customers in their digital transformation journey with a highly efficient and cost-effective solution. A complete re-architecture of their current setup running in on-premise standalone virtual machines was planned with advanced technologies like Managed Kubernetes. Jenkins was implemented for the CI/CD pipeline for making sure integration of individual jobs, easier code deployment to production and effortless auditing of logs. Auto-Scaling was flawless and handling large data was much easier.
Number of VM’s – 180+
Number of applications migrated – 25+
Approximate size of DB – 2TB
Migration of all the containerized images to managed GKE on Google Cloud helped achieve high availability and scaling.
The customer was able to manage their complete application lifecycle and build lifecycle as a code; it additionally helped to meet required security compliances.
Tools and services used:
Tools used – Istio, Jenkins, MySQL, Clam AV, Elasticsearch &GitHub
Google services used – Compute Engine, Container Build, Google Kubernetes Engine, Container Registry, Cloud SQL, Stack Driver, Cloud Identity & Access Management, Cloud VPN, Cloud DNS, Cloud Load Balancing