Customer: The fastest-growing cinema business in the Middle East
The customer is the fastest-growing cinema business in the Middle East who is intending to migrate their core movie ticket management and booking application on to AWS. Currently, it is a colocation data center that needs to be migrated for better scalability and availability followed by implementation of DevOps optimisation, disaster recovery, log analysis, and managed services.
Case – Migration
Type – RePlatform
Number of VM’s – 50
Number of applications migrated – 5
Approximate size of DB – 250 GB
The customer is the fastest-growing cinema business in the Middle East, which is the leading shopping mall, communities, retail, and leisure pioneer across the Middle East, Africa, and Asia. They are a prominent exhibitor in the middle east with a significant regional presence.
The customer is planning to migrate their core movie ticket management and booking application from their colocation data center (DC) to AWS for better availability and scalability. Migration to AWS was to facilitate the customer to move their production workload as-is without many architectural changes and then focus on DevOps optimisation.
Powerup proposed a 2 phased approach in migrating the customer applications to AWS.
Phase 1 – As-is migration of applications from on-premise set up to AWS cloud along with the migration of databases for better scalability and availability.
Phase 2 – Implementation of DevOps optimisation.
Post this, Powerup’s scope of work for the customer also included Disaster Recovery (DR) implementation, setting up EKK for log analysis and AWS managed services.
Powerup will study the customer’s environment and prepare a blueprint of the overall architecture. They will identify potential servers and database failure points and will accordingly fix the automation of backups.
Powerup to work in coordination with the customer team to,
export application and data from on-premise architecture to AWS using either Amazon Import/Export functionality or over the internet.
Restore data on the cloud and enable database replication between on-premise and Amazon data stores to identify differential data.
Implement the monitoring agents and configure backups.
Conduct load testing if required as well as system and user acceptance accessibility tests to identify and rectify vulnerabilities.
Post-deployment and stabilization, Powerup completed the automation of the infrastructure using AWS Cloud formation and code deployment automation to save operational time and effort.
Post phase 1 as-is migration of the customer’s applications, Powerup’s DevOps team will perform weekly manual and automated audits and share reports with the customer team.
Weekly reports on consolidated uptime of applications, total tickets logged, issue resolution details and actionable plans will be shared with the customer team. Powerup will also run a Vulnerability Assessment & Penetration Testing (VAPT) on cloud coupled with quarterly Disaster recovery (DR) drills for one pre-selected application in the cloud platform to ensure governance is in place.
DevOps is also responsible for seamless continuous integration (CI) typically handled by managing a standard single-source repository, automating the build, track the build changes/progress and finally automating the deployment.
Disaster Recovery (DR)
Powerup to understand customer’s compliance, Recovery Point Objective (RPO) and Recovery Time Objective (RTO) expectations before designing the DR strategy.
Configure staging VPC, subnets and the entire network set up as per current architecture.
Set up network access controls, create NAT gateway and provision firewall for the DR region.
Initiate CloudEndure console, enable replication to AWS staging server, create failover replication to DR site from CloudENdure dashboard to conduct DR drill.
Verify and analyze RTO and RPO requirements.
EKK set up
The AWS EKK stack (Amazon Elasticsearch Service, Amazon Kinesis and Kibana) act as an alternative to ELK, an open-source tool by Amazon to engage in and visualize data logs. Powerup’s scope involved gathering information on environment and providing access to relevant users to create an AWS ElasticSearch service and AWS Kinesis service. The intent was to install and configure the Kinesis agent on the application servers in order to push the data logs and validate the log stream to the Kibana dashboard. This set up would help configure error and failure logs, alerts, anti-virus logs and transition failures. The EKK solution is known to provision for analyzing logs and debugging applications. Overall, it helps in managing a log aggregation system. Know more on EKK implementation here.
The first step is to study the customer’s cloud environment to identify potential failure points and loopholes if any.
Powerup DevOps team will continuously monitor the customer’s cloud infrastructure health by keeping a check on CPU, memory and storage utilization as well as URL uptime and application performance.
OpenVPN will be configured for the secure exchange of data between the customer’s production setup on AWS and the individual cinema locations. The web, application and database servers are implemented in High Availability (HA) mode. Databases are implemented on Amazon EC2 with replication enabled to a standby server. Relational database service (RDS) may be considered if there are no dependencies from the application end.
Security in the cloud is a shared responsibility between the customer, cloud platform provider and Powerup. Powerup will continuously analyze and assist the customer with best practices around application-level security.
The security monitoring scope includes creating an AWS organization account and proxy accounts with multi-factor authorization (MFA) for controlled access on AWS. Powerup to also set up an Identity and Access Management (IAM), security groups as well as network components on customer’s behalf.
The powerup team has helped setting up the VPN tunnel from AWS to different customer theatre locations [31 different locations].
Enable server-side encryption and manage Secure Sockets Layer (SSL) certificates for the website. Monitor logs for security analysis, resource change tracking, and compliance auditing. Powerup DevOps team to track and monitor firewall for the customer’s environment and additionally mitigate distributed denial-of-service (DDoS) attacks on their portals and websites.
Anti-virus tools and intrusion detection/prevention to be set up by Powerup along with data encryption at the server as well as storage level. Powerup DevOps team will continuously monitor the status of automated and manual backups and record the events in a tracker. In case of missed automatic backups, a manual backup will be taken as a corrective step. Alerts to also be configured for all metrics monitored at the cloud infrastructure level and application level.
Migration helped the customer achieve better scalability and availability.
DevOps tool helped automate manual tasks and facilitated seamless continuous delivery while AWS cloud managed services provisioned the customer to reduce operational costs and achieve maximum optimization of workload efficiency.
Customer: The fastest-growing cinema business in the Middle East
The customer is the fastest-growing cinema business in the Middle East wanted to manage the logs from multiple environments by setting up centralized logging and visualization, this was done by implementing the EKK(Amazon Elasticsearch, Amazon Kinesis and Kibana) solution in their AWS environment.
The customer is a cinema arm of a leading shopping mall, retail and leisure pioneer across the Middle East and North Africa. They are the Middle East’s most innovative and customer-focused exhibitor, and the fastest and rapidly growing cinema business in the MENA region.
The customer’s applications generate huge amounts of logs from multiple servers, if any error occurs in the application it is difficult for the development team to get the logs or view the logs in real-time to troubleshoot the issue. They do not have a centralized location to visualize logs and get notified if any errors occur.
In the ticket booking scenario, by analyzing the logs that are generated by the application, an organization can enable valuable features, such as notifying the developers that error occurred in the application server while customers are booking the ticket. If the application logs can be analyzed and monitored in real-time, developers can be notified immediately to investigate and fix the issues.
Powerup built a log analytics solution on AWS using ElasticSearch as the real-time analytics engine. AWS Kinesis firehose pushes the data to ElasticSearch. In some scenarios, the Customer wanted to transform or enhance data streaming before it is delivered to ElasticSearch. Since all the application logs are in an unstructured format in the server, the customer wanted to filter the unstructured data and transform it into JSON before delivering it to Amazon Elasticsearch Service. Logs from Web, App and DB were pushed to Elasticsearch for all the six applications.
Amazon Kinesis Agent
The Amazon Kinesis Agent is a stand-alone Java software application that offers an easy way to collect and send data to Kinesis Streams and Kinesis Firehose.
AWS Kinesis Firehose Agent – daemon installed on each EC2 instance that pipes logs to Amazon Kinesis Firehose.
The agent continuously monitors a set of files and sends new data to your delivery stream. It handles file rotation, checkpointing, and retry upon failures. It delivers all of your data in a reliable, timely, and simple manner.
Amazon Kinesis Firehose
Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. It can capture, transform, and load streaming data into Amazon Kinesis Analytics, Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service, enabling near real-time analytics with existing business intelligence tools and dashboards that you’re already using today.
Kinesis Data Firehose Stream – endpoint that accepts the incoming log data and forwards to ElasticSearch
Kinesis Data Firehose can invoke your Lambda function to transform incoming source data and deliver the transformed data to destinations. When you enable Kinesis Data Firehose data transformation, Kinesis Data Firehose buffers incoming data up to 3 MB by default. Kinesis Data Firehose then invokes the specified Lambda function asynchronously with each buffered batch using the AWS Lambda synchronous invocation model. The transformed data is sent from Lambda to Kinesis Data Firehose. Kinesis Data Firehose then sends it to the destination when the specified destination buffering size or buffering interval is reached, whichever happens first.
Store, analyze, and correlate application and infrastructure log data to find and fix issues faster and improve application performance. You can receive automated alerts if your application is underperforming, enabling you to proactively address any issues.
Provide a fast, personalized search experience for your applications, websites, and data lake catalogs, allowing users to quickly find relevant data.
Collect logs and metrics from your servers, routers, switches, and virtualized machines to get comprehensive visibility into your infrastructure, reducing mean time to detect (MTTD) and resolve (MTTR) issues and lowering system downtime.
Kibana is an open-source data visualization and exploration tool used for log and time-series analytics, application monitoring, and operational intelligence use cases. It offers powerful and easy-to-use features such as histograms, line graphs, pie charts, heat maps, and built-in geospatial support. Also, it provides tight integration with Elasticsearch, a popular analytics and search engine, which makes Kibana the default choice for visualizing data stored in Elasticsearch.
Using Kibana’s pre-built aggregations and filters, you can run a variety of analytics like histograms, top-N queries, and trends with just a few clicks.
You can easily set up dashboards and reports and share them with others. All you need is a browser to view and explore the data.
Kibana comes with powerful geospatial capabilities so you can seamlessly layer in geographical information on top of your data and visualize results on maps.
Ingesting data to ElasticSearch using Amazon Kinesis Firehose.
Kinesis Data Firehose is part of the Kinesis streaming data platform, along with Kinesis Data Streams, Kinesis Video Streams, and Amazon Kinesis Data Analytics. With Kinesis Data Firehose, you don’t need to write applications or manage resources. You configure your data producers to send data to Kinesis Data Firehose, and it automatically delivers the data to the destination that you specified. You can also configure Kinesis Data Firehose to transform your data before delivering it.
The data of interest that your data producer sends to a Kinesis Data Firehose delivery stream. A record can be as large as 1000 KB.
Producers send records to Kinesis Data Firehose delivery streams. For example, a web server that sends log data to a delivery stream is a data producer. You can also configure your Kinesis Data Firehose delivery stream to automatically read data from an existing Kinesis data stream, and load it into destinations.
Writing Logs to Kinesis Data Firehose Using Kinesis Agent
Amazon Kinesis Agent is a standalone Java software application that offers an easy way to collect and send data to Kinesis Data Firehose. The agent continuously monitors a set of files and sends new data to your Kinesis Data Firehose delivery stream.
The agent handles file rotation, checkpointing, and retry upon failures. It delivers all of your data in a reliable, timely, and simple manner. It also emits Amazon CloudWatch metrics to help you better monitor and troubleshoot the streaming process.
The Kinesis Agent has been installed on all the production server environments such as web servers, log servers, and application servers. After installing the agent, we need to configure it by specifying the log files to monitor and the delivery stream for the data. After the agent is configured, it durably collects data from those log files and reliably sends it to the delivery stream.
Since the data in the servers are unstructured and the customer wanted to send the specific format of data to ElasticSearch and visualize it on Kibana. So we configured an agent to preprocess the data and deliver the preprocessed data to AWS Kinesis Firehose. Preprocessed configuration used in the Kinesis Agent
Since the data in the logs are unstructured and needed to filter some specific records from the data. So we used the match pattern to send the record to filter the data and send it to Kinesis Firehose.
The agent has configured in a way to capture the unstructured data using regular expression and send it to the AWS Kinesis Firehose.
An Example How we filtered the data and sent it to the kinesis firehose.
The record in the server has been converted to JSON format. The Match pattern only captures the data in the data according to regular expression and sends the data to AWS Kinesis Firehose. AWS Kinesis Firehose sends the data to Elasticsearch and can be visualized on the Kibana.
Powerup Team successfully implemented the real-time centralized log analytics solution using AWS kinesis firehose and ElasticSearch.
Kinesis agent was used to filtering the applications and kinesis firehose streams the logs to Elasticsearch.
Separate indexes were created for all 6 applications in Elasticsearch based on access log and error log.
A Total of 20 dashboards were created in Kibana based on error types, for example, 4xx error, 5xx error, cron failure, auth failure.
Created Alerts were sent to the developers using AWS SNS. when the configured thresholds, so that developers can take immediate actions on the errors generated on the application and server.
Developer log analysis time has greatly decreased from a couple of hours to a few minutes.
The EKK setup implemented for the customer is a total log-analysis platform for search, analyses and visualization of log-generated data from different machines and perform centralized logging to help identify any server and application-related issues across multiple servers in the customer environment and correlate the logs in a particular time frame.
The data analysis and visualization of EKK setup have benefited the management and respective stakeholders to view the business reports from various application streams which led to easy business decision making.
The customer an integrated resort on the island in Singapore. They offer some world-class attractions one of which is the Battlestar Galactica, the most enticing duel rollercoaster ride at the resort. They decided to cater to preventive maintenance of the wheels of this ride to ensure top-class safety. They planned to adopt Machine Learning (ML) based solution via Google cloud platform (GCP).
The Customer’s Battlestar Galactica ride is financially quite demanding and requires high maintenance.
The wheel detection process is time-consuming and a high maintenance manual job.
Decision making on the good versus the bad wheel is based on human judgement and expert’s experience.
The ultimate goal was to remove human intervention and automate the decision making on the identification of a bad wheel using machine learning. The machine learning model needed to be trained on currently available data and ingest real-time data over a period of time to help identify patterns of range and intensity values of wheels. This would in turn help in identifying the wheel as good or bad at the end of every run.
Ordering of .dat files generated by SICK cameras to be maintained in a single date-time format for appropriate Radio-frequency identification (RFID) wheel mapping. Bad wheel data should be stored in the same format as a good wheel (.dat files) in order to train the classifier. The dashboard to contain the trend of intensity and height values. Single folder to be maintained for Cam_01 and another folder for Cam_02, folder name or location should not change.
Data ingestion and storage
An image capturing software tool named Ranger Studio was used to absorb the complete information on wheels. The Ranger Studio onsite machine generates .dat files for wheels post every run and stores in a local machine. An upload service picks these .dat files from the storage location at pre-defined intervals and runs C# code on it to provide CSV output with range and intensity values.
CSV files are pushed to Google Cloud Services (GCS) using Google Pub/Sub real-time messaging service. The publisher is used to publish files from the local machine using two separate python scripts for Cam01 and Cam02. The subscriber is then used to subscribe to the published files for Cam01 and Cam02.
Powerup is responsible to ingest the data into cloud storage or cloud SQL based on the defined format. Processing of data would include the timestamp and wheel run count. There is a pub tracker and a sub tracker maintained to track the files for both cameras so that the subscribed files can be stored on GCS for both the cameras separately. After CSV data is processed, it is removed from the local machine via a custom script to avoid memory issues.
Data modelling Cloud SQL
Once data is processed, Powerup to design the data model in cloud SQL where all the data points will be stored in relational format.
The CSV files of individual wheels are then used to train the classifier model. The classifier model is built with an application programming interface named Keras. The trained classifier helps generate a prediction model (.pkl file) to identify good and bad wheels. The prediction model resides on a GCP VM. The generated CSV files are passed through the prediction model and are classified as good or bad based on an accuracy value.
Big Query and ML Model
Once the prediction for a wheel is done, the predicted accuracy score, timestamp and wheel information is stored into the Big Query tables. The average wheel accuracy for wheels is then displayed on Google Data Studio.
Powerup to ensure optimization of data performance via tuning and build the ML model. This would enable the customer to obtain large volumes of height and intensity data, post which, they score the ML model with new data.
Current accuracy threshold for SMS trigger is set at 70. Accuracy of prediction is set to improve over a period 6 months when the model has enough bad wheel data reported for training the ML classifier model. SMS will be triggered if the accuracy value is below 70.
SMS will also be triggered if a file is not received from the local machine to Google Cloud Storage via Google Pub/Sub. The reason for file not being received needs to be checked by the client’s SICK team as it may be due to multiple reasons like source file not generated due to camera malfunction, system shutdown or maintenance and so on. Powerup team to be informed about the same as the restart of instances may be required in such cases. Twilio is the service used for SMS whereas SendGrid is used for email notifications.
Security and Deployment
Powerup to build a secure environment for all third party integrations. Deploy User Acceptance Test (UAT) environments, conduct regression tests and provide
Go Live Support to off-site applications. The number of servers and services supported with the production was 10 where support included server management in terms of security, network, DevOps, backup DR and audit. Support also included adjusting ML models to improvise training.
Since the request payload size was higher, Google ML / Online Predictor could not be used. A custom prediction model was built with Keras to overcome this.
Google Cloud Platform.
Cloud Storage, Bog Query, Data Studio, Compute Engine.
Powerup has successfully been able to train the classifier model with a limited set of good and bad wheel real-time data. The accuracy of the model is expected to improve over time. With current data, the accuracy of the model stands at 60% ensuring cost-effectiveness and world-class safety.
The customer is a leading US-based medical equipment company catering mainly to cloud-connected medical devices that transform care for people with sleep apnea, COPD and other chronic diseases. They are looking at integrating their MyApp application’s data to MosaIQ Data Lake platform on AWS cloud. MyApp is a self-monitoring sleep therapy progress application used extensively by medical representatives and caregivers.
The customer is one of the top medical equipment companies based in San Diego, California. They primarily provide cloud-connectable medical devices for the treatment of sleep apnea, chronic obstructive pulmonary disease (COPD) and other respiratory conditions. It employs more than 7,500 employees worldwide with a presence in more than 120 countries globally that have manufacturing facilities in Australia, France, Singapore and the United States.
MyApp is the customer’s patient self-monitoring application that helps track patient’s sleep therapy progress both online as well as on smartphones. MyApp facilitates tailored coaching and handy tips to make therapy more comfortable. The Customer wanted to,
To integrate MyApp application data to MosaIQ Data Lake platform on AWS.
Reuse and replicate data flow of AirView, inclusive of policy, pseudo rules, de-identification, Protected Health Information (PHI) and non-PHI.
Build code for data staging, data transformations for regulatory adherence and storage on AWS Simple Storage Service (S3).
Powerup to analyze and define the scope of integration. Obtain complete access to AWS development, system integration test and production setups and create AWS services catering to Virtual Private Network (VPC)s, subnets, route tables and Internet gateways. Define fixed and incremental S3 buckets for PHI as well as non-PHI accounts.
Ensure that a detailed definition of MyApp S3 policies including source connections and scheduling is made available before coding in the development environment. Also, freeze all policies and pseudo rules for PHI and non-PHI data encryption until coding completion and migration to test environment.
Implement Data Migration Service (DMS) to migrate data from on-prem to AWS cloud storage S3. Data with all the files to be pushed inside a single folder per table in the S3 bucket via lambda functions. CDC to be implemented for incremental data transfer to S3 event which in turn will trigger and push the requests to Amazon Simple Queue Service (SQS).
Leverage Fargate containers to run scripts in order to check data against the IDs. Run Electronic Medical Records (EMR) cluster by applying masking logic to this data which is sent for further analytics. Identify and save the same in S3 buckets. The next step is to create a test strategy for unit and integration tests.
Powerup DevOps to configure Complement Fixation Test (CFT) and implement continuous integration and continuous deployment (CI/CD) process for MyApp migration. Create integration test scripts, test CI/CD process before the actual system integration migration (SIT), prepare migration to development and UAT environments and devise automation.
The next task is to migrate to SIT through Ci/CD to validate all the resources and execute full load and schedule trigger for CDC load before moving to production deployment. Repeat the process in the production environment and perform UAT.
Post the integration, Powerup took up the responsibility of architectural assessment and went ahead with the Well-Architected Review (WAR) framework. WAR is an architectural assessment based on AWS framework that is built on five pillars – operational efficiency, reliability, security, performance efficiency and cost optimization.
Powerup identified the workload to be reviewed and once relevant data were identified, reviews were arranged with the stakeholders at the company. Review could be conducted onsite or remotely. A report aligning with AWS best practices, categorized as critical, needs improvement or meets best practices were generated for the selected workload. The report highlights the priority with which remediation should be carried out.
MyApp application data has been integrated to MosaIQ on AWS cloud successfully. This platform can now provide capabilities to wider business team communities as MosaIQ is a data lake platform built on top of AWS stack and stores structure and unstructured data in raw format. It assists in the rapid discovery of actionable insights to improve patient care and business outcomes while maintaining security and regulatory compliance.
MosaIQ platform allows analytics, engineers, and data scientists to respond more efficiently and provide timely information to support better business decisions. This is mainly because data segregation is more organized and bifurcated for PHI and non-PHI data.
Reusable design from MyApp integration can be utilized for similar use cases across the company. A significant improvement in performance was noticed due to features like scalability and reduction of in-memory processing.
The customer is an international clinical-stage biopharmaceutical company focusing on cellular immunotherapy treatments for cancer is looking at adopting cloud services for the very first time. They plan to structure their database on Google cloud platform. The intention is to enhance performance and have efficient research outputs from their applications especially since they handle large volumes of data. They were also looking at the ability to scale at any point of time during peak loads along with complete automation of continuous integration and continuous deployment (CI/CD) process for easier deployments and better auditing, monitoring and log management.
The customer is a clinical-stage biopharmaceutical organization with the scientific vision of revolutionizing the treatment of cancer. They specialize in the research, clinical development and commercialization of cancer immunotherapy treatments. The combination of technologies from its academic, clinical and commercial research partners have enabled the company to create a fully integrated approach to the treatment of cancer with immunotherapy. They plan to work with Powerup to use Google Cloud Platform (GCP) as its cloud platform for their Cancer Research program.
The customer plans to use Google Cloud Platform (GCP) as its cloud platform for their Cancer Research program. Data scientists will be using a Secure File Transfer Protocol (SFTP) server to upload data on an average of one to two times a month with an estimated data volume of 2-6 TB per month.
The data transferred to GCP has to undergo a two-step cleansing process before uploading it on a database. The first step is to do a checksum to match the data schema against the sample database. The second step is transcoding and transformation of data after which the data is stored on a raw database.
Greenfield setup on GCP
Understanding customer needs while also understanding the current python models and workflows to be created were the first steps in initiating this project. Post these preliminary studies and sign-off, a detailed plan and solution architecture document formed a part of the greenfield project deliverables.
The set up included shared services, logging UAT and production accounts. The Cloud Deployment Manager (CDM) was configured to manage their servers, networks, infrastructure and web applications. Cloud Identity and Access Management (IAM) roles were created to access different GCP services as per customer specification, which helped in securely accessing another service.
On-premise connectivity is established via VPN tunnels.
The data scientists team have built nearly 50+ python/R models that help in the data processing. All the models are stored in GitHub currently. Python model will meet performance expectations when deployed and CI/CD pipelines to be created for 48 python models.
Once the data arrives on the database, the customer wants the python code to process the data and store the results on an intermediate database.
Multiple folders were created to deploy production, UAT and management applications. Cloud NAT was set up to enable internet access, Virtual Private Cloud (VPC) peering done for inter-connectivity of required VPCs and SFTP server was deployed on Google Compute Engine.
Once data gets uploaded on the raw GCS, checksum function will be triggered to initiate data cleansing. In the first phase, the data schema will be verified against a sample database after which the data will be pushed to transcoding and transformation jobs. Processed data will be stored to GCS.
All the python/R models will be deployed as a docker image on a Kubernetes cluster that is managed by Google ensuring that GCP is taking care of high availability and scaling.
The customer will have multiple workflows created to process data that in turn would be able to define all the workflows for python model executions.
The customer team will view the current data through a web application.
The processed data also has to be synced back to the on-premise server. An opensource antivirus tool is used to scan and verify data before migrating to Google Cloud Storage (GCS).
Monitoring and Logging
Monitoring tools such as stackdriver for infrastructure and application monitoring as well as log analytics was used as it supported features like tracing, debugging and profiling to monitor the overall performance of the application.
Additional tools such as Sensu to monitor infrastructure, Cloud Audit logging that checks Application Program Interface (API) activities, VPC flow logs to capture network logs and FluxDB as well as Grafana to store data on the database and visualize and create dashboards respectively were utilised.
Stackdriver logging module ensures centralized logging and monitoring of the entire system.
Security and Compliance
IAM with least permissible access and Multi-Factor Authentication (MFA) be enabled as an additional layer of security for account access. The databases won’t have direct access to critical servers like database and app servers. Firewall rules will be configured at the virtual networking level for effective protection and traffic control regardless of the operating system used. Only the required ports will be opened to give access to the necessary IP addresses.
Both data in transit and at rest are by default encrypted in GCP along with provisions for static code analysis and container image-level scanning.
Setup CI/CD pipeline using Jenkins which is an open-source tool that facilitates modern DevOps environment. It bridges the gap between development and operations by automating building, testing and deployment of applications.
After the successful deployment of code, code integration and log auditing got simpler. The customer was able to handle large blocks of data efficiently and auto-scaling at any point of time during new product launches and marketing events became effortless. This improved their performance as well.
The customer was also able to scale up without worrying about storage and compute requirements. They could move into an Opex model on the cloud by paying as per usage.
Moving to GCP enabled the customer to save 20% of their total costs as they could adopt various pricing models and intelligent data tiering.
The customer is UAE’s aviation corporation catering over 70 million passengers as of date. Passenger service system (PSS), their ticket booking application was a legacy system that they intended to migrate to a cloud environment while ensuring they manage to leverage administered services of cloud by conducting a Migration Readiness Assessment & Planning (MRAP).
Passenger Service System (PSS) was the existing ticket booking application for the customer. The objective was to understand this legacy system and then recommend how it can be migrated to AWS while leveraging the cloud-native capabilities via an MRAP assessment. The focus would be application modernization rather than a lift & shift migration to the cloud. The customer team intends to leverage managed services of cloud and go serverless, containers, open source etc. wherever possible. The customer team also wants to move away from the commercial Oracle database to a more open-source AWS Aurora PostgreSQL database due to the high licensing costs imposed by Oracle.
MRAP is critical to any organization that plans to adapt to the cloud as this tool-based assessment checks their application’s ability to cloud. Powerup was approached to perform MRAP on their existing set up to propose a migration plan as well as a roadmap, post its analysis.
The customer’s MRAP Process
To begin with, the RISC Networks RN150 virtual appliance, an application discovery tool that poses as an optional deployment architecture was configured and installed on the customer’s existing PSS Equinix data centre (DC) to collect data and create a detailed tool-based assessment to understand the existing set up ‘s readiness to migration.
Application stacks were built for the applications in scope and assessments as well as group interviews were conducted with all stakeholders. Data gathered from stakeholders were cross-verified with the information provided by the customer’s IT and application team to bridge gaps if any. Powerup team would then work on creating a proposed migration plan and a roadmap.
A comprehensive and detailed MRAP report included the following information:
Existing overall architecture
The existing PSS system was bought from a vendor called Radixx International, which provided three major services:
Availability service, an essential core service mainly used by online travel agencies (OTAs), end-users and global distribution system (GDS) to check the availability of their customer’s flights. It’s base system contained modules like Connect Point CP (core), payments, the enterprise application (Citrix app) all written in .NET and the enterprise application for operation and administration written in VB6.
Reservation service was used in booking passengers’ tickets where data was stored in two sessions, Couchbase and the Oracle database. The webpage traffic was 1000:1 when compared to availability service.
DCS System (Check-in & Departure Control Systems) is another core system of any airline, which assists in passenger check-in, baggage check-in and alerting the required officials. It is a desktop application used by airport officials to manage passengers from one location to another with the availability of an online check-in module as well.
Existing Database: Oracle is the current core database that stores all critical information consisting of 4 nodes – 2 Read-Write nodes in RAC1 & another 2 (read-only nodes) in RAC2. All availability checks are directed to the read-only Oracle nodes. The Oracle database nodes are heavily utilized roughly at 60-70% on an average with currently 14 schemas within the Oracle database accessed by the various modules. Oracle Advanced Queuing is used is some cases to push the data to the Oracle database.
Recommended AWS Landing zone structure
The purpose of AWS Landing Zone is to set up a secure, scalable, automated multi-account AWS environment derived from AWS best practices while implementing an initial security baseline through the creation of core accounts and resources.
The following Landing Zone Account structure was recommended for the customer:
AWS Organizations Account:
Primarily used to manage configuration and access to AWS Landing Zone managed accounts, the AWS organizations account provides the ability to create and financially manage member accounts.
Shared Services Account:
It is a reference for creating infrastructure shared services. In the customer’s case, Shared Services Account will have 2 VPCs – one for management applications like AD, Jenkins, Monitoring Server, Bastion etc. and other Shared services like NAT Gateway & Firewall. Palo Alto Firewall will be deployed in the shared services VPC across 2 Availability Zones (AZ)s and load balanced using AWS Application Load Balancer.
AWS SSM will be configured in this account for patch management of all the servers. AWS Pinpoint will be configured in this account to send notifications to customer – email, SMS and push notifications.
Centralized Logging Account:
The log archive account contains a central Amazon S3 bucket for storing copies of all logs like CloudTrail, Config, CloudWatch logs, ALB Access logs, VPC flow logs, Application Logs etc. The logging account will also host the Elasticsearch cluster, which can be used to create custom reports as per customer needs, and Kibana will be used to visualize those reports. All logs will be pushed to the current Splunk solution used by the customer for further analysis.
The Security account creates auditor (read-only) and administrator (full-access) cross-account roles from a security account to all AWS Landing Zone managed accounts. The organization’s security and compliance team can audit or perform emergency security operations with this setup and this account is also designated as the master Amazon GuardDuty account. Security Hub will be configured in this account to get a centralized view of security findings across all the AWS accounts and AWS KMS will be configured to encrypt sensitive data on S3, EBS volumes & RDS across all the accounts. Separate KMS keys will be configured for each account and each of the above-mentioned services as a best practice.
Powerup recommended Trend Micro as the preferred anti-malware solution and the management server can be deployed in the security account.
This account will be used to deploy the production PSS application and the supporting modules. High availability (HA) and DR will be considered to all deployments in this account. Auto-scaling will be enabled wherever possible.
UAT Account – Optimized Lift & Shift:
This account will be used to deploy the UAT version of the PSS application. HA and scalability are not a priority in this account. It is recommended to shut down the servers during off-hours to save cost.
Based on the understanding of the customer’s business a Hot Standby DR was recommended where a scaled-down version of the production setup will be always running and will be quickly scaled up in the event of a disaster.
UAT Account – Cloud-Native:
The account is where the customer’s developers will test all the architectures in scope. Once the team has made the required application changes, they will use this account to test the application on the cloud-native services like Lambda, EKS, Fargate, Cognito, DynamoDB etc.
Application Module – Global Distribution Systems (GDS)
A global distribution system (GDS) is one of the 15 modules of the PSS application. It is a computerized network system that enables transactions between travel industry service providers, mainly airlines, hotels, car rental companies, and travel agencies by using real-time inventory (for e.g., number of hotel rooms available, number of flight seats available, or number of cars available) to service providers.
The customer gets bookings from various GDS systems like Amadeus, Sabre, Travelport etc.
ARINC is the provider, which connects the client with various GDS systems.
The request comes from GDS systems and is pushed into the IBM MQ cluster of ARINC where it’s further pushed to the customer IBM MQ.
The GMP application then polls the IBM MQ queue and sends the requests to the PSS core, which in turn reads/writes to the Oracle DB.
GNP application talks with the Order Middleware, which then talks with the PSS systems to book, cancel, edit/change tickets etc.
Pricing is provided by the Offer Middleware.
Topology Diagram from RISC tool showing interdependency of various applications and modules:
Any changes in the GDS architecture can break the interaction between applications and modules or cause a discrepancy in the system that might lead to a compromise in data security. In order to protect the system from becoming vulnerable, Powerup recommended migrating the architecture as is while leveraging the cloud capabilities.
Proposed Migration Plan
IBM MQ cluster will be setup on EC2, and auto-scaling will be enabled to maintain the required number of nodes thus ensuring availability of EC2 instances at all times. IBM MQ will be deployed in a private subnet.
Amazon Elastic File System (Amazon EFS) will be automatically mounted on the IBM MQ server instance for distributed storage, to ensure high availability of the queue manager service and the message data. If the IBM MQ server fails in one availability zone, a new server is created in the second availability zone and connected to the existing data, so that no persistent messages are lost.
Application Load Balancer will be used to automatically distribute connections to the active IBM MQ server. GMP Application and PNL & ADL application will be deployed on EC2 across 2 AZs for high availability. GMP will be deployed in an auto-scaling group to scale based on the queue length in the IBM MQ server and consume and process the messages as soon as possible whereas PNL & ADL to scale out in case of high traffic.
APIS Inbound Application, AVS application, PSF & PR application and the Matip application will all be deployed on EC2 across 2 AZs for high availability in an auto-scaling group to scale out in case of high traffic.
GMP and GMP code sharing applications will be deployed as Lambda functions. The lambda function will run when a new message comes to the IBM MQ.
PNL & ADL application will be deployed as a Lambda function and the function will run when there is a change in the PNR number in which case a message must be sent to the airport.
AVS application will be deployed as Lambda functions where it will run when a message will be sent to the external systems.
Matip application will be deployed as a Lambda function and will run when a message will be sent using the MATIP protocol.
PFS & PR application will be deployed as Lambda functions. The lambda function will run when a message will be sent to the airport for booking.
APIS Inbound application will be deployed as a Lambda function and it will run when an APIS message will be sent to the GDS systems.
For all the above, required compute resources will be assigned as per the requirement. Lambda function will scale based on the load.
Application modifications recommended
All the application components like GMP, AVS, PNL & ADL, PFS & PR, Matip, etc are currently in .NET. which have to be moved into .NET Core to be run as Lambda functions. The applications are recommended to be broken down into microservices.
Oracle to Aurora Database Migration
AWS schema conversion tool (SCT) is run on the source database, which will generate a schema conversion report that will help understand interdependencies of existing schemas, and how they can be migrated on to Aurora PostgreSQL. The report will contain database objects some that can be directly converted by the SCT tools and the rest, which would need manual intervention. For Oracle functionalities that are not supported in Aurora PostgreSQL, the application team must write custom code to migrate those. Once all the schemas are migrated, AWS Database Migration Service will be used to migrate the entire data set from Oracle to Aurora.
Oracle to Aurora-PostgreSQL Roadmap
Lift & shift:
The current Oracle database will be moved to AWS as-is without any changes in order to kick-start the migration. The Oracle database can run on AWS RDS service or EC2 instances. One RDS node will be the master database in read/write mode. The master instance is the only instance to which the application can write to. There will be 3 additional read-replicas spread across 2 AZs of AWS to handle the load that is coming in for read requests. In case the master node goes down one of the read replicas is promoted as the master node.
Migrate the Oracle schemas to Aurora:
Once the Oracle database is fully migrated to AWS, the next step is to gradually migrate the schemas one by one to Aurora – PostgreSQL. The first step is to map all the 14 schemas with each application module of the customer. The Schemas will be migrated based on this mapping, wherever there are non-dependent schemas on other modules, it will be identified and migrated first.
The application will be modified to work with the new Aurora schema. Any functionality, which is not supported by Aurora, will be moved to application logic.
DB links can be established from Oracle to Aurora, however, it cannot be established from Aurora to Oracle database.
Any new application development that is in progress should be compatible and aligned with the Aurora schema.
Finally, all the 14 schemas will be migrated onto Aurora and the data will be migrated using DMS service. The entire process is expected to take up to 1 year. There will be 4 Aurora nodes – One Master Write & 3 Read Replicas spread across 2 AZs of AWS for high availability.
The assessment posed as a roadmap to move away from Oracle to PostgreSQL saving up to 30% in Oracle License cost. It also provided a way forward for each application towards cloud-native.
Current infrastructure provisioned was utilized at around 40-50% and a significant reduction in the overall total cost of ownership (TCO) was identified if they went ahead with cloud migration. Less administration by using AWS managed services also proved to be promising, facilitating smooth and optimized functioning of the system while requiring minimum administration.
With the MRAP assessment and findings in place, the customer now has greater visibility towards cloud migration and the benefits it would derive from implementing it.
Customer: One of India’s top media solutions company
The powerup cloud helped the customer completely transform their business environment through complete automation. Our design architecture and solution engineering improved business process efficiency, without any manual intervention, resulting in turnaround time is decreased by more than 90%. Now most of their applications running on the cloud, the customer has become one of most customer-friendly media companies in India.
The customer’s team wants to concentrate on building applications rather than spending time on the infrastructure setup and dependencies packages installed and maintained on the servers. The proposing solution needs to be a quick & scalable one so that business performance will be improved significantly.
Focusing on workload and transaction volume, we designed a customer-friendly, network optimized, a highly agile and scalable cloud platform that enabled cost optimization, effective management, and easy deployment. This helped reducing interventions and cost overheads.
We used AWS native tool CloudFormation to deploy the infrastructure as code, the ideology behind this is deployed infra as well as we can use it for Disaster Recovery.
CloudFormation template implemented in Stage & prod environment based on the best practice of AWS by subjecting the severs to reside in private subnets and internet routing with the help of Nat-gateway.
To remove the IP dependencies for a better way to manage failures, the servers and the websites are pointed to the Application load balancers where a single load balancer we managed to have multiple target groups in the view of cost optimization.
Base Packages Dependency:
This solution must remove the dependency of the developer to install the packages on the server to support the application.
The packages need to be installed on the infra setup, so the developer can deploy the code using the code deployer services rather than spending time to install dependencies.
Hence, we proposed & implemented the solution via Ansible, With the help of ansible we can able to manage multiple servers under a single roof. We have prepared a shell script that will install the packages on the server.
The architecture majorly differentiated in the means of Backend & frontend Module.
Backend Module where the java application will be running, hence a shell script will run the backend servers which will install the Java-8 versions and creates a Home path based on standard path, so home path execution of application will be always satisfied by this condition.
Frontend Module which more likely of Nginx combined with node.js which achieved by the same methodology.
The application’s logs and other backup artifacts are managed in the secondary EBS volume which the mount point to the fstab entries are also automated.
The Main part of deployment achieved by the code-deployer hence the servers should be installed with code-deployment agents during the server setup which is also done through ansible.
User access is another solution, where the access to the servers restricted for some people in the development team and the access will be provided to the server with the approval of their leads.
We had, dev, qa, psr, stage and prod environments we clubbed all the servers in the ansible inventory and generated a public key and private key and passed them on the standard part. When the user adds scripts runs, ansible will copy the public keys and create a user on the destination server by pasting the public key in the authorized file.
This method will be hidden the pub key from the end-user when the user asked to removed using ansible we will delete those users from the server.
Monitoring with sensu:
Infra team is responsible for monitoring the infra, hence we created a shell script that will install the sensu on the destination server for monitoring using ansible.
By implementing these solutions, the development was less worried about the packages dependencies which allowed them to concentrate on their app development and fixing bugs and user access got streamlined.
Bastion with MFA settings:
The servers in the environment can get accessed only by the bastion server which acts as the entry point.
This bastion server was set up with the MFA mechanism, where each user must access the server with MFA authentication as a security best practice.
In one of the legacy account, SSL offloaded at the server level with a lot of Vhosts. Hence renewing certificates will take time to reduce the time we used SSL with ansible to rotate the certificates in a quick time with less human efforts.
Automation in Pipeline :
Base packages installation on bootup which reduces one step of installation.
User access with automatic expiry condition.
In addition to the on-going consulting engagement with the customer for enhancement, and designing a system to meet the client’s need, Powerupcloud also faced some challenges. The Infra has to be created in quick time with 13 servers under the application load balancers, which includes Networking, compute and load balancers with target groups. The Instances were required to install with certain dependencies to run the application smoothly. As a result, the development process became more complicated.
The solution was also expected to meet the very high level of security, continuous monitoring, Non-stop 24X7 operation, High availability, agility, scalability, less turnaround time, and high performance, which was a considerable challenge given the high business criticality of the application.
To overcome these challenges, we established a predictive performance model for early problem detection and prevention. Also, started a dedicated performance analysis team with active participation from various client groups.
All the changes in configuration are smoothly and rapidly executed from the viewpoint of minimizing load balance and outage time.
Business Result & outcome
With the move to automation, the customer’s turn-around time decreased by 30%. This new system also helped them reduced capital investments as it is completely automated. The solution was designed in-keeping with our approach of security, scalability, agility, and reusability.
Successful implementation of the CloudFormation template.
The customer is one of the largest Indian entertainment companies catering to acquiring, co-producing, and distributing Indian cinema across the globe. They believe that media and OTT platforms can derive maximum benefit in terms of multi-tenant media management solutions provided by the cloud. Therefore, they are looking at migrating their existing servers, databases, applications, and content management system on to cloud for better scalability, maintenance of large volumes of data, modernization, and cost-effectiveness. The customer intends to also look at alternative migration strategies like re-structuring and refactoring if need be.
The customer is a global Indian entertainment company that acquires, co-produces, and distributes Indian films across all available formats such as cinema, television and digital new media. The customer became the first Indian media company to list on the New York Stock Exchange. It has experience of over three decades in establishing a global platform for Indian cinema. The company has an extensive and growing movie library comprising over 3,000 films, which include Hindi, Tamil, and other regional language films for home entertainment distribution.
The company also owns the rapidly growing Over The Top (OTT) platform. With over 100 million registered users and 7.9 million paying subscribers, the customer is one of India’s leading OTT platforms with the biggest catalogue of movies and music across several languages.
Problem statement / Objective
The online video market has brought a paradigm shift in the way technology is being used to enhance the customer journey and user experience. Media companies have huge storage and serving needs as well as the requirement for high availability via disaster recovery plans so that a 24x7x365 continuous content serving is available for users. Cloud could help media and OTT platforms address some pressing business challenges. Media and OTT companies are under constant pressure to continuously upload more content cost-effectively. At the same time, they have to deal with changing patterns in media consumption and the ways in which it is delivered to the audience.
The customer was keen on migrating their flagship OTT platform from a key public cloud platform to Microsoft Azure. Some of the key requirements were improved maintainability, scalability, and modernization of technology platforms. The overall migration involved re-platforming and migrating multiple key components such as the content management system (CMS), the Application Program Interfaces (APIs), and the data layer.
Powerup worked closely with the client’s engineering teams and with the OEM partner (Microsoft) to re-architect and re-platform the CMS component by leveraging the right set of PaaS services. The deployment and management methodology changed to containers (Docker) and Kubernetes.
Key learnings from the project are listed below:
Creating a bridge between the old database (in MySql) and a new database (in Postgres).
Migration of a massive volume of content from the source cloud platform to Microsoft Azure.
Rewriting the complete CMS app using a modern technology stack (using Python/Django) while incorporating functionality enhancements.
Setting up and maintaining the DevOps pipeline on Azure using open source components.
Modernized infrastructure powered by Azure, provided improved scalability and stability. The customer was able to minimize infrastructure maintenance using PAAS services. Modular design set-up enabled by migration empowered the developers with the ability to prototype new features faster.
The customer’s environment on AWS was facing scalability challenges as it was maintained across a heterogeneous set of software solutions with many different types of programming languages and systems and there was no fault-tolerant mechanism implemented. The lead time to get a developer operational was high as the developer ended up waiting for a longer duration to access cloud resources like EC2, RDS, etc. Additionally, the deployment process was manual which increased the chances of unforced human errors and configuration discrepancies. Configuration management took a long time which further slowed down the deployment process. Furthermore, there was no centralized mechanism for user management, log management, and cron job monitoring.
For AWS cloud development the built-in choice for infrastructure as code (IAC) is AWS CloudFormation. However, before building the AWS Cloudformation (CF) templates, Powerup conducted a thorough assessment of the customer’s existing infrastructure to identify the gaps and plan the template preparation phase accordingly. Below were a few key findings from their assessment:
Termination Protection was not enabled to many EC2 instances
IAM Password policy was not implemented
Root Multi-Factor Authentication (MFA) was not enabled
IAM roles were not used to access the AWS services from EC2 instances
CloudTrail was not integrated with Cloudwatch logs
S3 Access logs for Cloudtrail S3 bucket was not enabled
Log Metric was not enabled for Unauthorised API Calls; Using ROOT Account to access the AWS Console; IAM Policy changes; Changes to CloudTrail, CloudConfig, S3 Bucket policy; Alarm for any security group changes, NACL, RouteTable, VPCs
SSH ports of few security groups were open to Public
VPC Flow logs were not enabled for few VPCs
Powerup migrated their monolithic service into smaller independent services which are self-deployable, sustainable, and scalable. They also set up CI/CD using Jenkins and Ansible. Centralized user management was implemented using FreeIPA, while ELK stack was used to implement centralized log management. Healthcheck.io was used to implement centralized cron job monitoring.
CloudFormation (CF) Templates were then used in the creation of the complete AWS environment. The template can be reused to create multiple environments in the future. 20 Microservices in the stage environment was deployed and handed over to the customer team for validation. Powerup also shared the Ansible playbook which helps in setting up the following components – Server Hardening / Jenkins / Metabase / FreeIPA / Repository.
The below illustrates the architecture:
Different VPCs are provisioned for Stage, Production, and Infra management. VPC peering is established from Infra VPC to Production / Stage VPC.
VPN tunnel is established between the customs office to AWS Infra VPC for the SSH access / Infra tool access.
All layers except the elastic load balancer are configured in a private subnet.
Separate security group configured for each layer like DB / Cache / Queue / App / ELB / Infra security groups. Only required Inbound / Outbound rules.
Amazon ECS is configured in Auto-scaling mode. So the ECS workers will scale horizontally based on the Load to the entire ECS cluster.
Service level scaling is implemented for each service to scale the individual service automatically based on the load.
Elasticache (Redis) is used to store the end-user session
A highly available RabbitMQ cluster is configured. RabbitMQ is used as messaging broker between the microservices.
For MySQL and Postgresql, RDS Multi-AZ is configured. MongoDB is configured in Master-slave mode.
IAM roles are configured for accessing the AWS resources like S3 from EC2 instances.
VPC flow logs/cloud trail / Cloud Config are enabled for logging purposes. The logs are streamed into AWS Elasticsearch services using AWS Lambda. Alerts are configured for critical events like instance termination, IAM user deletion, Security group updates, etc.
AWS system manager is used to manage to collect the OS, Application, instance metadata of EC2 instances for inventory management.
AMIs and backups are configured for business continuity.
Jenkins is configured for CI / CD process.
CloudFormation template is being used for provisioning/updating of the environment.
Ansible is used as configuration management for all the server configurations like Jenkins / Bastion / FreeIPA etc.
Sensu monitoring system is configured to monitor system performance
New Relic is configured for application performance monitoring and deployment tracking
Amazon Redshift, free IPA, Amazon RDS, Redis.
IaC enabled customers to spin up an entire infrastructure architecture by running a script. This will allow the customer to not only deploy virtual servers, but also launch pre-configured databases, network infrastructure, storage systems, load balancers, and any other cloud service that is needed. IaC completely standardized the setup of infrastructure, thereby decreasing the chances of any incompatibility issues with infrastructure and applications that can run more smoothly. IaC is helpful for risk mitigation because the code can be version-controlled, every change in the server configuration is documented, logged, and tracked. And these configurations can be tested, just like code. So if there is an issue with the new setup configuration, it can be pinpointed and corrected much more easily, minimizing the risk of issues or failure.
Developer productivity drastically increases with the use of IaC. Cloud architectures can be easily deployed in multiple stages to make the software development life cycle much more efficient.
Tools used & AWS Services used – Compute – EC2,EKS,ECR,Lambda, Shared storage – SFTP,EFS, Database – RDS Postgres, Advanced networking – Route53, route 53 resolver , custom DHCP, Security – AWS IAM, Active Directory, CloudTrail, AWS Config, IAAS CloudFormation, Other Services – Terraform, Jenkins with sonarQube, Nexus and Clair
The customer is regarded as global insurance giants of the financial sector. ABS is their consolidated insurance system that they are looking at migrating to AWS along with its supporting applications. They also wanted Powerup to create a Disaster Recovery facility on AWS and make the ABS insurance system available as a backup solution for one of their esteemed banking clients while also catering to a business continuity strategy, automation of applications and security & compliance.
The Customer is a German multinational, one of the leading integrated financial services company, headquartered in Munich. Their core business caters to offering products and services in insurance and asset management.
Problem statement / Objective
ABS is a monolithic application while the supporting applications are microservices-based. Hence, a microservice architecture which can seamlessly integrate with the customer’s core insurance module was needed.
They wanted Powerup to deploy their applications on production as well as on a secondary (Disaster Recovery) DR facility on AWS using a Continuous Integration (CI)/ Continuous Deployment (CD) pipeline. This was to serve as a Business Continuity solution for one of their esteemed banking clients.
For business continuity, the customer anticipated the Recovery Time Objective (RTO) of less than 4hr and Recovery Point Objective (RPO) of not more than 24 hours.
In addition to infrastructure deployment, all application deployments were requested to be automated by the client. Being a financial services company, the customer is bound by multiple regulatory and compliance-related obligations for which Cloud Best Security practices were also to be instrumented.
AWS Landing Zone was set up with the following accounts – Organization Account, Production Account, Dev, Pre-Prod, Management, DR, Centralized Logging & Security Account.
The operational unit consisted of the customer business system (i.e., CISL (Core insurance layer), RAP (Rich Application), MFDB (Core application Database), CTRLM (Batch job automation)) and Non-ABS (Non-customer business system i.e., dispatcher).
All Logs will be centrally stored in the logging account. All management applications like Control-M, AD, Jenkins etc. will be deployed in the management account.
ABS application is deployed across multiple AZs and load balanced using AWS Application load balancer. Non-ABS applications are microservices-based and talk to the ABS running to process or fetch the required data based on request. Close to over 10 microservices are running on Docker within the EKS cluster.
Auto-scaling is enabled at the service level as well as EC2 level to scale out the microservices based on the load. The application uses Active Directory to authenticate.
Amazon Elastic Kubernetes Service (EKS) backed the highly available, reliable and decoupled API services which are accessible only inside the customer’s private global shared network. Each module is segregated with the namespace.
Jenkins pipelines were used to build automation, Nexus tool to store artifacts and Clair for checking vulnerabilities. Build Artifact Vulnerability management was made easy with the aid of SonarQube.
Active passive Disaster recovery
Actively synchronised AWS Secure File Transfer Protocol (SFTP) is the active directory and private file storage space on the cloud.
Powerup methodically designed and tested a cross-account and cross-region disaster recovery strategy. At the time of live deployment, docker images are tagged (with versioning) and shipped to Amazon Elastic Container Register (ECR) DR account. Encrypted (Amazon Machine Image) AMIs and Relational Database Service (RDS) snapshots are passively shipped to DR account with a Recovery Point Objective (RPO) of 3 hours.
Custom coupled Lambda functions are used to generate, ship and eliminate encrypted AMIs and snapshots to DR accounts with no human intervention as a backup solution.
Advanced strategy to ensure the best security
Custom cloud formation template helped in monitoring AWS API calls made to Change or update configurations of IAM roles / SecurityGroups inbound or outbound rules/ EC2. Granular rules are followed in the AWS config for maintaining and remediating as per regulatory compliance.
The customer’s network setup was the biggest issue faced. In AWS, the network was completely private, an environment without Internet Gateway (i.e., direct internet access) and Network Address Translation (NAT) because of which a custom Dynamic Host Configuration Protocol (DHCP) option set had to be used to cope up with an existing custom DNS server which was set up in the customer Shared Service account alongside a custom proxy setup for the internet. The most challenging part was registering the worker nodes to Master in EKS as some of the internal kubelet components were failing due to enterprise proxy and custom DNS servers. In order to fix this, AWS private link, route 53 resolver and Kubernetes configMap were fine-tuned effectively.
The ABS insurance application was successfully deployed on AWS environment while meeting all security & high availability guidelines as per the stated compliance directives.
During load tests, the application was found to be able to handle 200 concurrent users successfully.
Microservices helped in ‘easier to build and maintain’ application programming interface (API) services. Flexibility and scalability of different API applications were also achieved.
The customer could maintain the lifecycle of identifying, investigating and prioritizing vulnerabilities in code as well as containers without any compromise.
The customer could now implement strong access and control measures and maintain an information security policy.
Tabletop run-through and DR scenario simulations ensured business continuity.