Managed Data Lake on Cloud Improved Driver Notification by 95%

By April 7, 2020 August 25th, 2020 Data Case Study, Data Lake

Customer: The pioneer of Electric vehicles and related technologies in India.

 

Summary

The customer is the pioneer of Electric vehicles and related technologies in India, involved in designing and manufacturing of compact electric vehicles. The moving a fully managed & scalable infrastructure and configuration shift, resulted in a cost saving of 30%.

About Customer

The customer is the pioneer of Electric Vehicle technology in India. Their mission is to bring tomorrow’s movement today. They have a wide variety of electric vehicles and will be increasing this range even further with products spanning across personal and commercial segments. Their design supports the new paradigm of shared, electric, and connected mobility. Currently, the total number of connected cars is at 7000 and is further expected to grow to 50,000.

Problem Statement

The customer was looking at,

  • A fully managed and scalable Infrastructure set up and configuration on AWS.
  • Application and services migration from current Azure set up to AWS.
  • Setting up of an Extract, Transform, Load (ETL) pipeline for analytics along with managed data lake.
  • Availability and structure of historical data similar to live data for analytics and
  • Framework for near real-time notification.

They were also eyeing at maintaining the reliability of data in Postgres and Cassandra as well as on it’s back up server.

Proposed Solution

All application microservices and MQTT/TCP IoT brokers will be containerized and deployed on AWS Fargate. All latest IoT sensor data will be sent to the AWS environment. IoT Sensor data will be pushed to a Kinesis stream and a Lambda function to query the stream to find the critical data (low battery, door open, etc) and call the notification microservice. Old sensor data to be sent to the Azure environment initially due to existing public IP whitelisting. MQTT bridge and TCP port forwarding to be done to proxy the request from Azure to AWS. Once the old sensors are updated fully cut-over to AWS.

Identity Access Management (IAM) roles to be created to access different AWS services. The network is to be setup using the Virtual Private Cloud (VPC) service with appropriate CIDR range, subnets, and route tables created. Network Address Translation (NAT) gateway is setup to enable internet access for servers in the private subnet and all Docker Images will be stored in Elastic Container Registry (ECR).

AWS Elastic Container Service (ECS), Fargate, is used to run the docker containers to deploy all the container images on the worker nodes. ECS task definitions are configured for each container to be run. In Fargate the control plain and worker nodes are managed by AWS. The scaling, highly available (HA) services, and patching is handled by AWS as well.

Application Load Balancer (ALB) will be deployed as the front end to all the application microservices. ALB will forward the request to the Kong API gateway which in turn will route the requests to the microservices. Service level scaling will be configured in Fargate for more containers to spin up based on load. AWS Elasticache, a managed service with Redis Engine will be deployed across multiple Availability Zone (AZ) for HA, patching, and updates.

Aurora PostgreSQL will be used to host the PostgreSQL database. SQL dump will be taken from Azure PostgreSQL Virtual Machine (VM) and then restored on Aurora. 3 Node Cassandra cluster, of which 2 will be running in one AZ and the remaining ones in the second AZ will be setup for HA. A 3-node ElasticSearch cluster will also be setup using the AWS managed services.

 

 

In the bi-directional notification workflow, TCP and MQTT gateways will be running on EC2 machines and Parser application on a different EC2 instance. AWS Public IP addresses will be whitelisted on the IoT Sensor during manufacturing for the device to securely connect to AWS. The Gateway Server will push the raw data coming from the sensors to a Kinesis Stream. The Parser server will push the converted and processed data to the same or another Kinesis stream.

Lambda function will query the data in the Kinesis stream to find the fault or notification type data and will invoke the notification Microservice/ SNS to notify the customer team. This reduces the current notification time from 6-8 minutes to almost near real-time. The plan is to have Kinesis Firehose as a consumer reading from the Kinesis streams to push processed data to a different S3 bucket. Another Firehose will push the processed data to Cassandra Database and a different S3 bucket.

AWS Glue will be used for data aggregation previously done using Spark jobs to push the data to a separate S3 bucket. Athena will be used to query on the S3 buckets and standard SQL queries work with Athena. Dashboards will be created using Tableau.

 

 

Cloud platform

AWS.

Technologies used

Cassandra, Amazon Kinesis, Amazon Redshift, Amazon Athena, Tableau.

Business benefit

  • The customer can send and receive notifications in real-time & time taken to send notifications to the driver is reduced by 95%. Using AWS, applications can scale on a secure, fault-tolerant, and low-latency global cloud. With the implementation of the CI/CD pipeline, the customer team is no longer spending its valuable time on mundane administrative tasks. Powerup helped the customer achieve its goal of securing data while lowering cloud bills and simplifying compliance.
  • API Gateway proved to be one of the most beneficial services offered by AWS with its wide range of functionalities as it helped Powerup to address customer’s issues.
  • There was a parallel and collaborative effort from the customer and the Powerup team on containerization of the microservices.
  • Data-driven business decisions taken by the customer team helped in easier movement of data and eliminated the repetitive process.
  • 30% – Cost savings with new architecture

Leave a Reply