Data Lake & Data Warehouse for One of India’s largest media companies

By January 2, 2020 February 11th, 2020 AI, Alexa, Case Study

Customer: An e-commerce Company-Running Websites at Scale on App Service.

Problem statment:

One of India’s largest media companies, uses various SaaS platforms to run their OTT streaming application resulting in data is stored a several disparate sources. With around 20 of these data sources, resulting in an overall daily raw data aggregating to ~600 GB. This made extracting customer meta-data complex while making search and building recommendations difficult. 

The Solution

Building a Data Lake to bring all their customers’ and operations’ data at one place to understand their business better. Powerupcloud built real-time and batch ETL jobs to bring the data from varied data sources to S3. The raw data was stored in S3. The data was then populated in Redshift for further reporting while advanced analytics was run using Hadoop based ML engines on EMR. Reporting was done using QuickSight.

The solution architecture

 

Cloud Platform

AWS

Technologies Used

S3, DynamoDB, AWS ElasticSearch, Kibana, EMR Clusters, RedShift, QuickSight, Lambda, Cognito, API gateway, Athena, MongoDB, Kinesis

Leave a Reply