Data Lake on cloud for One of India’s largest media companies

By January 2, 2020 May 18th, 2020 Alexa, Case Study

Customer: India’s largest media company

Problem statement:

One of the largest media houses in the country was looking to improve the ad placements across channels for improved conversion. At the same time, it wanted to take other parameters like social media feedback (predominantly Twitter), EPG information, etc. into consideration. With the push towards digital content, on-prem infra was becoming a cost concern due to the volume of data being generated.

Current software that provides TRP information used to provide this information only once a week. There are certain reports that need to be generated in time (precisely in 6 to 12 mins from source to destination) to take critical business decisions. Also, there was a critical failure in the existing flow causing this delay as the processes were schedule-based. With all media companies generating these reports, the time taken to generate reports and make changes in promos, ad placements, etc. is very critical.

The Solution

The solution on a high level involved complete process transformation from tightly coupled synchronous architecture to an event-based, loosely coupled asynchronous architecture to make sure that the end reports are generated as desired by the user.

Powerup also helped this client take a cloud-first approach where the data from different sources (SAP, Chrome feeds, Twitter feeds, social media feedback in excel files, etc.) on-prem were piped to cloud. The data warehouse was created were data extracted from all the channels was moved. The data is then transformed using ETL jobs to a format that can be easily be pushed and visualized on Tableau. The system also has a logging system built in which keeps a check on different parameters like time taken for each process, success/failure of a process, the reason for the failure of the process, etc.

The end-to-end time taken to generate the critical reports was now 3 mins which improved the decision making capability of business leaders.

An auto-recovery feature was built for the failures so that no data is lost. The solution was also made modular keeping in mind the addition of new channels and scalability so that components can be added or removed without any code changes.

The solution architecture

Cloud Platform

AWS

Technologies Used

Amazon S3, Lambda, Redshift, IAM, EMR cluster, ETL server, CloudTrail, CloudWatch.

Benefit

The solution helped improve management to take business-critical decisions in time. With the reports now being generated and refreshed every 3 min, the client now can strategically do ad placements and this has led to better conversion. TRP is also set to increase further, post this initiative.

Leave a Reply