Enabling the leadership of a large OTT access business data just by asking for it

By September 25, 2019 May 18th, 2020 Alexa, AWS, Blogs, data

Written By: Kartikeya Sinha, Lead Data Architect, Powerupcloud & Siva S, CEO, Powerupcloud Technologies

Just imagine the work-life of a Chief Executive or someone from the senior leadership team of a company. You would see them getting into meetings after meetings. They always seem to be thinking about something. To make better business decisions, they need to understand their business data. In their super busy schedule, it often turns out to be cumbersome for them to navigate through complex Business Intelligence (BI) dashboards and tens & hundreds of reports to find the metrics they need.

With the introduction of Natural Language Processing (NLP) APIs from leading pubic cloud providers like AWS, Azure & Google, we have started receiving a lot of requirements around integrating these NLP APIs with BI dashboards so that the senior business executives can simply ask for specific data and hear them out instantly.

One such case is discussed in this blog post.


Problem Statement

One of our customers is a large video streaming company. They collect several metrics including video streaming, customer behaviour, application usage, network usage, etc. But these metrics were distributed across several software used by them for video streaming including the likes of Mixpanel, Youbora, Appsee, etc. The customer had the following requirements:


  1. Build a data lake so that all data can be accessed from one centralized location
  2. Build ML engines for prediction, correlation of the app data
  3. Build a highly responsive and graphically rich reporting dashboard
  4. Enable NLP to search metrics using voice or text query

In this blog, we will be covering the custom reporting dashboard and NLP integration modules.


Data Lake Solution

Powerupcloud’s data team built a data lake using Amazon Redshift, Amazon S3 to support the data analysis processes. The data was loaded to Amazon S3 by Talend jobs. An ETL job converts the raw data files to readable CSV files and pushes to a target bucket. This allows the data to be queried either by Redshift Spectrum or Athena directly from Amazon S3 and this brings down the data storage costs quite a bit.

Below is a high-level architecture diagram without the Redshift Spectrum or Athena component.



Tech Stack

– Amazon Redshift as DWH.

– Amazon Lex to do NLP on the query text and extract intent and slot values.

– Elasticbeanstalk based Query processing engine written in Python3

– Webkit Speech Recognition API to convert speech to text.

– Elasticbeanstalk to host the BI dashboard

– Tech stack for the BI dashboard — Bootstrap, jQuery, Morris.js charts


Rich Reporting Dashboard

Once the data lake was implemented, we were faced with the next big problem-how can you integrate NLP into a BI platform? We tried several out-of-the-box BI platforms like Redash, PowerBI, etc. But integrating a browser-based voice-to-text converter was a challenge. So we decided to go with Google Web Kit and a custom reporting dashboard.

As the customer needed a rich UI, we chose morris.js charts running on a bootstrap theme. Morris.js allowed us to have rich colours and graphics in the graphs while the bootstrap theme helped in a high level of customization.



Integrating Amazon Lex

This architecture gives you a flow of data from the browser to Redshift.

The queries generated by Google Webkit is passed to Amazon NLP for intents and associated slots. Once the slots are identified, the parameters are passed to the Query Processing API which queries the Redshift for relevant data. This data is then presented through the custom reports built.


How does the solution work?


  1. Click on the ‘mic’ icon and ask your query.
  2. The BI tool does the speech to text conversion using Webkit Speech API.
  3. The text query is then sent to a Query Processing engine.
  4. Query processing engine sends a request to Amazon Lex for extracting intent and slot values from the query.
  5. Amazon Lex responds back with the intent name and slot values.
  6. Query processing engine uses the intent name and slot values to form a SQL query to the backend DWH-Amazon Redshift.
  7. Using the result of the query from Redshift, the query processing engine forms a response back to the frontend dashboard (BI).
  8. The frontend (BI) dashboard uses the response data to plot the graph/display it in the table.


Training Amazon Lex

The utterances are trained as below. Please note that the more utterances you train, the smarter the engine gets. The slots can be added as per the reports built in the dashboard. In this example, we chose ‘DeviceOS’, ‘GraphType’ and ‘# of days’ as the slots that are needed to be supplied from the customer’s query.




Challenges Faced


  1. Webkit Speech API does a pretty good job of converting speech to text. However, it works only on Google Chrome browser. Firefox has recently launched support for speech recognition, but that is still in very nascent stage.
  2. Although the ideal situation would be that you ask any meaningful query to the BI tool and it should be able to answer it. However, in order to do that Query processing engine needs to be really super smart to form dynamic SQL queries based on the user query. We have not yet achieved that and are evolving the Query processing engine to handle as many queries as possible without a need for modification.


Voice-Based BI Engine in Action

The voice search can pull reports based on 3 inputs,


  • Metrics-Visitors or Viewers or Video Views
  • Devices-iOS or Android or TV or PWA
  • Time-Last X days
  • Sample Query: Can you show me the number of visitors from iOS for the last 10 days?
  • Note: Voice search for terms like ‘Video Views’ and ‘PWA’ might be a little difficult for Lex to comprehend. Text search works better.

Hope this read was insightful. The future is voice-based platforms, be it apps, reports, customer service, etc.

If you would like to know more details on this project or if you want us to build something similar for you, please write to us at data@powerupcloud.com.


Leave a Reply