Category

ML

Building your first Alexa Skill — Part 1

By | AI, Alexa, Blogs, Machine Learning, ML | No Comments

Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies

Technological advancement in the area of Artificial Intelligence & Machine Learning has not only helped systems to become more intelligent but has also made them more vocational. You can just speak to the phone & add items to your shopping list or just instruct your laptop to read your email. In this fast-growing era of voice-enabled automation, Amazon’s Alexa enabled devices are changing the way people go through their daily routines. In fact, it has introduced a new term in the dictionary, Intelligent Virtual Assistant (IVA).

Technopedia defines Intelligent Virtual Assistant as an engineered entity residing in software that interfaces with humans in a human way. This technology incorporates elements of interactive voice response and other modern artificial intelligence projects to deliver full-fledged “virtual identities” that converse with users.”

Some of the most commonly used IVAs are Google Assistant, Amazon Alexa, Apple Siri, Microsoft Cortana, with Samsung Bixby joining the already brimming list lately. Although IVAs seem to be technically charged, they bring enormous automation & value. Not only do they make jobs for humans easier, but they also optimize processes and reduce inefficiencies. These systems are so seamless, that just a simple voice command is required to get tasks completed.

The future of personalized customer experience is inevitably tied to “Intelligent Assistance”. –Dan Miller, Founder, Opus Research

So let’s bring our focus to Alexa, Amazon’s IVA. Alexa is Amazon’s cloud-based voice service, which can interface with multiple devices on Amazon. Alexa gives you the power to create applications, which have the capability to interact in natural language, making your systems more intuitive to interact with technology. Its capabilities mimic those of other IVAs such as Google Assistant, Apple Siri, Microsoft Cortana, and Samsung Bixby.

The Alexa Voice Service (AVS) is Amazon’s intelligent voice recognition and natural language understanding service that allows you to voice-enable any connected device that has a microphone and a speaker.

Powerupcloud has worked on multiple use-cases, where they have developed Alexa voice automation. One of the most successful & adopted use cases being one of the largest General Insurance providers.

This blog series aims at giving a high-level overview of building your first Alexa Skills. It has been divided into two parts, first, covering the required configurations for setting up the Alexa skills, while the second focuses on the approach for training the model and programming.

Before we dive in to start building our first skill, let’s have a look at some Alexa terminologies.

  • Alexa Skill — It is a robust set of actions or tasks that are accomplished by Alexa. It provides a set of built-in skills (such as playing music), and developers can use the Alexa Skills Kit to give Alexa new skills. A skill includes both the code (in the form of a cloud-based service) and the configuration provided on the developer console.
  • Alexa Skills Kit — A collection of APIs, tools, and documentation that will help us work with Alexa.
  • Utterances — The words, phrases or sentences the user says to Alexa to convey a meaning.
  • Intents — A representation of the action that fulfils the user’s spoken request.

You can find the detailed glossary at

https://developer.amazon.com/docs/ask-overviews/alexa-skills-kit-glossary.html

Following are the prerequisites to get started with your 1st Alexa skill.

  1. Amazon Developer Account (Free: It’s the same as the account you use for Amazon.in)
  2. Amazon Web Services (AWS) Account (Recommended)
  3. Basic Programming knowledge

Let’s now spend some time going through each requirement in depth.

We need to use the Amazon Developer Portal to configure our skill and build our model which is a necessity.

  • Click on Create Skill, and then select Custom Model to create your Custom Skill.

Please select your locale carefully. Alexa currently caters to English (AU), English (CA), English (IN), English (UK), German (DE), Japanese (JP), Spanish (ES), Spanish (MX), French (FR), and Italian (IT). We will use English (IN) while developing the current skill.

  • Select ‘Start from Scratch’
  • Alexa Developer Console
  • Enter an Invocation Name for your skill. Invocation name should be unique because it identifies Skills. Invocation Name is what you say Alexa to invoke or activate your skill.

There are certain requirements that your Invocation name must strictly adhere to.

  • Invocation name should be two or more words and can contain only lowercase alphabetic characters, spaces between words, possessive apostrophes (for example, “sam’s science trivia”), or periods used in abbreviations (for example, “a. b. c.”). Other characters like numbers must be spelt out. For example, “twenty-one”.
  • Invocation names cannot contain any of the Alexa skill launch phrases such as “launch”, “ask”, “tell”, “load”, “begin”, and “enable”. Wake words including “Alexa”, “Amazon”, “Echo”, “Computer”, or the words “skill” or “app” are not allowed. Learn more about invocation names for custom skills.
  • Changes to your skill’s invocation name will not take effect until you have built your skill’s interaction model. In order to successfully build, your skill’s interaction model must contain an intent with at least one sample utterance. Learn more about creating interaction models for custom skills.
  • Endpoint — The Endpoint will receive POST requests when a user interacts with your Alexa Skill. So this is basically the backend for your Alexa Skill. You can host your skill’s service endpoint either using AWS Lambda ARN, which is recommended, or a simple HTTPS endpoint. Advantages of using an AWS Lambda ARN are :
  • Sign in to AWS Management Console at https://aws.amazon.com/console/
  • Lookup for Lambda in AWS services
  • US East (N. Virginia)
  • EU (Ireland)
  • US West (Oregon)
  • Asia Pacific(Tokyo)

We are using Lambda in the N.Virginia (us-east-1) region.

  • Once we are in a supported region, we can go ahead to create a new function. There are three different options for creating your function. You can create a function from scratch or you can also use available Blueprints and Serverless Application Repositories.
  • C# / .NET
  • Go
  • Java
  • NodeJS
  • Python

We will discuss programming Alexa with different languages in the next part of this series.

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html

  • Go back to the Endpoint section in Alexa Developer Console, and add the ARN we had copied from Lambda in AWS Lambda ARN Default Region.

ARN format — arn:aws:lambda:us-east-1:XXXXX:function:function_name

In, part 2, we will discuss the training our model — adding Intents & Utterances, finding walkarounds for some interesting issues we faced, making workflows using dialog state, understanding the Alexa Request & Response JSON, and finally our programming approach in Python.

Bringing automation to Online Dating & Matrimony, saving big bucks!!

By | AI, AWS, ML | No Comments

Written by Rishabh Sood, Associate director-ASG at Powerupcloud Technologies.

Matchmaking is probably one of the oldest professions, as far as documented history can be traced. We all have come across some form of matchmaking, be it the neighbor aunty constantly looking to hook up her daughter to the NRI 30 something, or a relative with a lifelong wish of setting you up with her niece/nephew. Those were the simpler times, when the adults understood (or assumed they did) our requirements & with deep regard to our feelings (no pun intended!!), would search for the most (un)suitable match.

With the advancement in technology, the dating & matchmaking industry started migrating online & with the advancement in lifestyle, open dating & matchmaking no longer remained a taboo. In 2005, 29 percent of U.S. adults agreed with the statement, “People who use online dating sites are desperate;” but in 2013, only 21 percent of adults agreed with the statement.

With the advent of technology, more than half of the dating industry moved online (According to IBISWorld, the dating industry accounts for $3 billion revenues in the US alone) & is slated to grow at 25% CAGR through 2020. With this Digital revolution in the industry, the companies started to accumulate a host of data in the form of images. These sets of images, which were uploaded by the users while creating their profiles, was a goldmine of information to derive user insights & improve business metrics.

India, with a population of more than 1.3 billion people, is the 2nd largest online market across the globe, with over 460 million internet users. The online dating industry in India is supposed to grow at a CAGR of 10.5% from 2018–2023. With such a huge userbase & bright future, across the country, companies have spruced up in sizeable numbers, with a host of them already hitting profitability, unlike the other online ventures.

A very niche segment under online dating caters to the traditional audience, the Social matchmaking business. These companies bring individuals together, based on their preferences & lifestyle matches, allow them to connect, get to know each other & then take the plunge. Most of these companies, running the business legally, need to regulate their user base & digitally available data according to the Internet Censorship norms, which are quite stringent in India.

With one of the largest Matrimonial Players in India (& across the globe), the no of users registered & the no of images uploaded on a daily basis would be quite a sizeable no. The matrimonial player would generally get around 10,000 profiles created & 30,000 images uploaded on a monthly basis. These images were the profile images that their users would upload as part of their portfolio. Being one of the major criteria for getting matches, these images would be mandatory for profile creation & needed to go through a very stringent process of acceptance & rejection by the moderators.

The business process followed for profile activation is shown in the below image.

The business had invested heavily in the Photo Moderation process, as this was the core of the model. A 20 member team would manually asses each & every image being uploaded & would analyze it on the following parameters

· Age (should be between 25–60 years)

· Gender match (for male profiles, it should be a male image)

· Group Photo (for profile images, group photo would not be allowed)

· Indecently dressed/nude image

· Should not be a celebrity image

· Should not be a selfie

· Should not have a watermark across the image

· Should be above a specified quality (no blur, contrast, reflection, etc. in the image)

The manual moderation process not only made the process of profile activation very slow (up to 72 hours or 3 days) but would also leave the moderation to manual judgment. While to 1 human it might look an indecent image, to the other it might be perfectly alright to approve. Being a critical process, which might also lead to the legal watchdogs’ ire, in the case of misjudgment.

At Powerup, we expertise in Image Vision & building custom models to deliver business ROI. We worked with the leader of the matrimonial service to do a feasibility study on the automation of the complete Photo Moderation process. Our team of ML solution experts analyzed the set of images from the customer. The images were sanitized, structured & then labeled to train the Vision model created. They then tested multiple models that would suit the business problem. One peculiar problem to solve with the business was celebrity detection. Although a host of open source libraries detect the personalities across the world, none of them can detect the smaller known faces, Indian television industry artists, Lollywood & Tollywood actors, etc.

To resolve this, the database of celebrity images (which were manually rejected during the manual photo moderation process) was borrowed from the business team & via re-enforced learning models, used for training the model.

The team followed a 5 stepped approach (depicted below), to automate the photo moderation process, backed by powerful Image processing models.

But how would such a model scale up for a company that processes more than 30,000 images on a monthly basis? Will it be able to identify the new data sets added every day? What if the system fails to recognize an anomaly & the image gets activated as an incorrect approve?

The system was designed with a feedback loop, where the engine would constantly feed on the manual feedback on the classified & unclassified data. To resolve system scalability, the custom image processing models were backed by a strong re-enforced learning model, which would constantly add to the dataset for enhanced accuracy on the Photo Moderation process.

With a dataset of around 1 million images, the model has developed to delivery 60% accuracy on deployment. Within 6 months the model increased the accuracy to 78%, with daily dataset build & re-enforced learning on the custom-built engine.

With the engine increasing accuracy on a daily basis, it helped not only automate a critical function in the process but also helped achieve a positive ROI on the implementation. Within 6 months, the manual moderation team was reduced to a 5 member team to only look into the exception scenarios, which is a 75% reduction in the taskforce. The profile activation process was reduced from 72 hours to within a day, again a 72% improvement in TAT. With a 78% accuracy on positive image classification, the engine not only was a compelling use case to implement but was now a critical support system for the business.

To understand how the solution was implemented, please refer to our tech blog series at https://blog.powerupcloud.com/realtime-image-moderation-at-scale-using-aws-rekognition-d5e0a1969244