Category

Blogs

Making the Connected Car ‘Real-time Data Processing’ Dream a Reality

By | Analytics, Automation, AWS, Blogs | No Comments

Written by Jeremiah Peter, Solution specialist-Advanced Services Group, Contributor: Ravi Bharati, Tech Lead and Ajay Muralidhar,  Sr. Manager-Project Management at Powerupcloud Technologies

Connected car landscape

Imagine driving your car on a busy dirt road in the monsoon, dodging unscrupulous bikers, jaywalking pedestrians and menacing potholes. Suddenly, a fellow driver makes wild gestures to inform you that the rear door is unlocked, averting an imminent disaster.

In a connected car system, these events are tracked in near real-time and pushed to the driver’s cell phone within seconds. Although the business relevance of real-time car notifications is apparent, the conception of the underlying technology and infrastructure hardly is. The blog attempts to demystify the inner workings of handling data at scale for an Indian automobile behemoth and equips you with a baseline understanding of storing and processing vast troves of data for IoT enabled vehicles.

The paradigm of shared, electric and connected mobility, which seemed a distant reality a few years ago, is made possible through IoT sensors. Laced with tiny data transmitting devices, vehicles can send valuable information such as Battery Percentage, Distance to Empty (DTE), AC On/Off, Door Locked/Unlocked, etc. to the OEM. The service providers use this information to send near real-time alerts to consumers, weaving an intelligent and connected car experience. Timely analysis and availability of data, thus, becomes the most critical success component in the connected car ecosystem.

Before reaching the OEM’s notification system, data is churned through various phases such as data collection, data transformation, data labeling, and data aggregation. With the goal of making data consumable, manufacturers often struggle to set up a robust data pipeline that can process, orchestrate and analyze information at scale.

The data conundrum

According to Industry Consortium 5GAA, connected vehicles ecosystem can generate up to 100 terabytes of data each day. The interplay of certain key factors in the data transmission process will help you foster a deeper understanding of the mechanics behind IoT-enabled cars. As IoT sensors send data to a TCP/IP server, parsers embedded within the servers push all the time series data to a database. The parsing activity converts machine data (hexadecimal) into a human-readable format (Json) and subsequently triggers a call to a notification service. The service enables OEM’s to send key notifications over the app or through SMS to the end-consumer.

Given the scale and frequency of data exchange, the OEM’s earlier set up was constrained by the slow TCP/IP data transfer rate (Sensor data size: TCP/IP- 360 bytes; MQTT- 440 bytes). The slow transfer rate has far-reaching implications over the user experience, delaying notifications by 6-7 minutes. As part of a solution-driven approach, Powerup experts replaced the existing TCP/IP servers with MQTT servers to enhance the data transfer rate. The change affected a significant drop in notification send-time, which is presently calibrated at around 32-40 seconds.

Furthermore, the OEM’s infrastructure presented another unique challenge in that only 8 out of 21 services were containerized. The rest of the services ran on plain Azure VM’s. To optimize costs, automate scalability and reduce operational overhead, all services are deployed on Docker Containers. Containers provide a comprehensive runtime environment that includes dependencies, libraries, framework and configuration files for applications to run. However, containers require extensive orchestration activities to aid scalability and optimal resource management. AWS Fargate is leveraged to rid the OEM’s infrastructure management team of routine container maintenance chores such as provisioning, patching, cluster and capacity management

Moreover, MQTT and TCP IP brokers were also containerized and deployed on Fargate to ensure that all IoT sensor data is sent to the AWS environment. Once inside the AWS environment, sensor data is pushed to Kinesis Stream and Lambda to identify critical data and to call the AWS notification service-SNS. However, the AWS solution could not be readily implemented since the first generation of electric vehicles operated on 2G sim cards, which did not allow change of IP whitelisting configuration. To overcome the IP whitelisting impediment, we set up an MQTT bridge and configured TCP port forwarding to proxy the request from Azure to AWS. Once the first generation vehicles are called back, the new firmware will be updated over-the-air, enabling whitelisting of new AWS IP addresses. The back-handed approach will help the OEM to fully cut-over to the AWS environment without downtime or loss of sensor data.

On the Database front, the OEM’s new infrastructure hinges on the dynamic capabilities of Cassandra DB and PostgreSQL. Cassandra is used for storing Time Series data from IoT sensors. PostgreSQL database contains customer profile/vehicle data and is mostly used by the Payment Microservice. Transactional data is stored in PostgreSQL, which is frequently called upon by various services. While PostgreSQL holds a modest volume of 150 MB Total, the database size of Cassandra is close to 120 GB.

Reaping the benefits

While consumers will deeply benefit from the IoT led service notifications, fleet management operators can also adopt innovative measures to reduce operational inefficiencies and enhance cost savings. Most fleet management services today spend a significant proportion on administrative activities such as maintaining oversight on route optimization, tracking driver and vehicle safety, monitoring fuel utilization, etc. A modern fleet management system empowers operators to automate most of these tasks.

Additionally, preventive maintenance can help operators augment vehicle lifecycle by enabling fleet providers to pro-actively service vehicles based on vehicular telemetry data such as battery consumption, coolant temperature, tire pressure, engine performance and idling status (vehicle kept idle). For instance, if a truck were to break-down due to engine failure, the fleet operator could raise a ticket and notify the nearest service station before the event occurred, cutting down idle time.

Conclusion

With 7000 cars in its current fleet, the OEM’s infrastructure is well-poised to meet a surge of more than 50,000 cars in the near future. Although the connected car and autonomous driving segment still goes through its nascent stages of adoption, it will continue to heavily draw upon the OEM’s data ingestion capabilities to deliver a seamless experience, especially when the connected car domain transcends from a single-vehicle application to a more inclusive car-to-car communication mode. Buzzwords such as two-way data/telematic exchanges, proximity-based communications and real-time feedback are likely to become part of common parlance in mobility and fleet management solutions.

As the concept of the Intelligent Transport System gathers steam, technology partners will need to look at innovative avenues to handle high volume/velocity of data and build solutions that are future-ready. To know more about how you can transform your organization’s data ingestion capability, you can consult our solution experts here.

Transforming Invoice Processing through Automation

By | AI, Automation, Blogs, Image Processing | One Comment

Written by Jeremiah Peter, Solution specialist-Advanced Services Group, Contributor: Amita PM, Associate Tech Lead at Powerupcloud Technologies.

Automation Myth

According to a recent survey by a US-based consultancy firm, organizations spend anywhere between $12 to $20 from the time they receive an invoice until they reconcile it. The statistic is a stark reminder of how organizations, in pursuit of grand cost-cutting measures, often overlook gaping loopholes in their RPA adoption policy- All or nothing!

This blog makes a compelling case for implementing RPA incrementally in strategic processes to yield satisfactory results. Streamlining the invoice management process is, undoubtedly, a judicious leap in that direction.

Unstructured invoice dilemma

In a real-world scenario, data in invoices are not standardized and the quality of submission is often diverse and unpredictable. Under these circumstances, conventional data extraction tools lack the sophistication to parse necessary parameters and, often, present organizations the short end of the stick. 

Consequently, most invoice processing solutions available today fail to reconcile the format variance within the invoices. The Powerup Invoice Processing Application is a simple Web Application (written in HTML and Python) that leverages cloud OCR (Optical Character Recognition) services, to extract text from myriad invoice formats. Powered by an intelligent algorithm, the solution uses the pattern-matching feature to extract data (e.g. Date MM-DD-YYYY) and breaks free from the limitations of traditional data extraction solutions.

A high-level peek into the solution


Picture by Google.com

Driven by a highly user-friendly interface, the Powerup Invoice Processing Application enables users to upload invoices (png, jpg) from their local workstations. The action invokes a seamless API call to Google OCR service, which returns a long string object as API response. A sample of the string is presented below:

Subsequently, the string is converted to a human-readable format through a script, which uses a Python-based Regex library to identify desirable parameters in the invoice such as date, invoice number, order number, unit price, etc. The extracted parameters are passed back to the web application after successful validation. The entire process lasts not more than 10 seconds. The video below demonstrates how Powerup has successfully deployed the complete process:

Another noteworthy feature of the solution is that it seamlessly integrates with popular ERP systems such as SAP, QuickBooks, Sage, Microsoft Dynamics, etc. Given that ERP systems stash critical accounts payable documents (purchase orders, invoices, shipping receipts), a versatile solution requires integration with the organization’s ERP software to complete the automation cycle. 

A brief look at the advantages offered by invoice processing automation can help you assess the value delivered by the solution. 

The Silver-lining

Picture by Google.com

The adoption of Powerup Invoice Processing Application helps organizations reap the following benefits:

  • Deeply optimized invoice processing TAT resulting in quicker payment cycles
  • Up to 40% cost savings in procurement and invoice processing
  • Highly scalable solution that can process multiple invoices in a few minutes
  • Fewer errors and elimination of human data-entry errors
  • Free-form parameter pattern-matching 
  • Easy integration with ERP software
  • Readily implementable solution; no change required from vendor’s end 

Conclusion 

While procurement teams in various organizations struggle to strike a trade-off between low funds dispensation and high-cost savings, measures that enable them to cut expenses and improve efficiencies in the invoicing process are a welcome respite. 

Tools such as the Powerup Invoice Processing Application can help organizations infuse automation and agility into its processes, as well as, knockdown process complexities into manageable parts. Moreover, the time and cost efficiencies achieved in these undertakings can be passed on to other functions that can significantly bolster the organization’s service offerings. To find out how your organization can be positively impacted, sign up for a free demo session here.

Running Kubernetes Workloads on AWS Spot Instances-Part VIII

By | AWS, Blogs, Kubernetes | No Comments

Written by Priyanka Sharma, DevOps Architect, Powerupcloud Technologies

Till now we have practised a lot on the OnDemand Nodes of K8s Cluster. This post demonstrates how to use Spot Instances as K8s worker nodes, and shows the areas of provisioning, automatic scaling, and handling interruptions (termination) of K8s worker nodes across your cluster. Spot Instances can save you up to 70–90% cost as compared to OnDemand.Though Spot EKSInstances are cheaper, you cannot run all your worker nodes as Spot. You must have some OnDemand Instances as a backup because Spot Instances can betray you anytime with the interruptions 😉

In this article, we are discussing how you can use Spot Instances on EKS Cluster as well as the cluster you own on EC2 Servers.

Refer to our public Github Repo which contains the files/templates we have used in the implementation. This blog is covering the below-mentioned points:

Kubernetes Operations with AWS EKS

AWS EKS is a managed service that simplifies the management of Kubernetes servers. It provides a highly available and secure K8s control plane. There are two major components associated with your EKS Cluster:

  • EKS control plane which consists of control plane nodes that run the Kubernetes software, like etcd and the Kubernetes API server.
  • EKS worker nodes that are registered with the control plane.

With EKS, the need to manage the installation, scaling, or administration of master nodes is no longer required i.e. AWS will take care of the control plane and let you focus on your worker nodes and application.

Prerequisites

  • EC2 Server to provision the EKS cluster using AWSCLI commands.
  • The latest version of AWSCLI Installed on your Server
  • IAM Permissions to create the EKS Cluster. Create an IAM Instance profile with the permissions attached and assign to the EC2 Server.
  • EKS Service Role
  • Kubectl installed on the server.

Provision K8s Cluster with EKS

Execute the below command to provision an EKS Cluster:

aws eks create-cluster --name puck8s --role-arn arn:aws:iam::ACCOUNT:role/puc-eks-servicerole --resources-vpc-config subnetIds=subnet-xxxxx,subnet-xxxxx,subnet-xxxxxx,securityGroupIds=sg-xxxxx --region us-east-2

We have given private subnets available in our account to provision a private cluster.

Wait for the cluster to become available.

aws eks describe-cluster --name puck8s --query cluster.status --region us-east-2

Amazon EKS uses IAM to provide authentication to your Kubernetes cluster through the AWS IAM Authenticator for Kubernetes(Link in the References section below). Install it using the below commands:

curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/bin/linux/amd64/aws-iam-authenticator
chmod +x ./aws-iam-authenticator
cp ./aws-iam-authenticator /usr/bin/aws-iam-authenticator

Update ~/.kube/config file which will be used by kubectl to access the cluster.

aws eks update-kubeconfig --name puck8s --region us-east-2

Execute “kubectl get svc”.

Launch Spot and OnDemand Worker Nodes

We have provisioned the EKS worker nodes using a cloud formation template provided by AWS. The template is available in our Github repo as well i.e. provision-eks-worker-nodes/amazon-eks-node group-with-spot.yaml. The template will provision three Autoscaling Groups:

  • 2 ASG with Spot Instances with two different Instance types as given in the parameters
  • 1 ASG with OnDemand Instance with Instance type as given in the parameter

Create a Cloudformation stack and provide the values in the parameters. For the AMI parameter, enter the ID from the below table:

| Region                  |      AMI               | 
|-------------------------| ---------------------- |
| US East(Ohio)(us-east-2)| ami-0958a76db2d150238 |

Launch the stack and wait for the stack to be completed. Note down the Instance ARN from the Outputs.

Now get the config map from our repo.

https://github.com/powerupcloud/kubernetes-spot-webinar/blob/master/provision-eks-worker-nodes/aws-cm-auth.yaml

Open the file “aws-cm-auth.yaml ” and replace the <ARN of instance role (not instance profile)> snippet with the NodeInstanceRole value that you recorded in the previous procedure, and save the file.

kubectl apply -f aws-auth-cm.yaml
kubectl get nodes --watch

Wait for the nodes to be ready.

Kubernetes Operations with KOPS

Kops is an official Kubernetes project for managing production-grade Kubernetes clusters. It has commands for provisioning multi-node clusters, updating their settings including nodes and masters, and applying infrastructure changes to an existing cluster. Currently, Kops is actually the best tool for managing k8s cluster on AWS.

Note: You can use kops in the AWS regions which AWS EKS doesn't support.

Prerequisites:

  • Ec2 Server to provision the cluster using CLI commands.
  • Route53 domain, (for example, k8sdemo.powerupcloud.com) in the same account from where you are provisioning the cluster. Kops uses DNS for identifying the cluster. It adds the records for APIs in your Route53 Hosted Zone.
Note: For public hosted zone, you will have to add the NS records for the above domain to your actual DNS. For example, we have added an NS record for "k8sdemo.powerupcloud.com" to "powerupcloud.com". This will be used for the DNS resolution. For the private hosted zone, ensure to add the VPCs.
  • IAM Permissions to create the cluster resources and update DNS records in Route53. Create an IAM Instance profile with the permissions attached and assign to the EC2 Server.
  • S3 bucket for the state store.
  • Kubectl installed.

Install Kops

Log into the EC2 server and execute the below command to install Kops on the Server:

curl -LO https://github.com/kubernetes/kops/releases/download/$(curl -s https://api.github.com/repos/kubernetes/kops/releases/latest | grep tag_name | cut -d '"' -f 4)/kops-linux-amd64
chmod +x kops-linux-amd64
sudo mv kops-linux-amd64 /usr/local/bin/kops

Provision K8s Cluster

kops create cluster k8sdemo.powerupcloud.com --ssh-public-key ~/.ssh/id_rsa.pub --master-zones ap-south-1a --zones ap-south-1a,ap-south-1b,ap-south-1a --master-size=t2.medium --node-count=1 --master-count 1 --node-size t2.medium --topology private --dns public --networking calico --vpc vpc-xxxx --state s3://k8sdemo-kops-state-store --subnets subnet-xxxx,subnet-xxxx --utility-subnets subnet-xxxx,subnet-xxxx --kubernetes-version 1.11.4 --admin-access xx.xxx.xxxx.xx/32 --ssh-access xx.xxx.xxx.xx/32 --cloud-labels "Environment=DEMO"

Refer to our previous blog for the explanation of the arguments in the above command.

kops update cluster --yes

Once the above command is successful, we will have a private K8s Cluster ready with Master and Nodes in the private subnets.

Use the command “kops validate cluster CLUSTER_NAME” to validate the nodes in your k8s cluster.

Create Instance Groups for Spot and OnDemand Instances

Kops Instance Group helps in the grouping of similar instances which maps to an Autoscaling Group in AWS. We can use the “kops edit” command to edit the configuration of the nodes in the editor. The “kops update” command applies the changes to the existing nodes.

Once we have provisioned the cluster, we will have two Instance groups i.e. One for master and One for Nodes. Execute the below command to get available Instance Groups:

kops get ig

Edit nodes instance group to provision spot workers. Add the below Key Values. Set the max price property to your bid. For example, “0.10” represents a spot-price bid of $0.10 (10 cents) per hour.

spec:
...
maxPrice: "1.05"
nodeLabels:
lifecycle: Ec2Spot
node-role.kubernetes.io/spot-worker: "true"

The final configuration will look like as shown in the below screenshot:

Create one more Spot Instance Group for a different instance type.

kops create ig nodes2 --subnet ap-south-1a,ap-south-1b --role Node
kops edit ig nodes2

Add maxPrice and node labels and the final configuration will look like as shown in the below screenshot:

Now, we have configured two spot worker node groups for our cluster. Create an instance group for OnDemand Worker Nodes by executing the below command:

kops create ig ondemand-nodes --subnet ap-south-1a,ap-south-1b --role Node

kops edit ig ondemand-nodes

Add node labels for the OnDemand Workers.

Also, we have added taints to avoid the pods from OnDemand Worker Nodes. Preferably, the new pods will be assigned to the Spot workers.

To apply the above configurations, execute the below command:

kops update cluster
kops update cluster --yes
kops rolling-update cluster --yes

Cluster Autoscaler

Cluster Autoscaler is an open-source tool which automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:

  • there are pods that failed to run in the cluster due to insufficient resources
  • there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.

CA will run as a daemonset on the Cluster OnDemand Nodes. The YAML file for daemonset is provided in our Github Repo i.e. https://github.com/powerupcloud/kubernetes-spot-webinar/tree/master/cluster-autoscaler.

Update the following variables in the cluster-autoscaler/cluster-autoscaler-ds.yaml

  • Autoscaling Group Names of On-demand and Spot Groups
  • Update minimum count of instances in the Autoscaling group
  • Update max count of instances in the Autoscaling group
  • AWS Region
  • mode selector will ensure to run the CA pods on the OnDemand nodes always.

Create ClusterAutoscaler on both of the k8s clusters EKS as well as the cluster provisioned using kops. Ensure to attach the below permissions to the IAM Role assigned to the cluster worker nodes:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": "*"
}
]
}

The daemonset YAML file for the EKS Cluster will look like as shown in the below screenshot.

Similarly, for the cluster provisioned using Kops, the yaml file will be :


Create the DaemonSet.

kubectl create -f cluster-autoscaler/cluster-autoscaler-ds.yaml

Now create a pod disruption budget for CA which will ensure to run atleast one cluster autoscaler pod always.

kubectl create -f cluster-autoscaler/cluster-autoscaler-pdb.yaml

Verify the Cluster autoscaler pod logs in kube-system namespace:

kubectl get pods -n kube-system
kubectl logs -f pod/cluster-autoscaler-xxx-xxxx -n kube-system

Spot Termination Handler

The major fallbacks of a Spot Instance are –

  • it may take a long time to become available (or may never become available),
  • and maybe reclaimed by AWS at any time.

Amazon EC2 can interrupt your Spot Instance when the Spot price exceeds your maximum price, when the demand for Spot Instances rises, or when the supply of Spot Instances decreases. Whenever you are opting for Spot, you should always be prepared for the interruptions.

So, we are creating one interrupt handler on the clusters which will run as a daemonset on the OnSpot Worker Nodes. The workflow of the Spot Interrupt Handler can be summarized as:

  • Identify that a Spot Instance is being reclaimed.
  • Use the 2-minute notification window to gracefully prepare the node for termination.
  • Taint the node and cordon it off to prevent new pods from being placed.
  • Drain connections on the running pods.
  • To maintain desired capacity, replace the pods on remaining nodes

Create the Spot Interrupt Handler DaemonSet on both the k8s clusters using the below command:

kubectl apply -f spot-termination-handler/deploy-k8-pod/spot-interrupt-handler.yaml

Deploy Microservices with Istio

We have taken a BookInfo Sample application to deploy on our cluster which uses Istio.

Istio is an open platform to connect, manage, and secure microservices. For more info, see the link in the References section below. To deploy Istio on the k8s cluster, follow the steps below:

wget https://github.com/istio/istio/releases/download/1.0.4/istio-1.0.4-linux.tar.gz
tar -xvzf istio-1.0.4-linux.tar.gz
cd istio-1.0.4

In our case, we have provisioned the worker nodes in private subnets. For Istio to provision a publically accessible load balancer, tag the public subnets in your VPC with the below tag:

kubernetes.io/cluster/puck8s:shared

Install helm from the link below:

https://github.com/helm/helm

kubectl create -f install/kubernetes/helm/helm-service-account.yaml
helm init --service-account tiller --wait
helm install --wait --name istio --namespace istio-system install/kubernetes/helm/istio --set global.configValidation=false --set sidecarInjectorWebhook.enabled=false
kubectl get svc -n istio-system

You will get the LoadBalancer endpoint.

Create a gateway for the Bookinfo sample application.

kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml

The BookInfo sample application source code, Dockerfile, and Kubernetes deployment YAML files are available in the sample-app directory in our Github repo.

Build a docker image out of provided Dockerfiles and update the IMAGE variable in k8s/deployment.yaml for all the four services. Deploy each service using:

kubectl apply -f k8s

Hit http://LB_Endpoint/productpage. you will get the frontend of your application.

AutoScaling when the Application load is High

If the number of pods increases with the application load, the cluster autoscaler will provision more worker nodes in the Autoscaling Group. If the Spot Instance is not available, it will opt for OnDemand Instances.

Initial Settings in the ASG:

Scale up the number of pods for one deployment, for example, product page. Execute:

kubectl scale --replicas=200 deployment/productpage-v1

Watch the Cluster Autoscaler manage the ASG.

Similarly, if the application load is less, CA will manage the size of the ASG.

Note: We dont recommend to run the stateful applications on Spot Nodes. Use OnDemand Nodes for your stateful services.

and that’s all..!! Hope you found it useful. Happy Savings..!!

References:

Building your first Alexa Skill — Part 1

By | AI, Alexa, Blogs, Machine Learning, ML | No Comments

Written by Tejaswee Das, Software Engineer, Powerupcloud Technologies

Technological advancement in the area of Artificial Intelligence & Machine Learning has not only helped systems to become more intelligent but has also made them more vocational. You can just speak to the phone & add items to your shopping list or just instruct your laptop to read your email. In this fast-growing era of voice-enabled automation, Amazon’s Alexa enabled devices are changing the way people go through their daily routines. In fact, it has introduced a new term in the dictionary, Intelligent Virtual Assistant (IVA).

Technopedia defines Intelligent Virtual Assistant as an engineered entity residing in software that interfaces with humans in a human way. This technology incorporates elements of interactive voice response and other modern artificial intelligence projects to deliver full-fledged “virtual identities” that converse with users.”

Some of the most commonly used IVAs are Google Assistant, Amazon Alexa, Apple Siri, Microsoft Cortana, with Samsung Bixby joining the already brimming list lately. Although IVAs seem to be technically charged, they bring enormous automation & value. Not only do they make jobs for humans easier, but they also optimize processes and reduce inefficiencies. These systems are so seamless, that just a simple voice command is required to get tasks completed.

The future of personalized customer experience is inevitably tied to “Intelligent Assistance”. –Dan Miller, Founder, Opus Research

So let’s bring our focus to Alexa, Amazon’s IVA. Alexa is Amazon’s cloud-based voice service, which can interface with multiple devices on Amazon. Alexa gives you the power to create applications, which have the capability to interact in natural language, making your systems more intuitive to interact with technology. Its capabilities mimic those of other IVAs such as Google Assistant, Apple Siri, Microsoft Cortana, and Samsung Bixby.

The Alexa Voice Service (AVS) is Amazon’s intelligent voice recognition and natural language understanding service that allows you to voice-enable any connected device that has a microphone and a speaker.

Powerupcloud has worked on multiple use-cases, where they have developed Alexa voice automation. One of the most successful & adopted use cases being one of the largest General Insurance providers.

This blog series aims at giving a high-level overview of building your first Alexa Skills. It has been divided into two parts, first, covering the required configurations for setting up the Alexa skills, while the second focuses on the approach for training the model and programming.

Before we dive in to start building our first skill, let’s have a look at some Alexa terminologies.

  • Alexa Skill — It is a robust set of actions or tasks that are accomplished by Alexa. It provides a set of built-in skills (such as playing music), and developers can use the Alexa Skills Kit to give Alexa new skills. A skill includes both the code (in the form of a cloud-based service) and the configuration provided on the developer console.
  • Alexa Skills Kit — A collection of APIs, tools, and documentation that will help us work with Alexa.
  • Utterances — The words, phrases or sentences the user says to Alexa to convey a meaning.
  • Intents — A representation of the action that fulfils the user’s spoken request.

You can find the detailed glossary at

https://developer.amazon.com/docs/ask-overviews/alexa-skills-kit-glossary.html

Following are the prerequisites to get started with your 1st Alexa skill.

  1. Amazon Developer Account (Free: It’s the same as the account you use for Amazon.in)
  2. Amazon Web Services (AWS) Account (Recommended)
  3. Basic Programming knowledge

Let’s now spend some time going through each requirement in depth.

We need to use the Amazon Developer Portal to configure our skill and build our model which is a necessity.

  • Click on Create Skill, and then select Custom Model to create your Custom Skill.

Please select your locale carefully. Alexa currently caters to English (AU), English (CA), English (IN), English (UK), German (DE), Japanese (JP), Spanish (ES), Spanish (MX), French (FR), and Italian (IT). We will use English (IN) while developing the current skill.

  • Select ‘Start from Scratch’
  • Alexa Developer Console
  • Enter an Invocation Name for your skill. Invocation name should be unique because it identifies Skills. Invocation Name is what you say Alexa to invoke or activate your skill.

There are certain requirements that your Invocation name must strictly adhere to.

  • Invocation name should be two or more words and can contain only lowercase alphabetic characters, spaces between words, possessive apostrophes (for example, “sam’s science trivia”), or periods used in abbreviations (for example, “a. b. c.”). Other characters like numbers must be spelt out. For example, “twenty-one”.
  • Invocation names cannot contain any of the Alexa skill launch phrases such as “launch”, “ask”, “tell”, “load”, “begin”, and “enable”. Wake words including “Alexa”, “Amazon”, “Echo”, “Computer”, or the words “skill” or “app” are not allowed. Learn more about invocation names for custom skills.
  • Changes to your skill’s invocation name will not take effect until you have built your skill’s interaction model. In order to successfully build, your skill’s interaction model must contain an intent with at least one sample utterance. Learn more about creating interaction models for custom skills.
  • Endpoint — The Endpoint will receive POST requests when a user interacts with your Alexa Skill. So this is basically the backend for your Alexa Skill. You can host your skill’s service endpoint either using AWS Lambda ARN, which is recommended, or a simple HTTPS endpoint. Advantages of using an AWS Lambda ARN are :
  • Sign in to AWS Management Console at https://aws.amazon.com/console/
  • Lookup for Lambda in AWS services
  • US East (N. Virginia)
  • EU (Ireland)
  • US West (Oregon)
  • Asia Pacific(Tokyo)

We are using Lambda in the N.Virginia (us-east-1) region.

  • Once we are in a supported region, we can go ahead to create a new function. There are three different options for creating your function. You can create a function from scratch or you can also use available Blueprints and Serverless Application Repositories.
  • C# / .NET
  • Go
  • Java
  • NodeJS
  • Python

We will discuss programming Alexa with different languages in the next part of this series.

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html

  • Go back to the Endpoint section in Alexa Developer Console, and add the ARN we had copied from Lambda in AWS Lambda ARN Default Region.

ARN format — arn:aws:lambda:us-east-1:XXXXX:function:function_name

In, part 2, we will discuss the training our model — adding Intents & Utterances, finding walkarounds for some interesting issues we faced, making workflows using dialog state, understanding the Alexa Request & Response JSON, and finally our programming approach in Python.

Chatbots 2.0 — The new Series of bots & their influence on Automation

By | AI, Artificial Intelligence, Blogs, Chatbot | No Comments

Written by Rishabh Sood, Associate Director — Advanced Services Group at Powerupcoud Technologies

Chatbots as a concept are not new. In fact, under the domain of Artificial Intelligence, the origin of chatbots is quite early, tracing back to as early as 1955. Alan Turing published “Complete Machinery & Intelligence”, starting an unending debate, “Can machines think?”, laying the foundation of the Turing test & eventually leading to ELIZA in 1966, the 1st ever chatbot. It failed to pass the Turing test but did start a horde of chatbots to follow, each one more mature than its predecessor.

The next few years saw a host of chatbots, from PARRY to ALICE, but hardly any saw the light of the day. The actual war on the chatbots started with the larger players coming into the picture. Apple led with Siri in 2010, followed closely by Google Now, Amazon’s Alexa & Microsoft’s Cortana. These chatbots made life a tad easier for the users, as they could now speak to Siri
to book an Uber or tell Alexa to switch off the lights (another way to make our lives more cushioned). While these chatbots did create a huge value to users in terms of making their daily chores automated (& speak to a companion, for the lonely ones), business still was a long way from extracting benefits from the automated conversational channel.

Fast track to the world of today & we see chatbots part of every business. Every company has budgets allocated for automating at least 1 process on chatbots. Oracle says that 80% of the businesses are already using or have plans to start using chatbots for major business functions by 2020.
Chatbots have been implemented across companies & functions, primarily with a focus on automating support systems (internal as well as external). Most of the bots available in the market today respond to user queries basis keywords/phrases match. The more advanced bots today use the concept of intent matching & entities extraction to respond to more complex user queries. A handful of bots today even interact with the enterprise
systems to provide real-time data to the users. Most of the commercially successful bots in the market today are text-based interactions.

Most of the bots in action today augment tasks, which are repeatable/predictable in nature. Such tasks, if not automated, would require considerable human effort, if not automated. These chatbots are powered by Natural Language Processing engines to identify user’s intent (verb or action), which then is passed to the bot’s brain to execute a series of steps, to generate a response for the identified intent. A handful of bots also contain Natural Language Generation engines to generate conversations, with a human touch to it. Sadly, 99.9% of today’s implementations will still fail more than 60 years old Turing test.

It’s true that the conversational Engines, as chatbots are often referred to as, have been there for a couple of years, but the usefulness of their existence will now be brought to test. The last couple of months have seen a considerable improvement in how the conversational engines add value to the businesses, that someone refers to as the chatbot 2.0 wave.

At Powerup, we continuously spend efforts on researching & making our products & offerings better, to suit the increasing market demands. So, what can one expect from this new wave of bots? For starters, the whole world is moving towards voice-based interactions, the text remains only for the traditional few. So, the bots need to be equipped with the smart & intelligent voice to text engines, which can understand different accents & word pronunciations, in addition, to be able to extract the relevant text from the noise in the user’s query, to deliver actual value. The likes of Google & Microsoft have spent billions of dollars on voice to text engines, but the above still remains a tough nut to crack, keeping the accuracy of the voice-based system limited in the business world.

With the voice-based devices, such as Amazon Echo & Google Home, bring convenience & accessibility together. Being available for cheap & in mass (the smart speakers’ market is slated to grow to $11.79 billion by 2023), makes it a regular household item, rather than a luxury. The bots will have to start interacting with users via such devices, not limited to the
traditional channels of Web & Social. This will not only require the traditional voice to text layers to be built in, but specific skills (such as Alexa Voice Services for Alexa compatible devices) to be written. A key factor here is how the user experience on a platform that is purely voice-based (although Echo Spot also has a small screen attached to it), where visual rendering is almost nil, is seamless & equally engaging for the users, as is on traditional channels.

In 2017, 45% of the people globally were reported to have preferred speaking to a chatbot, rather than a human agent. 2 years down the line, chatbots are all set to become mainstream, rather than alternative sources of communication. But this poses a greater challenge for the companies into the business. The bots will now have to start delivering business value, in terms of ROI, conversions, conversation drops & metrics that matter to the business. HnM uses a bot that quizzes the users to understand their references & then show clothing recommendations basis the above-identified preferences. This significantly increased their conversion on customer queries.

The new age of chatbots has already started moving in a more conversational direction, rather than the rule-based response generation, which the earlier bots were capable of. This means the bots now understand human speech better & are able to sustain conversations with humans for longer periods. This has been possible due to the movement of the traditional intent & entity models on NLP to advancement on Neural networks & Convolutional networks, building word clouds & deriving relations on these to understand user queries.

Traditionally, Retail has remained the biggest adopter of the chatbots. According to Statista.com, Retail remained to occupy more than 50% of the chunk in the chatbots market till 2016. With the advancement being brought into the world of chatbots at lightning speed, other sectors are picking up the pace. Healthcare & Telecommunications, followed by Banking are joining the race of deriving business outputs via chatbots, reporting 27%, 25% & 20% acceptance in the area in 2018. The new wave of bots is slated to narrow this gap across sectors in terms of adoption further. A study released by Deloitte this year highlights the increase of internal chatbot use-cases growing more than customer-facing functions, reporting IT use-cases to be the highest.

Chatbots have always remained as a way of conversing with users. Businesses have always focused on how the experience on a chatbot can be improved for the end customer, while technology has focused on how chatbots can be made more intelligent. The bots, being one of the highest growing channels of communication with the customers, generates a host of data in the form of conversational logs. Business can derive a host of insights from this data,
as the adoption of bots among customers increases over the next couple of years. A challenge that most businesses will face would be the regulatory authorities, such as GDPR in the EU. How business work around these, would be interesting to see.

Mobile apps remain the widest adopted means of usage & communication in the 21 st century, but the customers are tired of installing multiple apps on their phones. An average user installs more than 50 apps on a smartphone, the trend is only going to change. With multiple players consolidating the usage of apps, users will limit the no of apps that get the coveted memory on their mobile phones. This will give an opportunity to the businesses to push chatbots as a communication channel, by integrating bots not only on their websites (mobile compatible of course) but other mobile adaptable channels, such as Google Assistant.

According to Harvard Business Review researchers, a 5-minute delay in responding to a customer query increases the chances of losing the customer by 100%, while a 10-minute delay increases this chance 4 times. This basic premise of customer service is taken care of by automated conversational engines, chatbots.

Chatbots have a bright future, especially with the technological advancement, availability & adaptability increasing. How the new age bots add value to the business, remains to be seen and monitored.

It would be great to hear what you think the future of automated user engagement would be and their degree of influence.

AWS bulk Tagging tool -Part II: Graffiti Monkey

By | AWS, Blogs | No Comments

In our last blog post, we have explained how to tag the EC2, RDS, and S3 in bulk numbers by aws-tagger.

Since this tool won’t support bulk volume/snapshots tagging so we have configured another tool for completing the tagging.

In this blog post, we are going to explain about Volume and Snapshots tagging by this amazing tool called Graffiti Monkey.

The Graffiti Monkey goes around tagging things. By looking at the tags an EC2 instance has, it copies those tags to the EBS Volumes that are attached to it, and then copies those tags to the EBS Snapshots.

Setup:

  1. Install graffiti monkey on the EC2 machine.
  2. Create an IAM user with the access key and secret key to provide permission to graffiti monkey.
  3. Create the config file(YAML) with all tags details needs to copy to the EC2.

Let’s start the hands-on:

Login to your EC2 Linux machine:

i)First, install pip on the machine.

yum install python-pip

ii)Second, install the graffiti-monkey

pip install graffiti_monkey

1-Config file(YAML):

  • Create the config file (YAML) in the AWS-EC2 machine. (Below is the sample)
  • We are going to use the same YAML file here for all the accounts as the tags are common for all of them.
  • If you add the new tags in EC2, then you need to add the new tags to this YAML file also as per the requirement.

tagging.yaml:

------region: eu-west-1instance_tags_to_propagate:- ‘Business Unit’- ‘Project’- ‘Customer’- ‘Environment’- ‘Product’- ‘Version’- ‘Requestor’- ‘Revenue_Type’- ‘Business_Model’- ‘Service’volume_tags_to_propagate:- ‘Business Unit’- ‘Project’- ‘Customer’- ‘Environment’- ‘Product’- ‘Version’- ‘Requestor’- ‘Revenue_Type’- ‘Business_Model’- ‘Service’- ‘Name’- ‘instance_id’- ‘device’

2-AWS Credentials:

  • We will create access and secret key for this IAM to provide permission to our EC2 to tag the resources into the account.
  • We need to attach the below permission to the IAM user from respective accounts.
{
“Version”: “2012–10–17”,
“Statement”: [{
“Action”: [
“ec2:Describe*”,“ec2:CreateTags”,
],
“Effect”: “Allow”,
“Resource”: “*”
}
]
}
  • We can attach IAM role also, to the EC2 machine directly for permission.

Graffiti_monkey command:

  • Now run the below command from the EC2 machine CLI:
graffiti-monkey --region us-east-1 --config tagging.yaml

Output:

Snapshot:

From console:

Volumes and Snapshot:

I hope this is helpful, please comment below in case of any implementation issues.

References: https://github.com/Answers4AWS/graffiti-monkey

Creating a VM Snapshot in google cloud using Python

By | Blogs, GCP | One Comment

Written by Nirmal Prabhu, Cloud Engineer, Powerupcloud Technologies.

Information is eternal, computers are ephemeral, backup is the saviour.

Keeping it to the point, we have a script to do that to automate VM disk snapshot for google cloud using python and it works with the help of tags.

This script will take a snapshot of all the disks of a VM whose tag matches under condition.

Tag your Virtual machine whose disks to be backed up. Here we used [‘env’:’prod’] where “env” is a key and “prod” is a value.

import apiclient

import json

from datetime import datetime, timedelta

day = datetime.now()

##To get the current date with format.

currday = day.strftime(‘%d-%m-%Y’)

compute = apiclient.discovery.build(‘compute’, ‘v1’)

def list_instances(compute, project, zone):

result = compute.instances().list(project=project, zone=zone).execute()

desired_vms= []

vmdisks= []

data=””

for each_item in result[‘items’]:

if each_item[‘labels’][‘env’]==’prod’: ##Mention the VM label to take snapshot

desired_vms.append(str(each_item[‘name’]))

disks = (each_item[‘disks’])

for disk in disks:

vmdisks.append(str(disk[‘deviceName’]))

data={“vmname”:desired_vms,”disk”:vmdisks}

for disk in vmdisks:

snapshot_body = {‘name’:’automated-snap-’+disk + currday} ## Name Format for new snapshot.

print “Creating snap for %s” % disk

request = compute.disks().createSnapshot(project=’xxx’, zone=’asia-south1-c’,disk=disk, body=snapshot_body) ##Mention the project and Zone

response = request.execute()

return data

print(list_instances(compute,’xxx’ ,’asia-south1-c’)) ##Mention the project and Zone

That’s it… We are done. Happy Automating…Let us know what you think!

AWS bulk Tagging tool -Part I: aws-tagger

By | AWS, Blogs | No Comments

Written by Mudita Misra, Cloud Engineer, Powerupcloud Technologies

Why and How the aws-tagger is useful for us?

Use case: “What if we have bulk AWS untagged resources and we need to get the billing based on tags in one or two daytime then how will we do it??”

In this article, we are going to explain how we can do the AWS resource tagging for bulk in number resources in just a few minutes.

Scenario:

  1. There was a requirement for one of our customers where we were having multiple accounts with bulk resources-EC2, RDS and S3. These resources have to be tagged with 8–9 Business tags for billing/segregation purposes. So we have explored and implemented aws-tagger to make the tagging someway easier.
  2. Tagging AWS resources is hard because each resource type has a different API which is slightly different. The AWS bulk tagging tool eliminates these differences so that you can simply specify the resource ID and the tags and it takes care of the rest.

Note: Any tags that already exist on the resource will not be removed, but the values will be updated if the tag key already exists. Tags are case sensitive.

Setup:

  1. Install aws-tagger on the local/EC2-machine
  2. Create IAM user with access key and secret key to provide permission to aws-tagger to apply the tags on the resources.
  3. Create the CSV file with all tags details.

Let’s start the hands-on:

  1. We can do it from our local machine and also we can have one AWS EC2 Linux/Windows machine from customer private network(if concerned).

i)First, install pip on the machine.

yum install python-pip

ii)Second, install the aws-tagger

pip install aws-tagger

AWS Credentials:

  1. We will create access key and secret key for this IAM to provide permission to our EC2/local to tag the resources into the account.
  2. We need to attach the below permission to the IAM user from different respective accounts.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ec2:Describe*",
"ec2:CreateTags",
"rds:Describe*",
"rds:AddTagsToResource",
"s3:Describe*",
"s3:PutBucketTagging",
"s3:GetBucketTagging"
],
"Effect": "Allow",
"Resource": "*"
}
]
}

aws configure

How many ways are there to do tagging by aws tagger???

We have below ways according to requirements:

1. Tag individual resource with a single tag

aws-tagger --resource i-07axxxxxxx --tag "Business:Production"

2. Tag multiple resources with multiple tags

aws-tagger --resource i-07axxxxxxxx --resource i-045xxxxxx --tag "Business:Production" --tag "User:Mudita"

3. Tag multiple resources from a CSV file (for bulk resources)

We need to create a CSV file which will be having the Resource ID, Region ID and tag keys with values to be attached to the respective resources.

Note: Make sure no key-value should be empty/blank if you are not sure about values put ‘NA’ or ‘-’

i) We can create an excel file in google sheets and later save it as the CSV file and use it for tagging.

For example:

ii) Download/Copy the CSV file to the local/AWS-EC2 machine.

AWS TAGGER:

Now run the below command for CSV file:

aws-tagger --csv tagger-ec2-details-mudita\ -\ aws-tagger.csv

If the command returns to the next line, there is no error and the resources are been tagged. We can verify the tags now from our AWS console.

Implemented AWS Tagger on the following AWS resource types:

1. EC2 instances

aws-tagger --resource i-07XXXXXXX --tag "Business:Production" --tag "User:Mudita"

2. S3 buckets

aws-tagger --resource mudita-powerup-bucket --tag "Business:Production" --tag "User:Raju"

3. RDS instances

aws-tagger --resource arn:aws:rds:us-east-1:1111XXXX:db:mudita-db --tag "Business:Production" --tag "User:Mudita"

I hope this is helpful, please comment below in case of any implementation issues.

Any EC2 volumes that are attached to the instance will be automatically tagged but in case of bulk resources, we won’t recommend aws tagger. We will be coming up with a new method for tagging volumes and snapshots in our next part.

Keep following the blog post for the upcoming part on how to tag Volumes and Snapshots attached to EC2 instances.

For more resources, you can follow below Github link:

Reference: https://github.com/washingtonpost/aws-tagger

Automated Deployment of PHP Application using Gitlab CI on Kubernetes — Part VII

By | AWS, Blogs, Kubernetes | No Comments

Written by Priyanka Sharma, DevOps Architect, Powerupcloud Technologies

Recently, we have got an opportunity to develop and deploy an application on a Kubernetes cluster running on AWS Cloud. We have developed a sample PHP application which will parse the CSV file and upload the content of the file into a MySQL RDS Instance. The application UI also supports some other functionalities like updating/deleting a particular row from the database, store and view processed files via AWS S3 bucket and view all the records of MySQL database. The Kubernetes Cluster is being provisioned using KOPS tool.​​ This article discusses the following points:

​​Prerequisites

  • ​​Route53 hosted zone (Required for KOPS)
  • ​​One S3 Bucket (Required for KOPS to store state information)
  • ​​One S3 bucket (Required to store the processed CSV files, for example, pucdemo-processed-.csv)
  • ​​One S3 bucket to store Application Access logs of the Loadbalancer.
  • ​​MySQL RDS in private subnet.3306 port is opened to the Kubernetes Cluster Nodes.
  • ​​Table to store the data from .CSV File in supported variables. In our case, we have used the following command to create a table in the database.
create database csvdb;
CREATE TABLE puc_csv(
sku INT,
name VARCHAR(200),
price DOUBLE
);

​​Setup

  • ​​Cloud: Amazon Web Services
  • ​​Scripting Languages Used: HTML, Javascript and PHP
  • ​​Kubernetes Version: 1.11
  • ​​K8s Cluster Instance Type: t2.medium
  • ​​Instances are launched in Private subnets
  • ​​3 masters and 2 nodes (Autoscaling Configured)
  • ​​K8s Master / Worker node is in the Autoscaling group for HA / Scalability / Fault Tolerant
  • ​​S3 buckets to store data (details in Prerequisites)
  • ​​Route53 has been used for DNS Management
  • ​​RDS — MySQL 5.7 (MultiAZ Enabled)

​​Provision Kubernetes Cluster on AWS

kops create cluster pucdemo.powerupcloud.com --ssh-public-key ~/.ssh/id_rsa.pub --master-zones ap-south-1a --zones ap-south-1a,ap-south-1b,ap-south-1a --master-size=t2.medium --node-count=2 --master-count 3 --node-size t2.small --topology private --dns public --networking calico --vpc vpc-xxxx --state s3://pucdemo-kops-state-store --subnets subnet-xxxx,subnet-xxxx --utility-subnets subnet-xxx,subnet-xxx --kubernetes-version 1.11.0 --api-loadbalancer-type internal --admin-access 172.31.0.0/16 --ssh-access 172.31.xx.xxx/32 --cloud-labels "Environment=TEST" --master-volume-size 100 --node-volume-size 100 --encrypt-etcd-storage;

​​where,

  • ​ We have provided our public key in the argument — ssh-public-key. The respective private key will be used for SSH access to your master and nodes.
  • private subnets are provided as arguments in “ — subnets”: will be used by Kubernetes API(internal)
  • ​​public subnets are provided as arguments in “— utility-subnets”: will be used by Kubernetes services(external)
  • ​​ “— admin-access” will have the IP CIDR for which the Kubernetes API port will be allowed.
  • ​​ “— ssh-access” will have the IP from where you will be able to SSH into master nodes of Kubernetes Cluster.
  • ​​pucdemo.powerupcloud.com is the hosted zone created in Route 53. KOPS will create API related DNS records within it.

​​Attach ECR Full access policy to cluster nodes Instance Profile.

​​Create Required Kubernetes Resources

Clone the below Github repo:

https://github.com/powerupcloud/k8s-data-from-csvfile-to-database.git

Create Gitlab Instance:

Replace the values for the following variable in the Kubernetes-gitlab/gitlab-deployment.yml :

  • GITLAB_ROOT_EMAIL
  • GITLAB_ROOT_PASSWORD
  • GITLAB_HOST
  • GITLAB_SSH_HOST
kubectl create -f kubernetes-gitlab/gitlab-ns.yml
kubectl create -f kubernetes-gitlab/postgresql-deployment.yml
kubectl create -f kubernetes-gitlab/postgresql-svc.yml
kubectl create -f kubernetes-gitlab/redis-deployment.yml
kubectl create -f kubernetes-gitlab/redis-svc.yml
kubectl create -f kubernetes-gitlab/gitlab-deployment.yml
kubectl create -f kubernetes-gitlab/gitlab-svc.yml

kubectl get svc -n gitlab” will give the provisioned Loadbalancer Endpoint. Create a DNS Record for the Endpoint, for example, git.demo.powerupcloud.com.

Create Gitlab Runner:

Replace the values for the following variable in the gitlab-runners/configmap.yml :

  • Gitlab URL
  • Registration Token

Go to the Gitlab Runners section in the Gitlab console to get the above values.

kubectl create -f gitlab-runners/rbac.yaml
kubectl create -f gitlab-runners/configmap.yaml
kubectl create -f gitlab-runners/deployment.yaml

Create CSVParser Application:

Create a base Docker image with Nginx and php7.0 installed on it and push to ECR. Give the base image in csvparser/k8s/deployment.yaml.

kubectl create -f csvparser/k8s/deployment.yaml
kubectl create -f csvparser/k8s/service.yaml

kubectl get svc” will give the provisioned Loadbalancer Endpoint. Create a DNS Record for the Endpoint, for example, app.demo.powerupcloud.com.

Application Functionality

  • Basic Authentication is enabled for the main page.
  • The browse field will accept the CSV file only.
  • After uploading, the data will be imported into the database by clicking the “Import” button.
  • The processed files can be viewed by clicking on the “View Files” button.
  • “View Data” button will list the records from the database in tabular format.
  • The data record can be edited inline and updated into the database by clicking the “Archive” button.
  • A particular row can be deleted from the database by clicking the “Delete” button.
  • The application is running on two different nodes in different subnets and is being deployed under a Classic LoadBalancer.

CI/CD

  • The Gitlab Instance and Runner are running as pods on the Kubernetes Cluster.
  • The application code is available in the Gitlab Repository along with Dockerfile and .gitlab-ci.yml
  • The pipeline is implemented in Gitlab Console using .gitlab-ci.yml file.
  • Whenever a commit is pushed to the Repository, the pipeline is triggered which will execute the following steps in a pipeline:
  • Build: Build a docker image from the Dockerfile and push to AWS ECR Repo.
  • Deploy: Updates the docker image for the already running application pod on Kubernetes Cluster.

Application in Action

Hit the Gitlab Service:

Sign in with the credentials.

Create a new Project and push the code. It will look like:

The Pipelines will look like:



The Application

View Data:

View Processed Files:

Editable table:

“Archive” will update the database.

Delete will delete the row from the database.

Note: We don’t recommend the application code to use in any scenario. It’s just for our testing purpose. It is not written using the best practices. This article showcases the provisioning of Kubernetes cluster using KOPS with best practices and the deployment of any PHP application on the cluster using Gitlab pipelines.

Hope you found it useful. Keep following our blogs for the more interesting articles on Kubernetes. Do visit the previous parts of this series.

References

Configuring replication in Apache Solr

By | Apache, Blogs, Solr | No Comments

Written by Nirmal Prabhu, Former Cloud Engineer, Powerupcloud Technologies

Apache Solr Replication: In this Solr replication example, we will set up replication in Apache Solr and demonstrate how a new record gets replicated from the master to slave cores. For this, we will consider one master and one slave server. In the production environment, we will use different machines for hosting the master and the slave server.

Step 1: [Install java]

Install Java and set Environment variable.

Step 2: [Install Apache Solr]

To begin with let’s download the latest version of Apache Solr from Here.

Once the Solr zip file is downloaded unzip it into a folder. The extracted folder will look like the below.

We can start the server using the command line script. Let’s go to the bin directory from the command prompt and issue the following command

  • solr start

This will start the Solr server under the default port 8983.

We can now open the following URL in the browser and validate that our Solr instance is running. The specifics of solr admin tool is beyond the scope of the example.

http://localhost:8983/solr/

Step 3: [Configuring Solr — master]

In this section, we will show you how to configure the master core for a Solr instance. Apache Solr ships with an option called Schemaless mode. This option allows users to construct an effective schema without manually editing the schema file. For this example, we will use the reference configset sample_techproducts_configs.

Step 4: [Creating master Core]

First, we need to create a core for indexing the data. The Solr create command has the following options:

  • -c <name> — Name of the core or collection to create (required).
  • -d <confdir> — The configuration directory, useful in the SolrCloud mode.
  • -n <configName> — The configuration name. This defaults to the same name as the core or collection.
  • -p <port> — Port of a local Solr instance to send the create command to; by default the script tries to detect the port by looking for running Solr instances.
  • -s <shards> — Number of shards to split a collection into, default is 1.
  • -rf <replicas> — Number of copies of each document in the collection. The default is 1.

In this example, we will use the -c parameter for core name, -rf parameter for replication and -d parameter for the configuration directory.

Now navigate the solr-5.0.0\bin folder in the command window and issue the following command.

solr create -c master -d sample_techproducts_configs -p 8983 -rf 3

We can see the following output in the command window.

Now we can navigate to the following URL and see master core being populated in the core selector. You can also see the statistics of the core.

http://localhost:8983/solr/#/master

Step 5: [Modify solrconfig]

Open the file solrconfig.xml under the folder server\solr\master\conf

Solrconfig.xml

<requestHandler name=”/replication” class=”solr.ReplicationHandler” >

<lst name=”master”>

<str name=”enable”>${enable.master:true}</str>

<str name=”replicateAfter”>commit</str>

<str name=”confFiles”>schema.xml,stopwords.txt</str>

</lst>

<lst name=”slave”>

<str name=”enable”>${enable.slave:false}</str>

<str name=”masterUrl”>http://privateip:8983/solr</str>

<str name=”pollInterval”>00:00:60</str>

</lst>

</requestHandler>

Since we have modified the solrconfig we have to restart the solr server. Issue the following commands in the command window navigating to solr-5.0.0\bin

  • solr stop -all
  • solr start

Step 6: [Configuring Solr — slave]

The data from the master core will get replicated into both slaves. We will run the two slaves on the same machine with different ports along with the master core. To do so, extract another copy of solr server to a folder called solr1. Navigate to the solr-5.0.0\bin folder of solr1 in the command window and issue the following command.

  • solr start -p 9000

The -p option will start the solr server in a different port. For the first slave, we will use port 9000.

Now navigate to the solr-5.0.0\bin folder of the slave in the command window and issue the following command.

Now open the file solrconfig.xml under the folder server\solr\slave\confand add the configuration for the slave under the request handler tag. In the configuration, we will point the slave to the masterUrl for replication. The poll interval is set to 20 seconds. It is the time difference between two poll requests made by the slave.

Solrconfig.xml

<requestHandler name=”/replication” class=”solr.ReplicationHandler” >

<lst name=”slave”>

<! — fully qualified url for the replication handler of master. It is possible

to pass on this as

a request param for the fetchindex command →

<str name=”enable”>${enable.slave:true}</str>

<str name=”masterUrl”>http://privateip:8983/solr/master/replication</str>

<! — Interval in which the slave should poll master .Format is HH:mm:ss . If

this is absent slave does not

poll automatically.

But a fetchindex can be triggered from the admin or the http API →

<str name=”pollInterval”>00:00:20</str>

<str name=”httpBasicAuthUser”>Administrator</str>

<str name=”httpBasicAuthPassword”>2z)DVL.7FNs</str>

</lst>

</requestHandler>

Since we have modified the solrconfig we have to restart the solr server. Issue the following commands in the command window navigating to solr-5.0.0\bin

  • solr stop -all
  • solr start -p 9000

Now open the slave console using the following URL. The replication section will show the configuration reflecting the configuration we made in the solrconfig.

http://localhost:9000/solr/#/slave/replication

Step 7: [Indexing and Replication]

Now we will index the example data pointing to the master core. Apache Solr comes with a Standalone Java program called the SimplePostTool. This program is packaged into JAR and available with the installation under the folder example\exampledocs.

Now we navigate to the example\exampledocs folder in the command prompt and type the following command. You will see a bunch of options to use the tool.

java -jar post.jar -h

The usage format, in general, is as follows

Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg>

[<file|folder|url|arg>…]]

As we said earlier, we will index the data present in the “books.csv” file shipped with Solr installation. We will navigate to the solr-5.0.0\example\exampledocs in the command prompt and issue the following command.

java -Dtype=text/csv -Durl=http://localhost:8983/solr/master/update -jar post.jar books.csv

The System Properties used here are:

  • -Dtype — the type of data file.
  • -Durl — URL for the jcg core.

The file “books.csv” will now be indexed and the command prompt will display the following output.

Now open the console of the slave cores and we can see the data replicated automatically.

http://localhost:9000/solr/#/slave

Step 8: [Add new record]

Now we validate the replication further by adding a record to the master core. To do it, let’s open the master console URL.

http://localhost:8983/solr/#/master/documents

Navigate to the documents section and choose the document type as CSV and input the following content into the document text area and click on Submit.

id,cat,name,price,inStock,author,series_t,sequence_i,genre_s

123,book,Apache Solr,6.99,TRUE,Ram,JCG,1,Technical

The data will be added to master core and get replicated to the slave servers. To validate it lets navigate to the slave core. We can find the count of documents getting increased to 11. We can also use the query section in the slave admin console to validate it. Open the following URL.

http://localhost:9000/solr/#/slave/query

Input the values name: apache in the q text area and click on Execute Query. The new record we inserted on the master core will get reflected in the slave core.