Category

Cloud Assessment

Transfer Data from Mysql to BigQuery using Data Fusion Pipeline

By | Blogs, Cloud Assessment, data | One Comment

Written by Anjali Sharma, Software Engineer at Powerupcloud Technologies

What is Cloud Data Fusion Pipeline? –

Cloud Data fusion is an enterprise data integration service provided by Google for quickly building and managing pipelines. A fully managed, cloud-native platform where a source(MySQL) and sink(bigquery) can be connected easily without using any code.

Since it’s a code-free environment anyone can use it easily without having any hindrances of technical skills or coding knowledge.

Cloud Data Fusion is built on the open-source project CDAP, and this open core ensures data pipeline portability for users. CDAP’s broad integration with on-premises and public cloud platforms gives Cloud Data Fusion users the ability to break down silos and deliver insights that were previously inaccessible.

How it does look on Cloud Data Fusion Platform

Why Cloud Data Fusion?

Now the question arises, why do we use cloud data fusion as we have other options to make a connection from MySQL DB to bigquery for ETL/ELT.

Data fusion pipelines provide fully managed, virtual interface, easy to use, fully scalable, fully distributed platform that enables you to connect to many different data sources easily.

Data fusion pipelines have the flexibility to have all the pipelines as code and enable you to use rest API calls to create and trigger pipelines. Hence cloud data fusion is a complete package to develop data pipelines easily and efficiently.

How do we create the data pipeline? –

Creating data fusion pipeline is quite easy on the Google cloud platform, we can get it done by following a few steps-

Step1- Go to GCP console find Cloud data fusion click on ‘Create Instance’.

Step2- Fill the instance name and region name and click on create.

Step3- It will take 10-15 minutes to create an instance, now go to view instance and click on redirect URL.

Step4-  Now you are inside cloud data fusion instance, click on HUB and choose a pipeline(import data from MySQL).

Step5- Along with pipelines in HUB you are getting several options. Choose import data from MySQL. Now we’re going to install Driver.

Step6-  Install Google cloud JDBC driver which will make a connection to let MySQL database communicate bigquery. We can find the driver from here itself but make sure the driver is of the latest version.

Step7-  Now go to Navigation Bar and click on the control center.

Step8-  Go to green encircled plus symbol and upload the latest version of JDBC driver.

Step9- Give a name to the driver and a suitable class name which is invalid format com.example.myclass and click on finish.

Step10- Now again go to HUB, click on import data from MySQL pipeline and click on create. Give a name to the pipeline and finish. Now you are able to customize your pipeline.

Here in the cloud data fusion studio we can change source and sink accordingly as here we need to connect Database(source) to Bigquery(sink).

Step11- Go to database properties and fill the plugin name and types. After filling the details Browse database and click on Add connection.

Step12- Here you will find installed Mysql driver click on it and put connection name, host, port, database name, id, password.

Step13- Test the connection and add it.

Step14- Now you are able to import your query. Deploy the Pipeline.

Step15- You have deployed your data fusion pipeline successfully.

Conclusion

Cloud Data Fusion takes care of most of ETL/ELT works for you. And since it’s part of Google Cloud, you can take advantage of built-in security benefits when using Cloud Data Fusion rather than self-managed CDAP servers:

  • Cloud-native security control with Cloud IAM—Identity management and authentication efforts are taken care of by Cloud Identity.
  • Full observability with Stackdriver Logging and Monitoring—Logs include pipeline logs and audit logs
  • Reduced exposure to the public internet with private networking.

Cloud Data Fusion offers both preconfigured transformations from an OSS library as well as the ability to create an internal library of custom connections and transformations that can be validated, shared, and reused across an organization. It lays the foundation for collaborative data engineering and improves productivity. That means less waiting for data engineers and, importantly, less sweating about code quality.

Modernizing application from VMs to container deployment-the realtime case study

By | Case Study, Cloud Assessment, Migration | No Comments

Customer: One of the India’s largest online marketplace

 

Problem Statement

The customer decided to modernize its applications from running on a virtual machine (VMs) to a container-based deployment. Shifting from VMs to containers allows developers to deliver changes in a fraction of the time in a cost-effective way. And once an app is in a container, it’s portable. So the team can move it freely from AWS to Azure to Google Cloud, back to on-premise, optimizing the benefits of a hybrid environment. Customer anticipated that the demand for women fashion products would grow quickly and would spike during sales and other promotional events. As the team scaled, the deployment process became a bottleneck. The team was frequently troubleshooting deployment failures, which caused delays and missed target dates. The customer’s Azure-based platform was hosted on the Google Cloud Platform and comprised approximately 25 servers, the majority based in Central-India.

Proposed Solution

Powerup conducted detailed cloud compatibility assessments to chart out-migration of the existing platform to a scalable and highly available Google Kubernetes Engine (GKE) Cluster predominately hosted in Asia-South Region. GKE is a managed, production-ready environment for deploying containerized applications. The migration was conducted in multiple waves to ensure that customer production is not affected. At the end of the migration application, the customer was expected to bill around US$0.15 million annually.

The figure below illustrates the migration roadmap and the steps are explained further:

  • Separate Virtual Private Cloud (VPC) created for Production / Stage Environment
  • HA K8s Private cluster is provisioned through GKE
  • MySQL VM Provisioned and installed through Terraform modules
  • Kafka VM Provisioned and installed through Terraform modules
  • 4 Pool is created in Kubernetes
  • Each pool has the following stateful components
    – Elasticsearch
    – Redis
    – MongoDB
    – Neo4j

Stateless Application Microservices are deployed in each pool according to the priority of the microservices  Ingress Load balancer is deployed in the K8s for the routing the traffic to the microservices.

Benefits

Going through the modernization process helped the team increase its velocity and optimize costs. Deployments were seamless and automated. This enables the team to deploy more frequently to lower environments and reduces the amount of time it takes to roll out a new version to production. Costs came down by 40% as the customer moved to a container platform. Moreover, now that the application is running in a container, it is portable between on-premise and public cloud environments.

Cloud platform

GCP