On July 28, at the Kangaroo Cloud 2022 product conference, Si Shu, head of Kangaroo Cloud Technology, officially announced the release of its product “Big Data Basic Platform EasyMR”.

EasyMR is a big data basic platform developed by Kangaroo Cloud, providing Hadoop, Hive, Spark, Trino, HBase, Kafka and other components, which is fully compatible with the Apache open source ecosystem; Support enterprise-level security control and enable LDAP+Kerberos+Ranger authentication permission system with one click; It provides a one-stop O&M management platform to help enterprises quickly build a big data platform and reduce O&M costs.

Combined with the Kangaroo Cloud in the field of digitalization for many years, the newly released big data basic platform EasyMR closely follows the advanced technology of the open source ecosystem, which can not only help customers easily cope with application scenarios such as massive data collection, storage, calculation, analysis and mining, and data security, but also provide all-round support for the deployment, upgrading, expansion and contraction of intelligent operation and maintenance, and truly become a convenient, intelligent and efficient “data base” for enterprises.

01

Six characteristics to create a domestic big data basic platform

Different from the novelty of ten years ago, now everyone has become completely accustomed to being in the “era of big data”, everyone can deeply feel the various changes and conveniences brought by big data to life, and the era of data explosion pushes every individual, enterprise, industry, and even country forward.

At present, the international situation is changing, the separation of bilateral relations between China and the United States, and the strong support of the state for the localization of Xinchuang have brought great impact to the domestic big data industry and brought new opportunities.

As the foundation and base of everything, the data basic platform has naturally become the top priority of domestic substitution. Only by truly having the ability to build an independent and controllable platform can we gradually establish our own IT underlying architecture and standards and form our own open ecosystem.

EasyMR is such an independently developed, fully controllable “enterprise data base” that is committed to helping enterprises to transform the wisdom of informatization.

By describing the key features of EasyMR, let’s talk about how EasyMR helps enterprises achieve intelligence.

● Interfaced cluster O&M

Hadoop clusters and big data platforms involve a variety of O&M operations, such as node scaling and contraction, component stop starting, service rolling restart, service parameter modification, version upgrade and rollback, etc., through a logical and process-based product interface, which is convenient for O&M personnel to operate and monitor, and improve O&M efficiency.

● Automate deployment

EasyMR produces the product installation package through standardized steps and parameter conventions, and all the services in the installation package are configured in the schema file in the release package, including the configuration parameters, health check parameters, and dependencies between the services of each service. Product deployment can be fully automated with one click based on the relevant configurations in the Schema.

● Dashboard cluster monitoring

Through the integration of open source Promethus and Grafana, the core parameters of clusters, services, and nodes are monitored, and data display is displayed through flexible dashboards. It includes CPU usage, RAM usage, disk space, IO read and write rate, and other core parameters to monitor, real-time grasp the running status of clusters, services, and nodes, and reduce O&M failure rates. At the same time, users are supported to build their own dashboards and monitoring items to achieve custom monitoring items.

● Real-time alarms

It supports real-time monitoring of the operating metrics of various components and services in the cluster, such as CPU, memory, disk, read and write IO, etc., and supports SMS, DingTalk, and email alarm channel configuration, and integrates a variety of third-party message plug-ins. When an exception occurs in the cluster service, the alarm condition can be triggered, and the system will notify the recipient in time.

● Strong scalability

Through the self-developed Easyagent Server, seven major REST interfaces are abstracted, and the installation, start, stop, update, configuration modification, uninstallation, execution, etc. interact with the upper-level application, which can make the agent category and function easily and infinitely extended.

● Safe and stable

Data security and product security are the key issues that big data products need to consider. EasyMR filters out the RFM, drop and other command lines in the product design to prevent misoperation of the database and execute relevant commands in a more secure way. At the same time, the rolling restart of services and the power failure restart of products solve the scenario that the service does not stop running during O&M and save O&M time.

02

Rich big data components solidify the data base

EasyMR supports Hadoop 2.8.5 and Hadoop 3.2.1 big data cluster construction, supports a wealth of big data components, and users can select components according to their business needs.

So, what exactly does EasyMR support for big data components?

● Yarn

Version support:

· Yarn supports Hadoop 2.8.5, 3.2.1

The main function is the Hadoop resource scheduler, which is responsible for managing the resource (CPU and memory) management and scheduling of the entire Hadoop cluster.

● Hdfs

Version support:

· Hdfs supports Hadoop 2.8.5, 3.2.1

Hdfs, or Hadoop distributed file system, is one of the three basic components of Hadoop, mainly to deal with the functions of adding, deleting, modifying, querying, and slicing data in big data scenarios.

● Flink

Version support:

· Flink 1.12

A distributed, open-source computing framework for data stream processing and batch data processing.

● Spark

Version support:

· Spark 2.4.8

A new generation of distributed, open-source memory-based big data frameworks that support offline, real-time computing, SQL syntax, and machine learning processing.

EasyMR has enhanced the DDL capabilities of SQL for open source components and supports the Add Column syntax.

● Hive

Version support:

· Hive 2.3.8

· Hive 3.1.2

A set of offline data processing system based on Hadoop provides structured table data management capabilities on top of HDFS, and provides SQL-like query syntax for data analysis and processing.

● Trino

Version support:

· Trino 0.359

Distributed SQL query engine for high-speed, real-time data analysis.

EasyMR has enhanced Trino’s Connector to support dynamic loading of Connectors; The community’s Connector has been extended to support the Starring Inceptor plugin.

● Hbase

Version support:

· Hbase 1.3.5

· Hbase 2.3.4

A highly reliable, high-performance, column-oriented, scalable, real-time read-write distributed database.

● Zookeeper

Version support:

· Zookeeper 3.6.2

Distributed application coordination service, distributed applications can implement synchronization services, configuration maintenance and naming services based on it, providing a reliable, scalable, distributed, and configurable coordination mechanism for distributed clusters to unify the state of the system.

03

Let’s install and deploy together

Among them, simplicity and ease of use are also a major advantage of EasyMR. EasyMR not only hopes to help enterprises achieve efficient integration of multi-source data and efficient analysis of full data, but also hopes to reduce the threshold for the use of the platform, and will not make the difficulty of getting started a barrier affecting the digital transformation process of enterprises.

So, join us in the installation and deployment of big data products!

01

Create a cluster

EasyMR supports unified management of multiple clusters.

Host cluster creation based on physical machines/virtual machines.

“Add Host” adds the host node by means of account access and command line access

02

Upload the component installation package

Select the existing component installation package on the platform to install and deploy, or upload your own component installation package by local upload or network upload.

03

Rapid and automated deployment with one click

EasyMR supports manual deployment of a single package and automatic deployment of multiple packages.

Automatic deployment defines the component deployment process by uploading the product line, the platform parses and filters the relevant components, implements automatic resource arrangement according to the defined host role, and completes the sequential deployment of the components according to the dependencies, which greatly saves the time of O&M deployment and resource configuration.

During the product installation process, we can see the service deployment progress and view the deployment logs in real time, and the service deployment situation can be seen at a glance.

04

7*24 hours real-time monitoring alarms

By integrating the open source promethus and grafana, EasyMR monitors the core parameters of clusters, services, and nodes, and displays data through a flexible dashboard. It includes CPU usage, RAM usage, disk space, I/O read and write rate and other core parameters for monitoring, real-time grasp of the running status of clusters, services, and nodes, and reduce O&M failure rates. At the same time, users are supported to build their own dashboards and monitoring items to achieve custom monitoring items.

05

Set alarm rules

The platform provides 5 kinds of channel configurations of “SMS channel, mail channel, DingTalk channel, enterprise WeChat channel, and custom channel”, and the user selects the appropriate channel according to the needs and fills in the channel configuration information, message template, address, etc. to complete the channel configuration.

04

Write at the end

Yes, EasyMR is such an easy-to-use, easy-to-use, and efficient big data basic tool, covering a variety of application scenarios such as enterprise service monitoring and O&M, upgrade and rollback of components, offline data analysis, and streaming data processing.

In the future, EasyMR will adhere to independent innovation, continuous evolution, and copy the accumulated big data practice experience to more enterprises.

This article briefly introduces the main features and deployment methods of EasMR, and to learn more about EasyMR, please click [Read Original] to view.