When I saw the Elasticsearch data export requirements, my first reaction was, why do you want to export?

When writing, it is not enough to write directly to a file of a given format such as CSV.

In fact, the real business scenario is far from what I thought.

Elasticsearch serves as a repository and retrieval source, and the relevant input data sources have long been all-encompassing and almost “omnipotent”.

This is shown in the following figure:

Relational databases (MySQL, Oracle, PostgreSQL), non-relational databases (MongoDB), big data engines (Kafka, Spark, Hadoop, Hbase, Flink), and in-memory databases (Redis) can all be imported into Elasticsearch.

The original data is often pre-processed and ETL (extracted, transformed, loaded) before it is collected and written to Elasticsearch, and the data related to the core retrieval is stored in Elasticsearch.

For certain business scenarios (e.g., banking), Elasticsearch data needs to be exported, but in fact, Elasticsearch data that has been preprocessed and cleaned needs to be exported.

So, what’s the problem? How do I export it?

Take CSV format (export data format) as an example.

There are many ways for Elasticsearch to export data, including but not limited to:

Let’s demonstrate each one with Elasticsearch 8.X.

The results are as follows:

Generate the CSV file as follows:

Common error messages:

Solution: Turn on ssl, which defaults to false. 8.X must be turned on manually.

Parameter meaning:

Screenshot of the tool export implementation:

There are many similar tools, take an example, it is convenient for everyone to practice.

A 1-minute video will do.

The video is as follows, just watch it.

A simple Python program is implemented below.

Uncomplicated syllogism:

Here is just a simple from + size traversal, the amount of data can be changed to scroll implementation.

The result of exporting the CSV is as follows:

Explain:

jq is a json parsing tool under shell scripts.

[“regist_id”, ****, “registration_number”] represents a custom output number as an array.

jq Usage details can be found in the Help Manual:https://stedolan.github.io/jq/tutorial/

The shell script export CSV is as follows:

There are N kinds of Elasticsearch solutions that can be exported, and this article is only a brick and nail.

How is the export scheme selected?

Depending on your business needs, if you don’t want to write code, you can do so with the help of third-party tools.

If you want to use the ELK component, logstash is recommended.

If you only have your own targeted implementation, you can either Python scripts or shell scripts.

For more solutions, welcome to leave a message to exchange.