Elasticsearch Cluster Migration Made Easy with Elasticdump

Elasticsearch Cluster Migration Made Easy with Elasticdump

October 03, 2023
Get tips and best practices from Develeap’s experts in your inbox

Our lives would be easier if all deployments took place in the cloud and all components were connected by a managed network. But real life is more complex, especially when dealing with hybrid environments (a combination of on-prem and cloud environments).

So, you’ve decided to use Elasticsearch to manage your logs so that you can find relevant information quickly and easily. That’s great!

But what should you do if you can’t send your logs to one central place in the cloud? 

How will you send logs from an offline environment to your developers with no open network port, or are you just looking for a migration tool between two or more Elasticsearch clusters?

If this is your case, this article has got you covered. Let’s dive in.

This article explains what to do when facing the challenge of managing migration between multiple Elasticsearch clusters in the most efficient and easy way. 

Should we start by mapping logs to files?

Some deployments map log files from running containers to local folders on their production servers and then copy them to a portable drive and send them to developers.

But, in this case, Elasticsearch is not used and we lose important metadata Elasticsearch collects for each log record, like – the container name of the log, image name and tag, hostip, etc.

Also – It would be large amounts of data to send every time.

Mapping logs to files creates a separate management system that will require your time and resources. Although it may be your go-to solution, this might not be the most efficient way to migrate your logs.

Time is money

We do not want to waste time managing logs, mapping and collecting logs files (from many servers…), and spending additional time while sending them to developers.

We want our product to be managed in the fastest and most efficient way – logs are sent fast to developers, issues or any bugs are investigated, and fixes are made, so that the customers are satisfied.

Finding the right tool

There are several tools that allow us to perform a migration between Elasticsearch clusters.

Logstash, Snapshots, Elasticsearch Migration Tool, Elasticdump

I found Elasticdump to be a great, simpler, and faster option for the following advantages:

✅ We will keep Elasticsearch structure in addition to keeping all its indexes.

✅ We can collect all the data from a main location.

✅ We can filter logs and send the relevant data, like – container name, namespace, timestamp, etc.

✅ We can easily encapsulate the command within a script, triggered by a scheduled 

 or any other automation method you prefer, and run it as part of the deployment for a fully automated solution.

✅ Can work as part of an offline system.

Let’s explore how Elasticdump tool works

We will describe two sides of the migration – the input and the output.

The input should be the elastic URL or file from which we want to take the migration data, and the output should be an Elasticsearch URL or file where the data should be saved.

The base command is
elasticdump \
  --input=http://source_es_server:9200/source_index \
  --output=http://destination_es_server:9200/destination_index \
  --type=data

 

Add filters

Would you like to include filters? Add the --searchBody

A filter can contain all the filters you use for the elastic query. In that example, I added a time and namespace filter to capture all logs from a specific namespace within designated timeframes.

--searchBody='{
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "@timestamp": {
                  "gte": "'"$timestamp_start"'",
                  "lte": "'"$timestamp_end"'",
                  "format": "MMM dd, yyyy HH:mm:ss.SSS"
                }
              }
            },
            {
              "term": {
                "kubernetes.namespace": "'"$namespace"'"
              }
            }
          ]
        }
      }
    }'

 

Working in an offline environment

We talked about an offline environment, right?

Let’s see how Elasticdump works by saving data to files.

Saving to file

Set the --output with your file path.

elasticdump \
  --input=http://source_es_server:9200/source_index \
  --output=/path/to/output_file.json \
  --type=data

 

Upload data from a file

Set the --input with your file path.

elasticdump --input=/path/to/input_file.json \
--output="http://destination_es_server:9200/destination_index"

 

Example:

Let’s explore a full solution for an offline system.

On the source Elasticsearch cluster:

  1. Use Elasticdump tool to save data to a file.
  2. Relocate the dump file – copy it to the destination environment.

On the destination Elasticsearch cluster side:

  1. Create an index template if needed.
  2. Use Elasticdump to upload the dump.
  3. Create a data view in Kibana if needed to be able to explore the dumped indexes.
Example bash script for each part
  1. Using elasticdump tool to save to a file.
elasticdump \
  --input=http://source_es_server:9200/source_index \
  --output=/path/to/output_file.json \
  --type=data \
  --searchBody='{
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "@timestamp": {
                  "gte": "'"$timestamp_start"'",
                  "lte": "'"$timestamp_end"'",
                  "format": "MMM dd, yyyy HH:mm:ss.SSS"
                }
              }
            },
            {
              "term": {
                "kubernetes.namespace": "'"$namespace"'"
              }
            }
          ]
        }
      }
    }'
  1. Relocate the dump file from /path/to/output_file.json and copy it to your destination environment to /path/to/input_file.json
  2. Create an index template if needed. Run the command below, it will create it in case it does not already exist.
    • Replace the <PATTERN_NAME> according to your preference.
    • Change the lifecycle as required.
    • Set the environment variables beforehand:

ELASTIC_USERNAME, ELASTIC_PASSWORD, ELASTIC_URL

curl -XPUT -u $ELASTIC_USERNAME:$ELASTIC_PASSWORD -k "https://$ELASTIC_URL/_index_template/<PATTERN_NAME>" -H "Content-Type: application/json" -d '
    {
      "index_patterns": ["<PATTERN_NAME>-*-*"],
      "template": {
        "settings": {
          "index.lifecycle.name": "7-days-default"
        }
      }
    }'
  1. Upload the dump from the file in /path/to/input_file.json
elasticdump --input=/path/to/input_file.json \
   --output="http://destination_es_server:9200/destination_index"
  1. Create a data view in Kibana if it does not exist.

The environment variable to set before:

  "KIBANA_USERNAME, KIBANA_PASSWORD, KIBANA_URL, DATA_VIEW_NAME, DATA_VIEW_TITLE"

# Check if the data view already exists
    EXISTING_VIEW=$(curl -s -X GET "http://$KIBANA_USERNAME:$KIBANA_PASSWORD@${KIBANA_URL}/api/data_views" | jq '.data_view[] | select(.name == "'"${DATA_VIEW_NAME}"'")')


    if [[ -z "${EXISTING_VIEW}" ]]; then
      # Data view doesn't exist, create it
      curl -X POST "http://$KIBANA_USERNAME:$KIBANA_PASSWORD@${KIBANA_URL}/api/data_views/data_view" -H 'kbn-xsrf: true' \
      -H "Content-Type: application/json" -d '
      {
        "data_view": {
          "title": "'"${DATA_VIEW_TITLE}-*"'",
          "name": "'"${DATA_VIEW_NAME}"'"
        }
      }'
    else
      echo "Data view with name ${DATA_VIEW_NAME} already exists."
    fi

 

We looked at the Elasticdump tool for migration between Elasticsearch clusters for both online and offline systems.
Even if your deployment is offline, you can still do it! 🙂

We’re Hiring!
Develeap is looking for talented DevOps engineers who want to make a difference in the world.