Working with Elasticsearch Pipelines

When the need comes that you will have to modify your indexed documents in a reliable way, Elasticsearch Pipelines will be the solution emerging up from your researches.

In this article, I will give you a taste, plus a guide about how to use this extremely powerful and easy feature available from the Ingest Node.

I will be using the Elasticsearch 6.6 version.

If you have any other version in use, please check the features presented here, by the Elasticsearch official documentation.

What is an Ingest Node Pipeline?

I like to define a pipeline as the middle-processor between your original document and the document that you want to have after “transformation“.

Mainly, the pipeline definition has two fields:

The description field is where you give the reference of what that pipeline will be doing.

The processors (plural) field is a list of processors that you want invoke to proceed with the document modifications. Is important to note that the processors are executed in the order that you define.

Below, I will present some cool examples, but always consult the documentation to have at least an idea of what is available for your Elasticsearch version.

Creating a pipeline

To create a pipeline, you only need to do a PUT request in the Ingest pipeline API.

PUT _ingest/pipeline/my-new-pipeline
{
  "description" : "that is my new pipeline",
  "processors" : [
    {
      // DESIRED PROCESSOR OBJECT IN HERE ...
    }
  ]
}

Adding a new field in your document

To set up a new field in the document, you can invoke the set processor.

This processor only needs the desired field name and value, for example:

{
  "set": {
    "field": "full_name",
    "value": "my full name"
  }
}

Convert a JSON string into a JSON object

Imagining that you have a field in your document that is filled up with a JSON string format, you can convert it into a JSON object by the json processor.

{
  "json" : {
    "field" : "my_string_json_field",
    "target_field" : "new_field_json_object"
  }
}

If you would like to add this new JSON object in the root of the document, you can set the flag add_to_root to true. You can only have either the target_field or add_to_root flag defined for this processor.

Removing a field from the document

To remove a field in your document is simple as to add (shown above), just try something like:

{
  "remove": {
    "field": "name"
  }
}

Using the Script Processor

That is by far my favorite processor!

Using the Painless Script Language you have extreme flexibility to manipulate your documents as you please.

For instance, you can verify if a field already exists and depending on the results, either create or increment its value:

{
  "description": "test pipeline",
  "version": 1,
  "processors": [
    {
      "script": {
        "source": """
       if (ctx.batch_number == null) {
         ctx.batch_number = 1;
       } else {
         ctx.batch_number = ctx.batch_number + 1;
       }
"""
      }
    }
  ]
}

You may be wondering what are those three quotes presented in the JSON object? That is simply a nice feature available when you use the Kibana Dev Tools Console. Indeed, JSON does not support that format so, after all, the script part will be formatted only as a full inline string.

To have access to the document fields, you can use the syntax ctx.field.

Even metadata of your document can be manipulated, so instructions like ctx_index = "new_index" is totally valid!

Simulating a pipeline result

If you are not completely sure of what results expect from your pipeline, you have a great simulate pipeline API available.

POST _ingest/pipeline/_simulate

The base of this POST request is the pipeline and docs fields.

After defining the wanted processors, you must simulate a document under the docs field, which is what will be processed against the pipeline option.

Example:

{
  "pipeline" :
  {
    "description": "_description",
    "processors": [
      {
        "set" : {
          "field" : "foo",
          "value" : "bar"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_type": "_doc",
      "_id": "1",
      "_source": {
        "firstname": "My Name"
      }
    },
    {
      "_index": "index",
      "_type": "_doc",
      "_id": "2",
      "_source": {
        "firstname": "Another Name"
      }
    }
  ]
}

As you may have noticed, the possibilities to deal with your document changes are pretty exhaustive!

Give you some chance and try out these examples and others that are available in the documentation.

Always remember that you don’t need to run it directly in your index, the simulation API is a good way to ensure the needed results.