Building time-based Elasticsearch indices to perform downtime-free maintenance operations

anusha morappanavar
Tripping Engineering
3 min readAug 28, 2018

--

Zero downtime maintenance of Elasticsearch is no piece of cake, but it is achievable using time-based indices along with the “Hot-Warm” architecture as recommended in Elastic’s blog post.

At Tripping, we use index aliases to add an abstraction layer between the Elasticsearch indices and our application. With index aliases, we don’t need to know the actual index names for the search operation, and we can transparently change the indices referenced by the alias without impacting our users. Index aliases can be added or removed from indices using a single atomic API call which facilitates zero downtime maintenance. This is an incredibly useful feature with two main benefits: long-term data management and structural maintenance.

The “Hot-Warm” architecture provides separation of concerns, i.e. hot nodes, configured with large disk space, are responsible for indexing while warm nodes, configured to be snappier, are responsible for searching. The cluster needs to know which of the data nodes are hot and which ones are warm. This is achieved by assigning arbitrary attributes to each node. We tag each data node by setting the following in elasticsearch.yml:

node.attr.temperature : hot 
OR
node.attr.temperature : warm

Time-based indices can be built using another useful feature called index templates. They allow us to configure mappings and settings of indices that haven’t been built yet. This ensures that any new index that matches the template’s pattern, is created on-demand and is configured with the right mappings and settings. Example:

Index template pattern looks like:
listings-<ENV>
And the newly built time-based index name looks like:
listings-<ENV>-YYYY-MM-DD-HH-MM

We assign every newly built index to the hot nodes by using the following in the index template’s settings:

"index.routing.allocation.require.temperature" : "hot"

A scheduled Jenkins job kicks off the indexing process daily and once it is completed, we have a brand new index, which was built from scratch and is populated with the latest inventory. We run a data health check to ensure that this index is ready to be searchable by our users. And then we perform below steps:

  1. Migrate the new index to the warm nodes by updating the setting:
PUT /new_index_name
{
"settings": {
"index.routing.allocation.require.temperature": "warm"
}
}

2. Toggle the search alias from the old index to the new one:

{
"actions": [
{
"remove": {
"index": "old_index_name",
"alias": "my_search_alias"
}
},
{
"add": {
"index": "new_index_name",
"alias": "my_search_alias"
}
}
]
}

3. Migrate the old index to the hot nodes by updating the settings:

PUT /old_index_name
{
"settings": {
"index.routing.allocation.require.temperature": "hot"
}
}

Since we built indices from scratch, DELETE operations are not needed and that helps us keep the GC happy. We store older indices in the hot nodes and they serve as backup indices if needed, and when the time comes to reduce the total data in the cluster, we delete the stale indices without any downtime and reclaim the disk space.

With the “Hot-Warm” architecture, index templates and index aliases in place, we can now perform routine reindexing operations to ingest fresh data into our system. It is also easy to update index mappings to enable new queries or to update the data model or to optimize the existing data and queries. We have encountered situations in the past where we had to change an existing field type or delete one, and that required a complete reindexing in a separate index created with the right mapping, which led to unhappy times. But not anymore! We are now able to easily change index shard counts, maximum result window size and other index parameters to optimize cluster performance.

--

--