Q&A with Britta Weber: Elasticsearch Group Berlin

By Arsenia Nikolaeva, Sr. Communications Manager EMEA
Tuesday, February 3, 2015 / 4 min read
elastisearchLeft: Britta Weber and Fyber’s Juan Vidal kicked off the presentation.
Upper Right: The event drew a packed house.
Lower Right: Britta joined Fyber’s systems engineer,
Robert Gardam, for post-talk discussions.

Many companies have to deal with large amounts of requests and data, specifically in a high-traffic mobile environment, and questions arise on how to sort and make sense of this data in as close to real time as possible. Open source tools, such as Elasticsearch, significantly help deal with this challenge. Fyber was excited to host a meetup of the Elasticsearch User Group Berlin organised by @asquera and have the very knowledgeable Britta Weber give a talk entitled “Making sense of your logs with the ELK stack”. She spoke about Elasticsearch, Logstash, and Kibana in her presentation and provided us with lots of great advice and examples of practical applications.

We caught up with Britta after the talk to explore a few questions on our minds, such as: How can Elasticsearch, Logstash, and Kibana help your work with big data? What are some best practices for an administrator working with Elasticsearch? The ideas and opinions expressed in this interview are Britta’s alone and not Elasticsearch’s.

What determines a log retention period? At Fyber, we work with large quantities of data and split our logs by the hour. Each hour amounts to about 60-70GB. We index a lot, but what do you suggest?
The length of the retention period depends on company policy – and even laws – so it’s a decision you have make for yourself. Sometimes it makes sense with this amount of data to index on a strong machine and move the index to a less performant one once the indexing is done. If you need to keep the index around for a long time, but not necessarily search on it, you can close the index and store it somewhere. This will reduce the need to keep all indices online. Your company policy and hardware constraints will determine what’s best for your company or project.

How do you size an Elasticsearch cluster for logs?
This goes along the lines of how you manage indices, and it’s a problem sometimes. First you need to know what you need to support: Do you need more indexing speed or will you be serving lots of queries? What are the constraints? For example, how long is your query allowed to take? Should it last a max of 10 milliseconds or can it be a minute? For example, some people just want to check summary statistics every morning and then it’s okay to have a query run for an hour.

How big is a shard allowed to be for querying?
You need to perform tests to see what works best for your situation. There’s a technique you can use to determine this: First, start off with one index and one shard and start indexing into it. Then you measure your query performance. Eventually it will reach the point where query latency will exceed what you are willing to accept – this is the size that the shard is allowed to take. Then take a look at how many lines/documents you expect per index, and you’ll know how many shards you need per index. Just remember that you can’t split indices once they are created. But the good thing is you can always add another index and start indexing into the new one.

If the query comes in and the data is only on one shard, it will only ever run on this shard. Parallelization is only achieved by splitting the index into shards. If you have an extremely large number of parallel queries, it may be worth increasing the number of replicas and ad hardware.The best performance is achieved if you have one shard per node, but that’s not always necessary.

How should you manage upgrades to your elasticsearch cluster?
The guides for Elasticsearch tell you how to do this.

How much time should you spend configuring index mappings?
A lot (laughs). If you have a lot of data, then it makes sense to think about which fields you actually need to analyze. If you are using Logstash, you may not need the “all” field or the “analyzed” field – if you can get rid of the “analyzed” field, super! The mapping also plays a huge role in the quality of the search results, so experimenting with different settings is often inevitable.

Is there a good reason to use aliases in the ELK stack? (indexing and searching)
Yes, you should always use aliases, always. The reason is the simplification of process. For example, if you want to reindex, you can switch alias easily as it’s an atomic operation. You can switch it quickly without needing to change everything that points to the indices.

When should you separate the roles in your cluster (i.e., master from data nodes)?
In general, if you have a big cluster – for example, 10 nodes – it makes sense to create dedicated master nodes. The reason is that your master node has to be quick, it has to be up and running, and the more work the master does, the more unstable your whole cluster could be. So having the master relieved of all the data node stuff, like indexing, makes sense. Always make sure the master does not run out of memory. The master node can be lightweight, but it’s important to make sure they can handle the size of the cluster state. This can grow when you have lots of indices with lots of mappings. This will be shared throughout the cluster and if the master’s isn’t able to hold the cluster state, it will run out of memory.

What key metrics should an Elasticsearch administrator look at when determining the health of the cluster?
Heap, always look at the heap. Make sure you don’t see the bad garbage collection pattern, the one that looks like a saw-tooth – that’s a bad sign. Give your node more heap, but not more than 32GB. There’s a webinar by my colleagues called “Pre-flight checklist” that provides a lot of guidance on this. Check it out.

What should an administrator do when the cluster comes up red?
Buy Elasticsearch support (laughs). When the cluster comes up red, look at the health of the cluster first and then at how many shards are active, how many are initializing, and how many are on the side (especially after a reboot). If some are initializing, wait a minute, be patient, and don’t panic. Look at the heap and the logs – the logs will tell you a lot. From there on, it’s tricky to give general advice since the best course of action depends on what you saw.

How many logs could a logstash stash if a logstash could stash logs?
(Laughs) A cagillion!

Many thanks to Britta for taking the time to sit down with us, and to all those who made it out for the event! If you missed the talk, the presentation slides are available for download here.

Tags: Events