This talk covers various performance aspects to keep in mind along with a high-level introduction about Elasticsearch and the different gotchas in distributed systems.
This talk was held at the Software Performance Meetup Munich in December 2014.
Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn't change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous. by Peter Deutsch https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing Distributed systems
compared to single process applications hard to predict/test on many different layers But... performance? Application & Libraries Runtime environment (JVM) OS Hardware Network
threads Main memory: No limit Disk: SAN vs. local, SSD vs. spindle Bare metal vs. virtualization https://speakerdeck.com/elasticsearch/life-after-ec2 Hardware
NUMA http://engineering.linkedin.com/performance/optimizing-linux- memory-management-low-latency-high-throughput-databases http://queue.acm.org/detail.cfm?id=2513149 Don't swap out if you need performance! OOM killer: Just dont... Operating systems
pointers Serialize everything yourself JVM versions tend to be incompatible use server vm, allocate all memory on startup reduce thread stack size http://rdiyewar-tech.blogspot.de/2013/02/outofmemoryerror- because-of-default.html JVM tricks
of them Single thread pool does not fit all Solution: Dedicated thread pools, based on the amount of available CPUs and their task complexity JVM Threads
object creation reduces young gen promoting pressure -XX:CMSInitiatingOccupancyFraction=75 Elasticsearch: Long GCs can result in nodes dropping out of the cluster and master reelections and data shifting (often happens due to GC pressure) Improving garbage collection
data caching per segment FSTs Blazing fast in-memory structures, allow thousands of qps Allow for complex searches like prefix/fuzzy searches or intersections Lucene
bare-metal, AWS, number of nodes Long running tests Avoid hitting the wrong caches and missing the right ones Rate limit the right things/things right Create your own benchmark numbers Performance test requirements