Elasticsearch(ES) is a distributed, full-text document store, search engine which is based on Apache Lucene as a core library. ES was originally designed for rich text-based search with advanced features to support complex queries, filters, analyzers with multiple languages. ES indexes and stores data so that it can be retrieved/searched in near real-time.
Distributed: ES saves the data in multiple nodes, data can be retrieved from any node at any time
Highlights of Lucene core:
- It can return search responses quickly.
- Based on the inverted index data structure. (stores mapping from content such as words or numbers to its location in a document(s))
Possibilities on using ES as primary database:
It depends on a lot of factors, such as,
(One ongoing WIP from the elastic team is that they are constantly working towards improving the resiliency)
○ Size of document ○ Number of concurrent requests of read per second ○ Writes per second
ES always does best in read requests, as the name suggests it is indexed data store.
If we prefer to increase the write (aka indexing rate) it can be achieved in several ways. refresh_interval, flush_threshold_size are some of the key parameters to be considered while improving writes.
Not impossible, however, is it advisable to go? My take;
Below are the pointers out of my experience in using ES as a database in production.
It is not preferred to use ES as the primary database when you have (WPS) writes per second dominant reads (RPS).
ES is very helpful when one of your primary use cases is to read, visualize the data using the complex combination.
Best works for structured data set with less number of nested items.
To mitigate the concerns of write in elastic search, add another layer over ES, that can be REDIS or KAFKA buffer to queue the incoming data and to avoid data loss.
The main question is it worth adding more complexity to architecture? Which adds maintenance and other ad-hoc items related to the new services.
The use-cases which use this architecture are;
- Realtime logs storing systems
- Data capturing systems, where continuous ingestion of data from frontend applications.
And if your primary goal is to save JSON data and with the best performance in writes and reads, (considering all the points above in this section )please go ahead with mainstream NoSQL databases like MongoDB.
Invention and Discovery always being the part of human nature,
I always love the idea of extending the capabilities of any such machine with a pinch of salt and pepper to adapt to our complex use-cases in our daily. However, this should happen without killing the nature, purpose for which the tool is built since we might have straight forward alternative for any such use case.
Did you find this article valuable?
Support Jothimani Radhakrishnan by becoming a sponsor. Any amount is appreciated!