Each Query Engine runs inside a YARN container. Query Engine memory has two components - Xmx and off heap.
The Xmx memory is used for query processing and the off heap memory is used for cuboid caching.
QE Container memory = Xmx + MaxDirectMemorySize
Further, there are two types of cache in the container memory.
Look Ahead cache (local): This cache is maintained in the Xmx memory of Query Engine and cleared once a request is served. This is maintained only for one session.
Global/Cuboid cache (Off heap): This cache is maintained in the MaxDirectMemorySize memory of Query Engine. To ensure higher throughput of queries and preserve optimal performance, cuboids are cached in off heap memory. The cache gets populated as the blocks get queried and remain in the memory until they are cleaned up. The purge thread performs cleanup of the cache when the following conditions are met:
- Cache has consumed 100% of configured memory.
- There are some blocks older than a day (Time to live).
Consider the following example.
Let’s assume there are 3 Query Engines configured on Kyvos cluster and the memory is configured as:
Max Memory (Xmx) = 20GB
Off Heap Memory = 12GB
Assuming there is a Sales cube with 300 cuboids. A request came in from user which needs to query all 300 cuboids. A thread (THRIFT_WORKER_THREAD – default value is 10 meaning 10 users can query simultaneously) is run from OLAP Engine to Query Engines for this request.
Memory used for each worker thread is calculated as:
Total_Xmx/THRIFT_WORKER_THREAD = 20GB/20 = 2GB (2000 mb)
This means, each Query Engine can use 2GB for the look ahead cache.
On Query Engine, one thread is used for each cuboid. Hence, for 100 cuboids 100 threads are needed per QE.
If PARALLEL_THREAD_PER_QUERY is configured as 20, this means 20 cuboids can be read parallelly for each request per Query Engine. Once this is done, next 20 cuboids will be read and so on.
Now, 50% of (2GB/20 = 100 mb) which is 50mb, can be used for look ahead cache per cuboid.
If the Block size = 1mb (default is 512 kb). Then per thread 50mb/1mb = 50 blocks/cuboids can be kept in look ahead cache.
Here, number of cuboids per QE is 100, and we can store only 50 cuboids at a time in look ahead cache. So, the remaining 50 cuboids will be missed. In this case, the Off Heap(Global) cache will come into picture and result will be searched from global cache. Still, if the result is not there, then the OLAP engine will read from disk. This will increase the I/O time.
Hence, you must ensure that the caches are fully utilized and tweak memory configuration if required.
If you increase the thrift worker threads for concurrency, then memory will also need an increase, as Look Ahead capacity will be decreased. You must decide these as per your use case.
If HDFS is slow then use global cache (increase off heap memory and decrease LA memory) if HDFS is fast, then use local cache (increase LA cache memory decrease off heap). This can be checked per Query Engine in OLAP engine logs.