How to understand subresult reuse and result caching.
EXPLAIN (ANALYZE)
to check for subresult reuse.MaybeCache
operator above any node in the query plan, which may cache a subresult if it isn’t too large. The MaybeCache
operator may later retrieve and reuse the cached subresult if the same subplan, with the same underlying data, is evaluated again.
Currently, the optimizer places a MaybeCache
operator in the following places:
MaybeCache
operator is versatile, and it can be placed anywhere in the plan.
Join
operators in the FireCache, provided they are not too large. These hash tables are costly to compute, so reusing them when similar consecutive queries run offers significant performance advantages.
INSERT
, UPDATE
, or DELETE
), outdated cache entries are no longer used.
orders
, customer
, and nation
tables:
MaybeCache
operator positioned at the top of the plan caches the subresult from the first run into the FireCache. Additionally, both Join
operators store their respective hash tables in the cache.
On a subsequent run of exactly the same query (over unchanged data), the MaybeCache
operator fetches the subresult from the cache, allowing the entire evaluation to be skipped. As a result, query latency is reduced to mere milliseconds. In this example, it leads to a speed improvement of over 100x on a single node, medium engine running TPC-H with scale factor of 100.
If the WHERE
condition is changed to add ... AND o_orderdate >= '1998-01-01'::Date ...
, the subresult cached by the MaybeCache
operator cannot be used because the query plan below it has changed. However, the subplan below the upper Join
remains unchanged, allowing the previously cached hash table to be reused in that JOIN
operator. This eliminates the need to re-evaluate the subplan and rebuild the hash table.
This results in more than 5x speed improvement on subsequent queries, even when each query has a different date restriction.
Nothing was executed
in the EXPLAIN (ANALYZE) output. This shows that an operator was skipped because a higher level operator retrieved the subresult from the FireCache. For example, in the following EXPLAIN (ANALYZE)
output, the MaybeCache
operator retrieved the result from the cache, bypassing the need to run the entire query:
RANDOM
cannot cache subresults. If an operator in the query plan depends directly or indirectly on the output of a nondeterministic function, caching is disabled.left_side.colum1 < right_side.column2
) cannot use subresult caching directly. The result cache can still be used for queries using such joins.enable_result_cache
to FALSE
ensures that full query results aren’t retrieved from cache, while still allowing for semantic cross-query subresult reuse.enable_subresult_cache
to FALSE
disables Firebolt’s entire subresult caching layer.