We currently see a number of strange things when we look at the cache statistics in HAC in our production system (multi-tenant system with 3 slave and 1 master tenant, 4 cluster nodes with 3 of them serving normal requests and 1 for cronjobs, HMC and cockpits).
Looking at the cache statistics on one normal instance, I can see that the cache is full and I see a fairly high number of evictions in the query cache (some hundreds per second), same for the number of misses and fetches. Yet, when I enable JDBC logging on the instance, I see much less queries in the log (around 20 per second), which I can't understand, especially since queries that do not use the cache (e.g. direct JDBC queries done by Hybris) are also counted in that number. How could a fetch not result in a log file line?
How does Hybris behave in a multi-tenant environment regarding JDBC logging anyway? When I enable JDBC logging in one tenant's HAC, are all queries of all tenants logged into the log file or only the ones done by the tenant? The log file also has no indication which tenant ran the query (there is a column with the value "master", but that is the name of the data source), so I cannot see the source tenant of the query.
When collection a number of queries via JDBC logging for a short time period (e.g. 2 minutes), I can see that quite a number of queries are executed several times (e.g. media/mediacontainer queries) with exactly the same placeholder values (e.g. some queries were executed 9 times during that time, and they are for sure FlexibleSearch queries). How is that possible? Such queries should be served by the cache. I cannot believe that the queries have been evicted all the time in between so they would need to be rerun.
Is each query (with placeholder replacement and other parameters like stnd end window) really unique in the cache or are (maybe for performance reasons) several versions of the queries, e.g. of different points in time, stored? I've dumped the whole query cache into a file and was surprised to find loads of query duplicates in there, which makes me question if the caching really works correctly.
Our query cache seems to grow endlessly. I have tried increasing it by a certain factor; it takes longer to fill it, but then still the number of misses per second seems to be the same afterwards. Using JDBC logging I have verified that almost all of the queries performed should be cachable (they don't contain timestamps, for example), but somehow they are not. I know that certain ootb queries are not cachable, of course (e.g. queries for incremental Solr indexing, cronjob scheduler queries, etc.), but those only make up a very tiny fraction of the queries. Why are they not cached but run over and over? I must be missing something.
I see that the number of query cache invalidations is way too low (a couple of thousands right now), while we have billion of hits for the query cache and 2.2 million item invalidations. This sounds completely off to me. I would expect each item invalidation (item creation, item modification, item removal) to account for an invalidation that invalidates all queries that make use of the item type (and I would assume that every invalidated query would count in that number separately). But then the figure for invalidations should be much much higher (much higher than the number of item invalidations). Most of the data is read-only, of course, but a lot of item types (carts, cart entries, orders, order entries, stock levels, ...) change all the time. What is happening here? Again, what am I missing?