Skip to main content
Skip to main content

Performance optimization: community tested strategies

This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can browse by specific problem. Having trouble with Materialized Views? Check out the Materialized Views community insights guide. If you're experiencing slow queries and want more examples, we also have a Query Optimization guide.

Order by cardinality (lowest to highest)

ClickHouse's primary index works best when low-cardinality columns come first, allowing it to skip large chunks of data efficiently. High-cardinality columns later in the key provide fine-grained sorting within those chunks. Start with columns that have few unique values (like status, category, country) and end with columns that have many unique values (like user_id, timestamp, session_id).

Check out more documentation on cardinality and primary indexes:

Time granularity matters

When using timestamps in your ORDER BY clause, consider the cardinality vs precision trade-off. Microsecond-precision timestamps create very high cardinality (nearly one unique value per row), which reduces the effectiveness of ClickHouse's sparse primary index. Rounded timestamps create lower cardinality that enables better index skipping, but you lose precision for time-based queries.

Focus on individual queries, not averages

When debugging ClickHouse performance, don't rely on average query times or overall system metrics. Instead, identify why specific queries are slow. A system can have good average performance while individual queries suffer from memory exhaustion, poor filtering, or high cardinality operations.

According to Alexey, CTO of ClickHouse: "The right way is to ask yourself why this particular query was processed in five seconds... I don't care if median and other queries process quickly. I only care about my query"

When a query is slow, don't just look at averages. Ask "Why was THIS specific query slow?" and examine the actual resource usage patterns.

Memory and row scanning

Sentry is a developer-first error tracking platform processing billions of events daily from 4+ million developers. Their key insight: "The cardinality of the grouping key that's going to drive memory in this particular situation" - High cardinality aggregations kill performance through memory exhaustion, not row scanning.

When queries fail, determine if it's a memory problem (too many groups) or scanning problem (too many rows).

A query like GROUP BY user_id, error_message, url_path creates a separate memory state for every unique combination of all three values together. With a higher load of users, error types, and URL paths, you could easily generate millions of aggregation states that must be held in memory simultaneously.

For extreme cases, Sentry uses deterministic sampling. A 10% sample reduces memory usage by 90% while maintaining roughly 5% accuracy for most aggregations:

This ensures the same users appear in every query, providing consistent results across time periods. The key insight: cityHash64() produces consistent hash values for the same input, so user_id = 12345 will always hash to the same value, ensuring that user either always appears in your 10% sample or never does - no flickering between queries.

Sentry's bit mask optimization

When aggregating by high-cardinality columns (like URLs), each unique value creates a separate aggregation state in memory, leading to memory exhaustion. Sentry's solution: instead of grouping by the actual URL strings, group by boolean expressions that collapse into bit masks.

Here is a query that you can try on your own tables if this situation applies to you:

Instead of storing every unique string in memory, you're storing the answer to questions about those strings as integers. The aggregation state becomes bounded and tiny, regardless of data diversity.

From Sentry's engineering team: "These heavy queries are more than 10x faster and our memory usage is 100x lower (and, more importantly, bounded). Our largest customers no longer see errors when searching for replays and we can now support customers of arbitrary size without running out of memory."

Video sources

Read Next: