Skip to main content
Version: 3.0.0-alpha (Diátaxis)

FAQ — ClickHouse

What is the difference between shards and replicas?

Shards and replicas play different roles in the ClickHouse architecture:

  • Shards: horizontal data distribution. Each shard contains a portion of the total dataset. Adding shards increases storage and processing capacity.
  • Replicas: identical copies of data within the same shard. Each replica contains the same data to ensure high availability.
clickhouse.yaml
spec:
shards: 2 # Data is distributed across 2 shards
replicas: 3 # Each shard has 3 copies (total: 6 pods)
tip

In production, use at least 2 replicas per shard for high availability. Increase the number of shards to handle larger data volumes.

What is ClickHouse Keeper for?

ClickHouse Keeper is the cluster coordination component, based on the Raft protocol. It replaces Apache ZooKeeper and provides:

  • Leader election for replicated tables
  • Coordination of replication operations between replicas
  • Metadata management for the cluster

The number of Keeper replicas must be odd (3 or 5) to guarantee quorum (majority required for leader election). The recommended minimum is 3 replicas.

clickhouse.yaml
spec:
clickhouseKeeper:
enabled: true
replicas: 3 # Always odd: 3 or 5
resourcesPreset: micro
size: 2Gi

Is ClickHouse suitable for transactional queries (OLTP)?

No. ClickHouse is an OLAP (Online Analytical Processing) database engine optimized for data analysis:

  • Column-oriented architecture: very performant for aggregations and scans on large data volumes
  • Optimized for massive reads and analytical queries
  • Not suitable for frequent transactional operations (individual UPDATE, DELETE)

If you need a transactional engine (OLTP), use PostgreSQL or MySQL on Hikube instead.

What is the difference between resourcesPreset and resources?

The resourcesPreset field lets you choose a predetermined resource profile for each ClickHouse replica. If the resources field (explicit CPU/memory) is defined, resourcesPreset is completely ignored.

PresetCPUMemory
nano250m128Mi
micro500m256Mi
small1512Mi
medium11Gi
large22Gi
xlarge44Gi
2xlarge88Gi
clickhouse.yaml
spec:
# Using a preset
resourcesPreset: large

# OR explicit configuration (the preset is then ignored)
resources:
cpu: 4000m
memory: 8Gi

How is data distributed between shards?

Data is distributed between shards via the ClickHouse Distributed engine:

  • Each shard stores a partition of the total dataset
  • The Distributed engine routes queries to all shards and merges the results
  • Data is replicated within each shard according to the configured number of replicas

To benefit from distribution, create tables with the ReplicatedMergeTree engine on each shard and a Distributed table for global queries.

How to configure ClickHouse backups?

ClickHouse backups use Restic for sending to S3-compatible storage. Configure the backup section:

clickhouse.yaml
spec:
backup:
enabled: true
s3Region: eu-central-1
s3Bucket: s3.example.com/clickhouse-backups
schedule: "0 3 * * *"
cleanupStrategy: "--keep-last=7 --keep-daily=7 --keep-weekly=4"
s3AccessKey: your-access-key
s3SecretKey: your-secret-key
resticPassword: your-restic-password
warning

Keep the resticPassword in a safe place. Without this password, backups cannot be decrypted.