Skip to main content

FAQ β€” ClickHouse

What is the difference between shards and replicas?​

Shards and replicas play different roles in the ClickHouse architecture:

  • Shards: horizontal data distribution. Each shard contains a portion of the total dataset. Adding shards increases storage and processing capacity.
  • Replicas: identical copies of data within the same shard. Each replica contains the same data to ensure high availability.
clickhouse.yaml
spec:
shards: 2 # Data is distributed across 2 shards
replicas: 3 # Each shard has 3 copies (total: 6 pods)
tip

In production, use at least 2 replicas per shard for high availability. Increase the number of shards to handle larger data volumes.

What is ClickHouse Keeper for?​

ClickHouse Keeper is the cluster coordination component, based on the Raft protocol. It replaces Apache ZooKeeper and provides:

  • Leader election for replicated tables
  • Coordination of replication operations between replicas
  • Metadata management for the cluster

The number of Keeper replicas must be odd (3 or 5) to guarantee quorum (majority required for leader election). The recommended minimum is 3 replicas.

clickhouse.yaml
spec:
clickhouseKeeper:
enabled: true
replicas: 3 # Always odd: 3 or 5
resourcesPreset: micro
size: 2Gi

Is ClickHouse suitable for transactional queries (OLTP)?​

No. ClickHouse is an OLAP (Online Analytical Processing) database engine optimized for data analysis:

  • Column-oriented architecture: very performant for aggregations and scans on large data volumes
  • Optimized for massive reads and analytical queries
  • Not suitable for frequent transactional operations (individual UPDATE, DELETE)

If you need a transactional engine (OLTP), use PostgreSQL or MySQL on Hikube instead.

What is the difference between resourcesPreset and resources?​

The resourcesPreset field lets you choose a predetermined resource profile for each ClickHouse replica. If the resources field (explicit CPU/memory) is defined, resourcesPreset is completely ignored.

PresetCPUMemory
nano250m128Mi
micro500m256Mi
small1512Mi
medium11Gi
large22Gi
xlarge44Gi
2xlarge88Gi
clickhouse.yaml
spec:
# Using a preset
resourcesPreset: large

# OR explicit configuration (the preset is then ignored)
resources:
cpu: 4000m
memory: 8Gi

How is data distributed between shards?​

Data is distributed between shards via the ClickHouse Distributed engine:

  • Each shard stores a partition of the total dataset
  • The Distributed engine routes queries to all shards and merges the results
  • Data is replicated within each shard according to the configured number of replicas

To benefit from distribution, create tables with the ReplicatedMergeTree engine on each shard and a Distributed table for global queries.

How to configure ClickHouse backups?​

ClickHouse backups use Restic for sending to S3-compatible storage. Configure the backup section:

clickhouse.yaml
spec:
backup:
enabled: true
s3Region: eu-central-1
s3Bucket: s3.example.com/clickhouse-backups
schedule: "0 3 * * *"
cleanupStrategy: "--keep-last=7 --keep-daily=7 --keep-weekly=4"
s3AccessKey: your-access-key
s3SecretKey: your-secret-key
resticPassword: your-restic-password
warning

Keep the resticPassword in a safe place. Without this password, backups cannot be decrypted.