ClickHouse Data Generation and Drill Tools

This article explains the helper tools under /usr/local/dbbot/clickhouse_ansible/examples. They are not deployment entry points. They exist to complete the drill flow of “deployment -> data generation -> backup -> restore -> data comparison”.

1. Tool location

Typical path:

/usr/local/dbbot/clickhouse_ansible/examples

The main tools are:

PathPurposeTypical usage
examples/business_demo/generate_ck_business_demo.shGenerate a one-off business-style demo datasetPost-deployment acceptance, query demos, preparing baseline data before backup
examples/hot_backup_lab/init_lab.shInitialize the hot-backup lab database and tablesPrepare an exercise environment before backup and restore drills
examples/hot_backup_lab/writer_ctl.shStart, stop, and inspect the continuous writerVerify ongoing writes during the backup window and compare data after restore
examples/hot_backup_lab/query_examples.sqlProvide a set of exercise SQL queriesRecord pre-backup baselines and do quick post-restore checks

Notes:

  • examples/README.md gives a short directory overview.
  • examples/*/run/, logs, and PID files are runtime artifacts and should not be committed.
  • The IPs, passwords, and cluster names in the examples are placeholders. Replace them with your real environment values.

For a full end-to-end ClickHouse drill, use this order:

  1. Finish ClickHouse Cluster Deployment
  2. Use the tools in this article to generate static data or continuous write traffic
  3. Run baseline SQL before backup and keep the results
  4. Run ClickHouse Backup
  5. Run ClickHouse Restore
  6. Use ClickHouse Data Comparison and Restore Validation or your baseline SQL to compare the restored result

3. business_demo: one-off business data generation

generate_ck_business_demo.sh is suitable when you want a business-like dataset right after deployment. Typical goals:

  • Verify distributed tables, replicated tables, and rollup tables
  • Demonstrate order, action, inventory, and slow-log analytics
  • Prepare a stable baseline dataset for later backup and restore drills

The default target database is lab_ck_biz. The script creates and fills these table groups:

  • Dimension tables: dim_shop, dim_category, dim_sku
  • Fact tables: fact_order_item, fact_user_action, fact_inventory_snapshot
  • Slow-log table: mysql_slowlog_raw
  • Rollup table: dws_sku_day

Minimal example:

cd /usr/local/dbbot/clickhouse_ansible
bash examples/business_demo/generate_ck_business_demo.sh \
  --host 192.0.2.11 \
  --port 9000 \
  --user default \
  --password '<clickhouse_password>' \
  --cluster 'example_3shards_2replicas' \
  --db lab_ck_biz \
  --reset

Common parameters:

  • --reset: drop and recreate the database before loading data
  • --orders: order-item row count, default 3000000
  • --actions: action-log row count, default 9000000
  • --slowlogs: slow-log row count, default 3000000
  • --days: time span in days, default 60

For a smaller smoke test:

cd /usr/local/dbbot/clickhouse_ansible
bash examples/business_demo/generate_ck_business_demo.sh \
  --host 192.0.2.11 \
  --password '<clickhouse_password>' \
  --orders 500000 \
  --actions 1500000 \
  --slowlogs 500000

4. hot_backup_lab: continuous-write drill

hot_backup_lab is better for backup drills where writes should keep flowing during the backup window. It contains three parts:

  1. init_lab.sh: initialize lab databases, tables, and seed data
  2. writer_ctl.sh: continuously write into fact_order_item, fact_user_action, and mysql_slowlog_raw
  3. query_examples.sql: provide real-time write, ranking, funnel, slow SQL, and baseline queries

4.1 Initialize the lab database

cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/init_lab.sh \
  --host 192.0.2.11 \
  --port 9000 \
  --user default \
  --password '<clickhouse_password>' \
  --cluster 'example_3shards_2replicas' \
  --db lab_hot_backup \
  --reset

To prepare multiple lab databases at once, use --dbs:

cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/init_lab.sh \
  --host 192.0.2.11 \
  --port 9000 \
  --user default \
  --password '<clickhouse_password>' \
  --cluster 'example_3shards_2replicas' \
  --dbs lab_hot_backup,lab_hot_backup_ext,lab_hot_backup_analytics \
  --reset

4.2 Start continuous writes

cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/writer_ctl.sh start \
  --host 192.0.2.11 \
  --port 9000 \
  --user default \
  --password '<clickhouse_password>' \
  --db lab_hot_backup \
  --order-batch 3000 \
  --action-batch 9000 \
  --slow-batch 1500 \
  --interval 2

Common actions:

  • start: launch the background writer
  • run: run in the foreground
  • status: check process status and recent write volume
  • stop: stop the background writer

Check status:

cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/writer_ctl.sh status \
  --host 192.0.2.11 \
  --port 9000 \
  --user default \
  --password '<clickhouse_password>' \
  --db lab_hot_backup

Stop the writer:

cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/writer_ctl.sh stop

Runtime files are written to:

  • examples/hot_backup_lab/run/writer.pid
  • examples/hot_backup_lab/run/writer.log

5. How to compare data

5.1 Quick baseline comparison

For a simple drill, the fastest way is to run the same SQL before and after backup:

cd /usr/local/dbbot/clickhouse_ansible
clickhouse-client \
  --host 192.0.2.11 \
  --port 9000 \
  --user default \
  --password '<clickhouse_password>' \
  --multiquery < examples/hot_backup_lab/query_examples.sql

The last query block returns row totals for fact_order_item, fact_user_action, and mysql_slowlog_raw, which can serve as your baseline before and after restore.

5.2 Cross-cluster data comparison

For formal acceptance between the source cluster and the restored target cluster, use the dedicated playbook:

cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
  -i ../inventory/hosts.backup.ini \
  -i ../inventory/hosts.restore.ini \
  validate_restore_consistency.yml

Detailed usage is covered in ClickHouse Data Comparison and Restore Validation.

A minimal check example:

validate_checks:
  - name: "order_items_full_count"
    database: "lab_hot_backup"
    local_table: "fact_order_item_local"

  - name: "slowlog_recent_window"
    database: "lab_hot_backup"
    local_table: "mysql_slowlog_raw_local"
    where: "log_time >= toDateTime('2026-03-01 00:00:00')"

Notes:

  • Non-TTL tables are good candidates for full-table count() comparison.
  • Tables with TTL ... DELETE are better validated with a fixed time window.
  • validate_checks is maintained in playbooks/vars/validate_restore_config.yml.

6. Notes

  1. business_demo is better for generating a stable sample dataset. hot_backup_lab is better for continuous-write drills.
  2. For TTL tables such as mysql_slowlog_raw_local, do not use the full-table row count from source and target as the only acceptance rule.
  3. If the source cluster is still receiving writes, strict cross-cluster comparison should be limited to fixed windows or write-stopped drills.
  4. Before generating large volumes of data, evaluate disk space, background merge pressure, and backup capacity.
  1. ClickHouse Cluster Deployment
  2. ClickHouse Backup
  3. ClickHouse Restore
  4. ClickHouse Data Comparison and Restore Validation