ClickHouse Data Generation and Drill Tools
This article explains the helper tools under /usr/local/dbbot/clickhouse_ansible/examples. They are not deployment entry points. They exist to complete the drill flow of “deployment -> data generation -> backup -> restore -> data comparison”.
1. Tool location
Typical path:
/usr/local/dbbot/clickhouse_ansible/examples
The main tools are:
| Path | Purpose | Typical usage |
|---|---|---|
examples/business_demo/generate_ck_business_demo.sh | Generate a one-off business-style demo dataset | Post-deployment acceptance, query demos, preparing baseline data before backup |
examples/hot_backup_lab/init_lab.sh | Initialize the hot-backup lab database and tables | Prepare an exercise environment before backup and restore drills |
examples/hot_backup_lab/writer_ctl.sh | Start, stop, and inspect the continuous writer | Verify ongoing writes during the backup window and compare data after restore |
examples/hot_backup_lab/query_examples.sql | Provide a set of exercise SQL queries | Record pre-backup baselines and do quick post-restore checks |
Notes:
examples/README.mdgives a short directory overview.examples/*/run/, logs, and PID files are runtime artifacts and should not be committed.- The IPs, passwords, and cluster names in the examples are placeholders. Replace them with your real environment values.
2. Recommended drill flow
For a full end-to-end ClickHouse drill, use this order:
- Finish ClickHouse Cluster Deployment
- Use the tools in this article to generate static data or continuous write traffic
- Run baseline SQL before backup and keep the results
- Run ClickHouse Backup
- Run ClickHouse Restore
- Use ClickHouse Data Comparison and Restore Validation or your baseline SQL to compare the restored result
3. business_demo: one-off business data generation
generate_ck_business_demo.sh is suitable when you want a business-like dataset right after deployment. Typical goals:
- Verify distributed tables, replicated tables, and rollup tables
- Demonstrate order, action, inventory, and slow-log analytics
- Prepare a stable baseline dataset for later backup and restore drills
The default target database is lab_ck_biz. The script creates and fills these table groups:
- Dimension tables:
dim_shop,dim_category,dim_sku - Fact tables:
fact_order_item,fact_user_action,fact_inventory_snapshot - Slow-log table:
mysql_slowlog_raw - Rollup table:
dws_sku_day
Minimal example:
cd /usr/local/dbbot/clickhouse_ansible
bash examples/business_demo/generate_ck_business_demo.sh \
--host 192.0.2.11 \
--port 9000 \
--user default \
--password '<clickhouse_password>' \
--cluster 'example_3shards_2replicas' \
--db lab_ck_biz \
--reset
Common parameters:
--reset: drop and recreate the database before loading data--orders: order-item row count, default3000000--actions: action-log row count, default9000000--slowlogs: slow-log row count, default3000000--days: time span in days, default60
For a smaller smoke test:
cd /usr/local/dbbot/clickhouse_ansible
bash examples/business_demo/generate_ck_business_demo.sh \
--host 192.0.2.11 \
--password '<clickhouse_password>' \
--orders 500000 \
--actions 1500000 \
--slowlogs 500000
4. hot_backup_lab: continuous-write drill
hot_backup_lab is better for backup drills where writes should keep flowing during the backup window. It contains three parts:
init_lab.sh: initialize lab databases, tables, and seed datawriter_ctl.sh: continuously write intofact_order_item,fact_user_action, andmysql_slowlog_rawquery_examples.sql: provide real-time write, ranking, funnel, slow SQL, and baseline queries
4.1 Initialize the lab database
cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/init_lab.sh \
--host 192.0.2.11 \
--port 9000 \
--user default \
--password '<clickhouse_password>' \
--cluster 'example_3shards_2replicas' \
--db lab_hot_backup \
--reset
To prepare multiple lab databases at once, use --dbs:
cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/init_lab.sh \
--host 192.0.2.11 \
--port 9000 \
--user default \
--password '<clickhouse_password>' \
--cluster 'example_3shards_2replicas' \
--dbs lab_hot_backup,lab_hot_backup_ext,lab_hot_backup_analytics \
--reset
4.2 Start continuous writes
cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/writer_ctl.sh start \
--host 192.0.2.11 \
--port 9000 \
--user default \
--password '<clickhouse_password>' \
--db lab_hot_backup \
--order-batch 3000 \
--action-batch 9000 \
--slow-batch 1500 \
--interval 2
Common actions:
start: launch the background writerrun: run in the foregroundstatus: check process status and recent write volumestop: stop the background writer
Check status:
cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/writer_ctl.sh status \
--host 192.0.2.11 \
--port 9000 \
--user default \
--password '<clickhouse_password>' \
--db lab_hot_backup
Stop the writer:
cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/writer_ctl.sh stop
Runtime files are written to:
examples/hot_backup_lab/run/writer.pidexamples/hot_backup_lab/run/writer.log
5. How to compare data
5.1 Quick baseline comparison
For a simple drill, the fastest way is to run the same SQL before and after backup:
cd /usr/local/dbbot/clickhouse_ansible
clickhouse-client \
--host 192.0.2.11 \
--port 9000 \
--user default \
--password '<clickhouse_password>' \
--multiquery < examples/hot_backup_lab/query_examples.sql
The last query block returns row totals for fact_order_item, fact_user_action, and mysql_slowlog_raw, which can serve as your baseline before and after restore.
5.2 Cross-cluster data comparison
For formal acceptance between the source cluster and the restored target cluster, use the dedicated playbook:
cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
-i ../inventory/hosts.backup.ini \
-i ../inventory/hosts.restore.ini \
validate_restore_consistency.yml
Detailed usage is covered in ClickHouse Data Comparison and Restore Validation.
A minimal check example:
validate_checks:
- name: "order_items_full_count"
database: "lab_hot_backup"
local_table: "fact_order_item_local"
- name: "slowlog_recent_window"
database: "lab_hot_backup"
local_table: "mysql_slowlog_raw_local"
where: "log_time >= toDateTime('2026-03-01 00:00:00')"
Notes:
- Non-TTL tables are good candidates for full-table
count()comparison. - Tables with
TTL ... DELETEare better validated with a fixed time window. validate_checksis maintained inplaybooks/vars/validate_restore_config.yml.
6. Notes
business_demois better for generating a stable sample dataset.hot_backup_labis better for continuous-write drills.- For TTL tables such as
mysql_slowlog_raw_local, do not use the full-table row count from source and target as the only acceptance rule. - If the source cluster is still receiving writes, strict cross-cluster comparison should be limited to fixed windows or write-stopped drills.
- Before generating large volumes of data, evaluate disk space, background merge pressure, and backup capacity.