ClickHouse Data Generation and Drill Tools

This article explains the helper tools under /usr/local/dbbot/clickhouse_ansible/examples. They are not deployment entry points. They exist to complete the drill flow of “deployment -> data generation -> backup -> restore -> data comparison”.

1. Tool location

Typical path:

/usr/local/dbbot/clickhouse_ansible/examples

The main tools are:

Path	Purpose	Typical usage
`examples/business_demo/generate_ck_business_demo.sh`	Generate a one-off business-style demo dataset	Post-deployment acceptance, query demos, preparing baseline data before backup
`examples/hot_backup_lab/init_lab.sh`	Initialize the hot-backup lab database and tables	Prepare an exercise environment before backup and restore drills
`examples/hot_backup_lab/writer_ctl.sh`	Start, stop, and inspect the continuous writer	Verify ongoing writes during the backup window and compare data after restore
`examples/hot_backup_lab/query_examples.sql`	Provide a set of exercise SQL queries	Record pre-backup baselines and do quick post-restore checks

Notes:

examples/README.md gives a short directory overview.
examples/*/run/, logs, and PID files are runtime artifacts and should not be committed.
The IPs, passwords, and cluster names in the examples are placeholders. Replace them with your real environment values.

2. Recommended drill flow

For a full end-to-end ClickHouse drill, use this order:

Finish ClickHouse Cluster Deployment
Use the tools in this article to generate static data or continuous write traffic
Run baseline SQL before backup and keep the results
Run ClickHouse Backup
Run ClickHouse Restore
Use ClickHouse Data Comparison and Restore Validation or your baseline SQL to compare the restored result

3. `business_demo`: one-off business data generation

generate_ck_business_demo.sh is suitable when you want a business-like dataset right after deployment. Typical goals:

Verify distributed tables, replicated tables, and rollup tables
Demonstrate order, action, inventory, and slow-log analytics
Prepare a stable baseline dataset for later backup and restore drills

The default target database is lab_ck_biz. The script creates and fills these table groups:

Dimension tables: dim_shop, dim_category, dim_sku
Fact tables: fact_order_item, fact_user_action, fact_inventory_snapshot
Slow-log table: mysql_slowlog_raw
Rollup table: dws_sku_day

Minimal example:

cd /usr/local/dbbot/clickhouse_ansible
bash examples/business_demo/generate_ck_business_demo.sh \
  --host 192.0.2.11 \
  --port 9000 \
  --user default \
  --password '<clickhouse_password>' \
  --cluster 'example_3shards_2replicas' \
  --db lab_ck_biz \
  --reset

Common parameters:

--reset: drop and recreate the database before loading data
--orders: order-item row count, default 3000000
--actions: action-log row count, default 9000000
--slowlogs: slow-log row count, default 3000000
--days: time span in days, default 60

For a smaller smoke test:

cd /usr/local/dbbot/clickhouse_ansible
bash examples/business_demo/generate_ck_business_demo.sh \
  --host 192.0.2.11 \
  --password '<clickhouse_password>' \
  --orders 500000 \
  --actions 1500000 \
  --slowlogs 500000

4. `hot_backup_lab`: continuous-write drill

hot_backup_lab is better for backup drills where writes should keep flowing during the backup window. It contains three parts:

init_lab.sh: initialize lab databases, tables, and seed data
writer_ctl.sh: continuously write into fact_order_item, fact_user_action, and mysql_slowlog_raw
query_examples.sql: provide real-time write, ranking, funnel, slow SQL, and baseline queries

4.1 Initialize the lab database

cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/init_lab.sh \
  --host 192.0.2.11 \
  --port 9000 \
  --user default \
  --password '<clickhouse_password>' \
  --cluster 'example_3shards_2replicas' \
  --db lab_hot_backup \
  --reset

To prepare multiple lab databases at once, use --dbs:

cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/init_lab.sh \
  --host 192.0.2.11 \
  --port 9000 \
  --user default \
  --password '<clickhouse_password>' \
  --cluster 'example_3shards_2replicas' \
  --dbs lab_hot_backup,lab_hot_backup_ext,lab_hot_backup_analytics \
  --reset

4.2 Start continuous writes

cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/writer_ctl.sh start \
  --host 192.0.2.11 \
  --port 9000 \
  --user default \
  --password '<clickhouse_password>' \
  --db lab_hot_backup \
  --order-batch 3000 \
  --action-batch 9000 \
  --slow-batch 1500 \
  --interval 2

Common actions:

start: launch the background writer
run: run in the foreground
status: check process status and recent write volume
stop: stop the background writer

Check status:

cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/writer_ctl.sh status \
  --host 192.0.2.11 \
  --port 9000 \
  --user default \
  --password '<clickhouse_password>' \
  --db lab_hot_backup

Stop the writer:

cd /usr/local/dbbot/clickhouse_ansible
bash examples/hot_backup_lab/writer_ctl.sh stop

Runtime files are written to:

examples/hot_backup_lab/run/writer.pid
examples/hot_backup_lab/run/writer.log

5. How to compare data

5.1 Quick baseline comparison

For a simple drill, the fastest way is to run the same SQL before and after backup:

cd /usr/local/dbbot/clickhouse_ansible
clickhouse-client \
  --host 192.0.2.11 \
  --port 9000 \
  --user default \
  --password '<clickhouse_password>' \
  --multiquery < examples/hot_backup_lab/query_examples.sql

The last query block returns row totals for fact_order_item, fact_user_action, and mysql_slowlog_raw, which can serve as your baseline before and after restore.

5.2 Cross-cluster data comparison

For formal acceptance between the source cluster and the restored target cluster, use the dedicated playbook:

cd /usr/local/dbbot/clickhouse_ansible/playbooks
ansible-playbook \
  -i ../inventory/hosts.backup.ini \
  -i ../inventory/hosts.restore.ini \
  validate_restore_consistency.yml

Detailed usage is covered in ClickHouse Data Comparison and Restore Validation.

A minimal check example:

validate_checks:
  - name: "order_items_full_count"
    database: "lab_hot_backup"
    local_table: "fact_order_item_local"

  - name: "slowlog_recent_window"
    database: "lab_hot_backup"
    local_table: "mysql_slowlog_raw_local"
    where: "log_time >= toDateTime('2026-03-01 00:00:00')"