MySQL MHA-Go Deployment Guide
This article explains the supported boundaries and basic usage of mha_go.yml. mha_go.yml deploys the Go rewrite of MHA (mha-go) — it drops the Perl MHA toolchain and runs as a single static binary plus one YAML config on the manager node.
1. How it works
mha-go is built around a controller main loop plus replica-side observers, with failover logic centered on GTID replication:
- Main loop: the manager probes every node on a
controller.monitor.intervalcadence (default2s). It declares a node unhealthy only afterfailure_thresholdconsecutive misses (default3) followed by areconfirm_timeoutre-check (default5s), filtering out transient noise. - Secondary observer: in addition to the manager’s own probes,
cluster.yamlregisters asecondary_checksentry per replica, so replicas independently confirm the primary is alive. This prevents a single network partition at the manager from triggering an erroneous failover. - Lease:
controller.lease.backend: local-memorywith a15sTTL ensures only one manager instance owns the switch decision at any time. - GTID-only recovery: replication mode is fixed to
gtid, so recovery does not rely on binlog-position or relay-log diff patching.replication.salvage.policy: salvage-if-possiblegoverns whether to attempt to rescue transactions lost on a failed primary. - Semi-sync preference:
semi_sync.policy: preferredpluswait_for_replica_countkeeps semi-sync on the happy path and degrades gracefully on timeout. - Candidate priority: the order of
slave_ipsmaps directly tocandidate_priority(100, 90, 80, …) and drives new-primary selection at failover time. - Writer endpoint abstraction:
writer_endpointis a pluggable layer. Defaulting tonone; enablingvipmode makes the manager invoke/usr/local/bin/mha_ip_failover.sh. The same abstraction can later target proxy / DNS-based endpoints without touching the control plane. - Runtime shape: the manager runs as the
mysqluser undersystemd, emitting JSON logs to the journal. All CLI actions are consolidated under a singlemhabinary with subcommands.
2. Comparison with the legacy MHA
mha.yml deploys the community Perl MHA; mha_go.yml deploys the Go rewrite. Both target the same “one primary + multiple replicas + dedicated manager” topology, but they differ materially in implementation and support scope:
| Dimension | Legacy MHA (mha.yml) | MHA-Go (mha_go.yml) |
|---|---|---|
| Language | Perl | Go, single static binary |
| OS support | CentOS 7.5 / EL7 only | CentOS 7/8, RHEL 7/8, Rocky 9, BigCloud 7/8/21, openEuler 20/22/24, Anolis OS 8, Kylin V10 |
| MySQL support | 5.7 | 8.0 / 8.4 |
| External deps | Perl + MHA RPM + DBD::mysql, … | None beyond the shipped binary |
| Runtime shape | masterha_manager foreground / nohup | systemd service (mha-manager.service) |
| Runtime user | Usually root | mysql user — least privilege |
| Config format | INI (app1.cnf) | YAML (cluster.yaml) |
| Logs | Plain text | Structured JSON, ingested by journal |
| Replication | Binlog-position or GTID | GTID only |
| Failure detection | Single-point manager ping | Manager main loop plus replica-side secondary_checks reconfirmation |
| Failure threshold | ping_interval × count | failure_threshold and reconfirm_timeout decoupled |
| Lost-txn rescue | SSH into the dead primary, diff binlog / relay log | Native GTID-based recovery gated by salvage.policy |
| Candidate priority | candidate_master / no_master tags | candidate_priority 0–100, auto-decreased by slave_ips order |
| Writer endpoint | Hard-coded VIP script | writer_endpoint abstraction (none / vip, extensible) |
| CLI | Multiple masterha_* binaries | Single mha <subcommand> (check-repl / manager / switch / failover-plan / failover-execute / version) |
| Dry-run | None | --dry-run is a first-class flag |
| Distribution | Separate RPM, independent version | Bundled with dbbot releases |
Rule of thumb: prefer mha_go.yml for new clusters. Stick with mha.yml only when you must keep supporting legacy MySQL 5.7 + CentOS 7.5 topologies.
3. Support boundaries
- Target architecture: one primary + multiple replicas + one dedicated MHA-Go manager
- Replication:
GTIDreplication is mandatory,semi_syncis preferred - Applicable versions: MySQL
8.0/8.4 - Supported OS matches the rest of
mysql_ansible:CentOS 7/8,RHEL 7/8,Rocky 9,BigCloud 7/8/21,openEuler 20/22/24,Anolis OS 8,Kylin V10
4. Topology conventions
mha_go.yml reuses the [dbbot_mysql] host group and distinguishes roles through three variables — master_ip, slave_ips, and manager_ip:
master_ip: the primary that accepts writes. Registered asdb1with roleprimaryincluster.yaml.slave_ips: a list of at least one replica. Registered asdb2,db3, … with rolereplica.manager_ip: the node that runs themha-managerprocess. It must be one ofslave_ipsand cannot equalmaster_ip. Defaults to the last entry inslave_ips.
In cluster.yaml, replicas’ candidate_priority decreases with position in slave_ips — first replica 100, second 90, and so on. This priority drives candidate selection when the primary fails.
5. Inventory example
[dbbot_mysql]
192.168.199.131 ansible_user=root ansible_ssh_pass="'<your_ssh_password>'"
192.168.199.132 ansible_user=root ansible_ssh_pass="'<your_ssh_password>'"
192.168.199.133 ansible_user=root ansible_ssh_pass="'<your_ssh_password>'"
[all:vars]
ansible_python_interpreter=auto_silent
6. Key variables
Edit mysql_ansible/playbooks/vars/var_mha_go.yml:
master_ip: 192.168.199.131
slave_ips:
- 192.168.199.132
- 192.168.199.133
# manager defaults to the last slave — override explicitly if you want a different node
manager_ip: "{{ slave_ips[-1] }}"
# Cluster name surfaced in cluster.yaml, mha log lines, and the systemd unit description
mha_go_cluster_name: app1
# --- Writer endpoint (optional VIP failover script) ---
# mha_go_writer_endpoint_enabled: true
# vip: 192.168.199.130
# vip_netmask: "32"
# net_work_interface: "ens33"
Other role variables you can override (defaults are usually fine):
| Variable | Default | Purpose |
|---|---|---|
mha_go_binary_dest | /usr/local/bin/mha | Destination of the mha binary on the manager node |
mha_go_config_dir | /etc/mha | Directory holding cluster.yaml |
mha_go_log_dir | /var/log/mha | Manager log directory |
mha_go_service_enabled | true | Whether to systemctl enable the manager service |
mha_go_writer_endpoint_enabled | false | Enable VIP failover (requires vip / vip_netmask / net_work_interface) |
7. Prerequisites
Before executing the main tasks, make_mha_go validates the following. Any unmet condition fails the run immediately:
- Every node’s
datadirmust containmaster_slave_finish.flag— i.e. MySQL was installed by dbbot and the primary/replica topology was built. - The manager node must have
/tmp/ssh_finish.flag— i.e.make_ssh_passwordlesshas already run. SELECT @@gtid_modemust returnONon every node.
mha_go.yml itself chains the required roles so you do not need to run anything by hand first:
pre_check_and_set → mysql_server → make_replication → make_ssh_passwordless → make_mha_go
Running mha_go.yml alone is enough.
8. Entry point
cd /usr/local/dbbot/mysql_ansible/playbooks
ansible-playbook mha_go.yml
For non-interactive runs (CI or automation), bypass the confirmation.yml pause prompt:
ansible-playbook mha_go.yml -e dbbot_confirmation_input=confirm
9. Artifacts produced
On the manager node after a successful run:
/usr/local/bin/mha: the Go static binary.mha versionprintsmha-go 0.x.y./etc/mha/cluster.yaml: cluster definition withdb1=primaryanddb2..=replica, replication modegtid, semi-sync policypreferred./etc/systemd/system/mha-manager.service: aType=simplesystemd unit that startsmha manager --config /etc/mha/cluster.yaml --log-format jsonas{{ mysql_user }}./var/log/mha/: log directory (JSON output, systemd also captures it via journal).
Every node (primary, replicas, manager) gets mha_go_finish.flag in its datadir so downstream playbooks can recognize it.
10. Common commands
All of these run on the manager node, mha being /usr/local/bin/mha:
# Print version
mha version
# One-shot config and replication health check, no manager loop
mha check-repl --config /etc/mha/cluster.yaml
# Print the failover plan only (dry run, no action)
mha failover-plan --config /etc/mha/cluster.yaml
# Execute failover promoting a specific candidate
mha failover-execute --config /etc/mha/cluster.yaml --candidate db2
# Controlled switchover to a chosen new primary
mha switch --config /etc/mha/cluster.yaml --new-primary db2 --dry-run
Service status and logs:
systemctl status mha-manager
journalctl -u mha-manager -f
11. Enabling the VIP writer endpoint
By default the deployment does not use a VIP and cluster.yaml writes writer_endpoint.kind: none. To expose a stable write VIP:
Enable it in
vars/var_mha_go.yml:mha_go_writer_endpoint_enabled: true vip: 192.168.199.130 vip_netmask: "32" net_work_interface: ens33The playbook then runs
edit_sudoer.ymlanddeploy_vip_script.yml, addingip addr/arpingsudo rules for themysqluser on every MySQL node and dropping/usr/local/bin/mha_ip_failover.shon the manager.cluster.yaml’swriter_endpointbecomes:writer_endpoint: kind: vip target: <vip> command: /usr/local/bin/mha_ip_failover.shvip/vip_netmask/net_work_interfacemust match the real network.vipmust be a valid IP andvip_netmaskmust be in0–32.
12. Things to note
manager_ipmust be a member ofslave_ips.validate_mha_go.ymlhard-fails on inventory/vars mismatch or a manager pointed at the primary.- Every IP in
slave_ipsmust also appear ininventory/hosts.ini, and vice versa — “in inventory but not in vars” is rejected. mha-manager.serviceruns as themysqluser;cluster.yamlis mode0640owned bymysql:mysql.- Before re-running after a failure, check: the
mha-managerservice state, whethercluster.yamlwas edited by hand, and that GTID / replication state is consistent across nodes. - Air-gapped environments must pre-stage yum dependencies (
python3-libselinux,ncurses-compat-libs,numactl,libaio,tar); otherwisepre_check_and_setwill fail when yum tries to refresh metadata.