MySQL MHA-Go Deployment Guide

This article explains the supported boundaries and basic usage of mha_go.yml. mha_go.yml deploys the Go rewrite of MHA (mha-go) — it drops the Perl MHA toolchain and runs as a single static binary plus one YAML config on the manager node.

1. How it works

mha-go is built around a controller main loop plus replica-side observers, with failover logic centered on GTID replication:

  • Main loop: the manager probes every node on a controller.monitor.interval cadence (default 2s). It declares a node unhealthy only after failure_threshold consecutive misses (default 3) followed by a reconfirm_timeout re-check (default 5s), filtering out transient noise.
  • Secondary observer: in addition to the manager’s own probes, cluster.yaml registers a secondary_checks entry per replica, so replicas independently confirm the primary is alive. This prevents a single network partition at the manager from triggering an erroneous failover.
  • Lease: controller.lease.backend: local-memory with a 15s TTL ensures only one manager instance owns the switch decision at any time.
  • GTID-only recovery: replication mode is fixed to gtid, so recovery does not rely on binlog-position or relay-log diff patching. replication.salvage.policy: salvage-if-possible governs whether to attempt to rescue transactions lost on a failed primary.
  • Semi-sync preference: semi_sync.policy: preferred plus wait_for_replica_count keeps semi-sync on the happy path and degrades gracefully on timeout.
  • Candidate priority: the order of slave_ips maps directly to candidate_priority (100, 90, 80, …) and drives new-primary selection at failover time.
  • Writer endpoint abstraction: writer_endpoint is a pluggable layer. Defaulting to none; enabling vip mode makes the manager invoke /usr/local/bin/mha_ip_failover.sh. The same abstraction can later target proxy / DNS-based endpoints without touching the control plane.
  • Runtime shape: the manager runs as the mysql user under systemd, emitting JSON logs to the journal. All CLI actions are consolidated under a single mha binary with subcommands.

2. Comparison with the legacy MHA

mha.yml deploys the community Perl MHA; mha_go.yml deploys the Go rewrite. Both target the same “one primary + multiple replicas + dedicated manager” topology, but they differ materially in implementation and support scope:

DimensionLegacy MHA (mha.yml)MHA-Go (mha_go.yml)
LanguagePerlGo, single static binary
OS supportCentOS 7.5 / EL7 onlyCentOS 7/8, RHEL 7/8, Rocky 9, BigCloud 7/8/21, openEuler 20/22/24, Anolis OS 8, Kylin V10
MySQL support5.78.0 / 8.4
External depsPerl + MHA RPM + DBD::mysql, …None beyond the shipped binary
Runtime shapemasterha_manager foreground / nohupsystemd service (mha-manager.service)
Runtime userUsually rootmysql user — least privilege
Config formatINI (app1.cnf)YAML (cluster.yaml)
LogsPlain textStructured JSON, ingested by journal
ReplicationBinlog-position or GTIDGTID only
Failure detectionSingle-point manager pingManager main loop plus replica-side secondary_checks reconfirmation
Failure thresholdping_interval × countfailure_threshold and reconfirm_timeout decoupled
Lost-txn rescueSSH into the dead primary, diff binlog / relay logNative GTID-based recovery gated by salvage.policy
Candidate prioritycandidate_master / no_master tagscandidate_priority 0–100, auto-decreased by slave_ips order
Writer endpointHard-coded VIP scriptwriter_endpoint abstraction (none / vip, extensible)
CLIMultiple masterha_* binariesSingle mha <subcommand> (check-repl / manager / switch / failover-plan / failover-execute / version)
Dry-runNone--dry-run is a first-class flag
DistributionSeparate RPM, independent versionBundled with dbbot releases

Rule of thumb: prefer mha_go.yml for new clusters. Stick with mha.yml only when you must keep supporting legacy MySQL 5.7 + CentOS 7.5 topologies.

3. Support boundaries

  • Target architecture: one primary + multiple replicas + one dedicated MHA-Go manager
  • Replication: GTID replication is mandatory, semi_sync is preferred
  • Applicable versions: MySQL 8.0 / 8.4
  • Supported OS matches the rest of mysql_ansible: CentOS 7/8, RHEL 7/8, Rocky 9, BigCloud 7/8/21, openEuler 20/22/24, Anolis OS 8, Kylin V10

4. Topology conventions

mha_go.yml reuses the [dbbot_mysql] host group and distinguishes roles through three variables — master_ip, slave_ips, and manager_ip:

  • master_ip: the primary that accepts writes. Registered as db1 with role primary in cluster.yaml.
  • slave_ips: a list of at least one replica. Registered as db2, db3, … with role replica.
  • manager_ip: the node that runs the mha-manager process. It must be one of slave_ips and cannot equal master_ip. Defaults to the last entry in slave_ips.

In cluster.yaml, replicas’ candidate_priority decreases with position in slave_ips — first replica 100, second 90, and so on. This priority drives candidate selection when the primary fails.

5. Inventory example

[dbbot_mysql]
192.168.199.131 ansible_user=root ansible_ssh_pass="'<your_ssh_password>'"
192.168.199.132 ansible_user=root ansible_ssh_pass="'<your_ssh_password>'"
192.168.199.133 ansible_user=root ansible_ssh_pass="'<your_ssh_password>'"

[all:vars]
ansible_python_interpreter=auto_silent

6. Key variables

Edit mysql_ansible/playbooks/vars/var_mha_go.yml:

master_ip: 192.168.199.131
slave_ips:
  - 192.168.199.132
  - 192.168.199.133

# manager defaults to the last slave — override explicitly if you want a different node
manager_ip: "{{ slave_ips[-1] }}"

# Cluster name surfaced in cluster.yaml, mha log lines, and the systemd unit description
mha_go_cluster_name: app1

# --- Writer endpoint (optional VIP failover script) ---
# mha_go_writer_endpoint_enabled: true
# vip: 192.168.199.130
# vip_netmask: "32"
# net_work_interface: "ens33"

Other role variables you can override (defaults are usually fine):

VariableDefaultPurpose
mha_go_binary_dest/usr/local/bin/mhaDestination of the mha binary on the manager node
mha_go_config_dir/etc/mhaDirectory holding cluster.yaml
mha_go_log_dir/var/log/mhaManager log directory
mha_go_service_enabledtrueWhether to systemctl enable the manager service
mha_go_writer_endpoint_enabledfalseEnable VIP failover (requires vip / vip_netmask / net_work_interface)

7. Prerequisites

Before executing the main tasks, make_mha_go validates the following. Any unmet condition fails the run immediately:

  • Every node’s datadir must contain master_slave_finish.flag — i.e. MySQL was installed by dbbot and the primary/replica topology was built.
  • The manager node must have /tmp/ssh_finish.flag — i.e. make_ssh_passwordless has already run.
  • SELECT @@gtid_mode must return ON on every node.

mha_go.yml itself chains the required roles so you do not need to run anything by hand first:

pre_check_and_set → mysql_server → make_replication → make_ssh_passwordless → make_mha_go

Running mha_go.yml alone is enough.

8. Entry point

cd /usr/local/dbbot/mysql_ansible/playbooks
ansible-playbook mha_go.yml

For non-interactive runs (CI or automation), bypass the confirmation.yml pause prompt:

ansible-playbook mha_go.yml -e dbbot_confirmation_input=confirm

9. Artifacts produced

On the manager node after a successful run:

  • /usr/local/bin/mha: the Go static binary. mha version prints mha-go 0.x.y.
  • /etc/mha/cluster.yaml: cluster definition with db1=primary and db2..=replica, replication mode gtid, semi-sync policy preferred.
  • /etc/systemd/system/mha-manager.service: a Type=simple systemd unit that starts mha manager --config /etc/mha/cluster.yaml --log-format json as {{ mysql_user }}.
  • /var/log/mha/: log directory (JSON output, systemd also captures it via journal).

Every node (primary, replicas, manager) gets mha_go_finish.flag in its datadir so downstream playbooks can recognize it.

10. Common commands

All of these run on the manager node, mha being /usr/local/bin/mha:

# Print version
mha version

# One-shot config and replication health check, no manager loop
mha check-repl --config /etc/mha/cluster.yaml

# Print the failover plan only (dry run, no action)
mha failover-plan --config /etc/mha/cluster.yaml

# Execute failover promoting a specific candidate
mha failover-execute --config /etc/mha/cluster.yaml --candidate db2

# Controlled switchover to a chosen new primary
mha switch --config /etc/mha/cluster.yaml --new-primary db2 --dry-run

Service status and logs:

systemctl status mha-manager
journalctl -u mha-manager -f

11. Enabling the VIP writer endpoint

By default the deployment does not use a VIP and cluster.yaml writes writer_endpoint.kind: none. To expose a stable write VIP:

  1. Enable it in vars/var_mha_go.yml:

    mha_go_writer_endpoint_enabled: true
    vip: 192.168.199.130
    vip_netmask: "32"
    net_work_interface: ens33
    
  2. The playbook then runs edit_sudoer.yml and deploy_vip_script.yml, adding ip addr / arping sudo rules for the mysql user on every MySQL node and dropping /usr/local/bin/mha_ip_failover.sh on the manager.

  3. cluster.yaml’s writer_endpoint becomes:

    writer_endpoint:
      kind: vip
      target: <vip>
      command: /usr/local/bin/mha_ip_failover.sh
    
  4. vip / vip_netmask / net_work_interface must match the real network. vip must be a valid IP and vip_netmask must be in 0–32.

12. Things to note

  • manager_ip must be a member of slave_ips. validate_mha_go.yml hard-fails on inventory/vars mismatch or a manager pointed at the primary.
  • Every IP in slave_ips must also appear in inventory/hosts.ini, and vice versa — “in inventory but not in vars” is rejected.
  • mha-manager.service runs as the mysql user; cluster.yaml is mode 0640 owned by mysql:mysql.
  • Before re-running after a failure, check: the mha-manager service state, whether cluster.yaml was edited by hand, and that GTID / replication state is consistent across nodes.
  • Air-gapped environments must pre-stage yum dependencies (python3-libselinux, ncurses-compat-libs, numactl, libaio, tar); otherwise pre_check_and_set will fail when yum tries to refresh metadata.