1. 概述
它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,MHA来保证数据库系统的高可用.在宕机的时间内(通常10—30秒内),完成故障切换,部署MHA,可避免主从一致性问题,节约购买新服务器的费用,不影响服务器性能,易安装,不改变现有部署。
还支持在线切换,从当前运行master切换到一个新的master上面,只需要很短的时间(0.5-2秒内),此时仅仅阻塞写操作,并不影响读操作,便于主机硬件维护。

工作原理
(1)从宕机崩溃的master保存二进制日志事件(binlog events);
(2)识别含有最新更新的slave;
(3)应用差异的中继日志(relay log)到其他的slave;
(4)应用从master保存的二进制日志事件(binlog events);
(5)提升一个slave为新的master;
(6)使其他的slave连接新的master进行复制;
2. 搭建环境
1. 架构说明:
IP | 主机名 | 数据库ID | 类型 |
192.168.2.51 | mysql01 | 251 | MySQL Master + mha-node |
192.168.2.52 | mysql02 | 252 | Mha-node+ Candidate Slave |
192.168.2.53 | mysql03 | 223 | Mha-node |
192.168.2.54 | mysql-mha01 | | mha-manager |
OS:CentOS 6.9
Mysql:5.7.20
Mha:0.57
2:配置/etc/my.cnf相关参数,在3各节点中分别配置GTID
log-bin=binlog
binlog-format=ROW
log-slave-updates=true
gtid-mode=on
enforce-gtid-consistency=true
master-info-repository=TABLE
relay-log-info-repository=TABLE
sync-master-info=1
slave-parallel-workers=2
binlog-checksum=CRC32
master-verify-checksum=1
slave-sql-verify-checksum=1
binlog-rows-query-log_events=1
重新初始化MySQL(非必须,这部是由于自动化脚本无法判定密码,要自动化必须剔除密码)
mysqld --initialize-insecure --user=mysql
service mysqld restart
设置root密码,创建复制用户:
mysql> use mysql;
mysql> GRANT ALL PRIVILEGES ON *.* TO root@"%" IDENTIFIED BY "123456";
mysql> GRANT GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' identified by '123456';
mysql> grant all privileges on *.* to 'mha'@'%' identified by '123456';
mysql> flush privileges;
在主导出mysqldump -uroot -p --all-databases --triggers --routines --events > /root/all.sql
在备库导入
3:在mysql2、mysql3配置Gtid复制
CHANGE MASTER TO
MASTER_HOST = '192.168.2.52',
MASTER_PORT = 3306,
MASTER_USER = 'repl',
MASTER_PASSWORD = '123456',
MASTER_AUTO_POSITION = 1;
3. MHA Manager安装
3.1. manager节点安装依赖软件
yum install perl-DBD-MySQL perl-ExtUtils-MakeMaker perl-CPAN perl-Mail-Sender perl-Log-Dispatch perl-Time-HiRes perl-Config-Tiny perl-Parallel-ForkManager perl-Module-Install perl-Config-IniFiles –y
3.2. 安装MHA-Manager的RPM
git clone https://gitee.com/gibsonxue/MHA.git
cd mha ; yum install mha4mysql-manager-0.57-0.el6.noarch.rpm ; mha4mysql-node-0.57-0.el6.noarch.rpm
3.3. 免秘钥登录配置
在manager节点需要配置到所有node节点的ssh免密码登录。
# ssh-keygen
#依次添加信任sh-copy-id root@ip{x.x.x.x}
#scp –r ~/.ssh root@
-rw------- 1 root root 398 Nov 19 11:24 authorized_keys
-rw------- 1 root root 1671 Nov 19 11:19 id_rsa
-rw-r--r-- 1 root root 398 Nov 19 11:20 id_rsa.pub
-rw-r--r-- 1 root root 1576 Nov 19 11:51 known_hosts
拷贝整个.ssh目录到所有其他服务器
3.4. 脚本定义
定义的脚本如下放入/usr/bin/下面:
3.4.1. master_ip_failover_01 故障切换时执行脚本
#!/usr/bin/env perl
use strict;
use warnings FATAL =>'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.2.50/24'; # Virtual IP
my $key = "2";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $exit_code = 0;
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "\n\n\n***************************************************************\n";
print "Disabling the VIP - $vip on old master: $orig_master_host\n";
print "***************************************************************\n\n\n\n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "\n\n\n***************************************************************\n";
print "Enabling the VIP - $vip on new master: $new_master_host \n";
print "***************************************************************\n\n\n\n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover –command=start|stop|stopssh|status –orig_master_host=host –orig_master_ip=ip –orig_master_port=po
rt –new_master_host=host –new_master_ip=ip –new_master_port=port\n";
}
3.4.2. master_ip_online_change_01无故障维护切换使用
#!/usr/bin/env perl
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;
my $_tstart;
my $_running_interval = 0.1;
my $vip = '192.168.2.50'; # Virtual IP
my $key = "2";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip/24";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $ssh_send_garp = "/sbin/arping -U $vip -I eth0 -c 1";
my (
$command, $orig_master_is_new_slave, $orig_master_host,
$orig_master_ip, $orig_master_port, $orig_master_user,
$orig_master_password, $orig_master_ssh_user,
$new_master_host, $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, $new_master_ssh_user,
);
GetOptions(
'command=s' => \$command,
'orig_master_is_new_slave' => \$orig_master_is_new_slave,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
'orig_master_password=s' => \$orig_master_password,
'orig_master_ssh_user=s' => \$orig_master_ssh_user,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
'new_master_ssh_user=s' => \$new_master_ssh_user,
);
exit &main();
sub start_vip(){
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_send_garp \"`;
}
sub stop_vip(){
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub current_time_us {
my ( $sec, $microsec ) = gettimeofday();
my $curdate = localtime($sec);
return $curdate . " " . sprintf( "%06d", $microsec );
}
sub sleep_until {
my $elapsed = tv_interval($_tstart);
if ( $_running_interval > $elapsed ) {
sleep( $_running_interval - $elapsed );
}
}
sub get_threads_util {
my $dbh = shift;
my $my_connection_id = shift;
my $running_time_threshold = shift;
my $type = shift;
$running_time_threshold = 0 unless ($running_time_threshold);
$type = 0 unless ($type);
my @threads;
my $sth = $dbh->prepare("SHOW PROCESSLIST");
$sth->execute();
while ( my $ref = $sth->fetchrow_hashref() ) {
my $id = $ref->{Id};
my $user = $ref->{User};
my $host = $ref->{Host};
my $command = $ref->{Command};
my $state = $ref->{State};
my $query_time = $ref->{Time};
my $info = $ref->{Info};
$info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
next if ( $my_connection_id == $id );
next if ( defined($query_time) && $query_time < $running_time_threshold );
next if ( defined($command) && $command eq "Binlog Dump" );
next if ( defined($user) && $user eq "system user" );
next
if ( defined($command)
&& $command eq "Sleep"
&& defined($query_time)
&& $query_time >= 1 );
if ( $type >= 1 ) {
next if ( defined($command) && $command eq "Sleep" );
next if ( defined($command) && $command eq "Connect" );
}
if ( $type >= 2 ) {
next if ( defined($info) && $info =~ m/^select/i );
next if ( defined($info) && $info =~ m/^show/i );
}
push @threads, $ref;
}
return @threads;
}
sub main {
if ( $command eq "stop" ) {
## Gracefully killing connections on the current master
# 1. Set read_only= 1 on the new master
# 2. DROP USER so that no app user can establish new connections
# 3. Set read_only= 1 on the current master
# 4. Kill current queries
# * Any database access failure will result in script die.
my $exit_code = 1;
eval {
## Setting read_only=1 on the new master (to avoid accident)
my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error(die_on_error)_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
print current_time_us() . " Set read_only on the new master.. ";
$new_master_handler->enable_read_only();
if ( $new_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
$new_master_handler->disconnect();
# Connecting to the orig master, die if any database error happens
my $orig_master_handler = new MHA::DBHelper();
$orig_master_handler->connect( $orig_master_ip, $orig_master_port,
$orig_master_user, $orig_master_password, 1 );
## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
$orig_master_handler->disable_log_bin_local();
# print current_time_us() . " Drpping app user on the orig master..\n";
#drop_app_user($orig_master_handler);
## Waiting for N * 100 milliseconds so that current connections can exit
my $time_until_read_only = 15;
$_tstart = [gettimeofday];
my @threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_read_only > 0 && $#threads >= 0 ) {
if ( $time_until_read_only % 5 == 0 ) {
printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_read_only * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_read_only--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}
## Setting read_only=1 on the current master so that nobody(except SUPER) can write
print current_time_us() . " Set read_only=1 on the orig master.. ";
$orig_master_handler->enable_read_only();
if ( $orig_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
## Waiting for M * 100 milliseconds so that current update queries can complete
my $time_until_kill_threads = 5;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
if ( $time_until_kill_threads % 5 == 0 ) {
printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_kill_threads--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}
## Terminating all threads
print current_time_us() . " Killing all application threads..\n";
$orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
print current_time_us() . " done.\n";
$orig_master_handler->enable_log_bin_local();
$orig_master_handler->disconnect();
## Droping the VIP
print "Disabling the VIP an old master: $orig_master_host \n";
&stop_vip();
## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
## Activating master ip on the new master
# 1. Create app user with write privileges
# 2. Moving backup script if needed
# 3. Register new master's ip to the catalog database
# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
my $exit_code = 10;
eval {
my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
## Set read_only=0 on the new master
$new_master_handler->disable_log_bin_local();
print current_time_us() . " Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only();
## Creating an app user on the new master
#print current_time_us() . " Creating app user on the new master..\n";
# create_app_user($new_master_handler);
print "Enabling the VIP $vip on the new master: $new_master_host \n";
&start_vip();
$new_master_handler->enable_log_bin_local();
$new_master_handler->disconnect();
## Update master ip on the catalog database, etc
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
# do nothing
exit 0;
}
else {
&usage();
exit 1;
}
}
sub usage {
print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
die;
}
3.4.3. 发送报告使用send_report.py
#!/usr/bin/env python
#-*- encoding:utf-8 -*-
#-------------------------------------------------------------------------------
# Name: send_report.py
# Author: zhoujy
#----------------------------------------------
import os
import sys
import time
import datetime
import smtplib
import subprocess
import fileinput
import getopt
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.Utils import COMMASPACE, formatdate
reload(sys)
sys.setdefaultencoding('utf8')
def send_mail(to, subject, text, from_mail, server="localhost"):
message = MIMEMultipart()
message['From'] = from_mail
message['To'] = COMMASPACE.join(to)
message['Date'] = formatdate(localtime=True)
message['Subject'] = subject
message.attach(MIMEText(text,_charset='utf-8'))
smtp = smtplib.SMTP(server)
smtp.sendmail(from_mail, to, message.as_string())
smtp.close()
if __name__ == "__main__":
opts,args = getopt.getopt(sys.argv[1:],"h",["orig_master_host=","new_master_host=","new_slave_hosts=","conf=","subject=","body=","app_vip=","new_master_ssh_port=","ssh_user="])
# print opts,args
for lines in opts:
key,values = lines
if key == '--orig_master_host':
orig_master_host = values
if key == '--new_master_host':
new_master_host = values
if key == '--new_slave_hosts':
new_slave_hosts = values
if key == '--subject':
subject = values
if key == '--body':
body = values
# text = sys.stdin.read()
mail_list = ['xueyuanfeng@cangoonline.com']
send_mail(mail_list, subject.encode("utf8"), body, "MHA_Monitor@smtp.cangoonline.com", server="127.0.0.1")
3.4.4. 编辑配置文件
[server default]
manager_log=/var/log/masterha/app01/manager.log
manager_workdir=/var/log/masterha/app01
master_binlog_dir=/var/lib/mysql
master_ip_failover_script=/usr/bin/master_ip_failover_01
master_ip_online_change_script=/usr/bin/master_ip_online_change_01
user=mha
password=123456
ping_interval=2
remote_workdir=/tmp
repl_user=repl
repl_password=123456
report_script=/usr/bin/send_report.py
secondary_check_script=/usr/bin/masterha_secondary_check -s MYSQL01 --user=root --port=22 --master_host=MYSQL02 --master_port=3306
shutdown_script=""
ssh_user=root
ssh_port=22
ping_type=CONNECT
[server1]
candidate_master=1
hostname=MYSQL01
port=3306
[server2]
candidate_master=1
hostname=MYSQL02
port=3306
[server3]
hostname=MYSQL03
port=3306
3.4.5. 启动MHA
nohup masterha_manager --conf=/etc/masterha/app01/app01.conf < /dev/null > /var/log/masterha/app01/manager.log 2>&1 &
4. MHA Node安装
4.1. manager节点需要安装node软件。
yum install perl-DBD-MySQL perl-ExtUtils-MakeMaker perl-CPAN perl-Mail-Sender perl-Log-Dispatch perl-Time-HiRes perl-Config-Tiny perl-Parallel-ForkManager perl-Module-Install perl-Config-IniFiles –y
4.2. 安装MHA-Node的RPM
git clone https://gitee.com/gibsonxue/MHA.git
cd mha ; mha4mysql-node-0.57-0.el6.noarch.rpm
5. MHA部署检查
5.1. 参数文件检查
检查文件权限是否为600。
5.2. masterha_check_ssh
masterha_check_ssh --conf=/etc/app01.cnf
检查结果必须是:
All SSH connection tests passed successfully.
5.3. masterha_check_repl
masterha_check_repl --conf=/etc/app01.cnf
检查结果必须是
MySQL Replication Health is OK.
5.4. masterha_secondary_check
具体脚本见配置文件里面secondary_check_script后面的内容。
检查结果必须是:
Master is reachable from
5.5. send_report
send_report --orig_master_host=x.x.x.x --new_master_host=y.y.y.y --new_slave_hosts=x.x.x.x --subject=/etc/masterha/app01.cnf --body="这是一封测试邮件"
必须要可以发送成功。
5.6. master_ip_failover_script
重点检查 vip 和 网卡 设置等是否正确。
5.7. master_ip_online_change_script
重点检查 vip 和 网卡 设置等是否正确。
5.8. manager.log
检查日志 /var/log/masterha/app01/manager.log 是否有报错。