7.hdfs namenode ha
7.1 namenode ha概述
所谓ha(high availablity [əˌveɪlə’bɪləti] ),即高可用(7x24小时服务不中断)。通过主备namenode解决,如果主namenode发生故障,则切换到备namenode上,从而解决namenode单点故障的问题。
- 实现高可用最关键的目的是消除单点故障。ha严格来说应该分成各个组件的ha机制:hdfs的ha和yarn的ha。
- hadoop1.x在hdfs集群中namenode存在单点故障;hadoop2.0+可以通过namenode ha解决单点故障的问题。
- hadoop2.x中支持两个namenode做ha,一个主,一个备。hadoop3.x中支持两个或两个以上的namenode做ha,一主一备,或一主多备。
- namenode主要在以下两个方面影响hdfs集群
- namenode机器发生意外,如宕机,集群将无法使用,直到管理员修复重启后。
- namenode机器需要升级,包括软件、硬件升级,此时集群也将无法使用
- hdfs ha功能通过配置active/standby两个namenodes实现在集群中对namenode的热备来解决上述问题。如果出现故障,如机器崩溃或机器需要升级维护,这时可通过此种方式将namenode很快的切换到另外一台机器。
7.2 自动namenode ha概述

主备namenode自动切换解决单点故障
- 主namenode对外提供服务,备namenode同步主namenode元数据,以待切换
- 所有datanode同时向两个namenode汇报数据块信息(位置)
- jnn:集群(属性)同步edits log
- standby:备,完成了fsimage+edits.log文件的合并产生新的fsimage,推送回ann
- 自动切换:基于zookeeper自动切换方案
- zookeeper failover controller:监控namenode健康状态,并向zookeeper注册namenode。namenode挂掉后,zkfc为namenode竞争锁,获得zkfc 锁的namenode变为active
7.3namenode自动ha 集群搭建
7.3.1 规划

- 切换快照到 “初始化”:
- 确认防火墙保持关闭状态(linux阶段)
- 配置4台服务器彼此之间免密登录(zookeeper阶段)
- 4台服务器上安装jdk,并配置环境变量(linux阶段)
- node2-4,安装zookeeper(zookeeper阶段)
- ssh时不提示信息的配置
- 配置hdfs ha
- 切换快照到“format_pre"(nn格式化之前):
- ssh时不提示信息的配置
- 配置hdfs ha
- 不切换快照,在当前位置:
- 删除/var/itbaizhan/hadoop/full目录和/opt/hadoop3.1.3/logs目录下的全部内容
- ssh时不提示信息的配置
- 配置hdfs ha
7.3.2 ssh时不提示信息配置
后续需要编写hdfs ha集群的启动和关闭的shell脚本,在shell脚本中会涉及到 ssh nodex 命令,将会出现提示fingerprint信息,比较烦人, 如何让ssh不提示fingerprint信息?
/etc/ssh/ssh_config(客户端配置文件) 区别于sshd_config(服务端配置文件)
[root@node0 ~]# vim /etc/ssh/ssh_config
#stricthostkeychecking ask
#找到上一行代码 改为下一行
stricthostkeychecking no
# 发送给其他虚拟机
[root@node0 ~]# scp /etc/ssh/ssh_config node1:/etc/ssh/
ssh_config 100% 2271 1.9mb/s 00:00
[root@node0 ~]# scp /etc/ssh/ssh_config node2:/etc/ssh/
ssh_config 100% 2271 2.0mb/s 00:00
[root@node0 ~]# scp /etc/ssh/ssh_config node3:/etc/ssh/
ssh_config 100% 2271 2.2mb/s 00:00
7.3.3 hdfs配置
关闭hdfs集群后,删除四台节点上/var/itbaizhan/hadoop/full目录和/opt/hadoop3.1.3/logs目录下的全部内容
rm -rf /var/itbaizhan/hadoop/full
rm -rf /opt/hadoop3.1.3/logs
以下一律在node0上操作,做完后scp到node1、node2、node3
- hadoop-env.sh配置jdk
[root@node0 ~]# cd /opt/hadoop-3.1.3/etc/hadoop/
[root@node0 hadoop]# vim hadoop-env.sh
# 添加图片中的代码

- 修改workers指定datanode的位置
[root@node0 hadoop]# pwd
/opt/hadoop-3.1.3/etc/hadoop
[root@node0 hadoop]# vim workers

- 修改core-sit.xml
[root@node0 hadoop]# pwd
/opt/hadoop-3.1.3/etc/hadoop
[root@node0 hadoop]# vim core-site.xml
<configuration>
<property>
<name>fs.defaultfs</name>
<value>hdfs://mycluster</value>
</property>
<!-- 数据的存放目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/var/itbaizhan/hadoop/ha</value>
</property>
<!-- 指定每个zookeeper服务器的位置和客户端端口号 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
<!-- 解决hdfs web页面上删除、创建文件权限不足的问题 -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>root</value>
</property>
</configuration>

- hdfs-site.xml
[root@node0 hadoop]# pwd
/opt/hadoop-3.1.3/etc/hadoop
[root@node0 hadoop]# vim hdfs-site.xml
<configuration>
<!-- journalnode数据存储目录 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>${hadoop.tmp.dir}/dfs/journalnode/</value>
</property>
<!--集群名称 -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- 集群中namenode节点都有哪些 -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn0,nn1</value>
</property>
<!-- namenode的rpc通信地址 -->
<property>
<name>dfs.namenode.rpcaddress.mycluster.nn0</name>
<value>node0:9820</value>
</property>
<property>
<name>dfs.namenode.rpcaddress.mycluster.nn1</name>
<value>node1:9820</value>
</property>
<!-- namenode的http通信地址 -->
<property>
<name>dfs.namenode.httpaddress.mycluster.nn0</name>
<value>node0:9870</value>
</property>
<property>
<name>dfs.namenode.httpaddress.mycluster.nn1</name>
<value>node1:9870</value>
</property>
<!-- 指定namenode元数据在journalnode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node0:8485;node1:8485;node2:8485/mycluster</value>
</property>
<!-- 访问代理类:client用于确定哪个namenode为active-->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value>
</property>
<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh秘钥登录-->
<property>
<name>dfs.ha.fencing.ssh.private-keyfiles</name>
<value>/root/.ssh/id_dsa</value>
</property>
<!-- 启用nn故障自动转移 -->
<property>
<name>dfs.ha.automaticfailover.enabled</name>
<value>true</value>
</property>
</configuration>

- 先同步配置文件到node1、node2、node3
[root@node0 hadoop]# scp hadoop-env.sh core-site.xml hdfs-site.xml node1:`pwd`
[root@node0 hadoop]# scp hadoop-env.sh core-site.xml hdfs-site.xml node2:`pwd`
[root@node0 hadoop]# scp hadoop-env.sh core-site.xml hdfs-site.xml node3:`pwd`
hadoop-env.sh 100% 16kb 10.6mb/s 00:00
core-site.xml 100% 1337 1.7mb/s 00:00
hdfs-site.xml 100% 2593 3.5mb/s 00:00
hadoop环境变量配置参考完全分布式的环境变量配置(参考此专栏之前的博客)
[root@node0 hadoop]# echo $path
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/java/default/bin:/usr/java/default/bin:/opt/hadoop-3.1.3/bin:/opt/hadoop-3.1.3/sbin:/root/bin
7.3.4 首次启动hdfs ha集群
a) 启动zookeeper集群并查看状态, node1、node2、node3分别执行:
zkserver.sh start
检查四台虚拟机是否启动zookeeeper集群,node0未安装
[root@node0 ~]# zkserver.sh start
zookeeper jmx enabled by default
using config: /opt/zookeeper/bin/../conf/zoo.cfg
starting zookeeper ... started
[root@node1 ~]# zkserver.sh start
zookeeper jmx enabled by default
using config: /opt/zookeeper-3.5.7/bin/../conf/zoo.cfg
starting zookeeper ... already running as process 69903.
[root@node2 ~]# zkserver.sh start
zookeeper jmx enabled by default
using config: /opt/zookeeper-3.5.7/bin/../conf/zoo.cfg
starting zookeeper ... already running as process 60415.
[root@node3 ~]# zkserver.sh start
zookeeper jmx enabled by default
using config: /opt/zookeeper-3.5.7/bin/../conf/zoo.cfg
starting zookeeper ... already running as process 25958.
查看启动状态与mode
[root@node1 ~]# zkserver.sh status
zookeeper jmx enabled by default
using config: /opt/zookeeper-3.5.7/bin/../conf/zoo.cfg
client port found: 2181. client address: localhost.
mode: follower
[root@node2 ~]# zkserver.sh status
zookeeper jmx enabled by default
using config: /opt/zookeeper-3.5.7/bin/../conf/zoo.cfg
client port found: 2181. client address: localhost.
mode: leader
[root@node3 ~]# zkserver.sh status
zookeeper jmx enabled by default
using config: /opt/zookeeper-3.5.7/bin/../conf/zoo.cfg
client port found: 2181. client address: localhost.
mode: follower
b) 在node0\node1\node2上启动三台journalnode
[root@node0 ~]# cd /opt/hadoop-3.1.3/etc/hadoop
[root@node0 hadoop]# jps
15331 namenode
71333 jps
16379 datanode
[root@node0 hadoop]# hdfs --daemon start journalnode
[root@node0 hadoop]# jps
71555 jps
15331 namenode
71496 journalnode
16379 datanode
[root@node1 ~]# cd /opt/hadoop-3.1.3/etc/hadoop
[root@node1 hadoop]# jps
22500 secondarynamenode
22824 datanode
88458 jps
69903 quorumpeermain
[root@node1 hadoop]# hdfs --daemon start journalnode
[root@node1 hadoop]# jps
22500 secondarynamenode
88661 jps
22824 datanode
69903 quorumpeermain
88623 journalnode
[root@node2 ~]# cd /opt/hadoop-3.1.3/etc/hadoop
[root@node2 hadoop]# jps
22437 datanode
60415 quorumpeermain
78270 jps
[root@node2 hadoop]# hdfs --daemon start journalnode
[root@node2 hadoop]# jps
22437 datanode
78468 jps
78431 journalnode
60415 quorumpeermain
[root@node3 ~]# cd /opt/hadoop-3.1.3/etc/hadoop
[root@node3 hadoop]# jps
25958 quorumpeermain
43768 jps
[root@node3 hadoop]# jps
25958 quorumpeermain
43901 jps
c) 选择node0(未安装zookeeper),格式化hdfs
[root@node0 hadoop]# hdfs namenode -format
#看到如下提示,表示格式化成功
2021-10-15 13:21:33,318 info common.storage: storage directory /var/itbaizhan/hadoop/ha/dfs/name has been successfully formatted.
/var/itbaizhan/hadoop/ha/dfs/name/current/目录下产生了fsimage文件
[root@node0 hadoop]# ll /var/itbaizhan/hadoop/ha/dfs/name/current/
total 16
-rw-r--r--. 1 root root 391 mar 5 00:56 fsimage_0000000000000000000
-rw-r--r--. 1 root root 62 mar 5 00:56 fsimage_0000000000000000000.md5
-rw-r--r--. 1 root root 2 mar 5 00:56 seen_txid
-rw-r--r--. 1 root root 219 mar 5 00:56 version
格式化后,启动namenode进程
[root@node0 hadoop]# hdfs --daemon start namenode
[root@node0 hadoop]# jps
62032 jps
61923 namenode
60228 quorumpeermain
61252 journalnode
22437 datanode
d) 在另一台node1上同步元数据,然后在该节点上启动namenode。
[root@node1 hadoop]# hdfs namenode -bootstrapstandby
# 出现以下信息说明成功
about to bootstrap standby id nn1 from:
nameservice id: mycluster
other namenode id: nn2
other nn's http address: http://node2:9870
other nn's ipc address: node2/192.168.188.140:9820
namespace id: 365299465
block pool id: bp-2070874415-192.168.188.140-1678006602170
cluster id: cid-f5b32a07-6a91-4044-b097-fd5ba69fd631
layout version: -64
isupgradefinalized: true
# 启动namenode
[root@node1 hadoop]# hdfs --daemon start namenode
e) 初始化zookeeper上的内容 一定是在namenode节点(node1或node2)上执行格式命令之前在node2-node4任一节点上:
[root@node4 hadoop]# zkcli.sh
[zk: localhost:2181(connected) 1] ls /
[itbaizhan, registry, wzyy, zk001, zookeeper]
接下来在node0上执行:
[root@node0 ~]# hdfs zkfc -formatzk
2021-10-15 13:30:20,048 info ha.activestandbyelector: successfully created /hadoop-ha/mycluster in zk.
然后在node4上接着执行:
[zk: localhost:2181(connected) 1] ls /
[zookeeper, hadoop-ha]
[zk: localhost:2181(connected) 2] ls /hadoop-ha
[mycluster]
[z: localhost:2181(connected) 3] ls /hadoop-ha/mycluster
[]
执行到此处,还没有启动3个datanode和2个zkfc进程。
f) 启动hadoop集群,在node1执行
[root@node0 ~]# start-dfs.sh
#出现如下错误提示
error: attempting to operate on hdfs journalnode as root
error: but there is no hdfs_journalnode_user defined. aborting operation.
starting zk failover controllers on nn hosts [node1 node2]
error: attempting to operate on hdfs zkfc as root
error: but there is no hdfs_zkfc_user defined. aborting operation.
#解决办法:修改start-dfs.sh文件
[root@node1 ~]# vim /opt/hadoop-3.1.3/sbin/start-dfs.sh
#添加
hdfs_journalnode_user=root
hdfs_zkfc_user=root
#为了防止关闭时出现类似的错误提示,修改stop-dfs.sh
[root@node0 ~]# vim /opt/hadoop-3.1.3/sbin/stop-dfs.sh
#添加
hdfs_journalnode_user=root
hdfs_zkfc_user=root
#再次启动
[root@node0 hadoop]# start-dfs.sh
在启动zkcli.sh的节点node4上观察:
[zk: localhost:2181(connected) 5] ls /hadoop-ha/mycluster
[activebreadcrumb, activestandbyelectorlock]
[zk: localhost:2181(connected) 6] get -s /hadoop-ha/mycluster/activestandbyelectorlock
myclusternn1node1 �l(�>
czxid = 0x600000008
ctime = fri oct 15 13:40:10 cst 2021
mzxid = 0x600000008
mtime = fri oct 15 13:40:10 cst 2021
pzxid = 0x600000008
cversion = 0
dataversion = 0
aclversion = 0
ephemeralowner = 0x300006fd40a0002
datalength = 29
numchildren = 0

将active namenode对应节点node1上namenode进程kill掉:
[root@node0 hadoop]# jps
10337 jps
7347 journalnode
9701 dfszkfailovercontroller
7689 namenode
[root@node0 hadoop]# kill -9 7689
#或者
[root@node0 hadoop]# hdfs --daemon stop namenode
[root@node0 hadoop]# jps
7347 journalnode
9701 dfszkfailovercontroller
10381 jps
node3上继续查看:
[zk: localhost:2181(connected) 12] get -s /hadoop-ha/mycluster/activestandbyelectorlock
myclusternn2node2 �l(�>
czxid = 0x60000006c
......
但是通过浏览器访问发现active namenode不能自动进行切换。这是因为缺少一个rpm包:psmisc。接下来在四台节点上安装psmisc包。
yum install -y psmisc
node0访问不了,node1 从standby变为了active。
node0上再次启动namenode:
7.3.5 编写hdfs ha启动和关闭脚本
在node0的/root/bin目录下编写zk、hdfs启动脚本
[root@node1 ~]# mkdir bin
[root@node1 ~]# ls
anaconda-ks.cfg bin hh.txt
[root@node1 ~]# cd bin/
[root@node1 bin]# vim alljps.sh
#!/bin/bash
#查看当前节点的角色进程
echo "-----------node1 jps--------------"
jps
for node in node2 node3 node4
do
echo "-----------$node jps--------------"
ssh $node "source /etc/profile;jps"
done
[root@node1 bin]# chmod +x alljps.sh
[root@node1 bin]# vim starthdfs.sh
#!/bin/bash
#启动zk集群
for node in node2 node3 node4
do
ssh $node "source /etc/profile;zkserver.sh start"
done
#休眠1s
sleep 1
#启动hdfs集群
start-dfs.sh
alljps.sh
# esc->:wq
[root@node1 bin]# chmod +x starthdfs.sh
在node0的/root/bin目录下编写zk、hdfs关闭脚本
[root@node1 bin]# vim stophdfs.sh
[root@node1 bin]# cat stophdfs.sh
#!/bin/bash
#关闭hdfs集群
stop-dfs.sh
#休眠1s
sleep 1
#关闭zk集群
for node in node2 node3 node4
do
ssh $node "source /etc/profile;zkserver.sh stop"
done
#查看四个节点中角色进程
alljps.sh
[root@node1 bin]# chmod +x stophdfs.sh
测试:stophdfs.sh进行关闭,starthdfs.sh进行启动
发表评论