6. etcd笔记

etcd 是一个分布式可靠的键值存储,也是云原生技术的基石,但我在过去几年也没有什么直接操作 etcd 的机会…..只有遇到严重问题时,比如:
- etcd 脑裂:ARM64 虚拟机时间同步导致的 etcd 集群异常
- etcd 写入超时:ARM64 虚拟机的云盘 IO 性能较差以及 CPU 性能太弱导致的写入时间过长
- etcd 数据丢失:运维人员误操作直接销毁数据库节点
才需要挺身而出救火,导致 etcd 相关知识长期处于容易遗忘的状态,因此这里做一些笔记,防患于未然。
下面会使用 etcd 版本为 3.5.15,API 版本为 3.5,官方文档为:v3.5 docs。
V2 版本时我们还可以直接使用 curl 通过 HTTP+JSON 的方式调用 etcd 接口进行操作,但从 V3 开始,gRPC+protobuf 取代了 HTTP+JSON 的通信方式,我们只能使用 etcdctl 或者再部署一个 gRPC gateway 来继续保持对 HTTP JSON 接口的支持,这里就讲一讲如何更好地使用 etcdctl。
etcd 的文档提提到 Global flags (e.g., dial-timeout, --cacert, --cert, --key) can be set with environment variables,具体对应的是 etcdctl 帮助命令中输出的 OPTIONS,如下:
OPTIONS:
--cacert="" verify certificates of TLS-enabled secure servers using this CA bundle
--cert="" identify secure client using this TLS certificate file
--command-timeout=5s timeout for short running command (excluding dial timeout)
--debug[=false] enable client-side debug logging
--dial-timeout=2s dial timeout for client connections
-d, --discovery-srv="" domain name to query for SRV records describing cluster endpoints
--discovery-srv-name="" service name to query when using DNS discovery
--endpoints=[127.0.0.1:2379] gRPC endpoints
-h, --help[=false] help for etcdctl
--hex[=false] print byte strings as hex encoded strings
--insecure-discovery[=true] accept insecure SRV records describing cluster endpoints
--insecure-skip-tls-verify[=false] skip server certificate verification (CAUTION: this option should be enabled only for testing purposes)
--insecure-transport[=true] disable transport security for client connections
--keepalive-time=2s keepalive time for client connections
--keepalive-timeout=6s keepalive timeout for client connections
--key="" identify secure client using this TLS key file
--password="" password for authentication (if this option is used, --user option shouldn't include password)
--user="" username[:password] for authentication (prompt if password is not supplied)
-w, --write-out="simple" set the output format (fields, json, protobuf, simple, table)
虽然 etcdctl 没有支持使用配置文件来配置默认参数,但可以通过环境变量设置 Global flags,例如需要使用 V3 版本 API,连接到三个指定的服务器,并设置默认的根证书、客户端证书与密钥,可以配置环境变量如下:
export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=192.168.123.101:2379,192.168.123.102:2379,192.168.123.103:2379
export ETCDCTL_CACERT=/etc/etcdctl/cacert.pem
export ETCDCTL_CERT=/etc/etcdctl/cert.pem
export ETCDCTL_KEY=/etc/etcdctl/key.pem
这样就可以直接使用 etcdctl 命令访问服务器,不需要添加额外的命令行参数,也能减少误操作:
➜ ~ etcdctl member list
96224218d3d7de4, started, machine-3, http://192.168.123.103:2380, http://192.168.123.103:2379, false
d32a4b16a0f9191, started, machine-1, http://192.168.123.101:2380, http://192.168.123.101:2379, false
47c5ecfae8fccdb1, started, machine-2, http://192.168.123.102:2380, http://192.168.123.102:2379, false
为了方便测试,我制作了一个 3.5.15 版本的容器镜像:wbuntu/etcd:3.5.15,默认使用单节点模式,使用到的 Dockerfile 和 启动脚本如下:
Dockerfile
FROM alpine:3.18
ARG TARGETARCH
RUN apk add --no-cache git zsh
RUN sh -c "$(wget -O- https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"
VOLUME /data
WORKDIR /data
COPY ${TARGETARCH}/* /usr/bin/
COPY standalone.sh /usr/bin/standalone.sh
CMD ["/usr/bin/standalone.sh"]
standalone.sh
#!/bin/sh
NAME=$(hostname)
etcd \
--name ${NAME} \
--data-dir etcd \
--listen-client-urls http://0.0.0.0:2379 \
--advertise-client-urls http://0.0.0.0:2379 \
--listen-peer-urls http://0.0.0.0:2380 \
--initial-advertise-peer-urls http://0.0.0.0:2380 \
--initial-cluster ${NAME}=http://0.0.0.0:2380 \
--initial-cluster-token tkn \
--initial-cluster-state new
使用以下命令可以直接启动单节点 etcd:
docker run --name etcd -d --restart unless-stopped -p 2379:2379 -p 2380:2380 -v etcd-data:/data wbuntu/etcd:3.15.15
可以进入容器可以查看 etcd 状态:
➜ ~ docker exec -it etcd zsh
➜ /data etcdctl member list -w=table
+------------------+---------+--------------+---------------------+---------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+--------------+---------------------+---------------------+------------+
| 59a9c584ea2c3f35 | started | 3a5b5ccf6ca6 | http://0.0.0.0:2380 | http://0.0.0.0:2379 | false |
+------------------+---------+--------------+---------------------+---------------------+------------+
➜ /data etcdctl endpoint status -w=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 127.0.0.1:2379 | 59a9c584ea2c3f35 | 3.5.15 | 20 kB | true | false | 2 | 4 | 4 | |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
单节点 etcd 只适合开发测试场景,生产环境中一般至少使用三节点或者五节点的 etcd 来实现高可用,这里使用 cluster.sh 脚本作为启动脚本在三台机器上部署 etcd:
cluster.sh
#!/bin/sh
TOKEN=tkn
CLUSTER_STATE=new
NAME_1=machine-1
NAME_2=machine-2
NAME_3=machine-3
HOST_1=192.168.123.101
HOST_2=192.168.123.102
HOST_3=192.168.123.103
CLUSTER=${NAME_1}=http://${HOST_1}:2380,${NAME_2}=http://${HOST_2}:2380,${NAME_3}=http://${HOST_3}:2380
THIS_NAME=${NAME_1}
THIS_IP=${HOST_1}
etcd \
--name ${THIS_NAME} \
--data-dir etcd \
--listen-client-urls http://${THIS_IP}:2379 \
--advertise-client-urls http://${THIS_IP}:2379 \
--listen-peer-urls http://${THIS_IP}:2380 \
--initial-advertise-peer-urls http://${THIS_IP}:2380 \
--initial-cluster ${CLUSTER} \
--initial-cluster-state ${CLUSTER_STATE} \
--initial-cluster-token ${TOKEN}
三台机器分别对应 NAME_1~NAME_3 与 HOST_1~HOST_3,将脚本中的 THIS_NAME 与 THIS_IP 分别替换后拷贝到三台机器上,然后挂载到容器内运行,这里直接使用宿主机网络简化配置:
docker run --name etcd -d --restart unless-stopped --network host -v etcd-data:/data -v $PWD/cluster.sh:/usr/bin/cluster.sh wbuntu/etcd:3.15.15 /usr/bin/cluster.sh
最后配置 etcdctl 访问三节点 etcd 集群:
➜ ~ export ETCDCTL_API=3
➜ ~ export ETCDCTL_ENDPOINTS=192.168.123.101:2379,192.168.123.102:2379,192.168.123.103:2379
➜ ~ etcdctl member list
5640a3d876739168, started, machine-3, http://192.168.123.103:2380, http://192.168.123.103:2379, false
cecb22d998638446, started, machine-2, http://192.168.123.102:2380, http://192.168.123.102:2379, false
cf50412f3d98e8eb, started, machine-1, http://192.168.123.101:2380, http://192.168.123.101:2379, false
参考文档:How to Add and Remove Members
添加和删除成员的前提是 etcd 集群处于正常运行状态,即包含 N 个成员的集群,有超过 (N-1)/2 成员处于健康状态。
首先从三节点集群中移除 192.168.123.103:
➜ ~ etcdctl endpoint status -w=table
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 192.168.123.101:2379 | d32a4b16a0f9191 | 3.5.15 | 1.9 MB | false | false | 8 | 17809240 | 17809240 | |
| 192.168.123.102:2379 | 47c5ecfae8fccdb1 | 3.5.15 | 1.9 MB | true | false | 8 | 17809240 | 17809240 | |
| 192.168.123.103:2379 | 96224218d3d7de4 | 3.5.15 | 1.9 MB | false | false | 8 | 17809240 | 17809240 | |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------
➜ ~ etcdctl member remove 96224218d3d7de4
Member 96224218d3d7de4 removed from cluster a01e30bfaa44f6a5
➜ ~ export ETCDCTL_ENDPOINTS=192.168.123.101:2379,192.168.123.102:2379
➜ ~ etcdctl endpoint status -w=table
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 192.168.123.101:2379 | d32a4b16a0f9191 | 3.5.15 | 1.9 MB | false | false | 8 | 17809247 | 17809247 | |
| 192.168.123.102:2379 | 47c5ecfae8fccdb1 | 3.5.15 | 1.9 MB | true | false | 8 | 17809247 | 17809247 | |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
再添加一个新节点 192.168.123.104:
➜ ~ export ETCDCTL_ENDPOINTS=192.168.123.101:2379,192.168.123.102:2379
➜ ~ etcdctl member add machine-4 --peer-urls=http://192.168.123.104:2380
Member aabe77f74a8b8fd8 added to cluster a01e30bfaa44f6a5
ETCD_NAME="machine-4"
ETCD_INITIAL_CLUSTER="machine-1=http://192.168.123.101:2380,machine-2=http://192.168.123.102:2380,machine-4=http://192.168.123.104:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.123.104:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
然后在 192.168.123.104 上使用启动 etcd,启动脚本如下,注意这里 CLUSTER_STATE 为 existing:
#!/bin/sh
TOKEN=tkn
CLUSTER_STATE=existing
NAME_1=machine-1
NAME_2=machine-2
NAME_4=machine-4
HOST_1=192.168.123.101
HOST_2=192.168.123.102
HOST_4=192.168.123.104
CLUSTER=${NAME_1}=http://${HOST_1}:2380,${NAME_2}=http://${HOST_2}:2380,${NAME_4}=http://${HOST_4}:2380
THIS_NAME=${NAME_4}
THIS_IP=${HOST_4}
etcd \
--name ${THIS_NAME} \
--data-dir etcd \
--listen-client-urls http://${THIS_IP}:2379 \
--advertise-client-urls http://${THIS_IP}:2379 \
--listen-peer-urls http://${THIS_IP}:2380 \
--initial-advertise-peer-urls http://${THIS_IP}:2380 \
--initial-cluster ${CLUSTER} \
--initial-cluster-state ${CLUSTER_STATE} \
--initial-cluster-token ${TOKEN}
保存脚本为 cluster.sh 然后启动容器:
docker run --name etcd -d --restart unless-stopped --network host -v etcd-data:/data -v $PWD/cluster.sh:/usr/bin/cluster.sh wbuntu/etcd:3.15.15 /usr/bin/cluster.sh
最后检查集群状态,已恢复正常:
➜ ~ export ETCDCTL_ENDPOINTS=192.168.123.101:2379,192.168.123.102:2379,192.168.123.104:2379
➜ ~ etcdctl member list -w=table
+------------------+---------+-----------+-----------------------------+-----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-----------+-----------------------------+-----------------------------+------------+
| d32a4b16a0f9191 | started | machine-1 | http://192.168.123.101:2380 | http://192.168.123.101:2379 | false |
| 47c5ecfae8fccdb1 | started | machine-2 | http://192.168.123.102:2380 | http://192.168.123.102:2379 | false |
| aabe77f74a8b8fd8 | started | machine-4 | http://192.168.123.104:2380 | http://192.168.123.104:2379 | false |
+------------------+---------+-----------+-----------------------------+-----------------------------+------------+
参考文档:
假如遭遇了严重故障,三节点 etcd 集群宕机了两节点而且无法恢复,如何将剩下的单节点恢复正常工作状态?如下所示,现在可以看到 etcd 集群的前两个节点已经宕机,剩下的一个节点报错:etcdserver: no leader
➜ ~ etcdctl endpoint status -w=table
{"level":"warn","ts":"2024-09-14T15:47:47.017106+0800","logger":"etcd-client","caller":"v3@v3.5.15/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0005a4000/192.168.123.101:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 192.168.123.101:2379: connect: no route to host\""}
Failed to get the status of endpoint 192.168.123.101:2379 (context deadline exceeded)
{"level":"warn","ts":"2024-09-14T15:47:52.017513+0800","logger":"etcd-client","caller":"v3@v3.5.15/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0005a4000/192.168.123.101:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 192.168.123.102:2379: connect: no route to host\""}
Failed to get the status of endpoint 192.168.123.102:2379 (context deadline exceeded)
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+-----------------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+-----------------------+
| 192.168.123.103:2379 | cba698a4889e4798 | 3.5.15 | 1.9 MB | false | false | 11 | 17809258 | 17809258 | etcdserver: no leader |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+-----------------------+
首先对剩下的单节点拍摄数据快照,保存为 db 文件,按 RAFT 一致性算法的数据同步机制,每个节点上都应该保存着完整数据备份,假如快照也无法拍摄,就尝试从 etcd 数据目录直接恢复,如果该目录数据也丢失,剩下的希望就是历史快照备份,一般都是定时备份到 S3 上的。
➜ ~ export ETCDCTL_ENDPOINTS=192.168.123.103:2379
➜ ~ etcdctl endpoint status -w=table
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+-----------------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+-----------------------+
| 192.168.123.103:2379 | cba698a4889e4798 | 3.5.15 | 1.9 MB | false | false | 11 | 17809258 | 17809258 | etcdserver: no leader |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+-----------------------+
➜ ~ etcdctl snapshot save etcd.db
{"level":"info","ts":"2024-09-14T15:52:19.780262+0800","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"etcd.db.part"}
{"level":"info","ts":"2024-09-14T15:52:19.782948+0800","logger":"client","caller":"v3@v3.5.15/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2024-09-14T15:52:19.783015+0800","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"192.168.123.103:2379"}
{"level":"info","ts":"2024-09-14T15:52:19.798245+0800","logger":"client","caller":"v3@v3.5.15/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2024-09-14T15:52:19.803647+0800","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"192.168.123.103:2379","size":"1.9 MB","took":"now"}
{"level":"info","ts":"2024-09-14T15:52:19.803755+0800","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"etcd.db"}
Snapshot saved at etcd.db
接下来停止该节点的 etcd 服务,清理数据,再还快照到指定目录,这里需要使用 etcdutl 从快照文件恢复数据:
➜ ~ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d0642a2f7199 wbuntu/etcd:3.15.15 "/usr/bin/cluster.sh" 22 minutes ago Up 22 minutes etcd
➜ ~ docker stop etcd
etcd
➜ ~ docker volume ls
DRIVER VOLUME NAME
local etcd-data
➜ ~ docker volume inspect etcd-data
[
{
"CreatedAt": "2024-09-14T07:37:20Z",
"Driver": "local",
"Labels": null,
"Mountpoint": "/var/lib/docker/volumes/etcd-data/_data",
"Name": "etcd-data",
"Options": null,
"Scope": "local"
}
]
➜ ~ tree /var/lib/docker/volumes/etcd-data/_data
/var/lib/docker/volumes/etcd-data/_data
`-- etcd
`-- member
|-- snap
| |-- 000000000000000a-00000000010fbf68.snap
| `-- db
`-- wal
|-- 0.tmp
`-- 0000000000000000-0000000000000000.wal
➜ ~ mv /var/lib/docker/volumes/etcd-data/_data/etcd ./etcd_backup
➜ ~ etcdutl snapshot restore etcd.db --data-dir /var/lib/docker/volumes/etcd-data/_data/etcd
2024-09-14T08:11:11Z info snapshot/v3_snapshot.go:265 restoring snapshot {"path": "etcd.db", "wal-dir": "/var/lib/docker/volumes/etcd-data/_data/etcd/member/wal", "data-dir": "/var/lib/docker/volumes/etcd-data/_data/etcd", "snap-dir": "/var/lib/docker/volumes/etcd-data/_data/etcd/member/snap", "initial-memory-map-size": 10737418240}
2024-09-14T08:11:11Z info membership/store.go:141 Trimming membership information from the backend...
2024-09-14T08:11:11Z info membership/cluster.go:421 added member {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
2024-09-14T08:11:11Z info snapshot/v3_snapshot.go:293 restored snapshot {"path": "etcd.db", "wal-dir": "/var/lib/docker/volumes/etcd-data/_data/etcd/member/wal", "data-dir": "/var/lib/docker/volumes/etcd-data/_data/etcd", "snap-dir": "/var/lib/docker/volumes/etcd-data/_data/etcd/member/snap", "initial-memory-map-size": 10737418240}
➜ ~ tree /var/lib/docker/volumes/etcd-data/_data
/var/lib/docker/volumes/etcd-data/_data
`-- etcd
`-- member
|-- snap
| |-- 0000000000000001-0000000000000001.snap
| `-- db
`-- wal
`-- 0000000000000000-0000000000000000.wal
然后修改 cluster.sh 配置,将当前节点作为单节点 etcd 集群启动:
#!/bin/sh
TOKEN=tkn
CLUSTER_STATE=new
NAME_3=machine-3
HOST_3=192.168.123.103
CLUSTER=${NAME_3}=http://${HOST_3}:2380
THIS_NAME=${NAME_3}
THIS_IP=${HOST_3}
etcd \
--name ${THIS_NAME} \
--data-dir etcd \
--listen-client-urls http://${THIS_IP}:2379 \
--advertise-client-urls http://${THIS_IP}:2379 \
--listen-peer-urls http://${THIS_IP}:2380 \
--initial-advertise-peer-urls http://${THIS_IP}:2380 \
--initial-cluster ${CLUSTER} \
--initial-cluster-state ${CLUSTER_STATE} \
--initial-cluster-token ${TOKEN}
再使用 etcdctl 命令检查集群,可以看到节点恢复正常:
➜ ~ export ETCDCTL_ENDPOINTS=192.168.123.103:2379
➜ ~ etcdctl endpoint status -w=table
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 192.168.123.103:2379 | 8e9e05c52164694d | 3.5.15 | 1.9 MB | true | false | 2 | 4 | 4 | |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
最后是修复容器集群,上述的三节点 etcd 集群是我的 K3s 集群的外部 etcd 存储,这里只需要修改 K3s 服务外部 etcd 地址,保留 192.168.123.103:2379,然后重启 K3s 进程恢复容器集群,如下:
➜ ~ grep ExecStart= -A9 /etc/systemd/system/k3s.service
ExecStart=/usr/local/bin/k3s \
server \
'--datastore-endpoint' \
'http://192.168.123.103:2379' \
'--tls-san=192.168.123.90' \
'--kube-apiserver-arg=--max-requests-inflight=800' \
'--kube-apiserver-arg=--max-mutating-requests-inflight=400' \
'--kube-controller-manager-arg=--kube-api-qps=100' \
'--kube-scheduler-arg=--kube-api-qps=200' \
'--kubelet-arg=--kube-api-qps=100' \
➜ ~ systemctl daemon-reload
➜ ~ systemctl restart k3s
➜ ~ kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-8b9777675-2phdr 1/1 Running 4 (3m55s ago) 27d
kube-system helm-install-traefik-crd-5v247 0/1 Completed 0 27d
kube-system helm-install-traefik-sfbxv 0/1 Completed 2 27d
kube-system local-path-provisioner-69dff9496c-xtt22 1/1 Running 8 (3m21s ago) 27d
kube-system metrics-server-854c559bd-gz2pw 1/1 Running 8 (3m21s ago) 27d
kube-system svclb-traefik-3d992c42-6qkrl 2/2 Running 8 (3m55s ago) 27d
kube-system traefik-54c4f4ffd8-wnzgv 1/1 Running 4 (3m55s ago) 27d