阿里云 Ubuntu 22.4 三节点集群(美国硅谷)部署 Kubernetes 1.33
组网
集群信息
集群概况
| 实例ID | 名称 | IP | 主机名 | 
|---|---|---|---|
| i-rj9976wzpibxv39zlxv3 | node1 | 10.0.1.1 | iZrj9976wzpibxv39zlxv3Z | 
| i-rj9b9nu5j7lbkcipzqj1 | node2 | 10.0.1.2 | iZrj9b9nu5j7lbkcipzqj1Z | 
| i-rj9hztrcp8hoxgfe9x8c | node3 | 10.0.1.3 | iZrj9hztrcp8hoxgfe9x8cZ | 
节点概况
三个节点配置相同。
| 大类 | 资源 | 配置 | 备注 | 
|---|---|---|---|
| 基础信息 | 实例ID | … | |
| 名称 | … | 自定义 | |
| 地域/可用区 | 美国(硅谷)/可用区A | 影响延迟与容灾 | |
| 计算 | 实例规格 | ecs.c8i.xlarge | 4 vCPU 8 GiB | 
| CPU利用率(7d峰值) | 2 % | 云监控数据 | |
| 内存 | 内存容量 | 8 GiB | |
| 内存利用率(7d峰值) | 15 % | ||
| 存储 | 系统盘 | 50 GiB ESSD Entry | |
| 数据盘 | |||
| 网络 | 专有网络VPC | vpc-rj9y86j6gag9djuvyh6cw | IPv4网段:10.0.0.0/16 | 
| 交换机 | vsw-rj97amv6sv9jrx3zhnwli | IPv4网段:10.0.1.0/24 | |
| 公网IP/EIP | … | 8 Mbps | |
| 镜像 | 操作系统 | Ubuntu 24.04 64bit | |
| 安全 | 安全组 | sg-rj9976wzpibxv39wgzak | 允许 22、3389、6443 | 
禁用交换空间
# 临时关闭交换空间
sudo swapoff -a
# 将
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
加载内核模块
# 临时加载模块(重启后失效)
# modprobe 用于 Linux 动态加载/卸载内核模块
sudo modprobe overlay
sudo modprobe br_netfilter
# 永久加载模块
sudo tee /etc/modules-load.d/k8s.conf <<EOF
overlay
br_netfilter
EOF
sudo tee /etc/sysctl.d/kubernetes.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
安装容器运行时
在每个节点进行相同操作。容器运行时采用 containerd。
# 安装前置依赖
sudo apt install -y curl gnupg2 software-properties-common apt-transport-https ca-certificates
# 添加 containerd 存储库
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/containerd.gpg
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
# 安装 containerd
sudo apt update
sudo apt install containerd.io -y
# 
containerd config default | sudo tee /etc/containerd/config.toml >/dev/null 2>&1
sudo sed -i 's/SystemdCgroup \= false/SystemdCgroup \= true/g' /etc/containerd/config.toml
# 重启 containerd 使得更改生效
sudo systemctl restart containerd
安装 k8s
在每个节点进行相同操作。
# Kubernetes 软件包在 Ubuntu 24.04 的默认包存储库中不可用,故需要添加存储库然后进行安装。
# 使用 curl 命令下载 Kubernetes 包存储库的公共签名密钥。
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.33/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# 添加 Kubernetes apt 仓库
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.33/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
# 安装 kubelet kubeadm kubectl 工具
sudo apt update
sudo apt install kubelet kubeadm kubectl -y
集群初始化
主节点执行初始化
- --control-plane-endpoint为主节点IP地址
sudo kubeadm init --control-plane-endpoint=10.0.1.1
root@iZrj9976wzpibxv39zlxv3Z:~# sudo kubeadm init --control-plane-endpoint=10.0.1.1
[init] Using Kubernetes version: v1.33.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
W0814 00:37:57.883097    7459 checks.go:846] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "registry.k8s.io/pause:3.10" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [izrj9976wzpibxv39zlxv3z kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.1.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [izrj9976wzpibxv39zlxv3z localhost] and IPs [10.0.1.1 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [izrj9976wzpibxv39zlxv3z localhost] and IPs [10.0.1.1 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.323647ms
[control-plane-check] Waiting for healthy control plane components. This can take up to 4m0s
[control-plane-check] Checking kube-apiserver at https://10.0.1.1:6443/livez
[control-plane-check] Checking kube-controller-manager at https://127.0.0.1:10257/healthz
[control-plane-check] Checking kube-scheduler at https://127.0.0.1:10259/livez
[control-plane-check] kube-controller-manager is healthy after 1.634933149s
[control-plane-check] kube-scheduler is healthy after 1.931743994s
[control-plane-check] kube-apiserver is healthy after 3.500579433s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node izrj9976wzpibxv39zlxv3z as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node izrj9976wzpibxv39zlxv3z as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: jgb353.v9qxwp1uic5944zj
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
  export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
  kubeadm join 10.0.1.1:6443 --token jgb353.v9qxwp1uic5944zj \
        --discovery-token-ca-cert-hash sha256:c9a75316ca750f7e1fb350f5059d575f3c6dff85c501a256927ab681787f1b6a \
        --control-plane 
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.0.1.1:6443 --token jgb353.v9qxwp1uic5944zj \
        --discovery-token-ca-cert-hash sha256:c9a75316ca750f7e1fb350f5059d575f3c6dff85c501a256927ab681787f1b6a
主节点
vi ~/.bashrc
export KUBECONFIG=/etc/kubernetes/admin.conf
在主节点安装网络插件(以 calico 为例)
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
将工作节点加入集群
每一个工作节点均需要执行一次。
root@iZrj9b9nu5j7lbkcipzqj1Z:~# kubeadm join 10.0.1.1:6443 --token jgb353.v9qxwp1uic5944zj \
        --discovery-token-ca-cert-hash sha256:c9a75316ca750f7e1fb350f5059d575f3c6dff85c501a256927ab681787f1b6a
[preflight] Running pre-flight checks
[preflight] Reading configuration from the "kubeadm-config" ConfigMap in namespace "kube-system"...
[preflight] Use 'kubeadm init phase upload-config --config your-config-file' to re-upload it.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 1.000644935s
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在主节点获取集群各节点状态,工作节点刚加入集群处于 NotReady,约1到2分钟后会变为Ready。
kubectl get nodes
root@iZrj9976wzpibxv39zlxv3Z:~# kubectl get nodes
NAME                      STATUS     ROLES           AGE     VERSION
izrj9976wzpibxv39zlxv3z   Ready      control-plane   5m22s   v1.33.3
izrj9b9nu5j7lbkcipzqj1z   NotReady   <none>          49s     v1.33.3
izrj9hztrcp8hoxgfe9x8cz   NotReady   <none>          40s     v1.33.3
在主节点获取当前各容器状态
kubectl get pods -n kube-system
root@iZrj9976wzpibxv39zlxv3Z:~# kubectl get pods -n kube-system
NAME                                              READY   STATUS    RESTARTS   AGE
calico-kube-controllers-7498b9bb4c-289wz          1/1     Running   0          22m
calico-node-7s6sv                                 1/1     Running   0          22m
calico-node-gqttc                                 1/1     Running   0          19m
calico-node-tg9c6                                 1/1     Running   0          19m
coredns-674b8bbfcf-2kqhr                          1/1     Running   0          24m
coredns-674b8bbfcf-7k7s2                          1/1     Running   0          24m
etcd-izrj9976wzpibxv39zlxv3z                      1/1     Running   2          24m
kube-apiserver-izrj9976wzpibxv39zlxv3z            1/1     Running   2          24m
kube-controller-manager-izrj9976wzpibxv39zlxv3z   1/1     Running   2          24m
kube-proxy-8gc7r                                  1/1     Running   0          24m
kube-proxy-q4554                                  1/1     Running   0          19m
kube-proxy-r5n49                                  1/1     Running   0          19m
kube-scheduler-izrj9976wzpibxv39zlxv3z            1/1     Running   2          24m
阿里云 Ubuntu 22.4 三节点集群(中国北京)部署 Kubernetes 1.33
组网
集群信息
集群概况
| 实例ID | 名称 | IP | 主机名 | 
|---|---|---|---|
| i-2zeddz86r5qtupvq5bdz | node1 | 10.0.1.1 | ip-10-0-1-1 | 
| i-2ze19ijurr8l445yywvd | node2 | 10.0.1.2 | ip-10-0-1-2 | 
| i-2ze8qn58gz4qt24fdsh5 | node3 | 10.0.1.3 | ip-10-0-1-3 | 
节点概况
三个节点配置相同。
| 大类 | 资源 | 配置 | 备注 | 
|---|---|---|---|
| 基础信息 | 实例ID | … | |
| 名称 | … | 自定义 | |
| 地域/可用区 | 华北2(北京)/可用区H | 影响延迟与容灾 | |
| 计算 | 实例规格 | ecs.u1-c1m2.large | 2 vCPU 4 GiB | 
| CPU利用率(7d峰值) | 5 % | 云监控数据 | |
| 内存 | 内存容量 | 4 GiB | |
| 内存利用率(7d峰值) | 22 % | ||
| 存储 | 系统盘 | 40 GiB ESSD Entry | |
| 数据盘 | |||
| 网络 | 专有网络VPC | vpc-2zeo6i8vg8l355t5yv6fp | IPv4网段:10.0.0.0/16 | 
| 交换机 | vsw-2zebjepezqbs5zgyayb5n | IPv4网段:10.0.1.0/24 | |
| 公网IP/EIP | … | 8 Mbps | |
| 镜像 | 操作系统 | Ubuntu 24.04 64bit | |
| 安全 | 安全组 | sg-2zeddz86r5qtupvstjhc | 允许 22、3389、6443 | 
禁用交换空间
# 临时关闭交换空间
sudo swapoff -a
# 将
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
加载内核模块
# 临时加载模块(重启后失效)
sudo modprobe overlay
sudo modprobe br_netfilter
# 永久加载模块
sudo tee /etc/modules-load.d/k8s.conf <<EOF
overlay
br_netfilter
EOF
sudo tee /etc/sysctl.d/kubernetes.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
安装容器运行时
在每个节点进行相同操作。容器运行时采用 docker。
# 步骤 1:刷新本地软件包索引并安装后续步骤所必需的基础工具
sudo apt-get update                                        # 更新软件包列表
sudo apt-get install ca-certificates curl gnupg            # ca-certificates:根证书,用于 HTTPS 验证
                                                           # curl:下载工具,后面用来拉取 GPG 公钥
                                                           # gnupg:GNU 隐私卫士,用于处理 GPG 签名
# 步骤 2:导入并信任 Docker 官方的 GPG 公钥,确保后续下载的软件包来源可信
sudo install -m 0755 -d /etc/apt/keyrings                  # 创建 /etc/apt/keyrings 目录(如果不存在),权限 0755
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg \
  | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg     # 从阿里云镜像拉取 Docker 的 GPG 公钥,转换为二进制格式并保存
sudo chmod a+r /etc/apt/keyrings/docker.gpg                # 确保所有用户均可读取该公钥文件
# 步骤 3:将阿里云 Docker CE 软件源写入系统
mkdir -p /etc/apt/sources.list.d/                          # 确保 /etc/apt/sources.list.d/ 目录存在
echo \
  "deb [arch=$(dpkg --print-architecture) \
   signed-by=/etc/apt/keyrings/docker.gpg] \
   https://mirrors.aliyun.com/docker-ce/linux/ubuntu \
   $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
  | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# 解释:
#   deb [...]  :声明这是一个二进制软件源
#   arch=...   :自动检测当前系统架构(amd64/arm64 等)
#   signed-by  :指定验证软件包签名所用的 GPG 公钥文件
#   https://...:阿里云 Docker CE 镜像地址
#   $(...)     :读取 /etc/os-release 中的 VERSION_CODENAME(如 jammy、focal 等)
#   stable     :只使用官方标记为 stable 的组件
# 步骤 4:再次更新索引并安装最新版 Docker CE 及相关组件
# 重新同步软件包索引,使新添加的 Docker 源生效
sudo apt-get update
# 安装 Docker 相关软件
# docker-ce              Docker 引擎(社区版)
# docker-ce-cli          Docker 命令行客户端
# containerd.io          容器运行时(Docker 默认使用 containerd)
# docker-buildx-plugin   Docker Buildx 插件(下一代构建工具)
# docker-compose-plugin  Docker Compose v2 插件(作为 docker compose 子命令)
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# 如需安装“指定版本”的 Docker-CE,可按下面两步手动操作:
# 步骤 1:查询仓库中可用的 Docker CE 版本列表
# apt-cache madison docker-ce
# 示例输出:
#   docker-ce | 17.03.1~ce-0~ubuntu-xenial | https://mirrors.aliyun.com/... xenial/stable amd64 Packages
#   docker-ce | 17.03.0~ce-0~ubuntu-xenial | https://mirrors.aliyun.com/... xenial/stable amd64 Packages
# 步骤 2:安装指定版本(将 [VERSION] 替换为步骤 1 中看到的完整版本号)
# sudo apt-get -y install docker-ce=[VERSION]
# 安装完成后,重启 Docker 服务并设置开机自启
systemctl restart docker
systemctl enable docker
# 1. 确保 Docker 配置目录存在(如果不存在就新建)
mkdir -p /etc/docker
# 2. 将多行内容一次性写入 /etc/docker/daemon.json,EOF 为结束标记
cat > /etc/docker/daemon.json << EOF
{
  # 使用 systemd 作为 cgroup 驱动,K8s 官方推荐,可避免“cgroupfs”与“systemd”混用导致的资源管理冲突
  "exec-opts": ["native.cgroupdriver=systemd"],
  # 镜像加速地址列表,Docker 会按顺序依次尝试
  "registry-mirrors": [
    "https://docker.1panel.live",     # 1Panel 镜像加速(中国境内)
    "https://hub.mirrorify.net",      # Mirrorify 镜像加速(中国境内)
    "https://docker.m.daocloud.io",   # DaoCloud 公共镜像加速(中国境内)
    "https://registry.dockermirror.com", # DockerMirror 镜像加速(中国境内)
    "https://docker.aityp.com",       # 渡渡鸟镜像同步站(中国境内)
    "https://docker.anyhub.us.kg",    # AnyHub 镜像加速(中国境内)
    "https://dockerhub.icu",          # DockerHub 镜像加速(中国境内)
    "https://docker.awsl9527.cn"      # AWSL 镜像加速(中国境内)
  ],
  # 允许通过 HTTP 连接的不安全私有仓库(通常用于公司内网测试环境)
  "insecure-registries": [
    "https://xxx.xxx.xxx.xxx"
  ],
  # 限制同时下载的最大并发数,避免带宽被占满
  "max-concurrent-downloads": 10,
  # 日志驱动及级别设置
  "log-driver": "json-file",        # 使用 json-file 日志驱动
  "log-level": "warn",              # 日志级别:warn(只打印警告及以上)
  "log-opts": {                     # 日志轮转参数
    "max-size": "10m",              # 单个日志文件最大 10 MB
    "max-file": "3"                 # 最多保留 3 个旧日志文件
  },
  # Docker 所有数据(镜像、容器、卷等)的存放目录
  "data-root": "/var/lib/docker"
}
EOF
# 3. 设置 Docker 开机自启动
systemctl enable docker
# 4. 重启 Docker 服务,让新的 daemon.json 生效
systemctl restart docker
# 5. 查看 Docker 服务状态,确认是否正常启动且无报错
systemctl status docker
# 6. 通过其中一个镜像加速器拉取 nginx:latest 镜像,验证加速效果
docker pull hub.mirrorify.net/library/nginx:latest
cat > /etc/docker/daemon.json << EOF
{
  "exec-opts": [
    "native.cgroupdriver=systemd"
  ],
  "registry-mirrors": [
    "https://docker.1panel.live",
    "https://hub.mirrorify.net",
    "https://docker.m.daocloud.io",
    "https://registry.dockermirror.com",
    "https://docker.aityp.com",
    "https://docker.anyhub.us.kg",
    "https://dockerhub.icu",
    "https://docker.awsl9527.cn"
  ],
  "insecure-registries": [],
  "max-concurrent-downloads": 10,
  "log-driver": "json-file",
  "log-level": "warn",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "data-root": "/var/lib/docker"
}
EOF
# docker 使用的 Cgroup Driver 应当为 systemd 而不是默认值 cgroupfs
docker info | grep -i "Cgroup Driver"
docker pull nginx
docker images
cri-docker
cri-docker 是一个CRI(Container Runtime Interface)适配器,用于让 Kubernetes 可以通过标准的 CRI 接口与 Docker 通信。
Kubernetes 从 1.20 开始弃用 dockershim,1.24 起彻底移除,默认不再支持 Docker 作为容器运行时。1.24 及以上版本 Kubernetes 应优先采用 containerd 作为容器运行时,若有旧系统兼容等特殊需求仍需使用 Docker,则需要 cri-docker 作为中间层,其一头对接 kubelet(CRI),一头对接 Docker API。
下载并解压
若由于网络环境无法直接从 Github 下载,可以访问 https://github.akams.cn 选择一个可以访问的代理连接进行下载。
wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.18/cri-dockerd-0.3.18.amd64.tgz
tar -zxvf cri-dockerd-*.amd64.tgz
cp cri-dockerd/cri-dockerd /usr/bin/
chmod +x /usr/bin/cri-dockerd
配置
官方配置模板:https://github.com/Mirantis/cri-dockerd/tree/master/packaging/systemd
/etc/systemd/system/cri-docker.service
在官方模板基础上,ExecStart 中需加上:
--network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9
cat > /etc/systemd/system/cri-docker.service <<EOF
[Unit]
Description=CRI Interface for Docker Application Container Engine
Documentation=https://docs.mirantis.com
After=network-online.target firewalld.service docker.service
Wants=network-online.target
Requires=cri-docker.socket
[Service]
Type=notify
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3
# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
Delegate=yes
KillMode=process
[Install]
WantedBy=multi-user.target
EOF
/etc/systemd/system/cri-docker.socket
官方模板原样配置。
cat > /etc/systemd/system/cri-docker.socket <<EOF
[Unit]
Description=CRI Docker Socket for the API
PartOf=cri-docker.service
[Socket]
ListenStream=%t/cri-dockerd.sock
SocketMode=0660
SocketUser=root
SocketGroup=docker
[Install]
WantedBy=sockets.target
EOF
# 让 systemd 重新扫描磁盘上的单元文件(.service、.socket、.mount 等)。当新增、修改或删除了任何 systemd 配置文件(如 /etc/systemd/system/cri-docker.service)后,必须执行这一步,否则 systemd 仍然使用旧的缓存。
systemctl daemon-reload
# 设置 cri-docker 开机自启,并且现在立即启动
systemctl enable cri-docker --now
安装 k8s
在每个节点进行相同操作。
# Kubernetes 软件包在 Ubuntu 24.04 的默认包存储库中不可用,故需要添加存储库然后进行安装。
# 使用 curl 命令下载 Kubernetes 包存储库的公共签名密钥。由于网络原因,采用阿里源进行安装,https://developer.aliyun.com/mirror/kubernetes,https://mirrors.aliyun.com/kubernetes-new。
curl -fsSL https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.33/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# 添加 Kubernetes apt 仓库
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.33/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
# 安装 kubelet kubeadm kubectl 工具
sudo apt update
sudo apt install kubelet kubeadm kubectl -y
# kubelet 的 cgroup driver 与 Docker 保持一致(都使用 systemd)
# 编辑 /etc/default/kubelet 配置文件,KUBELET_EXTRA_ARGS 中配置 --cgroup-driver=systemd,即 KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"
vi /etc/default/kubelet
systemctl enable kubelet
kubeadm config images list --kubernetes-version=stable
root@ip-10-0-1-1:~# kubeadm config images list --kubernetes-version=stable
registry.k8s.io/kube-apiserver:v1.33.4
registry.k8s.io/kube-controller-manager:v1.33.4
registry.k8s.io/kube-scheduler:v1.33.4
registry.k8s.io/kube-proxy:v1.33.4
registry.k8s.io/coredns/coredns:v1.12.0
registry.k8s.io/pause:3.10
registry.k8s.io/etcd:3.5.21-0
集群初始化
主节点执行初始化
- --control-plane-endpoint为主节点IP地址
sudo kubeadm init \
  --kubernetes-version=v1.33.4 \
  --pod-network-cidr=10.244.0.0/16 \
  --apiserver-advertise-address=10.0.1.1 \
  --image-repository registry.aliyuncs.com/google_containers \
  --cri-socket unix:///run/cri-dockerd.sock \
  --control-plane-endpoint=10.0.1.1
root@ip-10-0-1-1:~# sudo kubeadm init \
  --kubernetes-version=v1.33.4 \
  --pod-network-cidr=10.244.0.0/16 \
  --apiserver-advertise-address=10.0.1.1 \
  --image-repository registry.aliyuncs.com/google_containers \
  --cri-socket unix:///run/cri-dockerd.sock \
  --control-plane-endpoint=10.0.1.1
[init] Using Kubernetes version: v1.33.4
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
W0815 00:28:36.788919   12149 checks.go:846] detected that the sandbox image "registry.aliyuncs.com/google_containers/pause:3.9" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "registry.aliyuncs.com/google_containers/pause:3.10" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [ip-10-0-1-1.cn-beijing.ecs.internal kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.1.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [ip-10-0-1-1.cn-beijing.ecs.internal localhost] and IPs [10.0.1.1 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [ip-10-0-1-1.cn-beijing.ecs.internal localhost] and IPs [10.0.1.1 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 507.128098ms
[control-plane-check] Waiting for healthy control plane components. This can take up to 4m0s
[control-plane-check] Checking kube-apiserver at https://10.0.1.1:6443/livez
[control-plane-check] Checking kube-controller-manager at https://127.0.0.1:10257/healthz
[control-plane-check] Checking kube-scheduler at https://127.0.0.1:10259/livez
[control-plane-check] kube-controller-manager is healthy after 8.489155157s
[control-plane-check] kube-scheduler is healthy after 9.314309793s
[control-plane-check] kube-apiserver is healthy after 10.501890143s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node ip-10-0-1-1.cn-beijing.ecs.internal as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node ip-10-0-1-1.cn-beijing.ecs.internal as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: c5h4fy.vuksyytbu0bztxqb
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
  export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
  kubeadm join 10.0.1.1:6443 --token c5h4fy.vuksyytbu0bztxqb \
        --discovery-token-ca-cert-hash sha256:c5f38f474fc44b29977d4fbb09e009eb7ebfc530c934dfb72876516aa902be49 \
        --control-plane 
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.0.1.1:6443 --token c5h4fy.vuksyytbu0bztxqb \
        --discovery-token-ca-cert-hash sha256:c5f38f474fc44b29977d4fbb09e009eb7ebfc530c934dfb72876516aa902be49 
主节点
vi ~/.bashrc
export KUBECONFIG=/etc/kubernetes/admin.conf
source ~/.bashrc
在主节点安装网络插件(以 calico 为例)
https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
curl https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/calico.yaml -O
kubectl apply -f calico.yaml
将工作节点加入集群
每一个工作节点均需要执行一次。
# 注意这里需额外指定参数 cri-socket
kubeadm join 10.0.1.1:6443 --token c5h4fy.vuksyytbu0bztxqb \
        --discovery-token-ca-cert-hash sha256:c5f38f474fc44b29977d4fbb09e009eb7ebfc530c934dfb72876516aa902be49  --cri-socket=unix:///run/cri-dockerd.sock
root@ip-10-0-1-2:~# kubeadm join 10.0.1.1:6443 --token c5h4fy.vuksyytbu0bztxqb \
        --discovery-token-ca-cert-hash sha256:c5f38f474fc44b29977d4fbb09e009eb7ebfc530c934dfb72876516aa902be49  --cri-socket=unix:///run/cri-dockerd.sock
[preflight] Running pre-flight checks
[preflight] Reading configuration from the "kubeadm-config" ConfigMap in namespace "kube-system"...
[preflight] Use 'kubeadm init phase upload-config --config your-config-file' to re-upload it.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 1.000750357s
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在主节点获取集群各节点状态,工作节点刚加入集群处于 NotReady,约1到2分钟后会变为Ready。
kubectl get nodes
root@ip-10-0-1-1:~# kubectl get nodes
NAME                                  STATUS   ROLES           AGE   VERSION
ip-10-0-1-1.cn-beijing.ecs.internal   Ready    control-plane   66m   v1.33.4
ip-10-0-1-2.cn-beijing.ecs.internal   Ready    <none>          16m   v1.33.4
ip-10-0-1-3.cn-beijing.ecs.internal   Ready    <none>          16m   v1.33.4
在主节点获取当前各容器状态
kubectl get pods -n kube-system
root@ip-10-0-1-1:~# kubectl get pods -n kube-system
NAME                                                          READY   STATUS    RESTARTS   AGE
calico-kube-controllers-7498b9bb4c-996fm                      1/1     Running   0          35m
calico-node-kvtzj                                             1/1     Running   0          35m
calico-node-m6c86                                             1/1     Running   0          15m
calico-node-xlxsx                                             1/1     Running   0          16m
coredns-757cc6c8f8-jpllc                                      1/1     Running   0          65m
coredns-757cc6c8f8-xhwks                                      1/1     Running   0          65m
etcd-ip-10-0-1-1.cn-beijing.ecs.internal                      1/1     Running   0          65m
kube-apiserver-ip-10-0-1-1.cn-beijing.ecs.internal            1/1     Running   0          65m
kube-controller-manager-ip-10-0-1-1.cn-beijing.ecs.internal   1/1     Running   0          65m
kube-proxy-lwwmb                                              1/1     Running   0          16m
kube-proxy-srrsr                                              1/1     Running   0          65m
kube-proxy-vdtf8                                              1/1     Running   0          15m
kube-scheduler-ip-10-0-1-1.cn-beijing.ecs.internal            1/1     Running   0          65m
- 网络插件(Calico):calico-kube-controllers和 3 个calico-node都在运行,说明跨节点网络通信正常。
- DNS 服务(CoreDNS):两个副本都在运行,服务发现功能正常。
- 控制平面组件:- etcd:集群状态存储正常。
- kube-apiserver、- kube-controller-manager、- kube-scheduler:都在主节点上运行,控制平面健康。
 
- kube-proxy:每个节点一个代理 Pod,确保服务流量转发正常。
基于 VMware Wworkstation 虚拟机 Ubuntu 22.4 三节点集群部署 Kubernetes 1.34
https://www.ccgooes.com/2025/05/29/k8s/
高可用:https://www.cnblogs.com/lldhsds/p/18261304
纯手搓:https://www.cnblogs.com/xuweiweiwoaini/p/13884112.html
组网
https://kubernetes.io/zh-cn/docs/reference/networking/ports-and-protocols/
以 NAT 方式组网,查看虚拟网络编辑器,获取起始IP地址、结束IP地址和网关IP地址。
 
 
集群信息
集群概况
| 主机名 | IP | 
|---|---|
| node1 | 192.168.211.128 | 
| node2 | 192.168.211.129 | 
| node3 | 192.168.211.130 | 
节点概况
三个节点配置相同。
| 大类 | 资源 | 配置 | 备注 | 
|---|---|---|---|
| 基础信息 | |||
| 计算 | CPU | 2 vCPU | |
| 内存 | 内存容量 | 4 GiB | |
| 存储 | 系统盘 | 200 GiB | |
| 数据盘 | |||
| 网络 | 网段 | IPv4网段:192.168.211.0/24 | |
| NAT网关 | 192.168.211.2 | ||
| 镜像 | 操作系统 | Ubuntu 24.04 64bit | |
| 安全 | 防火墙 | ufw,允许 22、6443 | 
sudo vi /etc/netplan/50-cloud-init.yaml
network:
  version: 2
  ethernets:
    ens33:
      dhcp4: no
      addresses:
        # 本机私网 IP
        - 192.168.211.128/24
      routes:
        - to: default
          # 网关
          via: 192.168.211.2
      nameservers:
        addresses:
          # 阿里 DNS
          - 223.5.5.5
          - 223.6.6.6
          # 114 DNS
          - 114.114.114.114
          # Google Public DNS
          - 8.8.8.8
          - 8.8.4.4
sudo netplan apply
sudo hostnamectl set-hostname "node1"
cat >> /etc/hosts << EOF
192.168.211.128 node1
192.168.211.129 node2
192.168.211.130 node3
EOF
时间同步
https://www.kimi.com/share/d2fc7vm0ftlma44rv2tg
sudo timedatectl set-timezone Asia/Shanghai
sudo apt install -y ntpsec-ntpdate
ntpdate ntp.aliyun.com # 阿里云同步时间
禁用交换空间
# 临时关闭交换空间
sudo swapoff -a
# 将
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
加载内核模块
# 临时加载模块(重启后失效)
sudo modprobe overlay
sudo modprobe br_netfilter
# 永久加载模块
sudo tee /etc/modules-load.d/k8s.conf <<EOF
overlay
br_netfilter
EOF
sudo tee /etc/sysctl.d/kubernetes.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
ipvs
apt install -y ipset ipvsadm
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack
cat << EOF | tee /etc/modules-load.d/ipvs.conf
ip_vs
ip_vs_rr
ip_VS_wrr
ip_vs_sh
nf_conntrack
EOF
sudo sysctl --system
安装容器运行时
安装 docker
在每个节点进行相同操作。容器运行时采用 docker。
# 步骤 1:刷新本地软件包索引并安装后续步骤所必需的基础工具
sudo apt-get update                                        # 更新软件包列表
sudo apt-get install ca-certificates curl gnupg            # ca-certificates:根证书,用于 HTTPS 验证
                                                           # curl:下载工具,后面用来拉取 GPG 公钥
                                                           # gnupg:GNU 隐私卫士,用于处理 GPG 签名
# 步骤 2:导入并信任 Docker 官方的 GPG 公钥,确保后续下载的软件包来源可信
sudo install -m 0755 -d /etc/apt/keyrings                  # 创建 /etc/apt/keyrings 目录(如果不存在),权限 0755
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg \
  | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg     # 从阿里云镜像拉取 Docker 的 GPG 公钥,转换为二进制格式并保存
sudo chmod a+r /etc/apt/keyrings/docker.gpg                # 确保所有用户均可读取该公钥文件
# 步骤 3:将阿里云 Docker CE 软件源写入系统
mkdir -p /etc/apt/sources.list.d/                          # 确保 /etc/apt/sources.list.d/ 目录存在
echo \
  "deb [arch=$(dpkg --print-architecture) \
   signed-by=/etc/apt/keyrings/docker.gpg] \
   https://mirrors.aliyun.com/docker-ce/linux/ubuntu \
   $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
  | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# 解释:
#   deb [...]  :声明这是一个二进制软件源
#   arch=...   :自动检测当前系统架构(amd64/arm64 等)
#   signed-by  :指定验证软件包签名所用的 GPG 公钥文件
#   https://...:阿里云 Docker CE 镜像地址
#   $(...)     :读取 /etc/os-release 中的 VERSION_CODENAME(如 jammy、focal 等)
#   stable     :只使用官方标记为 stable 的组件
# 步骤 4:再次更新索引并安装最新版 Docker CE 及相关组件
# 重新同步软件包索引,使新添加的 Docker 源生效
sudo apt-get update
# 安装 Docker 相关软件
# docker-ce              Docker 引擎(社区版)
# docker-ce-cli          Docker 命令行客户端
# containerd.io          容器运行时(Docker 默认使用 containerd)
# docker-buildx-plugin   Docker Buildx 插件(下一代构建工具)
# docker-compose-plugin  Docker Compose v2 插件(作为 docker compose 子命令)
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# 如需安装“指定版本”的 Docker-CE,可按下面两步手动操作:
# 步骤 1:查询仓库中可用的 Docker CE 版本列表
# apt-cache madison docker-ce
# 示例输出:
#   docker-ce | 17.03.1~ce-0~ubuntu-xenial | https://mirrors.aliyun.com/... xenial/stable amd64 Packages
#   docker-ce | 17.03.0~ce-0~ubuntu-xenial | https://mirrors.aliyun.com/... xenial/stable amd64 Packages
# 步骤 2:安装指定版本(将 [VERSION] 替换为步骤 1 中看到的完整版本号)
# sudo apt-get -y install docker-ce=[VERSION]
# 安装完成后,重启 Docker 服务并设置开机自启
systemctl restart docker
systemctl enable docker
# 1. 确保 Docker 配置目录存在(如果不存在就新建)
mkdir -p /etc/docker
# 2. 将多行内容一次性写入 /etc/docker/daemon.json,EOF 为结束标记
cat > /etc/docker/daemon.json << EOF
{
  # 使用 systemd 作为 cgroup 驱动,K8s 官方推荐,可避免“cgroupfs”与“systemd”混用导致的资源管理冲突
  "exec-opts": ["native.cgroupdriver=systemd"],
  # 镜像加速地址列表,Docker 会按顺序依次尝试
  "registry-mirrors": [
    "https://docker.1panel.live",     # 1Panel 镜像加速(中国境内)
    "https://hub.mirrorify.net",      # Mirrorify 镜像加速(中国境内)
    "https://docker.m.daocloud.io",   # DaoCloud 公共镜像加速(中国境内)
    "https://registry.dockermirror.com", # DockerMirror 镜像加速(中国境内)
    "https://docker.aityp.com",       # 渡渡鸟镜像同步站(中国境内)
    "https://docker.anyhub.us.kg",    # AnyHub 镜像加速(中国境内)
    "https://dockerhub.icu",          # DockerHub 镜像加速(中国境内)
    "https://docker.awsl9527.cn"      # AWSL 镜像加速(中国境内)
  ],
  # 允许通过 HTTP 连接的不安全私有仓库(通常用于公司内网测试环境)
  "insecure-registries": [
    "https://xxx.xxx.xxx.xxx"
  ],
  # 限制同时下载的最大并发数,避免带宽被占满
  "max-concurrent-downloads": 10,
  # 日志驱动及级别设置
  "log-driver": "json-file",        # 使用 json-file 日志驱动
  "log-level": "warn",              # 日志级别:warn(只打印警告及以上)
  "log-opts": {                     # 日志轮转参数
    "max-size": "10m",              # 单个日志文件最大 10 MB
    "max-file": "3"                 # 最多保留 3 个旧日志文件
  },
  # Docker 所有数据(镜像、容器、卷等)的存放目录
  "data-root": "/var/lib/docker"
}
EOF
# 3. 设置 Docker 开机自启动
systemctl enable docker
# 4. 重启 Docker 服务,让新的 daemon.json 生效
systemctl restart docker
# 5. 查看 Docker 服务状态,确认是否正常启动且无报错
systemctl status docker
# 6. 通过其中一个镜像加速器拉取 nginx:latest 镜像,验证加速效果
docker pull hub.mirrorify.net/library/nginx:latest
sudo cat > /etc/docker/daemon.json << EOF
{
  "exec-opts": [
    "native.cgroupdriver=systemd"
  ],
  "registry-mirrors": [
    "https://docker.1panel.live",
    "https://hub.mirrorify.net",
    "https://docker.m.daocloud.io",
    "https://registry.dockermirror.com",
    "https://docker.aityp.com",
    "https://docker.anyhub.us.kg",
    "https://dockerhub.icu",
    "https://docker.awsl9527.cn"
  ],
  "insecure-registries": [],
  "max-concurrent-downloads": 10,
  "log-driver": "json-file",
  "log-level": "warn",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "data-root": "/var/lib/docker"
}
EOF
# docker 使用的 Cgroup Driver 应当为 systemd 而不是默认值 cgroupfs
docker info | grep -i "Cgroup Driver"
docker pull nginx
docker images
cri-docker
https://github.com/Mirantis/cri-dockerd
cri-docker 是一个CRI(Container Runtime Interface)适配器,用于让 Kubernetes 可以通过标准的 CRI 接口与 Docker 通信。
Kubernetes 在 1.20 版本以前默认以 docker 作为容器运行时,内置 dockershim,kubelet 借由 dockershim 与 docker 进行通信,随着生态发展,涌现出多种容器运行时(例如 containerd 等),为支持其他容器运行时,Kubemetes 引入CRI(Container Runtime Interface),只要容器运行时支持 CRI,kubelet 便可以与其通信。
Kubernetes 从 1.20 开始弃用 dockershim,1.24 起彻底移除,默认不再支持 Docker 作为容器运行时。1.24 及以上版本 Kubernetes 推荐采用 containerd 作为容器运行时,若有旧系统兼容等特殊需求仍需使用 Docker,由于 Docker 本身并未实现 CRI,则需要 cri-docker 作为中间层,其一头对接 kubelet(CRI),一头对接 Docker API。
下载并解压
若由于网络环境无法直接从 Github 下载,可以访问 https://github.akams.cn 选择一个可以访问的代理连接进行下载。
wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.18/cri-dockerd-0.3.18.amd64.tgz
tar -zxvf cri-dockerd-*.amd64.tgz
sudo cp cri-dockerd/cri-dockerd /usr/bin/
sudo chmod +x /usr/bin/cri-dockerd
配置
官方配置模板:https://github.com/Mirantis/cri-dockerd/tree/master/packaging/systemd
/etc/systemd/system/cri-docker.service
在官方模板基础上,ExecStart 中需加上:
--network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9
cat > /etc/systemd/system/cri-docker.service <<EOF
[Unit]
Description=CRI Interface for Docker Application Container Engine
Documentation=https://docs.mirantis.com
After=network-online.target firewalld.service docker.service
Wants=network-online.target
Requires=cri-docker.socket
[Service]
Type=notify
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3
# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
Delegate=yes
KillMode=process
[Install]
WantedBy=multi-user.target
EOF
/etc/systemd/system/cri-docker.socket
官方模板原样配置。
cat > /etc/systemd/system/cri-docker.socket <<EOF
[Unit]
Description=CRI Docker Socket for the API
PartOf=cri-docker.service
[Socket]
ListenStream=%t/cri-dockerd.sock
SocketMode=0660
SocketUser=root
SocketGroup=docker
[Install]
WantedBy=sockets.target
EOF
# 让 systemd 重新扫描磁盘上的单元文件(.service、.socket、.mount 等)。当新增、修改或删除了任何 systemd 配置文件(如 /etc/systemd/system/cri-docker.service)后,必须执行这一步,否则 systemd 仍然使用旧的缓存。
sudo systemctl daemon-reload
# 设置 cri-docker 开机自启,并且现在立即启动
sudo systemctl enable cri-docker --now
docker --version
安装 k8s
在每个节点进行相同操作。
# Kubernetes 软件包在 Ubuntu 24.04 的默认包存储库中不可用,故需要添加存储库然后进行安装。
# 使用 curl 命令下载 Kubernetes 包存储库的公共签名密钥。由于网络原因,采用阿里源进行安装,https://developer.aliyun.com/mirror/kubernetes,https://mirrors.aliyun.com/kubernetes-new。
curl -fsSL https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.33/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# 添加 Kubernetes apt 仓库
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.33/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
# 安装 kubelet kubeadm kubectl 工具
sudo apt update
sudo apt install kubelet kubeadm kubectl -y
# kubelet 的 cgroup driver 与 Docker 保持一致(都使用 systemd)
# 编辑 /etc/default/kubelet 配置文件,KUBELET_EXTRA_ARGS 中配置 --cgroup-driver=systemd,即 KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"
sudo vi /etc/default/kubelet
sudo systemctl enable kubelet
锁定版本
为了防止自动更新
sh 体验AI代码助手 代码解读复制代码# 三台机器都执行
apt-mark hold kubelet kubeadm kubectl
如果想升级版本,可以解锁
sh
 体验AI代码助手
 代码解读
复制代码apt-mark unhold kubelet kubeadm kubectl
kubeadm config images list --kubernetes-version=stable
ubuntu@node1:~$ kubeadm config images list --kubernetes-version=stable
registry.k8s.io/kube-apiserver:v1.33.4
registry.k8s.io/kube-controller-manager:v1.33.4
registry.k8s.io/kube-scheduler:v1.33.4
registry.k8s.io/kube-proxy:v1.33.4
registry.k8s.io/coredns/coredns:v1.12.0
registry.k8s.io/pause:3.10
registry.k8s.io/etcd:3.5.21-0
集群初始化
初始化控制平面节点
配置文件初始化
sudo kubeadm config print init-defaults > kubeadm-config.yaml
sudo kubeadm init --config kubeadm-config.yaml
主节点执行初始化
- --control-plane-endpoint为主节点IP地址
sudo kubeadm init \
  --kubernetes-version=v1.33.4 \
  --pod-network-cidr=10.244.0.0/16 \
  --apiserver-advertise-address=192.168.211.128 \
  --image-repository registry.aliyuncs.com/google_containers \
  --cri-socket unix:///run/cri-dockerd.sock \
  --control-plane-endpoint=192.168.211.128
ubuntu@node1:~$ sudo kubeadm init \
  --kubernetes-version=v1.33.4 \
  --pod-network-cidr=10.244.0.0/16 \
  --apiserver-advertise-address=192.168.211.128 \
  --image-repository registry.aliyuncs.com/google_containers \
  --cri-socket unix:///run/cri-dockerd.sock \
  --control-plane-endpoint=192.168.211.128
[init] Using Kubernetes version: v1.33.4
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
W0815 06:19:57.097678    5480 checks.go:846] detected that the sandbox image "registry.aliyuncs.com/google_containers/pause:3.9" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "registry.aliyuncs.com/google_containers/pause:3.10" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local node1] and IPs [10.96.0.1 192.168.211.128]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost node1] and IPs [192.168.211.128 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost node1] and IPs [192.168.211.128 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 508.417821ms
[control-plane-check] Waiting for healthy control plane components. This can take up to 4m0s
[control-plane-check] Checking kube-apiserver at https://192.168.211.128:6443/livez
[control-plane-check] Checking kube-controller-manager at https://127.0.0.1:10257/healthz
[control-plane-check] Checking kube-scheduler at https://127.0.0.1:10259/livez
[control-plane-check] kube-controller-manager is healthy after 7.010604153s
[control-plane-check] kube-scheduler is healthy after 8.348199286s
[control-plane-check] kube-apiserver is healthy after 10.003951502s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node node1 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node node1 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: 1dxtvx.whlw008htysk186v
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
  export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
  kubeadm join 192.168.211.128:6443 --token 1dxtvx.whlw008htysk186v \
        --discovery-token-ca-cert-hash sha256:57dd28c43642daf8f27edd4f37fba50031596846e925c78527a5843ab5795091 \
        --control-plane 
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.211.128:6443 --token 1dxtvx.whlw008htysk186v \
        --discovery-token-ca-cert-hash sha256:57dd28c43642daf8f27edd4f37fba50031596846e925c78527a5843ab5795091
主节点
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
在主节点安装网络插件(以 calico 为例)
https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
curl https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/calico.yaml -O
kubectl apply -f calico.yaml
将工作节点加入集群
每一个工作节点均需要执行一次。
# 注意这里需额外指定参数 cri-socket
sudo kubeadm join 192.168.211.128:6443 --token 1dxtvx.whlw008htysk186v \
        --discovery-token-ca-cert-hash sha256:57dd28c43642daf8f27edd4f37fba50031596846e925c78527a5843ab5795091  --cri-socket=unix:///run/cri-dockerd.sock
ubuntu@node2:~$ sudo kubeadm join 192.168.211.128:6443 --token 1dxtvx.whlw008htysk186v \
        --discovery-token-ca-cert-hash sha256:57dd28c43642daf8f27edd4f37fba50031596846e925c78527a5843ab5795091  --cri-socket=unix:///run/cri-dockerd.sock
[preflight] Running pre-flight checks
[preflight] Reading configuration from the "kubeadm-config" ConfigMap in namespace "kube-system"...
[preflight] Use 'kubeadm init phase upload-config --config your-config-file' to re-upload it.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 1.01902253s
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在主节点获取集群各节点状态,工作节点刚加入集群处于 NotReady,约1到2分钟后会变为Ready。
kubectl get nodes
ubuntu@node1:~$ kubectl get nodes
NAME    STATUS   ROLES           AGE    VERSION
node1   Ready    control-plane   21m    v1.33.4
node2   Ready    <none>          3m3s   v1.33.4
node3   Ready    <none>          105s   v1.33.4
在主节点获取当前各容器状态
kubectl get pods -n kube-system
ubuntu@node1:~$ kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS      AGE
calico-kube-controllers-576865d959-z5gm9   1/1     Running   0             11m
calico-node-669cl                          1/1     Running   0             11m
calico-node-cth4h                          1/1     Running   0             2m1s
calico-node-lz5s5                          1/1     Running   0             3m19s
coredns-757cc6c8f8-cn7mb                   1/1     Running   0             21m
coredns-757cc6c8f8-jcddf                   1/1     Running   0             21m
etcd-node1                                 1/1     Running   0             21m
kube-apiserver-node1                       1/1     Running   0             21m
kube-controller-manager-node1              1/1     Running   1 (11m ago)   21m
kube-proxy-b64rv                           1/1     Running   0             2m1s
kube-proxy-c5dkm                           1/1     Running   0             3m19s
kube-proxy-j6sg5                           1/1     Running   0             21m
kube-scheduler-node1                       1/1     Running   1 (11m ago)   21m
- 网络插件(Calico):calico-kube-controllers和 3 个calico-node都在运行,说明跨节点网络通信正常。
- DNS 服务(CoreDNS):两个副本都在运行,服务发现功能正常。
- 控制平面组件:- etcd:集群状态存储正常。
- kube-apiserver、- kube-controller-manager、- kube-scheduler:都在主节点上运行,控制平面健康。
 
- kube-proxy:每个节点一个代理 Pod,确保服务流量转发正常。
回退
kubeadm reset
rm -rf /root/.kube
rm -rf /etc/cni/net.d
rm -rf /etc/kubernetes/*
# 重启运行时
systemctl restart docker
# 重新初始化即可