Pending 說明 Pod 還沒有調度到某個 Node 上面。可以通過 kubectl describe pod <pod-name> 命令查看到當前 Pod 的事件,進而判斷爲什麼沒有調度。如
$ kubectl describe pod mypod
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 12s (x6 over 27s) default-scheduler 0/4 nodes are available: 2 Insufficient cpu.
可能的原因包括
資源不足,集群內所有的 Node 都不滿足該 Pod 請求的 CPU、內存、GPU 或者臨時存儲空間等資源。解決方法是刪除集群內不用的 Pod 或者增加新的 Node。
HostPort 端口已被佔用,通常推薦使用 Service 對外開放服務端口
Pod 一直處於 Waiting 或 ContainerCreating 狀態
首先還是通過 kubectl describe pod <pod-name> 命令查看到當前 Pod 的事件
$ kubectl -n kube-system describe pod nginx-pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned nginx-pod to node1
Normal SuccessfulMountVolume 1m kubelet, gpu13 MountVolume.SetUp succeeded for volume "config-volume"
Normal SuccessfulMountVolume 1m kubelet, gpu13 MountVolume.SetUp succeeded for volume "coredns-token-sxdmc"
Warning FailedSync 2s (x4 over 46s) kubelet, gpu13 Error syncing pod
Normal SandboxChanged 1s (x4 over 46s) kubelet, gpu13 Pod sandbox changed, it will be killed and re-created.
可以發現,該 Pod 的 Sandbox 容器無法正常啓動,具體原因需要查看 Kubelet 日誌:
$ journalctl -u kubelet
...
Mar 14 04:22:04 node1 kubelet[29801]: E0314 04:22:04.649912 29801 cni.go:294] Error adding network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.4.1/24
Mar 14 04:22:04 node1 kubelet[29801]: E0314 04:22:04.649941 29801 cni.go:243] Error while adding to cni network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.4.1/24
Mar 14 04:22:04 node1 kubelet[29801]: W0314 04:22:04.891337 29801 cni.go:258] CNI failed to retrieve network namespace path: Cannot find network namespace for the terminated container "c4fd616cde0e7052c240173541b8543f746e75c17744872aa04fe06f52b5141c"
Mar 14 04:22:05 node1 kubelet[29801]: E0314 04:22:05.965801 29801 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "nginx-pod" network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.4.1/24
發現是 cni0 網橋配置了一個不同網段的 IP 地址導致,刪除該網橋(網絡插件會自動重新創建)即可修復
$ kubectl describe pod mypod
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 36s default-scheduler Successfully assigned sh to k8s-agentpool1-38622806-0
Normal SuccessfulMountVolume 35s kubelet, k8s-agentpool1-38622806-0 MountVolume.SetUp succeeded for volume "default-token-n4pn6"
Normal Pulling 17s (x2 over 33s) kubelet, k8s-agentpool1-38622806-0 pulling image "a1pine"
Warning Failed 14s (x2 over 29s) kubelet, k8s-agentpool1-38622806-0 Failed to pull image "a1pine": rpc error: code = Unknown desc = Error response from daemon: repository a1pine not found: does not exist or no pull access
Warning Failed 14s (x2 over 29s) kubelet, k8s-agentpool1-38622806-0 Error: ErrImagePull
Normal SandboxChanged 4s (x7 over 28s) kubelet, k8s-agentpool1-38622806-0 Pod sandbox changed, it will be killed and re-created.
Normal BackOff 4s (x5 over 25s) kubelet, k8s-agentpool1-38622806-0 Back-off pulling image "a1pine"
Warning Failed 1s (x6 over 25s) kubelet, k8s-agentpool1-38622806-0 Error: ImagePullBackOff