Kubernetes-15:一文详解Pod、Node调度规则(亲和性、污点、容忍
Kubernetes Pod调度说明简介Scheduler 是 Kubernetes 的调度器,主要任务是把定义的Pod分配到集群的节点上,听起来非常简单,但要考虑需要方面的问题:
Scheduler 是作为单独的服务运行的,启动之后会一直监听API Server,获取 podSpec.NodeName为空的Pod,对每个Pod都会创建一个buiding,表明该Pod应该放在哪个节点上 ?调度过程调度流程:首先过滤掉不满足条件的节点,这个过程称为predicate;然后对通过的节点按照优先级的顺序,这个是priority;最后从中选择优先级最高的节点。如果中间有任何一步报错,则直接返回错误信息。 ?Predicate有一系列的算法可以使用:
如果在predicate过程中没有适合的节点,Pod会一直处于Pending状态,不断重新调度,直到有节点满足条件,经过这个步骤,如果多个节点满足条件,就会进入priority过程:按照优先级大小对节点排序,优先级由一系列键值对组成,键是该优先级的名称,值是它的权重,这些优先级选项包括:
通过算法对所有的优先级项目和权重进行计算,得出最终的结果 ?自定义调度器除了Kubernetes自带的调度器,也可以编写自己的调度器,通过spec.schedulername参数指定调度器的名字,可以为Pod选择某个调度器进行调度,比如下边的Pod选择my-scheduler进行调度,而不是默认的default-scheduler apiVersion: v1
kind: Pod
metadata:
name: scheduler-test
labels:
name: example-scheduler
spec:
schedulername: my-scheduler
containers:
- name: Pod-test
image: nginx:v1
? 下边开始正式介绍Pod的各种调度方法!!! ? 一、亲和性注意,以下所有的测试都是1Master、1Node的情况下: [root@Centos8 scheduler]# kubectl get node NAME STATUS ROLES AGE VERSION centos8 Ready master 134d v1.15.1 testcentos7 Ready <none> 133d v1.15.1 ? 1、节点亲和性pod.spec.affinity.nodeAffinity
requiredDuringSchedulingIgnoredDuringExecution硬策略vim node-affinity-required.yaml apiVersion: v1 kind: Pod metadata: name: affinity-required labels: app: node-affinity-pod spec: containers: - name: with-node-required image: nginx:1.2.1 imagePullPolicy: IfNotPresent affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname #节点名称 operator: NotIn #不是 values: - testcentos7 #node节点 [root@Centos8 ~]# kubectl get node --show-labels #查看node节点标签 NAME STATUS ROLES AGE VERSION LABELS centos8 Ready master 133d v1.15.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=centos8,kubernetes.io/os=linux,node-role.kubernetes.io/master= testcentos7 Ready <none> 133d v1.hostname=testcentos7,kubernetes.io/os=linux ## 目前只有两个节点,一个master 一个node,策略中表示此Pod必须不在testcentos7这个节点上 ## Pod创建之后,因为除去testcentos7节点已再无其他node,Master节点又不能被调度,所以一直处于Pending状态 将yaml文件中,NotIn改为In affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname #节点名称 operator: In #是,存在 values: - testcentos7 #node节点 再次创建,已经落在指定node节点中 [root@Centos8 scheduler]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE affinity-required 1/1 Running 0 11s 10.244.3.219 testcentos7 ? preferredDuringSchedulingIgnoredDuringExecution软策略vim node-affinity-preferred.yaml apiVersion: v1 kind: Pod metadata: name: affinity-preferred labels: app: node-affinity-pod spec: containers: - name: with-node-preferred image: nginx:1.2.1 imagePullPolicy: IfNotPresent affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: #权重为1,软策略中权重越高匹配到的机会越大 preference: #更偏向于 matchExpressions: - key: kubernetes.io/ #node名称 operator: In #等于,为 values: - testcentos7 #node真实名称 ## 更想落在node节点名称为testcentos7的node中 [root@Centos8 scheduler]# kubectl create -f node-affinity-prefered.yaml pod/affinity-prefered created [root@Centos8 scheduler]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE affinity-prefered 0 9s 3.220 testcentos7 更改一下策略,将node节点名称随便更改为不存在的node名称,例如kube-node2 affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: #node名称 operator: In #等于,为 values: - kube-node2 #node真实名称 [root@Centos8 scheduler]# kubectl create -f node-affinity-prefered created ##创建后,同样是落在了testcentos7节点上,虽然它更想落在kube-node2节点上,但没有,只好落在testcentos7节点中 [root@Centos8 scheduler]# kubectl get pod -0 17s 3.221 testcentos7 ? 软硬策略合体vim node-affinity-common.yaml node labels: app: node-affinity-pod spec: containers: - name: with-affinity-node image: nginx:v1 imagePullPulicy: IfNotPresent affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - operator: NotIn values: - k8s-node2 preferredDuringSchedulingIgnoredDuringExecution: - weight: preference: matchExpressions: - key: source operator: In values: - hello
?键值运算关系
?2、Pod亲和性pod.spec.affinity.podAffinity/podAntiAffinity
先创建一个测试Podvim pod.yaml apiVersion: v1 kind: Pod metadata: name: pod- labels: app: nginx type: web spec: containers: - name: pod- image: nginx: imagePullPolicy: IfNotPresent ports: - name: web containerPort: 80 [root@Centos8 scheduler]# kubectl create -f pod.yaml pod/pod- created [root@Centos8 scheduler]# kubectl get pod --show-labels NAME READY STATUS RESTARTS AGE LABELS pod-1 0 4s app=nginx,type=web ?requiredDuringSchedulingIgnoredDuringExecution Pod硬策略vim pod-affinity-required.yaml required labels: app: pod-3 spec: containers: - name: with-pod-required image: nginx: imagePullPolicy: IfNotPresent affinity: podAffinity: #在同一域下 requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app #标签key operator: In values: - nginx #标签value topologyKey: kubernetes.io/hostname #域的标准为node节点的名称
创建测试:[root@Centos8 scheduler]# kubectl create -f pod-affinity-required created [root@Centos8 scheduler]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE affinity-required 0 43s 3.224 testcentos7 pod-1 0 10m 3.223 testcentos7 # 和此标签Pod在同一node节点下 ? 将podAffinity改为podAnitAffinity,使它们不在用于node节点下 apiVersion: v1 kind: Pod metadata: name: required-pod2 labels: app: pod- imagePullPolicy: IfNotPresent affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: -hostname #域的标准为node节点的名称
创建测试:[root@Centos8 scheduler]# kubectl create -f pod-affinity-required.yaml pod/required-pod2 created [root@Centos8 scheduler]# kubectl get pod NAME READY STATUS RESTARTS AGE affinity-required 9m40s pod- 19m required-pod2 51s ## 由于我这里只有一个节点,所以required-pod2只能处于Pending状态 ? preferedDuringSchedulingIgnoredDuringExecution Pod软策略vim pod-affinity-prefered.yaml ... apiVersion: v1 kind: Pod metadata: name: affinity-prefered labels: app: pod-prefered image: nginx:v1 imagePullPolicy: IfNotPresent affinity: podAntiAffinity: #不在同一个域下 preferedDuringSchedulingIgnoredDuringExecution: - weight: podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - pod-2 topologyKey: kubernetes.io/ ...
亲和性/反亲和性调度策略比较如下:
? 二、污点(Taint)和容忍(Toleration)节点亲和性,是Pod的一种属性(偏好或硬性要求),它使Pod被吸引到一类特定的节点,Taint则相反,它使节点能够 排斥 一类特定的Pod Taint与Toleration相互配合,可以用来避免Pod被分配到不合适的节点上,每个节点上都可以应用一个或两个taint,这表示对那些不能容忍这些taint和pod,是不会被该节点接受的,如果将toleration应用于pod上,则表示这些pod可以(但不要求)被调度到具有匹配taint的节点上 注意,以下所有的测试都是1Master、1Node的情况下: 15.1
? 1、污点(Taint)(1)污点的组成使用kubectl taint 命令可以给某个node节点设置污点,Node被设置上污点之后就和Pod之间存在了一种相斥的关系,可以让Node拒绝Pod的调度执行,甚至将已经存在得Pod驱逐出去 每个污点的组成如下: key=value:effect 每个污点有一个 key 和 value 作为污点标签,其中 value 可以为空,effect描述污点的作用,当前 taint effect 支持如下三个选项:
?(2)污点的设置、查看和去除
[root@Centos8 scheduler]# kubectl describe node centos8 Taints: node-role.kubernetes.io/master:NoSchedule ## 设置污点 kubectl taint nodes [node name] key1=value:NoSchedule ## 节点说明中,查看Taint字段 kubectl describe node [node name] ## 去除污点 kubectl taint nodes [node name] key1:NoSchedule- ? 测试效果: ## 查看当前节点所拥有Pod,都在testcentos7中 [root@Centos8 scheduler]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE affinity-required 0 68m 0 78m testcentos7 required-pod2 0 59m <none> <none> ## 给testcentos7设置NoExecute污点 [root@Centos8 scheduler]# kubectl taint nodes testcentos7 check=vfan:NoExecute node/testcentos7 tainted ## 查看Pod有没被驱逐出去 [root@Centos8 scheduler]# kubectl get pod NAME READY STATUS RESTARTS AGE required-pod2 62m ## 只剩一个Pending状态的Pod,因为他还没创建,所以还未分配Node 查看?testcentos7 节点信息 [root@Centos8 scheduler]# kubectl describe node testcentos7
Taints: check=vfan:NoExecute
? 目前所有的节点都被打上了污点,新建Pod测试下效果: [root@Centos8 scheduler]# kubectl create - created [root@Centos8 scheduler]# kubectl get pod NAME READY STATUS RESTARTS AGE pod-1 4s required-pod2 0 7h18m
?2、容忍(Toleration)设置了污点的Node将根据 taint 的 effect:NoSchedule、PreferNoSchedule、NoExecute和Pod之间产生互斥的关系,Pod将在一定程度上不会被调度到 Node 上。但我们可以在 Pod 上设置容忍(Toleration),意思是设置了容忍的 Pod 将可以容忍污点的存在,可以被调度到存在污点的Node上 Pod.spec.tolerations tolerations: - key: "key1" operator: Equal value: value1 effect: NoSchedule tolerationSeconds: 3600 - key: NoExecute" - key: key2Exists"
(1)当不指定key时,表示容忍所有污点的key: tolerations: - operator: " 例如: vim pod3.yaml 80 tolerations: - operator: effect: " [root@Centos8 scheduler]# kubectl create -f pod3.yaml pod/pod- created [root@Centos8 scheduler]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-0 14m <none> <none> pod-2 0 11m 3.229 testcentos7 pod-3 0.107 centos8
? (2)当不指定 offect 值时,表示容忍所有的污点类型 "
? (3)Pod容忍测试用例: vim pod2.yaml tolerations: - key: check operator: value: vfanf pod2.yaml pod/pod- created [root@Centos8 scheduler]# kubectl get pod NAME READY STATUS RESTARTS AGE pod- 3m25s pod-0 4s
?最后将Node污点去除:kubectl taint nodes testcentos7 check=vfan:NoExecute-
? ? 三、指定调度节点注意,以下所有的测试都是1Master、1Node的情况下: 15.1
? 1、Pod.spec.nodeName 将 Pod 直接调度到指定的 Node 节点上,会跳过 Schedule 的调度策略,该匹配规则是强制匹配 vim nodeName1.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nodename- labels: app: web spec: replicas: template: metadata: labels: app: web spec: nodeName: testcentos7 containers: - name: nodename- image: nginx: imagePullPolicy: IfNotPresent ports: - containerPort: 80 [root@Centos8 scheduler]# kubectl apply -f nodeName1.yaml deployment.extensions/nodename-o wide NAME READY STATUS RESTARTS AGE IP NODE nodename-1-7f4c7db4d4-hdcjv 1 Running 0 92s 3.240 testcentos7 nodename-1-7f4c7db4d4-xxrj8 0 93s 3.2381-7f4c7db4d4-zkt2c 3.239 testcentos7
为了对比效果,修改yaml文件中Node节点为centos8 nodeName: centos8 再次创建测试 [root@Centos8 scheduler]# kubectl delete -f nodeName1.yaml [root@Centos8 scheduler]# kubectl apply - created NAME READY STATUS RESTARTS AGE IP NODE nodename-1-7d49bd7849-ct9w5 0 2m2s 0.112 centos8 nodename-1-7d49bd7849-qk9mm 0.1131-7d49bd7849-zdphd 0.111 centos8
? 2、Pod.spec.nodeSelector:通过 kubernetes 的 label-selector 机制选择节点,由调度策略匹配 label,而后调度 Pod 到目标节点,该匹配规则属于强制约束 vim nodeSelect1.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: node- labels: app: web spec: replicas: template: metadata: labels: app: myweb spec: nodeSelector: type: ssd # Node包含的标签 containers: - name: myweb image: nginx: ports: - containerPort: f nodeSelect1.yaml deployment.extensions/node- created [root@Centos8 scheduler]# kubectl get pod NAME READY STATUS RESTARTS AGE node-1-684b6cc685-9lzbn 3s node-1-684b6cc685-lwzrm 1-684b6cc685-qlgjq 3s [root@Centos8 scheduler]# kubectl get node --show-labels NAME STATUS ROLES AGE VERSION LABELS centos8 Ready master 135d v1. testcentos7 Ready <none> 134d v1. |