GKE Kubernetes blue/green deployment error window

Question

Score:0

Server

GKE Kubernetes blue/green deployment error window

JazzCat

1/3/23, 9:36 AM

We have a container cluster with mode: Autopilot running in GKE. We are currently receiving errors in a short window when performing a "blue/green"-deployment from Jenkins.

When we switch the service to the new deployment there is a window under 100ms that will generate the following error.

<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>502 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
<h2></h2>
</body></html>

I assume that this is because one of the pods are not started yet, but still starts routing traffic to the deployment.

We check that the deployment is rolled out after the deployment is created like this.

With the Jenkins plugin: https://github.com/jenkinsci/google-kubernetes-engine-plugin We have the verifyDeployments attribute set to true.

step([
  $class: 'KubernetesEngineBuilder',
  projectId: env.PROJECT_ID,
  clusterName: env.CLUSTER_NAME,
  namespace: env.NAMESPACE,
  location: env.CLUSTER_LOCATION,
  manifestPattern: './apps/app/deployments/green.yaml',
  credentialsId: env.APP_CREDENTIALS_ID,
  verifyDeployments: true
])

We also included a second check to really verify that the deployment is rolled out. Apparently the Jenkins plugin doesn't seem to do this very reliably.

kubectl rollout status deployment app-deployment --namespace app-namespace --watch --timeout=5m

We also noticed that it may occur that the deployment can error and a service is created anyway a subsequent step, which will crash the application, but this is another case we need to figure out how to solve, probably related to the Jenkins plugin.

Our deployment YAML looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-blue
  namespace: app
  labels: {app.kubernetes.io/managed-by: graphite-jenkins-gke}
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  selector:
    matchLabels: {app: app-blue}
  template:
    metadata:
      labels: {app: app-blue}
    spec:
      automountServiceAccountToken: true
      containers:
        image: eu.gcr.io/container-registry-project/app:latest
        imagePullPolicy: Always
        name: app
        ports:
        - {containerPort: 8080, name: http, protocol: TCP}
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 5
        resources:
          limits: {cpu: 500m, ephemeral-storage: 1Gi, memory: 512Mi}
          requests: {cpu: 500m, ephemeral-storage: 1Gi, memory: 512Mi}
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: [NET_RAW]
          privileged: false
          readOnlyRootFilesystem: false
          runAsNonRoot: false
      restartPolicy: Always
      schedulerName: default-scheduler
      serviceAccount: app
      serviceAccountName: app

Our service YAML looks like this:

apiVersion: v1
kind: Service
metadata:
  name: app-service
  namespace: app
spec:
  selector:
    app: app-blue
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

We simply switch the selector - app: in the service, to the app-blue or the app-green deployment to switch to the new deployment, but always get a small window of errors when doing this, anyone have any idea what we're doing wrong?

77

0 + 1

jenkins

kubernetes

google-kubernetes-engine

GKE Kubernetes blue/green deployment error window

Post an answer