Score:0

GKE Kubernetes blue/green deployment error window

ua flag

We have a container cluster with mode: Autopilot running in GKE. We are currently receiving errors in a short window when performing a "blue/green"-deployment from Jenkins.

When we switch the service to the new deployment there is a window under 100ms that will generate the following error.

<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>502 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
<h2></h2>
</body></html>

I assume that this is because one of the pods are not started yet, but still starts routing traffic to the deployment.

We check that the deployment is rolled out after the deployment is created like this.

With the Jenkins plugin: https://github.com/jenkinsci/google-kubernetes-engine-plugin We have the verifyDeployments attribute set to true.

step([
  $class: 'KubernetesEngineBuilder',
  projectId: env.PROJECT_ID,
  clusterName: env.CLUSTER_NAME,
  namespace: env.NAMESPACE,
  location: env.CLUSTER_LOCATION,
  manifestPattern: './apps/app/deployments/green.yaml',
  credentialsId: env.APP_CREDENTIALS_ID,
  verifyDeployments: true
])

We also included a second check to really verify that the deployment is rolled out. Apparently the Jenkins plugin doesn't seem to do this very reliably.

kubectl rollout status deployment app-deployment --namespace app-namespace --watch --timeout=5m

We also noticed that it may occur that the deployment can error and a service is created anyway a subsequent step, which will crash the application, but this is another case we need to figure out how to solve, probably related to the Jenkins plugin.

Our deployment YAML looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-blue
  namespace: app
  labels: {app.kubernetes.io/managed-by: graphite-jenkins-gke}
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  selector:
    matchLabels: {app: app-blue}
  template:
    metadata:
      labels: {app: app-blue}
    spec:
      automountServiceAccountToken: true
      containers:
        image: eu.gcr.io/container-registry-project/app:latest
        imagePullPolicy: Always
        name: app
        ports:
        - {containerPort: 8080, name: http, protocol: TCP}
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 5
        resources:
          limits: {cpu: 500m, ephemeral-storage: 1Gi, memory: 512Mi}
          requests: {cpu: 500m, ephemeral-storage: 1Gi, memory: 512Mi}
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: [NET_RAW]
          privileged: false
          readOnlyRootFilesystem: false
          runAsNonRoot: false
      restartPolicy: Always
      schedulerName: default-scheduler
      serviceAccount: app
      serviceAccountName: app

Our service YAML looks like this:

apiVersion: v1
kind: Service
metadata:
  name: app-service
  namespace: app
spec:
  selector:
    app: app-blue
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

We simply switch the selector - app: in the service, to the app-blue or the app-green deployment to switch to the new deployment, but always get a small window of errors when doing this, anyone have any idea what we're doing wrong?

Sergiusz avatar
lv flag
Have you tried increasing initialDelay on readinessProbe?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.