A more complete example for using container agent self-hosted runner?

I’m having a little trouble getting a sample of using the new container agent self-hosted runner working, on GCP/GKE. I’m new to CircleCI in general so I think I’m just missing some understanding.

I’ve installed the container agent to our cluster and configured it with the token and from the logs, its clearly talking to CircleCI just fine – it’s receiving jobs and attempting to run them.

I’m trying to run a job to deploy into our cluster. So I have this job:

  deploy:
    docker:
      - image: google/cloud-sdk
    resource_class: our-organization/denis-test-resource-class
    steps:
      - checkout
      - run:
          name: authenticate gcloud CLI and set project
          command: |
            echo $GCLOUD_SERVICE_KEY | gcloud auth activate-service-account --key-file=-
            gcloud --quiet config set project ${GOOGLE_PROJECT_ID}
            gcloud --quiet config set compute/zone ${GOOGLE_COMPUTE_ZONE}
      - gcp-gke/update-kubeconfig-with-credentials:
          cluster: em-alpha
      - gcp-gke/rollout-image:
          cluster: em-alpha
          container: $IMAGE_NAME
          deployment: $IMAGE_NAME
          image: $DOCKER_FULL_IMAGE_NAME
          tag: $BRANCH
          namespace: em-services-alpha-01

When I run the job, looks like the authenticate gcloud CLI and set project step runs fine:

#!/bin/bash -eo pipefail
echo $GCLOUD_SERVICE_KEY | gcloud auth activate-service-account --key-file=-
gcloud --quiet config set project ${GOOGLE_PROJECT_ID}
gcloud --quiet config set compute/zone ${GOOGLE_COMPUTE_ZONE}

Activated service account credentials for: [circleci-gcp-access@*******.iam.gserviceaccount.com]
Updated property [core/project].
WARNING: Property validation for compute/zone was skipped.
Updated property [compute/zone].
CircleCI received exit code 0

but I think gcp-gke/update-kubeconfig-with-credentials is failing (and I’m not sure I even need it?) (the step in the UI output is labelled Install latest gcloud CLI version, if not available)

#!/bin/bash -eo pipefail
install () {
  # Set sudo to work whether logged in as root user or non-root user
  if [[ $EUID == 0 ]]; then export SUDO=""; else export SUDO="sudo"; fi
  cd ~/
  curl -Ss --retry 5 https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-283.0.0-linux-x86_64.tar.gz | tar xz
  echo 'source ~/google-cloud-sdk/path.bash.inc' >> $BASH_ENV
}

if grep 'docker\|lxc' /proc/1/cgroup > /dev/null 2>&1; then
  if [[ $(command -v gcloud) == "" ]]; then
    install
  else
    echo "gcloud CLI is already installed."
  fi
else
  echo "----------------------------------------------------------------------------------------------------"
  echo "this is a machine executor job, replacing default installation of gcloud CLI"
  echo "----------------------------------------------------------------------------------------------------"
  sudo rm -rf /opt/google-cloud-sdk
  install
fi

----------------------------------------------------------------------------------------------------
this is a machine executor job, replacing default installation of gcloud CLI
----------------------------------------------------------------------------------------------------
/bin/bash: line 18: sudo: command not found

Exited with code exit status 127
CircleCI received exit code 127

Is there a more complete example of something like this somewhere? I’m piecing together bits and pieces from here and there…

(cc @sebastian-lerner :slight_smile: )

Hi @denis

In your .circleci/config.yml file, the two commands after the run step, are those meant to be specific “jobs” you want to run? or are those configurations for your cluster?

gcp-gke/update-kubeconfig-with-credentials:
          cluster: em-alpha
      - gcp-gke/rollout-image:
          cluster: em-alpha
          container: $IMAGE_NAME
          deployment: $IMAGE_NAME
          image: $DOCKER_FULL_IMAGE_NAME
          tag: $BRANCH
          namespace: em-services-alpha-01

The way you redirect the job to the cluster itself is by using the resource class associated with your container-agent. So you shouldn’t need to specify cluster: or namespace:, etc.

Here’s a full example CircleCI config file that uses a container-agent: CircleCI’s self-hosted runner FAQs - CircleCI

Let me know if you still have questions, happy to help

I left out some details I should have made more clear.

The gcp-gke/update-kubeconfig-with-credentials and gcp-gke/rollout-image steps are from the gcp-gke Orb – CircleCI Developer Hub - circleci/gcp-gke

So all those references to cluster, etc, are just parameters to those steps; I want to deploy a docker image from GCP’s Artifact Repository to our cluster (which is behind firewall, which is why I’m using the container agent self-hosted runner for this).

I’m pretty sure the job is running in the container agent on our cluster. I have the resource_class defined at the job level:

    docker:
      - image: google/cloud-sdk
    resource_class: our-organization/denis-test-resource-class

Should I not be using the google/cloud-sdk docker image?

It’s weird that at one step (where I’m explicitly specifying commands) the gcloud CLI command is working fine… but in the steps from the orb, it seems to think gcloud is not installed and is trying to install it.

Thanks!

Interesting, thanks for clarifying. Let me ask the engineering team I work with since I am stumped at this point. I’ll let you know when I have more, thanks for your patience.

@denis we think we found the issue and it seems to be a problem with some assumptions in the orb logic that aren’t compatible with a k8s environment.

We’re working internally to see if we can get the orb updated in a way to get around this issue. I’ll reach out when I have more. Thanks for the patience.

@denis an update here, an engineer at CircleCI took a look at this issue this week.

We think a work-around is simply to install sudo and that should turn the build green. We’ll be updating the orb to put a check in place to see if sudo is installed and if not, fail the job with a proper message since it’s cryptic right now