Cleaning up Working directory Self hosted Runners Error

I am trying to diagnose why this error keeps coming up with my self hosted machine runner when the cleanup_working_directory is set to true.
Error:
Detail: failed to remove working directory before starting task: unlinkat /var/lib/circleci/workdir/%s/.gradle/buildOutputCleanup/buildOutputCleanup.lock: permission denied

Below is my runner config. I believe the permissions are ok.

#!/bin/bash

set -eu pipefail
#-------------------------------------------------------------------------------
#
# CircleCI Runner installation script
# https://circleci.com/docs/runner-installation/
#
#-------------------------------------------------------------------------------
platform="linux/amd64"      
CONFIG_PATH="/etc/circleci-runner/circleci-runner-config.yaml"    # Determines where Runner config will be stored
SERVICE_PATH="/etc/systemd/system/circleci-runner.service"           # Determines where the Runner service definition will be stored
TIMESTAMP=$(date +"%g%m%d-%H%M%S-%3N")                  # Used to avoid Runner naming collisions
HOST_NAME=$(hostname)

AUTH_TOKEN="${auth_token}"                                           # Auth token for CircleCI
RUNNER_NAME="${runner_name}"                                        # A runner name - this is not the same as the Resource class - keep it short, and only with letters/numbers/dashes/underscores
UNIQUE_RUNNER_NAME="$RUNNER_NAME-$HOST_NAME-$TIMESTAMP"           
USERNAME="circleci"                                

# Create circleci user and working directory
id -u circleci &> /dev/null || sudo adduser --disabled-password --gecos GECOS circleci

# Set up the runner directories
echo "Setting up CircleCI Runner directories"
sudo mkdir -p /var/lib/circleci/workdir
sudo chmod 0750 /var/lib/circleci/workdir
sudo chown -R circleci /var/lib/circleci/workdir

# This enables code to execute root commands on the instance and changes to the system may persist after the job is run
echo "circleci ALL=(ALL) NOPASSWD:ALL" | sudo tee -a /etc/sudoers

sudo mkdir -p /etc/circleci-runner && sudo touch /etc/circleci-runner/circleci-runner-config.yaml
sudo chown -R circleci: /etc/circleci-runner
sudo chmod 600 /etc/circleci-runner/circleci-runner-config.yaml

echo "Installing CircleCI Runner for $platform"
curl -s https://packagecloud.io/install/repositories/circleci/runner/script.deb.sh?any=true | sudo bash
sudo apt-get install -y -o Dpkg::Options::="--force-confold" circleci-runner

#-------------------------------------------------------------------------------
# Install the CircleCI runner configuration
# CircleCI Runner will be executing as the configured $USERNAME
# Note the short idle timeout - this script is designed for auto-scaling scenarios - if a runner is unclaimed, it will quit and the system will shut down as defined in the below service definition
#-------------------------------------------------------------------------------

cat << EOF >$CONFIG_PATH
api:
  auth_token: $AUTH_TOKEN
runner:
  name: $UNIQUE_RUNNER_NAME
  command_prefix: ["sudo", "-niHu", "$USERNAME", "--"]
  working_directory: /var/lib/circleci/workdir/%s
  cleanup_working_directory: true
  idle_timeout: 1h
  max_run_time: 5h
  mode: continuous
logging:
  file: /var/log/com.circleci.runner.log
EOF

#-------------------------------------------------------------------------------
# Create the service to override the default one in /lib/systemd/system/
# The service will always restart
#-------------------------------------------------------------------------------
cat << EOF >$SERVICE_PATH
[Unit]
Description=CircleCI Runner
After=network.target
[Service]
ExecStart=/usr/bin/circleci-runner machine -c $CONFIG_PATH
Restart=always
User=circleci
Group=circleci
NotifyAccess=exec
TimeoutStopSec=18300
[Install]
WantedBy = multi-user.target
EOF

#-------------------------------------------------------------------------------
# Configure your runner environment
# This script must be able to run unattended - without user input
#-------------------------------------------------------------------------------
sudo apt update && sudo apt upgrade -y
sudo apt install coreutils curl tar gzip zip unzip -y

# Enable CircleCI Runner service and start it
# This MUST be done last, as it will immediately advertise to the CircleCI server that the runner is ready to use
#-------------------------------------------------------------------------------
sudo systemctl enable circleci-runner && sudo systemctl start circleci-runner

# Check status
sudo systemctl status circleci-runner

Can you post your config.yml or a cut-down version that results in the same error?

Currently from the public docs it is not clear what is going on, but the following is rather unclear (aimed at CircleCI team members).

  • cleanup_working_directory is a flag to cause the working directory to be cleaned up after a job, not as the error message indicates before ‘starting task’

  • the %s is meant to be substituted for the unique working directory name at runtime, which does not seem to be happening before unlinkat is called.

At the moment my best guess is that the cleanup process has not been coded with support for the %s option in mind. This does make some sense as the %s option just injects a tmp directory name, which at least in the past is the default action of the agent with the starting location just being set to /tmp.

Here is the workflow that uses our internal orb to run a static analysis scan

workflows:
  build-test-deploy-image-to-ecr-and-update-helm-charts:
    jobs:
      - scan-utils/scan:
          name: Scan using Dependency Track prod
          project: hyg-partner-manager
          resource_class: resource-org/my-machine
          context:
            - aws-dev
            - dependency-track-prod

Here is the source code of the scan-utils orb and the scan job:

description: |
  Scan the repository and block the PR when a vulnerability is detected.

machine: true
resource_class: << parameters.resource_class >>
parameters:
  project:
    description: >-
      Name of the project to be created on Dependency Track.
    type: string
  resource_class:
    description: >-
      Resource class of the machine runner: “resource-org/my-machine" or “resource-org/my-dev-machine"
    type: string
steps:
  - checkout
  - run:
      name: Setup env-vars
      command: |
        cat \<< EOF > env-vars
        ORG_NAME=my-org
        dependency_track_api_host=${dependency_track_api_host}
        dependency_track_console_host=${dependency_track_console_host}
        WORKSPACE_DIR=/app
        dependency_track_api_key=${dependency_track_api_key}
        BLOCKING=${BLOCKING}
        EOF
  - run:
      name: Remove config and credential files
      command: |
        if [ -f "$HOME/.docker/config.json" ]; then
          rm -f ~/.docker/config.json
        fi

        if [ -f "$HOME/.aws/credentials" ]; then
          rm -f ~/.aws/credentials
        fi
  - aws-ecr/ecr-login:
      role-arn: "${CIRCLECI_AWS_ROLE_ARN}"
      assume-web-identity: true
      role-session-name: "circleci-aws-access"
      session-duration: "1800"
      source-profile: "OIDC-PROFILE"
  - run:
      name: Pull image dependency-track-sca from ECR
      command: |
        docker pull <image>
  - run:
      name: Docker run scanner
      command: |
        set -x
        docker run --rm \
            -v /tmp:/tmp \
            --env-file env-vars \
            -e REPO_NAME="<< parameters.project >>" \
            -v "${PWD}":/app \
            -t <image>

OK, that looks simple, without any complications so can you try the following

Set the runner so that it uses the following

   working_directory: /var/lib/circleci/workdir
   cleanup_working_directory: true

So no random directory is placed under your working directory. This configuration is what the cleanup flag is meant to support as once the job has been completed it is meant to empty the working directory.

To do this you may have to hand clean up your working directory first as past job runs will have left files in place and so your job may not be able to complete.

It is not feasible for us to do manual cleanups as everything is automated and running behind an Autoscaling Group.
After changing the working directory path to

 working_directory: /var/lib/circleci/workdir
 cleanup_working_directory: true

I still get the same error. I think this is a genuine bug.

CircleCI failed to run this build, check your config. Try re-running the 
build and if this issue persists, open a Support ticket. Detail: failed to 
remove working directory before starting task: unlinkat 
/var/lib/circleci/workdir: permission denied