Suddenly Job started failing due to missing file from the attached workspace

Hi,

I have a CircleCI job which abruptly started failing because of the missing file from workspace attachment (without any new changes).

In my workflow, the docker job 1st creates a docker image & stores the version in text file (say version.txt) which is persisted to the workspace.
This workspace is then attached to the next deploy job to read the docker image version.

The job started failing around 2020-11-10 & the last successful deployment was on 2020-10-21.

Latest Failed job:

Last Successful Job:

The code repo is available in reply (new user restriction can’t share more than 2 URLs)

Thanks,
Abhishek

GitHub URL: https://github.com/abhisheksr01/spring-boot-microservice-best-practices

hello,

Could you try to ssh into the deploy app job and see if the file exists anywhere at all? You could try scanning the entire filesystem to search it sudo find / -name "version.txt". If it’s not being misplaced for some reason in the attache job then the next step would be to see if it’s being generated properly in the job which persists it. Let me know if you get to that point.

Hi, Thanks for your reply,

I tried running the find command in both the jobs & below are findings with the respective attachments.

Persisting Job: The file is created properly & can see the version is getting populated.

Deploy Job: Ran the find command but no traces of the file.

I tried the steps with a new file name as well but it doesn’t work.

Not sure if I can try any hack here to make it working again.

Regards,
Abhishek

Attaching Persisting Job screenshot here due to limitation of 1 media.

Hello,

Passing workflow:
Build - circleci/openjdk@sha256:1e11bbe2854ecf5c467d99d5760f48a94227f88b503efc4e6ca6d96ca9f7b139
Deploy - nukengprodservice/helm@sha256:9aaf8f88769c84b3e26d51dc81829a3d678770d9f83b871f4918d5b1f5dbf6fa

Failing workflow:
Build - circleci/openjdk@sha256:1e11bbe2854ecf5c467d99d5760f48a94227f88b503efc4e6ca6d96ca9f7b139
Deploy - nukengprodservice/helm@sha256:23011d4a9c9d0699fb37a25d72c98ace94cfb52212dba8325b5aae4c11345b2d

One thing I noticed is that the deploy job image has changed at some point between the failing and passing job, and the job where the file is persisted has not changed. A quick test to see if that image change is the reason for the random failure is to pin the old image using the hash above. You can see examples in this article. It’s possible an underlying image change such as the base image or permissions could cause issues with the workspace.

docker: 
   - image: nukengprodservice/helm@sha256:9aaf8f88769c84b3e26d51dc81829a3d678770d9f83b871f4918d5b1f5dbf6fa

Another thing to consider is trying to persist and restore the file to a different directory, such as /tmp. But changing the image back is a good simple first step.

Unfortunately none of them seems to be working.

I tried the old hash for the deploy stage here:

Also storing the “docker-version.txt” in a tmp directory