Pipeline.git.base_revision is completely unreliable

Why do we sometimes get pipeline.git.base_revision and at other times it’s empty? I understand it won’t be defined in the first commit to the repo, but this is in the thousands.

Is there a reliable way to get a commit range

<< pipeline.git.base_revision >>...<< pipeline.git.revision >> 

This is with version: 2.1 and pipelines enabled.

Here is an example:

PR: https://github.com/huggingface/transformers/pull/8850
CircleCI job: https://app.circleci.com/pipelines/github/huggingface/transformers/16541/workflows/17b20230-8d7c-4b36-813c-2681f2c8a977/jobs/128232)

It’s missing << pipeline.git.base_revision >> in

if git diff --name-only << pipeline.git.base_revision >>...<< pipeline.git.revision >> | egrep -qv '\.(md|rst)$'

resulting in:

if git diff --name-only ...5170e5381b9fccdfb9405d665ecee0515efc6453 | egrep -qv '\.(md|rst)$'

Thank you.

1 Like

this proved to be worse than just not having pipeline.git.base_revision defined. There are situations when it’s defined but it’s bogus if a developer force pushed. As shown here: https://github.com/huggingface/transformers/pull/8853#issuecomment-736781950

In such situation the commit range is bogus again and can’t be relied upon.

1 Like

So far pipeline.git.base_revision is consistently undefined when making a PR via direct file edit on github.

2 Likes

Further monitoring showed that pipeline.git.base_revision is completely unreliable.

e.g. in this PR #8918 it changed pipeline.git.base_revision on every commit resulting only the changes from last commit appearing as a change for the whole PR, which is very bad, since the PR could be failing tests, but the last commit’s changes in doc file only will make it appear that everything is green, which could be very misleading.

1 Like

Hey @stas! Thanks for the updates on this. I know you also have a Support ticket open and they’re helping sort this out. We appreciate the details, and I hope we can update this thread when we know more from Support!

Looking forward to the fixes - wasted so much time on this seemingly simple task :frowning:

Any reason why I can’t edit my posts anymore?

The title needs to be changed to “pipeline.git.base_revision is completely unreliable” - since now we have identified a whole slew of problems with it - but the software doesn’t let me do it. Thanks.

I don’t see a reason why you shouldn’t be able to edit still, but I will update the title for you!

1 Like

Thank you for editing this for me, @thekatertot

I think your discuss instance is configured to lock users out from editing their posts after a relatively short period of time. I still have the edit button available for the posts from today, but for all the posts from the previous days it’s gone.

Since there is a diff available for all changes, so no information is ever lost, it should be safe to give users some trust to be able to edit their own posts.

1 Like

Thanks! I’m looking at changing the settings now :slight_smile:

1 Like

Thank you for adjusting the config, @thekatertot - I can edit again - yay!

So I tried a whole bunch of other approaches.

CircleCI support suggested using https://api.github.com/repos/${CIRCLE_USERNAME}/${CIRCLE_REPO_NAME}/pulls/${CIRCLE_PR_NUMBER}") to get the correct data about the PR, in particular base.ref/base.sha:

            resp=$(curl -Ls https://api.github.com/repos/huggingface/transformers/pulls/${CIRCLE_PR_NUMBER})
            base_sha=$(jq -r .base.sha \<<< $resp)

but first I found a whole bunch of reports where base.sha is unreliable, and then when I still tried to test it out it sort of worked for a few tests and then sent me totally bogus information. It set base.sha to a commit from many days back and now it’s telling me that I made 92 commits in a PR. horrors. :frowning:

edit - later I discovered that someone force pushed into master and thus rewriting history - so that was not github’s problem. so I don’t know still then whether base.sha coming from the above github API is reliable or not.

I also tried to use commit count:

            resp=$(curl -Ls https://api.github.com/repos/huggingface/transformers/pulls/${CIRCLE_PR_NUMBER})  
            commits=$(jq -r .commits \<<< $resp)
            echo count backward $commits commits
            git --no-pager diff --name-only HEAD~$commits

but again with bogus info from github, I can’t even test how reliable HEAD~$commits is - somehow I feel there will be wrong info if someone rewrites PR’s history.

Probably the only solution that would possibly work is by checking out the real user branch in the forked repo and then doing all the figuring out there. But this only seems to work when a normal PR is submitted. If it’s done via github UI file edit, $CIRCLE_PR_NUMBER is not set!

Here is my emulate-user’s-clone-solution:

commands:
  skip-job-on-doc-only-changes:
    description: "Do not continue this job and exit with success for PRs with only doc changes"
    steps:
      - run:
          name: docs-only changes skip check
          command: |
            if test -n "$CIRCLE_PR_NUMBER"
            then
                echo $CIRCLE_PR_NUMBER
                resp=$(curl -Ls https://api.github.com/repos/huggingface/transformers/pulls/${CIRCLE_PR_NUMBER})
                user=$(jq -r .user.login \<<< $resp)   # PR creator username
                head_ref=$(jq -r .head.ref \<<< $resp) # PR user's branch name
                echo head_ref=$head_ref, user=$user
            fi
            
            if test -n "$user" && test -n "$head_ref"
            then
                git clone https://github.com/$user/transformers user-clone
                cd user-clone
                git checkout $head_ref
                fork_point_sha=$(git merge-base --fork-point master)
                cd -
            fi

            if test -n "$fork_point_sha" && test -n "$(git diff --name-only $fork_point_sha)"
            then
                git --no-pager diff --name-only $fork_point_sha
                if git diff --name-only $fork_point_sha | egrep -qv '\.(md|rst)$'
                then
                    echo "Non-docs were modified in this PR, proceeding normally"
                else
                    echo "Only docs were modified in this PR, quitting this job"
                    # enable skipping once we get this sorted out
                    # circleci step halt
                fi
            else
                echo "Not enough data to perform a skipping check - continuing the job"
            fi

It probably can be further simplified, but I wanted lots of debug info for now.

edit: I can see that this code will probably fail if the branch is on the original repo and not in the PR submitter’s forked repo, since that branch won’t exist in their repo. Grrr.

Update: communicating with CircleCI support it seems that for whatever reason they originally chose not to set $CIRCLE_PR_NUMBER for pull requests from the same repository (which is what happens if you create a PR by editing a file directly on github or intentionally branching off the original repo and not via a personal fork).

And chances are that this will get rectified at some point in the future, and then $CIRCLE_PR_NUMBER could be used in a reliable manner.

At the moment we are experimenting with the version of the command I posted at the end of the last comment. And CircleCI support offered another version which I will paste below - but I haven’t tested it:

      - run: |
          echo $CIRCLE_PR_NUMBER
          if [ ! -z "$CIRCLE_PR_NUMBER" ] ; then

            # Get base ref from Github API
            resp=$(curl -Ls "https://api.github.com/repos/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}/pulls/${CIRCLE_PR_NUMBER}")
            head_ref=$(jq -r .base.ref \<<< $resp)

            # Create new directory and clone again
            mkdir ../clone
            cd ../clone
            git clone "$CIRCLE_REPOSITORY_URL" .
            git fetch --force origin "pull/${CIRCLE_PR_NUMBER}/head:remotes/origin/pull/${CIRCLE_PR_NUMBER}"
            git fetch --all

            CHANGED=$(git --no-pager diff --name-only remotes/origin/${CIRCLE_BRANCH}..remotes/origin/${head_ref})
          else
            # No pull request data
            git fetch --all
            CHANGED=$(git --no-pager diff --name-only remotes/origin/${CIRCLE_BRANCH}..remotes/origin/master)
          fi

          echo "\n==============="
          echo $CHANGED

I am not sure that last bit (inside else) will do the right thing, because we want the changes of this PR and not changes between PR’s branch and master, which can be a totally different thing resulting in completely incorrect results.

1 Like

Hi @stas,

I just wanted to say that your attitude is highly commendable.

Sharing the Support ticket updates in this forum thread allows other users to benefit from the investigation and findings.

Thank you so much for taking the time to do this!

2 Likes

Well, after many experiments none of this working on CircleCI - too many edge cases where things fail. You might have a better luck on other platforms, but here at the moment it’s a no-go. The process of sorting this out interferes with many ongoing PRs and so this morning we removed this altogether :frowning:

When I started this process 2 weeks ago I didn’t think it’d be so difficult to get such a trivial thing as reliable commit range from CI, but well, it proved to be far from trivial and far from reliable.

So unfortunately doc-only PRs will continue burning resources for nothing.

:frowning:

1 Like

Just came in here with a very similar usecase. We’ve been using this script up until now https://github.com/labs42io/circleci-monorepo/blob/master/.circleci/circle_trigger.sh
to detect changes in certain folders only. But this requires defining a workflow per folder, and also doesn’t work on forks. This script might help your usecase.

But now we can’t really use that script since our pipelines don’t map 1-1 with folders, and it requires having too much config in the circleci config file, which we’re trying to move away from and have the config sit in other parts of the repo.

I just want to be able to get the HEAD commit of the branch that I’m trying to merge into (usually main, but not always), and compare the differences with the HEAD commit of my PR branch.

1 Like

Thank you for sharing your script, @jamesnicolas

I read through it and I’m not sure how it can help to reliably find a branching point. It looks like your main need is to find sha of the last build, whereas here we need sha at which branching happened. Could you please elaborate at how our use case could benefit from the serious work you have done while perfecting your script? Thank you!

Thanks all for sharing this discussion. :handshake:

1 Like

I’ve got a similar need. I need to get the target branch of the PR (from the same repo, no remote) so that I can rebase before running tests/builds.

This works fine when the PRs are posted from remote forks. But when the PR is from the same repo, the pipeline.git.base_revision gives error: no such commit when used with git branch --contains.

I’m glad to see that a support ticket was opened. However, it’s unfortunate that this does not appear to be resolved.