Hi friends! I’m building Link checking into my automated tests on Circle. We have a lot of links to drupal.org (references to modules), and that site ends up blocking the tests with a 403 response.
I’m wondering if others have encountered and resolved similar issues with sites that block bot crawling.
That’s probably a bot blocker on the Drupal domain. In general, you should try to avoid running automation on other people’s servers - it offloads costs onto the third party, and it is quite normal that they will try to block that.
You could:
Omit testing these links completely, by adding a blocklist in your link checker. This may be appropriate if the links to this domain do not change very often, and can be checked manually once.
If you really need to check these links automatically, then add delays in your link checker when hitting external domains (e.g. 1 second between each request). That will make your system look much less like a crawler.