[Product Launch] Chunk Tasks - fixing flaky tests

Hello CircleCI community!

Now in beta: a new agentic capability that identifies and provides fixes for flaky tests in your CircleCI projects, helping you ship quickly with confidence by reducing time spent debugging intermittent failures.

Getting Started

Prerequisites:

  1. An Anthropic or OpenAI API key to enable the agent to process and generate flaky test fixes using your existing model provider. Your source code is not stored nor used for training purposes by CircleCI.

    For OpenAI, make sure your org has gpt-5 model access and your organization is verified. You can read more at OpenAI Organization Verification Guide. If organization verification isn’t possible, read our FAQ section What if I can’t get my organization verified when using OpenAI?.

  2. Test results stored on CircleCI: To enable CircleCI to detect test flakiness, you need to store your test results using the store_test_results step in your CircleCI YML configuration file. Learn more about collecting test data here.

Setup:

  1. Navigate to the CircleCI web app → Your organization → Chunk Tasks (from the left-hand navigation) → Get Started

  2. Install the CircleCI GitHub App in your desired GitHub Organization so the agent can create Pull Requests with recommended fixes.

  3. Add your Anthropic or OpenAI API key.

  4. Select the followed project where the task should be assigned.

  5. Configure your preferred

    • Run frequency (daily/weekly/monthly)

      • Daily: Runs Sunday-Thursday at 22:00 UTC
      • Weekly: Runs on Sunday at 22:00 UTC
      • Monthly: Runs on the first first day of the month at 22:00 UTC
    • Maximum tests to fix per run

    • Number of solutions to try per test

    • Number of validation runs per test

    • Maximum number of concurrent open PRs.

  6. The agent creates an environment to run your tests by inferring setup from your repository. If you’d like more control, you can customize it with a .circleci/cci-agent-setup.yml file on your default branch. Learn more in our FAQ: Unable to run verification tests.

How It Works

  1. When running, the agent will identify flaky tests based on the tests that are marked flaky in CircleCI’s Test Insights
  2. The agent will generate potential fixes to the flakiness based on the Number of solutions to try per test setting that was configured during setup.
  3. The agent will validate solutions through multiple test runs to ensure the flakiness has been removed based on the Number of validations run per test setting that was configured during setup.
  4. The agent will open a pull request with the proposed fix

The Agent tasks list in CircleCI web app →Your organization → Chunk Tasks tab will show a row per every test being analyzed.

If the agent does not run into an error while analyzing & attempting to fix a given test’s flakiness, a it opens a PR with a proposed fix.

Each agent task will have two tabs:

  • Code Diff: the proposed code changes

  • Logs: the agent’s reasoning and analysis

If the agent lacks confidence in the fixes or runs into an error during execution, a PR is not created, but logs and analysis remain available for review.

Known limitations

Editing agent task configurations

Currently, there is no way to directly edit task configuration settings including post-run commands once an agent task is created. The workaround for now is to delete the Chunk task and recreate it:

    • Navigate to Organization settings > Chunk settings > Delete the current agent task.
    • Create a new agent task > customize your settings

OpenAI Zero Data Retention Compatibility Issue
If you’re using OpenAI as your model provider and see all Chunk tasks marked as “Not fixed” with “Could not diagnose a fix” messages and empty Logs tabs, the issue may be that your OpenAI account has Zero Data Retention enabled. Chunk does not yet support OpenAI accounts with Zero Data Retention.

Ad-hoc tasks

In the CircleCI web app, navigate to Organization settings > Chunk settings > “…” > Submit ad-hoc task. From a branch that already exists, you can ask Chunk to accomplish any task you’d like (ie. “remove the outdated call-to-action from my web app’s home page”). It will push its changes to the branch that you select. For these tasks, Chunk runs in the environment that you define in your cci-agent-setup.yml file (read more about Chunk’s environment here).

Join the Beta

Join the waitlist to get access

Share your feedback by commenting below or email sebastian@circleci.com


No extra cost during beta. Uses compute credits and your AI provider tokens. This will be a paid feature after beta.

Chunk Tasks - Latest updates (09/05/2025)

We’ve been actively improving the agent based on your feedback. Here are the latest enhancements:

Better User Experience

  • We’ve improved agent logs within Agent tasks to be more human-friendly and conversation-like. Logs now clearly distinguish between Assistant (instructions being executed by the agent) and User (output from the agent), making it easier to follow the agent’s workflow.

  • In the Agent task Logs tab, we’ve added an Expand/Collapse All toggle to streamline troubleshooting. This allows you to quickly expand all logs and use Cmd+F (or Ctrl+F) to search for specific words or commands.

    ezgif-8851e0c63baa76

  • We’ve enhanced PR bodies and run summaries with human-readable descriptions for better clarity.

  • Agent-generated branches now use circleci/fix-flaky-test-<id> instead of the previous fix-flaky-test-<id> format. Making them easier to identify and filter in your repository.

Behind the Scenes

  • Enhanced Execution Environment: The agent now uses a machine executor for better performance and reliability when running tests.

Chunk - FAQs

Does CircleCI use my data to train the models?

Your source code is not stored nor used for training purposes by CircleCI

What data does Chunk access?

Chunk accesses historical build data and repository contents to identify and fix flaky tests. This is the same information that CircleCI already has access to through your existing CircleCI configuration.

Will my test results be shared with other customers?

Your usage data, including test results, will not be used for any other customer. Each customer’s data remains isolated and is only used to support their own Chunk tasks.


How long are agent logs stored?

We store agent logs for 90 days. This is a fixed retention period that applies to all organizations, regardless of your plan’s standard data retention policy. After 90 days, logs are automatically deleted to keep your workspace at optimal performance.


OpenAI organization verification required. Please verify your organization at…

When encountering the message:
OpenAI organization verification required. Please verify your organization at https://platform.openai.com/settings/organization/generaland see ourcommunity forum for more debugging help
inside an agent task, it indicates that your OpenAI organization verification is still pending.

To fix this: In OpenAI Platform navigate to General > Organization settings and click ‘Verify Organization’ to follow the necessary steps to have your organization verified.

verifyorganization-ezgif.com-video-to-gif-converter

Additional help: OpenAI Organization Verification Guide


What if I can’t get my organization verified when using OpenAI?

If organization verification isn’t possible, you can bypass this requirement by adding an environment variable:

  1. Go to Organization Settings > Contexts > circleci-agents

  2. Add new Environment Variable:

    1. Name: CCI_AGENT_OPENAI_MODEL

    2. Value: gpt-5-nano

gpt-5-nano-ezgif.com-video-to-gif-converter (1)


Invalid OpenAI model specified. Please check the model name and ensure it is available for your account.

When encountering the message:
Invalid OpenAI model specified. Please check the model name and ensure it is available for your account.
you need to make sure your organization has gpt-5 model access.

To verify this: In OpenAI Platform

  1. Switch to the project you want to check (top-left dropdown).

  2. Go to Settings → Limits in the left-hand menu.

    • This page shows the models and rate limits for your project.

    • If gpt-5 is listed, you have access. If not, that project doesn’t.

  3. You can also check in the Playground:

    • Select the project (top-left).

    • Open the Model dropdown. Only available models will appear.

limitsss-ezgif.com-video-to-gif-converter


Action required - agent execution error

When encountering the message:
Action required - agent execution error
The agent ran into an error while executing this task. See our community forum for how to solve this error.
Email us at sebastian@circleci.com and we’ll help you figure it out.


Unable to run verification tests

Chunk runs in a Linux Machine VM with basic software installed by default. To verify that a proposed fix resolves flakiness, it re-runs the affected test several times. To do this, the agent may install additional software needed to set up the test environment, using clues from your circleci/config.yml to determine how to run the tests.

You can view these attempts in the CircleCI web app by opening the Chunk Tasks → Select a task logs → Expand All, then searching for “run the command for each attempt.” This will take you to the sections where the agent is trying to run the tests.

Improving verification success

Create an “agent environment” CircleCI YML file. This file lets you copy the environment-setup parts of your existing CircleCI config into a dedicated file for Chunk. Name the file cci-agent-setup.yml and ensure that it is present in your .circleci directory and on the default branch.

Chunk supports all standard CircleCI configuration options. This includes executors, resource classes, caching, contexts, environment variables, service containers, orbs, and everything else you’d use in a normal CircleCI pipeline. If it works in your .circleci/config.yml, it works in cci-agent-setup.yml. For a complete reference of available configuration options, see the CircleCI Configuration Reference.

Example cci-agent-setup.yml files:

Basic Python setup
version: 2.1
workflows:
  cci-agent-setup:
    jobs:
      - cci-agent-setup
jobs:
  cci-agent-setup:
    docker:
      - image: cimg/python:3.12
      - image: cimg/postgres:15.3
    steps:
      - checkout
      - run:
          name: Install dependencies
          command: |
             pip install -r requirements.txt

With Caching and Contexts
version: 2.1
workflows:
  cci-agent-setup:
    jobs:
      - cci-agent-setup:
          context: 
            - my-team-context  # Includes any secrets/env vars from this context
jobs:
  cci-agent-setup:
    docker:
      - image: cimg/node:18.0
    steps:
      - checkout
      - restore_cache:
          keys:
            - v1-dependencies-{{ checksum "package-lock.json" }}
      - run:
          name: Install dependencies
          command: npm install
      - save_cache:
          paths:
            - node_modules
          key: v1-dependencies-{{ checksum "package-lock.json" }}

With multiple services
version: 2.1
workflows:
  cci-agent-setup:
    jobs:
      - cci-agent-setup
jobs:
  cci-agent-setup:
    docker:
      - image: cimg/ruby:3.2
      - image: cimg/postgres:15.3
        environment:
          POSTGRES_USER: circleci
          POSTGRES_DB: test_db
      - image: redis:7.0
    steps:
      - checkout
      - run:
          name: Wait for DB
          command: dockerize -wait tcp://localhost:5432 -timeout 1m
      - run:
          name: Install dependencies
          command: bundle install
      - run:
          name: Setup database
          command: bundle exec rake db:setup
With custom resource class and machine executor
version: 2.1
workflows:
  cci-agent-setup:
    jobs:
      - cci-agent-setup
jobs:
  cci-agent-setup:
    machine:
      image: ubuntu-2204:2024.01.2
    resource_class: large
    steps:
      - checkout
      - run:
          name: Install dependencies
          command: |
            sudo apt-get update
            sudo apt-get install -y build-essential

Environment Variables & Contexts

Project environment variables: Chunk automatically has access to any environment variables you’ve configured at the project level in CircleCI. You don’t need to recreate or reference these, they’re already available.

Contexts: If you’re using CircleCI contexts to manage secrets or environment variables, simply include the context in your cci-agent-setup job (as shown in the caching example above). Chunk will have access to all variables from that context, no need to manually recreate them.

Testing Your Environment Setup

To build & iterate on Chunk’s environment, navigate to Organization Settings → Chunk Tasks → Identify desired Agent Task → Select [ … ] → Select [ Chunk Environment ]. This page lets you run the contents of your cci-agent-setup.yml file on a specific branch and immediately see the results from those ad-hoc tasks. Use the Custom button to submit a task to Chunk and see the results.

Merge the cci-agent-setup.yml file to your default branch when the results on the environment setup page are satisfactory.

Additional Guidance for Chunk

To improve Chunk’s ability to run tests & produce fixes that are aligned with stylistic/architectural preferences, many users also include a markdown file (claude.md or agents.md) in the root of their repo with instructions for running tests. Chunk should pick this up automatically.


Changing Chunk’s model provider

Currently, Chunk can only have one model provider installed at a time. To change your model provider:

  1. Navigate to Organization settingsChunk tasks from the left navigation

  2. Click Edit in Contexts

  3. Select circleci-agents, scroll down to Environment variables and delete your current model provider API key

  4. Click on Add environment variable and input your new API key information:

    • Environment variable name: Enter OPENAI_API_KEY or ANTHROPIC_API_KEY (depending on your model provider)

    • Value: Enter your model provider API key

    • Click Add environment variable


Task Summary or Pull Request Body Too Long or Poorly Formatted

If you’re noticing that your Chunk Task responses appear incomplete or poorly formatted, this may indicate that your API key needs to be configured for a more capable model.

Identifying the Issue
When viewing a Chunk Task through the CircleCI UI → Chunk Tasks, you might observe these indicators of suboptimal model performance:

  • Inconsistent formatting: The task body lacks properly bolded section headers for Run Summary, Root Cause, Proposed Fix, and Verification

  • Interactive prompts: Chunk Task body ends with open-ended questions like:

“Would you like me to implement the robust wait pattern in the test now, and add a small helper for future tests? If yes, I’ll apply the changes and run the targeted tests.”

To guide you, here are some examples of how a Chunk Task body should look:

Root Cause
These formatting and content issues typically occur when Chunk is using a less powerful language model to analyze and propose fixes for flaky tests.

Resolution
To fix this, you’ll need to ensure your organization has access to gpt-5 and it’s properly verified.
For details on verification requirements, see “OpenAI organization verification required. Please verify your organization at…” in our FAQs.

Important: If your team previously overrode the model used by Chunk, you’ll need to remove that configuration to prevent using a lower-performance model:

  1. Navigate to CircleCI web app > Organization Settings > Contexts > circleci-agents

  2. Remove the CCI_AGENT_OPENAI_MODEL environment variable from Environment variables section


Start Task button disabled

We’ve noticed some users experiencing an issue where the Start Task button remains disabled in Chunk Tasks > Assign new task, even after filling in all required inputs.Resolution

  1. Look for a callout in the top navigation bar that says “Your GitHub identity is not verified” and click Authorize.

image (1)

  1. Select “Authorize CircleCI App” when asked3
  2. Try assigning a new Chunk Task again

This should resolve the issue and allow you to proceed with assigning a Chunk task to a project.

Chunk Tasks - Latest Updates (10/16/2025)

Better User Experience

  • Chunk now uses failed job step context to improve fix accuracy.

  • We’ve fixed an issue where the Chunk Tasks nav bar item wasn’t highlighting when selected, improving navigation clarity.

  • We’ve resolved a bug where pasting an API key would automatically close the setup modal, ensuring a smoother setup experience.

  • Chunk’s commits now include verified signatures. All commits created by Chunk are now properly signed and authored by circleci-app[bot]. This resolves issues where unsigned commits would require special handling or higher privileges to merge.

  • You can now guide Chunk with a custom instruction file. Create a fix-flaky-test.md file in your .circleci/ directory to provide specific guidance about how you want the agent to approach fixing flaky tests in your project. This gives you fine-grained control over the agent’s behavior and lets you encode your team’s testing best practices directly into the fix generation process. Example .circleci/fix-flaky-test.md file:


## Command Restrictions

- You MUST NOT use the `sleep()` command or `setTimeout()` for delays in any scripts
- You MUST NOT use `eval()` as it poses security risks
- Avoid using shell wildcards in destructive operations (e.g., `rm -rf *`)

## Code Style Preferences

- Prefer functional components over class components in React
- Use TypeScript `type` definitions instead of `interface` (this project enforces this via ESLint)
- Favor explicit error handling over try-catch-all patterns
- Use async/await syntax over Promise chains for readability

## Security Considerations

- Always flag use of `dangerouslySetInnerHTML` in React components
- Highlight any potential SQL injection vulnerabilities
- Point out hardcoded credentials or API keys
- Flag any use of `eval()` or `Function()` constructors

## Documentation Standards
- Complex algorithms MUST include explanatory comments

Enhanced Functionality

  • Chunk is now using the latest model from Anthropic by default: Claude Sonnet 4.5

  • The cci-agent-setup.yml configuration now works seamlessly with orbs and user-specified resource classes, giving you more flexibility in how you set up your agent environment.

Behind the Scenes

  • We’ve added file protection safeguards to ensure the agent respects and excludes sensitive files from commits

  • We’ve restored full detail in execution logs when using OpenAI as model provider. A regression that caused Chunk execution logs to show significantly less output has been resolved. When in CCI web app > Chunk Tasks > Chunk Task, Logs now provide complete visibility into the agent’s actions, making troubleshooting much easier when reviewing task history.