How to configure CircleCI Continuous Delivery at scale?

I’m currently working on a project comprised of many teams that commit to a single GIT Master branch daily. We’re building a distributed system with many AWS compute services (SOA). Each one of these services has its own independent CircleCI workflow/pipeline. Within each one of these pipelines, we have the ability to push to our many non-prod environments as well as production. We’ve outfitted each environment with a CircleCI approval step that requires manual human intervention. We also have automated tests that run upon pushing to any given environment.

As of late, we’ve seen some issues with our current pipeline structure. We often work features that span multiple AWS compute services. To push to production, we’ve wanted to ensure all compute services are updated with their respective changes before promoting all feature related work items to the production environment. This usually leads to pushing to production late in an iteration/sprint to ensure all services have been updated first rather than treating each service as an independent micro-service.

As one could imagine, this causes issues if these commits are left in our CircleCI pipelines during an iteration/sprint. If we have a commit that is not apart of a given feature, the existing feature supporting commits in the pipeline prevents us from pushing to prod in isolation. Since our system is still in its infancy, we’ve not incorporated feature flags or canary releases.

Open Questions

  1. How can we get more control over what we promote to higher environments?
  2. If we’d like to withhold commits related to a given feature from production while allowing testing to continue in lower environments, how can this be done in CircleCI? Is there a way to select and or tag commits to promote to production? Will the solutions ensure no other commits are promoted to a higher environment as well?
  3. Could feature flags and treating each service as an independent micro-service be a saving grace here?
  4. Should we completely reimagine our CI/CD strategy altogether?

Any thoughts and or opinions are greatly appreciated.