One of our builds in CircleCI involves running tests on a dataset 200GB in size. Currently that data is downloaded from a third-party site, but the download tends to cause builds to timeout.
What strategy would best suit granting on-disk access to this very large data? Should the CircleCI cache be used? Or perhaps pre-bake the data into a Docker image?