Hi, from both the time and disk space it takes to run save_cache
and restore_cache
on a single large file, it seems like a compression / decompression step is being run. Can you tell me what compression algorithm or command is being used? I’m asking because it seems like too much time might be being spent during the compression / decompression step on a particular CI workload, so I’m wondering if it could be addressed by using a different utility, or by giving the user the option not to compress and letting the user do it. Thank you.
In the past, I came across a reference that indicated that the standard cache feature is using a Maven repo, but I can not find a link at the moment. Using such a store provides a lot of functionality but at the cost of performance. So while I am not sure of the compression algorithm used the overall performance may be impacted more by the repo store.
Rather than extending the feature set of the built-in cache, which would then limit the support team’s ability to provide support, there are just other ways to handle caching if you know what your workflow is.
-
One user at the end of last year reported a 2-2.5x performance increase by switching to zstd and an S3 store.
-
My environment is built around self-hosted runners. This means I have persistent storage and so jobs have persistent cache areas across runs. The result is that things are very fast, at the cost of additional operational work.