Caching pg_dump files by migrations



I have three repos that I run migrations on before doing downstream integration tests. We use Postgres as our database. The migrated repos are django & flask. In one job, we run the migrations in each repo, call pg_dump to dump the data to files on an attached ‘workspace’, then call persist_to_workspace to save the files. This process takes around 10 minutes. In parallel downstream jobs, we call attach_workspace to access those file and restore the database before running the integration tests.

I’d like to come up with a strategy to cache those ‘pg_dump’ files based on the last migration in each repo so, if no new migrations have been introduced, use the cached files. Otherwise, do all of the migrations and cache the new ‘pg_dump’ files.

Has anyone done something similar?


You could use the cache device, using the hash of the migration file as a cache key. If the migration file changes, the old cache becomes invalid and you have to re-do the migration.


Unfortunately there isn’t a single migration file. Each ‘app’ in the repo has a ‘migrations’ directory where migrations for that app live.


That’s probably OK - just write something that creates an overall hash of all of them and use that as your cache key.