Docker dependencies loading undesired previous state

I have a CircleCI 2.0 setup with 3 docker dependencies: postgres, redis and elasticsearch.

Their initial loading is currently up to 12min…
23%20pm

The ElasticSearch container is finding former index files to load.

[2018-08-01T06:03:30,343][INFO ][o.e.n.Node               ] [] initializing ...
...
[2018-08-01T06:03:37,834][INFO ][o.e.n.Node               ] [PXl-T9L] started
[2018-08-01T06:03:37,836][INFO ][o.e.g.GatewayService     ] [PXl-T9L] recovered [0] indices into cluster_state
[2018-08-01T06:05:24,345][INFO ][o.e.c.m.MetaDataCreateIndexService] [PXl-T9L] [XXXXXXXXXX_index_v4] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2018-08-01T06:06:06,869][INFO ][o.e.c.m.MetaDataCreateIndexService] [PXl-T9L] [ZZZZZZZZZZZZZZZZZZ_index_0_2] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
...
...
[2018-08-01T06:12:40,523][INFO ][o.e.c.m.MetaDataMappingService] [PXl-T9L] [YYYYYYYYYYYYYYYYYYYYYYYY_0_20180801161240252/jNMx_koYRxeQ8xYBIIYxlA] create_mapping [YYYYYYYYYYYYYYYYYY]

Job was canceled

From their top position I’d assume those containers would just load with default state, not any previous files.

Ultimately I don’t want any files that I didn’t explicitly cache/restore to be present at next build.

If anyone could help, I’d love to know what I missed out.

Since they are all logged as taking 12:07, I assume it is 12:07 total, and they are waiting until they are all started. If that is the case, can you get the times for each one? I am guessing from your description that Postgres and Redis start quickly, and it is Elasticsearch that takes 12+ minutes. Is that right?

I’d assume as well that you are using the default command to start this image. Have a look at the configuration reference in the docs: you can supply a command key for images, in which you can supply a custom start command. I would guess that you can turn off the initial indexing here.

Thanks for the reply.

Good idea on trying to stop elasticsearch from loading previous indexes, but it feels slightly wrong though: no indexes should be there in a first place!

I had a quick search about elasticsearch settings and couldn’t find a way to prevent it from loading the indexes - it’s seem to be an essential part of their “initial recovery” feature.

I then looked up more towards the docker image of elasticsearch and saw they’re using a VOLUME

So my thinking is now to force the volume to an empty/new one!

Here is the head of our .circleci.config.yaml config.

defaults: &defaults
  docker:
    - image: circleci/ruby:2.4.2-node
      environment:
        ...
    - image: circleci/postgres:9.6-alpine-postgis
      environment:
        ...
    - image: elasticsearch:5.1.2-alpine
    - image: redis:latest

Is there a way to specify a blank volume for an image?

Something like:

    - image: circleci/postgres:9.6-alpine-postgis
      environment:
        ...
      volume:
         /tmp/surely-inexistant:/usr/share/elasticsearch/data

Question remaining though: why the volume would be cached from previous build!!?

Thanks for the help

You’re not using a CircleCI image here, so if the container contains index data, they will also be in the image at Docker Hub. Isn’t this an upstream issue for that project?

Ah, are you running several jobs in a workflow? I would not have assumed that your secondary containers would cache data from one job to the next in a single workflow, but perhaps you can shed some light on how you are running this job?

If you mean previous runs of a single job not in a workflow, then yes, secondary containers should absolutely not be caching anything. They are meant to be fresh every time.

Can you dump the indexes to see what is in them? That might give you a clue as to when they are accumulated.

Thanks again, your reasoning process is of great help!

You’re not using a CircleCI image here, so if the container contains index data, they will also be in the image at Docker Hub. Isn’t this an upstream issue for that project?

Sorry I didn’t specify. Yes the container is not from CircleCI, but it’s a known public docker image, and I can confirm there is no index files contained in that image. The files are set only in CircleCI environment.

are you running several jobs in a workflow?

Yes, 3 actually, a build and 2 deploys dependent of the build, the current delayed job is the build (first) one.

I’ll dig on the source of the index files and potentially clean them at the end of the build for next build if I can’t find another way :confused:

1 Like

Hmm, intriguing. I wonder if secondary containers are preserved across the lifetime of the workflow? I don’t use secondary containers, but I would have thought everything would be stopped and started with each job.

Still, I can understand the benefit of such an arrangement - it would allow the preservation of server state between related jobs.

Yeah, that’s a good idea.

Or if you want to make sure the ES is completely fresh each time, you could always drop it as a secondary container and just install it in the primary/build container. It’s a bit messier, but getting it from apt-get will guarantee it to be as clean as possible :smiley_cat:.

They all say 12:07 because they were running for the duration of the build. If one were to die early, the timestamp would reflect it. It’s a shortcoming of the UI because it’s more clear while the build is still running what is happening with those containers.

We always destroy containers after they run. There is no data being pulled into the image that isn’t committed into the image itself.

Wow. True. :smirk:

Thanks.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.