On Glowstone, tests are being killed for no clear reason. (I thought we were running out of memory, but lowering the JVM heap sizes doesn’t seem to help.) How do I log what process is sending a kill signal? I’ve tried installing auditd
, but it fails to start. Also, is there a way to graph memory usage over the course of the test?
Have you ruled out Out Of Memory kill sigs? If not, take a look at dmesg
- if the OS has to term a process due to a memory emergency, it will log it here. Grep for the alarmingly titled “sacrifice child” if you have a lot of log output.
Found OOM kills:
circleci@02e7580ee912:~$ dmesg | grep -B20 sacrifice
[97587.420888] [50871] 168970 50871 7021 1639 19 3 0 0 run-mailcap
[97587.420890] [50873] 168970 50873 24703 4523 51 3 0 0 display-im6
[97587.420896] [51370] 168970 51370 1086 415 7 3 0 0 xdg-open
[97587.420898] [51505] 168970 51505 7021 1642 18 3 0 0 run-mailcap
[97587.420899] [51509] 168970 51509 24720 4631 50 3 0 0 display-im6
[97587.420903] [51975] 168970 51975 1086 412 8 3 0 0 xdg-open
[97587.420907] [52123] 168970 52123 7021 1665 19 3 0 0 run-mailcap
[97587.420908] [52152] 168970 52152 24703 4543 49 3 0 0 display-im6
[97587.420912] [52456] 168970 52456 1086 417 8 3 0 0 xdg-open
[97587.420914] [52589] 168970 52589 7021 1632 17 3 0 0 run-mailcap
[97587.420916] [52605] 168970 52605 24703 4513 49 3 0 0 display-im6
[97587.420922] [53035] 168970 53035 1086 418 8 3 0 0 xdg-open
[97587.420924] [53200] 168970 53200 7021 1641 17 3 0 0 run-mailcap
[97587.420926] [53202] 168970 53202 24702 4507 47 3 0 0 display-im6
[97587.420928] [53333] 168970 53333 1086 407 7 3 0 0 xdg-open
[97587.420930] [53470] 168970 53470 7021 1645 18 3 0 0 run-mailcap
[97587.420932] [53472] 168970 53472 24703 4501 51 3 0 0 display-im6
[97587.420937] [53911] 168970 53911 1086 389 8 3 0 0 xdg-open
[97587.420939] [54068] 168970 54068 7021 1643 18 3 0 0 run-mailcap
[97587.420941] [54070] 168970 54070 24702 4586 47 4 0 0 display-im6
[97587.420944] Memory cgroup out of memory: Kill process 9743 (Xvfb) score 319 or sacrifice child
--
[105544.706776] [<ffffffff811f431d>] mem_cgroup_oom_synchronize+0x30d/0x330
[105544.706779] [<ffffffff811ef660>] ? mem_cgroup_events.constprop.50+0x30/0x30
[105544.706781] [<ffffffff8118a9f5>] pagefault_out_of_memory+0x35/0x90
[105544.706785] [<ffffffff8107b711>] mm_fault_error+0x67/0x140
[105544.706789] [<ffffffff81068fc8>] __do_page_fault+0x3f8/0x430
[105544.706790] [<ffffffff81069022>] do_page_fault+0x22/0x30
[105544.706796] [<ffffffff818127d8>] page_fault+0x28/0x30
[105544.706797] Task in /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024/02e7580ee912bc3bcc6333e934b12a1c3baec861953bf87c536d4c2dda4c0d39 killed as a result of limit of /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024
[105544.706803] memory: usage 4194304kB, limit 4194304kB, failcnt 13828
[105544.706804] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[105544.706805] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[105544.706806] Memory cgroup stats for /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024: cache:12KB rss:8612KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:8624KB inactive_file:0KB active_file:0KB unevictable:0KB
[105544.706837] Memory cgroup stats for /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024/02e7580ee912bc3bcc6333e934b12a1c3baec861953bf87c536d4c2dda4c0d39: cache:460KB rss:4185220KB rss_huge:20480KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:32KB active_anon:4185368KB inactive_file:80KB active_file:20KB unevictable:0KB
[105544.706863] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[105544.706967] [86575] 0 86575 7980 5249 20 5 0 -1000 circleci
[105544.706973] [86783] 168970 86783 1073 193 8 3 0 0 sh
[105544.706976] [86887] 168970 86887 8444 5052 22 5 0 0 picard
[105544.706981] [87639] 168970 87639 5313275 575930 1316 12 0 0 java
[105544.706984] [88120] 168970 88120 1073 178 8 3 0 0 sh
[105544.706986] [88121] 168970 88121 5151400 475435 1081 12 0 0 java
[105544.706988] Memory cgroup out of memory: Kill process 87639 (java) score 550 or sacrifice child
--
[105544.723138] [<ffffffff811f35ff>] mem_cgroup_out_of_memory+0x26f/0x2c0
[105544.723140] [<ffffffff811f431d>] mem_cgroup_oom_synchronize+0x30d/0x330
[105544.723142] [<ffffffff811ef660>] ? mem_cgroup_events.constprop.50+0x30/0x30
[105544.723144] [<ffffffff8118a9f5>] pagefault_out_of_memory+0x35/0x90
[105544.723147] [<ffffffff8107b711>] mm_fault_error+0x67/0x140
[105544.723149] [<ffffffff81068fc8>] __do_page_fault+0x3f8/0x430
[105544.723150] [<ffffffff81069022>] do_page_fault+0x22/0x30
[105544.723153] [<ffffffff818127d8>] page_fault+0x28/0x30
[105544.723154] Task in /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024/02e7580ee912bc3bcc6333e934b12a1c3baec861953bf87c536d4c2dda4c0d39 killed as a result of limit of /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024
[105544.723158] memory: usage 4194304kB, limit 4194304kB, failcnt 13927
[105544.723159] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[105544.723160] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[105544.723161] Memory cgroup stats for /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024: cache:12KB rss:8612KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:8624KB inactive_file:0KB active_file:0KB unevictable:0KB
[105544.723182] Memory cgroup stats for /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024/02e7580ee912bc3bcc6333e934b12a1c3baec861953bf87c536d4c2dda4c0d39: cache:452KB rss:4185228KB rss_huge:20480KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:32KB active_anon:4185404KB inactive_file:68KB active_file:24KB unevictable:0KB
[105544.723202] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[105544.723241] [86575] 0 86575 7980 5249 20 5 0 -1000 circleci
[105544.723245] [86783] 168970 86783 1073 193 8 3 0 0 sh
[105544.723247] [86887] 168970 86887 8444 5052 22 5 0 0 picard
[105544.723249] [87639] 168970 87639 5313275 575933 1316 12 0 0 java
[105544.723251] [88121] 168970 88121 5151400 475483 1082 12 0 0 java
[105544.723253] Memory cgroup out of memory: Kill process 87639 (java) score 550 or sacrifice child
Lovely. OK, so you need to use less memory
May we see the run commands you use to start your Java program(s)?
I don’t know the “proper” way to do this, but you could write a background shell script that collects free
info in a loop, waiting a second between each one.
If you get really stuck, you could switch to a machine
executor rather than docker
. It’ll give you 8G of RAM to play with, rather than the standard 4G (or, if you are on a paid plan, you can ask Circle to upgrade your RAM limit even with docker
). Note though that machine
may be chargeable in the future, so it’s worth considering that a short-term fix (unless you need it anyway for other reasons).
- run: mvn -T 1C -B dependency:go-offline
- save_cache:
paths:
- ~/.m2
key: glowstone-{{ checksum "pom.xml" }}
- run: mvn -T 1C -B package
Java is only invoked by Maven and its plugins; our POM is at https://github.com/GlowstoneMC/Glowstone/blob/dev/pom.xml
OK. I was looking for memory min/max settings - do you rely on Java defaults, or are they specified in a file somewhere? I would go with this avenue of research first - I think you need to change these settings.
We use -Xms1024m -Xmx1024m
in _JAVA_OPTIONS and MAVEN_OPTS. But I’ve also tried -Xms512m -Xmx512m -Xss8m -XX:MaxMetaspaceSize=256m
, so that heap+stack+meta adds up to only 3GiB for our 2 concurrent JVM instances (the main one and the JUnit slave) with up to 64 threads between them (with test parallelization turned off, lest there actually be more than 64 threads), and it doesn’t help. Running it with SSH and top indicates that the parent JVM is still allocating over 2GiB.
I suspect one of our libraries’ libraries must not be cgroup-aware and must be using direct ByteBuffers or JNI code with mallocs (or something else that Java somehow counts as neither stack nor heap nor meta), but I have no idea how to track down which one (unless Java can somehow instrument JNI’s calls to malloc, brk, sbrk and mmap in order to log them with their Java stack traces).
OK, great info. If you want to get unstuck for the short term, try the machine
executor to see if 8G helps.
However, my minimal Java knowledge is at this point exhausted. In my view, this is not a question specific to Circle, so I suggest asking a question on Stack Overflow, where a Maven/Java person can make suggestions. At least you know what the problem is!