How to log kill signals?


#1

On Glowstone, tests are being killed for no clear reason. (I thought we were running out of memory, but lowering the JVM heap sizes doesn’t seem to help.) How do I log what process is sending a kill signal? I’ve tried installing auditd, but it fails to start. Also, is there a way to graph memory usage over the course of the test?


#2

Have you ruled out Out Of Memory kill sigs? If not, take a look at dmesg - if the OS has to term a process due to a memory emergency, it will log it here. Grep for the alarmingly titled “sacrifice child” if you have a lot of log output.


#4

Found OOM kills:

circleci@02e7580ee912:~$ dmesg | grep -B20 sacrifice
[97587.420888] [50871] 168970 50871     7021     1639      19       3        0             0 run-mailcap
[97587.420890] [50873] 168970 50873    24703     4523      51       3        0             0 display-im6
[97587.420896] [51370] 168970 51370     1086      415       7       3        0             0 xdg-open
[97587.420898] [51505] 168970 51505     7021     1642      18       3        0             0 run-mailcap
[97587.420899] [51509] 168970 51509    24720     4631      50       3        0             0 display-im6
[97587.420903] [51975] 168970 51975     1086      412       8       3        0             0 xdg-open
[97587.420907] [52123] 168970 52123     7021     1665      19       3        0             0 run-mailcap
[97587.420908] [52152] 168970 52152    24703     4543      49       3        0             0 display-im6
[97587.420912] [52456] 168970 52456     1086      417       8       3        0             0 xdg-open
[97587.420914] [52589] 168970 52589     7021     1632      17       3        0             0 run-mailcap
[97587.420916] [52605] 168970 52605    24703     4513      49       3        0             0 display-im6
[97587.420922] [53035] 168970 53035     1086      418       8       3        0             0 xdg-open
[97587.420924] [53200] 168970 53200     7021     1641      17       3        0             0 run-mailcap
[97587.420926] [53202] 168970 53202    24702     4507      47       3        0             0 display-im6
[97587.420928] [53333] 168970 53333     1086      407       7       3        0             0 xdg-open
[97587.420930] [53470] 168970 53470     7021     1645      18       3        0             0 run-mailcap
[97587.420932] [53472] 168970 53472    24703     4501      51       3        0             0 display-im6
[97587.420937] [53911] 168970 53911     1086      389       8       3        0             0 xdg-open
[97587.420939] [54068] 168970 54068     7021     1643      18       3        0             0 run-mailcap
[97587.420941] [54070] 168970 54070    24702     4586      47       4        0             0 display-im6
[97587.420944] Memory cgroup out of memory: Kill process 9743 (Xvfb) score 319 or sacrifice child
--
[105544.706776]  [<ffffffff811f431d>] mem_cgroup_oom_synchronize+0x30d/0x330
[105544.706779]  [<ffffffff811ef660>] ? mem_cgroup_events.constprop.50+0x30/0x30
[105544.706781]  [<ffffffff8118a9f5>] pagefault_out_of_memory+0x35/0x90
[105544.706785]  [<ffffffff8107b711>] mm_fault_error+0x67/0x140
[105544.706789]  [<ffffffff81068fc8>] __do_page_fault+0x3f8/0x430
[105544.706790]  [<ffffffff81069022>] do_page_fault+0x22/0x30
[105544.706796]  [<ffffffff818127d8>] page_fault+0x28/0x30
[105544.706797] Task in /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024/02e7580ee912bc3bcc6333e934b12a1c3baec861953bf87c536d4c2dda4c0d39 killed as a result of limit of /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024
[105544.706803] memory: usage 4194304kB, limit 4194304kB, failcnt 13828
[105544.706804] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[105544.706805] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[105544.706806] Memory cgroup stats for /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024: cache:12KB rss:8612KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:8624KB inactive_file:0KB active_file:0KB unevictable:0KB
[105544.706837] Memory cgroup stats for /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024/02e7580ee912bc3bcc6333e934b12a1c3baec861953bf87c536d4c2dda4c0d39: cache:460KB rss:4185220KB rss_huge:20480KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:32KB active_anon:4185368KB inactive_file:80KB active_file:20KB unevictable:0KB
[105544.706863] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[105544.706967] [86575]     0 86575     7980     5249      20       5        0         -1000 circleci
[105544.706973] [86783] 168970 86783     1073      193       8       3        0             0 sh
[105544.706976] [86887] 168970 86887     8444     5052      22       5        0             0 picard
[105544.706981] [87639] 168970 87639  5313275   575930    1316      12        0             0 java
[105544.706984] [88120] 168970 88120     1073      178       8       3        0             0 sh
[105544.706986] [88121] 168970 88121  5151400   475435    1081      12        0             0 java
[105544.706988] Memory cgroup out of memory: Kill process 87639 (java) score 550 or sacrifice child
--
[105544.723138]  [<ffffffff811f35ff>] mem_cgroup_out_of_memory+0x26f/0x2c0
[105544.723140]  [<ffffffff811f431d>] mem_cgroup_oom_synchronize+0x30d/0x330
[105544.723142]  [<ffffffff811ef660>] ? mem_cgroup_events.constprop.50+0x30/0x30
[105544.723144]  [<ffffffff8118a9f5>] pagefault_out_of_memory+0x35/0x90
[105544.723147]  [<ffffffff8107b711>] mm_fault_error+0x67/0x140
[105544.723149]  [<ffffffff81068fc8>] __do_page_fault+0x3f8/0x430
[105544.723150]  [<ffffffff81069022>] do_page_fault+0x22/0x30
[105544.723153]  [<ffffffff818127d8>] page_fault+0x28/0x30
[105544.723154] Task in /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024/02e7580ee912bc3bcc6333e934b12a1c3baec861953bf87c536d4c2dda4c0d39 killed as a result of limit of /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024
[105544.723158] memory: usage 4194304kB, limit 4194304kB, failcnt 13927
[105544.723159] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[105544.723160] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[105544.723161] Memory cgroup stats for /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024: cache:12KB rss:8612KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:8624KB inactive_file:0KB active_file:0KB unevictable:0KB
[105544.723182] Memory cgroup stats for /docker/6041cd4c8ff74a5b6aca5dd65c59ce35cc02295ce47235490646176aeab6d024/02e7580ee912bc3bcc6333e934b12a1c3baec861953bf87c536d4c2dda4c0d39: cache:452KB rss:4185228KB rss_huge:20480KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:32KB active_anon:4185404KB inactive_file:68KB active_file:24KB unevictable:0KB
[105544.723202] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[105544.723241] [86575]     0 86575     7980     5249      20       5        0         -1000 circleci
[105544.723245] [86783] 168970 86783     1073      193       8       3        0             0 sh
[105544.723247] [86887] 168970 86887     8444     5052      22       5        0             0 picard
[105544.723249] [87639] 168970 87639  5313275   575933    1316      12        0             0 java
[105544.723251] [88121] 168970 88121  5151400   475483    1082      12        0             0 java
[105544.723253] Memory cgroup out of memory: Kill process 87639 (java) score 550 or sacrifice child

#5

Lovely. OK, so you need to use less memory :stuck_out_tongue:

May we see the run commands you use to start your Java program(s)?

I don’t know the “proper” way to do this, but you could write a background shell script that collects free info in a loop, waiting a second between each one.

If you get really stuck, you could switch to a machine executor rather than docker. It’ll give you 8G of RAM to play with, rather than the standard 4G (or, if you are on a paid plan, you can ask Circle to upgrade your RAM limit even with docker). Note though that machine may be chargeable in the future, so it’s worth considering that a short-term fix (unless you need it anyway for other reasons).


#6
            - run: mvn -T 1C -B dependency:go-offline

            - save_cache:
                paths:
                    - ~/.m2
                key: glowstone-{{ checksum "pom.xml" }}

            - run: mvn -T 1C -B package

Java is only invoked by Maven and its plugins; our POM is at https://github.com/GlowstoneMC/Glowstone/blob/dev/pom.xml


#7

OK. I was looking for memory min/max settings - do you rely on Java defaults, or are they specified in a file somewhere? I would go with this avenue of research first - I think you need to change these settings.


#8

We use -Xms1024m -Xmx1024m in _JAVA_OPTIONS and MAVEN_OPTS. But I’ve also tried -Xms512m -Xmx512m -Xss8m -XX:MaxMetaspaceSize=256m , so that heap+stack+meta adds up to only 3GiB for our 2 concurrent JVM instances (the main one and the JUnit slave) with up to 64 threads between them (with test parallelization turned off, lest there actually be more than 64 threads), and it doesn’t help. Running it with SSH and top indicates that the parent JVM is still allocating over 2GiB.

I suspect one of our libraries’ libraries must not be cgroup-aware and must be using direct ByteBuffers or JNI code with mallocs (or something else that Java somehow counts as neither stack nor heap nor meta), but I have no idea how to track down which one (unless Java can somehow instrument JNI’s calls to malloc, brk, sbrk and mmap in order to log them with their Java stack traces).


#9

OK, great info. If you want to get unstuck for the short term, try the machine executor to see if 8G helps.

However, my minimal Java knowledge is at this point exhausted. :smile: In my view, this is not a question specific to Circle, so I suggest asking a question on Stack Overflow, where a Maven/Java person can make suggestions. At least you know what the problem is!


#10