Friday, June 26, 2015

Java: Leak, or Memory Pressure?

I recently encountered a memory allocation issue that I hadn’t seen before in (many) years of programming in GC languages (primarily Lisp/Java). The initial indication of a problem occurred a few months ago when a long running process (fetch data from web service, store data in mysql db, analyze data, store results in mysql) ran into a GC overhead error:

java.lang.OutOfMemoryError: GC overhead limit exceeded
running with a -Xmx ~ 200M. 

I did some profiling, found a few leaks, but couldn’t make the problem disappear. I also looked at it again in Netbeans, profiling recording the allocation stack trace. Although I did find a leak caused by some of my Hibernate query constructs (repetitively generating a string rather than using a named query), a complete diagnosis evaded me.

This week, as part of optimizing the process for final hosting on AWS, I revisited the issue: tightening the memory allocation to more quickly induce failure, and watching the GC logs. The GC logs yielded this surprising result:

11665.495: [Full GC (Ergonomics) 77812K->73371K(79872K), 0.0841821 secs]
11665.602: [Full GC (Ergonomics) 77812K->73371K(79872K), 0.0847730 secs]
11665.708: [Full GC (Ergonomics) 77812K->73395K(79872K), 0.0770183 secs]
11665.800: [Full GC (Ergonomics) 77812K->73371K(79872K), 0.0828496 secs]
11665.905: [Full GC (Ergonomics) 77812K->73369K(79872K), 0.0871271 secs]
11666.019: [Full GC (Ergonomics) 77812K->73369K(79872K), 0.0756355 secs]

There was no memory leak (explaining, in part, why it was so hard to find…)! The process ran for ~ 2 hours before finally hitting

java.lang.OutOfMemoryError: GC overhead limit exceeded

This isn’t a surprise, given the log messages above: 6 FULL gc’s in ~ half a second — the process was spending a lot of time in the GC.

Poking around the web a bit I found a thread on StackOverflow suggesting use of 


With this setting, the process reached the final “tenured memory” value quickly and stayed there for a while. However, the process began running very slowly, and the GC still couldn’t keep up, eventually yielding a straight up, vanilla 

java.lang.OutOfMemoryError: Java heap space

Again, this happened even though the tenured heap memory was stable and the value hadn’t increased for hours.

So, what’s going on? Memory pressure. Memory pressure indicates the ability of the memory management infrastructure to keep up with the processes that depend on it. In this case the management infrastructure consists of GC, marking (+ potentially compacting) free areas, and putting them on the free list. The dependent process is the Java analysis program. 

For background, here’s some memory pressure links

I think we all realize that this would possible in the abstract, although I’ve never personally encountered it before in my code. 

At the moment my solution is “punt & restart the process every few hours”, so this post is a work in progress. Since the tenured memory value is stable for each run, but varies with the -Xmx value, I assume that not only are there another tunable values in the system that might impact this performance, but that I can also reduce memory pressure by more clever caching and retrieval methods.

Longer term the options break down into

Run Slower: reduce the frequency of the demands that the Java process is making on the memory subsystem, aka slow down the repetition rate for updates

Run Leaner: reduce the impact of the demands that the Java process is making on the memory subsystem: create less garbage by creating fewer transient objects, etc.


PS: I tried using -XX:+UseParallelGC, and the process crashed in < 2 minutes, which was spectacular

No comments: