Reset CallTraceStorage counters before reporting live objects. #1009

krk · 2024-09-25T14:08:13Z

Allows collecting allocation and live object traces at the same time.
Also fixes "sometimes missing" stack traces in allocations.

Related issues

#928

Motivation and context

Enabling live object tracing missed stack traces "sometimes".

How has this been tested?

Unit tests asserting on live object's size with varying GC probability.

For the stack traces, JFR recording is created and allocation samples are verified to have stack traces in the JFR.

Sample test assertion without the patch, that is observed "sometimes":

INFO: Running AllocTests.livenessJfr...
WARNING: AllocTests.livenessJfr failed
java.lang.AssertionError: Stack trace missing for id 4939
        at test.alloc.AllocTests.livenessJfr(AllocTests.java:139)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at one.profiler.test.Runner.run(Runner.java:152)
        at one.profiler.test.Runner.run(Runner.java:168)
        at one.profiler.test.Runner.main(Runner.java:217)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

apangin · 2024-09-25T15:32:50Z

src/callTraceStorage.cpp

+        for (u32 slot = 0; slot < capacity; slot++) {
+            if (keys[slot] != 0) {
+                CallTraceSample& s = values[slot];
+                storeRelease(s.counter, 0);


Don't we need to reset s.samples too?
Check that flame graph is generated correctly both with and without --total option.

apangin · 2024-09-27T22:56:24Z

test/test/alloc/AllocTests.java

+    @Test(mainClass = RandomBlockRetainer.class, args = "0.7", inputs = { "true" })
+    @Test(mainClass = RandomBlockRetainer.class, args = "0.8", inputs = { "true" })
+    @Test(mainClass = RandomBlockRetainer.class, args = "0.9", inputs = { "true" })
+    @Test(mainClass = RandomBlockRetainer.class, args = "1.0", inputs = { "true" })


Let's keep the number of test cases reasonably small so that they don't run for ages.
For the future, it may be useful to have different test levels, where we'll run only a subset of tests on each commit.

apangin · 2024-09-27T22:58:56Z

test/test/alloc/AllocTests.java

+        System.out.println("totalBytes: " + totalBytes + " keepChance: " + keepChance + " lowerLimit: " + lowerLimit
+                + " upperLimit: " + upperLimit);
+
+        assert lowerLimit <= totalBytes;


Assert.isGreater/isLess can be helpful: it prints actual values when the assertion fails (so there is no need to have println statement above).

apangin · 2024-09-27T23:01:28Z

test/test/alloc/AllocTests.java

+        final boolean live = Boolean.parseBoolean(p.inputs()[0]);
+        final double keepChance = live ? Double.parseDouble(p.test().args()) : 1.0;
+
+        Output out = p.profile("--alloc 1k --total -o collapsed" + (live ? " --live" : ""));


--live is supported on JDK 11+ only. The does should not run or should fail on JDK 8.

Allows collecting allocation and live object traces at the same time. Also fixes "sometimes missing" stack traces in allocations.

This keeps the allocation stack traces in the jfr file while not inflating collapsed and flamegraph samples.

apangin · 2024-10-11T16:59:38Z

src/callTraceStorage.cpp

-                storeRelease(s.samples, 0);
+                if (resetSamples) {
+                    storeRelease(s.samples, 0);
+                }


When resetting counters, we always need to reset both samples and counter together. Otherwise, one of --total or non-total report will be broken. Also, to calculate metrics like average object size we need to have samples and counter in sync.

apangin · 2024-10-11T17:03:33Z

src/objectSampler.cpp

@@ -100,6 +100,10 @@ class LiveRefs {
        jvmtiEnv* jvmti = VM::jvmti();
        Profiler* profiler = Profiler::instance();

+        // Set CallTraceStorage counters to zero, so only live objects
+        // will be reported to collapsed and flamegraph outputs.
+        profiler->resetCounters();


The idea is to reset counters only for non-JFR recording.
In JFR recording, each sample is recorded individually, so accumulated counters are not actually used.

apangin · 2024-10-14T12:11:02Z

test/one/profiler/test/Assert.java

@@ -12,6 +12,9 @@ public class Assert {
    private static final Logger log = Logger.getLogger(Assert.class.getName());

    public static void isGreater(double value, double threshold) {
+        if (value <= threshold) {
+            throw new AssertionError("Expected " + value + " > " + threshold);
+        }


Redundant lines after merge?

Good catch, yes!

apangin · 2024-10-15T10:25:29Z

test/test/alloc/RandomBlockRetainer.java

+        }
+
+        // Allow test framework to find the pid.
+        Thread.sleep(1000);


This is a sure way to catch intermittent test failures. Is it possible to avoid timed sleeping? If you want to profile allocations from the beginning, maybe it's better to attach profiler as an agent.

apangin · 2024-10-15T13:07:21Z

test/one/profiler/test/Assert.java

+        if (value < threshold) {
+            throw new AssertionError("Expected " + value + " >= " + threshold);
+        }
+    }


For assertions consistency, let's add this after #1027.

These are used in the current commit, happy to replace them with #1027 when it is merged.

apangin · 2024-10-15T13:16:36Z

src/objectSampler.cpp

+        if(!profiler->jfrActive()) {
+            profiler->resetCounters();
+        }


To reduce amount of changes and to expose only one function, I'd encapsulate _jfr.active() check under Profiler::resetCounters (or however it will be named). Also because jfrActive() is inherently prone to races: it can be called only in situations when JFR recording state is known not to change.

apangin · 2024-10-15T21:37:17Z

src/profiler.cpp

@@ -773,6 +773,12 @@ void Profiler::recordEventOnly(EventType event_type, Event* event) {
    _locks[lock_index].unlock();
 }

+void Profiler::tryResetCounters() {
+    if(!_jfr.active()) {


apangin · 2024-10-15T21:39:31Z

src/objectSampler.cpp

@@ -100,6 +100,9 @@ class LiveRefs {
        jvmtiEnv* jvmti = VM::jvmti();
        Profiler* profiler = Profiler::instance();

+        // Reset counters only for non-JFR recording.


A slightly expanded comment would be helpful: why we reset counters here and why only for non-JFR recording.

Signed-off-by: Andrei Pangin <[email protected]>

krk force-pushed the lives branch 2 times, most recently from c7a8195 to 0383d9b Compare September 25, 2024 14:15

apangin reviewed Sep 25, 2024

View reviewed changes

apangin reviewed Sep 27, 2024

View reviewed changes

krk force-pushed the lives branch from 0383d9b to 12ba4d5 Compare October 7, 2024 14:18

Reset CallTraceStorage counters before reporting live objects.

e84d12a

Allows collecting allocation and live object traces at the same time. Also fixes "sometimes missing" stack traces in allocations.

krk force-pushed the lives branch from 12ba4d5 to e84d12a Compare October 7, 2024 15:16

krk added 2 commits October 11, 2024 15:38

Merge remote-tracking branch 'upstream/master' into lives

eced671

Do not reset samples while jfr is active.

017f171

This keeps the allocation stack traces in the jfr file while not inflating collapsed and flamegraph samples.

apangin reviewed Oct 11, 2024

View reviewed changes

address comments

29c3c02

apangin reviewed Oct 14, 2024

View reviewed changes

address comments

0d1f56c

apangin reviewed Oct 15, 2024

View reviewed changes

krk added 2 commits October 15, 2024 14:14

address comments

b0f7a13

address comments

b907d69

apangin reviewed Oct 15, 2024

View reviewed changes

krk and others added 2 commits October 16, 2024 10:17

spaces and comments

f0a6417

Removed extra whitespace changes

1825ce0

Signed-off-by: Andrei Pangin <[email protected]>

apangin merged commit da3f5f3 into async-profiler:master Oct 16, 2024
1 check passed

apangin mentioned this pull request Oct 20, 2024

Why Cannot StackTrace Be Found Using stackTraceId After LiveObject Is Enabled? #928

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reset CallTraceStorage counters before reporting live objects. #1009

Reset CallTraceStorage counters before reporting live objects. #1009

Uh oh!

krk commented Sep 25, 2024

Uh oh!

apangin Sep 25, 2024

Uh oh!

apangin Sep 27, 2024

Uh oh!

apangin Sep 27, 2024

Uh oh!

apangin Sep 27, 2024

Uh oh!

apangin Oct 11, 2024

Uh oh!

apangin Oct 11, 2024

Uh oh!

apangin Oct 14, 2024

Uh oh!

krk Oct 14, 2024

Uh oh!

apangin Oct 15, 2024

Uh oh!

apangin Oct 15, 2024

Uh oh!

krk Oct 15, 2024

Uh oh!

apangin Oct 15, 2024

Uh oh!

apangin Oct 15, 2024

Uh oh!

apangin Oct 15, 2024

Uh oh!

Uh oh!

Uh oh!

Reset CallTraceStorage counters before reporting live objects. #1009

Reset CallTraceStorage counters before reporting live objects. #1009

Uh oh!

Conversation

krk commented Sep 25, 2024

Related issues

Motivation and context

How has this been tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!