Skip to content

Reset CallTraceStorage counters before reporting live objects. #1009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 16, 2024

Conversation

krk
Copy link
Contributor

@krk krk commented Sep 25, 2024

Allows collecting allocation and live object traces at the same time.
Also fixes "sometimes missing" stack traces in allocations.

Related issues

#928

Motivation and context

Enabling live object tracing missed stack traces "sometimes".

How has this been tested?

Unit tests asserting on live object's size with varying GC probability.

For the stack traces, JFR recording is created and allocation samples are verified to have stack traces in the JFR.

Sample test assertion without the patch, that is observed "sometimes":

INFO: Running AllocTests.livenessJfr...
WARNING: AllocTests.livenessJfr failed
java.lang.AssertionError: Stack trace missing for id 4939
        at test.alloc.AllocTests.livenessJfr(AllocTests.java:139)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at one.profiler.test.Runner.run(Runner.java:152)
        at one.profiler.test.Runner.run(Runner.java:168)
        at one.profiler.test.Runner.main(Runner.java:217)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@krk krk force-pushed the lives branch 2 times, most recently from c7a8195 to 0383d9b Compare September 25, 2024 14:15
for (u32 slot = 0; slot < capacity; slot++) {
if (keys[slot] != 0) {
CallTraceSample& s = values[slot];
storeRelease(s.counter, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to reset s.samples too?
Check that flame graph is generated correctly both with and without --total option.

@Test(mainClass = RandomBlockRetainer.class, args = "0.7", inputs = { "true" })
@Test(mainClass = RandomBlockRetainer.class, args = "0.8", inputs = { "true" })
@Test(mainClass = RandomBlockRetainer.class, args = "0.9", inputs = { "true" })
@Test(mainClass = RandomBlockRetainer.class, args = "1.0", inputs = { "true" })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the number of test cases reasonably small so that they don't run for ages.
For the future, it may be useful to have different test levels, where we'll run only a subset of tests on each commit.

System.out.println("totalBytes: " + totalBytes + " keepChance: " + keepChance + " lowerLimit: " + lowerLimit
+ " upperLimit: " + upperLimit);

assert lowerLimit <= totalBytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert.isGreater/isLess can be helpful: it prints actual values when the assertion fails (so there is no need to have println statement above).

final boolean live = Boolean.parseBoolean(p.inputs()[0]);
final double keepChance = live ? Double.parseDouble(p.test().args()) : 1.0;

Output out = p.profile("--alloc 1k --total -o collapsed" + (live ? " --live" : ""));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--live is supported on JDK 11+ only. The does should not run or should fail on JDK 8.

Allows collecting allocation and live object traces at the same time.
Also fixes "sometimes missing" stack traces in allocations.
krk added 2 commits October 11, 2024 15:38
This keeps the allocation stack traces in the jfr file while not
inflating collapsed and flamegraph samples.
storeRelease(s.samples, 0);
if (resetSamples) {
storeRelease(s.samples, 0);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When resetting counters, we always need to reset both samples and counter together. Otherwise, one of --total or non-total report will be broken. Also, to calculate metrics like average object size we need to have samples and counter in sync.

@@ -100,6 +100,10 @@ class LiveRefs {
jvmtiEnv* jvmti = VM::jvmti();
Profiler* profiler = Profiler::instance();

// Set CallTraceStorage counters to zero, so only live objects
// will be reported to collapsed and flamegraph outputs.
profiler->resetCounters();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to reset counters only for non-JFR recording.
In JFR recording, each sample is recorded individually, so accumulated counters are not actually used.

@@ -12,6 +12,9 @@ public class Assert {
private static final Logger log = Logger.getLogger(Assert.class.getName());

public static void isGreater(double value, double threshold) {
if (value <= threshold) {
throw new AssertionError("Expected " + value + " > " + threshold);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant lines after merge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, yes!

}

// Allow test framework to find the pid.
Thread.sleep(1000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a sure way to catch intermittent test failures. Is it possible to avoid timed sleeping? If you want to profile allocations from the beginning, maybe it's better to attach profiler as an agent.

if (value < threshold) {
throw new AssertionError("Expected " + value + " >= " + threshold);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For assertions consistency, let's add this after #1027.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are used in the current commit, happy to replace them with #1027 when it is merged.

Comment on lines 104 to 106
if(!profiler->jfrActive()) {
profiler->resetCounters();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce amount of changes and to expose only one function, I'd encapsulate _jfr.active() check under Profiler::resetCounters (or however it will be named). Also because jfrActive() is inherently prone to races: it can be called only in situations when JFR recording state is known not to change.

src/profiler.cpp Outdated
@@ -773,6 +773,12 @@ void Profiler::recordEventOnly(EventType event_type, Event* event) {
_locks[lock_index].unlock();
}

void Profiler::tryResetCounters() {
if(!_jfr.active()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if_(

@@ -100,6 +100,9 @@ class LiveRefs {
jvmtiEnv* jvmti = VM::jvmti();
Profiler* profiler = Profiler::instance();

// Reset counters only for non-JFR recording.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slightly expanded comment would be helpful: why we reset counters here and why only for non-JFR recording.

@apangin apangin merged commit da3f5f3 into async-profiler:master Oct 16, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants