Skip to content

[WIP]KAFKA-19080 The constraint on segment.ms is not enforced at topic level #19371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 47 commits into
base: trunk
Choose a base branch
from

Conversation

m1a2st
Copy link
Collaborator

@m1a2st m1a2st commented Apr 4, 2025

The main reason is that we forgot setting the
TopicConfig.SEGMENT_BYTES_CONFIG at least to 1024 * 1024, thus
addressed it, and add a test for it.

@github-actions github-actions bot added triage PRs from the community tools storage Pull requests that target the storage module small Small PRs labels Apr 4, 2025
@m1a2st m1a2st changed the title KAFKA-19080 The constraint on segment.ms is not enforced at topic level [WIP]KAFKA-19080 The constraint on segment.ms is not enforced at topic level Apr 4, 2025
@github-actions github-actions bot added the core Kafka Broker label Apr 4, 2025
@m1a2st
Copy link
Collaborator Author

m1a2st commented Apr 4, 2025

Hello @junrao, @chia7712
I have a question about this problem.
In KAFKA-16368, only the ServerLogConfigs value for log.segment.bytes was modified to 1 MB. However, the TopicConfig which in LogConfig default for segment.bytes remains 14 bytes. Many tests rely on this small segment.bytes value to generate a large number of segments.

I think there are two possible approaches to resolve this:

  1. Add atLeast(1024 * 1024) validation in LogConfig, which would require fixing around 400 tests.
  2. Add a validator in LogManager#updateTopicConfig to validate the change request at runtime.

@junrao
Copy link
Contributor

junrao commented Apr 9, 2025

@m1a2st : Perhaps we could somehow allow the tests to set a small segment.bytes.

@github-actions github-actions bot removed the small Small PRs label Apr 9, 2025
@chia7712
Copy link
Member

chia7712 commented Apr 9, 2025

Maybe we can add a internal config to allow tests to define small size?

@chia7712
Copy link
Member

for example:

   this.internalSegmentSize = getString(TopicConfig.INTERNAL_SEGMENT_BYTES_CONFIG);

   ...
   
    public long segmentSize() {
        if (internalSegmentSize != null) return Long.parseLong(internalSegmentSize);
        return segmentMs;
    }

@github-actions github-actions bot removed needs-attention triage PRs from the community labels Apr 15, 2025
@chia7712
Copy link
Member

@m1a2st could you please check the failed tests?

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m1a2st : Thanks for the PR. Left a couple of comments.

}

// visible for testing
def internalApply(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having an internalApply, could we just use the existing apply and add INTERNAL_SEGMENT_BYTES_CONFIG to MetadataLogConfig?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to move the internal config to MetadataLogConfig, and it would be better to wait #19465 extracting the metadata-related configs from other class to MetadataLogConfig

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that metadata log uses a different approach to allow tests to use a smaller segment bytes than allowed in production. That approach defines the original segment byte config with a small minimal requirement, but adds METADATA_LOG_SEGMENT_MIN_BYTES_CONFIG to enforce the actual minimal requirement in production. This new config could be changed in tests to allow for smaller minimal bytes. The benefit of this approach is that it allows the existing config to be used directly to set a smaller value for tests. The downside is that the doc for min value is inaccurate and the validation is done through a customized logic.

It would be useful to pick the same strategy between metadata log and regular log. The metadata log approach seems slightly better since it's less intrusive. We could fix the inaccurate min value description for production somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse me, the strategy used by metadata log is to add a "internal" config (METADATA_LOG_SEGMENT_MIN_BYTES_CONFIG) to change the (metadata) segment size in testing, and that is what we want to address in this PR - we add a "internal" config for regular log, and so the test can use the "smaller" segment size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just saying that we now have two different ways to achieve the same goal. In the metadata log approach, you set the desired value through the original config, which is segment.bytes. You then set an internal config to change the min constraint.

The approach in this PR is to set the desired value through a different internal config.

It would be useful to choose same approach for both the metadata log and the regular log.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I prefer the approach of adding "internal".xxx config as it provide better user experience for public configs, allowing users to see the "correct" min value. Additionally, we can remove the customized logic of validation.

In short, I suggest to add following changes to this PR.

  1. remove METADATA_LOG_SEGMENT_MIN_BYTES_CONFIG
  2. remove MetadataLogConfig#logSegmentMinBytes
  3. add internal.metadata.log.segment.bytes
  4. customize MetadataLogConfig#logSegmentBytes as following code
    public int logSegmentBytes() {
        if (internalSogSegmentBytes != null) return internalSogSegmentBytes;
        return logSegmentBytes;
    }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @chia7712, @junrao comments, addressed it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why do we need internalApply()? MetadataLogConfig has both the product and internal segment bytes configs and we could just pass both into LogConfig in apply(), right?

@github-actions github-actions bot added the build Gradle build or GitHub Actions label Apr 21, 2025
Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m1a2st : Thanks for the updated PR. A few more comments.

@@ -85,14 +82,13 @@ public class MetadataLogConfig {
.define(METADATA_SNAPSHOT_MAX_INTERVAL_MS_CONFIG, LONG, METADATA_SNAPSHOT_MAX_INTERVAL_MS_DEFAULT, atLeast(0), HIGH, METADATA_SNAPSHOT_MAX_INTERVAL_MS_DOC)
.define(METADATA_LOG_DIR_CONFIG, STRING, null, null, HIGH, METADATA_LOG_DIR_DOC)
.define(METADATA_LOG_SEGMENT_BYTES_CONFIG, INT, METADATA_LOG_SEGMENT_BYTES_DEFAULT, atLeast(Records.LOG_OVERHEAD), HIGH, METADATA_LOG_SEGMENT_BYTES_DOC)
.defineInternal(METADATA_LOG_SEGMENT_MIN_BYTES_CONFIG, INT, METADATA_LOG_SEGMENT_MIN_BYTES_DEFAULT, atLeast(Records.LOG_OVERHEAD), HIGH, METADATA_LOG_SEGMENT_MIN_BYTES_DOC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to set the constraint for METADATA_LOG_SEGMENT_BYTES_CONFIG to be at least 8MB.

Also, I thought the plan is to remove METADATA_LOG_SEGMENT_MIN_BYTES_CONFIG, but add sth like METADATA_INTERNAL_LOG_SEGMENT_BYTES_CONFIG to match the design in LogConfig?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junrao I have discussed with @m1a2st offline, and he will update the PR tomorrow

val metadataLog: KafkaMetadataLog = createMetadataLog(topicPartition, topicId, dataDir, time, scheduler, config, nodeId, defaultLogConfig)

// Print a warning if users have overridden the internal config
if (config.logSegmentBytes() != KafkaRaftClient.MAX_BATCH_SIZE_BYTES) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we don't need this if we add the constraint directly to METADATA_LOG_SEGMENT_BYTES_CONFIG, right?

@@ -112,15 +108,15 @@ public class MetadataLogConfig {
* @param deleteDelayMillis The amount of time to wait before deleting a file from the filesystem
*/
public MetadataLogConfig(int logSegmentBytes,
int logSegmentMinBytes,
int internalLogSegmentMinBytes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this should be renamed to internalLogSegmentBytes since it's no longer the minimum, right?

}

// visible for testing
def internalApply(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why do we need internalApply()? MetadataLogConfig has both the product and internal segment bytes configs and we could just pass both into LogConfig in apply(), right?

@chia7712 chia7712 changed the title KAFKA-19080 The constraint on segment.ms is not enforced at topic level (WIP) KAFKA-19080 The constraint on segment.ms is not enforced at topic level Apr 24, 2025
@m1a2st m1a2st changed the title (WIP) KAFKA-19080 The constraint on segment.ms is not enforced at topic level KAFKA-19080 The constraint on segment.ms is not enforced at topic level Apr 25, 2025
@@ -601,7 +602,7 @@ class SaslSslAdminIntegrationTest extends BaseAdminIntegrationTest with SaslSetu
assertNotEquals(Uuid.ZERO_UUID, createResult.topicId(topic1).get())
assertEquals(topicIds(topic1), createResult.topicId(topic1).get())
assertFutureThrows(classOf[TopicAuthorizationException], createResult.topicId(topic2))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert unrelated change

@@ -567,7 +568,7 @@ class SaslSslAdminIntegrationTest extends BaseAdminIntegrationTest with SaslSetu
client.createAcls(List(denyAcl).asJava, new CreateAclsOptions()).all().get()

val topics = Seq(topic1, topic2)
val configsOverride = Map(TopicConfig.SEGMENT_BYTES_CONFIG -> "100000").asJava
val configsOverride = Map(LogConfig.INTERNAL_SEGMENT_BYTES_CONFIG -> "100000").asJava
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test case is used to verify the custom topic-level config, so we can increase the value to make test pass.

public static final String METADATA_LOG_SEGMENT_BYTES_CONFIG = "metadata.log.segment.bytes";
public static final String METADATA_LOG_SEGMENT_BYTES_DOC = "The maximum size of a single metadata log file.";
public static final int METADATA_LOG_SEGMENT_BYTES_DEFAULT = 1024 * 1024 * 1024;

public static final String INTERNAL_METADATA_LOG_SEGMENT_BYTES_CONFIG = "internal.metadata.log.segment.bytes";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need another internal config? User can configure internal.segment.bytes if they want to create small segment, right?

@m1a2st m1a2st changed the title KAFKA-19080 The constraint on segment.ms is not enforced at topic level [WIP]KAFKA-19080 The constraint on segment.ms is not enforced at topic level Apr 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Gradle build or GitHub Actions clients core Kafka Broker KIP-932 Queues for Kafka kraft storage Pull requests that target the storage module streams tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants