Skip to content

SeqMap.keys guarantees only key iteration order #10766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

som-snytt
Copy link
Contributor

@som-snytt som-snytt commented Apr 29, 2024

Fixes scala/bug#12991

Ensure that keys of a SeqMap is a Seq in expected order.

That satisfies the unwritten expectation that keys is ordered and mutually comparable.

The implementation of the collection returned by keys is not specified. As an implementation detail or convenience, it is a Vector for small SeqMaps.

@scala-jenkins scala-jenkins added this to the 2.13.15 milestone Apr 29, 2024
@som-snytt
Copy link
Contributor Author

@lrytz deprecated overriding of SeqMap#keys may have been hasty. IF it is useful (a big IF), it is usefully specialized. WDYT?

#10544

@SethTisue SethTisue requested a review from a team April 29, 2024 23:57
@SethTisue SethTisue added the library:collections PRs involving changes to the standard collection library label Apr 29, 2024
@som-snytt som-snytt marked this pull request as ready for review April 30, 2024 05:46
@lrytz
Copy link
Member

lrytz commented Apr 30, 2024

I think either keys is an alais of keySet, or we guarantee additional properties. I assume these would be "strict copy" and "same iteration order for ordered / sorted maps"? Maybe we can add overrides that reutrn Seq after SIP-51, but even without that the properties can be expressed in scaladoc and tests, and it's clear whether something is a bug or not.

Doing something in between can probably avoid some surprises but not really prevent bugs.

@som-snytt
Copy link
Contributor Author

Not sure if lrytz agreed to undeprecate, so I pushed it as a separate commit, unsquashed. Sorry, Jenkins.

To be clear, this PR was motivated by Ichoran's endorsement on the ticket, that even if keys is not (yet) Seq, go ahead and change it to return one anyway, as nobody could possibly be relying on the current behavior.

with nobody being the wiser for it

I can report that I do not feel wiser.

@lrytz
Copy link
Member

lrytz commented May 8, 2024

I see the point that returning a Seq doesn't hurt and can prevent surprises.

But if we don't actually guarantee anything then the value is very limited, surprises when using keys are just delayed.

I think either keys should be a final / deprecatedOverriding alias, or it should have documented properties that clients can rely on, anything in between is useful only by accident. But changing keys to always return a strict Seq copy is a (performance) breaking change, so it's not clear whether we can do that.

@som-snytt
Copy link
Contributor Author

som-snytt commented May 8, 2024

This is only for SeqMaps that don't yet override. It could be restricted further to SeqMap1..4, to be more conservative.

Then we can sort whether it's useful API etc under SIP-51.

Otherwise, I'll have to at-notify Ichoran.

@som-snytt som-snytt changed the title SeqMap.keys is Seq SeqMap.keys is Seq [ci: last-only] May 8, 2024
@som-snytt som-snytt force-pushed the issue/12991-seqmap-keys branch from 67027de to bff3d35 Compare May 8, 2024 20:19
@som-snytt
Copy link
Contributor Author

it's not clear what the API intends:

      assertEquals(/*"Expect the same keys to appear in the join taken either way around.",*/
        ourChanges.mergeByKey(theirChangesRedux).keySet,
        theirChangesRedux.mergeByKey(ourChanges).keys,
      )

@SethTisue
Copy link
Member

SethTisue commented Jul 18, 2024

@som-snytt where does this stand, in your opinion?

also interested in @scala/collections opinions

@som-snytt
Copy link
Contributor Author

Doing something in between can probably avoid some surprises but not really prevent bugs.

This standard is too high for me.

@NthPortal
Copy link
Contributor

I'm pretty sure I've mentioned this before somewhere, but I'd personally prefer that keys be of the as-yet-nonexistent type SeqSet.

I would also like keys in general to be an alias of keySet (maybe eventually the primary and later only method!)

@som-snytt
Copy link
Contributor Author

I don't remember why I reopened, but probably because Seth said

if anyone else cares enough

which is nerd-sniping as moral imperative.

I'll refresh my memory and the branch.

@som-snytt
Copy link
Contributor Author

I see the trouble began with this spurious override in VectorMap:

override def keys: Vector[K] = keysIterator.toVector

The override would make sense on grounds of efficiency, but per the docs, anyone who wants this behavior should write it out.

Lukas's comment is that the doc for keys must specify its guarantee.

If keys are ordered,

c.keys.iterator.sameElements(c.keySet.iterator)

Addtionally, keySet is ordered if the collection is sorted.

It is a non-goal of this PR to guarantee that keys returns a Set. That is what keySet is for. If you want a set that preserves iteration order, then the doc could show how to build that, too.

As an example, ListMap yields a list of keys. If you want an unordered set, use keys.toSet, or keySet for ordered keys.

The only reason to prefer keys.iterator to keySet.iterator is when it's more efficient for unsorted keys. (Speculation.)

I see the ticket is about checking equality. That is an especially weak expectation. Rex suggests the solution in this PR, just make keys return a Seq. But if the purpose of keys is "give me cheap iteration order", then producing a Seq is not desirable. That is Lukas's point about a performance regression, which would be severe for large maps. (The ticket is about tiny maps.)

When I touch the code, perhaps my perspective will change, and I'll note that here.

@som-snytt som-snytt force-pushed the issue/12991-seqmap-keys branch from bff3d35 to f580d8d Compare November 27, 2024 18:16
@som-snytt
Copy link
Contributor Author

The Ichoranism on scala/bug#12808

promise less, but make it work

@som-snytt
Copy link
Contributor Author

som-snytt commented Nov 27, 2024

That is just java.io.IOException: No space left on device

This PR will be squashable: it reverts the deprecated overriding, adds clarifying doc, and also slips in the Ichoranist "make it work" because why not. That calls for a mima exception for the private object, which I did not drill into during this expedition.

@som-snytt som-snytt changed the title SeqMap.keys is Seq [ci: last-only] SeqMap.keys is key iteration order only [ci: last-only] Nov 29, 2024
@som-snytt som-snytt force-pushed the issue/12991-seqmap-keys branch from f580d8d to 87815f5 Compare November 29, 2024 18:04
@som-snytt som-snytt force-pushed the issue/12991-seqmap-keys branch from 87815f5 to 2b71cd0 Compare December 3, 2024 13:29
@som-snytt
Copy link
Contributor Author

Note to self: I ran sbt:scala2> library/mimaReportBinaryIssues locally but was somehow careless and thought it said I was OK without an exclusion (because I had changed the signature?). Now I see the correct report, so I don't know what my mistake was.

@som-snytt som-snytt changed the title SeqMap.keys is key iteration order only [ci: last-only] SeqMap.keys is key iteration order only Dec 3, 2024
@som-snytt som-snytt requested a review from lrytz December 3, 2024 22:22
Copy link
Member

@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're keeping all 3 methods around I'd like the docs to say when to use which - I don't know at this point.

Is there a big risk that existing custom maps would violate the new iteration order guarantee?

@@ -130,6 +132,7 @@ object SeqMap extends MapFactory[SeqMap] {
f(key1, value1)
f(key2, value2)
}
override def keys: Iterable[K] = Vector(key1, key2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc these overrides wouldn't be needed to support the new iteration order guarantee?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is only for the unreasonable expectation that a.keys == b.keys for a couple of SeqMap.

@som-snytt
Copy link
Contributor Author

I gave it some thought again, because what else are the winter holidays for, and I wavered a moment about whether to keep the deprecation of keys. Also, I noticed that in Java, Dictionary has keys (an Enumeration) but Map has keySet.

There is no justification for keys in the standard library, assuming ListMap is always small. The important optimization via retronym is to override keysIterator and avoid Map#iterator which produces (k,v).

The clever pattern is that keySet delegates to keysIterator, which could be protected, if you guarantee keySet.iterator preserves order. Maybe the thinking was that unsorted set does not let you make that guarantee?

It occurs to me that SeqMap could be a SortedMap with an Ordering that reports arrival order, which would solve this API question. That would be for a future version.

assertSameElements(keys.iterator, sm.keys.iterator)
assertSameElements(vm.keys, sm.keys)
assertEquals(vm.keys, sm.keys)
assertEquals(lm.keys, sm.keys)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final edit added the unreasonable expectation back in. Previously, keys changed from Set to Seq after four elements. "But that's a change of only one letter!"

Undeprecate overriding Map keys
@som-snytt som-snytt force-pushed the issue/12991-seqmap-keys branch from 6511259 to 4170e40 Compare February 16, 2025 00:51
@SethTisue SethTisue added the release-notes worth highlighting in next release notes label Feb 28, 2025
@SethTisue
Copy link
Member

Is there a big risk that existing custom maps would violate the new iteration order guarantee?

as far as I can see, it should be sufficient to release-note it to give custom map implementers the heads-up

@SethTisue
Copy link
Member

@som-snytt mind giving the PR title and description your best user-facing, included-in-release notes effort?

@som-snytt
Copy link
Contributor Author

som-snytt commented Feb 28, 2025

The reading list is my discursive comment from the winter holiday followed by my archeological investigation from the Thanksgiving break.

Edit: I feel the need to add, "But don't tell them I told you that."

@som-snytt som-snytt changed the title SeqMap.keys is key iteration order only SeqMap.keys guarantees only key iteration order Feb 28, 2025
Copy link
Member

@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I keep pushing back... I know you're not responsible for where we are today and you're investing time in improving things.

I'm just still completely unclear when to use keys vs keySet. The deprecatedOverriding said that the two are interchangeable; if we change that, I should be able to tell from the docs which does what (including keysIterator).

// scala/scala#10766
ProblemFilters.exclude[DirectMissingMethodProblem]("scala.collection.immutable.SeqMap#EmptySeqMap.keys"),

// IMPROVE YOUR KARMA use trailing comma
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aka, trailing karma

Comment on lines +184 to +187
* By default, the set uses `keysIterator` for iteration and delegates `contains` to the `Map`.
*
* For `Map` implementations with ordered but not sorted keys, such as a `SeqMap`,
* `keys` may efficiently build a representation that preserves order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two phrases tell me some implementation details and that maybe the order is preserved (but only maybe). I think this is potentially more confusing than helpful...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think may governs efficiently, not preserves order.

@@ -229,6 +242,11 @@ trait MapOps[K, +V, +CC[_, _] <: IterableOps[_, AnyConstr, _], +C]
}

/** An [[Iterator]] of the keys contained by this map.
*
* If the collection maintains ordered keys, then `keysIterator` respects that order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful.

* If the collection maintains ordered keys, then `keysIterator` respects that order.
*
* The default implementation uses the iterator of this `Map`,
* but may be overridden for efficiency.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation detail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turtles all the way down.

@@ -207,6 +211,16 @@ trait MapOps[K, +V, +CC[_, _] <: IterableOps[_, AnyConstr, _], +C]
}

/** An [[Iterable]] collection of the keys contained by this map.
*
* The default implementation returns `keySet`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but then my next question as a user is: when does it not return keySet? When should I use this over keySet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They must read the user manual for their implementation.

Comment on lines +217 to +218
* For `Map` implementations with ordered but not sorted keys, such as a `SeqMap`,
* `keys` may efficiently build a representation that preserves order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation detail?

Comment on lines +220 to +223
* Its iterator respects the iteration order of `keysIterator`, such that
* ```
* keys.iterator.sameElements(keySet.iterator)
* ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a property that always holds? "Its iterator" - what is that referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subject is keys, hence keys.iterator. This is the property, yes. That is, don't use equals as in OP.

@som-snytt
Copy link
Contributor Author

Ah, I didn't read far enough back in the discussion.

I only re-opened because Seth

#10766 (comment)

but it's impossible to meet the standard set by

#10766 (comment)

My opinion would have been just to revert the deprecation, but I don't have actual opinions like Ichoran and NthPortal.

#10766 (comment)

This is way too much discussion driven by one user misusing the API (which may be open to abuse).

@som-snytt som-snytt closed this Feb 28, 2025
@som-snytt som-snytt deleted the issue/12991-seqmap-keys branch February 28, 2025 16:18
@som-snytt
Copy link
Contributor Author

I may also have re-opened just for trailing karma.

@SethTisue SethTisue removed this from the 2.13.17 milestone Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
library:collections PRs involving changes to the standard collection library release-notes worth highlighting in next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SeqMap.keys != VectorMap.keys (with identical keys) for sizes 0 to 4
7 participants