[bug#12506] Type-test-override `IndexedSeqOps#foreach` #9820

NthPortal · 2021-12-05T01:28:45Z

Effectively override IndexedSeqOps#foreach to loop
over indices instead of using an Iterator by adding a
type test to IterableOnceOps#foreach.

Fixes scala/bug#12506

Effectively override `IndexedSeqOps#foreach` to loop over indices instead of using an `Iterator` by adding a type test to `IterableOnceOps#foreach`.

julienrf

Thank you April!

Would you have some time to perform some execution time measurements?

src/library/scala/collection/IterableOnce.scala

NthPortal · 2021-12-05T08:51:30Z

I'll try to get to benchmarking. unfortunately, my server is hosting increasingly many things, making it increasingly unsuited to benchmarking, but hey—it's probably still fine

lrytz · 2021-12-06T14:16:12Z

I think we have other such "overrides" in place where a real override would be binary incompatible. At least in 2.12 we accumulated a bunch of them over time.

som-snytt · 2021-12-07T13:31:26Z

The linked commit from strawman overrides all the similar methods such as forall. Edit: probably too much trouble.

I think testing is important because it's easy to accidentally deoptimize on the jvm.

som-snytt · 2021-12-07T13:33:36Z

src/library/scala/collection/IterableOnce.scala

+        }
+      case _ =>
+        val it = iterator
+        while(it.hasNext) f(it.next())


I see the inconsistent whitespace was inherited. Dr Odersky just told someone "no spaceless if(" [indirect quote] on Scala 3.

I'll fix that, and also make the "override" its own method that gets called, for clarity

julienrf · 2021-12-13T18:42:15Z

I just ran the following benchmark on my machine:

package scala.collection

import java.util.concurrent.TimeUnit
import org.openjdk.jmh.annotations._
import org.openjdk.jmh.infra.Blackhole

import scala.collection.immutable.ArraySeq

@BenchmarkMode(Array(Mode.AverageTime))
@Fork(1)
@Threads(1)
@Warmup(iterations = 5)
@Measurement(iterations = 8)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
class ForEachBenchmark {

  @Param(Array("0", "1", "10", "1000", "10000", "100000"))
  var size: Int = _
  var array: Array[String] = _
  var vector: Vector[String] = _
  var arraySeq: ArraySeq[String] = _
  var arrayBuffer: mutable.ArrayBuffer[String] = _

  @Setup(Level.Trial) def initNumbers: Unit = {
    array = Array.fill(size)("foo")
    vector = Vector.fill(size)("foo")
    arraySeq = ArraySeq.fill(size)("foo")
    arrayBuffer = mutable.ArrayBuffer.fill(size)("foo")
  }

  @Benchmark def array_foreach(bh: Blackhole): Unit =
    bh.consume(array.foreach(bh.consume))

  @Benchmark def vector_foreach(bh: Blackhole): Unit =
    bh.consume(vector.foreach(bh.consume))

  @Benchmark def arraySeq_foreach(bh: Blackhole): Unit =
    bh.consume(arraySeq.foreach(bh.consume))

  @Benchmark def arrayBuffer_foreach(bh: Blackhole): Unit =
    bh.consume(arrayBuffer.foreach(bh.consume))

}

Here are the results:

Before:

[info] ForEachBenchmark.arrayBuffer_foreach       0  avgt    8       3.607 ±     0.111  ns/op
[info] ForEachBenchmark.arrayBuffer_foreach       1  avgt    8       8.205 ±     0.217  ns/op
[info] ForEachBenchmark.arrayBuffer_foreach      10  avgt    8      52.256 ±     2.819  ns/op
[info] ForEachBenchmark.arrayBuffer_foreach    1000  avgt    8    4652.913 ±   104.411  ns/op
[info] ForEachBenchmark.arrayBuffer_foreach   10000  avgt    8   47493.288 ±  1164.611  ns/op
[info] ForEachBenchmark.arrayBuffer_foreach  100000  avgt    8  497097.001 ± 21642.157  ns/op

After:

[info] ForEachBenchmark.arrayBuffer_foreach       0  avgt    8       3.342 ±      0.039  ns/op
[info] ForEachBenchmark.arrayBuffer_foreach       1  avgt    8       7.650 ±      0.593  ns/op
[info] ForEachBenchmark.arrayBuffer_foreach      10  avgt    8      48.327 ±      1.151  ns/op
[info] ForEachBenchmark.arrayBuffer_foreach    1000  avgt    8    4598.888 ±    626.385  ns/op
[info] ForEachBenchmark.arrayBuffer_foreach   10000  avgt    8   48920.076 ±   3978.785  ns/op
[info] ForEachBenchmark.arrayBuffer_foreach  100000  avgt    8  494307.976 ±  51521.829  ns/op

Before:

[info] ForEachBenchmark.arraySeq_foreach          0  avgt    8       3.909 ±     0.096  ns/op
[info] ForEachBenchmark.arraySeq_foreach          1  avgt    8       7.309 ±     0.168  ns/op
[info] ForEachBenchmark.arraySeq_foreach         10  avgt    8      45.904 ±     0.638  ns/op
[info] ForEachBenchmark.arraySeq_foreach       1000  avgt    8    3455.772 ±   112.868  ns/op
[info] ForEachBenchmark.arraySeq_foreach      10000  avgt    8   34451.301 ±   254.424  ns/op
[info] ForEachBenchmark.arraySeq_foreach     100000  avgt    8  344377.487 ±  3330.170  ns/op

After:

[info] ForEachBenchmark.arraySeq_foreach          0  avgt    8       4.248 ±      0.652  ns/op
[info] ForEachBenchmark.arraySeq_foreach          1  avgt    8       8.714 ±      0.354  ns/op
[info] ForEachBenchmark.arraySeq_foreach         10  avgt    8      52.733 ±      4.276  ns/op
[info] ForEachBenchmark.arraySeq_foreach       1000  avgt    8    4399.159 ±    868.108  ns/op
[info] ForEachBenchmark.arraySeq_foreach      10000  avgt    8   44252.763 ±   7538.913  ns/op
[info] ForEachBenchmark.arraySeq_foreach     100000  avgt    8  526270.677 ± 611607.069  ns/op

Before:

[info] ForEachBenchmark.array_foreach             0  avgt    8       3.355 ±     0.051  ns/op
[info] ForEachBenchmark.array_foreach             1  avgt    8       6.762 ±     0.169  ns/op
[info] ForEachBenchmark.array_foreach            10  avgt    8      70.455 ±     0.758  ns/op
[info] ForEachBenchmark.array_foreach          1000  avgt    8    3878.833 ±  3844.431  ns/op
[info] ForEachBenchmark.array_foreach         10000  avgt    8   31033.782 ±   840.745  ns/op
[info] ForEachBenchmark.array_foreach        100000  avgt    8  314350.427 ± 23601.784  ns/op

After (Arrays should not be affected by the PR):

[info] ForEachBenchmark.array_foreach             0  avgt    8       3.505 ±      0.334  ns/op
[info] ForEachBenchmark.array_foreach             1  avgt    8       7.555 ±      0.797  ns/op
[info] ForEachBenchmark.array_foreach            10  avgt    8      38.023 ±      5.219  ns/op
[info] ForEachBenchmark.array_foreach          1000  avgt    8    3358.272 ±    408.329  ns/op
[info] ForEachBenchmark.array_foreach         10000  avgt    8   35310.950 ±   6418.211  ns/op
[info] ForEachBenchmark.array_foreach        100000  avgt    8  367379.432 ±  45226.684  ns/op

Before:

[info] ForEachBenchmark.vector_foreach            0  avgt    8       3.366 ±     0.127  ns/op
[info] ForEachBenchmark.vector_foreach            1  avgt    8       7.075 ±     0.180  ns/op
[info] ForEachBenchmark.vector_foreach           10  avgt    8      35.302 ±     0.917  ns/op
[info] ForEachBenchmark.vector_foreach         1000  avgt    8    3763.463 ±   244.501  ns/op
[info] ForEachBenchmark.vector_foreach        10000  avgt    8   37467.608 ±   569.759  ns/op
[info] ForEachBenchmark.vector_foreach       100000  avgt    8  384386.185 ± 29821.773  ns/op

After:

[info] ForEachBenchmark.vector_foreach            0  avgt    8       3.668 ±      0.558  ns/op
[info] ForEachBenchmark.vector_foreach            1  avgt    8       7.121 ±      0.155  ns/op
[info] ForEachBenchmark.vector_foreach           10  avgt    8      36.102 ±      3.690  ns/op
[info] ForEachBenchmark.vector_foreach         1000  avgt    8    3977.236 ±    561.623  ns/op
[info] ForEachBenchmark.vector_foreach        10000  avgt    8   37354.477 ±   1814.811  ns/op
[info] ForEachBenchmark.vector_foreach       100000  avgt    8  375880.352 ±  17324.349  ns/op

Not sure if I did something wrong. But there seems to be no impact on the performance.

Ichoran · 2021-12-13T19:09:04Z

If there's no difference either way for anything, is it actually worth adding to the code complexity? Nominally it avoids an object creation, which one would think was good, but it looks to me like the object creation is getting optimized out or you'd be able to see the initializer cost.

Can you check GC churn before/after to verify?

I'd imagine in cases with less optimization, you'd get increased memory usage and initializer costs, but then you'd get a Vector slowdown because it's not very good at indexing.

SethTisue · 2021-12-13T21:08:28Z

@WojciechMazur you were the original reporter on Discord, I believe. Could you test this PR with your code, or supply your code in benchmark form?

lrytz · 2021-12-14T09:48:42Z

One potential issue with the benchmark is that it's monomorphic in the collection type.

  @Benchmark def arraySeq_foreach(bh: Blackhole): Unit =
    bh.consume(arraySeq.foreach(bh.consume))

In the JVM running the above, the IterableOnce.foreach method is only ever called on an ArraySeq receiver. So the JVM can inline through the ArraySeq iterator in val it = iterator; while(it.hasNext) f(it.next()), which uses an ArrayIterator.

That doesn't explain why the while (i < len) { f(seq(i)); i += 1 } version ends up running slower on an ArraySeq. But in a real world application, it's not likely that IterableOnce.foreach is going to be monomorphic in the receiver across the JVM lifetime.

I assume it should be possible to pollute the profile (run foreach on various collection types) before starting to measure an individual collection.

WojciechMazur · 2021-12-14T16:38:10Z

Hey, the original issue was discovered in one of Scala Native old benchmarks, which are pretty deprecated and in some cases not very idiomatic. The gist with benchmark adopted to be run with Jmh can be found in this link.

Unfortunately, when testing locally it looks this change would introduce regression, here are my results, the last one refers to a situation when changed change is moved to IndexSeqOps as an overridden method

	Benchmark	Mode	Cnt	Score	Error	Units
before	KmeansBenchmark.run	avgt	20	44323720.686	636910.669	ns/op
after	KmeansBenchmark.run	avgt	20	94648934.033	897606.542	ns/op
as-override	KmeansBenchmark.run	avgt	20	69258123.109	1885916.721	ns/op

I have no idea why these results are so different from the ones I was seeing before. Maybe I have some interference due to some background processes I'm not aware of.
Even though it maybe not be directly linked, still in Scala Native there are two benchmarks for which performance regression is clearly visible between Scala 2.11 and 2.13 (~2.4x longer execution time here's link to comparison). When tracking it down in the assembly I've seen that most of the execution time was spent while iterating over collections. Additionally, after applying overrides in SN (via the mechanism of Scala stdlib overrides when compiling it) I was observing speedup.

SethTisue · 2022-01-19T03:58:04Z

Marking as draft since the prospects of this progressing appear uncertain. @WojciechMazur are you interested in pursuing this further? @NthPortal, you?

NthPortal · 2022-01-30T21:40:38Z

@SethTisue I don't particularly intend to pursue this further. if justified by benchmarks, I'd be happy to tweak the code to be more in line with other type-test-overrides, but unless someone else justifies it, it's not clear to me that it's worth working on. happy for someone else to reopen or copy if they think it's worth it

[bug#12506] Type-test-override IndexedSeqOps#foreach

f80fab1

Effectively override `IndexedSeqOps#foreach` to loop over indices instead of using an `Iterator` by adding a type test to `IterableOnceOps#foreach`.

NthPortal added performance the need for speed. usually compiler performance, sometimes runtime performance. library:collections PRs involving changes to the standard collection library labels Dec 5, 2021

NthPortal requested a review from a team December 5, 2021 01:28

scala-jenkins added this to the 2.13.8 milestone Dec 5, 2021

julienrf reviewed Dec 5, 2021

View reviewed changes

src/library/scala/collection/IterableOnce.scala Show resolved Hide resolved

som-snytt reviewed Dec 7, 2021

View reviewed changes

dwijnand modified the milestones: 2.13.8, 2.13.9 Dec 15, 2021

SethTisue modified the milestones: 2.13.8, 2.13.9 Dec 15, 2021

SethTisue marked this pull request as draft January 19, 2022 03:56

NthPortal closed this Jan 30, 2022

SethTisue removed this from the 2.13.9 milestone Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug#12506] Type-test-override `IndexedSeqOps#foreach` #9820

[bug#12506] Type-test-override `IndexedSeqOps#foreach` #9820

Uh oh!

NthPortal commented Dec 5, 2021

Uh oh!

julienrf left a comment

Uh oh!

Uh oh!

NthPortal commented Dec 5, 2021

Uh oh!

lrytz commented Dec 6, 2021

Uh oh!

som-snytt commented Dec 7, 2021 •

edited

Loading

Uh oh!

som-snytt Dec 7, 2021

Uh oh!

NthPortal Dec 9, 2021

Uh oh!

julienrf commented Dec 13, 2021

Uh oh!

Ichoran commented Dec 13, 2021

Uh oh!

SethTisue commented Dec 13, 2021

Uh oh!

lrytz commented Dec 14, 2021 •

edited

Loading

Uh oh!

WojciechMazur commented Dec 14, 2021 •

edited

Loading

Uh oh!

SethTisue commented Jan 19, 2022

Uh oh!

NthPortal commented Jan 30, 2022

Uh oh!

Uh oh!

[bug#12506] Type-test-override IndexedSeqOps#foreach #9820

[bug#12506] Type-test-override IndexedSeqOps#foreach #9820

Uh oh!

Conversation

NthPortal commented Dec 5, 2021

Uh oh!

julienrf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NthPortal commented Dec 5, 2021

Uh oh!

lrytz commented Dec 6, 2021

Uh oh!

som-snytt commented Dec 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

som-snytt Dec 7, 2021

Choose a reason for hiding this comment

Uh oh!

NthPortal Dec 9, 2021

Choose a reason for hiding this comment

Uh oh!

julienrf commented Dec 13, 2021

Uh oh!

Ichoran commented Dec 13, 2021

Uh oh!

SethTisue commented Dec 13, 2021

Uh oh!

lrytz commented Dec 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WojciechMazur commented Dec 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SethTisue commented Jan 19, 2022

Uh oh!

NthPortal commented Jan 30, 2022

Uh oh!

Uh oh!

[bug#12506] Type-test-override `IndexedSeqOps#foreach` #9820

[bug#12506] Type-test-override `IndexedSeqOps#foreach` #9820

som-snytt commented Dec 7, 2021 •

edited

Loading

lrytz commented Dec 14, 2021 •

edited

Loading

WojciechMazur commented Dec 14, 2021 •

edited

Loading