Skip to content

Add methods on already-sorted sequences that remove or count duplicates. #257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,12 @@ This project follows semantic versioning.

- Bidirectional collections have a new `ends(with:)` method that matches
the behavior of the standard library's `starts(with:)` method. ([#224])
- Sequences that are already sorted can use the `countSortedDuplicates` and
`deduplicateSorted` methods, with eager and lazy versions.
The former returns each unique value paired with the count of
that value's occurances.
The latter returns each unique value,
turning a possibly non-decreasing sequence to a strictly-increasing one.

<!-- *No new changes.* -->

Expand Down
2 changes: 2 additions & 0 deletions Guides/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ These guides describe the design and intention behind the APIs included in the `
#### Subsetting operations

- [`compacted()`](https://github.com/apple/swift-algorithms/blob/main/Guides/Compacted.md): Drops the `nil`s from a sequence or collection, unwrapping the remaining elements.
- [`deduplicateSorted()`, `deduplicateSorted(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/SortedDuplicates.md): Given an already-sorted sequence and the sorting predicate, reduce all runs of a unique value to a single element each. Has eager and lazy variants.
- [`partitioned(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Partition.md): Returns the elements in a sequence or collection that do and do not match a given predicate.
- [`randomSample(count:)`, `randomSample(count:using:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/RandomSampling.md): Randomly selects a specific number of elements from a collection.
- [`randomStableSample(count:)`, `randomStableSample(count:using:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/RandomSampling.md): Randomly selects a specific number of elements from a collection, preserving their original relative order.
Expand All @@ -42,6 +43,7 @@ These guides describe the design and intention behind the APIs included in the `

- [`adjacentPairs()`](https://github.com/apple/swift-algorithms/blob/main/Guides/AdjacentPairs.md): Lazily iterates over tuples of adjacent elements.
- [`chunked(by:)`, `chunked(on:)`, `chunks(ofCount:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Chunked.md): Eager and lazy operations that break a collection into chunks based on either a binary predicate or when the result of a projection changes or chunks of a given count.
- [`countSortedDuplicates()`, `countSortedDuplicates(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/SortedDuplicates.md): Given an already-sorted sequence and the sorting predicate, return each unique value, pairing each with the number of occurances. Has eager and lazy variants.
- [`firstNonNil(_:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/FirstNonNil.md): Returns the first non-`nil` result from transforming a sequence's elements.
- [`grouped(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Grouped.md): Group up elements using the given closure, returning a Dictionary of those groups, keyed by the results of the closure.
- [`indexed()`](https://github.com/apple/swift-algorithms/blob/main/Guides/Indexed.md): Iterate over tuples of a collection's indices and elements.
Expand Down
65 changes: 65 additions & 0 deletions Guides/SortedDuplicates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Sorted Duplicates
[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/SortedDuplicates.swift) |
[Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/SortedDuplicatesTests.swift)]

Being a given a sequence that is already sorted, recognize each run of
identical values.
Use that to determine the length of each identical-value run of
identical values.
Or filter out the duplicate values by removing all occurances of
a given value besides the first.

```swift
// Put examples here
```

## Detailed Design

```swift
extension Sequence {
public func countSortedDuplicates(
by areInIncreasingOrder: (Element, Element) throws -> Bool
) rethrows -> [(value: Element, count: Int)]

public func deduplicateSorted(
by areInIncreasingOrder: (Element, Element) throws -> Bool
) rethrows -> [Element]
}

extension Sequence where Self.Element : Comparable {
public func countSortedDuplicates() -> [(value: Element, count: Int)]

public func deduplicateSorted() -> [Element]
}

extension LazySequenceProtocol {
public func countSortedDuplicates(
by areInIncreasingOrder: @escaping (Element, Element) -> Bool
) -> LazyCountDuplicatesSequence<Elements>

public func deduplicateSorted(
by areInIncreasingOrder: @escaping (Element, Element) -> Bool
) -> some (Sequence<Element> & LazySequenceProtocol)
}

extension LazySequenceProtocol where Self.Element : Comparable {
public func countSortedDuplicates()
-> LazyCountDuplicatesSequence<Elements>

public func deduplicateSorted()
-> some (Sequence<Element> & LazySequenceProtocol)
}

public struct LazyCountDuplicatesSequence<Base: Sequence>
: LazySequenceProtocol
{ /*...*/ }

public struct CountDuplicatesIterator<Base: IteratorProtocol>
: IteratorProtocol
{ /*...*/ }
```

### Complexity

Calling the lazy methods, those defined on `LazySequenceProtocol`, is O(_1_).
Calling the eager methods, those returning an array, is O(_n_).
15 changes: 15 additions & 0 deletions Sources/Algorithms/Documentation.docc/Filtering.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,14 @@ let withNoNils = array.compacted()
// Array(withNoNils) == [10, 30, 2, 3, 5]
```

The `deduplicateSorted()` methods remove consecutive elements of the same equivalence class from an already sorted sequence, turning a possibly non-decreasing sequence to a strictly-increasing one. The sorting predicate can be supplied.

```swift
let numbers = [0, 1, 2, 2, 2, 3, 5, 6, 6, 9, 10, 10]
let deduplicated = numbers.deduplicateSorted()
// Array(deduplicated) == [0, 1, 2, 3, 5, 6, 9, 10]
```

## Topics

### Uniquing Elements
Expand All @@ -34,6 +42,13 @@ let withNoNils = array.compacted()
- ``Swift/Collection/compacted()``
- ``Swift/Sequence/compacted()``

### Removing Duplicates from a Sorted Sequence

- ``Swift/Sequence/deduplicateSorted(by:)``
- ``Swift/Sequence/deduplicateSorted()``
- ``Swift/LazySequenceProtocol/deduplicateSorted(by:)``
- ``Swift/LazySequenceProtocol/deduplicateSorted()``

### Supporting Types

- ``UniquedSequence``
Expand Down
12 changes: 12 additions & 0 deletions Sources/Algorithms/Documentation.docc/Keying.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,15 @@ Convert a sequence to a dictionary, providing keys to individual elements or to
### Grouping Elements by Key

- ``Swift/Sequence/grouped(by:)``

### Counting each Element in a Sorted Sequence

- ``Swift/Sequence/countSortedDuplicates(by:)``
- ``Swift/Sequence/countSortedDuplicates()``
- ``Swift/LazySequenceProtocol/countSortedDuplicates(by:)``
- ``Swift/LazySequenceProtocol/countSortedDuplicates()``

### Supporting Types

- ``LazyCountDuplicatesSequence``
- ``CountDuplicatesIterator``
Loading
Loading