Skip to content

Data flow: Replace hasLocationInfo with getLocation #15853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 18, 2024

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented Mar 8, 2024

This PR changes the API of the shared data flow library from using hasLocationInfo, to instead use getLocation via the existing LocationSig signature for locations. In addition to having a nicer API, it also makes for a small performance improvement, as the 5-column hasLocationInfo predicate no longer needs to be materialized for all (Path)Nodes arising from instantiating the data flow library, we only ever need to materialize Location.hasLocationInfo.

The transformation is mostly straightforward, except for Go, where we need to account for the fact that not all existing tuples in Node.hasLocationInfo may correspond to an extracted location. This is handled by backing Location up by a newtype, which can represent both extracted and synthetic locations. In order to avoid a dependency on Node.hasLocationInfo in Locatable.getLocation, Locatable.getLocation returns not a Location, but instead a DbLocation, which can be cast to a Location, but does not extend Location.

@hvitved hvitved force-pushed the dataflow/get-location branch 2 times, most recently from 0dc9f0c to da423bb Compare March 8, 2024 08:59
@github-actions github-actions bot added the C++ label Mar 8, 2024
@asgerf
Copy link
Contributor

asgerf commented Mar 8, 2024

I just want to call out that this change will add more friction for the JS data flow migration, since JS data flow nodes have locations that don't correspond to any Location. I'm just wondering if this is change is unblocking something, in which case I'll just have to deal with it, but if it's mainly because it's feels "nicer" I'd appreciate it if we can try to hold off on breaking changes like this. 🙏

@hvitved hvitved force-pushed the dataflow/get-location branch 3 times, most recently from 802e254 to fe212db Compare March 11, 2024 09:08
@hvitved hvitved force-pushed the dataflow/get-location branch from fe212db to 1c19b09 Compare March 11, 2024 09:45
@hvitved hvitved force-pushed the dataflow/get-location branch from 555e907 to 01c1cca Compare March 11, 2024 13:43
@hvitved hvitved force-pushed the dataflow/get-location branch from 01c1cca to 73a59de Compare March 11, 2024 15:34
@hvitved hvitved force-pushed the dataflow/get-location branch 3 times, most recently from bfdafa8 to b0e5bd2 Compare March 12, 2024 12:55
@hvitved
Copy link
Contributor Author

hvitved commented Mar 12, 2024

I just want to call out that this change will add more friction for the JS data flow migration, since JS data flow nodes have locations that don't correspond to any Location. I'm just wondering if this is change is unblocking something, in which case I'll just have to deal with it, but if it's mainly because it's feels "nicer" I'd appreciate it if we can try to hold off on breaking changes like this. 🙏

As discussed offline, we agreed to go ahead with this change. The issue with missing Locations is the same in Go (see PR description).

@hvitved hvitved marked this pull request as ready for review March 12, 2024 13:54
@hvitved hvitved requested review from a team as code owners March 12, 2024 13:54
owen-mc
owen-mc previously approved these changes Mar 12, 2024
Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant to say: the Go changes look good to me.

Copy link
Contributor

@aibaars aibaars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. One suggestion for the TLocation type for go.

Copy link
Contributor

@aschackmull aschackmull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, but I wonder whether the change to source- and sink-groups is intentional? They now have no location instead of having the empty location.

Copy link
Contributor

@aschackmull aschackmull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the Go changes, but everything else LGTM.

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 👍

@hvitved hvitved merged commit fc55567 into github:main Mar 18, 2024
@hvitved hvitved deleted the dataflow/get-location branch March 18, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants