Skip to content

Migrate elementwise_util callers to the variants with out_dtypes in template arguments #9841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 68 commits into from
Apr 23, 2025

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Apr 2, 2025

This is necessary to take advantage of the previous PR #9388, which creates dtype-specialized implementations for the non-mixed dtype case.

The size cost of this approach looking at __TEXT of size_test_all_optimized_ops built with bash test/build_optimized_size_test.sh is 104212 bytes, a 2.2% increase.

swolchok added 30 commits March 18, 2025 17:32
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@swolchok
Copy link
Contributor Author

swolchok commented Apr 2, 2025

Replaces #9741 due to ghstack issues. for your convenience @manuelcandales you reviewed that one already

[ghstack-poisoned]
[ghstack-poisoned]
@swolchok
Copy link
Contributor Author

pretty clear flake and this has been green previously today, merging soon

Base automatically changed from gh/swolchok/383/head to main April 23, 2025 01:05
@swolchok swolchok merged commit b01c7de into main Apr 23, 2025
76 of 83 checks passed
@swolchok swolchok deleted the gh/swolchok/418/head branch April 23, 2025 01:05
jackzhxng added a commit that referenced this pull request Apr 23, 2025
swolchok added a commit that referenced this pull request Apr 25, 2025
…out_dtypes in template arguments

This is necessary to take advantage of #9388, which
creates dtype-specialized implementations for the non-mixed dtype case.

Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size:

```
Before:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:51 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1310720	81920	0	4295458816	4296851456	1001cc000	cmake-out/test/size_test_all_ops
4358144	98304	0	4296212480	4300668928	100570000	cmake-out/test/size_test_all_optimized_ops

After:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:57 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1376256	81920	0	4295491584	4296949760	1001e4000	cmake-out/test/size_test_all_ops
4423680	98304	0	4296327168	4300849152	10059c000	cmake-out/test/size_test_all_optimized_ops
```

However, on an absolute basis, size is still below where we are at two PRs ago, which was:

```
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:24 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1474560	81920	0	4295655424	4297211904	100224000	cmake-out/test/size_test_all_ops
4489216	98304	0	4296359936	4300947456	1005b4000	cmake-out/test/size_test_all_optimized_ops
```


ghstack-source-id: 9c6751f
ghstack-comment-id: 2831329157
Pull-Request-resolved: #10491
swolchok added a commit that referenced this pull request Apr 25, 2025
…out_dtypes in template arguments

This is necessary to take advantage of #9388, which
creates dtype-specialized implementations for the non-mixed dtype case.

Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size:

```
Before:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:51 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1310720	81920	0	4295458816	4296851456	1001cc000	cmake-out/test/size_test_all_ops
4358144	98304	0	4296212480	4300668928	100570000	cmake-out/test/size_test_all_optimized_ops

After:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:57 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1376256	81920	0	4295491584	4296949760	1001e4000	cmake-out/test/size_test_all_ops
4423680	98304	0	4296327168	4300849152	10059c000	cmake-out/test/size_test_all_optimized_ops
```

However, on an absolute basis, size is still below where we are at two PRs ago, which was:

```
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:24 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1474560	81920	0	4295655424	4297211904	100224000	cmake-out/test/size_test_all_ops
4489216	98304	0	4296359936	4300947456	1005b4000	cmake-out/test/size_test_all_optimized_ops
```


ghstack-source-id: 58e2b05
ghstack-comment-id: 2831329157
Pull-Request-resolved: #10491
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants