Skip to content

Reapply #9841: Migrate elementwise_util callers to the variants with out_dtypes in template arguments #10491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: gh/swolchok/430/head
Choose a base branch
from

Conversation

swolchok
Copy link
Contributor

This is necessary to take advantage of #9388, which
creates dtype-specialized implementations for the non-mixed dtype case.

Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size:

Before:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:51 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1310720	81920	0	4295458816	4296851456	1001cc000	cmake-out/test/size_test_all_ops
4358144	98304	0	4296212480	4300668928	100570000	cmake-out/test/size_test_all_optimized_ops

After:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:57 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1376256	81920	0	4295491584	4296949760	1001e4000	cmake-out/test/size_test_all_ops
4423680	98304	0	4296327168	4300849152	10059c000	cmake-out/test/size_test_all_optimized_ops

However, on an absolute basis, size is still below where we are at two PRs ago, which was:

ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:24 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1474560	81920	0	4295655424	4297211904	100224000	cmake-out/test/size_test_all_ops
4489216	98304	0	4296359936	4300947456	1005b4000	cmake-out/test/size_test_all_optimized_ops

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Apr 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10491

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3842103 with merge base 1bd7260 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

swolchok added a commit that referenced this pull request Apr 25, 2025
…out_dtypes in template arguments

This is necessary to take advantage of #9388, which
creates dtype-specialized implementations for the non-mixed dtype case.

Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size:

```
Before:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:51 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1310720	81920	0	4295458816	4296851456	1001cc000	cmake-out/test/size_test_all_ops
4358144	98304	0	4296212480	4300668928	100570000	cmake-out/test/size_test_all_optimized_ops

After:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:57 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1376256	81920	0	4295491584	4296949760	1001e4000	cmake-out/test/size_test_all_ops
4423680	98304	0	4296327168	4300849152	10059c000	cmake-out/test/size_test_all_optimized_ops
```

However, on an absolute basis, size is still below where we are at two PRs ago, which was:

```
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:24 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1474560	81920	0	4295655424	4297211904	100224000	cmake-out/test/size_test_all_ops
4489216	98304	0	4296359936	4300947456	1005b4000	cmake-out/test/size_test_all_optimized_ops
```


ghstack-source-id: 9c6751f
ghstack-comment-id: 2831329157
Pull-Request-resolved: #10491
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 25, 2025
[ghstack-poisoned]
swolchok added a commit that referenced this pull request Apr 25, 2025
…out_dtypes in template arguments

This is necessary to take advantage of #9388, which
creates dtype-specialized implementations for the non-mixed dtype case.

Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size:

```
Before:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:51 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1310720	81920	0	4295458816	4296851456	1001cc000	cmake-out/test/size_test_all_ops
4358144	98304	0	4296212480	4300668928	100570000	cmake-out/test/size_test_all_optimized_ops

After:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:57 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1376256	81920	0	4295491584	4296949760	1001e4000	cmake-out/test/size_test_all_ops
4423680	98304	0	4296327168	4300849152	10059c000	cmake-out/test/size_test_all_optimized_ops
```

However, on an absolute basis, size is still below where we are at two PRs ago, which was:

```
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:24 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1474560	81920	0	4295655424	4297211904	100224000	cmake-out/test/size_test_all_ops
4489216	98304	0	4296359936	4300947456	1005b4000	cmake-out/test/size_test_all_optimized_ops
```


ghstack-source-id: 58e2b05
ghstack-comment-id: 2831329157
Pull-Request-resolved: #10491
@swolchok
Copy link
Contributor Author

internal diff number for size check on this stack is D73691545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants