Skip to content

Fix windows builds & add some travis windows builds #592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 10, 2018

Conversation

gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented Nov 9, 2018

i686-pc-windows-gnu is still not working on travis, but everything else should be green.

@@ -1109,7 +1111,8 @@ pub unsafe fn _mm256_extractf128_pd(a: __m256d, imm8: i32) -> __m128d {
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_extractf128_si256)
#[inline]
#[target_feature(enable = "avx")]
#[cfg_attr(test, assert_instr(vextractf128, imm8 = 1))]
#[cfg_attr(all(test, not(target_os = "windows")),
assert_instr(vextractf128, imm8 = 1))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm do you know why these tests need to be ignored on Windows? I haven't had a chance to dig in yet but this seems somewhat bad

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are just getting different codegen on windows AFAICT. What was being generated looked "ok".

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 9, 2018

By ok I mean that the fma intrinsics have a number (e.g. 213 or 123) indicating in which xmm{1-3} registers each argument is, and on windows a different one was being generated than on linux. For all others, extracts / inserts it appears that on windows things are being read/written to memory instead of to vectors.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 9, 2018

For example:

---- coresimd::x86::avx::assert__mm256_extractf128_si256_vextractf128 stdout ----
disassembly for coresimd::coresimd::x86::avx::assert__mm256_extractf128_si256_vextractf128::_mm256_extractf128_si256_shim: 
	 0: lea rax,[140150599h] 
	 1: mov rdx,qword ptr [__imp__ZN12stdsimd_test11_DONT_DEDUP17hd862070486ee008dE] 
	 2: mov qword ptr [rdx],rax 
	 3: mov qword ptr [rdx+8],49h 
	 4: vmovaps xmm0,xmmword ptr [rcx+10h] 
	 5: ret 
	 6: � 
thread 'coresimd::x86::avx::assert__mm256_extractf128_si256_vextractf128' panicked at 'failed to find instruction `vextractf128` in the disassembly', crates\stdsimd-test\src\lib.rs:163:9

Here it appears that the vector register is just directly read to memory instead of using vextractf128. I don't know why this is, I suspect that the vector arguments are passed somewhat differently on windows, and llvm decided that doing this is better than using vextractf128. The run-time tests still run and pass.

@alexcrichton
Copy link
Member

Hm so in theory that should have all been fixed by this which tries to ensure the ABI has the registers coming in by-value so LLVM doesn't try to do anything tricky with optimizing the fact that they're already in memory.

It may be though that the ABI isn't what we want or that it was tweaked perhaps. In any case this seems fine by me, it seems highly unlikely anyway to have a Windows-specific bug here

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 9, 2018

@alexcrichton do you have a codegen tests for that windows vectorcall ABI thing ? (I mean in rustc).

@alexcrichton
Copy link
Member

Ah not in rustc afaik, apart from the standard "if we specify this does LLVM not crash?"

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 9, 2018

I don't have a windows machine to test this, but maybe vectorcall is just not working correctly here ?

On godbolt things just work without any ABI: https://rust.godbolt.org/z/Kkyzl2 - and they work better with vectorcall: https://rust.godbolt.org/z/ofFtAy

@alexcrichton
Copy link
Member

Oh hey looks like this may work! -- https://rust.godbolt.org/z/GARM_K

@alexcrichton
Copy link
Member

Oh I should mention the ABI isn't required, the problem is that the ABI for Windows (I think) differs from the ABI for Unix, where Unix passes more things in registers than Windows. The fact that the Windows default ABI passes more things via memory causes the codegen differences on Windows/Unix. The attempt with vectorcall was to change ABIs on Windows to something that passes more arguments via registers, but it looks like that didn't work. The sysv64 ABI though may work!

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 9, 2018

The sysv64 ABI though may work!

I've pushed a commit using it instead of vectorcall, we'll see what happens. Is there a list of ABIs that we support ?

@alexcrichton
Copy link
Member

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 9, 2018

LLVM ERROR: stack allocation size is not a multiple of 8

:(

@alexcrichton
Copy link
Member

alas!

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 10, 2018

so I've reverted the last commit :/

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 10, 2018

I've opened #594 to track this.

@gnzlbg gnzlbg merged commit 53537b3 into rust-lang:master Nov 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants