Loop without side-effect is not eliminated. Leads to O(n) instead of O(1) runtime

Adding the follow method as part of a benchmark to library/alloc/benches/vec.rs

```rust
#[repr(transparent)]
pub struct Foo(usize);


#[inline(never)]
pub fn vec_cast(input: Vec<Foo>) -> Vec<usize> {
    input.into_iter().map(|e| unsafe { std::mem::transmute(e) }).collect()
}
```

which exercises [this specialization in Vec](https://github.com/rust-lang/rust/blob/5d5ff84130da0d74c6ece368dbe821d8f83fa526/library/alloc/src/vec.rs#L2451-L2497)

results in the following assembly (extracted with objdump):

```assembly
0000000000086130 <collectionsbenches::vec::vec_cast>:
   86130:	48 8b 0e             	mov    (%rsi),%rcx
   86133:	48 89 f8             	mov    %rdi,%rax
   86136:	48 8b 56 08          	mov    0x8(%rsi),%rdx
   8613a:	48 8b 7e 10          	mov    0x10(%rsi),%rdi
   8613e:	48 89 ce             	mov    %rcx,%rsi
   86141:	48 85 ff             	test   %rdi,%rdi
   86144:	74 10                	je     86156 <collectionsbenches::vec::vec_cast+0x26>
   86146:	48 8d 34 f9          	lea    (%rcx,%rdi,8),%rsi
   8614a:	48 c1 e7 03          	shl    $0x3,%rdi
   8614e:	66 90                	xchg   %ax,%ax
   86150:	48 83 c7 f8          	add    $0xfffffffffffffff8,%rdi ; <= RDI unused from here onwards
   86154:	75 fa                	jne    86150 <collectionsbenches::vec::vec_cast+0x20>
   86156:	48 29 ce             	sub    %rcx,%rsi
   86159:	48 89 08             	mov    %rcx,(%rax)
   8615c:	48 89 50 08          	mov    %rdx,0x8(%rax)
   86160:	48 c1 fe 03          	sar    $0x3,%rsi
   86164:	48 89 70 10          	mov    %rsi,0x10(%rax)
   86168:	c3                   	retq   
```

The ghidra decompile for the same function (comments are mine):

```C
void collectionsbenches::vec::vec_cast(long *param_1,long *param_2)

{
  long lVar1;
  long lVar2;
  long lVar3;
  long lVar4;
  
  lVar1 = *param_2; // pointer
  lVar2 = param_2[1]; // capacity
  lVar4 = param_2[2]; // len
  lVar3 = lVar1;
  if (lVar4 != 0) {
    lVar3 = lVar1 + lVar4 * 8; // end pointer of vec::IntoIter
    lVar4 = lVar4 << 3; // len in bytes
    do {
      lVar4 = lVar4 + -8;
    } while (lVar4 != 0); // <== lVar4 unused from here onwards
  }
  *param_1 = lVar1; // pointer
  param_1[1] = lVar2; // capacity
  param_1[2] = lVar3 - lVar1 >> 3; // len from pointer difference
  return;
}
```

Note the useless loop.

The number of loop iterations (or rather the pointer increments) is needed to calculate the new length of the output `Vec`. LLVM already manages to hoist `lVar3 = lVar1 + lVar4 * 8;` but then it fails to eliminate the now-useless loop.

The issue does not occur if one uses `input.into_iter().flat_map(|e| None).collect()` instead, which always results in length == 0.

I tried several variations of the loop (e.g. replacing `try_fold` with a simple `while let Some() ...`) but it generally results in the same or worse assembly.

Note: The assembly looks somewhat different if I [run this on godbolt](https://rust.godbolt.org/z/cv1zKM7jn) but the decrementing loop without side-effect is still there. I assume the differences are due to LTO or some other compiler settings.

Tested on commit a1a13b2bc4f 2020-11-21 22:46

@rustbot modify labels: +I-slow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loop without side-effect is not eliminated. Leads to O(n) instead of O(1) runtime #79308

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Loop without side-effect is not eliminated. Leads to O(n) instead of O(1) runtime #79308

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions