Skip to content

Missed combining shr and shrx in collatz_f1() #137983

Open
@BreadTom

Description

@BreadTom

See godbolt and GCC bug.

#include <stdint.h>
#include <stdbool.h>

uint64_t
collatz_onlyoddstep (uint64_t oddnum){
  return (3 * oddnum + 1);
}

uint64_t
collatz_oddstep (uint64_t oddnum)
{
  return (3 * oddnum + 1) / 2;
}

uint64_t
collatz_div2tillodd (uint64_t num)
{
  num >>= __builtin_ctzg (num);
  return num;
}

uint64_t
collatz_f0 (uint64_t oddnum)
{
  oddnum = collatz_onlyoddstep (oddnum);
  return collatz_div2tillodd (oddnum);
}

uint64_t
collatz_f1 (uint64_t oddnum)
{
  oddnum = collatz_oddstep (oddnum);
  return collatz_div2tillodd (oddnum);
}

collatz_f1() uses shr then tzcnt then shrx.
collatz_f0() uses only tzcnt then shrx.

collatz_f0() speeds up by 10% when I tested it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions