Skip to content

libextra takes a very long time to build #6820

Closed
@Aatch

Description

@Aatch

This is mostly spend during the LLVM passes, specifically this breakdown:

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 6.9833 seconds (6.7659 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   1.7633 ( 26.6%)   0.1033 ( 29.8%)   1.8667 ( 26.7%)   1.8131 ( 26.8%)  Instruction Selection
   1.6800 ( 25.3%)   0.0867 ( 25.0%)   1.7667 ( 25.3%)   1.6579 ( 24.5%)  Instruction Scheduling
   0.9167 ( 13.8%)   0.0567 ( 16.3%)   0.9733 ( 13.9%)   0.9297 ( 13.7%)  Instruction Creation
   0.7833 ( 11.8%)   0.0233 (  6.7%)   0.8067 ( 11.6%)   0.8096 ( 12.0%)  DAG Combining 1
   0.4667 (  7.0%)   0.0267 (  7.7%)   0.4933 (  7.1%)   0.5465 (  8.1%)  DAG Legalization
   0.4567 (  6.9%)   0.0133 (  3.8%)   0.4700 (  6.7%)   0.4290 (  6.3%)  DAG Combining 2
   0.3900 (  5.9%)   0.0200 (  5.8%)   0.4100 (  5.9%)   0.3805 (  5.6%)  Type Legalization
   0.0967 (  1.5%)   0.0167 (  4.8%)   0.1133 (  1.6%)   0.0888 (  1.3%)  Instruction Scheduling Cleanup
   0.0500 (  0.8%)   0.0000 (  0.0%)   0.0500 (  0.7%)   0.0671 (  1.0%)  Vector Legalization
   0.0333 (  0.5%)   0.0000 (  0.0%)   0.0333 (  0.5%)   0.0438 (  0.6%)  DAG Combining after legalize types
   6.6367 (100.0%)   0.3467 (100.0%)   6.9833 (100.0%)   6.7659 (100.0%)  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.2067 seconds (0.2064 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.2000 ( 98.4%)   0.0033 (100.0%)   0.2033 ( 98.4%)   0.2007 ( 97.2%)  DWARF Exception Writer
   0.0033 (  1.6%)   0.0000 (  0.0%)   0.0033 (  1.6%)   0.0057 (  2.8%)  DWARF Debug Writer
   0.2033 (100.0%)   0.0033 (100.0%)   0.2067 (100.0%)   0.2064 (100.0%)  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 595.2133 seconds (595.9404 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  344.4500 ( 58.6%)   0.0133 (  0.2%)  344.4633 ( 57.9%)  344.8825 ( 57.9%)  Loop Invariant Code Motion
  90.9533 ( 15.5%)   2.7400 ( 35.9%)  93.6933 ( 15.7%)  93.8664 ( 15.8%)  Value Propagation
  82.5733 ( 14.1%)   2.4733 ( 32.4%)  85.0467 ( 14.3%)  85.1964 ( 14.3%)  Value Propagation
  10.7167 (  1.8%)   0.5633 (  7.4%)  11.2800 (  1.9%)  11.2990 (  1.9%)  X86 DAG->DAG Instruction Selection
   7.8900 (  1.3%)   0.0200 (  0.3%)   7.9100 (  1.3%)   7.9360 (  1.3%)  Dead Store Elimination
   6.7900 (  1.2%)   0.1200 (  1.6%)   6.9100 (  1.2%)   6.9070 (  1.2%)  Global Value Numbering
   4.1200 (  0.7%)   0.0333 (  0.4%)   4.1533 (  0.7%)   4.1638 (  0.7%)  Lazy Value Information Analysis
   3.8533 (  0.7%)   0.1333 (  1.7%)   3.9866 (  0.7%)   3.9785 (  0.7%)  Function Integration/Inlining
   3.8600 (  0.7%)   0.1667 (  2.2%)   4.0267 (  0.7%)   3.9677 (  0.7%)  Combine redundant instructions
   3.7167 (  0.6%)   0.0267 (  0.3%)   3.7433 (  0.6%)   3.7838 (  0.6%)  Lazy Value Information Analysis
   3.4300 (  0.6%)   0.0500 (  0.7%)   3.4800 (  0.6%)   3.5477 (  0.6%)  Combine redundant instructions
   2.8533 (  0.5%)   0.0800 (  1.0%)   2.9333 (  0.5%)   2.9895 (  0.5%)  Combine redundant instructions
   1.8967 (  0.3%)   0.3433 (  4.5%)   2.2400 (  0.4%)   2.2682 (  0.4%)  Dominator Tree Construction
   1.7967 (  0.3%)   0.2333 (  3.1%)   2.0300 (  0.3%)   2.0185 (  0.3%)  Simplify the CFG
   1.9900 (  0.3%)   0.0500 (  0.7%)   2.0400 (  0.3%)   2.0104 (  0.3%)  Combine redundant instructions
   1.2500 (  0.2%)   0.0667 (  0.9%)   1.3167 (  0.2%)   1.3386 (  0.2%)  X86 Assembly / Object Emitter
   1.2533 (  0.2%)   0.0200 (  0.3%)   1.2733 (  0.2%)   1.2525 (  0.2%)  Fast Register Allocator
   1.1433 (  0.2%)   0.0133 (  0.2%)   1.1567 (  0.2%)   1.1257 (  0.2%)  MemCpy Optimization
   0.9367 (  0.2%)   0.0133 (  0.2%)   0.9500 (  0.2%)   0.9755 (  0.2%)  Early CSE
   0.8500 (  0.1%)   0.0100 (  0.1%)   0.8600 (  0.1%)   0.8805 (  0.1%)  Early CSE
   0.7400 (  0.1%)   0.0567 (  0.7%)   0.7967 (  0.1%)   0.7889 (  0.1%)  Combine redundant instructions
   0.6600 (  0.1%)   0.0167 (  0.2%)   0.6767 (  0.1%)   0.5870 (  0.1%)  Reassociate expressions
   0.5967 (  0.1%)   0.0133 (  0.2%)   0.6100 (  0.1%)   0.5797 (  0.1%)  Sparse Conditional Constant Propagation
   0.5600 (  0.1%)   0.0100 (  0.1%)   0.5700 (  0.1%)   0.5748 (  0.1%)  Module Verifier
   0.4800 (  0.1%)   0.0067 (  0.1%)   0.4867 (  0.1%)   0.5019 (  0.1%)  Jump Threading
   0.5000 (  0.1%)   0.0133 (  0.2%)   0.5133 (  0.1%)   0.5009 (  0.1%)  Jump Threading
   0.4300 (  0.1%)   0.0100 (  0.1%)   0.4400 (  0.1%)   0.4926 (  0.1%)  Module Verifier
   0.4133 (  0.1%)   0.0133 (  0.2%)   0.4267 (  0.1%)   0.4251 (  0.1%)  Prologue/Epilogue Insertion & Frame Finalization
   0.3567 (  0.1%)   0.0367 (  0.5%)   0.3933 (  0.1%)   0.4110 (  0.1%)  Induction Variable Simplification
   0.3333 (  0.1%)   0.0100 (  0.1%)   0.3433 (  0.1%)   0.3968 (  0.1%)  Remove redundant instructions
   0.3867 (  0.1%)   0.0033 (  0.0%)   0.3900 (  0.1%)   0.3657 (  0.1%)  Aggressive Dead Code Elimination
   0.3533 (  0.1%)   0.0167 (  0.2%)   0.3700 (  0.1%)   0.3580 (  0.1%)  Natural Loop Information
   0.2233 (  0.0%)   0.0100 (  0.1%)   0.2333 (  0.0%)   0.2630 (  0.0%)  Dominator Tree Construction
   0.3033 (  0.1%)   0.0033 (  0.0%)   0.3067 (  0.1%)   0.2572 (  0.0%)  Dominator Tree Construction
   0.2333 (  0.0%)   0.0100 (  0.1%)   0.2433 (  0.0%)   0.2502 (  0.0%)  Simplify the CFG
   0.2633 (  0.0%)   0.0000 (  0.0%)   0.2633 (  0.0%)   0.2483 (  0.0%)  Two-Address instruction pass
   0.2167 (  0.0%)   0.0067 (  0.1%)   0.2233 (  0.0%)   0.2252 (  0.0%)  Interprocedural Sparse Conditional Constant Propagation
   0.2400 (  0.0%)   0.0133 (  0.2%)   0.2533 (  0.0%)   0.2229 (  0.0%)  Natural Loop Information
   0.2200 (  0.0%)   0.0067 (  0.1%)   0.2267 (  0.0%)   0.2041 (  0.0%)  Machine Function Analysis
   0.1767 (  0.0%)   0.0267 (  0.3%)   0.2033 (  0.0%)   0.1948 (  0.0%)  Early CSE
   0.2067 (  0.0%)   0.0033 (  0.0%)   0.2100 (  0.0%)   0.1879 (  0.0%)  Loop-Closed SSA Form Pass
   0.1667 (  0.0%)   0.0000 (  0.0%)   0.1667 (  0.0%)   0.1751 (  0.0%)  Dominator Tree Construction
   0.1633 (  0.0%)   0.0000 (  0.0%)   0.1633 (  0.0%)   0.1733 (  0.0%)  Simplify the CFG
   0.1867 (  0.0%)   0.0000 (  0.0%)   0.1867 (  0.0%)   0.1629 (  0.0%)  Loop-Closed SSA Form Pass
   0.1533 (  0.0%)   0.0067 (  0.1%)   0.1600 (  0.0%)   0.1571 (  0.0%)  Loop-Closed SSA Form Pass
   0.1367 (  0.0%)   0.0033 (  0.0%)   0.1400 (  0.0%)   0.1532 (  0.0%)  Loop-Closed SSA Form Pass
   0.1067 (  0.0%)   0.0033 (  0.0%)   0.1100 (  0.0%)   0.1364 (  0.0%)  Canonicalize natural loops
   0.1333 (  0.0%)   0.0000 (  0.0%)   0.1333 (  0.0%)   0.1325 (  0.0%)  Global Variable Optimizer
   0.1100 (  0.0%)   0.0267 (  0.3%)   0.1367 (  0.0%)   0.1225 (  0.0%)  Module Verifier
   0.1100 (  0.0%)   0.0033 (  0.0%)   0.1133 (  0.0%)   0.1223 (  0.0%)  X86 FP Stackifier
   0.1400 (  0.0%)   0.0033 (  0.0%)   0.1433 (  0.0%)   0.1222 (  0.0%)  Dominator Tree Construction
   0.1033 (  0.0%)   0.0100 (  0.1%)   0.1133 (  0.0%)   0.1170 (  0.0%)  Unroll loops
   0.1167 (  0.0%)   0.0000 (  0.0%)   0.1167 (  0.0%)   0.1162 (  0.0%)  Dead Global Elimination
   0.1367 (  0.0%)   0.0000 (  0.0%)   0.1367 (  0.0%)   0.1149 (  0.0%)  Post-RA pseudo instruction expansion pass
   0.1000 (  0.0%)   0.0100 (  0.1%)   0.1100 (  0.0%)   0.1107 (  0.0%)  Dominator Tree Construction
   0.0967 (  0.0%)   0.0000 (  0.0%)   0.0967 (  0.0%)   0.1028 (  0.0%)  Simplify the CFG
   0.0900 (  0.0%)   0.0133 (  0.2%)   0.1033 (  0.0%)   0.1023 (  0.0%)  Deduce function attributes
   0.0767 (  0.0%)   0.0067 (  0.1%)   0.0833 (  0.0%)   0.0962 (  0.0%)  Dominator Tree Construction
   0.1067 (  0.0%)   0.0067 (  0.1%)   0.1133 (  0.0%)   0.0935 (  0.0%)  Canonicalize natural loops
   0.1000 (  0.0%)   0.0067 (  0.1%)   0.1067 (  0.0%)   0.0899 (  0.0%)  Loop-Closed SSA Form Pass
   0.0767 (  0.0%)   0.0000 (  0.0%)   0.0767 (  0.0%)   0.0744 (  0.0%)  Dead Argument Elimination
   0.0500 (  0.0%)   0.0033 (  0.0%)   0.0533 (  0.0%)   0.0663 (  0.0%)  No target information
   0.0533 (  0.0%)   0.0000 (  0.0%)   0.0533 (  0.0%)   0.0614 (  0.0%)  Remove unused exception handling info
   0.0367 (  0.0%)   0.0000 (  0.0%)   0.0367 (  0.0%)   0.0613 (  0.0%)  Simplify well-known library calls
   0.0533 (  0.0%)   0.0067 (  0.1%)   0.0600 (  0.0%)   0.0612 (  0.0%)  No Alias Analysis (always returns 'may' alias)
   0.0767 (  0.0%)   0.0000 (  0.0%)   0.0767 (  0.0%)   0.0610 (  0.0%)  Unswitch loops
   0.0567 (  0.0%)   0.0000 (  0.0%)   0.0567 (  0.0%)   0.0572 (  0.0%)  Basic CallGraph Construction
   0.0433 (  0.0%)   0.0033 (  0.0%)   0.0467 (  0.0%)   0.0550 (  0.0%)  Eliminate PHI nodes for register allocation
   0.0433 (  0.0%)   0.0067 (  0.1%)   0.0500 (  0.0%)   0.0547 (  0.0%)  Canonicalize natural loops
   0.0533 (  0.0%)   0.0000 (  0.0%)   0.0533 (  0.0%)   0.0487 (  0.0%)  Tail Call Elimination
   0.0467 (  0.0%)   0.0000 (  0.0%)   0.0467 (  0.0%)   0.0433 (  0.0%)  Expand ISel Pseudo-instructions
   0.0367 (  0.0%)   0.0033 (  0.0%)   0.0400 (  0.0%)   0.0405 (  0.0%)  Canonicalize natural loops
   0.0567 (  0.0%)   0.0033 (  0.0%)   0.0600 (  0.0%)   0.0395 (  0.0%)  Memory Dependence Analysis
   0.0400 (  0.0%)   0.0033 (  0.0%)   0.0433 (  0.0%)   0.0384 (  0.0%)  Remove unreachable blocks from the CFG
   0.0467 (  0.0%)   0.0067 (  0.1%)   0.0533 (  0.0%)   0.0355 (  0.0%)  Scalar Evolution Analysis
   0.0200 (  0.0%)   0.0033 (  0.0%)   0.0233 (  0.0%)   0.0300 (  0.0%)  Memory Dependence Analysis
   0.0200 (  0.0%)   0.0000 (  0.0%)   0.0200 (  0.0%)   0.0271 (  0.0%)  Scalar Evolution Analysis
   0.0300 (  0.0%)   0.0033 (  0.0%)   0.0333 (  0.0%)   0.0265 (  0.0%)  Bundle Machine CFG Edges
   0.0167 (  0.0%)   0.0067 (  0.1%)   0.0233 (  0.0%)   0.0227 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0200 (  0.0%)   0.0067 (  0.1%)   0.0267 (  0.0%)   0.0225 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0200 (  0.0%)   0.0000 (  0.0%)   0.0200 (  0.0%)   0.0213 (  0.0%)  Memory Dependence Analysis
   0.0067 (  0.0%)   0.0000 (  0.0%)   0.0067 (  0.0%)   0.0170 (  0.0%)  Preliminary module verification
   0.0100 (  0.0%)   0.0033 (  0.0%)   0.0133 (  0.0%)   0.0160 (  0.0%)  Preliminary module verification
   0.0067 (  0.0%)   0.0067 (  0.1%)   0.0133 (  0.0%)   0.0151 (  0.0%)  Delete dead loops
   0.0200 (  0.0%)   0.0000 (  0.0%)   0.0200 (  0.0%)   0.0123 (  0.0%)  Exception handling preparation
   0.0167 (  0.0%)   0.0000 (  0.0%)   0.0167 (  0.0%)   0.0121 (  0.0%)  Rotate Loops
   0.0067 (  0.0%)   0.0000 (  0.0%)   0.0067 (  0.0%)   0.0113 (  0.0%)  Inline Cost Analysis
   0.0200 (  0.0%)   0.0000 (  0.0%)   0.0200 (  0.0%)   0.0105 (  0.0%)  No Alias Analysis (always returns 'may' alias)
   0.0067 (  0.0%)   0.0033 (  0.0%)   0.0100 (  0.0%)   0.0104 (  0.0%)  X86 Target Transform Info
   0.0067 (  0.0%)   0.0033 (  0.0%)   0.0100 (  0.0%)   0.0103 (  0.0%)  Target independent code generator's TTI
   0.0100 (  0.0%)   0.0000 (  0.0%)   0.0100 (  0.0%)   0.0102 (  0.0%)  No target information
   0.0067 (  0.0%)   0.0000 (  0.0%)   0.0067 (  0.0%)   0.0063 (  0.0%)  Insert stack protectors
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0058 (  0.0%)  Unroll loops
   0.0000 (  0.0%)   0.0033 (  0.0%)   0.0033 (  0.0%)   0.0054 (  0.0%)  Preliminary module verification
   0.0067 (  0.0%)   0.0000 (  0.0%)   0.0067 (  0.0%)   0.0043 (  0.0%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0042 (  0.0%)  Local Stack Slot Allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)  Recognize loop idioms
   0.0033 (  0.0%)   0.0000 (  0.0%)   0.0033 (  0.0%)   0.0040 (  0.0%)  Lower Garbage Collection Instructions
   0.0000 (  0.0%)   0.0033 (  0.0%)   0.0033 (  0.0%)   0.0036 (  0.0%)  Target Library Information
   0.0033 (  0.0%)   0.0000 (  0.0%)   0.0033 (  0.0%)   0.0018 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)  Strip Unused Function Prototypes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)  Merge Duplicate Global Constants
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
  587.5766 (100.0%)   7.6367 (100.0%)  595.2133 (100.0%)  595.9404 (100.0%)  Total

That is over 10 minutes of time. And yes, over 5 minutes of it is spent doing Loop Invariant Code Motion.

Part of #6819

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions