Closed
Description
This is mostly spend during the LLVM passes, specifically this breakdown:
===-------------------------------------------------------------------------===
Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
Total Execution Time: 6.9833 seconds (6.7659 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
1.7633 ( 26.6%) 0.1033 ( 29.8%) 1.8667 ( 26.7%) 1.8131 ( 26.8%) Instruction Selection
1.6800 ( 25.3%) 0.0867 ( 25.0%) 1.7667 ( 25.3%) 1.6579 ( 24.5%) Instruction Scheduling
0.9167 ( 13.8%) 0.0567 ( 16.3%) 0.9733 ( 13.9%) 0.9297 ( 13.7%) Instruction Creation
0.7833 ( 11.8%) 0.0233 ( 6.7%) 0.8067 ( 11.6%) 0.8096 ( 12.0%) DAG Combining 1
0.4667 ( 7.0%) 0.0267 ( 7.7%) 0.4933 ( 7.1%) 0.5465 ( 8.1%) DAG Legalization
0.4567 ( 6.9%) 0.0133 ( 3.8%) 0.4700 ( 6.7%) 0.4290 ( 6.3%) DAG Combining 2
0.3900 ( 5.9%) 0.0200 ( 5.8%) 0.4100 ( 5.9%) 0.3805 ( 5.6%) Type Legalization
0.0967 ( 1.5%) 0.0167 ( 4.8%) 0.1133 ( 1.6%) 0.0888 ( 1.3%) Instruction Scheduling Cleanup
0.0500 ( 0.8%) 0.0000 ( 0.0%) 0.0500 ( 0.7%) 0.0671 ( 1.0%) Vector Legalization
0.0333 ( 0.5%) 0.0000 ( 0.0%) 0.0333 ( 0.5%) 0.0438 ( 0.6%) DAG Combining after legalize types
6.6367 (100.0%) 0.3467 (100.0%) 6.9833 (100.0%) 6.7659 (100.0%) Total
===-------------------------------------------------------------------------===
DWARF Emission
===-------------------------------------------------------------------------===
Total Execution Time: 0.2067 seconds (0.2064 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.2000 ( 98.4%) 0.0033 (100.0%) 0.2033 ( 98.4%) 0.2007 ( 97.2%) DWARF Exception Writer
0.0033 ( 1.6%) 0.0000 ( 0.0%) 0.0033 ( 1.6%) 0.0057 ( 2.8%) DWARF Debug Writer
0.2033 (100.0%) 0.0033 (100.0%) 0.2067 (100.0%) 0.2064 (100.0%) Total
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 595.2133 seconds (595.9404 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
344.4500 ( 58.6%) 0.0133 ( 0.2%) 344.4633 ( 57.9%) 344.8825 ( 57.9%) Loop Invariant Code Motion
90.9533 ( 15.5%) 2.7400 ( 35.9%) 93.6933 ( 15.7%) 93.8664 ( 15.8%) Value Propagation
82.5733 ( 14.1%) 2.4733 ( 32.4%) 85.0467 ( 14.3%) 85.1964 ( 14.3%) Value Propagation
10.7167 ( 1.8%) 0.5633 ( 7.4%) 11.2800 ( 1.9%) 11.2990 ( 1.9%) X86 DAG->DAG Instruction Selection
7.8900 ( 1.3%) 0.0200 ( 0.3%) 7.9100 ( 1.3%) 7.9360 ( 1.3%) Dead Store Elimination
6.7900 ( 1.2%) 0.1200 ( 1.6%) 6.9100 ( 1.2%) 6.9070 ( 1.2%) Global Value Numbering
4.1200 ( 0.7%) 0.0333 ( 0.4%) 4.1533 ( 0.7%) 4.1638 ( 0.7%) Lazy Value Information Analysis
3.8533 ( 0.7%) 0.1333 ( 1.7%) 3.9866 ( 0.7%) 3.9785 ( 0.7%) Function Integration/Inlining
3.8600 ( 0.7%) 0.1667 ( 2.2%) 4.0267 ( 0.7%) 3.9677 ( 0.7%) Combine redundant instructions
3.7167 ( 0.6%) 0.0267 ( 0.3%) 3.7433 ( 0.6%) 3.7838 ( 0.6%) Lazy Value Information Analysis
3.4300 ( 0.6%) 0.0500 ( 0.7%) 3.4800 ( 0.6%) 3.5477 ( 0.6%) Combine redundant instructions
2.8533 ( 0.5%) 0.0800 ( 1.0%) 2.9333 ( 0.5%) 2.9895 ( 0.5%) Combine redundant instructions
1.8967 ( 0.3%) 0.3433 ( 4.5%) 2.2400 ( 0.4%) 2.2682 ( 0.4%) Dominator Tree Construction
1.7967 ( 0.3%) 0.2333 ( 3.1%) 2.0300 ( 0.3%) 2.0185 ( 0.3%) Simplify the CFG
1.9900 ( 0.3%) 0.0500 ( 0.7%) 2.0400 ( 0.3%) 2.0104 ( 0.3%) Combine redundant instructions
1.2500 ( 0.2%) 0.0667 ( 0.9%) 1.3167 ( 0.2%) 1.3386 ( 0.2%) X86 Assembly / Object Emitter
1.2533 ( 0.2%) 0.0200 ( 0.3%) 1.2733 ( 0.2%) 1.2525 ( 0.2%) Fast Register Allocator
1.1433 ( 0.2%) 0.0133 ( 0.2%) 1.1567 ( 0.2%) 1.1257 ( 0.2%) MemCpy Optimization
0.9367 ( 0.2%) 0.0133 ( 0.2%) 0.9500 ( 0.2%) 0.9755 ( 0.2%) Early CSE
0.8500 ( 0.1%) 0.0100 ( 0.1%) 0.8600 ( 0.1%) 0.8805 ( 0.1%) Early CSE
0.7400 ( 0.1%) 0.0567 ( 0.7%) 0.7967 ( 0.1%) 0.7889 ( 0.1%) Combine redundant instructions
0.6600 ( 0.1%) 0.0167 ( 0.2%) 0.6767 ( 0.1%) 0.5870 ( 0.1%) Reassociate expressions
0.5967 ( 0.1%) 0.0133 ( 0.2%) 0.6100 ( 0.1%) 0.5797 ( 0.1%) Sparse Conditional Constant Propagation
0.5600 ( 0.1%) 0.0100 ( 0.1%) 0.5700 ( 0.1%) 0.5748 ( 0.1%) Module Verifier
0.4800 ( 0.1%) 0.0067 ( 0.1%) 0.4867 ( 0.1%) 0.5019 ( 0.1%) Jump Threading
0.5000 ( 0.1%) 0.0133 ( 0.2%) 0.5133 ( 0.1%) 0.5009 ( 0.1%) Jump Threading
0.4300 ( 0.1%) 0.0100 ( 0.1%) 0.4400 ( 0.1%) 0.4926 ( 0.1%) Module Verifier
0.4133 ( 0.1%) 0.0133 ( 0.2%) 0.4267 ( 0.1%) 0.4251 ( 0.1%) Prologue/Epilogue Insertion & Frame Finalization
0.3567 ( 0.1%) 0.0367 ( 0.5%) 0.3933 ( 0.1%) 0.4110 ( 0.1%) Induction Variable Simplification
0.3333 ( 0.1%) 0.0100 ( 0.1%) 0.3433 ( 0.1%) 0.3968 ( 0.1%) Remove redundant instructions
0.3867 ( 0.1%) 0.0033 ( 0.0%) 0.3900 ( 0.1%) 0.3657 ( 0.1%) Aggressive Dead Code Elimination
0.3533 ( 0.1%) 0.0167 ( 0.2%) 0.3700 ( 0.1%) 0.3580 ( 0.1%) Natural Loop Information
0.2233 ( 0.0%) 0.0100 ( 0.1%) 0.2333 ( 0.0%) 0.2630 ( 0.0%) Dominator Tree Construction
0.3033 ( 0.1%) 0.0033 ( 0.0%) 0.3067 ( 0.1%) 0.2572 ( 0.0%) Dominator Tree Construction
0.2333 ( 0.0%) 0.0100 ( 0.1%) 0.2433 ( 0.0%) 0.2502 ( 0.0%) Simplify the CFG
0.2633 ( 0.0%) 0.0000 ( 0.0%) 0.2633 ( 0.0%) 0.2483 ( 0.0%) Two-Address instruction pass
0.2167 ( 0.0%) 0.0067 ( 0.1%) 0.2233 ( 0.0%) 0.2252 ( 0.0%) Interprocedural Sparse Conditional Constant Propagation
0.2400 ( 0.0%) 0.0133 ( 0.2%) 0.2533 ( 0.0%) 0.2229 ( 0.0%) Natural Loop Information
0.2200 ( 0.0%) 0.0067 ( 0.1%) 0.2267 ( 0.0%) 0.2041 ( 0.0%) Machine Function Analysis
0.1767 ( 0.0%) 0.0267 ( 0.3%) 0.2033 ( 0.0%) 0.1948 ( 0.0%) Early CSE
0.2067 ( 0.0%) 0.0033 ( 0.0%) 0.2100 ( 0.0%) 0.1879 ( 0.0%) Loop-Closed SSA Form Pass
0.1667 ( 0.0%) 0.0000 ( 0.0%) 0.1667 ( 0.0%) 0.1751 ( 0.0%) Dominator Tree Construction
0.1633 ( 0.0%) 0.0000 ( 0.0%) 0.1633 ( 0.0%) 0.1733 ( 0.0%) Simplify the CFG
0.1867 ( 0.0%) 0.0000 ( 0.0%) 0.1867 ( 0.0%) 0.1629 ( 0.0%) Loop-Closed SSA Form Pass
0.1533 ( 0.0%) 0.0067 ( 0.1%) 0.1600 ( 0.0%) 0.1571 ( 0.0%) Loop-Closed SSA Form Pass
0.1367 ( 0.0%) 0.0033 ( 0.0%) 0.1400 ( 0.0%) 0.1532 ( 0.0%) Loop-Closed SSA Form Pass
0.1067 ( 0.0%) 0.0033 ( 0.0%) 0.1100 ( 0.0%) 0.1364 ( 0.0%) Canonicalize natural loops
0.1333 ( 0.0%) 0.0000 ( 0.0%) 0.1333 ( 0.0%) 0.1325 ( 0.0%) Global Variable Optimizer
0.1100 ( 0.0%) 0.0267 ( 0.3%) 0.1367 ( 0.0%) 0.1225 ( 0.0%) Module Verifier
0.1100 ( 0.0%) 0.0033 ( 0.0%) 0.1133 ( 0.0%) 0.1223 ( 0.0%) X86 FP Stackifier
0.1400 ( 0.0%) 0.0033 ( 0.0%) 0.1433 ( 0.0%) 0.1222 ( 0.0%) Dominator Tree Construction
0.1033 ( 0.0%) 0.0100 ( 0.1%) 0.1133 ( 0.0%) 0.1170 ( 0.0%) Unroll loops
0.1167 ( 0.0%) 0.0000 ( 0.0%) 0.1167 ( 0.0%) 0.1162 ( 0.0%) Dead Global Elimination
0.1367 ( 0.0%) 0.0000 ( 0.0%) 0.1367 ( 0.0%) 0.1149 ( 0.0%) Post-RA pseudo instruction expansion pass
0.1000 ( 0.0%) 0.0100 ( 0.1%) 0.1100 ( 0.0%) 0.1107 ( 0.0%) Dominator Tree Construction
0.0967 ( 0.0%) 0.0000 ( 0.0%) 0.0967 ( 0.0%) 0.1028 ( 0.0%) Simplify the CFG
0.0900 ( 0.0%) 0.0133 ( 0.2%) 0.1033 ( 0.0%) 0.1023 ( 0.0%) Deduce function attributes
0.0767 ( 0.0%) 0.0067 ( 0.1%) 0.0833 ( 0.0%) 0.0962 ( 0.0%) Dominator Tree Construction
0.1067 ( 0.0%) 0.0067 ( 0.1%) 0.1133 ( 0.0%) 0.0935 ( 0.0%) Canonicalize natural loops
0.1000 ( 0.0%) 0.0067 ( 0.1%) 0.1067 ( 0.0%) 0.0899 ( 0.0%) Loop-Closed SSA Form Pass
0.0767 ( 0.0%) 0.0000 ( 0.0%) 0.0767 ( 0.0%) 0.0744 ( 0.0%) Dead Argument Elimination
0.0500 ( 0.0%) 0.0033 ( 0.0%) 0.0533 ( 0.0%) 0.0663 ( 0.0%) No target information
0.0533 ( 0.0%) 0.0000 ( 0.0%) 0.0533 ( 0.0%) 0.0614 ( 0.0%) Remove unused exception handling info
0.0367 ( 0.0%) 0.0000 ( 0.0%) 0.0367 ( 0.0%) 0.0613 ( 0.0%) Simplify well-known library calls
0.0533 ( 0.0%) 0.0067 ( 0.1%) 0.0600 ( 0.0%) 0.0612 ( 0.0%) No Alias Analysis (always returns 'may' alias)
0.0767 ( 0.0%) 0.0000 ( 0.0%) 0.0767 ( 0.0%) 0.0610 ( 0.0%) Unswitch loops
0.0567 ( 0.0%) 0.0000 ( 0.0%) 0.0567 ( 0.0%) 0.0572 ( 0.0%) Basic CallGraph Construction
0.0433 ( 0.0%) 0.0033 ( 0.0%) 0.0467 ( 0.0%) 0.0550 ( 0.0%) Eliminate PHI nodes for register allocation
0.0433 ( 0.0%) 0.0067 ( 0.1%) 0.0500 ( 0.0%) 0.0547 ( 0.0%) Canonicalize natural loops
0.0533 ( 0.0%) 0.0000 ( 0.0%) 0.0533 ( 0.0%) 0.0487 ( 0.0%) Tail Call Elimination
0.0467 ( 0.0%) 0.0000 ( 0.0%) 0.0467 ( 0.0%) 0.0433 ( 0.0%) Expand ISel Pseudo-instructions
0.0367 ( 0.0%) 0.0033 ( 0.0%) 0.0400 ( 0.0%) 0.0405 ( 0.0%) Canonicalize natural loops
0.0567 ( 0.0%) 0.0033 ( 0.0%) 0.0600 ( 0.0%) 0.0395 ( 0.0%) Memory Dependence Analysis
0.0400 ( 0.0%) 0.0033 ( 0.0%) 0.0433 ( 0.0%) 0.0384 ( 0.0%) Remove unreachable blocks from the CFG
0.0467 ( 0.0%) 0.0067 ( 0.1%) 0.0533 ( 0.0%) 0.0355 ( 0.0%) Scalar Evolution Analysis
0.0200 ( 0.0%) 0.0033 ( 0.0%) 0.0233 ( 0.0%) 0.0300 ( 0.0%) Memory Dependence Analysis
0.0200 ( 0.0%) 0.0000 ( 0.0%) 0.0200 ( 0.0%) 0.0271 ( 0.0%) Scalar Evolution Analysis
0.0300 ( 0.0%) 0.0033 ( 0.0%) 0.0333 ( 0.0%) 0.0265 ( 0.0%) Bundle Machine CFG Edges
0.0167 ( 0.0%) 0.0067 ( 0.1%) 0.0233 ( 0.0%) 0.0227 ( 0.0%) Basic Alias Analysis (stateless AA impl)
0.0200 ( 0.0%) 0.0067 ( 0.1%) 0.0267 ( 0.0%) 0.0225 ( 0.0%) Basic Alias Analysis (stateless AA impl)
0.0200 ( 0.0%) 0.0000 ( 0.0%) 0.0200 ( 0.0%) 0.0213 ( 0.0%) Memory Dependence Analysis
0.0067 ( 0.0%) 0.0000 ( 0.0%) 0.0067 ( 0.0%) 0.0170 ( 0.0%) Preliminary module verification
0.0100 ( 0.0%) 0.0033 ( 0.0%) 0.0133 ( 0.0%) 0.0160 ( 0.0%) Preliminary module verification
0.0067 ( 0.0%) 0.0067 ( 0.1%) 0.0133 ( 0.0%) 0.0151 ( 0.0%) Delete dead loops
0.0200 ( 0.0%) 0.0000 ( 0.0%) 0.0200 ( 0.0%) 0.0123 ( 0.0%) Exception handling preparation
0.0167 ( 0.0%) 0.0000 ( 0.0%) 0.0167 ( 0.0%) 0.0121 ( 0.0%) Rotate Loops
0.0067 ( 0.0%) 0.0000 ( 0.0%) 0.0067 ( 0.0%) 0.0113 ( 0.0%) Inline Cost Analysis
0.0200 ( 0.0%) 0.0000 ( 0.0%) 0.0200 ( 0.0%) 0.0105 ( 0.0%) No Alias Analysis (always returns 'may' alias)
0.0067 ( 0.0%) 0.0033 ( 0.0%) 0.0100 ( 0.0%) 0.0104 ( 0.0%) X86 Target Transform Info
0.0067 ( 0.0%) 0.0033 ( 0.0%) 0.0100 ( 0.0%) 0.0103 ( 0.0%) Target independent code generator's TTI
0.0100 ( 0.0%) 0.0000 ( 0.0%) 0.0100 ( 0.0%) 0.0102 ( 0.0%) No target information
0.0067 ( 0.0%) 0.0000 ( 0.0%) 0.0067 ( 0.0%) 0.0063 ( 0.0%) Insert stack protectors
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0058 ( 0.0%) Unroll loops
0.0000 ( 0.0%) 0.0033 ( 0.0%) 0.0033 ( 0.0%) 0.0054 ( 0.0%) Preliminary module verification
0.0067 ( 0.0%) 0.0000 ( 0.0%) 0.0067 ( 0.0%) 0.0043 ( 0.0%) Analyze Machine Code For Garbage Collection
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0042 ( 0.0%) Local Stack Slot Allocation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) Recognize loop idioms
0.0033 ( 0.0%) 0.0000 ( 0.0%) 0.0033 ( 0.0%) 0.0040 ( 0.0%) Lower Garbage Collection Instructions
0.0000 ( 0.0%) 0.0033 ( 0.0%) 0.0033 ( 0.0%) 0.0036 ( 0.0%) Target Library Information
0.0033 ( 0.0%) 0.0000 ( 0.0%) 0.0033 ( 0.0%) 0.0018 ( 0.0%) Create Garbage Collector Module Metadata
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) Strip Unused Function Prototypes
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.0%) Merge Duplicate Global Constants
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
587.5766 (100.0%) 7.6367 (100.0%) 595.2133 (100.0%) 595.9404 (100.0%) Total
That is over 10 minutes of time. And yes, over 5 minutes of it is spent doing Loop Invariant Code Motion.
Part of #6819
Metadata
Metadata
Assignees
Labels
No labels