Skip to content

Commit 7162ad0

Browse files
author
Brox Chen
authored
FPGA: remove fp-relaxed from compile cli, use fp-precise for widnows emulator (#1321)
1 parent e561b41 commit 7162ad0

File tree

8 files changed

+53
-63
lines changed

8 files changed

+53
-63
lines changed

DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/README.md

+2-6
Original file line numberDiff line numberDiff line change
@@ -83,16 +83,13 @@ The following list shows the key optimization techniques included in the referen
8383
2. Using two copies of the compute matrix to read a full row and a full column per cycle.
8484
3. Converting the nested loop into a single merged loop and applying Triangular Loop optimizations. This approach enables the ability to generate a design that is pipelined efficiently.
8585
4. Fully vectorizing the dot products using loop unrolling.
86-
5. Using the `-Xsfp-relaxed` compiler option to reorder floating point operations and allowing the inference of a specialized dot-product DSP. This option further reduces the number of DSP blocks needed by the implementation, the overall latency, and pipeline depth.
87-
6. Using an efficient memory banking scheme to generate high performance hardware (all local memories are single-read, single-write).
88-
7. Using the `fpga_reg` attribute to insert more pipeline stages where needed to improve the frequency achieved by the design.
86+
5. Using an efficient memory banking scheme to generate high performance hardware (all local memories are single-read, single-write).
87+
6. Using the `fpga_reg` attribute to insert more pipeline stages where needed to improve the frequency achieved by the design.
8988

9089
### Matrix Dimensions and FPGA Resources
9190

9291
In this reference design, the Cholesky decomposition algorithm is used to factor a real _n_ × _n_ matrix. The algorithm computes the vector dot product of two rows of the matrix. In our FPGA implementation, the dot product is computed in a loop over the _n_ elements in the row. The loop is fully unrolled to maximize throughput, so *n* real multiplication operations are performed in parallel on the FPGA and followed by sequential additions to compute the dot product result.
9392

94-
The sample uses the `-fp-relaxed` compiler option, which permits the compiler to reorder floating point additions (for example, to assume that floating point addition is commutative). The compiler reorders the additions so that the dot product arithmetic can be optimally implemented using the specialized floating point Digital Signal Processing (DSP) hardware on the FPGA.
95-
9693
With this optimization, our FPGA implementation requires _n_ DSPs to compute the real floating point dot product. The input matrix is also replicated two times in order to be able to read two full rows per cycle. The matrix size is constrained by the total FPGA DSP and RAM resources available.
9794

9895
### Compiler Flags Used
@@ -101,7 +98,6 @@ With this optimization, our FPGA implementation requires _n_ DSPs to compute the
10198
|:--- |:---
10299
|`-Xshardware` | Target FPGA hardware (as opposed to FPGA emulator)
103100
|`-Xsclock=<target fmax>MHz` | The FPGA backend attempts to achieve <target fmax> MHz
104-
|`-Xsfp-relaxed` | Allows the FPGA backend to re-order floating point arithmetic operations (for example, permit assuming $(a + b + c) == (c + a + b)$ )
105101
|`-Xsparallel=2` | Use 2 cores when compiling the bitstream through Quartus
106102
|`-Xsseed` | Specifies the Quartus compile seed, to potentially yield slightly higher fmax
107103

DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/CMakeLists.txt

+12-10
Original file line numberDiff line numberDiff line change
@@ -66,11 +66,13 @@ endif()
6666

6767
# This is a Windows-specific flag that enables error handling in host code
6868
if(WIN32)
69-
set(PLATFORM_SPECIFIC_COMPILE_FLAGS "/EHsc /Qactypes /Wall /fp:precise")
70-
set(PLATFORM_SPECIFIC_LINK_FLAGS "/Qactypes /fp:precise")
69+
set(PLATFORM_SPECIFIC_COMPILE_FLAGS "/EHsc /Qactypes /Wall")
70+
set(PLATFORM_SPECIFIC_LINK_FLAGS "/Qactypes ")
71+
set(EMULATOR_PLATFORM_FLAGS "/fp:precise")
7172
else()
72-
set(PLATFORM_SPECIFIC_COMPILE_FLAGS "-qactypes -Wall -fno-finite-math-only -fp-model=precise")
73-
set(PLATFORM_SPECIFIC_LINK_FLAGS "-fp-model=precise")
73+
set(PLATFORM_SPECIFIC_COMPILE_FLAGS "-qactypes -Wall -fno-finite-math-only ")
74+
set(PLATFORM_SPECIFIC_LINK_FLAGS "")
75+
set(EMULATOR_PLATFORM_FLAGS "")
7476
endif()
7577

7678
if(IGNORE_DEFAULT_SEED)
@@ -98,12 +100,12 @@ message(STATUS "SEED=${SEED}")
98100
# 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V).
99101
# 2. The "link" stage invokes the compiler's FPGA backend before linking.
100102
# For this reason, FPGA backend flags must be passed as link flags in CMake.
101-
set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_EMULATOR ${BSP_FLAG}")
102-
set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${BSP_FLAG}")
103-
set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed ${USER_SIMULATOR_FLAGS} ${BSP_FLAG}")
104-
set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed ${BSP_FLAG}")
105-
set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed -DFPGA_HARDWARE ${BSP_FLAG}")
106-
set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed ${BSP_FLAG}")
103+
set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${EMULATOR_PLATFORM_FLAGS} ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_EMULATOR ${BSP_FLAG}")
104+
set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${EMULATOR_PLATFORM_FLAGS} ${PLATFORM_SPECIFIC_LINK_FLAGS} ${BSP_FLAG}")
105+
set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} ${USER_SIMULATOR_FLAGS} ${BSP_FLAG}")
106+
set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}")
107+
set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_HARDWARE ${BSP_FLAG}")
108+
set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}")
107109
# use cmake -D USER_HARDWARE_FLAGS=<flags> to set extra flags for FPGA backend compilation
108110

109111

DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/README.md

+4-8
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,6 @@ Performance results are based on testing as of April 26, 2022.
8686

8787
In this reference design, the Cholesky decomposition algorithm is used to factor a real _n_ × _n_ matrix. The algorithm computes the vector dot product of two rows of the matrix. In our FPGA implementation, the dot product is computed in a loop over the row's _n_ elements. The loop is fully unrolled to maximize throughput. As a result, *n* real multiplication operations are performed in parallel on the FPGA, followed by sequential additions to compute the dot product result.
8888

89-
We use the compiler option `-fp-relaxed`, which permits the compiler to reorder floating point additions (i.e. to assume that floating point addition is commutative). The compiler uses this freedom to reorder the additions so that the dot product arithmetic can be optimally implemented using the FPGA's specialized floating point DSP (Digital Signal Processing) hardware.
90-
9189
With this optimization, our FPGA implementation requires _n_ DSPs to compute the real floating point dot product. The input matrix is also replicated two times in order to be able to read two full rows per cycle. The matrix size is constrained by the total FPGA DSP and RAM resources available.
9290

9391
The matrix inversion algorithm used in this reference design performs a Gaussian elimination to invert the triangular matrix _L_ obtained by the Cholesky decomposition. To do so, another _n_ DSPs are required to perform the associated dot-product. Finally, the matrix product of $LI^{\star}LI$ also requires _n_ DSPs.
@@ -111,10 +109,9 @@ The design uses the following key optimization techniques:
111109
2. Using two copies of the compute matrix in order to be able to read a full row and a full column per cycle.
112110
3. Converting the nested loop into a single merged loop and applying Triangular Loop optimizations. This allows us to generate a design that is very well pipelined.
113111
4. Fully vectorizing the dot products using loop unrolling.
114-
5. Using the compiler flag -Xsfp-relaxed to re-order floating point operations and allowing the inference of a specialized dot-product DSP. This further reduces the number of DSP blocks needed by the implementation, the overall latency, and pipeline depth.
115-
6. Using an efficient memory banking scheme to generate high performance hardware (all local memories are single-read, single-write).
116-
7. Using the `fpga_reg` attribute to insert more pipeline stages where needed to improve the frequency achieved by the design.
117-
8. Using the input matrices properties (hermitian positive matrices) to reduce the number of operations. For example, the (_LI_*) * _LI_ computation only requires to compute half of the output matrix as the result is symmetric.
112+
5. Using an efficient memory banking scheme to generate high performance hardware (all local memories are single-read, single-write).
113+
6. Using the `fpga_reg` attribute to insert more pipeline stages where needed to improve the frequency achieved by the design.
114+
7. Using the input matrices properties (hermitian positive matrices) to reduce the number of operations. For example, the (_LI_*) * _LI_ computation only requires to compute half of the output matrix as the result is symmetric.
118115

119116
### Source Code Breakdown
120117

@@ -134,7 +131,6 @@ For descriptions of `streaming_cholesky.hpp`, `streaming_cholesky_inversion.hpp`
134131
|:--- |:---
135132
|`-Xshardware` | Target FPGA hardware (as opposed to FPGA emulator)
136133
|`-Xsclock=<target fmax>MHz` | The FPGA backend attempts to achieve <target fmax> MHz
137-
|`-Xsfp-relaxed` | Allows the FPGA backend to re-order floating point arithmetic operations (for example, permit assuming $(a + b + c) == (c + a + b)$)
138134
|`-Xsparallel=2` | Use 2 cores when compiling the bitstream through Quartus
139135
|`-Xsseed` | Specifies the Quartus compile seed to yield slightly higher, possibly, fmax
140136

@@ -332,4 +328,4 @@ PASSED
332328
333329
Code samples are licensed under the MIT license. See [License.txt](/License.txt) for details.
334330
335-
Third party program Licenses can be found here: [third-party-programs.txt](/third-party-programs.txt).
331+
Third party program Licenses can be found here: [third-party-programs.txt](/third-party-programs.txt).

DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/CMakeLists.txt

+12-10
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,13 @@ endif()
4141

4242
# This is a Windows-specific flag that enables error handling in host code
4343
if(WIN32)
44-
set(PLATFORM_SPECIFIC_COMPILE_FLAGS "/EHsc /Qactypes /Wall /fp:precise")
45-
set(PLATFORM_SPECIFIC_LINK_FLAGS "/Qactypes /fp:precise")
44+
set(PLATFORM_SPECIFIC_COMPILE_FLAGS "/EHsc /Qactypes /Wall ")
45+
set(PLATFORM_SPECIFIC_LINK_FLAGS "/Qactypes ")
46+
set(EMULATOR_PLATFORM_FLAGS "/fp:precise")
4647
else()
47-
set(PLATFORM_SPECIFIC_COMPILE_FLAGS "-qactypes -Wall -fno-finite-math-only -fp-model=precise ")
48-
set(PLATFORM_SPECIFIC_LINK_FLAGS "-fp-model=precise ")
48+
set(PLATFORM_SPECIFIC_COMPILE_FLAGS "-qactypes -Wall -fno-finite-math-only ")
49+
set(PLATFORM_SPECIFIC_LINK_FLAGS "")
50+
set(EMULATOR_PLATFORM_FLAGS "")
4951
endif()
5052

5153
if(DEVICE_FLAG MATCHES "A10")
@@ -106,12 +108,12 @@ message(STATUS "SEED=${SEED}")
106108
# 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V).
107109
# 2. The "link" stage invokes the compiler's FPGA backend before linking.
108110
# For this reason, FPGA backend flags must be passed as link flags in CMake.
109-
set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_EMULATOR ${BSP_FLAG}")
110-
set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${BSP_FLAG}")
111-
set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed ${USER_HARDWARE_FLAGS} ${BSP_FLAG}")
112-
set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} -Xsfp-relaxed ${BSP_FLAG}")
113-
set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed -DFPGA_HARDWARE ${BSP_FLAG}")
114-
set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed ${BSP_FLAG}")
111+
set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${EMULATOR_PLATFORM_FLAGS} ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_EMULATOR ${BSP_FLAG}")
112+
set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${EMULATOR_PLATFORM_FLAGS} ${PLATFORM_SPECIFIC_LINK_FLAGS} ${BSP_FLAG}")
113+
set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}")
114+
set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} ${BSP_FLAG}")
115+
set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_HARDWARE ${BSP_FLAG}")
116+
set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}")
115117
# use cmake -D USER_HARDWARE_FLAGS=<flags> to set extra flags for FPGA backend compilation
116118

117119
###############################################################################

DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/README.md

+2-6
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,6 @@ Performance results are based on testing as of July 29, 2020.
7474

7575
The QR decomposition algorithm factors a complex _m_ × _n_ matrix, where _m__n_. The algorithm computes the vector dot product of two columns of the matrix. In our FPGA implementation, the dot product is computed in a loop over the column's _m_ elements. The loop is unrolled fully to maximize throughput. The *m* complex multiplication operations are performed in parallel on the FPGA followed by sequential additions to compute the dot product result.
7676

77-
The design uses the `-fp-relaxed` option, which permits the compiler to reorder floating point additions (to assume that floating point addition is commutative). The compiler reorders the additions so that the dot product arithmetic can be optimally implemented using the specialized floating point DSP (Digital Signal Processing) hardware in the FPGA.
78-
7977
With this optimization, our FPGA implementation requires 4*m* DSPs to compute the complex floating point dot product or 2*m* DSPs for the real case. The matrix size is constrained by the total FPGA DSP resources available.
8078

8179
By default, the design is parameterized to process 128 × 128 matrices when compiled targeting an Intel® Arria® 10 FPGA. It is parameterized to process 256 × 256 matrices when compiled targeting a Intel® Stratix® 10 or Intel® Agilex™ FPGA; however, the design can process matrices from 4 x 4 to 512 x 512.
@@ -92,17 +90,15 @@ The key optimization techniques used are as follows:
9290
1. Refactoring the original Gram-Schmidt algorithm to merge two dot products into one, reducing the total number of dot products needed to three from two. This helps us reduce the DSPs required for the implementation.
9391
2. Converting the nested loop into a single merged loop and applying Triangular Loop optimizations. This allows us to generate a design that is very well pipelined.
9492
3. Fully vectorizing the dot products using loop unrolling.
95-
4. Using the compiler flag -Xsfp-relaxed to re-order floating point operations and allowing the inference of a specialized dot-product DSP. This further reduces the number of DSP blocks needed by the implementation, the overall latency, and pipeline depth.
96-
5. Using an efficient memory banking scheme to generate high performance hardware.
97-
6. Using the `fpga_reg` attribute to insert more pipeline stages where needed to improve the frequency achieved by the design.
93+
4. Using an efficient memory banking scheme to generate high performance hardware.
94+
5. Using the `fpga_reg` attribute to insert more pipeline stages where needed to improve the frequency achieved by the design.
9895

9996
### Compiler Flags Used
10097

10198
| Flag | Description
10299
|:--- |:---
103100
| `-Xshardware` | Target FPGA hardware (as opposed to FPGA emulator)
104101
| `-Xsclock=360MHz` | The FPGA backend attempts to achieve 360 MHz
105-
| `-Xsfp-relaxed` | Allows the FPGA backend to re-order floating point arithmetic operations (e.g. permit assuming (a + b + c) == (c + a + b) )
106102
| `-Xsparallel=2` | Use 2 cores when compiling the bitstream through Intel® Quartus®
107103
| `-Xsseed` | Specifies the Intel® Quartus® compile seed, to yield slightly higher fmax
108104

0 commit comments

Comments
 (0)