Skip to content

Conversation

@lemire
Copy link
Member

@lemire lemire commented Mar 9, 2025

We have added a few options (such as json_fmt). These trigger a short series of branches in our parse_number_string function. Unfortunately, this is costing us a bit of performance. We can get it back by turning the runtime parameters into template parameters... there is still a branch, but it is more direct and likely extremely cheap.

Building:

cmake -B build -D FASTFLOAT_BENCHMARKS=ON
cmake --build build

Let us benchmark on an Apple M2 processor (LLVM 16). These benchmarks are part of our library.

Before:

⚡  ./build/benchmarks/realbenchmark
####
# reading /Users/dlemire/CVS/github/fast_float/bui/benchmarks/data/canada.txt
####
# read 111126 lines
ASCII volume = 1.93374 MB
fastfloat (64)                          :  1166.05 MB/s (+/- 8.5 %)    67.01 Mfloat/s
fastfloat (32)                          :  1211.52 MB/s (+/- 7.5 %)    69.62 Mfloat/s
UTF-16 volume = 3.86749 MB
fastfloat (64)                          :  2187.60 MB/s (+/- 5.6 %)    62.86 Mfloat/s
fastfloat (32)                          :  2269.15 MB/s (+/- 4.6 %)    65.20 Mfloat/s
####
# reading /Users/dlemire/CVS/github/fast_float/bui/benchmarks/data/mesh.txt
####
# read 73019 lines
ASCII volume = 0.536009 MB
fastfloat (64)                          :   773.74 MB/s (+/- 4.7 %)   105.40 Mfloat/s
fastfloat (32)                          :   686.86 MB/s (+/- 4.3 %)    93.57 Mfloat/s
UTF-16 volume = 1.07202 MB
fastfloat (64)                          :  1551.87 MB/s (+/- 4.8 %)   105.70 Mfloat/s
fastfloat (32)                          :  1396.92 MB/s (+/- 6.0 %)    95.15 Mfloat/s

After:

⚡  ./build/benchmarks/realbenchmark
####
# reading /Users/dlemire/CVS/github/fast_float/build/benchmarks/data/canada.txt
####
# read 111126 lines
ASCII volume = 1.93374 MB
fastfloat (64)                          :  1175.62 MB/s (+/- 4.9 %)    67.56 Mfloat/s
fastfloat (32)                          :  1224.92 MB/s (+/- 4.1 %)    70.39 Mfloat/s
UTF-16 volume = 3.86749 MB
fastfloat (64)                          :  2220.99 MB/s (+/- 4.4 %)    63.82 Mfloat/s
fastfloat (32)                          :  2292.35 MB/s (+/- 4.4 %)    65.87 Mfloat/s
####
# reading /Users/dlemire/CVS/github/fast_float/build/benchmarks/data/mesh.txt
####
# read 73019 lines
ASCII volume = 0.536009 MB
fastfloat (64)                          :   873.69 MB/s (+/- 2.0 %)   119.02 Mfloat/s
fastfloat (32)                          :   785.65 MB/s (+/- 4.3 %)   107.03 Mfloat/s
UTF-16 volume = 1.07202 MB
fastfloat (64)                          :  1638.44 MB/s (+/- 1.8 %)   111.60 Mfloat/s
fastfloat (32)                          :  1548.04 MB/s (+/- 4.2 %)   105.44 Mfloat/s

On the mesh dataset, we get a performance boost of about 15%.

This might be related to PR #307 by @IRainman

@dalle
Copy link
Collaborator

dalle commented Mar 9, 2025

Looks good. Do you think there be a macro for if-constexpr for C++17 enabled compilers?

Something like this?

#if __cplusplus >= 201703L || (defined(_MSVC_LANG) && _MSVC_LANG >= 201703L)
#define FASTFLOAT_IF_CONSTEXPR17(x) if constexpr (x)
#else
#define FASTFLOAT_IF_CONSTEXPR17(x) if (x)
#endif

...
FASTFLOAT_IF_CONSTEXPR17(basic_json_fmt)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the #include "event_counter.h" at the top of the file outside of the ifdef? That way it will compile on Windows too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@IRainman
Copy link

IRainman commented Mar 9, 2025

Nice! Also the code definitely more cache friendly, it's shown by more constant time results on output.

Copy link
Collaborator

@dalle dalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I commented somewhere else that we could explore using if-constexpr, but it's up to you. I'm a bit worried of "constant expression in if-statement" warnings appearing.

@IRainman
Copy link

IRainman commented Mar 9, 2025

Also about std::from_chars and compatibility, as you can remember, the standard regexp isn't ideal and the fmt library, especially with compile time parsing macros
fmt::format(FMT_COMPILE("some value: {}"), value));
much faster than std implementation and generates very compact code.

@IRainman
Copy link

IRainman commented Mar 9, 2025

Looks good. I commented somewhere else that we could explore using if-constexpr, but it's up to you. I'm a bit worried of "constant expression in if-statement" warnings appearing.

It's probably fine because it's old code and in the std function should be consteval in used in constexpr if. Maybe I should write a defect report for standard std::is_constant_evaluated. What do you think?

@lemire
Copy link
Member Author

lemire commented Mar 9, 2025

@dalle

I am adding

#if defined(__cpp_if_constexpr) && __cpp_if_constexpr >= 201606L
#define FASTFLOAT_IF_CONSTEXPR17(x) if constexpr (x)
#else
#define FASTFLOAT_IF_CONSTEXPR17(x) if (x)
#endif

@IRainman
Copy link

IRainman commented Mar 9, 2025

Maybe it's better to convert all options into a template parameter?

@lemire
Copy link
Member Author

lemire commented Mar 10, 2025

Maybe it's better to convert all options into a template parameter?

Well. That's an empirical question (i.e., see benchmarks).

@lemire lemire merged commit e019768 into main Mar 10, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants