diff --git a/doc/charconv.adoc b/doc/charconv.adoc index f5ecb79..425b5e4 100644 --- a/doc/charconv.adoc +++ b/doc/charconv.adoc @@ -19,13 +19,16 @@ Matt Borland include::charconv/overview.adoc[] include::charconv/build.adoc[] +include::charconv/basic_usage.adoc[] +include::charconv/api_reference.adoc[] include::charconv/from_chars.adoc[] include::charconv/to_chars.adoc[] include::charconv/chars_format.adoc[] include::charconv/limits.adoc[] -include::charconv/reference.adoc[] +#include::charconv/reference.adoc[] include::charconv/benchmarks.adoc[] include::charconv/sources.adoc[] +include::charconv/acknowledgments.adoc[] include::charconv/copyright.adoc[] :leveloffset: -1 diff --git a/doc/charconv/acknowledgments.adoc b/doc/charconv/acknowledgments.adoc new file mode 100644 index 0000000..c2d3b63 --- /dev/null +++ b/doc/charconv/acknowledgments.adoc @@ -0,0 +1,16 @@ +//// +Copyright 2024 Matt Borland +Distributed under the Boost Software License, Version 1.0. +https://www.boost.org/LICENSE_1_0.txt +//// + +[#acknowledgments] += Acknowledgments +:idprefix: ack_ + +Special thanks to the following people (non-inclusive list): + + - Peter Dimov for providing technical guidance and contributing to the library throughout development + - Chris Kormanyos for serving as the library review manager + - Stephan T. Lavavej for providing the basis for the benchmarks. + - All that reviewed the library and provided feedback to make it better diff --git a/doc/charconv/api_reference.adoc b/doc/charconv/api_reference.adoc new file mode 100644 index 0000000..39fe9c2 --- /dev/null +++ b/doc/charconv/api_reference.adoc @@ -0,0 +1,34 @@ +//// +Copyright 2023 Matt Borland +Distributed under the Boost Software License, Version 1.0. +https://www.boost.org/LICENSE_1_0.txt +//// + +[#api_reference] += API Reference +:idprefix: api_ref_ + +== Functions + +- <> +- <> +- <> + +== Structures + +- <> +- <> + +== Enums + +- <> + +== Constants + +- <> +- <> + +== Macros + +- <> +- <> diff --git a/doc/charconv/basic_usage.adoc b/doc/charconv/basic_usage.adoc new file mode 100644 index 0000000..00f389b --- /dev/null +++ b/doc/charconv/basic_usage.adoc @@ -0,0 +1,28 @@ +//// +Copyright 2024 Matt Borland +Distributed under the Boost Software License, Version 1.0. +https://www.boost.org/LICENSE_1_0.txt +//// + +[#basic_usage] += Basic Usage Examples +:idprefix: basic_usage_ + +== Usage Examples +[source, c++] +---- +#include + +const char* buffer = "42"; +int v = 0; +boost::charconv::from_chars_result r = boost::charconv::from_chars(buffer, buffer + std::strlen(buffer), v); +assert(r.ec == std::errc()); +assert(v == 42); + +char buffer[64]; +int v = 123456; +boost::charconv:to_chars_result r = boost::charconv::to_chars(buffer, buffer + sizeof(buffer), v); +assert(r.ec == std::errc()); +assert(!strncmp(buffer, "123456", 6)); // Strncmp returns 0 on match + +---- diff --git a/doc/charconv/benchmarks.adoc b/doc/charconv/benchmarks.adoc index 638b8e2..0d18cde 100644 --- a/doc/charconv/benchmarks.adoc +++ b/doc/charconv/benchmarks.adoc @@ -7,10 +7,14 @@ https://www.boost.org/LICENSE_1_0.txt = Benchmarks :idprefix: benchmarks +This section describes a range of performance benchmarks that have been run comparing this library with the standard library, and how to run your own benchmarks if required. + The values are relative to the performance of `std::printf` and `std::strtoX`. Larger numbers are more performant (e.g. 2.00 means twice as fast, and 0.50 means it takes twice as long). +`std::printf` and `std::strtoX` are always listed first as they will be the reference value. == How to run the Benchmarks +[#run_benchmarks_] To run the benchmarks yourself, navigate to the test folder and define `BOOST_CHARCONV_RUN_BENCHMARKS` when running the tests. An example on Linux with b2: `../../../b2 cxxstd=20 toolset=gcc-13 define=BOOST_CHARCONV_RUN_BENCHMARKS STL_benchmark linkflags="-lfmt" -a release` . @@ -23,11 +27,14 @@ Additionally, you will need the following: * https://github.com/google/double-conversion[libdouble-conversion] * https://github.com/fmtlib/fmt[{fmt}] -== x86_64 Linux +== Results +[#benchmark_results_] + +=== x86_64 Linux Data in tables 1 - 4 were run on Ubuntu 23.04 with x86_64 architecture using GCC 13.1.0 with libstdc++. -=== Floating Point +==== Floating Point .to_chars floating point with the shortest representation |=== @@ -67,7 +74,7 @@ Data in tables 1 - 4 were run on Ubuntu 23.04 with x86_64 architecture using GCC |1.16 / 1.30 |=== -=== Integral +==== Integral .to_chars base 10 integers |=== @@ -103,11 +110,11 @@ Data in tables 1 - 4 were run on Ubuntu 23.04 with x86_64 architecture using GCC |2.54 / 1.78 |=== -== x86_64 Windows +=== x86_64 Windows Data in tables 5 - 8 were run on Windows 11 with x86_64 architecture using MSVC 14.3 (V17.7.0). -=== Floating Point +==== Floating Point .to_chars floating point with the shortest representation |=== @@ -141,7 +148,7 @@ Data in tables 5 - 8 were run on Windows 11 with x86_64 architecture using MSVC |2.06 / 5.21 |=== -=== Integral +==== Integral .to_chars base 10 integers |=== @@ -175,11 +182,11 @@ Data in tables 5 - 8 were run on Windows 11 with x86_64 architecture using MSVC |2.68 / 2.27 |=== -== ARM MacOS +=== ARM MacOS Data in tables 9-12 were run on MacOS Ventura 13.5.2 with M1 Pro architecture using Homebrew GCC 13.2.0 with libstdc++. -=== Floating Point +==== Floating Point .to_chars floating point with the shortest representation |=== @@ -220,7 +227,7 @@ Data in tables 9-12 were run on MacOS Ventura 13.5.2 with M1 Pro architecture us |=== -=== Integral +==== Integral .to_chars base 10 integers |=== @@ -255,6 +262,3 @@ Data in tables 9-12 were run on MacOS Ventura 13.5.2 with M1 Pro architecture us |Boost.Charconv.from_chars |2.27 / 1.65 |=== - -Special thanks to Stephan T. Lavavej for providing the basis for the benchmarks. - diff --git a/doc/charconv/build.adoc b/doc/charconv/build.adoc index 51e9de9..d08ae34 100644 --- a/doc/charconv/build.adoc +++ b/doc/charconv/build.adoc @@ -4,7 +4,7 @@ Distributed under the Boost Software License, Version 1.0. https://www.boost.org/LICENSE_1_0.txt //// -= Building the Library += Getting Started :idprefix: build_ == B2 @@ -29,6 +29,8 @@ To install the development environment, run: sudo ./b2 install cxxstd=11 ---- +The value of cxxstd must be at least 11. https://www.boost.org/doc/libs/1_84_0/tools/build/doc/html/index.html[See the b2 documentation] under `cxxstd` for all valid values. + == vcpkg Run the following commands to clone the latest version of Charconv and install it using vcpkg: @@ -39,7 +41,7 @@ cd charconv vcpkg install charconv --overlay-ports=ports/charconv ---- -Any required Boost packages that do not already exist will be installed automatically. +Any required Boost packages not currently installed in your development environment will be installed automatically. == Conan @@ -67,3 +69,7 @@ For example, using a `conanfile.txt`: [requires] boost_charconv/1.0.0 ---- + +== Dependencies + +This library depends on: Boost.Assert, Boost.Config, Boost.Core, and https://gcc.gnu.org/onlinedocs/libquadmath/[libquadmath] on supported platforms (e.g. Linux with x86, x86_64, PPC64, and IA64). diff --git a/doc/charconv/chars_format.adoc b/doc/charconv/chars_format.adoc index fe9953f..c6b89fe 100644 --- a/doc/charconv/chars_format.adoc +++ b/doc/charconv/chars_format.adoc @@ -8,6 +8,12 @@ https://www.boost.org/LICENSE_1_0.txt :idprefix: chars_format_ == chars_format overview + +`boost::charconv::chars_format` is an `enum class` used to define the format of floating point types with `from_chars` and `to_chars`. + +== Definition +[#chars_format_defintion_] + [source, c++] ---- namespace boost { namespace charconv { @@ -22,7 +28,8 @@ enum class chars_format : unsigned }} // Namespace boost::charconv ---- -`boost::charconv::chars_format` is used to specify the format of floating point types with `from_chars` and `to_chars`. + +== Formats === Scientific Format Scientific format will be of the form `1.3e+03`. diff --git a/doc/charconv/from_chars.adoc b/doc/charconv/from_chars.adoc index e03816c..6c4dc04 100644 --- a/doc/charconv/from_chars.adoc +++ b/doc/charconv/from_chars.adoc @@ -1,5 +1,5 @@ //// -Copyright 2023 Matt Borland +Copyright 2023 - 2024 Matt Borland Distributed under the Boost Software License, Version 1.0. https://www.boost.org/LICENSE_1_0.txt //// @@ -10,7 +10,11 @@ https://www.boost.org/LICENSE_1_0.txt == from_chars overview `from_chars` is a set of functions that parse a string from `[first, last)` in an attempt to convert the string into `value` according to the `chars_format` specified (if applicable). -The result of `from_chars` is `from_chars_result` which on success returns `ptr == last` and `ec == std::errc()`, and on failure returns `ptr` equal to the last valid character parsed or `last` on underflow/overflow, and `ec == std::errc::invalid_argument` or `std::errc::result_out_of_range` respectively. +The parsing of number is locale-independent (e.g. equivalent to the "C" locale). +The result of `from_chars` is `from_chars_result` which on success returns `ptr == last` and `ec == std::errc()`, and on failure returns `ptr` equal to the last valid character parsed or `last` on underflow/overflow, and `ec == std::errc::invalid_argument` or `std::errc::result_out_of_range` respectively. `from_chars` does not require the character sequence to be null terminated. + +== Definitions +[#from_chars_definitions_] [source, c++] ---- @@ -33,51 +37,67 @@ BOOST_CXX14_CONSTEXPR from_chars_result from_chars(const char* first, cons template from_chars_result from_chars(const char* first, const char* last, Real& value, chars_format fmt = chars_format::general) noexcept; -// See note below in from_chars for floating point types +// See note below Usage notes for from_chars for floating point types + template -from_chars_result from_chars_strict(const char* first, const char* last, Real& value, chars_format fmt = chars_format::general) noexcept; +from_chars_result from_chars_erange(const char* first, const char* last, Real& value, chars_format fmt = chars_format::general) noexcept; }} // Namespace boost::charconv ---- -== from_chars_result -* `ptr` - On return from `from_chars` it is a pointer to the first character not matching the pattern, or pointer to `last` if all characters are successfully parsed. -* `ec` - the error code. Values returned by `from_chars` are: -** `std::errc()` - successful parsing -** `std::errc::invalid_argument` - invalid argument (e.g. parsing a negative number into an unsigned type) -** `std::errc::result_out_of_range` - result out of range (e.g. overflow) -* `operator==` - compares the values of ptr and ec for equality - -== from_chars -* `first`, `last` - valid range to parse +== from_chars parameters +* `first`, `last` - pointers to a valid range to parse * `value` - where the output is stored upon successful parsing * `base` (integer only) - the integer base to use. Must be between 2 and 36 inclusive * `fmt` (floating point only) - The format of the buffer. See <> for description. -=== from_chars for integral types +== from_chars_result +* `ptr` - On return from `from_chars` it is a pointer to the first character not matching the pattern, or pointer to `last` if all characters are successfully parsed. +* `ec` - https://en.cppreference.com/w/cpp/error/errc[the error code]. Values returned by `from_chars` are: + +|=== +|Return Value | Description +| `std::errc()` | Successful Parsing +| `std::errc::invalid_argument` | 1) Parsing a negative into an unsigned type + +2) Leading `+` sign + +3) Leading space + +4) Incompatible formatting (e.g. exponent on `chars_format::fixed`, or p as exponent on value that is not `chars_format::hex`) See <> + +| `std::errc::result_out_of_range` | 1) Overflow + +2) Underflow +|=== + +* `operator==` - compares the values of ptr and ec for equality + +== Usage Notes + +=== Usage notes for from_chars for integral types * All built-in integral types are allowed except bool which is deleted * These functions have been tested to support `\__int128` and `unsigned __int128` * from_chars for integral types is constexpr when compiled using `-std=c++14` or newer ** One known exception is GCC 5 which does not support constexpr comparison of `const char*`. +* A valid string must only contain the characters for numbers. Leading spaces are not ignored, and will return `std::errc::invalid_argument`. -=== from_chars for floating point types +=== Usage notes for from_chars for floating point types * On `std::errc::result_out_of_range` we return ±0 for small values (e.g. 1.0e-99999) or ±HUGE_VAL for large values (e.g. 1.0e+99999) to match the handling of `std::strtod`. This is a divergence from the standard which states we should return the `value` argument unmodified. -** The rationale for this divergence is an open issue with LWG here: https://cplusplus.github.io/LWG/lwg-active.html#3081. +** `from_chars` has an open issue with LWG here: https://cplusplus.github.io/LWG/lwg-active.html#3081. The standard for does not distinguish between underflow and overflow like strtod does. Let's say you are writing a JSON library, and you replace `std::strtod` with `boost::charconv::from_chars` for performance reasons. Charconv returns std::errc::result_out_of_range on some conversion. You would then have to parse the string again yourself to figure out which of the four possible reasons you got `std::errc::result_out_of_range`. -Charconv already had this information but could not give it to you. +Charconv can give you that information by using `boost::charconv::from_chars_erange` instead of `boost::charconv::from_chars` throughout the code base. By implementing the resolution to the LWG issue that matches the established strtod behavior I think we are providing the correct behavior without waiting on the committee's decision. -** If you prefer the handling required by the standard (e.g. value is returned unmodified on `std::errc::result_out_of_range`) use `boost::charconv::from_chars_strict`. -The handling of `std::errc::result_out_of_range` is the only difference between `from_chars` and `from_chars_strict`. - * These functions have been tested to support all built-in floating-point types and those from C++23's `` ** Long doubles can be 64, 80, or 128-bit, but must be IEEE 754 compliant. An example of a non-compliant, and therefore unsupported, format is `__ibm128`. ** Use of `__float128` or `std::float128_t` requires compiling with `-std=gnu++xx` and linking GCC's `libquadmath`. +This is done automatically when building with CMake. == Examples @@ -126,6 +146,9 @@ assert(v == 8.0427e-18); ---- === std::errc::invalid_argument + +The below is invalid because a negative value is being parsed into an unsigned integer. + [source, c++] ---- const char* buffer = "-123"; @@ -134,6 +157,9 @@ auto r = boost::charconv::from_chars(buffer, buffer + std::strlen(buffer), v); assert(r.ec == std::errc::invalid_argument); assert(!r); // Same as above but less verbose. Added in C++26. ---- + +The below is invalid because a fixed format floating-point value can not have an exponent. + [source, c++] ---- const char* buffer = "-1.573e-3"; @@ -142,7 +168,7 @@ auto r = boost::charconv::from_chars(buffer, buffer + std::strlen(buffer), v, bo assert(r.ec == std::errc::invalid_argument); assert(!r); // Same as above but less verbose. Added in C++26. ---- -Note: In the event of std::errc::invalid_argument, v is not modified by `from_chars` +Note: In the event of `std::errc::invalid_argument`, v is not modified by `from_chars` === std::errc::result_out_of_range [source, c++] diff --git a/doc/charconv/limits.adoc b/doc/charconv/limits.adoc index 847b356..541e377 100644 --- a/doc/charconv/limits.adoc +++ b/doc/charconv/limits.adoc @@ -11,6 +11,9 @@ https://www.boost.org/LICENSE_1_0.txt The contents of `` are designed to help the user optimize the size of the buffer required for `to_chars`. +== Definitions +[#limits_definitions_] + [source, c++] ---- namespace boost { namespace charconv { @@ -35,6 +38,8 @@ The minimum size of the buffer that needs to be passed to `to_chars` to guarant == Examples +The following two examples are for `max_chars10` to optimize the buffer size with `to_chars` for an integral type and a floating-point type respectively. + [source, c++] ---- char buffer [boost::charconv::limits::max_chars10; @@ -55,6 +60,8 @@ assert(r); // Same as above but less verbose. Added in C++26. assert(!strcmp(buffer, "3.40282347e+38")); // strcmp returns 0 on match ---- +The following example is a usage of `max_chars` when used to serialize an integer in binary (base = 2). + [source, c++] ---- char buffer [boost::charconv::limits::max_chars; diff --git a/doc/charconv/overview.adoc b/doc/charconv/overview.adoc index 4967a56..8fc0b7a 100644 --- a/doc/charconv/overview.adoc +++ b/doc/charconv/overview.adoc @@ -11,39 +11,25 @@ https://www.boost.org/LICENSE_1_0.txt == Description -Charconv is a collection of parsing functions that are locale-independent, non-allocating, and non-throwing. -This library requires a minimum of C++11. +Boost.Charconv converts character buffers to numbers, and numbers to character buffers. +It is a small library of two overloaded functions to do the heavy lifting, plus several supporting enums, structures, templates, and constants, with a particular focus on performance and consistency +across the supported development environments. -== Usage Examples -[source, c++] ----- -#include +Why should I be interested in this Library? Charconv is locale-independent, non-allocating^1^, non-throwing and only requires a minimum of C++ 11. +It provides functionality similar to that found in `std::printf` or `std::strtod` with <>. +This library can also be used in place of the standard library `` if unavailable with your toolchain. +Currently only https://en.cppreference.com/w/cpp/compiler_support/17.html[GCC 11+ and MSVC 19.24+] support both integer and floating-point conversions in their implementation of ``. + +If you are using either of those compilers, Boost.Charconv is at least as performant as ``, and can be up to several times faster. +See: <> -const char* buffer = "42"; -int v = 0; -boost::charconv::from_chars_result r = boost::charconv::from_chars(buffer, buffer + std::strlen(buffer), v); -assert(r.ec == std::errc()); -assert(v == 42); +^1^ The one edge case where allocation may occur is you are parsing a string to an 80 or 128-bit `long double` or `__float128`, and the string is over 1024 bytes long. -char buffer[64]; -int v = 123456; -boost::charconv:to_chars_result r = boost::charconv::to_chars(buffer, buffer + sizeof(buffer) - 1, v); -assert(r.ec == std::errc()); -assert(!strncmp(buffer, "123456", 6)); // Strncmp returns 0 on match +== Supported Compilers / OS ----- - -== Supported Compilers +Boost.Charconv is tested on Ubuntu, macOS, and Windows with the following compilers: * GCC 5 or later * Clang 3.8 or later * Visual Studio 2015 (14.0) or later -Tested on https://github.com/cppalliance/charconv/actions[Github Actions] and https://drone.cpp.al/cppalliance/charconv[Drone]. - -== Why use Boost.Charconv over ? - -Currently only https://en.cppreference.com/w/cpp/compiler_support/17[GCC 11+ and MSVC 19.24+] support both integer and floating-point conversions in their implementation of ``. + - -If you are using either of those compilers, Boost.Charconv is at least as performant as ``, and can be up to several times faster. -See: <> +Tested on https://github.com/cppalliance/charconv/actions[GitHub Actions] and https://drone.cpp.al/cppalliance/charconv[Drone]. diff --git a/doc/charconv/sources.adoc b/doc/charconv/sources.adoc index ed564a6..38c58d3 100644 --- a/doc/charconv/sources.adoc +++ b/doc/charconv/sources.adoc @@ -6,6 +6,9 @@ https://www.boost.org/LICENSE_1_0.txt [#sources] = Sources + +The following papers and blog posts serve as the basis for the algorithms used in the library: + :idprefix: :linkattrs: diff --git a/doc/charconv/to_chars.adoc b/doc/charconv/to_chars.adoc index 687dc36..bb41851 100644 --- a/doc/charconv/to_chars.adoc +++ b/doc/charconv/to_chars.adoc @@ -9,7 +9,12 @@ https://www.boost.org/LICENSE_1_0.txt == to_chars overview -`to_chars` is a set of functions that attempts to convert `value` into a character buffer specified by `[first, last)`. The result of `to_chars` is `to_chars_result` which on success returns `ptr` equal to one-past-the-end of the characters written and `ec == std::errc()` and on failure returns `std::errc::result_out_of_range` and `ptr == last`. +`to_chars` is a set of functions that attempts to convert `value` into a character buffer specified by `[first, last)`. +The result of `to_chars` is `to_chars_result` which on success returns `ptr` equal to one-past-the-end of the characters written and `ec == std::errc()` and on failure returns `std::errc::result_out_of_range` and `ptr == last`. +`to_chars` does not null-terminate the returned characters. + +== Definitions +[#to_chars_definitions_] [source, c++] ---- @@ -36,14 +41,7 @@ to_chars_result to_chars(char* first, char* last, Real value, chars_format fmt = }} // Namespace boost::charconv ---- -== to_chars_result -* `ptr` - On return from `to_chars` points to one-past-the-end of the characters written on success or `last` on failure -* `ec` - the error code from `to_chars`. Returned values are: -** `std::errc()` - successful parsing -** `std::errc::result_out_of_range` - result out of range (e.g. overflow) -* `operator==` - compares the value of ptr and ec for equality - -== to_chars +== to_chars parameters * `first, last` - pointers to the beginning and end of the character buffer * `value` - the value to be parsed into the buffer * `base` (integer only) - the integer base to use. Must be between 2 and 36 inclusive @@ -51,14 +49,30 @@ to_chars_result to_chars(char* first, char* last, Real value, chars_format fmt = See <> for description. * `precision` (float only) - the number of decimal places required -=== to_chars for integral types +== to_chars_result +* `ptr` - On return from `to_chars` points to one-past-the-end of the characters written on success or `last` on failure +* `ec` - https://en.cppreference.com/w/cpp/error/errc[the error code]. Values returned by `to_chars` are: +|=== +|Return Value | Description +|`std::errc()` | Successful Parsing +| `std::errc::result_out_of_range` | 1) Overflow + +2) Underflow +|=== + +* `operator==` - compares the value of ptr and ec for equality + +== Usage Notes + +=== Usage notes for to_chars for integral types +[#integral_usage_notes_] * All built-in integral types are allowed except bool which is deleted * from_chars for integral type is constexpr (BOOST_CHARCONV_CONSTEXPR is defined) when: ** compiled using `-std=c++14` or newer ** using a compiler with `\__builtin_ is_constant_evaluated` * These functions have been tested to support `\__int128` and `unsigned __int128` -=== to_chars for floating point types +=== Usage notes for to_chars for floating point types * The following will be returned when handling different values of `NaN` ** `qNaN` returns "nan" ** `-qNaN` returns "-nan(ind)" @@ -67,6 +81,7 @@ See <> for description. * These functions have been tested to support all built-in floating-point types and those from C++23's `` ** Long doubles can be 64, 80, or 128-bit, but must be IEEE 754 compliant. An example of a non-compliant, and therefore unsupported, format is `ibm128`. ** Use of `__float128` or `std::float128_t` requires compiling with `-std=gnu++xx` and linking GCC's `libquadmath`. +This is done automatically when building with CMake. == Examples