diff --git a/doc/getting_started.qbk b/doc/getting_started.qbk index 94b0b19c..b0dcf29b 100644 --- a/doc/getting_started.qbk +++ b/doc/getting_started.qbk @@ -1,4 +1,4 @@ -[section Getting started] +[section:getting_started Getting started] To get you started quickly, here are some heavily commented examples to copy paste from. If you prefer a more traditional, structured exposition, check out the [link histogram.guide full user guide]. @@ -220,7 +220,7 @@ except ImportError: [section Make and use a 1d-histogram in Python without Numpy] -Building the library with Numpy support is highly recommended, but just for completeness, here is an example on how to use the library without Numpy support. +Building the library with Numpy support is highly recommended, but here is an example on how to use the library without Numpy support for completeness. [python]`` import histogram as hg diff --git a/doc/guide.qbk b/doc/guide.qbk index 1577acb7..c98c8587 100644 --- a/doc/guide.qbk +++ b/doc/guide.qbk @@ -1,18 +1,28 @@ [section:guide User guide] -How to create and work with histograms is described here. This library is designed to make simple things simple, yet complex things possible. For a quick start, you don't need to read the complete user guide; have a look into the tutorial and the examples instead. This guide covers the basic and more advanced usage of the library. +How to create and work with histograms is described here. This library is designed to make simple things simple, yet complex things possible. For a quick start, you don't need to read the complete user guide; have a look at the [link histogram.getting_started Getting started] section. This guide covers the basic and more advanced usage of the library. + +[section Use case for multi-dimensional histograms] + +This library provides a class with a simple interface, which implements a general multi-dimensional histogram for multi-dimensional data. A histogram represents a finite number of discrete cells over the data space. A datum passed to the histogram is sorted into one of these cells, called bins here, and instead of storing the datum, the counter of the bin is incremented. Keeping the bin counts around requires much less memory than keeping all the original data around. + +Data can be one-dimensional or multi-dimensional. A multi-dimensional datum is an entity that has more than one describing feature. A point in space is an example. You need three values to describe a single point, the datum, and you probably want to use a 3-d histogram to capture a 3-d point distribution in space. + +The advantage of using a 3-d histogram over three separate 1-d histograms, one for each coordinate, is that the multi-dimensional histogram is able to capture more structure. For example, you could have a point distribution that looks like a checker board, alternating high and low density along each coordinate. Then the 1-d histograms would look like flat distributions, completely hiding the structure, while the 3-d histogram would retain the structure for further analysis. + +[endsect] [section C++ usage] [section Create a histogram] -This library provides a class with a simple interface, which implements a general multi-dimensional histogram for multi-dimensional input values. The histogram class comes in two variants with a common interface, see the [link histogram.rationale.histogram_types rationale] for more information. Using [classref boost::histogram::histogram] is recommended whenever possible. You need [classref boost::histogram::histogram] if: +The histogram class comes in two variants with a common interface, see the [link histogram.rationale.histogram_types rationale] for more information. Using [classref boost::histogram::histogram] is recommended when it is possible. You need [classref boost::histogram::histogram] if: -* you need to create histogram configurations based on input you only have at runtime +* you only know the histogram configurations at runtime, not at compile-time * you want to interoperate with Python -Use the factory function [funcref boost::histogram::make_static_histogram] (or [funcref boost::histogram::make_dynamic_histogram], respectively) to make histograms with default options. The default options make sure that the histogram is safe to use, very fast, and memory efficient. If you are curious about changing these options, have a look at the expert section below. +Use the factory function [funcref boost::histogram::make_static_histogram] (or [funcref boost::histogram::make_dynamic_histogram], respectively) to make histograms with default policies. The default policies make sure that the histogram is safe to use, very fast, and memory efficient. If you are curious about trying other policies or using your own, have a look at the expert section below. [c++]`` #include @@ -29,46 +39,6 @@ int main() { The function `make_static_histogram(...)` takes a variable number of axis objects as arguments. An axis object defines how input values are mapped to bins, which means that it defines the mapping function and the number bins. If you provide one axis, the histogram is one-dimensional. If you provide two, it is two-dimensional, and so on. -The library comes with a number of builtin axis classes (you can write your own, too, see [link histogram.concepts.axis axis concept]). The [classref boost::histogram::axis::regular regular axis] should be your default choice, because it is easy to use and fast. If you have a continous range of integers, the [classref boost::histogram::axis::integer integer axis] is faster. If you have data which wraps around, like angles, use a [classref boost::histogram::axis::circular circular axis]. - -Check the class descriptions of [classref boost::histogram::axis::regular regular axis], [classref boost::histogram::axis::variable variable axis], [classref boost::histogram::axis::circular circular axis], [classref boost::histogram::axis::integer integer axis], and [classref boost::histogram::axis::category category axis] for advice. See the [link histogram.rationale.axis_types rationale about axis types] for more information. - -In addition to the required parameters for an axis, you can provide an optional label as a string to any axis, which helps to remember what the axis is categorising. Example: you have census data and you want to investigate how yearly income correlates with age, you could do: - -[c++]`` -#include - -namespace bh = boost::histogram; - -int main() { - // create a 2d-histogram in default configuration with an "age" axis - // and an "income" axis - auto h = bh::make_static_histogram(bh::axis::regular<>(20, 0, 100, "age in years"), - bh::axis::regular<>(20, 0, 100, "yearly income in $1000")); - // do something with h -} -`` - -Without the labels it would be difficult to remember which axis was covering which quantity. Beware, for safety reasons, labels cannot be changed once the axis is created. Axes objects which differ in their label do not compare equal with `operator==`. - -By default, under- and overflow bins are added automatically for each axis range. Therefore, if you create an axis with 20 bins, the histogram will actually have 22 bins in that dimension. The two extra bins are very useful and in most cases you want to have them. However, if you know for sure that the input is strictly covered by the axis, you can disable them and save memory: - -[c++]`` -#include - -namespace bh = boost::histogram; - -int main() { - // create a 1d-histogram for dice throws with eye values from 1 to 6 - auto h = bh::make_static_histogram(bh::axis::integer<>(1, 7, "eyes", bh::axis::uoflow::off)); - // do something with h -} -`` - -Using a [classref boost::histogram::axis::integer integer axis] in this example is convenient, because the input values are integers and we want one bin for each eye value. The intervals in all axes are always semi-open, the last value is never included. That's why the upper end is 7 and not 6, here. This is similar to iterator -ranges from `begin` to `end`, where `end` is also not included. -[note The specialised [classref boost::histogram::axis::circular circular axis] never creates under- and overflow bins, because the axis is circular. The highest bin wrapps around to the lowest bin and vice versa, so there is no need for extra bins.] - When you work with [classref boost::histogram::histogram], you can also create a histogram from a run-time compiled collection of axis objects: [c++]`` @@ -87,47 +57,99 @@ int main() { } `` -[note In all these examples, memory for bin counters is allocated lazily, because the default policy [classref boost::histogram::adaptive_storage] is used. Allocation is deferred to the first call to `fill(...)`, which are described in the next section. Therefore memory allocation exceptions are not thrown when the histogram is created, but possibly later on the first fill.] +[note In all these examples, memory for bin counters is allocated lazily, because the default policy [classref boost::histogram::adaptive_storage] is used. Allocation is deferred to the first call to `fill(...)`, which are described in the next section. Therefore memory allocation exceptions are not thrown when the histogram is created, but possibly later on the first fill. This also gives you a chance to check how much memory the histogram will allocate and whether that is prohibitive. Use the method `bincount()` to see how many bins your axis layout requires. At the first fill, `bincount()` bytes will be allocated, which may grow later again when the size of the bin counters needs to be increased.] + +[section Axis configuration] + +The library comes with a number of builtin axis classes (you can write your own, too, see [link histogram.concepts.axis axis concept]). The [classref boost::histogram::axis::regular regular axis] should be your default choice, because it is easy to use and fast. If you have a continous range of integers, the [classref boost::histogram::axis::integer integer axis] is faster. If you have data which wraps around, like angles, use a [classref boost::histogram::axis::circular circular axis]. + +[note All axes which define bins in terms of intervals always use semi-open intervals by convention. The last value is never included. For example, the axis `axis::integer<>(0, 3)` has three bins with intervals `[0, 1), [1, 2), [2, 3)`. To remember this, think of iterator ranges from `begin` to `end`, where `end` is also not included.] + +Check the class descriptions of [classref boost::histogram::axis::regular regular axis], [classref boost::histogram::axis::variable variable axis], [classref boost::histogram::axis::circular circular axis], [classref boost::histogram::axis::integer integer axis], and [classref boost::histogram::axis::category category axis] for advice. See the [link histogram.rationale.axis_types rationale about axis types] for more information. + +In addition to the required parameters for an axis, you can assign an optional label to any axis, which helps to remember what the axis is categorising. Example: you have census data and you want to investigate how yearly income correlates with age, you could do: + +[c++]`` +#include + +namespace bh = boost::histogram; + +int main() { + // create a 2d-histogram in default configuration with an "age" axis + // and an "income" axis + auto h = bh::make_static_histogram(bh::axis::regular<>(20, 0, 100, "age in years"), + bh::axis::regular<>(20, 0, 100, "yearly income in $1000")); + // do something with h +} +`` + +Without the labels it would be difficult to remember which axis was covering which quantity. Labels are the only property of an axis that can be changed later. Axes objects which differ in their label do not compare equal with `operator==`. + +By default, under- and overflow bins are added automatically for each axis range. Therefore, if you create an axis with 20 bins, the histogram will actually have 22 bins in that dimension. The two extra bins are very useful and in most cases you want to have them. However, if you know for sure that the input is strictly covered by the axis, you can disable them to save memory. This is done by passing an additional parameter to the axis constructor: + +[c++]`` +#include + +namespace bh = boost::histogram; + +int main() { + // create a 1d-histogram for dice throws with eye values from 1 to 6 + auto h = bh::make_static_histogram(bh::axis::integer<>(1, 7, "eyes", bh::axis::uoflow::off)); + // do something with h +} +`` + +We use an [classref boost::histogram::axis::integer integer axis] here, because the input values are integers. + +[note The specialised [classref boost::histogram::axis::circular circular axis] never creates under- and overflow bins, because the axis is circular. The highest bin wrapps around to the lowest bin and vice versa, so there is no need for extra bins.] + +[endsect] [endsect] [section Fill a histogram with data] -The histogram (either type) supports three kinds of fills. - -* `fill(...)` initiates a normal fill, which increments an internal counter by one. - -* `fill(..., count(n))` initiates a fill, which increments an internal counter by the integer number `n`. - -* `fill(..., weight(x))` initiates a weighted fill, which increments an internal counter a weight `x` (a real number) when a value is in the bin range. - -Why weighted fills are sometimes useful is explained [link histogram.rationale.weights in the rationale]. This is mostly required in a scientific context. If you don't see the point, you can just ignore this type of call. Especially, do not use the form `fill(..., weight(x))` if you just wanted to avoid calling `fill(...)` repeatedly with the same arguments. Use `fill(..., count(n))` for that, because it is way more efficient. Apart for that, you are free to mix these calls in any order, meaning, you can start calling `fill(...)` and later switch to `fill(..., weight(x))` on the same histogram or vice versa. - -Here is an example which fills a 2d-histogram with 1000 pairs of normal distributed numbers taken from a generator: +After you created the histogram, you want to insert potentionally multi-dimensional data. This is done with the flexible `fill(...)` method, which you call in a handcrafted loop. The histogram supports three kinds of fills, as shown in the example: [c++]`` -// also see examples/example_2d.cpp #include -#include -#include -namespace br = boost::random; namespace bh = boost::histogram; int main() { - br::mt19937 gen; - br::normal_distribution<> norm; - auto h = bh::make_static_histogram( - bh::axis::regular<>(100, -5, 5, "x"), - bh::axis::regular<>(100, -5, 5, "y") - ); - for (int i = 0; i < 1000; ++i) - h.fill(norm(gen), norm(gen)); - // h is now filled + auto h = bh::make_dynamic_histogram(bh::axis::integer<>(0, 4), + bh::axis::regular<>(10, 0, 5)); + + // fill histogram, number of arguments must be equal to number of axes + h.fill(0, 4.1); // increases bin counter by one + h.fill(1, 1.2, bh::count(3)); // increase bin counter by 3 + h.fill(bh::count(3), 2, 2.3); // also increases bin counter by 3 + h.fill(3, 3.4, bh::weight(1.5)); // increase bin counter by weight 1.5 + h.fill(bh::weight(1.5), 3, 3.4); // does the same as the previous call + + // a dynamic histogram also supports fills from an interator range while + // a static histogram does not allow it; using a range of wrong length + // is an error + std::vector values = {4, 3.1}; + h.fill(values.begin(), values.end()); } `` -Here is a second example which using a weighted fill in a functional programming style. The input values are taken from a container: +Here is a breakdown regarding fills. + +* `fill(...)` initiates a normal fill, which takes `N` arguments, where `N` is equal to the number of axes of the histogram, finds the corresponding bin, and increments an the counter of that bin by one. + +* `fill(..., count(n))` initiates a fill, which increments the associated bin counter by the integer number `n`. There are no restrictions on the position of the `count` helper class, it can appear at he beginning or the end or at any other position. Using more than one `count` instance is an error. + +* `fill(..., weight(x))` initiates a weighted fill, which increments the value counter of the associated bin by a real number `x` and an associated variance counter by `x*x`. The position of the `weight` helper class is free, just like for `count`. Having more than one `weight` instance or `weight` and `count` in the same call is an error. + +You can freely mix these order of these calls. For example, you can start calling `fill(...)` and later switch to `fill(..., weight(x))` on the same histogram and vice versa. + +Why weighted fills are useful to some is explained [link histogram.rationale.weights in the rationale]. This is mostly required in a scientific context. If you don't see the point, you can just ignore this type of call. Especially, do not use a weighted fill, if you just wanted to avoid calling `fill(...)` repeatedly with the same arguments. Use `fill(..., count(n))` for that, because it is more efficient. + +[note The first call to a weighted fill will internally cause a switch from integral bin counters to a new data structure, which holds two double counters per bin, one for the sum of weights (the value), and another for the sum of weights squared (the variance). This is necessary, because in case of weighted fills, the variance cannot be trivially computed from the integral count anymore.] + +In contrast to [@boost:/libs/accumulators/index.html Boost.Accumulators], this library asks you to write the filling loop yourself, because that is the most flexible solution for multi-dimensional data. If you prefer a functional programming style, you can use a lambda, as shown in this example. [c++]`` #include @@ -148,14 +170,69 @@ int main() { [endsect] -[section Work with a filled histogram] +[section Extract data from a filled histogram] -TODO: explain how to access values and variances, operators +Once the histogram has been filled, you want to access the bin counts at some point. You may want to visualize the histogram, or compute some quantities like the mean of the distribution approximately represented by the histogram. -The histogram provides a non-parametric variance estimate for the bin count in either case. +To access the count of each bin, you use a multi-dimensional index, which consists of a sequence of bin indices for the axes in order. You can use this index to access the value and variance of each bin, using the methods with the same name, as demonstrated in the next example. + +[c++]`` +#include +#include + +namespace bh = boost::histogram; + +int main() { + // make a histogram with 2 x 2 = 4 bins (not counting under- and overflow bins) + auto h = bh::make_dynamic_histogram(bh::axis::regular<>(2, -1, 1), + bh::axis::regular<>(2, 2, 4)); + h.fill(-0.5, 2.5, bh::count(1)); // low, low + h.fill(-0.5, 3.5, bh::count(2)); // low, high + h.fill( 0.5, 2.5, bh::count(3)); // high, low + h.fill( 0.5, 3.5, bh::weight(4)); // high, high + + // access value of bin count, number of arguments must be equal + std::cout << h.value(0, 0) << " " // low, low + << h.value(0, 1) << " " // low, high + << h.value(1, 0) << " " // high, low + << h.value(1, 1) // high, high + << std::endl; + + // prints: 1 2 3 4 + + // access variance of bin count + std::cout << h.variance(0, 0) << " " // low, low + << h.variance(0, 1) << " " // low, high + << h.variance(1, 0) << " " // high, low + << h.variance(1, 1) // high, high + << std::endl; + + // prints: 1 2 3 16 + + // a dynamic histogram also supports access via an interator range, while + // a static histogram does not allow it; using a range of wrong length + // is an error + std::vector idx(2); + idx = {0, 1}; + std::cout << h.value(idx.begin(), idx.end()) << " " + << h.variance(idx.begin(), idx.end()) << std::endl; +} +`` + +[note The numbers returned by `value(...)` and `variance(...)` are equal, as long as the bin has not been filled with a weighted datum. The internal structure, which handles the bin counters, has been optimised for this common case. It uses only a single counter per bin for normal fills, and only switches to two counters once the first weighted fill is performed.] + +[endsect] + +[section Supported arithmetic operators] Histograms can be added if they have the same signature. This is convenient if histograms are filled in parallel on a cluster and then merged (added). +[endsect] + +[section Streaming and serialization] + +The histogram provides a non-parametric variance estimate for the bin count in either case. + The histogram can be serialized to disk for persistent storage from C++ and pickled in Python. It comes with Numpy support, too. The histogram can be fast-filled with Numpy arrays for maximum performance, and viewed as a Numpy array without copying its memory buffer. [endsect] @@ -208,6 +285,8 @@ h.fill(v) # fills the histogram with each value in the array `fill(...)` accepts any sequence that can be converted into a numpy array with `dtype=float`. To get the best performance, avoid the conversion and work with such numpy arrays directly. +The histogram can be serialized to disk for persistent storage from C++ and pickled in Python. It comes with Numpy support, too. The histogram can be fast-filled with Numpy arrays for maximum performance, and viewed as a Numpy array without copying its memory buffer. + [endsect] [endsect] diff --git a/doc/sync_code.py b/doc/sync_code.py index e5e885cd..2bcb30f4 100755 --- a/doc/sync_code.py +++ b/doc/sync_code.py @@ -8,13 +8,11 @@ def is_more_recent(a, b): out_dir = os.path.dirname(__file__) + "/../examples" -exi = 1 for qbk in glob.glob(os.path.dirname(__file__) + "/*.qbk"): base = os.path.splitext(os.path.basename(qbk))[0] - if base != "getting_started": continue with open(qbk) as fi: qbk_content = fi.read() - qbk_needs_update = False + exi = 1 for m in re.finditer("\[([^\]]+)\]``\n*", qbk_content): tag = m.group(1) start = m.end() @@ -26,7 +24,7 @@ for qbk in glob.glob(os.path.dirname(__file__) + "/*.qbk"): ext = "py" else: raise NotImplementedError("can only handle tags c++ and python") - foname = out_dir + "/%s_listing_%i.%s" % (base, exi, ext) + foname = out_dir + "/%s_listing_%02i.%s" % (base, exi, ext) if os.path.exists(foname): with open(foname) as fi: code2 = fi.read() @@ -34,13 +32,7 @@ for qbk in glob.glob(os.path.dirname(__file__) + "/*.qbk"): if is_more_recent(qbk, foname): with open(foname, "w") as fo: fo.write(code) - else: - qbk_content = qbk_content[:start] + code2 + qbk_content[end:] - qbk_needs_update = True else: with open(foname, "w") as fo: fo.write(code) exi += 1 - if qbk_needs_update: - with open(qbk, "w") as fo: - fo.write(qbk_content) diff --git a/examples/getting_started_listing_1.cpp b/examples/getting_started_listing_1.cpp deleted file mode 100644 index 45b0320c..00000000 --- a/examples/getting_started_listing_1.cpp +++ /dev/null @@ -1,87 +0,0 @@ -#include -#include - -int main(int, char**) { - namespace bh = boost::histogram; - using namespace bh::literals; // enables _c suffix - - /* - create a static 1d-histogram with an axis that has 10 equidistant - bins on the real line from -1.0 to 2.0, and label it as "x" - */ - auto h = bh::make_static_histogram( - bh::axis::regular<>(10, -1.0, 2.0, "x") - ); - - // fill histogram with data, typically this would happen in a loop - h.fill(-1.5); // put in underflow bin - h.fill(-1.0); // included in first bin, bin interval is semi-open - h.fill(-0.5); - h.fill(1.1); - h.fill(0.3); - h.fill(1.7); - h.fill(2.0); // put in overflow bin, bin interval is semi-open - h.fill(20.0); // put in overflow bin - - /* - use bh::count(N) if you would otherwise call h.fill(...) with - *same* argument N times, N is an integer argument - */ - h.fill(1.0, bh::count(4)); - - /* - do a weighted fill using bh::weight, which accepts a double - - don't mix this with bh::count, both have a different effect on the - variance (see Rationale for an explanation regarding weights) - - if you don't know what this is good for, use bh::count instead, - it is most likeliy what you want and it is more efficient - */ - h.fill(0.1, bh::weight(2.5)); - - /* - iterate over bins, loop excludes under- and overflow bins - - index 0_c is a compile-time number, the only way in C++ to make - axis(...) to return a different type for each index - - for-loop yields instances of `std::pair`, where - `bin_type` usually is a semi-open interval representing the bin, - whose edges can be accessed with methods `lower()` and `upper()`, - but the [bin type] depends on the axis, look it up in the reference - - `value(index)` method returns the bin count at index - - `variance(index)` method returns a variance estimate of the bin - count at index (see Rationale section for what this means) - */ - for (const auto& bin : h.axis(0_c)) { - std::cout << "bin " << bin.first << " x in [" - << bin.second.lower() << ", " << bin.second.upper() << "): " - << h.value(bin.first) << " +/- " - << std::sqrt(h.variance(bin.first)) - << std::endl; - } - - // accessing under- and overflow bins is easy, use indices -1 and 10 - std::cout << "underflow bin [" << h.axis(0_c)[-1].lower() - << ", " << h.axis(0_c)[-1].upper() << "): " - << h.value(-1) << " +/- " << std::sqrt(h.variance(-1)) - << std::endl; - std::cout << "overflow bin [" << h.axis(0_c)[10].lower() - << ", " << h.axis(0_c)[10].upper() << "): " - << h.value(10) << " +/- " << std::sqrt(h.variance(10)) - << std::endl; - - /* program output: - - bin 0 x in [-1, -0.7): 1 +/- 1 - bin 1 x in [-0.7, -0.4): 1 +/- 1 - bin 2 x in [-0.4, -0.1): 0 +/- 0 - bin 3 x in [-0.1, 0.2): 2.5 +/- 2.5 - bin 4 x in [0.2, 0.5): 1 +/- 1 - bin 5 x in [0.5, 0.8): 0 +/- 0 - bin 6 x in [0.8, 1.1): 4 +/- 2 - bin 7 x in [1.1, 1.4): 1 +/- 1 - bin 8 x in [1.4, 1.7): 0 +/- 0 - bin 9 x in [1.7, 2): 1 +/- 1 - underflow bin [-inf, -1): 1 +/- 1 - overflow bin [2, inf): 2 +/- 1.41421 - - */ -} diff --git a/examples/getting_started_listing_2.cpp b/examples/getting_started_listing_2.cpp deleted file mode 100644 index 4793694f..00000000 --- a/examples/getting_started_listing_2.cpp +++ /dev/null @@ -1,47 +0,0 @@ -#include -#include -#include -#include -#include - -namespace br = boost::random; -namespace bh = boost::histogram; - -int main() { - /* - create a dynamic histogram with the factory `make_dynamic_histogram` - - axis can be passed directly just like for `make_static_histogram` - - in addition, the factory also accepts iterators over a sequence of - axis::any, the polymorphic type that can hold concrete axis types - */ - std::vector> axes; - axes.emplace_back(bh::axis::category({"red", "blue"})); - axes.emplace_back(bh::axis::regular<>(5, -5, 5, "x")); - axes.emplace_back(bh::axis::regular<>(5, -5, 5, "y")); - auto h = bh::make_dynamic_histogram(axes.begin(), axes.end()); - - // fill histogram with random numbers - br::mt19937 gen; - br::normal_distribution<> norm; - for (int i = 0; i < 1000; ++i) - h.fill(i % 2 ? "red" : "blue", norm(gen), norm(gen)); - - /* - print dynamic histogram by iterating over bins - - for most axis types, the for loop looks just like for a static - histogram, except that we can pass runtime numbers, too - - if the [bin type] of the axis is not convertible to a - double interval, one needs to cast axis::any before looping; - this is here the case for the category axis - */ - using cas = bh::axis::category; - for (const auto& cbin : bh::axis::cast(h.axis(0))) { - std::printf("%s\n", cbin.second.c_str()); - for (const auto& ybin : h.axis(2)) { // rows - for (const auto& xbin : h.axis(1)) { // columns - std::printf("%3.0f ", h.value(cbin.first, xbin.first, ybin.first)); - } - std::printf("\n"); - } - } -} diff --git a/examples/getting_started_listing_2.py b/examples/getting_started_listing_2.py deleted file mode 100644 index 4d26e5b8..00000000 --- a/examples/getting_started_listing_2.py +++ /dev/null @@ -1,37 +0,0 @@ -import histogram as hg -import numpy as np - -# create 2d-histogram with two axes with 10 equidistant bins from -3 to 3 -h = hg.histogram(hg.axis.regular(10, -3, 3, "x"), - hg.axis.regular(10, -3, 3, "y")) - -# generate some numpy arrays with data to fill into histogram, -# in this case normal distributed random numbers in x and y -x = np.random.randn(1000) -y = 0.5 * np.random.randn(1000) - -# fill histogram with numpy arrays, this is very fast -h.fill(x, y) - -# get representations of the bin edges as Numpy arrays, this representation -# differs from `list(h.axis(0))`, because it is optimised for compatibility -# with existing Numpy code, i.e. to replace numpy.histogram -x = np.array(h.axis(0)) -y = np.array(h.axis(1)) - -# creates a view of the counts (no copy involved) -count_matrix = np.asarray(h) - -# cut off the under- and overflow bins (no copy involved) -reduced_count_matrix = count_matrix[:-2,:-2] - -try: - # draw the count matrix - import matplotlib.pyplot as plt - plt.pcolor(x, y, reduced_count_matrix.T) - plt.xlabel(h.axis(0).label) - plt.ylabel(h.axis(1).label) - plt.savefig("example_2d_python.png") -except ImportError: - # ok, no matplotlib, then just print it - print count_matrix diff --git a/examples/getting_started_listing_3.py b/examples/getting_started_listing_3.py deleted file mode 100644 index 2d5b4b03..00000000 --- a/examples/getting_started_listing_3.py +++ /dev/null @@ -1,52 +0,0 @@ -import histogram as hg -import numpy as np - -# create 2d-histogram with two axes with 10 equidistant bins from -3 to 3 -h = hg.histogram(hg.axis.regular(10, -3, 3, "x"), - hg.axis.regular(10, -3, 3, "y")) - -# generate some numpy arrays with data to fill into histogram, -# in this case normal distributed random numbers in x and y -x = np.random.randn(1000) -y = 0.5 * np.random.randn(1000) - -# fill histogram with numpy arrays, this is very fast -h.fill(x, y) - -# get representations of the bin edges as Numpy arrays, this representation -# differs from `list(h.axis(0))`, because it is optimised for compatibility -# with existing Numpy code, i.e. to replace numpy.histogram -x = np.array(h.axis(0)) -y = np.array(h.axis(1)) - -# creates a view of the counts (no copy involved) -count_matrix = np.asarray(h) - -# cut off the under- and overflow bins to not confuse matplotib (no copy) -reduced_count_matrix = count_matrix[:-2,:-2] - -try: - # draw the count matrix - import matplotlib.pyplot as plt - plt.pcolor(x, y, reduced_count_matrix.T) - plt.xlabel(h.axis(0).label) - plt.ylabel(h.axis(1).label) - plt.savefig("example_2d_python.png") -except ImportError: - # ok, no matplotlib, then just print the full count matrix - print count_matrix - - # output of the print looks something like this, the two right-most rows - # and two down-most columns represent under-/overflow bins - # [[ 0 0 0 1 5 0 0 1 0 0 0 0] - # [ 0 0 0 1 17 11 6 0 0 0 0 0] - # [ 0 0 0 5 31 26 4 1 0 0 0 0] - # [ 0 0 3 20 59 62 26 4 0 0 0 0] - # [ 0 0 1 26 96 89 16 1 0 0 0 0] - # [ 0 0 4 21 86 84 20 1 0 0 0 0] - # [ 0 0 1 24 71 50 15 2 0 0 0 0] - # [ 0 0 0 6 26 37 7 0 0 0 0 0] - # [ 0 0 0 0 11 10 2 0 0 0 0 0] - # [ 0 0 0 1 2 3 1 0 0 0 0 0] - # [ 0 0 0 0 0 2 0 0 0 0 0 0] - # [ 0 0 0 0 0 1 0 0 0 0 0 0]] diff --git a/examples/getting_started_listing_4.py b/examples/getting_started_listing_4.py deleted file mode 100644 index e2bab209..00000000 --- a/examples/getting_started_listing_4.py +++ /dev/null @@ -1,13 +0,0 @@ -import histogram as hg - -# make 1-d histogram with 5 logarithmic bins from 1e0 to 1e5 -h = hg.histogram(hg.axis.regular_log(5, 1e0, 1e5, "x")) - -# fill histogram with numbers -for x in (2e0, 2e1, 2e2, 2e3, 2e4): - h.fill(x, count=2) - -# iterate over bins and access bin counter -for idx, (lower, upper) in enumerate(h.axis(0)): - print "bin {0} x in [{1}, {2}): {3} +/- {4}".format( - idx, lower, upper, h.value(idx), h.variance(idx) ** 0.5) diff --git a/examples/guide_listing_1.cpp b/examples/guide_listing_1.cpp deleted file mode 100644 index b962238b..00000000 --- a/examples/guide_listing_1.cpp +++ /dev/null @@ -1,10 +0,0 @@ -#include - -namespace bh = boost::histogram; - -int main() { - // create a 1d-histogram in default configuration which - // covers the real line from -1 to 1 in 100 bins - auto h = bh::make_static_histogram(bh::axis::regular<>(100, -1, 1)); - // do something with h -} diff --git a/examples/guide_listing_10.py b/examples/guide_listing_10.py deleted file mode 100644 index 924c64f2..00000000 --- a/examples/guide_listing_10.py +++ /dev/null @@ -1,12 +0,0 @@ -import histogram as bh -import numpy as np - -h = bh.histogram(bh.axis.integer(0, 9)) - -# don't do this, it is very slow -for i in range(10): - h.fill(i) - -# do this instead, it is very fast -v = np.arange(10, dtype=float) -h.fill(v) # fills the histogram with each value in the array diff --git a/examples/guide_listing_2.cpp b/examples/guide_listing_2.cpp deleted file mode 100644 index b962238b..00000000 --- a/examples/guide_listing_2.cpp +++ /dev/null @@ -1,10 +0,0 @@ -#include - -namespace bh = boost::histogram; - -int main() { - // create a 1d-histogram in default configuration which - // covers the real line from -1 to 1 in 100 bins - auto h = bh::make_static_histogram(bh::axis::regular<>(100, -1, 1)); - // do something with h -} diff --git a/examples/guide_listing_3.cpp b/examples/guide_listing_3.cpp deleted file mode 100644 index a6c5e05e..00000000 --- a/examples/guide_listing_3.cpp +++ /dev/null @@ -1,11 +0,0 @@ -#include - -namespace bh = boost::histogram; - -int main() { - // create a 2d-histogram in default configuration with an "age" axis - // and an "income" axis - auto h = bh::make_static_histogram(bh::axis::regular<>(20, 0, 100, "age in years"), - bh::axis::regular<>(20, 0, 100, "yearly income in $1000")); - // do something with h -} diff --git a/examples/guide_listing_4.cpp b/examples/guide_listing_4.cpp deleted file mode 100644 index 93d4d5de..00000000 --- a/examples/guide_listing_4.cpp +++ /dev/null @@ -1,9 +0,0 @@ -#include - -namespace bh = boost::histogram; - -int main() { - // create a 1d-histogram for dice throws with eye values from 1 to 6 - auto h = bh::make_static_histogram(bh::axis::integer<>(1, 7, "eyes", bh::axis::uoflow::off)); - // do something with h -} diff --git a/examples/guide_listing_5.cpp b/examples/guide_listing_5.cpp deleted file mode 100644 index f7a04292..00000000 --- a/examples/guide_listing_5.cpp +++ /dev/null @@ -1,13 +0,0 @@ -#include -#include - -namespace bh = boost::histogram; - -int main() { - using hist_type = bh::histogram; - auto v = std::vector>(); - v.push_back(bh::axis::regular<>(100, -1, 1)); - v.push_back(bh::axis::integer<>(1, 7)); - auto h = hist_type(v.begin(), v.end()); - // do something with h -} diff --git a/examples/guide_listing_6.cpp b/examples/guide_listing_6.cpp deleted file mode 100644 index 7a9bb259..00000000 --- a/examples/guide_listing_6.cpp +++ /dev/null @@ -1,19 +0,0 @@ -// also see examples/example_2d.cpp -#include -#include -#include - -namespace br = boost::random; -namespace bh = boost::histogram; - -int main() { - br::mt19937 gen; - br::normal_distribution<> norm; - auto h = bh::make_static_histogram( - bh::axis::regular<>(100, -5, 5, "x"), - bh::axis::regular<>(100, -5, 5, "y") - ); - for (int i = 0; i < 1000; ++i) - h.fill(norm(gen), norm(gen)); - // h is now filled -} diff --git a/examples/guide_listing_7.cpp b/examples/guide_listing_7.cpp deleted file mode 100644 index 4afd4667..00000000 --- a/examples/guide_listing_7.cpp +++ /dev/null @@ -1,14 +0,0 @@ -#include -#include -#include - -namespace bh = boost::histogram; - -int main() { - auto h = bh::make_static_histogram(bh::axis::integer<>(0, 9)); - std::vector v{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; - std::for_each(v.begin(), v.end(), - [&h](int x) { h.fill(x, bh::weight(2.0)); } - ); - // h is now filled -} diff --git a/examples/guide_listing_7.py b/examples/guide_listing_7.py deleted file mode 100644 index dffcb89c..00000000 --- a/examples/guide_listing_7.py +++ /dev/null @@ -1,10 +0,0 @@ -# also see examples/create_python_fill_cpp.py and examples/module_cpp_filler.cpp -import histogram as bh -import cpp_filler - -h = bh.histogram(bh.axis.regular(100, -1, 1), - bh.axis.integer(0, 10)) - -cpp_filler.process(h) # histogram is filled with input values - -# continue with statistical analysis of h diff --git a/examples/guide_listing_8.cpp b/examples/guide_listing_8.cpp deleted file mode 100644 index 4afd4667..00000000 --- a/examples/guide_listing_8.cpp +++ /dev/null @@ -1,14 +0,0 @@ -#include -#include -#include - -namespace bh = boost::histogram; - -int main() { - auto h = bh::make_static_histogram(bh::axis::integer<>(0, 9)); - std::vector v{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; - std::for_each(v.begin(), v.end(), - [&h](int x) { h.fill(x, bh::weight(2.0)); } - ); - // h is now filled -} diff --git a/examples/guide_listing_8.py b/examples/guide_listing_8.py deleted file mode 100644 index dffcb89c..00000000 --- a/examples/guide_listing_8.py +++ /dev/null @@ -1,10 +0,0 @@ -# also see examples/create_python_fill_cpp.py and examples/module_cpp_filler.cpp -import histogram as bh -import cpp_filler - -h = bh.histogram(bh.axis.regular(100, -1, 1), - bh.axis.integer(0, 10)) - -cpp_filler.process(h) # histogram is filled with input values - -# continue with statistical analysis of h diff --git a/examples/guide_listing_9.py b/examples/guide_listing_9.py deleted file mode 100644 index 924c64f2..00000000 --- a/examples/guide_listing_9.py +++ /dev/null @@ -1,12 +0,0 @@ -import histogram as bh -import numpy as np - -h = bh.histogram(bh.axis.integer(0, 9)) - -# don't do this, it is very slow -for i in range(10): - h.fill(i) - -# do this instead, it is very fast -v = np.arange(10, dtype=float) -h.fill(v) # fills the histogram with each value in the array diff --git a/examples/module_cpp_filler.cpp b/examples/module_cpp_filler.cpp index b7d55054..c6d113e9 100644 --- a/examples/module_cpp_filler.cpp +++ b/examples/module_cpp_filler.cpp @@ -4,6 +4,8 @@ // (See accompanying file LICENSE_1_0.txt // or copy at http://www.boost.org/LICENSE_1_0.txt) +// also see python_fill_cpp.py + #include #include #include diff --git a/src/python/histogram.cpp b/src/python/histogram.cpp index a1c9adc9..4cbb4467 100644 --- a/src/python/histogram.cpp +++ b/src/python/histogram.cpp @@ -371,27 +371,27 @@ void register_histogram() { .add_property("dim", &dynamic_histogram::dim) .def("axis", histogram_axis, python::arg("i") = 0, ":param int i: axis index" - "\nReturns axis with index i.") + "\n:return: corresponding axis object") .def("fill", python::raw_function(histogram_fill), - "Pass N values where N is equal to the dimensions" - "\nof the histogram, and optionally another value with the keyword" - "\n*weight*. All values must be convertible to double." + ":param double args: values (number must match dimension)" + "\n:keyword double weight: optional weight" + "\n:keyword uint32_t count: optional count" "\n" "\nIf Numpy support is enabled, 1d-arrays can be passed instead of" "\nvalues, which must be equal in lenght. Arrays and values can" - "\nbe mixed in the same call.") + "\nbe mixed arbitrarily in the same call.") .add_property("bincount", &dynamic_histogram::bincount, - "Returns total number of bins, including under- and overflow.") + ":return: total number of bins, including under- and overflow") .add_property("sum", &dynamic_histogram::sum, - "Returns sum of all entries, including under- and overflow bins.") + ":return: sum of all entries, including under- and overflow bins") .def("value", python::raw_function(histogram_value), - ":param int args: indices of the bin" + ":param int args: indices of the bin (number must match dimension)" "\n:return: count for the bin") .def("variance", python::raw_function(histogram_variance), - ":param int args: indices of the bin" + ":param int args: indices of the bin (number must match dimension)" "\n:return: variance estimate for the bin") .def("__repr__", histogram_repr, - ":returns: string representation of the histogram") + ":return: string representation of the histogram") .def(python::self == python::self) .def(python::self += python::self) .def(python::self *= double()) diff --git a/test/histogram_test.cpp b/test/histogram_test.cpp index eaed9b56..0d69877d 100644 --- a/test/histogram_test.cpp +++ b/test/histogram_test.cpp @@ -779,15 +779,29 @@ int main() { // init { auto v = std::vector>(); - v.push_back(axis::regular<>(100, -1, 1)); + v.push_back(axis::regular<>(4, -1, 1)); v.push_back(axis::integer<>(1, 7)); - auto h = histogram(v.begin(), v.end()); - BOOST_TEST_EQ(h.axis(0_c), v[0]); - BOOST_TEST_EQ(h.axis(1_c), v[1]); + auto h = make_dynamic_histogram(v.begin(), v.end()); BOOST_TEST_EQ(h.axis(0), v[0]); BOOST_TEST_EQ(h.axis(1), v[1]); } + // using iterator ranges + { + auto h = make_dynamic_histogram(axis::regular<>(2, -1, 1), + axis::regular<>(2, 2, 4)); + auto v = std::vector(2); + v = {-0.5, 2.5}; + h.fill(v.begin(), v.end()); + v = { 0.5, 3.5}; + h.fill(v.begin(), v.end()); + auto i = std::vector(2); + i = {0, 0}; + BOOST_TEST_EQ(h.value(i.begin(), i.end()), 1); + i = {1, 1}; + BOOST_TEST_EQ(h.variance(i.begin(), i.end()), 1); + } + // axis methods { enum { A, B };