stats

`stats`#

Selector helpers for book keeping of selection and event weight statistics for aggregation over datasets.

Classes:

`increment_stats`(args, *kwargs)
`increment_event_stats`(args, *kwargs)

class increment_stats(*args, **kwargs)[source]#

Bases: Selector

Attributes:

`call_force`
`data_only`
`mc_only`
`nominal_only`
`shifts_only`

Methods:

call_func(events, results, stats[, ...])

Unexposed selector that does not actually select objects but that instead increments selection metrics in a given dictionary stats given a chunk of events and the corresponding selection results.

setup_func(reqs, inputs, reader_targets)

rtype:: None

call_force = True#

call_func(events, results, stats, weight_map=None, group_map=None, group_combinations=None, **kwargs)#

A weight_map* can be defined to configure the actual fields to be added. The key of each entry should either start with "num, to state that it will refer to a plain number of events, or "sum", to state that the field describes the sum of a specific column (usualky weights). Different types of values are accepted, depending on the type of “operation”: :rtype: tuple[ak.Array, SelectionResult]

"num": An event mask, or an Ellipsis to select all events.

"sum": Either a column to sum over, or a 2-tuple containing the column to sum, and
an event mask to only sum over certain events.

Example:

# weight map definition
weight_map = {
    # "num" operations
    "num_events": Ellipsis,  # all events
    "num_events_selected": results.event,  # selected events only
    # "sum" operations
    "sum_mc_weight": events.mc_weight,  # weights of all events
    "sum_mc_weight_selected": (events.mc_weight, results.event),  # weights of selected events
}

# usage within an exposed selector
# (where results are generated, and events and stats were passed by SelectEvents)
self[increment_stats_per_process](events, results, stats, weight_map=weight_map, **kwargs)

Each sum of weights can also be extracted for each unique element in a so-called group, such as per process id, or per jet multiplicity bin. For this purpose, a group_map can be defined, mapping the name of a group (e.g. "process" or "njet") to a dictionary with the fields

"values", unique values to loop over,

"mask_fn", a function that is supposed to return a mask given a single value, and

"combinations_only" (optional), a boolean flag (False by default) that decides
whether this group is not to be evaluated on its own, but only as part of a combination with other groups (see below).

Example:

group_map = {
    "process": {
        "values": events.process_id,
        "mask_fn": (lambda v: events.process_id == v),
    },
    "njet": {
        "values": results.x.n_jets,
        "mask_fn": (lambda v: results.x.n_jets == v),
    },
}

Based on the weight_map in the example above, this will result in eight additional fields in stats, e.g, "sum_mc_weight_per_process", "sum_mc_weight_selected_per_process", "sum_mc_weight_per_njet", "sum_mc_weight_selected_per_njet", etc. (same of “num”). Each of these new fields will refer to a dictionary with keys corresponding to the unique values defined in the group_map above.

In addition, combinations of groups can be configured using group_combinations. It accepts a sequence of tuples whose elements should be names of groups in group_names. As the name suggests, combinations of all possible values between groups are evaluated and stored in a nested dictionary.

Example:

group_combinations = [("process", "njet")]

In this case, stats will obtain additional fields, such as "sum_mc_weight_per_process_and_njet" and "sum_mc_weight_selected_per_process_and_njet", referring to nested dictionaries whose structure depends on the exact order of group names per tuple.

data_only = False#

mc_only = False#

nominal_only = False#

setup_func(reqs, inputs, reader_targets)#

Return type:: None

shifts_only = None#

class increment_event_stats(*args, **kwargs)[source]#

Bases: Selector

Attributes:

`call_force`
`data_only`
`mc_only`
`nominal_only`
`produces`
`shifts_only`
`uses`

Methods:

call_func(events, results, stats, **kwargs)

Simplified version of increment_stats that only increments the number of events and the number of selected events.

call_force = True#

call_func(events, results, stats, **kwargs)#

Simplified version of increment_stats that only increments the number of events and the number of selected events.

Return type:: tuple[Array, SelectionResult]

data_only = False#

mc_only = False#

nominal_only = False#

produces = {<class 'columnflow.selection.stats.increment_stats'>}#

shifts_only = None#

uses = {<class 'columnflow.selection.stats.increment_stats'>}#

stats

Contents

stats#

`stats`#