stats#

Selector helpers for book keeping of selection and event weight statistics for aggregation over datasets.

Classes:

increment_stats(*args, **kwargs)

increment_event_stats(*args, **kwargs)

class increment_stats(*args, **kwargs)[source]#

Bases: Selector

Attributes:

Methods:

call_func(events, results, stats[, ...])

Unexposed selector that does not actually select objects but that instead increments selection metrics in a given dictionary stats given a chunk of events and the corresponding selection results.

setup_func(task, reqs, inputs, ...)

skip_func(**kwargs)

update_cls_dict(cls_name, cls_dict, get_attr)

cache_instances = True#
call_force = True#
call_func(events, results, stats, weight_map=None, group_map=None, group_combinations=None, skip_func=None, **kwargs)#

Unexposed selector that does not actually select objects but that instead increments selection metrics in a given dictionary stats given a chunk of events and the corresponding selection results.

A weight_map* can be defined to configure the actual fields to be added. The key of each entry should either start with "num, to state that it will refer to a plain number of events, or "sum", to state that the field describes the sum of a specific column (usualky weights). Different types of values are accepted, depending on the type of “operation”: :rtype: tuple[ak.Array, SelectionResult]

  • "num": An event mask, or an Ellipsis to select all events.

  • "sum": Either a column to sum over, or a 2-tuple containing the column to sum, and

    an event mask to only sum over certain events.

Example:

# weight map definition
weight_map = {
    # "num" operations
    "num_events": Ellipsis,  # all events
    "num_events_selected": results.event,  # selected events only
    # "sum" operations
    "sum_mc_weight": events.mc_weight,  # weights of all events
    "sum_mc_weight_selected": (events.mc_weight, results.event),  # weights of selected events
}

# usage within an exposed selector
# (where results are generated, and events and stats were passed by SelectEvents)
self[increment_stats_per_process](events, results, stats, weight_map=weight_map, **kwargs)

Each sum of weights can also be extracted for each unique element in a so-called group, such as per process id, or per jet multiplicity bin. For this purpose, a group_map can be defined, mapping the name of a group (e.g. "process" or "njet") to a dictionary with the fields

  • "values", unique values to loop over,

  • "mask_fn", a function that is supposed to return a mask given a single value, and

  • "combinations_only" (optional), a boolean flag (False by default) that decides

    whether this group is not to be evaluated on its own, but only as part of a combination with other groups (see below).

Example:

group_map = {
    "process": {
        "values": events.process_id,
        "mask_fn": (lambda v: events.process_id == v),
    },
    "njet": {
        "values": results.x.n_jets,
        "mask_fn": (lambda v: results.x.n_jets == v),
    },
}

Based on the weight_map in the example above, this will result in eight additional fields in stats, e.g, "sum_mc_weight_per_process", "sum_mc_weight_selected_per_process", "sum_mc_weight_per_njet", "sum_mc_weight_selected_per_njet", etc. (same of “num”). Each of these new fields will refer to a dictionary with keys corresponding to the unique values defined in the group_map above.

In addition, combinations of groups can be configured using group_combinations. It accepts a sequence of tuples whose elements should be names of groups in group_names. As the name suggests, combinations of all possible values between groups are evaluated and stored in a nested dictionary.

Example:

group_combinations = [("process", "njet")]

In this case, stats will obtain additional fields, such as "sum_mc_weight_per_process_and_njet" and "sum_mc_weight_selected_per_process_and_njet", referring to nested dictionaries whose structure depends on the exact order of group names per tuple. To reduce the number of entries in the stats but still make use of this combinatorics feature, a skip_func can be defined that receives the weight name and the names of the groups of an entry. If the function returns True, the entry will be skipped.

data_only = False#
mc_only = False#
setup_func(task, reqs, inputs, reader_targets, **kwargs)#
Return type:

None

skip_func(**kwargs) bool#
Return type:

bool

static update_cls_dict(cls_name, cls_dict, get_attr)#
class increment_event_stats(*args, **kwargs)[source]#

Bases: Selector

Attributes:

Methods:

call_func(events, results, stats, **kwargs)

Simplified version of increment_stats that only increments the number of events and the number of selected events.

skip_func(**kwargs)

update_cls_dict(cls_name, cls_dict, get_attr)

cache_instances = True#
call_force = True#
call_func(events, results, stats, **kwargs)#

Simplified version of increment_stats that only increments the number of events and the number of selected events.

Return type:

tuple[Array, SelectionResult]

data_only = False#
mc_only = False#
produces = {<class 'columnflow.selection.stats.increment_stats'>}#
skip_func(**kwargs) bool#
Return type:

bool

static update_cls_dict(cls_name, cls_dict, get_attr)#
uses = {<class 'columnflow.selection.stats.increment_stats'>}#