columnflow.config_util#

Collection of general helpers and utilities.

Functions:

get_events_from_categories(events, categories)

Helper function that returns all events from an awkward array events that are categorized into one of the leafs of one of the categories.

get_category_name_columns(category_ids, ...)

Function that transforms column of category ids to column of category names.

get_root_processes_from_campaign(campaign)

Extracts all root process objects from datasets contained in an order campaign and returns them in a unique object index.

get_datasets_from_process(config, process[, ...])

Given a process and the config it belongs to, returns a list of order dataset objects that contain matching processes.

add_shift_aliases(config, shift_source, aliases)

Extracts the two up and down shift instances from a config corresponding to a shift_source (i.e. the name of a shift without directions) and assigns aliases to their auxiliary data.

get_shift_from_configs(configs, shift[, silent])

Given a list of configs and a shift name or instance, returns the corresponding shift instance from the first config that contains it.

get_shifts_from_sources(config, *shift_sources)

Takes a config object and returns a list of shift instances for both directions given a sequence shift_sources.

group_shifts(shifts)

Takes several order.Shift instances shifts and groups them according to their shift source.

expand_shift_sources(shifts)

Given a sequence shifts containing either shift names (<source>_<direction>) or shift sources, the latter ones are expanded with both possible directions and returned in a common list.

create_category_id(config, category_name[, ...])

Creates a unique id for a order.Category named category_name in a order.Config object config and returns it.

add_category(config[, parent])

Creates a order.Category instance by forwarding all kwargs to its constructor, adds it to a parent object.

create_category_combinations(config, ...[, ...])

Given a config object and sequences of categories in a dict, creates all combinations of possible leaf categories at different depths, connects them with parent - child relations (see order.Category) and returns the number of newly created categories.

verify_config_processes(config[, warn])

Verifies for all datasets contained in a config object that the linked processes are covered by any process object registered in config and raises an exception if not.

Classes:

CategoryGroup(categories, is_complete, ...)

Container to store information about a group of categories, mostly used for creating combinations in create_category_combinations().

get_events_from_categories(events, categories, config_inst=None)[source]#

Helper function that returns all events from an awkward array events that are categorized into one of the leafs of one of the categories.

Parameters:
  • events (ak.Array) – Awkward array. Requires the ‘category_ids’ field to be present.

  • categories (Sequence[str | od.Category]) – Sequence of category instances. Can also be a sequence of strings when passing a config_inst.

  • config_inst (od.Config | None, default: None) – Optional config instance to load category instances.

Raises:

ValueError – If “category_ids” is not present in the events fields.

Return type:

ak.Array

Returns:

Awkward array of all events that are categorized into one of the leafs of one of the categories

get_category_name_columns(category_ids, config_inst)[source]#

Function that transforms column of category ids to column of category names.

Parameters:
  • category_ids (Array) – Awkward array of category ids.

  • config_inst (Config) – Config instance from which to load category instances.

Raises:

ValueError – If any of the category ids is not defined in the config_inst.

Return type:

Array

Returns:

Awkward array of category names with the same shape as category_ids

get_root_processes_from_campaign(campaign)[source]#

Extracts all root process objects from datasets contained in an order campaign and returns them in a unique object index.

Parameters:

campaign (Campaign) – Campaign object containing information about relevant datasets

Return type:

UniqueObjectIndex

Returns:

Unique indices for Process instances of root processes associated with these datasets

get_datasets_from_process(config, process, strategy='inclusive', only_first=True, check_deep=False)[source]#

Given a process and the config it belongs to, returns a list of order dataset objects that contain matching processes. This is done by walking through process and its child processes and checking whether they are contained in known datasets. strategy controls how possible ambiguities are resolved:

  • "all": The full process tree is traversed and all matching datasets are considered.

    Note that this might lead to a potential over-representation of the phase space.

  • "inclusive": If a dataset is found to match a process, its child processes are not

    checked further.

  • "exclusive": If any (deep) subprocess of process is found to be contained in a

    dataset, return datasets of subprocesses but not that of process itself (if any).

  • "exclusive_strict": If all (deep) subprocesses of process are found to be

    contained in a dataset, return these datasets but not that of process itself (if any).

As an example, consider the process tree

        flowchart BT
    A[single top]
    B{s channel}
    C{t channel}
    D{tw channel}
    E(t)
    F(tbar)
    G(t)
    H(tbar)
    I(t)
    J(tbar)

    B --> A
    C --> A
    D --> A

    E --> B
    F --> B

    G --> C
    H --> C

    I --> D
    J --> D
    

and datasets existing for

  1. single top - s channel - t

  2. single top - s channel - tbar

  3. single top - t channel

  4. single top - t channel - t

  5. single top - tw channel

  6. single top - tw channel - t

  7. single top - tw channel - tbar

in the config. Depending on strategy, the returned datasets for process ``single top``are:

  • "all": [1, 2, 3, 4, 5, 6, 7]. Simply all datasets matching any subprocess.

  • "inclusive": [1, 2, 3, 5]. Skipping single top - t channel - t,

    single top - tw channel - t, and single top - tw channel - tbar, since more inclusive datasets (single top - t channel and single top - tw channel) exist.

  • "exclusive": [1, 2, 4, 6, 7]. Skipping single_top - t_channel and

    single top - tw channel since more exclusive datasets (single top - t channel - t, single top - tw channel - t, and single top - tw channel - tbar) exist.

  • "exclusive_strict": [1, 2, 3, 6, 7]. Like "exclusive", but not skipping

    single top - t channel since not all subprocesses of t channel match a dataset (there is no single top - t channel - tbar dataset).

In addition, two arguments configure how the check is performed whether a process is contained in a dataset. If only_first is True, only the first matching dataset is considered. Otherwise, all datasets matching a specific process are returned. For the check itself, check_deep is forwarded to order.Dataset.has_process().

Parameters:
  • config (od.config.Config) – Config instance containing the information about known datasets.

  • process (str | od.process.Process) – Process instance or process name for which you want to obtain list of datasets.

  • strategy (str, default: 'inclusive') – controls how possible ambiguities are resolved. Choices: ["all", "inclusive", "exclusive", "exclusive_strict"]

  • only_first (bool, default: True) – If True, only the first matching dataset is considered.

  • check_deep (bool, default: False) – Forwarded to order.Dataset.has_process()

Raises:

ValueError – If strategy is not in list of allowed choices

Return type:

list[od.dataset.Dataset]

Returns:

List of datasets that correspond to process, depending on the specifics of the query

add_shift_aliases(config, shift_source, aliases)[source]#

Extracts the two up and down shift instances from a config corresponding to a shift_source (i.e. the name of a shift without directions) and assigns aliases to their auxiliary data.

Aliases should be given in a dictionary, mapping alias targets (keys) to sources (values). In both strings, template variables are injected with fields corresponding to all od.Shift attributes, such as name, id, and direction.

Example:

add_shift_aliases(config, "pdf", {"pdf_weight": "pdf_weight_{direction}"})
# adds {"pdf_weight": "pdf_weight_up"} to the "pdf_up" shift in "config"
# plus {"pdf_weight": "pdf_weight_down"} to the "pdf_down" shift in "config"
Return type:

None

get_shift_from_configs(configs, shift, silent=False)[source]#

Given a list of configs and a shift name or instance, returns the corresponding shift instance from the first config that contains it. If silent is True, None is returned instead of raising an exception in case the shift is not found.

Return type:

od.Shift | None

get_shifts_from_sources(config, *shift_sources)[source]#

Takes a config object and returns a list of shift instances for both directions given a sequence shift_sources.

Return type:

list[Shift]

group_shifts(shifts)[source]#

Takes several order.Shift instances shifts and groups them according to their shift source. The nominal shift, if present, is returned separately. The remaining shifts are grouped by their source and the corresponding up and down shifts are stored in a dictionary. :rtype: tuple[od.Shift | None, dict[str, tuple[od.Shift, od.Shift]]]

Example: .. code-block:: python

# assuming the following shifts exist group_shifts([nominal, x_up, y_up, y_down, x_down]) # -> (nominal, {“x”: (x_up, x_down), “y”: (y_up, y_down)})

An exception is raised in case a shift source is represented only by its up or down shift.

expand_shift_sources(shifts)[source]#

Given a sequence shifts containing either shift names (<source>_<direction>) or shift sources, the latter ones are expanded with both possible directions and returned in a common list.

Example:

expand_shift_sources(["jes", "jer_up"])
# -> ["jes_up", "jes_down", "jer_up"]
Return type:

list[str]

create_category_id(config, category_name, hash_len=7, salt=None)[source]#

Creates a unique id for a order.Category named category_name in a order.Config object config and returns it. Internally, law.util.create_hash() is used which receives hash_len. In case of an unintentional (yet unlikely) collision of two ids, there is the option to add a custom salt value. :rtype: int

Note

Please note that the size of the returned id depends on hash_len. When storing the id subsequently in an array, please be aware that values 8 or more require a np.int64.

add_category(config, parent=None, **kwargs)[source]#

Creates a order.Category instance by forwarding all kwargs to its constructor, adds it to a parent object. such as a order.Config or an other order.Category, and returns it. When kwargs do not contain a field id, create_category_id() is used to create one.

Parameters:
  • config (od.Config) – order.Config object for which the category is created.

  • parent (od.Config | od.Category | od.Channel | None, default: None) – Parent object to which the category is added. If None, config is used.

  • kwargs – Keyword arguments forwarded to the category constructor.

Return type:

od.Category

Returns:

The newly created category instance.

class CategoryGroup(categories, is_complete, has_overlap, warn=True)[source]#

Bases: object

Container to store information about a group of categories, mostly used for creating combinations in create_category_combinations().

Parameters:
  • categories (list[od.Category | str]) – List of order.Category objects or names that refer to the desired category.

  • is_complete (bool) – Should be True if the union of category selections covers the full phase space (no gaps).

  • has_overlap (bool) – Should be False if all categories are pairwise disjoint (no overlap).

  • warn (bool, default: True) – If True, a warning is issued when summing over the group of categories.

Attributes:

categories

is_complete

has_overlap

warn

is_partition

Returns True if the group of categories is a full partition of the phase space (no overlap, no gaps).

categories: list[od.Category | str]#
is_complete: bool#
has_overlap: bool#
warn: bool = True#
property is_partition: bool#

Returns True if the group of categories is a full partition of the phase space (no overlap, no gaps).

create_category_combinations(config, categories, name_fn, kwargs_fn=None, skip_existing=True, skip_fn=None)[source]#

Given a config object and sequences of categories in a dict, creates all combinations of possible leaf categories at different depths, connects them with parent - child relations (see order.Category) and returns the number of newly created categories.

categories should be a dictionary that maps string names to CategoryGroup objects which are thin wrappers around sequences of categories (objects or names). Group names (dictionary keys) are used as keyword arguments in a callable name_fn that is supposed to return the name of newly created categories (see example below).

Note

The CategoryGroup.is_complete and CategoryGroup.has_overlap attributes are imperative for columnflow to determine whether the summation over specific categories is valid or may result in under- or over-counting when combining leaf categories. These checks may be performed by other functions and tools based on information derived from groups and stored in auxiliary fields of the newly created categories.

Each newly created category is instantiated with this name as well as arbitrary keyword arguments as returned by kwargs_fn. This function is called with the categories (in a dictionary, mapped to the sequence names as given in categories) that contribute to the newly created category and should return a dictionary. If the fields "id" and "selection" are missing, they are filled with reasonable defaults leading to a auto-generated, deterministic id and a list of all parent selection statements.

If the name of a new category is already known to config it is skipped unless skip_existing is False. In addition, skip_fn can be a callable that receives a dictionary mapping group names to categories that represents the combination of categories to be added. In case skip_fn returns True, the combination is skipped.

Example:

categories = {
    "lepton": CategoryGroup(categories=["e", "mu"], is_complete=False, has_overlap=False),
    "n_jets": CategoryGroup(categories=["eq0j", "eq1j", "ge2j"], is_complete=True, has_overlap=False),
    "n_tags": CategoryGroup(categories=["0t", "1t"], is_complete=False, has_overlap=False),
}

def name_fn(categories):
    # simple implementation: join names in defined order if existing
    return "__".join(cat.name for cat in categories.values() if cat)

def kwargs_fn(categories):
    # return arguments that are forwarded to the category init
    # (use id "+" here which simply increments the last taken id, see order.Category)
    # (note that this is also the default)
    return {"id": "+"}

create_category_combinations(cfg, categories, name_fn, kwargs_fn)
Parameters:
  • config (od.Config) – order.Config object for which the categories are created.

  • categories (dict[str, CategoryGroup | list[od.Category]]) – Dictionary that maps group names to CategoryGroup containers.

  • name_fn (Callable[[Any], str]) – Callable that receives a dictionary mapping group names to categories and returns the name of the newly created category.

  • kwargs_fn (Callable[[Any], dict] | None, default: None) – Callable that receives a dictionary mapping group names to categories and returns a dictionary of keyword arguments that are forwarded to the category constructor.

  • skip_existing (bool, default: True) – If True, skip the creation of a category when it already exists in config.

  • skip_fn (Callable[[dict[str, od.Category], str], bool] | None, default: None) – Callable that receives a dictionary mapping group names to categories and returns True if the combination should be skipped.

Raises:
  • TypeError – If name_fn is not a callable.

  • TypeError – If kwargs_fn is not a callable when set.

  • ValueError – If a non-unique category id is detected.

Return type:

int

Returns:

Number of newly created categories.

verify_config_processes(config, warn=False)[source]#

Verifies for all datasets contained in a config object that the linked processes are covered by any process object registered in config and raises an exception if not. If warn is True, a warning is printed instead.

Return type:

None