`columnflow.config_util`#

Collection of general helpers and utilities.

Functions:

`get_root_processes_from_campaign`(campaign)	Extracts all root process objects from datasets contained in an order campaign and returns them in a unique object index.
`get_datasets_from_process`(config, process[, ...])	Given a process and the config it belongs to, returns a list of order dataset objects that contain matching processes.
`add_shift_aliases`(config, shift_source, aliases)	Extracts the two up and down shift instances from a config corresponding to a shift_source (i.e.
`get_shifts_from_sources`(config, *shift_sources)	Takes a config object and returns a list of shift instances for both directions given a sequence shift_sources.
`expand_shift_sources`(shifts)	Given a sequence shifts containing either shift names (`<source>_<direction>`) or shift sources, the latter ones are expanded with both possible directions and returned in a common list.
`create_category_id`(config, category_name[, ...])	Creates a unique id for a `order.Category` named category_name in a `order.Config` object config and returns it.
`add_category`(config, **kwargs)	Creates a `order.Category` instance by forwarding all kwargs to its constructor, adds it to a `order.Config` object config and returns it.
`create_category_combinations`(config, ...[, ...])	Given a config object and sequences of categories in a dict, creates all combinations of possible leaf categories at different depths, connects them with parent - child relations (see `order.Category`) and returns the number of newly created categories.
`verify_config_processes`(config[, warn])	Verifies for all datasets contained in a config object that the linked processes are covered by any process object registered in config and raises an exception if not.

get_root_processes_from_campaign(campaign)[source]#

Extracts all root process objects from datasets contained in an order campaign and returns them in a unique object index.

Parameters:: campaign (Campaign) – Campaign object containing information about relevant datasets
Return type:: UniqueObjectIndex
Returns:: Unique indices for Process instances of root processes associated with these datasets

get_datasets_from_process(config, process, strategy='inclusive', only_first=True, check_deep=False)[source]#

Given a process and the config it belongs to, returns a list of order dataset objects that contain matching processes. This is done by walking through process and its child processes and checking whether they are contained in known datasets. strategy controls how possible ambiguities are resolved:

"all": The full process tree is traversed and all matching datasets are considered.
Note that this might lead to a potential over-representation of the phase space.

"inclusive": If a dataset is found to match a process, its child processes are not
checked further.

"exclusive": If any (deep) subprocess of process is found to be contained in a
dataset, return datasets of subprocesses but not that of process itself (if any).

"exclusive_strict": If all (deep) subprocesses of process are found to be
contained in a dataset, return these datasets but not that of process itself (if any).

As an example, consider the process tree

flowchart BT A[single top] B{s channel} C{t channel} D{tw channel} E(t) F(tbar) G(t) H(tbar) I(t) J(tbar) B --> A C --> A D --> A E --> B F --> B G --> C H --> C I --> D J --> D

and datasets existing for

single top - s channel - t
single top - s channel - tbar
single top - t channel
single top - t channel - t
single top - tw channel
single top - tw channel - t
single top - tw channel - tbar

in the config. Depending on strategy, the returned datasets for process ``single top``are:

"all": [1, 2, 3, 4, 5, 6, 7]. Simply all datasets matching any subprocess.

"inclusive": [1, 2, 3, 5]. Skipping single top - t channel - t,
single top - tw channel - t, and single top - tw channel - tbar, since more inclusive datasets (single top - t channel and single top - tw channel) exist.

"exclusive": [1, 2, 4, 6, 7]. Skipping single_top - t_channel and
single top - tw channel since more exclusive datasets (single top - t channel - t, single top - tw channel - t, and single top - tw channel - tbar) exist.

"exclusive_strict": [1, 2, 3, 6, 7]. Like "exclusive", but not skipping
single top - t channel since not all subprocesses of t channel match a dataset (there is no single top - t channel - tbar dataset).

In addition, two arguments configure how the check is performed whether a process is contained in a dataset. If only_first is True, only the first matching dataset is considered. Otherwise, all datasets matching a specific process are returned. For the check itself, check_deep is forwarded to order.Dataset.has_process().

Parameters:

config (order.config.Config) – Config instance containing the information about known datasets.
process (str | order.process.Process) – Process instance or process name for which you want to obtain list of datasets.
strategy (str, default: 'inclusive') – controls how possible ambiguities are resolved. Choices: ["all", "inclusive", "exclusive", "exclusive_strict"]
only_first (bool, default: True) – If True, only the first matching dataset is considered.
check_deep (bool, default: False) – Forwarded to order.Dataset.has_process()

Raises:

ValueError – If strategy is not in list of allowed choices

Return type:

list[order.dataset.Dataset]

Returns:

List of datasets that correspond to process, depending on the specifics of the query

add_shift_aliases(config, shift_source, aliases)[source]#

Extracts the two up and down shift instances from a config corresponding to a shift_source (i.e. the name of a shift without directions) and assigns aliases to their auxiliary data.

Aliases should be given in a dictionary, mapping alias targets (keys) to sources (values). In both strings, template variables are injected with fields corresponding to all od.Shift attributes, such as name, id, and direction.

Example:

add_shift_aliases(config, "pdf", {"pdf_weight": "pdf_weight_{direction}"})
# adds {"pdf_weight": "pdf_weight_up"} to the "pdf_up" shift in "config"
# plus {"pdf_weight": "pdf_weight_down"} to the "pdf_down" shift in "config"

Return type:: None

get_shifts_from_sources(config, *shift_sources)[source]#

Takes a config object and returns a list of shift instances for both directions given a sequence shift_sources.

Return type:: list[Shift]

expand_shift_sources(shifts)[source]#

Given a sequence shifts containing either shift names (<source>_<direction>) or shift sources, the latter ones are expanded with both possible directions and returned in a common list.

Example:

expand_shift_sources(["jes", "jer_up"])
# -> ["jes_up", "jes_down", "jer_up"]

Return type:: list[str]

create_category_id(config, category_name, hash_len=7, salt=None)[source]#: Creates a unique id for a order.Category named category_name in a order.Config object config and returns it. Internally, law.util.create_hash() is used which receives hash_len. In case of an unintentional (yet unlikely) collision of two ids, there is the option to add a custom salt value. :rtype: int

Note

Please note that the size of the returned id depends on hash_len. When storing the id subsequently in an array, please be aware that values 8 or more require a np.int64.

add_category(config, **kwargs)[source]#

Creates a order.Category instance by forwarding all kwargs to its constructor, adds it to a order.Config object config and returns it. When kwargs do not contain a field id, create_category_id() is used to create one.

Return type:: Category

create_category_combinations(config, categories, name_fn, kwargs_fn=None, skip_existing=True, skip_fn=None)[source]#

Given a config object and sequences of categories in a dict, creates all combinations of possible leaf categories at different depths, connects them with parent - child relations (see order.Category) and returns the number of newly created categories.

categories should be a dictionary that maps string names to sequences of categories that should be combined. The names are used as keyword arguments in a callable name_fn that is supposed to return the name of newly created categories (see example below).

Each newly created category is instantiated with this name as well as arbitrary keyword arguments as returned by kwargs_fn. This function is called with the categories (in a dictionary, mapped to the sequence names as given in categories) that contribute to the newly created category and should return a dictionary. If the fields "id" and "selection" are missing, they are filled with reasonable defaults leading to a auto-generated, deterministic id and a list of all parent selection statements.

If the name of a new category is already known to config it is skipped unless skip_existing is False. In addition, skip_fn can be a callable that receives a dictionary mapping group names to categories that represents the combination of categories to be added. In case skip_fn returns True, the combination is skipped.

Example:

categories = {
    "lepton": [cfg.get_category("e"), cfg.get_category("mu")],
    "n_jets": [cfg.get_category("1j"), cfg.get_category("2j")],
    "n_tags": [cfg.get_category("0t"), cfg.get_category("1t")],
}

def name_fn(categories):
    # simple implementation: join names in defined order if existing
    return "__".join(cat.name for cat in categories.values() if cat)

def kwargs_fn(categories):
    # return arguments that are forwarded to the category init
    # (use id "+" here which simply increments the last taken id, see order.Category)
    # (note that this is also the default)
    return {"id": "+"}

create_category_combinations(cfg, categories, name_fn, kwargs_fn)

Return type:: int

verify_config_processes(config, warn=False)[source]#

Verifies for all datasets contained in a config object that the linked processes are covered by any process object registered in config and raises an exception if not. If warn is True, a warning is printed instead.

Return type:: None

columnflow.config_util

Contents

columnflow.config_util#

`columnflow.config_util`#