columnflow.config_util
#
Collection of general helpers and utilities.
Functions:
|
Extracts all root process objects from datasets contained in an order campaign and returns them in a unique object index. |
|
Given a process and the config it belongs to, returns a list of order dataset objects that contain matching processes. |
|
Extracts the two up and down shift instances from a config corresponding to a shift_source (i.e. |
|
Takes a config object and returns a list of shift instances for both directions given a sequence shift_sources. |
|
Given a sequence shifts containing either shift names ( |
|
Creates a unique id for a |
|
Creates a |
|
Given a config object and sequences of categories in a dict, creates all combinations of possible leaf categories at different depths, connects them with parent - child relations (see |
|
Verifies for all datasets contained in a config object that the linked processes are covered by any process object registered in config and raises an exception if not. |
- get_root_processes_from_campaign(campaign)[source]#
Extracts all root process objects from datasets contained in an order campaign and returns them in a unique object index.
- get_datasets_from_process(config, process, strategy='inclusive', only_first=True, check_deep=False)[source]#
Given a process and the config it belongs to, returns a list of order dataset objects that contain matching processes. This is done by walking through process and its child processes and checking whether they are contained in known datasets. strategy controls how possible ambiguities are resolved:
"all"
: The full process tree is traversed and all matching datasets are considered.Note that this might lead to a potential over-representation of the phase space.
"inclusive"
: If a dataset is found to match a process, its child processes are notchecked further.
"exclusive"
: If any (deep) subprocess of process is found to be contained in adataset, return datasets of subprocesses but not that of process itself (if any).
"exclusive_strict"
: If all (deep) subprocesses of process are found to becontained in a dataset, return these datasets but not that of process itself (if any).
As an example, consider the process tree
flowchart BT A[single top] B{s channel} C{t channel} D{tw channel} E(t) F(tbar) G(t) H(tbar) I(t) J(tbar) B --> A C --> A D --> A E --> B F --> B G --> C H --> C I --> D J --> Dand datasets existing for
single top - s channel - t
single top - s channel - tbar
single top - t channel
single top - t channel - t
single top - tw channel
single top - tw channel - t
single top - tw channel - tbar
in the config. Depending on strategy, the returned datasets for process ``single top``are:
"all"
:[1, 2, 3, 4, 5, 6, 7]
. Simply all datasets matching any subprocess."inclusive"
:[1, 2, 3, 5]
. Skippingsingle top - t channel - t
,single top - tw channel - t
, andsingle top - tw channel - tbar
, since more inclusive datasets (single top - t channel
andsingle top - tw channel
) exist.
"exclusive"
:[1, 2, 4, 6, 7]
. Skippingsingle_top - t_channel
andsingle top - tw channel
since more exclusive datasets (single top - t channel - t
,single top - tw channel - t
, andsingle top - tw channel - tbar
) exist.
"exclusive_strict"
:[1, 2, 3, 6, 7]
. Like"exclusive"
, but not skippingsingle top - t channel
since not all subprocesses oft channel
match a dataset (there is nosingle top - t channel - tbar
dataset).
In addition, two arguments configure how the check is performed whether a process is contained in a dataset. If only_first is True, only the first matching dataset is considered. Otherwise, all datasets matching a specific process are returned. For the check itself, check_deep is forwarded to
order.Dataset.has_process()
.- Parameters:
config (order.config.Config) – Config instance containing the information about known datasets.
process (str | order.process.Process) – Process instance or process name for which you want to obtain list of datasets.
strategy (str, default:
'inclusive'
) – controls how possible ambiguities are resolved. Choices: ["all"
,"inclusive"
,"exclusive"
,"exclusive_strict"
]only_first (bool, default:
True
) – If True, only the first matching dataset is considered.check_deep (bool, default:
False
) – Forwarded toorder.Dataset.has_process()
- Raises:
ValueError – If strategy is not in list of allowed choices
- Return type:
- Returns:
List of datasets that correspond to process, depending on the specifics of the query
- add_shift_aliases(config, shift_source, aliases)[source]#
Extracts the two up and down shift instances from a config corresponding to a shift_source (i.e. the name of a shift without directions) and assigns aliases to their auxiliary data.
Aliases should be given in a dictionary, mapping alias targets (keys) to sources (values). In both strings, template variables are injected with fields corresponding to all
od.Shift
attributes, such as name, id, and direction.Example:
add_shift_aliases(config, "pdf", {"pdf_weight": "pdf_weight_{direction}"}) # adds {"pdf_weight": "pdf_weight_up"} to the "pdf_up" shift in "config" # plus {"pdf_weight": "pdf_weight_down"} to the "pdf_down" shift in "config"
- Return type:
- get_shifts_from_sources(config, *shift_sources)[source]#
Takes a config object and returns a list of shift instances for both directions given a sequence shift_sources.
- expand_shift_sources(shifts)[source]#
Given a sequence shifts containing either shift names (
<source>_<direction>
) or shift sources, the latter ones are expanded with both possible directions and returned in a common list.Example:
expand_shift_sources(["jes", "jer_up"]) # -> ["jes_up", "jes_down", "jer_up"]
- create_category_id(config, category_name, hash_len=7, salt=None)[source]#
Creates a unique id for a
order.Category
named category_name in aorder.Config
object config and returns it. Internally,law.util.create_hash()
is used which receives hash_len. In case of an unintentional (yet unlikely) collision of two ids, there is the option to add a custom salt value. :rtype:int
Note
Please note that the size of the returned id depends on hash_len. When storing the id subsequently in an array, please be aware that values 8 or more require a
np.int64
.
- add_category(config, **kwargs)[source]#
Creates a
order.Category
instance by forwarding all kwargs to its constructor, adds it to aorder.Config
object config and returns it. When kwargs do not contain a field id,create_category_id()
is used to create one.- Return type:
- create_category_combinations(config, categories, name_fn, kwargs_fn=None, skip_existing=True, skip_fn=None)[source]#
Given a config object and sequences of categories in a dict, creates all combinations of possible leaf categories at different depths, connects them with parent - child relations (see
order.Category
) and returns the number of newly created categories.categories should be a dictionary that maps string names to sequences of categories that should be combined. The names are used as keyword arguments in a callable name_fn that is supposed to return the name of newly created categories (see example below).
Each newly created category is instantiated with this name as well as arbitrary keyword arguments as returned by kwargs_fn. This function is called with the categories (in a dictionary, mapped to the sequence names as given in categories) that contribute to the newly created category and should return a dictionary. If the fields
"id"
and"selection"
are missing, they are filled with reasonable defaults leading to a auto-generated, deterministic id and a list of all parent selection statements.If the name of a new category is already known to config it is skipped unless skip_existing is False. In addition, skip_fn can be a callable that receives a dictionary mapping group names to categories that represents the combination of categories to be added. In case skip_fn returns True, the combination is skipped.
Example:
categories = { "lepton": [cfg.get_category("e"), cfg.get_category("mu")], "n_jets": [cfg.get_category("1j"), cfg.get_category("2j")], "n_tags": [cfg.get_category("0t"), cfg.get_category("1t")], } def name_fn(categories): # simple implementation: join names in defined order if existing return "__".join(cat.name for cat in categories.values() if cat) def kwargs_fn(categories): # return arguments that are forwarded to the category init # (use id "+" here which simply increments the last taken id, see order.Category) # (note that this is also the default) return {"id": "+"} create_category_combinations(cfg, categories, name_fn, kwargs_fn)
- Return type: