mixins

Contents

mixins#

Lightweight mixins task classes.

Classes:

CalibratorMixin(*args, **kwargs)

Mixin to include a single Calibrator into tasks.

CalibratorsMixin(*args, **kwargs)

Mixin to include multiple Calibrator instances into tasks.

SelectorMixin(*args, **kwargs)

Mixin to include a single Selector instances into tasks.

SelectorStepsMixin(*args, **kwargs)

Mixin to include multiple selector steps into tasks.

ProducerMixin(*args, **kwargs)

Mixin to include a single Producer into tasks.

ProducersMixin(*args, **kwargs)

Mixin to include multiple Producer instances into tasks.

MLModelMixinBase(*args, **kwargs)

Base Mixin to include a machine learning applications into tasks.

MLModelTrainingMixin(*args, **kwargs)

A mixin class for training machine learning models.

MLModelMixin(*args, **kwargs)

MLModelDataMixin(*args, **kwargs)

MLModelsMixin(*args, **kwargs)

InferenceModelMixin(*args, **kwargs)

CategoriesMixin(*args, **kwargs)

VariablesMixin(*args, **kwargs)

DatasetsProcessesMixin(*args, **kwargs)

ShiftSourcesMixin(*args, **kwargs)

WeightProducerMixin(*args, **kwargs)

ChunkedIOMixin(*args, **kwargs)

class CalibratorMixin(*args, **kwargs)[source]#

Bases: ConfigTask

Mixin to include a single Calibrator into tasks.

Inheriting from this mixin will give access to instantiate and access a Calibrator instance with name calibrator, which is an input parameter for this task.

Attributes:

calibrator

register_calibrator_shifts

calibrator_inst

Access current Calibrator instance.

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

get_calibrator_inst(calibrator[, kwargs])

Initialize Calibrator instance.

resolve_param_values(params)

Resolve parameter values params relevant for the CalibratorMixin and all classes it inherits from.

get_known_shifts(config_inst, params)

Adds set of shifts that the current calibrator_inst registers to the set of known shifts and upstream_shifts.

req_params(inst, **kwargs)

Returns the required parameters for the task.

store_parts()

Create parts to create the output path to store intermediary results for the current Task.

find_keep_columns(collection)

Finds the columns to keep based on the collection.

calibrator = <luigi.parameter.Parameter object>#
register_calibrator_shifts = False#
classmethod get_calibrator_inst(calibrator, kwargs=None)[source]#

Initialize Calibrator instance.

Extracts relevant kwargs for this calibrator instance using the get_calibrator_kwargs() method. After this process, the previously initialized instance of a Calibrator with the name calibrator is initialized using the get_cls() method with the relevant keyword arguments.

Parameters:
  • calibrator (str) – Name of the calibrator instance

  • kwargs (default: None) – Any set keyword argument that is potentially relevant for this Calibrator instance

Raises:

RuntimeError – if requested Calibrator instance is not exposed

Return type:

Calibrator

Returns:

The initialized Calibrator instance.

classmethod resolve_param_values(params)[source]#

Resolve parameter values params relevant for the CalibratorMixin and all classes it inherits from.

Loads the config_inst and loads the parameter "calibrator". In case the parameter is not found, defaults to "default_calibrator". Finally, this function adds the keyword "calibrator_inst", which contains the Calibrator instance obtained using get_calibrator_inst() method.

Parameters:

params (dict[str, Any]) – Dictionary with parameters provided by the user at commandline level.

Return type:

dict[str, Any]

Returns:

Dictionary of parameters that now includes new value for "calibrator_inst".

classmethod get_known_shifts(config_inst, params)[source]#

Adds set of shifts that the current calibrator_inst registers to the set of known shifts and upstream_shifts.

First, the set of shifts and upstream_shifts are obtained from the config_inst and the current set of parameters params using the get_known_shifts methods of all classes that CalibratorMixin inherits from. Afterwards, check if the current calibrator_inst registers shifts. If register_calibrator_shifts is True, add them to the current set of shifts. Otherwise, add the shifts obtained from the calibrator_inst to upstream_shifts.

Parameters:
  • config_inst (Config) – Config instance for the current task.

  • params (dict[str, Any]) – Dictionary containing the current set of parameters provided by the user at commandline level

Return type:

tuple[set[str], set[str]]

Returns:

Tuple with updated sets of shifts and upstream_shifts.

classmethod req_params(inst, **kwargs)[source]#

Returns the required parameters for the task. It prefers –calibrator set on task-level via command line.

Parameters:
  • inst (Task) – The current task instance.

  • kwargs – Additional keyword arguments.

Return type:

dict[str, Any]

Returns:

Dictionary of required parameters.

property calibrator_inst: Calibrator#

Access current Calibrator instance.

This method loads the current Calibrator calibrator_inst from the cache or initializes it. If the calibrator requests a specific sandbox, set this sandbox as the environment for the current Task.

Returns:

Current Calibrator instance

store_parts()[source]#

Create parts to create the output path to store intermediary results for the current Task.

This method calls store_parts() of the super class and inserts {“calibrator”: “calib__{self.calibrator}”} before keyword version. For more information, see e.g. store_parts().

Return type:

InsertableDict[str, str]

Returns:

Updated parts to create output path to store intermediary results.

find_keep_columns(collection)[source]#

Finds the columns to keep based on the collection.

If the collection is ALL_FROM_CALIBRATOR, it includes the columns produced by the calibrator.

Parameters:

collection (ColumnCollection) – The collection of columns.

Return type:

set[Route]

Returns:

Set of columns to keep.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class CalibratorsMixin(*args, **kwargs)[source]#

Bases: ConfigTask

Mixin to include multiple Calibrator instances into tasks.

Inheriting from this mixin will allow a task to instantiate and access a set of Calibrator instances with names calibrators, which is a comma-separated list of calibrator names and is an input parameter for this task.

Attributes:

calibrators

register_calibrators_shifts

calibrator_insts

Access current list of Calibrator instances.

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

get_calibrator_insts(calibrators[, kwargs])

Get all requested calibrators.

resolve_param_values(params)

Resolve values params and check against possible default values and calibrator groups.

get_known_shifts(config_inst, params)

Adds set of all shifts that the list of calibrator_insts register to the set of known shifts and upstream_shifts.

req_params(inst, **kwargs)

Returns the required parameters for the task.

store_parts()

Create parts to create the output path to store intermediary results for the current Task.

find_keep_columns(collection)

Finds the columns to keep based on the collection.

calibrators = <law.parameter.CSVParameter object>#
register_calibrators_shifts = False#
classmethod get_calibrator_insts(calibrators, kwargs=None)[source]#

Get all requested calibrators.

Calibrator instances are either initalized or loaded from cache.

Parameters:
  • calibrators (Iterable[str]) – Names of Calibrators to load

  • kwargs (default: None) – Additional keyword arguments to forward to individual Calibrator instances

Raises:

RuntimeError – if requested calibrators are not exposed

Return type:

list[Calibrator]

Returns:

List of Calibrator instances.

classmethod resolve_param_values(params)[source]#

Resolve values params and check against possible default values and calibrator groups.

Check the values in params against the default value "default_calibrator" and possible group definitions "calibrator_groups" in the current config inst. For more information, see resolve_config_default_and_groups().

Parameters:

params (InsertableDict[str, Any]) – Parameter values to resolve

Return type:

InsertableDict[str, Any]

Returns:

Dictionary of parameters that contains the list requested Calibrator instances under the keyword "calibrator_insts". See get_calibrator_insts() for more information.

classmethod get_known_shifts(config_inst, params)[source]#

Adds set of all shifts that the list of calibrator_insts register to the set of known shifts and upstream_shifts.

First, the set of shifts and upstream_shifts are obtained from the config_inst and the current set of parameters params using the get_known_shifts methods of all classes that CalibratorsMixin inherits from. Afterwards, loop through the list of Calibrator and check if they register shifts. If register_calibrators_shifts is True, add them to the current set of shifts. Otherwise, add the shifts to upstream_shifts.

Parameters:
  • config_inst (Config) – Config instance for the current task.

  • params (dict[str, Any]) – Dictionary containing the current set of parameters provided by the user at commandline level

Return type:

tuple[set[str], set[str]]

Returns:

Tuple with updated sets of shifts and upstream_shifts.

classmethod req_params(inst, **kwargs)[source]#

Returns the required parameters for the task.

It prefers --calibrators set on task-level via command line.

Parameters:
  • inst (Task) – The current task instance.

  • kwargs – Additional keyword arguments.

Return type:

dict[str, Any]

Returns:

Dictionary of required parameters.

property calibrator_insts: list[columnflow.calibration.Calibrator]#

Access current list of Calibrator instances.

Loads the current Calibrator calibrator_insts from the cache or initializes it.

Returns:

Current list Calibrator instances

store_parts()[source]#

Create parts to create the output path to store intermediary results for the current Task.

Calls store_parts() of the super class and inserts {“calibrator”: “calib__{HASH}”} before keyword version. Here, HASH is the joint string of the first five calibrator names + a hash created with law.util.create_hash() based on the list of calibrators, starting at its 5th element (i.e. self.calibrators[5:]) For more information, see e.g. store_parts().

Returns:

Updated parts to create output path to store intermediary results.

find_keep_columns(collection)[source]#

Finds the columns to keep based on the collection.

If the collection is ALL_FROM_CALIBRATORS, it includes the columns produced by the calibrators.

Parameters:

collection (ColumnCollection) – The collection of columns.

Return type:

set[Route]

Returns:

Set of columns to keep.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class SelectorMixin(*args, **kwargs)[source]#

Bases: ConfigTask

Mixin to include a single Selector instances into tasks.

Inheriting from this mixin will allow a task to instantiate and access a Selector instance with name selector, which is an input parameter for this task.

Attributes:

selector

register_selector_shifts

selector_inst

Access current Selector instance.

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

get_selector_inst(selector[, kwargs])

Get requested selector.

resolve_param_values(params)

Resolve values params and check against possible default values and selector groups.

get_known_shifts(config_inst, params)

Adds set of shifts that the current selector_inst registers to the set of known shifts and upstream_shifts.

req_params(inst, **kwargs)

Get the required parameters for the task, preferring the --selector set on task-level via CLI.

store_parts()

Create parts to create the output path to store intermediary results for the current Task.

find_keep_columns(collection)

Returns a set of Route objects describing columns that should be kept given a type of column collection.

selector = <luigi.parameter.Parameter object>#
register_selector_shifts = False#
classmethod get_selector_inst(selector, kwargs=None)[source]#

Get requested selector.

Selector instance is either initalized or loaded from cache.

Parameters:
  • selector (str) – Name of Selector to load

  • kwargs (default: None) – Additional keyword arguments to forward to the Selector instance

Return type:

Selector

Returns:

Selector instance.

classmethod resolve_param_values(params)[source]#

Resolve values params and check against possible default values and selector groups.

Check the values in params against the default value "default_selector" in the current config inst. For more information, see resolve_config_default().

Parameters:

params (dict[str, Any]) – Parameter values to resolve

Return type:

dict

Returns:

Dictionary of parameters that contains the requested Selector instance under the keyword "selector_inst".

classmethod get_known_shifts(config_inst, params)[source]#

Adds set of shifts that the current selector_inst registers to the set of known shifts and upstream_shifts.

First, the set of shifts and upstream_shifts are obtained from the config_inst and the current set of parameters params using the get_known_shifts methods of all classes that SelectorMixin inherits from. Afterwards, check if the current selector_inst registers shifts. If register_selector_shifts is True, add them to the current set of shifts. Otherwise, add the shifts obtained from the selector_inst to upstream_shifts.

Parameters:
  • config_inst (Config) – Config instance for the current task.

  • params (dict[str, Any]) – Dictionary containing the current set of parameters provided by the user at commandline level

Return type:

tuple[set[str], set[str]]

Returns:

Tuple with updated sets of shifts and upstream_shifts.

classmethod req_params(inst, **kwargs)[source]#

Get the required parameters for the task, preferring the --selector set on task-level via CLI.

This method first checks if the –selector parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.

Parameters:
  • inst (Task) – The current task instance.

  • kwargs – Additional keyword arguments that may contain parameters for the task.

Return type:

dict[str, Any]

Returns:

A dictionary of parameters required for the task.

property selector_inst#

Access current Selector instance.

Loads the current Selector selector_inst from the cache or initializes it. If the selector requests a specific sandbox, set this sandbox as the environment for the current Task.

Returns:

Current Selector instance

store_parts()[source]#

Create parts to create the output path to store intermediary results for the current Task.

Calls store_parts() of the super class and inserts {“selector”: “sel__{SELECTOR_NAME}”} before keyword version. Here, SELECTOR_NAME is the name of the current selector_inst.

Returns:

Updated parts to create output path to store intermediary results.

find_keep_columns(collection)[source]#

Returns a set of Route objects describing columns that should be kept given a type of column collection.

Parameters:

collection (ColumnCollection) – The collection to return.

Return type:

set[Route]

Returns:

A set of Route objects.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class SelectorStepsMixin(*args, **kwargs)[source]#

Bases: SelectorMixin

Mixin to include multiple selector steps into tasks.

Inheriting from this mixin will allow a task to access selector steps, which can be a comma-separated list of selector step names and is an input parameter for this task.

Attributes:

selector_steps

exclude_params_repr_empty

selector_steps_order_sensitive

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

resolve_param_values(params)

Resolve values params and check against possible default values and selector step groups.

req_params(inst, **kwargs)

Get the required parameters for the task, preferring the --selector-steps set on task-level via CLI.

store_parts()

Create parts to create the output path to store intermediary results for the current Task.

selector_steps = <law.parameter.CSVParameter object>#
exclude_params_repr_empty = {'selector_steps'}#
selector_steps_order_sensitive = False#
classmethod resolve_param_values(params)[source]#

Resolve values params and check against possible default values and selector step groups.

Check the values in params against the default value "default_selector_steps" and the group "selector_step_groups" in the current config inst. For more information, see resolve_config_default(). If SelectorStepsMixin.selector_steps_order_sensitive is True, sort the selector steps.

Parameters:

params (dict[str, Any]) – Parameter values to resolve

Return type:

dict[str, Any]

Returns:

Dictionary of parameters that contains the requested selector steps under the keyword "selector_steps".

classmethod req_params(inst, **kwargs)[source]#

Get the required parameters for the task, preferring the –selector-steps set on task-level via CLI.

This method first checks if the –selector-steps parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.

Parameters:
  • inst (Task) – The current task instance.

  • kwargs – Additional keyword arguments that may contain parameters for the task.

Return type:

dict[str, Any]

Returns:

A dictionary of parameters required for the task.

store_parts()[source]#

Create parts to create the output path to store intermediary results for the current Task.

Calls store_parts() of the super class and inserts {“selector”: “__steps__LIST_OF_STEPS”}, where LIST_OF_STEPS is the sorted list of selector steps. For more information, see e.g. store_parts().

Return type:

InsertableDict

Returns:

Updated parts to create output path to store intermediary results.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class ProducerMixin(*args, **kwargs)[source]#

Bases: ConfigTask

Mixin to include a single Producer into tasks.

Inheriting from this mixin will give access to instantiate and access a Producer instance with name producer, which is an input parameter for this task.

Attributes:

producer

register_producer_shifts

producer_inst

Access current Producer instance.

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

get_producer_inst(producer[, kwargs])

Initialize Producer instance.

resolve_param_values(params)

Resolve parameter values params relevant for the ProducerMixin and all classes it inherits from.

get_known_shifts(config_inst, params)

Adds set of shifts that the current producer_inst registers to the set of known shifts and upstream_shifts.

req_params(inst, **kwargs)

Get the required parameters for the task, preferring the --producer set on task-level via CLI.

store_parts()

Create parts to create the output path to store intermediary results for the current Task.

find_keep_columns(collection)

Finds the columns to keep based on the collection.

producer = <luigi.parameter.Parameter object>#
register_producer_shifts = False#
classmethod get_producer_inst(producer, kwargs=None)[source]#

Initialize Producer instance.

Extracts relevant kwargs for this producer instance using the get_producer_kwargs() method. After this process, the previously initialized instance of a Producer with the name producer is initialized using the get_cls() method with the relevant keyword arguments.

Parameters:
  • producer (str) – Name of the Producer instance

  • kwargs (default: None) – Any set keyword argument that is potentially relevant for this Producer instance

Raises:

RuntimeError – if requested Producer instance is not exposed

Return type:

Producer

Returns:

The initialized Producer instance.

classmethod resolve_param_values(params)[source]#

Resolve parameter values params relevant for the ProducerMixin and all classes it inherits from.

Loads the config_inst and loads the parameter "producer". In case the parameter is not found, defaults to "default_producer". Finally, this function adds the keyword "producer_inst", which contains the Producer instance obtained using get_producer_inst() method.

Parameters:

params (dict[str, Any]) – Dictionary with parameters provided by the user at commandline level.

Return type:

dict[str, Any]

Returns:

Dictionary of parameters that now includes new value for "producer_inst".

classmethod get_known_shifts(config_inst, params)[source]#

Adds set of shifts that the current producer_inst registers to the set of known shifts and upstream_shifts.

First, the set of shifts and upstream_shifts are obtained from the config_inst and the current set of parameters params using the get_known_shifts methods of all classes that ProducerMixin inherits from. Afterwards, check if the current producer_inst registers shifts. If register_producer_shifts is True, add them to the current set of shifts. Otherwise, add the shifts obtained from the producer_inst to upstream_shifts.

Parameters:
  • config_inst (Config) – Config instance for the current task.

  • params (dict[str, Any]) – Dictionary containing the current set of parameters provided by the user at commandline level

Return type:

tuple[set[str], set[str]]

Returns:

Tuple with updated sets of shifts and upstream_shifts.

classmethod req_params(inst, **kwargs)[source]#

Get the required parameters for the task, preferring the --producer set on task-level via CLI.

This method first checks if the --producer parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.

Parameters:
  • inst (Task) – The current task instance.

  • kwargs – Additional keyword arguments that may contain parameters for the task.

Return type:

dict[str, Any]

Returns:

A dictionary of parameters required for the task.

property producer_inst: Producer#

Access current Producer instance.

Loads the current Producer producer_inst from the cache or initializes it. If the producer requests a specific sandbox, set this sandbox as the environment for the current Task.

Returns:

Current Producer instance

store_parts()[source]#

Create parts to create the output path to store intermediary results for the current Task.

Calls store_parts() of the super class and inserts {“producer”: “prod__{self.producer}”} before keyword version. For more information, see e.g. store_parts().

Return type:

InsertableDict[str, str]

Returns:

Updated parts to create output path to store intermediary results.

find_keep_columns(collection)[source]#

Finds the columns to keep based on the collection.

This method first calls the ‘find_keep_columns’ method of the superclass with the given collection. If the collection is equal to ALL_FROM_PRODUCER, it adds the columns produced by the producer instance to the set of columns.

Parameters:

collection (ColumnCollection) – The collection of columns.

Return type:

set[Route]

Returns:

A set of columns to keep.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class ProducersMixin(*args, **kwargs)[source]#

Bases: ConfigTask

Mixin to include multiple Producer instances into tasks.

Inheriting from this mixin will allow a task to instantiate and access a set of Producer instances with names producers, which is a comma-separated list of producer names and is an input parameter for this task.

Attributes:

producers

register_producers_shifts

producer_insts

Access current list of Producer instances.

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

get_producer_insts(producers[, kwargs])

Get all requested producers.

resolve_param_values(params)

Resolve values params and check against possible default values and producer groups.

get_known_shifts(config_inst, params)

Adds set of all shifts that the list of producer_insts register to the set of known shifts and upstream_shifts.

req_params(inst, **kwargs)

Get the required parameters for the task, preferring the --producers set on task-level via CLI.

store_parts()

Create parts to create the output path to store intermediary results for the current Task.

find_keep_columns(collection)

Finds the columns to keep based on the collection.

producers = <law.parameter.CSVParameter object>#
register_producers_shifts = False#
classmethod get_producer_insts(producers, kwargs=None)[source]#

Get all requested producers.

Producer instances are either initalized or loaded from cache.

Parameters:
  • producers (Iterable[str]) – Names of Producer instances to load

  • kwargs (default: None) – Additional keyword arguments to forward to individual Producer instances

Raises:

RuntimeError – if requested producers are not exposed

Return type:

list[Producer]

Returns:

List of Producer instances.

classmethod resolve_param_values(params)[source]#

Resolve values params and check against possible default values and producer groups.

Check the values in params against the default value "default_producer" and possible group definitions "producer_groups" in the current config inst. For more information, see resolve_config_default_and_groups().

Parameters:

params (InsertableDict[str, Any]) – Parameter values to resolve

Return type:

InsertableDict[str, Any]

Returns:

Dictionary of parameters that contains the list requested Producer instances under the keyword "producer_insts". See get_producer_insts() for more information.

classmethod get_known_shifts(config_inst, params)[source]#

Adds set of all shifts that the list of producer_insts register to the set of known shifts and upstream_shifts.

First, the set of shifts and upstream_shifts are obtained from the config_inst and the current set of parameters params using the get_known_shifts methods of all classes that ProducersMixin inherits from. Afterwards, loop through the list of Producer and check if they register shifts. If register_producers_shifts is True, add them to the current set of shifts. Otherwise, add the shifts to upstream_shifts.

Parameters:
  • config_inst (Config) – Config instance for the current task.

  • params (dict[str, Any]) – Dictionary containing the current set of parameters provided by the user at commandline level

Return type:

tuple[set[str], set[str]]

Returns:

Tuple with updated sets of shifts and upstream_shifts.

classmethod req_params(inst, **kwargs)[source]#

Get the required parameters for the task, preferring the –producers set on task-level via CLI.

This method first checks if the –producers parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.

Parameters:
  • inst (Task) – The current task instance.

  • kwargs – Additional keyword arguments that may contain parameters for the task.

Return type:

dict[str, Any]

Returns:

A dictionary of parameters required for the task.

property producer_insts: list[columnflow.production.Producer]#

Access current list of Producer instances.

Loads the current Producer producer_insts from the cache or initializes it.

Returns:

Current list Producer instances

store_parts()[source]#

Create parts to create the output path to store intermediary results for the current Task.

Calls store_parts() of the super class and inserts {“producers”: “prod__{HASH}”} before keyword version. Here, HASH is the joint string of the first five producer names + a hash created with law.util.create_hash() based on the list of producers, starting at its 5th element (i.e. self.producers[5:]) For more information, see e.g. store_parts().

Returns:

Updated parts to create output path to store intermediary results.

find_keep_columns(collection)[source]#

Finds the columns to keep based on the collection.

This method first calls the ‘find_keep_columns’ method of the superclass with the given collection. If the collection is equal to ALL_FROM_PRODUCERS, it adds the columns produced by all producer instances to the set of columns.

Parameters:

collection (ColumnCollection) – The collection of columns.

Return type:

set[Route]

Returns:

A set of columns to keep.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class MLModelMixinBase(*args, **kwargs)[source]#

Bases: AnalysisTask

Base Mixin to include a machine learning applications into tasks.

Inheriting from this mixin will allow a task to instantiate and access a MLModel instance with name ml_model, which is an input parameter for this task.

Attributes:

ml_model

exclude_params_repr_empty

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

req_params(inst, **kwargs)

Get the required parameters for the task, preferring the --ml-model set on task-level via CLI.

get_ml_model_inst(ml_model, analysis_inst[, ...])

Get requested ml_model instance.

events_used_in_training(config_inst, ...)

Evaluate whether the events for the combination of dataset_inst and shift_inst shall be used in the training.

ml_model = <luigi.parameter.Parameter object>#
exclude_params_repr_empty = {'ml_model'}#
classmethod req_params(inst, **kwargs)[source]#

Get the required parameters for the task, preferring the --ml-model set on task-level via CLI.

This method first checks if the --ml-model parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.

Parameters:
  • inst (Task) – The current task instance.

  • kwargs – Additional keyword arguments that may contain parameters for the task.

Return type:

dict[str, Any]

Returns:

A dictionary of parameters required for the task.

classmethod get_ml_model_inst(ml_model, analysis_inst, requested_configs=None, **kwargs)[source]#

Get requested ml_model instance.

This method retrieves the requested ml_model instance. If requested_configs are provided, they are used for the training of the ML application.

Parameters:
  • ml_model (str) – Name of MLModel to load.

  • analysis_inst (od.Analysis) – Forward this analysis inst to the init function of new MLModel sub class.

  • requested_configs (list[str] | None, default: None) – Configs needed for the training of the ML application.

  • kwargs – Additional keyword arguments to forward to the MLModel instance.

Return type:

MLModel

Returns:

MLModel instance.

events_used_in_training(config_inst, dataset_inst, shift_inst)[source]#

Evaluate whether the events for the combination of dataset_inst and shift_inst shall be used in the training.

This method checks if the dataset_inst is in the set of datasets of the current ml_model_inst based on the given config_inst. Additionally, the function checks that the shift_inst does not have the tag “disjoint_from_nominal”.

Parameters:
  • config_inst (Config) – The configuration instance.

  • dataset_inst (Dataset) – The dataset instance.

  • shift_inst (Shift) – The shift instance.

Return type:

bool

Returns:

True if the events shall be used in the training, False otherwise.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class MLModelTrainingMixin(*args, **kwargs)[source]#

Bases: MLModelMixinBase

A mixin class for training machine learning models.

This class provides parameters for configuring the training of machine learning models.

Attributes:

configs

calibrators

selectors

producers

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

resolve_calibrators(ml_model_inst, params)

Resolve the calibrators for the given ML model instance.

resolve_selectors(ml_model_inst, params)

Resolve the selectors for the given ML model instance.

resolve_producers(ml_model_inst, params)

Resolve the producers for the given ML model instance.

resolve_param_values(params)

Resolve the parameter values for the given parameters.

store_parts()

Generate a dictionary of store parts for the current instance.

configs = <law.parameter.CSVParameter object>#
calibrators = <law.parameter.MultiCSVParameter object>#
selectors = <law.parameter.CSVParameter object>#
producers = <law.parameter.MultiCSVParameter object>#
classmethod resolve_calibrators(ml_model_inst, params)[source]#

Resolve the calibrators for the given ML model instance.

This method retrieves the calibrators from the parameters params and broadcasts them to the configs if necessary. It also resolves calibrator_groups and default_calibrator from the config(s) associated with this ML model instance, and validates the number of sequences. Finally, it checks the retrieved calibrators against the training calibrators of the model using training_calibrators() and instantiates them if necessary.

Parameters:
  • ml_model_inst (MLModel) – The ML model instance.

  • params (dict[str, Any]) – A dictionary of parameters that may contain the calibrators.

Return type:

tuple[tuple[str]]

Returns:

A tuple of tuples containing the resolved calibrators.

Raises:

Exception – If the number of calibrator sequences does not match the number of configs used by the ML model.

classmethod resolve_selectors(ml_model_inst, params)[source]#

Resolve the selectors for the given ML model instance.

This method retrieves the selectors from the parameters params and broadcasts them to the configs if necessary. It also resolves default_selector from the config(s) associated with this ML model instance, validates the number of sequences. Finally, it checks the retrieved selectors against the training selectors of the model, using training_selector(), and instantiates them.

Parameters:
  • ml_model_inst (MLModel) – The ML model instance.

  • params (dict[str, Any]) – A dictionary of parameters that may contain the selectors.

Return type:

tuple[str]

Returns:

A tuple containing the resolved selectors.

Raises:

Exception – If the number of selector sequences does not match the number of configs used by the ML model.

classmethod resolve_producers(ml_model_inst, params)[source]#

Resolve the producers for the given ML model instance.

This method retrieves the producers from the parameters params and broadcasts them to the configs if necessary. It also resolves producer_groups and default_producer from the config(s) associated with this ML model instance, validates the number of sequences. Finally, it checks the retrieved producers against the training producers of the model, using training_producers(), and instantiates them.

Parameters:
  • ml_model_inst (MLModel) – The ML model instance.

  • params (dict[str, Any]) – A dictionary of parameters that may contain the producers.

Return type:

tuple[tuple[str]]

Returns:

A tuple of tuples containing the resolved producers.

Raises:

Exception – If the number of producer sequences does not match the number of configs used by the ML model.

classmethod resolve_param_values(params)[source]#

Resolve the parameter values for the given parameters.

This method retrieves the parameters and resolves the ML model instance, configs, calibrators, selectors, and producers. It also calls the model’s setup hook.

Parameters:

params (dict[str, Any]) – A dictionary of parameters that may contain the analysis instance and ML model.

Return type:

dict[str, Any]

Returns:

A dictionary containing the resolved parameters.

Raises:

Exception – If the ML model instance received configs to define training configs, but did not define any.

store_parts()[source]#

Generate a dictionary of store parts for the current instance.

This method extends the base method to include additional parts related to machine learning model configurations, calibrators, selectors, producers (CSP), and the ML model instance itself. If the list of either of the CSPs is empty, the corresponding part is set to "none", otherwise, the first two elements of the list are joined with "__". If the list of either of the CSPs contains more than two elements, the part is extended with the number of elements and a hash of the remaining elements, which is created with law.util.create_hash(). The parts are represented as strings and are used to create unique identifiers for the instance’s output.

Return type:

InsertableDict[str, str]

Returns:

An InsertableDict containing the store parts.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {'ml_model'}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class MLModelMixin(*args, **kwargs)[source]#

Bases: ConfigTask, MLModelMixinBase

Attributes:

ml_model

allow_empty_ml_model

exclude_params_repr_empty

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

resolve_param_values(params)

rtype:

dict[str, Any]

store_parts()

Returns a law.util.InsertableDict whose values are used to create a store path.

find_keep_columns(collection)

Returns a set of Route objects describing columns that should be kept given a type of column collection.

ml_model = <luigi.parameter.Parameter object>#
allow_empty_ml_model = True#
exclude_params_repr_empty = {'ml_model'}#
classmethod resolve_param_values(params)[source]#
Return type:

dict[str, Any]

store_parts()[source]#

Returns a law.util.InsertableDict whose values are used to create a store path. For instance, the parts {"keyA": "a", "keyB": "b", 2: "c"} lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.

Return type:

InsertableDict

Returns:

Dictionary with parts to create a path to store intermediary results.

find_keep_columns(collection)[source]#

Returns a set of Route objects describing columns that should be kept given a type of column collection.

Parameters:

collection (ColumnCollection) – The collection to return.

Return type:

set[Route]

Returns:

A set of Route objects.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class MLModelDataMixin(*args, **kwargs)[source]#

Bases: MLModelMixin

Attributes:

allow_empty_ml_model

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

store_parts()

Returns a law.util.InsertableDict whose values are used to create a store path.

allow_empty_ml_model = False#
store_parts()[source]#

Returns a law.util.InsertableDict whose values are used to create a store path. For instance, the parts {"keyA": "a", "keyB": "b", 2: "c"} lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.

Return type:

InsertableDict

Returns:

Dictionary with parts to create a path to store intermediary results.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {'ml_model'}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class MLModelsMixin(*args, **kwargs)[source]#

Bases: ConfigTask

Attributes:

ml_models

allow_empty_ml_models

exclude_params_repr_empty

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

resolve_param_values(params)

rtype:

dict[str, Any]

req_params(inst, **kwargs)

Returns parameters that are jointly defined in this class and another task instance of some other class.

store_parts()

Returns a law.util.InsertableDict whose values are used to create a store path.

find_keep_columns(collection)

Returns a set of Route objects describing columns that should be kept given a type of column collection.

ml_models = <law.parameter.CSVParameter object>#
allow_empty_ml_models = True#
exclude_params_repr_empty = {'ml_models'}#
classmethod resolve_param_values(params)[source]#
Return type:

dict[str, Any]

classmethod req_params(inst, **kwargs)[source]#

Returns parameters that are jointly defined in this class and another task instance of some other class. The parameters are used when calling Task.req(self).

Return type:

dict

store_parts()[source]#

Returns a law.util.InsertableDict whose values are used to create a store path. For instance, the parts {"keyA": "a", "keyB": "b", 2: "c"} lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.

Return type:

InsertableDict

Returns:

Dictionary with parts to create a path to store intermediary results.

find_keep_columns(collection)[source]#

Returns a set of Route objects describing columns that should be kept given a type of column collection.

Parameters:

collection (ColumnCollection) – The collection to return.

Return type:

set[Route]

Returns:

A set of Route objects.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class InferenceModelMixin(*args, **kwargs)[source]#

Bases: ConfigTask

Attributes:

inference_model

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

resolve_param_values(params)

rtype:

dict[str, Any]

get_inference_model_inst(inference_model, ...)

rtype:

InferenceModel

req_params(inst, **kwargs)

Returns parameters that are jointly defined in this class and another task instance of some other class.

store_parts()

Returns a law.util.InsertableDict whose values are used to create a store path.

inference_model = <luigi.parameter.Parameter object>#
classmethod resolve_param_values(params)[source]#
Return type:

dict[str, Any]

classmethod get_inference_model_inst(inference_model, config_inst)[source]#
Return type:

InferenceModel

classmethod req_params(inst, **kwargs)[source]#

Returns parameters that are jointly defined in this class and another task instance of some other class. The parameters are used when calling Task.req(self).

Return type:

dict

store_parts()[source]#

Returns a law.util.InsertableDict whose values are used to create a store path. For instance, the parts {"keyA": "a", "keyB": "b", 2: "c"} lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.

Return type:

InsertableDict

Returns:

Dictionary with parts to create a path to store intermediary results.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class CategoriesMixin(*args, **kwargs)[source]#

Bases: ConfigTask

Attributes:

categories

default_categories

allow_empty_categories

categories_repr

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

resolve_param_values(params)

categories = <law.parameter.CSVParameter object>#
default_categories = None#
allow_empty_categories = False#
classmethod resolve_param_values(params)[source]#
property categories_repr#
exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class VariablesMixin(*args, **kwargs)[source]#

Bases: ConfigTask

Attributes:

variables

default_variables

allow_empty_variables

variables_repr

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

resolve_param_values(params)

split_multi_variable(variable)

Splits a multi-dimensional variable given in the format "var_a[-var_b[-...]]" into separate variable names using a delimiter ("-") and returns a tuple.

join_multi_variable(variables)

Joins the name of multiple variables using a delimiter ("-") into a single string that represents a multi-dimensional variable and returns it.

variables = <law.parameter.CSVParameter object>#
default_variables = None#
allow_empty_variables = False#
classmethod resolve_param_values(params)[source]#
classmethod split_multi_variable(variable)[source]#

Splits a multi-dimensional variable given in the format "var_a[-var_b[-...]]" into separate variable names using a delimiter ("-") and returns a tuple.

Return type:

tuple[str]

classmethod join_multi_variable(variables)[source]#

Joins the name of multiple variables using a delimiter ("-") into a single string that represents a multi-dimensional variable and returns it.

Return type:

str

property variables_repr#
exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class DatasetsProcessesMixin(*args, **kwargs)[source]#

Bases: ConfigTask

Attributes:

datasets

processes

allow_empty_datasets

allow_empty_processes

datasets_repr

processes_repr

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

resolve_param_values(params)

get_known_shifts(config_inst, params)

Returns two sets of shifts in a tuple: shifts implemented by _this_ task, and depdenent shifts implemented by upstream tasks.

datasets = <law.parameter.CSVParameter object>#
processes = <law.parameter.CSVParameter object>#
allow_empty_datasets = False#
allow_empty_processes = False#
classmethod resolve_param_values(params)[source]#
classmethod get_known_shifts(config_inst, params)[source]#

Returns two sets of shifts in a tuple: shifts implemented by _this_ task, and depdenent shifts implemented by upstream tasks.

property datasets_repr#
property processes_repr#
exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class ShiftSourcesMixin(*args, **kwargs)[source]#

Bases: ConfigTask

Attributes:

shift_sources

allow_empty_shift_sources

shift_sources_repr

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

resolve_param_values(params)

expand_shift_sources(sources)

rtype:

list[str]

reduce_shifts(shifts)

rtype:

list[str]

shift_sources = <law.parameter.CSVParameter object>#
allow_empty_shift_sources = False#
classmethod resolve_param_values(params)[source]#
classmethod expand_shift_sources(sources)[source]#
Return type:

list[str]

classmethod reduce_shifts(shifts)[source]#
Return type:

list[str]

property shift_sources_repr#
exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class WeightProducerMixin(*args, **kwargs)[source]#

Bases: ConfigTask

Attributes:

weight_producer

register_weight_producer_shifts

weight_producer_inst

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

get_weight_producer_inst(weight_producer[, ...])

rtype:

WeightProducer

resolve_param_values(params)

rtype:

dict[str, Any]

get_known_shifts(config_inst, params)

Returns two sets of shifts in a tuple: shifts implemented by _this_ task, and depdenent shifts implemented by upstream tasks.

store_parts()

Returns a law.util.InsertableDict whose values are used to create a store path.

weight_producer = <luigi.parameter.Parameter object>#
register_weight_producer_shifts = False#
classmethod get_weight_producer_inst(weight_producer, kwargs=None)[source]#
Return type:

WeightProducer

classmethod resolve_param_values(params)[source]#
Return type:

dict[str, Any]

classmethod get_known_shifts(config_inst, params)[source]#

Returns two sets of shifts in a tuple: shifts implemented by _this_ task, and depdenent shifts implemented by upstream tasks.

Return type:

tuple[set[str], set[str]]

property weight_producer_inst: WeightProducer#
store_parts()[source]#

Returns a law.util.InsertableDict whose values are used to create a store path. For instance, the parts {"keyA": "a", "keyB": "b", 2: "c"} lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.

Return type:

InsertableDict[str, str]

Returns:

Dictionary with parts to create a path to store intermediary results.

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
class ChunkedIOMixin(*args, **kwargs)[source]#

Bases: AnalysisTask

Attributes:

check_finite_output

check_overlapping_inputs

exclude_params_req

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

Methods:

raise_if_not_finite(ak_array)

Checks whether all values in array ak_array are finite.

raise_if_overlapping(ak_arrays)

Checks whether fields of ak_arrays overlap.

iter_chunked_io(*args, **kwargs)

check_finite_output = <luigi.parameter.BoolParameter object>#
check_overlapping_inputs = <luigi.parameter.BoolParameter object>#
exclude_params_req = {'check_finite_output', 'check_overlapping_inputs'}#
classmethod raise_if_not_finite(ak_array)[source]#

Checks whether all values in array ak_array are finite.

The check is performed using the numpy.isfinite() function.

Parameters:

ak_array (Array) – Array with events to check.

Raises:

ValueError – If any value in ak_array is not finite.

Return type:

None

classmethod raise_if_overlapping(ak_arrays)[source]#

Checks whether fields of ak_arrays overlap.

Parameters:

ak_arrays (Sequence[Array]) – Arrays with fields to check.

Raises:

ValueError – If at least one overlap is found.

Return type:

None

exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
iter_chunked_io(*args, **kwargs)[source]#