mixins
#
Lightweight mixins task classes.
Classes:
|
Mixin to include a single |
|
Mixin to include multiple |
|
Mixin to include a single |
|
Mixin to include multiple selector steps into tasks. |
|
Mixin to include a single |
|
Mixin to include multiple |
|
Base Mixin to include a machine learning applications into tasks. |
|
A mixin class for training machine learning models. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- class CalibratorMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
Mixin to include a single
Calibrator
into tasks.Inheriting from this mixin will give access to instantiate and access a
Calibrator
instance with name calibrator, which is an input parameter for this task.Attributes:
Access current
Calibrator
instance.Methods:
get_calibrator_inst
(calibrator[, kwargs])Initialize
Calibrator
instance.resolve_param_values
(params)Resolve parameter values params relevant for the
CalibratorMixin
and all classes it inherits from.get_known_shifts
(config_inst, params)Adds set of shifts that the current
calibrator_inst
registers to the set of knownshifts
andupstream_shifts
.req_params
(inst, **kwargs)Returns the required parameters for the task.
Create parts to create the output path to store intermediary results for the current
Task
.find_keep_columns
(collection)Finds the columns to keep based on the collection.
- calibrator = <luigi.parameter.Parameter object>#
- register_calibrator_shifts = False#
- classmethod get_calibrator_inst(calibrator, kwargs=None)[source]#
Initialize
Calibrator
instance.Extracts relevant kwargs for this calibrator instance using the
get_calibrator_kwargs()
method. After this process, the previously initialized instance of aCalibrator
with the name calibrator is initialized using theget_cls()
method with the relevant keyword arguments.- Parameters:
calibrator (
str
) – Name of the calibrator instancekwargs (default:
None
) – Any set keyword argument that is potentially relevant for thisCalibrator
instance
- Raises:
RuntimeError – if requested
Calibrator
instance is notexposed
- Return type:
- Returns:
The initialized
Calibrator
instance.
- classmethod resolve_param_values(params)[source]#
Resolve parameter values params relevant for the
CalibratorMixin
and all classes it inherits from.Loads the
config_inst
and loads the parameter"calibrator"
. In case the parameter is not found, defaults to"default_calibrator"
. Finally, this function adds the keyword"calibrator_inst"
, which contains theCalibrator
instance obtained usingget_calibrator_inst()
method.
- classmethod get_known_shifts(config_inst, params)[source]#
Adds set of shifts that the current
calibrator_inst
registers to the set of knownshifts
andupstream_shifts
.First, the set of
shifts
andupstream_shifts
are obtained from the config_inst and the current set of parameters params using theget_known_shifts
methods of all classes thatCalibratorMixin
inherits from. Afterwards, check if the currentcalibrator_inst
registers shifts. Ifregister_calibrator_shifts
isTrue
, add them to the current set ofshifts
. Otherwise, add the shifts obtained from thecalibrator_inst
toupstream_shifts
.
- classmethod req_params(inst, **kwargs)[source]#
Returns the required parameters for the task. It prefers –calibrator set on task-level via command line.
- property calibrator_inst: Calibrator#
Access current
Calibrator
instance.This method loads the current
Calibrator
calibrator_inst from the cache or initializes it. If the calibrator requests a specificsandbox
, set this sandbox as the environment for the currentTask
.- Returns:
Current
Calibrator
instance
- store_parts()[source]#
Create parts to create the output path to store intermediary results for the current
Task
.This method calls
store_parts()
of thesuper
class and inserts {“calibrator”: “calib__{self.calibrator}”} before keywordversion
. For more information, see e.g.store_parts()
.
- find_keep_columns(collection)[source]#
Finds the columns to keep based on the collection.
If the collection is ALL_FROM_CALIBRATOR, it includes the columns produced by the calibrator.
- Parameters:
collection (
ColumnCollection
) – The collection of columns.- Return type:
- Returns:
Set of columns to keep.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class CalibratorsMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
Mixin to include multiple
Calibrator
instances into tasks.Inheriting from this mixin will allow a task to instantiate and access a set of
Calibrator
instances with names calibrators, which is a comma-separated list of calibrator names and is an input parameter for this task.Attributes:
Access current list of
Calibrator
instances.Methods:
get_calibrator_insts
(calibrators[, kwargs])Get all requested calibrators.
resolve_param_values
(params)Resolve values params and check against possible default values and calibrator groups.
get_known_shifts
(config_inst, params)Adds set of all shifts that the list of
calibrator_insts
register to the set of knownshifts
andupstream_shifts
.req_params
(inst, **kwargs)Returns the required parameters for the task.
Create parts to create the output path to store intermediary results for the current
Task
.find_keep_columns
(collection)Finds the columns to keep based on the collection.
- calibrators = <law.parameter.CSVParameter object>#
- register_calibrators_shifts = False#
- classmethod get_calibrator_insts(calibrators, kwargs=None)[source]#
Get all requested calibrators.
Calibrator
instances are either initalized or loaded from cache.- Parameters:
kwargs (default:
None
) – Additional keyword arguments to forward to individualCalibrator
instances
- Raises:
RuntimeError – if requested calibrators are not
exposed
- Return type:
- Returns:
List of
Calibrator
instances.
- classmethod resolve_param_values(params)[source]#
Resolve values params and check against possible default values and calibrator groups.
Check the values in params against the default value
"default_calibrator"
and possible group definitions"calibrator_groups"
in the current config inst. For more information, seeresolve_config_default_and_groups()
.- Parameters:
params (
InsertableDict
[str
,Any
]) – Parameter values to resolve- Return type:
- Returns:
Dictionary of parameters that contains the list requested
Calibrator
instances under the keyword"calibrator_insts"
. Seeget_calibrator_insts()
for more information.
- classmethod get_known_shifts(config_inst, params)[source]#
Adds set of all shifts that the list of
calibrator_insts
register to the set of knownshifts
andupstream_shifts
.First, the set of
shifts
andupstream_shifts
are obtained from the config_inst and the current set of parameters params using theget_known_shifts
methods of all classes thatCalibratorsMixin
inherits from. Afterwards, loop through the list ofCalibrator
and check if they register shifts. Ifregister_calibrators_shifts
isTrue
, add them to the current set ofshifts
. Otherwise, add the shifts toupstream_shifts
.
- classmethod req_params(inst, **kwargs)[source]#
Returns the required parameters for the task.
It prefers
--calibrators
set on task-level via command line.
- property calibrator_insts: list[columnflow.calibration.Calibrator]#
Access current list of
Calibrator
instances.Loads the current
Calibrator
calibrator_insts from the cache or initializes it.- Returns:
Current list
Calibrator
instances
- store_parts()[source]#
Create parts to create the output path to store intermediary results for the current
Task
.Calls
store_parts()
of thesuper
class and inserts {“calibrator”: “calib__{HASH}”} before keywordversion
. Here,HASH
is the joint string of the first five calibrator names + a hash created withlaw.util.create_hash()
based on the list of calibrators, starting at its 5th element (i.e.self.calibrators[5:]
) For more information, see e.g.store_parts()
.- Returns:
Updated parts to create output path to store intermediary results.
- find_keep_columns(collection)[source]#
Finds the columns to keep based on the collection.
If the collection is
ALL_FROM_CALIBRATORS
, it includes the columns produced by the calibrators.- Parameters:
collection (
ColumnCollection
) – The collection of columns.- Return type:
- Returns:
Set of columns to keep.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class SelectorMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
Mixin to include a single
Selector
instances into tasks.Inheriting from this mixin will allow a task to instantiate and access a
Selector
instance with name selector, which is an input parameter for this task.Attributes:
Access current
Selector
instance.Methods:
get_selector_inst
(selector[, kwargs])Get requested selector.
resolve_param_values
(params)Resolve values params and check against possible default values and selector groups.
get_known_shifts
(config_inst, params)Adds set of shifts that the current
selector_inst
registers to the set of knownshifts
andupstream_shifts
.req_params
(inst, **kwargs)Get the required parameters for the task, preferring the
--selector
set on task-level via CLI.Create parts to create the output path to store intermediary results for the current
Task
.find_keep_columns
(collection)Returns a set of
Route
objects describing columns that should be kept given a type of column collection.- selector = <luigi.parameter.Parameter object>#
- register_selector_shifts = False#
- classmethod get_selector_inst(selector, kwargs=None)[source]#
Get requested selector.
Selector
instance is either initalized or loaded from cache.
- classmethod resolve_param_values(params)[source]#
Resolve values params and check against possible default values and selector groups.
Check the values in params against the default value
"default_selector"
in the current config inst. For more information, seeresolve_config_default()
.
- classmethod get_known_shifts(config_inst, params)[source]#
Adds set of shifts that the current
selector_inst
registers to the set of knownshifts
andupstream_shifts
.First, the set of
shifts
andupstream_shifts
are obtained from the config_inst and the current set of parameters params using theget_known_shifts
methods of all classes thatSelectorMixin
inherits from. Afterwards, check if the currentselector_inst
registers shifts. Ifregister_selector_shifts
isTrue
, add them to the current set ofshifts
. Otherwise, add the shifts obtained from theselector_inst
toupstream_shifts
.
- classmethod req_params(inst, **kwargs)[source]#
Get the required parameters for the task, preferring the
--selector
set on task-level via CLI.This method first checks if the –selector parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.
- property selector_inst#
Access current
Selector
instance.Loads the current
Selector
selector_inst from the cache or initializes it. If the selector requests a specificsandbox
, set this sandbox as the environment for the currentTask
.- Returns:
Current
Selector
instance
- store_parts()[source]#
Create parts to create the output path to store intermediary results for the current
Task
.Calls
store_parts()
of thesuper
class and inserts {“selector”: “sel__{SELECTOR_NAME}”} before keywordversion
. Here,SELECTOR_NAME
is the name of the currentselector_inst
.- Returns:
Updated parts to create output path to store intermediary results.
- find_keep_columns(collection)[source]#
Returns a set of
Route
objects describing columns that should be kept given a type of column collection.- Parameters:
collection (
ColumnCollection
) – The collection to return.- Return type:
- Returns:
A set of
Route
objects.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class SelectorStepsMixin(*args, **kwargs)[source]#
Bases:
SelectorMixin
Mixin to include multiple selector steps into tasks.
Inheriting from this mixin will allow a task to access selector steps, which can be a comma-separated list of selector step names and is an input parameter for this task.
Attributes:
Methods:
resolve_param_values
(params)Resolve values params and check against possible default values and selector step groups.
req_params
(inst, **kwargs)Get the required parameters for the task, preferring the --selector-steps set on task-level via CLI.
Create parts to create the output path to store intermediary results for the current
Task
.- selector_steps = <law.parameter.CSVParameter object>#
- exclude_params_repr_empty = {'selector_steps'}#
- selector_steps_order_sensitive = False#
- classmethod resolve_param_values(params)[source]#
Resolve values params and check against possible default values and selector step groups.
Check the values in params against the default value
"default_selector_steps"
and the group"selector_step_groups"
in the current config inst. For more information, seeresolve_config_default()
. IfSelectorStepsMixin.selector_steps_order_sensitive
isTrue
,sort
the selector steps.
- classmethod req_params(inst, **kwargs)[source]#
Get the required parameters for the task, preferring the –selector-steps set on task-level via CLI.
This method first checks if the –selector-steps parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.
- store_parts()[source]#
Create parts to create the output path to store intermediary results for the current
Task
.Calls
store_parts()
of thesuper
class and inserts {“selector”: “__steps__LIST_OF_STEPS”}, whereLIST_OF_STEPS
is the sorted list of selector steps. For more information, see e.g.store_parts()
.- Return type:
InsertableDict
- Returns:
Updated parts to create output path to store intermediary results.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class ProducerMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
Mixin to include a single
Producer
into tasks.Inheriting from this mixin will give access to instantiate and access a
Producer
instance with name producer, which is an input parameter for this task.Attributes:
Access current
Producer
instance.Methods:
get_producer_inst
(producer[, kwargs])Initialize
Producer
instance.resolve_param_values
(params)Resolve parameter values params relevant for the
ProducerMixin
and all classes it inherits from.get_known_shifts
(config_inst, params)Adds set of shifts that the current
producer_inst
registers to the set of knownshifts
andupstream_shifts
.req_params
(inst, **kwargs)Get the required parameters for the task, preferring the
--producer
set on task-level via CLI.Create parts to create the output path to store intermediary results for the current
Task
.find_keep_columns
(collection)Finds the columns to keep based on the collection.
- producer = <luigi.parameter.Parameter object>#
- register_producer_shifts = False#
- classmethod get_producer_inst(producer, kwargs=None)[source]#
Initialize
Producer
instance.Extracts relevant kwargs for this producer instance using the
get_producer_kwargs()
method. After this process, the previously initialized instance of aProducer
with the name producer is initialized using theget_cls()
method with the relevant keyword arguments.
- classmethod resolve_param_values(params)[source]#
Resolve parameter values params relevant for the
ProducerMixin
and all classes it inherits from.Loads the
config_inst
and loads the parameter"producer"
. In case the parameter is not found, defaults to"default_producer"
. Finally, this function adds the keyword"producer_inst"
, which contains theProducer
instance obtained usingget_producer_inst()
method.
- classmethod get_known_shifts(config_inst, params)[source]#
Adds set of shifts that the current
producer_inst
registers to the set of knownshifts
andupstream_shifts
.First, the set of
shifts
andupstream_shifts
are obtained from the config_inst and the current set of parameters params using theget_known_shifts
methods of all classes thatProducerMixin
inherits from. Afterwards, check if the currentproducer_inst
registers shifts. Ifregister_producer_shifts
isTrue
, add them to the current set ofshifts
. Otherwise, add the shifts obtained from theproducer_inst
toupstream_shifts
.
- classmethod req_params(inst, **kwargs)[source]#
Get the required parameters for the task, preferring the
--producer
set on task-level via CLI.This method first checks if the
--producer
parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.
- property producer_inst: Producer#
Access current
Producer
instance.Loads the current
Producer
producer_inst from the cache or initializes it. If the producer requests a specificsandbox
, set this sandbox as the environment for the currentTask
.- Returns:
Current
Producer
instance
- store_parts()[source]#
Create parts to create the output path to store intermediary results for the current
Task
.Calls
store_parts()
of thesuper
class and inserts {“producer”: “prod__{self.producer}”} before keywordversion
. For more information, see e.g.store_parts()
.
- find_keep_columns(collection)[source]#
Finds the columns to keep based on the collection.
This method first calls the ‘find_keep_columns’ method of the superclass with the given collection. If the collection is equal to
ALL_FROM_PRODUCER
, it adds the columns produced by the producer instance to the set of columns.- Parameters:
collection (
ColumnCollection
) – The collection of columns.- Return type:
- Returns:
A set of columns to keep.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class ProducersMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
Mixin to include multiple
Producer
instances into tasks.Inheriting from this mixin will allow a task to instantiate and access a set of
Producer
instances with names producers, which is a comma-separated list of producer names and is an input parameter for this task.Attributes:
Access current list of
Producer
instances.Methods:
get_producer_insts
(producers[, kwargs])Get all requested producers.
resolve_param_values
(params)Resolve values params and check against possible default values and producer groups.
get_known_shifts
(config_inst, params)Adds set of all shifts that the list of
producer_insts
register to the set of knownshifts
andupstream_shifts
.req_params
(inst, **kwargs)Get the required parameters for the task, preferring the --producers set on task-level via CLI.
Create parts to create the output path to store intermediary results for the current
Task
.find_keep_columns
(collection)Finds the columns to keep based on the collection.
- producers = <law.parameter.CSVParameter object>#
- register_producers_shifts = False#
- classmethod get_producer_insts(producers, kwargs=None)[source]#
Get all requested producers.
Producer
instances are either initalized or loaded from cache.- Parameters:
- Raises:
RuntimeError – if requested producers are not
exposed
- Return type:
- Returns:
List of
Producer
instances.
- classmethod resolve_param_values(params)[source]#
Resolve values params and check against possible default values and producer groups.
Check the values in params against the default value
"default_producer"
and possible group definitions"producer_groups"
in the current config inst. For more information, seeresolve_config_default_and_groups()
.
- classmethod get_known_shifts(config_inst, params)[source]#
Adds set of all shifts that the list of
producer_insts
register to the set of knownshifts
andupstream_shifts
.First, the set of
shifts
andupstream_shifts
are obtained from the config_inst and the current set of parameters params using theget_known_shifts
methods of all classes thatProducersMixin
inherits from. Afterwards, loop through the list ofProducer
and check if they register shifts. Ifregister_producers_shifts
isTrue
, add them to the current set ofshifts
. Otherwise, add the shifts toupstream_shifts
.
- classmethod req_params(inst, **kwargs)[source]#
Get the required parameters for the task, preferring the –producers set on task-level via CLI.
This method first checks if the –producers parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.
- property producer_insts: list[columnflow.production.Producer]#
Access current list of
Producer
instances.Loads the current
Producer
producer_insts from the cache or initializes it.- Returns:
Current list
Producer
instances
- store_parts()[source]#
Create parts to create the output path to store intermediary results for the current
Task
.Calls
store_parts()
of thesuper
class and inserts {“producers”: “prod__{HASH}”} before keywordversion
. Here,HASH
is the joint string of the first five producer names + a hash created withlaw.util.create_hash()
based on the list of producers, starting at its 5th element (i.e.self.producers[5:]
) For more information, see e.g.store_parts()
.- Returns:
Updated parts to create output path to store intermediary results.
- find_keep_columns(collection)[source]#
Finds the columns to keep based on the collection.
This method first calls the ‘find_keep_columns’ method of the superclass with the given collection. If the collection is equal to
ALL_FROM_PRODUCERS
, it adds the columns produced by all producer instances to the set of columns.- Parameters:
collection (
ColumnCollection
) – The collection of columns.- Return type:
- Returns:
A set of columns to keep.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class MLModelMixinBase(*args, **kwargs)[source]#
Bases:
AnalysisTask
Base Mixin to include a machine learning applications into tasks.
Inheriting from this mixin will allow a task to instantiate and access a
MLModel
instance with name ml_model, which is an input parameter for this task.Attributes:
Methods:
req_params
(inst, **kwargs)Get the required parameters for the task, preferring the
--ml-model
set on task-level via CLI.get_ml_model_inst
(ml_model, analysis_inst[, ...])Get requested ml_model instance.
events_used_in_training
(config_inst, ...)Evaluate whether the events for the combination of dataset_inst and shift_inst shall be used in the training.
- ml_model = <luigi.parameter.Parameter object>#
- exclude_params_repr_empty = {'ml_model'}#
- classmethod req_params(inst, **kwargs)[source]#
Get the required parameters for the task, preferring the
--ml-model
set on task-level via CLI.This method first checks if the
--ml-model
parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.
- classmethod get_ml_model_inst(ml_model, analysis_inst, requested_configs=None, **kwargs)[source]#
Get requested ml_model instance.
This method retrieves the requested ml_model instance. If requested_configs are provided, they are used for the training of the ML application.
- Parameters:
analysis_inst (od.Analysis) – Forward this analysis inst to the init function of new MLModel sub class.
requested_configs (list[str] | None, default:
None
) – Configs needed for the training of the ML application.kwargs – Additional keyword arguments to forward to the
MLModel
instance.
- Return type:
- Returns:
MLModel
instance.
- events_used_in_training(config_inst, dataset_inst, shift_inst)[source]#
Evaluate whether the events for the combination of dataset_inst and shift_inst shall be used in the training.
This method checks if the dataset_inst is in the set of datasets of the current ml_model_inst based on the given config_inst. Additionally, the function checks that the shift_inst does not have the tag “disjoint_from_nominal”.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class MLModelTrainingMixin(*args, **kwargs)[source]#
Bases:
MLModelMixinBase
A mixin class for training machine learning models.
This class provides parameters for configuring the training of machine learning models.
Attributes:
Methods:
resolve_calibrators
(ml_model_inst, params)Resolve the calibrators for the given ML model instance.
resolve_selectors
(ml_model_inst, params)Resolve the selectors for the given ML model instance.
resolve_producers
(ml_model_inst, params)Resolve the producers for the given ML model instance.
resolve_param_values
(params)Resolve the parameter values for the given parameters.
Generate a dictionary of store parts for the current instance.
- configs = <law.parameter.CSVParameter object>#
- calibrators = <law.parameter.MultiCSVParameter object>#
- selectors = <law.parameter.CSVParameter object>#
- producers = <law.parameter.MultiCSVParameter object>#
- classmethod resolve_calibrators(ml_model_inst, params)[source]#
Resolve the calibrators for the given ML model instance.
This method retrieves the calibrators from the parameters params and broadcasts them to the configs if necessary. It also resolves calibrator_groups and default_calibrator from the config(s) associated with this ML model instance, and validates the number of sequences. Finally, it checks the retrieved calibrators against the training calibrators of the model using
training_calibrators()
and instantiates them if necessary.- Parameters:
- Return type:
- Returns:
A tuple of tuples containing the resolved calibrators.
- Raises:
Exception – If the number of calibrator sequences does not match the number of configs used by the ML model.
- classmethod resolve_selectors(ml_model_inst, params)[source]#
Resolve the selectors for the given ML model instance.
This method retrieves the selectors from the parameters params and broadcasts them to the configs if necessary. It also resolves default_selector from the config(s) associated with this ML model instance, validates the number of sequences. Finally, it checks the retrieved selectors against the training selectors of the model, using
training_selector()
, and instantiates them.- Parameters:
- Return type:
- Returns:
A tuple containing the resolved selectors.
- Raises:
Exception – If the number of selector sequences does not match the number of configs used by the ML model.
- classmethod resolve_producers(ml_model_inst, params)[source]#
Resolve the producers for the given ML model instance.
This method retrieves the producers from the parameters params and broadcasts them to the configs if necessary. It also resolves producer_groups and default_producer from the config(s) associated with this ML model instance, validates the number of sequences. Finally, it checks the retrieved producers against the training producers of the model, using
training_producers()
, and instantiates them.- Parameters:
- Return type:
- Returns:
A tuple of tuples containing the resolved producers.
- Raises:
Exception – If the number of producer sequences does not match the number of configs used by the ML model.
- classmethod resolve_param_values(params)[source]#
Resolve the parameter values for the given parameters.
This method retrieves the parameters and resolves the ML model instance, configs, calibrators, selectors, and producers. It also calls the model’s setup hook.
- Parameters:
params (
dict
[str
,Any
]) – A dictionary of parameters that may contain the analysis instance and ML model.- Return type:
- Returns:
A dictionary containing the resolved parameters.
- Raises:
Exception – If the ML model instance received configs to define training configs, but did not define any.
- store_parts()[source]#
Generate a dictionary of store parts for the current instance.
This method extends the base method to include additional parts related to machine learning model configurations, calibrators, selectors, producers (CSP), and the ML model instance itself. If the list of either of the CSPs is empty, the corresponding part is set to
"none"
, otherwise, the first two elements of the list are joined with"__"
. If the list of either of the CSPs contains more than two elements, the part is extended with the number of elements and a hash of the remaining elements, which is created withlaw.util.create_hash()
. The parts are represented as strings and are used to create unique identifiers for the instance’s output.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {'ml_model'}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class MLModelMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
,MLModelMixinBase
Attributes:
Methods:
resolve_param_values
(params)Returns a
law.util.InsertableDict
whose values are used to create a store path.find_keep_columns
(collection)Returns a set of
Route
objects describing columns that should be kept given a type of column collection.- ml_model = <luigi.parameter.Parameter object>#
- allow_empty_ml_model = True#
- exclude_params_repr_empty = {'ml_model'}#
- store_parts()[source]#
Returns a
law.util.InsertableDict
whose values are used to create a store path. For instance, the parts{"keyA": "a", "keyB": "b", 2: "c"}
lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.- Return type:
InsertableDict
- Returns:
Dictionary with parts to create a path to store intermediary results.
- find_keep_columns(collection)[source]#
Returns a set of
Route
objects describing columns that should be kept given a type of column collection.- Parameters:
collection (
ColumnCollection
) – The collection to return.- Return type:
- Returns:
A set of
Route
objects.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class MLModelDataMixin(*args, **kwargs)[source]#
Bases:
MLModelMixin
Attributes:
Methods:
Returns a
law.util.InsertableDict
whose values are used to create a store path.- allow_empty_ml_model = False#
- store_parts()[source]#
Returns a
law.util.InsertableDict
whose values are used to create a store path. For instance, the parts{"keyA": "a", "keyB": "b", 2: "c"}
lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.- Return type:
InsertableDict
- Returns:
Dictionary with parts to create a path to store intermediary results.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {'ml_model'}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class MLModelsMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
Attributes:
Methods:
resolve_param_values
(params)req_params
(inst, **kwargs)Returns parameters that are jointly defined in this class and another task instance of some other class.
Returns a
law.util.InsertableDict
whose values are used to create a store path.find_keep_columns
(collection)Returns a set of
Route
objects describing columns that should be kept given a type of column collection.- ml_models = <law.parameter.CSVParameter object>#
- allow_empty_ml_models = True#
- exclude_params_repr_empty = {'ml_models'}#
- classmethod req_params(inst, **kwargs)[source]#
Returns parameters that are jointly defined in this class and another task instance of some other class. The parameters are used when calling
Task.req(self)
.- Return type:
- store_parts()[source]#
Returns a
law.util.InsertableDict
whose values are used to create a store path. For instance, the parts{"keyA": "a", "keyB": "b", 2: "c"}
lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.- Return type:
InsertableDict
- Returns:
Dictionary with parts to create a path to store intermediary results.
- find_keep_columns(collection)[source]#
Returns a set of
Route
objects describing columns that should be kept given a type of column collection.- Parameters:
collection (
ColumnCollection
) – The collection to return.- Return type:
- Returns:
A set of
Route
objects.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class InferenceModelMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
Attributes:
Methods:
resolve_param_values
(params)get_inference_model_inst
(inference_model, ...)- rtype:
req_params
(inst, **kwargs)Returns parameters that are jointly defined in this class and another task instance of some other class.
Returns a
law.util.InsertableDict
whose values are used to create a store path.- inference_model = <luigi.parameter.Parameter object>#
- classmethod req_params(inst, **kwargs)[source]#
Returns parameters that are jointly defined in this class and another task instance of some other class. The parameters are used when calling
Task.req(self)
.- Return type:
- store_parts()[source]#
Returns a
law.util.InsertableDict
whose values are used to create a store path. For instance, the parts{"keyA": "a", "keyB": "b", 2: "c"}
lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.- Return type:
InsertableDict
- Returns:
Dictionary with parts to create a path to store intermediary results.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class CategoriesMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
Attributes:
Methods:
resolve_param_values
(params)- categories = <law.parameter.CSVParameter object>#
- default_categories = None#
- allow_empty_categories = False#
- property categories_repr#
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class VariablesMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
Attributes:
Methods:
resolve_param_values
(params)split_multi_variable
(variable)Splits a multi-dimensional variable given in the format
"var_a[-var_b[-...]]"
into separate variable names using a delimiter ("-"
) and returns a tuple.join_multi_variable
(variables)Joins the name of multiple variables using a delimiter (
"-"
) into a single string that represents a multi-dimensional variable and returns it.- variables = <law.parameter.CSVParameter object>#
- default_variables = None#
- allow_empty_variables = False#
- classmethod split_multi_variable(variable)[source]#
Splits a multi-dimensional variable given in the format
"var_a[-var_b[-...]]"
into separate variable names using a delimiter ("-"
) and returns a tuple.
- classmethod join_multi_variable(variables)[source]#
Joins the name of multiple variables using a delimiter (
"-"
) into a single string that represents a multi-dimensional variable and returns it.- Return type:
- property variables_repr#
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class DatasetsProcessesMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
Attributes:
Methods:
resolve_param_values
(params)get_known_shifts
(config_inst, params)Returns two sets of shifts in a tuple: shifts implemented by _this_ task, and depdenent shifts implemented by upstream tasks.
- datasets = <law.parameter.CSVParameter object>#
- processes = <law.parameter.CSVParameter object>#
- allow_empty_datasets = False#
- allow_empty_processes = False#
- classmethod get_known_shifts(config_inst, params)[source]#
Returns two sets of shifts in a tuple: shifts implemented by _this_ task, and depdenent shifts implemented by upstream tasks.
- property datasets_repr#
- property processes_repr#
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class ShiftSourcesMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
Attributes:
Methods:
resolve_param_values
(params)expand_shift_sources
(sources)- rtype:
list[str]
reduce_shifts
(shifts)- rtype:
list[str]
- shift_sources = <law.parameter.CSVParameter object>#
- allow_empty_shift_sources = False#
- property shift_sources_repr#
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class WeightProducerMixin(*args, **kwargs)[source]#
Bases:
ConfigTask
Attributes:
Methods:
get_weight_producer_inst
(weight_producer[, ...])- rtype:
WeightProducer
resolve_param_values
(params)get_known_shifts
(config_inst, params)Returns two sets of shifts in a tuple: shifts implemented by _this_ task, and depdenent shifts implemented by upstream tasks.
Returns a
law.util.InsertableDict
whose values are used to create a store path.- weight_producer = <luigi.parameter.Parameter object>#
- register_weight_producer_shifts = False#
- classmethod get_weight_producer_inst(weight_producer, kwargs=None)[source]#
- Return type:
WeightProducer
- classmethod get_known_shifts(config_inst, params)[source]#
Returns two sets of shifts in a tuple: shifts implemented by _this_ task, and depdenent shifts implemented by upstream tasks.
- property weight_producer_inst: WeightProducer#
- store_parts()[source]#
Returns a
law.util.InsertableDict
whose values are used to create a store path. For instance, the parts{"keyA": "a", "keyB": "b", 2: "c"}
lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- class ChunkedIOMixin(*args, **kwargs)[source]#
Bases:
AnalysisTask
Attributes:
Methods:
raise_if_not_finite
(ak_array)Checks whether all values in array ak_array are finite.
raise_if_overlapping
(ak_arrays)Checks whether fields of ak_arrays overlap.
iter_chunked_io
(*args, **kwargs)- check_finite_output = <luigi.parameter.BoolParameter object>#
- check_overlapping_inputs = <luigi.parameter.BoolParameter object>#
- exclude_params_req = {'check_finite_output', 'check_overlapping_inputs'}#
- classmethod raise_if_not_finite(ak_array)[source]#
Checks whether all values in array ak_array are finite.
The check is performed using the
numpy.isfinite()
function.- Parameters:
ak_array (
Array
) – Array with events to check.- Raises:
ValueError – If any value in ak_array is not finite.
- Return type:
- classmethod raise_if_overlapping(ak_arrays)[source]#
Checks whether fields of ak_arrays overlap.
- Parameters:
- Raises:
ValueError – If at least one overlap is found.
- Return type:
- exclude_index = False#
- exclude_params_index = {}#
- exclude_params_repr = {}#
- exclude_params_repr_empty = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#