mixins#
Lightweight mixins task classes.
Classes:
|
Mixin to include a single |
|
Mixin to include multiple |
|
Mixin to include a single |
|
Mixin to include multiple selector steps into tasks. |
|
Mixin to include a single |
|
Mixin to include multiple |
|
Base mixin to include a machine learning application into tasks. |
|
A mixin class for training machine learning models. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- class CalibratorMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskMixin to include a single
Calibratorinto tasks.Inheriting from this mixin will give access to instantiate and access a
Calibratorinstance with name calibrator, which is an input parameter for this task.Attributes:
Access current
Calibratorinstance.Return a string representation of the calibrator.
Methods:
get_calibrator_inst(calibrator[, kwargs])Initialize
Calibratorinstance.resolve_param_values(params)Resolve parameter values params relevant for the
CalibratorMixinand all classes it inherits from.get_known_shifts(config_inst, params)Adds set of shifts that the current
calibrator_instregisters to the set of knownshiftsandupstream_shifts.req_params(inst, **kwargs)Returns the required parameters for the task.
Create parts to create the output path to store intermediary results for the current
Task.find_keep_columns(collection)Finds the columns to keep based on the collection.
get_config_lookup_keys(inst_or_params)Returns a dictionary with keys that can be used to lookup state specific values in a config or dictionary, such as default task versions or output locations.
- calibrator = <luigi.parameter.Parameter object>#
- register_calibrator_sandbox = False#
- register_calibrator_shifts = False#
- classmethod get_calibrator_inst(calibrator, kwargs=None)[source]#
Initialize
Calibratorinstance.Extracts relevant kwargs for this calibrator instance using the
get_calibrator_kwargs()method. After this process, the previously initialized instance of aCalibratorwith the name calibrator is initialized using theget_cls()method with the relevant keyword arguments.- Parameters:
calibrator (
str) – Name of the calibrator instancekwargs (default:
None) – Any set keyword argument that is potentially relevant for thisCalibratorinstance
- Raises:
RuntimeError – if requested
Calibratorinstance is notexposed- Return type:
- Returns:
The initialized
Calibratorinstance.
- classmethod resolve_param_values(params)[source]#
Resolve parameter values params relevant for the
CalibratorMixinand all classes it inherits from.Loads the
config_instand loads the parameter"calibrator". In case the parameter is not found, defaults to"default_calibrator". Finally, this function adds the keyword"calibrator_inst", which contains theCalibratorinstance obtained usingget_calibrator_inst()method.
- classmethod get_known_shifts(config_inst, params)[source]#
Adds set of shifts that the current
calibrator_instregisters to the set of knownshiftsandupstream_shifts.First, the set of
shiftsandupstream_shiftsare obtained from the config_inst and the current set of parameters params using theget_known_shiftsmethods of all classes thatCalibratorMixininherits from. Afterwards, check if the currentcalibrator_instregisters shifts. Ifregister_calibrator_shiftsisTrue, add them to the current set ofshifts. Otherwise, add the shifts obtained from thecalibrator_insttoupstream_shifts.
- classmethod req_params(inst, **kwargs)[source]#
Returns the required parameters for the task. It prefers –calibrator set on task-level via command line.
- property calibrator_inst: Calibrator#
Access current
Calibratorinstance.This method loads the current
Calibratorcalibrator_inst from the cache or initializes it. If the calibrator requests a specificsandbox, set this sandbox as the environment for the currentTask.- Returns:
Current
Calibratorinstance
- property calibrator_repr#
Return a string representation of the calibrator.
- store_parts()[source]#
Create parts to create the output path to store intermediary results for the current
Task.This method calls
store_parts()of thesuperclass and inserts {“calibrator”: “calib__{self.calibrator}”} before keywordversion. For more information, see e.g.store_parts().
- find_keep_columns(collection)[source]#
Finds the columns to keep based on the collection.
If the collection is ALL_FROM_CALIBRATOR, it includes the columns produced by the calibrator.
- Parameters:
collection (
ColumnCollection) – The collection of columns.- Return type:
- Returns:
Set of columns to keep.
- classmethod get_config_lookup_keys(inst_or_params)[source]#
Returns a dictionary with keys that can be used to lookup state specific values in a config or dictionary, such as default task versions or output locations.
- Parameters:
inst_or_params (CalibratorMixin | dict[str, Any]) – The tasks instance or its parameters.
- Return type:
law.util.InsertiableDict
- Returns:
A dictionary with keys that can be used for nested lookup.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class CalibratorsMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskMixin to include multiple
Calibratorinstances into tasks.Inheriting from this mixin will allow a task to instantiate and access a set of
Calibratorinstances with names calibrators, which is a comma-separated list of calibrator names and is an input parameter for this task.Attributes:
Access current list of
Calibratorinstances.Return a string representation of the calibrators.
Methods:
get_calibrator_insts(calibrators[, kwargs])Get all requested calibrators.
resolve_param_values(params)Resolve values params and check against possible default values and calibrator groups.
get_known_shifts(config_inst, params)Adds set of all shifts that the list of
calibrator_instsregister to the set of knownshiftsandupstream_shifts.req_params(inst, **kwargs)Returns the required parameters for the task.
Create parts to create the output path to store intermediary results for the current
Task.find_keep_columns(collection)Finds the columns to keep based on the collection.
- calibrators = <law.parameter.CSVParameter object>#
- register_calibrators_shifts = False#
- classmethod get_calibrator_insts(calibrators, kwargs=None)[source]#
Get all requested calibrators.
Calibratorinstances are either initalized or loaded from cache.- Parameters:
kwargs (default:
None) – Additional keyword arguments to forward to individualCalibratorinstances
- Raises:
RuntimeError – if requested calibrators are not
exposed- Return type:
- Returns:
List of
Calibratorinstances.
- classmethod resolve_param_values(params)[source]#
Resolve values params and check against possible default values and calibrator groups.
Check the values in params against the default value
"default_calibrator"and possible group definitions"calibrator_groups"in the current config inst. For more information, seeresolve_config_default_and_groups().- Parameters:
params (
InsertableDict[str,Any]) – Parameter values to resolve- Return type:
- Returns:
Dictionary of parameters that contains the list requested
Calibratorinstances under the keyword"calibrator_insts". Seeget_calibrator_insts()for more information.
- classmethod get_known_shifts(config_inst, params)[source]#
Adds set of all shifts that the list of
calibrator_instsregister to the set of knownshiftsandupstream_shifts.First, the set of
shiftsandupstream_shiftsare obtained from the config_inst and the current set of parameters params using theget_known_shiftsmethods of all classes thatCalibratorsMixininherits from. Afterwards, loop through the list ofCalibratorand check if they register shifts. Ifregister_calibrators_shiftsisTrue, add them to the current set ofshifts. Otherwise, add the shifts toupstream_shifts.
- classmethod req_params(inst, **kwargs)[source]#
Returns the required parameters for the task.
It prefers
--calibratorsset on task-level via command line.
- property calibrator_insts: list[columnflow.calibration.Calibrator]#
Access current list of
Calibratorinstances.Loads the current
Calibratorcalibrator_insts from the cache or initializes it.- Returns:
Current list
Calibratorinstances
- store_parts()[source]#
Create parts to create the output path to store intermediary results for the current
Task.Calls
store_parts()of thesuperclass and inserts {“calibrator”: “calib__{HASH}”} before keywordversion. Here,HASHis the joint string of the first five calibrator names + a hash created withlaw.util.create_hash()based on the list of calibrators, starting at its 5th element (i.e.self.calibrators[5:]) For more information, see e.g.store_parts().- Returns:
Updated parts to create output path to store intermediary results.
- find_keep_columns(collection)[source]#
Finds the columns to keep based on the collection.
If the collection is
ALL_FROM_CALIBRATORS, it includes the columns produced by the calibrators.- Parameters:
collection (
ColumnCollection) – The collection of columns.- Return type:
- Returns:
Set of columns to keep.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class SelectorMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskMixin to include a single
Selectorinstances into tasks.Inheriting from this mixin will allow a task to instantiate and access a
Selectorinstance with name selector, which is an input parameter for this task.Attributes:
Access current
Selectorinstance.Return a string representation of the selector.
Methods:
get_selector_inst(selector[, kwargs])Get requested selector.
resolve_param_values(params)Resolve values params and check against possible default values and selector groups.
get_known_shifts(config_inst, params)Adds set of shifts that the current
selector_instregisters to the set of knownshiftsandupstream_shifts.req_params(inst, **kwargs)Get the required parameters for the task, preferring the
--selectorset on task-level via CLI.Create parts to create the output path to store intermediary results for the current
Task.find_keep_columns(collection)Returns a set of
Routeobjects describing columns that should be kept given a type of column collection.get_config_lookup_keys(inst_or_params)Returns a dictionary with keys that can be used to lookup state specific values in a config or dictionary, such as default task versions or output locations.
- selector = <luigi.parameter.Parameter object>#
- register_selector_sandbox = False#
- register_selector_shifts = False#
- classmethod get_selector_inst(selector, kwargs=None)[source]#
Get requested selector.
Selectorinstance is either initalized or loaded from cache.
- classmethod resolve_param_values(params)[source]#
Resolve values params and check against possible default values and selector groups.
Check the values in params against the default value
"default_selector"in the current config inst. For more information, seeresolve_config_default().
- classmethod get_known_shifts(config_inst, params)[source]#
Adds set of shifts that the current
selector_instregisters to the set of knownshiftsandupstream_shifts.First, the set of
shiftsandupstream_shiftsare obtained from the config_inst and the current set of parameters params using theget_known_shiftsmethods of all classes thatSelectorMixininherits from. Afterwards, check if the currentselector_instregisters shifts. Ifregister_selector_shiftsisTrue, add them to the current set ofshifts. Otherwise, add the shifts obtained from theselector_insttoupstream_shifts.
- classmethod req_params(inst, **kwargs)[source]#
Get the required parameters for the task, preferring the
--selectorset on task-level via CLI.This method first checks if the –selector parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.
- property selector_inst#
Access current
Selectorinstance.Loads the current
Selectorselector_inst from the cache or initializes it. If the selector requests a specificsandbox, set this sandbox as the environment for the currentTask.- Returns:
Current
Selectorinstance
- property selector_repr#
Return a string representation of the selector.
- store_parts()[source]#
Create parts to create the output path to store intermediary results for the current
Task.Calls
store_parts()of thesuperclass and inserts {“selector”: “sel__{SELECTOR_NAME}”} before keywordversion. Here,SELECTOR_NAMEis the name of the currentselector_inst.- Returns:
Updated parts to create output path to store intermediary results.
- find_keep_columns(collection)[source]#
Returns a set of
Routeobjects describing columns that should be kept given a type of column collection.- Parameters:
collection (
ColumnCollection) – The collection to return.- Return type:
- Returns:
A set of
Routeobjects.
- classmethod get_config_lookup_keys(inst_or_params)[source]#
Returns a dictionary with keys that can be used to lookup state specific values in a config or dictionary, such as default task versions or output locations.
- Parameters:
inst_or_params (SelectorMixin | dict[str, Any]) – The tasks instance or its parameters.
- Return type:
law.util.InsertiableDict
- Returns:
A dictionary with keys that can be used for nested lookup.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class SelectorStepsMixin(*args, **kwargs)[source]#
Bases:
SelectorMixinMixin to include multiple selector steps into tasks.
Inheriting from this mixin will allow a task to access selector steps, which can be a comma-separated list of selector step names and is an input parameter for this task.
Attributes:
Methods:
resolve_param_values(params)Resolve values params and check against possible default values and selector step groups.
req_params(inst, **kwargs)Get the required parameters for the task, preferring the --selector-steps set on task-level via CLI.
Create parts to create the output path to store intermediary results for the current
Task.- selector_steps = <law.parameter.CSVParameter object>#
- exclude_params_repr_empty = {'selector_steps'}#
- selector_steps_order_sensitive = False#
- classmethod resolve_param_values(params)[source]#
Resolve values params and check against possible default values and selector step groups.
Check the values in params against the default value
"default_selector_steps"and the group"selector_step_groups"in the current config inst. For more information, seeresolve_config_default(). IfSelectorStepsMixin.selector_steps_order_sensitiveisTrue,sortthe selector steps.
- classmethod req_params(inst, **kwargs)[source]#
Get the required parameters for the task, preferring the –selector-steps set on task-level via CLI.
This method first checks if the –selector-steps parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.
- store_parts()[source]#
Create parts to create the output path to store intermediary results for the current
Task.Calls
store_parts()of thesuperclass and inserts {“selector”: “__steps__LIST_OF_STEPS”}, whereLIST_OF_STEPSis the sorted list of selector steps. For more information, see e.g.store_parts().- Return type:
InsertableDict- Returns:
Updated parts to create output path to store intermediary results.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class ProducerMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskMixin to include a single
Producerinto tasks.Inheriting from this mixin will give access to instantiate and access a
Producerinstance with name producer, which is an input parameter for this task.Attributes:
Access current
Producerinstance.Return a string representation of the producer.
Methods:
get_producer_inst(producer[, kwargs])Initialize
Producerinstance.resolve_param_values(params)Resolve parameter values params relevant for the
ProducerMixinand all classes it inherits from.get_known_shifts(config_inst, params)Adds set of shifts that the current
producer_instregisters to the set of knownshiftsandupstream_shifts.req_params(inst, **kwargs)Get the required parameters for the task, preferring the
--producerset on task-level via CLI.Create parts to create the output path to store intermediary results for the current
Task.find_keep_columns(collection)Finds the columns to keep based on the collection.
get_config_lookup_keys(inst_or_params)Returns a dictionary with keys that can be used to lookup state specific values in a config or dictionary, such as default task versions or output locations.
- producer = <luigi.parameter.Parameter object>#
- register_producer_sandbox = False#
- register_producer_shifts = False#
- classmethod get_producer_inst(producer, kwargs=None)[source]#
Initialize
Producerinstance.Extracts relevant kwargs for this producer instance using the
get_producer_kwargs()method. After this process, the previously initialized instance of aProducerwith the name producer is initialized using theget_cls()method with the relevant keyword arguments.
- classmethod resolve_param_values(params)[source]#
Resolve parameter values params relevant for the
ProducerMixinand all classes it inherits from.Loads the
config_instand loads the parameter"producer". In case the parameter is not found, defaults to"default_producer". Finally, this function adds the keyword"producer_inst", which contains theProducerinstance obtained usingget_producer_inst()method.
- classmethod get_known_shifts(config_inst, params)[source]#
Adds set of shifts that the current
producer_instregisters to the set of knownshiftsandupstream_shifts.First, the set of
shiftsandupstream_shiftsare obtained from the config_inst and the current set of parameters params using theget_known_shiftsmethods of all classes thatProducerMixininherits from. Afterwards, check if the currentproducer_instregisters shifts. Ifregister_producer_shiftsisTrue, add them to the current set ofshifts. Otherwise, add the shifts obtained from theproducer_insttoupstream_shifts.
- classmethod req_params(inst, **kwargs)[source]#
Get the required parameters for the task, preferring the
--producerset on task-level via CLI.This method first checks if the
--producerparameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.
- property producer_inst: Producer#
Access current
Producerinstance.Loads the current
Producerproducer_inst from the cache or initializes it. If the producer requests a specificsandbox, set this sandbox as the environment for the currentTask.- Returns:
Current
Producerinstance
- store_parts()[source]#
Create parts to create the output path to store intermediary results for the current
Task.Calls
store_parts()of thesuperclass and inserts {“producer”: “prod__{self.producer}”} before keywordversion. For more information, see e.g.store_parts().
- find_keep_columns(collection)[source]#
Finds the columns to keep based on the collection.
This method first calls the ‘find_keep_columns’ method of the superclass with the given collection. If the collection is equal to
ALL_FROM_PRODUCER, it adds the columns produced by the producer instance to the set of columns.- Parameters:
collection (
ColumnCollection) – The collection of columns.- Return type:
- Returns:
A set of columns to keep.
- classmethod get_config_lookup_keys(inst_or_params)[source]#
Returns a dictionary with keys that can be used to lookup state specific values in a config or dictionary, such as default task versions or output locations.
- Parameters:
inst_or_params (ProducerMixin | dict[str, Any]) – The tasks instance or its parameters.
- Return type:
law.util.InsertiableDict
- Returns:
A dictionary with keys that can be used for nested lookup.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class ProducersMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskMixin to include multiple
Producerinstances into tasks.Inheriting from this mixin will allow a task to instantiate and access a set of
Producerinstances with names producers, which is a comma-separated list of producer names and is an input parameter for this task.Attributes:
Access current list of
Producerinstances.Return a string representation of the producers.
Methods:
get_producer_insts(producers[, kwargs])Get all requested producers.
resolve_param_values(params)Resolve values params and check against possible default values and producer groups.
get_known_shifts(config_inst, params)Adds set of all shifts that the list of
producer_instsregister to the set of knownshiftsandupstream_shifts.req_params(inst, **kwargs)Get the required parameters for the task, preferring the --producers set on task-level via CLI.
Create parts to create the output path to store intermediary results for the current
Task.find_keep_columns(collection)Finds the columns to keep based on the collection.
- producers = <law.parameter.CSVParameter object>#
- register_producers_shifts = False#
- classmethod get_producer_insts(producers, kwargs=None)[source]#
Get all requested producers.
Producerinstances are either initalized or loaded from cache.- Parameters:
- Raises:
RuntimeError – if requested producers are not
exposed- Return type:
- Returns:
List of
Producerinstances.
- classmethod resolve_param_values(params)[source]#
Resolve values params and check against possible default values and producer groups.
Check the values in params against the default value
"default_producer"and possible group definitions"producer_groups"in the current config inst. For more information, seeresolve_config_default_and_groups().
- classmethod get_known_shifts(config_inst, params)[source]#
Adds set of all shifts that the list of
producer_instsregister to the set of knownshiftsandupstream_shifts.First, the set of
shiftsandupstream_shiftsare obtained from the config_inst and the current set of parameters params using theget_known_shiftsmethods of all classes thatProducersMixininherits from. Afterwards, loop through the list ofProducerand check if they register shifts. Ifregister_producers_shiftsisTrue, add them to the current set ofshifts. Otherwise, add the shifts toupstream_shifts.
- classmethod req_params(inst, **kwargs)[source]#
Get the required parameters for the task, preferring the –producers set on task-level via CLI.
This method first checks if the –producers parameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.
- property producer_insts: list[columnflow.production.Producer]#
Access current list of
Producerinstances.Loads the current
Producerproducer_insts from the cache or initializes it.- Returns:
Current list
Producerinstances
- store_parts()[source]#
Create parts to create the output path to store intermediary results for the current
Task.Calls
store_parts()of thesuperclass and inserts {“producers”: “prod__{HASH}”} before keywordversion. Here,HASHis the joint string of the first five producer names + a hash created withlaw.util.create_hash()based on the list of producers, starting at its 5th element (i.e.self.producers[5:]) For more information, see e.g.store_parts().- Returns:
Updated parts to create output path to store intermediary results.
- find_keep_columns(collection)[source]#
Finds the columns to keep based on the collection.
This method first calls the ‘find_keep_columns’ method of the superclass with the given collection. If the collection is equal to
ALL_FROM_PRODUCERS, it adds the columns produced by all producer instances to the set of columns.- Parameters:
collection (
ColumnCollection) – The collection of columns.- Return type:
- Returns:
A set of columns to keep.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class MLModelMixinBase(*args, **kwargs)[source]#
Bases:
AnalysisTaskBase mixin to include a machine learning application into tasks.
Inheriting from this mixin will allow a task to instantiate and access a
MLModelinstance with name ml_model, which is an input parameter for this task.Attributes:
Returns a string representation of the ML model instance.
Methods:
req_params(inst, **kwargs)Get the required parameters for the task, preferring the
--ml-modelset on task-level via CLI.get_ml_model_inst(ml_model, analysis_inst[, ...])Get requested ml_model instance.
events_used_in_training(config_inst, ...)Evaluate whether the events for the combination of dataset_inst and shift_inst shall be used in the training.
- ml_model = <luigi.parameter.Parameter object>#
- ml_model_settings = <columnflow.tasks.framework.parameters.SettingsParameter object>#
- exclude_params_repr_empty = {'ml_model'}#
- property ml_model_repr#
Returns a string representation of the ML model instance.
- classmethod req_params(inst, **kwargs)[source]#
Get the required parameters for the task, preferring the
--ml-modelset on task-level via CLI.This method first checks if the
--ml-modelparameter is set at the task-level via the command line. If it is, this parameter is preferred and added to the ‘_prefer_cli’ key in the kwargs dictionary. The method then calls the ‘req_params’ method of the superclass with the updated kwargs.
- classmethod get_ml_model_inst(ml_model, analysis_inst, requested_configs=None, **kwargs)[source]#
Get requested ml_model instance.
This method retrieves the requested ml_model instance. If requested_configs are provided, they are used for the training of the ML application.
- Parameters:
analysis_inst (od.Analysis) – Forward this analysis inst to the init function of new MLModel sub class.
requested_configs (list[str] | None, default:
None) – Configs needed for the training of the ML application.kwargs – Additional keyword arguments to forward to the
MLModelinstance.
- Return type:
- Returns:
MLModelinstance.
- events_used_in_training(config_inst, dataset_inst, shift_inst)[source]#
Evaluate whether the events for the combination of dataset_inst and shift_inst shall be used in the training.
This method checks if the dataset_inst is in the set of datasets of the current ml_model_inst based on the given config_inst. Additionally, the function checks that the shift_inst does not have the tag “disjoint_from_nominal”.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class MLModelTrainingMixin(*args, **kwargs)[source]#
Bases:
MLModelMixinBaseA mixin class for training machine learning models.
This class provides parameters for configuring the training of machine learning models.
Attributes:
Methods:
resolve_calibrators(ml_model_inst, params)Resolve the calibrators for the given ML model instance.
resolve_selectors(ml_model_inst, params)Resolve the selectors for the given ML model instance.
resolve_producers(ml_model_inst, params)Resolve the producers for the given ML model instance.
resolve_param_values(params)Resolve the parameter values for the given parameters.
Generate a dictionary of store parts for the current instance.
- configs = <law.parameter.CSVParameter object>#
- calibrators = <law.parameter.MultiCSVParameter object>#
- selectors = <law.parameter.CSVParameter object>#
- producers = <law.parameter.MultiCSVParameter object>#
- classmethod resolve_calibrators(ml_model_inst, params)[source]#
Resolve the calibrators for the given ML model instance.
This method retrieves the calibrators from the parameters params and broadcasts them to the configs if necessary. It also resolves calibrator_groups and default_calibrator from the config(s) associated with this ML model instance, and validates the number of sequences. Finally, it checks the retrieved calibrators against the training calibrators of the model using
training_calibrators()and instantiates them if necessary.- Parameters:
- Return type:
- Returns:
A tuple of tuples containing the resolved calibrators.
- Raises:
Exception – If the number of calibrator sequences does not match the number of configs used by the ML model.
- classmethod resolve_selectors(ml_model_inst, params)[source]#
Resolve the selectors for the given ML model instance.
This method retrieves the selectors from the parameters params and broadcasts them to the configs if necessary. It also resolves default_selector from the config(s) associated with this ML model instance, validates the number of sequences. Finally, it checks the retrieved selectors against the training selectors of the model, using
training_selector(), and instantiates them.- Parameters:
- Return type:
- Returns:
A tuple containing the resolved selectors.
- Raises:
Exception – If the number of selector sequences does not match the number of configs used by the ML model.
- classmethod resolve_producers(ml_model_inst, params)[source]#
Resolve the producers for the given ML model instance.
This method retrieves the producers from the parameters params and broadcasts them to the configs if necessary. It also resolves producer_groups and default_producer from the config(s) associated with this ML model instance, validates the number of sequences. Finally, it checks the retrieved producers against the training producers of the model, using
training_producers(), and instantiates them.- Parameters:
- Return type:
- Returns:
A tuple of tuples containing the resolved producers.
- Raises:
Exception – If the number of producer sequences does not match the number of configs used by the ML model.
- classmethod resolve_param_values(params)[source]#
Resolve the parameter values for the given parameters.
This method retrieves the parameters and resolves the ML model instance, configs, calibrators, selectors, and producers. It also calls the model’s setup hook.
- Parameters:
params (
dict[str,Any]) – A dictionary of parameters that may contain the analysis instance and ML model.- Return type:
- Returns:
A dictionary containing the resolved parameters.
- Raises:
Exception – If the ML model instance received configs to define training configs, but did not define any.
- store_parts()[source]#
Generate a dictionary of store parts for the current instance.
This method extends the base method to include additional parts related to machine learning model configurations, calibrators, selectors, producers (CSP), and the ML model instance itself. If the list of either of the CSPs is empty, the corresponding part is set to
"none", otherwise, the first two elements of the list are joined with"__". If the list of either of the CSPs contains more than two elements, the part is extended with the number of elements and a hash of the remaining elements, which is created withlaw.util.create_hash(). The parts are represented as strings and are used to create unique identifiers for the instance’s output.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {'ml_model'}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class MLModelMixin(*args, **kwargs)[source]#
Bases:
ConfigTask,MLModelMixinBaseAttributes:
Methods:
resolve_param_values(params)Returns a
law.util.InsertableDictwhose values are used to create a store path.find_keep_columns(collection)Returns a set of
Routeobjects describing columns that should be kept given a type of column collection.- ml_model = <luigi.parameter.Parameter object>#
- allow_empty_ml_model = True#
- exclude_params_repr_empty = {'ml_model'}#
- store_parts()[source]#
Returns a
law.util.InsertableDictwhose values are used to create a store path. For instance, the parts{"keyA": "a", "keyB": "b", 2: "c"}lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.- Return type:
InsertableDict- Returns:
Dictionary with parts to create a path to store intermediary results.
- find_keep_columns(collection)[source]#
Returns a set of
Routeobjects describing columns that should be kept given a type of column collection.- Parameters:
collection (
ColumnCollection) – The collection to return.- Return type:
- Returns:
A set of
Routeobjects.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class MLModelDataMixin(*args, **kwargs)[source]#
Bases:
MLModelMixinAttributes:
Methods:
Returns a
law.util.InsertableDictwhose values are used to create a store path.- allow_empty_ml_model = False#
- store_parts()[source]#
Returns a
law.util.InsertableDictwhose values are used to create a store path. For instance, the parts{"keyA": "a", "keyB": "b", 2: "c"}lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.- Return type:
InsertableDict- Returns:
Dictionary with parts to create a path to store intermediary results.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {'ml_model'}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class MLModelsMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskAttributes:
Returns a string representation of the ML models.
Methods:
resolve_param_values(params)req_params(inst, **kwargs)Returns parameters that are jointly defined in this class and another task instance of some other class.
Returns a
law.util.InsertableDictwhose values are used to create a store path.find_keep_columns(collection)Returns a set of
Routeobjects describing columns that should be kept given a type of column collection.- ml_models = <law.parameter.CSVParameter object>#
- allow_empty_ml_models = True#
- exclude_params_repr_empty = {'ml_models'}#
- property ml_models_repr#
Returns a string representation of the ML models.
- classmethod req_params(inst, **kwargs)[source]#
Returns parameters that are jointly defined in this class and another task instance of some other class. The parameters are used when calling
Task.req(self).- Return type:
- store_parts()[source]#
Returns a
law.util.InsertableDictwhose values are used to create a store path. For instance, the parts{"keyA": "a", "keyB": "b", 2: "c"}lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.- Return type:
InsertableDict- Returns:
Dictionary with parts to create a path to store intermediary results.
- find_keep_columns(collection)[source]#
Returns a set of
Routeobjects describing columns that should be kept given a type of column collection.- Parameters:
collection (
ColumnCollection) – The collection to return.- Return type:
- Returns:
A set of
Routeobjects.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class InferenceModelMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskAttributes:
Methods:
resolve_param_values(params)get_inference_model_inst(inference_model, ...)- rtype:
req_params(inst, **kwargs)Returns parameters that are jointly defined in this class and another task instance of some other class.
Returns a
law.util.InsertableDictwhose values are used to create a store path.- inference_model = <luigi.parameter.Parameter object>#
- classmethod req_params(inst, **kwargs)[source]#
Returns parameters that are jointly defined in this class and another task instance of some other class. The parameters are used when calling
Task.req(self).- Return type:
- property inference_model_repr#
- store_parts()[source]#
Returns a
law.util.InsertableDictwhose values are used to create a store path. For instance, the parts{"keyA": "a", "keyB": "b", 2: "c"}lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.- Return type:
InsertableDict- Returns:
Dictionary with parts to create a path to store intermediary results.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class CategoriesMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskAttributes:
Methods:
resolve_param_values(params)- categories = <law.parameter.CSVParameter object>#
- default_categories = None#
- allow_empty_categories = False#
- property categories_repr#
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class VariablesMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskAttributes:
Methods:
resolve_param_values(params)split_multi_variable(variable)Splits a multi-dimensional variable given in the format
"var_a[-var_b[-...]]"into separate variable names using a delimiter ("-") and returns a tuple.join_multi_variable(variables)Joins the name of multiple variables using a delimiter (
"-") into a single string that represents a multi-dimensional variable and returns it.- variables = <law.parameter.CSVParameter object>#
- default_variables = None#
- allow_empty_variables = False#
- allow_missing_variables = False#
- classmethod split_multi_variable(variable)[source]#
Splits a multi-dimensional variable given in the format
"var_a[-var_b[-...]]"into separate variable names using a delimiter ("-") and returns a tuple.
- classmethod join_multi_variable(variables)[source]#
Joins the name of multiple variables using a delimiter (
"-") into a single string that represents a multi-dimensional variable and returns it.- Return type:
- property variables_repr#
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class DatasetsProcessesMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskAttributes:
Methods:
resolve_param_values(params)get_known_shifts(config_inst, params)Returns two sets of shifts in a tuple: shifts implemented by _this_ task, and dependent shifts that are implemented by upstream tasks.
- datasets = <law.parameter.CSVParameter object>#
- processes = <law.parameter.CSVParameter object>#
- allow_empty_datasets = False#
- allow_empty_processes = False#
- classmethod get_known_shifts(config_inst, params)[source]#
Returns two sets of shifts in a tuple: shifts implemented by _this_ task, and dependent shifts that are implemented by upstream tasks.
- property datasets_repr#
- property processes_repr#
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class ShiftSourcesMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskAttributes:
Methods:
resolve_param_values(params)expand_shift_sources(sources)- rtype:
list[str]
reduce_shifts(shifts)- rtype:
list[str]
- shift_sources = <law.parameter.CSVParameter object>#
- allow_empty_shift_sources = False#
- property shift_sources_repr#
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class WeightProducerMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskAttributes:
Methods:
get_weight_producer_inst(weight_producer[, ...])- rtype:
WeightProducer
resolve_param_values(params)get_known_shifts(config_inst, params)Returns two sets of shifts in a tuple: shifts implemented by _this_ task, and dependent shifts that are implemented by upstream tasks.
Returns a
law.util.InsertableDictwhose values are used to create a store path.- weight_producer = <luigi.parameter.Parameter object>#
- register_weight_producer_sandbox = False#
- register_weight_producer_shifts = False#
- classmethod get_known_shifts(config_inst, params)[source]#
Returns two sets of shifts in a tuple: shifts implemented by _this_ task, and dependent shifts that are implemented by upstream tasks.
- property weight_producer_inst: WeightProducer#
- store_parts()[source]#
Returns a
law.util.InsertableDictwhose values are used to create a store path. For instance, the parts{"keyA": "a", "keyB": "b", 2: "c"}lead to the path “a/b/c”. The keys can be used by subclassing tasks to overwrite values.
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class ChunkedIOMixin(*args, **kwargs)[source]#
Bases:
AnalysisTaskAttributes:
Methods:
raise_if_not_finite(ak_array)Checks whether all values in array ak_array are finite.
raise_if_overlapping(ak_arrays)Checks whether fields of ak_arrays overlap.
iter_chunked_io(*args, **kwargs)- check_finite_output = <luigi.parameter.BoolParameter object>#
- check_overlapping_inputs = <luigi.parameter.BoolParameter object>#
- exclude_params_req = {'check_finite_output', 'check_overlapping_inputs', 'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- default_chunk_size = 100000#
- default_pool_size = 2#
- classmethod raise_if_not_finite(ak_array)[source]#
Checks whether all values in array ak_array are finite.
The check is performed using the
numpy.isfinite()function.- Parameters:
ak_array (
Array) – Array with events to check.- Raises:
ValueError – If any value in ak_array is not finite.
- Return type:
- classmethod raise_if_overlapping(ak_arrays)[source]#
Checks whether fields of ak_arrays overlap.
- Parameters:
- Raises:
ValueError – If at least one overlap is found.
- Return type:
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- class HistHookMixin(*args, **kwargs)[source]#
Bases:
ConfigTaskAttributes:
Return a string representation of the hist hooks.
Methods:
invoke_hist_hooks(hists)Invoke hooks to update histograms before plotting.
- hist_hooks = <law.parameter.CSVParameter object>#
- exclude_index = False#
- exclude_params_branch = {'user'}#
- exclude_params_index = {'user'}#
- exclude_params_repr = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_repr_empty = {}#
- exclude_params_req = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#
- exclude_params_req_get = {}#
- exclude_params_req_set = {}#
- exclude_params_sandbox = {'log_file', 'sandbox'}#
- exclude_params_workflow = {'notify_custom', 'notify_mattermost', 'notify_slack', 'user'}#