ml

`ml`#

Tasks related to ML workflows.

Classes:

`PrepareMLEvents`(args, *kwargs)
`PrepareMLEventsWrapper`(args, *kwargs)
`MergeMLStats`(args, *kwargs)
`MergeMLStatsWrapper`(args, *kwargs)
`MergeMLEvents`(args, *kwargs)
`MergeMLEventsWrapper`(args, *kwargs)
`MLTraining`(args, *kwargs)
`MLEvaluation`(args, *kwargs)
`MLEvaluationWrapper`(args, *kwargs)
`MergeMLEvaluation`(args, *kwargs)	Task to merge events for a dataset, where the MLEvaluation produces multiple parquet files.
`MergeMLEvaluationWrapper`(args, *kwargs)
`PlotMLResultsBase`(args, *kwargs)	A base class, used for the implementation of the ML plotting tasks.
`PlotMLResults`(args, *kwargs)	A task that generates plots for machine learning results.

class PrepareMLEvents(*args, **kwargs)[source]#

Bases: MLModelDataMixin, ProducersMixin, SelectorMixin, CalibratorsMixin, ChunkedIOMixin, MergeReducedEventsUser, LocalWorkflow, RemoteWorkflow

Attributes:

`sandbox`
`allow_empty_ml_model`
`reqs`
`missing_column_alias_strategy`
`preparation_producer_inst`
`check_finite_output`
`check_overlapping_inputs`
`exclude_index`
`exclude_params_branch`
`exclude_params_htcondor_workflow`
`exclude_params_index`
`exclude_params_remote_workflow`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_get`
`exclude_params_req_set`
`exclude_params_sandbox`
`exclude_params_slurm_workflow`
`exclude_params_workflow`

Methods:

`workflow_requires`()	Hook to add workflow requirements.
`requires`()	The Tasks that this Task depends on.
`output`()	The output that this Task produces.
`run`()	The task run method, to be overridden in a subclass.

sandbox = 'bash::$CF_BASE/sandboxes/venv_columnar.sh'#

allow_empty_ml_model = False#

reqs = {'BuildBashSandbox': <class 'columnflow.tasks.framework.remote.BuildBashSandbox'>, 'BundleBashSandbox': <class 'columnflow.tasks.framework.remote.BundleBashSandbox'>, 'BundleCMSSWSandbox': <class 'columnflow.tasks.framework.remote.BundleCMSSWSandbox'>, 'BundleRepo': <class 'columnflow.tasks.framework.remote.BundleRepo'>, 'BundleSoftware': <class 'columnflow.tasks.framework.remote.BundleSoftware'>, 'MergeReducedEvents': <class 'columnflow.tasks.reduction.MergeReducedEvents'>, 'MergeReductionStats': <class 'columnflow.tasks.reduction.MergeReductionStats'>, 'ProduceColumns': <class 'columnflow.tasks.production.ProduceColumns'>}#

missing_column_alias_strategy = 'original'#

property preparation_producer_inst#

workflow_requires()[source]#: Hook to add workflow requirements. This method is expected to return a dictionary. When this method is called from a branch task, an exception is raised.

requires()[source]#

The Tasks that this Task depends on.

A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.

See Task.requires

output()[source]#

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note: If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

run()[source]#

The task run method, to be overridden in a subclass.

See Task.run

check_finite_output = <luigi.parameter.BoolParameter object>#

check_overlapping_inputs = <luigi.parameter.BoolParameter object>#

exclude_index = False#

exclude_params_branch = {'acceptance', 'branches', 'cancel_jobs', 'cleanup_jobs', 'htcondor_cpus', 'htcondor_flavor', 'htcondor_gpus', 'htcondor_logs', 'htcondor_memory', 'htcondor_pool', 'htcondor_scheduler', 'htcondor_share_software', 'ignore_submission', 'job_workers', 'max_runtime', 'no_poll', 'parallel_jobs', 'pilot', 'poll_fails', 'poll_interval', 'retries', 'shuffle_jobs', 'slurm_flavor', 'slurm_partition', 'submission_threads', 'tasks_per_job', 'tolerance', 'transfer_logs', 'walltime'}#

exclude_params_htcondor_workflow = {}#

exclude_params_index = {'effective_workflow', 'local_shift'}#

exclude_params_remote_workflow = {'local_shift'}#

exclude_params_repr = {'cancel_jobs', 'cleanup_jobs', 'workflow'}#

exclude_params_repr_empty = {'ml_model'}#

exclude_params_req = {'check_finite_output', 'check_overlapping_inputs', 'effective_workflow', 'local_shift'}#

exclude_params_req_get = {}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'local_shift', 'log_file', 'sandbox'}#

exclude_params_slurm_workflow = {}#

exclude_params_workflow = {'branch'}#

class PrepareMLEventsWrapper(*args, **kwargs)#

Bases: AnalysisTask, WrapperTask

Attributes:

`configs`
`datasets`
`exclude_index`
`exclude_params_index`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_get`
`exclude_params_req_set`
`exclude_params_sandbox`
`skip_configs`
`skip_datasets`

Methods:

`requires`()	Collect requirements defined by the underlying `require_cls` of the `WrapperTask` depending on optional additional parameters.
`update_wrapper_params`(params)

configs = <law.parameter.CSVParameter object>#

datasets = <law.parameter.CSVParameter object>#

exclude_index = False#

exclude_params_index = {}#

exclude_params_repr = {}#

exclude_params_repr_empty = {}#

exclude_params_req = {}#

exclude_params_req_get = {}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'log_file', 'sandbox'}#

requires() → Requirements#

Collect requirements defined by the underlying require_cls of the WrapperTask depending on optional additional parameters.

Return type:: Requirements
Returns:: Requirements for the WrapperTask instance.

skip_configs = <law.parameter.CSVParameter object>#

skip_datasets = <law.parameter.CSVParameter object>#

update_wrapper_params(params)#

class MergeMLStats(*args, **kwargs)[source]#

Bases: MLModelDataMixin, ProducersMixin, SelectorMixin, CalibratorsMixin, DatasetTask, ForestMerge

Attributes:

`merge_factor`
`exclude_params_req_get`
`reqs`
`exclude_index`
`exclude_params_branch`
`exclude_params_forest_merge`
`exclude_params_index`
`exclude_params_remote_workflow`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_set`
`exclude_params_sandbox`
`exclude_params_workflow`

Methods:

`create_branch_map`()	Define the branch map for when this task is used as a workflow.
`merge_workflow_requires`()
`merge_requires`(start_branch, end_branch)
`merge_output`()
`trace_merge_inputs`(inputs)
`run`()	The task run method, to be overridden in a subclass.
`merge`(inputs, output)
`merge_counts`(dst, src)	Adds counts (integers or floats) in a src dictionary recursively into a dst dictionary.

merge_factor = 20#

exclude_params_req_get = {'workflow'}#

reqs = {'PrepareMLEvents': <class 'columnflow.tasks.ml.PrepareMLEvents'>}#

create_branch_map()[source]#: Define the branch map for when this task is used as a workflow. By default, use the merging information provided by file_merging_factor to return a dictionary which maps branches to one or more input file indices. E.g. 1 -> [3, 4, 5] would mean that branch 1 is simultaneously handling input file indices 3, 4 and 5.

merge_workflow_requires()[source]#

merge_requires(start_branch, end_branch)[source]#

merge_output()[source]#

trace_merge_inputs(inputs)[source]#

run()[source]#

The task run method, to be overridden in a subclass.

See Task.run

merge(inputs, output)[source]#

classmethod merge_counts(dst, src)[source]#

Adds counts (integers or floats) in a src dictionary recursively into a dst dictionary. dst is updated in-place and also returned.

Return type:: dict

exclude_index = False#

exclude_params_branch = {'acceptance', 'branches', 'pilot', 'tolerance'}#

exclude_params_forest_merge = {'branch', 'branches', 'keep_nodes', 'tree_depth', 'tree_index'}#

exclude_params_index = {'effective_workflow', 'local_shift'}#

exclude_params_remote_workflow = {'local_shift'}#

exclude_params_repr = {'workflow'}#

exclude_params_repr_empty = {'ml_model'}#

exclude_params_req = {'effective_workflow', 'local_shift'}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'local_shift', 'log_file', 'sandbox'}#

exclude_params_workflow = {'branch'}#

class MergeMLStatsWrapper(*args, **kwargs)#

Bases: AnalysisTask, WrapperTask

Attributes:

`configs`
`datasets`
`exclude_index`
`exclude_params_index`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_get`
`exclude_params_req_set`
`exclude_params_sandbox`
`shifts`
`skip_configs`
`skip_datasets`
`skip_shifts`

Methods:

`requires`()	Collect requirements defined by the underlying `require_cls` of the `WrapperTask` depending on optional additional parameters.
`update_wrapper_params`(params)

configs = <law.parameter.CSVParameter object>#

datasets = <law.parameter.CSVParameter object>#

exclude_index = False#

exclude_params_index = {}#

exclude_params_repr = {}#

exclude_params_repr_empty = {}#

exclude_params_req = {}#

exclude_params_req_get = {}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'log_file', 'sandbox'}#

requires() → Requirements#

Collect requirements defined by the underlying require_cls of the WrapperTask depending on optional additional parameters.

Return type:: Requirements
Returns:: Requirements for the WrapperTask instance.

shifts = <law.parameter.CSVParameter object>#

skip_configs = <law.parameter.CSVParameter object>#

skip_datasets = <law.parameter.CSVParameter object>#

skip_shifts = <law.parameter.CSVParameter object>#

update_wrapper_params(params)#

class MergeMLEvents(*args, **kwargs)[source]#

Bases: MLModelDataMixin, ProducersMixin, SelectorMixin, CalibratorsMixin, DatasetTask, ForestMerge, RemoteWorkflow

Attributes:

`sandbox`
`fold`
`shift`
`effective_shift`
`allow_empty_shift`
`merge_factor`
`allow_empty_ml_model`
`reqs`
`exclude_index`
`exclude_params_branch`
`exclude_params_forest_merge`
`exclude_params_htcondor_workflow`
`exclude_params_index`
`exclude_params_remote_workflow`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_get`
`exclude_params_req_set`
`exclude_params_sandbox`
`exclude_params_slurm_workflow`
`exclude_params_workflow`

Methods:

`create_branch_map`()	Define the branch map for when this task is used as a workflow.
`merge_workflow_requires`()
`merge_requires`(start_leaf, end_leaf)
`trace_merge_inputs`(inputs)
`merge_output`()
`run`()	The task run method, to be overridden in a subclass.
`merge`(inputs, output)

sandbox = 'bash::$CF_BASE/sandboxes/venv_columnar.sh'#

fold = <luigi.parameter.IntParameter object>#

shift = None#

effective_shift = None#

allow_empty_shift = True#

merge_factor = 10#

allow_empty_ml_model = False#

reqs = {'BuildBashSandbox': <class 'columnflow.tasks.framework.remote.BuildBashSandbox'>, 'BundleBashSandbox': <class 'columnflow.tasks.framework.remote.BundleBashSandbox'>, 'BundleCMSSWSandbox': <class 'columnflow.tasks.framework.remote.BundleCMSSWSandbox'>, 'BundleRepo': <class 'columnflow.tasks.framework.remote.BundleRepo'>, 'BundleSoftware': <class 'columnflow.tasks.framework.remote.BundleSoftware'>, 'PrepareMLEvents': <class 'columnflow.tasks.ml.PrepareMLEvents'>}#

create_branch_map()[source]#: Define the branch map for when this task is used as a workflow. By default, use the merging information provided by file_merging_factor to return a dictionary which maps branches to one or more input file indices. E.g. 1 -> [3, 4, 5] would mean that branch 1 is simultaneously handling input file indices 3, 4 and 5.

merge_workflow_requires()[source]#

merge_requires(start_leaf, end_leaf)[source]#

trace_merge_inputs(inputs)[source]#

merge_output()[source]#

run()[source]#

The task run method, to be overridden in a subclass.

See Task.run

merge(inputs, output)[source]#

exclude_index = False#

exclude_params_branch = {'acceptance', 'branches', 'cancel_jobs', 'cleanup_jobs', 'htcondor_cpus', 'htcondor_flavor', 'htcondor_gpus', 'htcondor_logs', 'htcondor_memory', 'htcondor_pool', 'htcondor_scheduler', 'htcondor_share_software', 'ignore_submission', 'job_workers', 'max_runtime', 'no_poll', 'parallel_jobs', 'pilot', 'poll_fails', 'poll_interval', 'retries', 'shuffle_jobs', 'slurm_flavor', 'slurm_partition', 'submission_threads', 'tasks_per_job', 'tolerance', 'transfer_logs', 'walltime'}#

exclude_params_forest_merge = {'branch', 'branches', 'keep_nodes', 'tree_depth', 'tree_index'}#

exclude_params_htcondor_workflow = {}#

exclude_params_index = {'effective_workflow', 'local_shift'}#

exclude_params_remote_workflow = {'local_shift'}#

exclude_params_repr = {'cancel_jobs', 'cleanup_jobs', 'workflow'}#

exclude_params_repr_empty = {'ml_model'}#

exclude_params_req = {'effective_workflow', 'local_shift'}#

exclude_params_req_get = {}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'local_shift', 'log_file', 'sandbox'}#

exclude_params_slurm_workflow = {}#

exclude_params_workflow = {'branch'}#

class MergeMLEventsWrapper(*args, **kwargs)#

Bases: AnalysisTask, WrapperTask

Attributes:

`configs`
`datasets`
`exclude_index`
`exclude_params_index`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_get`
`exclude_params_req_set`
`exclude_params_sandbox`
`skip_configs`
`skip_datasets`

Methods:

`requires`()	Collect requirements defined by the underlying `require_cls` of the `WrapperTask` depending on optional additional parameters.
`update_wrapper_params`(params)

configs = <law.parameter.CSVParameter object>#

datasets = <law.parameter.CSVParameter object>#

exclude_index = False#

exclude_params_index = {}#

exclude_params_repr = {}#

exclude_params_repr_empty = {}#

exclude_params_req = {}#

exclude_params_req_get = {}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'log_file', 'sandbox'}#

requires() → Requirements#

Collect requirements defined by the underlying require_cls of the WrapperTask depending on optional additional parameters.

Return type:: Requirements
Returns:: Requirements for the WrapperTask instance.

skip_configs = <law.parameter.CSVParameter object>#

skip_datasets = <law.parameter.CSVParameter object>#

update_wrapper_params(params)#

class MLTraining(*args, **kwargs)[source]#

Bases: MLModelTrainingMixin, LocalWorkflow, RemoteWorkflow

Attributes:

`allow_empty_ml_model`
`reqs`
`sandbox`	Parameter whose value is a `str`, and a base class for other parameter types.
`accepts_messages`	For configuring which scheduler messages can be received.
`fold`
`exclude_index`
`exclude_params_branch`
`exclude_params_htcondor_workflow`
`exclude_params_index`
`exclude_params_remote_workflow`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_get`
`exclude_params_req_set`
`exclude_params_sandbox`
`exclude_params_slurm_workflow`
`exclude_params_workflow`

Methods:

`create_branch_map`()	Abstract method that must be overwritten by inheriting tasks to define the branch map.
`workflow_requires`()	Hook to add workflow requirements.
`requires`()	The Tasks that this Task depends on.
`output`()	The output that this Task produces.
`run`()	The task run method, to be overridden in a subclass.

allow_empty_ml_model = False#

reqs = {'BuildBashSandbox': <class 'columnflow.tasks.framework.remote.BuildBashSandbox'>, 'BundleBashSandbox': <class 'columnflow.tasks.framework.remote.BundleBashSandbox'>, 'BundleCMSSWSandbox': <class 'columnflow.tasks.framework.remote.BundleCMSSWSandbox'>, 'BundleRepo': <class 'columnflow.tasks.framework.remote.BundleRepo'>, 'BundleSoftware': <class 'columnflow.tasks.framework.remote.BundleSoftware'>, 'MergeMLEvents': <class 'columnflow.tasks.ml.MergeMLEvents'>, 'MergeMLStats': <class 'columnflow.tasks.ml.MergeMLStats'>}#

property sandbox#

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

Any value provided on the command line:

To the root task (eg. --param xyz)

Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See ParamConfigIngestion

Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

property accepts_messages#: For configuring which scheduler messages can be received. When falsy, this tasks does not accept any message. When True, all messages are accepted.

property fold#

create_branch_map()[source]#: Abstract method that must be overwritten by inheriting tasks to define the branch map.

workflow_requires()[source]#: Hook to add workflow requirements. This method is expected to return a dictionary. When this method is called from a branch task, an exception is raised.

requires()[source]#

The Tasks that this Task depends on.

See Task.requires

output()[source]#

The output that this Task produces.

Implementation note: If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

run()[source]#

The task run method, to be overridden in a subclass.

See Task.run

exclude_index = False#

exclude_params_branch = {'acceptance', 'branches', 'cancel_jobs', 'cleanup_jobs', 'htcondor_cpus', 'htcondor_flavor', 'htcondor_gpus', 'htcondor_logs', 'htcondor_memory', 'htcondor_pool', 'htcondor_scheduler', 'htcondor_share_software', 'ignore_submission', 'job_workers', 'max_runtime', 'no_poll', 'parallel_jobs', 'pilot', 'poll_fails', 'poll_interval', 'retries', 'shuffle_jobs', 'slurm_flavor', 'slurm_partition', 'submission_threads', 'tasks_per_job', 'tolerance', 'transfer_logs', 'walltime'}#

exclude_params_htcondor_workflow = {}#

exclude_params_index = {'effective_workflow'}#

exclude_params_remote_workflow = {}#

exclude_params_repr = {'cancel_jobs', 'cleanup_jobs', 'workflow'}#

exclude_params_repr_empty = {'ml_model'}#

exclude_params_req = {'effective_workflow'}#

exclude_params_req_get = {}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'log_file', 'sandbox'}#

exclude_params_slurm_workflow = {}#

exclude_params_workflow = {'branch'}#

class MLEvaluation(*args, **kwargs)[source]#

Bases: MLModelMixin, ProducersMixin, SelectorMixin, CalibratorsMixin, ChunkedIOMixin, MergeReducedEventsUser, LocalWorkflow, RemoteWorkflow

Attributes:

`allow_empty_ml_model`
`missing_column_alias_strategy`
`reqs`
`sandbox`
`preparation_producer_inst`
`check_finite_output`
`check_overlapping_inputs`
`exclude_index`
`exclude_params_branch`
`exclude_params_htcondor_workflow`
`exclude_params_index`
`exclude_params_remote_workflow`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_get`
`exclude_params_req_set`
`exclude_params_sandbox`
`exclude_params_slurm_workflow`
`exclude_params_workflow`

Methods:

`workflow_requires`()	Hook to add workflow requirements.
`requires`()	The Tasks that this Task depends on.
`output`()	The output that this Task produces.
`run`()	The task run method, to be overridden in a subclass.

allow_empty_ml_model = False#

missing_column_alias_strategy = 'original'#

reqs = {'BuildBashSandbox': <class 'columnflow.tasks.framework.remote.BuildBashSandbox'>, 'BundleBashSandbox': <class 'columnflow.tasks.framework.remote.BundleBashSandbox'>, 'BundleCMSSWSandbox': <class 'columnflow.tasks.framework.remote.BundleCMSSWSandbox'>, 'BundleRepo': <class 'columnflow.tasks.framework.remote.BundleRepo'>, 'BundleSoftware': <class 'columnflow.tasks.framework.remote.BundleSoftware'>, 'MLTraining': <class 'columnflow.tasks.ml.MLTraining'>, 'MergeReducedEvents': <class 'columnflow.tasks.reduction.MergeReducedEvents'>, 'MergeReductionStats': <class 'columnflow.tasks.reduction.MergeReductionStats'>, 'ProduceColumns': <class 'columnflow.tasks.production.ProduceColumns'>}#

sandbox = None#

property preparation_producer_inst#

workflow_requires()[source]#: Hook to add workflow requirements. This method is expected to return a dictionary. When this method is called from a branch task, an exception is raised.

requires()[source]#

The Tasks that this Task depends on.

See Task.requires

output()[source]#

The output that this Task produces.

Implementation note: If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

run()[source]#

The task run method, to be overridden in a subclass.

See Task.run

check_finite_output = <luigi.parameter.BoolParameter object>#

check_overlapping_inputs = <luigi.parameter.BoolParameter object>#

exclude_index = False#

exclude_params_branch = {'acceptance', 'branches', 'cancel_jobs', 'cleanup_jobs', 'htcondor_cpus', 'htcondor_flavor', 'htcondor_gpus', 'htcondor_logs', 'htcondor_memory', 'htcondor_pool', 'htcondor_scheduler', 'htcondor_share_software', 'ignore_submission', 'job_workers', 'max_runtime', 'no_poll', 'parallel_jobs', 'pilot', 'poll_fails', 'poll_interval', 'retries', 'shuffle_jobs', 'slurm_flavor', 'slurm_partition', 'submission_threads', 'tasks_per_job', 'tolerance', 'transfer_logs', 'walltime'}#

exclude_params_htcondor_workflow = {}#

exclude_params_index = {'effective_workflow', 'local_shift'}#

exclude_params_remote_workflow = {'local_shift'}#

exclude_params_repr = {'cancel_jobs', 'cleanup_jobs', 'workflow'}#

exclude_params_repr_empty = {'ml_model'}#

exclude_params_req = {'check_finite_output', 'check_overlapping_inputs', 'effective_workflow', 'local_shift'}#

exclude_params_req_get = {}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'local_shift', 'log_file', 'sandbox'}#

exclude_params_slurm_workflow = {}#

exclude_params_workflow = {'branch'}#

class MLEvaluationWrapper(*args, **kwargs)#

Bases: AnalysisTask, WrapperTask

Attributes:

`configs`
`datasets`
`exclude_index`
`exclude_params_index`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_get`
`exclude_params_req_set`
`exclude_params_sandbox`
`shifts`
`skip_configs`
`skip_datasets`
`skip_shifts`

Methods:

`requires`()	Collect requirements defined by the underlying `require_cls` of the `WrapperTask` depending on optional additional parameters.
`update_wrapper_params`(params)

configs = <law.parameter.CSVParameter object>#

datasets = <law.parameter.CSVParameter object>#

exclude_index = False#

exclude_params_index = {}#

exclude_params_repr = {}#

exclude_params_repr_empty = {}#

exclude_params_req = {}#

exclude_params_req_get = {}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'log_file', 'sandbox'}#

requires() → Requirements#

Collect requirements defined by the underlying require_cls of the WrapperTask depending on optional additional parameters.

Return type:: Requirements
Returns:: Requirements for the WrapperTask instance.

shifts = <law.parameter.CSVParameter object>#

skip_configs = <law.parameter.CSVParameter object>#

skip_datasets = <law.parameter.CSVParameter object>#

skip_shifts = <law.parameter.CSVParameter object>#

update_wrapper_params(params)#

class MergeMLEvaluation(*args, **kwargs)[source]#

Bases: MLModelMixin, ProducersMixin, SelectorMixin, CalibratorsMixin, DatasetTask, ForestMerge, RemoteWorkflow

Task to merge events for a dataset, where the MLEvaluation produces multiple parquet files. The task serves as a helper task for plotting the ML evaluation results in the PlotMLResults task.

Attributes:

`sandbox`
`merge_factor`
`reqs`
`exclude_index`
`exclude_params_branch`
`exclude_params_forest_merge`
`exclude_params_htcondor_workflow`
`exclude_params_index`
`exclude_params_remote_workflow`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_get`
`exclude_params_req_set`
`exclude_params_sandbox`
`exclude_params_slurm_workflow`
`exclude_params_workflow`

Methods:

`create_branch_map`()	Define the branch map for when this task is used as a workflow.
`merge_workflow_requires`()
`merge_requires`(start_branch, end_branch)
`merge_output`()
`merge`(inputs, output)

sandbox = 'bash::$CF_BASE/sandboxes/venv_columnar.sh'#

merge_factor = 20#

reqs = {'BuildBashSandbox': <class 'columnflow.tasks.framework.remote.BuildBashSandbox'>, 'BundleBashSandbox': <class 'columnflow.tasks.framework.remote.BundleBashSandbox'>, 'BundleCMSSWSandbox': <class 'columnflow.tasks.framework.remote.BundleCMSSWSandbox'>, 'BundleRepo': <class 'columnflow.tasks.framework.remote.BundleRepo'>, 'BundleSoftware': <class 'columnflow.tasks.framework.remote.BundleSoftware'>, 'MLEvaluation': <class 'columnflow.tasks.ml.MLEvaluation'>}#

create_branch_map()[source]#: Define the branch map for when this task is used as a workflow. By default, use the merging information provided by file_merging_factor to return a dictionary which maps branches to one or more input file indices. E.g. 1 -> [3, 4, 5] would mean that branch 1 is simultaneously handling input file indices 3, 4 and 5.

merge_workflow_requires()[source]#

merge_requires(start_branch, end_branch)[source]#

merge_output()[source]#

merge(inputs, output)[source]#

exclude_index = False#

exclude_params_branch = {'acceptance', 'branches', 'cancel_jobs', 'cleanup_jobs', 'htcondor_cpus', 'htcondor_flavor', 'htcondor_gpus', 'htcondor_logs', 'htcondor_memory', 'htcondor_pool', 'htcondor_scheduler', 'htcondor_share_software', 'ignore_submission', 'job_workers', 'max_runtime', 'no_poll', 'parallel_jobs', 'pilot', 'poll_fails', 'poll_interval', 'retries', 'shuffle_jobs', 'slurm_flavor', 'slurm_partition', 'submission_threads', 'tasks_per_job', 'tolerance', 'transfer_logs', 'walltime'}#

exclude_params_forest_merge = {'branch', 'branches', 'keep_nodes', 'tree_depth', 'tree_index'}#

exclude_params_htcondor_workflow = {}#

exclude_params_index = {'effective_workflow', 'local_shift'}#

exclude_params_remote_workflow = {'local_shift'}#

exclude_params_repr = {'cancel_jobs', 'cleanup_jobs', 'workflow'}#

exclude_params_repr_empty = {'ml_model'}#

exclude_params_req = {'effective_workflow', 'local_shift'}#

exclude_params_req_get = {}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'local_shift', 'log_file', 'sandbox'}#

exclude_params_slurm_workflow = {}#

exclude_params_workflow = {'branch'}#

class MergeMLEvaluationWrapper(*args, **kwargs)#

Bases: AnalysisTask, WrapperTask

Attributes:

`configs`
`datasets`
`exclude_index`
`exclude_params_index`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_get`
`exclude_params_req_set`
`exclude_params_sandbox`
`shifts`
`skip_configs`
`skip_datasets`
`skip_shifts`

Methods:

`requires`()	Collect requirements defined by the underlying `require_cls` of the `WrapperTask` depending on optional additional parameters.
`update_wrapper_params`(params)

configs = <law.parameter.CSVParameter object>#

datasets = <law.parameter.CSVParameter object>#

exclude_index = False#

exclude_params_index = {}#

exclude_params_repr = {}#

exclude_params_repr_empty = {}#

exclude_params_req = {}#

exclude_params_req_get = {}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'log_file', 'sandbox'}#

requires() → Requirements#

Collect requirements defined by the underlying require_cls of the WrapperTask depending on optional additional parameters.

Return type:: Requirements
Returns:: Requirements for the WrapperTask instance.

shifts = <law.parameter.CSVParameter object>#

skip_configs = <law.parameter.CSVParameter object>#

skip_datasets = <law.parameter.CSVParameter object>#

skip_shifts = <law.parameter.CSVParameter object>#

update_wrapper_params(params)#

class PlotMLResultsBase(*args, **kwargs)[source]#

Bases: ProcessPlotSettingMixin, CategoriesMixin, MLModelMixin, ProducersMixin, SelectorStepsMixin, CalibratorsMixin, LocalWorkflow, RemoteWorkflow

A base class, used for the implementation of the ML plotting tasks. This class implements a plot_function parameter for choosing a desired plotting function and a prepare_inputs method, that returns a dict with the chosen events.

Attributes:

`sandbox`
`plot_function`
`skip_processes`
`plot_sub_processes`
`skip_uncertainties`
`reqs`
`exclude_index`
`exclude_params_branch`
`exclude_params_htcondor_workflow`
`exclude_params_index`
`exclude_params_remote_workflow`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_get`
`exclude_params_req_set`
`exclude_params_sandbox`
`exclude_params_slurm_workflow`
`exclude_params_workflow`

Methods:

`store_parts`()	Create parts to create the output path to store intermediary results for the current `Task`.
`create_branch_map`()	Abstract method that must be overwritten by inheriting tasks to define the branch map.
`requires`()	The Tasks that this Task depends on.
`workflow_requires`([only_super])	Hook to add workflow requirements.
`output`()	The output that this Task produces.
`prepare_inputs`()	prepare the inputs for the plot function, based on the given configuration and category.

sandbox = 'bash::$CF_BASE/sandboxes/venv_columnar.sh'#

plot_function = <luigi.parameter.Parameter object>#

skip_processes = <law.parameter.CSVParameter object>#

plot_sub_processes = <luigi.parameter.BoolParameter object>#

skip_uncertainties = <luigi.parameter.BoolParameter object>#

reqs = {'BuildBashSandbox': <class 'columnflow.tasks.framework.remote.BuildBashSandbox'>, 'BundleBashSandbox': <class 'columnflow.tasks.framework.remote.BundleBashSandbox'>, 'BundleCMSSWSandbox': <class 'columnflow.tasks.framework.remote.BundleCMSSWSandbox'>, 'BundleRepo': <class 'columnflow.tasks.framework.remote.BundleRepo'>, 'BundleSoftware': <class 'columnflow.tasks.framework.remote.BundleSoftware'>, 'MergeMLEvaluation': <class 'columnflow.tasks.ml.MergeMLEvaluation'>}#

store_parts()[source]#

Create parts to create the output path to store intermediary results for the current Task.

Calls store_parts() of the super class and inserts {“producers”: “prod__{HASH}”} before keyword version. Here, HASH is the joint string of the first five producer names + a hash created with law.util.create_hash() based on the list of producers, starting at its 5th element (i.e. self.producers[5:]) For more information, see e.g. store_parts().

Returns:: Updated parts to create output path to store intermediary results.

create_branch_map()[source]#: Abstract method that must be overwritten by inheriting tasks to define the branch map.

requires()[source]#

The Tasks that this Task depends on.

See Task.requires

workflow_requires(only_super=False)[source]#: Hook to add workflow requirements. This method is expected to return a dictionary. When this method is called from a branch task, an exception is raised.

output()[source]#

The output that this Task produces.

Return type:: dict[str, list]

Implementation note: If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

prepare_inputs()[source]#

prepare the inputs for the plot function, based on the given configuration and category.

Raises:

NotImplementedError – This error is raised if a given dataset contains more than one process.
ValueError – This error is raised if plot_sub_processes is used without providing the process_ids column in the data

Return type:

dict[str, Array]

Returns:

dict[str, ak.Array]: A dictionary with the dataset names as keys and the corresponding predictions as values.

exclude_index = False#

exclude_params_branch = {'acceptance', 'branches', 'cancel_jobs', 'cleanup_jobs', 'htcondor_cpus', 'htcondor_flavor', 'htcondor_gpus', 'htcondor_logs', 'htcondor_memory', 'htcondor_pool', 'htcondor_scheduler', 'htcondor_share_software', 'ignore_submission', 'job_workers', 'max_runtime', 'no_poll', 'parallel_jobs', 'pilot', 'poll_fails', 'poll_interval', 'retries', 'shuffle_jobs', 'slurm_flavor', 'slurm_partition', 'submission_threads', 'tasks_per_job', 'tolerance', 'transfer_logs', 'walltime'}#

exclude_params_htcondor_workflow = {}#

exclude_params_index = {'effective_workflow'}#

exclude_params_remote_workflow = {}#

exclude_params_repr = {'cancel_jobs', 'cleanup_jobs', 'workflow'}#

exclude_params_repr_empty = {'ml_model', 'selector_steps'}#

exclude_params_req = {'effective_workflow'}#

exclude_params_req_get = {}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'log_file', 'sandbox'}#

exclude_params_slurm_workflow = {}#

exclude_params_workflow = {'branch'}#

class PlotMLResults(*args, **kwargs)[source]#

Bases: PlotMLResultsBase

A task that generates plots for machine learning results.

This task generates plots for machine learning results based on the given configuration and category. The plots can be either a confusion matrix (CM) or a receiver operating characteristic (ROC) curve. This task uses the output of the MergeMLEvaluation task as input and saves the plots with the corresponding array used to create the plot.

Attributes:

`plot_function`
`exclude_index`
`exclude_params_branch`
`exclude_params_htcondor_workflow`
`exclude_params_index`
`exclude_params_remote_workflow`
`exclude_params_repr`
`exclude_params_repr_empty`
`exclude_params_req`
`exclude_params_req_get`
`exclude_params_req_set`
`exclude_params_sandbox`
`exclude_params_slurm_workflow`
`exclude_params_workflow`

Methods:

`prepare_plot_parameters`()	Helper function to prepare the plot parameters for the plot function.
`output`()	The output that this Task produces.
`run`()	The task run method, to be overridden in a subclass.

plot_function = <luigi.parameter.ChoiceParameter object>#

prepare_plot_parameters()[source]#: Helper function to prepare the plot parameters for the plot function. Implemented to parse the axes labels from the general settings.

exclude_index = False#

exclude_params_branch = {'acceptance', 'branches', 'cancel_jobs', 'cleanup_jobs', 'htcondor_cpus', 'htcondor_flavor', 'htcondor_gpus', 'htcondor_logs', 'htcondor_memory', 'htcondor_pool', 'htcondor_scheduler', 'htcondor_share_software', 'ignore_submission', 'job_workers', 'max_runtime', 'no_poll', 'parallel_jobs', 'pilot', 'poll_fails', 'poll_interval', 'retries', 'shuffle_jobs', 'slurm_flavor', 'slurm_partition', 'submission_threads', 'tasks_per_job', 'tolerance', 'transfer_logs', 'walltime'}#

exclude_params_htcondor_workflow = {}#

exclude_params_index = {'effective_workflow'}#

exclude_params_remote_workflow = {}#

exclude_params_repr = {'cancel_jobs', 'cleanup_jobs', 'workflow'}#

exclude_params_repr_empty = {'ml_model', 'selector_steps'}#

exclude_params_req = {'effective_workflow'}#

exclude_params_req_get = {}#

exclude_params_req_set = {}#

exclude_params_sandbox = {'log_file', 'sandbox'}#

exclude_params_slurm_workflow = {}#

exclude_params_workflow = {'branch'}#

output()[source]#

The output that this Task produces.

Implementation note: If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

run()[source]#

The task run method, to be overridden in a subclass.

See Task.run

`PrepareMLEvents`(args, *kwargs)
`PrepareMLEventsWrapper`(args, *kwargs)
`MergeMLStats`(args, *kwargs)
`MergeMLStatsWrapper`(args, *kwargs)
`MergeMLEvents`(args, *kwargs)
`MergeMLEventsWrapper`(args, *kwargs)
`MLTraining`(args, *kwargs)
`MLEvaluation`(args, *kwargs)
`MLEvaluationWrapper`(args, *kwargs)
`MergeMLEvaluation`(args, *kwargs)	Task to merge events for a dataset, where the MLEvaluation produces multiple parquet files.
`MergeMLEvaluationWrapper`(args, *kwargs)
`PlotMLResultsBase`(args, *kwargs)	A base class, used for the implementation of the ML plotting tasks.
`PlotMLResults`(args, *kwargs)	A task that generates plots for machine learning results.

ml

Contents

ml#

`ml`#