histograms

Contents

histograms#

Task to produce and merge histograms.

Classes:

CreateHistograms(*args, **kwargs)

CreateHistogramsWrapper(*args, **kwargs)

MergeHistograms(*args, **kwargs)

MergeHistogramsWrapper(*args, **kwargs)

MergeShiftedHistograms(*args, **kwargs)

MergeShiftedHistogramsWrapper(*args, **kwargs)

class CreateHistograms(*args, **kwargs)[source]#

Bases: VariablesMixin, WeightProducerMixin, MLModelsMixin, ProducersMixin, SelectorStepsMixin, CalibratorsMixin, ChunkedIOMixin, MergeReducedEventsUser, LocalWorkflow, RemoteWorkflow

Attributes:

sandbox

reqs

missing_column_alias_strategy

category_id_columns

register_weight_producer_shifts

mandatory_columns

check_overlapping_inputs

exclude_index

exclude_params_branch

exclude_params_htcondor_workflow

exclude_params_index

exclude_params_remote_workflow

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

exclude_params_slurm_workflow

exclude_params_workflow

Methods:

workflow_requires()

Hook to add workflow requirements.

requires()

The Tasks that this Task depends on.

output()

The output that this Task produces.

run()

The task run method, to be overridden in a subclass.

sandbox = 'bash::$CF_BASE/sandboxes/venv_columnar.sh'#
reqs = {'BuildBashSandbox': <class 'columnflow.tasks.framework.remote.BuildBashSandbox'>, 'BundleBashSandbox': <class 'columnflow.tasks.framework.remote.BundleBashSandbox'>, 'BundleCMSSWSandbox': <class 'columnflow.tasks.framework.remote.BundleCMSSWSandbox'>, 'BundleRepo': <class 'columnflow.tasks.framework.remote.BundleRepo'>, 'BundleSoftware': <class 'columnflow.tasks.framework.remote.BundleSoftware'>, 'MLEvaluation': <class 'columnflow.tasks.ml.MLEvaluation'>, 'MergeReducedEvents': <class 'columnflow.tasks.reduction.MergeReducedEvents'>, 'MergeReductionStats': <class 'columnflow.tasks.reduction.MergeReductionStats'>, 'ProduceColumns': <class 'columnflow.tasks.production.ProduceColumns'>}#
missing_column_alias_strategy = 'original'#
category_id_columns = {'category_ids'}#
register_weight_producer_shifts = True#
mandatory_columns = {'category_ids', 'process_id'}#
workflow_requires()[source]#

Hook to add workflow requirements. This method is expected to return a dictionary. When this method is called from a branch task, an exception is raised.

requires()[source]#

The Tasks that this Task depends on.

A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.

See Task.requires

output()[source]#

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note

If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

run()[source]#

The task run method, to be overridden in a subclass.

See Task.run

check_overlapping_inputs = <luigi.parameter.BoolParameter object>#
exclude_index = False#
exclude_params_branch = {'acceptance', 'branches', 'cancel_jobs', 'cleanup_jobs', 'htcondor_cpus', 'htcondor_flavor', 'htcondor_gpus', 'htcondor_logs', 'htcondor_memory', 'htcondor_pool', 'htcondor_scheduler', 'htcondor_share_software', 'ignore_submission', 'job_workers', 'max_runtime', 'no_poll', 'parallel_jobs', 'pilot', 'poll_fails', 'poll_interval', 'retries', 'shuffle_jobs', 'slurm_flavor', 'slurm_partition', 'submission_threads', 'tasks_per_job', 'tolerance', 'transfer_logs', 'walltime'}#
exclude_params_htcondor_workflow = {}#
exclude_params_index = {'effective_workflow', 'local_shift'}#
exclude_params_remote_workflow = {'local_shift'}#
exclude_params_repr = {'cancel_jobs', 'cleanup_jobs', 'workflow'}#
exclude_params_repr_empty = {'ml_models', 'selector_steps'}#
exclude_params_req = {'check_finite_output', 'check_overlapping_inputs', 'effective_workflow', 'local_shift'}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'local_shift', 'log_file', 'sandbox'}#
exclude_params_slurm_workflow = {}#
exclude_params_workflow = {'branch'}#
class CreateHistogramsWrapper(*args, **kwargs)#

Bases: AnalysisTask, WrapperTask

Attributes:

configs

datasets

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

shifts

skip_configs

skip_datasets

skip_shifts

Methods:

requires()

Collect requirements defined by the underlying require_cls of the WrapperTask depending on optional additional parameters.

update_wrapper_params(params)

configs = <law.parameter.CSVParameter object>#
datasets = <law.parameter.CSVParameter object>#
exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
requires() Requirements#

Collect requirements defined by the underlying require_cls of the WrapperTask depending on optional additional parameters.

Return type:

Requirements

Returns:

Requirements for the WrapperTask instance.

shifts = <law.parameter.CSVParameter object>#
skip_configs = <law.parameter.CSVParameter object>#
skip_datasets = <law.parameter.CSVParameter object>#
skip_shifts = <law.parameter.CSVParameter object>#
update_wrapper_params(params)#
class MergeHistograms(*args, **kwargs)[source]#

Bases: VariablesMixin, WeightProducerMixin, MLModelsMixin, ProducersMixin, SelectorStepsMixin, CalibratorsMixin, DatasetTask, LocalWorkflow, RemoteWorkflow

Attributes:

only_missing

remove_previous

sandbox

reqs

exclude_index

exclude_params_branch

exclude_params_htcondor_workflow

exclude_params_index

exclude_params_remote_workflow

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

exclude_params_slurm_workflow

exclude_params_workflow

Methods:

create_branch_map()

Define the branch map for when this task is used as a workflow.

workflow_requires()

Hook to add workflow requirements.

requires()

The Tasks that this Task depends on.

output()

The output that this Task produces.

run()

The task run method, to be overridden in a subclass.

only_missing = <luigi.parameter.BoolParameter object>#
remove_previous = <luigi.parameter.BoolParameter object>#
sandbox = 'bash::$CF_BASE/sandboxes/venv_columnar.sh'#
reqs = {'BuildBashSandbox': <class 'columnflow.tasks.framework.remote.BuildBashSandbox'>, 'BundleBashSandbox': <class 'columnflow.tasks.framework.remote.BundleBashSandbox'>, 'BundleCMSSWSandbox': <class 'columnflow.tasks.framework.remote.BundleCMSSWSandbox'>, 'BundleRepo': <class 'columnflow.tasks.framework.remote.BundleRepo'>, 'BundleSoftware': <class 'columnflow.tasks.framework.remote.BundleSoftware'>, 'CreateHistograms': <class 'columnflow.tasks.histograms.CreateHistograms'>}#
create_branch_map()[source]#

Define the branch map for when this task is used as a workflow. By default, use the merging information provided by file_merging_factor to return a dictionary which maps branches to one or more input file indices. E.g. 1 -> [3, 4, 5] would mean that branch 1 is simultaneously handling input file indices 3, 4 and 5.

workflow_requires()[source]#

Hook to add workflow requirements. This method is expected to return a dictionary. When this method is called from a branch task, an exception is raised.

requires()[source]#

The Tasks that this Task depends on.

A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.

See Task.requires

output()[source]#

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note

If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

run()[source]#

The task run method, to be overridden in a subclass.

See Task.run

exclude_index = False#
exclude_params_branch = {'acceptance', 'branches', 'cancel_jobs', 'cleanup_jobs', 'htcondor_cpus', 'htcondor_flavor', 'htcondor_gpus', 'htcondor_logs', 'htcondor_memory', 'htcondor_pool', 'htcondor_scheduler', 'htcondor_share_software', 'ignore_submission', 'job_workers', 'max_runtime', 'no_poll', 'parallel_jobs', 'pilot', 'poll_fails', 'poll_interval', 'retries', 'shuffle_jobs', 'slurm_flavor', 'slurm_partition', 'submission_threads', 'tasks_per_job', 'tolerance', 'transfer_logs', 'walltime'}#
exclude_params_htcondor_workflow = {}#
exclude_params_index = {'effective_workflow', 'local_shift'}#
exclude_params_remote_workflow = {'local_shift'}#
exclude_params_repr = {'cancel_jobs', 'cleanup_jobs', 'workflow'}#
exclude_params_repr_empty = {'ml_models', 'selector_steps'}#
exclude_params_req = {'effective_workflow', 'local_shift'}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'local_shift', 'log_file', 'sandbox'}#
exclude_params_slurm_workflow = {}#
exclude_params_workflow = {'branch'}#
class MergeHistogramsWrapper(*args, **kwargs)#

Bases: AnalysisTask, WrapperTask

Attributes:

configs

datasets

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

shifts

skip_configs

skip_datasets

skip_shifts

Methods:

requires()

Collect requirements defined by the underlying require_cls of the WrapperTask depending on optional additional parameters.

update_wrapper_params(params)

configs = <law.parameter.CSVParameter object>#
datasets = <law.parameter.CSVParameter object>#
exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
requires() Requirements#

Collect requirements defined by the underlying require_cls of the WrapperTask depending on optional additional parameters.

Return type:

Requirements

Returns:

Requirements for the WrapperTask instance.

shifts = <law.parameter.CSVParameter object>#
skip_configs = <law.parameter.CSVParameter object>#
skip_datasets = <law.parameter.CSVParameter object>#
skip_shifts = <law.parameter.CSVParameter object>#
update_wrapper_params(params)#
class MergeShiftedHistograms(*args, **kwargs)[source]#

Bases: VariablesMixin, ShiftSourcesMixin, WeightProducerMixin, MLModelsMixin, ProducersMixin, SelectorStepsMixin, CalibratorsMixin, DatasetTask, LocalWorkflow, RemoteWorkflow

Attributes:

sandbox

shift

effective_shift

allow_empty_shift

allow_empty_shift_sources

reqs

exclude_index

exclude_params_branch

exclude_params_htcondor_workflow

exclude_params_index

exclude_params_remote_workflow

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

exclude_params_slurm_workflow

exclude_params_workflow

Methods:

create_branch_map()

Define the branch map for when this task is used as a workflow.

workflow_requires()

Hook to add workflow requirements.

requires()

The Tasks that this Task depends on.

store_parts()

Create parts to create the output path to store intermediary results for the current Task.

output()

The output that this Task produces.

run()

The task run method, to be overridden in a subclass.

sandbox = 'bash::$CF_BASE/sandboxes/venv_columnar.sh'#
shift = None#
effective_shift = None#
allow_empty_shift = True#
allow_empty_shift_sources = True#
reqs = {'BuildBashSandbox': <class 'columnflow.tasks.framework.remote.BuildBashSandbox'>, 'BundleBashSandbox': <class 'columnflow.tasks.framework.remote.BundleBashSandbox'>, 'BundleCMSSWSandbox': <class 'columnflow.tasks.framework.remote.BundleCMSSWSandbox'>, 'BundleRepo': <class 'columnflow.tasks.framework.remote.BundleRepo'>, 'BundleSoftware': <class 'columnflow.tasks.framework.remote.BundleSoftware'>, 'MergeHistograms': <class 'columnflow.tasks.histograms.MergeHistograms'>}#
create_branch_map()[source]#

Define the branch map for when this task is used as a workflow. By default, use the merging information provided by file_merging_factor to return a dictionary which maps branches to one or more input file indices. E.g. 1 -> [3, 4, 5] would mean that branch 1 is simultaneously handling input file indices 3, 4 and 5.

workflow_requires()[source]#

Hook to add workflow requirements. This method is expected to return a dictionary. When this method is called from a branch task, an exception is raised.

requires()[source]#

The Tasks that this Task depends on.

A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.

See Task.requires

store_parts()[source]#

Create parts to create the output path to store intermediary results for the current Task.

Calls store_parts() of the super class and inserts {“producers”: “prod__{HASH}”} before keyword version. Here, HASH is the joint string of the first five producer names + a hash created with law.util.create_hash() based on the list of producers, starting at its 5th element (i.e. self.producers[5:]) For more information, see e.g. store_parts().

Returns:

Updated parts to create output path to store intermediary results.

output()[source]#

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note

If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

run()[source]#

The task run method, to be overridden in a subclass.

See Task.run

exclude_index = False#
exclude_params_branch = {'acceptance', 'branches', 'cancel_jobs', 'cleanup_jobs', 'htcondor_cpus', 'htcondor_flavor', 'htcondor_gpus', 'htcondor_logs', 'htcondor_memory', 'htcondor_pool', 'htcondor_scheduler', 'htcondor_share_software', 'ignore_submission', 'job_workers', 'max_runtime', 'no_poll', 'parallel_jobs', 'pilot', 'poll_fails', 'poll_interval', 'retries', 'shuffle_jobs', 'slurm_flavor', 'slurm_partition', 'submission_threads', 'tasks_per_job', 'tolerance', 'transfer_logs', 'walltime'}#
exclude_params_htcondor_workflow = {}#
exclude_params_index = {'effective_workflow', 'local_shift'}#
exclude_params_remote_workflow = {'local_shift'}#
exclude_params_repr = {'cancel_jobs', 'cleanup_jobs', 'workflow'}#
exclude_params_repr_empty = {'ml_models', 'selector_steps'}#
exclude_params_req = {'effective_workflow', 'local_shift'}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'local_shift', 'log_file', 'sandbox'}#
exclude_params_slurm_workflow = {}#
exclude_params_workflow = {'branch'}#
class MergeShiftedHistogramsWrapper(*args, **kwargs)#

Bases: AnalysisTask, WrapperTask

Attributes:

configs

datasets

exclude_index

exclude_params_index

exclude_params_repr

exclude_params_repr_empty

exclude_params_req

exclude_params_req_get

exclude_params_req_set

exclude_params_sandbox

skip_configs

skip_datasets

Methods:

requires()

Collect requirements defined by the underlying require_cls of the WrapperTask depending on optional additional parameters.

update_wrapper_params(params)

configs = <law.parameter.CSVParameter object>#
datasets = <law.parameter.CSVParameter object>#
exclude_index = False#
exclude_params_index = {}#
exclude_params_repr = {}#
exclude_params_repr_empty = {}#
exclude_params_req = {}#
exclude_params_req_get = {}#
exclude_params_req_set = {}#
exclude_params_sandbox = {'log_file', 'sandbox'}#
requires() Requirements#

Collect requirements defined by the underlying require_cls of the WrapperTask depending on optional additional parameters.

Return type:

Requirements

Returns:

Requirements for the WrapperTask instance.

skip_configs = <law.parameter.CSVParameter object>#
skip_datasets = <law.parameter.CSVParameter object>#
update_wrapper_params(params)#