Plotting#

In columnflow, there are multiple tasks to create plots. This section showcases how to create and customize a plot based on the PlotVariables1D task. The usage of other plotting tasks is mostly analogous to the PlotVariables1D task. The most important differences compared to PlotVariables1D are presented in a separate section for each of the other plotting tasks. An overview of all plotting tasks is given in the Plotting tasks section.

Creating your first plot#

Assuming you used the analysis template to setup your analysis, you can create a first plot by running

law run cf.PlotVariables1D --version v1 \
    --calibrators example --selector example --producer example \
    --processes data,tt,st --variables n_jet --categories incl,2j

This will run the full analysis chain for the given processes (data, tt, st) and should create plots looking like this:

../_images/cf.PlotVariables1D_tpl_config_analy__1__12dfac316a__plot__proc_3_7727a49dc2__cat_incl__var_n_jet.pdf

../_images/cf.PlotVariables1D_tpl_config_analy__1__12dfac316a__plot__proc_3_7727a49dc2__cat_2j__var_n_jet.pdf

The PlotVariables1D task is located at the bottom of our task graph, which means that all tasks leading to PlotVariables1D will be run for all datasets corresponding to the --processes we requested using the Calibrators, Selector, and Producers (often referred to as CSPs) as requested. In the following examples, we will skip the --calibrators, --selector and --producers parameters, which means that the defaults defined in the config will be used automatically. Examples on how to implement your own CSPs can be found in the calibrators, selectors, and producers sections of the user guide. The --variables parameter defines, for which variables we want to create histograms and plots. Variables are order objects that need to be defined in the config as shown in the config objects section. The column corresponding to the expression statement needs to be stored either after the ReduceEvents task or as part of a Producer used in the ProduceColumns task. For each of the category given with the --categories parameter, one plot will be produced. A detailed guide on how to implement categories in Columnflow is given in the categories section.

To define which processes and datasets to consider when plotting, you can use the --processes and --datasets parameter. When only processes are given, all datasets corresponding to the requested processes will be considered. When only datasets are given, all processes in the config will be considered. The --processes parameter can be used to change the order of processes in the stack and the legend (try for example --processes st,tt instead) and to further distinguish between sub-processes (e.g. via --processses tt_sl,tt_dl,tt_fh).

Customization of plots#

There are many different parameters implemented that allow customizing the style of a plot. A short overview to all plotting parameters is given in the Plotting tasks. In the following, a few exemplary task calls are given to present the usage of our plotting parameters, using the PlotVariables1D task. Most paramaters are shared between the different plotting tasks. The most important changes regarding the task parameters are discussed in separate sections for each type of plotting task.

Per default, the PlotVariables1D task creates one plot per variable with all Monte Carlo processes being included in a stack and data being shown as separate points. The bottom subplot shows the ratio between signal and all processes included in the stack and can be disabled via the --skip_ratio parameter. To change the text next to the label, you can add the --cms-label parameter.

To compare shapes of multiple processes, you might want to plot each process separately as one line. To achieve this, you can use the unstack option of the --process-settings parameter. This parameter can also be used to change other attributes of your process instances, such as color, label, and the scale. To better compare shapes of processes, we can normalize each line with the --shape-norm parameter. Combining all the previously discussed parameters might lead to a task call such as

law run cf.PlotVariables1D --version v1 --processes tt,st --variables n_jet,jet1_pt \
    --skip-ratio --shape-norm --cms-label simpw \
    --process-settings "tt,unstack,color=#e41a1c:st,unstack,label=Single Top"

to produce the following plot:

../_images/cf.PlotVariables1D_tpl_config_analy__1__0191de868f__plot__proc_2_a2211e799f__cat_incl__var_jet1_pt__c1.pdf

../_images/cf.PlotVariables1D_tpl_config_analy__1__0191de868f__plot__proc_2_a2211e799f__cat_incl__var_n_jet__c1.pdf

Parameters that only contain a single value can also be passed via the --general-settings, which is a single comma-separated list of parameters, where the name and the value are separated via a =. The value of each parameter is automatically resolved to either a float, bool, or a string. When no = is present, the parameter is automatically set to True.

We can also change the y-scale of the plot to a log scale by adding --yscale log and change some properties of specific variables via the variable-settings parameter. For example, we might want to create the plots of our two obserables in one call, but would like to try out a rebinned version of jet1_pt that merges bins by a factor of 10. A corresponding task call might be

law run cf.PlotVariables1D --version v1 --processes tt,st --variables n_jet,jet1_pt \
    --general-settings "skip_ratio,shape_norm,yscale=log,cms-label=simpw" \
    --variable-settings "n_jet,y_title=Events,x_title=N jets:jet1_pt,rebin=10,x_title=Leading jet \$p_{T}\$"

../_images/cf.PlotVariables1D_tpl_config_analy__1__c80529af83__plot__proc_2_a2211e799f__cat_incl__var_jet1_pt__c2.pdf

../_images/cf.PlotVariables1D_tpl_config_analy__1__c80529af83__plot__proc_2_a2211e799f__cat_incl__var_n_jet__c2.pdf

For the general_settings, process_settings, and variable_settings you can define defaults and groups in the config, e.g. via

config_inst.x.default_variable_settings = {"jet1_pt": {"rebin": 4, "x_title": r"Leading jet $p_{T}$"}}
config_inst.x.process_settings_groups = {
    "unstack_processes": {proc: {"unstack": True} for proc in ("tt", "st")},
}
config_inst.x.general_settings_groups = {
    "compare_shapes": {"skip_ratio": True, "shape_norm": True, "yscale": "log", "cms_label": "simpw"},
}

The default is automatically used when no parameter is given in the task call, and the groups can be used directly on the command line and will be resolved automatically. Our previously defined defaults and groups will be used e.g. by the following task call:

law run cf.PlotVariables1D --version v1 --processes tt,st --variables n_jet,jet1_pt \
    --process-settings unstack_processes --general-settings compare_shapes

../_images/cf.PlotVariables1D_tpl_config_analy__1__be60d3bca7__plot__proc_2_a2211e799f__cat_incl__var_jet1_pt__c3.pdf

../_images/cf.PlotVariables1D_tpl_config_analy__1__be60d3bca7__plot__proc_2_a2211e799f__cat_incl__var_n_jet__c3.pdf

Creating 2D plots#

Columnflow also provides the PlotVariables2D task to create two-dimensional plots. Two-dimensional histograms are created by passing two variables to the --variables parameter, separated by a -. Here is an exemplary task call and their outputs.

law run cf.PlotVariables2D --version v1 \
    --processes tt,st --variables n_jet-jet1_pt,jet1_pt-n_jet

../_images/cf.PlotVariables2D_tpl_config_analy__1__b27b994979__plot__proc_2_a2211e799f__cat_incl__var_jet1_pt-n_jet.pdf

../_images/cf.PlotVariables2D_tpl_config_analy__1__b27b994979__plot__proc_2_a2211e799f__cat_incl__var_n_jet-jet1_pt.pdf

While most of the plotting parameters used in the PlotVariables1D task can be reused for this task, there are also some additional parameters only available for 2D plotting tasks. For more information on the task parameters of the PlotVariables2D task, take a look into the plotting task overview.

Creating cutflow plots#

The previously discussed plotting functions only create plots after applying the full event selection. To allow inspecting and optimizing an event and object selection, Columnflow also includes plotting tasks that can produce plots after each individual selection step.

To create a simple cutflow plot, displaying event yields after each individual selection step, you can use the PlotCutflow task, e.g. via calling

law run cf.PlotCutflow --version v1 \
    --calibrators example --selector example --categories incl,2j \
    --shape-norm --process-settings tt,unstack:st,unstack \
    --processes tt,st --selector-steps jet,muon

../_images/cf.PlotCutflow_tpl_config_analy__1__12a17bf79c__cutflow__cat_incl.pdf

../_images/cf.PlotCutflow_tpl_config_analy__1__12a17bf79c__cutflow__cat_2j.pdf

This will produce a plot with three bins, containing the event yield before applying any selection and after each selector step, where we always apply the logical and of all previous selector steps.

To create plots of variables as part of the cutflow, we also provide the PlotCutflowVariables1D, which mostly behaves the same as the PlotVariables1D task.

law run cf.PlotCutflowVariables1D --version v1 \
    --calibrators example --selector example \
    --processes tt,st --variables cf_jet1_pt --categories incl \
    --selector-steps jet,muon --per-plot processes

../_images/cf.PlotCutflowVariables1D_tpl_config_analy__1__d8a37d3da9__plot__step0_Initial__proc_2_a2211e799f__cat_incl__var_cf_jet1_pt.pdf

../_images/cf.PlotCutflowVariables1D_tpl_config_analy__1__d8a37d3da9__plot__step1_jet__proc_2_a2211e799f__cat_incl__var_cf_jet1_pt.pdf

../_images/cf.PlotCutflowVariables1D_tpl_config_analy__1__d8a37d3da9__plot__step2_muon__proc_2_a2211e799f__cat_incl__var_cf_jet1_pt.pdf

The per-plot parameter defines whether to produce one plot per selector step (per-plot processes) or one plot per process (per-plot steps). For the per-plot steps option, try the following task call:

law run cf.PlotCutflowVariables1D --version v1 \
    --calibrators example --selector example \
    --processes tt,st --variables cf_jet1_pt --categories incl \
    --selector-steps jet,muon --per-plot steps

../_images/cf.PlotCutflowVariables1D_tpl_config_analy__1__c3947accbb__plot__proc_st__cat_incl__var_cf_jet1_pt.pdf

../_images/cf.PlotCutflowVariables1D_tpl_config_analy__1__c3947accbb__plot__proc_tt__cat_incl__var_cf_jet1_pt.pdf

Creating plots for different shifts#

Like most tasks, our plotting tasks also contain the --shift parameter that allows requesting the outputs for a certain type of systematic variation. Per default, the shift parameter is set to “nominal”, but you could also produce your plot with a certain systematic uncertainty varied up or down, e.g. via running

law run cf.PlotVariables1D --version v1 \
    --processes tt,st --variables n_jet --shift mu_up

If you already ran the same task call with --shift nominal before, this will only require to produce new histograms and plots, as a shift such as the mu_up is typically implemented as an event weight and therefore does not require to reproduce any columns. Other shifts such as jec_up also impact our event selection and therefore also need to re-run anything starting from SelectEvents. A detailed overview on how to implement different types of systematic uncertainties is given in the systematics section (TODO: not existing).

For directly comparing differences introduced by one shift source, we provide the PlotShiftedVariables1D task. Instead of the --shift parameter, this task implements the --shift-sources option and creates one plot per shift source displaying the nominal distribution (black) compared to the shift source varied up (red) and down (blue). The task can be called e.g. via

law run cf.PlotShiftedVariables1D --version v1 \
    --processes tt,st --variables jet1_pt,n_jet --shift-sources mu

and produces the following plot:

../_images/cf.PlotShiftedVariables1D_tpl_config_analy__1__42b45aba89__plot__proc_2_a2211e799f__unc_mu__cat_incl__var_jet1_pt.pdf

../_images/cf.PlotShiftedVariables1D_tpl_config_analy__1__42b45aba89__plot__proc_2_a2211e799f__unc_mu__cat_incl__var_n_jet.pdf

This produces per default only one plot containing the sum of all processes. To produce this plot per process, you can use the PlotShiftedVariablesPerProcess1D task

Directly displaying plots in the terminal#

All plotting tasks also include a --view-cmd parameter that allows directly printing the plot during the runtime of the task:

law run cf.PlotVariables1D --version v1 \
    --processes tt,st --variables n_jet --view-cmd evince-previewer

Using your own plotting function#

While all plotting tasks provide default plotting functions which implement many parameters to customize the plot, it might be necessary to write your own plotting functions if you want to create a specific type of plot. In that case, you can simply write a function that follows the signature of all other plotting functions and call a plotting task with this function using the --plot-function parameter.

An example on how to implement such a plotting function is shown in the following:

def my_plot1d_func(
    hists: OrderedDict[od.Process, hist.Hist],
    config_inst: od.Config,
    category_inst: od.Category,
    variable_insts: list[od.Variable],
    style_config: dict | None = None,
    yscale: str | None = "",
    process_settings: dict | None = None,
    variable_settings: dict | None = None,
    example_param: str | float | bool | None = None,
    **kwargs,
) -> tuple(plt.Figure, tuple(plt.Axis,)):
    """
    This is an exemplary custom plotting function.

    Exemplary task call:

    .. code-block:: bash
        law run cf.PlotVariables1D --version v1 --processes st,tt --variables jet1_pt \
            --plot-function __cf_module_name__.plotting.example.my_plot1d_func \
            --general-settings example_param=some_text
    """
    # we can add arbitrary parameters via the `general_settings` parameter to access them in the
    # plotting function. They are automatically parsed either to a bool, float, or string
    print(f"The example_param has been set to '{example_param}' (type: {type(example_param)})")

    # call helper function to remove shift axis from histogram
    remove_residual_axis(hists, "shift")

    # call helper functions to apply the variable_settings and process_settings
    variable_inst = variable_insts[0]
    hists = apply_variable_settings(hists, variable_insts, variable_settings)
    hists = apply_process_settings(hists, process_settings)

    # use the mplhep CMS stype
    plt.style.use(mplhep.style.CMS)

    # create a figure and fill it with content
    fig, ax = plt.subplots()
    for proc_inst, h in hists.items():
        h.plot1d(
            ax=ax,
            label=proc_inst.label,
            color=proc_inst.color1,
        )

    # styling and parameter implementation (e.g. `yscale`)
    ax.set(
        yscale=yscale,
        ylabel=variable_inst.get_full_y_title(),
        xlabel=variable_inst.get_full_x_title(),
        xscale="log" if variable_inst.log_x else "linear",
    )
    ax.legend()
    mplhep.cms.label(ax=ax, fontsize=22, llabel="private work")

    # task expects a figure and a tuple of axes as output
    return fig, (ax,)

Plotting

Contents