Plotting#

In columnflow, there are multiple tasks to create plots. This section showcases how to create and customize a plot based on the PlotVariables1D task. The usage of other plotting tasks is mostly analogous to the PlotVariables1D task. The most important differences compared to PlotVariables1D are presented in a separate section for each of the other plotting tasks. An overview of all plotting tasks is given in the Plotting tasks section.

Creating your first plot#

Assuming you used the analysis template to setup your analysis, you can create a first plot by running

law run cf.PlotVariables1D --version v1 \
    --calibrators example --selector example --producer example \
    --processes data,tt,st --variables n_jet --categories incl,2j

This will run the full analysis chain for the given processes (data, tt, st) and should create plots looking like this:

../_images/cf.PlotVariables1D_tpl_config_analy__1__12dfac316a__plot__proc_3_7727a49dc2__cat_incl__var_n_jet.pdf
../_images/cf.PlotVariables1D_tpl_config_analy__1__12dfac316a__plot__proc_3_7727a49dc2__cat_2j__var_n_jet.pdf
Where do I find that plot?

You can add --print-output 0 to every task call, which will print the full filename of all outputs of the requested task. Alternatively, you can add --fetch-output 0,a to directly copy all outputs of this task into the directory you are currently in. Finally, there is the --view-cmd parameter you can add to directly display the plot during the runtime of the task, e.g. via --view-cmd evince-previewer.

The PlotVariables1D task is located at the bottom of our task graph, which means that all tasks leading to PlotVariables1D will be run for all datasets corresponding to the --processes we requested using the Calibrators, Selector, and Producers (often referred to as CSPs) as requested. In the following examples, we will skip the --calibrators, --selector and --producers parameters, which means that the defaults defined in the config will be used automatically. Examples on how to implement your own CSPs can be found in the calibrators, selectors, and producers sections of the user guide. The --variables parameter defines, for which variables we want to create histograms and plots. Variables are order objects that need to be defined in the config as shown in the config objects section. The column corresponding to the expression statement needs to be stored either after the ReduceEvents task or as part of a Producer used in the ProduceColumns task. For each of the category given with the --categories parameter, one plot will be produced. A detailed guide on how to implement categories in Columnflow is given in the categories section.

To define which processes and datasets to consider when plotting, you can use the --processes and --datasets parameter. When only processes are given, all datasets corresponding to the requested processes will be considered. When only datasets are given, all processes in the config will be considered. The --processes parameter can be used to change the order of processes in the stack and the legend (try for example --processes st,tt instead) and to further distinguish between sub-processes (e.g. via --processses tt_sl,tt_dl,tt_fh).

IMPORTANT! Do not add the same dataset via multiple processes!

At the time of writing this documentation, there is still an issue present that histograms corresponding to a dataset can accidentally be used multiple times. For example, when adding --processes tt,tt_sl, the events corresponding to the dataset tt_sl_powheg will be displayed twice in the resulting plot.

Customization of plots#

There are many different parameters implemented that allow customizing the style of a plot. A short overview to all plotting parameters is given in the Plotting tasks. In the following, a few exemplary task calls are given to present the usage of our plotting parameters, using the PlotVariables1D task. Most paramaters are shared between the different plotting tasks. The most important changes regarding the task parameters are discussed in separate sections for each type of plotting task.

Per default, the PlotVariables1D task creates one plot per variable with all Monte Carlo processes being included in a stack and data being shown as separate points. The bottom subplot shows the ratio between signal and all processes included in the stack and can be disabled via the --skip_ratio parameter. To change the text next to the label, you can add the --cms-label parameter.

What are the cms-label options?

In general, this parameter accepts all types of strings, but there is a set of shortcuts for commonly used labels that will automatically be resolved:

To compare shapes of multiple processes, you might want to plot each process separately as one line. To achieve this, you can use the unstack option of the --process-settings parameter. This parameter can also be used to change other attributes of your process instances, such as color, label, and the scale. To better compare shapes of processes, we can normalize each line with the --shape-norm parameter. Combining all the previously discussed parameters might lead to a task call such as

law run cf.PlotVariables1D --version v1 --processes tt,st --variables n_jet,jet1_pt \
    --skip-ratio --shape-norm --cms-label simpw \
    --process-settings "tt,unstack,color=#e41a1c:st,unstack,label=Single Top"

to produce the following plot:

../_images/cf.PlotVariables1D_tpl_config_analy__1__0191de868f__plot__proc_2_a2211e799f__cat_incl__var_jet1_pt__c1.pdf
../_images/cf.PlotVariables1D_tpl_config_analy__1__0191de868f__plot__proc_2_a2211e799f__cat_incl__var_n_jet__c1.pdf

Parameters that only contain a single value can also be passed via the --general-settings, which is a single comma-separated list of parameters, where the name and the value are separated via a =. The value of each parameter is automatically resolved to either a float, bool, or a string. When no = is present, the parameter is automatically set to True.

What is the advantage of setting parameters via the --general-settings parameter?

While there is no direct advantage of setting parameters via the --general-settings, this parameter provides some convenience by allowing you to define defaults and groups in the config (will be discussed later in the guide).

Additionally, this parameter allows you to set parameters on the command line that are not directly implemented as task parameters. This is especially helpful when you want to parametrize custom plotting functions.

We can also change the y-scale of the plot to a log scale by adding --yscale log and change some properties of specific variables via the variable-settings parameter. For example, we might want to create the plots of our two obserables in one call, but would like to try out a rebinned version of jet1_pt that merges bins by a factor of 10. A corresponding task call might be

law run cf.PlotVariables1D --version v1 --processes tt,st --variables n_jet,jet1_pt \
    --general-settings "skip_ratio,shape_norm,yscale=log,cms-label=simpw" \
    --variable-settings "n_jet,y_title=Events,x_title=N jets:jet1_pt,rebin=10,x_title=Leading jet \$p_{T}\$"
../_images/cf.PlotVariables1D_tpl_config_analy__1__c80529af83__plot__proc_2_a2211e799f__cat_incl__var_jet1_pt__c2.pdf
../_images/cf.PlotVariables1D_tpl_config_analy__1__c80529af83__plot__proc_2_a2211e799f__cat_incl__var_n_jet__c2.pdf
Limitations of the variable_settings

While in theory we can change anything inside the variable and process instances via the variable_settings parameter, there are certain attributes that are already used during the creation of the histograms (e.g. the expression and the binning). Since our variable_settings parameter only modifies these attributes during the runtime of our plotting task, this will not impact our final results.

For the general_settings, process_settings, and variable_settings you can define defaults and groups in the config, e.g. via

config_inst.x.default_variable_settings = {"jet1_pt": {"rebin": 4, "x_title": r"Leading jet $p_{T}$"}}
config_inst.x.process_settings_groups = {
    "unstack_processes": {proc: {"unstack": True} for proc in ("tt", "st")},
}
config_inst.x.general_settings_groups = {
    "compare_shapes": {"skip_ratio": True, "shape_norm": True, "yscale": "log", "cms_label": "simpw"},
}

The default is automatically used when no parameter is given in the task call, and the groups can be used directly on the command line and will be resolved automatically. Our previously defined defaults and groups will be used e.g. by the following task call:

law run cf.PlotVariables1D --version v1 --processes tt,st --variables n_jet,jet1_pt \
    --process-settings unstack_processes --general-settings compare_shapes
../_images/cf.PlotVariables1D_tpl_config_analy__1__be60d3bca7__plot__proc_2_a2211e799f__cat_incl__var_jet1_pt__c3.pdf
../_images/cf.PlotVariables1D_tpl_config_analy__1__be60d3bca7__plot__proc_2_a2211e799f__cat_incl__var_n_jet__c3.pdf

Creating 2D plots#

Columnflow also provides the PlotVariables2D task to create two-dimensional plots. Two-dimensional histograms are created by passing two variables to the --variables parameter, separated by a -. Here is an exemplary task call and their outputs.

law run cf.PlotVariables2D --version v1 \
    --processes tt,st --variables n_jet-jet1_pt,jet1_pt-n_jet
../_images/cf.PlotVariables2D_tpl_config_analy__1__b27b994979__plot__proc_2_a2211e799f__cat_incl__var_jet1_pt-n_jet.pdf
../_images/cf.PlotVariables2D_tpl_config_analy__1__b27b994979__plot__proc_2_a2211e799f__cat_incl__var_n_jet-jet1_pt.pdf

While most of the plotting parameters used in the PlotVariables1D task can be reused for this task, there are also some additional parameters only available for 2D plotting tasks. For more information on the task parameters of the PlotVariables2D task, take a look into the plotting task overview.

Creating cutflow plots#

The previously discussed plotting functions only create plots after applying the full event selection. To allow inspecting and optimizing an event and object selection, Columnflow also includes plotting tasks that can produce plots after each individual selection step.

To create a simple cutflow plot, displaying event yields after each individual selection step, you can use the PlotCutflow task, e.g. via calling

law run cf.PlotCutflow --version v1 \
    --calibrators example --selector example --categories incl,2j \
    --shape-norm --process-settings tt,unstack:st,unstack \
    --processes tt,st --selector-steps jet,muon
../_images/cf.PlotCutflow_tpl_config_analy__1__12a17bf79c__cutflow__cat_incl.pdf
../_images/cf.PlotCutflow_tpl_config_analy__1__12a17bf79c__cutflow__cat_2j.pdf

This will produce a plot with three bins, containing the event yield before applying any selection and after each selector step, where we always apply the logical and of all previous selector steps.

What are the options for the --selector-steps? Can I customize the step labels?

The steps listed in the --selector-steps parameter need to be defined by the Selector that has been used. A detailed guide on how to implement your own selector can be found in the Selections guide.

Per default, the name of the selector step is used on the x-axis, but you can also provide custom step labels via the config:

config_inst.x.selector_step_labels = {
    "muon": r"$N_{muon} = 1$",
    "jet": r"$N_{jets}^{AK4} \geq 1$",
}

To create plots of variables as part of the cutflow, we also provide the PlotCutflowVariables1D, which mostly behaves the same as the PlotVariables1D task.

law run cf.PlotCutflowVariables1D --version v1 \
    --calibrators example --selector example \
    --processes tt,st --variables cf_jet1_pt --categories incl \
    --selector-steps jet,muon --per-plot processes
../_images/cf.PlotCutflowVariables1D_tpl_config_analy__1__d8a37d3da9__plot__step0_Initial__proc_2_a2211e799f__cat_incl__var_cf_jet1_pt.pdf
../_images/cf.PlotCutflowVariables1D_tpl_config_analy__1__d8a37d3da9__plot__step1_jet__proc_2_a2211e799f__cat_incl__var_cf_jet1_pt.pdf
../_images/cf.PlotCutflowVariables1D_tpl_config_analy__1__d8a37d3da9__plot__step2_muon__proc_2_a2211e799f__cat_incl__var_cf_jet1_pt.pdf

The per-plot parameter defines whether to produce one plot per selector step (per-plot processes) or one plot per process (per-plot steps). For the per-plot steps option, try the following task call:

law run cf.PlotCutflowVariables1D --version v1 \
    --calibrators example --selector example \
    --processes tt,st --variables cf_jet1_pt --categories incl \
    --selector-steps jet,muon --per-plot steps
../_images/cf.PlotCutflowVariables1D_tpl_config_analy__1__c3947accbb__plot__proc_st__cat_incl__var_cf_jet1_pt.pdf
../_images/cf.PlotCutflowVariables1D_tpl_config_analy__1__c3947accbb__plot__proc_tt__cat_incl__var_cf_jet1_pt.pdf

Creating plots for different shifts#

Like most tasks, our plotting tasks also contain the --shift parameter that allows requesting the outputs for a certain type of systematic variation. Per default, the shift parameter is set to “nominal”, but you could also produce your plot with a certain systematic uncertainty varied up or down, e.g. via running

law run cf.PlotVariables1D --version v1 \
    --processes tt,st --variables n_jet --shift mu_up

If you already ran the same task call with --shift nominal before, this will only require to produce new histograms and plots, as a shift such as the mu_up is typically implemented as an event weight and therefore does not require to reproduce any columns. Other shifts such as jec_up also impact our event selection and therefore also need to re-run anything starting from SelectEvents. A detailed overview on how to implement different types of systematic uncertainties is given in the systematics section (TODO: not existing).

For directly comparing differences introduced by one shift source, we provide the PlotShiftedVariables1D task. Instead of the --shift parameter, this task implements the --shift-sources option and creates one plot per shift source displaying the nominal distribution (black) compared to the shift source varied up (red) and down (blue). The task can be called e.g. via

law run cf.PlotShiftedVariables1D --version v1 \
    --processes tt,st --variables jet1_pt,n_jet --shift-sources mu

and produces the following plot:

../_images/cf.PlotShiftedVariables1D_tpl_config_analy__1__42b45aba89__plot__proc_2_a2211e799f__unc_mu__cat_incl__var_jet1_pt.pdf
../_images/cf.PlotShiftedVariables1D_tpl_config_analy__1__42b45aba89__plot__proc_2_a2211e799f__unc_mu__cat_incl__var_n_jet.pdf

This produces per default only one plot containing the sum of all processes. To produce this plot per process, you can use the PlotShiftedVariablesPerProcess1D task

Directly displaying plots in the terminal#

All plotting tasks also include a --view-cmd parameter that allows directly printing the plot during the runtime of the task:

law run cf.PlotVariables1D --version v1 \
    --processes tt,st --variables n_jet --view-cmd evince-previewer

Using your own plotting function#

While all plotting tasks provide default plotting functions which implement many parameters to customize the plot, it might be necessary to write your own plotting functions if you want to create a specific type of plot. In that case, you can simply write a function that follows the signature of all other plotting functions and call a plotting task with this function using the --plot-function parameter.

An example on how to implement such a plotting function is shown in the following:

def my_plot1d_func(
    hists: OrderedDict[od.Process, hist.Hist],
    config_inst: od.Config,
    category_inst: od.Category,
    variable_insts: list[od.Variable],
    style_config: dict | None = None,
    yscale: str | None = "",
    process_settings: dict | None = None,
    variable_settings: dict | None = None,
    example_param: str | float | bool | None = None,
    **kwargs,
) -> tuple(plt.Figure, tuple(plt.Axis,)):
    """
    This is an exemplary custom plotting function.

    Exemplary task call:

    .. code-block:: bash
        law run cf.PlotVariables1D --version v1 --processes st,tt --variables jet1_pt \
            --plot-function __cf_module_name__.plotting.example.my_plot1d_func \
            --general-settings example_param=some_text
    """
    # we can add arbitrary parameters via the `general_settings` parameter to access them in the
    # plotting function. They are automatically parsed either to a bool, float, or string
    print(f"The example_param has been set to '{example_param}' (type: {type(example_param)})")

    # call helper function to remove shift axis from histogram
    remove_residual_axis(hists, "shift")

    # call helper functions to apply the variable_settings and process_settings
    variable_inst = variable_insts[0]
    hists = apply_variable_settings(hists, variable_insts, variable_settings)
    hists = apply_process_settings(hists, process_settings)

    # use the mplhep CMS stype
    plt.style.use(mplhep.style.CMS)

    # create a figure and fill it with content
    fig, ax = plt.subplots()
    for proc_inst, h in hists.items():
        h.plot1d(
            ax=ax,
            label=proc_inst.label,
            color=proc_inst.color1,
        )

    # styling and parameter implementation (e.g. `yscale`)
    ax.set(
        yscale=yscale,
        ylabel=variable_inst.get_full_y_title(),
        xlabel=variable_inst.get_full_x_title(),
        xscale="log" if variable_inst.log_x else "linear",
    )
    ax.legend()
    mplhep.cms.label(ax=ax, fontsize=22, llabel="private work")

    # task expects a figure and a tuple of axes as output
    return fig, (ax,)