columnflow.util

Contents

columnflow.util#

Collection of general helpers and utilities.

Data:

UNSET

Placeholder for an unset value.

primes

List of the first 200 primes.

Functions:

maybe_import(name[, package, force])

Calls importlib.import_module internally and returns the module if it exists, or otherwise a MockModule instance with the same name.

import_plt()

Lazily imports and configures matplotlib pyplot.

import_ROOT()

Lazily imports and configures ROOT.

import_file(path[, attr])

Loads the content of a python file located at path and returns its package content as a dictionary.

ipython_shell([confirm_exit, pretty_print, ...])

Starts an IPython shell with configurable parameters.

create_random_name()

Returns a random string based on UUID v4.

expand_path(*path)

Takes path fragments, joins them and recursively expands all contained environment variables.

real_path(*path)

Takes path fragments and returns the real, absolute location with all variables expanded.

ensure_dir(path)

Ensures that a directory at path (and its subdirectories) exists and returns the full, expanded path.

wget(src, dst[, force])

Downloads a file from a remote src to a local destination dst, creating intermediate directories when needed.

call_thread(func[, args, kwargs, timeout])

Execute a function func in a thread and aborts the call when timeout is reached.

call_proc(func[, args, kwargs, timeout])

Execute a function func in a process and aborts the call when timeout is reached.

ensure_proxy(fn, opts, task, *args, **kwargs)

Law task decorator that checks whether either a voms or arc proxy is existing before calling the decorated method.

dev_sandbox(sandbox[, add, remove])

Takes a sandbox key sandbox and adds or removes the substring "_dev" right before the file extension (if any), depending on whether the current environment is used for development (see env_is_dev) and the add and remove flags.

freeze(cont)

Constructs an immutable version of a native Python container.

memoize(f)

Function decorator that implements memoization.

safe_div(a, b)

Returns a divided by b if b is not zero, and zero otherwise.

try_float(f)

Tests whether a value f can be converted to a float.

try_complex(f)

Tests whether a value f can be converted to a complex number.

try_int(i)

Tests whether a value i can be converted to an integer.

maybe_int(i)

Returns i as an integer if it is a whole number, and as a float otherwise.

is_pattern(s)

Returns True if a string s contains pattern characters such as "*" or "?", and False otherwise.

is_regex(s)

Returns True if a string s is a regular expression starting with "^" and ending with "$", and False otherwise.

pattern_matcher(pattern[, mode])

Takes a string pattern which might be an actual pattern for fnmatching, a regular expressions or just a plain string and returns a function that can be used to test of a string matches that pattern.

dict_add_strict(d, key, value)

Adds key-value pair to dictionary, but only if it does not change an existing value; Raises KeyError otherwise.

get_source_code(obj[, indent])

Returns the source code of any object obj as a string.

classproperty(func)

Propety decorator for class-level methods.

load_correction_set(target)

Loads a correction set using the correctionlib from a file target.

Classes:

DotDict

Subclass of OrderedDict that provides read and write access to items via attributes by implementing __getattr__ and __setattr__.

MockModule(name)

Mockup object that resembles a module with arbitrarily deep structure such that, e.g.,

FunctionArgs(*args, **kwargs)

Light-weight utility class that wraps all passed args and kwargs and allows to invoke different functions with them.

ClassPropertyDescriptor(fget[, fset])

Generic descriptor class that is used by classproperty().

DerivableMeta(cls_name, bases, cls_dict)

Meta class for Derivable objects providing class-level features such as improved tracing and lookup of subclasses, and single-line subclassing for partial-like overwriting of class-level attributes.

Derivable()

Derivable base class with features provided by the meta DerivableMeta.

KeyValueMessage(*args, key, value, **kwargs)

Subclass of luigi.worker.SchedulerMessage that adds key and value attributes, parsed from the incoming message assuming a format key = value.

UNSET = <object object>#

Placeholder for an unset value.

primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997, 1009, 1013, 1019, 1021, 1031, 1033, 1039, 1049, 1051, 1061, 1063, 1069, 1087, 1091, 1093, 1097, 1103, 1109, 1117, 1123, 1129, 1151, 1153, 1163, 1171, 1181, 1187, 1193, 1201, 1213, 1217, 1223]#

List of the first 200 primes.

maybe_import(name, package=None, force=False)[source]#

Calls importlib.import_module internally and returns the module if it exists, or otherwise a MockModule instance with the same name. When force is True and the import fails, an ImportError is raised.

Return type:

ModuleType | MockModule

import_plt()[source]#

Lazily imports and configures matplotlib pyplot.

Return type:

ModuleType

import_ROOT()[source]#

Lazily imports and configures ROOT.

Return type:

ModuleType

import_file(path, attr=None)[source]#

Loads the content of a python file located at path and returns its package content as a dictionary. When attr is set, only the attribute with that name is returned.

The file is not required to be importable as its content is loaded directly into the interpreter. While this approach is not necessarily clean, it can be useful in places where custom code must be loaded.

ipython_shell(confirm_exit=False, pretty_print=True, banner=False)[source]#

Starts an IPython shell with configurable parameters.

Parameters:
  • confirm_exit (bool, default: False) – Whether to ask for confirmation before exiting the shell.

  • pretty_print (bool, default: True) – Whether to use pretty printing.

  • banner (bool, default: False) – Whether to display the IPython banner.

create_random_name()[source]#

Returns a random string based on UUID v4.

Return type:

str

expand_path(*path)[source]#

Takes path fragments, joins them and recursively expands all contained environment variables.

Return type:

str

real_path(*path)[source]#

Takes path fragments and returns the real, absolute location with all variables expanded.

Return type:

str

ensure_dir(path)[source]#

Ensures that a directory at path (and its subdirectories) exists and returns the full, expanded path.

Return type:

str

wget(src, dst, force=False)[source]#

Downloads a file from a remote src to a local destination dst, creating intermediate directories when needed. When dst refers to an existing file, an exception is raised unless force is True.

The full, normalized destination path is returned.

Return type:

str

call_thread(func, args=(), kwargs=None, timeout=None)[source]#

Execute a function func in a thread and aborts the call when timeout is reached. args and kwargs are forwarded to the function.

The return value is a 3-tuple (finsihed_in_time, func(), err).

Return type:

tuple[bool, Any, str | None]

call_proc(func, args=(), kwargs=None, timeout=None)[source]#

Execute a function func in a process and aborts the call when timeout is reached. args and kwargs are forwarded to the function.

The return value is a 3-tuple (finsihed_in_time, func(), err).

Return type:

tuple[bool, Any, str | None]

ensure_proxy(fn, opts, task, *args, **kwargs)[source]#

Law task decorator that checks whether either a voms or arc proxy is existing before calling the decorated method.

Return type:

tuple[Callable, Callable, Callable]

dev_sandbox(sandbox, add=True, remove=True)[source]#

Takes a sandbox key sandbox and adds or removes the substring “_dev” right before the file extension (if any), depending on whether the current environment is used for development (see env_is_dev) and the add and remove flags.

If sandbox does not contain the “_dev” postfix and both env_is_dev and add are True, the postfix is appended.

If sandbox does (!) contain the “_dev” postfix, env_is_dev is False and remove is True, the postfix is removed.

In any other case, sandbox is returned unchanged.

Examples:

# if env_is_dev and /path/to/script_dev.sh exists
dev_sandbox("bash::/path/to/script.sh")
# -> "bash::/path/to/script_dev.sh"

# otherwise
dev_sandbox("bash::/path/to/script.sh")
# -> "bash::/path/to/script.sh"
Return type:

str

freeze(cont)[source]#

Constructs an immutable version of a native Python container.

Recursively replaces all mutable containers (dict, list, set) encountered within cont by an immutable equivalent: Lists are converted to tuples, sets to frozenset objects, and dictionaries to tuples of (key, value) pairs.

Return type:

Any

memoize(f)[source]#

Function decorator that implements memoization. Function results are cached on first call and returned from cache on every subsequent call with the same arguments.

Return type:

Callable

safe_div(a, b)[source]#

Returns a divided by b if b is not zero, and zero otherwise.

Return type:

float

try_float(f)[source]#

Tests whether a value f can be converted to a float.

Return type:

bool

try_complex(f)[source]#

Tests whether a value f can be converted to a complex number.

Return type:

bool

try_int(i)[source]#

Tests whether a value i can be converted to an integer.

Return type:

bool

maybe_int(i)[source]#

Returns i as an integer if it is a whole number, and as a float otherwise.

Return type:

Any

is_pattern(s)[source]#

Returns True if a string s contains pattern characters such as “*” or “?”, and False otherwise.

Return type:

bool

is_regex(s)[source]#

Returns True if a string s is a regular expression starting with “^” and ending with “$”, and False otherwise.

Return type:

bool

pattern_matcher(pattern, mode=<built-in function any>)[source]#

Takes a string pattern which might be an actual pattern for fnmatching, a regular expressions or just a plain string and returns a function that can be used to test of a string matches that pattern.

When pattern is a sequence, all its patterns are compared the same way and the result is the combination given a mode which typically should be any or all.

Example:

matcher = pattern_matcher("foo*")
matcher("foo123")  # -> True
matcher("bar123")  # -> False

matcher = pattern_matcher(r"^foo\d+.*$")
matcher("foox")  # -> False
matcher("foo1")  # -> True

matcher = pattern_matcher(("foo*", "*bar"), mode=any)
matcher("foo123")  # -> True
matcher("123bar")  # -> True

matcher = pattern_matcher(("foo*", "*bar"), mode=all)
matcher("foo123")     # -> False
matcher("123bar")     # -> False
matcher("foo123bar")  # -> True
Return type:

Callable[[str], bool]

dict_add_strict(d, key, value)[source]#

Adds key-value pair to dictionary, but only if it does not change an existing value; Raises KeyError otherwise.

Return type:

None

get_source_code(obj, indent=None)[source]#

Returns the source code of any object obj as a string. When indent is not None, the code indentation is first removed and then re-applied with indent if it is a string, or by that many spaces in case it is an integer.

Return type:

str

class DotDict[source]#

Bases: OrderedDict

Subclass of OrderedDict that provides read and write access to items via attributes by implementing __getattr__ and __setattr__. In case a item is accessed via attribute and it does not exist, an AttriuteError is raised rather than a KeyError. Example:

d = DotDict()
d["foo"] = 1

print(d["foo"])
# => 1

print(d.foo)
# => 1

print(d["bar"])
# => KeyError

print(d.bar)
# => AttributeError

d.bar = 123
print(d.bar)
# => 123

# use wrap() to convert a nested dict
d = DotDict.wrap({"foo": {"bar": 1}})
print(d.foo.bar)
# => 1

Methods:

__getattr__(attr)

rtype:

Any

__setattr__(attr, value)

Implement setattr(self, name, value).

copy()

rtype:

DotDict

wrap(*args, **kwargs)

Takes a dictionary d and recursively replaces it and all other nested dictionary types with DotDict's for deep attribute-style access.

Attributes:

__getattr__(attr)[source]#
Return type:

Any

__setattr__(attr, value)[source]#

Implement setattr(self, name, value).

Return type:

None

copy()[source]#
Return type:

DotDict

classmethod wrap(*args, **kwargs)[source]#

Takes a dictionary d and recursively replaces it and all other nested dictionary types with DotDict’s for deep attribute-style access.

Return type:

DotDict

__annotations__ = {}#
__module__ = 'columnflow.util'#
class MockModule(name)[source]#

Bases: object

Mockup object that resembles a module with arbitrarily deep structure such that, e.g.,

coffea = MockModule("coffea")
print(coffea.nanoevents.NanoEventsArray)
# -> "<MockupModule 'coffea' at 0x981jald1>"

will always succeed at declaration, but most likely fail at execution time. In fact, each attribute access will return the mock object again. This might only be useful in places where a module is potentially not existing (e.g. due to sandboxing) but one wants to import it either way a) to perform only one top-level import as opposed to imports in all functions of a package, or b) to provide type hints for documentation purposes.

_name#

type: str

The name of the mock module.

Methods:

__init__(name)

__getattr__(attr)

rtype:

MockModule

__repr__()

Return repr(self).

__call__(*args, **kwargs)

Call self as a function.

__nonzero__()

rtype:

bool

__bool__()

rtype:

bool

__or__(other)

rtype:

Any

Attributes:

__dict__

__module__

__weakref__

list of weak references to the object (if defined)

__init__(name)[source]#
__getattr__(attr)[source]#
Return type:

MockModule

__repr__()[source]#

Return repr(self).

Return type:

str

__call__(*args, **kwargs)[source]#

Call self as a function.

Return type:

None

__nonzero__()[source]#
Return type:

bool

__bool__()[source]#
Return type:

bool

__or__(other)[source]#
Return type:

Any

__annotations__ = {}#
__dict__ = mappingproxy({'__module__': 'columnflow.util', '__doc__': '\n    Mockup object that resembles a module with arbitrarily deep structure such that, e.g.,\n\n    .. code-block:: python\n\n        coffea = MockModule("coffea")\n        print(coffea.nanoevents.NanoEventsArray)\n        # -> "<MockupModule \'coffea\' at 0x981jald1>"\n\n    will always succeed at declaration, but most likely fail at execution time. In fact, each\n    attribute access will return the mock object again. This might only be useful in places where\n    a module is potentially not existing (e.g. due to sandboxing) but one wants to import it either\n    way a) to perform only one top-level import as opposed to imports in all functions of a package,\n    or b) to provide type hints for documentation purposes.\n\n    .. py:attribute:: _name\n\n        type: str\n\n        The name of the mock module.\n    ', '__init__': <function MockModule.__init__>, '__getattr__': <function MockModule.__getattr__>, '__repr__': <function MockModule.__repr__>, '__call__': <function MockModule.__call__>, '__nonzero__': <function MockModule.__nonzero__>, '__bool__': <function MockModule.__bool__>, '__or__': <function MockModule.__or__>, '__dict__': <attribute '__dict__' of 'MockModule' objects>, '__weakref__': <attribute '__weakref__' of 'MockModule' objects>, '__annotations__': {}})#
__module__ = 'columnflow.util'#
__weakref__#

list of weak references to the object (if defined)

class FunctionArgs(*args, **kwargs)[source]#

Bases: object

Light-weight utility class that wraps all passed args and kwargs and allows to invoke different functions with them.

Methods:

__init__(*args, **kwargs)

__call__(func)

Call self as a function.

Attributes:

__dict__

__module__

__weakref__

list of weak references to the object (if defined)

__init__(*args, **kwargs)[source]#
__call__(func)[source]#

Call self as a function.

Return type:

Any

__annotations__ = {}#
__dict__ = mappingproxy({'__module__': 'columnflow.util', '__doc__': '\n    Light-weight utility class that wraps all passed *args* and *kwargs* and allows to invoke\n    different functions with them.\n    ', '__init__': <function FunctionArgs.__init__>, '__call__': <function FunctionArgs.__call__>, '__dict__': <attribute '__dict__' of 'FunctionArgs' objects>, '__weakref__': <attribute '__weakref__' of 'FunctionArgs' objects>, '__annotations__': {}})#
__module__ = 'columnflow.util'#
__weakref__#

list of weak references to the object (if defined)

class ClassPropertyDescriptor(fget, fset=None)[source]#

Bases: object

Generic descriptor class that is used by classproperty().

Methods:

__init__(fget[, fset])

__get__(obj[, cls])

rtype:

Any

__set__(obj, value)

rtype:

None

Attributes:

__dict__

__module__

__weakref__

list of weak references to the object (if defined)

__init__(fget, fset=None)[source]#
__get__(obj, cls=None)[source]#
Return type:

Any

__set__(obj, value)[source]#
Return type:

None

__annotations__ = {}#
__dict__ = mappingproxy({'__module__': 'columnflow.util', '__doc__': '\n    Generic descriptor class that is used by :py:func:`classproperty`.\n    ', '__init__': <function ClassPropertyDescriptor.__init__>, '__get__': <function ClassPropertyDescriptor.__get__>, '__set__': <function ClassPropertyDescriptor.__set__>, '__dict__': <attribute '__dict__' of 'ClassPropertyDescriptor' objects>, '__weakref__': <attribute '__weakref__' of 'ClassPropertyDescriptor' objects>, '__annotations__': {}})#
__module__ = 'columnflow.util'#
__weakref__#

list of weak references to the object (if defined)

classproperty(func)[source]#

Propety decorator for class-level methods.

Return type:

ClassPropertyDescriptor

class DerivableMeta(cls_name: str, bases: tuple, cls_dict: dict)[source]#

Bases: ABCMeta

Meta class for Derivable objects providing class-level features such as improved tracing and lookup of subclasses, and single-line subclassing for partial-like overwriting of class-level attributes.

Methods:

__new__(metacls, cls_name, bases, cls_dict)

Class creation.

has_cls(cls_name[, deep])

Returns True if this class has a subclass named cls_name and False otherwise.

get_cls(cls_name[, deep, silent])

Returns a previously created subclass named cls_name.

derive(cls_name[, bases, cls_dict, module])

Creates a subclass named cls_name inheriting from this class an additional, optional bases.

derived_by(other)

Returns if a class other is either this or derived from this class, and False otherwise.

Attributes:

static __new__(metacls, cls_name, bases, cls_dict)[source]#

Class creation.

Return type:

DerivableMeta

has_cls(cls_name, deep=True)[source]#

Returns True if this class has a subclass named cls_name and False otherwise. When deep is True, the lookup is recursive through all levels of subclasses.

Return type:

bool

get_cls(cls_name, deep=True, silent=False)[source]#

Returns a previously created subclass named cls_name.

When deep is True, the lookup is recursive through all levels of subclasses. When no such subclass was found an exception is raised, unless silent is True in which case None is returned.

Parameters:
  • cls_name (str) – Name of the subclass to load

  • deep (bool, default: True) – Search for the subclass cls_name throughout the whole inheritance tree of this class (True) or just in the direct inheritance line (False)

  • silent (bool, default: False) – If True, raise an error if no subclass cls_name was found, otherwise return None

Raises:
  • ValueError – If deep is False and the name cls_name is not found in the direct line of subclasses of this class

  • ValueError – If deep is True and the name cls_name is not found at any level of the inheritance tree starting at this class

Return type:

DerivableMeta | None

Returns:

The requested subclass

derive(cls_name, bases=(), cls_dict=None, module=None)[source]#

Creates a subclass named cls_name inheriting from this class an additional, optional bases.

cls_dict will be attached as class-level attributes.

Parameters:
  • cls_name (str) – Name of the newly-derived class

  • bases (tuple, default: ()) – Additional bases to derive new class from

  • cls_dict (dict[str, Any] | None, default: None) – Dictionary to forward to init function of derived class

  • module (str | None, default: None) – extract module name from this module

Return type:

DerivableMeta

Returns:

Newly derived class instance

derived_by(other)[source]#

Returns if a class other is either this or derived from this class, and False otherwise.

Return type:

bool

__annotations__ = {}#
__module__ = 'columnflow.util'#
class Derivable[source]#

Bases: object

Derivable base class with features provided by the meta DerivableMeta.

classattribute cls_name#

type: str read-only

A shorthand to access the name of the class.

Attributes:

cls_name

__abstractmethods__

__dict__

__module__

__weakref__

list of weak references to the object (if defined)

cls_name = 'Derivable'#
__abstractmethods__ = frozenset({})#
__annotations__ = {}#
__dict__ = mappingproxy({'__module__': 'columnflow.util', '__doc__': '\n    Derivable base class with features provided by the meta :py:class:`DerivableMeta`.\n\n    .. py:classattribute:: cls_name\n\n        type: str\n        read-only\n\n        A shorthand to access the name of the class.\n    ', 'cls_name': <columnflow.util.ClassPropertyDescriptor object>, '_subclasses': {'ArrayFunction': <class 'columnflow.columnar_util.ArrayFunction'>, 'MLModel': <class 'columnflow.ml.MLModel'>, 'InferenceModel': <class 'columnflow.inference.InferenceModel'>}, '__dict__': <attribute '__dict__' of 'Derivable' objects>, '__weakref__': <attribute '__weakref__' of 'Derivable' objects>, '__abstractmethods__': frozenset(), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})#
__module__ = 'columnflow.util'#
__weakref__#

list of weak references to the object (if defined)

class KeyValueMessage(*args, key, value, **kwargs)[source]#

Bases: SchedulerMessage

Subclass of luigi.worker.SchedulerMessage that adds key and value attributes, parsed from the incoming message assuming a format key = value.

Attributes:

Methods:

from_message(message)

Factory for KeyValueMessage instances that takes an existing message object and splits its content into a key value pair.

__init__(*args, key, value, **kwargs)

__str__()

Return str(self).

message_cre = re.compile('^\\s*([^\\=\\:]+)\\s*(\\=|\\:)\\s*(.*)\\s*$')#
classmethod from_message(message)[source]#

Factory for KeyValueMessage instances that takes an existing message object and splits its content into a key value pair. The instance is returned if the parsing is successful, and None otherwise.

Return type:

KeyValueMessage | None

__init__(*args, key, value, **kwargs)[source]#
__str__()[source]#

Return str(self).

Return type:

str

__annotations__ = {}#
__module__ = 'columnflow.util'#
load_correction_set(target)[source]#

Loads a correction set using the correctionlib from a file target.

Return type:

Any