Modules

plumbium is a module for recording the activity of file processing pipelines.

Main plumbium module containing the Pipeline class and function recording methods.

class plumbium.processresult.OutputRecorder

Holds commands used via the call function and their resulting output.

reset()

Clear the stored commands and output.

class plumbium.processresult.Pipeline

Main class managing the recording of a processing pipeline.

record(process)

Record a process in this pipeline.

Parameters:process (plumbium.processresult.ProcessOutput) – The new result.
run(name, pipeline_func, base_dir, *inputs, **kwargs)

Execute a function as a recorded pipeline.

Parameters:
  • name (str) – The name of the pipeline - used to name the output file.
  • pipeline_function (function) – The function to be run.
  • base_dir (str) – The directory in which to save the pipeline output, also used as the root directory for input filenames if the filenames given are not absolute.
  • *inputs – The inputs to the pipeline.
Keyword Arguments:
 
  • metadata (dict) – Additional information to be included in the result JSON.
  • filename (str) – String template for the result filename.
  • result_recorder (object) – An instance of a class implementing a write() method that accepts the report dictionary.
  • result_names (str) – An iterable of strings containing the names for any values returned by the pipeline.
  • report_name (str) – Filename for the JSON report (default: report.json).
save(exception=None, report_name='report.json')

Save a record of the pipeline execution.

Creates a JSON file with information about the pipeline then saves it to a gzipped tar file along with all files used in the pipeline.

Keyword Arguments:
 exception (exceptions.Exception or None) – The exception which caused the pipeline run to fail
class plumbium.processresult.ProcessOutput(func, args, kwargs, commands, output, exception, started, finished, **output_images)

A record of one stage within a pipeline.

Parameters:
  • func (function) – The function that was run.
  • args (list) – The arguments passed to the function.
  • kwargs (dict) – The keyword arguments passed to the function.
  • output (str) – Text printed to stdout or stderr during execution.
  • exception (exceptions.Exception or None) – The exception that occurred running the stage if applicable.
  • started (datetime.datetime) – When the stage was started.
  • finished (datetime.datetime) – When the stage finished executing.
  • **output_images (plumbium.artefacts.Artefact) – Images produced by the stage.
__getitem__(key)

Get the item corresponding to key in the _results dictionary.

__iter__()

Get an iterable over the keys in the _results dictionary.

__len__()

Get the length of the _results dictionary.

as_dict()

Serialize this output as a dict.

plumbium.processresult.call(cmd, cwd=None, shell=False)

Execute scripts and applications in a pipeline with output capturing.

Parameters:
  • cmd (list) – List containing the program to be called and any arguments e.g. ['tar', '-x', '-f', 'file.tgz'].
  • cwd (str) – Working directory in which to execute the command.
  • shell (bool) – Execute the command in a shell.
Returns:

The output from the called command on stdout and stderr.

Return type:

str

plumbium.processresult.record(*output_names)

Decorator for wrapping pipeline stages.

Parameters:*output_names (str) – The names of each returned variable.

Module containing the plumbium.artefacts.Artefact base class and subclasses.

class plumbium.artefacts.Artefact(filename, extension, exists=True)

Base class for Plumbium artefacts (files consumed by and generated by processes).

Parameters:
  • filename (str) – The filename of the artefact.
  • extension (str) – The extension of the artefact’s filename.
Keyword Arguments:
 

exists (boolean) – If true raise an exception if the file does not exist.

Raises:
  • exceptions.ValueError – If filename does not end with extension.
  • exceptions.IOError – If filename does not exist.
abspath

The file’s absolute path.

basename

The filename without the extension and directory components.

>> Artefact('/dir/file.txt').basename
'/dir/file'
checksum()

Calculate the SHA-1 checksum of the file.

dereference()

Remove any directory components from the filename.

>> a = Artefact('/dir/file.txt')
>> a.dereference()
>> a.filename
'file.txt'
dirname

Return the directory component of the filename.

>> Artefact('/dir/file.txt').dirname()
'/dir'
exists()

Return True if Artefact.filename exists.

filename

The artefact’s filename.

justname

The filename without the extension and directory components.

>> Artefact('/dir/file.txt').justname
'file'
class plumbium.artefacts.NiiGzImage(filename, exists=True)

An artefact for .nii.gz images.

Parameters:filename (str) – The filename of the artefact.
Keyword Arguments:
 exists (boolean) – If true raise an exception if the file does not exist.
class plumbium.artefacts.TextFile(filename, exists=True)

An artefact for .txt files.

Parameters:filename (str) – The filename of the artefact.
Keyword Arguments:
 exists (boolean) – If true raise an exception if the file does not exist.

Module containing the get_environment function.

plumbium.environment.get_environment()

Obtain information about the executing environment.

Captures:
  • installed Python packages using pip (if available),
  • hostname
  • uname
  • environment variables
Returns:a dict with the keys python_packages, hostname, uname and environ
Return type:dict

Module containing functions for recording results to files and databases.

class plumbium.recorders.CSVFile(path, values)

Records results to a CSV file.

Parameters:
  • path (str) – The file to which results should be written
  • values (dict) – a mapping from table columns to values
write(results)

Write results to the file specified.

Parameters:results (dict) – A dictionary of results to record

Note

If the specified does not exist it will be created and a header will be written , otherwise the new result is appended.

class plumbium.recorders.SQLDatabase(uri, table, values)

Record results to a database supported by SQLAlchemy.

Parameters:
  • uri (str) – database server URI e.g. mysql://username:password@localhost/dbname
  • table (str) – table name
  • values (dict) – a mapping from database table columns to values
write(results)

Write the results to the database table specified at initialisation.

Parameters:results (dict) – A dictionary of results to record
class plumbium.recorders.MongoDB(uri, database, collection)

Records results to a MongoDB database.

Parameters:
  • uri (str) – MongoDB server URI e.g. mongodb://localhost:27017
  • database (str) – database name
  • collection (str) – collection name

Note

Use of this class requires the installation of the pymongo module.

See also

MongoDB tutorial

write(results)

Insert results into the database.

class plumbium.recorders.StdOut(values)

Print results to stdout.

Parameters:values (dict) – key-value pairs to be printed
write(results)

Print the results to stdout.

class plumbium.recorders.Slack(url, channel, values)

Send a Slack notification when a pipeline completes.

Parameters:
  • url (str) – Slack Webhook URL
  • channel (str) – The channel name to post to
  • values – (dict): A mapping of result keys to report

Note

Use of this class requires the installation of the slackclient module.

write(results)

Send a message to Slack.

Parameters:results (dict) – A dictionary of results to record

Exposes the CSVFile result recorder.

class plumbium.recorders.csvfile.CSVFile(path, values)

Records results to a CSV file.

Parameters:
  • path (str) – The file to which results should be written
  • values (dict) – a mapping from table columns to values
write(results)

Write results to the file specified.

Parameters:results (dict) – A dictionary of results to record

Note

If the specified does not exist it will be created and a header will be written , otherwise the new result is appended.

Exposes the MongoDB recorder class.

class plumbium.recorders.mongodb.MongoDB(uri, database, collection)

Records results to a MongoDB database.

Parameters:
  • uri (str) – MongoDB server URI e.g. mongodb://localhost:27017
  • database (str) – database name
  • collection (str) – collection name

Note

Use of this class requires the installation of the pymongo module.

See also

MongoDB tutorial

write(results)

Insert results into the database.

Exposes the Slack result recorder.

class plumbium.recorders.slack.Slack(url, channel, values)

Send a Slack notification when a pipeline completes.

Parameters:
  • url (str) – Slack Webhook URL
  • channel (str) – The channel name to post to
  • values – (dict): A mapping of result keys to report

Note

Use of this class requires the installation of the slackclient module.

write(results)

Send a message to Slack.

Parameters:results (dict) – A dictionary of results to record

Exposes the SQLDatabase result recorder.

class plumbium.recorders.sqldatabase.SQLDatabase(uri, table, values)

Record results to a database supported by SQLAlchemy.

Parameters:
  • uri (str) – database server URI e.g. mysql://username:password@localhost/dbname
  • table (str) – table name
  • values (dict) – a mapping from database table columns to values
write(results)

Write the results to the database table specified at initialisation.

Parameters:results (dict) – A dictionary of results to record

Exposes the StdOut recorder.

class plumbium.recorders.stdout.StdOut(values)

Print results to stdout.

Parameters:values (dict) – key-value pairs to be printed
write(results)

Print the results to stdout.