Modules¶
plumbium is a module for recording the activity of file processing pipelines.
Main plumbium module containing the Pipeline class and function recording methods.
-
class
plumbium.processresult.
OutputRecorder
¶ Holds commands used via the call function and their resulting output.
-
reset
()¶ Clear the stored commands and output.
-
-
class
plumbium.processresult.
Pipeline
¶ Main class managing the recording of a processing pipeline.
-
record
(process)¶ Record a process in this pipeline.
Parameters: process ( plumbium.processresult.ProcessOutput
) – The new result.
-
run
(name, pipeline_func, base_dir, *inputs, **kwargs)¶ Execute a function as a recorded pipeline.
Parameters: - name (str) – The name of the pipeline - used to name the output file.
- pipeline_function (function) – The function to be run.
- base_dir (str) – The directory in which to save the pipeline output, also used as the root directory for input filenames if the filenames given are not absolute.
- *inputs – The inputs to the pipeline.
Keyword Arguments: - metadata (dict) – Additional information to be included in the result JSON.
- filename (str) – String template for the result filename.
- result_recorder (object) – An instance of a class implementing a write() method that accepts the report dictionary.
- result_names (str) – An iterable of strings containing the names for any values returned by the pipeline.
- report_name (str) – Filename for the JSON report (default: report.json).
-
save
(exception=None, report_name='report.json')¶ Save a record of the pipeline execution.
Creates a JSON file with information about the pipeline then saves it to a gzipped tar file along with all files used in the pipeline.
Keyword Arguments: exception ( exceptions.Exception
or None) – The exception which caused the pipeline run to fail
-
-
class
plumbium.processresult.
ProcessOutput
(func, args, kwargs, commands, output, exception, started, finished, **output_images)¶ A record of one stage within a pipeline.
Parameters: - func (function) – The function that was run.
- args (list) – The arguments passed to the function.
- kwargs (dict) – The keyword arguments passed to the function.
- output (str) – Text printed to stdout or stderr during execution.
- exception (
exceptions.Exception
or None) – The exception that occurred running the stage if applicable. - started (
datetime.datetime
) – When the stage was started. - finished (
datetime.datetime
) – When the stage finished executing. - **output_images (
plumbium.artefacts.Artefact
) – Images produced by the stage.
-
__getitem__
(key)¶ Get the item corresponding to
key
in the_results
dictionary.
-
__iter__
()¶ Get an iterable over the keys in the
_results
dictionary.
-
__len__
()¶ Get the length of the
_results
dictionary.
-
as_dict
()¶ Serialize this output as a
dict
.
-
plumbium.processresult.
call
(cmd, cwd=None, shell=False)¶ Execute scripts and applications in a pipeline with output capturing.
Parameters: - cmd (list) – List containing the program to be called and any arguments
e.g.
['tar', '-x', '-f', 'file.tgz']
. - cwd (str) – Working directory in which to execute the command.
- shell (bool) – Execute the command in a shell.
Returns: The output from the called command on stdout and stderr.
Return type: str
- cmd (list) – List containing the program to be called and any arguments
e.g.
-
plumbium.processresult.
record
(*output_names)¶ Decorator for wrapping pipeline stages.
Parameters: *output_names (str) – The names of each returned variable.
Module containing the plumbium.artefacts.Artefact
base class and subclasses.
-
class
plumbium.artefacts.
Artefact
(filename, extension, exists=True)¶ Base class for Plumbium artefacts (files consumed by and generated by processes).
Parameters: - filename (str) – The filename of the artefact.
- extension (str) – The extension of the artefact’s filename.
Keyword Arguments: exists (boolean) – If true raise an exception if the file does not exist.
Raises: exceptions.ValueError
– Iffilename
does not end withextension
.exceptions.IOError
– Iffilename
does not exist.
-
abspath
¶ The file’s absolute path.
-
basename
¶ The filename without the extension and directory components.
>> Artefact('/dir/file.txt').basename '/dir/file'
-
checksum
()¶ Calculate the SHA-1 checksum of the file.
-
dereference
()¶ Remove any directory components from the filename.
>> a = Artefact('/dir/file.txt') >> a.dereference() >> a.filename 'file.txt'
-
dirname
¶ Return the directory component of the filename.
>> Artefact('/dir/file.txt').dirname() '/dir'
-
exists
()¶ Return
True
ifArtefact.filename
exists.
-
filename
¶ The artefact’s filename.
-
justname
¶ The filename without the extension and directory components.
>> Artefact('/dir/file.txt').justname 'file'
-
class
plumbium.artefacts.
NiiGzImage
(filename, exists=True)¶ An artefact for
.nii.gz
images.Parameters: filename (str) – The filename of the artefact. Keyword Arguments: exists (boolean) – If true raise an exception if the file does not exist.
-
class
plumbium.artefacts.
TextFile
(filename, exists=True)¶ An artefact for
.txt
files.Parameters: filename (str) – The filename of the artefact. Keyword Arguments: exists (boolean) – If true raise an exception if the file does not exist.
Module containing the get_environment function.
-
plumbium.environment.
get_environment
()¶ Obtain information about the executing environment.
- Captures:
- installed Python packages using pip (if available),
- hostname
- uname
- environment variables
Returns: a dict with the keys python_packages
,hostname
,uname
andenviron
Return type: dict
Module containing functions for recording results to files and databases.
-
class
plumbium.recorders.
CSVFile
(path, values)¶ Records results to a CSV file.
Parameters: - path (str) – The file to which results should be written
- values (dict) – a mapping from table columns to values
-
write
(results)¶ Write results to the file specified.
Parameters: results (dict) – A dictionary of results to record Note
If the specified does not exist it will be created and a header will be written , otherwise the new result is appended.
-
class
plumbium.recorders.
SQLDatabase
(uri, table, values)¶ Record results to a database supported by SQLAlchemy.
Parameters: - uri (str) – database server URI e.g.
mysql://username:password@localhost/dbname
- table (str) – table name
- values (dict) – a mapping from database table columns to values
See also
-
write
(results)¶ Write the results to the database table specified at initialisation.
Parameters: results (dict) – A dictionary of results to record
- uri (str) – database server URI e.g.
-
class
plumbium.recorders.
MongoDB
(uri, database, collection)¶ Records results to a MongoDB database.
Parameters: - uri (str) – MongoDB server URI e.g.
mongodb://localhost:27017
- database (str) – database name
- collection (str) – collection name
Note
Use of this class requires the installation of the pymongo module.
See also
-
write
(results)¶ Insert results into the database.
- uri (str) – MongoDB server URI e.g.
-
class
plumbium.recorders.
StdOut
(values)¶ Print results to stdout.
Parameters: values (dict) – key-value pairs to be printed -
write
(results)¶ Print the results to stdout.
-
-
class
plumbium.recorders.
Slack
(url, channel, values)¶ Send a Slack notification when a pipeline completes.
Parameters: - url (str) – Slack Webhook URL
- channel (str) – The channel name to post to
- values – (dict): A mapping of result keys to report
Note
Use of this class requires the installation of the slackclient module.
-
write
(results)¶ Send a message to Slack.
Parameters: results (dict) – A dictionary of results to record
Exposes the CSVFile result recorder.
-
class
plumbium.recorders.csvfile.
CSVFile
(path, values)¶ Records results to a CSV file.
Parameters: - path (str) – The file to which results should be written
- values (dict) – a mapping from table columns to values
-
write
(results)¶ Write results to the file specified.
Parameters: results (dict) – A dictionary of results to record Note
If the specified does not exist it will be created and a header will be written , otherwise the new result is appended.
Exposes the MongoDB recorder class.
-
class
plumbium.recorders.mongodb.
MongoDB
(uri, database, collection)¶ Records results to a MongoDB database.
Parameters: - uri (str) – MongoDB server URI e.g.
mongodb://localhost:27017
- database (str) – database name
- collection (str) – collection name
Note
Use of this class requires the installation of the pymongo module.
See also
-
write
(results)¶ Insert results into the database.
- uri (str) – MongoDB server URI e.g.
Exposes the Slack result recorder.
-
class
plumbium.recorders.slack.
Slack
(url, channel, values)¶ Send a Slack notification when a pipeline completes.
Parameters: - url (str) – Slack Webhook URL
- channel (str) – The channel name to post to
- values – (dict): A mapping of result keys to report
Note
Use of this class requires the installation of the slackclient module.
-
write
(results)¶ Send a message to Slack.
Parameters: results (dict) – A dictionary of results to record
Exposes the SQLDatabase result recorder.
-
class
plumbium.recorders.sqldatabase.
SQLDatabase
(uri, table, values)¶ Record results to a database supported by SQLAlchemy.
Parameters: - uri (str) – database server URI e.g.
mysql://username:password@localhost/dbname
- table (str) – table name
- values (dict) – a mapping from database table columns to values
See also
-
write
(results)¶ Write the results to the database table specified at initialisation.
Parameters: results (dict) – A dictionary of results to record
- uri (str) – database server URI e.g.
Exposes the StdOut recorder.