Next steps¶
Adding metadata¶
Often it is useful to add extra information to an analysis record such as
software versions or patient identification numbers. This information can be
added to an analysis using the metadata
keyword argument.
pipeline.run(
'example',
my_pipeline,
base_directory,
metadata={'site': 5, 'subject': 1, 'version': '0.1beta2'}
)
This metadata dictionary will be included in the saved JSON file and can be used by result recorders and to name output files.
Pipeline result names¶
If your pipeline function returns values these can be named in the report file
using the result_names
keyword argument.
pipeline.run(
'example',
my_pipeline,
base_directory,
result_names=('foo', 'bar')
)
Output file naming¶
By default the results of an analysis run are saved as
'[analysis_name]-[start date]_[start_time].tar.gz'
. This behaviour can be
changed by adding the filename
keyword to your pipeline.run
call.
pipeline.run(
'example',
my_pipeline,
base_directory,
metadata={'site': 5, 'subject': 1},
filename='{name}-{metadata[site]:03d}-{metadata[subject]:02d}-{start_date:%Y%m%d}'
)
The filename argument should be given as a string using Python’s format string syntax. When the file is saved the fields in this string will be replaced using the results structure - the layout of this structure can be seen by inspecting the JSON file that Plumbium produces.
Recording results¶
In addition to archiving analysis results to a file Plumbium can record analysis outcomes to a number of other destinations.
CSV file¶
The CSVFile
recorder outputs selected
fields from the results structure to a CSV file (which will be created or
appended to as appropriate). To use CSVFile first create an instance of the class.
csvfile = CSVFile(
'csv_results.csv',
OrderedDict([
('start_date', lambda x: x['start_date']),
('data_val', lambda x: x['processes'][-1]['printed_output'].strip().split(' ')[0])
])
)
The first argument is the path of the CSV file you want to record to. The
second argument is a dictionary consisting of keys corresponding to the column
names in your CSV file and function which will return the appropriate value for
each column. An OrderedDict
should be used
so that the columns are ordered as expected (using a regular dict will give a
random order of columns.
SQL database¶
To record to any SQL database supported by SQLAlchemy use the SQLDatabase
class.
db = SQLDatabase(
'sqlite:///db.sqlite',
'results',
{
'wordcount': lambda x: x['processes'][-1]['printed_output'].strip().split(' ')[0],
'start_date': lambda x: x['start_date']
}
)
The first argument should be a database URL in a form recognised by SQLAlchemy, the second argument is the name of the database table to insert the new result into (this table must exist - Plumbium won’t try to create it), the last argument is a dictionary of column names and functions to output values as described above.
MongoDB¶
Plumbium can save the complete JSON result structure to a MongoDB server using
the MongoDB
class.
mongodb = MongoDB('mongodb://localhost:27017/', 'plumbium', 'results')
The first arugment is a MongoDB URL (see the PyMongo tutorial for details). The second argument is the database name and the final argument is the collection to insert into.
Slack¶
The Slack recorder allows a message to be sent to a Slack channel configured with a Webhook. You will need the name of the channel to post to and the Webhook URL from the Slack website.
slack = Slack(
'https://hooks.slack.com/services/...',
'#channel',
OrderedDict([
('start_date', lambda x: x['start_date']),
('data_val', lambda x: x['processes'][-1]['printed_output'].strip().split(' ')[0])
])
)
The first argument is the Webhook URL, the second is the channel to post to (the channel name should include the preceding #). The example shown will send a message like the following to Slack upon completion:
Plumbium task completestart date: 20160101 11:59data_val: 55