The core framework is responsible for parsing and interpreting microbial physiology data by parsing it into a object-oriented and database-driven hierarchy. The data schema is based on the workflow of designing experiments and the associated data analysis process.
This is an overview of the models:
Model | Function |
---|---|
TrialIdentifier | Describes a trial (time, analyte, strain, media, etc.) |
AnalyteData | Time, data points and vectors for quantified data (g/L product, OD, etc.) |
SingleTrial | All analytes for a given unit (e.g. a tube, well on plate, bioreactor, etc.) |
ReplicateTrial | Contains a set of SingleTrial s with replicates
grouped to calculate statistics |
Experiment | All of the trials performed on a given date |
Project | Groups of experiments with overall goals |
TrialIdentifier
class¶Before any data importing, a description should be generated. Data is described based on the strain (organism, plasmids, etc.), the media (salts, carbon sources, nitrogen sources, etc.) and the environment (temperature, labware, shaking speed, etc.) A trial identifier is everything required to uniquely identify a point of data.
The three fundamental units of a trial identifier are the Strain
,
Media
and Environment
classes.
Model | Function |
---|---|
Strain | Describes the organism being characterized (e.g. strain, knockouts, plasmids, etc.) |
Media | Described the medium used to characterize the organism (e.g. M9 + 0.02% glc_D) |
Environment | The conditions and labware used (e.g. 96-well plate, 250RPM, 37C) |
We will build a trial identifier with all its components and use it to describe some data.s
In [12]:
import impact as impt
from importlib import reload
reload(impt)
strain = impt.Strain()
strain.name = 'LMSE001'
strain.plasmids.append(impt.Plasmid(name='pTrc99a'))
print(strain)
media = impt.Media()
media.add_component('IPTG',concentration= 20,unit='ng/mL')
print(media)
env = impt.Environment(labware=impt.Labware(name='96MTP'),
shaking_speed = 250,
temperature = 37)
print(env)
LMSE001+pTrc99a
20g/L IPTG
96MTP 250RPM 37C
With these fundamental units, we can construct a trial identifier.
In [13]:
ti = impt.TimeCourseIdentifier(strain=strain,media=media,environment=env)
print(ti)
strain: LMSE001+pTrc99a, media: 20g/L IPTG, env: 96MTP 250RPM 37C, analyte: None, rep: -1
We see the strain, media and environment set and some empty values for analyte and replicate. Let’s fill in the missing values required to fully describe the analyte.
In [14]:
ti.analyte_name = 'glc__D'
ti.analyte_type = 'substrate'
ti.replicate_id = 1
print(ti)
strain: LMSE001+pTrc99a, media: 20g/L IPTG, env: 96MTP 250RPM 37C, analyte: glc__D, rep: 1
We can now use this trial identifier to build objects with experimental data.
AnalyteData
¶TimePoint
and TimeCourse
¶These time points are rarely used directly, but are included in order to flatten data into a relational database. These time points can either be created and added individually, or a time vector and data vector can be provided and the associated time points will automatically be generated.
In [15]:
substrate = impt.Substrate()
# Add each time point individually
for t, data in zip([0,1,2,3,4,5],[0,1,2,3,4,5]):
tp = impt.TimePoint(trial_identifier=ti,time=t,data=data)
substrate.add_timepoint(tp)
# or, add the vectors
substrate = impt.Substrate(trial_identifier=ti,time_vector=[0,1,2,3,4,5],data_vector=[0,1,2,3,4,5])
print(substrate.pd_series)
0 0
1 1
2 2
3 3
4 4
5 5
dtype: int64
Here we instantiated a substrate object because we are dealing with a
substrate - this differentiation allows impact to choose the appropriate
model for the data, as well as calculate features. Any time series data
can be imported as a impt.TimeCourse
, but additional details can be
extracted if a specific data type is chosen.
Analyte type | Function |
---|---|
Substrate |
An analyte which is consumed |
Product |
An analyte which is produced |
Biomass |
A measurement of the biomass concentration |
Reporter |
A reporter, such as fluorescence from gfp or mCherry |
In [25]:
import numpy as np
import impact.plotting as implot
def exp_growth(t):
X0 = 0.05
mu = 0.1
return X0 * np.exp(mu*t)
# def production(t, product_yield, biomass_concentration):
# rate =
x = np.linspace(0,20,20)
y = exp_growth(x)
implot.plot([implot.go.Scatter(x=x,y=y)])
SingleTrial
¶ReplicateTrial
¶Experiment
¶Project
¶