Core structure

The core framework is responsible for parsing and interpreting microbial physiology data by parsing it into a object-oriented and database-driven hierarchy. The data schema is based on the workflow of designing experiments and the associated data analysis process.

This is an overview of the models:

Model Function
TrialIdentifier Describes a trial (time, analyte, strain, media, etc.)
AnalyteData Time, data points and vectors for quantified data (g/L product, OD, etc.)
SingleTrial All analytes for a given unit (e.g. a tube, well on plate, bioreactor, etc.)
ReplicateTrial Contains a set of SingleTrials with replicates grouped to calculate statistics
Experiment All of the trials performed on a given date
Project Groups of experiments with overall goals

TrialIdentifier class

Before any data importing, a description should be generated. Data is described based on the strain (organism, plasmids, etc.), the media (salts, carbon sources, nitrogen sources, etc.) and the environment (temperature, labware, shaking speed, etc.) A trial identifier is everything required to uniquely identify a point of data.

The three fundamental units of a trial identifier are the Strain, Media and Environment classes.

Model Function
Strain Describes the organism being characterized (e.g. strain, knockouts, plasmids, etc.)
Media Described the medium used to characterize the organism (e.g. M9 + 0.02% glc_D)
Environment The conditions and labware used (e.g. 96-well plate, 250RPM, 37C)

We will build a trial identifier with all its components and use it to describe some data.s

In [12]:
import impact as impt
from importlib import reload
reload(impt)
strain = impt.Strain()
strain.name = 'LMSE001'
strain.plasmids.append(impt.Plasmid(name='pTrc99a'))
print(strain)

media = impt.Media()
media.add_component('IPTG',concentration= 20,unit='ng/mL')
print(media)

env = impt.Environment(labware=impt.Labware(name='96MTP'),
                      shaking_speed = 250,
                      temperature = 37)
print(env)
LMSE001+pTrc99a
20g/L IPTG
96MTP 250RPM 37C

With these fundamental units, we can construct a trial identifier.

In [13]:
ti = impt.TimeCourseIdentifier(strain=strain,media=media,environment=env)
print(ti)
strain: LMSE001+pTrc99a,        media: 20g/L IPTG,      env: 96MTP 250RPM 37C,  analyte: None,  rep: -1

We see the strain, media and environment set and some empty values for analyte and replicate. Let’s fill in the missing values required to fully describe the analyte.

In [14]:
ti.analyte_name = 'glc__D'
ti.analyte_type = 'substrate'
ti.replicate_id = 1
print(ti)
strain: LMSE001+pTrc99a,        media: 20g/L IPTG,      env: 96MTP 250RPM 37C,  analyte: glc__D,        rep: 1

We can now use this trial identifier to build objects with experimental data.

AnalyteData

TimePoint and TimeCourse

These time points are rarely used directly, but are included in order to flatten data into a relational database. These time points can either be created and added individually, or a time vector and data vector can be provided and the associated time points will automatically be generated.

In [15]:
substrate = impt.Substrate()
# Add each time point individually
for t, data in zip([0,1,2,3,4,5],[0,1,2,3,4,5]):
    tp = impt.TimePoint(trial_identifier=ti,time=t,data=data)
    substrate.add_timepoint(tp)

# or, add the vectors
substrate = impt.Substrate(trial_identifier=ti,time_vector=[0,1,2,3,4,5],data_vector=[0,1,2,3,4,5])
print(substrate.pd_series)
0    0
1    1
2    2
3    3
4    4
5    5
dtype: int64

Here we instantiated a substrate object because we are dealing with a substrate - this differentiation allows impact to choose the appropriate model for the data, as well as calculate features. Any time series data can be imported as a impt.TimeCourse, but additional details can be extracted if a specific data type is chosen.

Analyte type Function
Substrate An analyte which is consumed
Product An analyte which is produced
Biomass A measurement of the biomass concentration
Reporter A reporter, such as fluorescence from gfp or mCherry
In [25]:
import numpy as np
import impact.plotting as implot

def exp_growth(t):
    X0 = 0.05
    mu = 0.1
    return X0 * np.exp(mu*t)

# def production(t, product_yield, biomass_concentration):
#     rate =

x = np.linspace(0,20,20)
y = exp_growth(x)
implot.plot([implot.go.Scatter(x=x,y=y)])

SingleTrial

ReplicateTrial

Experiment

Project