Core structure¶

The core framework is responsible for parsing and interpreting microbial physiology data by parsing it into a object-oriented and database-driven hierarchy. The data schema is based on the workflow of designing experiments and the associated data analysis process.

This is an overview of the models:

Model	Function
TrialIdentifier	Describes a trial (time, analyte, strain, media, etc.)
AnalyteData	Time, data points and vectors for quantified data (g/L product, OD, etc.)
SingleTrial	All analytes for a given unit (e.g. a tube, well on plate, bioreactor, etc.)
ReplicateTrial	Contains a set of `SingleTrial`s with replicates grouped to calculate statistics
Experiment	All of the trials performed on a given date
Project	Groups of experiments with overall goals

`TrialIdentifier` class¶

Before any data importing, a description should be generated. Data is described based on the strain (organism, plasmids, etc.), the media (salts, carbon sources, nitrogen sources, etc.) and the environment (temperature, labware, shaking speed, etc.) A trial identifier is everything required to uniquely identify a point of data.

The three fundamental units of a trial identifier are the Strain, Media and Environment classes.

Model	Function
Strain	Describes the organism being characterized (e.g. strain, knockouts, plasmids, etc.)
Media	Described the medium used to characterize the organism (e.g. M9 + 0.02% glc_D)
Environment	The conditions and labware used (e.g. 96-well plate, 250RPM, 37C)

We will build a trial identifier with all its components and use it to describe some data.s

In [12]:

import impact as impt
from importlib import reload
reload(impt)
strain = impt.Strain()
strain.name = 'LMSE001'
strain.plasmids.append(impt.Plasmid(name='pTrc99a'))
print(strain)

media = impt.Media()
media.add_component('IPTG',concentration= 20,unit='ng/mL')
print(media)

env = impt.Environment(labware=impt.Labware(name='96MTP'),
                      shaking_speed = 250,
                      temperature = 37)
print(env)

LMSE001+pTrc99a
20g/L IPTG
96MTP 250RPM 37C

With these fundamental units, we can construct a trial identifier.

In [13]:

ti = impt.TimeCourseIdentifier(strain=strain,media=media,environment=env)
print(ti)

strain: LMSE001+pTrc99a,        media: 20g/L IPTG,      env: 96MTP 250RPM 37C,  analyte: None,  rep: -1

We see the strain, media and environment set and some empty values for analyte and replicate. Let’s fill in the missing values required to fully describe the analyte.

In [14]:

ti.analyte_name = 'glc__D'
ti.analyte_type = 'substrate'
ti.replicate_id = 1
print(ti)

strain: LMSE001+pTrc99a,        media: 20g/L IPTG,      env: 96MTP 250RPM 37C,  analyte: glc__D,        rep: 1

We can now use this trial identifier to build objects with experimental data.

`AnalyteData`¶

`TimePoint` and `TimeCourse`¶

These time points are rarely used directly, but are included in order to flatten data into a relational database. These time points can either be created and added individually, or a time vector and data vector can be provided and the associated time points will automatically be generated.

In [15]:

substrate = impt.Substrate()
# Add each time point individually
for t, data in zip([0,1,2,3,4,5],[0,1,2,3,4,5]):
    tp = impt.TimePoint(trial_identifier=ti,time=t,data=data)
    substrate.add_timepoint(tp)

# or, add the vectors
substrate = impt.Substrate(trial_identifier=ti,time_vector=[0,1,2,3,4,5],data_vector=[0,1,2,3,4,5])
print(substrate.pd_series)

0    0
1    1
2    2
3    3
4    4
5    5
dtype: int64

Here we instantiated a substrate object because we are dealing with a substrate - this differentiation allows impact to choose the appropriate model for the data, as well as calculate features. Any time series data can be imported as a impt.TimeCourse, but additional details can be extracted if a specific data type is chosen.

Analyte type	Function
`Substrate`	An analyte which is consumed
`Product`	An analyte which is produced
`Biomass`	A measurement of the biomass concentration
`Reporter`	A reporter, such as fluorescence from gfp or mCherry

In [25]:

import numpy as np
import impact.plotting as implot

def exp_growth(t):
    X0 = 0.05
    mu = 0.1
    return X0 * np.exp(mu*t)

# def production(t, product_yield, biomass_concentration):
#     rate =

x = np.linspace(0,20,20)
y = exp_growth(x)
implot.plot([implot.go.Scatter(x=x,y=y)])

Core structure¶

`TrialIdentifier` class¶

`AnalyteData`¶

`TimePoint` and `TimeCourse`¶

`SingleTrial`¶

`ReplicateTrial`¶

`Experiment`¶

`Project`¶

Core structure¶

TrialIdentifier class¶

AnalyteData¶

TimePoint and TimeCourse¶

SingleTrial¶

ReplicateTrial¶

Experiment¶

Project¶

`TrialIdentifier` class¶

`AnalyteData`¶

`TimePoint` and `TimeCourse`¶

`SingleTrial`¶

`ReplicateTrial`¶

`Experiment`¶

`Project`¶