{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Core structure\n", "The core framework is responsible for parsing and interpreting microbial physiology data by parsing it into a object-oriented and database-driven hierarchy. The data schema is based on the workflow of designing experiments and the associated data analysis process. \n", "\n", "This is an overview of the models:\n", "\n", "| Model | Function |\n", "|-----------------------|----------------------------------------------------------------------------------|\n", "| TrialIdentifier | Describes a trial (time, analyte, strain, media, etc.) |\n", "| AnalyteData | Time, data points and vectors for quantified data (g/L product, OD, etc.) |\n", "| SingleTrial | All analytes for a given unit (e.g. a tube, well on plate, bioreactor, etc.) |\n", "| ReplicateTrial | Contains a set of `SingleTrial`s with replicates grouped to calculate statistics |\n", "| Experiment | All of the trials performed on a given date |\n", "| Project | Groups of experiments with overall goals | \n", "\n", "## `TrialIdentifier` class\n", "\n", "Before any data importing, a description should be generated. Data is described based on the strain (organism, plasmids, etc.), the media (salts, carbon sources, nitrogen sources, etc.) and the environment (temperature, labware, shaking speed, etc.) A trial identifier is everything required to uniquely identify a point of data.\n", "\n", "The three fundamental units of a trial identifier are the `Strain`, `Media` and `Environment` classes. \n", "\n", "| Model | Function |\n", "|-------------------|--------------------------------------------------------------------------------------|\n", "| Strain | Describes the organism being characterized (e.g. strain, knockouts, plasmids, etc.) |\n", "| Media | Described the medium used to characterize the organism (e.g. M9 + 0.02% glc_D) |\n", "| Environment | The conditions and labware used (e.g. 96-well plate, 250RPM, 37C) |\n", "\n", "We will build a trial identifier with all its components and use it to describe some data.s" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LMSE001+pTrc99a\n", "20g/L IPTG\n", "96MTP 250RPM 37C\n" ] } ], "source": [ "import impact as impt\n", "from importlib import reload\n", "reload(impt)\n", "strain = impt.Strain()\n", "strain.name = 'LMSE001'\n", "strain.plasmids.append(impt.Plasmid(name='pTrc99a'))\n", "print(strain)\n", "\n", "media = impt.Media()\n", "media.add_component('IPTG',concentration= 20,unit='ng/mL')\n", "print(media)\n", "\n", "env = impt.Environment(labware=impt.Labware(name='96MTP'),\n", " shaking_speed = 250,\n", " temperature = 37)\n", "print(env)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With these fundamental units, we can construct a trial identifier." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "strain: LMSE001+pTrc99a,\tmedia: 20g/L IPTG,\tenv: 96MTP 250RPM 37C,\tanalyte: None,\trep: -1\n" ] } ], "source": [ "ti = impt.TimeCourseIdentifier(strain=strain,media=media,environment=env)\n", "print(ti)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see the strain, media and environment set and some empty values for analyte and replicate. Let's fill in the missing values required to fully describe the analyte." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "strain: LMSE001+pTrc99a,\tmedia: 20g/L IPTG,\tenv: 96MTP 250RPM 37C,\tanalyte: glc__D,\trep: 1\n" ] } ], "source": [ "ti.analyte_name = 'glc__D'\n", "ti.analyte_type = 'substrate'\n", "ti.replicate_id = 1\n", "print(ti)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "We can now use this trial identifier to build objects with experimental data.\n", "\n", "## `AnalyteData`\n", "### `TimePoint` and `TimeCourse`\n", "These time points are rarely used directly, but are included in order to flatten data into a relational database. These time points can either be created and added individually, or a time vector and data vector can be provided and the associated time points will automatically be generated." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 0\n", "1 1\n", "2 2\n", "3 3\n", "4 4\n", "5 5\n", "dtype: int64\n" ] } ], "source": [ "substrate = impt.Substrate()\n", "# Add each time point individually\n", "for t, data in zip([0,1,2,3,4,5],[0,1,2,3,4,5]):\n", " tp = impt.TimePoint(trial_identifier=ti,time=t,data=data)\n", " substrate.add_timepoint(tp)\n", "\n", "# or, add the vectors\n", "substrate = impt.Substrate(trial_identifier=ti,time_vector=[0,1,2,3,4,5],data_vector=[0,1,2,3,4,5])\n", "print(substrate.pd_series)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we instantiated a substrate object because we are dealing with a substrate - this differentiation allows impact to choose the appropriate model for the data, as well as calculate features. Any time series data can be imported as a `impt.TimeCourse`, but additional details can be extracted if a specific data type is chosen. \n", "\n", "| Analyte type | Function |\n", "|----|----|\n", "| `Substrate` | An analyte which is consumed |\n", "| `Product` | An analyte which is produced |\n", "| `Biomass` | A measurement of the biomass concentration |\n", "| `Reporter`| A reporter, such as fluorescence from gfp or mCherry |" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "