{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Core structure\n",
    "The core framework is responsible for parsing and interpreting microbial physiology data by parsing it into a object-oriented and database-driven hierarchy. The data schema is based on the workflow of designing experiments and the associated data analysis process. \n",
    "\n",
    "This is an overview of the models:\n",
    "\n",
    "| Model                 | Function                                                                         |\n",
    "|-----------------------|----------------------------------------------------------------------------------|\n",
    "| TrialIdentifier       | Describes a trial (time, analyte, strain, media, etc.)                           |\n",
    "| AnalyteData           | Time, data points and vectors for quantified data (g/L product, OD, etc.)        |\n",
    "| SingleTrial           | All analytes for a given unit (e.g. a tube, well on plate, bioreactor, etc.)     |\n",
    "| ReplicateTrial        | Contains a set of `SingleTrial`s with replicates grouped to calculate statistics |\n",
    "| Experiment            | All of the trials performed on a given date                                      |\n",
    "| Project               | Groups of experiments with overall goals                                         | \n",
    "\n",
    "## `TrialIdentifier` class\n",
    "\n",
    "Before any data importing, a description should be generated. Data is described based on the strain (organism, plasmids, etc.), the media (salts, carbon sources, nitrogen sources, etc.) and the environment (temperature, labware, shaking speed, etc.) A trial identifier is everything required to uniquely identify a point of data.\n",
    "\n",
    "The three fundamental units of a trial identifier are the `Strain`, `Media` and `Environment` classes. \n",
    "\n",
    "| Model             | Function                                                                             |\n",
    "|-------------------|--------------------------------------------------------------------------------------|\n",
    "| Strain            | Describes the organism being characterized (e.g. strain, knockouts, plasmids, etc.)  |\n",
    "| Media             | Described the medium used to characterize the organism (e.g. M9 + 0.02% glc_D)       |\n",
    "| Environment       | The conditions and labware used (e.g. 96-well plate, 250RPM, 37C)                    |\n",
    "\n",
    "We will build a trial identifier with all its components and use it to describe some data.s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LMSE001+pTrc99a\n",
      "20g/L IPTG\n",
      "96MTP 250RPM 37C\n"
     ]
    }
   ],
   "source": [
    "import impact as impt\n",
    "from importlib import reload\n",
    "reload(impt)\n",
    "strain = impt.Strain()\n",
    "strain.name = 'LMSE001'\n",
    "strain.plasmids.append(impt.Plasmid(name='pTrc99a'))\n",
    "print(strain)\n",
    "\n",
    "media = impt.Media()\n",
    "media.add_component('IPTG',concentration= 20,unit='ng/mL')\n",
    "print(media)\n",
    "\n",
    "env = impt.Environment(labware=impt.Labware(name='96MTP'),\n",
    "                      shaking_speed = 250,\n",
    "                      temperature = 37)\n",
    "print(env)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With these fundamental units, we can construct a trial identifier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "strain: LMSE001+pTrc99a,\tmedia: 20g/L IPTG,\tenv: 96MTP 250RPM 37C,\tanalyte: None,\trep: -1\n"
     ]
    }
   ],
   "source": [
    "ti = impt.TimeCourseIdentifier(strain=strain,media=media,environment=env)\n",
    "print(ti)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see the strain, media and environment set and some empty values for analyte and replicate. Let's fill in the missing values required to fully describe the analyte."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "strain: LMSE001+pTrc99a,\tmedia: 20g/L IPTG,\tenv: 96MTP 250RPM 37C,\tanalyte: glc__D,\trep: 1\n"
     ]
    }
   ],
   "source": [
    "ti.analyte_name = 'glc__D'\n",
    "ti.analyte_type = 'substrate'\n",
    "ti.replicate_id = 1\n",
    "print(ti)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "We can now use this trial identifier to build objects with experimental data.\n",
    "\n",
    "## `AnalyteData`\n",
    "### `TimePoint` and `TimeCourse`\n",
    "These time points are rarely used directly, but are included in order to flatten data into a relational database. These time points can either be created and added individually, or a time vector and data vector can be provided and the associated time points will automatically be generated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0    0\n",
      "1    1\n",
      "2    2\n",
      "3    3\n",
      "4    4\n",
      "5    5\n",
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "substrate = impt.Substrate()\n",
    "# Add each time point individually\n",
    "for t, data in zip([0,1,2,3,4,5],[0,1,2,3,4,5]):\n",
    "    tp = impt.TimePoint(trial_identifier=ti,time=t,data=data)\n",
    "    substrate.add_timepoint(tp)\n",
    "\n",
    "# or, add the vectors\n",
    "substrate = impt.Substrate(trial_identifier=ti,time_vector=[0,1,2,3,4,5],data_vector=[0,1,2,3,4,5])\n",
    "print(substrate.pd_series)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we instantiated a substrate object because we are dealing with a substrate - this differentiation allows impact to choose the appropriate model for the data, as well as calculate features. Any time series data can be imported as a `impt.TimeCourse`, but additional details can be extracted if a specific data type is chosen. \n",
    "\n",
    "| Analyte type | Function |\n",
    "|----|----|\n",
    "| `Substrate` | An analyte which is consumed |\n",
    "| `Product` | An analyte which is produced |\n",
    "| `Biomass` | A measurement of the biomass concentration |\n",
    "| `Reporter`| A reporter, such as fluorescence from gfp or mCherry |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div id=\"7e235684-e6e3-4958-902a-148b22eb9552\" style=\"height: 525px; width: 100%;\" class=\"plotly-graph-div\"></div><script type=\"text/javascript\">require([\"plotly\"], function(Plotly) { window.PLOTLYENV=window.PLOTLYENV || {};window.PLOTLYENV.BASE_URL=\"https://plot.ly\";Plotly.newPlot(\"7e235684-e6e3-4958-902a-148b22eb9552\", [{\"y\": [0.05, 0.05555014705422354, 0.061716376754917215, 0.0685670760877903, 0.0761782231950977, 0.0846342300163428, 0.09402887846457647, 0.10446636052101872, 0.11606243378324244, 0.12894570528260402, 0.14325905780918444, 0.15916123456299414, 0.17682859970612186, 0.19645709434134995, 0.2182644096101492, 0.24249240101094138, 0.26940977071379074, 0.2993150476199145, 0.3325398982165634, 0.36945280494653254], \"type\": \"scatter\", \"x\": [0.0, 1.0526315789473684, 2.1052631578947367, 3.1578947368421053, 4.2105263157894735, 5.263157894736842, 6.315789473684211, 7.368421052631579, 8.421052631578947, 9.473684210526315, 10.526315789473683, 11.578947368421051, 12.631578947368421, 13.68421052631579, 14.736842105263158, 15.789473684210526, 16.842105263157894, 17.894736842105264, 18.94736842105263, 20.0]}], {}, {\"linkText\": \"Export to plot.ly\", \"showLink\": true})});</script>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import impact.plotting as implot\n",
    "\n",
    "def exp_growth(t):\n",
    "    X0 = 0.05\n",
    "    mu = 0.1\n",
    "    return X0 * np.exp(mu*t)\n",
    "\n",
    "# def production(t, product_yield, biomass_concentration):\n",
    "#     rate = \n",
    "\n",
    "x = np.linspace(0,20,20)\n",
    "y = exp_growth(x)\n",
    "implot.plot([implot.go.Scatter(x=x,y=y)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `SingleTrial`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `ReplicateTrial`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `Experiment`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `Project`"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}