{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Adding custom functionality\n", "\n", "## Writing a new parser\n", "\n", "Here we will go through the process of writing a parser that parses data from an arbitrary format, in this case the format is provided in the file `new_parser_data.xlsx` and consists of one sheet with data grouped by time point, and another sheet with the identifiers\n", "\n", "Generally, it makes sense to start with one of the existing parsers as a guide. In this case, the `spectromax_OD` parser would be most relevant. The sample data is available in `tests/test_data/sample_parser_docs_data.xlsx`" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\Naveen\\Anaconda3\\lib\\site-packages\\IPython\\html.py:14: ShimWarning: The `IPython.html` package has been deprecated since IPython 4.0. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.\n", " \"`IPython.html.widgets` has moved to `ipywidgets`.\", ShimWarning)\n" ] } ], "source": [ "from impact.parsers import Parser, parse_raw_identifier, parse_time_point_list\n", "from impact import TimePoint\n", "\n", "def new_parser(experiment, data, id_type='traverse'):\n", " # Define the layout of our data\n", " first_row_index = 0\n", " plate_size = 8\n", " spacing = 1\n", " time_row_col = [0,0]\n", " data_row_col = [1,0]\n", " \n", " # Define the type of data this parser accepts\n", " analyte_name = 'OD600'\n", " analyte_type = 'biomass'\n", " \n", " # In this case, we can first prepare the data by extracting the relevant information from each sheet \n", " unparsed_identifiers = data['identifiers']\n", " raw_data = data['data']\n", "\n", " # The data starts at (1,1) and is in a 8x12 format\n", " timepoint_list = []\n", "\n", " # We first parse the identifiers, as these can be recycled (the only thing that is changing is the time)\n", " identifiers = []\n", " for i, row in enumerate(unparsed_identifiers):\n", " parsed_row = []\n", " for j, data in enumerate(row):\n", " # Here we can implement logic to exclude any data which is not present, for example when a plate is not full\n", " # In this case, any cell which is empty, 0, or None will be excluded\n", " if unparsed_identifiers[i][j] not in ['', 0, '0', None]:\n", " temp_trial_identifier = parse_raw_identifier(unparsed_identifiers[i][j], id_type)\n", " parsed_row.append(temp_trial_identifier)\n", " else:\n", " parsed_row.append(None)\n", " identifiers.append(parsed_row)\n", "\n", " \n", " for start_row_index in range(first_row_index, len(raw_data), plate_size+spacing):\n", " if raw_data[start_row_index][0] != '~End':\n", " time = int(raw_data[start_row_index+time_row_col[0]][time_row_col[1]])\n", "\n", " # Define the data for a single plate, single timepoint\n", " plate_data = [row[2:14] for row in raw_data[start_row_index:start_row_index+plate_size]]\n", "\n", " # Load the data point by point\n", " for i, row in enumerate(plate_data):\n", " for j, data in enumerate(row):\n", " # Skip wells where no identifier is listed or no data present\n", " if identifiers[i][j] is not None and data not in [None,'']:\n", " ti = identifiers[i][j]\n", " ti.analyte_type, ti.analyte_name = analyte_type, analyte_name\n", " time_point = TimePoint(ti, time, float(data))\n", " timepoint_list.append(time_point)\n", " else:\n", " break\n", "\n", " # Finally we parse all of the time points (into their logical strucutre based on identifiers)\n", " # And add them to the experiment\n", " replicate_trial_list = parse_time_point_list(timepoint_list)\n", " for rep in replicate_trial_list:\n", " experiment.add_replicate_trial(rep)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the new parser defined, we can register it to the Parser class, and directly parse our data. The parser will return an `Experiment` instance, containing all the data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Importing data from ../tests/test_data/sample_parser_docs_data.xlsx...0.0s\n", "Parsing time point list...Parsed 246 time points in 0.2s\n", "Parsing analyte list...Parsed 82 analytes in 482.0ms\n", "Parsing single trial list...Parsed 32 replicates in 0.1s\n" ] } ], "source": [ "Parser.register_parser('my_new_format',new_parser)\n", "expt = Parser.parse_raw_data('my_new_format',file_name='../tests/test_data/sample_parser_docs_data.xlsx')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "strain media environment analytes\n", "----------------- -------------------- ------------- ----------\n", "3KO-D1 + pKDL071 Base + 1.0 a.u. aTc ['OD600']\n", "3KO-D1 + pKDL071 Base + 2.0 a.u. IPTG ['OD600']\n", "3KO-D28 + pKDL071 Base + 1.0 a.u. aTc ['OD600']\n", "3KO-D28 + pKDL071 Base + 2.0 a.u. IPTG ['OD600']\n", "3KO-D59 + pKDL071 Base + 1.0 a.u. aTc ['OD600']\n", "3KO-D59 + pKDL071 Base + 2.0 a.u. IPTG ['OD600']\n", "IMPT1 + pIMPT001 Base + 1.0 a.u. aTc ['OD600']\n", "IMPT1 + pIMPT001 Base + 2.0 a.u. IPTG ['OD600']\n", "IMPT1 + pIMPT002 Base + 1.0 a.u. aTc ['OD600']\n", "IMPT1 + pIMPT002 Base + 2.0 a.u. IPTG ['OD600']\n", "IMPT1 + pIMPT003 Base + 1.0 a.u. aTc ['OD600']\n", "IMPT1 + pIMPT003 Base + 2.0 a.u. IPTG ['OD600']\n", "IMPT1 + pIMPT004 Base + 1.0 a.u. aTc ['OD600']\n", "IMPT1 + pIMPT004 Base + 2.0 a.u. IPTG ['OD600']\n", "IMPT2 + pIMPT001 Base + 1.0 a.u. aTc ['OD600']\n", "IMPT2 + pIMPT001 Base + 2.0 a.u. IPTG ['OD600']\n", "IMPT2 + pIMPT002 Base + 1.0 a.u. aTc ['OD600']\n", "IMPT2 + pIMPT002 Base + 2.0 a.u. IPTG ['OD600']\n", "IMPT2 + pIMPT003 Base + 1.0 a.u. aTc ['OD600']\n", "IMPT2 + pIMPT003 Base + 2.0 a.u. IPTG ['OD600']\n", "IMPT2 + pIMPT004 Base + 1.0 a.u. aTc ['OD600']\n", "IMPT2 + pIMPT004 Base + 2.0 a.u. IPTG ['OD600']\n", "IMPT3 + pIMPT001 Base + 1.0 a.u. aTc ['OD600']\n", "IMPT3 + pIMPT001 Base + 2.0 a.u. IPTG ['OD600']\n", "IMPT3 + pIMPT002 Base + 1.0 a.u. aTc ['OD600']\n", "IMPT3 + pIMPT002 Base + 2.0 a.u. IPTG ['OD600']\n", "IMPT3 + pIMPT003 Base + 1.0 a.u. aTc ['OD600']\n", "IMPT3 + pIMPT003 Base + 2.0 a.u. IPTG ['OD600']\n", "IMPT3 + pIMPT004 Base + 1.0 a.u. aTc ['OD600']\n", "IMPT3 + pIMPT004 Base + 2.0 a.u. IPTG ['OD600']\n", "dlacI + pKDL071 Base + 1.0 a.u. aTc ['OD600']\n", "dlacI + pKDL071 Base + 2.0 a.u. IPTG ['OD600']\n" ] } ], "source": [ "print(expt)" ] } ], "metadata": { "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" } }, "nbformat": 4, "nbformat_minor": 2 }