"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.heatmap(assignments)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"where each row corresponds to a cell, and each column to a cell type, with the entry being the probability of that cell belonging to a particular cell type.\n",
"\n",
"To fetch an array corresponding to the most likely cell type assignments, call"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
cell_type
\n",
"
\n",
" \n",
" \n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_1
\n",
"
Unknown
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_2
\n",
"
Unknown
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_3
\n",
"
Unknown
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_4
\n",
"
Unknown
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_5
\n",
"
Unknown
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_4927
\n",
"
epithelial(basal)
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_4928
\n",
"
Unknown
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_4929
\n",
"
epithelial(basal)
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_4930
\n",
"
epithelial(luminal)
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_4931
\n",
"
epithelial(luminal)
\n",
"
\n",
" \n",
"
\n",
"
4931 rows × 1 columns
\n",
"
"
],
"text/plain": [
" cell_type\n",
"BaselTMA_SP43_115_X4Y8_1 Unknown\n",
"BaselTMA_SP43_115_X4Y8_2 Unknown\n",
"BaselTMA_SP43_115_X4Y8_3 Unknown\n",
"BaselTMA_SP43_115_X4Y8_4 Unknown\n",
"BaselTMA_SP43_115_X4Y8_5 Unknown\n",
"... ...\n",
"BaselTMA_SP43_115_X4Y8_4927 epithelial(basal)\n",
"BaselTMA_SP43_115_X4Y8_4928 Unknown\n",
"BaselTMA_SP43_115_X4Y8_4929 epithelial(basal)\n",
"BaselTMA_SP43_115_X4Y8_4930 epithelial(luminal)\n",
"BaselTMA_SP43_115_X4Y8_4931 epithelial(luminal)\n",
"\n",
"[4931 rows x 1 columns]"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ast.get_celltypes()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cell type diagnostics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is important to run diagnostics to ensure that cell types express their markers at higher levels than other cell types. To do this, run the `diagnostics_celltype()` function, which will alert to any issues if a cell type doesn't express its marker signficantly higher than an alternative cell type (for which that protein isn't a marker):"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
feature
\n",
"
should be expressed higher in
\n",
"
than
\n",
"
mean cell type 1
\n",
"
mean cell type 2
\n",
"
p-value
\n",
"
note
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Fibronectin
\n",
"
stromal
\n",
"
B cells
\n",
"
2.045195
\n",
"
1.565420
\n",
"
inf
\n",
"
Only 1 cell in a type: comparison not possible
\n",
"
\n",
"
\n",
"
1
\n",
"
Vimentin
\n",
"
stromal
\n",
"
B cells
\n",
"
2.994335
\n",
"
1.095477
\n",
"
inf
\n",
"
Only 1 cell in a type: comparison not possible
\n",
"
\n",
"
\n",
"
2
\n",
"
CD20
\n",
"
B cells
\n",
"
stromal
\n",
"
0.403677
\n",
"
0.081944
\n",
"
inf
\n",
"
Only 1 cell in a type: comparison not possible
\n",
"
\n",
"
\n",
"
3
\n",
"
CD20
\n",
"
B cells
\n",
"
T cells
\n",
"
0.403677
\n",
"
0.123195
\n",
"
inf
\n",
"
Only 1 cell in a type: comparison not possible
\n",
"
\n",
"
\n",
"
4
\n",
"
CD20
\n",
"
B cells
\n",
"
macrophage
\n",
"
0.403677
\n",
"
0.212008
\n",
"
inf
\n",
"
Only 1 cell in a type: comparison not possible
\n",
"
\n",
"
\n",
"
5
\n",
"
CD20
\n",
"
B cells
\n",
"
epithelial(basal)
\n",
"
0.403677
\n",
"
0.118228
\n",
"
inf
\n",
"
Only 1 cell in a type: comparison not possible
\n",
"
\n",
"
\n",
"
6
\n",
"
CD20
\n",
"
B cells
\n",
"
epithelial(luminal)
\n",
"
0.403677
\n",
"
0.131991
\n",
"
inf
\n",
"
Only 1 cell in a type: comparison not possible
\n",
"
\n",
"
\n",
"
7
\n",
"
CD20
\n",
"
B cells
\n",
"
Other
\n",
"
0.403677
\n",
"
0.017356
\n",
"
inf
\n",
"
Only 1 cell in a type: comparison not possible
\n",
"
\n",
"
\n",
"
8
\n",
"
CD45
\n",
"
B cells
\n",
"
stromal
\n",
"
0.077596
\n",
"
0.227257
\n",
"
inf
\n",
"
Only 1 cell in a type: comparison not possible
\n",
"
\n",
"
\n",
"
9
\n",
"
CD45
\n",
"
B cells
\n",
"
epithelial(basal)
\n",
"
0.077596
\n",
"
0.182078
\n",
"
inf
\n",
"
Only 1 cell in a type: comparison not possible
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" feature should be expressed higher in than \\\n",
"0 Fibronectin stromal B cells \n",
"1 Vimentin stromal B cells \n",
"2 CD20 B cells stromal \n",
"3 CD20 B cells T cells \n",
"4 CD20 B cells macrophage \n",
"5 CD20 B cells epithelial(basal) \n",
"6 CD20 B cells epithelial(luminal) \n",
"7 CD20 B cells Other \n",
"8 CD45 B cells stromal \n",
"9 CD45 B cells epithelial(basal) \n",
"\n",
" mean cell type 1 mean cell type 2 p-value \\\n",
"0 2.045195 1.565420 inf \n",
"1 2.994335 1.095477 inf \n",
"2 0.403677 0.081944 inf \n",
"3 0.403677 0.123195 inf \n",
"4 0.403677 0.212008 inf \n",
"5 0.403677 0.118228 inf \n",
"6 0.403677 0.131991 inf \n",
"7 0.403677 0.017356 inf \n",
"8 0.077596 0.227257 inf \n",
"9 0.077596 0.182078 inf \n",
"\n",
" note \n",
"0 Only 1 cell in a type: comparison not possible \n",
"1 Only 1 cell in a type: comparison not possible \n",
"2 Only 1 cell in a type: comparison not possible \n",
"3 Only 1 cell in a type: comparison not possible \n",
"4 Only 1 cell in a type: comparison not possible \n",
"5 Only 1 cell in a type: comparison not possible \n",
"6 Only 1 cell in a type: comparison not possible \n",
"7 Only 1 cell in a type: comparison not possible \n",
"8 Only 1 cell in a type: comparison not possible \n",
"9 Only 1 cell in a type: comparison not possible "
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ast.diagnostics_celltype().head(n=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
".. note:: \n",
" In this tutorial, we end up with many \"Only 1 cell in a type: comparison not possible\" notes - this is simply because the small dataset size results in only a single cell assigned to many types, making statistical testing infeasible."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Calling `ast.diagnostics_celltype()` returns a `pd.DataFrame`, where each column corresponds to a particular protein and two cell types, with a warning if the protein is not expressed at higher levels in the cell type for which it is a marker than the cell type for which it is not."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The diagnostics:\n",
"\n",
"1. Iterates through every cell type and every marker for that cell type\n",
"\n",
"2. Given a cell type *c* and marker *g*, find the set of cell types *D* that don't have *g* as a marker\n",
"\n",
"3. For each cell type *d* in *D*, perform a t-test between the expression of marker *g* in *c* vs *d*\n",
"\n",
"4. If *g* is not expressed significantly higher (at significance *alpha*), output a diagnostic explaining this for further investigation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If multiple issues are found, the markers and cell types may need refined."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Fitting cell state "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly as before, to fit cell state, call"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/jinelles.h/Documents/Camlab/astir-top-level/astir/astir/models/cellstate.py:176: UserWarning: Delta loss batch size is greater than the number of epochs\n",
" warnings.warn(\"Delta loss batch size is greater than the number of epochs\")\n",
"training restart 1/5: 100%|██████████| 5/5 [ 3.41s/epochs, current loss: 59.8]\n",
"training restart 2/5: 100%|██████████| 5/5 [ 3.40s/epochs, current loss: 98.0] \n",
"training restart 3/5: 100%|██████████| 5/5 [ 3.51s/epochs, current loss: 78.2]\n",
"training restart 4/5: 100%|██████████| 5/5 [ 3.48s/epochs, current loss: 88.0] \n",
"training restart 5/5: 100%|██████████| 5/5 [ 3.37s/epochs, current loss: 60.0]\n",
"training restart (final): 100%|██████████| 50/50 [ 3.60s/epochs, current loss: 14.5]\n",
"/Users/jinelles.h/Documents/Camlab/astir-top-level/astir/astir/astir.py:268: UserWarning: Maximum epochs reached. More iteration may be needed to complete the training.\n",
" warnings.warn(msg)\n"
]
}
],
"source": [
"ast.fit_state(batch_size = 1024, learning_rate=1e-3, max_epochs=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and similary plot the losses via"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5, 0, 'Epoch')"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.scatter(\n",
" states['RTK_signalling'],\n",
" ast.get_state_dataset().get_exprs_df()['Her2']\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cell state diagnostics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is important to run diagnostics on cell states model for the same reasons\n",
"stated for the cell type model. `Astir.diagnostics_cellstate()` spots any non\n",
" marker protein and pathway pairs whose expressions are higher than those of\n",
" the marker proteins of the pathway."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
pathway
\n",
"
protein A
\n",
"
correlation of protein A
\n",
"
protein B
\n",
"
correlation of protein B
\n",
"
note
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
RTK_signalling
\n",
"
EGFR
\n",
"
0.823298
\n",
"
Cleaved Caspase3
\n",
"
0.831128
\n",
"
EGFR is marker for RTK_signalling but Cleaved ...
\n",
"
\n",
"
\n",
"
1
\n",
"
RTK_signalling
\n",
"
EGFR
\n",
"
0.823298
\n",
"
cleaved PARP
\n",
"
0.831128
\n",
"
EGFR is marker for RTK_signalling but cleaved ...
\n",
"
\n",
"
\n",
"
2
\n",
"
cell_growth
\n",
"
Ki-67
\n",
"
0.058879
\n",
"
Cleaved Caspase3
\n",
"
0.862459
\n",
"
Ki-67 is marker for cell_growth but Cleaved Ca...
\n",
"
\n",
"
\n",
"
3
\n",
"
cell_growth
\n",
"
Ki-67
\n",
"
0.058879
\n",
"
EGFR
\n",
"
0.897650
\n",
"
Ki-67 is marker for cell_growth but EGFR isn't
\n",
"
\n",
"
\n",
"
4
\n",
"
cell_growth
\n",
"
Ki-67
\n",
"
0.058879
\n",
"
Her2
\n",
"
0.750985
\n",
"
Ki-67 is marker for cell_growth but Her2 isn't
\n",
"
\n",
"
\n",
"
5
\n",
"
cell_growth
\n",
"
Ki-67
\n",
"
0.058879
\n",
"
cleaved PARP
\n",
"
0.862459
\n",
"
Ki-67 is marker for cell_growth but cleaved PA...
\n",
"
\n",
"
\n",
"
6
\n",
"
cell_growth
\n",
"
Ki-67
\n",
"
0.058879
\n",
"
phospho S6
\n",
"
0.272989
\n",
"
Ki-67 is marker for cell_growth but phospho S6...
\n",
"
\n",
"
\n",
"
7
\n",
"
mTOR_signalling
\n",
"
phospho S6
\n",
"
0.699387
\n",
"
EGFR
\n",
"
0.720013
\n",
"
phospho S6 is marker for mTOR_signalling but E...
\n",
"
\n",
"
\n",
"
8
\n",
"
mTOR_signalling
\n",
"
phospho S6
\n",
"
0.699387
\n",
"
Her2
\n",
"
0.757867
\n",
"
phospho S6 is marker for mTOR_signalling but H...
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" pathway protein A correlation of protein A protein B \\\n",
"0 RTK_signalling EGFR 0.823298 Cleaved Caspase3 \n",
"1 RTK_signalling EGFR 0.823298 cleaved PARP \n",
"2 cell_growth Ki-67 0.058879 Cleaved Caspase3 \n",
"3 cell_growth Ki-67 0.058879 EGFR \n",
"4 cell_growth Ki-67 0.058879 Her2 \n",
"5 cell_growth Ki-67 0.058879 cleaved PARP \n",
"6 cell_growth Ki-67 0.058879 phospho S6 \n",
"7 mTOR_signalling phospho S6 0.699387 EGFR \n",
"8 mTOR_signalling phospho S6 0.699387 Her2 \n",
"\n",
" correlation of protein B note \n",
"0 0.831128 EGFR is marker for RTK_signalling but Cleaved ... \n",
"1 0.831128 EGFR is marker for RTK_signalling but cleaved ... \n",
"2 0.862459 Ki-67 is marker for cell_growth but Cleaved Ca... \n",
"3 0.897650 Ki-67 is marker for cell_growth but EGFR isn't \n",
"4 0.750985 Ki-67 is marker for cell_growth but Her2 isn't \n",
"5 0.862459 Ki-67 is marker for cell_growth but cleaved PA... \n",
"6 0.272989 Ki-67 is marker for cell_growth but phospho S6... \n",
"7 0.720013 phospho S6 is marker for mTOR_signalling but E... \n",
"8 0.757867 phospho S6 is marker for mTOR_signalling but H... "
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ast.diagnostics_cellstate().head(n=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Calling `ast.diagnostics_cellstate()` returns a `pd.DataFrame`, where each\n",
"column corresponds to a particular protein and two cell types, with a warning\n",
" if the protein is not expressed at higher levels in the cell state for which\n",
" it is a marker than the cell state for which it is not."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The diagnostics:\n",
"\n",
"1. Get correlations between all cell states and proteins\n",
"\n",
"2. For each cell state *c*, get the smallest correlation with marker *g*\n",
"\n",
"3. For each cell state *c* and its non marker *g*, find any correlation that is\n",
"bigger than those smallest correlation for *c*.\n",
"\n",
"4. Any *c* and *g* pairs found in step 3 will be included in the output of\n",
"`Astir.diagnostics_cellstate()`, including an explanation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If multiple issues are found, the markers and cell states may need refined.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Saving results "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Both cell type and cell state information can easily be saved to disk via"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"ast.type_to_csv(\"data/cell-types.csv\")\n",
"ast.state_to_csv(\"data/cell-states.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
",stromal,B cells,T cells,macrophage,epithelial(basal),epithelial(luminal),Other\r\n",
"BaselTMA_SP43_115_X4Y8_1,5.9812641356773485e-05,0.43514272442399615,4.4133780759206555e-06,4.778704083811438e-05,0.00590960852651017,1.6351770806453748e-06,0.5588340188121422\r\n",
"BaselTMA_SP43_115_X4Y8_2,7.080264289768121e-05,0.38496458175145926,5.889272783675721e-06,4.3142984449199766e-05,0.04471036731616506,7.2442437475747445e-06,0.5701979717884975\r\n"
]
}
],
"source": [
"!head -n 3 data/cell-types.csv"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
",RTK_signalling,cell_growth,mTOR_signalling,apoptosis\r\n",
"BaselTMA_SP43_115_X4Y8_1,0.2191776340692214,0.17884886731101723,0.13343558969907734,0.1441052515306596\r\n",
"BaselTMA_SP43_115_X4Y8_2,0.29646752007327026,0.11967817733248733,0.20014708539822942,0.1579368118796618\r\n"
]
}
],
"source": [
"!head -n 3 data/cell-states.csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"where the first (unnamed) column always corresponds to the cell name/ID."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Accessing internal functions and data "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Data stored in `astir` objects is in the form of an `SCDataSet`. These can be retrieved via"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"celltype_data = ast.get_type_dataset()\n",
"celltype_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and similarly for cell state via `ast.get_state_dataset()`.\n",
"\n",
"These have several helper functions to retrieve relevant information to the dataset:"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['BaselTMA_SP43_115_X4Y8_1',\n",
" 'BaselTMA_SP43_115_X4Y8_2',\n",
" 'BaselTMA_SP43_115_X4Y8_3',\n",
" 'BaselTMA_SP43_115_X4Y8_4']"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"celltype_data.get_cell_names()[0:4] # cell names"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['stromal',\n",
" 'B cells',\n",
" 'T cells',\n",
" 'macrophage',\n",
" 'epithelial(basal)',\n",
" 'epithelial(luminal)']"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"celltype_data.get_classes() # cell type names"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"6\n",
"14\n"
]
}
],
"source": [
"print(celltype_data.get_n_classes()) # number of cell types\n",
"print(celltype_data.get_n_features()) # number of features / proteins"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[4.2121e-02, 5.2044e-02, 1.7348e-03, ..., 6.1314e-01, 4.4827e-02,\n",
" 3.8841e-01],\n",
" [0.0000e+00, 3.4770e-02, 0.0000e+00, ..., 9.4025e-01, 1.9424e-02,\n",
" 5.6716e-01],\n",
" [1.8200e-02, 7.8596e-02, 0.0000e+00, ..., 6.2852e-01, 4.0905e-02,\n",
" 8.4946e-01],\n",
" ...,\n",
" [1.4403e-01, 7.0877e-02, 2.0325e-01, ..., 1.7303e+00, 2.1434e-01,\n",
" 9.4889e-01],\n",
" [3.6400e-02, 9.9307e-02, 1.1815e-01, ..., 1.6467e+00, 1.3463e-01,\n",
" 1.9238e+00],\n",
" [5.4069e-02, 1.1861e-01, 5.3894e-02, ..., 1.4265e+00, 4.9443e-01,\n",
" 1.6126e+00]], dtype=torch.float64)"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"celltype_data.get_exprs() # Return a torch tensor corresponding to the expression data used"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" CD20 CD3 CD45 CD68 \\\n",
"BaselTMA_SP43_115_X4Y8_1 0.008424 0.010409 0.000347 0.008586 \n",
"BaselTMA_SP43_115_X4Y8_2 0.000000 0.006954 0.000000 0.016915 \n",
"BaselTMA_SP43_115_X4Y8_3 0.003640 0.015719 0.000000 0.008586 \n",
"BaselTMA_SP43_115_X4Y8_4 0.000000 0.003740 0.003136 0.021657 \n",
"BaselTMA_SP43_115_X4Y8_5 0.019956 0.014758 0.018645 0.024490 \n",
"... ... ... ... ... \n",
"BaselTMA_SP43_115_X4Y8_4927 0.000000 0.007295 0.028565 0.076878 \n",
"BaselTMA_SP43_115_X4Y8_4928 0.000000 0.020123 0.017928 0.018262 \n",
"BaselTMA_SP43_115_X4Y8_4929 0.028802 0.014175 0.040638 0.100518 \n",
"BaselTMA_SP43_115_X4Y8_4930 0.007280 0.019860 0.023629 0.089328 \n",
"BaselTMA_SP43_115_X4Y8_4931 0.010813 0.023720 0.010779 0.096799 \n",
"\n",
" Cytokeratin 14 Cytokeratin 19 Cytokeratin 5 \\\n",
"BaselTMA_SP43_115_X4Y8_1 0.009215 0.015349 0.022756 \n",
"BaselTMA_SP43_115_X4Y8_2 0.011813 0.011297 0.019773 \n",
"BaselTMA_SP43_115_X4Y8_3 0.005251 0.025360 0.033187 \n",
"BaselTMA_SP43_115_X4Y8_4 0.014276 0.010551 0.004497 \n",
"BaselTMA_SP43_115_X4Y8_5 0.000000 0.047767 0.038716 \n",
"... ... ... ... \n",
"BaselTMA_SP43_115_X4Y8_4927 0.009306 0.000123 0.027927 \n",
"BaselTMA_SP43_115_X4Y8_4928 0.000000 0.019686 0.000000 \n",
"BaselTMA_SP43_115_X4Y8_4929 0.010249 0.040175 0.027628 \n",
"BaselTMA_SP43_115_X4Y8_4930 0.057028 0.054079 0.021476 \n",
"BaselTMA_SP43_115_X4Y8_4931 0.068105 0.026760 0.030130 \n",
"\n",
" Cytokeratin 7 Cytokeratin 8/18 E-Cadherin \\\n",
"BaselTMA_SP43_115_X4Y8_1 0.014714 0.022104 0.159329 \n",
"BaselTMA_SP43_115_X4Y8_2 0.003848 0.030560 0.193268 \n",
"BaselTMA_SP43_115_X4Y8_3 0.062041 0.050745 0.220020 \n",
"BaselTMA_SP43_115_X4Y8_4 0.018720 0.012455 0.238691 \n",
"BaselTMA_SP43_115_X4Y8_5 0.000000 0.025077 0.242644 \n",
"... ... ... ... \n",
"BaselTMA_SP43_115_X4Y8_4927 0.000000 0.000000 0.339496 \n",
"BaselTMA_SP43_115_X4Y8_4928 0.000000 0.031084 0.180707 \n",
"BaselTMA_SP43_115_X4Y8_4929 0.009111 0.011427 0.415244 \n",
"BaselTMA_SP43_115_X4Y8_4930 0.084552 0.046873 0.368578 \n",
"BaselTMA_SP43_115_X4Y8_4931 0.073124 0.032700 0.360796 \n",
"\n",
" Fibronectin Her2 Vimentin pan Cytokeratin \n",
"BaselTMA_SP43_115_X4Y8_1 0.064861 0.122322 0.008965 0.077604 \n",
"BaselTMA_SP43_115_X4Y8_2 0.076731 0.186959 0.003885 0.113190 \n",
"BaselTMA_SP43_115_X4Y8_3 0.051344 0.125376 0.008181 0.169086 \n",
"BaselTMA_SP43_115_X4Y8_4 0.132006 0.105818 0.018769 0.083547 \n",
"BaselTMA_SP43_115_X4Y8_5 0.124412 0.098638 0.000000 0.059350 \n",
"... ... ... ... ... \n",
"BaselTMA_SP43_115_X4Y8_4927 0.046690 0.252541 0.034240 0.079310 \n",
"BaselTMA_SP43_115_X4Y8_4928 0.082254 0.073101 0.022108 0.157243 \n",
"BaselTMA_SP43_115_X4Y8_4929 0.167446 0.339493 0.042855 0.188656 \n",
"BaselTMA_SP43_115_X4Y8_4930 0.152866 0.323661 0.026924 0.375847 \n",
"BaselTMA_SP43_115_X4Y8_4931 0.230238 0.281564 0.098726 0.317179 \n",
"\n",
"[4931 rows x 14 columns]"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ast.get_type_dataset().get_exprs_df()"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"source": [
"## 6. Saving models "
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"source": [
"After fixing the models, we can save the cell type/state assignment, the losses, the parameters (e.g. `mu`, `rho`, `log_sigma`, etc) and the run informations (e.g. `batch_size`, `learning_rate`, `delta_loss`, etc) to an hdf5 file."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"ast.save_models(\"data/astir_summary.hdf5\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The hierarchy of the hdf5 file would be:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Only the model that is trained will be saved (`CellTypeModel` or `CellStateModel` or both). If the functioned is called before any model is trained, exception will be raised. Data saved in the file is either `int` or `np.array`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Plot clustermap of expression data "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After fixing the cell type model, we can also plot a heatmap of protein expression of cells clustered by type. The heatmap will be saved at the location `plot_name`, which is default to `\"./celltype_protein_cluster.png\"`"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"ast.type_clustermap(plot_name=\"./img/celltype_protein_cluster.png\", threshold = 0.7, figsize=(7, 5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: `threshold` is the probability threshold above which a cell is assigned to a cell type, default to 0.7. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. Hierarchical model specification "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the marker yaml file, the user can also add a section called `hierarchy`, which specifies the hierarchical structure of cell types. Here's an example:\n",
"```\n",
"hierarchy:\n",
" immune:\n",
" - B cells\n",
" - T cells\n",
" - macrophage\n",
" epithelial:\n",
" - epithelial(basal)\n",
" - epithelial(luminal)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some notes: \n",
"1. The section would be accessed by key `hierarchy`.\n",
"2. In the section, the higher-levelled cell type names should be the keys.\n",
"3. The values in the section should also exist as the cell type names in the `cell_types` section. (e.g. if we have `\"B cells\"` in `marker[\"hierarchy\"][\"immune\"]`, we should also be able to get `marker[\"cell_types\"][\"B cells\"]`)\n",
"\n",
"This section could be used to summarize the cell types assignment at a higher hierarchical level. (e.g. a cell is predicted as \"immune\" instead of \"B cells\" or \"T cells\")"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
immune
\n",
"
epithelial
\n",
"
\n",
" \n",
" \n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_1
\n",
"
0.435195
\n",
"
0.005911
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_2
\n",
"
0.385014
\n",
"
0.044718
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_3
\n",
"
0.188510
\n",
"
0.330741
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_4
\n",
"
0.444039
\n",
"
0.011956
\n",
"
\n",
"
\n",
"
BaselTMA_SP43_115_X4Y8_5
\n",
"
0.468241
\n",
"
0.017085
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" immune epithelial\n",
"BaselTMA_SP43_115_X4Y8_1 0.435195 0.005911\n",
"BaselTMA_SP43_115_X4Y8_2 0.385014 0.044718\n",
"BaselTMA_SP43_115_X4Y8_3 0.188510 0.330741\n",
"BaselTMA_SP43_115_X4Y8_4 0.444039 0.011956\n",
"BaselTMA_SP43_115_X4Y8_5 0.468241 0.017085"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hierarchy_probs = ast.assign_celltype_hierarchy()\n",
"hierarchy_probs.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To make it more clear, here's a heatmap for the cell assignment in a higher hierarchy:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The way it is calculated is simply summing up the probabilities of the cell type assignments under the same hierarchy."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}