astir.models package

Module contents

Classes

CellTypeModel(dset[, random_seed, dtype])

Class to perform statistical inference to assign cells to cell types.

CellStateModel(dset[, const, dropout_rate, …])

Class to perform statistical inference to on the activation

AstirModel(dset, random_seed, dtype)

Abstract class to perform statistical inference to assign.

TypeRecognitionNet(C, G[, hidden_size])

Type Recognition Neural Network.

StateRecognitionNet(C, G[, const, …])

State Recognition Neural Network to get mean of z and standard deviation of z.

class astir.models.CellTypeModel(dset, random_seed=1234, dtype=torch.float64)[source]

Bases: astir.models.abstract.AstirModel

Class to perform statistical inference to assign cells to cell types.

Parameters
  • dset (SCDataset) – the input gene expression dataframe

  • random_seed (int, optional) – the random seed for parameter initialization, defaults to 1234

  • dtype (torch.dtype, optional) – the data type of parameters, should be the same as dset, defaults to torch.float64

Methods

diagnostics(cell_type_assignments, alpha)

Run diagnostics on cell type assignments

fit([max_epochs, learning_rate, batch_size, …])

rtype

None

fit_yield_loss([max_epochs, learning_rate, …])

Runs train loops until the convergence reaches delta_loss for

get_assignment()

Get the final assignment of the dataset.

get_celltypes([threshold])

Get the most likely cell types

get_recognet()

Getter for the recognition net.

plot_clustermap([plot_name, threshold, figsize])

Save the heatmap of protein content in cells with cell types labeled.

predict(new_dset)

Feed new_dset to the recognition net to get a prediction.

diagnostics(cell_type_assignments, alpha)[source]

Run diagnostics on cell type assignments

See astir.Astir.diagnostics_celltype() for full documentation

Return type

DataFrame

fit(max_epochs=50, learning_rate=0.001, batch_size=128, delta_loss=0.001, msg='')[source]
Return type

None

fit_yield_loss(max_epochs=50, learning_rate=0.001, batch_size=128, delta_loss=0.001, msg='')[source]
Runs train loops until the convergence reaches delta_loss for

delta_loss_batch sizes or for max_epochs number of times

Parameters
  • max_epochs (int) – number of train loop iterations, defaults to 50

  • learning_rate (float) – the learning rate, defaults to 0.01

  • batch_size (int) – the batch size, defaults to 128

  • delta_loss (float) – stops iteration once the loss rate reaches delta_loss, defaults to 0.001

  • msg (str) – iterator bar message, defaults to empty string

Return type

None

get_assignment()[source]

Get the final assignment of the dataset.

Returns

the final assignment of the dataset

Return type

np.array

get_celltypes(threshold=0.7)[source]

Get the most likely cell types

A cell is assigned to a cell type if the probability is greater than threshold. If no cell types have a probability higher than threshold, then “Unknown” is returned

Parameters

threshold – the probability threshold above which a cell is assigned to a cell type

Return type

DataFrame

Returns

a data frame with most likely cell types for each

get_recognet()[source]

Getter for the recognition net.

Return type

TypeRecognitionNet

Returns

the trained recognition net

plot_clustermap(plot_name='celltype_protein_cluster.png', threshold=0.7, figsize=7, 5)[source]

Save the heatmap of protein content in cells with cell types labeled.

Parameters
  • plot_name (str, optional) – name of the plot, extension(e.g. .png or .jpg) is needed, defaults to “celltype_protein_cluster.png”

  • threshold (float, optional) – the probability threshold above which a cell is assigned to a cell type, defaults to 0.7

Return type

None

predict(new_dset)[source]

Feed new_dset to the recognition net to get a prediction.

Parameters

new_dset (pd.DataFrame) – the dataset to be predicted

Returns

the resulting cell type assignment

Return type

np.array

class astir.models.CellStateModel(dset, const=2, dropout_rate=0, batch_norm=False, random_seed=42, dtype=torch.float64)[source]

Bases: astir.models.abstract.AstirModel

Class to perform statistical inference to on the activation

of states (pathways) across cells

Methods

diagnostics()

Run diagnostics on cell state assignments

fit([max_epochs, learning_rate, batch_size, …])

Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times

get_correlations()

rtype

array

get_data()

Returns data parameter

get_final_mu_z([new_dset])

Returns the mean of the predicted z values for each core

get_losses()

Getter for losses

get_recognet()

Getter for the recognition net

get_scdataset()

Returns the input dataset

get_variables()

Returns all variables

is_converged()

Returns True if the model converged

Parameters
  • df_gex – the input gene expression dataframe

  • marker_dict – the gene marker dictionary

  • random_seed (int) – seed number to reproduce results, defaults to 1234

  • dtype (dtype) – torch datatype to use in the model

diagnostics()[source]

Run diagnostics on cell state assignments

Return type

DataFrame

Returns

diagnostics

fit(max_epochs=50, learning_rate=0.001, batch_size=128, delta_loss=0.001, delta_loss_batch=10, msg='')[source]

Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times

Parameters
  • max_epochs (int) – number of train loop iterations, defaults to 50

  • learning_rate (float) – the learning rate, defaults to 0.01

  • batch_size (int) – the batch size, defaults to 128

  • delta_loss (float) – stops iteration once the loss rate reaches delta_loss, defaults to 0.001

  • delta_loss_batch (int) – the batch size to consider delta loss, defaults to 10

  • msg (str) – iterator bar message, defaults to empty string

Return type

List[float]

get_correlations()[source]
Return type

array

get_data()[source]

Returns data parameter

Return type

Dict[str, Tensor]

Returns

self._data

get_final_mu_z(new_dset=None)[source]

Returns the mean of the predicted z values for each core

Parameters

new_dset (Optional[SCDataset]) – returns the predicted z values of this dataset on the existing model. If None, it predicts using the existing dataset

Return type

Tensor

Returns

the mean of the predicted z values for each core

get_losses()[source]

Getter for losses

Return type

array

Returns

a torch tensor of losses for each training iteration the model runs

get_recognet()[source]

Getter for the recognition net

Return type

StateRecognitionNet

Returns

the trained recognition net

get_scdataset()[source]

Returns the input dataset

Return type

SCDataset

Returns

self._dset

get_variables()[source]

Returns all variables

Return type

Dict[str, Tensor]

Returns

self._variables

is_converged()[source]

Returns True if the model converged

Return type

bool

Returns

self._is_converged

class astir.models.AstirModel(dset, random_seed, dtype)[source]

Bases: object

Abstract class to perform statistical inference to assign. This module is the super class of CellTypeModel and CellStateModel and is not supposed to be instantiated.

Methods

fit(max_epochs, learning_rate, batch_size, …)

rtype

None

get_data()

get_losses()

Getter for losses.

get_scdataset()

Getter for the SCDataset.

get_variables()

Returns all variables

is_converged()

Returns True if the model converged

fit(max_epochs, learning_rate, batch_size, delta_loss, msg)[source]
Return type

None

get_data()[source]
get_losses()[source]

Getter for losses.

Returns

self.losses

Return type

float

get_scdataset()[source]

Getter for the SCDataset.

Returns

self._dset

Return type

SCDataset

get_variables()[source]

Returns all variables

Returns

self._variables

is_converged()[source]

Returns True if the model converged

Return type

bool

Returns

self._is_converged

class astir.models.TypeRecognitionNet(C, G, hidden_size=10)[source]

Bases: torch.nn.modules.module.Module

Type Recognition Neural Network.

Parameters
  • C (int) – number of classes

  • G (int) – number of features

  • hidden_size – size of hidden layers

Methods

forward(x)

One forward pass.

forward(x)[source]

One forward pass.

Parameters

x (torch.Tensor) – the input vector

Returns

the calculated cost value

Return type

torch.Tensor

class astir.models.StateRecognitionNet(C, G, const=2, dropout_rate=0, batch_norm=False)[source]

Bases: torch.nn.modules.module.Module

State Recognition Neural Network to get mean of z and standard deviation of z. The neural network architecture looks like this: G -> const * C -> const * C -> G (for mu) or -> G (for std). With batch normal layers after each activation output layers and dropout activation units

Parameters
  • C (int) – number of classes

  • G (int) – number of proteins

  • const (int) – the size of the hidden layers are const times proportional to C

  • dropout_rate (float) – the dropout rate

  • batch_norm (bool) – apply batch normal layers if True

Methods

forward(x)

One forward pass of the StateRecognitionNet

forward(x)[source]

One forward pass of the StateRecognitionNet

Return type

Tuple[Tensor, Tensor]