astir.models package¶

Module contents¶

Classes

`CellTypeModel`(dset[, random_seed, dtype])	Class to perform statistical inference to assign cells to cell types.
`CellStateModel`(dset[, const, dropout_rate, …])	Class to perform statistical inference to on the activation
`AstirModel`(dset, random_seed, dtype)	Abstract class to perform statistical inference to assign.
`TypeRecognitionNet`(C, G[, hidden_size])	Type Recognition Neural Network.
`StateRecognitionNet`(C, G[, const, …])	State Recognition Neural Network to get mean of z and standard deviation of z.

class astir.models.CellTypeModel(dset, random_seed=1234, dtype=torch.float64)[source]¶

Bases: astir.models.abstract.AstirModel

Class to perform statistical inference to assign cells to cell types.

Parameters

dset (SCDataset) – the input gene expression dataframe
random_seed (int, optional) – the random seed for parameter initialization, defaults to 1234
dtype (torch.dtype, optional) – the data type of parameters, should be the same as dset, defaults to torch.float64

Methods

`diagnostics`(cell_type_assignments, alpha)	Run diagnostics on cell type assignments
`fit`([max_epochs, learning_rate, batch_size, …])	rtype `None`
`fit_yield_loss`([max_epochs, learning_rate, …])	Runs train loops until the convergence reaches delta_loss for
`get_assignment`()	Get the final assignment of the dataset.
`get_celltypes`([threshold])	Get the most likely cell types
`get_recognet`()	Getter for the recognition net.
`plot_clustermap`([plot_name, threshold, figsize])	Save the heatmap of protein content in cells with cell types labeled.
`predict`(new_dset)	Feed new_dset to the recognition net to get a prediction.

diagnostics(cell_type_assignments, alpha)[source]¶

Run diagnostics on cell type assignments

See astir.Astir.diagnostics_celltype() for full documentation

Return type: DataFrame

fit(max_epochs=50, learning_rate=0.001, batch_size=128, delta_loss=0.001, msg='')[source]¶

Return type: None

fit_yield_loss(max_epochs=50, learning_rate=0.001, batch_size=128, delta_loss=0.001, msg='')[source]¶

Runs train loops until the convergence reaches delta_loss for: delta_loss_batch sizes or for max_epochs number of times

Parameters

max_epochs (int) – number of train loop iterations, defaults to 50
learning_rate (float) – the learning rate, defaults to 0.01
batch_size (int) – the batch size, defaults to 128
delta_loss (float) – stops iteration once the loss rate reaches delta_loss, defaults to 0.001
msg (str) – iterator bar message, defaults to empty string

Return type

None

get_assignment()[source]¶

Get the final assignment of the dataset.

Returns: the final assignment of the dataset
Return type: np.array

get_celltypes(threshold=0.7)[source]¶

Get the most likely cell types

A cell is assigned to a cell type if the probability is greater than threshold. If no cell types have a probability higher than threshold, then “Unknown” is returned

Parameters: threshold – the probability threshold above which a cell is assigned to a cell type
Return type: DataFrame
Returns: a data frame with most likely cell types for each

get_recognet()[source]¶

Getter for the recognition net.

Return type: TypeRecognitionNet
Returns: the trained recognition net

plot_clustermap(plot_name='celltype_protein_cluster.png', threshold=0.7, figsize=7, 5)[source]¶

Save the heatmap of protein content in cells with cell types labeled.

Parameters

plot_name (str, optional) – name of the plot, extension(e.g. .png or .jpg) is needed, defaults to “celltype_protein_cluster.png”
threshold (float, optional) – the probability threshold above which a cell is assigned to a cell type, defaults to 0.7

Return type

None

predict(new_dset)[source]¶

Feed new_dset to the recognition net to get a prediction.

Parameters: new_dset (pd.DataFrame) – the dataset to be predicted
Returns: the resulting cell type assignment
Return type: np.array

class astir.models.CellStateModel(dset, const=2, dropout_rate=0, batch_norm=False, random_seed=42, dtype=torch.float64)[source]¶

Bases: astir.models.abstract.AstirModel

Class to perform statistical inference to on the activation: of states (pathways) across cells

Methods

`diagnostics`()	Run diagnostics on cell state assignments
`fit`([max_epochs, learning_rate, batch_size, …])	Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times
`get_correlations`()	rtype `array`
`get_data`()	Returns data parameter
`get_final_mu_z`([new_dset])	Returns the mean of the predicted z values for each core
`get_losses`()	Getter for losses
`get_recognet`()	Getter for the recognition net
`get_scdataset`()	Returns the input dataset
`get_variables`()	Returns all variables
`is_converged`()	Returns True if the model converged

Parameters

df_gex – the input gene expression dataframe
marker_dict – the gene marker dictionary
random_seed (int) – seed number to reproduce results, defaults to 1234
dtype (dtype) – torch datatype to use in the model

diagnostics()[source]¶

Run diagnostics on cell state assignments

Return type: DataFrame
Returns: diagnostics

fit(max_epochs=50, learning_rate=0.001, batch_size=128, delta_loss=0.001, delta_loss_batch=10, msg='')[source]¶

Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times

Parameters

max_epochs (int) – number of train loop iterations, defaults to 50
learning_rate (float) – the learning rate, defaults to 0.01
batch_size (int) – the batch size, defaults to 128
delta_loss (float) – stops iteration once the loss rate reaches delta_loss, defaults to 0.001
delta_loss_batch (int) – the batch size to consider delta loss, defaults to 10
msg (str) – iterator bar message, defaults to empty string

Return type

List[float]

get_correlations()[source]¶

Return type: array

get_data()[source]¶

Returns data parameter

Return type: Dict[str, Tensor]
Returns: self._data

get_final_mu_z(new_dset=None)[source]¶

Returns the mean of the predicted z values for each core

Parameters: new_dset (Optional[SCDataset]) – returns the predicted z values of this dataset on the existing model. If None, it predicts using the existing dataset
Return type: Tensor
Returns: the mean of the predicted z values for each core

get_losses()[source]¶

Getter for losses

Return type: array
Returns: a torch tensor of losses for each training iteration the model runs

get_recognet()[source]¶

Getter for the recognition net

Return type: StateRecognitionNet
Returns: the trained recognition net

get_scdataset()[source]¶

Returns the input dataset

Return type: SCDataset
Returns: self._dset

get_variables()[source]¶

Returns all variables

Return type: Dict[str, Tensor]
Returns: self._variables

is_converged()[source]¶

Returns True if the model converged

Return type: bool
Returns: self._is_converged

class astir.models.AstirModel(dset, random_seed, dtype)[source]¶

Bases: object

Abstract class to perform statistical inference to assign. This module is the super class of CellTypeModel and CellStateModel and is not supposed to be instantiated.

Methods

`fit`(max_epochs, learning_rate, batch_size, …)	rtype `None`
`get_data`()
`get_losses`()	Getter for losses.
`get_scdataset`()	Getter for the SCDataset.
`get_variables`()	Returns all variables
`is_converged`()	Returns True if the model converged

fit(max_epochs, learning_rate, batch_size, delta_loss, msg)[source]¶

Return type: None

get_data()[source]¶

get_losses()[source]¶

Getter for losses.

Returns: self.losses
Return type: float

get_scdataset()[source]¶

Getter for the SCDataset.

Returns: self._dset
Return type: SCDataset

get_variables()[source]¶

Returns all variables

Returns: self._variables

is_converged()[source]¶

Returns True if the model converged

Return type: bool
Returns: self._is_converged

class astir.models.TypeRecognitionNet(C, G, hidden_size=10)[source]¶

Bases: torch.nn.modules.module.Module

Type Recognition Neural Network.

Parameters

C (int) – number of classes
G (int) – number of features
hidden_size – size of hidden layers

Methods

forward(x)

One forward pass.

forward(x)[source]¶

One forward pass.

Parameters: x (torch.Tensor) – the input vector
Returns: the calculated cost value
Return type: torch.Tensor

class astir.models.StateRecognitionNet(C, G, const=2, dropout_rate=0, batch_norm=False)[source]¶

Bases: torch.nn.modules.module.Module

State Recognition Neural Network to get mean of z and standard deviation of z. The neural network architecture looks like this: G -> const * C -> const * C -> G (for mu) or -> G (for std). With batch normal layers after each activation output layers and dropout activation units

Parameters

C (int) – number of classes
G (int) – number of proteins
const (int) – the size of the hidden layers are const times proportional to C
dropout_rate (float) – the dropout rate
batch_norm (bool) – apply batch normal layers if True

Methods

forward(x)

One forward pass of the StateRecognitionNet

forward(x)[source]¶

One forward pass of the StateRecognitionNet

Return type: Tuple[Tensor, Tensor]