astir.data package¶
Module contents¶
Functions
|
Create an Astir object from an expression CSV and marker YAML |
|
Create an Astir object a directory containing multiple csv files |
|
Create an Astir object from a loom file and a marker yaml |
|
Create an Astir object from an |
Classes
|
Container for single-cell proteomic data in the form of |
-
astir.data.
from_csv_yaml
(csv_input, marker_yaml, design_csv=None, random_seed=1234, dtype=torch.float64)[source]¶ Create an Astir object from an expression CSV and marker YAML
- Parameters
csv_input (
str
) – Path to input csv containing expression for cells (rows) by proteins (columns). First column is cell identifier, and additional column names are gene identifiers.marker_yaml (
str
) – Path to input YAML file containing marker gene information. Should include cell_type and cell_state entries. See documention.design_csv – Path to design matrix as a CSV. Rows should be cells, and columns covariates. First column is cell identifier, and additional column names are covariate identifiers.
-
astir.data.
from_csv_dir_yaml
(input_dir, marker_yaml, random_seed=1234, dtype=torch.float64)[source]¶ Create an Astir object a directory containing multiple csv files
- Parameters
input_dir (
str
) – Path to a directory containing multiple CSV files, each in the format expected by from_csv_yamlmarker_yaml (
str
) – Path to input YAML file containing marker gene information. Should include cell_type and cell_state entries. See documention.design_csv – Path to design matrix as a CSV. Rows should be cells, and columns covariates. First column is cell identifier, and additional column names are covariate identifiers.
-
astir.data.
from_loompy_yaml
(loom_file, marker_yaml, protein_name_attr='protein', cell_name_attr='cell_name', batch_name_attr='batch', random_seed=1234, dtype=torch.float64)[source]¶ Create an Astir object from a loom file and a marker yaml
- Parameters
loom_file (
str
) – Path to a loom file, where rows correspond to proteins and columns to cellsmarker_yaml (
str
) – Path to input YAML file containing marker gene information. Should include cell_type and cell_state entries. See documention.protein_name_attr (
str
) – The attribute (key) in the row attributes that identifies the protein names (required to match with the marker gene information)cell_name_attr (
str
) – The attribute (key) in the column attributes that identifies the name of each cellbatch_name_attr (
str
) – The attribute (key) in the column attributes that identifies the batch. A design matrix will be built using this (if present) using a one-hot encoding to control for batch.random_seed (
int
) – The random seed to be used to initialize variables
- Returns
An object of class astir_bash.py.Astir using data imported from the loom files
-
astir.data.
from_anndata_yaml
(anndata_file, marker_yaml, protein_name=None, cell_name=None, batch_name='batch', random_seed=1234, dtype=torch.float64)[source]¶ Create an Astir object from an
anndata.Anndata
file and a marker yaml- Parameters
anndata_file (
str
) – Path to ananndata.Anndata
h5py filemarker_yaml (
str
) – Path to input YAML file containing marker gene information. Should include cell_type and cell_state entries. See documention.protein_name (
Optional
[str
]) – The column of adata.var containing protein names. If this is none, defaults to adata.var_namescell_name (
Optional
[str
]) – The column of adata.obs containing cell names. If this is none, defaults to adata.obs_namesbatch_name (
str
) – The column of adata.obs containing batch names. A design matrix will be built using this (if present) using a one-hot encoding to control for batch.random_seed (
int
) – The random seed to be used to initialize variables
- Returns
An object of class astir_bash.py.Astir using data imported from the loom files
-
class
astir.data.
SCDataset
(expr_input, marker_dict, include_other_column, design=None, dtype=torch.float64)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Container for single-cell proteomic data in the form of a pytorch dataset
- Parameters
expr_input (
Union
[DataFrame
,Tuple
[Union
[array
,Tensor
],List
[str
],List
[str
]]]) – Input expression data. See details :expr_input is either a pd.DataFrame or a three-element tuple. When it is pd.DataFrame, its index and column should indicate the cell name and feature name of the dataset; when it is a three-element tuple, it should be in the form of Tuple[Union[np.array, torch.Tensor], List[str], List[str]] and its first element should be the actual dataset as either np.array or torch.tensor, the second element should be a list containing the name of the columns or the names of features, the third element should be a list containing the name of the indices or the names of the cells.:marker_dict (
Dict
[str
,List
[str
]]) – Marker dictionary containing cell type and information. See details :The dictionary maps the name of cell type/state to protein features. :design (
Union
[array
,DataFrame
,None
]) – A design matrixinclude_other_column (
bool
) – Should an additional ‘other’ column be included?dtype (
dtype
) – torch datatype of the model
Methods
Get the cell names.
Get the cell types/states.
Get the design matrix.
Get the dtype of the SCDataset.
Return the expression data as a
torch.Tensor
.Return the expression data as a
pandas.DataFrame
.Get the features (proteins).
Return the marker matrix as a
torch.Tensor
.get_mu
()Get the mean expression of each protein as a
torch.Tensor
.Get the number of cells: either the number of cell types or cell states.
Get the number of ‘classes’: either the number of cell types or cell states.
Get the number of features (proteins).
- rtype
Tensor
normalize
([percentile_lower, …])Normalize the expression data
rescale
()Normalize the expression data.
-
get_classes
()[source]¶ Get the cell types/states.
- Returns
return self._classes
- Return type
List[str]
-
get_features
()[source]¶ Get the features (proteins).
- Returns
return self._m_features
- Return type
List[str]
-
get_n_cells
()[source]¶ Get the number of cells: either the number of cell types or cell states.
- Return type
int
-
get_n_classes
()[source]¶ Get the number of ‘classes’: either the number of cell types or cell states.
- Return type
int