2. SCIV API
Download (.dl)
Data download interface, used to download single-cell data and trait files.
- sciv.dl.download_sc_atac_file(is_force: bool = False) None
Download the scATAC2 file from the remote server to the local cache.
Parameters
- is_forcebool, optional
If True, force re-download even if the file exists. Default is False.
Examples
>>> sciv.dl.download_sc_atac_file()
- sciv.dl.download_trait_file(is_force: bool = False) None
Download the trait file from the remote server to the local cache.
Parameters
- is_forcebool, optional
If True, force re-download even if the file exists. Default is False.
Examples
>>> sciv.dl.download_trait_file()
- sciv.dl.download_trs_file(is_force: bool = False) None
Download the TRS file from the remote server to the local cache.
Parameters
- is_forcebool, optional
If True, force re-download even if the file exists. Default is False.
Examples
>>> sciv.dl.download_trs_file()
- sciv.dl.download_trs_score_file(is_force: bool = False) None
Download the TRS score file from the remote server to the local cache.
Parameters
- is_forcebool, optional
If True, force re-download even if the file exists. Default is False.
Examples
>>> sciv.dl.download_trs_score_file()
- sciv.dl.read_sc_atac_file() AnnData
Read the scATAC-seq file from the local cache.
Returns
- AnnData
The scATAC-seq data.
Examples
>>> adata = sciv.dl.read_sc_atac_file()
- sciv.dl.read_trait_file() Tuple[dict, DataFrame]
Read the trait files from the local cache.
Returns
- Tuple[dict, DataFrame]
The trait data.
Examples
>>> variants, trait_info = sciv.dl.read_trait_file()
File (.fl)
File read-write interface, used for processing single-cell ATAC data, H5AD, H5 and other format files.
- sciv.fl.barcodes_add_anno(annotation_file: str | Path, cell_anno: DataFrame, clusters: str | None = None) DataFrame
Add user inputted cell information to the cell annotation data.
Parameters
- annotation_filepath
The file that adds information about cells must contain the column name barcodes, the file input by the user.
- cell_annoDataFrame
Read the cell description in the scATAC-seq data generated from the file.
- clustersstr, optional
The column name for cell clusters or cell types. (In most cases, this column can be ignored.) It is worth noting that only the values in this column are judged to determine whether they contain NA values. If they do, they are assigned the value unknown, and if not, no operation is performed.
Returns
- Complete cell annotation data
Complete cell annotation data with user inputted cell information.
- sciv.fl.read_barcodes_file(barcodes_file: str | Path, clusters: str | None = None, barcode_split_character: str = '-', annotation_file: str | Path | None = None) DataFrame
Read barcodes file.
Parameters
- barcodes_filepath
Barcodes file.
- clustersstr, optional
The column name for cell clusters or cell types. (In most cases, this column can be ignored.) It is worth noting that only the values in this column are judged to determine whether they contain NA values. If they do, they are assigned the value unknown, and if not, no operation is performed.
- barcode_split_characterstr, default=’-’
A barcode separated character symbol. (meta)
- annotation_filepath, optional
The file that adds information about cells must contain the column name barcodes, the file input by the user.
Returns
- Cell annotation data
Cell annotation data with user inputted cell information.
- sciv.fl.read_h5(file: str | Path, is_close: bool = False)
Read AnnData data from an h5 file.
Parameters
- filepath
Path to the h5 file.
- is_closebool, default=False
If True, close the file. Default is False.
Returns
- AnnData data.
The loaded AnnData data from the h5 file.
- sciv.fl.read_h5ad(file: str | Path, is_verbose: bool = True) AnnData
Read AnnData from an h5ad file.
Parameters
- filepath
Path to the h5ad file.
- is_verbosebool, default=True
If True, print log information. Default is True.
Returns
- AnnData
The loaded AnnData object.
- sciv.fl.read_pkl(file: str | Path, is_verbose: bool = True)
Read data from a pickle file.
Parameters
- filepath
Path to the pickle file.
- is_verbosebool, default=True
If True, print log information. Default is True.
Returns
- Python variable data.
The loaded Python variable data from the pickle file.
- sciv.fl.read_sc_atac(resource: str | Path | None = None, is_transpose: bool = True, barcode_split_character: str = '-', on_barcode_split_character: str | None = None, annotation_file: str | Path | None = None, clusters: str | None = None, peak_split_character: Tuple = (':', '-')) AnnData
Read scATAC-seq data and return it in AnnData format.
Parameters
- resourcepath, optional
Input data source. Can be one of the following: 1. Path to directory containing matrix, bed file, etc. (output from cell-ranger) 2. H5 file obtained through cell-ranger 3. A comprehensive h5ad file 4. A table file with cell or peak columns and indexes, where content is fragment counts Default is None.
- is_transposebool, default=True
Whether transpose is required to read the matrix file.
- barcode_split_characterstr, default=’-’
Character used to split barcode information (for metadata).
- on_barcode_split_characterstr, optional
Character used to split barcode information (for matrix). If None, uses barcode_split_character. Default is None.
- annotation_filepath, optional
File containing additional cell information. Must contain a ‘barcodes’ column. Default is None.
- clustersstr, optional
Column name for cell clusters or cell types. If NA values exist in this column, they will be assigned as ‘unknown’. Default is None.
- peak_split_characterTuple, default=(“:”, “-“)
Characters used to split peak information (chromosome, start, end). First element splits chromosome from start, second splits start from end.
Returns
- AnnData
scATAC-seq data in AnnData format with cell and peak annotations.
- sciv.fl.read_sc_atac_10x_h5(file: str | Path, clusters: str | None = None, barcode_split_character: str = '-', annotation_file: str | Path | None = None, peak_split_character: Tuple = (':', '-')) AnnData
Read hdf5 file from Cell Ranger v3 or later versions.
Parameters
- filepath
A comprehensive h5ad file. (It can be obtained through cell-ranger)
- clustersstr, optional
The column name for cell clusters or cell types. (In most cases, this column can be ignored.) It is worth noting that only the values in this column are judged to determine whether they contain NA values. If they do, they are assigned the value unknown, and if not, no operation is performed.
- barcode_split_characterstr, default=’-’
A barcode separated character symbol (meta)
- annotation_filepath, optional
The file that adds information about cells must contain the column name barcodes
- peak_split_charactertuple, default=(‘:’,’-‘)
A peak separated character symbol
Returns
- AnnData
scATAC-seq data.
- sciv.fl.read_variants(base_path: str | Path | None = None, files: list | set | Tuple | ndarray | None = None, labels: dict | None = None, column_map: dict | None = None, repeat_symbol: str = '_#') Tuple[dict, DataFrame]
Read variant file set.
Parameters
- base_pathpath, optional
Path for storing mutation trait data. The file must contain the following column names: chr, position, rsId, pp, where ID represents the representative of the trait name. Default is None.
- filescollection, optional
Collection of mutation trait data file paths. Default is None.
- labelsdict, optional
Classification labels for each trait or disease. Default is None.
- column_mapdict, optional
Mapping of column names to facilitate mapping the corresponding column names in the mutation file to the specified column name information. For example: {0: “chr”, 1: “position”, 2: “rsId”, 3: “pp”}. Default is None.
- repeat_symbolstr, default=”_#”
Symbol used to distinguish duplicate trait names. If two files have the same name abbreviation, a symbol and numerical value will be added to one of the abbreviations.
Returns
- dict
Dictionary containing AnnData objects for each trait or disease, where keys are trait names and values are AnnData objects with variant information.
- DataFrame
Annotated information on traits or diseases, including summary statistics such as pp_sum, pp_mean, count, and filename.
- sciv.fl.save_h5(data: dict, save_file: str | Path, group_name: str = 'matrix') None
Save H5 data to H5 file.
Parameters
- datadict
Input H5 data to save.
- save_filepath
Input path to save file.
- group_name: str, default=”matrix”
The group name.
Returns
- H5 file
The input H5 file.
- sciv.fl.save_h5ad(data: AnnData, file: str | Path) AnnData
Save AnnData data to h5ad file.
Parameters
- dataAnnData
Input AnnData object to save.
- filepath
Path to save file.
Returns
- AnnData
The input AnnData object.
- sciv.fl.save_pkl(data, save_file: str | Path, is_verbose: bool = False) None
Save pkl data to pkl file.
Parameters
- dataany
Input data to save.
- save_filepath
Input path to save file.
is_verbose: Set true to print log;
Returns
- pkl file
The input pkl file.
- sciv.fl.to_fragments(adata: AnnData, fragments: str, layer: str | None = None, batch_size: int = 100000, is_sort: bool = True, is_gz: bool = True, is_keep: bool = False) None
Convert AnnData format data into fragments format file.
Parameters
- adataAnnData
Input AnnData object containing single-cell data.
- fragmentsstr
Output file path for the fragments file.
- layerstr, optional
The layer of data to use for generating fragments file. If None, uses the main data matrix (adata.X).
- batch_sizeint, default=50000
Batch size for processing data. Larger values reduce memory consumption.
- is_sortbool, default=True
Whether to sort the output by chromosome and start position. Sorts chromosomes in natural order (chr1, chr2, …, chrX, chrY, chrM).
- is_gzbool, default=True
Whether to compress the output file using gzip. Uses pysam.tabix_compress for compression.
- is_keepbool, default=False
Whether to keep the uncompressed fragments file after compression. Only effective when is_gz is True. If False, the uncompressed file is deleted after successful compression.
Returns
- None
Writes fragments file to the specified path.
Note
To export results processed by SnapATAC2, please use snapatac2.ex.export_fragments directly. Using this function is not recommended.
- sciv.fl.to_meta(adata: AnnData, dir_path: str | Path, layer: str | None = None, feature_name: str = 'peaks.bed', field: Literal['real', 'complex', 'pattern', 'integer'] | None = None) None
Convert AnnData object into metadata directory containing matrix, feature files, etc.
This function exports single-cell data into standard 10x Genomics format, including: - matrix.mtx: Sparse matrix file in Matrix Market format - annotation.txt: Cell annotation information - barcodes.tsv: Cell barcodes list - peaks.bed or specified feature file: Genomic feature information
Parameters
- adataAnnData
Input AnnData object containing single-cell data.
- dir_pathpath
Output directory path for storing generated metadata files.
- layerstr, optional
layer: The layer of data that needs to form meta files; If None, uses adata.X as the main data matrix.
- feature_namestr, default=”peaks.bed”
Output name for the feature file. If starts with “peaks”, feature indices will be parsed by chromosome position into BED format.
- field_Field, optional
Matrix data type field, available values: - ‘real’: Real numbers - ‘complex’: Complex numbers - ‘pattern’: Pattern matrix (no values) - ‘integer’: Integer values If None, automatically determined from data type.
Returns
- Directory
The input directory.
Model (.ml)
The core interface of the model provides functions for cell type association analysis and causal variation recognition.
- sciv.ml.association_score(adata: AnnData, score_name: str = 'association_score', layer: str = 'trs_source', axis: Literal[0, 1] = 0) None
Calculate association score for traits or diseases. This function calculates the association score for traits or diseases based on the TRS (Trait Relevance Score) data in the input AnnData object.
Parameters
- adataAnnData
Input AnnData object containing TRS data.
- score_namestr, optional
Name of the score column in the AnnData object. Default is “association_score”.
- layerstr, optional
Layer name in the AnnData object containing TRS data. Default is “trs_source”.
- axisLiteral[0, 1], optional
Axis to calculate the score (0 for traits, 1 for diseases). Default is 0.
- return:
None
- sciv.ml.core(adata: AnnData, variants: dict, trait_info: DataFrame, cell_rate: float | None = None, peak_rate: float | None = None, max_epochs: int = 500, lr: float = 1e-05, batch_size: int = 128, eps: float = 1e-08, early_stopping: bool = True, early_stopping_patience: int = 50, strategy: str = 'ddp_notebook_find_unused_parameters_true', batch_key: str | None = None, resolution: float = 0.5, k: int = 30, or_k: int = 10, weight: float = 0.5, kernel: Literal['laplacian', 'gaussian'] = 'gaussian', local_k: int = 10, kernel_gamma: float | str | list | set | Tuple | ndarray | None = None, epsilon: float = 1e-05, max_steps: int = 300, gamma: float = 0.05, enrichment_gamma: float = 0.05, p: int = 2, n_jobs: int = -1, min_seed_cell_rate: float = 0.01, max_seed_cell_rate: float = 0.05, credible_threshold: float = 0, diff_peak_value: Literal['emp_effect', 'bayes_factor', 'emp_prob1', 'all'] = 'emp_effect', enrichment_threshold: Literal['golden', 'half', 'e', 'pi', 'none'] | float = 'golden', is_ablation: bool = False, model_dir: str | Path | None = None, save_path: str | Path | None = None, is_simple: bool = True, is_save_random_walk_model: bool = False, is_file_exist_loading: bool = False, filename_dict: dict | None = None, block_size: int = -1) AnnData
The core algorithm of sciv includes the flow of all algorithms, as well as drawing and saving data. In the entire algorithm, the samples are in the row position, and the traits or diseases are in the column position, while ensuring that there is no interaction between the traits or diseases, ensuring the stability of the results;
- Meaning of main variables:
overlap_adata, (obs: peaks, var: traits/diseases) Peaks-traits/diseases data obtained by overlaying variant data with peaks.
da_peaks, (obs: clusters (Leiden), var: peaks) Differential peak data of cell clustering, used for weight correction of cells.
init_score, (obs: cells, var: traits/diseases) This is the initial TRS data.
cc_data, (obs: cells, var: cells) Cell similarity data.
random_walk, RandomWalk class.
trs, (obs: cells, var: traits/diseases) This is the final TRS data.
Parameters
- adataAnnData
scATAC-seq data.
- variantsdict
Variant data. This data is recommended to be obtained by executing the fl.read_variants method.
- trait_infoDataFrame
Variant annotation file information.
- cell_rateOptional[float], default None
Removing the percentage of cell count in total cell count only takes effect when the min_cells parameter is None.
- peak_rateOptional[float], default None
Removing the percentage of peak count in total peak count only takes effect when the min_peaks parameter is None.
- max_epochsint, default 500
The maximum number of epochs for PoissonVI training.
- lrfloat, default 1e-5
Learning rate for optimization.
- batch_sizeint, default 128
Minibatch size to use during training.
- epsfloat, default 1e-08
Optimizer eps.
- early_stoppingbool, default True
Whether to perform early stopping with respect to the validation set.
- early_stopping_patienceint, default 50
How many epochs to wait for improvement before early stopping.
- strategystr, default “ddp_notebook_find_unused_parameters_true”
DDP strategy.
- batch_keyOptional[str], default None
Batch information in scATAC-seq data.
- resolutionfloat, default 0.5
Resolution of the Leiden Cluster. The recommended values are any one of 0.4, 0.9, 1.3, 1.5.
- kint, default 30
When building an mKNN network, the number of nodes connected by each node (and operation).
- or_kint, default 10
When building an mKNN network, the number of nodes connected by each node (or operation).
- weightfloat, default 0.5
The weight of interactions or operations.
- kernelLiteral[“laplacian”, “gaussian”], default “gaussian”
Determine the kernel function to be used.
- local_kint, default 10
Determining the number of neighbors for the adaptive kernel.
- kernel_gammaOptional[Union[float, str, collection]], default None
When the value of kernel is “laplacian”, if it is None, then it is the reciprocal of the latent representation dimension of the cell. When the value of kernel is “gaussian”, if it is None, then it defaults to an adaptive value obtained through local information of the parameter local_k. Otherwise, it should be strictly positive.
- epsilonfloat, default 1e-05
Conditions for stopping in random walk.
- max_stepsint, default 300
Maximum number of steps in a random walk with restart.
- gammafloat, default 0.05
Reset weight for random walk.
- enrichment_gammafloat, default 0.05
Reset weight for random walk for enrichment.
- pint, default 2
Distance used for loss {1: Manhattan distance, 2: Euclidean distance}.
- n_jobsint, default -1
The maximum number of concurrently running jobs.
- min_seed_cell_ratefloat, default 0.01
The minimum percentage of seed cells in all cells.
- max_seed_cell_ratefloat, default 0.05
The maximum percentage of seed cells in all cells.
- credible_thresholdfloat, default 0
The threshold for determining the credibility of enriched cells in the context of enrichment, i.e. the threshold for judging enriched cells.
- diff_peak_valuedifference_peak_optional, default ‘emp_effect’
Specify the correction value in peak correction of clustering type differences. {‘emp_effect’, ‘bayes_factor’, ‘emp_prob1’}
- enrichment_thresholdUnion[enrichment_optional, float], default ‘golden’
Only by setting a threshold for the standardized output TRS can a portion of the enrichment results be obtained. Parameters support string types {‘golden’, ‘half’, ‘e’, ‘pi’, ‘none’}, or valid floating-point types within the range of (0, log1p(1)).
- is_ablationbool, default False
True represents obtaining the results of the ablation experiment. This parameter is limited by the is_simple parameter, and its effectiveness requires setting is_simple to False.
- model_dirOptional[path], default None
The folder name saved by the training module. It is worth noting that if the training model file (model.pt) exists in this path, it will be automatically read and skip the training of PoissonVI model.
- save_pathOptional[path], default None
Save path for process files and result files.
- is_simplebool, default True
True represents not adding unnecessary intermediate variables, only adding the final result. It is worth noting that when set to True, the is_ablation parameter will become invalid, and when set to False, is_ablation will only take effect.
- is_save_random_walk_modelbool, default False
Default to False, do not save random walk model. When setting True, please ensure sufficient storage as the saved pkl file is relatively large.
- is_file_exist_loadingbool, default False
By default, the file will be overwritten. When set to True, if the file exists, the process will be skipped and the file will be directly read as the result.
- filename_dictOptional[dict], default None
The name of the file that exists. default: {
“sc_atac”: “sc_atac.h5ad”, “da_peaks”: “da_peaks.h5ad”, “atac_overlap”: “atac_overlap.h5ad”, “init_score”: “init_score.h5ad”, “cc_data”: “cc_data.h5ad”, “random_walk”: “random_walk.h5ad”, “trs”: “trs.h5ad”
}
- block_sizeint
The size of the segmentation stored in block wise matrix multiplication. By sacrificing time and space to reduce memory consumption to a certain extent. If the value is less than or equal to zero, no block operation will be performed.
Returns
- AnnData
AnnData object containing TRS (Trait Relevance Score) results. (obs: cells, var: traits/diseases) This is the final TRS data.
- sciv.ml.knock(trs: AnnData, sc_atac: AnnData, da_peaks: AnnData, cc_data: AnnData, knock_trait: str, knock_info: dict[str, Union[str, list, set, Tuple, numpy.ndarray]], knock_value: float = 0, is_add_control: bool = False) AnnData
Perform gene knockdown or knockout analysis on a specific trait.
This function simulates the effect of knocking down or knocking out specific variants associated with a trait, and re-runs the random walk algorithm to compute the resulting TRS (Trait Relevance Score) changes.
Parameters
- trsAnnData
TRS result data from ml.core, containing parameters, variants, trait_info and trs_source.
- sc_atacAnnData
scATAC-seq data used in the original analysis.
- da_peaksAnnData
Differential accessibility peaks data from the original analysis.
- cc_dataAnnData
Cell-cell similarity network data from the original analysis.
- knock_traitstr
The trait ID to perform knockdown/knockout on.
- knock_infodict[str, Union[str, collection]]
Dictionary mapping knock group names to variant IDs (rsId) to be knocked down. Each key is a group name, and each value is either a single variant ID (str) or a collection of variant IDs to knock down together.
- knock_valuefloat, default 0
The value to set for knocked-down variants. Default is 0 (complete knockout). Values >= 1e-3 are not recommended as they may not achieve the desired effect.
- is_add_controlbool, default False
Whether to add control experiments (knocking out background variants).
Returns
- AnnData
AnnData object containing TRS results after knockdown/knockout. Includes knock parameters in .uns[“params”] and knock-specific metadata.
Plot (.pl)
Visual interface, including multiple chart types for data analysis and presentation.
Graph
Network diagram visualization function.
- sciv.pl.communities_graph(adata: AnnData, labels: list | set | Tuple | ndarray, layer: str | None = None, groupby: str = 'clusters', x_name: str | None = None, y_name: str | None = None, title: str | None = None, width: float = 2, height: float = 2, bottom: float = 0, node_size: float = 2.0, line_widths: float = 0.001, start_color_index: int = 0, color_step_size: int = 0, output: str | Path | None = None, show: bool = True, close: bool = False)
Plot a cell-cell network diagram with community detection coloring.
This function visualizes a network graph where nodes represent cells and edges represent connections between cells. Nodes are colored based on their community assignments.
Parameters
- adataAnnData
Annotated data matrix with observations (cells) and variables (genes).
- labelscollection
Community labels for grouping nodes. Each community is a collection of node indices.
- layerstr, optional
Name of the layer in adata to use for adjacency matrix. If None, uses adata.X.
- groupbystr, default=”clusters”
Column name in adata.obs containing cluster information for color assignment.
- x_namestr, optional
Label for the x-axis.
- y_namestr, optional
Label for the y-axis.
- titlestr, optional
Title of the plot.
- widthfloat, default=2
Width of the figure in inches.
- heightfloat, default=2
Height of the figure in inches.
- bottomfloat, default=0
Bottom margin adjustment.
- node_sizefloat, default=2.0
Size of the nodes in the network.
- line_widthsfloat, default=0.001
Width of the node edges and network edges.
- start_color_indexint, default=0
Starting index for color selection from the color palette.
- color_step_sizeint, default=0
Step size for selecting colors from the palette for different communities.
- outputpath, optional
Path to save the figure.
- showbool, default=True
Whether to display the figure.
- closebool, default=False
Whether to close the figure after display.
Returns
- None
The function displays and/or saves the network plot.
- sciv.pl.graph(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, labels: list | set | Tuple | ndarray | None = None, node_size: int = 50, name: str | None = None, x_name: str | None = None, y_name: str | None = None, title: str | None = None, width: float = 2, height: float = 2, bottom: float = 0, is_font: bool = False, output: str | Path | None = None, show: bool = True, close: bool = False) None
Plot a graph from an adjacency matrix.
Parameters
- datamatrix_data
Adjacency matrix representing the graph connections.
- labelscollection, optional
Labels for each node in the graph.
- node_sizeint, default=50
Size of the nodes in the plot.
- namestr, optional
Name of the graph.
- x_namestr, optional
Label for the x-axis.
- y_namestr, optional
Label for the y-axis.
- titlestr, optional
Title of the plot.
- widthfloat, default=2
Width of the figure in inches.
- heightfloat, default=2
Height of the figure in inches.
- bottomfloat, default=0
Bottom margin adjustment.
- is_fontbool, default=False
Whether to display node labels.
- outputpath, optional
Path to save the figure.
- showbool, default=True
Whether to display the figure.
- closebool, default=False
Whether to close the figure after display.
- sciv.pl.network_two_types(data_pairs: list, type1_scores: dict, type2_scores: dict, type1_node_size: dict | list | float | None = 50, type2_node_size: dict | list | float | None = 50, label_nodes: list | None = None, width: float = 4, height: float = 3, k: float | None = None, iterations: int = 50, scale: float = 1, radius: float = 0.35, type1_node_shape: str = 'o', type2_node_shape: str = 's', type1_bar_label: str = 'Score', type2_bar_label: str = 'Score', type1_cmap_str: str = 'winter', type2_cmap_str: str = 'YlOrRd', node_alpha: float = 0.8, edge_alpha: float = 0.8, is_fluctuate: bool = True, layout_type: str = 'spring', output: str | Path | None = None, show: bool = True, close: bool = False)
Plot a bipartite network graph with two types of nodes.
This function visualizes a network where nodes are divided into two distinct types (e.g., genes and variations), with edges representing connections between them. Each node type can have different sizes, colors, and shapes based on their scores.
Parameters
- data_pairslist
List of tuples representing edges between type1 and type2 nodes.
- type1_scoresdict
Dictionary mapping type1 node names to their score values for color mapping.
- type2_scoresdict
Dictionary mapping type2 node names to their score values for color mapping.
- type1_node_sizeUnion[dict, list, float], default=50
Size of type1 nodes. Can be a single value, list, or dict mapping nodes to sizes.
- type2_node_sizeUnion[dict, list, float], default=50
Size of type2 nodes. Can be a single value, list, or dict mapping nodes to sizes.
- label_nodeslist, optional
List of node names to display labels for.
- widthfloat, default=4
Width of the figure in inches.
- heightfloat, default=3
Height of the figure in inches.
- kfloat, optional
Optimal distance between nodes for spring layout. If None, uses default.
- iterationsint, default=50
Number of iterations for spring layout optimization.
- scalefloat, default=1
Scale factor for the layout positions.
- radiusfloat, default=0.35
Radius for positioning connected nodes around their parent nodes in custom layouts.
- type1_node_shapestr, default=’o’
Matplotlib marker shape for type1 nodes.
- type2_node_shapestr, default=’s’
Matplotlib marker shape for type2 nodes.
- type1_bar_labelstr, default=’Score’
Label for the color bar of type1 nodes.
- type2_bar_labelstr, default=’Score’
Label for the color bar of type2 nodes.
- type1_cmap_strstr, default=”winter”
Colormap name for type1 node colors.
- type2_cmap_strstr, default=”YlOrRd”
Colormap name for type2 node colors.
- node_alphafloat, default=0.8
Transparency level for nodes (0-1).
- edge_alphafloat, default=0.8
Transparency level for edges (0-1).
- is_fluctuatebool, default=True
Whether to add random fluctuation to node positions in custom layouts.
- layout_typestr, default=’spring’
Layout algorithm to use. Options: ‘spring’, ‘kamada_kawai’, ‘circular’, ‘shell’, ‘circular_type1’, ‘circular_type2’, ‘square_type1’, ‘square_type2’.
- outputpath, optional
Path to save the figure.
- showbool, default=True
Whether to display the figure.
- closebool, default=False
Whether to close the figure after display.
Returns
- None
The function displays and/or saves the network plot.
Heatmap
Heatmap visualization function.
- sciv.pl.heatmap(adata: AnnData, layer: str | None = None, title: str | None = None, width: float = 4, height: float = 4, bottom: float = 0, annot: bool = False, square: bool = True, is_cluster: bool = False, cmap: str = 'Oranges', line_widths: float = 1, fmt: str = '.2f', rotation: float = 65, x_name: str | None = None, y_name: str | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Generate a simple heatmap using seaborn.
Parameters
- adataAnnData
Input AnnData object containing the data matrix.
- layerstr, default None
Layer name in adata.layers to use for plotting. If None, uses adata.X.
- titleOptional[str], default None
Title of the figure.
- widthfloat, default 4
Width of the figure in inches.
- heightfloat, default 4
Height of the figure in inches.
- bottomfloat, default 0
Bottom margin of the figure.
- annotbool, default False
Whether to annotate each cell with its numeric value.
- squarebool, default True
Whether to make cells square-shaped.
- is_clusterbool, default False
Whether to perform hierarchical clustering (uses clustermap instead of heatmap).
- cmapstr, default “Oranges”
Colormap for the heatmap.
- line_widthsfloat, default 1
Width of the lines that divide cells.
- fmtstr, default “.2f”
String formatting code for annotations.
- rotationfloat, default 65
Rotation angle for x-axis labels.
- x_namestr, default None
Label for the x-axis.
- y_namestr, default None
Label for the y-axis.
- outputpath, default None
File path to save the figure. If None, figure is not saved.
- showbool, default True
Whether to display the figure.
- closebool, default False
Whether to close the figure after saving.
- **kwargsAny
Additional keyword arguments passed to seaborn heatmap or clustermap.
Returns
- None
Displays or saves the heatmap figure.
- sciv.pl.heatmap_annotation(adata: AnnData, layer: str | None = None, width: float = 4, height: float = 4, title: str | None = None, label: str = 'value', row_name: str | None = None, col_name: str | None = None, row_names: str | None = None, col_names: str | None = None, row_anno_label: bool = False, col_anno_label: bool = False, row_anno_text: bool = False, col_anno_text: bool = False, row_legend: bool = False, col_legend: bool = False, row_show_names: bool = False, col_show_names: bool = False, row_cluster: bool = False, col_cluster: bool = False, cluster_method: str = 'average', cluster_metric: str = 'correlation', row_names_side: str = 'left', col_names_side: str = 'bottom', bottom: float = 0.01, label_size: float = 9, fontsize: float = 9, level_bar_height: float | None = None, anno_specific_labels: list | None = None, x_label_rotation: float = 245, y_label_rotation: float = 0, row_color_start_index: int = 0, col_color_start_index: int = 10, row_split: int | Series | None = None, col_split: int | Series | None = None, row_split_order: list | str | None = None, col_split_order: list | str | None = None, row_split_gap: float = 0.5, col_split_gap: float = 0.2, frac: float = 0.2, relpos: Tuple = (0, 1), anno_label_height: float | None = None, selected_anno_label_height: float = 2.5, category_height: float | None = 2.5, x_name: str | None = None, y_name: str | None = None, row_score_name: str = 'association_score', cmap: str = 'Oranges', is_sort: bool = True, show: bool = True, close: bool = False, output: str | Path | None = None, **kwargs) None
Generate a heatmap with row and column annotations.
Parameters
- adataAnnData
Input AnnData object containing the data matrix and metadata.
- layerOptional[str], default None
Layer name in adata.layers to use for plotting. If None, uses adata.X.
- widthfloat, default 4
Width of the figure in inches.
- heightfloat, default 4
Height of the figure in inches.
- titleOptional[str], default None
Title of the figure.
- labelstr, default “value”
Label for the heatmap color bar.
- row_nameOptional[str], default None
Column name in adata.obs for row annotations.
- col_nameOptional[str], default None
Column name in adata.var for column annotations.
- row_namesOptional[str], default None
Column name in adata.obs to use as row index labels.
- col_namesOptional[str], default None
Column name in adata.var to use as column index labels.
- row_anno_labelbool, default False
Whether to display merged labels for row annotations.
- col_anno_labelbool, default False
Whether to display merged labels for column annotations.
- row_anno_textbool, default False
Whether to display text labels on row annotation bars.
- col_anno_textbool, default False
Whether to display text labels on column annotation bars.
- row_legendbool, default False
Whether to show legend for row annotations.
- col_legendbool, default False
Whether to show legend for column annotations.
- row_show_namesbool, default False
Whether to display row names (index labels) on the heatmap.
- col_show_namesbool, default False
Whether to display column names (index labels) on the heatmap.
- row_clusterbool, default False
Whether to perform hierarchical clustering on rows.
- col_clusterbool, default False
Whether to perform hierarchical clustering on columns.
- cluster_methodstr, default “average”
Linkage method for hierarchical clustering (e.g., “average”, “single”, “complete”).
- cluster_metricstr, default “correlation”
Distance metric for hierarchical clustering (e.g., “correlation”, “euclidean”).
- row_names_sidestr, default “left”
Side to display row names (“left” or “right”).
- col_names_sidestr, default “bottom”
Side to display column names (“top” or “bottom”).
- bottomfloat, default 0.01
Bottom margin of the figure.
- label_sizefloat, default 9
Font size for row and column name labels.
- fontsizefloat, default 9
Font size for axis titles.
- level_bar_heightfloat, default None
Height of the association score bar plot annotation.
- anno_specific_labelslist, default None
List of specific row labels to highlight in the annotation.
- x_label_rotationfloat, default 245
Rotation angle for x-axis labels (column names).
- y_label_rotationfloat, default 0
Rotation angle for y-axis labels (row names).
- row_color_start_indexint, default 0
Starting index in the color palette for row annotations.
- col_color_start_indexint, default 10
Starting index in the color palette for column annotations.
- row_splitUnion[int, pd.Series], default None
Number of clusters or grouping series for splitting rows.
- col_splitUnion[int, pd.Series], default None
Number of clusters or grouping series for splitting columns.
- row_split_orderUnion[list, str], default None
Order for row splits or ‘cluster_between_groups’ for auto-clustering.
- col_split_orderUnion[list, str], default None
Order for column splits or ‘cluster_between_groups’ for auto-clustering.
- row_split_gapfloat, default 0.5
Gap size between row splits in mm.
- col_split_gapfloat, default 0.2
Gap size between column splits in mm.
- fracfloat, default 0.2
Fraction parameter for annotation label positioning.
- relposTuple, default (0, 1)
Relative position for annotation labels.
- anno_label_heightOptional[float], default None
Height of the annotation label bar.
- selected_anno_label_heightfloat, default 2.5
Height of the selected annotation label bar.
- category_heightOptional[float], default 2.5
Height of the category annotation bar.
- x_nameOptional[str], default None
Label for the x-axis.
- y_nameOptional[str], default None
Label for the y-axis.
- row_score_namestr, default “association_score”
Column name in adata.obs for the association score bar plot.
- cmapstr, default “Oranges”
Colormap for the heatmap.
- is_sortbool, default True
Whether to sort rows and columns before plotting.
- showbool, default True
Whether to display the figure.
- closebool, default False
Whether to close the figure after saving.
- outputpath, default None
File path to save the figure. If None, figure is not saved.
- **kwargs
Additional keyword arguments passed to ClusterMapPlotter.
Returns
- None
Displays or saves the heatmap figure.
Scatter
Scatter chart visualization function.
- sciv.pl.manhattan_causal_variant(df: DataFrame, y: str = 'pp', chr_name: str = 'chr', label: str = 'rsId', size: int = 30, labels: list | None = None, colors: list | None = None, width: float = 8, height: float = 2, bottom: float = 0, title: str | None = None, is_sort: bool = True, line_width: float = 0.5, y_round: int = 3, x_name: str | None = 'Chromosome', y_name: str | None = 'pp', y_limit: Tuple[float, float] = (0, 1), output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Create a Manhattan plot for causal variant visualization across chromosomes.
Parameters
- dfDataFrame
Input data containing variant information with chromosome and position data
- ystr, default “pp”
Column name for y-axis values (typically posterior probability or p-value)
- chr_namestr, default “chr”
Column name for chromosome identifiers
- labelstr, default “rsId”
Column name for variant labels/identifiers
- sizeint, default 30
Size of scatter points
- labelsOptional[list], optional
List of specific variant labels to annotate on the plot
- colorsOptional[list], optional
Custom color palette for different chromosomes
- widthfloat, default 8
Figure width in inches
- heightfloat, default 2
Figure height in inches
- bottomfloat, default 0
Bottom margin adjustment
- titlestr, optional
Plot title
- is_sortbool, default True
Whether to sort data by chromosome
- line_widthfloat, default 0.5
Width of separator lines between chromosomes and grid lines
- y_roundint, default 3
Number of decimal places for y-value annotations
- x_nameOptional[str], default “Chromosome”
Label for x-axis
- y_nameOptional[str], default “pp”
Label for y-axis
- y_limitTuple[float, float], default (0, 1)
Y-axis limits for the plot
- outputpath, optional
Output file path
- showbool, default True
Whether to display the plot
- closebool, default False
Whether to close the figure after saving
- **kwargsAny
Additional arguments passed to ax.axvline
- sciv.pl.pseudo_time_score(df: DataFrame, x: str, y: str, x_name: str | None = None, y_name: str | None = None, title: str | None = None, width: float = 2, height: float = 1.2, bottom: float = 0, alpha: float = 0.65, line_width: float = 1.5, step_length: int = 5, polyorder: int = 1, size: float | list | set | Tuple | ndarray = 1.0, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Create a scatter plot showing pseudo-time scores with a smoothed trend line.
Parameters
- dfDataFrame
Input data containing pseudo-time and score values
- xstr
Column name for pseudo-time values (x-axis)
- ystr
Column name for score values (y-axis)
- x_namestr, optional
Label for x-axis
- y_namestr, optional
Label for y-axis
- titlestr, optional
Plot title
- widthfloat, default 2
Figure width in inches
- heightfloat, default 1.2
Figure height in inches
- bottomfloat, default 0
Bottom margin adjustment
- alphafloat, default 0.65
Transparency of scatter points
- line_widthfloat, default 1.5
Width of the smoothed trend line
- step_lengthint, default 5
Step length for determining Savitzky-Golay filter window size
- polyorderint, default 1
Polynomial order for Savitzky-Golay filter
- sizeUnion[float, collection], default 1.0
Size of scatter points
- outputpath, optional
Output file path
- showbool, default True
Whether to display the plot
- closebool, default False
Whether to close the figure after saving
- **kwargsAny
Additional arguments passed to ax.scatter
- sciv.pl.scatter_3d(df: DataFrame, x: str, y: str, z: str, hue: str | None = None, x_name: str | None = None, y_name: str | None = None, z_name: str | None = None, title: str | None = None, width: float = 7, height: float = 7, elev: float = 30, azim: float = -60, is_add_legend: bool = True, cmap: str | ListedColormap = 'tab20', font_size: int = 14, edge_color: str | None = None, size: float | list | set | Tuple | ndarray = 0.1, legend_name: str | None = None, is_add_max_label: bool = False, text_left_offset: float = 0.5, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any)
Create a 3D scatter plot with customizable aesthetics.
Parameters
- dfDataFrame
Input data containing x, y, z coordinates
- xstr
Column name for x-axis values
- ystr
Column name for y-axis values
- zstr
Column name for z-axis values
- huestr, optional
Column name for color grouping
- x_namestr, optional
Label for x-axis
- y_namestr, optional
Label for y-axis
- z_namestr, optional
Label for z-axis
- titlestr, optional
Plot title
- widthfloat, default 7
Figure width in inches
- heightfloat, default 7
Figure height in inches
- elevfloat, default 30
Elevation angle for 3D view
- azimfloat, default -60
Azimuth angle for 3D view
- is_add_legendbool, default True
Whether to add legend
- cmapUnion[str, ListedColormap], default ‘tab20’
Colormap for coloring
- font_sizeint, default 14
Font size for labels and title
- edge_colorstr, optional
Edge color for scatter points
- sizeUnion[float, collection], default 0.1
Size of scatter points
- legend_namestr, optional
Title for legend
- is_add_max_labelbool, default False
Whether to add label for maximum z value point
- text_left_offsetfloat, default 0.5
Horizontal offset for max value label
- outputpath, optional
Output file path
- showbool, default True
Whether to display the plot
- closebool, default False
Whether to close the figure after saving
- **kwargsAny
Additional arguments passed to ax.scatter
- sciv.pl.scatter_atac(adata: AnnData, columns: Tuple[str, str] = ('UMAP1', 'UMAP2'), groupby: str = 'clusters', hue_order: list | None = None, width: float = 2, height: float = 2, x_name: str | None = None, y_name: str | None = None, start_color_index: int = 0, color_step_size: int = 0, type_colors: list | set | Tuple | ndarray | None = None, edge_color: str | None = None, size: float = 1.0, text_fontsize: float = 7, legend_fontsize: float = 7, is_text: bool = False, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Create a scatter plot for ATAC-seq data with cluster coloring.
Parameters
- adataAnnData
AnnData object containing observations and coordinates
- columnsTuple[str, str], default (“UMAP1”, “UMAP2”)
Column names for x and y coordinates in adata.obs
- groupbystr, default “clusters”
Column name for cluster labels in adata.obs
- hue_orderlist, optional
Order of clusters for legend
- widthfloat, default 2
Figure width in inches
- heightfloat, default 2
Figure height in inches
- x_namestr, optional
Label for x-axis
- y_namestr, optional
Label for y-axis
- start_color_indexint, default 0
Starting index in color palette
- color_step_sizeint, default 0
Step size for color selection
- type_colorscollection, optional
Custom color palette
- edge_colorstr, optional
Edge color for scatter points
- sizefloat, default 1.0
Size of scatter points
- text_fontsizefloat, default 7
Font size for annotation text
- legend_fontsizefloat, default 7
Font size for legend text
- is_textbool, default False
Whether to add text annotations
- outputpath, optional
Output file path
- showbool, default True
Whether to display the plot
- closebool, default False
Whether to close the figure after saving
- **kwargsAny
Additional arguments passed to scatter_base
- sciv.pl.scatter_base(df: DataFrame, x: str, y: str, hue: str | None = None, hue_order: list | None = None, x_name: str | None = None, y_name: str | None = None, title: str | None = None, bar_label: str | None = None, cmap: str = 'Oranges', width: float = 2, height: float = 2, right: float = 0.9, bottom: float = 0, text_fontsize: float = 7, legend_fontsize: float = 7, start_color_index: int = 0, color_step_size: int = 0, type_colors: list | set | Tuple | ndarray | None = None, edge_color: str | None = None, size: float | list | set | Tuple | ndarray = 1.0, legend: dict | None = None, number: bool = False, is_text: bool = False, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Create a base scatter plot with customizable aesthetics.
Parameters
- dfDataFrame
Input data containing x, y coordinates and optional hue values
- xstr
Column name for x-axis values
- ystr
Column name for y-axis values
- huestr, optional
Column name for color grouping
- hue_orderlist, optional
Order of hue categories for legend
- x_namestr, optional
Label for x-axis
- y_namestr, optional
Label for y-axis
- titlestr, optional
Plot title
- bar_labelstr, optional
Label for colorbar when number=True
- cmapstr, default “Oranges”
Colormap for continuous coloring
- widthfloat, default 2
Figure width in inches
- heightfloat, default 2
Figure height in inches
- rightfloat, default 0.9
Position for legend anchor
- bottomfloat, default 0
Bottom margin adjustment
- text_fontsizefloat, default 7
Font size for annotation text
- legend_fontsizefloat, default 7
Font size for legend text
- start_color_indexint, default 0
Starting index in color palette
- color_step_sizeint, default 0
Step size for color selection
- type_colorscollection, optional
Custom color palette
- edge_colorstr, optional
Edge color for scatter points
- sizeUnion[float, collection], default 1.0
Size of scatter points
- legenddict, optional
Mapping to rename hue categories
- numberbool, default False
Whether to use continuous color scale
- is_textbool, default False
Whether to add text annotations
- outputpath, optional
Output file path
- showbool, default True
Whether to display the plot
- closebool, default False
Whether to close the figure after saving
- **kwargsAny
Additional arguments passed to sns.scatterplot
- sciv.pl.scatter_trait(trait_adata: AnnData, title: str | None = None, bar_label: str | None = None, trait_name: str = 'All', layers: list | set | Tuple | ndarray | None = None, columns: Tuple[str, str] = ('UMAP1', 'UMAP2'), cmap: str = 'viridis', width: float = 2, height: float = 2, right: float = 0.9, x_name: str | None = None, y_name: str | None = None, number: bool = True, edge_color: str | None = None, size: float | list | set | Tuple | ndarray = 1.0, text_fontsize: float = 7, legend_fontsize: float = 7, start_color_index: int = 0, color_step_size: int = 0, type_colors: list | set | Tuple | ndarray | None = None, is_text: bool = False, legend: dict | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Plot trait data scatter plot.
Parameters
- trait_adataAnnData
AnnData object containing trait/disease scores and cell metadata
- titlestr, optional
Title prefix for the plot
- bar_labelstr, optional
Label for colorbar when number=True
- trait_namestr, default “All”
Name of trait/disease to plot, or “All” to plot all traits
- layersUnion[None, collection], optional
List of layer names to plot from trait_adata.layers
- columnsTuple[str, str], default (“UMAP1”, “UMAP2”)
Column names for x and y coordinates in trait_adata.obs
- cmapstr, default “viridis”
Colormap for continuous coloring
- widthfloat, default 2
Figure width in inches
- heightfloat, default 2
Figure height in inches
- rightfloat, default 0.9
Position for legend anchor
- x_namestr, optional
Label for x-axis
- y_namestr, optional
Label for y-axis
- numberbool, default True
Whether to use continuous color scale for trait scores
- edge_colorstr, optional
Edge color for scatter points
- sizeUnion[float, collection], default 1.0
Size of scatter points
- text_fontsizefloat, default 7
Font size for annotation text
- legend_fontsizefloat, default 7
Font size for legend text
- start_color_indexint, default 0
Starting index in color palette
- color_step_sizeint, default 0
Step size for color selection
- type_colorscollection, optional
Custom color palette
- is_textbool, default False
Whether to add text annotations
- legenddict, optional
Mapping to rename hue categories
- outputpath, optional
Output directory path for saving plots
- showbool, default True
Whether to display the plot
- closebool, default False
Whether to close the figure after saving
- **kwargsAny
Additional arguments passed to scatter_base
- sciv.pl.volcano_base(df: DataFrame, x: str = 'Log2(Fold change)', y: str = '-Log10(P value)', hue: str = 'type', size: int = 3, palette: list | None = None, width: float = 2, height: float = 2, bottom: float = 0, y_min: float = 0, axh_value: float = np.float64(3.0), axv_left_value: float = -1, axv_right_value: float = 1, title: str | None = None, x_name: str | None = None, y_name: str | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Plot volcano plot.
Parameters
- dfDataFrame
Data frame.
- xstr, optional
X-axis.
- ystr, optional
Y-axis.
- huestr, optional
Hue.
- sizeint, optional
Size.
- paletteOptional[list], optional
Palette.
- widthfloat, optional
Width.
- heightfloat, optional
Height.
- bottomfloat, optional
Bottom.
- y_minfloat, optional
Y-min.
- axh_valuefloat, optional
Axh-value.
- axv_left_valuefloat, optional
Axv-left-value.
- axv_right_valuefloat, optional
Axv-right-value.
- titlestr, optional
Title.
- x_nameOptional[str], optional
X-name.
- y_nameOptional[str], optional
Y-name.
- outputpath, optional
Output.
- showbool, optional
Show to display the plot.
- closebool, optional
Close to close the figure after saving.
- kwargsAny, optional
Additional keyword arguments passed to sns.scatterplot.
Returns
None
Violin
Violin chart visualization function.
- sciv.pl.violin_base(df: DataFrame, value: str = 'value', x_name: str | None = None, y_name: str = 'value', kind: Literal['strip', 'swarm', 'box', 'violin', 'boxen', 'point', 'bar', 'count'] = 'violin', groupby: str = 'clusters', palette: Tuple | list | None = None, hue: str | None = None, width: float = 2, height: float = 2, bottom: float = 0.3, rotation: float = 65, line_width: float = 0.5, title: str | None = None, split: bool = False, is_sort: bool = True, order_names: list | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Plot violin plot.
Parameters
- dfDataFrame
Input data.
- valuestr, optional
Value column.
- x_namestr, optional
X name.
- y_namestr, optional
Y name.
- kind_Kind, optional
Kind of plot.
- groupbystr, optional
Clusters column.
- paletteUnion[Tuple, list], optional
Palette.
- huestr, optional
Hue column.
- widthfloat, optional
Width.
- heightfloat, optional
Height.
- bottomfloat, optional
Bottom.
- rotationfloat, optional
Rotation.
- line_widthfloat, optional
Line width.
- titlestr, optional
Title.
- splitbool, optional
Whether to split.
- is_sortbool, optional
Whether to sort.
- order_nameslist, optional
Order names.
- outputpath, optional
Output path.
- showbool, optional
Whether to show.
- closebool, optional
Whether to close.
- kwargsAny, optional
Keyword arguments.
Returns
None
- sciv.pl.violin_trait(trait_df: DataFrame, trait_name: str | list = 'All', trait_column_name: str = 'id', value: str = 'value', groupby: str = 'clusters', kind: Literal['strip', 'swarm', 'box', 'violin', 'boxen', 'point', 'bar', 'count'] = 'violin', x_name: str | None = None, y_name: str = 'value', palette: Tuple | None = None, width: float = 2, height: float = 2, rotation: float = 65, line_width: float = 0.1, bottom: float = 0.3, split: bool = False, is_sort: bool = True, order_names: list | None = None, title: str | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Plot violin plot for trait data.
This function creates violin plots (or other categorical plots) for trait data, allowing visualization of trait distributions across different clusters.
Parameters
- trait_dfDataFrame
Input trait data containing trait information and values.
- trait_nameUnion[str, list], optional
Name(s) of the trait(s) to plot. Use “All” to plot all traits.
- trait_column_namestr, optional
Column name in trait_df that contains trait identifiers.
- valuestr, optional
Column name containing the values to plot.
- groupbystr, optional
Column name containing cluster assignments.
- kind_Kind, optional
Type of categorical plot to create (e.g., “violin”, “box”, “strip”).
- x_namestr, optional
Label for the x-axis.
- y_namestr, optional
Label for the y-axis.
- paletteTuple, optional
Color palette for the plot.
- widthfloat, optional
Width of the figure in inches.
- heightfloat, optional
Height of the figure in inches.
- rotationfloat, optional
Rotation angle for x-axis labels in degrees.
- line_widthfloat, optional
Width of the plot lines.
- bottomfloat, optional
Bottom margin of the figure.
- splitbool, optional
Whether to split the violin plot when using hue.
- is_sortbool, optional
Whether to sort clusters by median value.
- order_nameslist, optional
Custom order for cluster names.
- titlestr, optional
Title prefix for the plot.
- outputpath, optional
Directory path to save the output files.
- showbool, optional
Whether to display the plot.
- closebool, optional
Whether to close the figure after saving.
- kwargsAny, optional
Additional keyword arguments passed to violin_base.
Returns
None
Box
Visualization function of box diagram.
- sciv.pl.box_base(df: DataFrame, x: str = 'clusters', y: str = 'value', x_name: str | None = None, y_name: str = 'value', palette: Tuple | list | None = None, width: float = 2, height: float = 2, bottom: float = 0.3, line_width: float = 0.3, marker_size: float = 0.2, rotation: float = 65, orient: str | None = None, title: str | None = None, whis: float = 1.5, show_fliers: bool = True, is_sort: bool = True, order_names: list | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Create a box plot with customizable styling options.
Parameters
- dfDataFrame
Input data containing the values to plot.
- xstr, default “clusters”
Column name for the x-axis categorical variable.
- ystr, default “value”
Column name for the y-axis numerical variable.
- x_namestr, optional
Custom label for the x-axis. If None, uses the x column name.
- y_namestr, default “value”
Custom label for the y-axis.
- paletteUnion[Tuple, list], optional
Color palette for the boxes. If None and “color” column exists, uses that.
- widthfloat, default 2
Width of the figure in inches.
- heightfloat, default 2
Height of the figure in inches.
- bottomfloat, default 0.3
Bottom margin adjustment for the plot.
- line_widthfloat, default 0.3
Width of lines in the plot (box edges, whiskers, etc.).
- marker_sizefloat, default 0.2
Size of outlier markers.
- rotationfloat, default 65
Rotation angle for x-axis tick labels in degrees.
- orientstr, optional
Orientation of the plot (“v” for vertical, “h” for horizontal).
- titlestr, optional
Title of the plot.
- whisfloat, default 1.5
Proportion of the IQR past the low and high quartiles to extend the whiskers.
- show_fliersbool, default True
Whether to display outlier points beyond the whiskers.
- is_sortbool, default True
Whether to sort boxes by median value in descending order.
- order_nameslist, optional
Custom order for x-axis categories. Only used if is_sort is False.
- outputpath, optional
File path to save the plot. If None, plot is not saved.
- showbool, default True
Whether to display the plot.
- closebool, default False
Whether to close the figure after displaying.
- **kwargsAny
Additional keyword arguments passed to seaborn.boxplot.
- sciv.pl.box_trait(trait_df: DataFrame, trait_name: str = 'All', trait_column_name: str = 'id', value: str = 'value', groupby: str = 'clusters', x_name: str | None = None, y_name: str = 'value', palette: Tuple | list | None = None, orient: str | None = None, width: float = 2, height: float = 2, line_width: float = 0.1, marker_size: float = 0.5, bottom: float = 0.3, rotation: float = 65, whis: float = 1.5, show_fliers: bool = True, is_sort: bool = True, order_names: list | None = None, title: str | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Create box plots for trait/disease data across different clusters.
This function generates box plots for each trait or a specific trait from the input dataframe. It filters data by trait and creates individual box plots using the box_base function.
Parameters
- trait_dfDataFrame
Input data containing trait/disease information and values to plot.
- trait_namestr, default “All”
Name of the trait/disease to plot. Use “All” to plot all traits.
- trait_column_namestr, default “id”
Column name in trait_df that contains trait/disease identifiers.
- valuestr, default “value”
Column name for the numerical values to be plotted on y-axis.
- groupbystr, default “clusters”
Column name for the cluster categories to be plotted on x-axis.
- x_namestr, optional
Custom label for the x-axis. If None, uses the clusters column name.
- y_namestr, default “value”
Custom label for the y-axis.
- paletteUnion[Tuple, list], optional
Color palette for the boxes.
- orientstr, optional
Orientation of the plot (“v” for vertical, “h” for horizontal).
- widthfloat, default 2
Width of the figure in inches.
- heightfloat, default 2
Height of the figure in inches.
- line_widthfloat, default 0.1
Width of lines in the plot.
- marker_sizefloat, default 0.5
Size of outlier markers.
- bottomfloat, default 0.3
Bottom margin adjustment for the plot.
- rotationfloat, default 65
Rotation angle for x-axis tick labels in degrees.
- whisfloat, default 1.5
Proportion of the IQR to extend the whiskers.
- show_fliersbool, default True
Whether to display outlier points beyond the whiskers.
- is_sortbool, default True
Whether to sort boxes by median value.
- order_nameslist, optional
Custom order for x-axis categories.
- titlestr, optional
Base title for the plots. Trait name will be appended.
- outputpath, optional
Directory path to save the plots. If None, plots are not saved.
- showbool, default True
Whether to display the plots.
- closebool, default False
Whether to close the figure after displaying.
- **kwargsAny
Additional keyword arguments passed to box_base function.
KDE
Visualization function of kernel density estimation map.
- sciv.pl.kde(adata: AnnData, layer: str | None = None, x_name: str | None = None, y_name: str | None = None, title: str | None = None, width: float = 4, height: float = 2, bottom: float = 0.3, axis: Literal[-1, 0, 1] = -1, sample_number: int = 1000000, is_legend: bool = True, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Plot Kernel Density Estimation (KDE) for single-cell data.
Parameters
- adataAnnData
Annotated data matrix with observations (rows) and variables (columns).
- layerstr, optional
Which layer of adata to use. If None, uses adata.X.
- x_namestr, optional
Label for the x-axis.
- y_namestr, optional
Label for the y-axis.
- titlestr, optional
Title of the plot.
- widthfloat, default=4
Width of the figure in inches.
- heightfloat, default=2
Height of the figure in inches.
- bottomfloat, default=0.3
Bottom margin of the figure.
- axisLiteral[-1, 0, 1], default=-1
Axis along which to compute KDE: - -1: Flatten all data and compute single KDE. - 0: Compute KDE for each column (variable). - 1: Compute KDE for each row (observation).
- sample_numberint, default=1000000
Maximum number of samples to use for KDE computation. If data exceeds this, random downsampling is applied.
- is_legendbool, default=True
Whether to display legend when axis is 0 or 1.
- outputpath, optional
Path to save the figure. If None, figure is not saved.
- showbool, default=True
Whether to display the figure.
- closebool, default=False
Whether to close the figure after displaying.
- **kwargsAny
Additional keyword arguments passed to seaborn.kdeplot.
Line
Line chart visualization function.
- sciv.pl.base_line(data: AnnData | DataFrame, x: str, y: str, layer: str | None = None, width: float = 2, height: float = 2, bottom: float = 0, title: str | None = None, x_name: str | None = None, y_name: str | None = None, label: str | None = None, legend: str | None = None, legend_list: list | None = None, start_color_index: int = 0, color_step_size: int = 0, color_type: str = 'set', colors: list | None = None, line_width: float = 1.5, x_name_rotation: float = 65, x_ticks: int | list | set | Tuple | ndarray | None = None, y_limit: Tuple[float, float] = (0, 1), output: str | Path | None = None, is_str: bool = True, show: bool = True, close: bool = False, **kwargs: Any) None
Base line plot function for visualizing data trends over time or categories.
This function creates a line plot from either AnnData or DataFrame objects, supporting grouped data visualization with customizable colors, legends, and styling.
Parameters
- dataUnion[AnnData, DataFrame]
Input data object, can be either AnnData (single-cell data) or pandas DataFrame.
- xstr
Column name to use for x-axis values.
- ystr
Column name to use for y-axis values.
- layerOptional[str], default None
Specific layer to use from AnnData.layers when data is AnnData.
- widthfloat, default 2
Figure width in inches.
- heightfloat, default 2
Figure height in inches.
- bottomfloat, default 0
Bottom margin adjustment for the plot.
- titleOptional[str], default None
Title of the plot.
- x_nameOptional[str], default None
Label for x-axis. If None, uses x column name.
- y_nameOptional[str], default None
Label for y-axis. If None, uses y column name.
- labelOptional[str], default None
Column name used for grouping data (creates separate lines).
- legendOptional[str], default None
Title for the legend. If None and label is provided, uses “category”.
- legend_listlist, default None
List of specific group values to include in the plot.
- start_color_indexint, default 0
Starting index for color selection from the color palette.
- color_step_sizeint, default 0
Step size for selecting colors from the palette.
- color_typestr, default “set”
Type of color palette to use (key from plot_color_types).
- colorslist, default None
Custom list of colors to use for the plot.
- line_widthfloat, default 1.5
Width of the lines in the plot.
- x_name_rotationfloat, default 65
Rotation angle for x-axis tick labels (in degrees).
- x_ticksOptional[Union[int, collection]], default None
Custom tick positions or number of ticks for x-axis.
- y_limitTuple[float, float], default (0, 1)
Y-axis limits as (min, max) tuple.
- outputOptional[path], default None
File path to save the figure. If None, figure is not saved.
- is_strbool, default True
Whether to treat x-axis values as strings (affects tick formatting).
- showbool, default True
Whether to display the plot.
- closebool, default False
Whether to close the figure after display.
- **kwargsAny
Additional keyword arguments passed to seaborn.lineplot.
Returns
- None
The function displays and/or saves the plot but does not return any value.
Bar
Bar chart visualization function.
- sciv.pl.bar(ax_x: list | set | Tuple | ndarray, ax_y: list | set | Tuple | ndarray, x_name: str | None = None, y_name: str | None = None, title: str | None = None, color: str = '#70b5de', text_color: str = '#000205', width: float = 2, height: float = 2, bottom: float = 0, text_left_move: float = 0.1, direction: Literal['vertical', 'horizontal'] = 'vertical', output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Create a simple bar chart with optional value labels.
This function generates a bar plot (vertical or horizontal) with customizable appearance and automatically adds numerical value labels on each bar.
Parameters
- ax_xcollection
Categories or labels for the x-axis (or y-axis if horizontal).
- ax_ycollection
Numerical values for the bar heights (or widths if horizontal).
- x_namestr, optional
Label for the x-axis. Default is None.
- y_namestr, optional
Label for the y-axis. Default is None.
- titlestr, optional
Title of the plot. Default is None.
- colorstr, default “#70b5de”
Color of the bars.
- text_colorstr, default “#000205”
Color of the value labels on bars.
- widthfloat, default 2
Width of the figure in inches.
- heightfloat, default 2
Height of the figure in inches.
- bottomfloat, default 0
Bottom margin adjustment.
- text_left_movefloat, default 0.1
Horizontal adjustment for text position on bars.
- directionLiteral[‘vertical’, ‘horizontal’], default “vertical”
Orientation of the bars.
- outputpath, optional
File path to save the figure. Default is None.
- showbool, default True
Whether to display the plot.
- closebool, default False
Whether to close the figure after saving.
- **kwargsAny
Additional keyword arguments passed to matplotlib’s bar/barh function.
Returns
- None
The function displays and/or saves the plot but does not return any value.
- sciv.pl.bar_significance(df: DataFrame, x: str, y: str, hue: str, x_name: str | None = None, y_name: str | None = None, anchor: str | None = None, legend: str | None = None, legend_list: list | None = None, hue_order: list | None = None, width: float = 2, height: float = 2, bottom: float = 0, legend_gap: float = 1.15, line_width: float = 0.5, capsize: float = 0.1, errcolor: str = 'k', start_color_index: int = 0, color_step_size: int = 0, color_type: str = 'set', test: str = 't-test_ind', ci: str | float = 'sd', x_rotation: float = 0, x_deviation: float = 0.02, y_deviation: float = 0.02, y_limit: Tuple[float, float] = (0, 1), anno: bool = False, anno_fontsize: float = 7, line_height: float = 0.01, line_offset: float = 0.01, colors: list | dict | None = None, title: str | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Create a bar chart with statistical significance annotations relative to an anchor group.
This function generates a grouped bar plot with error bars and performs pairwise statistical significance testing between an anchor group and other groups within each category. It supports custom color palettes, legend positioning, and various statistical tests.
Parameters
- dfDataFrame
Input DataFrame containing the data to plot.
- xstr
Column name for x-axis categories.
- ystr
Column name for y-axis values.
- huestr
Column name for grouping bars by color.
- x_namestr, optional
Label for x-axis. Default is None.
- y_namestr, optional
Label for y-axis. Default is None.
- anchorstr, optional
Reference group name for pairwise significance testing. If provided, statistical comparisons will be made between this group and all other groups within each x category.
- legendstr, optional
Legend title. Default is “category”.
- legend_listlist, optional
Subset of hue values to include in the plot. If provided, only these values will be plotted. Default is None.
- hue_orderlist, optional
Order of hue categories for plotting and legend. Default is None.
- widthfloat, default 2
Width of the figure in inches.
- heightfloat, default 2
Height of the figure in inches.
- bottomfloat, default 0
Bottom margin adjustment.
- legend_gapfloat, default 1.15
Vertical gap between plot and legend, specified as a ratio of the y-axis height.
- line_widthfloat, default 0.5
Width of error bars and significance annotation lines.
- capsizefloat, default 0.1
Width of the error bar caps.
- errcolorstr, default “k”
Color of the error bars.
- start_color_indexint, default 0
Starting index in the color palette for the first hue category.
- color_step_sizeint, default 0
Step size when cycling through the color palette for subsequent hue categories.
- color_typestr, default “set”
Name of the seaborn color palette to use. Must be a key in plot_color_types.
- teststr, default “t-test_ind”
Statistical test for pairwise comparisons. Options include: {“t-test_ind”, “t-test_welch”, “t-test_paired”, “Mann-Whitney”, “Mann-Whitney-gt”,
“Mann-Whitney-ls”, “Levene”, “Wilcoxon”, “Kruskal”, “Brunner-Munzel”}.
- ciUnion[str, float], default “sd”
Confidence interval type or value for error bars. Can be “sd” for standard deviation or a float for confidence interval percentage.
- x_rotationfloat, default 0
Rotation angle for x-axis tick labels in degrees.
- x_deviationfloat, default 0.02
Horizontal offset for bar value annotations.
- y_deviationfloat, default 0.02
Vertical offset adjustment for bar value annotations.
- y_limitTuple[float, float], default (0, 1)
Y-axis limits for the plot.
- annobool, default False
Whether to annotate bars with their numerical values.
- anno_fontsizefloat, default 7
Font size for bar value annotations.
- line_heightfloat, default 0.01
Height of significance annotation lines as a fraction of y-axis range.
- line_offsetfloat, default 0.01
Vertical offset for significance annotation lines from the bar tops.
- colorsUnion[list, dict], optional
Custom color list or dictionary mapping hue values to colors. If provided, overrides the default color palette. Default is None.
- titlestr, optional
Title of the plot. Default is None.
- outputpath, optional
File path to save the figure. Default is None.
- showbool, default True
Whether to display the plot.
- closebool, default False
Whether to close the figure after saving.
- **kwargsAny
Additional keyword arguments passed to seaborn’s barplot function.
Returns
- None
The function displays and/or saves the plot but does not return any value.
- sciv.pl.bar_trait(trait_df: DataFrame, trait_name: str = 'All', trait_column_name: str = 'id', value: str = 'rate', groupby: str = 'clusters', x_name: str = 'Cell type', y_name: str = 'Enrichment ratio', color: Tuple = ('#2e6fb7', '#f7f7f7'), legend: Tuple = ('Enrichment', 'Conservative'), text_color: str = '#000205', groupby_sort: list | None = None, width: float = 2, height: float = 2, bottom: float = 0, rotation: float = 65, title: str | None = None, text_left_move: float = 0.15, y_limit: Tuple[float, float] = (0, 1), output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any)
Create stacked bar charts for multiple traits or a specific trait.
This function generates enrichment bar plots for traits (e.g., diseases, gene sets) in the input DataFrame. It can plot all traits or a specific trait based on the trait_name parameter. Each trait’s enrichment data is visualized using the class_bar function, with results saved to individual files.
Parameters
- trait_dfDataFrame
Input DataFrame containing trait enrichment data. Must include columns for trait identifiers, cluster labels, and enrichment values.
- trait_namestr, default “All”
The specific trait to plot. If “All”, plots bar charts for all unique traits in the trait_column_name column.
- trait_column_namestr, default “id”
Column name in trait_df that contains trait identifiers.
- valuestr, default “rate”
Column name containing the numerical enrichment values to plot.
- groupbystr, default “clusters”
Column name containing cluster or cell type labels.
- x_namestr, default “Cell type”
Label for the x-axis.
- y_namestr, default “Enrichment ratio”
Label for the y-axis.
- colorTuple, default (“#2e6fb7”, “#f7f7f7”)
Colors for the two bar segments (enrichment color, conservative color).
- legendTuple, default (“Enrichment”, “Conservative”)
Labels for the legend corresponding to the two bar segments.
- text_colorstr, default “#000205”
Color of the value labels on bars.
- groupby_sortOptional[list], default None
Custom order for clusters. If provided, clusters will be sorted according to this list. If None, clusters are sorted by enrichment value.
- widthfloat, default 2
Width of the figure in inches.
- heightfloat, default 2
Height of the figure in inches.
- bottomfloat, default 0
Bottom margin adjustment.
- rotationfloat, default 65
Rotation angle for x-axis tick labels in degrees.
- titlestr, optional
Base title of the plot. The trait name will be appended to this title. Default is None.
- text_left_movefloat, default 0.15
Horizontal adjustment for text position on bars.
- y_limitTuple[float, float], default (0, 1)
The y-axis limits for the plot.
- outputpath, optional
Directory path to save the figures. If provided, each trait’s plot will be saved as a PDF file in this directory. Default is None.
- showbool, default True
Whether to display the plot.
- closebool, default False
Whether to close the figure after saving.
- **kwargsAny
Additional keyword arguments passed to the class_bar function.
Returns
- None
The function displays and/or saves the plots but does not return any value.
- sciv.pl.class_bar(df: DataFrame, value: str = 'rate', by: str = 'value', groupby: str = 'clusters', color: Tuple = ('#2e6fb7', '#f7f7f7'), x_name: str = 'Cell type', y_name: str = 'Enrichment ratio', legend: Tuple = ('Enrichment', 'Conservative'), text_color: str = '#000205', groupby_sort: list | None = None, width: float = 2, height: float = 2, bottom: float = 0, rotation: float = 65, title: str | None = None, text_left_move: float = 0.15, y_limit: Tuple[float, float] = (0, 1), output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any)
Create a stacked bar chart for enrichment analysis with two categories.
This function filters a DataFrame by a binary column, sorts the data by clusters, and generates a stacked bar plot using the two_bar function. It is typically used to visualize enrichment ratios where one category represents enriched values and the other represents conservative values.
Parameters
- dfDataFrame
Input DataFrame containing the data to plot.
- valuestr, default “rate”
Column name containing the numerical values to plot.
- bystr, default “value”
Column name used to filter the DataFrame into two categories (typically binary: 0 and 1).
- groupbystr, default “clusters”
Column name containing the cluster labels or categories.
- colorTuple, default (“#2e6fb7”, “#f7f7f7”)
Colors for the two bar segments (enrichment color, conservative color).
- x_namestr, default “Cell type”
Label for the x-axis.
- y_namestr, default “Enrichment ratio”
Label for the y-axis.
- legendTuple, default (“Enrichment”, “Conservative”)
Labels for the legend corresponding to the two bar segments.
- text_colorstr, default “#000205”
Color of the value labels on bars.
- groupby_sortOptional[list], default None
Custom order for clusters. If provided, clusters will be sorted according to this list. If None, clusters will be sorted by value in descending order.
- widthfloat, default 2
Width of the figure in inches.
- heightfloat, default 2
Height of the figure in inches.
- bottomfloat, default 0
Bottom margin adjustment.
- rotationfloat, default 65
Rotation angle for x-axis tick labels in degrees.
- titlestr, optional
Title of the plot. Default is None.
- text_left_movefloat, default 0.15
Horizontal adjustment for text position on bars.
- y_limitTuple[float, float], default (0, 1)
The y-axis limits for the plot.
- outputpath, optional
File path to save the figure. Default is None.
- showbool, default True
Whether to display the plot.
- closebool, default False
Whether to close the figure after saving.
- **kwargsAny
Additional keyword arguments passed to the two_bar function.
Returns
- None
The function displays and/or saves the plot but does not return any value.
- sciv.pl.rate_bar_plot(adata: AnnData, layer: str | None = None, trait_name: str = 'All', dir_name: str = 'feature', column: str = 'value', groupby: str = 'clusters', color: Tuple = ('#2e6fb7', '#f7f7f7'), legend: Tuple = ('Enrichment', 'Conservative'), x_name: str = 'Cell type', y_name: str = 'Enrichment ratio', groupby_sort: list | None = None, text_color: str = '#000205', width: float = 2, height: float = 2, bottom: float = 0, rotation: float = 65, title: str | None = None, text_left_move: float = 0.15, y_limit: Tuple[float, float] = (0, 1), plot_output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Generate a bar plot showing enrichment ratios for trait-cluster combinations.
This function calculates the completion ratio using the complete_ratio function and visualizes the results as a bar plot. It handles directory creation for output files and passes appropriate parameters to the bar_trait plotting function.
Parameters
- adataAnnData
Input AnnData object containing the data to be visualized.
- layerstr, optional
Specify the layer of the matrix to be processed. If None, uses the main matrix.
- trait_namestr, default “All”
The name of the trait being analyzed, used for filtering data.
- dir_namestr, default “feature”
Folder name for generating and saving bar plot outputs.
- columnstr, default “value”
The column name containing the binary enrichment values.
- groupbystr, default “clusters”
The column name in adata.obs that defines the cell clusters.
- colorTuple, default (“#2e6fb7”, “#f7f7f7”)
Color tuple for the bar plot (enrichment color, conservative color).
- legendTuple, default (“Enrichment”, “Conservative”)
Legend labels for the two categories in the plot.
- x_namestr, default “Cell type”
The label for the x-axis.
- y_namestr, default “Enrichment ratio”
The label for the y-axis.
- groupby_sortOptional[list], optional
Custom order for clusters. If None, uses default sorting.
- text_colorstr, default “#000205”
Color for text annotations in the plot.
- widthfloat, default 2
The width of the output figure in inches.
- heightfloat, default 2
The height of the output figure in inches.
- bottomfloat, default 0
Bottom margin adjustment for the plot.
- rotationfloat, default 65
Rotation angle for x-axis labels in degrees.
- titlestr, optional
The title of the plot. If None, no title is displayed.
- text_left_movefloat, default 0.15
Horizontal adjustment for text position.
- y_limitTuple[float, float], default (0, 1)
The y-axis limits for the plot.
- plot_outputpath, optional
Directory path for saving output files. If None, figures are not saved.
- showbool, default True
If True, display the figure interactively.
- closebool, default False
If True, close the figure after saving.
- **kwargsAny
Additional keyword arguments passed to bar_trait function.
Returns
- None
This function does not return any value. Outputs are saved to files or displayed.
- sciv.pl.two_bar(ax_x: list | set | Tuple | ndarray, ax_y: Tuple, x_name: str | None = None, y_name: str | None = None, legend: Tuple = ('1', '2'), color: Tuple = ('#2e6fb7', '#f7f7f7'), text_color: str = '#000205', width: float = 2, height: float = 2, bottom: float = 0, rotation: float = 65, text_left_move: float = 0.15, y_limit: Tuple[float, float] = (0, 1), title: str | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any)
Create a stacked bar chart with two categories.
This function generates a stacked bar plot where two sets of values are displayed as stacked bars. It automatically adds numerical value labels on the first bar segment and includes a legend for the two categories.
Parameters
- ax_xcollection
Categories or labels for the x-axis.
- ax_yTuple
A tuple containing two collections of numerical values for the two bar segments. The second segment will be stacked on top of the first.
- x_namestr, optional
Label for the x-axis. Default is None.
- y_namestr, optional
Label for the y-axis. Default is None.
- legendTuple, default (“1”, “2”)
Labels for the legend corresponding to the two bar segments.
- colorTuple, default (“#2e6fb7”, “#f7f7f7”)
Colors for the two bar segments (first segment, second segment).
- text_colorstr, default “#000205”
Color of the value labels on bars.
- widthfloat, default 2
Width of the figure in inches.
- heightfloat, default 2
Height of the figure in inches.
- bottomfloat, default 0
Bottom margin adjustment.
- rotationfloat, default 65
Rotation angle for x-axis tick labels in degrees.
- text_left_movefloat, default 0.15
Horizontal adjustment for text position on bars.
- y_limitTuple[float, float], default (0, 1)
The y-axis limits for the plot.
- titlestr, optional
Title of the plot. Default is None.
- outputpath, optional
File path to save the figure. Default is None.
- showbool, default True
Whether to display the plot.
- closebool, default False
Whether to close the figure after saving.
- **kwargsAny
Additional keyword arguments passed to matplotlib’s bar function.
Returns
- None
The function displays and/or saves the plot but does not return any value.
Barcode
Barcode visualization function.
- sciv.pl.barcode_base(df: DataFrame, groupby_list: list, sort_column: str = 'value', column: str = 'clusters', width: float = 1, height: float = 3, trait_column_name: str = 'id', title: str | None = None, cmap: str = 'Oranges', bar_label: str = 'TRS', is_ticks: bool = True, colors: list | None = None, ground_true: list | None = None, output: str | Path | None = None, show: bool = True, close: bool = False) None
Plot barcode plot.
Parameters
- dfDataFrame
Input data.
- groupby_listlist
Cluster list.
- sort_columnstr, optional
Sort column.
- columnstr, optional
Column name for clusters.
- widthfloat, optional
Width.
- heightfloat, optional
Height.
- trait_column_namestr, optional
Trait column name.
- titlestr, optional
Title.
- cmapstr, optional
Cmap.
- bar_labelstr, optional
Bar label.
- is_ticksbool, optional
Whether to show ticks.
- colorslist, optional
Colors.
- ground_truelist, optional
Ground true.
- outputpath, optional
Output path.
- showbool, optional
Whether to display the plot.
- closebool, optional
Whether to close the figure after display.
- sciv.pl.barcode_trait(trait_df: DataFrame, trait_name: str = 'All', trait_column_name: str = 'id', sort_column: str = 'value', groupby: str = 'clusters', cmap: str = 'viridis', width: float = 1, height: float = 3, is_ticks: bool = True, colors: list | None = None, ground_true: list | None = None, title: str | None = None, suffix: str = 'pdf', output: str | Path | None = None, show: bool = True, close: bool = False) None
Plot barcode plots for traits/diseases.
This function generates barcode visualizations for specified traits or all traits in the dataset. It creates individual plots for each trait showing the distribution of trait scores across different clusters.
Parameters
- trait_dfDataFrame
Input DataFrame containing trait scores and cluster information.
- trait_namestr, optional
Name of the trait/disease to plot. Use “All” to plot all traits. Default is “All”.
- trait_column_namestr, optional
Column name in the DataFrame that contains trait identifiers. Default is “id”.
- sort_columnstr, optional
Column name used for sorting values in the barcode plot. Default is “value”.
- groupbystr, optional
Column name in the DataFrame that contains cluster assignments. Default is “clusters”.
- cmapstr, optional
Colormap name for the value heatmap. Default is “viridis”.
- widthfloat, optional
Width of the figure in inches. Default is 1.
- heightfloat, optional
Height of the figure in inches. Default is 3.
- is_ticksbool, optional
Whether to display colorbar ticks. Default is True.
- colorslist, optional
Custom color list for cluster visualization. If None, uses default colors. Default is None.
- ground_truelist, optional
Ground truth cluster labels for ordering. Default is None.
- titlestr, optional
Base title for the plots. Trait name will be appended. Default is None.
- suffixstr, optional
File extension for output plots (e.g., “pdf”, “png”). Default is “pdf”.
- outputpath, optional
Directory path for saving output files. Default is None.
- showbool, optional
Whether to display the plots interactively. Default is True.
- closebool, optional
Whether to close the figure after display. Default is False.
Pie
Pie chart visualization function.
- sciv.pl.base_pie(values: list, labels: list, x_name: str | None = None, y_name: str | None = None, title: str | None = None, width: float = 2, height: float = 2, bottom: float = 0, pct_distance: float = 0.6, label_distance: float = 1.1, colors: list | None = None, autopct: str = '%1.2f%%', output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Create a basic pie chart with customizable parameters.
This function generates a simple pie chart using matplotlib, with support for custom colors, labels, and various display options.
Parameters
- valueslist
The values to be plotted in the pie chart.
- labelslist
The labels corresponding to each value in the pie chart.
- x_namestr, optional
The label for the x-axis. Default is None.
- y_namestr, optional
The label for the y-axis. Default is None.
- titlestr, optional
The title of the pie chart. Default is None.
- widthfloat, optional
The width of the figure in inches. Default is 2.
- heightfloat, optional
The height of the figure in inches. Default is 2.
- bottomfloat, optional
The bottom margin of the figure. Default is 0.
- pct_distancefloat, optional
The distance of the percentage labels from the center of the pie. Default is 0.6.
- label_distancefloat, optional
The distance of the labels from the center of the pie. Default is 1.1.
- colorslist, optional
A list of colors to use for the pie slices. If None, default colors will be used. Default is None.
- autopctstr, optional
The format string for the percentage labels. Default is ‘%1.2f%%’.
- outputpath, optional
The file path to save the figure. If None, the figure will not be saved. Default is None.
- showbool, optional
Whether to display the figure. Default is True.
- closebool, optional
Whether to close the figure after displaying. Default is False.
- **kwargsAny
Additional keyword arguments passed to matplotlib’s pie function.
- sciv.pl.pie_label(df: DataFrame, map_groupby: str | list | set | Tuple | ndarray, value: str = 'value', groupby: str = 'clusters', x_name: str | None = None, y_name: str | None = None, title: str | None = None, width: float = 2, height: float = 2, bottom: float = 0, radius: float = 0.6, fontsize: float = 17, pct_distance: float = 0.6, label_distance: float = 1.1, colors: list | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Create a donut-style pie chart showing cluster label distribution.
This function generates a pie chart with a central hole (donut chart) to visualize the distribution of predicted cluster labels against true labels. The chart displays the percentage of correctly predicted labels in the center.
Parameters
- dfDataFrame
The input data containing cluster and value information.
- map_groupbyUnion[str, collection]
The mapping of clusters, can be a column name or a collection of cluster labels.
- valuestr, optional
The column name for values in the DataFrame. Default is “value”.
- groupbystr, optional
The column name for cluster labels in the DataFrame. Default is “clusters”.
- x_namestr, optional
The label for the x-axis. Default is None.
- y_namestr, optional
The label for the y-axis. Default is None.
- titlestr, optional
The title of the pie chart. Default is None.
- widthfloat, optional
The width of the figure in inches. Default is 2.
- heightfloat, optional
The height of the figure in inches. Default is 2.
- bottomfloat, optional
The bottom margin of the figure. Default is 0.
- radiusfloat, optional
The radius of the inner white circle to create donut effect. Default is 0.6.
- fontsizefloat, optional
The font size for the percentage text in the center. Default is 17.
- pct_distancefloat, optional
The distance of the percentage labels from the center of the pie. Default is 0.6.
- label_distancefloat, optional
The distance of the labels from the center of the pie. Default is 1.1.
- colorslist, optional
A list of colors to use for the pie slices. If None, default colors will be used. Default is None.
- outputpath, optional
The file path to save the figure. If None, the figure will not be saved. Default is None.
- showbool, optional
Whether to display the figure. Default is True.
- closebool, optional
Whether to close the figure after displaying. Default is False.
- **kwargsAny
Additional keyword arguments passed to matplotlib’s pie function.
- sciv.pl.pie_trait(trait_df: DataFrame, trait_groupby_map: dict, trait_name: str = 'All', groupby: str = 'clusters', trait_column_name: str = 'id', value: str = 'value', x_name: str | None = None, y_name: str | None = None, title: str | None = None, width: float = 2, height: float = 2, radius: float = 0.6, fontsize: float = 17, pct_distance: float = 0.6, label_distance: float = 1.1, colors: list | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Create pie charts for trait/disease cluster distribution analysis.
This function generates donut-style pie charts to visualize the distribution of trait-specific scores across different cell clusters. It supports batch processing for multiple traits or single trait analysis.
Parameters
- trait_dfDataFrame
The input data containing trait information, cluster labels, and values.
- trait_groupby_mapdict
A dictionary mapping trait names to their corresponding cluster mappings. Keys are trait names, values are cluster label mappings.
- trait_namestr, optional
The specific trait to plot. Use “All” to plot all traits in the data. Default is “All”.
- groupbystr, optional
The column name for cluster labels in the DataFrame. Default is “clusters”.
- trait_column_namestr, optional
The column name for trait identifiers in the DataFrame. Default is “id”.
- valuestr, optional
The column name for values/scores in the DataFrame. Default is “value”.
- x_namestr, optional
The label for the x-axis. Default is None.
- y_namestr, optional
The label for the y-axis. Default is None.
- titlestr, optional
The base title for the pie charts. Trait name will be appended if provided. Default is None.
- widthfloat, optional
The width of the figure in inches. Default is 2.
- heightfloat, optional
The height of the figure in inches. Default is 2.
- radiusfloat, optional
The radius of the inner white circle to create donut effect. Default is 0.6.
- fontsizefloat, optional
The font size for the percentage text in the center. Default is 17.
- pct_distancefloat, optional
The distance of the percentage labels from the center of the pie. Default is 0.6.
- label_distancefloat, optional
The distance of the labels from the center of the pie. Default is 1.1.
- colorslist, optional
A list of colors to use for the pie slices. If None, default colors will be used. Default is None.
- outputpath, optional
The directory path to save the figures. If None, figures will not be saved. Default is None.
- showbool, optional
Whether to display the figure. Default is True.
- closebool, optional
Whether to close the figure after displaying. Default is False.
- **kwargsAny
Additional keyword arguments passed to the pie_label function.
Bubble
Bubble chart visualization function.
- sciv.pl.bubble(df: DataFrame, x: str, y: str, hue: str | None = None, size: str | None = None, x_name: str | None = None, y_name: str | None = None, title: str | None = None, width: float = 2, height: float = 2, bottom: float = 0, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any)
Create a bubble plot using seaborn’s relplot.
Parameters
- dfDataFrame
Input data structure.
- xstr
Column name for x-axis values.
- ystr
Column name for y-axis values.
- huestr, optional
Column name for color encoding.
- sizestr, optional
Column name for size encoding.
- x_namestr, optional
Custom label for x-axis.
- y_namestr, optional
Custom label for y-axis.
- titlestr, optional
Plot title.
- widthfloat, default=2
Figure width in inches.
- heightfloat, default=2
Figure height in inches.
- bottomfloat, default=0
Bottom margin adjustment.
- outputpath, optional
File path to save the figure.
- showbool, default=True
Whether to display the plot.
- closebool, default=False
Whether to close the figure after display.
- **kwargsAny
Additional arguments passed to seaborn.relplot.
Radar
Radar visualization function.
- sciv.pl.base_radar(df: DataFrame, ax_x: str, ax_y: str, hue: str, x_name: str | None = None, y_name: str | None = None, title: str | None = None, width: float = 4, height: float = 4, bottom: float = 0, colors: list | set | Tuple | ndarray | None = None, line_width: float = 0.5, y_limit: Tuple = (0, 1), bbox_to_anchor: Tuple = (1.3, 1.1), is_fill: bool = True, fill_alpha: float = 0.2, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Plot a radar chart with multiple groups.
Parameters
- dfDataFrame
Input data containing the values to plot.
- ax_xstr
Column name for category labels (x-axis categories).
- ax_ystr
Column name for values to plot (y-axis values).
- huestr
Column name for grouping different lines.
- x_namestr, optional
Label for the x-axis.
- y_namestr, optional
Label for the y-axis.
- titlestr, optional
Title of the chart.
- widthfloat, optional
Width of the chart figure.
- heightfloat, optional
Height of the chart figure.
- bottomfloat, optional
Bottom margin adjustment.
- colorscollection, optional
Colors for each group line.
- line_widthfloat, optional
Width of the radar lines.
- y_limitTuple, optional
Y-axis limit range.
- bbox_to_anchorTuple, optional
Position for the legend box.
- is_fillbool, optional
Whether to fill the radar area.
- fill_alphafloat, optional
Transparency level for the filled area.
- outputpath, optional
Output path to save the figure.
- showbool, optional
Whether to display the figure.
- closebool, optional
Whether to close the figure after display.
- kwargsAny, optional
Additional keyword arguments for plotting.
Returns
None
- sciv.pl.radar(ax_x: list | set | Tuple | ndarray, ax_y: list | set | Tuple | ndarray, x_name: str | None = None, y_name: str | None = None, title: str | None = None, colors: list | set | Tuple | ndarray | None = None, width: float = 4, height: float = 4, bottom: float = 0, center_text: str | None = None, rotation: float = 25, value_top: float = 0.1, text_top: float = 1.2, is_fixed: bool = False, is_angle: bool = True, y_limit: Tuple = (-0.5, 1), y_axis_scale: Tuple = (0, 1), output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Plot a radar chart.
Parameters
- ax_xcollection
Category labels for the radar chart.
- ax_ycollection
Data values for each category.
- x_namestr, optional
Label for the x-axis.
- y_namestr, optional
Label for the y-axis.
- titlestr, optional
Title of the chart.
- colorscollection, optional
Colors for the radar chart.
- widthfloat, optional
Width of the chart.
- heightfloat, optional
Height of the chart.
- bottomfloat, optional
Bottom margin adjustment.
- center_textstr, optional
Center text for the chart.
- rotationfloat, optional
Angle rotation for the radar chart.
- value_topfloat, optional
Value top for the radar chart.
- text_topfloat, optional
Text top for the radar chart.
- is_fixedbool, optional
Whether to fix the radar chart.
- is_anglebool, optional
Whether to use angle for the radar chart.
- y_limitTuple, optional
Y-axis limit.
- y_axis_scaleTuple, optional
Y-axis scale.
- outputpath, optional
Output path.
- showbool, optional
Whether to show.
- closebool, optional
Whether to close.
- kwargsAny, optional
Keyword arguments.
Returns
None
- sciv.pl.radar_trait(trait_df: DataFrame, trait_name: str = 'All', trait_column_name: str = 'id', value: str = 'rate', clusters: str = 'clusters', color: list | set | Tuple | ndarray | str | None = None, clusters_sort: list | None = None, width: float = 4, height: float = 4, rotation: float = 65, title: str | None = None, value_top: float = 0.1, text_top: float = 1.2, is_fixed: bool = False, is_angle: bool = True, y_limit: Tuple = (-0.5, 1), y_axis_scale: Tuple = (0, 1), output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any)
Plot radar charts for trait enrichment analysis.
This function creates radar charts to visualize trait/disease enrichment scores across different clusters. It can plot either a single trait or all traits in the dataset.
Parameters
- trait_dfDataFrame
Input dataframe containing trait enrichment data.
- trait_namestr, optional
Name of the trait to plot. Use “All” to plot all traits. Default is “All”.
- trait_column_namestr, optional
Column name in trait_df that contains trait identifiers. Default is “id”.
- valuestr, optional
Column name containing the enrichment values to plot. Default is “rate”.
- clustersstr, optional
Column name containing cluster identifiers. Default is “clusters”.
- colorUnion[collection, str], optional
Colors for the radar chart bars. Can be a column name (str) or a collection of colors.
- clusters_sortOptional[list], optional
Custom order for clusters. If None, clusters are sorted by value in descending order.
- widthfloat, optional
Width of the figure in inches. Default is 4.
- heightfloat, optional
Height of the figure in inches. Default is 4.
- rotationfloat, optional
Rotation angle for text labels in degrees. Default is 65.
- titlestr, optional
Base title for the plot. Trait name will be appended if provided.
- value_topfloat, optional
Vertical offset for value labels above bars. Default is 0.1.
- text_topfloat, optional
Radial position for category labels. Default is 1.2.
- is_fixedbool, optional
If True, value labels are placed at a fixed position. Default is False.
- is_anglebool, optional
If True, rotate labels to align with radar angles. Default is True.
- y_limitTuple, optional
Y-axis limits as (min, max). Default is (-0.5, 1).
- y_axis_scaleTuple, optional
Scale range for y-axis ticks as (min, max). Default is (0, 1).
- outputpath, optional
Directory path to save output PDF files. If None, files are not saved.
- showbool, optional
Whether to display the plot. Default is True.
- closebool, optional
Whether to close the figure after display. Default is False.
- kwargsAny, optional
Additional keyword arguments passed to the radar function.
Returns
- None
The function saves plots to files and/or displays them based on parameters.
- sciv.pl.rate_circular_bar_plot(adata: AnnData, layer: str | None = None, trait_name: str = 'All', dir_name: str = 'feature', column: str = 'value', groupby: str = 'clusters', color: list | set | Tuple | ndarray | str | None = None, groupby_sort: list | None = None, width: float = 2, height: float = 2, rotation: float = 25, title: str | None = None, value_top: float = 0.1, text_top: float = 1.2, is_fixed: bool = False, is_angle: bool = True, y_limit: Tuple = (-0.5, 1), y_axis_scale: Tuple = (0, 1), plot_output: str | Path | None = None, show: bool = True, close: bool = False) None
Generate a circular bar plot (radar chart) showing enrichment ratios for trait-cluster combinations.
This function calculates the completion ratio using the complete_ratio function and visualizes the results as a circular bar plot (radar chart). It handles directory creation for output files and passes appropriate parameters to the radar_trait plotting function.
Parameters
- adataAnnData
Input AnnData object containing the data to be visualized.
- layerstr, optional
Specify the layer of the matrix to be processed. If None, uses the main matrix.
- trait_namestr, default “All”
The name of the trait being analyzed, used for filtering data.
- dir_namestr, default “feature”
Folder name for generating and saving circular bar plot outputs.
- columnstr, default “value”
The column name containing the binary enrichment values.
- groupbystr, default “clusters”
The column name in adata.obs that defines the cell clusters.
- colorUnion[collection, str], optional
Color specification for the plot. Can be a color collection or a column name to use for coloring bars based on data values.
- groupby_sortOptional[list], optional
Custom order for clusters. If None, uses default sorting.
- widthfloat, default 2
The width of the output figure in inches.
- heightfloat, default 2
The height of the output figure in inches.
- rotationfloat, default 25
Rotation angle for the circular plot in degrees.
- titlestr, optional
The title of the plot. If None, no title is displayed.
- value_topfloat, default 0.1
Vertical offset for value labels in the plot.
- text_topfloat, default 1.2
Vertical offset for text labels in the plot.
- is_fixedbool, default False
If True, use fixed scaling for the plot.
- is_anglebool, default True
If True, use angular positioning for bars.
- y_limitTuple, default (-0.5, 1)
The y-axis limits for the plot.
- y_axis_scaleTuple, default (0, 1)
The scale range for the y-axis values.
- plot_outputpath, optional
Directory path for saving output files. If None, figures are not saved.
- showbool, default True
If True, display the figure interactively.
- closebool, default False
If True, close the figure after saving.
Returns
- None
This function does not return any value. Outputs are saved to files or displayed.
Venn
Wayne diagram visualization function.
- sciv.pl.three_venn(set1: list | set | Tuple | ndarray, set2: list | set | Tuple | ndarray, set3: list | set | Tuple | ndarray, name1: str = 'Set1', name2: str = 'Set2', name3: str = 'Set3', width: float = 2, height: float = 2, bottom: float = 0, colors: list | None = None, x_name: str | None = None, y_name: str | None = None, title: str | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Plot three Venn diagram.
Parameters
- set1collection
First set of elements.
- set2collection
Second set of elements.
- set3collection
Third set of elements.
- name1str, optional
Name of the first set.
- name2str, optional
Name of the second set.
- name3str, optional
Name of the third set.
- widthfloat, optional
Width of the diagram.
- heightfloat, optional
Height of the diagram.
- bottomfloat, optional
Bottom of the diagram.
- colorslist, optional
Colors for the sets.
- x_namestr, optional
X name.
- y_namestr, optional
Y name.
- titlestr, optional
Title of the diagram.
- outputpath, optional
Output path.
- showbool, optional
Whether to show.
- closebool, optional
Whether to close.
- kwargsAny, optional
Keyword arguments.
Returns
None
- sciv.pl.two_venn(set1: list | set | Tuple | ndarray, set2: list | set | Tuple | ndarray, name1: str = 'Set1', name2: str = 'Set2', width: float = 2, height: float = 2, bottom: float = 0, colors: list | None = None, x_name: str | None = None, y_name: str | None = None, title: str | None = None, output: str | Path | None = None, show: bool = True, close: bool = False, **kwargs: Any) None
Plot two Venn diagram.
Parameters
- set1collection
First set of elements.
- set2collection
Second set of elements.
- name1str, optional
Name of the first set.
- name2str, optional
Name of the second set.
- widthfloat, optional
Width of the diagram.
- heightfloat, optional
Height of the diagram.
- bottomfloat, optional
Bottom of the diagram.
- colorslist, optional
Colors for the sets.
- x_namestr, optional
X name.
- y_namestr, optional
Y name.
- titlestr, optional
Title of the diagram.
- outputpath, optional
Output path.
- showbool, optional
Whether to show.
- closebool, optional
Whether to close.
- kwargsAny, optional
Keyword arguments.
Returns
None
Preprocessing (.pp)
Data preprocessing interface, used for single-cell data cleaning, differential analysis, and enrichment analysis.
- sciv.pp.adata_group(adata: AnnData, groupby: str, extra_column: str | None = None, axis: Literal[0, 1] = 1, layer: str | None = None, method: list | set | Tuple | ndarray | str = ('mean', 'sum', 'median')) AnnData
Return reshaped AnnData organized by given column values.
Parameters
- adataAnnData
input data;
- groupbystr
grouping column;
- extra_columnOptional[str], optional
Extra columns reserved based on grouped column;
- axisLiteral[0, 1], optional
Which dimension is used for grouping. {1: adata.obs, 0: adata.var};
- layerstr, optional
Specify the matrix to be processed;
- methodcollection | str, optional
The method of grouping strategy supports the following 5 types and their combinations. The five methods are {“mean”, “sum”, “median”, “max”, “min”}.
Returns
- AnnData
Data grouped by AnnData.
- sciv.pp.adata_map_df(adata: AnnData, column: str = 'value', layer: str | None = None) DataFrame
Convert AnnData to a form of row column value.
Parameters
- adataAnnData
Enter the AnnData data to be converted;
- columnstr
Specify the column name of the value;
- layerstr, optional
Specify the matrix to be processed;
Returns
- DataFrame
The DataFrame data of the row column value.
- sciv.pp.filter_data(adata: AnnData, min_cells: int = 1, min_peaks: int = 1, min_peaks_counts: int = 1, min_cells_counts: int = 1, cell_rate: float | None = None, peak_rate: float | None = None, is_copy: bool = False, is_min_cell: bool = True, is_min_peak: bool = False) AnnData
Filter scATAC-seq data.
Parameters
- adataAnnData
scATAC-seq data
- min_peaks_countsint, optional
Minimum number of counts required for a peak to pass filtering
- min_cellsint, optional
Minimum number of cells expressed required for a peak to pass filtering
- min_cells_countsint, optional
Minimum number of counts required for a cell to pass filtering
- min_peaksint, optional
Minimum number of peaks expressed required for a cell to pass filtering
- cell_rateOptional[float], optional
Removing the percentage of cell count in total cell count only takes effect when the min_cells parameter is None
- peak_rateOptional[float], optional
Removing the percentage of peak count in total peak count only takes effect when the min_peaks parameter is None
- is_copybool, optional
Do you want to deeply copy data.
- is_min_cellbool, optional
Whether to screen cells
- is_min_peakbool, optional
Whether to screen peaks
Returns
- AnnData
scATAC-seq data
- sciv.pp.get_difference_genes(adata: AnnData, groupby: str, method: Literal['logreg', 't-test', 'wilcoxon', 't-test_overestim_var'] | None = 'wilcoxon', cell_anno: DataFrame | None = None, diff_genes_file: str | None = None) AnnData
Get differentially expressed/active genes.
Parameters
- adataAnnData
scATAC-seq data
- groupbystr
groupby name
- method_Method, optional
Method to use for differentially expressed gene analysis.
- cell_annoOptional[DataFrame], optional
Cell annotation DataFrame.
- diff_genes_fileOptional[str], optional
Output file name.
Returns
- AnnData
scATAC-seq data
- sciv.pp.get_difference_peaks(adata: AnnData, genome_anno, groupby: str, cell_anno: DataFrame | None = None, min_log_fc: float = 0.25, min_pct: float = 0.05, peak_matrix_save_file: str | Path | None = None, diff_peaks_save_file: str | Path | None = None) AnnData
Get difference peaks.
Parameters
- adataAnnData
Fragment file path.
- genome_annoDataFrame
Genome annotation.
- groupbystr
Cluster name.
- cell_annoOptional[DataFrame], optional
Cell annotation.
- min_log_fcfloat, optional
Minimum log2 fold change.
- min_pctfloat, optional
Minimum percentage.
- peak_matrix_save_fileOptional[path], optional
Peak matrix save file.
- diff_peaks_save_fileOptional[path], optional
Difference peaks save file.
Returns
- AnnData
Difference peaks data.
- sciv.pp.get_gene_enrichment(adata: AnnData, top_number: int = 50, threshold: float = 0.01, layer: str | None = None, is_order_or_lt: bool = True, is_top: bool = True, gene_sets: list[str] | set = ('GO_Biological_Process_2023', 'GO_Cellular_Component_2023', 'GO_Molecular_Function_2023', 'GWAS_Catalog_2023', 'KEGG_2016'), organism: Literal['Human', 'Mouse', 'Yeast', 'Fly', 'Fish', 'Worm'] | None = 'human', output_dir: str | None = None) DataFrame
Get gene enrichment analysis.
Parameters
- adataAnnData
Input data;
- top_numberint, optional
Top number of genes to use.
- thresholdfloat, optional
Threshold to use.
- layerOptional[str], optional
Specify the matrix to be processed;
- is_order_or_ltbool, optional
Whether to order or filter by threshold.
- is_topbool, optional
Whether to get top top_number genes.
- gene_setsUnion[list[str], set], optional
Gene sets to use.
- organism_Datasets, optional
Organism to use.
- output_dirOptional[str], optional
Output directory.
Returns
- DataFrame
GSEA enrichr results DataFrame.
- sciv.pp.get_gene_expression(adata: AnnData, genome_anno, min_cells: int = 5, gene_save_file: str | Path | None = None) AnnData
Get gene expression matrix.
Parameters
- adataAnnData
scATAC-seq data
- genome_annoDataFrame
Genome annotation.
- min_cellsint, optional
Minimum cells.
- gene_save_fileOptional[path], optional
Gene save file path.
Returns
- AnnData
Gene expression matrix.
- sciv.pp.get_peak_matrix(adata: AnnData, genome_anno, groupby: str, cell_anno: DataFrame | None = None, peak_matrix_save_file: str | Path | None = None) AnnData
Generate peak matrix from scATAC-seq data.
This function processes scATAC-seq fragment files to generate a cell-by-peak matrix through peak calling using MACS3. It performs quality control, tile matrix generation, feature selection, and peak calling at the specified cluster level.
Parameters
- adataAnnData
scATAC-seq data
- genome_annoDataFrame
Genome annotation information.
- groupbystr
Column name in cell annotation indicating cluster labels for peak calling.
- cell_annoOptional[DataFrame], optional
Cell annotation DataFrame containing cluster information.
- peak_matrix_save_fileOptional[path], optional
Path to save the output peak matrix h5ad file.
Returns
- AnnData
Cell-by-peak matrix.
- sciv.pp.get_sc_atac(fragment_file: str | Path, genome_anno, h5ad_file: str | Path | None = None, min_num_fragments: int = 200, sorted_by_barcode: bool = False, bin_size: int = 500, min_tsse: float = 5.0, counting_strategy: Literal['fragment', 'insertion', 'paired-insertion'] = 'paired-insertion', need_features: int | float | None = None, is_filter_doublets: bool = True) AnnData
Get scATAC-seq data from fragment file or h5ad file.
This function processes scATAC-seq data by importing fragment files, performing quality control, adding tile matrices, selecting features, and filtering doublets. It can also read pre-processed h5ad files.
Parameters
- fragment_filepath
Path to the fragment file or h5ad file.
- genome_annoDataFrame
Genome annotation.
- h5ad_fileOptional[path], optional
Path to save the h5ad file. If None, a temporary cache file will be used.
- min_num_fragmentsint, optional
Minimum number of fragments required for a cell to pass filtering.
- sorted_by_barcodebool, optional
Whether the input fragment file is sorted by barcode.
- bin_sizeint, optional
Size of consecutive genomic regions used to record the counts.
- min_tssefloat, optional
Minimum TSS enrichment score required for a cell to pass filtering.
- counting_strategyLiteral[‘fragment’, ‘insertion’, ‘paired-insertion’], optional
Strategy to count fragments in bins.
- need_featuresOptional[Union[int | float]], optional
Number or proportion of features to select.
- is_filter_doubletsbool, optional
Whether to filter doublets.
Returns
- AnnData
Processed scATAC-seq data.
- sciv.pp.get_tf_data(adata: AnnData, genome_anno, groupby: str, cell_anno: DataFrame | None = None, p_value: float = 0.01, peak_matrix_save_file: str | Path | None = None, tf_save_file: str | Path | None = None) AnnData
Get TF data.
Parameters
- adataAnnData
scATAC-seq data
- genome_annoDataFrame
Genome annotation.
- groupbystr
Cluster name.
- cell_annoOptional[DataFrame], optional
Cell annotation.
- p_valuefloat, optional
P-value threshold.
- peak_matrix_save_fileOptional[path], optional
Peak matrix save file.
- tf_save_fileOptional[path], optional
TF save file.
Returns
- AnnData
TF data.
- sciv.pp.gsea_enrichr(gene_list: list[str], gene_sets: list[str] | set = ('GO_Biological_Process_2023', 'GO_Cellular_Component_2023', 'GO_Molecular_Function_2023', 'GWAS_Catalog_2023', 'KEGG_2016'), organism: Literal['Human', 'Mouse', 'Yeast', 'Fly', 'Fish', 'Worm'] | None = 'human', is_verbose: bool = True, output_dir: str | None = None) DataFrame
GSEA enrichr analysis.
Parameters
- gene_listlist[str]
Gene list.
- gene_setsUnion[list[str], set], optional
Gene sets to use.
- organism_Datasets, optional
Organism to use.
- is_verbosebool, optional
Whether to print verbose messages.
- output_dirOptional[str], optional
Output directory.
Returns
- DataFrame
GSEA enrichr results DataFrame.
- sciv.pp.merge_sc_atac(files: dict, genome_anno, merge_key: str = 'merge_sc_atac', min_num_fragments: int = 200, sorted_by_barcode: bool = False, bin_size: int = 500, min_tsse: float = 5.0, counting_strategy: Literal['fragment', 'insertion', 'paired-insertion'] = 'paired-insertion', max_iter_harmony: int = 20, harmony_groupby: str | list[str] | None = None, is_selected: bool = False, is_batch: bool = True, need_features: int | float | None = None, output_path: str | Path | None = None) AnnData
Integrate multiple scATAC-seq data through snapATAC2.
This function integrates multiple scATAC-seq datasets using snapATAC2. Reference: https://kzhang.org/SnapATAC2/tutorials/integration.html
Note: Please do not move the generated files during this processing.
Parameters
- filesdict
Dictionary mapping sample names to file paths of scATAC-seq data. Format: {file_key: file_path, …}
- genome_annoDataFrame
Genome annotation. Commonly snap.genome.hg38 or snap.genome.hg19.
- merge_keystr, optional
Key used to form the final H5AD file name. Default is “merge_sc_atac”.
- min_num_fragmentsint, optional
Minimum number of unique fragments required for a cell to pass filtering. Default is 200.
- sorted_by_barcodebool, optional
Whether the input fragment file is sorted by barcode. Default is False.
- bin_sizeint, optional
Size of consecutive genomic regions used to record counts. Default is 500.
- min_tssefloat, optional
Minimum TSS enrichment score required for a cell to pass filtering. Default is 5.0.
- counting_strategyLiteral[‘fragment’, ‘insertion’, ‘paired-insertion’], optional
Strategy to count fragments in bins. Default is ‘paired-insertion’.
- max_iter_harmonyint, optional
Maximum number of iterations for the harmony algorithm. Default is 20.
- harmony_groupbyOptional[Union[str, list[str]]], optional
If specified, split data into groups and perform batch correction on each group separately.
- is_selectedbool, optional
If True, perform additional filtering based on feature selection from each sample using the snap.pp.select_features method.
- is_batchbool, optional
If True, perform batch correction by sample. Default is True.
- need_featuresOptional[Union[int, float]], optional
Number or proportion of features to select. If <= 1, interpreted as a proportion of total features. If > 1, interpreted as absolute number.
- output_pathOptional[path], optional
Directory path for output files. If None, temporary files are used.
Returns
- AnnData
Integrated scATAC-seq data.
- sciv.pp.paga_trajectory(adata: AnnData, layer: str | None = None, latent: str = 'X_pca', groups: str = 'louvain', position: list | set | Tuple | ndarray | None = None, lsi_components: int = 50, root_cluster: str | None = None, n_neighbors: int = 15, resolution: float = 1.0, is_denoise: bool = True) None
Get paga trajectory.
Parameters
- adataAnnData
scATAC-seq data
- layerOptional[str], optional
Specify the matrix to be processed;
- latentstr, optional
Latent space to use.
- groupsstr, optional
Group name to use.
- positionOptional[collection], optional
Position to use.
- lsi_componentsint, optional
Number of components to use.
- root_clusterOptional[str], optional
Root cluster to use.
- n_neighborsint, optional
Number of neighbors to use.
- resolutionfloat, optional
Resolution to use.
- is_denoisebool, optional
Whether to denoise.
Returns
None
- sciv.pp.poisson_vi(adata: AnnData, max_epochs: int = 500, lr: float = 0.0001, batch_size: int = 128, eps: float = 1e-08, early_stopping: bool = True, early_stopping_patience: int = 50, strategy: str = 'ddp_notebook_find_unused_parameters_true', batch_key: str | None = None, resolution: float = 0.5, dp_delta: float = 0.05, latent_name: str = 'latent', model_dir: str | Path | None = None) AnnData
PoissonVI processing of the data results in the current sample representation and peak difference data after Leiden clustering.
Parameters
- adataAnnData
Input data to be processed.
- max_epochsint, default 500
The maximum number of epochs for PoissonVI training.
- lrfloat, default 1e-4
Learning rate for optimization.
- batch_sizeint, default 128
Minibatch size to use during training.
- epsfloat, default 1e-08
Optimizer epsilon.
- early_stoppingbool, default True
Whether to perform early stopping with respect to the validation set.
- early_stopping_patienceint, default 50
How many epochs to wait for improvement before early stopping.
- strategystr, default “ddp_notebook_find_unused_parameters_true”
DDP strategy.
- batch_keystr, optional
Batch information in scATAC-seq data.
- resolutionfloat, default 0.5
Resolution of the Leiden clustering.
- dp_deltafloat, default 0.05
Empirical effect size threshold for PeakVI method in differential analysis.
- latent_namestr, default “latent”
The name of latent representation.
- model_dirstr, optional
The folder name for saving the trained model.
Returns
- AnnData
Differential peak data of clustering types.
Tool (.tl)
Tool function interface, including core computing functions such as algorithms, matrix operations, and random walks.
Algorithm
Algorithm related functions.
- sciv.tl.add_bernoulli_fluctuation_noise(counts_matrix: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, noise_level: float = 0.1) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Add Bernoulli fluctuation noise to the counts matrix (add 1 with probability noise_level)
Parameters
- counts_matrixmatrix_data
Input counts matrix
- noise_levelfloat, default 0.1
Noise level, i.e., the probability of randomly adding 1 (range: 0.0 - 1.0)
Returns
- matrix_data
Matrix after adding noise
- sciv.tl.add_noise_perturb(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, rate: float) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Add peak percentage noise to each cell
Parameters
- datamatrix_data
Input counts matrix
- ratefloat
Noise level, i.e., the probability of randomly adding 1 (range: 0.0 - 1.0)
Returns
- matrix_data
Matrix after adding noise
- sciv.tl.ami(labels_pred: list | set | Tuple | ndarray, labels_true: list | set | Tuple | ndarray) float
AMI (0, 1)
Parameters
- labels_predcollection
Predictive labels for clustering;
- labels_truecollection
Real labels for clustering.
Returns
- float
AMI score.
- sciv.tl.ari(labels_pred: list | set | Tuple | ndarray, labels_true: list | set | Tuple | ndarray) float
ARI (-1, 1)
Parameters
- labels_predcollection
Predictive labels for clustering;
- labels_truecollection
Real labels for clustering.
Returns
- float
ARI score.
- sciv.tl.binary_indicator(labels_true: list | set | Tuple | ndarray, labels_pred: list | set | Tuple | ndarray) Tuple[float, float, float, float, float, float, float]
Accuracy, Recall, F1, FPR, TPR, AUROC, AUPRC.
Parameters
- labels_truecollection
Real labels for clustering;
- labels_predcollection
Predictive labels for clustering.
Returns
- tuple
Binary Indicators.
- sciv.tl.calculate_fragment_weighted_accessibility(input_data: dict, block_size: int = -1) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Calculate the initial trait- or disease-related cell score.
Parameters
- input_datadict
data: Convert the counts matrix to the fragments matrix using the scvi.data.reads_to_fragments
overlap_data: Peaks-traits/diseases data
- block_sizeint
The size of the segmentation stored in block wise matrix multiplication. If the value is less than or equal to zero, no block operation will be performed
Returns
- matrix_data
Initial TRS.
- sciv.tl.calculate_init_score_weight(adata: AnnData, da_peaks_adata: AnnData, overlap_adata: AnnData, layer: str | None = 'fragments', diff_peak_value: Literal['emp_effect', 'bayes_factor', 'emp_prob1', 'all'] = 'emp_effect', is_simple: bool = True, block_size: int = -1) AnnData
Calculate the initial trait- or disease-related cell score with weight.
Parameters
- adataAnnData
scATAC-seq data;
- da_peaks_adataAnnData
Differential peak data;
- overlap_adataAnnData
Peaks-traits/diseases data;
- layerstr
Optional. The layer value of scATAC-seq data;
- diff_peak_valuedifference_peak_optional
Specify the correction value in peak correction of clustering type differences. {‘emp_effect’, ‘bayes_factor’, ‘emp_prob1’, ‘all’}
- is_simplebool
True represents not adding unnecessary intermediate variables, only adding the final result. It is worth noting that when set to True, the is_ablation parameter will become invalid, and when set to False, is_ablation will only take effect;
- block_sizeint
The size of the segmentation stored in block wise matrix multiplication. If the value is less than or equal to zero, no block operation will be performed
Returns
- AnnData
Initial TRS with weight.
- sciv.tl.calinski_harabasz(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, labels: list | set | Tuple | ndarray) float
The Calinski-Harabasz index is also one of the indicators used to evaluate the quality of clustering models. It measures the compactness within the cluster and the separation between clusters in the clustering results. The larger the value, the better the clustering effect.
Parameters
- datamatrix_data
First data.
- labelscollection
Predicted labels for each sample.
Returns
- float
Calinski-Harabasz index.
- sciv.tl.coefficient_of_variation(matrix: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, axis: Literal[0, 1, -1] = 0, default: float = 0) float | list | set | Tuple | ndarray
Calculate the coefficient of variation.
Parameters
- matrixmatrix_data
Input matrix data.
- axisLiteral[0, 1, -1], optional
Axis to calculate the coefficient of variation. Default is 0.
- defaultfloat, optional
Default value for division by zero. Default is 0.
Returns
- Union[float, collection]
Coefficient of variation.
- sciv.tl.davies_bouldin(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, labels: list | set | Tuple | ndarray) float
Davies-Bouldin index (DBI).
Parameters
- datamatrix_data
A list of
n_features-dimensional data points. Each row corresponds to a single data point;- labelscollection
Predicted labels for each sample.
Returns
- float
Davies-Bouldin index.
- sciv.tl.euclidean_distances(data1: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, data2: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list | None = None, block_size: int = -1) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Calculate the Euclidean distance between two matrices.
Parameters
- data1matrix_data
First data;
- data2matrix_data
Second data (If the second data is empty, it will default to the first data.)
- block_sizeint
The size of the segmentation stored in block wise matrix multiplication. If the value is less than or equal to zero, no block operation will be performed.
Returns
- matrix_data
Data of Euclidean distance.
- sciv.tl.is_asc_sort(positions_list: list) bool
Judge whether the site is in ascending order.
Parameters
- positions_listlist
Positions list.
Returns
- bool
True for ascending order, otherwise False.
- sciv.tl.jaccard_similarity(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, n_jobs: int = -1, is_to_dense: bool = False) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Calculate the Jaccard similarity matrix.
Parameters
- datamatrix_data
Input cell feature data;
- n_jobsint, optional
The number of jobs to use for the computation.
- is_to_densebool, optional
Whether to convert the data into a dense matrix.
Returns
- matrix_data
Jaccard similarity matrix.
- sciv.tl.k_means(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, n_clusters: int = 8, is_to_dense: bool = False) list | set | Tuple | ndarray
Perform k-means clustering on data.
Parameters
- datamatrix_data
Input data matrix;
- n_clustersint, optional
The number of clusters to form as well as the number of centroids to generate.
- is_to_densebool, optional
Whether to convert the data into a dense matrix.
Returns
- collection
Tags after k-means clustering.
- sciv.tl.kl_divergence(data1: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, data2: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list) float
Calculate KL divergence for two data.
Parameters
- data1matrix_data
First data.
- data2matrix_data
Second data.
Returns
- float
KL divergence score.
- sciv.tl.lsi(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, n_components: int = 50, is_to_dense: bool = False) ndarray | matrix | list
SVD LSI.
Parameters
- datamatrix_data
Input cell feature data;
- n_componentsint, optional
Dimensions that need to be reduced to.
- is_to_densebool, optional
Whether to convert the data into a dense matrix.
Returns
- dense_data
Reduced dimensional data (SVD LSI model).
- sciv.tl.marginal_normalize(matrix: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, axis: Literal[0, 1] = 0, default: float = 1e-50) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Marginal standardization.
Parameters
- matrixmatrix_data
Standardized data matrix required;
- axisLiteral[0, 1], optional
Standardize according to which dimension;
- defaultfloat, optional
To prevent division by 0, this value needs to be added to the denominator.
Returns
- matrix_data
Standardized data.
- sciv.tl.mean_symmetric_scale(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, axis: Literal[0, 1, -1] = -1, is_verbose: bool = True) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Calculate the mean symmetric.
Parameters
- datamatrix_data
Input data;
- axisLiteral[0, 1, -1], optional
Standardize according to which dimension.
- is_verbosebool, optional
log information.
Returns
- matrix_data
Standardized data after average symmetry.
- sciv.tl.min_max_norm(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, axis: Literal[0, 1, -1] = -1) ndarray | matrix | list
Calculate min max standardized data.
Parameters
- datamatrix_data
Input data;
- axisLiteral[0, 1, -1], optional
Standardize according to which dimension.
Returns
- dense_data
Standardized data.
- sciv.tl.obtain_cell_cell_network(adata: AnnData, k: int = 30, or_k: int = 10, weight: float = 0.5, kernel: Literal['laplacian', 'gaussian'] = 'gaussian', local_k: int = 10, gamma: float | str | list | set | Tuple | ndarray | None = None, is_simple: bool = True) AnnData
Calculate cell-cell correlation
Parameters
- adataAnnData
scATAC-seq data;
- kint
When building an M-KNN network, the number of nodes connected by each node (and);
- or_kint
When building an M-KNN network, the number of nodes connected by each node (or);
- weightfloat
The weight of interactions or operations;
- local_kint
Number of neighbors for the adaptive kernel;
- kernelLiteral[“laplacian”, “gaussian”]
Determine the kernel function to be used;
- gammaOptional[Union[float, str, collection]]
When the value of kernel is “laplacian”, if it is None, then it is the reciprocal of the latent representation dimension of the cell. When the value of kernel is “gaussian”, if it is None, then it defaults to an adaptive value obtained through local information of the parameter local_k. Otherwise, it should be strictly positive;
- is_simplebool
True represents not adding unnecessary intermediate variables, only adding the final result. It is worth noting that when set to True, the is_ablation parameter will become invalid, and when set to False, is_ablation will only take effect;
Returns
- AnnData
Cell similarity data.
- sciv.tl.overlap(regions: DataFrame, variants: DataFrame) DataFrame
Relate the peak region and variant site.
Parameters
- regionsDataFrame
Information of peaks.
- variantsDataFrame
Information of variants.
Returns
- DataFrame
The variant maps data in the peak region.
- sciv.tl.overlap_sum(regions: AnnData, variants: dict, trait_info: DataFrame, n_jobs: int = -1) AnnData
Overlap regional data and mutation data and sum the PP values of all mutations in a region as the values for that region.
Parameters
- regionsAnnData
Data of peaks.
- variantsdict
Data of variants.
- trait_infoDataFrame
Information of traits.
- n_jobsint
The maximum number of concurrently running jobs.
- sciv.tl.pca(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, n_components: int = 50, is_to_dense: bool = False) ndarray | matrix | list
PCA.
Parameters
- datamatrix_data
Input cell feature data;
- n_componentsint, optional
Dimensions that need to be reduced to.
- is_to_densebool, optional
Whether to convert the data into a dense matrix.
Returns
- dense_data
Reduced dimensional data.
- sciv.tl.perturb_data(data: list | set | Tuple | ndarray, percentage: float) list | set | Tuple | ndarray
Randomly perturbs the positions of a percentage of data.
Parameters
- datacollection
List of data elements to be perturbed.
- percentagefloat
Percentage of data to be perturbed.
Returns
- collection
Perturbed data list.
- sciv.tl.safe_kl_divergence(p: list | set | Tuple | ndarray, q: list | set | Tuple | ndarray, epsilon: float = 1e-10) float
Safe KL divergence calculation to avoid division by zero.
Parameters
- pcollection
First data.
- qcollection
Second data.
- epsilonfloat, optional
The small value to add to the denominator to avoid zeros.
Returns
- float
KL divergence score.
- sciv.tl.semi_mutual_knn_weight(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, k: int = 30, or_k: int = 10, weight: float = 0.5, is_for: bool = True, is_mknn_fully_connected: bool = True) Tuple[coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list]
Mutual KNN with weight.
Parameters
- datamatrix_data
Input data matrix;
- kint, optional
The number of nearest neighbors (AND);
- or_kint, optional
The number of or nearest neighbors (OR);
- weightfloat, optional
The weight of interactions or operations;
- is_forbool, optional
Obtain the nearest neighbors of each node from each row of the for loop matrix; Setting it to True is very suitable for situations with large samples and insufficient memory.
- is_mknn_fully_connectedbool, optional
Is the network of MKNN an all connected graph? If the value is True, it ensures that a node is connected to at least the node that is not closest to itself. This parameter does not affect the result of SM-KNN (the first result), but only affects the result of traditional M-KNN (the second result).
Returns
- matrix_data
Adjacency weight matrix.
- sciv.tl.sigmoid(data: list | set | Tuple | ndarray | coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | matrix) list | set | Tuple | ndarray | coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | matrix
Sigmoid function.
Parameters
- datacollection, matrix_data
Input data.
Returns
- collection, matrix_data
Sigmoid output.
- sciv.tl.silhouette(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, labels: list | set | Tuple | ndarray) float
silhouette score.
Parameters
- datamatrix_data
An array of pairwise distances between samples, or a feature array.
- labelscollection
Predicted labels for each sample.
Returns
- float
silhouette score.
- sciv.tl.spectral_clustering(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, n_clusters: int = 8, n_components=30, eigen_solver='arpack', is_to_dense: bool = False) list | set | Tuple | ndarray
Spectral clustering on data.
Parameters
- datamatrix_data
Input data matrix;
- n_clustersint, optional
The dimension of the projection subspace.
- n_componentsint, optional
The dimension of the projection subspace.
- eigen_solverstr, optional
Default use of Nyström approximation.
- is_to_densebool, optional
Whether to convert the data into a dense matrix.
Returns
- collection
Tags after spectral clustering.
- sciv.tl.spectral_eigenmaps(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, n_components: int = 30, affinity: Literal['nearest_neighbors', 'rbf', 'precomputed', 'precomputed_nearest_neighbors', 'jaccard'] = 'nearest_neighbors', eigen_solver: Literal['arpack', 'lobpcg', 'amg'] | None = None, n_jobs: int = -1, is_to_dense: bool = False) ndarray | matrix | list
Spectral Eigenmaps.
Parameters
- datamatrix_data
Input cell feature data;
- n_componentsint, optional
Dimensions that need to be reduced to.
- eigen_solverOptional[_EigenSolver], optional
The eigenvalue decomposition strategy to use.
affinity: method n_jobs : int, optional
The number of jobs to use for the computation.
- is_to_densebool, optional
Whether to convert the data into a dense matrix.
Returns
- dense_data
Reduced dimensional data.
- sciv.tl.symmetric_scale(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, scale: int | float | list | set | Tuple | ndarray = 2.0, axis: Literal[0, 1, -1] = -1, is_verbose: bool = True) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Symmetric scale Function.
Parameters
- datamatrix_data
Input data;
- axisLiteral[0, 1, -1], optional
Standardize according to which dimension;
- scaleUnion[number, collection], optional
scaling factor.
- is_verbosebool, optional
log information.
Returns
- matrix_data
Standardized data.
- sciv.tl.tf_idf(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, ri_sparse: bool = True) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
TF-IDF transformer.
Parameters
- datamatrix_data
Matrix data that needs to be converted;
- ri_sparsebool, optional
(return_is_sparse) Whether to return sparse matrix.
Returns
- matrix_data
Matrix processed by TF-IDF.
- sciv.tl.tsne(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, n_components: int = 2, is_to_dense: bool = False) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
T-SNE dimensionality reduction on data.
Parameters
- datamatrix_data
Data matrix that requires dimensionality reduction;
- n_componentsint, optional
Dimension of the embedded space.
- is_to_densebool, optional
Whether to convert the data into a dense matrix.
Returns
- matrix_data
Reduced dimensional data matrix.
- sciv.tl.umap(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, n_neighbors: float = 15, n_components: int = 2, min_dist: float = 0.15, is_to_dense: bool = False) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
UMAP dimensionality reduction on data.
Parameters
- datamatrix_data
Data matrix that requires dimensionality reduction;
- n_neighborsfloat, optional
The size of local neighborhood (in terms of number of neighboring sample points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. In general values should be in the range 2 to 100;
- n_componentsint, optional
The dimension of the space to embed into. This defaults to 2 to provide easy visualization, but can reasonably be set to any integer value in the range 2 to 100.
- min_distfloat, optional
The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the
spreadvalue, which determines the scale at which embedded points will be spread out.- is_to_densebool, optional
Whether to convert the data into a dense matrix.
Returns
- matrix_data
Reduced dimensional data matrix.
- sciv.tl.z_score_marginal(matrix: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, axis: Literal[0, 1] = 0) Tuple[coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list]
Matrix standardization (z-score, marginal).
Parameters
- matrixmatrix_data
Standardized data matrix required.
- axisLiteral[0, 1], optional
Standardize according to which dimension.
Returns
- matrix_data, matrix_data
Standardized data. First element is the z-score data, second element is the mean data.
- sciv.tl.z_score_normalize(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, with_mean: bool = True, ri_sparse: bool | None = None, is_sklearn: bool = False) ndarray | matrix | list | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix
Matrix standardization (z-score).
Parameters
- datamatrix_data
Standardized data matrix required.
- with_meanbool, optional
If True, center the data before scaling.
- ri_sparsebool | None, optional
(return_is_sparse) Whether to return sparse matrix.
- is_sklearnbool, optional
This parameter represents whether to use the sklearn package.
Returns
- dense_data, sparse_matrix
Standardized matrix.
- sciv.tl.z_score_to_p_value(z_score: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Convert z-score to p-value.
Parameters
- z_scorematrix_data
Input z-score data.
Returns
- matrix_data
P-value data.
Random Walk
Random walk related functions.
- class sciv.tl.RandomWalk(cc_adata: AnnData, init_status: AnnData, epsilon: float = 1e-05, max_steps: int = 300, gamma: float = 0.05, enrichment_gamma: float = 0.05, p: int = 2, n_jobs: int = -1, min_seed_cell_rate: float = 0.01, max_seed_cell_rate: float = 0.05, credible_threshold: float = 0, enrichment_threshold: Literal['golden', 'half', 'e', 'pi', 'none'] | float = 'golden', benchmark_count: int = 10, is_ablation: bool = False, is_simple: bool = True)
Bases:
objectRandom walk analysis.
- run_ablation_ncsw() None
Removed cell weights in random walk and cluster type weights in initial scores.
- run_en_ablation_ncsw() None
Removed cell weights in random walk and cluster type weights in initial scores.
- run_knock(trs: AnnData, knock_trait: str, is_control: bool = False) None
Knockout analysis.
Parameters
- trsAnnData
Input AnnData object.
- knock_traitstr
Knockout trait or disease.
- is_controlbool, optional
Whether to control the knockout. default is False.
- static scale_norm(score: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, is_verbose: bool = False) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Scale normalization of the score matrix.
Parameters
- scorematrix_data
Score matrix.
- is_verbosebool, optional
Whether to print the progress. Defaults to False.
Returns
- matrix_data
The normalized score matrix.
- class sciv.tl.TraitDataParallel(module: T, device_ids: Sequence[int | device] | None = None, output_device: int | device | None = None, dim: int = 0)
Bases:
DataParallelData parallel module for trait analysis.
- gather(outputs, output_device)
Collect the results after parallel processing, check for the existence of results, and merge the results by column (each result matrix has the same number of rows but different numbers of columns).
Parameters
- outputslist
Output results of each device
- output_deviceint
Output device ID.
Returns
- Tensor
The merged results sorted by column.
- scatter(inputs, kwargs, device_ids)
Scatter the input data to multiple devices.
Parameters
- inputslist
List of input data to be scattered.
- kwargsdict
Dictionary of keyword arguments to be scattered.
- device_idslist
List of device IDs to scatter the data to.
Returns
- tuple
Tuple of scattered input data and keyword arguments.
- sciv.tl.random_walk(seed_cell_weight: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, weight: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, gamma: float = 0.05, epsilon: float = 1e-05, max_steps: int = 300, p: int = 2, n_jobs: int = -1, device: str = 'auto') coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Random walk analysis.
Parameters
- seed_cell_weightmatrix_data
Seed cell weight matrix, where each column represents a seed cell.
- weightmatrix_data
Transition probability matrix (weight matrix). Defaults to None.
- gammafloat
Random walk parameter. Defaults to 0.05.
- epsilonfloat
Convergence threshold. Defaults to 1e-5.
- max_stepsint
Maximum number of steps. Defaults to 300.
- pint
Order of the random walk. Defaults to 2.
- n_jobsint
Number of jobs to run in parallel. Defaults to -1, which means using all available processors.
- devicestr
Device to run the analysis on. Defaults to ‘auto’.
Returns
- matrix_data
The association score matrix, where each column represents the association score of a seed cell.
- sciv.tl.trs_scale_norm(score: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, axis: Literal[0, 1, -1] = 0, is_verbose: bool = True) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Standardize and normalize the cell scores.
Parameters
- scorematrix_data
Cell scores matrix.
- axisLiteral[0, 1, -1]
Axis to apply the standardization and normalization. Defaults to 0.
- is_verbosebool
Whether to print the progress. Defaults to True.
Returns
- matrix_data
The standardized and normalized cell scores matrix.
Matrix
Matrix operation related functions.
- sciv.tl.down_sampling_data(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list | set | Tuple, sample_number: int = 1000000) list
Down-sampling data.
Parameters
- dataUnion[matrix_data | collection]
Data that requires down-sampling;
- sample_numberint, optional
How many samples (values) were down-sampled.
Returns
- list
Data after down-sampling.
- sciv.tl.matrix_callback_block_storage(matrix: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, callback, block_size: int = 10000, data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list | None = None) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Callback matrix.
Parameters
- matrixmatrix_data
Matrix
- callbackcallable
callback function
- block_sizeint, optional
The size of the segmentation stored in block wise matrix multiplication. If the value is less than or equal to zero, no block operation will be performed.
- datamatrix_data, optional
Return the placeholder variables of the result matrix. If there is value, it will reduce the consumption of memory space.
Returns
- matrix_data
Result Matrix (CSR format)
- sciv.tl.matrix_division_block_storage(matrix: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, value: float | int | list | set | Tuple | ndarray | coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | matrix, block_size: int = 10000, data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list | None = None) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Dividing a matrix by another value, vector, or matrix.
Parameters
- matrixmatrix_data
Matrix
- valueUnion[float, int, collection, matrix_data]
Value, vector, or matrix
- block_sizeint, optional
The size of the segmentation stored in block wise matrix multiplication. If the value is less than or equal to zero, no block operation will be performed.
- datamatrix_data, optional
Return the placeholder variables of the result matrix. If there is value, it will reduce the consumption of memory space.
Returns
- matrix_data
Result Matrix (CSR format)
- sciv.tl.matrix_dot_block_storage(data1: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, data2: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, block_size: int = 10000, is_return_sparse: bool = False, data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list | None = None) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Perform Cartesian product of two matrices through block storage method.
Parameters
- data1matrix_data
Matrix 1
- data2matrix_data
Matrix 2
- block_sizeint, optional
The size of the segmentation stored in block wise matrix multiplication. If the value is less than or equal to zero, no block operation will be performed.
- is_return_sparsebool, optional
Whether to return sparse matrix.
- datamatrix_data, optional
Return the placeholder variables of the result matrix. If there is value, it will reduce the consumption of memory space.
Returns
- matrix_data
Cartesian product result
- sciv.tl.matrix_multiply_block_storage(data1: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, data2: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, block_size: int = 10000, data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list | None = None) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Perform Hadamard product of two matrices through block storage method.
Parameters
- data1matrix_data
Matrix 1
- data2matrix_data
Matrix 2
- block_sizeint, optional
The size of the segmentation stored in block wise matrix multiplication. If the value is less than or equal to zero, no block operation will be performed.
- datamatrix_data, optional
Return the placeholder variables of the result matrix. If there is value, it will reduce the consumption of memory space.
Returns
- matrix_data
Hadamard product result
- sciv.tl.matrix_operation_memory_efficient(data1: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, data2: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list | int | float, chunk_size: int = 10000, default: float = 100000000.0, operation: Literal['+', '-', '*', '/'] = '*') coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix
Perform element-wise addition, subtraction, multiplication, and division on two sparse matrices by blocks, supporting memory-efficient processing.
Parameters
- data1matrix_data
Sparse matrix 1
- data2Union[matrix_data, number]
Sparse matrix 2
- chunk_sizeint, optional
The size of the segmentation stored in block wise element-wise operation. If the value is less than or equal to zero, no block operation will be performed.
- defaultfloat, optional
Default value for division operation when denominator is 0. If the value is 0, it will raise a ValueError.
- operationLiteral[‘+’, ‘-’, ‘*’, ‘/’], optional
Element-wise operation type, optional ‘+’, ‘-’, ‘*’, ‘/’
Returns
- sparse_matrix
Result sparse matrix (CSR format)
- sciv.tl.merge_matrix(datas: list, axis: Literal[0, 1] = 0) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Merge multiple matrix data into one matrix.
Parameters
- dataslist
List of matrix data.
- axisLiteral[0, 1], optional
Axis to merge the matrix. Default is 0.
Returns
- matrix_data
Merged matrix data.
- sciv.tl.split_matrix(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, axis: Literal[0, 1] = 0, chunk_number: int = 1000) list
Split matrix into multiple parts.
Parameters
- datamatrix_data
Input matrix data.
- axisLiteral[0, 1], optional
Axis to split the matrix. Default is 0.
- chunk_numberint, optional
Number of parts to split the matrix. Default is 1000.
Returns
- list
List of split matrix data.
- sciv.tl.vector_multiply_block_storage(data1: list | set | Tuple | ndarray, data2: list | set | Tuple | ndarray, block_size: int = 10000, data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list | None = None) coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list
Two vectors are broadcast in rows and columns respectively and multiplied by Hadamard product.
Parameters
- data1collection
Vector 1
- data2collection
Vector 2
- block_sizeint, optional
The size of the segmentation stored in block wise matrix multiplication. If the value is less than or equal to zero, no block operation will be performed.
- datamatrix_data, optional
Return the placeholder variables of the result matrix. If there is value, it will reduce the consumption of memory space.
Returns
- matrix_data
Result Matrix (CSR format)
Util (.ul)
A universal tool interface that includes constant definitions, logging, and auxiliary functions.
- sciv.ul.check_adata_get(adata: AnnData, layer: str | None = None, is_dense: bool = True, is_matrix: bool = False) AnnData
Check if layer is in .layers, and instantiate a new AnnData with it as .X.
Parameters
- adataAnnData
Input AnnData object.
- layerstr, optional
Layer of the data. Default is None.
- is_densebool, optional
Whether to return dense matrix. Default is True.
- is_matrixbool, optional
Whether to return matrix. Default is False.
Returns
- AnnData
Data.
- sciv.ul.check_gpu_availability(verbose: bool = False) bool
Check the availability of GPU.
Parameters
- verbosebool, optional
Whether to print the information. Default is False.
Returns
- bool
Whether the GPU is available.
Examples
>>> availability = sciv.ul.check_gpu_availability()
- sciv.ul.file_method(name: str | None = None, is_verbose: bool = False) StaticMethod
Create file method handler class
Create a StaticMethod class instance based on the given name for handling file operations. If a name is provided, it will be combined with the project name as the handler file name; otherwise, only the project name will be used.
Parameters
- namestr, optional
File handler name suffix, default is None
- is_verbosebool, default is False
Is log information displayed
Returns
- StaticMethod
Configured StaticMethod class instance
- sciv.ul.generate_hex_colors(num_colors: int) list
Generate random hex colors.
Parameters
- num_colorsint
Number of colors to generate.
Returns
- list
List of random hex colors.
Examples
>>> colors3 = sciv.ul.generate_hex_colors(3) >>> colors5 = sciv.ul.generate_hex_colors(5) >>> print(f"Generate three colors: {colors3}") >>> print(f"Generate five colors: {colors5}")
- sciv.ul.generate_str(length: int = 10) str
Generate a random string.
Parameters
- lengthint, optional
Length of the string. Default is 10.
Returns
- str
Random string.
- sciv.ul.get_index(position: int | float, positions_list: list, is_sort: bool = True) int | Tuple[int, int]
- Search for position information. Similar to half search.
If the position exists in the list, return the index. If it does not exist, return the index located between the two indexes.
Parameters
- positionnumber
Position to search for.
- positions_listlist
Position list.
- is_sortbool, optional
Whether to sort the list. Default is True.
Returns
- Union[int, Tuple[int, int]]
Position index.
- sciv.ul.get_real_predict_label(df: DataFrame, map_groupby: str | list | set | Tuple | ndarray, groupby: str = 'clusters', value: str = 'value') Tuple[DataFrame, int, list]
Get the real and predict label of the trait.
Parameters
- dfDataFrame
Input data.
- map_groupbyUnion[str, collection]
Map of the cluster.
- groupbystr, optional
Name of the column of the cluster. Default is “clusters”.
- valuestr, optional
Name of the column of the value. Default is “value”.
Returns
- Tuple[DataFrame, int, list]
Sorted DataFrame. Number of the cluster. List of the cluster.
- sciv.ul.list_duplicate_set(data: list) list
Append numbering to duplicate information. If data is None, return an empty list. If data is a list, return it as is. If data is a collection, return it converted to a list.
Parameters
- datalist
Input data.
Returns
- list
Unique data with constant quantity.
- sciv.ul.list_index(data: list) Tuple[list, list | set | Tuple | ndarray]
Get the index of each element in a list.
Parameters
- datalist
Input data.
Returns
- Tuple[list, collection]
Index of each element in the list. Types of the elements in the list.
- sciv.ul.log(name: str | None = None) Logger
Create log handler class
Create a Logger class instance based on the given name for logging. If a name is provided, it will be combined with the project name as the log file name; otherwise, only the project name will be used.
Parameters
- namestr, optional
Log handler name suffix, default is None
Returns
- Logger
Configured Logger class instance
- sciv.ul.merge_matrix(datas: list, axis: Literal[0, 1] = 0) list
Merge multiple matrices into one matrix.
Parameters
- dataslist
Input data.
- axisLiteral[0, 1], optional
Axis to merge the matrices. Default is 0.
Returns
- list
Merged data.
- sciv.ul.numerical_bisection_step(min_value: float, max_value: float, step_length: float) Tuple[list | set | Tuple | ndarray, int]
Get the numerical bisection step.
Parameters
- min_valuefloat
Minimum value of the step.
- max_valuefloat
Maximum value of the step.
- step_lengthfloat
Step length of the bisection.
Returns
- Tuple[collection, int]
Numerical bisection step. Number of steps.
- sciv.ul.set_inf_value(matrix: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list) None
Set the infinite value of the matrix to the maximum value of the matrix.
Parameters
- matrixmatrix_data
Input matrix.
- sciv.ul.split_matrix(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, axis: Literal[0, 1] = 0, chunk_number: int = 1000) list
Split a matrix into multiple parts.
Parameters
- datamatrix_data
Input data.
- axisLiteral[0, 1], optional
Axis to split the matrix. Default is 0.
- chunk_numberint, optional
Number of parts to split the matrix. Default is 1000.
Returns
- list
Split data.
- sciv.ul.strings_map_numbers(str_list: list, start: int = 0) list
Map strings to numerical values.
Parameters
- str_listlist
Input strings.
- startint, optional
Start value of the mapping. Default is 0.
Returns
- list
Mapped numerical values.
- sciv.ul.sum_min_max(data: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, axis: int = 1) Tuple[int | float, int | float]
Obtain the minimum/maximum sum of rows in the matrix
Obtain the minimum/maximum sum of rows in the matrix. If data is None, return (0, 0). If data is a dense matrix, return the minimum/maximum sum of rows. If data is a sparse matrix, return the minimum/maximum sum of rows.
Returns
- Tuple[number, number]
Minimum value of rows, maximum value of rows.
- sciv.ul.to_dense(sm: coo_array | csr_array | csc_array | dok_array | lil_array | bsr_array | dia_array | sparray | coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix | ndarray | matrix | list, is_array: bool = False) ndarray | matrix | list
Convert sparse matrix to dense matrix
Convert a sparse matrix to a dense matrix. If sm is None, return an empty dense matrix. If sm is a dense matrix, return it as is. If sm is a sparse matrix, return it converted to array form.
Returns
- dense_data
Converted dense matrix.
- sciv.ul.to_sparse(dm: ~numpy.ndarray | ~numpy.matrix | list, way_callback=<class 'scipy.sparse._csr.csr_matrix'>, is_matrix: bool = True) coo_matrix | csr_matrix | csc_matrix | dok_matrix | lil_matrix | bsr_matrix | dia_matrix | spmatrix
Convert dense matrix to sparse matrix
Convert a dense matrix. If dm is None, return an empty sparse matrix. If dm is a sparse matrix, return it as is. If dm is a dense matrix, return it converted to sparse form.
Returns
- sparse_matrix
Converted sparse matrix.
- sciv.ul.track_with_memory(interval: float = 60) Callable
Create memory tracking decorator
Create a decorator function that records memory usage at fixed intervals during function execution. Returns the result, elapsed time, and memory list.
Parameters
- intervalfloat, optional
Sampling interval (seconds), default is 60 seconds.
Returns
- Callable
Decorator function; when the wrapped function is called, it returns a dictionary containing:<br/> - ‘result’: the original function’s return value.<br/> - ‘time’: function execution time (seconds) if is_monitor is True, otherwise None.<br/> - ‘memory’: list of sampled memory usage (bytes) if is_monitor is True, otherwise None.<br/>