Usage
OncoMark provides a Python API for two core functionalities:
-
Stem Cell Type Classification – Classifies transcriptomic samples into one of three categories:
- Pluripotent
- Multipotent
- Unipotent
-
Transcriptomic Deconvolution – Estimates and quantifies the relative proportions of different cell types within bulk RNA-seq samples.
This page outlines the steps for using both modules with example data and scripts.
📁 Example Data
Example input files for testing can be found in the test directory of the GitHub repository:
classifier_test.csv– For cell-type classificationPBMC_PSEUDOBULK.csv– Bulk RNA-seq input datasignature_matrix_pbmc.csv– Signature matrix for deconvolutioncell_fractions_PBMC.csv– True cell-type frequenciesPBMC_PSEUDOBULK_INDEPENDENT.csv– Independent data on which trained model will be applied.
🛠️ Installation
Install the ACSCeND package:
pip install ACSCeND
````
---
## 📦 Step 1: Import Required Modules
Start by importing the required classes from the package. Make sure your transcriptomics data is loaded into a `pandas.DataFrame`.
```python
import pandas as pd
from ACSCeND import Predictor, Deconvoluter
🔬 Step 2a: Stem Cell Type Classification
The Predictor class allows you to classify transcriptomic samples into stem cell types.
⚠️ Note: The classifier handles all necessary preprocessing internally. You may use either raw counts or normalized expression data.
❗ Do not use the raw model file from GitHub directly — the Python API is required to ensure preprocessing and normalization are done correctly.
✅ Example Usage
# Load sample input data
df_pred = pd.read_csv('classifier_test.csv', index_col=0)
# Initialize classifier
predictor = Predictor()
# Predict stem cell type (returns class labels)
predictions = predictor(df_pred, prob=False)
print(predictions)
# Predict class probabilities
probabilities = predictor(df_pred, prob=True)
print(probabilities)
🧬 Step 2b: Bulk RNA-seq Deconvolution
The Deconvoluter class can be used to infer cell-type compositions from bulk RNA-seq samples.
✅ Required Inputs
data– Bulk RNA-seq gene expression matrix (samples × genes) [For Training]sig– Signature gene expression matrix (cell types × genes) [For Training]freq– Known cell-type fractions [For Training]org– Independent bulk test dataset [For Inference]normalized– Whether the input data is already normalized (True/False)
✅ Example Usage
# File paths
data_path = 'PBMC_PSEUDOBULK.csv'
signature_path = 'signature_matrix_pbmc.csv'
true_frequencies = 'cell_fractions_PBMC.csv'
independent_data = 'PBMC_PSEUDOBULK_INDEPENDENT.csv'
# Run deconvolution (set normalized=False for raw counts)
org_freq = Deconvoluter(
data_path,
signature_path,
true_frequencies,
independent_data,
normalized=False
)
# org_freq contains the estimated cell-type frequencies for each bulk sample
print(org_freq)
📎 Notes
- Input gene expression data should be in the form of a
.csvfile where rows are genes and columns are samples. - Ensure consistent gene naming across bulk data and signature matrices.
- If you're running on a large dataset, ensure enough memory is available for processing.
📚 References
- For full details and source code: ACSCeND GitHub Repository
- Package:
ACSCeND
```