mioXpektron.utils.analysis
Comprehensive analysis for Cancer vs Control ToF-SIMS-like intensity tables.
Input CSV requirements
- Columns:
‘SampleName’ : sample ID (string)
‘Group’ : class label (‘Cancer’ or ‘Control’)
Remaining cols : numeric features (m/z intensities); header names are m/z values
File should already be imputed and non-negative if you enable cNMF.
What this script produces
In the output directory (default: ./analysis_outputs), it writes: - label_counts.csv - univariate_results.csv (Welch t-test per feature, log2 fold-change, BH-FDR q-values) - volcano.png - pca.png - (optional) umap.png – if –umap flag set and umap-learn is installed - roc_logistic.png, roc_random_forest.png - model_performance.csv - importance_lr_l1.png, importance_rf.png - heatmap_top{N}.png – heatmap of top-N features by FDR - embeddings.csv – PCA (and UMAP if requested) - If –cnmf is provided:
cnmf_summary_k{K}.txt
cnmf_consensus_k{K}.npy
cnmf_PAC_vs_k.csv
cnmf_consensus_best.png
cnmf_W_best.npy, cnmf_H_best.npy
cnmf_factor_{j}_top_features.csv (top m/z contributors per factor)
cnmf_factor_{j}_bar.png (bar plot of top contributors)
Usage
python analyze_breast_spectra.py –input aligned_peaks_intensity_breast_new_imputed_rf.csv –outdir analysis_outputs –topn 25 –umap –cnmf –k_list 3 4 5 6 7 –cnmf_reps 30 –cnmf_beta KL
Notes
Welch t-tests (unequal variances) + Benjamini–Hochberg FDR control.
PCA on log1p-standardized intensities.
Classifiers: Logistic Regression (L1, saga) and Random Forest.
cNMF implements multiple NMF runs per k, aligns factors (Hungarian matching), builds a consensus co-clustering matrix, computes PAC stability, and selects k.
References (methods; general, not version-specific)
Welch’s t-test: Welch, 1947; BH-FDR: Benjamini & Hochberg, 1995.
PCA: Pearson, 1901; Hotelling, 1933.
Logistic regression & L1 regularization: Tibshirani, 1996 (Lasso).
Random Forest: Breiman, 2001.
UMAP: McInnes et al., 2018.
NMF (MU updates): Lee & Seung, 2001; cNMF: Brunet et al., 2004; survey in Berry et al., 2007.
- mioXpektron.utils.analysis.bh_fdr(pvals)[source]
Benjamini–Hochberg FDR for a 1D array of p-values.
- mioXpektron.utils.analysis.compute_univariate_tests(X, y)[source]
Welch t-test per feature and log2 fold-change (Cancer / Control).
- mioXpektron.utils.analysis.run_cnmf(X_pos, k_list, R=30, max_iter=1000, beta='frobenius', random_seeds=None, outdir=None)[source]
Consensus NMF across k values. Returns dict[k] with W_mean, H_mean, consensus, PAC, W_list, H_list.