mioXpektron.utils.analysis

Comprehensive analysis for Cancer vs Control ToF-SIMS-like intensity tables.

Input CSV requirements

  • Columns:
    • ‘SampleName’ : sample ID (string)

    • ‘Group’ : class label (‘Cancer’ or ‘Control’)

    • Remaining cols : numeric features (m/z intensities); header names are m/z values

  • File should already be imputed and non-negative if you enable cNMF.

What this script produces

In the output directory (default: ./analysis_outputs), it writes: - label_counts.csv - univariate_results.csv (Welch t-test per feature, log2 fold-change, BH-FDR q-values) - volcano.png - pca.png - (optional) umap.png – if –umap flag set and umap-learn is installed - roc_logistic.png, roc_random_forest.png - model_performance.csv - importance_lr_l1.png, importance_rf.png - heatmap_top{N}.png – heatmap of top-N features by FDR - embeddings.csv – PCA (and UMAP if requested) - If –cnmf is provided:

  • cnmf_summary_k{K}.txt

  • cnmf_consensus_k{K}.npy

  • cnmf_PAC_vs_k.csv

  • cnmf_consensus_best.png

  • cnmf_W_best.npy, cnmf_H_best.npy

  • cnmf_factor_{j}_top_features.csv (top m/z contributors per factor)

  • cnmf_factor_{j}_bar.png (bar plot of top contributors)

Usage

python analyze_breast_spectra.py –input aligned_peaks_intensity_breast_new_imputed_rf.csv –outdir analysis_outputs –topn 25 –umap –cnmf –k_list 3 4 5 6 7 –cnmf_reps 30 –cnmf_beta KL

Notes

  • Welch t-tests (unequal variances) + Benjamini–Hochberg FDR control.

  • PCA on log1p-standardized intensities.

  • Classifiers: Logistic Regression (L1, saga) and Random Forest.

  • cNMF implements multiple NMF runs per k, aligns factors (Hungarian matching), builds a consensus co-clustering matrix, computes PAC stability, and selects k.

References (methods; general, not version-specific)

  • Welch’s t-test: Welch, 1947; BH-FDR: Benjamini & Hochberg, 1995.

  • PCA: Pearson, 1901; Hotelling, 1933.

  • Logistic regression & L1 regularization: Tibshirani, 1996 (Lasso).

  • Random Forest: Breiman, 2001.

  • UMAP: McInnes et al., 2018.

  • NMF (MU updates): Lee & Seung, 2001; cNMF: Brunet et al., 2004; survey in Berry et al., 2007.

Author’s note

  • Where I recommend KL loss for count-like data, that is a common practice in mass-spec intensity modeling, consistent with Poisson-like noise assumptions and NMF literature (opinion grounded in cited works above).

Functions

bh_fdr(pvals)

Benjamini–Hochberg FDR for a 1D array of p-values.

choose_k_by_pac(results)

compute_univariate_tests(X, y)

Welch t-test per feature and log2 fold-change (Cancer / Control).

ensure_dir(path)

main(input_file, outdir[, topn, umap, cnmf, ...])

plot_heatmap_top_features(X, y, res, outpath)

plot_pca(X_scaled, y, outpath)

plot_umap(X_scaled, y, outpath[, ...])

plot_volcano(res, outpath[, q_thresh, fc_thresh])

run_cnmf(X_pos, k_list[, R, max_iter, beta, ...])

Consensus NMF across k values.

run_models(X_scaled, y01, features, outdir)

save_consensus_heatmap(consensus, labels, ...)

save_factor_bars(H, feature_names, outdir[, ...])

mioXpektron.utils.analysis.ensure_dir(path)[source]
Parameters:

path (str)

mioXpektron.utils.analysis.bh_fdr(pvals)[source]

Benjamini–Hochberg FDR for a 1D array of p-values.

Parameters:

pvals (ndarray)

Return type:

ndarray

mioXpektron.utils.analysis.compute_univariate_tests(X, y)[source]

Welch t-test per feature and log2 fold-change (Cancer / Control).

Parameters:
Return type:

DataFrame

mioXpektron.utils.analysis.plot_volcano(res, outpath, q_thresh=0.05, fc_thresh=1.0)[source]
Parameters:
mioXpektron.utils.analysis.plot_pca(X_scaled, y, outpath)[source]
Parameters:
Return type:

Tuple[ndarray, ndarray]

mioXpektron.utils.analysis.plot_umap(X_scaled, y, outpath, n_neighbors=15, min_dist=0.1)[source]
Parameters:
mioXpektron.utils.analysis.plot_heatmap_top_features(X, y, res, outpath, top_n=25)[source]
Parameters:
mioXpektron.utils.analysis.run_models(X_scaled, y01, features, outdir, seed=0)[source]
Parameters:
mioXpektron.utils.analysis.run_cnmf(X_pos, k_list, R=30, max_iter=1000, beta='frobenius', random_seeds=None, outdir=None)[source]

Consensus NMF across k values. Returns dict[k] with W_mean, H_mean, consensus, PAC, W_list, H_list.

Parameters:
Return type:

Dict[int, Dict[str, object]]

mioXpektron.utils.analysis.choose_k_by_pac(results)[source]
Parameters:

results (Dict[int, Dict[str, object]])

Return type:

int

mioXpektron.utils.analysis.save_consensus_heatmap(consensus, labels, outpath)[source]
Parameters:
mioXpektron.utils.analysis.save_factor_bars(H, feature_names, outdir, topm=15)[source]
Parameters:
mioXpektron.utils.analysis.main(input_file, outdir, topn=25, umap=False, cnmf=False, k_list=None, cnmf_reps=30, cnmf_beta='frobenius')[source]