mioXpektron.utils.analysis

Deprecated study-level analysis entry point.

Use mioXpektron.analysis instead. This module re-exports the public API and keeps a CLI-compatible main() for legacy scripts.

Functions

`ensure_dir`(path)
`main`(input_file, outdir[, topn, umap, tsne, ...])	Legacy CLI wrapper around `AnalysisWorkflow`.

class mioXpektron.utils.analysis.AnalysisConfig(outdir='analysis_outputs', label_col='Group', sample_col='SampleName', group_a=None, group_b=None, reference_group=None, top_n_features=25, transform='log1p', random_state=0, embedding_methods=None, run_umap=False, run_tsne=False, umap_n_neighbors=15, umap_min_dist=0.1, tsne_perplexity=30.0, run_ml_benchmark=False, include_xgboost=True, ml_top_n_plot=10, run_ml_tuning=False, ml_tune_top_n=3, run_shap=False, run_cnmf=False, cnmf_k_list=None, cnmf_reps=30, cnmf_beta='frobenius', cnmf_top_features=15)[source]

Bases: object

Configuration for AnalysisWorkflow.

Parameters:

outdir (str)
label_col (str)
sample_col (str)
group_a (str | None)
group_b (str | None)
reference_group (str | None)
top_n_features (int)
transform (str)
random_state (int)
embedding_methods (List[str] | None)
run_umap (bool)
run_tsne (bool)
umap_n_neighbors (int)
umap_min_dist (float)
tsne_perplexity (float)
run_ml_benchmark (bool)
include_xgboost (bool)
ml_top_n_plot (int)
run_ml_tuning (bool)
ml_tune_top_n (int)
run_shap (bool)
run_cnmf (bool)
cnmf_k_list (List[int] | None)
cnmf_reps (int)
cnmf_beta (str)
cnmf_top_features (int)

outdir: str = 'analysis_outputs'

label_col: str = 'Group'

sample_col: str = 'SampleName'

group_a: str | None = None

group_b: str | None = None

reference_group: str | None = None

top_n_features: int = 25

transform: str = 'log1p'

random_state: int = 0

embedding_methods: List[str] | None = None

run_umap: bool = False

run_tsne: bool = False

umap_n_neighbors: int = 15

umap_min_dist: float = 0.1

tsne_perplexity: float = 30.0

run_ml_benchmark: bool = False

include_xgboost: bool = True

ml_top_n_plot: int = 10

run_ml_tuning: bool = False

ml_tune_top_n: int = 3

run_shap: bool = False

run_cnmf: bool = False

cnmf_k_list: List[int] | None = None

cnmf_reps: int = 30

cnmf_beta: str = 'frobenius'

cnmf_top_features: int = 15

class mioXpektron.utils.analysis.AnalysisWorkflow(data, config=None, *, models=None)[source]

Bases: object

Orchestrate univariate stats, embeddings, ML, and optional cNMF.

Parameters:

data (pd.DataFrame)
config (Optional[AnalysisConfig])
models (Optional[Mapping[str, Any]])

results: Dict[str, Any]

run()[source]

Execute the configured analysis pipeline and write outputs.

Return type:: Dict[str, Any]

mioXpektron.utils.analysis.bh_fdr(pvals)[source]

Benjamini–Hochberg FDR correction for a 1D array of p-values.

Parameters:: pvals (ndarray)
Return type:: ndarray

mioXpektron.utils.analysis.choose_k_by_pac(results)[source]

Select the rank with the lowest PAC score.

Parameters:: results (Dict[int, Dict[str, object]])
Return type:: int

mioXpektron.utils.analysis.compute_univariate_tests(X, y, *, group_a=None, group_b=None, reference_group=None, eps=1e-12)[source]

Welch t-test per feature with log2 fold-change (group_a / group_b).

When group_a and group_b are omitted, the two largest groups by sample count are compared. reference_group sets the denominator for log2 fold-change and defaults to group_b.

Parameters:

X (DataFrame)
y (Series)
group_a (str | None)
group_b (str | None)
reference_group (str | None)
eps (float)

Return type:

DataFrame

mioXpektron.utils.analysis.main(input_file, outdir, topn=25, umap=False, tsne=False, shap=False, cnmf=False, k_list=None, cnmf_reps=30, cnmf_beta='frobenius', ml_benchmark=False, ml_tuning=False)[source]: Legacy CLI wrapper around AnalysisWorkflow.

mioXpektron.utils.analysis.plot_heatmap_top_features(X, y, res, savepath, *, top_n=25, label_col='Group')[source]

Heatmap of top differential features (z-scored), samples ordered by group.

Parameters:

X (DataFrame)
y (Series)
res (DataFrame)
savepath (str)
top_n (int)
label_col (str)

Return type:

None

mioXpektron.utils.analysis.plot_pca(X_scaled, y, savepath, *, random_state=0)[source]

PCA scatter plot coloured by group labels.

Parameters:

X_scaled (ndarray)
y (Series)
savepath (str)
random_state (int)

Return type:

Tuple[ndarray, ndarray]

mioXpektron.utils.analysis.plot_umap(X_scaled, y, savepath, *, n_neighbors=15, min_dist=0.1, random_state=0)[source]

UMAP embedding plot when umap-learn is installed.

Parameters:

X_scaled (ndarray)
y (Series)
savepath (str)
n_neighbors (int)
min_dist (float)
random_state (int)

Return type:

ndarray | None

mioXpektron.utils.analysis.plot_volcano(res, savepath, *, group_a=None, group_b=None, q_thresh=0.05, fc_thresh=1.0)[source]

Volcano plot of log2 fold-change versus -log10(p-value).

Parameters:

res (DataFrame)
savepath (str)
group_a (str | None)
group_b (str | None)
q_thresh (float)
fc_thresh (float)

Return type:

None

mioXpektron.utils.analysis.prepare_matrix(df, *, label_col='Group', sample_col='SampleName', meta_cols=None, feature_cols=None, coerce_numeric=True, fill_na=0.0)[source]

Build a sample-by-feature matrix and group labels from pipeline output.

Accepts either:

A long table with SampleName / Group columns and m/z feature columns (typical exported CSV), or
An aligned matrix from align_peaks() where SampleName and optionally Group are index levels.

Parameters:

df (DataFrame) – Input table or aligned feature matrix.
label_col (str) – Column or index level containing group labels.
sample_col (str) – Column or index level containing sample identifiers.
meta_cols (Sequence[str] | None) – Additional metadata columns to exclude from features. Defaults to SampleName and Group only.
feature_cols (Sequence[str] | None) – Explicit feature column names. When omitted, all non-metadata columns are used.
coerce_numeric (bool) – If True, coerce feature columns to numeric (invalid values become NaN).
fill_na (float) – Value used to fill missing feature values after coercion.

Returns:

X – Feature matrix (samples x m/z), index aligned with meta.
y – Group labels indexed like X.
meta – Metadata frame with at least sample_col and label_col.

Return type:

Tuple[DataFrame, Series, DataFrame]

mioXpektron.utils.analysis.run_cnmf(X_pos, k_list, *, R=30, max_iter=1000, beta='frobenius', random_seeds=None, outdir=None)[source]

Run consensus NMF across multiple rank values.

Parameters:

X_pos (ndarray)
k_list (List[int])
R (int)
max_iter (int)
beta (str)
random_seeds (List[int] | None)
outdir (str | None)

Return type:

Dict[int, Dict[str, object]]

mioXpektron.utils.analysis.save_consensus_heatmap(consensus, labels, savepath, *, label_col='Group')[source]

Plot a consensus matrix ordered by group labels.

Parameters:

consensus (ndarray)
labels (Series)
savepath (str)
label_col (str)

Return type:

None

mioXpektron.utils.analysis.save_factor_bars(H, feature_names, outdir, *, topm=15)[source]

Save per-factor top m/z contributors as CSV and bar plots.

Parameters:

H (ndarray)
feature_names (List[str])
outdir (str)
topm (int)

Return type:

None