mioXpektron.normalization.normalization_eval

Normalization method evaluation for ToF-SIMS data.

Evaluates multiple normalization strategies on a set of labelled spectra using unsupervised, supervised and spectral-quality metrics, then ranks them with composite scores — following the approach established in xpectrass for FTIR data but adapted to the specifics of ToF-SIMS (Poisson counting statistics, high dynamic range, ion-yield variation).

Usage

>>> from mioXpektron.normalization import NormalizationEvaluator
>>> evaluator = NormalizationEvaluator(files=["spectra/*.txt"])
>>> summary = evaluator.evaluate()
>>> evaluator.plot()

Functions

`evaluate_one_method`(X_raw, groups, ...[, ...])	Evaluate a single normalisation method on the spectra matrix.
`spectral_angle`(a, b[, eps])	Spectral Angle Mapper (SAM) in radians; lower => more similar shape.
`within_group_mean_sam`(X, groups)	Mean SAM across all pairs within each group (technical replicates).

Classes

NormalizationEvaluator([files, methods, ...])

Evaluate normalization methods on labelled ToF-SIMS spectra.

mioXpektron.normalization.normalization_eval.spectral_angle(a, b, eps=1e-12)[source]

Spectral Angle Mapper (SAM) in radians; lower => more similar shape.

Parameters:

a (ndarray)
b (ndarray)
eps (float)

Return type:

float

mioXpektron.normalization.normalization_eval.within_group_mean_sam(X, groups)[source]

Mean SAM across all pairs within each group (technical replicates).

Parameters:

X (ndarray)
groups (ndarray)

Return type:

float

mioXpektron.normalization.normalization_eval.evaluate_one_method(X_raw, groups, mz_values, method, method_kwargs=None, n_clusters=None, cluster_bootstrap_rounds=30, cluster_bootstrap_frac=0.8, random_state=0, compute_supervised=True)[source]

Evaluate a single normalisation method on the spectra matrix.

Parameters:

X_raw (np.ndarray) – (n_samples, n_channels) raw intensity matrix.
groups (np.ndarray) – (n_samples,) label per sample.
mz_values (np.ndarray) – (n_channels,) m/z axis shared by all spectra.
method (str) – Normalization method name.
method_kwargs (dict, optional) – Extra keyword arguments forwarded to normalize().
n_clusters (int, optional) – Number of clusters. Defaults to number of unique groups.
cluster_bootstrap_rounds (int) – Bootstrap rounds for cluster stability.
cluster_bootstrap_frac (float) – Fraction of samples per bootstrap round.
random_state (int) – RNG seed.
compute_supervised (bool) – If True and scikit-learn is available, run supervised CV.

Returns:

Keys include method, all metric values, and compute_time_sec.

Return type:

dict

class mioXpektron.normalization.normalization_eval.NormalizationEvaluator(files=<factory>, methods=None, method_kwargs_map=None, mz_min=None, mz_max=None, n_clusters=None, cluster_bootstrap_rounds=30, cluster_bootstrap_frac=0.8, random_state=0, compute_supervised=True, n_jobs=-1, group_patterns=None, group_fn=None)[source]

Bases: object

Evaluate normalization methods on labelled ToF-SIMS spectra.

Parameters:

files (list of str or Path) – Paths or glob patterns expanding to spectrum text files.
methods (list of str, optional) – Normalization method names. Defaults to a sensible subset.
method_kwargs_map (dict, optional) – {method_name: {kwarg: value, ...}} for method-specific params.
mz_min (float, optional) – m/z range to import.
mz_max (float, optional) – m/z range to import.
n_clusters (int, optional) – Number of clusters for KMeans evaluation. Auto-detected if omitted.
cluster_bootstrap_rounds (int) – Bootstrap rounds for stability metric.
random_state (int) – RNG seed for reproducibility.
compute_supervised (bool) – Run supervised classification (requires scikit-learn + >=2 groups).
n_jobs (int) – Parallel workers (joblib). -1 = all CPUs, 1 = sequential.
cluster_bootstrap_frac (float)
group_patterns (Dict[str, str] | None)
group_fn (Any | None)

Examples

>>> evaluator = NormalizationEvaluator(files=["data/*.txt"])
>>> summary = evaluator.evaluate()
>>> evaluator.plot()

files: List[str | Path]

methods: List[str] | None = None

method_kwargs_map: Dict[str, Dict[str, Any]] | None = None

mz_min: float | None = None

mz_max: float | None = None

n_clusters: int | None = None

cluster_bootstrap_rounds: int = 30

cluster_bootstrap_frac: float = 0.8

random_state: int = 0

compute_supervised: bool = True

n_jobs: int = -1

group_patterns: Dict[str, str] | None = None

group_fn: Any | None = None

evaluate()[source]

Evaluate all methods and return a scored DataFrame.

Returns:: One row per method, sorted by score_combined (descending). Includes raw metrics, z-scored metrics, and four composite scores.
Return type:: pd.DataFrame

plot(out_dir='normalization_selection_output', save=True)[source]

Generate evaluation plots (box plots, bar charts, radar).

Parameters:

out_dir (str or Path) – Sub-folder inside OUTPUT_DIR for saved figures.
save (bool) – Persist plots as PNG + PDF.

Returns:

Saved file paths.

Return type:

list of Path

print_summary(top_n=5)[source]

Print a ranked summary of evaluation results.

Parameters:: top_n (int, default 5) – Number of top methods to display per score variant.
Return type:: None

preview_overlay(file, methods=None, max_methods=5, mz_min=None, mz_max=None, save_to='normalization_selection_output')[source]

Plot raw vs normalised overlays for quick visual comparison.

Parameters:

file (str or Path) – Single spectrum file to visualise.
methods (list of str, optional) – Methods to overlay. Defaults to top methods from evaluation.
max_methods (int) – Cap on the number of overlays.
mz_min (float, optional) – m/z window for the plot.
mz_max (float, optional) – m/z window for the plot.
save_to (str, Path, or None) – Save directory (relative to OUTPUT_DIR). None skips saving.

Return type:

None