mioXpektron.normalization.normalization_eval

Normalization method evaluation for ToF-SIMS data.

Evaluates multiple normalization strategies on a set of labelled spectra using unsupervised, supervised and spectral-quality metrics, then ranks them with composite scores — following the approach established in xpectrass for FTIR data but adapted to the specifics of ToF-SIMS (Poisson counting statistics, high dynamic range, ion-yield variation).

Usage

>>> from mioXpektron.normalization import NormalizationEvaluator
>>> evaluator = NormalizationEvaluator(files=["spectra/*.txt"])
>>> summary = evaluator.evaluate()
>>> evaluator.plot()

Functions

evaluate_one_method(X_raw, groups, ...[, ...])

Evaluate a single normalisation method on the spectra matrix.

spectral_angle(a, b[, eps])

Spectral Angle Mapper (SAM) in radians; lower => more similar shape.

within_group_mean_sam(X, groups)

Mean SAM across all pairs within each group (technical replicates).

Classes

NormalizationEvaluator([files, methods, ...])

Evaluate normalization methods on labelled ToF-SIMS spectra.

mioXpektron.normalization.normalization_eval.spectral_angle(a, b, eps=1e-12)[source]

Spectral Angle Mapper (SAM) in radians; lower => more similar shape.

Parameters:
Return type:

float

mioXpektron.normalization.normalization_eval.within_group_mean_sam(X, groups)[source]

Mean SAM across all pairs within each group (technical replicates).

Parameters:
Return type:

float

mioXpektron.normalization.normalization_eval.evaluate_one_method(X_raw, groups, mz_values, method, method_kwargs=None, n_clusters=None, cluster_bootstrap_rounds=30, cluster_bootstrap_frac=0.8, random_state=0, compute_supervised=True)[source]

Evaluate a single normalisation method on the spectra matrix.

Parameters:
  • X_raw (np.ndarray) – (n_samples, n_channels) raw intensity matrix.

  • groups (np.ndarray) – (n_samples,) label per sample.

  • mz_values (np.ndarray) – (n_channels,) m/z axis shared by all spectra.

  • method (str) – Normalization method name.

  • method_kwargs (dict, optional) – Extra keyword arguments forwarded to normalize().

  • n_clusters (int, optional) – Number of clusters. Defaults to number of unique groups.

  • cluster_bootstrap_rounds (int) – Bootstrap rounds for cluster stability.

  • cluster_bootstrap_frac (float) – Fraction of samples per bootstrap round.

  • random_state (int) – RNG seed.

  • compute_supervised (bool) – If True and scikit-learn is available, run supervised CV.

Returns:

Keys include method, all metric values, and compute_time_sec.

Return type:

dict

class mioXpektron.normalization.normalization_eval.NormalizationEvaluator(files=<factory>, methods=None, method_kwargs_map=None, mz_min=None, mz_max=None, n_clusters=None, cluster_bootstrap_rounds=30, cluster_bootstrap_frac=0.8, random_state=0, compute_supervised=True, n_jobs=-1, group_patterns=None, group_fn=None)[source]

Bases: object

Evaluate normalization methods on labelled ToF-SIMS spectra.

Parameters:
  • files (list of str or Path) – Paths or glob patterns expanding to spectrum text files.

  • methods (list of str, optional) – Normalization method names. Defaults to a sensible subset.

  • method_kwargs_map (dict, optional) – {method_name: {kwarg: value, ...}} for method-specific params.

  • mz_min (float, optional) – m/z range to import.

  • mz_max (float, optional) – m/z range to import.

  • n_clusters (int, optional) – Number of clusters for KMeans evaluation. Auto-detected if omitted.

  • cluster_bootstrap_rounds (int) – Bootstrap rounds for stability metric.

  • random_state (int) – RNG seed for reproducibility.

  • compute_supervised (bool) – Run supervised classification (requires scikit-learn + >=2 groups).

  • n_jobs (int) – Parallel workers (joblib). -1 = all CPUs, 1 = sequential.

  • cluster_bootstrap_frac (float)

  • group_patterns (Dict[str, str] | None)

  • group_fn (Any | None)

Examples

>>> evaluator = NormalizationEvaluator(files=["data/*.txt"])
>>> summary = evaluator.evaluate()
>>> evaluator.plot()
files: List[str | Path]
methods: List[str] | None = None
method_kwargs_map: Dict[str, Dict[str, Any]] | None = None
mz_min: float | None = None
mz_max: float | None = None
n_clusters: int | None = None
cluster_bootstrap_rounds: int = 30
cluster_bootstrap_frac: float = 0.8
random_state: int = 0
compute_supervised: bool = True
n_jobs: int = -1
group_patterns: Dict[str, str] | None = None
group_fn: Any | None = None
evaluate()[source]

Evaluate all methods and return a scored DataFrame.

Returns:

One row per method, sorted by score_combined (descending). Includes raw metrics, z-scored metrics, and four composite scores.

Return type:

pd.DataFrame

plot(out_dir='normalization_selection_output', save=True)[source]

Generate evaluation plots (box plots, bar charts, radar).

Parameters:
  • out_dir (str or Path) – Sub-folder inside OUTPUT_DIR for saved figures.

  • save (bool) – Persist plots as PNG + PDF.

Returns:

Saved file paths.

Return type:

list of Path

print_summary(top_n=5)[source]

Print a ranked summary of evaluation results.

Parameters:

top_n (int, default 5) – Number of top methods to display per score variant.

Return type:

None

preview_overlay(file, methods=None, max_methods=5, mz_min=None, mz_max=None, save_to='normalization_selection_output')[source]

Plot raw vs normalised overlays for quick visual comparison.

Parameters:
  • file (str or Path) – Single spectrum file to visualise.

  • methods (list of str, optional) – Methods to overlay. Defaults to top methods from evaluation.

  • max_methods (int) – Cap on the number of overlays.

  • mz_min (float, optional) – m/z window for the plot.

  • mz_max (float, optional) – m/z window for the plot.

  • save_to (str, Path, or None) – Save directory (relative to OUTPUT_DIR). None skips saving.

Return type:

None