mioXpektron.normalization

Normalization utilities for the Xpektron toolkit.

mioXpektron.normalization.normalize(intensities, method='tic', **kwargs)[source]

Apply a named normalization method to a 1-D intensity array.

Parameters:
  • intensities (array-like) – Raw intensity values (1-D).

  • method (str, default "tic") – Name of the normalization method. Call normalization_method_names() for the full list.

  • **kwargs – Method-specific keyword arguments forwarded to the underlying function (e.g. target_tic for TIC, reference_mz_idx for selected-ion normalization).

Returns:

Normalized intensity values.

Return type:

np.ndarray

Raises:

ValueError – If method is not recognised.

mioXpektron.normalization.normalization_method_names()[source]

Return a sorted list of available 1-D normalization method names.

Return type:

List[str]

mioXpektron.normalization.tic_normalization(intensities, target_tic=1000000.0)[source]

Scale intensities so the total-ion current equals target_tic.

This is the most common normalisation in ToF-SIMS. Each spectrum is multiplied by target_tic / sum(intensities) so that all spectra share the same TIC.

Parameters:
  • intensities (array-like) – Raw ion counts or intensities.

  • target_tic (float or None) – Desired total-ion current after scaling. Pass None to skip.

Return type:

np.ndarray

mioXpektron.normalization.median_normalization(intensities, target_median=1.0)[source]

Scale intensities so the median equals target_median.

More robust than TIC when a few dominant peaks (e.g. substrate ions) inflate the total-ion current.

Parameters:
  • intensities (array-like)

  • target_median (float, default 1.0)

Return type:

np.ndarray

mioXpektron.normalization.rms_normalization(intensities, target_rms=1.0)[source]

Scale intensities so the root-mean-square equals target_rms.

A compromise between TIC (dominated by big peaks) and median (ignores peak structure).

Parameters:
  • intensities (array-like)

  • target_rms (float, default 1.0)

Return type:

np.ndarray

mioXpektron.normalization.max_normalization(intensities)[source]

Scale intensities so the maximum value equals 1.

Parameters:

intensities (array-like)

Return type:

np.ndarray

mioXpektron.normalization.vector_normalization(intensities)[source]

Scale intensities to unit L2 norm (vector length = 1).

Useful for comparing spectral shape irrespective of total signal.

Parameters:

intensities (array-like)

Return type:

np.ndarray

mioXpektron.normalization.snv_normalization(intensities)[source]

Standard Normal Variate: centre and scale to unit variance.

Commonly used before multivariate analysis (PCA, PLS-DA) to remove multiplicative scatter effects.

Parameters:

intensities (array-like)

Returns:

Mean-centred, variance-scaled spectrum. Note: values can be negative, which is expected for SNV.

Return type:

np.ndarray

mioXpektron.normalization.robust_snv_normalization(intensities, mad_scale=1.4826)[source]

Robust SNV using median and MAD instead of mean and standard deviation.

This is less sensitive to a few dominant ions than classical SNV and is therefore a better fit when substrate/matrix peaks dominate part of the spectrum.

Parameters:
  • intensities (array-like)

  • mad_scale (float, default 1.4826) – Consistency factor turning MAD into a robust standard deviation estimate for approximately Gaussian data.

Returns:

Median-centred, MAD-scaled spectrum. Negative values are expected.

Return type:

np.ndarray

mioXpektron.normalization.poisson_scaling(intensities)[source]

Poisson (square-root mean) scaling for count data.

Each channel is divided by sqrt(mean_intensity) across the spectrum. This equalises the weight of low- and high-count channels when ToF-SIMS data follow Poisson statistics. Widely used before PCA.

Parameters:

intensities (array-like)

Return type:

np.ndarray

mioXpektron.normalization.sqrt_normalization(intensities)[source]

Square-root variance-stabilising transform.

sqrt(intensity) stabilises the variance of Poisson-distributed ion counts. Often combined with mean-centering before PCA.

Parameters:

intensities (array-like)

Return type:

np.ndarray

mioXpektron.normalization.log_normalization(intensities, pseudo_count=1.0)[source]

Log(1 + intensity) transform for high-dynamic-range spectra.

Parameters:
  • intensities (array-like)

  • pseudo_count (float, default 1.0) – Added before taking the log to avoid log(0).

Return type:

np.ndarray

mioXpektron.normalization.selected_ion_normalization(intensities, reference_idx=None, reference_intensity=None, target=1.0)[source]

Normalise to a single reference peak (e.g. substrate or matrix ion).

Provide either reference_idx (index into the intensity array) or reference_intensity (the absolute value to divide by).

Parameters:
  • intensities (array-like)

  • reference_idx (int, optional) – Index of the reference peak in intensities.

  • reference_intensity (float, optional) – Absolute intensity value to normalise against.

  • target (float, default 1.0) – Target value for the reference peak after normalisation.

Return type:

np.ndarray

mioXpektron.normalization.multi_ion_reference_normalization(intensities, reference_indices=None, reference_values=None, target=1.0)[source]

Normalize using multiple reference ions and a robust median ratio.

Parameters:
  • intensities (array-like)

  • reference_indices (sequence of int) – Indices of stable reference ions in the spectrum.

  • reference_values (sequence of float, optional) – Expected intensities for the same reference ions. When provided the spectrum is scaled by the median observed/reference ratio. When omitted, the median observed intensity is scaled to target.

  • target (float, default 1.0) – Target robust centre when reference_values is omitted.

Return type:

np.ndarray

mioXpektron.normalization.pqn_normalization(intensities, reference=None)[source]

Probabilistic Quotient Normalization.

Designed for compositional data where a few species dominate. Divides each channel by the median quotient relative to a reference spectrum.

Parameters:
  • intensities (array-like)

  • reference (array-like or None) – Reference spectrum (e.g. median of a dataset). If None, falls back to TIC normalization with a warning.

Return type:

np.ndarray

mioXpektron.normalization.mass_stratified_pqn_normalization(intensities, mz_values=None, reference=None, strata=None)[source]

Apply PQN separately across coarse m/z strata.

This keeps a global TIC-normalised baseline while estimating local PQN size factors for different m/z regions.

Parameters:
  • intensities (array-like)

  • mz_values (array-like) – m/z axis shared with intensities.

  • reference (array-like) – Dataset-level reference spectrum on the same m/z grid.

  • strata (sequence of tuple(float, float), optional) – Inclusive/exclusive m/z windows [(lo, hi), ...]. Defaults to [(0, 100), (100, 400), (400, inf)].

Return type:

np.ndarray

mioXpektron.normalization.median_of_ratios_normalization(intensities, reference=None)[source]

DESeq2-style median-of-ratios normalization.

Computes the geometric mean spectrum as reference, then normalises each sample by the median ratio to that reference. Robust to compositional effects.

Parameters:
  • intensities (array-like)

  • reference (array-like or None) – Pre-computed geometric-mean reference. If None, falls back to TIC normalization with a warning.

Return type:

np.ndarray

mioXpektron.normalization.vsn_normalization(intensities)[source]

Variance-stabilising normalization via arcsinh transform.

arcsinh(x) behaves like log(2x) for large values but handles zeros and small values gracefully. Suitable for high-dynamic-range ToF-SIMS spectra.

Parameters:

intensities (array-like)

Return type:

np.ndarray

mioXpektron.normalization.minmax_normalization(intensities, feature_range=(0.0, 1.0))[source]

Scale intensities to a fixed range (default [0, 1]).

Parameters:
  • intensities (array-like)

  • feature_range (tuple of float, default (0.0, 1.0))

Return type:

np.ndarray

mioXpektron.normalization.pareto_normalization(intensities, mean=None, std=None, eps=1e-12)[source]

Pareto scale a spectrum using dataset-level feature statistics.

Pareto scaling is a dataset-level transform commonly used before PCA: each feature is mean-centred and divided by sqrt(std_feature). This down-weights very intense ions less aggressively than autoscaling while still reducing dominance by a few channels.

Parameters:
  • intensities (array-like)

  • mean (array-like) – Per-feature dataset mean with the same shape as intensities.

  • std (array-like) – Per-feature dataset standard deviation with the same shape as intensities.

  • eps (float, default 1e-12) – Numerical floor preventing division by zero.

Returns:

Mean-centred, Pareto-scaled spectrum. Negative values are expected.

Return type:

np.ndarray

Raises:

ValueError – If dataset-level mean/std arrays are not provided.

class mioXpektron.normalization.NormalizationEvaluator(files=<factory>, methods=None, method_kwargs_map=None, mz_min=None, mz_max=None, n_clusters=None, cluster_bootstrap_rounds=30, cluster_bootstrap_frac=0.8, random_state=0, compute_supervised=True, n_jobs=-1, group_patterns=None, group_fn=None)[source]

Bases: object

Evaluate normalization methods on labelled ToF-SIMS spectra.

Parameters:
  • files (list of str or Path) – Paths or glob patterns expanding to spectrum text files.

  • methods (list of str, optional) – Normalization method names. Defaults to a sensible subset.

  • method_kwargs_map (dict, optional) – {method_name: {kwarg: value, ...}} for method-specific params.

  • mz_min (float, optional) – m/z range to import.

  • mz_max (float, optional) – m/z range to import.

  • n_clusters (int, optional) – Number of clusters for KMeans evaluation. Auto-detected if omitted.

  • cluster_bootstrap_rounds (int) – Bootstrap rounds for stability metric.

  • random_state (int) – RNG seed for reproducibility.

  • compute_supervised (bool) – Run supervised classification (requires scikit-learn + >=2 groups).

  • n_jobs (int) – Parallel workers (joblib). -1 = all CPUs, 1 = sequential.

  • cluster_bootstrap_frac (float)

  • group_patterns (Dict[str, str] | None)

  • group_fn (Any | None)

Examples

>>> evaluator = NormalizationEvaluator(files=["data/*.txt"])
>>> summary = evaluator.evaluate()
>>> evaluator.plot()
files: List[str | Path]
methods: List[str] | None = None
method_kwargs_map: Dict[str, Dict[str, Any]] | None = None
mz_min: float | None = None
mz_max: float | None = None
n_clusters: int | None = None
cluster_bootstrap_rounds: int = 30
cluster_bootstrap_frac: float = 0.8
random_state: int = 0
compute_supervised: bool = True
n_jobs: int = -1
group_patterns: Dict[str, str] | None = None
group_fn: Any | None = None
evaluate()[source]

Evaluate all methods and return a scored DataFrame.

Returns:

One row per method, sorted by score_combined (descending). Includes raw metrics, z-scored metrics, and four composite scores.

Return type:

pd.DataFrame

plot(out_dir='normalization_selection_output', save=True)[source]

Generate evaluation plots (box plots, bar charts, radar).

Parameters:
  • out_dir (str or Path) – Sub-folder inside OUTPUT_DIR for saved figures.

  • save (bool) – Persist plots as PNG + PDF.

Returns:

Saved file paths.

Return type:

list of Path

print_summary(top_n=5)[source]

Print a ranked summary of evaluation results.

Parameters:

top_n (int, default 5) – Number of top methods to display per score variant.

Return type:

None

preview_overlay(file, methods=None, max_methods=5, mz_min=None, mz_max=None, save_to='normalization_selection_output')[source]

Plot raw vs normalised overlays for quick visual comparison.

Parameters:
  • file (str or Path) – Single spectrum file to visualise.

  • methods (list of str, optional) – Methods to overlay. Defaults to top methods from evaluation.

  • max_methods (int) – Cap on the number of overlays.

  • mz_min (float, optional) – m/z window for the plot.

  • mz_max (float, optional) – m/z window for the plot.

  • save_to (str, Path, or None) – Save directory (relative to OUTPUT_DIR). None skips saving.

Return type:

None

class mioXpektron.normalization.NormalizationMethods(mz_values, raw_intensities)[source]

Bases: object

Evaluate and apply normalization strategies for ToF-SIMS data.

Parameters:
  • mz_values (array-like) – The m/z axis shared by all spectra.

  • raw_intensities (array-like) – Raw intensity values aligned with mz_values.

apply(method='tic', **kwargs)[source]

Apply a named normalization to the stored spectrum.

Parameters:
Returns:

Normalized intensity array.

Return type:

np.ndarray

compare_visual(methods=None, method_kwargs_map=None, mz_min=0, mz_max=500, sample_name='test', group=None, figsize=(12, 8), save_plot=True)[source]

Plot the raw spectrum alongside several normalized versions.

Parameters:
  • methods (list of str, optional) – Normalization methods to overlay. Defaults to a curated set.

  • method_kwargs_map (dict, optional) – {method: {kwarg: value}} for method-specific parameters.

  • mz_min (float) – m/z bounds for the preview window.

  • mz_max (float) – m/z bounds for the preview window.

  • sample_name (str) – Label used for file naming.

  • group (str or None) – Group identifier.

  • figsize (tuple) – Figure size.

  • save_plot (bool) – Persist the rendered figure.

Return type:

matplotlib.axes.Axes

normalize_and_check(method='tic', method_kwargs=None, *, sample_name='test', group=None, mz_min=0, mz_max=500, show_peaks=False, peak_height=1000, peak_prominence=50, min_peak_width=1, max_peak_width=None, figsize=(10, 6), save_plot=True)[source]

Apply one normalization and visualise the result with peak overlay.

Parameters:
  • method (str) – Normalization method.

  • method_kwargs (dict, optional) – Extra kwargs forwarded to normalize().

  • sample_name (str) – Plot labels.

  • group (str) – Plot labels.

  • mz_min (float) – m/z window for the plot.

  • mz_max (float) – m/z window for the plot.

  • show_peaks (bool) – Annotate detected peaks.

  • peak_height (float) – Peak detection tuning passed to PlotPeak.

  • peak_prominence (float) – Peak detection tuning passed to PlotPeak.

  • min_peak_width (int) – Peak detection tuning passed to PlotPeak.

  • max_peak_width (int | None) – Peak detection tuning passed to PlotPeak.

  • figsize (tuple)

  • save_plot (bool)

Return type:

matplotlib.axes.Axes

static evaluate(files, methods=None, method_kwargs_map=None, mz_min=None, mz_max=None, n_jobs=-1, compute_supervised=True, save_results=True)[source]

Evaluate normalization methods across multiple spectra files.

Thin wrapper around NormalizationEvaluator that runs evaluation, prints a summary, and optionally saves results.

Parameters:
  • files (list of str or Path) – Spectrum file paths or glob patterns.

  • methods (list of str, optional) – Method names to evaluate.

  • method_kwargs_map (dict, optional) – Per-method keyword arguments.

  • mz_min (float, optional) – m/z range for data import.

  • mz_max (float, optional) – m/z range for data import.

  • n_jobs (int) – Parallel workers (-1 = all CPUs).

  • compute_supervised (bool) – Run supervised classification (requires scikit-learn).

  • save_results (bool) – Save CSV + JSON + plots to OUTPUT_DIR.

Returns:

The evaluator instance (call .plot() for figures).

Return type:

NormalizationEvaluator

static available_methods()[source]

Return sorted list of available normalization method names.

Return type:

List[str]

mioXpektron.normalization.batch_tic_norm(input_pattern, output_dir='normalized_spectra', mz_min=None, mz_max=None, normalization_target=1000000.0, verbose=False)[source]

Batch‑import and preprocess multiple ToF‑SIMS spectra, then save the (m/z, normalized_intensity) arrays for each file as a tab‑separated text file in output_dir.

Parameters:
Returns:

Paths of the files written, in processing order.

Return type:

List[str]

mioXpektron.normalization.data_preprocessing(file_path, mz_min=None, mz_max=None, normalization_target=1000000.0, verbose=True, return_all=False)[source]

Import and preprocess ToF-SIMS data from a text file.

Parameters:

file_pathstr

Path to the ToF-SIMS data file

mz_min, mz_maxfloat, optional

m/z range to import

normalization_targetfloat or None

Target TIC for normalization, or None to skip

verbosebool

Print progress if True

return_allbool

If True, return all intermediate arrays

Returns:

mz_values : numpy.ndarray normalized_intensities : numpy.ndarray sample_name : str group : str (optionally: intermediate arrays)

mioXpektron.normalization.resample_spectrum(mz_values, intensity_values, target_mz, method='linear')[source]

Resample a spectrum onto a target m/z grid.

The input axis is sorted, duplicate m/z positions are collapsed to their first occurrence, and values outside the native m/z range are filled with zero. Supported interpolation methods are linear, pchip, akima, makima, and cubic.

mioXpektron.normalization.normalization_target(files, mz_min=None, mz_max=None)[source]

Normalize peak intensities or areas to a target value.

Parameters:
  • files (list of str) – List of file paths to process.

  • mz_min (float or None) – m/z window for data import (if supported).

  • mz_max (float or None) – m/z window for data import (if supported).

  • baseline_method (str) – Method for baseline correction.

  • noise_method (str) – Noise filtering method.

  • missing_value_method (str) – Method for handling missing values.

Returns:

normalized_df – Normalized DataFrame.

Return type:

pd.DataFrame

class mioXpektron.normalization.BatchTicNorm(input_pattern, output_dir='normalized_spectra', normalization_target=1000000.0, n_workers=-1, verbose=True)[source]

Bases: object

Batch TIC normalization for multiple spectra files using Polars and concurrent.futures.

Supports both CSV and TXT file formats: - CSV: Uses ‘corrected_intensity’ if available, otherwise ‘intensity’ - TXT: Tab-separated m/z and intensity values

Output files contain: channel, mz, intensity (normalized)

Parameters:
  • input_pattern (str)

  • output_dir (str)

  • normalization_target (float)

  • n_workers (int)

  • verbose (bool)

__init__(input_pattern, output_dir='normalized_spectra', normalization_target=1000000.0, n_workers=-1, verbose=True)[source]

Initialize BatchTicNorm processor.

Parameters:
  • input_pattern (str) – Glob pattern for input files (e.g., ‘data/.csv’ or ‘data/.txt’)

  • output_dir (str) – Directory to save normalized files

  • normalization_target (float) – Target TIC value for normalization (default: 1e6)

  • n_workers (int) – Number of parallel workers (default: 16)

  • verbose (bool) – Print progress information

process()[source]

Process all files using concurrent.futures.

Returns:

List of output file paths that were successfully created

Return type:

List[str]

get_tic_statistics()[source]

Calculate TIC statistics for all input files before normalization.

Returns:

DataFrame with columns: filename, tic_original, tic_million

Return type:

pl.DataFrame

Modules

main

High-level orchestration helpers for normalising ToF-SIMS spectra.

normalization

Normalization methods for ToF-SIMS mass spectrometry data.

normalization_eval

Normalization method evaluation for ToF-SIMS data.

preprocessing

tic_count