Normalization
The normalization module provides 18 normalization strategies for ToF-SIMS spectra, ranging from simple scaling (TIC, max) to variance-stabilising transforms (Poisson, sqrt, VSN), robust methods (median, robust SNV, PQN), and reference-aware strategies such as selected-ion, multi-ion reference, and mass-stratified PQN. An evaluation framework helps choose the best method for your dataset.
Quick Example
from mioXpektron import normalize
# TIC normalization (default)
normalized = normalize(intensity, method="tic", target_tic=1e6)
# Or use any of 18 available methods
normalized = normalize(intensity, method="poisson")
Available Methods
Method |
Name |
Use Case |
|---|---|---|
|
Total Ion Current |
General-purpose normalization (default) |
|
Median scaling |
Robust when dominant peaks inflate TIC (e.g. substrate ions) |
|
Root Mean Square |
Compromise between TIC and median sensitivity |
|
Maximum normalization |
Quick comparison (max value = 1) |
|
L2 vector norm |
Spectral shape comparison (unit length) |
|
Standard Normal Variate |
Before PCA/PLS-DA; removes multiplicative scatter |
|
Median/MAD SNV |
More robust than SNV when a few dominant peaks distort the mean/std |
|
Poisson scaling |
Before PCA on count data; equalises channel weights |
|
Square-root transform |
Variance stabilisation for Poisson-distributed counts |
|
Log(1+x) transform |
High dynamic range spectra |
|
arcsinh transform |
Variance stabilisation; handles zeros gracefully |
|
Min-Max scaling |
Fixed range [0, 1] |
|
Single-peak reference |
Normalize to a known reference ion |
|
Robust multi-ion reference |
Normalize to a stable ion panel using a median ratio across reference ions |
|
Probabilistic Quotient |
Compositional effects; requires dataset reference |
|
Regional PQN |
Correct region-specific matrix effects across coarse m/z strata |
|
DESeq2-style |
Multi-batch experiments; requires geometric-mean reference |
|
Dataset-level Pareto scaling |
Downstream multivariate analysis where mean-centering with reduced variance scaling is useful |
Method Details
TIC Normalization — Scales each spectrum so the sum of all intensities equals a target value (default 1 million):
from mioXpektron import tic_normalization
normalized = tic_normalization(intensity, target_tic=1e6)
Poisson Scaling — Divides by sqrt(mean_intensity), equalising
the weight of low- and high-count channels. Nearly universal for
multivariate analysis of ToF-SIMS count data:
from mioXpektron import normalize
scaled = normalize(intensity, method="poisson")
Selected-Ion Normalization — Normalizes to a reference peak (by index or absolute intensity):
# By index
normalized = normalize(intensity, method="selected_ion", reference_idx=42)
# By absolute value
normalized = normalize(intensity, method="selected_ion",
reference_intensity=5000.0)
PQN (Probabilistic Quotient Normalization) — Handles compositional effects. Requires a reference spectrum (e.g. median of the dataset):
import numpy as np
from mioXpektron import normalize
# Compute reference from all spectra
reference = np.median(all_spectra, axis=0)
normalized = normalize(intensity, method="pqn", reference=reference)
Robust SNV — Uses the median and median absolute deviation (MAD) instead of the mean and standard deviation:
normalized = normalize(intensity, method="robust_snv")
Multi-Ion Reference — Uses a panel of stable ions and a robust median ratio across the panel:
normalized = normalize(
intensity,
method="multi_ion_reference",
reference_indices=[120, 245, 910],
reference_values=[5200.0, 1800.0, 25000.0],
)
Mass-Stratified PQN — Applies PQN within coarse m/z windows while still using a dataset-level reference spectrum:
normalized = normalize(
intensity,
method="mass_stratified_pqn",
mz_values=mz,
reference=reference,
strata=[(0.0, 100.0), (100.0, 400.0), (400.0, float("inf"))],
)
Pareto Scaling — Requires dataset-level mean and standard deviation and is most useful for multivariate modelling rather than raw signal correction:
normalized = normalize(
intensity,
method="pareto",
mean=dataset_mean,
std=dataset_std,
)
Unified Dispatcher
All methods are accessible through the normalize() function:
from mioXpektron import normalize, normalization_method_names
# List available methods
print(normalization_method_names())
# ['log', 'mass_stratified_pqn', 'max', 'median', 'median_of_ratios',
# 'minmax', 'multi_ion_reference', 'pareto', 'poisson', 'pqn',
# 'rms', 'robust_snv', 'selected_ion', 'snv', 'sqrt', 'tic',
# 'vector', 'vsn']
# Apply any method
result = normalize(intensity, method="rms", target_rms=1.0)
Method Evaluation
The NormalizationEvaluator compares methods on your actual data using
unsupervised, supervised, and spectral-quality metrics — following the
approach from xpectrass adapted for ToF-SIMS:
from mioXpektron import NormalizationEvaluator
evaluator = NormalizationEvaluator(
files=["spectra/"],
methods=["tic", "robust_snv", "pqn", "mass_stratified_pqn", "log"],
method_kwargs_map={
"mass_stratified_pqn": {
"strata": [(0.0, 100.0), (100.0, 400.0), (400.0, float("inf"))],
},
},
)
# Run evaluation
results = evaluator.evaluate()
# Print ranked summary
evaluator.print_summary()
# Generate plots
evaluator.plot()
NormalizationEvaluator accepts directories, file paths, and glob patterns.
When a directory is given it will load both traditional .txt spectra and
baseline-corrected .csv spectra.
Notebook Workflow
For baseline-corrected CSV cohorts with non-identical native m/z axes, use the
repository notebook NoteBooks/_06_Normalization.ipynb. It:
builds a shared m/z grid before evaluation
supports
linear,pchip,akima,makima(SciPy >= 1.13), andcubicresamplingincludes
mass_stratified_pqnin the default evaluation setenables
multi_ion_referencewhenmulti_ion_reference_mzis providedexports the winning method to a timestamped normalized output folder
Metrics computed:
CV of TIC — Coefficient of variation of total-ion current (lower = more consistent normalization)
Within-group SAM — Spectral Angle Mapper between technical replicates (lower = more similar shapes)
Within-group correlation — Mean Pearson correlation within groups (higher = better consistency)
Clustering — Adjusted Rand Index, NMI, silhouette, and stability via KMeans
Supervised — Macro F1 and balanced accuracy via stratified cross-validation (requires scikit-learn and ≥2 groups)
Composite scores (z-scored, higher = better):
score_combined— Balanced across all metric categories (recommended)score_unsupervised— For unlabelled datascore_supervised— Classification-focusedscore_efficient— Includes computation time
Visual Comparison
Preview how different methods affect a single spectrum:
from mioXpektron import NormalizationMethods
nm = NormalizationMethods(mz_values, raw_intensity)
# Side-by-side comparison
nm.compare_visual(
methods=["tic", "median", "poisson", "sqrt"],
mz_min=0, mz_max=200,
)
# Apply one method and visualise with peak overlay
nm.normalize_and_check(
method="poisson",
show_peaks=True,
mz_min=0, mz_max=200,
)
Data Preprocessing
Combined import, filtering, and normalization in one step:
from mioXpektron import data_preprocessing
sample_name, group, mz, normalized = data_preprocessing(
"spectrum.txt",
mz_min=1.0,
mz_max=300.0,
normalization_target=1e6,
)
Batch Normalization
Functional Interface
from mioXpektron import batch_tic_norm
written_files = batch_tic_norm(
"spectra/*.txt",
output_dir="normalized/",
normalization_target=1e6,
)
Class-Based Interface
For more control with Polars-based parallel processing:
from mioXpektron import BatchTicNorm
normalizer = BatchTicNorm(
input_pattern="spectra/*.csv",
output_dir="normalized/",
normalization_target=1e6,
n_workers=4,
)
# View TIC statistics before processing
stats = normalizer.get_tic_statistics()
# Run batch normalization
output_files = normalizer.process()
Batch Evaluation
Evaluate methods across an entire dataset:
from mioXpektron import NormalizationMethods
evaluator = NormalizationMethods.evaluate(
files=["spectra/*.txt"],
methods=["tic", "median", "rms", "poisson", "sqrt", "vsn"],
n_jobs=-1, # use all CPUs
)
# The returned evaluator has .plot() and .print_summary()
evaluator.plot()
API Reference
- mioXpektron.normalization.normalize(intensities, method='tic', **kwargs)[source]
Apply a named normalization method to a 1-D intensity array.
- Parameters:
intensities (array-like) – Raw intensity values (1-D).
method (str, default
"tic") – Name of the normalization method. Callnormalization_method_names()for the full list.**kwargs – Method-specific keyword arguments forwarded to the underlying function (e.g.
target_ticfor TIC,reference_mz_idxfor selected-ion normalization).
- Returns:
Normalized intensity values.
- Return type:
np.ndarray
- Raises:
ValueError – If method is not recognised.
- mioXpektron.normalization.normalization_method_names()[source]
Return a sorted list of available 1-D normalization method names.
- mioXpektron.normalization.tic_normalization(intensities, target_tic=1000000.0)[source]
Scale intensities so the total-ion current equals target_tic.
This is the most common normalisation in ToF-SIMS. Each spectrum is multiplied by
target_tic / sum(intensities)so that all spectra share the same TIC.- Parameters:
intensities (array-like) – Raw ion counts or intensities.
target_tic (float or None) – Desired total-ion current after scaling. Pass
Noneto skip.
- Return type:
np.ndarray
- mioXpektron.normalization.median_normalization(intensities, target_median=1.0)[source]
Scale intensities so the median equals target_median.
More robust than TIC when a few dominant peaks (e.g. substrate ions) inflate the total-ion current.
- Parameters:
intensities (array-like)
target_median (float, default 1.0)
- Return type:
np.ndarray
- mioXpektron.normalization.rms_normalization(intensities, target_rms=1.0)[source]
Scale intensities so the root-mean-square equals target_rms.
A compromise between TIC (dominated by big peaks) and median (ignores peak structure).
- Parameters:
intensities (array-like)
target_rms (float, default 1.0)
- Return type:
np.ndarray
- mioXpektron.normalization.max_normalization(intensities)[source]
Scale intensities so the maximum value equals 1.
- Parameters:
intensities (array-like)
- Return type:
np.ndarray
- mioXpektron.normalization.vector_normalization(intensities)[source]
Scale intensities to unit L2 norm (vector length = 1).
Useful for comparing spectral shape irrespective of total signal.
- Parameters:
intensities (array-like)
- Return type:
np.ndarray
- mioXpektron.normalization.snv_normalization(intensities)[source]
Standard Normal Variate: centre and scale to unit variance.
Commonly used before multivariate analysis (PCA, PLS-DA) to remove multiplicative scatter effects.
- Parameters:
intensities (array-like)
- Returns:
Mean-centred, variance-scaled spectrum. Note: values can be negative, which is expected for SNV.
- Return type:
np.ndarray
- mioXpektron.normalization.robust_snv_normalization(intensities, mad_scale=1.4826)[source]
Robust SNV using median and MAD instead of mean and standard deviation.
This is less sensitive to a few dominant ions than classical SNV and is therefore a better fit when substrate/matrix peaks dominate part of the spectrum.
- Parameters:
intensities (array-like)
mad_scale (float, default 1.4826) – Consistency factor turning MAD into a robust standard deviation estimate for approximately Gaussian data.
- Returns:
Median-centred, MAD-scaled spectrum. Negative values are expected.
- Return type:
np.ndarray
- mioXpektron.normalization.poisson_scaling(intensities)[source]
Poisson (square-root mean) scaling for count data.
Each channel is divided by
sqrt(mean_intensity)across the spectrum. This equalises the weight of low- and high-count channels when ToF-SIMS data follow Poisson statistics. Widely used before PCA.- Parameters:
intensities (array-like)
- Return type:
np.ndarray
- mioXpektron.normalization.sqrt_normalization(intensities)[source]
Square-root variance-stabilising transform.
sqrt(intensity)stabilises the variance of Poisson-distributed ion counts. Often combined with mean-centering before PCA.- Parameters:
intensities (array-like)
- Return type:
np.ndarray
- mioXpektron.normalization.log_normalization(intensities, pseudo_count=1.0)[source]
Log(1 + intensity) transform for high-dynamic-range spectra.
- Parameters:
intensities (array-like)
pseudo_count (float, default 1.0) – Added before taking the log to avoid log(0).
- Return type:
np.ndarray
- mioXpektron.normalization.vsn_normalization(intensities)[source]
Variance-stabilising normalization via
arcsinhtransform.arcsinh(x)behaves likelog(2x)for large values but handles zeros and small values gracefully. Suitable for high-dynamic-range ToF-SIMS spectra.- Parameters:
intensities (array-like)
- Return type:
np.ndarray
- mioXpektron.normalization.minmax_normalization(intensities, feature_range=(0.0, 1.0))[source]
Scale intensities to a fixed range (default [0, 1]).
- mioXpektron.normalization.selected_ion_normalization(intensities, reference_idx=None, reference_intensity=None, target=1.0)[source]
Normalise to a single reference peak (e.g. substrate or matrix ion).
Provide either
reference_idx(index into the intensity array) orreference_intensity(the absolute value to divide by).- Parameters:
- Return type:
np.ndarray
- mioXpektron.normalization.multi_ion_reference_normalization(intensities, reference_indices=None, reference_values=None, target=1.0)[source]
Normalize using multiple reference ions and a robust median ratio.
- Parameters:
intensities (array-like)
reference_indices (sequence of int) – Indices of stable reference ions in the spectrum.
reference_values (sequence of float, optional) – Expected intensities for the same reference ions. When provided the spectrum is scaled by the median observed/reference ratio. When omitted, the median observed intensity is scaled to
target.target (float, default 1.0) – Target robust centre when
reference_valuesis omitted.
- Return type:
np.ndarray
- mioXpektron.normalization.pqn_normalization(intensities, reference=None)[source]
Probabilistic Quotient Normalization.
Designed for compositional data where a few species dominate. Divides each channel by the median quotient relative to a reference spectrum.
- Parameters:
intensities (array-like)
reference (array-like or None) – Reference spectrum (e.g. median of a dataset). If
None, falls back to TIC normalization with a warning.
- Return type:
np.ndarray
- mioXpektron.normalization.mass_stratified_pqn_normalization(intensities, mz_values=None, reference=None, strata=None)[source]
Apply PQN separately across coarse m/z strata.
This keeps a global TIC-normalised baseline while estimating local PQN size factors for different m/z regions.
- Parameters:
intensities (array-like)
mz_values (array-like) – m/z axis shared with
intensities.reference (array-like) – Dataset-level reference spectrum on the same m/z grid.
strata (sequence of tuple(float, float), optional) – Inclusive/exclusive m/z windows
[(lo, hi), ...]. Defaults to[(0, 100), (100, 400), (400, inf)].
- Return type:
np.ndarray
- mioXpektron.normalization.median_of_ratios_normalization(intensities, reference=None)[source]
DESeq2-style median-of-ratios normalization.
Computes the geometric mean spectrum as reference, then normalises each sample by the median ratio to that reference. Robust to compositional effects.
- Parameters:
intensities (array-like)
reference (array-like or None) – Pre-computed geometric-mean reference. If
None, falls back to TIC normalization with a warning.
- Return type:
np.ndarray
- mioXpektron.normalization.pareto_normalization(intensities, mean=None, std=None, eps=1e-12)[source]
Pareto scale a spectrum using dataset-level feature statistics.
Pareto scaling is a dataset-level transform commonly used before PCA: each feature is mean-centred and divided by
sqrt(std_feature). This down-weights very intense ions less aggressively than autoscaling while still reducing dominance by a few channels.- Parameters:
intensities (array-like)
mean (array-like) – Per-feature dataset mean with the same shape as
intensities.std (array-like) – Per-feature dataset standard deviation with the same shape as
intensities.eps (float, default 1e-12) – Numerical floor preventing division by zero.
- Returns:
Mean-centred, Pareto-scaled spectrum. Negative values are expected.
- Return type:
np.ndarray
- Raises:
ValueError – If dataset-level mean/std arrays are not provided.
- class mioXpektron.normalization.NormalizationEvaluator(files=<factory>, methods=None, method_kwargs_map=None, mz_min=None, mz_max=None, n_clusters=None, cluster_bootstrap_rounds=30, cluster_bootstrap_frac=0.8, random_state=0, compute_supervised=True, n_jobs=-1, group_patterns=None, group_fn=None)[source]
Bases:
objectEvaluate normalization methods on labelled ToF-SIMS spectra.
- Parameters:
files (list of str or Path) – Paths or glob patterns expanding to spectrum text files.
methods (list of str, optional) – Normalization method names. Defaults to a sensible subset.
method_kwargs_map (dict, optional) –
{method_name: {kwarg: value, ...}}for method-specific params.mz_min (float, optional) – m/z range to import.
mz_max (float, optional) – m/z range to import.
n_clusters (int, optional) – Number of clusters for KMeans evaluation. Auto-detected if omitted.
cluster_bootstrap_rounds (int) – Bootstrap rounds for stability metric.
random_state (int) – RNG seed for reproducibility.
compute_supervised (bool) – Run supervised classification (requires scikit-learn + >=2 groups).
n_jobs (int) – Parallel workers (joblib).
-1= all CPUs,1= sequential.cluster_bootstrap_frac (float)
group_fn (Any | None)
Examples
>>> evaluator = NormalizationEvaluator(files=["data/*.txt"]) >>> summary = evaluator.evaluate() >>> evaluator.plot()
- evaluate()[source]
Evaluate all methods and return a scored DataFrame.
- Returns:
One row per method, sorted by
score_combined(descending). Includes raw metrics, z-scored metrics, and four composite scores.- Return type:
pd.DataFrame
- plot(out_dir='normalization_selection_output', save=True)[source]
Generate evaluation plots (box plots, bar charts, radar).
- print_summary(top_n=5)[source]
Print a ranked summary of evaluation results.
- Parameters:
top_n (int, default 5) – Number of top methods to display per score variant.
- Return type:
None
- preview_overlay(file, methods=None, max_methods=5, mz_min=None, mz_max=None, save_to='normalization_selection_output')[source]
Plot raw vs normalised overlays for quick visual comparison.
- Parameters:
file (str or Path) – Single spectrum file to visualise.
methods (list of str, optional) – Methods to overlay. Defaults to top methods from evaluation.
max_methods (int) – Cap on the number of overlays.
mz_min (float, optional) – m/z window for the plot.
mz_max (float, optional) – m/z window for the plot.
save_to (str, Path, or None) – Save directory (relative to OUTPUT_DIR).
Noneskips saving.
- Return type:
None
- class mioXpektron.normalization.NormalizationMethods(mz_values, raw_intensities)[source]
Bases:
objectEvaluate and apply normalization strategies for ToF-SIMS data.
- Parameters:
mz_values (array-like) – The m/z axis shared by all spectra.
raw_intensities (array-like) – Raw intensity values aligned with
mz_values.
- apply(method='tic', **kwargs)[source]
Apply a named normalization to the stored spectrum.
- Parameters:
method (str) – Normalization method name (see
normalization_method_names()).**kwargs – Method-specific keyword arguments.
- Returns:
Normalized intensity array.
- Return type:
np.ndarray
- compare_visual(methods=None, method_kwargs_map=None, mz_min=0, mz_max=500, sample_name='test', group=None, figsize=(12, 8), save_plot=True)[source]
Plot the raw spectrum alongside several normalized versions.
- Parameters:
methods (list of str, optional) – Normalization methods to overlay. Defaults to a curated set.
method_kwargs_map (dict, optional) –
{method: {kwarg: value}}for method-specific parameters.mz_min (float) – m/z bounds for the preview window.
mz_max (float) – m/z bounds for the preview window.
sample_name (str) – Label used for file naming.
group (str or None) – Group identifier.
figsize (tuple) – Figure size.
save_plot (bool) – Persist the rendered figure.
- Return type:
- normalize_and_check(method='tic', method_kwargs=None, *, sample_name='test', group=None, mz_min=0, mz_max=500, show_peaks=False, peak_height=1000, peak_prominence=50, min_peak_width=1, max_peak_width=None, figsize=(10, 6), save_plot=True)[source]
Apply one normalization and visualise the result with peak overlay.
- Parameters:
method (str) – Normalization method.
method_kwargs (dict, optional) – Extra kwargs forwarded to
normalize().sample_name (str) – Plot labels.
group (str) – Plot labels.
mz_min (float) – m/z window for the plot.
mz_max (float) – m/z window for the plot.
show_peaks (bool) – Annotate detected peaks.
peak_height (float) – Peak detection tuning passed to
PlotPeak.peak_prominence (float) – Peak detection tuning passed to
PlotPeak.min_peak_width (int) – Peak detection tuning passed to
PlotPeak.max_peak_width (int | None) – Peak detection tuning passed to
PlotPeak.figsize (tuple)
save_plot (bool)
- Return type:
- static evaluate(files, methods=None, method_kwargs_map=None, mz_min=None, mz_max=None, n_jobs=-1, compute_supervised=True, save_results=True)[source]
Evaluate normalization methods across multiple spectra files.
Thin wrapper around
NormalizationEvaluatorthat runs evaluation, prints a summary, and optionally saves results.- Parameters:
files (list of str or Path) – Spectrum file paths or glob patterns.
method_kwargs_map (dict, optional) – Per-method keyword arguments.
mz_min (float, optional) – m/z range for data import.
mz_max (float, optional) – m/z range for data import.
n_jobs (int) – Parallel workers (
-1= all CPUs).compute_supervised (bool) – Run supervised classification (requires scikit-learn).
save_results (bool) – Save CSV + JSON + plots to
OUTPUT_DIR.
- Returns:
The evaluator instance (call
.plot()for figures).- Return type:
- mioXpektron.normalization.data_preprocessing(file_path, mz_min=None, mz_max=None, normalization_target=1000000.0, verbose=True, return_all=False)[source]
Import and preprocess ToF-SIMS data from a text file.
Parameters:
- file_pathstr
Path to the ToF-SIMS data file
- mz_min, mz_maxfloat, optional
m/z range to import
- normalization_targetfloat or None
Target TIC for normalization, or None to skip
- verbosebool
Print progress if True
- return_allbool
If True, return all intermediate arrays
Returns:
mz_values : numpy.ndarray normalized_intensities : numpy.ndarray sample_name : str group : str (optionally: intermediate arrays)
- mioXpektron.normalization.batch_tic_norm(input_pattern, output_dir='normalized_spectra', mz_min=None, mz_max=None, normalization_target=1000000.0, verbose=False)[source]
Batch‑import and preprocess multiple ToF‑SIMS spectra, then save the (m/z, normalized_intensity) arrays for each file as a tab‑separated text file in output_dir.
- Parameters:
input_pattern (str) – Glob pattern (e.g. ‘spectra/*.txt’) that expands to the input files.
output_dir (str) – Folder where ‘<original‑name>_normalized.txt’ will be written; created if it does not already exist.
mz_min (float | None) – Passed through to :pyfunc:`data_preprocessing`.
mz_max (float | None) – Passed through to :pyfunc:`data_preprocessing`.
normalization_target (float | None) – Passed through to :pyfunc:`data_preprocessing`.
verbose (bool) – Passed through to :pyfunc:`data_preprocessing`.
- Returns:
Paths of the files written, in processing order.
- Return type:
List[str]
- class mioXpektron.normalization.BatchTicNorm(input_pattern, output_dir='normalized_spectra', normalization_target=1000000.0, n_workers=-1, verbose=True)[source]
Bases:
objectBatch TIC normalization for multiple spectra files using Polars and concurrent.futures.
Supports both CSV and TXT file formats: - CSV: Uses ‘corrected_intensity’ if available, otherwise ‘intensity’ - TXT: Tab-separated m/z and intensity values
Output files contain: channel, mz, intensity (normalized)
- Parameters:
- __init__(input_pattern, output_dir='normalized_spectra', normalization_target=1000000.0, n_workers=-1, verbose=True)[source]
Initialize BatchTicNorm processor.
- Parameters:
input_pattern (str) – Glob pattern for input files (e.g., ‘data/.csv’ or ‘data/.txt’)
output_dir (str) – Directory to save normalized files
normalization_target (float) – Target TIC value for normalization (default: 1e6)
n_workers (int) – Number of parallel workers (default: 16)
verbose (bool) – Print progress information