Normalization

The normalization module provides 18 normalization strategies for ToF-SIMS spectra, ranging from simple scaling (TIC, max) to variance-stabilising transforms (Poisson, sqrt, VSN), robust methods (median, robust SNV, PQN), and reference-aware strategies such as selected-ion, multi-ion reference, and mass-stratified PQN. An evaluation framework helps choose the best method for your dataset.

Quick Example

from mioXpektron import normalize

# TIC normalization (default)
normalized = normalize(intensity, method="tic", target_tic=1e6)

# Or use any of 18 available methods
normalized = normalize(intensity, method="poisson")

Available Methods

Method

Name

Use Case

tic

Total Ion Current

General-purpose normalization (default)

median

Median scaling

Robust when dominant peaks inflate TIC (e.g. substrate ions)

rms

Root Mean Square

Compromise between TIC and median sensitivity

max

Maximum normalization

Quick comparison (max value = 1)

vector

L2 vector norm

Spectral shape comparison (unit length)

snv

Standard Normal Variate

Before PCA/PLS-DA; removes multiplicative scatter

robust_snv

Median/MAD SNV

More robust than SNV when a few dominant peaks distort the mean/std

poisson

Poisson scaling

Before PCA on count data; equalises channel weights

sqrt

Square-root transform

Variance stabilisation for Poisson-distributed counts

log

Log(1+x) transform

High dynamic range spectra

vsn

arcsinh transform

Variance stabilisation; handles zeros gracefully

minmax

Min-Max scaling

Fixed range [0, 1]

selected_ion

Single-peak reference

Normalize to a known reference ion

multi_ion_reference

Robust multi-ion reference

Normalize to a stable ion panel using a median ratio across reference ions

pqn

Probabilistic Quotient

Compositional effects; requires dataset reference

mass_stratified_pqn

Regional PQN

Correct region-specific matrix effects across coarse m/z strata

median_of_ratios

DESeq2-style

Multi-batch experiments; requires geometric-mean reference

pareto

Dataset-level Pareto scaling

Downstream multivariate analysis where mean-centering with reduced variance scaling is useful

Method Details

TIC Normalization — Scales each spectrum so the sum of all intensities equals a target value (default 1 million):

from mioXpektron import tic_normalization

normalized = tic_normalization(intensity, target_tic=1e6)

Poisson Scaling — Divides by sqrt(mean_intensity), equalising the weight of low- and high-count channels. Nearly universal for multivariate analysis of ToF-SIMS count data:

from mioXpektron import normalize

scaled = normalize(intensity, method="poisson")

Selected-Ion Normalization — Normalizes to a reference peak (by index or absolute intensity):

# By index
normalized = normalize(intensity, method="selected_ion", reference_idx=42)

# By absolute value
normalized = normalize(intensity, method="selected_ion",
                       reference_intensity=5000.0)

PQN (Probabilistic Quotient Normalization) — Handles compositional effects. Requires a reference spectrum (e.g. median of the dataset):

import numpy as np
from mioXpektron import normalize

# Compute reference from all spectra
reference = np.median(all_spectra, axis=0)
normalized = normalize(intensity, method="pqn", reference=reference)

Robust SNV — Uses the median and median absolute deviation (MAD) instead of the mean and standard deviation:

normalized = normalize(intensity, method="robust_snv")

Multi-Ion Reference — Uses a panel of stable ions and a robust median ratio across the panel:

normalized = normalize(
    intensity,
    method="multi_ion_reference",
    reference_indices=[120, 245, 910],
    reference_values=[5200.0, 1800.0, 25000.0],
)

Mass-Stratified PQN — Applies PQN within coarse m/z windows while still using a dataset-level reference spectrum:

normalized = normalize(
    intensity,
    method="mass_stratified_pqn",
    mz_values=mz,
    reference=reference,
    strata=[(0.0, 100.0), (100.0, 400.0), (400.0, float("inf"))],
)

Pareto Scaling — Requires dataset-level mean and standard deviation and is most useful for multivariate modelling rather than raw signal correction:

normalized = normalize(
    intensity,
    method="pareto",
    mean=dataset_mean,
    std=dataset_std,
)

Unified Dispatcher

All methods are accessible through the normalize() function:

from mioXpektron import normalize, normalization_method_names

# List available methods
print(normalization_method_names())
# ['log', 'mass_stratified_pqn', 'max', 'median', 'median_of_ratios',
#  'minmax', 'multi_ion_reference', 'pareto', 'poisson', 'pqn',
#  'rms', 'robust_snv', 'selected_ion', 'snv', 'sqrt', 'tic',
#  'vector', 'vsn']

# Apply any method
result = normalize(intensity, method="rms", target_rms=1.0)

Method Evaluation

The NormalizationEvaluator compares methods on your actual data using unsupervised, supervised, and spectral-quality metrics — following the approach from xpectrass adapted for ToF-SIMS:

from mioXpektron import NormalizationEvaluator

evaluator = NormalizationEvaluator(
    files=["spectra/"],
    methods=["tic", "robust_snv", "pqn", "mass_stratified_pqn", "log"],
    method_kwargs_map={
        "mass_stratified_pqn": {
            "strata": [(0.0, 100.0), (100.0, 400.0), (400.0, float("inf"))],
        },
    },
)

# Run evaluation
results = evaluator.evaluate()

# Print ranked summary
evaluator.print_summary()

# Generate plots
evaluator.plot()

NormalizationEvaluator accepts directories, file paths, and glob patterns. When a directory is given it will load both traditional .txt spectra and baseline-corrected .csv spectra.

Notebook Workflow

For baseline-corrected CSV cohorts with non-identical native m/z axes, use the repository notebook NoteBooks/_06_Normalization.ipynb. It:

  • builds a shared m/z grid before evaluation

  • supports linear, pchip, akima, makima (SciPy >= 1.13), and cubic resampling

  • includes mass_stratified_pqn in the default evaluation set

  • enables multi_ion_reference when multi_ion_reference_mz is provided

  • exports the winning method to a timestamped normalized output folder

Metrics computed:

  • CV of TIC — Coefficient of variation of total-ion current (lower = more consistent normalization)

  • Within-group SAM — Spectral Angle Mapper between technical replicates (lower = more similar shapes)

  • Within-group correlation — Mean Pearson correlation within groups (higher = better consistency)

  • Clustering — Adjusted Rand Index, NMI, silhouette, and stability via KMeans

  • Supervised — Macro F1 and balanced accuracy via stratified cross-validation (requires scikit-learn and ≥2 groups)

Composite scores (z-scored, higher = better):

  • score_combined — Balanced across all metric categories (recommended)

  • score_unsupervised — For unlabelled data

  • score_supervised — Classification-focused

  • score_efficient — Includes computation time

Visual Comparison

Preview how different methods affect a single spectrum:

from mioXpektron import NormalizationMethods

nm = NormalizationMethods(mz_values, raw_intensity)

# Side-by-side comparison
nm.compare_visual(
    methods=["tic", "median", "poisson", "sqrt"],
    mz_min=0, mz_max=200,
)

# Apply one method and visualise with peak overlay
nm.normalize_and_check(
    method="poisson",
    show_peaks=True,
    mz_min=0, mz_max=200,
)

Data Preprocessing

Combined import, filtering, and normalization in one step:

from mioXpektron import data_preprocessing

sample_name, group, mz, normalized = data_preprocessing(
    "spectrum.txt",
    mz_min=1.0,
    mz_max=300.0,
    normalization_target=1e6,
)

Batch Normalization

Functional Interface

from mioXpektron import batch_tic_norm

written_files = batch_tic_norm(
    "spectra/*.txt",
    output_dir="normalized/",
    normalization_target=1e6,
)

Class-Based Interface

For more control with Polars-based parallel processing:

from mioXpektron import BatchTicNorm

normalizer = BatchTicNorm(
    input_pattern="spectra/*.csv",
    output_dir="normalized/",
    normalization_target=1e6,
    n_workers=4,
)

# View TIC statistics before processing
stats = normalizer.get_tic_statistics()

# Run batch normalization
output_files = normalizer.process()

Batch Evaluation

Evaluate methods across an entire dataset:

from mioXpektron import NormalizationMethods

evaluator = NormalizationMethods.evaluate(
    files=["spectra/*.txt"],
    methods=["tic", "median", "rms", "poisson", "sqrt", "vsn"],
    n_jobs=-1,  # use all CPUs
)

# The returned evaluator has .plot() and .print_summary()
evaluator.plot()

API Reference

mioXpektron.normalization.normalize(intensities, method='tic', **kwargs)[source]

Apply a named normalization method to a 1-D intensity array.

Parameters:
  • intensities (array-like) – Raw intensity values (1-D).

  • method (str, default "tic") – Name of the normalization method. Call normalization_method_names() for the full list.

  • **kwargs – Method-specific keyword arguments forwarded to the underlying function (e.g. target_tic for TIC, reference_mz_idx for selected-ion normalization).

Returns:

Normalized intensity values.

Return type:

np.ndarray

Raises:

ValueError – If method is not recognised.

mioXpektron.normalization.normalization_method_names()[source]

Return a sorted list of available 1-D normalization method names.

Return type:

List[str]

mioXpektron.normalization.tic_normalization(intensities, target_tic=1000000.0)[source]

Scale intensities so the total-ion current equals target_tic.

This is the most common normalisation in ToF-SIMS. Each spectrum is multiplied by target_tic / sum(intensities) so that all spectra share the same TIC.

Parameters:
  • intensities (array-like) – Raw ion counts or intensities.

  • target_tic (float or None) – Desired total-ion current after scaling. Pass None to skip.

Return type:

np.ndarray

mioXpektron.normalization.median_normalization(intensities, target_median=1.0)[source]

Scale intensities so the median equals target_median.

More robust than TIC when a few dominant peaks (e.g. substrate ions) inflate the total-ion current.

Parameters:
  • intensities (array-like)

  • target_median (float, default 1.0)

Return type:

np.ndarray

mioXpektron.normalization.rms_normalization(intensities, target_rms=1.0)[source]

Scale intensities so the root-mean-square equals target_rms.

A compromise between TIC (dominated by big peaks) and median (ignores peak structure).

Parameters:
  • intensities (array-like)

  • target_rms (float, default 1.0)

Return type:

np.ndarray

mioXpektron.normalization.max_normalization(intensities)[source]

Scale intensities so the maximum value equals 1.

Parameters:

intensities (array-like)

Return type:

np.ndarray

mioXpektron.normalization.vector_normalization(intensities)[source]

Scale intensities to unit L2 norm (vector length = 1).

Useful for comparing spectral shape irrespective of total signal.

Parameters:

intensities (array-like)

Return type:

np.ndarray

mioXpektron.normalization.snv_normalization(intensities)[source]

Standard Normal Variate: centre and scale to unit variance.

Commonly used before multivariate analysis (PCA, PLS-DA) to remove multiplicative scatter effects.

Parameters:

intensities (array-like)

Returns:

Mean-centred, variance-scaled spectrum. Note: values can be negative, which is expected for SNV.

Return type:

np.ndarray

mioXpektron.normalization.robust_snv_normalization(intensities, mad_scale=1.4826)[source]

Robust SNV using median and MAD instead of mean and standard deviation.

This is less sensitive to a few dominant ions than classical SNV and is therefore a better fit when substrate/matrix peaks dominate part of the spectrum.

Parameters:
  • intensities (array-like)

  • mad_scale (float, default 1.4826) – Consistency factor turning MAD into a robust standard deviation estimate for approximately Gaussian data.

Returns:

Median-centred, MAD-scaled spectrum. Negative values are expected.

Return type:

np.ndarray

mioXpektron.normalization.poisson_scaling(intensities)[source]

Poisson (square-root mean) scaling for count data.

Each channel is divided by sqrt(mean_intensity) across the spectrum. This equalises the weight of low- and high-count channels when ToF-SIMS data follow Poisson statistics. Widely used before PCA.

Parameters:

intensities (array-like)

Return type:

np.ndarray

mioXpektron.normalization.sqrt_normalization(intensities)[source]

Square-root variance-stabilising transform.

sqrt(intensity) stabilises the variance of Poisson-distributed ion counts. Often combined with mean-centering before PCA.

Parameters:

intensities (array-like)

Return type:

np.ndarray

mioXpektron.normalization.log_normalization(intensities, pseudo_count=1.0)[source]

Log(1 + intensity) transform for high-dynamic-range spectra.

Parameters:
  • intensities (array-like)

  • pseudo_count (float, default 1.0) – Added before taking the log to avoid log(0).

Return type:

np.ndarray

mioXpektron.normalization.vsn_normalization(intensities)[source]

Variance-stabilising normalization via arcsinh transform.

arcsinh(x) behaves like log(2x) for large values but handles zeros and small values gracefully. Suitable for high-dynamic-range ToF-SIMS spectra.

Parameters:

intensities (array-like)

Return type:

np.ndarray

mioXpektron.normalization.minmax_normalization(intensities, feature_range=(0.0, 1.0))[source]

Scale intensities to a fixed range (default [0, 1]).

Parameters:
  • intensities (array-like)

  • feature_range (tuple of float, default (0.0, 1.0))

Return type:

np.ndarray

mioXpektron.normalization.selected_ion_normalization(intensities, reference_idx=None, reference_intensity=None, target=1.0)[source]

Normalise to a single reference peak (e.g. substrate or matrix ion).

Provide either reference_idx (index into the intensity array) or reference_intensity (the absolute value to divide by).

Parameters:
  • intensities (array-like)

  • reference_idx (int, optional) – Index of the reference peak in intensities.

  • reference_intensity (float, optional) – Absolute intensity value to normalise against.

  • target (float, default 1.0) – Target value for the reference peak after normalisation.

Return type:

np.ndarray

mioXpektron.normalization.multi_ion_reference_normalization(intensities, reference_indices=None, reference_values=None, target=1.0)[source]

Normalize using multiple reference ions and a robust median ratio.

Parameters:
  • intensities (array-like)

  • reference_indices (sequence of int) – Indices of stable reference ions in the spectrum.

  • reference_values (sequence of float, optional) – Expected intensities for the same reference ions. When provided the spectrum is scaled by the median observed/reference ratio. When omitted, the median observed intensity is scaled to target.

  • target (float, default 1.0) – Target robust centre when reference_values is omitted.

Return type:

np.ndarray

mioXpektron.normalization.pqn_normalization(intensities, reference=None)[source]

Probabilistic Quotient Normalization.

Designed for compositional data where a few species dominate. Divides each channel by the median quotient relative to a reference spectrum.

Parameters:
  • intensities (array-like)

  • reference (array-like or None) – Reference spectrum (e.g. median of a dataset). If None, falls back to TIC normalization with a warning.

Return type:

np.ndarray

mioXpektron.normalization.mass_stratified_pqn_normalization(intensities, mz_values=None, reference=None, strata=None)[source]

Apply PQN separately across coarse m/z strata.

This keeps a global TIC-normalised baseline while estimating local PQN size factors for different m/z regions.

Parameters:
  • intensities (array-like)

  • mz_values (array-like) – m/z axis shared with intensities.

  • reference (array-like) – Dataset-level reference spectrum on the same m/z grid.

  • strata (sequence of tuple(float, float), optional) – Inclusive/exclusive m/z windows [(lo, hi), ...]. Defaults to [(0, 100), (100, 400), (400, inf)].

Return type:

np.ndarray

mioXpektron.normalization.median_of_ratios_normalization(intensities, reference=None)[source]

DESeq2-style median-of-ratios normalization.

Computes the geometric mean spectrum as reference, then normalises each sample by the median ratio to that reference. Robust to compositional effects.

Parameters:
  • intensities (array-like)

  • reference (array-like or None) – Pre-computed geometric-mean reference. If None, falls back to TIC normalization with a warning.

Return type:

np.ndarray

mioXpektron.normalization.pareto_normalization(intensities, mean=None, std=None, eps=1e-12)[source]

Pareto scale a spectrum using dataset-level feature statistics.

Pareto scaling is a dataset-level transform commonly used before PCA: each feature is mean-centred and divided by sqrt(std_feature). This down-weights very intense ions less aggressively than autoscaling while still reducing dominance by a few channels.

Parameters:
  • intensities (array-like)

  • mean (array-like) – Per-feature dataset mean with the same shape as intensities.

  • std (array-like) – Per-feature dataset standard deviation with the same shape as intensities.

  • eps (float, default 1e-12) – Numerical floor preventing division by zero.

Returns:

Mean-centred, Pareto-scaled spectrum. Negative values are expected.

Return type:

np.ndarray

Raises:

ValueError – If dataset-level mean/std arrays are not provided.

class mioXpektron.normalization.NormalizationEvaluator(files=<factory>, methods=None, method_kwargs_map=None, mz_min=None, mz_max=None, n_clusters=None, cluster_bootstrap_rounds=30, cluster_bootstrap_frac=0.8, random_state=0, compute_supervised=True, n_jobs=-1, group_patterns=None, group_fn=None)[source]

Bases: object

Evaluate normalization methods on labelled ToF-SIMS spectra.

Parameters:
  • files (list of str or Path) – Paths or glob patterns expanding to spectrum text files.

  • methods (list of str, optional) – Normalization method names. Defaults to a sensible subset.

  • method_kwargs_map (dict, optional) – {method_name: {kwarg: value, ...}} for method-specific params.

  • mz_min (float, optional) – m/z range to import.

  • mz_max (float, optional) – m/z range to import.

  • n_clusters (int, optional) – Number of clusters for KMeans evaluation. Auto-detected if omitted.

  • cluster_bootstrap_rounds (int) – Bootstrap rounds for stability metric.

  • random_state (int) – RNG seed for reproducibility.

  • compute_supervised (bool) – Run supervised classification (requires scikit-learn + >=2 groups).

  • n_jobs (int) – Parallel workers (joblib). -1 = all CPUs, 1 = sequential.

  • cluster_bootstrap_frac (float)

  • group_patterns (Dict[str, str] | None)

  • group_fn (Any | None)

Examples

>>> evaluator = NormalizationEvaluator(files=["data/*.txt"])
>>> summary = evaluator.evaluate()
>>> evaluator.plot()
files: List[str | Path]
methods: List[str] | None = None
method_kwargs_map: Dict[str, Dict[str, Any]] | None = None
mz_min: float | None = None
mz_max: float | None = None
n_clusters: int | None = None
cluster_bootstrap_rounds: int = 30
cluster_bootstrap_frac: float = 0.8
random_state: int = 0
compute_supervised: bool = True
n_jobs: int = -1
group_patterns: Dict[str, str] | None = None
group_fn: Any | None = None
evaluate()[source]

Evaluate all methods and return a scored DataFrame.

Returns:

One row per method, sorted by score_combined (descending). Includes raw metrics, z-scored metrics, and four composite scores.

Return type:

pd.DataFrame

plot(out_dir='normalization_selection_output', save=True)[source]

Generate evaluation plots (box plots, bar charts, radar).

Parameters:
  • out_dir (str or Path) – Sub-folder inside OUTPUT_DIR for saved figures.

  • save (bool) – Persist plots as PNG + PDF.

Returns:

Saved file paths.

Return type:

list of Path

print_summary(top_n=5)[source]

Print a ranked summary of evaluation results.

Parameters:

top_n (int, default 5) – Number of top methods to display per score variant.

Return type:

None

preview_overlay(file, methods=None, max_methods=5, mz_min=None, mz_max=None, save_to='normalization_selection_output')[source]

Plot raw vs normalised overlays for quick visual comparison.

Parameters:
  • file (str or Path) – Single spectrum file to visualise.

  • methods (list of str, optional) – Methods to overlay. Defaults to top methods from evaluation.

  • max_methods (int) – Cap on the number of overlays.

  • mz_min (float, optional) – m/z window for the plot.

  • mz_max (float, optional) – m/z window for the plot.

  • save_to (str, Path, or None) – Save directory (relative to OUTPUT_DIR). None skips saving.

Return type:

None

class mioXpektron.normalization.NormalizationMethods(mz_values, raw_intensities)[source]

Bases: object

Evaluate and apply normalization strategies for ToF-SIMS data.

Parameters:
  • mz_values (array-like) – The m/z axis shared by all spectra.

  • raw_intensities (array-like) – Raw intensity values aligned with mz_values.

apply(method='tic', **kwargs)[source]

Apply a named normalization to the stored spectrum.

Parameters:
Returns:

Normalized intensity array.

Return type:

np.ndarray

compare_visual(methods=None, method_kwargs_map=None, mz_min=0, mz_max=500, sample_name='test', group=None, figsize=(12, 8), save_plot=True)[source]

Plot the raw spectrum alongside several normalized versions.

Parameters:
  • methods (list of str, optional) – Normalization methods to overlay. Defaults to a curated set.

  • method_kwargs_map (dict, optional) – {method: {kwarg: value}} for method-specific parameters.

  • mz_min (float) – m/z bounds for the preview window.

  • mz_max (float) – m/z bounds for the preview window.

  • sample_name (str) – Label used for file naming.

  • group (str or None) – Group identifier.

  • figsize (tuple) – Figure size.

  • save_plot (bool) – Persist the rendered figure.

Return type:

matplotlib.axes.Axes

normalize_and_check(method='tic', method_kwargs=None, *, sample_name='test', group=None, mz_min=0, mz_max=500, show_peaks=False, peak_height=1000, peak_prominence=50, min_peak_width=1, max_peak_width=None, figsize=(10, 6), save_plot=True)[source]

Apply one normalization and visualise the result with peak overlay.

Parameters:
  • method (str) – Normalization method.

  • method_kwargs (dict, optional) – Extra kwargs forwarded to normalize().

  • sample_name (str) – Plot labels.

  • group (str) – Plot labels.

  • mz_min (float) – m/z window for the plot.

  • mz_max (float) – m/z window for the plot.

  • show_peaks (bool) – Annotate detected peaks.

  • peak_height (float) – Peak detection tuning passed to PlotPeak.

  • peak_prominence (float) – Peak detection tuning passed to PlotPeak.

  • min_peak_width (int) – Peak detection tuning passed to PlotPeak.

  • max_peak_width (int | None) – Peak detection tuning passed to PlotPeak.

  • figsize (tuple)

  • save_plot (bool)

Return type:

matplotlib.axes.Axes

static evaluate(files, methods=None, method_kwargs_map=None, mz_min=None, mz_max=None, n_jobs=-1, compute_supervised=True, save_results=True)[source]

Evaluate normalization methods across multiple spectra files.

Thin wrapper around NormalizationEvaluator that runs evaluation, prints a summary, and optionally saves results.

Parameters:
  • files (list of str or Path) – Spectrum file paths or glob patterns.

  • methods (list of str, optional) – Method names to evaluate.

  • method_kwargs_map (dict, optional) – Per-method keyword arguments.

  • mz_min (float, optional) – m/z range for data import.

  • mz_max (float, optional) – m/z range for data import.

  • n_jobs (int) – Parallel workers (-1 = all CPUs).

  • compute_supervised (bool) – Run supervised classification (requires scikit-learn).

  • save_results (bool) – Save CSV + JSON + plots to OUTPUT_DIR.

Returns:

The evaluator instance (call .plot() for figures).

Return type:

NormalizationEvaluator

static available_methods()[source]

Return sorted list of available normalization method names.

Return type:

List[str]

mioXpektron.normalization.data_preprocessing(file_path, mz_min=None, mz_max=None, normalization_target=1000000.0, verbose=True, return_all=False)[source]

Import and preprocess ToF-SIMS data from a text file.

Parameters:

file_pathstr

Path to the ToF-SIMS data file

mz_min, mz_maxfloat, optional

m/z range to import

normalization_targetfloat or None

Target TIC for normalization, or None to skip

verbosebool

Print progress if True

return_allbool

If True, return all intermediate arrays

Returns:

mz_values : numpy.ndarray normalized_intensities : numpy.ndarray sample_name : str group : str (optionally: intermediate arrays)

mioXpektron.normalization.batch_tic_norm(input_pattern, output_dir='normalized_spectra', mz_min=None, mz_max=None, normalization_target=1000000.0, verbose=False)[source]

Batch‑import and preprocess multiple ToF‑SIMS spectra, then save the (m/z, normalized_intensity) arrays for each file as a tab‑separated text file in output_dir.

Parameters:
Returns:

Paths of the files written, in processing order.

Return type:

List[str]

class mioXpektron.normalization.BatchTicNorm(input_pattern, output_dir='normalized_spectra', normalization_target=1000000.0, n_workers=-1, verbose=True)[source]

Bases: object

Batch TIC normalization for multiple spectra files using Polars and concurrent.futures.

Supports both CSV and TXT file formats: - CSV: Uses ‘corrected_intensity’ if available, otherwise ‘intensity’ - TXT: Tab-separated m/z and intensity values

Output files contain: channel, mz, intensity (normalized)

Parameters:
  • input_pattern (str)

  • output_dir (str)

  • normalization_target (float)

  • n_workers (int)

  • verbose (bool)

__init__(input_pattern, output_dir='normalized_spectra', normalization_target=1000000.0, n_workers=-1, verbose=True)[source]

Initialize BatchTicNorm processor.

Parameters:
  • input_pattern (str) – Glob pattern for input files (e.g., ‘data/.csv’ or ‘data/.txt’)

  • output_dir (str) – Directory to save normalized files

  • normalization_target (float) – Target TIC value for normalization (default: 1e6)

  • n_workers (int) – Number of parallel workers (default: 16)

  • verbose (bool) – Print progress information

process()[source]

Process all files using concurrent.futures.

Returns:

List of output file paths that were successfully created

Return type:

List[str]

get_tic_statistics()[source]

Calculate TIC statistics for all input files before normalization.

Returns:

DataFrame with columns: filename, tic_original, tic_million

Return type:

pl.DataFrame