Pipeline Reference

The mioXpektron pipeline provides end-to-end batch processing of ToF-SIMS spectra. It chains recalibration, denoising, baseline correction, normalization, peak detection, and cross-sample alignment into a single call.

Pipeline Steps

The pipeline executes these steps in order:

  1. Recalibration (optional) — convert channel numbers to m/z values using reference masses and a TOF model.

  2. Denoising — reduce noise with wavelet, Gaussian, median, or Savitzky-Golay filters.

  3. Baseline correction — remove broad background with AirPLS, AsLS, or other pybaselines methods.

  4. TIC normalization — scale spectra to a common Total Ion Current.

  5. Peak detection — find peaks using local maximum or CWT algorithms, with automatic noise estimation.

  6. Alignment — align detected peaks across samples by m/z tolerance, producing unified intensity and area matrices.

Configuration

from mioXpektron import PipelineConfig

config = PipelineConfig(
    # Recalibration
    use_recalibration=True,
    reference_masses=[1.008, 22.99, 38.96, 58.07],
    output_folder_calibrated="calibrated_spectra",

    # Denoising
    denoise_method="wavelet",     # wavelet | gaussian | median | savitzky_golay | none
    denoise_params=None,          # dict of method-specific keyword arguments

    # Baseline
    baseline_method="airpls",
    baseline_params=None,
    clip_negative_after_baseline=True,

    # Normalization
    normalization_target=1e6,

    # Peak alignment
    mz_min=None,                  # optional m/z range filter
    mz_max=None,
    mz_tolerance=0.2,            # Da tolerance for cross-sample alignment
    mz_rounding_precision=1,

    # Parallelism
    max_workers=None,             # None = use all available cores

    # Adaptive parameterization (opt-in)
    auto_tune=False,             # True = derive mz_tolerance and normalization_target from data
)

Running the Pipeline

from mioXpektron import run_pipeline

files = ["sample_01.txt", "sample_02.txt", "sample_03.txt"]

intensity_df, area_df = run_pipeline(files, config=config)

With calibration data:

calib_channels = {
    "sample_01.txt": [100, 500, 1000, 2000],
    "sample_02.txt": [101, 502, 998, 2001],
}

intensity_df, area_df = run_pipeline(
    files,
    calib_channels_dict=calib_channels,
    config=config,
)

Output Format

The pipeline returns two pandas.DataFrame objects:

intensity_df

Rows = m/z values (aligned across samples), columns = sample names. Values are peak intensities after processing.

area_df

Same structure as intensity_df but values are integrated peak areas.

Both DataFrames share the same m/z index, making them ready for downstream statistical analysis.

Adaptive Parameterization

Set auto_tune=True to let the pipeline derive mz_tolerance and normalization_target from the data instead of using hardcoded defaults:

config = PipelineConfig(auto_tune=True)
intensity_df, area_df = run_pipeline(files, config=config)

When auto_tune is active the pipeline:

  1. Estimates mz_tolerance from median m/z spacing across a pilot sample.

  2. Estimates normalization_target from median raw TIC across the batch.

All other parameters keep their defaults and can still be overridden manually. See Adaptive Parameterization for details on each estimator.

Reference Masses

The pipeline now provides a canonical reference mass list DEFAULT_REFERENCE_MASSES (18 ions) used as the fallback when reference_masses is not provided. Import it directly:

from mioXpektron import DEFAULT_REFERENCE_MASSES

PipelineConfig Reference

class mioXpektron.PipelineConfig(use_recalibration=True, reference_masses=None, output_folder_calibrated='calibrated_spectra', denoise_method='wavelet', denoise_params=None, baseline_method='airpls', baseline_params=None, clip_negative_after_baseline=True, normalization_target=1000000.0, mz_min=None, mz_max=None, mz_tolerance=0.2, mz_rounding_precision=1, max_workers=None, auto_tune=False)[source]

Bases: object

High-level pipeline configuration for batch ToF‑SIMS processing.

Parameters:
  • use_recalibration (bool)

  • reference_masses (List[float] | None)

  • output_folder_calibrated (str)

  • denoise_method (str)

  • denoise_params (Dict | None)

  • baseline_method (str)

  • baseline_params (Dict | None)

  • clip_negative_after_baseline (bool)

  • normalization_target (float)

  • mz_min (float | None)

  • mz_max (float | None)

  • mz_tolerance (float)

  • mz_rounding_precision (int)

  • max_workers (int | None)

  • auto_tune (bool)

use_recalibration: bool = True
reference_masses: List[float] | None = None
output_folder_calibrated: str = 'calibrated_spectra'
denoise_method: str = 'wavelet'
denoise_params: Dict | None = None
baseline_method: str = 'airpls'
baseline_params: Dict | None = None
clip_negative_after_baseline: bool = True
normalization_target: float = 1000000.0
mz_min: float | None = None
mz_max: float | None = None
mz_tolerance: float = 0.2
mz_rounding_precision: int = 1
max_workers: int | None = None
auto_tune: bool = False