Quick Start

This guide walks through a typical mioXpektron workflow: loading data, processing it step by step, and extracting peaks.

Importing Data

mioXpektron reads tab- or comma-separated ToF-SIMS spectra with columns for m/z values and intensities:

import mioXpektron as mx

# Load a single spectrum
mz, intensity, sample_name, group = mx.import_data("path/to/spectrum.txt")

The loader automatically detects separators, skips comment lines, and infers sample names from the filename.

Step-by-Step Processing

Denoising

Reduce noise while preserving peak shapes:

denoised = mx.noise_filtering(intensity, method="wavelet")

Available methods: "wavelet", "gaussian", "median", "savitzky_golay", "none".

Baseline Correction

Remove broad background signals:

corrected = mx.baseline_correction(denoised, method="airpls")

Over 20 methods are available from the pybaselines library, including "airpls", "asls", "mor", "snip", and more.

Normalization

Normalize spectra using any of 14 available methods:

from mioXpektron import normalize

# TIC normalization (default)
normalized = normalize(corrected, method="tic", target_tic=1e6)

# Poisson scaling (recommended before PCA)
scaled = normalize(corrected, method="poisson")

# Or use the direct function
from mioXpektron import tic_normalization
normalized = tic_normalization(corrected, target_tic=1e6)

Peak Detection

Detect peaks with area integration:

peaks_df = mx.detect_peaks_with_area(corrected, snr_threshold=3.0)

For continuous wavelet transform (CWT) based detection:

peaks_df = mx.detect_peaks_cwt_with_area(corrected, min_snr=3.0)

Visualization

Plot spectra with annotated peaks:

mx.PlotPeak(corrected, peaks_df)

Automated Pipeline

For batch processing, use the built-in pipeline that chains all steps:

from mioXpektron import run_pipeline, PipelineConfig

config = PipelineConfig(
    denoise_method="wavelet",
    baseline_method="airpls",
    normalization_target=1e6,
    mz_tolerance=0.2,
)

files = ["sample_01.txt", "sample_02.txt", "sample_03.txt"]
intensity_df, area_df = run_pipeline(files, config=config)

The pipeline returns two DataFrames: an intensity matrix and an area matrix, both aligned by m/z across all samples.

Adaptive Parameterization

Set auto_tune=True to let the pipeline derive optimal thresholds from your data instead of using fixed defaults:

from mioXpektron import FlexibleCalibrator, FlexibleCalibConfig

config = FlexibleCalibConfig(
    reference_masses=[1.0073, 27.0229, 29.0386, 41.0386, 57.0699, 104.1075],
    calibration_method="quad_sqrt",
    auto_tune=True,
)

calibrator = FlexibleCalibrator(config)
summary = calibrator.calibrate(file_list)

The pipeline also supports this flag:

config = PipelineConfig(auto_tune=True)
intensity_df, area_df = run_pipeline(files, config=config)

When auto_tune is active, parameters like calibration tolerance, outlier threshold, screening thresholds, normalization target, and alignment tolerance are estimated from the spectra. See Adaptive Parameterization for details on each estimator.

Mass Calibration

Calibrate channel-based spectra to m/z:

from mioXpektron import AutoCalibrator, AutoCalibConfig

config = AutoCalibConfig(
    reference_masses=[12.0, 28.0, 56.0],
    model="quadratic",
)

calibrator = AutoCalibrator(config)
calibrated_data = calibrator.calibrate(data)

For more control, use FlexibleCalibrator with explicit channel-to-mass mappings. See Calibration for details.

Batch Processing

Process entire directories of spectra:

from mioXpektron import BatchDenoising, batch_tic_norm

# Batch denoising
denoiser = BatchDenoising(method="savgol", window_length=11)
denoised_files = denoiser.process_directory("data/")

# Batch normalization
normalized = batch_tic_norm("data/", output_dir="normalized/")

Method Comparison

Compare denoising strategies on your data:

from mioXpektron import compare_denoising_methods

results = compare_denoising_methods(
    data,
    methods=["wavelet", "gaussian", "savgol"],
    metric="snr",
)

Evaluate baseline correction approaches:

import glob
import random
from mioXpektron import BaselineMethodEvaluator, ScanForFlatRegion

files = sorted(glob.glob("output_files/denoised_spectrums_*/*.txt"))
sample = sorted(random.Random(42).sample(files, min(10, len(files))))

windows = ScanForFlatRegion(files=sample).run()
param_grid = {
    "pspline_lsrpls": [{"lam": 1e6}],
    "pspline_drpls": [{"lam": 1e6}],
    "aspls": [{"lam": 1e6}],
    "imodpoly": [{"poly_order": 3}],
}

evaluator = BaselineMethodEvaluator(
    files=sample,
    methods=list(param_grid),
    param_grid=param_grid,
    flat_windows=windows,
)
summary = evaluator.evaluate()
best_method = summary["overall_best_spec"]

Evaluate normalization strategies:

from mioXpektron import NormalizationEvaluator

evaluator = NormalizationEvaluator(
    files=["spectra/"],
    methods=["tic", "robust_snv", "pqn", "mass_stratified_pqn", "log"],
    method_kwargs_map={
        "mass_stratified_pqn": {
            "strata": [(0.0, 100.0), (100.0, 400.0), (400.0, float("inf"))],
        },
    },
)
results = evaluator.evaluate()
evaluator.print_summary()
evaluator.plot()

For baseline-corrected CSV cohorts, use the repository notebook NoteBooks/_06_Normalization.ipynb to resample spectra onto a shared m/z grid before ranking methods. The notebook includes mass_stratified_pqn by default and can enable multi_ion_reference when reference ions are known.

Next Steps