Baseline Correction

The baseline module removes broad background signals from ToF-SIMS spectra. It wraps the pybaselines library and adds batch processing, method evaluation, and flexible column name handling.

Quick Example

from mioXpektron import baseline_correction

corrected, baseline = baseline_correction(
    intensity,
    method="airpls",
    lam=1e6,
    return_baseline=True,
)

Available Methods

mioXpektron supports the 1-D baseline methods exposed by pybaselines plus two lightweight filters:

  • "median_filter"

  • "adaptive_window"

  • "poly" as a convenience alias

  • the available pybaselines methods returned by baseline_method_names()

For the current method list in your environment:

from mioXpektron import baseline_method_names

print(baseline_method_names())

Each method accepts its own keyword arguments, which are passed through to the underlying implementation. Parameterized evaluator labels such as "aspls(lam=1000000.0)" can also be passed back into the baseline utilities.

Batch Baseline Correction

Process multiple files in parallel:

from mioXpektron import BaselineBatchCorrector

corrector = BaselineBatchCorrector(
    in_dir="denoised_spectra",
    pattern="*.txt",
    method="airpls",
    method_kwargs={"lam": 1e6},
    n_jobs=4,
)

out_dir = corrector.run(out_root="output_files")

Method Evaluation

Systematically compare baseline methods using quality metrics:

import glob
import random
from mioXpektron import BaselineMethodEvaluator, ScanForFlatRegion

files = sorted(glob.glob("output_files/denoised_spectrums_*/*.txt"))
sample = sorted(random.Random(42).sample(files, min(10, len(files))))

windows = ScanForFlatRegion(files=sample).run()

param_grid = {
    "pspline_lsrpls": [{"lam": 1e6}],
    "pspline_drpls": [{"lam": 1e6}],
    "pspline_iarpls": [{"lam": 1e6}],
    "pspline_arpls": [{"lam": 1e6}],
    "pspline_airpls": [{"lam": 1e6}],
    "aspls": [{"lam": 1e6}],
    "imodpoly": [{"poly_order": 3}],
}

evaluator = BaselineMethodEvaluator(
    files=sample,
    methods=list(param_grid),
    param_grid=param_grid,
    flat_windows=windows,
)
summary = evaluator.evaluate(n_jobs=4)
best = summary["overall_best_spec"]
print(best["label"], best["method"], best["kwargs"])
evaluator.preview_overlay(
    file=sample[0],
    methods=[spec["label"] for spec in summary["overall_order_specs"][:3]],
)

If param_grid is provided and methods is omitted, the evaluator uses the grid keys as the candidate set. For large cohorts, evaluating a representative random subset of spectra first is usually much faster than scoring every file.

The evaluator computes six metrics:

  • RFZN — Residual Flatness in Zero-signal regions (Noise)

  • NAR — Negative Area Ratio (how much correction goes below zero)

  • SNR — Signal-to-Noise Ratio improvement

  • BBI — Baseline-Below-Input indicator

  • BR — Baseline Roughness

  • NBC — Negative Bin Count

Flat Region Detection

Identify flat (signal-free) regions for baseline evaluation:

from mioXpektron import ScanForFlatRegion

scanner = ScanForFlatRegion(files=sample)
flat_regions = scanner.run()

Column Name Flexibility

The baseline module automatically recognizes common column naming conventions:

  • Channel: channel, chan, ch, index, idx

  • m/z: m/z, mz, mass, moverz, m_over_z

  • Intensity: intensity, counts, signal, y, ion_counts

Matching is case-insensitive.

API Reference

mioXpektron.baseline.baseline_correction(intensities, method='airpls', window_size=101, poly_order=4, clip_negative=True, return_baseline=False, **kwargs)[source]

Baseline-correct a 1‑D spectrum with pybaselines or custom filters.

Parameters:
  • intensities (array-like) – Raw y values.

  • method (str) – Algorithm name; see baseline_method_names().

  • window_size (int) – Kernel width for the two custom filters.

  • poly_order (int) – Polynomial order for the ‘poly’ alias.

  • clip_negative (bool) – If True, negative corrected values are set to 0.

  • return_baseline (bool) – If True, also return the estimated baseline.

  • **kwargs – Forwarded to the chosen algorithm (e.g. lam=1e6, p=0.01).

Return type:

corrected or (corrected, baseline)

class mioXpektron.baseline.BaselineMethodEvaluator(files=<factory>, methods=None, param_grid=None, use_small_param_preset=False, auto_scale_window_size=True, eval_clip_negative=False, topk_for_snr=5, raw_noise_quantile=0.2, flat_windows=None, metrics_for_composite=('rfzn', 'nar', 'snr', 'bbi', 'br', 'nbc'), n_jobs=-1)[source]

Bases: object

Evaluate baseline algorithms on ToF‑SIMS files supplied as paths or globs.

Parameters:
files: List[str | Path]
methods: List[str] | None = None
param_grid: Dict[str, List[Dict]] | None = None
use_small_param_preset: bool = False
auto_scale_window_size: bool = True
eval_clip_negative: bool = False
topk_for_snr: int = 5
raw_noise_quantile: float = 0.2
flat_windows: List[Tuple[float, float]] | None = None
metrics_for_composite: Tuple[str, ...] = ('rfzn', 'nar', 'snr', 'bbi', 'br', 'nbc')
n_jobs: int = -1
labels: List[str]
specs: List[Tuple[str, Dict]]
evaluate(noise_quantile=None, n_jobs=None)[source]
Parameters:
  • noise_quantile (float | None)

  • n_jobs (int | None)

warning_log()[source]
Return type:

DataFrame

plot(out_dir='baseline_selection_output')[source]
Parameters:

out_dir (str | Path)

Return type:

List[Path]

preview_overlay(file, methods=None, max_methods=5, save_to='baseline_selection_output', show_errors=True)[source]

Plot raw, baseline and corrected overlays for a few methods on a single file.

Parameters:
  • file (str or Path) – Path to a single spectrum file (not a list!)

  • methods (list of str, optional) – Method names to plot. If None, uses top methods from evaluation.

  • max_methods (int) – Maximum number of methods to plot (default: 5)

  • save_to (str or Path, optional) – Directory to save plots. Set to None to skip saving.

  • show_errors (bool) – If True (default), print errors when methods fail instead of silently ignoring them.

class mioXpektron.baseline.BaselineBatchCorrector(in_dir: 'Union[str, Path]', pattern: 'str' = '*.csv', recursive: 'bool' = False, method: 'str' = 'airpls', method_kwargs: 'Dict' = <factory>, clip_negative: 'bool' = True, per_file_best: 'bool' = False, best_method_map: 'Optional[Dict[str, str]]'=None, n_jobs: 'int' = -1, save_plots: 'bool' = False)[source]

Bases: object

Parameters:
in_dir: str | Path
pattern: str = '*.csv'
recursive: bool = False
method: str = 'airpls'
method_kwargs: Dict
clip_negative: bool = True
per_file_best: bool = False
best_method_map: Dict[str, str] | None = None
n_jobs: int = -1
save_plots: bool = False
run(out_root=None)[source]
Parameters:

out_root (str | Path | None)

Return type:

Path

class mioXpektron.baseline.ScanForFlatRegion(files: 'List[Union[str, Path]]'=<factory>, out_dir: 'Union[str, Path]'='flat_windows_out', n_jobs: 'int' = -1, flat_params: 'FlatParams' = <factory>, agg_params: 'AggregateParams' = <factory>, auto_tune: 'bool' = False)[source]

Bases: object

Parameters:
files: List[str | Path]
out_dir: str | Path = 'flat_windows_out'
n_jobs: int = -1
flat_params: FlatParams
agg_params: AggregateParams
auto_tune: bool = False
run()[source]