Pipeline Reference ================== The mioXpektron pipeline provides end-to-end batch processing of ToF-SIMS spectra. It chains recalibration, denoising, baseline correction, normalization, peak detection, and cross-sample alignment into a single call. Pipeline Steps -------------- The pipeline executes these steps in order: 1. **Recalibration** (optional) --- convert channel numbers to m/z values using reference masses and a TOF model. 2. **Denoising** --- reduce noise with wavelet, Gaussian, median, or Savitzky-Golay filters. 3. **Baseline correction** --- remove broad background with AirPLS, AsLS, or other pybaselines methods. 4. **TIC normalization** --- scale spectra to a common Total Ion Current. 5. **Peak detection** --- find peaks using local maximum or CWT algorithms, with automatic noise estimation. 6. **Alignment** --- align detected peaks across samples by m/z tolerance, producing unified intensity and area matrices. Configuration ------------- .. code-block:: python from mioXpektron import PipelineConfig config = PipelineConfig( # Recalibration use_recalibration=True, reference_masses=[1.008, 22.99, 38.96, 58.07], output_folder_calibrated="calibrated_spectra", # Denoising denoise_method="wavelet", # wavelet | gaussian | median | savitzky_golay | none denoise_params=None, # dict of method-specific keyword arguments # Baseline baseline_method="airpls", baseline_params=None, clip_negative_after_baseline=True, # Normalization normalization_target=1e6, # Peak alignment mz_min=None, # optional m/z range filter mz_max=None, mz_tolerance=0.2, # Da tolerance for cross-sample alignment mz_rounding_precision=1, # Parallelism max_workers=None, # None = use all available cores # Adaptive parameterization (opt-in) auto_tune=False, # True = derive mz_tolerance and normalization_target from data ) Running the Pipeline -------------------- .. code-block:: python from mioXpektron import run_pipeline files = ["sample_01.txt", "sample_02.txt", "sample_03.txt"] intensity_df, area_df = run_pipeline(files, config=config) With calibration data: .. code-block:: python calib_channels = { "sample_01.txt": [100, 500, 1000, 2000], "sample_02.txt": [101, 502, 998, 2001], } intensity_df, area_df = run_pipeline( files, calib_channels_dict=calib_channels, config=config, ) Output Format ------------- The pipeline returns two ``pandas.DataFrame`` objects: **intensity_df** Rows = m/z values (aligned across samples), columns = sample names. Values are peak intensities after processing. **area_df** Same structure as ``intensity_df`` but values are integrated peak areas. Both DataFrames share the same m/z index, making them ready for downstream statistical analysis. Adaptive Parameterization ------------------------- Set ``auto_tune=True`` to let the pipeline derive ``mz_tolerance`` and ``normalization_target`` from the data instead of using hardcoded defaults: .. code-block:: python config = PipelineConfig(auto_tune=True) intensity_df, area_df = run_pipeline(files, config=config) When ``auto_tune`` is active the pipeline: 1. Estimates ``mz_tolerance`` from median m/z spacing across a pilot sample. 2. Estimates ``normalization_target`` from median raw TIC across the batch. All other parameters keep their defaults and can still be overridden manually. See :doc:`modules/adaptive` for details on each estimator. Reference Masses ^^^^^^^^^^^^^^^^ The pipeline now provides a canonical reference mass list ``DEFAULT_REFERENCE_MASSES`` (18 ions) used as the fallback when ``reference_masses`` is not provided. Import it directly: .. code-block:: python from mioXpektron import DEFAULT_REFERENCE_MASSES PipelineConfig Reference ------------------------ .. autoclass:: mioXpektron.PipelineConfig :members: :undoc-members: