Peak Detection

The detection module identifies peaks in processed spectra, computes areas by integration, and aligns peaks across multiple samples.

Quick Example

from mioXpektron import detect_peaks_with_area

peaks_df = detect_peaks_with_area(
    mz_values=mz,
    intensities=corrected,
    sample_name="sample_01",
    group="control",
    min_snr=3.0,
    noise_model="mz_binned",
)

Detection Algorithms

Local Maximum Detection

The default method. Finds peaks as local maxima above a noise-based threshold:

from mioXpektron import detect_peaks_with_area, detect_peaks_with_area_v2

# Standard version
peaks = detect_peaks_with_area(
    mz_values=mz,
    intensities=corrected,
    sample_name="sample_01",
    group="control",
    min_snr=3.0,
)

# Enhanced version with additional peak properties
peaks = detect_peaks_with_area_v2(
    mz_values=mz,
    intensities=corrected,
    sample_name="sample_01",
    group="control",
    min_snr=3.0,
    noise_model="mz_binned",
    noise_bins=20,
)

CWT-Based Detection

Uses the Continuous Wavelet Transform for multi-scale peak detection, which is more robust to varying peak widths:

from mioXpektron import detect_peaks_cwt_with_area

peaks = detect_peaks_cwt_with_area(
    mz_values=mz,
    intensities=corrected,
    sample_name="sample_01",
    group="control",
    min_snr=3.0,
)

Noise Estimation

Robust noise estimation using the Median Absolute Deviation (MAD) approach, which excludes peak regions for accurate background noise measurement:

from mioXpektron import robust_noise_estimation

median_noise, std_noise = robust_noise_estimation(corrected)

The default global thresholding path uses a Gaussian-equivalent MAD estimate on positive intensities after masking the measured width of detected peaks plus an additional point margin. This is a robust heuristic for thresholding, not a full physical Poisson detector model.

For spectra whose background varies across the mass range, the detection entry points also support noise_model="mz_binned". This estimates local background statistics in m/z bins and interpolates them back to a per-point threshold profile:

peaks = detect_peaks_with_area_v2(
    mz_values=mz,
    intensities=corrected,
    sample_name="sample_01",
    group="control",
    noise_model="mz_binned",
    noise_bins=20,
    noise_min_points=25,
)

Available noise models:

  • "global": one threshold for the full spectrum

  • "mz_binned": interpolated m/z-dependent thresholds

For spectra with strong mass-dependent background changes, "mz_binned" is the preferred choice. The global model remains useful as a fast default, but its SNR interpretation should be treated as heuristic.

Area Integration

Peak areas are computed from peak widths and corrected baselines. The current integration path:

  • handles empty or invalid background regions defensively

  • integrates on the true floating peak boundaries

  • reports the area definition and integration method in the output table

Batch Peak Collection

collect_peak_properties_batch() runs the full preprocessing and peak collection workflow across many spectra and forwards the detector-specific options consistently:

peaks_df = collect_peak_properties_batch(
    files=file_list,
    method="Gaussian",
    min_intensity=5,
    min_snr=3.0,
    noise_model="mz_binned",
    noise_bins=20,
)

For analytic fit methods that enable overlapping-peak deconvolution, the current implementation now uses a conservative two-stage acceptance rule:

  • nearby peaks must overlap on an adaptive width-based spacing criterion

  • the two-Gaussian fit must improve BIC over a single-Gaussian window fit by at least deconvolution_min_bic_delta (default 10)

Component widths are also checked against the configured peak-width bounds before the deconvoluted peaks are accepted.

Cross-Sample Alignment

Align peaks across multiple samples by m/z tolerance:

from mioXpektron import align_peaks, PeakAlignIntensityArea

# Align peak lists from multiple samples
aligned = align_peaks(peak_list, mz_tolerance=0.2)

# Full alignment with intensity and area matrices
aligner = PeakAlignIntensityArea(
    mz_tolerance=0.2,
    method="Gaussian",
    noise_model="mz_binned",
    noise_bins=20,
    deconvolution_min_bic_delta=10.0,
)
intensity_matrix, area_matrix = aligner.align(peak_data)

PeakAlignIntensityArea now exposes the underlying peak-detection method and the same noise-model options as the batch collector, so alignment runs can be compared on equal footing.

Overlapping Peak Analysis

Detect and visualize overlapping peaks:

from mioXpektron import check_overlapping_peaks, check_overlapping_peaks2

# Basic overlap check
overlaps = check_overlapping_peaks(peaks, resolution_threshold=0.5)

# Enhanced analysis with visualization
check_overlapping_peaks2(peaks, data, resolution_threshold=0.5)

API Reference

mioXpektron.detection.detect_peaks_with_area(mz_values, intensities, sample_name, group, min_intensity=1, min_snr=3, min_distance=2, window_size=10, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, verbose=False)[source]

Fast peak detection in ToF-SIMS or similar spectra, including peak area.

Returns:

peak_indicesnp.ndarray

Indices of detected peaks

peak_propertiesdict

Contains: mz, intensities, widths, prominences, heights, areas

mioXpektron.detection.detect_peaks_with_area_v2(mz, intens, sample_name, group, *, min_intensity=1, min_snr=3, min_distance=2, prominence=10, min_peak_width=1, max_peak_width=75, rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, noise_window=10, verbose=False)[source]
mioXpektron.detection.detect_peaks_cwt_with_area(mz_values, intensities, sample_name, group, min_intensity=1, min_snr=3, min_distance=2, window_size=10, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, verbose=False)[source]

Peak detection using Continuous Wavelet Transform (CWT) for ToF-SIMS spectra.

Returns:

peak_propertiespd.DataFrame

Contains: mz, intensities, widths (approx), amplitudes, areas

mioXpektron.detection.robust_peak_detection(mz_values, intensities, sample_name, group, method='Gaussian', min_intensity=1, min_snr=3, min_distance=2, window_size=10, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, distance_threshold=0.1, combined=False, use_cwt=False, noise_model='global', noise_bins=20, noise_min_points=25, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True, verbose=False)[source]

Fast peak detection in ToF-SIMS or similar spectra, including peak area.

Returns:

peak_indicesnp.ndarray

Indices of detected peaks

peak_propertiesdict

Contains: mz, intensities, widths, prominences, heights, areas

Notes

Overlapping-peak deconvolution now requires both geometric overlap and a BIC improvement over a single-Gaussian window fit. Fitted component widths must also remain within the user-specified peak-width bounds.

mioXpektron.detection.robust_noise_estimation(intensities, peak_indices=None, window=2, peak_height=None, peak_prominence=None, min_peak_width=1, max_peak_width=75)[source]

Robust noise estimation by excluding regions near detected peaks.

Parameters:
  • intensities (np.ndarray) – Denoised, baseline-corrected intensities.

  • peak_indices (np.ndarray or None) – Indices of detected peaks. If None, function will detect peaks automatically.

  • window (int) – Extra number of data points to exclude on each side of the detected peak width. The measured peak extent is always masked first.

  • peak_height (float or None) – Minimum height for peak detection. If None, defaults to the median of positive intensities (data-adaptive).

  • peak_prominence (float or None) – Minimum prominence for peak detection. If None, defaults to 3x the MAD of positive intensities (data-adaptive).

Returns:

  • median_intensity (float) – Median intensity of noise region.

  • robust_std (float) – Robust standard deviation (Gaussian-equivalent MAD) of noise region.

mioXpektron.detection.robust_noise_estimation_mz_dependent(mz_values, intensities, peak_indices=None, window=2, peak_height=None, peak_prominence=None, min_peak_width=1, max_peak_width=75, n_bins=20, min_points_per_bin=25)[source]

Estimate local noise as piecewise-constant m/z bins interpolated over the spectrum.

Returns:

  • median_profile (np.ndarray) – Per-point local median noise estimate.

  • std_profile (np.ndarray) – Per-point local Gaussian-equivalent robust std estimate.

mioXpektron.detection.collect_peak_properties_batch(files, mz_min=None, mz_max=None, baseline_method='airpls', noise_method='wavelet', missing_value_method='interpolation', normalization_target=100000000.0, method='Gaussian', min_intensity=1, min_snr=3, min_distance=5, window_size=10, peak_height=50, prominence=50, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, distance_threshold=0.01, combined=False, noise_model='global', noise_bins=20, noise_min_points=25, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True)[source]

Collect peak properties from a batch of ToF-SIMS files.

Parameters:
  • files (list of str) – List of file paths to process.

  • mz_min (float or None) – m/z window for data import (if supported).

  • mz_max (float or None) – m/z window for data import (if supported).

  • baseline_method (str) – Method for baseline correction.

  • noise_method (str) – Noise filtering method.

  • missing_value_method (str) – Method for handling missing values.

  • normalization_target (float) – Target TIC normalization value.

  • min_snr (int or float) – Minimum signal-to-noise ratio for peak detection.

  • min_distance (int) – Minimum distance between peaks (in data points).

  • prominence (int or float or None) – Minimum peak prominence for detection.

  • width_rel_height (float) – Relative height for width calculation (e.g., 0.5 = FWHM).

  • noise_model ({"global", "mz_binned"}) – Noise model used to derive peak thresholds.

  • noise_bins (int) – Number of m/z bins for noise_model="mz_binned".

  • noise_min_points (int) – Minimum positive noise points per bin before using local estimates.

Returns:

peaks_df – DataFrame with all peak properties for all files.

Return type:

pd.DataFrame

mioXpektron.detection.align_peaks(peaks_df, mz_tolerance=0.2, mz_rounding_precision=1, output='intensity')[source]

Cluster peaks by m/z and return an aligned feature matrix.

Uses a greedy sorted-bin algorithm that guarantees every aligned bin spans at most mz_tolerance in m/z.

class mioXpektron.detection.PeakAlignIntensityArea(mz_tolerance=0.2, mz_rounding_precision=1, min_intensity=1, min_snr=3, min_distance=2, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, method=None, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True, output_dir=None, verbose=False, group_patterns=None, group_fn=None)[source]

Bases: object

Process normalized ToF-SIMS spectra from CSV files, detect peaks, align them across samples, and calculate both intensity and area tables for each aligned m/z value.

Parameters:
  • mz_tolerance (float, optional (default=0.2)) – Maximum distance (in m/z units) for clustering peaks across samples.

  • mz_rounding_precision (int, optional (default=1)) – Number of decimal places for rounding aligned m/z values in output tables.

  • min_intensity (float, optional (default=1)) – Minimum intensity threshold for considering data points.

  • min_snr (float, optional (default=3)) – Minimum signal-to-noise ratio for peak detection.

  • min_distance (int, optional (default=2)) – Minimum distance (in data points) between peaks.

  • peak_height (float, optional (default=50)) – Minimum peak height for initial peak detection.

  • prominence (float, optional (default=10)) – Minimum prominence for peak detection.

  • min_peak_width (int, optional (default=1)) – Minimum peak width (in data points).

  • max_peak_width (int, optional (default=75)) – Maximum peak width (in data points).

  • width_rel_height (float, optional (default=0.5)) – Relative height for peak width calculation (0.5 = FWHM).

  • noise_model ({"global", "mz_binned"}, optional (default="global")) – Noise model used for threshold estimation.

  • noise_bins (int, optional (default=20)) – Number of m/z bins when using noise_model="mz_binned".

  • noise_min_points (int, optional (default=25)) – Minimum positive noise points per bin for the local model.

  • method (str or None, optional (default=None)) – Peak-detection / fitting method. None uses simple local-max detection (detect_peaks_with_area_v2), 'cwt' uses CWT detection, and 'Gaussian' / 'Lorentzian' / 'Voigt' use curve-fit detection via robust_peak_detection.

  • deconvolution_min_bic_delta (float, optional (default=10.0)) – Minimum BIC improvement required before accepting a two-Gaussian deconvolution over a single-peak fit.

  • deconvolution_overlap_factor (float, optional (default=0.75)) – Scale factor applied to the mean measured peak width when deriving the adaptive deconvolution spacing gate.

  • deconvolution_replace_singles (bool, optional (default=True)) – If True, replace overlapping single-peak fits with the accepted deconvoluted components in the output table.

  • output_dir (str or None, optional) – Directory to save output CSV files. If None, files are not saved.

  • verbose (bool, optional (default=False)) – If True, print progress information.

Examples

>>> from mioXpektron.detection import PeakAlignIntensityArea
>>> import glob
>>>
>>> # Get all normalized spectra
>>> csv_files = glob.glob('output_files/normalized_spectra/*.csv')
>>>
>>> # Create analyzer instance
>>> analyzer = PeakAlignIntensityArea(
...     mz_tolerance=0.1,
...     min_snr=3,
...     output_dir='output_files/peak_analysis'
... )
>>>
>>> # Process with m/z cutoff
>>> intensity_table, area_table, peaks_df = analyzer.run(
...     csv_files,
...     mz_min=50,
...     mz_max=500
... )
>>>
>>> print(f"Detected {len(peaks_df)} peaks across {len(csv_files)} samples")
>>> print(f"Aligned to {intensity_table.shape[1]} unique m/z values")
__init__(mz_tolerance=0.2, mz_rounding_precision=1, min_intensity=1, min_snr=3, min_distance=2, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, method=None, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True, output_dir=None, verbose=False, group_patterns=None, group_fn=None)[source]

Initialize the PeakAlignIntensityArea analyzer with default parameters.

run(csv_files, mz_min=None, mz_max=None)[source]

Process CSV files and perform peak detection, alignment, and quantification.

Parameters:
  • csv_files (list of str) – List of paths to normalized spectrum CSV files. Each CSV should have columns: ‘channel’, ‘mz’, ‘intensity’

  • mz_min (float or None, optional) – Minimum m/z value to consider for peak detection. If None, use full range.

  • mz_max (float or None, optional) – Maximum m/z value to consider for peak detection. If None, use full range.

Returns:

  • intensity_table (pd.DataFrame) – DataFrame with samples as rows and aligned m/z values as columns, containing peak intensities (amplitudes). Missing peaks are filled with 0.

  • area_table (pd.DataFrame) – DataFrame with samples as rows and aligned m/z values as columns, containing peak areas. Missing peaks are filled with 0.

  • peaks_df (pd.DataFrame) – DataFrame containing all detected peaks with their properties before alignment.