mioXpektron.detection

Peak detection utilities for the Xpektron toolkit.

mioXpektron.detection.align_peaks(peaks_df, mz_tolerance=0.2, mz_rounding_precision=1, output='intensity')[source]

Cluster peaks by m/z and return an aligned feature matrix.

Uses a greedy sorted-bin algorithm that guarantees every aligned bin spans at most mz_tolerance in m/z.

mioXpektron.detection.check_overlapping_peaks(data_dir, file_name, mz_min, mz_max, norm_tic=False, alpha=0.2)[source]
mioXpektron.detection.check_overlapping_peaks2(data_dir, file_pattern, mz_min, mz_max, norm_tic=False, alpha=0.18, bin_width=0.001, show_median=True, show_group_cumulative=True)[source]

Overlay spectra with two colors (Cancer vs Control) inferred from file names.

Parameters:
  • data_dir (str) – Directory containing spectra.

  • file_pattern (str) – Glob pattern (e.g., “*.txt”).

  • mz_min (float) – m/z window to visualize.

  • mz_max (float) – m/z window to visualize.

  • norm_tic (bool, default False) – Normalize each spectrum by its TIC prior to plotting.

  • alpha (float, default 0.18) – Line transparency for individual spectra.

  • bin_width (float, default 0.001) – Common grid step for interpolation (used for medians/cumulative plots).

  • show_median (bool, default True) – If True, overlay per-group median curves (thicker lines).

  • show_group_cumulative (bool, default True) – If True, plot per-group cumulative intensity curves on a separate figure.

Notes

  • Group detection is based on substrings in filenames: “_CC” (Cancer), “_CT” (Control).

  • Files without these markers are labeled “Unknown” and plotted in grey.

mioXpektron.detection.collect_peak_properties_batch(files, mz_min=None, mz_max=None, baseline_method='airpls', noise_method='wavelet', missing_value_method='interpolation', normalization_target=100000000.0, method='Gaussian', min_intensity=1, min_snr=3, min_distance=5, window_size=10, peak_height=50, prominence=50, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, distance_threshold=0.01, combined=False, noise_model='global', noise_bins=20, noise_min_points=25, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True)[source]

Collect peak properties from a batch of ToF-SIMS files.

Parameters:
  • files (list of str) – List of file paths to process.

  • mz_min (float or None) – m/z window for data import (if supported).

  • mz_max (float or None) – m/z window for data import (if supported).

  • baseline_method (str) – Method for baseline correction.

  • noise_method (str) – Noise filtering method.

  • missing_value_method (str) – Method for handling missing values.

  • normalization_target (float) – Target TIC normalization value.

  • min_snr (int or float) – Minimum signal-to-noise ratio for peak detection.

  • min_distance (int) – Minimum distance between peaks (in data points).

  • prominence (int or float or None) – Minimum peak prominence for detection.

  • width_rel_height (float) – Relative height for width calculation (e.g., 0.5 = FWHM).

  • noise_model ({"global", "mz_binned"}) – Noise model used to derive peak thresholds.

  • noise_bins (int) – Number of m/z bins for noise_model="mz_binned".

  • noise_min_points (int) – Minimum positive noise points per bin before using local estimates.

Returns:

peaks_df – DataFrame with all peak properties for all files.

Return type:

pd.DataFrame

mioXpektron.detection.detect_peaks_cwt_with_area(mz_values, intensities, sample_name, group, min_intensity=1, min_snr=3, min_distance=2, window_size=10, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, verbose=False)[source]

Peak detection using Continuous Wavelet Transform (CWT) for ToF-SIMS spectra.

Returns:

peak_propertiespd.DataFrame

Contains: mz, intensities, widths (approx), amplitudes, areas

mioXpektron.detection.detect_peaks_with_area(mz_values, intensities, sample_name, group, min_intensity=1, min_snr=3, min_distance=2, window_size=10, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, verbose=False)[source]

Fast peak detection in ToF-SIMS or similar spectra, including peak area.

Returns:

peak_indicesnp.ndarray

Indices of detected peaks

peak_propertiesdict

Contains: mz, intensities, widths, prominences, heights, areas

mioXpektron.detection.detect_peaks_with_area_v2(mz, intens, sample_name, group, *, min_intensity=1, min_snr=3, min_distance=2, prominence=10, min_peak_width=1, max_peak_width=75, rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, noise_window=10, verbose=False)[source]
class mioXpektron.detection.PeakAlignIntensityArea(mz_tolerance=0.2, mz_rounding_precision=1, min_intensity=1, min_snr=3, min_distance=2, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, method=None, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True, output_dir=None, verbose=False, group_patterns=None, group_fn=None)[source]

Bases: object

Process normalized ToF-SIMS spectra from CSV files, detect peaks, align them across samples, and calculate both intensity and area tables for each aligned m/z value.

Parameters:
  • mz_tolerance (float, optional (default=0.2)) – Maximum distance (in m/z units) for clustering peaks across samples.

  • mz_rounding_precision (int, optional (default=1)) – Number of decimal places for rounding aligned m/z values in output tables.

  • min_intensity (float, optional (default=1)) – Minimum intensity threshold for considering data points.

  • min_snr (float, optional (default=3)) – Minimum signal-to-noise ratio for peak detection.

  • min_distance (int, optional (default=2)) – Minimum distance (in data points) between peaks.

  • peak_height (float, optional (default=50)) – Minimum peak height for initial peak detection.

  • prominence (float, optional (default=10)) – Minimum prominence for peak detection.

  • min_peak_width (int, optional (default=1)) – Minimum peak width (in data points).

  • max_peak_width (int, optional (default=75)) – Maximum peak width (in data points).

  • width_rel_height (float, optional (default=0.5)) – Relative height for peak width calculation (0.5 = FWHM).

  • noise_model ({"global", "mz_binned"}, optional (default="global")) – Noise model used for threshold estimation.

  • noise_bins (int, optional (default=20)) – Number of m/z bins when using noise_model="mz_binned".

  • noise_min_points (int, optional (default=25)) – Minimum positive noise points per bin for the local model.

  • method (str or None, optional (default=None)) – Peak-detection / fitting method. None uses simple local-max detection (detect_peaks_with_area_v2), 'cwt' uses CWT detection, and 'Gaussian' / 'Lorentzian' / 'Voigt' use curve-fit detection via robust_peak_detection.

  • deconvolution_min_bic_delta (float, optional (default=10.0)) – Minimum BIC improvement required before accepting a two-Gaussian deconvolution over a single-peak fit.

  • deconvolution_overlap_factor (float, optional (default=0.75)) – Scale factor applied to the mean measured peak width when deriving the adaptive deconvolution spacing gate.

  • deconvolution_replace_singles (bool, optional (default=True)) – If True, replace overlapping single-peak fits with the accepted deconvoluted components in the output table.

  • output_dir (str or None, optional) – Directory to save output CSV files. If None, files are not saved.

  • verbose (bool, optional (default=False)) – If True, print progress information.

Examples

>>> from mioXpektron.detection import PeakAlignIntensityArea
>>> import glob
>>>
>>> # Get all normalized spectra
>>> csv_files = glob.glob('output_files/normalized_spectra/*.csv')
>>>
>>> # Create analyzer instance
>>> analyzer = PeakAlignIntensityArea(
...     mz_tolerance=0.1,
...     min_snr=3,
...     output_dir='output_files/peak_analysis'
... )
>>>
>>> # Process with m/z cutoff
>>> intensity_table, area_table, peaks_df = analyzer.run(
...     csv_files,
...     mz_min=50,
...     mz_max=500
... )
>>>
>>> print(f"Detected {len(peaks_df)} peaks across {len(csv_files)} samples")
>>> print(f"Aligned to {intensity_table.shape[1]} unique m/z values")
__init__(mz_tolerance=0.2, mz_rounding_precision=1, min_intensity=1, min_snr=3, min_distance=2, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, method=None, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True, output_dir=None, verbose=False, group_patterns=None, group_fn=None)[source]

Initialize the PeakAlignIntensityArea analyzer with default parameters.

run(csv_files, mz_min=None, mz_max=None)[source]

Process CSV files and perform peak detection, alignment, and quantification.

Parameters:
  • csv_files (list of str) – List of paths to normalized spectrum CSV files. Each CSV should have columns: ‘channel’, ‘mz’, ‘intensity’

  • mz_min (float or None, optional) – Minimum m/z value to consider for peak detection. If None, use full range.

  • mz_max (float or None, optional) – Maximum m/z value to consider for peak detection. If None, use full range.

Returns:

  • intensity_table (pd.DataFrame) – DataFrame with samples as rows and aligned m/z values as columns, containing peak intensities (amplitudes). Missing peaks are filled with 0.

  • area_table (pd.DataFrame) – DataFrame with samples as rows and aligned m/z values as columns, containing peak areas. Missing peaks are filled with 0.

  • peaks_df (pd.DataFrame) – DataFrame containing all detected peaks with their properties before alignment.

mioXpektron.detection.robust_noise_estimation(intensities, peak_indices=None, window=2, peak_height=None, peak_prominence=None, min_peak_width=1, max_peak_width=75)[source]

Robust noise estimation by excluding regions near detected peaks.

Parameters:
  • intensities (np.ndarray) – Denoised, baseline-corrected intensities.

  • peak_indices (np.ndarray or None) – Indices of detected peaks. If None, function will detect peaks automatically.

  • window (int) – Extra number of data points to exclude on each side of the detected peak width. The measured peak extent is always masked first.

  • peak_height (float or None) – Minimum height for peak detection. If None, defaults to the median of positive intensities (data-adaptive).

  • peak_prominence (float or None) – Minimum prominence for peak detection. If None, defaults to 3x the MAD of positive intensities (data-adaptive).

Returns:

  • median_intensity (float) – Median intensity of noise region.

  • robust_std (float) – Robust standard deviation (Gaussian-equivalent MAD) of noise region.

mioXpektron.detection.robust_noise_estimation_mz(mz_values, intensities, min_mz, max_mz)[source]

Estimate noise from a user-specified m/z baseline region.

Parameters:
  • mz_values (np.ndarray) – m/z axis.

  • intensities (np.ndarray) – Corresponding intensity values.

  • min_mz (float) – m/z window that defines the baseline region.

  • max_mz (float) – m/z window that defines the baseline region.

Returns:

  • median_intensity (float) – Median intensity of the baseline region.

  • robust_std (float) – Robust standard deviation (MAD-scaled) of the baseline region.

mioXpektron.detection.robust_noise_estimation_mz_dependent(mz_values, intensities, peak_indices=None, window=2, peak_height=None, peak_prominence=None, min_peak_width=1, max_peak_width=75, n_bins=20, min_points_per_bin=25)[source]

Estimate local noise as piecewise-constant m/z bins interpolated over the spectrum.

Returns:

  • median_profile (np.ndarray) – Per-point local median noise estimate.

  • std_profile (np.ndarray) – Per-point local Gaussian-equivalent robust std estimate.

mioXpektron.detection.robust_peak_detection(mz_values, intensities, sample_name, group, method='Gaussian', min_intensity=1, min_snr=3, min_distance=2, window_size=10, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, distance_threshold=0.1, combined=False, use_cwt=False, noise_model='global', noise_bins=20, noise_min_points=25, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True, verbose=False)[source]

Fast peak detection in ToF-SIMS or similar spectra, including peak area.

Returns:

peak_indicesnp.ndarray

Indices of detected peaks

peak_propertiesdict

Contains: mz, intensities, widths, prominences, heights, areas

Notes

Overlapping-peak deconvolution now requires both geometric overlap and a BIC improvement over a single-Gaussian window fit. Fitted component widths must also remain within the user-specified peak-width bounds.

Modules

check_overlapping_peaks(data_dir, file_name, ...)

check_overlapping_peaks2(data_dir, ...[, ...])

Overlay spectra with two colors (Cancer vs Control) inferred from file names.

detection

peak_analysis

test_noise_models