mioXpektron.detection
Peak detection utilities for the Xpektron toolkit.
- mioXpektron.detection.align_peaks(peaks_df, mz_tolerance=0.2, mz_rounding_precision=1, output='intensity')[source]
Cluster peaks by m/z and return an aligned feature matrix.
Uses a greedy sorted-bin algorithm that guarantees every aligned bin spans at most mz_tolerance in m/z.
- mioXpektron.detection.check_overlapping_peaks(data_dir, file_name, mz_min, mz_max, norm_tic=False, alpha=0.2)[source]
- mioXpektron.detection.check_overlapping_peaks2(data_dir, file_pattern, mz_min, mz_max, norm_tic=False, alpha=0.18, bin_width=0.001, show_median=True, show_group_cumulative=True)[source]
Overlay spectra with two colors (Cancer vs Control) inferred from file names.
- Parameters:
data_dir (str) – Directory containing spectra.
mz_min (float) – m/z window to visualize.
mz_max (float) – m/z window to visualize.
norm_tic (bool, default False) – Normalize each spectrum by its TIC prior to plotting.
alpha (float, default 0.18) – Line transparency for individual spectra.
bin_width (float, default 0.001) – Common grid step for interpolation (used for medians/cumulative plots).
show_median (bool, default True) – If True, overlay per-group median curves (thicker lines).
show_group_cumulative (bool, default True) – If True, plot per-group cumulative intensity curves on a separate figure.
Notes
Group detection is based on substrings in filenames: “_CC” (Cancer), “_CT” (Control).
Files without these markers are labeled “Unknown” and plotted in grey.
- mioXpektron.detection.collect_peak_properties_batch(files, mz_min=None, mz_max=None, baseline_method='airpls', noise_method='wavelet', missing_value_method='interpolation', normalization_target=100000000.0, method='Gaussian', min_intensity=1, min_snr=3, min_distance=5, window_size=10, peak_height=50, prominence=50, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, distance_threshold=0.01, combined=False, noise_model='global', noise_bins=20, noise_min_points=25, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True)[source]
Collect peak properties from a batch of ToF-SIMS files.
- Parameters:
mz_min (float or None) – m/z window for data import (if supported).
mz_max (float or None) – m/z window for data import (if supported).
baseline_method (str) – Method for baseline correction.
noise_method (str) – Noise filtering method.
missing_value_method (str) – Method for handling missing values.
normalization_target (float) – Target TIC normalization value.
min_snr (int or float) – Minimum signal-to-noise ratio for peak detection.
min_distance (int) – Minimum distance between peaks (in data points).
prominence (int or float or None) – Minimum peak prominence for detection.
width_rel_height (float) – Relative height for width calculation (e.g., 0.5 = FWHM).
noise_model ({"global", "mz_binned"}) – Noise model used to derive peak thresholds.
noise_bins (int) – Number of m/z bins for
noise_model="mz_binned".noise_min_points (int) – Minimum positive noise points per bin before using local estimates.
- Returns:
peaks_df – DataFrame with all peak properties for all files.
- Return type:
pd.DataFrame
- mioXpektron.detection.detect_peaks_cwt_with_area(mz_values, intensities, sample_name, group, min_intensity=1, min_snr=3, min_distance=2, window_size=10, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, verbose=False)[source]
Peak detection using Continuous Wavelet Transform (CWT) for ToF-SIMS spectra.
Returns:
- peak_propertiespd.DataFrame
Contains: mz, intensities, widths (approx), amplitudes, areas
- mioXpektron.detection.detect_peaks_with_area(mz_values, intensities, sample_name, group, min_intensity=1, min_snr=3, min_distance=2, window_size=10, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, verbose=False)[source]
Fast peak detection in ToF-SIMS or similar spectra, including peak area.
Returns:
- peak_indicesnp.ndarray
Indices of detected peaks
- peak_propertiesdict
Contains: mz, intensities, widths, prominences, heights, areas
- mioXpektron.detection.detect_peaks_with_area_v2(mz, intens, sample_name, group, *, min_intensity=1, min_snr=3, min_distance=2, prominence=10, min_peak_width=1, max_peak_width=75, rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, noise_window=10, verbose=False)[source]
- class mioXpektron.detection.PeakAlignIntensityArea(mz_tolerance=0.2, mz_rounding_precision=1, min_intensity=1, min_snr=3, min_distance=2, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, method=None, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True, output_dir=None, verbose=False, group_patterns=None, group_fn=None)[source]
Bases:
objectProcess normalized ToF-SIMS spectra from CSV files, detect peaks, align them across samples, and calculate both intensity and area tables for each aligned m/z value.
- Parameters:
mz_tolerance (float, optional (default=0.2)) – Maximum distance (in m/z units) for clustering peaks across samples.
mz_rounding_precision (int, optional (default=1)) – Number of decimal places for rounding aligned m/z values in output tables.
min_intensity (float, optional (default=1)) – Minimum intensity threshold for considering data points.
min_snr (float, optional (default=3)) – Minimum signal-to-noise ratio for peak detection.
min_distance (int, optional (default=2)) – Minimum distance (in data points) between peaks.
peak_height (float, optional (default=50)) – Minimum peak height for initial peak detection.
prominence (float, optional (default=10)) – Minimum prominence for peak detection.
min_peak_width (int, optional (default=1)) – Minimum peak width (in data points).
max_peak_width (int, optional (default=75)) – Maximum peak width (in data points).
width_rel_height (float, optional (default=0.5)) – Relative height for peak width calculation (0.5 = FWHM).
noise_model ({"global", "mz_binned"}, optional (default="global")) – Noise model used for threshold estimation.
noise_bins (int, optional (default=20)) – Number of m/z bins when using
noise_model="mz_binned".noise_min_points (int, optional (default=25)) – Minimum positive noise points per bin for the local model.
method (str or None, optional (default=None)) – Peak-detection / fitting method. None uses simple local-max detection (
detect_peaks_with_area_v2),'cwt'uses CWT detection, and'Gaussian'/'Lorentzian'/'Voigt'use curve-fit detection viarobust_peak_detection.deconvolution_min_bic_delta (float, optional (default=10.0)) – Minimum BIC improvement required before accepting a two-Gaussian deconvolution over a single-peak fit.
deconvolution_overlap_factor (float, optional (default=0.75)) – Scale factor applied to the mean measured peak width when deriving the adaptive deconvolution spacing gate.
deconvolution_replace_singles (bool, optional (default=True)) – If True, replace overlapping single-peak fits with the accepted deconvoluted components in the output table.
output_dir (str or None, optional) – Directory to save output CSV files. If None, files are not saved.
verbose (bool, optional (default=False)) – If True, print progress information.
Examples
>>> from mioXpektron.detection import PeakAlignIntensityArea >>> import glob >>> >>> # Get all normalized spectra >>> csv_files = glob.glob('output_files/normalized_spectra/*.csv') >>> >>> # Create analyzer instance >>> analyzer = PeakAlignIntensityArea( ... mz_tolerance=0.1, ... min_snr=3, ... output_dir='output_files/peak_analysis' ... ) >>> >>> # Process with m/z cutoff >>> intensity_table, area_table, peaks_df = analyzer.run( ... csv_files, ... mz_min=50, ... mz_max=500 ... ) >>> >>> print(f"Detected {len(peaks_df)} peaks across {len(csv_files)} samples") >>> print(f"Aligned to {intensity_table.shape[1]} unique m/z values")
- __init__(mz_tolerance=0.2, mz_rounding_precision=1, min_intensity=1, min_snr=3, min_distance=2, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, method=None, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True, output_dir=None, verbose=False, group_patterns=None, group_fn=None)[source]
Initialize the PeakAlignIntensityArea analyzer with default parameters.
- run(csv_files, mz_min=None, mz_max=None)[source]
Process CSV files and perform peak detection, alignment, and quantification.
- Parameters:
csv_files (list of str) – List of paths to normalized spectrum CSV files. Each CSV should have columns: ‘channel’, ‘mz’, ‘intensity’
mz_min (float or None, optional) – Minimum m/z value to consider for peak detection. If None, use full range.
mz_max (float or None, optional) – Maximum m/z value to consider for peak detection. If None, use full range.
- Returns:
intensity_table (pd.DataFrame) – DataFrame with samples as rows and aligned m/z values as columns, containing peak intensities (amplitudes). Missing peaks are filled with 0.
area_table (pd.DataFrame) – DataFrame with samples as rows and aligned m/z values as columns, containing peak areas. Missing peaks are filled with 0.
peaks_df (pd.DataFrame) – DataFrame containing all detected peaks with their properties before alignment.
- mioXpektron.detection.robust_noise_estimation(intensities, peak_indices=None, window=2, peak_height=None, peak_prominence=None, min_peak_width=1, max_peak_width=75)[source]
Robust noise estimation by excluding regions near detected peaks.
- Parameters:
intensities (np.ndarray) – Denoised, baseline-corrected intensities.
peak_indices (np.ndarray or None) – Indices of detected peaks. If None, function will detect peaks automatically.
window (int) – Extra number of data points to exclude on each side of the detected peak width. The measured peak extent is always masked first.
peak_height (float or None) – Minimum height for peak detection. If None, defaults to the median of positive intensities (data-adaptive).
peak_prominence (float or None) – Minimum prominence for peak detection. If None, defaults to 3x the MAD of positive intensities (data-adaptive).
- Returns:
median_intensity (float) – Median intensity of noise region.
robust_std (float) – Robust standard deviation (Gaussian-equivalent MAD) of noise region.
- mioXpektron.detection.robust_noise_estimation_mz(mz_values, intensities, min_mz, max_mz)[source]
Estimate noise from a user-specified m/z baseline region.
- Parameters:
- Returns:
median_intensity (float) – Median intensity of the baseline region.
robust_std (float) – Robust standard deviation (MAD-scaled) of the baseline region.
- mioXpektron.detection.robust_noise_estimation_mz_dependent(mz_values, intensities, peak_indices=None, window=2, peak_height=None, peak_prominence=None, min_peak_width=1, max_peak_width=75, n_bins=20, min_points_per_bin=25)[source]
Estimate local noise as piecewise-constant m/z bins interpolated over the spectrum.
- Returns:
median_profile (np.ndarray) – Per-point local median noise estimate.
std_profile (np.ndarray) – Per-point local Gaussian-equivalent robust std estimate.
- mioXpektron.detection.robust_peak_detection(mz_values, intensities, sample_name, group, method='Gaussian', min_intensity=1, min_snr=3, min_distance=2, window_size=10, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, distance_threshold=0.1, combined=False, use_cwt=False, noise_model='global', noise_bins=20, noise_min_points=25, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True, verbose=False)[source]
Fast peak detection in ToF-SIMS or similar spectra, including peak area.
Returns:
- peak_indicesnp.ndarray
Indices of detected peaks
- peak_propertiesdict
Contains: mz, intensities, widths, prominences, heights, areas
Notes
Overlapping-peak deconvolution now requires both geometric overlap and a BIC improvement over a single-Gaussian window fit. Fitted component widths must also remain within the user-specified peak-width bounds.
Modules
|
|
|
Overlay spectra with two colors (Cancer vs Control) inferred from file names. |