mioXpektron.denoise.denoise_select
ToF-SIMS 1D denoising & smoothing utilities
This module provides a battery of denoising and smoothing methods for 1D ToF-SIMS spectra and a noise-aware evaluation framework that selects methods based on both peak preservation and explicit noise reduction criteria. It supports:
Wavelet shrinkage (universal, SURE, Bayes, etc.) with optional variance-stabilizing transform (Anscombe).
Classical smoothers (Savitzky–Golay, Gaussian, Median, and a no-op baseline).
Robust, peak-centric measurements (height, FWHM, area, m/z shift) before/after denoising.
- Explicit noise quantification:
– Global background σ̂ via MAD on baseline regions (outside expanded FWHM windows) – Local σ̂ around each peak (flanking bands) and ΔSNR per peak (in dB) – High-frequency residual power via PSD (Welch by default), integrated above a cutoff
Parallel execution (threads or processes) and optional progress bars.
Ranking with tunable weights and hard constraints to forbid methods that do not denoise.
Convenience plotting (Pareto: ΔSNR vs |%height|) and multi-window evaluation helpers.
Design notes
Baseline regions are defined by excluding each reference peak’s FWHM window expanded by baseline_expand.
Local noise is estimated in flanking bands at [1.5W, 3W] on each side of a peak by default.
PSD is computed on uniformly resampled baseline segments to make frequency analysis well-defined.
Threads are the recommended backend because most heavy NumPy/SciPy routines release the GIL.
Functions
|
Aggregate per-method summaries across windows, spectra, or other units. |
|
Run a battery of denoising/smoothing methods and quantify both peak preservation and noise reduction. |
|
Evaluate denoising methods across multiple m/z windows and aggregate results. |
|
Translate a compact label back into |
|
Render ΔSNR vs. |%height| with Pareto annotations. |
|
Dispatch ranking to pandas or polars implementation with identical semantics. |
|
Rank methods using explicit gates plus a dimensionless tie-break score. |
|
Rank methods (polars) using the pandas implementation for identical semantics. |
|
|
|
Convert an absolute m/z shift to parts-per-million (ppm). |
Classes
|
Container for per-peak measurements. |
- class mioXpektron.denoise.denoise_select.PeakMeasurement(mz_center, idx_center, height, fwhm_pts, mz_left, mz_right, area, prominence=nan)[source]
Bases:
objectContainer for per-peak measurements.
- Parameters:
- mz_left, mz_right
Left/right m/z boundaries where the peak crosses the chosen relative height (e.g., 50%).
- Type:
- prominence
Peak prominence measured on the same signal; used to reject weak shoulders when re-matching peaks after denoising.
- Type:
- mioXpektron.denoise.denoise_select.compare_denoising_methods(x, y, *, min_mz=None, max_mz=None, max_peaks=300, min_prominence=None, rel_height=0.5, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=False, include_derivatives=False, target_dx=None, return_format='pandas', n_jobs=-1, parallel_backend='thread', progress=True, baseline_expand=2.0, flank_inner=1.5, flank_outer=3.0, hf_enabled=True, hf_cutoff_hz=None, hf_cutoff_frac=0.3, hf_resample_dx=None, hf_psd_method='welch', hf_welch_nperseg=None)[source]
Run a battery of denoising/smoothing methods and quantify both peak preservation and noise reduction.
- Parameters:
x (array-like or None) – m/z axis. If None, an index axis [0..N-1] is used.
y (array-like) – Raw intensities.
min_mz (float, optional) – Optional range restriction on x prior to peak detection and evaluation.
max_mz (float, optional) – Optional range restriction on x prior to peak detection and evaluation.
max_peaks (int, default 300) – Maximum number of reference peaks (by prominence) to evaluate in the selected range.
min_prominence (float, optional) – Prominence threshold for scipy.signal.find_peaks during reference detection.
rel_height (float, default 0.5) – Relative height used for FWHM measurements (e.g., 0.5 = half-height).
search_ppm (float, default 20.0) – ±ppm window around each reference m/z used to re-detect peaks after denoising.
match_min_prominence_ratio (float, default 0.1) – Minimum post-denoise prominence required for a matched peak, expressed as a fraction of the raw reference prominence.
match_min_prominence_abs (float, default 0.0) – Absolute lower bound for post-denoise peak prominence.
match_min_width_pts (float, default 0.25) – Minimum acceptable peak width in index points for a post-denoise match.
resample_to_uniform (bool, default False) – If True, allow denoisers to resample to a uniform grid internally when beneficial.
include_derivatives (bool, default False) – If True, include derivative-style Savitzky-Golay (deriv>0) and Gaussian (order>0) operators in the candidate grid. By default the search includes only smoothing/denoising variants.
target_dx (float, optional) – Desired spacing when resample_to_uniform=True.
return_format ({"pandas","polars"}, default "pandas") – Backend for output DataFrames.
n_jobs (int, default -1) – Number of workers used to evaluate methods in parallel (1 disables parallelism).
parallel_backend ({"thread","process"}, default "thread") – Parallelism backend. Threads are often efficient because NumPy/SciPy/PyWavelets drop the GIL.
progress (bool, default True) – Show a progress bar if tqdm is available.
baseline_expand (float, default 2.0) – Multiplier to expand each peak’s FWHM window when masking baseline regions used for noise/PSD estimates.
flank_inner (float, defaults 1.5 and 3.0) – Distances (in FWHM multiples) defining flanking bands used for local noise estimation.
flank_outer (float, defaults 1.5 and 3.0) – Distances (in FWHM multiples) defining flanking bands used for local noise estimation.
hf_enabled (bool, default True) – If True, compute high-frequency (HF) residual power metrics on baseline regions via PSD.
hf_cutoff_hz (float, optional) – Absolute HF cutoff frequency (cycles per m/z). If None, uses hf_cutoff_frac * Nyquist.
hf_cutoff_frac (float, default 0.3) – Fraction of the Nyquist frequency used as the HF band when hf_cutoff_hz is None.
hf_resample_dx (float, optional) – Δx used to resample baseline segments to a uniform grid for PSD; defaults to median Δx if None.
hf_psd_method ({"welch","periodogram"}, default "welch") – PSD estimator for HF power. Welch provides lower-variance estimates on finite windows.
hf_welch_nperseg (int, optional) – Segment length for Welch PSD. If None, chosen automatically (≈ max(16, N/8), power-of-two, ≤1024).
- Returns:
summary_df, per_peak_df – If return_format=”pandas”, returns pandas.DataFrame; if “polars”, returns polars.DataFrame. summary_df contains method-level medians/IQRs, noise and HF metrics; per_peak_df has per-peak rows.
- Return type:
DataFrame
- mioXpektron.denoise.denoise_select.aggregate_method_summaries(summary, *, unit_label='windows', return_format='pandas')[source]
Aggregate per-method summaries across windows, spectra, or other units.
The aggregation intentionally gives each unit equal weight by taking the median of unit-level metrics, while match fractions remain peak-weighted. This avoids a single busy window or spectrum dominating the ranking.
- mioXpektron.denoise.denoise_select.compare_methods_in_windows(x, y, windows, *, per_window_max_peaks=50, min_prominence=None, rel_height=0.5, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=False, include_derivatives=False, target_dx=None, return_format='pandas', n_jobs=-1, parallel_backend='thread', progress=True, baseline_expand=2.0, flank_inner=1.5, flank_outer=3.0, hf_enabled=True, hf_cutoff_hz=None, hf_cutoff_frac=0.3, hf_resample_dx=None, hf_psd_method='welch', hf_welch_nperseg=None, auto_tune=False, auto_tune_files=None)[source]
Evaluate denoising methods across multiple m/z windows and aggregate results.
- Parameters:
x (np.ndarray) – m/z axis and intensity values.
y (np.ndarray) – m/z axis and intensity values.
windows (list[tuple[float, float]]) – Each tuple is (min_mz, max_mz) for a window to evaluate.
per_window_max_peaks (int, default 50) – Max number of strongest peaks (by prominence) to measure within each window.
min_prominence (float, optional) – Minimum prominence passed to signal.find_peaks for reference peak detection.
rel_height (float, default 0.5) – Relative height used to define FWHM when measuring peaks.
search_ppm (float, default 20.0) – ±ppm window around each reference m/z used to re-detect peaks after denoising.
match_min_prominence_ratio (floats) – Forwarded to the peak re-matching logic used after denoising.
match_min_prominence_abs (floats) – Forwarded to the peak re-matching logic used after denoising.
match_min_width_pts (floats) – Forwarded to the peak re-matching logic used after denoising.
resample_to_uniform (optional) – Passed through to denoisers that support resampling.
target_dx (optional) – Passed through to denoisers that support resampling.
include_derivatives (bool, default False) – If True, include derivative-style Savitzky-Golay and Gaussian candidates inside each window’s method grid.
return_format ({"pandas","polars"}) – Backend for output DataFrames.
n_jobs (int, default -1) – Workers used within each window’s call to compare_denoising_methods.
parallel_backend ({"thread","process"}, default "thread") – Parallelism backend.
progress (bool, default True) – Show progress bars during evaluation.
baseline_expand (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.
flank_inner (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.
flank_outer (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.
hf_enabled (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_cutoff_hz (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_cutoff_frac (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_resample_dx (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_psd_method (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_welch_nperseg (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
auto_tune (bool)
- Returns:
If return_format == “pandas” –
- rolluppd.DataFrame
Method-level aggregation across all windows.
- summary_allpd.DataFrame
Per-window, per-method summary table (noise and peak metrics).
- detail_allpd.DataFrame
Per-peak detail table across all windows.
If return_format == “polars” – rollup, summary_all, detail_all : pl.DataFrame
- mioXpektron.denoise.denoise_select.to_ppm(mz_shift_med, mz_ref_median)[source]
Convert an absolute m/z shift to parts-per-million (ppm).
ppm = 1e6 * Δm / m_ref
- mioXpektron.denoise.denoise_select.rank_methods_pandas(summary_df, per_peak_df, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, w_hf_db=1.5, w_hf_frac=1.0, min_noise_db=0.5, min_delta_snr_db=1.0, selection_criteria=None)[source]
Rank methods using explicit gates plus a dimensionless tie-break score.
The primary scientific selection rule is: 1. Require peak-preservation and minimum-denoising criteria to pass. 2. Use a dimensionless score only as a secondary tie-break, not as the
primary scientific claim.
- mioXpektron.denoise.denoise_select.rank_methods_polars(summary_df, per_peak_df, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, w_hf_db=1.5, w_hf_frac=1.0, min_noise_db=0.5, min_delta_snr_db=1.0, selection_criteria=None)[source]
Rank methods (polars) using the pandas implementation for identical semantics.
- mioXpektron.denoise.denoise_select.rank_method(input_format, summary_df, per_peak_df, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, w_hf_db=1.5, w_hf_frac=1.0, min_noise_db=0.5, min_delta_snr_db=1.0, selection_criteria=None)[source]
Dispatch ranking to pandas or polars implementation with identical semantics.
Returns a DataFrame (pandas or polars) sorted by ascending score and includes explicit pass/fail flags for denoising and peak-preservation criteria.
- mioXpektron.denoise.denoise_select.decode_method_label(label)[source]
Translate a compact label back into
noise_filteringparameters.
- mioXpektron.denoise.denoise_select.select_methods(summary, basis='constrained_pareto_then_snr', top_k=12, require_pass=True, require_finite_metrics=True)[source]
- Returns:
the post-filter DataFrame (shared!) frontier_df: DataFrame of Pareto points (or None if basis=’score’ and Pareto not computed) selected_df: the DataFrame of selected rows to annotate/return (top_k)
- Return type:
filtered_df
- mioXpektron.denoise.denoise_select.plot_pareto_delta_snr_vs_height(summary, annotate=True, top_k=12, out_path=None, ax=None, basis='constrained_pareto_then_snr', require_pass=True, require_finite_metrics=True, save_plot=True, save_pareto=True)[source]
Render ΔSNR vs. |%height| with Pareto annotations.
Parameters mirror
select_methods(); seeDenoisingMethods.plotfor additional discussion. The helper creates the Matplotlib figure whenaxis omitted and optionally saves both the chart and frontier table.