mioXpektron.denoise.denoise_select

ToF-SIMS 1D denoising & smoothing utilities

This module provides a battery of denoising and smoothing methods for 1D ToF-SIMS spectra and a noise-aware evaluation framework that selects methods based on both peak preservation and explicit noise reduction criteria. It supports:

  • Wavelet shrinkage (universal, SURE, Bayes, etc.) with optional variance-stabilizing transform (Anscombe).

  • Classical smoothers (Savitzky–Golay, Gaussian, Median, and a no-op baseline).

  • Robust, peak-centric measurements (height, FWHM, area, m/z shift) before/after denoising.

  • Explicit noise quantification:

    – Global background σ̂ via MAD on baseline regions (outside expanded FWHM windows) – Local σ̂ around each peak (flanking bands) and ΔSNR per peak (in dB) – High-frequency residual power via PSD (Welch by default), integrated above a cutoff

  • Parallel execution (threads or processes) and optional progress bars.

  • Ranking with tunable weights and hard constraints to forbid methods that do not denoise.

  • Convenience plotting (Pareto: ΔSNR vs |%height|) and multi-window evaluation helpers.

Design notes

  • Baseline regions are defined by excluding each reference peak’s FWHM window expanded by baseline_expand.

  • Local noise is estimated in flanking bands at [1.5W, 3W] on each side of a peak by default.

  • PSD is computed on uniformly resampled baseline segments to make frequency analysis well-defined.

  • Threads are the recommended backend because most heavy NumPy/SciPy routines release the GIL.

Functions

aggregate_method_summaries(summary, *[, ...])

Aggregate per-method summaries across windows, spectra, or other units.

compare_denoising_methods(x, y, *[, min_mz, ...])

Run a battery of denoising/smoothing methods and quantify both peak preservation and noise reduction.

compare_methods_in_windows(x, y, windows, *)

Evaluate denoising methods across multiple m/z windows and aggregate results.

decode_method_label(label)

Translate a compact label back into noise_filtering parameters.

plot_pareto_delta_snr_vs_height(summary[, ...])

Render ΔSNR vs. |%height| with Pareto annotations.

rank_method(input_format, summary_df, ...[, ...])

Dispatch ranking to pandas or polars implementation with identical semantics.

rank_methods_pandas(summary_df, per_peak_df)

Rank methods using explicit gates plus a dimensionless tie-break score.

rank_methods_polars(summary_df, per_peak_df)

Rank methods (polars) using the pandas implementation for identical semantics.

select_methods(summary[, basis, top_k, ...])

to_ppm(mz_shift_med, mz_ref_median)

Convert an absolute m/z shift to parts-per-million (ppm).

Classes

PeakMeasurement(mz_center, idx_center, ...)

Container for per-peak measurements.

class mioXpektron.denoise.denoise_select.PeakMeasurement(mz_center, idx_center, height, fwhm_pts, mz_left, mz_right, area, prominence=nan)[source]

Bases: object

Container for per-peak measurements.

Parameters:
mz_center

Peak center m/z (from the provided x-axis) at the local maximum index.

Type:

float

idx_center

Index of the peak center in the array (integer sample index).

Type:

int

height

Peak height (y at the local maximum) on the measured signal.

Type:

float

fwhm_pts

Full width at half maximum, measured in index points on the measured signal.

Type:

float

mz_left, mz_right

Left/right m/z boundaries where the peak crosses the chosen relative height (e.g., 50%).

Type:

float

area

Trapezoidal integral of y over [mz_left, mz_right].

Type:

float

prominence

Peak prominence measured on the same signal; used to reject weak shoulders when re-matching peaks after denoising.

Type:

float

mz_center: float
idx_center: int
height: float
fwhm_pts: float
mz_left: float
mz_right: float
area: float
prominence: float = nan
mioXpektron.denoise.denoise_select.compare_denoising_methods(x, y, *, min_mz=None, max_mz=None, max_peaks=300, min_prominence=None, rel_height=0.5, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=False, include_derivatives=False, target_dx=None, return_format='pandas', n_jobs=-1, parallel_backend='thread', progress=True, baseline_expand=2.0, flank_inner=1.5, flank_outer=3.0, hf_enabled=True, hf_cutoff_hz=None, hf_cutoff_frac=0.3, hf_resample_dx=None, hf_psd_method='welch', hf_welch_nperseg=None)[source]

Run a battery of denoising/smoothing methods and quantify both peak preservation and noise reduction.

Parameters:
  • x (array-like or None) – m/z axis. If None, an index axis [0..N-1] is used.

  • y (array-like) – Raw intensities.

  • min_mz (float, optional) – Optional range restriction on x prior to peak detection and evaluation.

  • max_mz (float, optional) – Optional range restriction on x prior to peak detection and evaluation.

  • max_peaks (int, default 300) – Maximum number of reference peaks (by prominence) to evaluate in the selected range.

  • min_prominence (float, optional) – Prominence threshold for scipy.signal.find_peaks during reference detection.

  • rel_height (float, default 0.5) – Relative height used for FWHM measurements (e.g., 0.5 = half-height).

  • search_ppm (float, default 20.0) – ±ppm window around each reference m/z used to re-detect peaks after denoising.

  • match_min_prominence_ratio (float, default 0.1) – Minimum post-denoise prominence required for a matched peak, expressed as a fraction of the raw reference prominence.

  • match_min_prominence_abs (float, default 0.0) – Absolute lower bound for post-denoise peak prominence.

  • match_min_width_pts (float, default 0.25) – Minimum acceptable peak width in index points for a post-denoise match.

  • resample_to_uniform (bool, default False) – If True, allow denoisers to resample to a uniform grid internally when beneficial.

  • include_derivatives (bool, default False) – If True, include derivative-style Savitzky-Golay (deriv>0) and Gaussian (order>0) operators in the candidate grid. By default the search includes only smoothing/denoising variants.

  • target_dx (float, optional) – Desired spacing when resample_to_uniform=True.

  • return_format ({"pandas","polars"}, default "pandas") – Backend for output DataFrames.

  • n_jobs (int, default -1) – Number of workers used to evaluate methods in parallel (1 disables parallelism).

  • parallel_backend ({"thread","process"}, default "thread") – Parallelism backend. Threads are often efficient because NumPy/SciPy/PyWavelets drop the GIL.

  • progress (bool, default True) – Show a progress bar if tqdm is available.

  • baseline_expand (float, default 2.0) – Multiplier to expand each peak’s FWHM window when masking baseline regions used for noise/PSD estimates.

  • flank_inner (float, defaults 1.5 and 3.0) – Distances (in FWHM multiples) defining flanking bands used for local noise estimation.

  • flank_outer (float, defaults 1.5 and 3.0) – Distances (in FWHM multiples) defining flanking bands used for local noise estimation.

  • hf_enabled (bool, default True) – If True, compute high-frequency (HF) residual power metrics on baseline regions via PSD.

  • hf_cutoff_hz (float, optional) – Absolute HF cutoff frequency (cycles per m/z). If None, uses hf_cutoff_frac * Nyquist.

  • hf_cutoff_frac (float, default 0.3) – Fraction of the Nyquist frequency used as the HF band when hf_cutoff_hz is None.

  • hf_resample_dx (float, optional) – Δx used to resample baseline segments to a uniform grid for PSD; defaults to median Δx if None.

  • hf_psd_method ({"welch","periodogram"}, default "welch") – PSD estimator for HF power. Welch provides lower-variance estimates on finite windows.

  • hf_welch_nperseg (int, optional) – Segment length for Welch PSD. If None, chosen automatically (≈ max(16, N/8), power-of-two, ≤1024).

Returns:

summary_df, per_peak_df – If return_format=”pandas”, returns pandas.DataFrame; if “polars”, returns polars.DataFrame. summary_df contains method-level medians/IQRs, noise and HF metrics; per_peak_df has per-peak rows.

Return type:

DataFrame

mioXpektron.denoise.denoise_select.aggregate_method_summaries(summary, *, unit_label='windows', return_format='pandas')[source]

Aggregate per-method summaries across windows, spectra, or other units.

The aggregation intentionally gives each unit equal weight by taking the median of unit-level metrics, while match fractions remain peak-weighted. This avoids a single busy window or spectrum dominating the ranking.

Parameters:
  • unit_label (str)

  • return_format (Literal['pandas', 'polars'])

mioXpektron.denoise.denoise_select.compare_methods_in_windows(x, y, windows, *, per_window_max_peaks=50, min_prominence=None, rel_height=0.5, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=False, include_derivatives=False, target_dx=None, return_format='pandas', n_jobs=-1, parallel_backend='thread', progress=True, baseline_expand=2.0, flank_inner=1.5, flank_outer=3.0, hf_enabled=True, hf_cutoff_hz=None, hf_cutoff_frac=0.3, hf_resample_dx=None, hf_psd_method='welch', hf_welch_nperseg=None, auto_tune=False, auto_tune_files=None)[source]

Evaluate denoising methods across multiple m/z windows and aggregate results.

Parameters:
  • x (np.ndarray) – m/z axis and intensity values.

  • y (np.ndarray) – m/z axis and intensity values.

  • windows (list[tuple[float, float]]) – Each tuple is (min_mz, max_mz) for a window to evaluate.

  • per_window_max_peaks (int, default 50) – Max number of strongest peaks (by prominence) to measure within each window.

  • min_prominence (float, optional) – Minimum prominence passed to signal.find_peaks for reference peak detection.

  • rel_height (float, default 0.5) – Relative height used to define FWHM when measuring peaks.

  • search_ppm (float, default 20.0) – ±ppm window around each reference m/z used to re-detect peaks after denoising.

  • match_min_prominence_ratio (floats) – Forwarded to the peak re-matching logic used after denoising.

  • match_min_prominence_abs (floats) – Forwarded to the peak re-matching logic used after denoising.

  • match_min_width_pts (floats) – Forwarded to the peak re-matching logic used after denoising.

  • resample_to_uniform (optional) – Passed through to denoisers that support resampling.

  • target_dx (optional) – Passed through to denoisers that support resampling.

  • include_derivatives (bool, default False) – If True, include derivative-style Savitzky-Golay and Gaussian candidates inside each window’s method grid.

  • return_format ({"pandas","polars"}) – Backend for output DataFrames.

  • n_jobs (int, default -1) – Workers used within each window’s call to compare_denoising_methods.

  • parallel_backend ({"thread","process"}, default "thread") – Parallelism backend.

  • progress (bool, default True) – Show progress bars during evaluation.

  • baseline_expand (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.

  • flank_inner (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.

  • flank_outer (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.

  • hf_enabled (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).

  • hf_cutoff_hz (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).

  • hf_cutoff_frac (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).

  • hf_resample_dx (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).

  • hf_psd_method (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).

  • hf_welch_nperseg (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).

  • auto_tune (bool)

  • auto_tune_files (list[str] | None)

Returns:

  • If return_format == “pandas”

    rolluppd.DataFrame

    Method-level aggregation across all windows.

    summary_allpd.DataFrame

    Per-window, per-method summary table (noise and peak metrics).

    detail_allpd.DataFrame

    Per-peak detail table across all windows.

  • If return_format == “polars” – rollup, summary_all, detail_all : pl.DataFrame

mioXpektron.denoise.denoise_select.to_ppm(mz_shift_med, mz_ref_median)[source]

Convert an absolute m/z shift to parts-per-million (ppm).

ppm = 1e6 * Δm / m_ref

Parameters:
Return type:

float

mioXpektron.denoise.denoise_select.rank_methods_pandas(summary_df, per_peak_df, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, w_hf_db=1.5, w_hf_frac=1.0, min_noise_db=0.5, min_delta_snr_db=1.0, selection_criteria=None)[source]

Rank methods using explicit gates plus a dimensionless tie-break score.

The primary scientific selection rule is: 1. Require peak-preservation and minimum-denoising criteria to pass. 2. Use a dimensionless score only as a secondary tie-break, not as the

primary scientific claim.

Parameters:

selection_criteria (Dict[str, float] | None)

mioXpektron.denoise.denoise_select.rank_methods_polars(summary_df, per_peak_df, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, w_hf_db=1.5, w_hf_frac=1.0, min_noise_db=0.5, min_delta_snr_db=1.0, selection_criteria=None)[source]

Rank methods (polars) using the pandas implementation for identical semantics.

Parameters:
  • summary_df (DataFrame)

  • per_peak_df (DataFrame)

  • min_noise_db (float)

  • min_delta_snr_db (float)

  • selection_criteria (Dict[str, float] | None)

Return type:

DataFrame

mioXpektron.denoise.denoise_select.rank_method(input_format, summary_df, per_peak_df, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, w_hf_db=1.5, w_hf_frac=1.0, min_noise_db=0.5, min_delta_snr_db=1.0, selection_criteria=None)[source]

Dispatch ranking to pandas or polars implementation with identical semantics.

Returns a DataFrame (pandas or polars) sorted by ascending score and includes explicit pass/fail flags for denoising and peak-preservation criteria.

Parameters:
mioXpektron.denoise.denoise_select.decode_method_label(label)[source]

Translate a compact label back into noise_filtering parameters.

Parameters:

label (str)

Return type:

dict

mioXpektron.denoise.denoise_select.select_methods(summary, basis='constrained_pareto_then_snr', top_k=12, require_pass=True, require_finite_metrics=True)[source]
Returns:

the post-filter DataFrame (shared!) frontier_df: DataFrame of Pareto points (or None if basis=’score’ and Pareto not computed) selected_df: the DataFrame of selected rows to annotate/return (top_k)

Return type:

filtered_df

mioXpektron.denoise.denoise_select.plot_pareto_delta_snr_vs_height(summary, annotate=True, top_k=12, out_path=None, ax=None, basis='constrained_pareto_then_snr', require_pass=True, require_finite_metrics=True, save_plot=True, save_pareto=True)[source]

Render ΔSNR vs. |%height| with Pareto annotations.

Parameters mirror select_methods(); see DenoisingMethods.plot for additional discussion. The helper creates the Matplotlib figure when ax is omitted and optionally saves both the chart and frontier table.