mioXpektron.denoise.denoise_select

ToF-SIMS 1D denoising & smoothing utilities

This module provides a battery of denoising and smoothing methods for 1D ToF-SIMS spectra and a noise-aware evaluation framework that selects methods based on both peak preservation and explicit noise reduction criteria. It supports:

Wavelet shrinkage (universal, SURE, Bayes, etc.) with optional variance-stabilizing transform (Anscombe).
Classical smoothers (Savitzky–Golay, Gaussian, Median, and a no-op baseline).
Robust, peak-centric measurements (height, FWHM, area, m/z shift) before/after denoising.
Explicit noise quantification:
– Global background σ̂ via MAD on baseline regions (outside expanded FWHM windows) – Local σ̂ around each peak (flanking bands) and ΔSNR per peak (in dB) – High-frequency residual power via PSD (Welch by default), integrated above a cutoff
Parallel execution (threads or processes) and optional progress bars.
Ranking with tunable weights and hard constraints to forbid methods that do not denoise.
Convenience plotting (Pareto: ΔSNR vs |%height|) and multi-window evaluation helpers.

Design notes

Baseline regions are defined by excluding each reference peak’s FWHM window expanded by baseline_expand.
Local noise is estimated in flanking bands at [1.5W, 3W] on each side of a peak by default.
PSD is computed on uniformly resampled baseline segments to make frequency analysis well-defined.
Threads are the recommended backend because most heavy NumPy/SciPy routines release the GIL.

Functions

`aggregate_method_summaries`(summary, *[, ...])	Aggregate per-method summaries across windows, spectra, or other units.
`compare_denoising_methods`(x, y, *[, min_mz, ...])	Run a battery of denoising/smoothing methods and quantify both peak preservation and noise reduction.
`compare_methods_in_windows`(x, y, windows, *)	Evaluate denoising methods across multiple m/z windows and aggregate results.
`decode_method_label`(label)	Translate a compact label back into `noise_filtering` parameters.
`plot_pareto_delta_snr_vs_height`(summary[, ...])	Render ΔSNR vs. \|%height\| with Pareto annotations.
`rank_method`(input_format, summary_df, ...[, ...])	Dispatch ranking to pandas or polars implementation with identical semantics.
`rank_methods_pandas`(summary_df, per_peak_df)	Rank methods using explicit gates plus a dimensionless tie-break score.
`rank_methods_polars`(summary_df, per_peak_df)	Rank methods (polars) using the pandas implementation for identical semantics.
`select_methods`(summary[, basis, top_k, ...])
`to_ppm`(mz_shift_med, mz_ref_median)	Convert an absolute m/z shift to parts-per-million (ppm).

Classes

PeakMeasurement(mz_center, idx_center, ...)

Container for per-peak measurements.

class mioXpektron.denoise.denoise_select.PeakMeasurement(mz_center, idx_center, height, fwhm_pts, mz_left, mz_right, area, prominence=nan)[source]

Bases: object

Container for per-peak measurements.

Parameters:

mz_center (float)
idx_center (int)
height (float)
fwhm_pts (float)
mz_left (float)
mz_right (float)
area (float)
prominence (float)

mz_center

Peak center m/z (from the provided x-axis) at the local maximum index.

Type:: float

idx_center

Index of the peak center in the array (integer sample index).

Type:: int

height

Peak height (y at the local maximum) on the measured signal.

Type:: float

fwhm_pts

Full width at half maximum, measured in index points on the measured signal.

Type:: float

mz_left, mz_right

Left/right m/z boundaries where the peak crosses the chosen relative height (e.g., 50%).

Type:: float

area

Trapezoidal integral of y over [mz_left, mz_right].

Type:: float

prominence

Peak prominence measured on the same signal; used to reject weak shoulders when re-matching peaks after denoising.

Type:: float

mz_center: float

idx_center: int

height: float

fwhm_pts: float

mz_left: float

mz_right: float

area: float

prominence: float = nan

mioXpektron.denoise.denoise_select.compare_denoising_methods(x, y, *, min_mz=None, max_mz=None, max_peaks=300, min_prominence=None, rel_height=0.5, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=False, include_derivatives=False, target_dx=None, return_format='pandas', n_jobs=-1, parallel_backend='thread', progress=True, baseline_expand=2.0, flank_inner=1.5, flank_outer=3.0, hf_enabled=True, hf_cutoff_hz=None, hf_cutoff_frac=0.3, hf_resample_dx=None, hf_psd_method='welch', hf_welch_nperseg=None)[source]

Run a battery of denoising/smoothing methods and quantify both peak preservation and noise reduction.

Parameters:

x (array-like or None) – m/z axis. If None, an index axis [0..N-1] is used.
y (array-like) – Raw intensities.
min_mz (float, optional) – Optional range restriction on x prior to peak detection and evaluation.
max_mz (float, optional) – Optional range restriction on x prior to peak detection and evaluation.
max_peaks (int, default 300) – Maximum number of reference peaks (by prominence) to evaluate in the selected range.
min_prominence (float, optional) – Prominence threshold for scipy.signal.find_peaks during reference detection.
rel_height (float, default 0.5) – Relative height used for FWHM measurements (e.g., 0.5 = half-height).
search_ppm (float, default 20.0) – ±ppm window around each reference m/z used to re-detect peaks after denoising.
match_min_prominence_ratio (float, default 0.1) – Minimum post-denoise prominence required for a matched peak, expressed as a fraction of the raw reference prominence.
match_min_prominence_abs (float, default 0.0) – Absolute lower bound for post-denoise peak prominence.
match_min_width_pts (float, default 0.25) – Minimum acceptable peak width in index points for a post-denoise match.
resample_to_uniform (bool, default False) – If True, allow denoisers to resample to a uniform grid internally when beneficial.
include_derivatives (bool, default False) – If True, include derivative-style Savitzky-Golay (deriv>0) and Gaussian (order>0) operators in the candidate grid. By default the search includes only smoothing/denoising variants.
target_dx (float, optional) – Desired spacing when resample_to_uniform=True.
return_format ({"pandas","polars"}, default "pandas") – Backend for output DataFrames.
n_jobs (int, default -1) – Number of workers used to evaluate methods in parallel (1 disables parallelism).
parallel_backend ({"thread","process"}, default "thread") – Parallelism backend. Threads are often efficient because NumPy/SciPy/PyWavelets drop the GIL.
progress (bool, default True) – Show a progress bar if tqdm is available.
baseline_expand (float, default 2.0) – Multiplier to expand each peak’s FWHM window when masking baseline regions used for noise/PSD estimates.
flank_inner (float, defaults 1.5 and 3.0) – Distances (in FWHM multiples) defining flanking bands used for local noise estimation.
flank_outer (float, defaults 1.5 and 3.0) – Distances (in FWHM multiples) defining flanking bands used for local noise estimation.
hf_enabled (bool, default True) – If True, compute high-frequency (HF) residual power metrics on baseline regions via PSD.
hf_cutoff_hz (float, optional) – Absolute HF cutoff frequency (cycles per m/z). If None, uses hf_cutoff_frac * Nyquist.
hf_cutoff_frac (float, default 0.3) – Fraction of the Nyquist frequency used as the HF band when hf_cutoff_hz is None.
hf_resample_dx (float, optional) – Δx used to resample baseline segments to a uniform grid for PSD; defaults to median Δx if None.
hf_psd_method ({"welch","periodogram"}, default "welch") – PSD estimator for HF power. Welch provides lower-variance estimates on finite windows.
hf_welch_nperseg (int, optional) – Segment length for Welch PSD. If None, chosen automatically (≈ max(16, N/8), power-of-two, ≤1024).

Returns:

summary_df, per_peak_df – If return_format=”pandas”, returns pandas.DataFrame; if “polars”, returns polars.DataFrame. summary_df contains method-level medians/IQRs, noise and HF metrics; per_peak_df has per-peak rows.

Return type:

DataFrame

mioXpektron.denoise.denoise_select.aggregate_method_summaries(summary, *, unit_label='windows', return_format='pandas')[source]

Aggregate per-method summaries across windows, spectra, or other units.

The aggregation intentionally gives each unit equal weight by taking the median of unit-level metrics, while match fractions remain peak-weighted. This avoids a single busy window or spectrum dominating the ranking.

Parameters:

unit_label (str)
return_format (Literal['pandas', 'polars'])

mioXpektron.denoise.denoise_select.compare_methods_in_windows(x, y, windows, *, per_window_max_peaks=50, min_prominence=None, rel_height=0.5, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=False, include_derivatives=False, target_dx=None, return_format='pandas', n_jobs=-1, parallel_backend='thread', progress=True, baseline_expand=2.0, flank_inner=1.5, flank_outer=3.0, hf_enabled=True, hf_cutoff_hz=None, hf_cutoff_frac=0.3, hf_resample_dx=None, hf_psd_method='welch', hf_welch_nperseg=None, auto_tune=False, auto_tune_files=None)[source]

Evaluate denoising methods across multiple m/z windows and aggregate results.

Parameters:

x (np.ndarray) – m/z axis and intensity values.
y (np.ndarray) – m/z axis and intensity values.
windows (list[tuple[float, float]]) – Each tuple is (min_mz, max_mz) for a window to evaluate.
per_window_max_peaks (int, default 50) – Max number of strongest peaks (by prominence) to measure within each window.
min_prominence (float, optional) – Minimum prominence passed to signal.find_peaks for reference peak detection.
rel_height (float, default 0.5) – Relative height used to define FWHM when measuring peaks.
search_ppm (float, default 20.0) – ±ppm window around each reference m/z used to re-detect peaks after denoising.
match_min_prominence_ratio (floats) – Forwarded to the peak re-matching logic used after denoising.
match_min_prominence_abs (floats) – Forwarded to the peak re-matching logic used after denoising.
match_min_width_pts (floats) – Forwarded to the peak re-matching logic used after denoising.
resample_to_uniform (optional) – Passed through to denoisers that support resampling.
target_dx (optional) – Passed through to denoisers that support resampling.
include_derivatives (bool, default False) – If True, include derivative-style Savitzky-Golay and Gaussian candidates inside each window’s method grid.
return_format ({"pandas","polars"}) – Backend for output DataFrames.
n_jobs (int, default -1) – Workers used within each window’s call to compare_denoising_methods.
parallel_backend ({"thread","process"}, default "thread") – Parallelism backend.
progress (bool, default True) – Show progress bars during evaluation.
baseline_expand (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.
flank_inner (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.
flank_outer (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.
hf_enabled (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_cutoff_hz (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_cutoff_frac (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_resample_dx (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_psd_method (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_welch_nperseg (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
auto_tune (bool)
auto_tune_files (list[str] | None)

Returns:

If return_format == “pandas” –

rolluppd.DataFrame
Method-level aggregation across all windows.

summary_allpd.DataFrame
Per-window, per-method summary table (noise and peak metrics).

detail_allpd.DataFrame
Per-peak detail table across all windows.
If return_format == “polars” – rollup, summary_all, detail_all : pl.DataFrame

mioXpektron.denoise.denoise_select.to_ppm(mz_shift_med, mz_ref_median)[source]

Convert an absolute m/z shift to parts-per-million (ppm).

ppm = 1e6 * Δm / m_ref

Parameters:

mz_shift_med (float)
mz_ref_median (float)

Return type:

float

mioXpektron.denoise.denoise_select.rank_methods_pandas(summary_df, per_peak_df, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, w_hf_db=1.5, w_hf_frac=1.0, min_noise_db=0.5, min_delta_snr_db=1.0, selection_criteria=None)[source]

Rank methods using explicit gates plus a dimensionless tie-break score.

The primary scientific selection rule is: 1. Require peak-preservation and minimum-denoising criteria to pass. 2. Use a dimensionless score only as a secondary tie-break, not as the

primary scientific claim.

Parameters:: selection_criteria (Dict[str, float] | None)

mioXpektron.denoise.denoise_select.rank_methods_polars(summary_df, per_peak_df, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, w_hf_db=1.5, w_hf_frac=1.0, min_noise_db=0.5, min_delta_snr_db=1.0, selection_criteria=None)[source]

Rank methods (polars) using the pandas implementation for identical semantics.

Parameters:

summary_df (DataFrame)
per_peak_df (DataFrame)
min_noise_db (float)
min_delta_snr_db (float)
selection_criteria (Dict[str, float] | None)

Return type:

DataFrame

mioXpektron.denoise.denoise_select.rank_method(input_format, summary_df, per_peak_df, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, w_hf_db=1.5, w_hf_frac=1.0, min_noise_db=0.5, min_delta_snr_db=1.0, selection_criteria=None)[source]

Dispatch ranking to pandas or polars implementation with identical semantics.

Returns a DataFrame (pandas or polars) sorted by ascending score and includes explicit pass/fail flags for denoising and peak-preservation criteria.

Parameters:

min_noise_db (float)
min_delta_snr_db (float)
selection_criteria (Dict[str, float] | None)

mioXpektron.denoise.denoise_select.decode_method_label(label)[source]

Translate a compact label back into noise_filtering parameters.

Parameters:: label (str)
Return type:: dict

mioXpektron.denoise.denoise_select.select_methods(summary, basis='constrained_pareto_then_snr', top_k=12, require_pass=True, require_finite_metrics=True)[source]

Returns:: the post-filter DataFrame (shared!) frontier_df: DataFrame of Pareto points (or None if basis=’score’ and Pareto not computed) selected_df: the DataFrame of selected rows to annotate/return (top_k)
Return type:: filtered_df

mioXpektron.denoise.denoise_select.plot_pareto_delta_snr_vs_height(summary, annotate=True, top_k=12, out_path=None, ax=None, basis='constrained_pareto_then_snr', require_pass=True, require_finite_metrics=True, save_plot=True, save_pareto=True)[source]

Render ΔSNR vs. |%height| with Pareto annotations.

Parameters mirror select_methods(); see DenoisingMethods.plot for additional discussion. The helper creates the Matplotlib figure when ax is omitted and optionally saves both the chart and frontier table.