mioXpektron.denoise

Denoising utilities for the Xpektron toolkit.

class mioXpektron.denoise.BatchDenoising(file_paths, *, method='wavelet', n_workers=None, backend='threads', progress=True, params=None)[source]

Bases: object

Run denoising across a batch of spectra with stable outputs.

__init__(file_paths, *, method='wavelet', n_workers=None, backend='threads', progress=True, params=None)[source]

Store batch processing parameters for later execution.

run(output_root=None, folder_name='denoised_spectrums', save_result=True)[source]

Execute the batch denoising run.

Parameters:
  • output_root (str | Path | None) – Directory where the result folder will be created. If omitted, defaults to OUTPUT_DIR.

  • folder_name (str, default "denoised_spectrums") – Name for the result folder.

  • save_result (bool, default True) – Persist the executor results dataframe to OUTPUT_DIR.

Returns:

Records describing each processed file.

Return type:

list[BatchResult]

class mioXpektron.denoise.DenoisingMethods(mz_values, raw_intensities)[source]

Bases: object

Evaluate and visualize denoising strategies for mass spectrometry data.

Parameters:
  • mz (np.ndarray | pl.Series) – The m/z axis of the spectrum.

  • intensity (np.ndarray | pl.Series) – Raw intensity values aligned with mz.

__init__(mz_values, raw_intensities)[source]

Store the raw spectrum that downstream helpers will operate on.

classmethod compare_across_files(file_paths, *, windows=None, min_mz=None, max_mz=None, per_window_max_peaks=50, min_prominence=None, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=True, include_derivatives=False, return_format='pandas', w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, selection_criteria=None, file_n_jobs=0, file_parallel_backend='thread', method_n_jobs=None, method_parallel_backend='thread', progress=True, save_summary=True)[source]

Rank denoising methods across a cohort of spectra files.

Each file contributes one per-method summary, and the final cohort ranking aggregates those summaries with equal file weighting. This is a stronger basis for selecting a default denoiser than evaluating a single arbitrary spectrum.

Parallelism

This method supports two levels of parallelism: - file-level via file_n_jobs / file_parallel_backend - method-level inside each file via method_n_jobs / method_parallel_backend

When file_n_jobs=0 (default), worker counts are chosen automatically to avoid nested oversubscription.

returns:

(ranked_summary, sample_summary_all, detail_all) where sample_summary_all contains one aggregated row per file/method pair and detail_all contains all per-peak rows.

rtype:

tuple

compare(min_mz, max_mz, return_format='pandas', match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, include_derivatives=False, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, selection_criteria=None, save_summary=True)[source]

Compare denoising methods across the full spectrum window.

Parameters:
  • min_mz (float) – Bounds for the evaluation window.

  • max_mz (float) – Bounds for the evaluation window.

  • return_format ({"pandas", "polars"}, default "pandas") – Determines the summary dataframe type returned by the lower-level evaluators.

  • w_match (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_mz (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_area (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_height (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_fwhm (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_spread (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_noise_db (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_delta_snr_db (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • selection_criteria (dict | None, optional) – Override the default peak-preservation and denoising thresholds used to define scientifically acceptable methods.

  • save_summary (bool, default True) – When True and the summary is a pandas object, persist an Excel copy in OUTPUT_DIR for later inspection.

Returns:

Ranked table whose concrete type depends on return_format.

Return type:

DataFrame or LazyFrame

compare_in_windows(windows, per_window_max_peaks=50, min_prominence=None, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=True, include_derivatives=False, return_format='pandas', w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, selection_criteria=None, save_summary=True)[source]

Compare denoising methods within pre-defined m/z windows.

Parameters mirror compare() with additional controls for window segmentation. The return value matches return_format and includes a ranking aggregated across all windows.

Returns:

Ranked summary consistent with return_format.

Return type:

DataFrame or LazyFrame

plot(summary, annotate=True, top_k=3, save_plot=True, save_pareto=True)[source]

Visualize the Pareto front of SNR gain versus peak-height deviation.

Parameters:
  • summary (DataFrame or LazyFrame) – Ranking output generated by compare() or compare_in_windows().

  • annotate (bool, default True) – If True, label the top top_k points on the Pareto chart.

  • top_k (int, default 3) – Number of top-ranked methods to annotate.

  • save_plot (bool, default True) – Persist the Matplotlib figure via plot_pareto_delta_snr_vs_height().

  • save_pareto (bool, default True) – Persist the underlying data used to draw the plot.

Returns:

The axis used for further customization.

Return type:

matplotlib.axes.Axes

denoise_check(denoise_params, *, sample_name='test', group=None, log_scale_y=False, mz_min=0, mz_max=500, show_peaks=False, peak_height=1000, peak_prominence=50, min_peak_width=1, max_peak_width=None, figsize=(10, 6), save_plot=True)[source]

Preview a single denoising configuration by plotting selected peaks.

Parameters:
  • denoise_params (Mapping[str, Any]) – Keyword arguments forwarded directly to noise_filtering().

  • sample_name (str, default "test") – Label forwarded to PlotPeak for file naming.

  • group (str | None, optional) – Group identifier used by PlotPeak when saving plots.

  • log_scale_y (bool, default False) – Apply log1p before plotting, useful for high-dynamic-range spectra.

  • mz_min (float) – m/z bounds for the preview overlay.

  • mz_max (float) – m/z bounds for the preview overlay.

  • show_peaks (bool, default False) – Highlight top peaks using PlotPeak detection settings.

  • peak_height (float) – Tuning knobs passed to PlotPeak when show_peaks is True.

  • peak_prominence (float) – Tuning knobs passed to PlotPeak when show_peaks is True.

  • min_peak_width (float) – Tuning knobs passed to PlotPeak when show_peaks is True.

  • max_peak_width (float) – Tuning knobs passed to PlotPeak when show_peaks is True.

  • save_plot (bool, default True) – Persist the rendered preview when requested by PlotPeak.

Returns:

Axis returned by PlotPeak so callers can layer annotations.

Return type:

matplotlib.axes.Axes

method_parameters(summary, rank=0, basis='constrained_pareto_then_snr', require_pass=True, require_finite_metrics=True, save_selected=True)[source]

Extract the configuration for a ranked denoising method.

Parameters:
  • summary (DataFrame | pl.DataFrame) – Ranked output produced by the comparison helpers.

  • rank (int, default 0) – Zero-based index of the desired method after Pareto filtering.

  • basis (str, default "constrained_pareto_then_snr") – Strategy forwarded to select_methods() when Pareto filtering is available.

  • require_pass (bool, default True) – If True, discard rows that failed the minimum denoising constraint.

  • require_finite_metrics (bool, default True) – Drop methods with NaNs before ranking.

  • save_selected (bool, default True) – Persist the filtered table to OUTPUT_DIR for reproducibility.

Returns:

Parameters suitable for passing into noise_filtering().

Return type:

dict

mioXpektron.denoise.batch_denoise(files, output_dir, method='wavelet', n_workers=0, backend='threads', progress=True, params=None)[source]

Apply the configured denoising method to multiple spectrum files.

Parameters:
  • files (Iterable[str | Path]) – Collection of filesystem paths (glob results, manual list, etc.).

  • output_dir (str | Path) – Directory where the denoised outputs will be written.

  • method (str, default "wavelet") – Name of the smoothing routine forwarded to noise_filtering().

  • n_workers (int, default 0) – Worker count for the executor. 0 or None selects a CPU-aware default.

  • backend ({"threads", "processes"}, default "threads") – Execution strategy for the worker pool.

  • progress (bool, default True) – If True, wrap the executor iterator in tqdm when available.

  • params (dict | None) – Extra keyword arguments forwarded to noise_filtering().

Returns:

Status records describing each attempted file.

Return type:

list[BatchResult]

Raises:

ValueError – If no input paths exist or an unsupported backend name is provided.

mioXpektron.denoise.compare_denoising_methods(x, y, *, min_mz=None, max_mz=None, max_peaks=300, min_prominence=None, rel_height=0.5, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=False, include_derivatives=False, target_dx=None, return_format='pandas', n_jobs=-1, parallel_backend='thread', progress=True, baseline_expand=2.0, flank_inner=1.5, flank_outer=3.0, hf_enabled=True, hf_cutoff_hz=None, hf_cutoff_frac=0.3, hf_resample_dx=None, hf_psd_method='welch', hf_welch_nperseg=None)[source]

Run a battery of denoising/smoothing methods and quantify both peak preservation and noise reduction.

Parameters:
  • x (array-like or None) – m/z axis. If None, an index axis [0..N-1] is used.

  • y (array-like) – Raw intensities.

  • min_mz (float, optional) – Optional range restriction on x prior to peak detection and evaluation.

  • max_mz (float, optional) – Optional range restriction on x prior to peak detection and evaluation.

  • max_peaks (int, default 300) – Maximum number of reference peaks (by prominence) to evaluate in the selected range.

  • min_prominence (float, optional) – Prominence threshold for scipy.signal.find_peaks during reference detection.

  • rel_height (float, default 0.5) – Relative height used for FWHM measurements (e.g., 0.5 = half-height).

  • search_ppm (float, default 20.0) – ±ppm window around each reference m/z used to re-detect peaks after denoising.

  • match_min_prominence_ratio (float, default 0.1) – Minimum post-denoise prominence required for a matched peak, expressed as a fraction of the raw reference prominence.

  • match_min_prominence_abs (float, default 0.0) – Absolute lower bound for post-denoise peak prominence.

  • match_min_width_pts (float, default 0.25) – Minimum acceptable peak width in index points for a post-denoise match.

  • resample_to_uniform (bool, default False) – If True, allow denoisers to resample to a uniform grid internally when beneficial.

  • include_derivatives (bool, default False) – If True, include derivative-style Savitzky-Golay (deriv>0) and Gaussian (order>0) operators in the candidate grid. By default the search includes only smoothing/denoising variants.

  • target_dx (float, optional) – Desired spacing when resample_to_uniform=True.

  • return_format ({"pandas","polars"}, default "pandas") – Backend for output DataFrames.

  • n_jobs (int, default -1) – Number of workers used to evaluate methods in parallel (1 disables parallelism).

  • parallel_backend ({"thread","process"}, default "thread") – Parallelism backend. Threads are often efficient because NumPy/SciPy/PyWavelets drop the GIL.

  • progress (bool, default True) – Show a progress bar if tqdm is available.

  • baseline_expand (float, default 2.0) – Multiplier to expand each peak’s FWHM window when masking baseline regions used for noise/PSD estimates.

  • flank_inner (float, defaults 1.5 and 3.0) – Distances (in FWHM multiples) defining flanking bands used for local noise estimation.

  • flank_outer (float, defaults 1.5 and 3.0) – Distances (in FWHM multiples) defining flanking bands used for local noise estimation.

  • hf_enabled (bool, default True) – If True, compute high-frequency (HF) residual power metrics on baseline regions via PSD.

  • hf_cutoff_hz (float, optional) – Absolute HF cutoff frequency (cycles per m/z). If None, uses hf_cutoff_frac * Nyquist.

  • hf_cutoff_frac (float, default 0.3) – Fraction of the Nyquist frequency used as the HF band when hf_cutoff_hz is None.

  • hf_resample_dx (float, optional) – Δx used to resample baseline segments to a uniform grid for PSD; defaults to median Δx if None.

  • hf_psd_method ({"welch","periodogram"}, default "welch") – PSD estimator for HF power. Welch provides lower-variance estimates on finite windows.

  • hf_welch_nperseg (int, optional) – Segment length for Welch PSD. If None, chosen automatically (≈ max(16, N/8), power-of-two, ≤1024).

Returns:

summary_df, per_peak_df – If return_format=”pandas”, returns pandas.DataFrame; if “polars”, returns polars.DataFrame. summary_df contains method-level medians/IQRs, noise and HF metrics; per_peak_df has per-peak rows.

Return type:

DataFrame

mioXpektron.denoise.compare_methods_in_windows(x, y, windows, *, per_window_max_peaks=50, min_prominence=None, rel_height=0.5, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=False, include_derivatives=False, target_dx=None, return_format='pandas', n_jobs=-1, parallel_backend='thread', progress=True, baseline_expand=2.0, flank_inner=1.5, flank_outer=3.0, hf_enabled=True, hf_cutoff_hz=None, hf_cutoff_frac=0.3, hf_resample_dx=None, hf_psd_method='welch', hf_welch_nperseg=None, auto_tune=False, auto_tune_files=None)[source]

Evaluate denoising methods across multiple m/z windows and aggregate results.

Parameters:
  • x (np.ndarray) – m/z axis and intensity values.

  • y (np.ndarray) – m/z axis and intensity values.

  • windows (list[tuple[float, float]]) – Each tuple is (min_mz, max_mz) for a window to evaluate.

  • per_window_max_peaks (int, default 50) – Max number of strongest peaks (by prominence) to measure within each window.

  • min_prominence (float, optional) – Minimum prominence passed to signal.find_peaks for reference peak detection.

  • rel_height (float, default 0.5) – Relative height used to define FWHM when measuring peaks.

  • search_ppm (float, default 20.0) – ±ppm window around each reference m/z used to re-detect peaks after denoising.

  • match_min_prominence_ratio (floats) – Forwarded to the peak re-matching logic used after denoising.

  • match_min_prominence_abs (floats) – Forwarded to the peak re-matching logic used after denoising.

  • match_min_width_pts (floats) – Forwarded to the peak re-matching logic used after denoising.

  • resample_to_uniform (optional) – Passed through to denoisers that support resampling.

  • target_dx (optional) – Passed through to denoisers that support resampling.

  • include_derivatives (bool, default False) – If True, include derivative-style Savitzky-Golay and Gaussian candidates inside each window’s method grid.

  • return_format ({"pandas","polars"}) – Backend for output DataFrames.

  • n_jobs (int, default -1) – Workers used within each window’s call to compare_denoising_methods.

  • parallel_backend ({"thread","process"}, default "thread") – Parallelism backend.

  • progress (bool, default True) – Show progress bars during evaluation.

  • baseline_expand (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.

  • flank_inner (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.

  • flank_outer (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.

  • hf_enabled (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).

  • hf_cutoff_hz (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).

  • hf_cutoff_frac (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).

  • hf_resample_dx (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).

  • hf_psd_method (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).

  • hf_welch_nperseg (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).

  • auto_tune (bool)

  • auto_tune_files (list[str] | None)

Returns:

  • If return_format == “pandas”

    rolluppd.DataFrame

    Method-level aggregation across all windows.

    summary_allpd.DataFrame

    Per-window, per-method summary table (noise and peak metrics).

    detail_allpd.DataFrame

    Per-peak detail table across all windows.

  • If return_format == “polars” – rollup, summary_all, detail_all : pl.DataFrame

mioXpektron.denoise.decode_method_label(label)[source]

Translate a compact label back into noise_filtering parameters.

Parameters:

label (str)

Return type:

dict

mioXpektron.denoise.noise_filtering(intensities, *, method='wavelet', window_length=15, polyorder=3, deriv=0, gauss_sigma_pts=None, gaussian_order=0, wavelet='sym8', level=None, threshold_strategy='universal', threshold_mode='soft', sigma=None, sigma_strategy='per_level', variance_stabilize='none', anscombe_negative_policy='warn_clip', cycle_spins=0, pywt_mode='periodization', clip_nonnegative=True, preserve_tic=False, x=None, resample_to_uniform=False, target_dx=None, forward_interp='pchip')[source]

Apply 1D denoising/smoothing to ToF-SIMS spectra.

Notes

  • Savitzky–Golay / Gaussian / Median assume ~uniform sampling. If your m/z grid is nonuniform, pass x and set resample_to_uniform=True. The wavelet path can also resample when resample_to_uniform=True.

  • Wavelet shrinkage preserves narrow peaks; consider Bayes/SURE and cycle-spins.

Parameters:
  • intensities (np.ndarray) – 1D intensity array.

  • method ({'savitzky_golay','gaussian','median','wavelet','none'})

  • window_length (int) – Odd window for Savitzky–Golay or median; will be coerced to odd >=3.

  • polyorder (int) – For Savitzky–Golay, 0 ≤ polyorder < window_length.

  • deriv (int) – For Savitzky–Golay, derivative order (0 = smoothing; 1/2/… compute derivatives). Requires polyorder >= deriv.

  • gauss_sigma_pts (float or None) – If provided, overrides default sigma = window_length/6 for Gaussian filter.

  • gaussian_order (int) – For Gaussian filtering, derivative order for ndimage.gaussian_filter1d. 0 = smoothing; >0 computes derivatives.

  • wavelet (Literal['db4', 'db8', 'sym5', 'sym8', 'coif2', 'coif3']) – Passed to wavelet processing (see wavelet_denoise).

  • level (int | None) – Passed to wavelet processing (see wavelet_denoise).

  • threshold_strategy (Literal['universal', 'bayes', 'sure', 'sure_opt']) – Passed to wavelet processing (see wavelet_denoise).

  • threshold_mode (Literal['soft', 'hard']) – Passed to wavelet processing (see wavelet_denoise).

  • sigma (float | None) – Passed to wavelet processing (see wavelet_denoise).

  • cycle_spins (Literal[0, 4, 8, 16, 32]) – Passed to wavelet processing (see wavelet_denoise).

  • pywt_mode (str) – Passed to wavelet processing (see wavelet_denoise).

  • sigma_strategy ({"per_level","global"}) – Strategy if sigma is None. “per_level” = σ_j via MAD on each detail subband; “global” = one σ via MAD on the finest detail of the unshifted input.

  • variance_stabilize ({"none","anscombe"}) – Apply variance‑stabilizing transform before denoising. "anscombe" uses the classical Anscombe transform for non-negative Poisson-like input.

  • anscombe_negative_policy ({"warn_clip","clip","raise"}) – Handling policy for negative values before the classical Anscombe transform.

  • clip_nonnegative (bool) – Output behaviors.

  • preserve_tic (bool) – Output behaviors.

  • x (np.ndarray or None) – Optional m/z (or channel) axis aligned with intensities.

  • resample_to_uniform (bool) – If True and x is provided, internally resample to a uniform grid and back.

  • target_dx (float or None) – Target spacing for the uniform grid (if None, inferred).

  • forward_interp ({'pchip','linear'}) – Interpolant used when building the uniform-grid signal (PCHIP recommended).

Returns:

Filtered intensities aligned to the input grid/order.

Return type:

np.ndarray

Raises:

ValueError – If intensities or x have mismatched shapes or if intensities is not 1D. If Savitzky–Golay has polyorder < deriv after clamping, or if method is unknown.

See also

wavelet_denoise

Core wavelet denoising routine used when method=”wavelet”.

mioXpektron.denoise.plot_pareto_delta_snr_vs_height(summary, annotate=True, top_k=12, out_path=None, ax=None, basis='constrained_pareto_then_snr', require_pass=True, require_finite_metrics=True, save_plot=True, save_pareto=True)[source]

Render ΔSNR vs. |%height| with Pareto annotations.

Parameters mirror select_methods(); see DenoisingMethods.plot for additional discussion. The helper creates the Matplotlib figure when ax is omitted and optionally saves both the chart and frontier table.

mioXpektron.denoise.rank_method(input_format, summary_df, per_peak_df, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, w_hf_db=1.5, w_hf_frac=1.0, min_noise_db=0.5, min_delta_snr_db=1.0, selection_criteria=None)[source]

Dispatch ranking to pandas or polars implementation with identical semantics.

Returns a DataFrame (pandas or polars) sorted by ascending score and includes explicit pass/fail flags for denoising and peak-preservation criteria.

Parameters:
mioXpektron.denoise.select_methods(summary, basis='constrained_pareto_then_snr', top_k=12, require_pass=True, require_finite_metrics=True)[source]
Returns:

the post-filter DataFrame (shared!) frontier_df: DataFrame of Pareto points (or None if basis=’score’ and Pareto not computed) selected_df: the DataFrame of selected rows to annotate/return (top_k)

Return type:

filtered_df

Modules

denoise_batch

Batch-oriented helpers for running denoising over many spectra files.

denoise_main

ToF-SIMS 1D denoising & smoothing utilities :2: (WARNING/2) Title underline too short. ToF-SIMS 1D denoising & smoothing utilities ------------------------------------------ A focused toolkit for 1D spectra (e.g., ToF‑SIMS / MS) that implements wavelet‑shrinkage denoising and classical smoothing filters, plus resampling to a uniform grid when needed. The design emphasizes:

denoise_select

ToF-SIMS 1D denoising & smoothing utilities This module provides a battery of denoising and smoothing methods for 1D ToF-SIMS spectra and a noise-aware evaluation framework that selects methods based on both peak preservation and explicit noise reduction criteria. It supports:

main

High-level orchestration helpers for denoising spectra and reviewing results.

test_denoise_selection