mioXpektron.denoise.main

High-level orchestration helpers for denoising spectra and reviewing results.

Classes

BatchDenoising(file_paths, *[, method, ...])

Run denoising across a batch of spectra with stable outputs.

DenoisingMethods(mz_values, raw_intensities)

Evaluate and visualize denoising strategies for mass spectrometry data.

class mioXpektron.denoise.main.DenoisingMethods(mz_values, raw_intensities)[source]

Bases: object

Evaluate and visualize denoising strategies for mass spectrometry data.

Parameters:
  • mz (np.ndarray | pl.Series) – The m/z axis of the spectrum.

  • intensity (np.ndarray | pl.Series) – Raw intensity values aligned with mz.

__init__(mz_values, raw_intensities)[source]

Store the raw spectrum that downstream helpers will operate on.

classmethod compare_across_files(file_paths, *, windows=None, min_mz=None, max_mz=None, per_window_max_peaks=50, min_prominence=None, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=True, include_derivatives=False, return_format='pandas', w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, selection_criteria=None, file_n_jobs=0, file_parallel_backend='thread', method_n_jobs=None, method_parallel_backend='thread', progress=True, save_summary=True)[source]

Rank denoising methods across a cohort of spectra files.

Each file contributes one per-method summary, and the final cohort ranking aggregates those summaries with equal file weighting. This is a stronger basis for selecting a default denoiser than evaluating a single arbitrary spectrum.

Parallelism

This method supports two levels of parallelism: - file-level via file_n_jobs / file_parallel_backend - method-level inside each file via method_n_jobs / method_parallel_backend

When file_n_jobs=0 (default), worker counts are chosen automatically to avoid nested oversubscription.

returns:

(ranked_summary, sample_summary_all, detail_all) where sample_summary_all contains one aggregated row per file/method pair and detail_all contains all per-peak rows.

rtype:

tuple

compare(min_mz, max_mz, return_format='pandas', match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, include_derivatives=False, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, selection_criteria=None, save_summary=True)[source]

Compare denoising methods across the full spectrum window.

Parameters:
  • min_mz (float) – Bounds for the evaluation window.

  • max_mz (float) – Bounds for the evaluation window.

  • return_format ({"pandas", "polars"}, default "pandas") – Determines the summary dataframe type returned by the lower-level evaluators.

  • w_match (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_mz (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_area (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_height (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_fwhm (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_spread (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_noise_db (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • w_delta_snr_db (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.

  • selection_criteria (dict | None, optional) – Override the default peak-preservation and denoising thresholds used to define scientifically acceptable methods.

  • save_summary (bool, default True) – When True and the summary is a pandas object, persist an Excel copy in OUTPUT_DIR for later inspection.

Returns:

Ranked table whose concrete type depends on return_format.

Return type:

DataFrame or LazyFrame

compare_in_windows(windows, per_window_max_peaks=50, min_prominence=None, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=True, include_derivatives=False, return_format='pandas', w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, selection_criteria=None, save_summary=True)[source]

Compare denoising methods within pre-defined m/z windows.

Parameters mirror compare() with additional controls for window segmentation. The return value matches return_format and includes a ranking aggregated across all windows.

Returns:

Ranked summary consistent with return_format.

Return type:

DataFrame or LazyFrame

plot(summary, annotate=True, top_k=3, save_plot=True, save_pareto=True)[source]

Visualize the Pareto front of SNR gain versus peak-height deviation.

Parameters:
  • summary (DataFrame or LazyFrame) – Ranking output generated by compare() or compare_in_windows().

  • annotate (bool, default True) – If True, label the top top_k points on the Pareto chart.

  • top_k (int, default 3) – Number of top-ranked methods to annotate.

  • save_plot (bool, default True) – Persist the Matplotlib figure via plot_pareto_delta_snr_vs_height().

  • save_pareto (bool, default True) – Persist the underlying data used to draw the plot.

Returns:

The axis used for further customization.

Return type:

matplotlib.axes.Axes

denoise_check(denoise_params, *, sample_name='test', group=None, log_scale_y=False, mz_min=0, mz_max=500, show_peaks=False, peak_height=1000, peak_prominence=50, min_peak_width=1, max_peak_width=None, figsize=(10, 6), save_plot=True)[source]

Preview a single denoising configuration by plotting selected peaks.

Parameters:
  • denoise_params (Mapping[str, Any]) – Keyword arguments forwarded directly to noise_filtering().

  • sample_name (str, default "test") – Label forwarded to PlotPeak for file naming.

  • group (str | None, optional) – Group identifier used by PlotPeak when saving plots.

  • log_scale_y (bool, default False) – Apply log1p before plotting, useful for high-dynamic-range spectra.

  • mz_min (float) – m/z bounds for the preview overlay.

  • mz_max (float) – m/z bounds for the preview overlay.

  • show_peaks (bool, default False) – Highlight top peaks using PlotPeak detection settings.

  • peak_height (float) – Tuning knobs passed to PlotPeak when show_peaks is True.

  • peak_prominence (float) – Tuning knobs passed to PlotPeak when show_peaks is True.

  • min_peak_width (float) – Tuning knobs passed to PlotPeak when show_peaks is True.

  • max_peak_width (float) – Tuning knobs passed to PlotPeak when show_peaks is True.

  • save_plot (bool, default True) – Persist the rendered preview when requested by PlotPeak.

Returns:

Axis returned by PlotPeak so callers can layer annotations.

Return type:

matplotlib.axes.Axes

method_parameters(summary, rank=0, basis='constrained_pareto_then_snr', require_pass=True, require_finite_metrics=True, save_selected=True)[source]

Extract the configuration for a ranked denoising method.

Parameters:
  • summary (DataFrame | pl.DataFrame) – Ranked output produced by the comparison helpers.

  • rank (int, default 0) – Zero-based index of the desired method after Pareto filtering.

  • basis (str, default "constrained_pareto_then_snr") – Strategy forwarded to select_methods() when Pareto filtering is available.

  • require_pass (bool, default True) – If True, discard rows that failed the minimum denoising constraint.

  • require_finite_metrics (bool, default True) – Drop methods with NaNs before ranking.

  • save_selected (bool, default True) – Persist the filtered table to OUTPUT_DIR for reproducibility.

Returns:

Parameters suitable for passing into noise_filtering().

Return type:

dict

class mioXpektron.denoise.main.BatchDenoising(file_paths, *, method='wavelet', n_workers=None, backend='threads', progress=True, params=None)[source]

Bases: object

Run denoising across a batch of spectra with stable outputs.

__init__(file_paths, *, method='wavelet', n_workers=None, backend='threads', progress=True, params=None)[source]

Store batch processing parameters for later execution.

run(output_root=None, folder_name='denoised_spectrums', save_result=True)[source]

Execute the batch denoising run.

Parameters:
  • output_root (str | Path | None) – Directory where the result folder will be created. If omitted, defaults to OUTPUT_DIR.

  • folder_name (str, default "denoised_spectrums") – Name for the result folder.

  • save_result (bool, default True) – Persist the executor results dataframe to OUTPUT_DIR.

Returns:

Records describing each processed file.

Return type:

list[BatchResult]