mioXpektron.denoise.main
High-level orchestration helpers for denoising spectra and reviewing results.
Classes
|
Run denoising across a batch of spectra with stable outputs. |
|
Evaluate and visualize denoising strategies for mass spectrometry data. |
- class mioXpektron.denoise.main.DenoisingMethods(mz_values, raw_intensities)[source]
Bases:
objectEvaluate and visualize denoising strategies for mass spectrometry data.
- Parameters:
mz (np.ndarray | pl.Series) – The m/z axis of the spectrum.
intensity (np.ndarray | pl.Series) – Raw intensity values aligned with
mz.
- __init__(mz_values, raw_intensities)[source]
Store the raw spectrum that downstream helpers will operate on.
- classmethod compare_across_files(file_paths, *, windows=None, min_mz=None, max_mz=None, per_window_max_peaks=50, min_prominence=None, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=True, include_derivatives=False, return_format='pandas', w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, selection_criteria=None, file_n_jobs=0, file_parallel_backend='thread', method_n_jobs=None, method_parallel_backend='thread', progress=True, save_summary=True)[source]
Rank denoising methods across a cohort of spectra files.
Each file contributes one per-method summary, and the final cohort ranking aggregates those summaries with equal file weighting. This is a stronger basis for selecting a default denoiser than evaluating a single arbitrary spectrum.
Parallelism
This method supports two levels of parallelism: - file-level via
file_n_jobs/file_parallel_backend- method-level inside each file viamethod_n_jobs/method_parallel_backendWhen
file_n_jobs=0(default), worker counts are chosen automatically to avoid nested oversubscription.- returns:
(ranked_summary, sample_summary_all, detail_all)wheresample_summary_allcontains one aggregated row per file/method pair anddetail_allcontains all per-peak rows.- rtype:
tuple
- compare(min_mz, max_mz, return_format='pandas', match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, include_derivatives=False, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, selection_criteria=None, save_summary=True)[source]
Compare denoising methods across the full spectrum window.
- Parameters:
min_mz (float) – Bounds for the evaluation window.
max_mz (float) – Bounds for the evaluation window.
return_format ({"pandas", "polars"}, default "pandas") – Determines the summary dataframe type returned by the lower-level evaluators.
w_match (float) – Weights applied by
rank_method()when building the secondary dimensionless tie-break score.w_mz (float) – Weights applied by
rank_method()when building the secondary dimensionless tie-break score.w_area (float) – Weights applied by
rank_method()when building the secondary dimensionless tie-break score.w_height (float) – Weights applied by
rank_method()when building the secondary dimensionless tie-break score.w_fwhm (float) – Weights applied by
rank_method()when building the secondary dimensionless tie-break score.w_spread (float) – Weights applied by
rank_method()when building the secondary dimensionless tie-break score.w_noise_db (float) – Weights applied by
rank_method()when building the secondary dimensionless tie-break score.w_delta_snr_db (float) – Weights applied by
rank_method()when building the secondary dimensionless tie-break score.selection_criteria (dict | None, optional) – Override the default peak-preservation and denoising thresholds used to define scientifically acceptable methods.
save_summary (bool, default True) – When True and the summary is a pandas object, persist an Excel copy in
OUTPUT_DIRfor later inspection.
- Returns:
Ranked table whose concrete type depends on
return_format.- Return type:
DataFrame or LazyFrame
- compare_in_windows(windows, per_window_max_peaks=50, min_prominence=None, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=True, include_derivatives=False, return_format='pandas', w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, selection_criteria=None, save_summary=True)[source]
Compare denoising methods within pre-defined m/z windows.
Parameters mirror
compare()with additional controls for window segmentation. The return value matchesreturn_formatand includes a ranking aggregated across all windows.- Returns:
Ranked summary consistent with
return_format.- Return type:
DataFrame or LazyFrame
- plot(summary, annotate=True, top_k=3, save_plot=True, save_pareto=True)[source]
Visualize the Pareto front of SNR gain versus peak-height deviation.
- Parameters:
summary (DataFrame or LazyFrame) – Ranking output generated by
compare()orcompare_in_windows().annotate (bool, default True) – If True, label the top
top_kpoints on the Pareto chart.top_k (int, default 3) – Number of top-ranked methods to annotate.
save_plot (bool, default True) – Persist the Matplotlib figure via
plot_pareto_delta_snr_vs_height().save_pareto (bool, default True) – Persist the underlying data used to draw the plot.
- Returns:
The axis used for further customization.
- Return type:
- denoise_check(denoise_params, *, sample_name='test', group=None, log_scale_y=False, mz_min=0, mz_max=500, show_peaks=False, peak_height=1000, peak_prominence=50, min_peak_width=1, max_peak_width=None, figsize=(10, 6), save_plot=True)[source]
Preview a single denoising configuration by plotting selected peaks.
- Parameters:
denoise_params (Mapping[str, Any]) – Keyword arguments forwarded directly to
noise_filtering().sample_name (str, default "test") – Label forwarded to
PlotPeakfor file naming.group (str | None, optional) – Group identifier used by
PlotPeakwhen saving plots.log_scale_y (bool, default False) – Apply
log1pbefore plotting, useful for high-dynamic-range spectra.mz_min (float) – m/z bounds for the preview overlay.
mz_max (float) – m/z bounds for the preview overlay.
show_peaks (bool, default False) – Highlight top peaks using
PlotPeakdetection settings.peak_height (float) – Tuning knobs passed to
PlotPeakwhenshow_peaksis True.peak_prominence (float) – Tuning knobs passed to
PlotPeakwhenshow_peaksis True.min_peak_width (float) – Tuning knobs passed to
PlotPeakwhenshow_peaksis True.max_peak_width (float) – Tuning knobs passed to
PlotPeakwhenshow_peaksis True.save_plot (bool, default True) – Persist the rendered preview when requested by
PlotPeak.
- Returns:
Axis returned by
PlotPeakso callers can layer annotations.- Return type:
- method_parameters(summary, rank=0, basis='constrained_pareto_then_snr', require_pass=True, require_finite_metrics=True, save_selected=True)[source]
Extract the configuration for a ranked denoising method.
- Parameters:
summary (DataFrame | pl.DataFrame) – Ranked output produced by the comparison helpers.
rank (int, default 0) – Zero-based index of the desired method after Pareto filtering.
basis (str, default "constrained_pareto_then_snr") – Strategy forwarded to
select_methods()when Pareto filtering is available.require_pass (bool, default True) – If True, discard rows that failed the minimum denoising constraint.
require_finite_metrics (bool, default True) – Drop methods with NaNs before ranking.
save_selected (bool, default True) – Persist the filtered table to
OUTPUT_DIRfor reproducibility.
- Returns:
Parameters suitable for passing into
noise_filtering().- Return type:
- class mioXpektron.denoise.main.BatchDenoising(file_paths, *, method='wavelet', n_workers=None, backend='threads', progress=True, params=None)[source]
Bases:
objectRun denoising across a batch of spectra with stable outputs.
- __init__(file_paths, *, method='wavelet', n_workers=None, backend='threads', progress=True, params=None)[source]
Store batch processing parameters for later execution.