mioXpektron.baseline.flat_window_suggester

flat_window_suggester_polars.py

Small application to discover common “flat” m/z windows across a set of ToF‑SIMS spectra. Inputs are 3‑column tables: Channel, m/z, intensity (case‑insensitive).

Key changes vs. the original: - Replaced all pandas operations with Polars (Rust/Arrow backend). - Added support for providing an explicit list of file paths or glob patterns,

so data can be spread across many folders (no need for a single root dir).

  • Kept the numerical core (NumPy/SciPy) for smoothing & derivatives.

What it does (unchanged conceptually)

  1. Per spectrum: - Smooth intensities (Savitzky–Golay) and compute 1st/2nd derivatives. - Flag baseline‑candidate points where simultaneously:

    y_raw <= q_y quantile AND |dy/dx| <= q_g quantile AND |d²y/dx²| <= q_c quantile

    • Merge contiguous candidate points into segments; keep segments that satisfy minimum width & minimum number of points.

  2. Across all spectra: - Discretize the global m/z range into bins (width = bin_width). - For each file, mark bins covered by any of its segments. - Compute the coverage fraction per bin (#files covering bin / #files total). - Extract contiguous regions whose coverage ≥ coverage_threshold. - Rank regions by mean coverage (then by width) and return top_k windows.

Outputs

  • out_dir / per_file_segments.csv (Polars CSV)

  • out_dir / flat_windows_suggestions.csv (Polars CSV with coverage stats)

  • out_dir / flat_windows.json (list[[lo, hi], …])

  • out_dir / coverage_curve.(png|pdf) (plot of coverage vs m/z)

Functions

aggregate_common_windows(segments_by_file, ...)

Merge per-file segments into common windows via m/z bin coverage.

find_flat_segments(x, y, p)

Return list of (lo, hi, n_points) flat segments for one spectrum.

read_spectrum_table(path)

Robust reader that returns Polars DataFrame with standardized columns.

Classes

AggregateParams([bin_width, ...])

FlatParams([y_quantile, grad_quantile, ...])

ScanForFlatRegion(files, out_dir, n_jobs, ...)

mioXpektron.baseline.flat_window_suggester.read_spectrum_table(path)[source]

Robust reader that returns Polars DataFrame with standardized columns. Tries comma, tab, then whitespace-delimited tables (with ‘#’ comments).

Parameters:

path (str | Path)

Return type:

DataFrame

class mioXpektron.baseline.flat_window_suggester.FlatParams(y_quantile: 'float' = 0.2, grad_quantile: 'float' = 0.4, curv_quantile: 'float' = 0.4, savgol_window: 'int' = 11, savgol_poly: 'int' = 2, min_width: 'float' = 0.2, min_points: 'int' = 20)[source]

Bases: object

Parameters:
y_quantile: float = 0.2
grad_quantile: float = 0.4
curv_quantile: float = 0.4
savgol_window: int = 11
savgol_poly: int = 2
min_width: float = 0.2
min_points: int = 20
mioXpektron.baseline.flat_window_suggester.find_flat_segments(x, y, p)[source]

Return list of (lo, hi, n_points) flat segments for one spectrum.

Parameters:
Return type:

List[Tuple[float, float, int]]

class mioXpektron.baseline.flat_window_suggester.AggregateParams(bin_width: 'float' = 0.1, coverage_threshold: 'float' = 0.5, top_k: 'int' = 6)[source]

Bases: object

Parameters:
bin_width: float = 0.1
coverage_threshold: float = 0.5
top_k: int = 6
mioXpektron.baseline.flat_window_suggester.aggregate_common_windows(segments_by_file, x_minmax, agg)[source]

Merge per-file segments into common windows via m/z bin coverage. Returns (windows, coverage_table_df[polars]).

Parameters:
Return type:

Tuple[List[Tuple[float, float]], DataFrame]

class mioXpektron.baseline.flat_window_suggester.ScanForFlatRegion(files: 'List[Union[str, Path]]'=<factory>, out_dir: 'Union[str, Path]'='flat_windows_out', n_jobs: 'int' = -1, flat_params: 'FlatParams' = <factory>, agg_params: 'AggregateParams' = <factory>, auto_tune: 'bool' = False)[source]

Bases: object

Parameters:
files: List[str | Path]
out_dir: str | Path = 'flat_windows_out'
n_jobs: int = -1
flat_params: FlatParams
agg_params: AggregateParams
auto_tune: bool = False
run()[source]