mioXpektron 

mioXpektron.evaluate_all_models(models, data_dict, *, dataset_name='dataset')[source]

Benchmark a mapping of classifiers and return a sorted results table.

Parameters:

models (Mapping[str, Any])
data_dict (Mapping[str, Any])
dataset_name (str)

Return type:

mioXpektron.get_benchmark_models(*, random_state=42, include_boosting=True)[source]

Return a compact set of classifiers suitable for m/z matrices.

Parameters:

random_state (int)
include_boosting (bool)

Return type:

mioXpektron.infer_feature_columns(df, *, meta_cols=('SampleName', 'Group'))[source]

Return non-metadata columns, attempting to detect m/z-like headers.

Parameters:

df (DataFrame)
meta_cols (Sequence[str])

Return type:

List[str]

mioXpektron.plot_heatmap_top_features(X, y, res, savepath, *, top_n=25, label_col='Group')[source]

Heatmap of top differential features (z-scored), samples ordered by group.

Parameters:

X (DataFrame)
y (Series)
res (DataFrame)
savepath (str)
top_n (int)
label_col (str)

Return type:

None

mioXpektron.plot_pca(X_scaled, y, savepath, *, random_state=0)[source]

PCA scatter plot coloured by group labels.

Parameters:

X_scaled (ndarray)
y (Series)
savepath (str)
random_state (int)

Return type:

Tuple[ndarray, ndarray]

mioXpektron.plot_tsne(X_scaled, y, savepath, *, perplexity=30.0, random_state=0)[source]

t-SNE scatter plot coloured by group labels.

Parameters:

X_scaled (ndarray)
y (Series)
savepath (str)
perplexity (float)
random_state (int)

Return type:

ndarray

mioXpektron.plot_umap(X_scaled, y, savepath, *, n_neighbors=15, min_dist=0.1, random_state=0)[source]

UMAP embedding plot when umap-learn is installed.

Parameters:

X_scaled (ndarray)
y (Series)
savepath (str)
n_neighbors (int)
min_dist (float)
random_state (int)

Return type:

ndarray | None

mioXpektron.run_embeddings(X_scaled, y, outdir, *, methods=None, run_umap=False, run_tsne=False, random_state=0, umap_n_neighbors=15, umap_min_dist=0.1, tsne_perplexity=30.0)[source]

Compute and save requested embeddings; return coordinate arrays.

Parameters:

X_scaled (ndarray)
y (Series)
outdir (str)
methods (Sequence[str] | None)
run_umap (bool)
run_tsne (bool)
random_state (int)
umap_n_neighbors (int)
umap_min_dist (float)
tsne_perplexity (float)

Return type:

Dict[str, ndarray]

mioXpektron.analysis_capabilities()[source]

Report which extended analysis features are available.

Return type:: Dict[str, bool]

mioXpektron.plot_volcano(res, savepath, *, group_a=None, group_b=None, q_thresh=0.05, fc_thresh=1.0)[source]

Volcano plot of log2 fold-change versus -log10(p-value).

Parameters:

res (DataFrame)
savepath (str)
group_a (str | None)
group_b (str | None)
q_thresh (float)
fc_thresh (float)

Return type:

None

mioXpektron.prepare_matrix(df, *, label_col='Group', sample_col='SampleName', meta_cols=None, feature_cols=None, coerce_numeric=True, fill_na=0.0)[source]

Build a sample-by-feature matrix and group labels from pipeline output.

Accepts either:

A long table with SampleName / Group columns and m/z feature columns (typical exported CSV), or
An aligned matrix from align_peaks() where SampleName and optionally Group are index levels.

Parameters:

df (DataFrame) – Input table or aligned feature matrix.
label_col (str) – Column or index level containing group labels.
sample_col (str) – Column or index level containing sample identifiers.
meta_cols (Sequence[str] | None) – Additional metadata columns to exclude from features. Defaults to SampleName and Group only.
feature_cols (Sequence[str] | None) – Explicit feature column names. When omitted, all non-metadata columns are used.
coerce_numeric (bool) – If True, coerce feature columns to numeric (invalid values become NaN).
fill_na (float) – Value used to fill missing feature values after coercion.

Returns:

X – Feature matrix (samples x m/z), index aligned with meta.
y – Group labels indexed like X.
meta – Metadata frame with at least sample_col and label_col.

Return type:

Tuple[DataFrame, Series, DataFrame]

mioXpektron.prepare_ml_data(data, *, label_col='Group', sample_col='SampleName', test_size=0.2, random_state=42, transform='log1p', scale_features=True, handle_missing='zero')[source]

Prepare an aligned matrix for supervised classification benchmarks.

Parameters:

data (DataFrame | Tuple[DataFrame, Series]) – Either a table with metadata columns or a (X, y) tuple from prepare_matrix().
label_col (str)
sample_col (str)
test_size (float)
random_state (int)
transform (str)
scale_features (bool)
handle_missing (str)

Return type:

mioXpektron.run_analysis(data, *, config=None, **kwargs)[source]

Convenience wrapper around AnalysisWorkflow.

Parameters:

data (DataFrame | str)
config (AnalysisConfig | None)
kwargs (Any)

Return type:

mioXpektron.run_cnmf(X_pos, k_list, *, R=30, max_iter=1000, beta='frobenius', random_seeds=None, outdir=None)[source]

Run consensus NMF across multiple rank values.

Parameters:

X_pos (ndarray)
k_list (List[int])
R (int)
max_iter (int)
beta (str)
random_seeds (List[int] | None)
outdir (str | None)

Return type:

Dict[int, Dict[str, object]]

mioXpektron.calculate_multiclass_metrics(y_true, y_pred, *, y_proba=None, class_names=None, data_dict=None)[source]

Compute overall and per-class classification metrics.

Parameters:

y_true (ndarray)
y_pred (ndarray)
y_proba (ndarray | None)
class_names (Sequence[str] | None)
data_dict (Mapping[str, Any] | None)

Return type:

mioXpektron.compare_model_results(results_a, results_b, *, dataset_a='dataset_a', dataset_b='dataset_b')[source]

Merge two benchmark tables and compute accuracy deltas.

Parameters:

results_a (DataFrame)
results_b (DataFrame)
dataset_a (str)
dataset_b (str)

Return type:

mioXpektron.plot_confusion_matrix(y_true, y_pred, savepath, *, class_names=None, data_dict=None, normalize=False, title='Confusion matrix')[source]

Plot and save a confusion matrix heatmap.

Parameters:

y_true (ndarray)
y_pred (ndarray)
savepath (str)
class_names (Sequence[str] | None)
data_dict (Mapping[str, Any] | None)
normalize (bool)
title (str)

Return type:

ndarray

mioXpektron.run_multi_dataset_comparison(datasets, *, outdir='comparison_outputs', config=None, run_ml_benchmark=True)[source]

Run analysis workflows on multiple datasets and compare ML benchmarks.

Parameters:

datasets (Mapping[str, DataFrame | str])
outdir (str)
config (Any | None)
run_ml_benchmark (bool)

Return type:

mioXpektron.tune_top_models(data_dict, results_df, *, top_n=3, cv_folds=5, random_state=42, verbose=0)[source]

Grid-search hyperparameters for the top-performing models.

Parameters:

data_dict (Mapping[str, Any])
results_df (DataFrame)
top_n (int)
cv_folds (int)
random_state (int)
verbose (int)

Return type:

class mioXpektron.AggregateParams(bin_width: 'float' = 0.1, coverage_threshold: 'float' = 0.5, top_k: 'int' = 6)[source]

Bases: object

Parameters:

bin_width (float)
coverage_threshold (float)
top_k (int)

bin_width: float = 0.1

coverage_threshold: float = 0.5

top_k: int = 6

class mioXpektron.BaselineBatchCorrector(in_dir: 'Union[str, Path]', pattern: 'str' = '*.csv', recursive: 'bool' = False, method: 'str' = 'airpls', method_kwargs: 'Dict' = <factory>, clip_negative: 'bool' = True, per_file_best: 'bool' = False, best_method_map: 'Optional[Dict[str, str]]'=None, n_jobs: 'int' = -1, save_plots: 'bool' = False)[source]

Bases: object

Parameters:

in_dir (str | Path)
pattern (str)
recursive (bool)
method (str)
method_kwargs (Dict)
clip_negative (bool)
per_file_best (bool)
best_method_map (Dict[str, str] | None)
n_jobs (int)
save_plots (bool)

in_dir: str | Path

pattern: str = '*.csv'

recursive: bool = False

method: str = 'airpls'

method_kwargs: Dict

clip_negative: bool = True

per_file_best: bool = False

best_method_map: Dict[str, str] | None = None

n_jobs: int = -1

save_plots: bool = False

run(out_root=None)[source]

Parameters:: out_root (str | Path | None)
Return type:: Path

class mioXpektron.BaselineMethodEvaluator(files=<factory>, methods=None, param_grid=None, use_small_param_preset=False, auto_scale_window_size=True, eval_clip_negative=False, topk_for_snr=5, raw_noise_quantile=0.2, flat_windows=None, metrics_for_composite=('rfzn', 'nar', 'snr', 'bbi', 'br', 'nbc'), n_jobs=-1)[source]

Bases: object

Evaluate baseline algorithms on ToF‑SIMS files supplied as paths or globs.

Parameters:

files (List[str | Path])
methods (List[str] | None)
param_grid (Dict[str, List[Dict]] | None)
use_small_param_preset (bool)
auto_scale_window_size (bool)
eval_clip_negative (bool)
topk_for_snr (int)
raw_noise_quantile (float)
flat_windows (List[Tuple[float, float]] | None)
metrics_for_composite (Tuple[str, ...])
n_jobs (int)

files: List[str | Path]

methods: List[str] | None = None

param_grid: Dict[str, List[Dict]] | None = None

use_small_param_preset: bool = False

auto_scale_window_size: bool = True

eval_clip_negative: bool = False

topk_for_snr: int = 5

raw_noise_quantile: float = 0.2

flat_windows: List[Tuple[float, float]] | None = None

metrics_for_composite: Tuple[str, ...] = ('rfzn', 'nar', 'snr', 'bbi', 'br', 'nbc')

n_jobs: int = -1

labels: List[str]

specs: List[Tuple[str, Dict]]

evaluate(noise_quantile=None, n_jobs=None)[source]

Parameters:

noise_quantile (float | None)
n_jobs (int | None)

warning_log()[source]

Return type:: DataFrame

plot(out_dir='baseline_selection_output')[source]

Parameters:: out_dir (str | Path)
Return type:: List[Path]

preview_overlay(file, methods=None, max_methods=5, save_to='baseline_selection_output', show_errors=True)[source]

Plot raw, baseline and corrected overlays for a few methods on a single file.

Parameters:

file (str or Path) – Path to a single spectrum file (not a list!)
methods (list of str, optional) – Method names to plot. If None, uses top methods from evaluation.
max_methods (int) – Maximum number of methods to plot (default: 5)
save_to (str or Path, optional) – Directory to save plots. Set to None to skip saving.
show_errors (bool) – If True (default), print errors when methods fail instead of silently ignoring them.

class mioXpektron.BatchDenoising(file_paths, *, method='wavelet', n_workers=None, backend='threads', progress=True, params=None)[source]

Bases: object

Run denoising across a batch of spectra with stable outputs.

__init__(file_paths, *, method='wavelet', n_workers=None, backend='threads', progress=True, params=None)[source]: Store batch processing parameters for later execution.

run(output_root=None, folder_name='denoised_spectrums', save_result=True)[source]

Execute the batch denoising run.

Parameters:

output_root (str | Path | None) – Directory where the result folder will be created. If omitted, defaults to OUTPUT_DIR.
folder_name (str, default "denoised_spectrums") – Name for the result folder.
save_result (bool, default True) – Persist the executor results dataframe to OUTPUT_DIR.

Returns:

Records describing each processed file.

Return type:

list[BatchResult]

class mioXpektron.DenoisingMethods(mz_values, raw_intensities)[source]

Bases: object

Evaluate and visualize denoising strategies for mass spectrometry data.

Parameters:

mz (np.ndarray | pl.Series) – The m/z axis of the spectrum.
intensity (np.ndarray | pl.Series) – Raw intensity values aligned with mz.

__init__(mz_values, raw_intensities)[source]: Store the raw spectrum that downstream helpers will operate on.

classmethod compare_across_files(file_paths, *, windows=None, min_mz=None, max_mz=None, per_window_max_peaks=50, min_prominence=None, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=True, include_derivatives=False, return_format='pandas', w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, selection_criteria=None, file_n_jobs=0, file_parallel_backend='thread', method_n_jobs=None, method_parallel_backend='thread', progress=True, save_summary=True)[source]

Rank denoising methods across a cohort of spectra files.

Each file contributes one per-method summary, and the final cohort ranking aggregates those summaries with equal file weighting. This is a stronger basis for selecting a default denoiser than evaluating a single arbitrary spectrum.

Parallelism

This method supports two levels of parallelism: - file-level via file_n_jobs / file_parallel_backend - method-level inside each file via method_n_jobs / method_parallel_backend

When file_n_jobs=0 (default), worker counts are chosen automatically to avoid nested oversubscription.

returns:: (ranked_summary, sample_summary_all, detail_all) where sample_summary_all contains one aggregated row per file/method pair and detail_all contains all per-peak rows.
rtype:: tuple

compare(min_mz, max_mz, return_format='pandas', match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, include_derivatives=False, w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, selection_criteria=None, save_summary=True)[source]

Compare denoising methods across the full spectrum window.

Parameters:

min_mz (float) – Bounds for the evaluation window.
max_mz (float) – Bounds for the evaluation window.
return_format ({"pandas", "polars"}, default "pandas") – Determines the summary dataframe type returned by the lower-level evaluators.
w_match (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.
w_mz (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.
w_area (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.
w_height (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.
w_fwhm (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.
w_spread (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.
w_noise_db (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.
w_delta_snr_db (float) – Weights applied by rank_method() when building the secondary dimensionless tie-break score.
selection_criteria (dict | None, optional) – Override the default peak-preservation and denoising thresholds used to define scientifically acceptable methods.
save_summary (bool, default True) – When True and the summary is a pandas object, persist an Excel copy in OUTPUT_DIR for later inspection.

Returns:

Ranked table whose concrete type depends on return_format.

Return type:

DataFrame or LazyFrame

compare_in_windows(windows, per_window_max_peaks=50, min_prominence=None, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=True, include_derivatives=False, return_format='pandas', w_match=3.0, w_mz=2.0, w_area=2.0, w_height=1.5, w_fwhm=1.0, w_spread=1.0, w_noise_db=2.0, w_delta_snr_db=1.5, selection_criteria=None, save_summary=True)[source]

Compare denoising methods within pre-defined m/z windows.

Parameters mirror compare() with additional controls for window segmentation. The return value matches return_format and includes a ranking aggregated across all windows.

Returns:: Ranked summary consistent with return_format.
Return type:: DataFrame or LazyFrame

plot(summary, annotate=True, top_k=3, save_plot=True, save_pareto=True)[source]

Visualize the Pareto front of SNR gain versus peak-height deviation.

Parameters:

summary (DataFrame or LazyFrame) – Ranking output generated by compare() or compare_in_windows().
annotate (bool, default True) – If True, label the top top_k points on the Pareto chart.
top_k (int, default 3) – Number of top-ranked methods to annotate.
save_plot (bool, default True) – Persist the Matplotlib figure via plot_pareto_delta_snr_vs_height().
save_pareto (bool, default True) – Persist the underlying data used to draw the plot.

Returns:

The axis used for further customization.

Return type:

denoise_check(denoise_params, *, sample_name='test', group=None, log_scale_y=False, mz_min=0, mz_max=500, show_peaks=False, peak_height=1000, peak_prominence=50, min_peak_width=1, max_peak_width=None, figsize=(10, 6), save_plot=True)[source]

Preview a single denoising configuration by plotting selected peaks.

Parameters:

denoise_params (Mapping[str, Any]) – Keyword arguments forwarded directly to noise_filtering().
sample_name (str, default "test") – Label forwarded to PlotPeak for file naming.
group (str | None, optional) – Group identifier used by PlotPeak when saving plots.
log_scale_y (bool, default False) – Apply log1p before plotting, useful for high-dynamic-range spectra.
mz_min (float) – m/z bounds for the preview overlay.
mz_max (float) – m/z bounds for the preview overlay.
show_peaks (bool, default False) – Highlight top peaks using PlotPeak detection settings.
peak_height (float) – Tuning knobs passed to PlotPeak when show_peaks is True.
peak_prominence (float) – Tuning knobs passed to PlotPeak when show_peaks is True.
min_peak_width (float) – Tuning knobs passed to PlotPeak when show_peaks is True.
max_peak_width (float) – Tuning knobs passed to PlotPeak when show_peaks is True.
save_plot (bool, default True) – Persist the rendered preview when requested by PlotPeak.

Returns:

Axis returned by PlotPeak so callers can layer annotations.

Return type:

method_parameters(summary, rank=0, basis='constrained_pareto_then_snr', require_pass=True, require_finite_metrics=True, save_selected=True)[source]

Extract the configuration for a ranked denoising method.

Parameters:

summary (DataFrame | pl.DataFrame) – Ranked output produced by the comparison helpers.
rank (int, default 0) – Zero-based index of the desired method after Pareto filtering.
basis (str, default "constrained_pareto_then_snr") – Strategy forwarded to select_methods() when Pareto filtering is available.
require_pass (bool, default True) – If True, discard rows that failed the minimum denoising constraint.
require_finite_metrics (bool, default True) – Drop methods with NaNs before ranking.
save_selected (bool, default True) – Persist the filtered table to OUTPUT_DIR for reproducibility.

Returns:

Parameters suitable for passing into noise_filtering().

Return type:

dict

class mioXpektron.FlatParams(y_quantile: 'float' = 0.2, grad_quantile: 'float' = 0.4, curv_quantile: 'float' = 0.4, savgol_window: 'int' = 11, savgol_poly: 'int' = 2, min_width: 'float' = 0.2, min_points: 'int' = 20)[source]

Bases: object

Parameters:

y_quantile (float)
grad_quantile (float)
curv_quantile (float)
savgol_window (int)
savgol_poly (int)
min_width (float)
min_points (int)

y_quantile: float = 0.2

grad_quantile: float = 0.4

curv_quantile: float = 0.4

savgol_window: int = 11

savgol_poly: int = 2

min_width: float = 0.2

min_points: int = 20

class mioXpektron.ScanForFlatRegion(files: 'List[Union[str, Path]]'=<factory>, out_dir: 'Union[str, Path]'='flat_windows_out', n_jobs: 'int' = -1, flat_params: 'FlatParams' = <factory>, agg_params: 'AggregateParams' = <factory>, auto_tune: 'bool' = False)[source]

Bases: object

Parameters:

files (List[str | Path])
out_dir (str | Path)
n_jobs (int)
flat_params (FlatParams)
agg_params (AggregateParams)
auto_tune (bool)

files: List[str | Path]

out_dir: str | Path = 'flat_windows_out'

n_jobs: int = -1

flat_params: FlatParams

agg_params: AggregateParams

auto_tune: bool = False

run()[source]

mioXpektron.align_peaks(peaks_df, mz_tolerance=0.2, mz_rounding_precision=1, output='intensity')[source]

Cluster peaks by m/z and return an aligned feature matrix.

Uses a greedy sorted-bin algorithm that guarantees every aligned bin spans at most mz_tolerance in m/z.

mioXpektron.baseline_correction(intensities, method='airpls', window_size=101, poly_order=4, clip_negative=True, return_baseline=False, **kwargs)[source]

Baseline-correct a 1‑D spectrum with pybaselines or custom filters.

Parameters:

intensities (array-like) – Raw y values.
method (str) – Algorithm name; see baseline_method_names().
window_size (int) – Kernel width for the two custom filters.
poly_order (int) – Polynomial order for the ‘poly’ alias.
clip_negative (bool) – If True, negative corrected values are set to 0.
return_baseline (bool) – If True, also return the estimated baseline.
**kwargs – Forwarded to the chosen algorithm (e.g. lam=1e6, p=0.01).

Return type:

corrected or (corrected, baseline)

mioXpektron.baseline_method_names()[source]

Return a sorted list of available baseline algorithms.

Based on pybaselines.Baseline public callables, plus two custom filters (“median_filter”, “adaptive_window”) and a ‘poly’ alias. A few methods that are not 1‑D safe or impractically slow are removed.

Return type:: List[str]

mioXpektron.batch_denoise(files, output_dir, method='wavelet', n_workers=0, backend='threads', progress=True, params=None)[source]

Apply the configured denoising method to multiple spectrum files.

Parameters:

files (Iterable[str | Path]) – Collection of filesystem paths (glob results, manual list, etc.).
output_dir (str | Path) – Directory where the denoised outputs will be written.
method (str, default "wavelet") – Name of the smoothing routine forwarded to noise_filtering().
n_workers (int, default 0) – Worker count for the executor. 0 or None selects a CPU-aware default.
backend ({"threads", "processes"}, default "threads") – Execution strategy for the worker pool.
progress (bool, default True) – If True, wrap the executor iterator in tqdm when available.
params (dict | None) – Extra keyword arguments forwarded to noise_filtering().

Returns:

Status records describing each attempted file.

Return type:

list[BatchResult]

Raises:

ValueError – If no input paths exist or an unsupported backend name is provided.

class mioXpektron.FlexibleCalibrator(config=None)[source]

Bases: object

Single-model calibrator with user-selected method.

Unlike AutoCalibrator, this calibrator fits exactly one model and provides more control over outlier rejection, quality thresholds, and per-model parameters.

Parameters:: config (FlexibleCalibConfig | None)

calibrate(files, calib_channels_dict=None)[source]

Calibrate all files using the selected calibration method.

Parameters:

files (Sequence[str])
calib_channels_dict (Dict[str, Sequence[float]] | None)

Return type:

class mioXpektron.FlexibleCalibConfig(reference_masses, calibration_method='quad_sqrt', output_folder='calibrated_spectra', output_mz_range=None, max_workers=None, autodetect_tol_da=None, autodetect_tol_ppm=None, autodetect_method='gaussian', autodetect_fallback_policy='max', autodetect_strategy='mz', prefer_recompute_from_channel=False, outlier_threshold=3.0, use_outlier_rejection=True, max_iterations=3, min_calibrants=3, max_ppm_threshold=100.0, fail_on_high_error=False, retry_high_error_with_pruning=False, retry_high_error_with_mz_fallback=False, retry_high_error_max_removals=5, exclude_reference_masses=<factory>, auto_screen_reference_masses=False, screen_max_mean_abs_ppm=50.0, screen_max_median_abs_ppm=None, screen_min_valid_fraction=0.8, screen_min_count=3, screen_exclude_below_mz=1.5, spline_smoothing=None, multisegment_breakpoints=<factory>, instrument_params=<factory>, save_diagnostic_plots=False, verbose=True, auto_tune=False)[source]

Bases: object

Configuration for flexible calibration with a single user-selected method.

Parameters:

reference_masses (List[float])
calibration_method (Literal['quad_sqrt', 'linear_sqrt', 'poly2', 'reflectron', 'multisegment', 'spline', 'physical'])
output_folder (str)
output_mz_range (Tuple[float | None, float | None] | None)
max_workers (int | None)
autodetect_tol_da (float | Sequence[float] | None)
autodetect_tol_ppm (float | None)
autodetect_method (str)
autodetect_fallback_policy (str)
autodetect_strategy (str)
prefer_recompute_from_channel (bool)
outlier_threshold (float)
use_outlier_rejection (bool)
max_iterations (int)
min_calibrants (int)
max_ppm_threshold (float | None)
fail_on_high_error (bool)
retry_high_error_with_pruning (bool)
retry_high_error_with_mz_fallback (bool)
retry_high_error_max_removals (int)
exclude_reference_masses (List[float])
auto_screen_reference_masses (bool)
screen_max_mean_abs_ppm (float)
screen_max_median_abs_ppm (float | None)
screen_min_valid_fraction (float)
screen_min_count (int)
screen_exclude_below_mz (float)
spline_smoothing (float | None)
multisegment_breakpoints (List[float])
instrument_params (Dict[str, float])
save_diagnostic_plots (bool)
verbose (bool)
auto_tune (bool)

reference_masses: List[float]

calibration_method: Literal['quad_sqrt', 'linear_sqrt', 'poly2', 'reflectron', 'multisegment', 'spline', 'physical'] = 'quad_sqrt'

output_folder: str = 'calibrated_spectra'

output_mz_range: Tuple[float | None, float | None] | None = None

max_workers: int | None = None

autodetect_tol_da: float | Sequence[float] | None = None

autodetect_tol_ppm: float | None = None

autodetect_method: str = 'gaussian'

autodetect_fallback_policy: str = 'max'

autodetect_strategy: str = 'mz'

prefer_recompute_from_channel: bool = False

outlier_threshold: float = 3.0

use_outlier_rejection: bool = True

max_iterations: int = 3

min_calibrants: int = 3

max_ppm_threshold: float | None = 100.0

fail_on_high_error: bool = False

retry_high_error_with_pruning: bool = False

retry_high_error_with_mz_fallback: bool = False

retry_high_error_max_removals: int = 5

exclude_reference_masses: List[float]

auto_screen_reference_masses: bool = False

screen_max_mean_abs_ppm: float = 50.0

screen_max_median_abs_ppm: float | None = None

screen_min_valid_fraction: float = 0.8

screen_min_count: int = 3

screen_exclude_below_mz: float = 1.5

spline_smoothing: float | None = None

multisegment_breakpoints: List[float]

instrument_params: Dict[str, float]

save_diagnostic_plots: bool = False

verbose: bool = True

auto_tune: bool = False

mioXpektron.batch_tic_norm(input_pattern, output_dir='normalized_spectra', mz_min=None, mz_max=None, normalization_target=1000000.0, verbose=False)[source]

Batch‑import and preprocess multiple ToF‑SIMS spectra, then save the (m/z, normalized_intensity) arrays for each file as a tab‑separated text file in output_dir.

Parameters:

input_pattern (str) – Glob pattern (e.g. ‘spectra/*.txt’) that expands to the input files.
output_dir (str) – Folder where ‘<original‑name>_normalized.txt’ will be written; created if it does not already exist.
mz_min (float | None) – Passed through to :pyfunc:`data_preprocessing`.
mz_max (float | None) – Passed through to :pyfunc:`data_preprocessing`.
normalization_target (float | None) – Passed through to :pyfunc:`data_preprocessing`.
verbose (bool) – Passed through to :pyfunc:`data_preprocessing`.

Returns:

Paths of the files written, in processing order.

Return type:

List[str]

mioXpektron.check_overlapping_peaks(data_dir, file_name, mz_min, mz_max, norm_tic=False, alpha=0.2)[source]

mioXpektron.check_overlapping_peaks2(data_dir, file_pattern, mz_min, mz_max, norm_tic=False, alpha=0.18, bin_width=0.001, show_median=True, show_group_cumulative=True)[source]

Overlay spectra with two colors (Cancer vs Control) inferred from file names.

Parameters:

data_dir (str) – Directory containing spectra.
file_pattern (str) – Glob pattern (e.g., “*.txt”).
mz_min (float) – m/z window to visualize.
mz_max (float) – m/z window to visualize.
norm_tic (bool, default False) – Normalize each spectrum by its TIC prior to plotting.
alpha (float, default 0.18) – Line transparency for individual spectra.
bin_width (float, default 0.001) – Common grid step for interpolation (used for medians/cumulative plots).
show_median (bool, default True) – If True, overlay per-group median curves (thicker lines).
show_group_cumulative (bool, default True) – If True, plot per-group cumulative intensity curves on a separate figure.

Notes

Group detection is based on substrings in filenames: “_CC” (Cancer), “_CT” (Control).
Files without these markers are labeled “Unknown” and plotted in grey.

mioXpektron.collect_peak_properties_batch(files, mz_min=None, mz_max=None, baseline_method='airpls', noise_method='wavelet', missing_value_method='interpolation', normalization_target=100000000.0, method='Gaussian', min_intensity=1, min_snr=3, min_distance=5, window_size=10, peak_height=50, prominence=50, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, distance_threshold=0.01, combined=False, noise_model='global', noise_bins=20, noise_min_points=25, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True)[source]

Collect peak properties from a batch of ToF-SIMS files.

Parameters:

files (list of str) – List of file paths to process.
mz_min (float or None) – m/z window for data import (if supported).
mz_max (float or None) – m/z window for data import (if supported).
baseline_method (str) – Method for baseline correction.
noise_method (str) – Noise filtering method.
missing_value_method (str) – Method for handling missing values.
normalization_target (float) – Target TIC normalization value.
min_snr (int or float) – Minimum signal-to-noise ratio for peak detection.
min_distance (int) – Minimum distance between peaks (in data points).
prominence (int or float or None) – Minimum peak prominence for detection.
width_rel_height (float) – Relative height for width calculation (e.g., 0.5 = FWHM).
noise_model ({"global", "mz_binned"}) – Noise model used to derive peak thresholds.
noise_bins (int) – Number of m/z bins for noise_model="mz_binned".
noise_min_points (int) – Minimum positive noise points per bin before using local estimates.

Returns:

peaks_df – DataFrame with all peak properties for all files.

Return type:

pd.DataFrame

mioXpektron.compare_denoising_methods(x, y, *, min_mz=None, max_mz=None, max_peaks=300, min_prominence=None, rel_height=0.5, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=False, include_derivatives=False, target_dx=None, return_format='pandas', n_jobs=-1, parallel_backend='thread', progress=True, baseline_expand=2.0, flank_inner=1.5, flank_outer=3.0, hf_enabled=True, hf_cutoff_hz=None, hf_cutoff_frac=0.3, hf_resample_dx=None, hf_psd_method='welch', hf_welch_nperseg=None)[source]

Run a battery of denoising/smoothing methods and quantify both peak preservation and noise reduction.

Parameters:

x (array-like or None) – m/z axis. If None, an index axis [0..N-1] is used.
y (array-like) – Raw intensities.
min_mz (float, optional) – Optional range restriction on x prior to peak detection and evaluation.
max_mz (float, optional) – Optional range restriction on x prior to peak detection and evaluation.
max_peaks (int, default 300) – Maximum number of reference peaks (by prominence) to evaluate in the selected range.
min_prominence (float, optional) – Prominence threshold for scipy.signal.find_peaks during reference detection.
rel_height (float, default 0.5) – Relative height used for FWHM measurements (e.g., 0.5 = half-height).
search_ppm (float, default 20.0) – ±ppm window around each reference m/z used to re-detect peaks after denoising.
match_min_prominence_ratio (float, default 0.1) – Minimum post-denoise prominence required for a matched peak, expressed as a fraction of the raw reference prominence.
match_min_prominence_abs (float, default 0.0) – Absolute lower bound for post-denoise peak prominence.
match_min_width_pts (float, default 0.25) – Minimum acceptable peak width in index points for a post-denoise match.
resample_to_uniform (bool, default False) – If True, allow denoisers to resample to a uniform grid internally when beneficial.
include_derivatives (bool, default False) – If True, include derivative-style Savitzky-Golay (deriv>0) and Gaussian (order>0) operators in the candidate grid. By default the search includes only smoothing/denoising variants.
target_dx (float, optional) – Desired spacing when resample_to_uniform=True.
return_format ({"pandas","polars"}, default "pandas") – Backend for output DataFrames.
n_jobs (int, default -1) – Number of workers used to evaluate methods in parallel (1 disables parallelism).
parallel_backend ({"thread","process"}, default "thread") – Parallelism backend. Threads are often efficient because NumPy/SciPy/PyWavelets drop the GIL.
progress (bool, default True) – Show a progress bar if tqdm is available.
baseline_expand (float, default 2.0) – Multiplier to expand each peak’s FWHM window when masking baseline regions used for noise/PSD estimates.
flank_inner (float, defaults 1.5 and 3.0) – Distances (in FWHM multiples) defining flanking bands used for local noise estimation.
flank_outer (float, defaults 1.5 and 3.0) – Distances (in FWHM multiples) defining flanking bands used for local noise estimation.
hf_enabled (bool, default True) – If True, compute high-frequency (HF) residual power metrics on baseline regions via PSD.
hf_cutoff_hz (float, optional) – Absolute HF cutoff frequency (cycles per m/z). If None, uses hf_cutoff_frac * Nyquist.
hf_cutoff_frac (float, default 0.3) – Fraction of the Nyquist frequency used as the HF band when hf_cutoff_hz is None.
hf_resample_dx (float, optional) – Δx used to resample baseline segments to a uniform grid for PSD; defaults to median Δx if None.
hf_psd_method ({"welch","periodogram"}, default "welch") – PSD estimator for HF power. Welch provides lower-variance estimates on finite windows.
hf_welch_nperseg (int, optional) – Segment length for Welch PSD. If None, chosen automatically (≈ max(16, N/8), power-of-two, ≤1024).

Returns:

summary_df, per_peak_df – If return_format=”pandas”, returns pandas.DataFrame; if “polars”, returns polars.DataFrame. summary_df contains method-level medians/IQRs, noise and HF metrics; per_peak_df has per-peak rows.

Return type:

DataFrame

mioXpektron.compare_methods_in_windows(x, y, windows, *, per_window_max_peaks=50, min_prominence=None, rel_height=0.5, search_ppm=20.0, match_min_prominence_ratio=0.1, match_min_prominence_abs=0.0, match_min_width_pts=0.25, resample_to_uniform=False, include_derivatives=False, target_dx=None, return_format='pandas', n_jobs=-1, parallel_backend='thread', progress=True, baseline_expand=2.0, flank_inner=1.5, flank_outer=3.0, hf_enabled=True, hf_cutoff_hz=None, hf_cutoff_frac=0.3, hf_resample_dx=None, hf_psd_method='welch', hf_welch_nperseg=None, auto_tune=False, auto_tune_files=None)[source]

Evaluate denoising methods across multiple m/z windows and aggregate results.

Parameters:

x (np.ndarray) – m/z axis and intensity values.
y (np.ndarray) – m/z axis and intensity values.
windows (list[tuple[float, float]]) – Each tuple is (min_mz, max_mz) for a window to evaluate.
per_window_max_peaks (int, default 50) – Max number of strongest peaks (by prominence) to measure within each window.
min_prominence (float, optional) – Minimum prominence passed to signal.find_peaks for reference peak detection.
rel_height (float, default 0.5) – Relative height used to define FWHM when measuring peaks.
search_ppm (float, default 20.0) – ±ppm window around each reference m/z used to re-detect peaks after denoising.
match_min_prominence_ratio (floats) – Forwarded to the peak re-matching logic used after denoising.
match_min_prominence_abs (floats) – Forwarded to the peak re-matching logic used after denoising.
match_min_width_pts (floats) – Forwarded to the peak re-matching logic used after denoising.
resample_to_uniform (optional) – Passed through to denoisers that support resampling.
target_dx (optional) – Passed through to denoisers that support resampling.
include_derivatives (bool, default False) – If True, include derivative-style Savitzky-Golay and Gaussian candidates inside each window’s method grid.
return_format ({"pandas","polars"}) – Backend for output DataFrames.
n_jobs (int, default -1) – Workers used within each window’s call to compare_denoising_methods.
parallel_backend ({"thread","process"}, default "thread") – Parallelism backend.
progress (bool, default True) – Show progress bars during evaluation.
baseline_expand (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.
flank_inner (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.
flank_outer (floats) – Baseline mask expansion and flanking-band multipliers forwarded to noise metrics.
hf_enabled (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_cutoff_hz (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_cutoff_frac (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_resample_dx (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_psd_method (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
hf_welch_nperseg (optional) – High-frequency PSD controls forwarded to noise metrics (Welch is lower-variance).
auto_tune (bool)
auto_tune_files (list[str] | None)

Returns:

If return_format == “pandas” –

rolluppd.DataFrame
Method-level aggregation across all windows.

summary_allpd.DataFrame
Per-window, per-method summary table (noise and peak metrics).

detail_allpd.DataFrame
Per-peak detail table across all windows.
If return_format == “polars” – rollup, summary_all, detail_all : pl.DataFrame

mioXpektron.data_preprocessing(file_path, mz_min=None, mz_max=None, normalization_target=1000000.0, verbose=True, return_all=False)[source]

Import and preprocess ToF-SIMS data from a text file.

Parameters:

file_pathstr: Path to the ToF-SIMS data file
mz_min, mz_maxfloat, optional: m/z range to import
normalization_targetfloat or None: Target TIC for normalization, or None to skip
verbosebool: Print progress if True
return_allbool: If True, return all intermediate arrays

Returns:

mz_values : numpy.ndarray normalized_intensities : numpy.ndarray sample_name : str group : str (optionally: intermediate arrays)

mioXpektron.decode_method_label(label)[source]

Translate a compact label back into noise_filtering parameters.

Parameters:: label (str)
Return type:: dict

mioXpektron.detect_peaks_cwt_with_area(mz_values, intensities, sample_name, group, min_intensity=1, min_snr=3, min_distance=2, window_size=10, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, verbose=False)[source]

Peak detection using Continuous Wavelet Transform (CWT) for ToF-SIMS spectra.

Returns:

peak_propertiespd.DataFrame: Contains: mz, intensities, widths (approx), amplitudes, areas

mioXpektron.detect_peaks_with_area(mz_values, intensities, sample_name, group, min_intensity=1, min_snr=3, min_distance=2, window_size=10, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, verbose=False)[source]

Fast peak detection in ToF-SIMS or similar spectra, including peak area.

Returns:

peak_indicesnp.ndarray: Indices of detected peaks
peak_propertiesdict: Contains: mz, intensities, widths, prominences, heights, areas

mioXpektron.detect_peaks_with_area_v2(mz, intens, sample_name, group, *, min_intensity=1, min_snr=3, min_distance=2, prominence=10, min_peak_width=1, max_peak_width=75, rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, noise_window=10, verbose=False)[source]

mioXpektron.import_data(file_path, mz_min=None, mz_max=None, group_patterns=None, group_fn=None)[source]

Import ToF-SIMS data from a spectrum file.

Parameters:

file_path (str) – Path to the ToF-SIMS data file. Supports tab-delimited .txt exports with m/z + Intensity columns and CSV exports with mz + corrected_intensity or intensity columns.
mz_min (float, optional) – Minimum m/z value to be imported (inclusive).
mz_max (float, optional) – Maximum m/z value to be imported (inclusive).
group_patterns (dict[str, str], optional) – Mapping of {regex_pattern: group_label}. Patterns are tested against the sample name (filename without extension) in order; the first match determines the group. Defaults to {'_CC...': 'Cancer', '_CT...': 'Control'}.
group_fn (callable, optional) – A function (sample_name: str) -> str that returns the group label directly. When provided this takes priority over group_patterns.

Returns:

mz (np.ndarray) – Mass-to-charge ratio values.
intensity (np.ndarray) – Intensity values.
sample_name (str) – Sample name extracted from file name.
group (str) – Group label derived from the filename.

Return type:

Tuple[ndarray, ndarray, str, str]

mioXpektron.normalization_target(files, mz_min=None, mz_max=None)[source]

Normalize peak intensities or areas to a target value.

Parameters:

files (list of str) – List of file paths to process.
mz_min (float or None) – m/z window for data import (if supported).
mz_max (float or None) – m/z window for data import (if supported).
baseline_method (str) – Method for baseline correction.
noise_method (str) – Noise filtering method.
missing_value_method (str) – Method for handling missing values.

Returns:

normalized_df – Normalized DataFrame.

Return type:

pd.DataFrame

mioXpektron.noise_filtering(intensities, *, method='wavelet', window_length=15, polyorder=3, deriv=0, gauss_sigma_pts=None, gaussian_order=0, wavelet='sym8', level=None, threshold_strategy='universal', threshold_mode='soft', sigma=None, sigma_strategy='per_level', variance_stabilize='none', anscombe_negative_policy='warn_clip', cycle_spins=0, pywt_mode='periodization', clip_nonnegative=True, preserve_tic=False, x=None, resample_to_uniform=False, target_dx=None, forward_interp='pchip')[source]

Apply 1D denoising/smoothing to ToF-SIMS spectra.

Notes

Savitzky–Golay / Gaussian / Median assume ~uniform sampling. If your m/z grid is nonuniform, pass x and set resample_to_uniform=True. The wavelet path can also resample when resample_to_uniform=True.
Wavelet shrinkage preserves narrow peaks; consider Bayes/SURE and cycle-spins.

Parameters:

intensities (np.ndarray) – 1D intensity array.
method ({'savitzky_golay','gaussian','median','wavelet','none'})
window_length (int) – Odd window for Savitzky–Golay or median; will be coerced to odd >=3.
polyorder (int) – For Savitzky–Golay, 0 ≤ polyorder < window_length.
deriv (int) – For Savitzky–Golay, derivative order (0 = smoothing; 1/2/… compute derivatives). Requires polyorder >= deriv.
gauss_sigma_pts (float or None) – If provided, overrides default sigma = window_length/6 for Gaussian filter.
gaussian_order (int) – For Gaussian filtering, derivative order for ndimage.gaussian_filter1d. 0 = smoothing; >0 computes derivatives.
wavelet (Literal['db4', 'db8', 'sym5', 'sym8', 'coif2', 'coif3']) – Passed to wavelet processing (see wavelet_denoise).
level (int | None) – Passed to wavelet processing (see wavelet_denoise).
threshold_strategy (Literal['universal', 'bayes', 'sure', 'sure_opt']) – Passed to wavelet processing (see wavelet_denoise).
threshold_mode (Literal['soft', 'hard']) – Passed to wavelet processing (see wavelet_denoise).
sigma (float | None) – Passed to wavelet processing (see wavelet_denoise).
cycle_spins (Literal[0, 4, 8, 16, 32]) – Passed to wavelet processing (see wavelet_denoise).
pywt_mode (str) – Passed to wavelet processing (see wavelet_denoise).
sigma_strategy ({"per_level","global"}) – Strategy if sigma is None. “per_level” = σ_j via MAD on each detail subband; “global” = one σ via MAD on the finest detail of the unshifted input.
variance_stabilize ({"none","anscombe"}) – Apply variance‑stabilizing transform before denoising. "anscombe" uses the classical Anscombe transform for non-negative Poisson-like input.
anscombe_negative_policy ({"warn_clip","clip","raise"}) – Handling policy for negative values before the classical Anscombe transform.
clip_nonnegative (bool) – Output behaviors.
preserve_tic (bool) – Output behaviors.
x (np.ndarray or None) – Optional m/z (or channel) axis aligned with intensities.
resample_to_uniform (bool) – If True and x is provided, internally resample to a uniform grid and back.
target_dx (float or None) – Target spacing for the uniform grid (if None, inferred).
forward_interp ({'pchip','linear'}) – Interpolant used when building the uniform-grid signal (PCHIP recommended).

Returns:

Filtered intensities aligned to the input grid/order.

Return type:

np.ndarray

Raises:

ValueError – If intensities or x have mismatched shapes or if intensities is not 1D. If Savitzky–Golay has polyorder < deriv after clamping, or if method is unknown.

Returns:

peak_indicesnp.ndarray: Indices of detected peaks
peak_propertiesdict: Contains: mz, intensities, widths, prominences, heights, areas

Notes

Overlapping-peak deconvolution now requires both geometric overlap and a BIC improvement over a single-Gaussian window fit. Fitted component widths must also remain within the user-specified peak-width bounds.

mioXpektron.small_param_grid_preset(n_points=None)[source]

A compact parameter grid for common methods.

Keys must match pybaselines.Baseline method names (plus ‘poly’ and our two filters).

Parameters:: n_points (int, optional) – Number of data points in spectrum. If provided, window_size will be calculated adaptively as a percentage of data size. If None, uses moderate defaults suitable for ~100K point spectra.
Returns:: Parameter grid with method names as keys
Return type:: dict

Notes

Window sizes are calculated as: - Small: 0.05% of data (min 51) - Medium: 0.10% of data (min 101) - Large: 0.20% of data (min 501)

This adaptive scaling ensures that filter methods perform consistently across datasets of different sizes. Fixed window sizes work poorly: - For 10K points: window=101 is 1.0% (OK) - For 1M points: window=101 is 0.01% (too small, causes jagged baselines)

Examples

>>> # Auto-scale for 938K point spectrum
>>> grid = small_param_grid_preset(n_points=938000)
>>> grid['median_filter']
[{'window_size': 469}, {'window_size': 938}, {'window_size': 1876}]

>>> # Use defaults for unknown size
>>> grid = small_param_grid_preset()
>>> grid['median_filter']
[{'window_size': 501}, {'window_size': 1001}, {'window_size': 2001}]

mioXpektron.tic_normalization(intensities, target_tic=1000000.0)[source]

Scale intensities so the total-ion current equals target_tic.

This is the most common normalisation in ToF-SIMS. Each spectrum is multiplied by target_tic / sum(intensities) so that all spectra share the same TIC.

Parameters:

intensities (array-like) – Raw ion counts or intensities.
target_tic (float or None) – Desired total-ion current after scaling. Pass None to skip.

Return type:

np.ndarray

mioXpektron.run_pipeline(files, *, calib_channels_dict=None, config=None)[source]

Run the end‑to‑end ToF‑SIMS batch pipeline and return aligned matrices.

Steps

Optional recalibration (Channel→m/z)
Denoising
Baseline correction
TIC normalization
Peak detection and alignment → unified m/z × samples tables

rtype:: (intensity_df, area_df) aligned by m/z across samples.

Parameters:

files (Sequence[str])
calib_channels_dict (Dict[str, Sequence[float]] | None)
config (PipelineConfig | None)

Return type:

Tuple[DataFrame, DataFrame]

class mioXpektron.PipelineConfig(use_recalibration=True, reference_masses=None, output_folder_calibrated='calibrated_spectra', denoise_method='wavelet', denoise_params=None, baseline_method='airpls', baseline_params=None, clip_negative_after_baseline=True, normalization_target=1000000.0, mz_min=None, mz_max=None, mz_tolerance=0.2, mz_rounding_precision=1, max_workers=None, auto_tune=False)[source]

Bases: object

High-level pipeline configuration for batch ToF‑SIMS processing.

Parameters:

use_recalibration (bool)
reference_masses (List[float] | None)
output_folder_calibrated (str)
denoise_method (str)
denoise_params (Dict | None)
baseline_method (str)
baseline_params (Dict | None)
clip_negative_after_baseline (bool)
normalization_target (float)
mz_min (float | None)
mz_max (float | None)
mz_tolerance (float)
mz_rounding_precision (int)
max_workers (int | None)
auto_tune (bool)

use_recalibration: bool = True

reference_masses: List[float] | None = None

output_folder_calibrated: str = 'calibrated_spectra'

denoise_method: str = 'wavelet'

denoise_params: Dict | None = None

baseline_method: str = 'airpls'

baseline_params: Dict | None = None

clip_negative_after_baseline: bool = True

normalization_target: float = 1000000.0

mz_min: float | None = None

mz_max: float | None = None

mz_tolerance: float = 0.2

mz_rounding_precision: int = 1

max_workers: int | None = None

auto_tune: bool = False

class mioXpektron.AutoCalibrator(config=None)[source]

Bases: object

Automatic multi-model calibrator.

Fits all requested models, selects the best one per file, and writes calibrated spectra.

Parameters:: config (AutoCalibConfig | None)

calibrate(files, calib_channels_dict=None)[source]

Calibrate all files with automatic model selection.

Parameters:

files (Sequence[str])
calib_channels_dict (Dict[str, Sequence[float]] | None)

Return type:

class mioXpektron.AutoCalibConfig(reference_masses, output_folder='calibrated_spectra', max_workers=None, autodetect_tol_da=None, autodetect_tol_ppm=None, autodetect_method='gaussian', autodetect_fallback_policy='max', autodetect_strategy='mz', prefer_recompute_from_channel=False, outlier_threshold=3.0, use_outlier_rejection=True, max_iterations=3, model=None, models_to_try=None, prefer_physical_models=True, min_calibrants=3, max_ppm_warning=100.0, max_ppm_error=500.0, use_bootstrap_init=True, spline_smoothing=None, multisegment_breakpoints=<factory>, instrument_params=<factory>)[source]

Bases: object

Universal calibration configuration with robust options.

Parameters:

reference_masses (list of float) – Known calibrant ion masses (m/z).
model (str, optional) – Convenience shortcut — a single model name (or common alias like 'quadratic', 'tof', 'linear'). Resolved into models_to_try during __post_init__. Ignored when models_to_try is explicitly provided.
models_to_try (list of str, optional) – Explicit list of model names to fit. Default: all production-ready models (excludes experimental ones such as multisegment and physical).
output_folder (str)
max_workers (int | None)
autodetect_tol_da (float | None)
autodetect_tol_ppm (float | None)
autodetect_method (str)
autodetect_fallback_policy (str)
autodetect_strategy (str)
prefer_recompute_from_channel (bool)
outlier_threshold (float)
use_outlier_rejection (bool)
max_iterations (int)
prefer_physical_models (bool)
min_calibrants (int)
max_ppm_warning (float)
max_ppm_error (float)
use_bootstrap_init (bool)
spline_smoothing (float | None)
multisegment_breakpoints (List[float])
instrument_params (Dict[str, float])

reference_masses: List[float]

output_folder: str = 'calibrated_spectra'

max_workers: int | None = None

autodetect_tol_da: float | None = None

autodetect_tol_ppm: float | None = None

autodetect_method: str = 'gaussian'

autodetect_fallback_policy: str = 'max'

autodetect_strategy: str = 'mz'

prefer_recompute_from_channel: bool = False

outlier_threshold: float = 3.0

use_outlier_rejection: bool = True

max_iterations: int = 3

model: str | None = None

models_to_try: List[str] | None = None

prefer_physical_models: bool = True

min_calibrants: int = 3

max_ppm_warning: float = 100.0

max_ppm_error: float = 500.0

use_bootstrap_init: bool = True

spline_smoothing: float | None = None

multisegment_breakpoints: List[float]

instrument_params: Dict[str, float]

class mioXpektron.PlotPeaks(config=None)[source]

Bases: object

Class for plotting overlapping peaks from multiple spectra files.

Features: - Load and group spectra by inferred labels (Cancer/Control/Unknown) - Overlay individual spectra with customizable transparency - Plot per-group median curves - Plot cumulative intensity by group - Flexible configuration through PlotPeaksConfig

Example:

>>> config = PlotPeaksConfig(
...     data_dir="data/spectra",
...     mz_min=100.0,
...     mz_max=200.0,
...     norm_tic=True,
...     save_fig=True,
...     save_path="../output_files/plots"
... )
>>> plotter = PlotPeaks(config)
>>> plotter.load_data()
>>> plotter.plot_overlay()
>>> plotter.plot_cumulative()

__init__(config=None)[source]

Initialize PlotPeaks.

Parameters:: config (PlotPeaksConfig, optional) – Configuration object. If None, must set attributes manually.

set_group_inference(func)[source]

Set custom group inference function.

Parameters:: func (callable) – Function that takes a file path and returns group label.

static load_window(file_path, mz_min, mz_max, norm_tic=False)[source]

Read one spectrum and return (m/z, intensity) in the requested window.

Parameters:

file_path (str) – Path to a tab or comma-separated spectrum with columns for m/z and intensity. Column names are case-insensitive and support variations: - m/z: “mz”, “m/z”, “M/Z”, “MZ”, “Mz” - intensity: “intensity”, “Intensity”, “INTENSITY”, “int”, “Int”
mz_min (float) – Inclusive m/z window to extract.
mz_max (float) – Inclusive m/z window to extract.
norm_tic (bool, default False) – If True, normalize intensities by total ion count (sum to 1).

Returns:

mz (np.ndarray)
inten (np.ndarray) – Intensities scaled by 1e6 (to keep values readable on plots).

Return type:

Tuple[ndarray, ndarray]

load_data()[source]

Load all files matching the pattern and group them by inferred labels.

Raises:: RuntimeError – If no files match the pattern.
Return type:: None

get_group_counts()[source]

Get counts of spectra per group.

Returns:: Dictionary with group names as keys and counts as values.
Return type:: dict

plot_overlay(ax=None, show=True)[source]

Plot overlapping spectra with optional median curves.

Parameters:

ax (matplotlib.axes.Axes, optional) – Axes to plot on. If None, creates new figure.
show (bool, default True) – If True, call plt.show() at the end.

Returns:

The figure object.

Return type:

matplotlib.figure.Figure

plot_cumulative(ax=None, show=True)[source]

Plot cumulative intensity curves by group.

Parameters:

ax (matplotlib.axes.Axes, optional) – Axes to plot on. If None, creates new figure.
show (bool, default True) – If True, call plt.show() at the end.

Returns:

The figure object.

Return type:

matplotlib.figure.Figure

plot_all()[source]

Convenience method to plot both overlay and cumulative plots.

Returns:

fig_overlay (matplotlib.figure.Figure) – The overlay plot figure.
fig_cumulative (matplotlib.figure.Figure) – The cumulative plot figure (or None if disabled).

Return type:

Tuple[Figure, Figure]

Parameters:: config (PlotPeaksConfig | None)

class mioXpektron.PlotPeaksConfig(data_dir, file_pattern='*.txt', mz_min=0.0, mz_max=1000.0, norm_tic=False, bin_width=0.001, alpha=0.18, show_median=True, show_group_cumulative=True, figsize=(10, 6), cumulative_figsize=(10, 4), color_map=None, save_fig=False, save_path='output_files/plots')[source]

Bases: object

Configuration for PlotPeaks class.

Parameters:

data_dir (str) – Directory containing spectra files.
file_pattern (str, default “*.txt”) – Glob pattern for matching spectrum files.
mz_min (float, default 0.0) – Minimum m/z value for the plotting window.
mz_max (float, default 1000.0) – Maximum m/z value for the plotting window.
norm_tic (bool, default False) – If True, normalize intensities by total ion count.
bin_width (float, default 0.001) – Bin width for interpolation grid.
alpha (float, default 0.18) – Transparency for individual spectra lines.
show_median (bool, default True) – If True, overlay median curves on the plot.
show_group_cumulative (bool, default True) – If True, create cumulative intensity plot.
figsize (tuple, default (10, 6)) – Figure size for overlay plot.
cumulative_figsize (tuple, default (10, 4)) – Figure size for cumulative plot.
color_map (dict, optional) – Dictionary mapping group names to colors.
save_fig (bool, default False) – If True, save figures as PDF files.
save_path (str, default "../output_files/plots") – Directory path where PDF files will be saved.

data_dir: str

file_pattern: str = '*.txt'

mz_min: float = 0.0

mz_max: float = 1000.0

norm_tic: bool = False

bin_width: float = 0.001

alpha: float = 0.18

show_median: bool = True

show_group_cumulative: bool = True

figsize: Tuple[int, int] = (10, 6)

cumulative_figsize: Tuple[int, int] = (10, 4)

color_map: Dict[str, str] | None = None

save_fig: bool = False

save_path: str = 'output_files/plots'

mioXpektron.plot_overlapping_peaks(data_dir, file_pattern, mz_min, mz_max, norm_tic=False, alpha=0.18, bin_width=0.001, show_median=True, show_group_cumulative=True)[source]

Overlay spectra with two colors (Cancer vs Control) inferred from file names.

DEPRECATED: This function is maintained for backwards compatibility. Use PlotPeaks class for new code.

Parameters:

data_dir (str) – Directory containing spectra.
file_pattern (str) – Glob pattern (e.g., “*.txt”).
mz_min (float) – m/z window to visualize.
mz_max (float) – m/z window to visualize.
norm_tic (bool, default False) – Normalize each spectrum by its TIC prior to plotting.
alpha (float, default 0.18) – Line transparency for individual spectra.
bin_width (float, default 0.001) – Common grid step for interpolation (used for medians/cumulative plots).
show_median (bool, default True) – If True, overlay per-group median curves (thicker lines).
show_group_cumulative (bool, default True) – If True, plot per-group cumulative intensity curves on a separate figure.

Notes

Group detection is based on substrings in filenames: “_CC” (Cancer), “_CT” (Control).
Files without these markers are labeled “Unknown” and plotted in grey.

Examples

>>> # New recommended approach
>>> config = PlotPeaksConfig(
...     data_dir="data/spectra",
...     mz_min=100.0,
...     mz_max=200.0
... )
>>> plotter = PlotPeaks(config)
>>> plotter.load_data()
>>> plotter.plot_all()

class mioXpektron.FlexibleCalibratorDebug(config=None)[source]

Bases: FlexibleCalibrator

Debug variant of FlexibleCalibrator with verbose peak-picking.

Inherits all calibration logic and overrides only _autodetect_channels to route through the diagnostic versions of _enhanced_pick_channels and _parabolic_peak_center.

Parameters:: config (FlexibleCalibConfig | None)

mioXpektron.FlexibleCalibConfigDebug: alias of FlexibleCalibConfig

class mioXpektron.BatchTicNorm(input_pattern, output_dir='normalized_spectra', normalization_target=1000000.0, n_workers=-1, verbose=True)[source]

Bases: object

Batch TIC normalization for multiple spectra files using Polars and concurrent.futures.

Supports both CSV and TXT file formats: - CSV: Uses ‘corrected_intensity’ if available, otherwise ‘intensity’ - TXT: Tab-separated m/z and intensity values

Output files contain: channel, mz, intensity (normalized)

Parameters:

input_pattern (str)
output_dir (str)
normalization_target (float)
n_workers (int)
verbose (bool)

__init__(input_pattern, output_dir='normalized_spectra', normalization_target=1000000.0, n_workers=-1, verbose=True)[source]

Initialize BatchTicNorm processor.

Parameters:

input_pattern (str) – Glob pattern for input files (e.g., ‘data/.csv’ or ‘data/.txt’)
output_dir (str) – Directory to save normalized files
normalization_target (float) – Target TIC value for normalization (default: 1e6)
n_workers (int) – Number of parallel workers (default: 16)
verbose (bool) – Print progress information

process()[source]

Process all files using concurrent.futures.

Returns:: List of output file paths that were successfully created
Return type:: List[str]

get_tic_statistics()[source]

Calculate TIC statistics for all input files before normalization.

Returns:: DataFrame with columns: filename, tic_original, tic_million
Return type:: pl.DataFrame

mioXpektron.normalize(intensities, method='tic', **kwargs)[source]

Apply a named normalization method to a 1-D intensity array.

Parameters:

intensities (array-like) – Raw intensity values (1-D).
method (str, default "tic") – Name of the normalization method. Call normalization_method_names() for the full list.
**kwargs – Method-specific keyword arguments forwarded to the underlying function (e.g. target_tic for TIC, reference_mz_idx for selected-ion normalization).

Returns:

Normalized intensity values.

Return type:

np.ndarray

Raises:

ValueError – If method is not recognised.

mioXpektron.normalization_method_names()[source]

Return a sorted list of available 1-D normalization method names.

Return type:: List[str]

class mioXpektron.NormalizationEvaluator(files=<factory>, methods=None, method_kwargs_map=None, mz_min=None, mz_max=None, n_clusters=None, cluster_bootstrap_rounds=30, cluster_bootstrap_frac=0.8, random_state=0, compute_supervised=True, n_jobs=-1, group_patterns=None, group_fn=None)[source]

Bases: object

Evaluate normalization methods on labelled ToF-SIMS spectra.

Parameters:

files (list of str or Path) – Paths or glob patterns expanding to spectrum text files.
methods (list of str, optional) – Normalization method names. Defaults to a sensible subset.
method_kwargs_map (dict, optional) – {method_name: {kwarg: value, ...}} for method-specific params.
mz_min (float, optional) – m/z range to import.
mz_max (float, optional) – m/z range to import.
n_clusters (int, optional) – Number of clusters for KMeans evaluation. Auto-detected if omitted.
cluster_bootstrap_rounds (int) – Bootstrap rounds for stability metric.
random_state (int) – RNG seed for reproducibility.
compute_supervised (bool) – Run supervised classification (requires scikit-learn + >=2 groups).
n_jobs (int) – Parallel workers (joblib). -1 = all CPUs, 1 = sequential.
cluster_bootstrap_frac (float)
group_patterns (Dict[str, str] | None)
group_fn (Any | None)

Examples

>>> evaluator = NormalizationEvaluator(files=["data/*.txt"])
>>> summary = evaluator.evaluate()
>>> evaluator.plot()

files: List[str | Path]

methods: List[str] | None = None

method_kwargs_map: Dict[str, Dict[str, Any]] | None = None

mz_min: float | None = None

mz_max: float | None = None

n_clusters: int | None = None

cluster_bootstrap_rounds: int = 30

cluster_bootstrap_frac: float = 0.8

random_state: int = 0

compute_supervised: bool = True

n_jobs: int = -1

group_patterns: Dict[str, str] | None = None

group_fn: Any | None = None

evaluate()[source]

Evaluate all methods and return a scored DataFrame.

Returns:: One row per method, sorted by score_combined (descending). Includes raw metrics, z-scored metrics, and four composite scores.
Return type:: pd.DataFrame

plot(out_dir='normalization_selection_output', save=True)[source]

Generate evaluation plots (box plots, bar charts, radar).

Parameters:

out_dir (str or Path) – Sub-folder inside OUTPUT_DIR for saved figures.
save (bool) – Persist plots as PNG + PDF.

Returns:

Saved file paths.

Return type:

list of Path

print_summary(top_n=5)[source]

Print a ranked summary of evaluation results.

Parameters:: top_n (int, default 5) – Number of top methods to display per score variant.
Return type:: None

preview_overlay(file, methods=None, max_methods=5, mz_min=None, mz_max=None, save_to='normalization_selection_output')[source]

Plot raw vs normalised overlays for quick visual comparison.

Parameters:

file (str or Path) – Single spectrum file to visualise.
methods (list of str, optional) – Methods to overlay. Defaults to top methods from evaluation.
max_methods (int) – Cap on the number of overlays.
mz_min (float, optional) – m/z window for the plot.
mz_max (float, optional) – m/z window for the plot.
save_to (str, Path, or None) – Save directory (relative to OUTPUT_DIR). None skips saving.

Return type:

None

class mioXpektron.NormalizationMethods(mz_values, raw_intensities)[source]

Bases: object

Evaluate and apply normalization strategies for ToF-SIMS data.

Parameters:

mz_values (array-like) – The m/z axis shared by all spectra.
raw_intensities (array-like) – Raw intensity values aligned with mz_values.

apply(method='tic', **kwargs)[source]

Apply a named normalization to the stored spectrum.

Parameters:

method (str) – Normalization method name (see normalization_method_names()).
**kwargs – Method-specific keyword arguments.

Returns:

Normalized intensity array.

Return type:

np.ndarray

compare_visual(methods=None, method_kwargs_map=None, mz_min=0, mz_max=500, sample_name='test', group=None, figsize=(12, 8), save_plot=True)[source]

Plot the raw spectrum alongside several normalized versions.

Parameters:

methods (list of str, optional) – Normalization methods to overlay. Defaults to a curated set.
method_kwargs_map (dict, optional) – {method: {kwarg: value}} for method-specific parameters.
mz_min (float) – m/z bounds for the preview window.
mz_max (float) – m/z bounds for the preview window.
sample_name (str) – Label used for file naming.
group (str or None) – Group identifier.
figsize (tuple) – Figure size.
save_plot (bool) – Persist the rendered figure.

Return type:

normalize_and_check(method='tic', method_kwargs=None, *, sample_name='test', group=None, mz_min=0, mz_max=500, show_peaks=False, peak_height=1000, peak_prominence=50, min_peak_width=1, max_peak_width=None, figsize=(10, 6), save_plot=True)[source]

Apply one normalization and visualise the result with peak overlay.

Parameters:

method (str) – Normalization method.
method_kwargs (dict, optional) – Extra kwargs forwarded to normalize().
sample_name (str) – Plot labels.
group (str) – Plot labels.
mz_min (float) – m/z window for the plot.
mz_max (float) – m/z window for the plot.
show_peaks (bool) – Annotate detected peaks.
peak_height (float) – Peak detection tuning passed to PlotPeak.
peak_prominence (float) – Peak detection tuning passed to PlotPeak.
min_peak_width (int) – Peak detection tuning passed to PlotPeak.
max_peak_width (int | None) – Peak detection tuning passed to PlotPeak.
figsize (tuple)
save_plot (bool)

Return type:

static evaluate(files, methods=None, method_kwargs_map=None, mz_min=None, mz_max=None, n_jobs=-1, compute_supervised=True, save_results=True)[source]

Evaluate normalization methods across multiple spectra files.

Thin wrapper around NormalizationEvaluator that runs evaluation, prints a summary, and optionally saves results.

Parameters:

files (list of str or Path) – Spectrum file paths or glob patterns.
methods (list of str, optional) – Method names to evaluate.
method_kwargs_map (dict, optional) – Per-method keyword arguments.
mz_min (float, optional) – m/z range for data import.
mz_max (float, optional) – m/z range for data import.
n_jobs (int) – Parallel workers (-1 = all CPUs).
compute_supervised (bool) – Run supervised classification (requires scikit-learn).
save_results (bool) – Save CSV + JSON + plots to OUTPUT_DIR.

Returns:

The evaluator instance (call .plot() for figures).

Return type:

NormalizationEvaluator

static available_methods()[source]

Return sorted list of available normalization method names.

Return type:: List[str]

class mioXpektron.PeakAlignIntensityArea(mz_tolerance=0.2, mz_rounding_precision=1, min_intensity=1, min_snr=3, min_distance=2, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, method=None, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True, output_dir=None, verbose=False, group_patterns=None, group_fn=None)[source]

Bases: object

Process normalized ToF-SIMS spectra from CSV files, detect peaks, align them across samples, and calculate both intensity and area tables for each aligned m/z value.

Parameters:

mz_tolerance (float, optional (default=0.2)) – Maximum distance (in m/z units) for clustering peaks across samples.
mz_rounding_precision (int, optional (default=1)) – Number of decimal places for rounding aligned m/z values in output tables.
min_intensity (float, optional (default=1)) – Minimum intensity threshold for considering data points.
min_snr (float, optional (default=3)) – Minimum signal-to-noise ratio for peak detection.
min_distance (int, optional (default=2)) – Minimum distance (in data points) between peaks.
peak_height (float, optional (default=50)) – Minimum peak height for initial peak detection.
prominence (float, optional (default=10)) – Minimum prominence for peak detection.
min_peak_width (int, optional (default=1)) – Minimum peak width (in data points).
max_peak_width (int, optional (default=75)) – Maximum peak width (in data points).
width_rel_height (float, optional (default=0.5)) – Relative height for peak width calculation (0.5 = FWHM).
noise_model ({"global", "mz_binned"}, optional (default="global")) – Noise model used for threshold estimation.
noise_bins (int, optional (default=20)) – Number of m/z bins when using noise_model="mz_binned".
noise_min_points (int, optional (default=25)) – Minimum positive noise points per bin for the local model.
method (str or None, optional (default=None)) – Peak-detection / fitting method. None uses simple local-max detection (detect_peaks_with_area_v2), 'cwt' uses CWT detection, and 'Gaussian' / 'Lorentzian' / 'Voigt' use curve-fit detection via robust_peak_detection.
deconvolution_min_bic_delta (float, optional (default=10.0)) – Minimum BIC improvement required before accepting a two-Gaussian deconvolution over a single-peak fit.
deconvolution_overlap_factor (float, optional (default=0.75)) – Scale factor applied to the mean measured peak width when deriving the adaptive deconvolution spacing gate.
deconvolution_replace_singles (bool, optional (default=True)) – If True, replace overlapping single-peak fits with the accepted deconvoluted components in the output table.
output_dir (str or None, optional) – Directory to save output CSV files. If None, files are not saved.
verbose (bool, optional (default=False)) – If True, print progress information.

Examples

>>> from mioXpektron.detection import PeakAlignIntensityArea
>>> import glob
>>>
>>> # Get all normalized spectra
>>> csv_files = glob.glob('output_files/normalized_spectra/*.csv')
>>>
>>> # Create analyzer instance
>>> analyzer = PeakAlignIntensityArea(
...     mz_tolerance=0.1,
...     min_snr=3,
...     output_dir='output_files/peak_analysis'
... )
>>>
>>> # Process with m/z cutoff
>>> intensity_table, area_table, peaks_df = analyzer.run(
...     csv_files,
...     mz_min=50,
...     mz_max=500
... )
>>>
>>> print(f"Detected {len(peaks_df)} peaks across {len(csv_files)} samples")
>>> print(f"Aligned to {intensity_table.shape[1]} unique m/z values")

__init__(mz_tolerance=0.2, mz_rounding_precision=1, min_intensity=1, min_snr=3, min_distance=2, peak_height=50, prominence=10, min_peak_width=1, max_peak_width=75, width_rel_height=0.5, noise_model='global', noise_bins=20, noise_min_points=25, method=None, deconvolution_min_bic_delta=10.0, deconvolution_overlap_factor=0.75, deconvolution_replace_singles=True, output_dir=None, verbose=False, group_patterns=None, group_fn=None)[source]: Initialize the PeakAlignIntensityArea analyzer with default parameters.

run(csv_files, mz_min=None, mz_max=None)[source]

Process CSV files and perform peak detection, alignment, and quantification.

Parameters:

csv_files (list of str) – List of paths to normalized spectrum CSV files. Each CSV should have columns: ‘channel’, ‘mz’, ‘intensity’
mz_min (float or None, optional) – Minimum m/z value to consider for peak detection. If None, use full range.
mz_max (float or None, optional) – Maximum m/z value to consider for peak detection. If None, use full range.

Returns:

intensity_table (pd.DataFrame) – DataFrame with samples as rows and aligned m/z values as columns, containing peak intensities (amplitudes). Missing peaks are filled with 0.
area_table (pd.DataFrame) – DataFrame with samples as rows and aligned m/z values as columns, containing peak areas. Missing peaks are filled with 0.
peaks_df (pd.DataFrame) – DataFrame containing all detected peaks with their properties before alignment.

mioXpektron.auto_tune_calib_config(files, reference_masses, *, base_config=None, sample_n=10)[source]

Build a FlexibleCalibConfig with data-driven parameters.

Starts from base_config (or the default) and replaces tolerance, screening values, and breakpoints with adaptive estimates. The caller can further override any field afterwards.

Returns a FlexibleCalibConfig instance.

Parameters:

files (Sequence[str])
reference_masses (Sequence[float])
sample_n (int)

mioXpektron.estimate_autodetect_tolerance(files, reference_masses, *, sample_n=10, quantile=0.9)[source]

Estimate autodetect_tol_da from observed peak widths near calibrant m/z values.

Reads a sample of spectra, measures the FWHM of the strongest peak within +/-1 Da of each reference mass, and returns a tolerance equal to quantile of those widths (clamped to [0.05, 2.0] Da).

Parameters:

files (Sequence[str])
reference_masses (Sequence[float])
sample_n (int)
quantile (float)

Return type:

mioXpektron.estimate_bootstrap_heuristics(files, *, sample_n=10)[source]

Derive adaptive bootstrap peak-matching constants from observed channel statistics (noise, spacing, range).

Returns a dict whose keys match the _BOOTSTRAP_* constant names in _models.py (without the leading underscore).

Parameters:

files (Sequence[str])
sample_n (int)

Return type:

Dict[str, float]

mioXpektron.estimate_denoise_params(files, *, sample_n=5)[source]

Estimate hf_cutoff_frac and max_peaks for the denoise selection evaluator from pilot spectra.

Returns a dict of keyword overrides for compare_denoising_methods.

Parameters:

files (Sequence[str])
sample_n (int)

Return type:

Dict[str, object]

mioXpektron.estimate_flat_params(files, *, sample_n=10)[source]

Estimate savgol_window and quantile thresholds for FlatParams from the data.

Returns a dict of keyword overrides suitable for dataclasses.replace(FlatParams(), **result).

Parameters:

files (Sequence[str])
sample_n (int)

Return type:

Dict[str, object]

mioXpektron.estimate_multisegment_breakpoints(reference_masses, n_segments=3)[source]

Place segment breakpoints at quantile boundaries of the reference mass range so each segment contains roughly equal calibrant counts.

Parameters:

reference_masses (Sequence[float])
n_segments (int)

Return type:

List[float]

mioXpektron.estimate_mz_tolerance(files, *, sample_n=10, multiplier=3.0)[source]

Estimate mz_tolerance from observed median m/z spacing, scaled by multiplier. Clamped to [0.01, 1.0].

Parameters:

files (Sequence[str])
sample_n (int)
multiplier (float)

Return type:

mioXpektron.estimate_normalization_target(files, *, sample_n=20, mz_min=None, mz_max=None)[source]

Estimate normalization_target as the median raw TIC across a sample of spectra. Falls back to 1e6 on failure.

Parameters:

files (Sequence[str])
sample_n (int)
mz_min (float | None)
mz_max (float | None)

Return type:

mioXpektron.estimate_outlier_threshold(residuals, *, target_false_rejection_rate=0.01, bounds=(2.0, 5.0))[source]

Derive outlier_threshold from observed residual spread.

Uses the empirical quantile corresponding to 1 - target_false_rejection_rate of absolute z-scores (MAD-scaled), clamped to bounds.

Parameters:

residuals (ndarray[tuple[Any, ...], dtype[float64]])
target_false_rejection_rate (float)
bounds (Tuple[float, float])

Return type: