Utilities

The utils module provides file I/O, batch processing orchestration, and statistical analysis tools.

Data Import

Load ToF-SIMS spectra from text files:

from mioXpektron import import_data

mz, intensity, sample_name, group = import_data(
    "spectrum.txt",
    mz_min=1.0,
    mz_max=300.0,
)

The importer:

  • Auto-detects separators (tab, comma, space)

  • Skips comment lines (#, //)

  • Infers sample names from filenames

  • Infers sample groups from filename patterns

  • Supports optional m/z range filtering

Batch Processing

Run parallel peak extraction and alignment across many spectra:

from mioXpektron.utils import batch_processing

peaks_df, intensity_df, area_df = batch_processing(
    file_list,
    max_workers=4,
    mz_min=1.0,
    mz_max=300.0,
    normalization_target=1e6,
    mz_tolerance=0.2,
)

Statistical Analysis

The analysis submodule provides tools for downstream statistical analysis of aligned peak matrices.

Benjamini-Hochberg FDR Correction

from mioXpektron.utils.analysis import bh_fdr

q_values = bh_fdr(p_values)

Univariate Testing

from mioXpektron.utils.analysis import compute_univariate_tests

results = compute_univariate_tests(intensity_df, groups)

Visualization Helpers

The analysis module includes plotting utilities for:

  • Volcano plots

  • PCA and UMAP projections

  • ROC curves

  • Heatmaps

API Reference

mioXpektron.utils.import_data(file_path, mz_min=None, mz_max=None, group_patterns=None, group_fn=None)[source]

Import ToF-SIMS data from a spectrum file.

Parameters:
  • file_path (str) – Path to the ToF-SIMS data file. Supports tab-delimited .txt exports with m/z + Intensity columns and CSV exports with mz + corrected_intensity or intensity columns.

  • mz_min (float, optional) – Minimum m/z value to be imported (inclusive).

  • mz_max (float, optional) – Maximum m/z value to be imported (inclusive).

  • group_patterns (dict[str, str], optional) – Mapping of {regex_pattern: group_label}. Patterns are tested against the sample name (filename without extension) in order; the first match determines the group. Defaults to {'_CC...': 'Cancer', '_CT...': 'Control'}.

  • group_fn (callable, optional) – A function (sample_name: str) -> str that returns the group label directly. When provided this takes priority over group_patterns.

Returns:

  • mz (np.ndarray) – Mass-to-charge ratio values.

  • intensity (np.ndarray) – Intensity values.

  • sample_name (str) – Sample name extracted from file name.

  • group (str) – Group label derived from the filename.

Return type:

Tuple[ndarray, ndarray, str, str]