mioXpektron.normalization.preprocessing

Functions

batch_tic_norm(input_pattern[, output_dir, ...])

Batch‑import and preprocess multiple ToF‑SIMS spectra, then save the (m/z, normalized_intensity) arrays for each file as a tab‑separated text file in output_dir.

data_preprocessing(file_path[, mz_min, ...])

Import and preprocess ToF-SIMS data from a text file.

resample_spectrum(mz_values, ...[, method])

Resample a spectrum onto a target m/z grid.

Classes

BatchTicNorm(input_pattern[, output_dir, ...])

Batch TIC normalization for multiple spectra files using Polars and concurrent.futures.

mioXpektron.normalization.preprocessing.resample_spectrum(mz_values, intensity_values, target_mz, method='linear')[source]

Resample a spectrum onto a target m/z grid.

The input axis is sorted, duplicate m/z positions are collapsed to their first occurrence, and values outside the native m/z range are filled with zero. Supported interpolation methods are linear, pchip, akima, makima, and cubic.

mioXpektron.normalization.preprocessing.data_preprocessing(file_path, mz_min=None, mz_max=None, normalization_target=1000000.0, verbose=True, return_all=False)[source]

Import and preprocess ToF-SIMS data from a text file.

Parameters:

file_pathstr

Path to the ToF-SIMS data file

mz_min, mz_maxfloat, optional

m/z range to import

normalization_targetfloat or None

Target TIC for normalization, or None to skip

verbosebool

Print progress if True

return_allbool

If True, return all intermediate arrays

Returns:

mz_values : numpy.ndarray normalized_intensities : numpy.ndarray sample_name : str group : str (optionally: intermediate arrays)

mioXpektron.normalization.preprocessing.batch_tic_norm(input_pattern, output_dir='normalized_spectra', mz_min=None, mz_max=None, normalization_target=1000000.0, verbose=False)[source]

Batch‑import and preprocess multiple ToF‑SIMS spectra, then save the (m/z, normalized_intensity) arrays for each file as a tab‑separated text file in output_dir.

Parameters:
Returns:

Paths of the files written, in processing order.

Return type:

List[str]

class mioXpektron.normalization.preprocessing.BatchTicNorm(input_pattern, output_dir='normalized_spectra', normalization_target=1000000.0, n_workers=-1, verbose=True)[source]

Bases: object

Batch TIC normalization for multiple spectra files using Polars and concurrent.futures.

Supports both CSV and TXT file formats: - CSV: Uses ‘corrected_intensity’ if available, otherwise ‘intensity’ - TXT: Tab-separated m/z and intensity values

Output files contain: channel, mz, intensity (normalized)

Parameters:
  • input_pattern (str)

  • output_dir (str)

  • normalization_target (float)

  • n_workers (int)

  • verbose (bool)

__init__(input_pattern, output_dir='normalized_spectra', normalization_target=1000000.0, n_workers=-1, verbose=True)[source]

Initialize BatchTicNorm processor.

Parameters:
  • input_pattern (str) – Glob pattern for input files (e.g., ‘data/.csv’ or ‘data/.txt’)

  • output_dir (str) – Directory to save normalized files

  • normalization_target (float) – Target TIC value for normalization (default: 1e6)

  • n_workers (int) – Number of parallel workers (default: 16)

  • verbose (bool) – Print progress information

process()[source]

Process all files using concurrent.futures.

Returns:

List of output file paths that were successfully created

Return type:

List[str]

get_tic_statistics()[source]

Calculate TIC statistics for all input files before normalization.

Returns:

DataFrame with columns: filename, tic_original, tic_million

Return type:

pl.DataFrame