Every spectrometer manufacturer has invented their own file format. Bruker stores FTIR spectra in proprietary binary .0 files. Thermo Fisher uses the SPC format from the Galactic Industries acquisition. Horiba wraps Raman data in XML. Renishaw packs spectral maps into WDF binaries. Each format captures the same fundamental thing - an array of x-values paired with an array of y-values, plus metadata - but they are all incompatible with each other.
This creates a concrete engineering problem. If you are building software that processes spectra from multiple instruments - a clinical workflow platform, a multi-site data pipeline, a machine learning training system - you need to read all of them. And if you need to exchange data between systems, you need a common interchange format.
This article is the reference you bookmark. It covers every major spectral data format:
- Vendor-neutral interchange standards: JCAMP-DX, SPC, ANDI/netCDF
- Vendor-specific proprietary formats: Bruker OPUS, Renishaw WDF, Horiba LabSpec
- Plain CSV as the universal fallback
For each, you get the file structure, parsing code in Python, gotchas, and when to use it. At the end, a decision framework for choosing the right format for your use case.
The Format Landscape
Before diving into specifics, here is the taxonomy:
| Format | Extension | Type | Vendor | Standardized? | Multi-spectrum? |
|---|---|---|---|---|---|
| JCAMP-DX | .dx, .jdx | Text | Neutral | Yes (IUPAC) | Yes (NTUPLES, LINK) |
| SPC | .spc | Binary | Thermo/Galactic | De facto | Yes (subfiles) |
| ANDI/netCDF | .cdf | Binary | Neutral | Yes (ASTM) | Yes (variables) |
| Bruker OPUS | .0, .1, .2 | Binary | Bruker | No | Yes (data blocks) |
| Renishaw WDF | .wdf | Binary | Renishaw | No | Yes (spectral maps) |
| Horiba LabSpec | .l5s, .l6s, .ngs | Binary | Horiba | No | Yes |
| CSV/ASCII | .csv, .txt | Text | Neutral | No | By convention |
The standardized formats were designed for interchange. The proprietary formats were designed for the vendor's own software. You will encounter all of them.
JCAMP-DX: The IUPAC Standard
JCAMP-DX (Joint Committee on Atomic and Molecular Physical Data - Data Exchange) is the only internationally standardized format for chemical spectral data. Published by IUPAC, it has been revised multiple times since its introduction in 1988, with the current version at 6.0.
File Structure
JCAMP-DX is a plain text format. Every file consists of labeled data records (LDRs), each starting with ## and a label name:
##TITLE= Polystyrene ATR-FTIR Spectrum
##JCAMP-DX= 5.01
##DATA TYPE= INFRARED SPECTRUM
##ORIGIN= Bruker OPUS 9.3
##OWNER= SpectraDx Lab
##XUNITS= 1/CM
##YUNITS= ABSORBANCE
##FIRSTX= 399.2308
##LASTX= 3999.6401
##DELTAX= 0.9643
##NPOINTS= 3734
##XFACTOR= 1.000000
##YFACTOR= 0.000001
##XYDATA= (X++(Y..Y))
399.2 1823456 1834567 1845678 1856789 1867890
404.1 1878901 1889012 1900123 1911234 1922345
...
##END=
The header records define the spectral metadata: data type (infrared, Raman, NIR, NMR, UV-Vis, mass spec), axis units, spectral range, and number of points. The data section contains the actual spectral values.
Data Encoding Schemes
JCAMP-DX was designed when disk space was expensive, so it includes several compression schemes for the ##XYDATA block:
AFFN (ASCII Free Format Numeric) - the simplest encoding. X and Y values as plain numbers separated by whitespace. Easy to parse, large files:
##XYDATA= (XY..XY)
399.23 0.182345
400.19 0.183456
401.16 0.184567
PAC (Packed) - Y values are encoded as integers by dividing by YFACTOR. Saves space. X values are implicit (calculated from FIRSTX + n × DELTAX):
##XYDATA= (X++(Y..Y))
399.2 1823456 1834567 1845678
SQZ (Squeezed) - Digits 0-9 are replaced with characters @ABCDEFGHI, with + and - signs encoded into the digit character using uppercase vs lowercase. Saves about 30% over PAC:
##XYDATA= (X++(Y..Y))
399.2 @BhCDefG ABcDeFg ...
DIF (Difference) - Each value is stored as the difference from the previous value. Spectral data is usually smooth, so differences are small numbers, making for excellent compression:
##XYDATA= (X++(Y..Y))
399.2 1823456 j1111 j1111 j1111 j1111
DIFDUP (Difference + Duplicate) - Combines DIF with run-length encoding for repeated differences. The most compact encoding and the most common in practice.
NTUPLES - An extension for multi-dimensional data (e.g., a spectrum with both real and imaginary components, or a series of spectra at different time points):
##NTUPLES= INFRARED SPECTRUM
##VAR_NAME= FREQUENCY, ABSORBANCE_REAL, ABSORBANCE_IMAG
##SYMBOL= X, Y, R
##VAR_TYPE= INDEPENDENT, DEPENDENT, DEPENDENT
##VAR_DIM= 3734, 3734, 3734
##UNITS= 1/CM, ABSORBANCE, ABSORBANCE
##PAGE= N=1
##DATA TABLE= (X++(Y..Y)), XYDATA
399.2 1823456 1834567 1845678
...
##END NTUPLES=
Parsing JCAMP-DX in Python
The jcamp library handles all encoding schemes:
import jcamp
import numpy as np
# Read a JCAMP-DX file
data = jcamp.jcamp_readfile('spectrum.dx')
# Access spectral data
wavenumbers = data['x'] # numpy array
absorbance = data['y'] # numpy array
# Access metadata
title = data.get('title', '')
data_type = data.get('data type', '')
origin = data.get('origin', '')
xunits = data.get('xunits', '')
yunits = data.get('yunits', '')
print(f"Title: {title}")
print(f"Type: {data_type}")
print(f"Range: {wavenumbers[0]:.1f} - {wavenumbers[-1]:.1f} {xunits}")
print(f"Points: {len(wavenumbers)}")For files with NTUPLES (multi-page spectra):
# jcamp handles NTUPLES transparently
# but for multi-block files, you may need to iterate
data = jcamp.jcamp_readfile('multi_spectrum.jdx')
# If the file contains linked blocks:
if 'children' in data:
for i, child in enumerate(data['children']):
x = child['x']
y = child['y']
print(f"Block {i}: {len(x)} points")Install with pip install jcamp.
Gotchas
- Encoding detection. The library must auto-detect which compression scheme is used. Older or non-standard JCAMP files sometimes mix schemes or use non-standard labels. If
jcamp.jcamp_readfile()fails, tryjcamp.jcamp_read()on the raw string. - Character encoding. JCAMP-DX files are nominally ASCII, but you will encounter Latin-1 and UTF-8 in metadata fields. Read with
errors='replace'as a fallback. - Precision loss. SQZ and DIF encoding introduce quantization due to the integer representation. The
YFACTORdetermines the precision floor. Check thatYFACTORis small enough for your application. - Version differences. JCAMP-DX 4.24 (1988) only supports single spectra. Version 5.0 (1993) added NTUPLES for NMR and multi-dimensional data. Version 5.01 (1999) added GLP compliance fields and Y2K date fixes. Active development halted around 2006 - an XML replacement was proposed but never materialized. Most files in the wild are version 4.24 or 5.01.
SPC: The De Facto Binary Standard
The SPC format originated at Galactic Industries in the 1990s. When Thermo Fisher acquired Galactic (via the Nicolet and Mattson lineage), SPC became Thermo's standard format. Because Thermo Fisher is the largest instrument vendor, SPC became the de facto interchange format - most spectroscopy software can read it.
File Structure
SPC is a fixed-header binary format. The main header occupies the first 512 bytes and defines the file type, data dimensions, and encoding:
Offset Size Field Description
------ ---- ----- -----------
0 1 ftflgs File type flags (bit field)
1 1 fversn Version: 0x4B (old) or 0x4D (new)
2 1 fexper Experiment type (IR, Raman, UV, etc.)
3 1 fexp Exponent for Y values (2^fexp scaling)
4 4 fnpts Number of points per spectrum
8 8 ffirst First X value (double)
16 8 flast Last X value (double)
24 4 fnsub Number of subfiles (spectra in file)
...
512+ varies Subfile data Actual spectral data
The ftflgs byte encodes critical information as individual bits:
| Bit | Meaning when set |
|---|---|
| 0 | Y values are 16-bit integers (vs. 32-bit float) |
| 1 | Experiment extension exists |
| 2 | Multi-file (multiple subfiles) |
| 3 | Z values in random order |
| 4 | Z values not even (non-uniform time/parameter spacing) |
| 5 | Custom axis labels in file |
| 7 | X values are not evenly spaced (each subfile has its own X array) |
When bit 7 is clear (X evenly spaced), the X values are calculated from ffirst, flast, and fnpts. When bit 7 is set, each subfile carries its own X array - doubling the data size but supporting arbitrary X axes.
Subfile Structure
For multi-spectrum files, each subfile (spectrum) follows the header:
Offset Size Field Description
------ ---- ----- -----------
0 1 subflgs Subfile flags
1 1 subexp Y exponent for this subfile
2 2 subindx Subfile index
4 4 subtime Z value (e.g., time in kinetics)
8 4 subnext Offset to next subfile
12 4 subnois Noise level
16 4 subnpts Number of points (if different from main)
20 4 subscan Number of co-added scans
24 4 subwlevel W axis value
28 varies Y data Spectral data (float32 or int16)
Parsing SPC in Python
The spc-spectra package reads SPC files:
import spc_spectra as spc
import numpy as np
# Read an SPC file
f = spc.File('spectrum.spc')
# Single-spectrum file
x = f.x # wavenumber/wavelength array
y = f.sub[0].y # intensity/absorbance array
print(f"Points: {len(x)}")
print(f"Range: {x[0]:.1f} - {x[-1]:.1f}")
print(f"Experiment type: {f.exp_type}")
# Multi-spectrum file (e.g., kinetics series)
for i, sub in enumerate(f.sub):
print(f"Spectrum {i}: {len(sub.y)} points, "
f"z-value: {sub.subtime}")
# Access log block (text metadata)
if f.log_dict:
for key, value in f.log_dict.items():
print(f" {key}: {value}")Install with pip install spc-spectra.
The Log Block
SPC files include an optional log block - a free-form text section after the spectral data. Instrument software uses this to store metadata that does not fit the fixed header: sample name, operator, date, instrument serial number, and acquisition parameters. The log block has its own 64-byte header specifying its offset and size.
# Access the raw log text
if hasattr(f, 'log_content'):
print(f.log_content)
# Typical output:
# DATE=05/08/2026
# TIME=14:25:00
# OPERATOR=J. Smith
# RESOLUTION=4 cm-1
# SCANS=32Gotchas
- Old vs. new format. Files with
fversn = 0x4Buse the old Galactic format (pre-1996). Files withfversn = 0x4Duse the new format. Thespc-spectralibrary handles both, but some older files may have quirks. - Y scaling. Y values may be stored as 32-bit floats or as integers scaled by
2^fexp. Checkftflgsbit 0 andfexpto interpret correctly. - Byte order. SPC files are little-endian. This is only relevant if you are writing your own parser on a big-endian system (rare today).
- Thermo-specific extensions. Newer Thermo Fisher instruments may write SPC files with proprietary extensions in the log block. These are generally safe to ignore.
ANDI/netCDF: The Self-Describing Format
ANDI (Analytical Data Interchange) uses the netCDF (Network Common Data Form) container format. Originally standardized by ASTM as E1947 for chromatographic data, it has been adapted for spectroscopic data as well.
netCDF is a self-describing binary format - the file includes its own schema (dimensions, variables, attributes) alongside the data. This makes it robust against format evolution: a reader can discover what the file contains without knowing the exact version.
Structure
A netCDF spectral file typically contains:
- Dimensions:
point_number(number of data points),scan_number(for multi-spectrum),string_length(for text attributes) - Variables:
ordinate_values(Y data),abscissa_values(X data if not uniform),detector_name,scan_acquisition_time - Global attributes:
dataset_origin,experiment_type,operator_name,detector_unit,ordinate_unit
Parsing ANDI/netCDF in Python
Use scipy.io.netcdf for older ANDI files or netCDF4 for modern netCDF-4 files:
from scipy.io import netcdf_file
import numpy as np
# Read an ANDI/netCDF spectral file
with netcdf_file('spectrum.cdf', 'r') as f:
# List all variables
print("Variables:", list(f.variables.keys()))
# Access spectral data
y = f.variables['ordinate_values'][:].copy()
# X values may be stored explicitly or calculated
if 'abscissa_values' in f.variables:
x = f.variables['abscissa_values'][:].copy()
else:
# Calculate from attributes
first_x = f.variables['ordinate_values'].first_x
last_x = f.variables['ordinate_values'].last_x
x = np.linspace(first_x, last_x, len(y))
# Access metadata
print(f"Origin: {getattr(f, 'dataset_origin', 'Unknown')}")
print(f"Type: {getattr(f, 'experiment_type', 'Unknown')}")
print(f"Points: {len(x)}")
print(f"Range: {x[0]:.1f} - {x[-1]:.1f}")For netCDF-4 format files (newer instruments):
import netCDF4
ds = netCDF4.Dataset('spectrum.cdf', 'r')
# List dimensions and variables
print("Dimensions:", dict(ds.dimensions))
print("Variables:", list(ds.variables.keys()))
# Access data
y = ds.variables['ordinate_values'][:]
x = ds.variables['abscissa_values'][:]
ds.close()Install netCDF4 with pip install netCDF4. The scipy.io.netcdf module is included with scipy and handles classic netCDF (v3) files - sufficient for most ANDI files.
Gotchas
- Variable naming. ANDI files from different vendors use different variable names for the same data. Agilent uses
ordinate_values; some older instruments useintensity_valuesorsignal. Inspect the file before hardcoding variable names. - Chromatographic vs. spectroscopic. The ASTM E1947 standard was designed for chromatography. Spectroscopy ANDI files adapt the schema, but the fit is not perfect. Multi-spectrum files (e.g., hyphenated GC-IR data) use the
scan_numberdimension. - netCDF versions. Classic netCDF (v3) and netCDF-4 (based on HDF5) have different internal structures.
scipy.io.netcdfreads only v3.netCDF4reads both.
Bruker OPUS: The Proprietary Binary
Bruker's OPUS format is the native file format for all Bruker FTIR instruments. Files have numeric extensions (.0, .1, .2, incrementing with each measurement) and are proprietary binary with no published specification.
We cover OPUS in detail in our Bruker OPUS interfaces guide and Python FTIR automation tutorial. Here is the format summary relevant to data interchange.
Structure
An OPUS file is a container of typed data blocks, each with a header specifying the block type, offset, and size. Known block types include:
| Block Type | Content | Typical Key |
|---|---|---|
| Absorbance spectrum | Processed absorbance data | a |
| Transmittance spectrum | Processed transmittance | t |
| Reflectance spectrum | Processed reflectance | r |
| Sample interferogram | Raw detector signal | igsm |
| Reference interferogram | Background detector signal | igrf |
| Sample phase spectrum | Phase correction data | phsm |
| Reference phase spectrum | Background phase | phrf |
| Instrument parameters | Acquisition settings | params |
| Optic parameters | Optical configuration | optic |
| Sample parameters | Sample identification | sample |
Each data block contains a header with the first X value, last X value, number of points, and a scaling factor, followed by the raw data as 32-bit floats.
Parsing in Python
Three libraries handle OPUS files, each with different trade-offs:
brukeropus - the most complete:
from brukeropus import read_opus
opus = read_opus('sample.0')
# List available data blocks
print(opus.data_keys) # e.g., ['a', 't', 'igsm', 'igrf']
# Access absorbance spectrum
x = opus.a.x # numpy array of wavenumbers
y = opus.a.y # numpy array of absorbance values
# Access all parameters
print(opus.params)
# {'INS': 'Alpha II', 'SRC': 'Internal', 'RES': '4', ...}brukeropusreader - lighter weight, read-only:
from brukeropusreader import read_file
data = read_file('sample.0')
absorbance = data["AB"] # numpy arrayopusFC - C-based, fastest for batch processing:
import opusFC
blocks = opusFC.listContents('sample.0')
for b in blocks:
print(f"Type: {b.blocktype}, Points: {b.npt}")
data = opusFC.getOpusData('sample.0', blocks[0])
x, y = data.x, data.yInstall with pip install brukeropus, pip install brukeropusreader, or pip install opusFC respectively.
Gotchas
- No specification. The format is reverse-engineered. Each Python library handles a slightly different subset of OPUS versions and block types. If one library fails on a particular file, try another.
- File locking. OPUS locks files during acquisition. If you try to read a
.0file while OPUS is still writing it, you get corrupted data or a permission error. See the retry logic in our Python FTIR tutorial. - Numeric extensions. The extension is not a file type indicator - it is a sequence counter.
.0is the first measurement,.1is the second. The actual format is identical regardless of extension. - Export alternative. If you control the instrument workflow, exporting to JCAMP-DX (
.dx) from OPUS is sometimes simpler than parsing the native format. OPUS supports export to JCAMP-DX, SPC, CSV, and several other formats.
Renishaw WDF: Raman Spectral Maps
Renishaw's WiRE software stores Raman spectra in the WDF (WiRE Data Format) binary format. WDF files can contain single spectra, line scans, or full 2D spectral maps - a grid of spatial positions, each with a complete Raman spectrum.
Structure
The WDF file begins with a fixed header block, followed by a series of tagged data blocks. Each block has a 16-byte header with a 4-byte block type identifier and a 4-byte block size. Major block types:
| Block Type ID | Content |
|---|---|
WDF1 | Main header (version, measurement type, point counts) |
DATA | Spectral intensity data (float32 array) |
XLST | X-axis list (wavenumber/wavelength values) |
YLST | Y-axis list (secondary axis values) |
ORGN | Origin list (spatial coordinates for map data) |
WMAP | Map metadata (dimensions, step sizes) |
WHTL | White-light image (if captured) |
TEXT | Text metadata (sample description, notes) |
For spectral maps, the DATA block contains all spectra concatenated: if you have a 10×10 map with 1024 points per spectrum, the DATA block contains 100 × 1024 = 102,400 float32 values.
Parsing in Python
The renishawWiRE library parses WDF files:
from renishawWiRE import WDFReader
import numpy as np
# Read a WDF file
reader = WDFReader('raman_map.wdf')
# Basic metadata
print(f"Title: {reader.title}")
print(f"Measurement type: {reader.measurement_type}")
print(f"Number of spectra: {reader.count}")
print(f"Points per spectrum: {reader.point_per_spectrum}")
# Wavenumber axis (same for all spectra in the map)
wavenumbers = reader.xdata # numpy array, shape (npoints,)
# All spectral data
spectra = reader.spectra # numpy array, shape (nspectra, npoints)
# For a single spectrum file:
single_spectrum = spectra[0]
# For a map: access spatial coordinates
if reader.measurement_type == 2: # Map
x_coords = reader.xpos # X positions
y_coords = reader.ypos # Y positions
map_shape = reader.map_shape # (ny, nx)
# Reshape spectra into a spatial grid
spectra_map = spectra.reshape(*map_shape, -1)
# spectra_map[row, col, :] gives the spectrum at that position
# Access the white-light image if available
if reader.img is not None:
img = reader.img # PIL Image objectInstall with pip install renishawWiRE.
Gotchas
- Map orientation. Spatial coordinates in WDF files follow the instrument stage convention, which may differ from image convention (row 0 at top vs. bottom). Verify orientation against the white-light image.
- Large files. A 100×100 spectral map with 1024 points per spectrum produces a 40 MB DATA block. Load lazily if memory is a concern.
- WiRE version differences. Older WiRE versions (< 5.0) write slightly different block structures. The
renishawWiRElibrary handles most versions but may fail on very old files.
Horiba LabSpec: Mixed Binary
Horiba's LabSpec software (for LabRAM and XploRA Raman systems) uses proprietary binary formats with extensions .l5s (LabSpec 5), .l6s (LabSpec 6), and .ngs. These are binary containers, not XML - though LabSpec can export to XML, which is the most accessible format for third-party parsing.
What Is Known
The LabSpec format stores:
- Spectral data (float32 or float64 arrays)
- Wavenumber/wavelength axis
- Acquisition parameters (laser wavelength, exposure time, grating position)
- Spatial coordinates (for mapping data)
- Embedded white-light images
Parsing Options
There is no established open-source Python library for native LabSpec files. Practical approaches:
- Export from LabSpec software. LabSpec can export to JCAMP-DX, SPC, ASCII/CSV, and several other formats. If you control the instrument workflow, configure LabSpec to auto-export to JCAMP-DX or CSV after each acquisition.
- Use Horiba's SDK. Horiba provides an SDK for programmatic access to LabSpec data, but it is Windows-only and requires a LabSpec license.
- Parse the binary directly. The format has been partially reverse-engineered. The file begins with a version-dependent header, followed by data blocks that can be located by searching for known magic bytes. This approach is fragile and not recommended unless you have a large corpus of LabSpec files and no other option.
import numpy as np
import struct
def read_labspec_spectrum(filepath: str) -> tuple:
"""
Attempt to read spectral data from a LabSpec file.
This is a best-effort parser for common LabSpec versions.
Export to JCAMP-DX or SPC is preferred.
"""
with open(filepath, 'rb') as f:
raw = f.read()
# Search for the data block marker (version-dependent)
# This is inherently fragile - prefer export formats
# Common pattern: look for a float64 array preceded by
# a 4-byte count
raise NotImplementedError(
"Native LabSpec parsing is fragile. "
"Export to JCAMP-DX from LabSpec instead: "
"File > Export > JCAMP-DX"
)Recommendation: Do not invest in native LabSpec parsing. Export to JCAMP-DX or SPC from the LabSpec software and parse that instead. The export preserves all spectral data and metadata.
CSV and ASCII: The Universal Fallback
Every spectroscopy instrument can export to CSV or tab-delimited text. It is the lowest common denominator - universally readable, universally lossy.
Common Layouts
Two-column (X, Y):
# Wavenumber (cm-1), Absorbance
3999.64, 0.0234
3998.68, 0.0231
3997.71, 0.0228
...
Multi-column (X, Y1, Y2, ...) for multiple spectra:
Wavenumber, Sample_001, Sample_002, Sample_003
3999.64, 0.0234, 0.0198, 0.0267
3998.68, 0.0231, 0.0195, 0.0264
Header variations. Some instruments write metadata as comment lines (prefixed with #, %, or ;). Some write column headers. Some write nothing - just raw numbers.
Parsing CSV in Python
import numpy as np
import pandas as pd
# Simple two-column CSV
data = np.loadtxt('spectrum.csv', delimiter=',', skiprows=1)
wavenumbers = data[:, 0]
absorbance = data[:, 1]
# Multi-spectrum CSV with headers
df = pd.read_csv('spectra.csv')
wavenumbers = df.iloc[:, 0].values
spectra = df.iloc[:, 1:].values # shape: (npoints, nspectra)
# Handle comment lines
data = np.loadtxt('spectrum.txt', comments='#', delimiter='\t')When CSV Works
CSV works when:
- You are exporting a single spectrum or a small batch for quick analysis
- You need to open data in Excel or Google Sheets
- You are exchanging data with a collaborator who does not have spectroscopy software
- You are feeding data into a generic ML pipeline that consumes tabular data
When CSV Fails
CSV loses metadata. You get the spectral data but not the instrument model, acquisition parameters, resolution, number of scans, sample name, operator, or timestamp. For clinical and regulatory use cases where data provenance matters, CSV is insufficient. Use JCAMP-DX or SPC.
CSV also has no standard schema. Two-column vs. multi-column, comma vs. tab, header vs. no header, comment prefix - every instrument exports slightly different CSV. Your parser needs to be defensive.
Vendor Support Matrix
Which instruments output which formats, and what interchange formats they support:
| Vendor | Instrument | Native Format | Export: JCAMP-DX | Export: SPC | Export: CSV | Export: netCDF |
|---|---|---|---|---|---|---|
| Bruker | Alpha II, Vertex, Tensor | OPUS (.0) | Yes | Yes | Yes | No |
| Thermo Fisher | Nicolet iS50, iS20 | SPC (.spc) | Yes | Native | Yes | Yes |
| Thermo Fisher | DXR3 (Raman) | SPC (.spc) | Yes | Native | Yes | Yes |
| Horiba | LabRAM, XploRA | LabSpec (.l5s/.l6s) | Yes* | Yes | Yes | HDF5 (v6.3+) |
| Renishaw | inVia, Virsa | WDF (.wdf) | Yes | Yes | Yes | No |
| Agilent | Cary 630 FTIR | Agilent (.a2r) | Yes | No | Yes | No |
| PerkinElmer | Spectrum Two, Frontier | SP (.sp) | Yes | Yes | Yes | No |
| Shimadzu | IRSpirit, IRTracer | Shimadzu (.spc*) | Yes | Variant | Yes | No |
*Shimadzu uses a modified SPC format that is not fully compatible with Thermo SPC readers.
Key observation: JCAMP-DX is the only format that every vendor supports for export. If you need a single interchange format, JCAMP-DX is the safe choice.
Decision Framework: Which Format Should You Use?
For Data Interchange Between Systems
Use JCAMP-DX. It is the only IUPAC-standardized format, universally supported for export, and text-based (human-inspectable, version-controllable, diffable). The compression encoding makes files smaller than you might expect.
Use SPC as a secondary interchange format if your ecosystem is Thermo-heavy. Avoid netCDF unless you are working in a chromatography-adjacent pipeline that already uses it.
For Archival and Regulatory Compliance
Use the vendor's native format plus JCAMP-DX. Regulatory submissions (FDA, EU IVDR) need raw data in the original instrument format for reproducibility. Store the native .0, .spc, or .wdf file as the primary record and a JCAMP-DX export as the human-readable companion.
For Machine Learning Pipelines
Use NumPy arrays (.npy) or HDF5 (.h5) internally. Parse the source format once, extract the spectral data, preprocess it (baseline correction, normalization), and save the result as a NumPy array or HDF5 dataset. ML frameworks consume arrays, not spectral files.
import numpy as np
# Parse once, save as numpy
wavenumbers, spectra, labels = [], [], []
for filepath, label in dataset:
opus = read_opus(filepath)
wavenumbers = opus.a.x
spectra.append(opus.a.y)
labels.append(label)
np.save('wavenumbers.npy', wavenumbers)
np.save('spectra.npy', np.array(spectra))
np.save('labels.npy', np.array(labels))For Real-Time Instrument Integration
Read the native format directly. Parsing overhead matters when you are processing spectra in a clinical workflow with sub-second latency requirements. The native format avoids the export step and gives you access to all metadata (interferograms, quality metrics, instrument status) that export formats may omit.
Use brukeropus for Bruker, spc-spectra for Thermo, and renishawWiRE for Renishaw. See our Python FTIR automation tutorial for the complete real-time acquisition pipeline.
For Multi-Vendor Platforms
Build a parser adapter per vendor, normalize to a common internal representation. This is what SpectraDx does. Each instrument adapter reads the native format and produces a standardized internal spectrum object (wavenumber array, intensity array, metadata dict). Downstream processing - classification, result delivery, archival - operates on the normalized representation, not the vendor-specific format.
from dataclasses import dataclass
import numpy as np
@dataclass
class NormalizedSpectrum:
wavenumbers: np.ndarray
intensities: np.ndarray
metadata: dict
source_format: str
source_path: str
def parse_any_format(filepath: str) -> NormalizedSpectrum:
"""Route to the correct parser based on file extension."""
ext = filepath.rsplit('.', 1)[-1].lower()
if ext.isdigit(): # Bruker OPUS (.0, .1, .2)
from brukeropus import read_opus
opus = read_opus(filepath)
return NormalizedSpectrum(
wavenumbers=opus.a.x,
intensities=opus.a.y,
metadata=dict(opus.params),
source_format='opus',
source_path=filepath,
)
elif ext == 'spc':
import spc_spectra as spc
f = spc.File(filepath)
return NormalizedSpectrum(
wavenumbers=f.x,
intensities=f.sub[0].y,
metadata=f.log_dict or {},
source_format='spc',
source_path=filepath,
)
elif ext in ('dx', 'jdx'):
import jcamp
data = jcamp.jcamp_readfile(filepath)
return NormalizedSpectrum(
wavenumbers=data['x'],
intensities=data['y'],
metadata={k: v for k, v in data.items()
if k not in ('x', 'y')},
source_format='jcamp-dx',
source_path=filepath,
)
elif ext == 'wdf':
from renishawWiRE import WDFReader
reader = WDFReader(filepath)
return NormalizedSpectrum(
wavenumbers=reader.xdata,
intensities=reader.spectra[0],
metadata={'title': reader.title},
source_format='wdf',
source_path=filepath,
)
elif ext in ('csv', 'txt'):
data = np.loadtxt(filepath, delimiter=',', skiprows=1)
return NormalizedSpectrum(
wavenumbers=data[:, 0],
intensities=data[:, 1],
metadata={},
source_format='csv',
source_path=filepath,
)
else:
raise ValueError(f"Unsupported format: .{ext}")This adapter pattern is the foundation of instrument-agnostic spectroscopy software and is central to the SpectraDx platform approach. For more on how this fits into a clinical deployment architecture, see Building Clinical Workflow Software for Spectroscopy-Based Diagnostics. For details on moving parsed spectral data into classification models, see Building AI Pipelines for Spectral Classification.
Further Reading
- The Complete Guide to Bruker OPUS Programmatic Interfaces - deep dive into OPUS file parsing and instrument control
- Automating FTIR Measurements with Python - practical tutorial with batch processing code
- Connecting Spectroscopy Instruments to LIMS - how parsed spectral data feeds into laboratory information systems
- IUPAC JCAMP-DX standard: doi:10.1351/pac199163121781 (v4.24), doi:10.1351/pac199365122023 (v5.0)

