Description
Hi @effigies and @matthew-brett (pinging you both directly, because you've had the most involvement in the relevant bits of the nibabel code).
A colleague of mine, very competent, but fairly new to nibabel and numpy, recently found himself in a very confusing situation, ultimately brought about by the scl_slope
/ scl_inter
rescaling logic baked into nibabel, and I'm wondering if another section should be added to the nibabel documentation which gives recommendations on how to work with NIFTI images of integral type.
A full example is given below, but in brief, he was working with a NIFTI1 image of type int32
, but working with float64
arrays in memory. He was repeatedly loading an image in, manipulating it, and saving it back out as an int32
image. After a couple of iterations, the floating point imprecision in the scl_slope
and scl_inter
fields had accumulated enough that the integral values of his image data were no longer being re-constructed, which caused problems when he tried to create and use a binary mask.
The problem boils down to a combination of two coding patterns that are in common use:
-
Using
get_fdata
to retrieve the image data, so that it is always loaded and manipulated asfloat64
, regardless of the storage type. -
When creating a new image from an existing image (e.g. after some processing step), using the header from the original image to preserve the NIFTI header metadata (not that there's much of use in there, but that's another matter):
image1 = nib.load('image1.nii.gz') data = image1.get_fdata() # do stuff to data # Save new image, preserving header metadata # The catch here is that the data is float64, # but the header specifies int32 (or some # integral type), so the data gets re-scaled # to use up the full range of the storage type. image2 = nib.Nifti1Image(data, None, image1.header) image2.to_filename('image2.nii.gz')
Using the above patterns with images of integer type can lead to problems, as demonstrated in the example below. So what I'm wondering is whether a section should be added to the nibabel documentation, with some recommended strategies for working with integral data. I'd be happy to attempt a first pass if you think it's a good idea.
I can think of a few solutions/patterns to avoiding this problem, any of which could be the best approach depending on the situation:
- When loading, loading the input data as its storage type
- When loading, rounding rather than casting, to avoid truncation
- When writing, converting the data to the desired storage type
To reproduce the situation that my colleague found himself in:
import numpy as np
import nibabel as nib
# Create some random int32 data
# and save it as a NIFTI image
nib.Nifti1Image(
np.random.randint(0, 100, (20, 20, 20)).astype(np.int32),
np.eye(4)).to_filename('image1.nii.gz')
# Now load that image in using
# the recommended get_fdata
image1 = nib.load('image1.nii.gz')
data1 = image1.get_fdata()
# Create a copy of the image, preserving
# the heaader info from the original image.
# Because data1 is float64, and the image
# header specifies int32, the data will be
# rescaled to use the full int32 range.
image2 = nib.Nifti1Image(data1, None, header=image1.header)
image2.to_filename('image2.nii.gz')
# Load that image in, and create a binary
# mask from it.
#
# (Here, because scl_slope and scl_inter
# are set on the image, it doesn't matter
# whether we use get_fdata or dataobj -
# in either case we are given the data as
# float64).
#
# Again, we use the same header, and the
# data is thus rescaled again and saved
# as int32.
image3 = nib.load('image2.nii.gz')
data3 = np.asarray(image3.dataobj)
data3[data3 <= 50] = 0
data3[data3 > 50] = 1
image3 = nib.Nifti1Image(data3, None, header=image1.header)
image3.to_filename('image3.nii.gz')
print(f'number of voxels in binary mask: {data3.sum()}')
# Re-load the mask from file. We're
# using it as a mask, so we'll cast it
# to an integer type
image4 = nib.load('image3.nii.gz')
data4 = np.asarray(image4.dataobj).astype(np.int8)
# But now we have a problem - the re-scaling
# which took place when image3.nii.gz was
# loaded have produced values which are
# slightly less than 1, so the cast to int8
# has truncated them and produced a binary
# mask containing all zeros.
print(f'number of voxels in binary mask: {data4.sum()}')
print(f' Slope: {image4.dataobj.slope}')
print(f' Inter: {image4.dataobj.inter}')
Example output (note the teensy scaling factor):
number of voxels in binary mask: 3980.0
number of voxels in binary mask: 0
Slope: 2.3283064365386963e-10
Inter: 0.5