There are certainly some novel compression schemes out there, and different ways to represent numbers. For compression on disk, assuming a Python implementation, you could use compressed pickles with gzip
, for example.
I think the overhead with these ML packages, like Tensorflow, can easily can get out of hand, and aren’t necessary beyond basic Numpy, or simple low level routines in C/Rust, for doing embedding analysis. May be required for massive massive things. Although GPU’s could still come in handy for this problem if you have one laying around.
So with this in mind, there are simple ways to get full dynamic range for each embedding chunk by storing an extra FP number per vector.
For each embedding vector, you find the largest magnitude, then you rescale these values to this largest magnitude, assuming two’s compliment, like so for this 16 bit example:
Vectorized Python conversion for 16 bit.
import numpy as np
def convert_array_np(arr):
arr = np.clip(arr, -1, 1) # Ensuring values are within [-1, 1]
scaled_arr = np.round(arr * 32768).astype(int)
scaled_arr = np.clip(scaled_arr, -32768, 32767) # Clipping to 16-bit signed int range
return scaled_arr
# Example usage
arr = np.array([0.5, -0.5, 1, -1, 0.1])
converted_arr = convert_array_np(arr)
print(converted_arr)
In the code above, you pre-process the array arr
by dividing the entire array by the max magnitude (Positive or negative, but store only magnitude. Note: This is not done in the code above, you would do it outside and before this function). And you record this magnitude for later. Also you’d have to recast this array to 16 bit signed integers, not shown.
You now have max dynamic range on this embedding vector for your chosen bit depth.
You do your integer math for the dot-product, then you scale the result by both the input max magnitude and the target knowledge vector max magnitude. So essentially 2 extra FP multiplies. So if the first one hit -0.734, and the second one hit 0.539, you correlate and multiply the correlation by 0.734*0.539. And of course you divide by 2^{15} if you want to get back to \pm 1 cosine similarities.
As for \mu-Law, my understanding is that is more to do with speech and human hearing. But I have more of a DSP, non-audio background, so I’m not sure.
Instead, the embeddings lend themselves more to the non-audio, pure DSP signal processing domain, IMO, since they aren’t audio per se.