Package org.apache.lucene.util
Class VectorUtil
java.lang.Object
org.apache.lucene.util.VectorUtil
Utilities for computations with numeric arrays, especially algebraic operations like vector dot
products. This class uses SIMD vectorization if the corresponding Java module is available and
enabled. To enable vectorized code, pass
--add-modules jdk.incubator.vector to Java's
command line.
It will use CPU's FMA instructions if it is known to perform faster than separate multiply+add. This requires at least Hotspot C2 enabled, which is the default for OpenJDK based JVMs.
To explicitly disable or enable FMA usage, pass the following system properties:
-Dlucene.useScalarFMA=(auto|true|false)for scalar operations-Dlucene.useVectorFMA=(auto|true|false)for vectorized operations (with vector incubator module)
The default is auto, which enables this for known CPU types and JVM settings. If
Hotspot C2 is disabled, FMA and vectorization are not used.
Vectorization and FMA is only supported for Hotspot-based JVMs; it won't work on OpenJ9-based
JVMs unless they provide HotSpotDiagnosticMXBean. Please also make
sure that you have the jdk.management module enabled in modularized applications.
-
Method Summary
Modifier and TypeMethodDescriptionstatic voidadd(float[] u, float[] v) Adds the second argument to the firststatic float[]checkFinite(float[] v) Checks if a float vector only has finite components.static floatcosine(byte[] a, byte[] b) Returns the cosine similarity between the two vectors.static floatcosine(float[] a, float[] b) Returns the cosine similarity between the two vectors.static intdotProduct(byte[] a, byte[] b) Dot product computed over signed bytes.static floatdotProduct(float[] a, float[] b) Returns the vector dot product of the two vectors.static floatdotProductScore(byte[] a, byte[] b) Dot product score computed over signed bytes, scaled to be in [0, 1].static intfindNextGEQ(int[] buffer, int target, int from, int to) Given an arraybufferthat is sorted between indexes0inclusive andtoexclusive, find the first array index whose value is greater than or equal totarget.static longint4BitDotProduct(byte[] q, byte[] d) Dot product computed over int4 (values between [0,15]) bytes and a binary vector.static intint4DotProduct(byte[] a, byte[] b) static intint4DotProductPacked(byte[] unpacked, byte[] packed) Dot product computed over int4 (values between [0,15]) bytes.static booleanisUnitVector(float[] v) static float[]l2normalize(float[] v) Modifies the argument to be unit length, dividing by its l2-norm.static float[]l2normalize(float[] v, boolean throwOnZero) Modifies the argument to be unit length, dividing by its l2-norm.static floatminMaxScalarQuantize(float[] vector, byte[] dest, float scale, float alpha, float minQuantile, float maxQuantile) Scalar quantizesvector, putting the result intodest.static floatrecalculateOffset(byte[] vector, float oldAlpha, float oldMinQuantile, float scale, float alpha, float minQuantile, float maxQuantile) Recalculates the offset forvector.static floatscaleMaxInnerProductScore(float vectorDotProductSimilarity) static intsquareDistance(byte[] a, byte[] b) Returns the sum of squared differences of the two vectors.static floatsquareDistance(float[] a, float[] b) Returns the sum of squared differences of the two vectors.static intxorBitCount(byte[] a, byte[] b) XOR bit count computed over signed bytes.
-
Method Details
-
dotProduct
public static float dotProduct(float[] a, float[] b) Returns the vector dot product of the two vectors.- Throws:
IllegalArgumentException- if the vectors' dimensions differ.
-
cosine
public static float cosine(float[] a, float[] b) Returns the cosine similarity between the two vectors.- Throws:
IllegalArgumentException- if the vectors' dimensions differ.
-
cosine
public static float cosine(byte[] a, byte[] b) Returns the cosine similarity between the two vectors. -
squareDistance
public static float squareDistance(float[] a, float[] b) Returns the sum of squared differences of the two vectors.- Throws:
IllegalArgumentException- if the vectors' dimensions differ.
-
squareDistance
public static int squareDistance(byte[] a, byte[] b) Returns the sum of squared differences of the two vectors. -
l2normalize
public static float[] l2normalize(float[] v) Modifies the argument to be unit length, dividing by its l2-norm. IllegalArgumentException is thrown for zero vectors.- Returns:
- the input array after normalization
-
isUnitVector
public static boolean isUnitVector(float[] v) -
l2normalize
public static float[] l2normalize(float[] v, boolean throwOnZero) Modifies the argument to be unit length, dividing by its l2-norm.- Parameters:
v- the vector to normalizethrowOnZero- whether to throw an exception whenvhas all zeros- Returns:
- the input array after normalization
- Throws:
IllegalArgumentException- when the vector is all zero and throwOnZero is true
-
add
public static void add(float[] u, float[] v) Adds the second argument to the first- Parameters:
u- the destinationv- the vector to add to the destination
-
dotProduct
public static int dotProduct(byte[] a, byte[] b) Dot product computed over signed bytes.- Parameters:
a- bytes containing a vectorb- bytes containing another vector, of the same dimension- Returns:
- the value of the dot product of the two vectors
-
int4DotProduct
public static int int4DotProduct(byte[] a, byte[] b) -
int4DotProductPacked
public static int int4DotProductPacked(byte[] unpacked, byte[] packed) Dot product computed over int4 (values between [0,15]) bytes. The second vector is considered "packed" (i.e. every byte representing two values). The following packing is assumed:packed[0] = (raw[0] * 16) | raw[packed.length]; packed[1] = (raw[1] * 16) | raw[packed.length + 1]; ... packed[packed.length - 1] = (raw[packed.length - 1] * 16) | raw[2 * packed.length - 1];
- Parameters:
unpacked- the unpacked vector, of even lengthpacked- the packed vector, of length(unpacked.length + 1) / 2- Returns:
- the value of the dot product of the two vectors
-
int4BitDotProduct
public static long int4BitDotProduct(byte[] q, byte[] d) Dot product computed over int4 (values between [0,15]) bytes and a binary vector.- Parameters:
q- the int4 query vectord- the binary document vector- Returns:
- the dot product
-
xorBitCount
public static int xorBitCount(byte[] a, byte[] b) XOR bit count computed over signed bytes.- Parameters:
a- bytes containing a vectorb- bytes containing another vector, of the same dimension- Returns:
- the value of the XOR bit count of the two vectors
-
dotProductScore
public static float dotProductScore(byte[] a, byte[] b) Dot product score computed over signed bytes, scaled to be in [0, 1].- Parameters:
a- bytes containing a vectorb- bytes containing another vector, of the same dimension- Returns:
- the value of the similarity function applied to the two vectors
-
scaleMaxInnerProductScore
public static float scaleMaxInnerProductScore(float vectorDotProductSimilarity) - Parameters:
vectorDotProductSimilarity- the raw similarity between two vectors- Returns:
- A scaled score preventing negative scores for maximum-inner-product
-
checkFinite
public static float[] checkFinite(float[] v) Checks if a float vector only has finite components.- Parameters:
v- bytes containing a vector- Returns:
- the vector for call-chaining
- Throws:
IllegalArgumentException- if any component of vector is not finite
-
findNextGEQ
public static int findNextGEQ(int[] buffer, int target, int from, int to) Given an arraybufferthat is sorted between indexes0inclusive andtoexclusive, find the first array index whose value is greater than or equal totarget. This index is guaranteed to be at leastfrom. If there is no such array index,tois returned. -
minMaxScalarQuantize
public static float minMaxScalarQuantize(float[] vector, byte[] dest, float scale, float alpha, float minQuantile, float maxQuantile) Scalar quantizesvector, putting the result intodest.- Parameters:
vector- the vector to quantizedest- the destination vectorscale- the scaling factoralpha- the alpha valueminQuantile- the lower quantile of the distributionmaxQuantile- the upper quantile of the distribution- Returns:
- the corrective offset that needs to be applied to the score
-
recalculateOffset
public static float recalculateOffset(byte[] vector, float oldAlpha, float oldMinQuantile, float scale, float alpha, float minQuantile, float maxQuantile) Recalculates the offset forvector.- Parameters:
vector- the vector to quantizeoldAlpha- the previous alpha valueoldMinQuantile- the previous lower quantilescale- the scaling factoralpha- the alpha valueminQuantile- the lower quantile of the distributionmaxQuantile- the upper quantile of the distribution- Returns:
- the new corrective offset
-