Mercurial > vec
view README @ 11:13575ba795d3
impl/gcc: add native 256-bit and 512-bit intrinsics
these are simple to implement.
At some point I'd like to refactor vec into using a union and being
able to detect AVX512 and friends at compile time, so that the processors
that *can* use it are enabled at runtime. This would mean adding a vec_init
function, which isn't that big of a deal and can just be run at startup
anyway and will grab the CPU flags we need.
author | Paper <paper@tflc.us> |
---|---|
date | Mon, 18 Nov 2024 16:12:24 -0500 |
parents | f12b5dd4e18c |
children | e05c257c6a23 |
line wrap: on
line source
vec - a tiny SIMD vector header-only library written in C99 it comes with an extremely basic (and somewhat lacking) API, where there are eight supported vector types, all 128-bit: vint8x16 - 16 signed 8-bit integers vint16x8 - 8 signed 16-bit integers vint32x4 - 4 signed 32-bit integers vint64x2 - 2 signed 64-bit integers vuint8x16 - 16 unsigned 8-bit integers vuint16x8 - 8 unsigned 16-bit integers vuint32x4 - 4 unsigned 32-bit integers vuint32x4 - 2 unsigned 64-bit integers all of these have many operations that are prefixed with the name of the type and an underscore, for example: vint8x16 vint8x16_splat(uint8_t x) - creates a vint8x16 where all of the values are filled with the value of `x' the current supported operations are: v[u]intAxB splat([u]intA_t x) creates a vector with all of the values are filled with the value of `x' v[u]intAxB load(const [u]intA_t x[B]) copies the values from the memory address stored at `x'; the address is NOT required to be aligned void store(v[u]intAxB vec, [u]intA_t x[B]) copies the values from the vector into the memory address stored at `x' like with load(), this does not require address alignment v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2) adds the value of `vec1' and `vec2' and returns it v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2) subtracts the value of `vec2' from `vec1' and returns it v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2) multiplies the values of `vec1' and `vec2' together and returns it v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2) divides vec1 by the values in vec2. dividing by zero is considered defined behavior and should result in a zero; if this doesn't happen it's considered a bug v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2) bitwise AND (&) of the values in both vectors v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2) bitwise OR (|) of the values in both vectors v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2) bitwise XOR (^) of the values in both vectors v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2) arithmetic right shift of the values in vec1 by the corresponding values in vec2 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2) arithmetic left shift of the values in vec1 by the corresponding values in vec2 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2) logical right shift of the values in vec1 by the corresponding values in vec2 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2) returns the average of the values in both vectors i.e., div(mul(vec1, vec2), splat(2)) there are also a number of comparisons possible: v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2) turns on all bits of the corresponding value in the result vector if the value in `vec1' is less than the corresponding value in `vec2', else all of the bits are turned off. v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2) turns on all bits of the corresponding value in the result vector if the value in `vec1' is greater than the corresponding value in `vec2', else all of the bits are turned off. v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2) turns on all bits of the corresponding value in the result vector if the value in `vec1' are equal to the corresponding value in `vec2', else all of the bits are turned off. v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2) turns on all bits of the corresponding value in the result vector if the value in `vec1' is less than or equal to the corresponding value in `vec2', else all of the bits are turned off. v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2) turns on all bits of the corresponding value in the result vector if the value in `vec1' is greater than or equal to the corresponding value in `vec2', else all of the bits are turned off.