Mercurial > vec
view README @ 39:f9ca85d2f14c
*: rearrange some things; add avx512bw support
author | Paper <paper@tflc.us> |
---|---|
date | Sat, 26 Apr 2025 15:31:39 -0400 |
parents | fd42f9b1b95e |
children | 55cadb1fac4b |
line wrap: on
line source
vec - a tiny SIMD vector header-only library written in C99 - Be prepared! Are you sure you want to know? :-) ------------------------------------------------------------------------------ THE VECTOR API ------------------------------------------------------------------------------ vec comes with an extremely basic API that is similar to other intrinsics libraries; each type is in the exact same format: v[sign][bits]x[size] where `sign' is either nothing (for signed) or `u' (for unsigned), `bits' is the bit size of the integer format, and `size' is the how many integers are in the vector vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics on processors where vec has an implementation and falls back to array-based implementations where they are not. For example, creating a 256-bit vector on powerpc would simply create two consecutive 128-bit vectors. All of these have many operations that are prefixed with the name of the type and an underscore, for example: vint8x16 vint8x16_splat(int8_t x) - creates a vint8x16 where all of the values are filled with the value of `x' The currently supported operations are: v[u]intAxB splat([u]intA_t x) creates a vector with all of the values are filled with the value of `x' v[u]intAxB load(const [u]intA_t x[B]) copies the values from the memory address stored at `x'; the address is NOT required to be aligned v[u]intAxB load_aligned(const [u]intA_t x[B]) like `load', but the address is required to be aligned, which can cause some speed improvements if done correctly. void store(v[u]intAxB vec, [u]intA_t x[B]) copies the values from the vector into the memory address stored at `x'. like with load(), this does not require address alignment void store_aligned(v[u]intAxB vec, [u]intA_t x[B]) like `store', but the address is required to be aligned, which can cause some speed improvements if done correctly. v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2) adds the value of `vec1' and `vec2' and returns it v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2) subtracts the value of `vec2' from `vec1' and returns it v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2) multiplies the values of `vec1' and `vec2' together and returns it v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2) divides vec1 by the values in vec2. dividing by zero is considered defined behavior and should result in a zero; if this doesn't happen it's considered a bug v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2) bitwise AND (&) of the values in both vectors v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2) bitwise OR (|) of the values in both vectors v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2) bitwise XOR (^) of the values in both vectors v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2) arithmetic right shift of the values in vec1 by the corresponding values in vec2 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2) arithmetic left shift of the values in vec1 by the corresponding values in vec2 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2) logical right shift of the values in vec1 by the corresponding values in vec2 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2) returns the average of the values in both vectors i.e., div(add(vec1, vec2), splat(2)), without the possibility of overflow. v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2) returns the minimum of the values in both vectors v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2) returns the maximum of the values in both vectors There are also a number of comparisons possible: v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2) turns on all bits of the corresponding value in the result vector if the value in `vec1' is less than the corresponding value in `vec2', else all of the bits are turned off. v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2) turns on all bits of the corresponding value in the result vector if the value in `vec1' is greater than the corresponding value in `vec2', else all of the bits are turned off. v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2) turns on all bits of the corresponding value in the result vector if the value in `vec1' are equal to the corresponding value in `vec2', else all of the bits are turned off. v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2) turns on all bits of the corresponding value in the result vector if the value in `vec1' is less than or equal to the corresponding value in `vec2', else all of the bits are turned off. v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2) turns on all bits of the corresponding value in the result vector if the value in `vec1' is greater than or equal to the corresponding value in `vec2', else all of the bits are turned off. This API will most definitely have more operations available as they are requested (and as they are needed). Patches are accepted and encouraged! ------------------------------------------------------------------------------ USING VEC ------------------------------------------------------------------------------ To use vec, simply include `vec/vec.h` in your program. If you would like your program to also be able to run on older systems, you can create multiple translation units and pass different command line arguments to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector modes the CPU supports at runtime. vec provides an optional public API specifically for this use-case within `vec/cpu.h`; bear in mind though that it is not thread-safe, so if your program is multithreaded you'll want to cache the results on startup. The CPU vector detection API is extremely simple, and self-explanatory. You call `vec_get_CPU_features()', and it returns a bit-mask of the values within the enum placed above the function definition. From there, you can test for each value specifically. vec should work perfectly fine with C++, though it is not tested as thoroughly as C is. Your mileage may vary. You should probably be using a library more tailored towards C++ such as Highway[1] or std::simd. [1]: https://google.github.io/highway/en/master/ ------------------------------------------------------------------------------ MEMORY ALLOCATION ------------------------------------------------------------------------------ vec allows for stack-based and heap-based aligned array allocation. The stack-based API is simple, and goes among the lines of this: VINT16x32_ALIGNED_ARRAY(arr); /* arr is now either an array type or a pointer type, depending on whether * the compiler supports the alignas operator within C11 or later, or has * its own extension to align arrays. vec will fallback to manual pointer * alignment if the compiler does not support it. */ /* this macro returns the full size of the array in bytes */ int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr); /* this macro returns the length of the array * (basically a synonym for sizeof/sizeof[0]) */ int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr); /* no need to free the aligned array -- it is always on the stack */ The heap-based API is based off the good old C malloc API: /* heap allocation stuff is only defined here: */ #include "vec/mem.h" vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32)); /* q is now aligned, and ready for use with a vector aligned load * function. */ vint32x16_load_aligned(q); /* Say we want to reallocate the memory with a different size. * No problem there! */ q = vec_realloc(q, 2048 * sizeof(vec_int32)); /* In a real world program, you'll want to check that vec_malloc * and vec_realloc do not fail, but this error checking has been * withheld from this example, as it is the same as for regular * malloc and realloc. */ vec_free(q); /* If you need it to be initialized, we have you covered: */ q = vec_calloc(1024, sizeof(vec_int32)); /* vec_calloc forwards to the real calloc, so there is no overhead of * calling memset or something similar. */ vec_free(q); ------------------------------------------------------------------------------ THE BOTTOM ------------------------------------------------------------------------------ vec is copyright (c) Paper 2024-2025. See the file LICENSE in the distribution for more information. Bugs? Questions? Suggestions? Patches? Feel free to contact me at any of the following: Website: https://tflc.us/ Email: paper@tflc.us IRC: slipofpaper on Libera.chat Discord: @slipofpaper am I a real programmer now? :^)