Mercurial > vec
diff README @ 38:fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
I slightly edited vec.h however to use calloc directly rather
than malloc + memset.
author | Paper <paper@tflc.us> |
---|---|
date | Sat, 26 Apr 2025 02:54:44 -0400 |
parents | 677c03c382b8 |
children | f9ca85d2f14c |
line wrap: on
line diff
--- a/README Sat Apr 26 01:04:35 2025 -0400 +++ b/README Sat Apr 26 02:54:44 2025 -0400 @@ -1,6 +1,11 @@ vec - a tiny SIMD vector header-only library written in C99 -it comes with an extremely basic API that is similar to other intrinsics +- Be prepared! Are you sure you want to know? :-) + +------------------------------------------------------------------------------ +THE VECTOR API +------------------------------------------------------------------------------ +vec comes with an extremely basic API that is similar to other intrinsics libraries; each type is in the exact same format: v[sign][bits]x[size] @@ -10,16 +15,17 @@ vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics on processors where vec has an implementation and falls back to array-based -implementations where they are not. +implementations where they are not. For example, creating a 256-bit vector +on powerpc would simply create two consecutive 128-bit vectors. -all of these have many operations that are prefixed with the name of the +All of these have many operations that are prefixed with the name of the type and an underscore, for example: - vint8x16 vint8x16_splat(uint8_t x) + vint8x16 vint8x16_splat(int8_t x) - creates a vint8x16 where all of the values are filled with the value of `x' -the current supported operations are: +The currently supported operations are: v[u]intAxB splat([u]intA_t x) creates a vector with all of the values are filled with @@ -29,11 +35,18 @@ copies the values from the memory address stored at `x'; the address is NOT required to be aligned + v[u]intAxB load_aligned(const [u]intA_t x[B]) + like `load', but the address is required to be aligned, + which can cause some speed improvements if done correctly. + void store(v[u]intAxB vec, [u]intA_t x[B]) copies the values from the vector into the memory address - stored at `x' + stored at `x'. + like with load(), this does not require address alignment - like with load(), this does not require address alignment + void store_aligned(v[u]intAxB vec, [u]intA_t x[B]) + like `store', but the address is required to be aligned, + which can cause some speed improvements if done correctly. v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2) adds the value of `vec1' and `vec2' and returns it @@ -73,9 +86,16 @@ v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2) returns the average of the values in both vectors - i.e., div(mul(vec1, vec2), splat(2)) + i.e., div(add(vec1, vec2), splat(2)), without + the possibility of overflow. -there are also a number of comparisons possible: + v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2) + returns the minimum of the values in both vectors + + v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2) + returns the maximum of the values in both vectors + +There are also a number of comparisons possible: v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2) turns on all bits of the corresponding value in @@ -107,9 +127,94 @@ than or equal to the corresponding value in `vec2', else all of the bits are turned off. -to initialize vec, you MUST call `vec_init()' when your programs starts up. +This API will most definitely have more operations available as they are +requested (and as they are needed). Patches are accepted and encouraged! + +------------------------------------------------------------------------------ +USING VEC +------------------------------------------------------------------------------ +To use vec, simply include `vec/vec.h` in your program. If you would like +your program to also be able to run on older systems, you can create +multiple translation units and pass different command line arguments +to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector +modes the CPU supports at runtime. vec provides an optional public API +specifically for this use-case within `vec/impl/cpu.h`; bear in mind +though that it is not thread-safe, so if your program is multithreaded +you'll want to cache the results on startup. + +The CPU vector detection API is extremely simple, and self-explanatory. +You call `vec_get_CPU_features()', and it returns a bit-mask of the +values within the enum placed above the function definition. From there, +you can test for each value specifically. + +vec should work perfectly fine with C++, though it is not tested as +thoroughly as C is. Your mileage may vary. You should probably be using +a library more tailored towards C++ such as Highway[1] or std::simd. + +[1]: https://google.github.io/highway/en/master/ + +------------------------------------------------------------------------------ +MEMORY ALLOCATION +------------------------------------------------------------------------------ +vec allows for stack-based and heap-based aligned array allocation. The +stack-based API is simple, and goes among the lines of this: + + VINT16x32_ALIGNED_ARRAY(arr); + + /* arr is now either an array type or a pointer type, depending on whether + * the compiler supports the alignas operator within C11 or later, or has + * its own extension to align arrays. vec will fallback to manual pointer + * alignment if the compiler does not support it. */ + + /* this macro returns the full size of the array in bytes */ + int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr); + + /* this macro returns the length of the array + * (basically a synonym for sizeof/sizeof[0]) */ + int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr); -note that `vec_init()' is NOT thread-safe, and things can and will -blow up if you call it simultaneously from different threads (i.e. you -try to only initialize it when you need to... please just initialize -it on startup so you don't have to worry about that!!!) + /* no need to free the aligned array -- it is always on the stack */ + +The heap-based API is based off the good old C malloc API: + + vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32)); + + /* q is now aligned, and ready for use with a vector aligned load + * function. */ + vint32x16_load_aligned(q); + + /* Say we want to reallocate the memory with a different size. + * No problem there! */ + q = vec_realloc(q, 2048 * sizeof(vec_int32)); + + /* In a real world program, you'll want to check that vec_malloc + * and vec_realloc do not fail, but this error checking has been + * withheld from this example, as it is the same as for regular + * malloc and realloc. */ + + vec_free(q); + + /* If you need it to be initialized, we have you covered: */ + q = vec_calloc(1024, sizeof(vec_int32)); + + /* vec_calloc forwards to the real calloc, so there is no overhead of + * calling memset or something similar. */ + + vec_free(q); + +------------------------------------------------------------------------------ +THE BOTTOM +------------------------------------------------------------------------------ +vec is copyright (c) Paper 2024-2025. +See the file LICENSE in the distribution for more information. + +Bugs? Questions? Suggestions? Patches? +Feel free to contact me at any of the following: + +Website: https://tflc.us/ +Email: paper@tflc.us +IRC: slipofpaper on Libera.chat +Discord: @slipofpaper + + +am I a real programmer now? :^) \ No newline at end of file