Mercurial > vec
comparison README @ 38:fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
I slightly edited vec.h however to use calloc directly rather
than malloc + memset.
| author | Paper <paper@tflc.us> |
|---|---|
| date | Sat, 26 Apr 2025 02:54:44 -0400 |
| parents | 677c03c382b8 |
| children | f9ca85d2f14c |
comparison
equal
deleted
inserted
replaced
| 37:4b5a557aa64f | 38:fd42f9b1b95e |
|---|---|
| 1 vec - a tiny SIMD vector header-only library written in C99 | 1 vec - a tiny SIMD vector header-only library written in C99 |
| 2 | 2 |
| 3 it comes with an extremely basic API that is similar to other intrinsics | 3 - Be prepared! Are you sure you want to know? :-) |
| 4 | |
| 5 ------------------------------------------------------------------------------ | |
| 6 THE VECTOR API | |
| 7 ------------------------------------------------------------------------------ | |
| 8 vec comes with an extremely basic API that is similar to other intrinsics | |
| 4 libraries; each type is in the exact same format: | 9 libraries; each type is in the exact same format: |
| 5 | 10 |
| 6 v[sign][bits]x[size] | 11 v[sign][bits]x[size] |
| 7 where `sign' is either nothing (for signed) or `u' (for unsigned), | 12 where `sign' is either nothing (for signed) or `u' (for unsigned), |
| 8 `bits' is the bit size of the integer format, | 13 `bits' is the bit size of the integer format, |
| 9 and `size' is the how many integers are in the vector | 14 and `size' is the how many integers are in the vector |
| 10 | 15 |
| 11 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics | 16 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics |
| 12 on processors where vec has an implementation and falls back to array-based | 17 on processors where vec has an implementation and falls back to array-based |
| 13 implementations where they are not. | 18 implementations where they are not. For example, creating a 256-bit vector |
| 14 | 19 on powerpc would simply create two consecutive 128-bit vectors. |
| 15 all of these have many operations that are prefixed with the name of the | 20 |
| 21 All of these have many operations that are prefixed with the name of the | |
| 16 type and an underscore, for example: | 22 type and an underscore, for example: |
| 17 | 23 |
| 18 vint8x16 vint8x16_splat(uint8_t x) | 24 vint8x16 vint8x16_splat(int8_t x) |
| 19 - creates a vint8x16 where all of the values are filled | 25 - creates a vint8x16 where all of the values are filled |
| 20 with the value of `x' | 26 with the value of `x' |
| 21 | 27 |
| 22 the current supported operations are: | 28 The currently supported operations are: |
| 23 | 29 |
| 24 v[u]intAxB splat([u]intA_t x) | 30 v[u]intAxB splat([u]intA_t x) |
| 25 creates a vector with all of the values are filled with | 31 creates a vector with all of the values are filled with |
| 26 the value of `x' | 32 the value of `x' |
| 27 | 33 |
| 28 v[u]intAxB load(const [u]intA_t x[B]) | 34 v[u]intAxB load(const [u]intA_t x[B]) |
| 29 copies the values from the memory address stored at `x'; | 35 copies the values from the memory address stored at `x'; |
| 30 the address is NOT required to be aligned | 36 the address is NOT required to be aligned |
| 31 | 37 |
| 38 v[u]intAxB load_aligned(const [u]intA_t x[B]) | |
| 39 like `load', but the address is required to be aligned, | |
| 40 which can cause some speed improvements if done correctly. | |
| 41 | |
| 32 void store(v[u]intAxB vec, [u]intA_t x[B]) | 42 void store(v[u]intAxB vec, [u]intA_t x[B]) |
| 33 copies the values from the vector into the memory address | 43 copies the values from the vector into the memory address |
| 34 stored at `x' | 44 stored at `x'. |
| 35 | |
| 36 like with load(), this does not require address alignment | 45 like with load(), this does not require address alignment |
| 46 | |
| 47 void store_aligned(v[u]intAxB vec, [u]intA_t x[B]) | |
| 48 like `store', but the address is required to be aligned, | |
| 49 which can cause some speed improvements if done correctly. | |
| 37 | 50 |
| 38 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2) | 51 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2) |
| 39 adds the value of `vec1' and `vec2' and returns it | 52 adds the value of `vec1' and `vec2' and returns it |
| 40 | 53 |
| 41 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2) | 54 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2) |
| 71 logical right shift of the values in vec1 by | 84 logical right shift of the values in vec1 by |
| 72 the corresponding values in vec2 | 85 the corresponding values in vec2 |
| 73 | 86 |
| 74 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2) | 87 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2) |
| 75 returns the average of the values in both vectors | 88 returns the average of the values in both vectors |
| 76 i.e., div(mul(vec1, vec2), splat(2)) | 89 i.e., div(add(vec1, vec2), splat(2)), without |
| 77 | 90 the possibility of overflow. |
| 78 there are also a number of comparisons possible: | 91 |
| 92 v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2) | |
| 93 returns the minimum of the values in both vectors | |
| 94 | |
| 95 v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2) | |
| 96 returns the maximum of the values in both vectors | |
| 97 | |
| 98 There are also a number of comparisons possible: | |
| 79 | 99 |
| 80 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2) | 100 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2) |
| 81 turns on all bits of the corresponding value in | 101 turns on all bits of the corresponding value in |
| 82 the result vector if the value in `vec1' is less | 102 the result vector if the value in `vec1' is less |
| 83 than the corresponding value in `vec2', else all | 103 than the corresponding value in `vec2', else all |
| 105 turns on all bits of the corresponding value in | 125 turns on all bits of the corresponding value in |
| 106 the result vector if the value in `vec1' is greater | 126 the result vector if the value in `vec1' is greater |
| 107 than or equal to the corresponding value in `vec2', | 127 than or equal to the corresponding value in `vec2', |
| 108 else all of the bits are turned off. | 128 else all of the bits are turned off. |
| 109 | 129 |
| 110 to initialize vec, you MUST call `vec_init()' when your programs starts up. | 130 This API will most definitely have more operations available as they are |
| 111 | 131 requested (and as they are needed). Patches are accepted and encouraged! |
| 112 note that `vec_init()' is NOT thread-safe, and things can and will | 132 |
| 113 blow up if you call it simultaneously from different threads (i.e. you | 133 ------------------------------------------------------------------------------ |
| 114 try to only initialize it when you need to... please just initialize | 134 USING VEC |
| 115 it on startup so you don't have to worry about that!!!) | 135 ------------------------------------------------------------------------------ |
| 136 To use vec, simply include `vec/vec.h` in your program. If you would like | |
| 137 your program to also be able to run on older systems, you can create | |
| 138 multiple translation units and pass different command line arguments | |
| 139 to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector | |
| 140 modes the CPU supports at runtime. vec provides an optional public API | |
| 141 specifically for this use-case within `vec/impl/cpu.h`; bear in mind | |
| 142 though that it is not thread-safe, so if your program is multithreaded | |
| 143 you'll want to cache the results on startup. | |
| 144 | |
| 145 The CPU vector detection API is extremely simple, and self-explanatory. | |
| 146 You call `vec_get_CPU_features()', and it returns a bit-mask of the | |
| 147 values within the enum placed above the function definition. From there, | |
| 148 you can test for each value specifically. | |
| 149 | |
| 150 vec should work perfectly fine with C++, though it is not tested as | |
| 151 thoroughly as C is. Your mileage may vary. You should probably be using | |
| 152 a library more tailored towards C++ such as Highway[1] or std::simd. | |
| 153 | |
| 154 [1]: https://google.github.io/highway/en/master/ | |
| 155 | |
| 156 ------------------------------------------------------------------------------ | |
| 157 MEMORY ALLOCATION | |
| 158 ------------------------------------------------------------------------------ | |
| 159 vec allows for stack-based and heap-based aligned array allocation. The | |
| 160 stack-based API is simple, and goes among the lines of this: | |
| 161 | |
| 162 VINT16x32_ALIGNED_ARRAY(arr); | |
| 163 | |
| 164 /* arr is now either an array type or a pointer type, depending on whether | |
| 165 * the compiler supports the alignas operator within C11 or later, or has | |
| 166 * its own extension to align arrays. vec will fallback to manual pointer | |
| 167 * alignment if the compiler does not support it. */ | |
| 168 | |
| 169 /* this macro returns the full size of the array in bytes */ | |
| 170 int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr); | |
| 171 | |
| 172 /* this macro returns the length of the array | |
| 173 * (basically a synonym for sizeof/sizeof[0]) */ | |
| 174 int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr); | |
| 175 | |
| 176 /* no need to free the aligned array -- it is always on the stack */ | |
| 177 | |
| 178 The heap-based API is based off the good old C malloc API: | |
| 179 | |
| 180 vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32)); | |
| 181 | |
| 182 /* q is now aligned, and ready for use with a vector aligned load | |
| 183 * function. */ | |
| 184 vint32x16_load_aligned(q); | |
| 185 | |
| 186 /* Say we want to reallocate the memory with a different size. | |
| 187 * No problem there! */ | |
| 188 q = vec_realloc(q, 2048 * sizeof(vec_int32)); | |
| 189 | |
| 190 /* In a real world program, you'll want to check that vec_malloc | |
| 191 * and vec_realloc do not fail, but this error checking has been | |
| 192 * withheld from this example, as it is the same as for regular | |
| 193 * malloc and realloc. */ | |
| 194 | |
| 195 vec_free(q); | |
| 196 | |
| 197 /* If you need it to be initialized, we have you covered: */ | |
| 198 q = vec_calloc(1024, sizeof(vec_int32)); | |
| 199 | |
| 200 /* vec_calloc forwards to the real calloc, so there is no overhead of | |
| 201 * calling memset or something similar. */ | |
| 202 | |
| 203 vec_free(q); | |
| 204 | |
| 205 ------------------------------------------------------------------------------ | |
| 206 THE BOTTOM | |
| 207 ------------------------------------------------------------------------------ | |
| 208 vec is copyright (c) Paper 2024-2025. | |
| 209 See the file LICENSE in the distribution for more information. | |
| 210 | |
| 211 Bugs? Questions? Suggestions? Patches? | |
| 212 Feel free to contact me at any of the following: | |
| 213 | |
| 214 Website: https://tflc.us/ | |
| 215 Email: paper@tflc.us | |
| 216 IRC: slipofpaper on Libera.chat | |
| 217 Discord: @slipofpaper | |
| 218 | |
| 219 | |
| 220 am I a real programmer now? :^) |
