comparison README @ 38:fd42f9b1b95e

docs: update copyright for 2025, update the README with more info I slightly edited vec.h however to use calloc directly rather than malloc + memset.
author Paper <paper@tflc.us>
date Sat, 26 Apr 2025 02:54:44 -0400
parents 677c03c382b8
children f9ca85d2f14c
comparison
equal deleted inserted replaced
37:4b5a557aa64f 38:fd42f9b1b95e
1 vec - a tiny SIMD vector header-only library written in C99 1 vec - a tiny SIMD vector header-only library written in C99
2 2
3 it comes with an extremely basic API that is similar to other intrinsics 3 - Be prepared! Are you sure you want to know? :-)
4
5 ------------------------------------------------------------------------------
6 THE VECTOR API
7 ------------------------------------------------------------------------------
8 vec comes with an extremely basic API that is similar to other intrinsics
4 libraries; each type is in the exact same format: 9 libraries; each type is in the exact same format:
5 10
6 v[sign][bits]x[size] 11 v[sign][bits]x[size]
7 where `sign' is either nothing (for signed) or `u' (for unsigned), 12 where `sign' is either nothing (for signed) or `u' (for unsigned),
8 `bits' is the bit size of the integer format, 13 `bits' is the bit size of the integer format,
9 and `size' is the how many integers are in the vector 14 and `size' is the how many integers are in the vector
10 15
11 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics 16 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics
12 on processors where vec has an implementation and falls back to array-based 17 on processors where vec has an implementation and falls back to array-based
13 implementations where they are not. 18 implementations where they are not. For example, creating a 256-bit vector
14 19 on powerpc would simply create two consecutive 128-bit vectors.
15 all of these have many operations that are prefixed with the name of the 20
21 All of these have many operations that are prefixed with the name of the
16 type and an underscore, for example: 22 type and an underscore, for example:
17 23
18 vint8x16 vint8x16_splat(uint8_t x) 24 vint8x16 vint8x16_splat(int8_t x)
19 - creates a vint8x16 where all of the values are filled 25 - creates a vint8x16 where all of the values are filled
20 with the value of `x' 26 with the value of `x'
21 27
22 the current supported operations are: 28 The currently supported operations are:
23 29
24 v[u]intAxB splat([u]intA_t x) 30 v[u]intAxB splat([u]intA_t x)
25 creates a vector with all of the values are filled with 31 creates a vector with all of the values are filled with
26 the value of `x' 32 the value of `x'
27 33
28 v[u]intAxB load(const [u]intA_t x[B]) 34 v[u]intAxB load(const [u]intA_t x[B])
29 copies the values from the memory address stored at `x'; 35 copies the values from the memory address stored at `x';
30 the address is NOT required to be aligned 36 the address is NOT required to be aligned
31 37
38 v[u]intAxB load_aligned(const [u]intA_t x[B])
39 like `load', but the address is required to be aligned,
40 which can cause some speed improvements if done correctly.
41
32 void store(v[u]intAxB vec, [u]intA_t x[B]) 42 void store(v[u]intAxB vec, [u]intA_t x[B])
33 copies the values from the vector into the memory address 43 copies the values from the vector into the memory address
34 stored at `x' 44 stored at `x'.
35
36 like with load(), this does not require address alignment 45 like with load(), this does not require address alignment
46
47 void store_aligned(v[u]intAxB vec, [u]intA_t x[B])
48 like `store', but the address is required to be aligned,
49 which can cause some speed improvements if done correctly.
37 50
38 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2) 51 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
39 adds the value of `vec1' and `vec2' and returns it 52 adds the value of `vec1' and `vec2' and returns it
40 53
41 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2) 54 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2)
71 logical right shift of the values in vec1 by 84 logical right shift of the values in vec1 by
72 the corresponding values in vec2 85 the corresponding values in vec2
73 86
74 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2) 87 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
75 returns the average of the values in both vectors 88 returns the average of the values in both vectors
76 i.e., div(mul(vec1, vec2), splat(2)) 89 i.e., div(add(vec1, vec2), splat(2)), without
77 90 the possibility of overflow.
78 there are also a number of comparisons possible: 91
92 v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2)
93 returns the minimum of the values in both vectors
94
95 v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2)
96 returns the maximum of the values in both vectors
97
98 There are also a number of comparisons possible:
79 99
80 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2) 100 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
81 turns on all bits of the corresponding value in 101 turns on all bits of the corresponding value in
82 the result vector if the value in `vec1' is less 102 the result vector if the value in `vec1' is less
83 than the corresponding value in `vec2', else all 103 than the corresponding value in `vec2', else all
105 turns on all bits of the corresponding value in 125 turns on all bits of the corresponding value in
106 the result vector if the value in `vec1' is greater 126 the result vector if the value in `vec1' is greater
107 than or equal to the corresponding value in `vec2', 127 than or equal to the corresponding value in `vec2',
108 else all of the bits are turned off. 128 else all of the bits are turned off.
109 129
110 to initialize vec, you MUST call `vec_init()' when your programs starts up. 130 This API will most definitely have more operations available as they are
111 131 requested (and as they are needed). Patches are accepted and encouraged!
112 note that `vec_init()' is NOT thread-safe, and things can and will 132
113 blow up if you call it simultaneously from different threads (i.e. you 133 ------------------------------------------------------------------------------
114 try to only initialize it when you need to... please just initialize 134 USING VEC
115 it on startup so you don't have to worry about that!!!) 135 ------------------------------------------------------------------------------
136 To use vec, simply include `vec/vec.h` in your program. If you would like
137 your program to also be able to run on older systems, you can create
138 multiple translation units and pass different command line arguments
139 to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector
140 modes the CPU supports at runtime. vec provides an optional public API
141 specifically for this use-case within `vec/impl/cpu.h`; bear in mind
142 though that it is not thread-safe, so if your program is multithreaded
143 you'll want to cache the results on startup.
144
145 The CPU vector detection API is extremely simple, and self-explanatory.
146 You call `vec_get_CPU_features()', and it returns a bit-mask of the
147 values within the enum placed above the function definition. From there,
148 you can test for each value specifically.
149
150 vec should work perfectly fine with C++, though it is not tested as
151 thoroughly as C is. Your mileage may vary. You should probably be using
152 a library more tailored towards C++ such as Highway[1] or std::simd.
153
154 [1]: https://google.github.io/highway/en/master/
155
156 ------------------------------------------------------------------------------
157 MEMORY ALLOCATION
158 ------------------------------------------------------------------------------
159 vec allows for stack-based and heap-based aligned array allocation. The
160 stack-based API is simple, and goes among the lines of this:
161
162 VINT16x32_ALIGNED_ARRAY(arr);
163
164 /* arr is now either an array type or a pointer type, depending on whether
165 * the compiler supports the alignas operator within C11 or later, or has
166 * its own extension to align arrays. vec will fallback to manual pointer
167 * alignment if the compiler does not support it. */
168
169 /* this macro returns the full size of the array in bytes */
170 int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr);
171
172 /* this macro returns the length of the array
173 * (basically a synonym for sizeof/sizeof[0]) */
174 int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr);
175
176 /* no need to free the aligned array -- it is always on the stack */
177
178 The heap-based API is based off the good old C malloc API:
179
180 vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32));
181
182 /* q is now aligned, and ready for use with a vector aligned load
183 * function. */
184 vint32x16_load_aligned(q);
185
186 /* Say we want to reallocate the memory with a different size.
187 * No problem there! */
188 q = vec_realloc(q, 2048 * sizeof(vec_int32));
189
190 /* In a real world program, you'll want to check that vec_malloc
191 * and vec_realloc do not fail, but this error checking has been
192 * withheld from this example, as it is the same as for regular
193 * malloc and realloc. */
194
195 vec_free(q);
196
197 /* If you need it to be initialized, we have you covered: */
198 q = vec_calloc(1024, sizeof(vec_int32));
199
200 /* vec_calloc forwards to the real calloc, so there is no overhead of
201 * calling memset or something similar. */
202
203 vec_free(q);
204
205 ------------------------------------------------------------------------------
206 THE BOTTOM
207 ------------------------------------------------------------------------------
208 vec is copyright (c) Paper 2024-2025.
209 See the file LICENSE in the distribution for more information.
210
211 Bugs? Questions? Suggestions? Patches?
212 Feel free to contact me at any of the following:
213
214 Website: https://tflc.us/
215 Email: paper@tflc.us
216 IRC: slipofpaper on Libera.chat
217 Discord: @slipofpaper
218
219
220 am I a real programmer now? :^)