comparison README @ 39:f9ca85d2f14c

*: rearrange some things; add avx512bw support
author Paper <paper@tflc.us>
date Sat, 26 Apr 2025 15:31:39 -0400
parents fd42f9b1b95e
children 55cadb1fac4b
comparison
equal deleted inserted replaced
38:fd42f9b1b95e 39:f9ca85d2f14c
136 To use vec, simply include `vec/vec.h` in your program. If you would like 136 To use vec, simply include `vec/vec.h` in your program. If you would like
137 your program to also be able to run on older systems, you can create 137 your program to also be able to run on older systems, you can create
138 multiple translation units and pass different command line arguments 138 multiple translation units and pass different command line arguments
139 to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector 139 to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector
140 modes the CPU supports at runtime. vec provides an optional public API 140 modes the CPU supports at runtime. vec provides an optional public API
141 specifically for this use-case within `vec/impl/cpu.h`; bear in mind 141 specifically for this use-case within `vec/cpu.h`; bear in mind though
142 though that it is not thread-safe, so if your program is multithreaded 142 that it is not thread-safe, so if your program is multithreaded you'll want
143 you'll want to cache the results on startup. 143 to cache the results on startup.
144 144
145 The CPU vector detection API is extremely simple, and self-explanatory. 145 The CPU vector detection API is extremely simple, and self-explanatory.
146 You call `vec_get_CPU_features()', and it returns a bit-mask of the 146 You call `vec_get_CPU_features()', and it returns a bit-mask of the
147 values within the enum placed above the function definition. From there, 147 values within the enum placed above the function definition. From there,
148 you can test for each value specifically. 148 you can test for each value specifically.
175 175
176 /* no need to free the aligned array -- it is always on the stack */ 176 /* no need to free the aligned array -- it is always on the stack */
177 177
178 The heap-based API is based off the good old C malloc API: 178 The heap-based API is based off the good old C malloc API:
179 179
180 /* heap allocation stuff is only defined here: */
181 #include "vec/mem.h"
182
180 vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32)); 183 vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32));
181 184
182 /* q is now aligned, and ready for use with a vector aligned load 185 /* q is now aligned, and ready for use with a vector aligned load
183 * function. */ 186 * function. */
184 vint32x16_load_aligned(q); 187 vint32x16_load_aligned(q);