Mercurial > vec
annotate README @ 39:f9ca85d2f14c
*: rearrange some things; add avx512bw support
author | Paper <paper@tflc.us> |
---|---|
date | Sat, 26 Apr 2025 15:31:39 -0400 |
parents | fd42f9b1b95e |
children | 55cadb1fac4b |
rev | line source |
---|---|
36 | 1 vec - a tiny SIMD vector header-only library written in C99 |
0 | 2 |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
3 - Be prepared! Are you sure you want to know? :-) |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
4 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
5 ------------------------------------------------------------------------------ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
6 THE VECTOR API |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
7 ------------------------------------------------------------------------------ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
8 vec comes with an extremely basic API that is similar to other intrinsics |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
9 libraries; each type is in the exact same format: |
0 | 10 |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
11 v[sign][bits]x[size] |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
12 where `sign' is either nothing (for signed) or `u' (for unsigned), |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
13 `bits' is the bit size of the integer format, |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
14 and `size' is the how many integers are in the vector |
0 | 15 |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
16 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
17 on processors where vec has an implementation and falls back to array-based |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
18 implementations where they are not. For example, creating a 256-bit vector |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
19 on powerpc would simply create two consecutive 128-bit vectors. |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
20 |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
21 All of these have many operations that are prefixed with the name of the |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
22 type and an underscore, for example: |
0 | 23 |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
24 vint8x16 vint8x16_splat(int8_t x) |
0 | 25 - creates a vint8x16 where all of the values are filled |
26 with the value of `x' | |
27 | |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
28 The currently supported operations are: |
0 | 29 |
30 v[u]intAxB splat([u]intA_t x) | |
31 creates a vector with all of the values are filled with | |
32 the value of `x' | |
33 | |
34 v[u]intAxB load(const [u]intA_t x[B]) | |
35 copies the values from the memory address stored at `x'; | |
36 the address is NOT required to be aligned | |
37 | |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
38 v[u]intAxB load_aligned(const [u]intA_t x[B]) |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
39 like `load', but the address is required to be aligned, |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
40 which can cause some speed improvements if done correctly. |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
41 |
0 | 42 void store(v[u]intAxB vec, [u]intA_t x[B]) |
43 copies the values from the vector into the memory address | |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
44 stored at `x'. |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
45 like with load(), this does not require address alignment |
0 | 46 |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
47 void store_aligned(v[u]intAxB vec, [u]intA_t x[B]) |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
48 like `store', but the address is required to be aligned, |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
49 which can cause some speed improvements if done correctly. |
0 | 50 |
51 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2) | |
52 adds the value of `vec1' and `vec2' and returns it | |
53 | |
54 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2) | |
55 subtracts the value of `vec2' from `vec1' and returns it | |
56 | |
57 v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2) | |
58 multiplies the values of `vec1' and `vec2' together and | |
59 returns it | |
2
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
60 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
61 v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
62 divides vec1 by the values in vec2. dividing by zero is |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
63 considered defined behavior and should result in a zero; |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
64 if this doesn't happen it's considered a bug |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
65 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
66 v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
67 bitwise AND (&) of the values in both vectors |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
68 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
69 v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
70 bitwise OR (|) of the values in both vectors |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
71 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
72 v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
73 bitwise XOR (^) of the values in both vectors |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
74 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
75 v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
76 arithmetic right shift of the values in vec1 by |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
77 the corresponding values in vec2 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
78 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
79 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
80 arithmetic left shift of the values in vec1 by |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
81 the corresponding values in vec2 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
82 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
83 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
84 logical right shift of the values in vec1 by |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
85 the corresponding values in vec2 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
86 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
87 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
88 returns the average of the values in both vectors |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
89 i.e., div(add(vec1, vec2), splat(2)), without |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
90 the possibility of overflow. |
2
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
91 |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
92 v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2) |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
93 returns the minimum of the values in both vectors |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
94 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
95 v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2) |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
96 returns the maximum of the values in both vectors |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
97 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
98 There are also a number of comparisons possible: |
2
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
99 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
100 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
101 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
102 the result vector if the value in `vec1' is less |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
103 than the corresponding value in `vec2', else all |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
104 of the bits are turned off. |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
105 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
106 v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
107 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
108 the result vector if the value in `vec1' is greater |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
109 than the corresponding value in `vec2', else all |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
110 of the bits are turned off. |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
111 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
112 v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
113 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
114 the result vector if the value in `vec1' are equal |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
115 to the corresponding value in `vec2', else all |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
116 of the bits are turned off. |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
117 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
118 v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
119 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
120 the result vector if the value in `vec1' is less |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
121 than or equal to the corresponding value in `vec2', |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
122 else all of the bits are turned off. |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
123 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
124 v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
125 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
126 the result vector if the value in `vec1' is greater |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
127 than or equal to the corresponding value in `vec2', |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
128 else all of the bits are turned off. |
36 | 129 |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
130 This API will most definitely have more operations available as they are |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
131 requested (and as they are needed). Patches are accepted and encouraged! |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
132 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
133 ------------------------------------------------------------------------------ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
134 USING VEC |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
135 ------------------------------------------------------------------------------ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
136 To use vec, simply include `vec/vec.h` in your program. If you would like |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
137 your program to also be able to run on older systems, you can create |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
138 multiple translation units and pass different command line arguments |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
139 to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
140 modes the CPU supports at runtime. vec provides an optional public API |
39
f9ca85d2f14c
*: rearrange some things; add avx512bw support
Paper <paper@tflc.us>
parents:
38
diff
changeset
|
141 specifically for this use-case within `vec/cpu.h`; bear in mind though |
f9ca85d2f14c
*: rearrange some things; add avx512bw support
Paper <paper@tflc.us>
parents:
38
diff
changeset
|
142 that it is not thread-safe, so if your program is multithreaded you'll want |
f9ca85d2f14c
*: rearrange some things; add avx512bw support
Paper <paper@tflc.us>
parents:
38
diff
changeset
|
143 to cache the results on startup. |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
144 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
145 The CPU vector detection API is extremely simple, and self-explanatory. |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
146 You call `vec_get_CPU_features()', and it returns a bit-mask of the |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
147 values within the enum placed above the function definition. From there, |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
148 you can test for each value specifically. |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
149 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
150 vec should work perfectly fine with C++, though it is not tested as |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
151 thoroughly as C is. Your mileage may vary. You should probably be using |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
152 a library more tailored towards C++ such as Highway[1] or std::simd. |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
153 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
154 [1]: https://google.github.io/highway/en/master/ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
155 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
156 ------------------------------------------------------------------------------ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
157 MEMORY ALLOCATION |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
158 ------------------------------------------------------------------------------ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
159 vec allows for stack-based and heap-based aligned array allocation. The |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
160 stack-based API is simple, and goes among the lines of this: |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
161 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
162 VINT16x32_ALIGNED_ARRAY(arr); |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
163 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
164 /* arr is now either an array type or a pointer type, depending on whether |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
165 * the compiler supports the alignas operator within C11 or later, or has |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
166 * its own extension to align arrays. vec will fallback to manual pointer |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
167 * alignment if the compiler does not support it. */ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
168 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
169 /* this macro returns the full size of the array in bytes */ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
170 int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr); |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
171 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
172 /* this macro returns the length of the array |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
173 * (basically a synonym for sizeof/sizeof[0]) */ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
174 int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr); |
36 | 175 |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
176 /* no need to free the aligned array -- it is always on the stack */ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
177 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
178 The heap-based API is based off the good old C malloc API: |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
179 |
39
f9ca85d2f14c
*: rearrange some things; add avx512bw support
Paper <paper@tflc.us>
parents:
38
diff
changeset
|
180 /* heap allocation stuff is only defined here: */ |
f9ca85d2f14c
*: rearrange some things; add avx512bw support
Paper <paper@tflc.us>
parents:
38
diff
changeset
|
181 #include "vec/mem.h" |
f9ca85d2f14c
*: rearrange some things; add avx512bw support
Paper <paper@tflc.us>
parents:
38
diff
changeset
|
182 |
38
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
183 vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32)); |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
184 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
185 /* q is now aligned, and ready for use with a vector aligned load |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
186 * function. */ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
187 vint32x16_load_aligned(q); |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
188 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
189 /* Say we want to reallocate the memory with a different size. |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
190 * No problem there! */ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
191 q = vec_realloc(q, 2048 * sizeof(vec_int32)); |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
192 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
193 /* In a real world program, you'll want to check that vec_malloc |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
194 * and vec_realloc do not fail, but this error checking has been |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
195 * withheld from this example, as it is the same as for regular |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
196 * malloc and realloc. */ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
197 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
198 vec_free(q); |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
199 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
200 /* If you need it to be initialized, we have you covered: */ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
201 q = vec_calloc(1024, sizeof(vec_int32)); |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
202 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
203 /* vec_calloc forwards to the real calloc, so there is no overhead of |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
204 * calling memset or something similar. */ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
205 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
206 vec_free(q); |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
207 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
208 ------------------------------------------------------------------------------ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
209 THE BOTTOM |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
210 ------------------------------------------------------------------------------ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
211 vec is copyright (c) Paper 2024-2025. |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
212 See the file LICENSE in the distribution for more information. |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
213 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
214 Bugs? Questions? Suggestions? Patches? |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
215 Feel free to contact me at any of the following: |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
216 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
217 Website: https://tflc.us/ |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
218 Email: paper@tflc.us |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
219 IRC: slipofpaper on Libera.chat |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
220 Discord: @slipofpaper |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
221 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
222 |
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents:
36
diff
changeset
|
223 am I a real programmer now? :^) |