Mercurial > vec
annotate README @ 48:7d55b2bf8152 default tip
vec: these macros literally never worked, oops
| author | Paper <paper@tflc.us> | 
|---|---|
| date | Sat, 09 Aug 2025 16:03:34 -0400 | 
| parents | 55cadb1fac4b | 
| children | 
| rev | line source | 
|---|---|
| 36 | 1 vec - a tiny SIMD vector header-only library written in C99 | 
| 0 | 2 | 
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
3 - Be prepared! Are you sure you want to know? :-) | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
4 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
5 ------------------------------------------------------------------------------ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
6 THE VECTOR API | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
7 ------------------------------------------------------------------------------ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
8 vec comes with an extremely basic API that is similar to other intrinsics | 
| 
15
 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 
Paper <paper@tflc.us> 
parents: 
2 
diff
changeset
 | 
9 libraries; each type is in the exact same format: | 
| 0 | 10 | 
| 
15
 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 
Paper <paper@tflc.us> 
parents: 
2 
diff
changeset
 | 
11 v[sign][bits]x[size] | 
| 
 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 
Paper <paper@tflc.us> 
parents: 
2 
diff
changeset
 | 
12 where `sign' is either nothing (for signed) or `u' (for unsigned), | 
| 
 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 
Paper <paper@tflc.us> 
parents: 
2 
diff
changeset
 | 
13 `bits' is the bit size of the integer format, | 
| 
 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 
Paper <paper@tflc.us> 
parents: 
2 
diff
changeset
 | 
14 and `size' is the how many integers are in the vector | 
| 0 | 15 | 
| 
15
 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 
Paper <paper@tflc.us> 
parents: 
2 
diff
changeset
 | 
16 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics | 
| 
 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 
Paper <paper@tflc.us> 
parents: 
2 
diff
changeset
 | 
17 on processors where vec has an implementation and falls back to array-based | 
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
18 implementations where they are not. For example, creating a 256-bit vector | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
19 on powerpc would simply create two consecutive 128-bit vectors. | 
| 
15
 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 
Paper <paper@tflc.us> 
parents: 
2 
diff
changeset
 | 
20 | 
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
21 All of these have many operations that are prefixed with the name of the | 
| 
15
 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 
Paper <paper@tflc.us> 
parents: 
2 
diff
changeset
 | 
22 type and an underscore, for example: | 
| 0 | 23 | 
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
24 vint8x16 vint8x16_splat(int8_t x) | 
| 0 | 25 - creates a vint8x16 where all of the values are filled | 
| 26 with the value of `x' | |
| 27 | |
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
28 The currently supported operations are: | 
| 0 | 29 | 
| 30 v[u]intAxB splat([u]intA_t x) | |
| 31 creates a vector with all of the values are filled with | |
| 32 the value of `x' | |
| 33 | |
| 34 v[u]intAxB load(const [u]intA_t x[B]) | |
| 35 copies the values from the memory address stored at `x'; | |
| 36 the address is NOT required to be aligned | |
| 37 | |
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
38 v[u]intAxB load_aligned(const [u]intA_t x[B]) | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
39 like `load', but the address is required to be aligned, | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
40 which can cause some speed improvements if done correctly. | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
41 | 
| 0 | 42 void store(v[u]intAxB vec, [u]intA_t x[B]) | 
| 43 copies the values from the vector into the memory address | |
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
44 stored at `x'. | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
45 like with load(), this does not require address alignment | 
| 0 | 46 | 
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
47 void store_aligned(v[u]intAxB vec, [u]intA_t x[B]) | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
48 like `store', but the address is required to be aligned, | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
49 which can cause some speed improvements if done correctly. | 
| 0 | 50 | 
| 51 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2) | |
| 52 adds the value of `vec1' and `vec2' and returns it | |
| 53 | |
| 54 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2) | |
| 55 subtracts the value of `vec2' from `vec1' and returns it | |
| 56 | |
| 57 v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2) | |
| 58 multiplies the values of `vec1' and `vec2' together and | |
| 59 returns it | |
| 
2
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
60 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
61 v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
62 divides vec1 by the values in vec2. dividing by zero is | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
63 considered defined behavior and should result in a zero; | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
64 if this doesn't happen it's considered a bug | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
65 | 
| 
40
 
55cadb1fac4b
*: add mod operation, add GCC vector backend
 
Paper <paper@tflc.us> 
parents: 
39 
diff
changeset
 | 
66 v[u]intAxB mod(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
55cadb1fac4b
*: add mod operation, add GCC vector backend
 
Paper <paper@tflc.us> 
parents: 
39 
diff
changeset
 | 
67 gives the remainder of a division operation. as with div, | 
| 
 
55cadb1fac4b
*: add mod operation, add GCC vector backend
 
Paper <paper@tflc.us> 
parents: 
39 
diff
changeset
 | 
68 divide-by-zero is defined behavior. | 
| 
 
55cadb1fac4b
*: add mod operation, add GCC vector backend
 
Paper <paper@tflc.us> 
parents: 
39 
diff
changeset
 | 
69 | 
| 
2
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
70 v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
71 bitwise AND (&) of the values in both vectors | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
72 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
73 v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
74 bitwise OR (|) of the values in both vectors | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
75 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
76 v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
77 bitwise XOR (^) of the values in both vectors | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
78 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
79 v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
80 arithmetic right shift of the values in vec1 by | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
81 the corresponding values in vec2 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
82 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
83 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
84 arithmetic left shift of the values in vec1 by | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
85 the corresponding values in vec2 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
86 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
87 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
88 logical right shift of the values in vec1 by | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
89 the corresponding values in vec2 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
90 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
91 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
92 returns the average of the values in both vectors | 
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
93 i.e., div(add(vec1, vec2), splat(2)), without | 
| 
40
 
55cadb1fac4b
*: add mod operation, add GCC vector backend
 
Paper <paper@tflc.us> 
parents: 
39 
diff
changeset
 | 
94 the possibility of overflow. If you are familiar | 
| 
 
55cadb1fac4b
*: add mod operation, add GCC vector backend
 
Paper <paper@tflc.us> 
parents: 
39 
diff
changeset
 | 
95 with AltiVec, this operation exactly mimics | 
| 
 
55cadb1fac4b
*: add mod operation, add GCC vector backend
 
Paper <paper@tflc.us> 
parents: 
39 
diff
changeset
 | 
96 vec_avg. | 
| 
2
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
97 | 
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
98 v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
99 returns the minimum of the values in both vectors | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
100 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
101 v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
102 returns the maximum of the values in both vectors | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
103 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
104 There are also a number of comparisons possible: | 
| 
2
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
105 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
106 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
107 turns on all bits of the corresponding value in | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
108 the result vector if the value in `vec1' is less | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
109 than the corresponding value in `vec2', else all | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
110 of the bits are turned off. | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
111 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
112 v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
113 turns on all bits of the corresponding value in | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
114 the result vector if the value in `vec1' is greater | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
115 than the corresponding value in `vec2', else all | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
116 of the bits are turned off. | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
117 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
118 v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
119 turns on all bits of the corresponding value in | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
120 the result vector if the value in `vec1' are equal | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
121 to the corresponding value in `vec2', else all | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
122 of the bits are turned off. | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
123 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
124 v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
125 turns on all bits of the corresponding value in | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
126 the result vector if the value in `vec1' is less | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
127 than or equal to the corresponding value in `vec2', | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
128 else all of the bits are turned off. | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
129 | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
130 v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
131 turns on all bits of the corresponding value in | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
132 the result vector if the value in `vec1' is greater | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
133 than or equal to the corresponding value in `vec2', | 
| 
 
f12b5dd4e18c
*: many new operations and a real test suite
 
Paper <paper@tflc.us> 
parents: 
0 
diff
changeset
 | 
134 else all of the bits are turned off. | 
| 36 | 135 | 
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
136 This API will most definitely have more operations available as they are | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
137 requested (and as they are needed). Patches are accepted and encouraged! | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
138 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
139 ------------------------------------------------------------------------------ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
140 USING VEC | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
141 ------------------------------------------------------------------------------ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
142 To use vec, simply include `vec/vec.h` in your program. If you would like | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
143 your program to also be able to run on older systems, you can create | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
144 multiple translation units and pass different command line arguments | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
145 to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
146 modes the CPU supports at runtime. vec provides an optional public API | 
| 
39
 
f9ca85d2f14c
*: rearrange some things; add avx512bw support
 
Paper <paper@tflc.us> 
parents: 
38 
diff
changeset
 | 
147 specifically for this use-case within `vec/cpu.h`; bear in mind though | 
| 
 
f9ca85d2f14c
*: rearrange some things; add avx512bw support
 
Paper <paper@tflc.us> 
parents: 
38 
diff
changeset
 | 
148 that it is not thread-safe, so if your program is multithreaded you'll want | 
| 
 
f9ca85d2f14c
*: rearrange some things; add avx512bw support
 
Paper <paper@tflc.us> 
parents: 
38 
diff
changeset
 | 
149 to cache the results on startup. | 
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
150 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
151 The CPU vector detection API is extremely simple, and self-explanatory. | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
152 You call `vec_get_CPU_features()', and it returns a bit-mask of the | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
153 values within the enum placed above the function definition. From there, | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
154 you can test for each value specifically. | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
155 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
156 vec should work perfectly fine with C++, though it is not tested as | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
157 thoroughly as C is. Your mileage may vary. You should probably be using | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
158 a library more tailored towards C++ such as Highway[1] or std::simd. | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
159 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
160 [1]: https://google.github.io/highway/en/master/ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
161 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
162 ------------------------------------------------------------------------------ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
163 MEMORY ALLOCATION | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
164 ------------------------------------------------------------------------------ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
165 vec allows for stack-based and heap-based aligned array allocation. The | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
166 stack-based API is simple, and goes among the lines of this: | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
167 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
168 VINT16x32_ALIGNED_ARRAY(arr); | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
169 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
170 /* arr is now either an array type or a pointer type, depending on whether | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
171 * the compiler supports the alignas operator within C11 or later, or has | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
172 * its own extension to align arrays. vec will fallback to manual pointer | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
173 * alignment if the compiler does not support it. */ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
174 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
175 /* this macro returns the full size of the array in bytes */ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
176 int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr); | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
177 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
178 /* this macro returns the length of the array | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
179 * (basically a synonym for sizeof/sizeof[0]) */ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
180 int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr); | 
| 36 | 181 | 
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
182 /* no need to free the aligned array -- it is always on the stack */ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
183 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
184 The heap-based API is based off the good old C malloc API: | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
185 | 
| 
39
 
f9ca85d2f14c
*: rearrange some things; add avx512bw support
 
Paper <paper@tflc.us> 
parents: 
38 
diff
changeset
 | 
186 /* heap allocation stuff is only defined here: */ | 
| 
 
f9ca85d2f14c
*: rearrange some things; add avx512bw support
 
Paper <paper@tflc.us> 
parents: 
38 
diff
changeset
 | 
187 #include "vec/mem.h" | 
| 
 
f9ca85d2f14c
*: rearrange some things; add avx512bw support
 
Paper <paper@tflc.us> 
parents: 
38 
diff
changeset
 | 
188 | 
| 
38
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
189 vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32)); | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
190 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
191 /* q is now aligned, and ready for use with a vector aligned load | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
192 * function. */ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
193 vint32x16_load_aligned(q); | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
194 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
195 /* Say we want to reallocate the memory with a different size. | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
196 * No problem there! */ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
197 q = vec_realloc(q, 2048 * sizeof(vec_int32)); | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
198 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
199 /* In a real world program, you'll want to check that vec_malloc | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
200 * and vec_realloc do not fail, but this error checking has been | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
201 * withheld from this example, as it is the same as for regular | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
202 * malloc and realloc. */ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
203 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
204 vec_free(q); | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
205 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
206 /* If you need it to be initialized, we have you covered: */ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
207 q = vec_calloc(1024, sizeof(vec_int32)); | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
208 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
209 /* vec_calloc forwards to the real calloc, so there is no overhead of | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
210 * calling memset or something similar. */ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
211 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
212 vec_free(q); | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
213 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
214 ------------------------------------------------------------------------------ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
215 THE BOTTOM | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
216 ------------------------------------------------------------------------------ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
217 vec is copyright (c) Paper 2024-2025. | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
218 See the file LICENSE in the distribution for more information. | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
219 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
220 Bugs? Questions? Suggestions? Patches? | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
221 Feel free to contact me at any of the following: | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
222 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
223 Website: https://tflc.us/ | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
224 Email: paper@tflc.us | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
225 IRC: slipofpaper on Libera.chat | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
226 Discord: @slipofpaper | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
227 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
228 | 
| 
 
fd42f9b1b95e
docs: update copyright for 2025, update the README with more info
 
Paper <paper@tflc.us> 
parents: 
36 
diff
changeset
 | 
229 am I a real programmer now? :^) | 
