annotate README @ 41:c6e0df09b86f default tip

*: performance improvements with old GCC, reimplement altivec
author Paper <paper@tflc.us>
date Mon, 28 Apr 2025 16:31:59 -0400
parents 55cadb1fac4b
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
36
677c03c382b8 Backed out changeset e26874655738
Paper <paper@tflc.us>
parents: 23
diff changeset
1 vec - a tiny SIMD vector header-only library written in C99
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
2
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
3 - Be prepared! Are you sure you want to know? :-)
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
4
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
5 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
6 THE VECTOR API
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
7 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
8 vec comes with an extremely basic API that is similar to other intrinsics
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
9 libraries; each type is in the exact same format:
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
10
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
11 v[sign][bits]x[size]
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
12 where `sign' is either nothing (for signed) or `u' (for unsigned),
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
13 `bits' is the bit size of the integer format,
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
14 and `size' is the how many integers are in the vector
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
15
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
16 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
17 on processors where vec has an implementation and falls back to array-based
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
18 implementations where they are not. For example, creating a 256-bit vector
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
19 on powerpc would simply create two consecutive 128-bit vectors.
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
20
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
21 All of these have many operations that are prefixed with the name of the
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
22 type and an underscore, for example:
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
23
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
24 vint8x16 vint8x16_splat(int8_t x)
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
25 - creates a vint8x16 where all of the values are filled
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
26 with the value of `x'
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
27
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
28 The currently supported operations are:
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
29
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
30 v[u]intAxB splat([u]intA_t x)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
31 creates a vector with all of the values are filled with
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
32 the value of `x'
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
33
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
34 v[u]intAxB load(const [u]intA_t x[B])
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
35 copies the values from the memory address stored at `x';
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
36 the address is NOT required to be aligned
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
37
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
38 v[u]intAxB load_aligned(const [u]intA_t x[B])
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
39 like `load', but the address is required to be aligned,
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
40 which can cause some speed improvements if done correctly.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
41
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
42 void store(v[u]intAxB vec, [u]intA_t x[B])
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
43 copies the values from the vector into the memory address
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
44 stored at `x'.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
45 like with load(), this does not require address alignment
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
46
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
47 void store_aligned(v[u]intAxB vec, [u]intA_t x[B])
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
48 like `store', but the address is required to be aligned,
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
49 which can cause some speed improvements if done correctly.
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
50
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
51 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
52 adds the value of `vec1' and `vec2' and returns it
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
53
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
54 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
55 subtracts the value of `vec2' from `vec1' and returns it
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
56
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
57 v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
58 multiplies the values of `vec1' and `vec2' together and
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
59 returns it
2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
60
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
61 v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
62 divides vec1 by the values in vec2. dividing by zero is
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
63 considered defined behavior and should result in a zero;
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
64 if this doesn't happen it's considered a bug
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
65
40
55cadb1fac4b *: add mod operation, add GCC vector backend
Paper <paper@tflc.us>
parents: 39
diff changeset
66 v[u]intAxB mod(v[u]intAxB vec1, v[u]intAxB vec2)
55cadb1fac4b *: add mod operation, add GCC vector backend
Paper <paper@tflc.us>
parents: 39
diff changeset
67 gives the remainder of a division operation. as with div,
55cadb1fac4b *: add mod operation, add GCC vector backend
Paper <paper@tflc.us>
parents: 39
diff changeset
68 divide-by-zero is defined behavior.
55cadb1fac4b *: add mod operation, add GCC vector backend
Paper <paper@tflc.us>
parents: 39
diff changeset
69
2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
70 v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
71 bitwise AND (&) of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
72
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
73 v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
74 bitwise OR (|) of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
75
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
76 v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
77 bitwise XOR (^) of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
78
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
79 v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
80 arithmetic right shift of the values in vec1 by
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
81 the corresponding values in vec2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
82
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
83 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
84 arithmetic left shift of the values in vec1 by
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
85 the corresponding values in vec2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
86
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
87 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
88 logical right shift of the values in vec1 by
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
89 the corresponding values in vec2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
90
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
91 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
92 returns the average of the values in both vectors
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
93 i.e., div(add(vec1, vec2), splat(2)), without
40
55cadb1fac4b *: add mod operation, add GCC vector backend
Paper <paper@tflc.us>
parents: 39
diff changeset
94 the possibility of overflow. If you are familiar
55cadb1fac4b *: add mod operation, add GCC vector backend
Paper <paper@tflc.us>
parents: 39
diff changeset
95 with AltiVec, this operation exactly mimics
55cadb1fac4b *: add mod operation, add GCC vector backend
Paper <paper@tflc.us>
parents: 39
diff changeset
96 vec_avg.
2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
97
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
98 v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2)
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
99 returns the minimum of the values in both vectors
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
100
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
101 v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2)
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
102 returns the maximum of the values in both vectors
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
103
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
104 There are also a number of comparisons possible:
2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
105
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
106 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
107 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
108 the result vector if the value in `vec1' is less
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
109 than the corresponding value in `vec2', else all
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
110 of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
111
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
112 v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
113 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
114 the result vector if the value in `vec1' is greater
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
115 than the corresponding value in `vec2', else all
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
116 of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
117
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
118 v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
119 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
120 the result vector if the value in `vec1' are equal
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
121 to the corresponding value in `vec2', else all
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
122 of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
123
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
124 v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
125 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
126 the result vector if the value in `vec1' is less
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
127 than or equal to the corresponding value in `vec2',
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
128 else all of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
129
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
130 v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
131 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
132 the result vector if the value in `vec1' is greater
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
133 than or equal to the corresponding value in `vec2',
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
134 else all of the bits are turned off.
36
677c03c382b8 Backed out changeset e26874655738
Paper <paper@tflc.us>
parents: 23
diff changeset
135
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
136 This API will most definitely have more operations available as they are
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
137 requested (and as they are needed). Patches are accepted and encouraged!
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
138
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
139 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
140 USING VEC
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
141 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
142 To use vec, simply include `vec/vec.h` in your program. If you would like
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
143 your program to also be able to run on older systems, you can create
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
144 multiple translation units and pass different command line arguments
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
145 to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
146 modes the CPU supports at runtime. vec provides an optional public API
39
f9ca85d2f14c *: rearrange some things; add avx512bw support
Paper <paper@tflc.us>
parents: 38
diff changeset
147 specifically for this use-case within `vec/cpu.h`; bear in mind though
f9ca85d2f14c *: rearrange some things; add avx512bw support
Paper <paper@tflc.us>
parents: 38
diff changeset
148 that it is not thread-safe, so if your program is multithreaded you'll want
f9ca85d2f14c *: rearrange some things; add avx512bw support
Paper <paper@tflc.us>
parents: 38
diff changeset
149 to cache the results on startup.
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
150
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
151 The CPU vector detection API is extremely simple, and self-explanatory.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
152 You call `vec_get_CPU_features()', and it returns a bit-mask of the
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
153 values within the enum placed above the function definition. From there,
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
154 you can test for each value specifically.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
155
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
156 vec should work perfectly fine with C++, though it is not tested as
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
157 thoroughly as C is. Your mileage may vary. You should probably be using
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
158 a library more tailored towards C++ such as Highway[1] or std::simd.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
159
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
160 [1]: https://google.github.io/highway/en/master/
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
161
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
162 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
163 MEMORY ALLOCATION
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
164 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
165 vec allows for stack-based and heap-based aligned array allocation. The
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
166 stack-based API is simple, and goes among the lines of this:
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
167
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
168 VINT16x32_ALIGNED_ARRAY(arr);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
169
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
170 /* arr is now either an array type or a pointer type, depending on whether
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
171 * the compiler supports the alignas operator within C11 or later, or has
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
172 * its own extension to align arrays. vec will fallback to manual pointer
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
173 * alignment if the compiler does not support it. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
174
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
175 /* this macro returns the full size of the array in bytes */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
176 int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
177
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
178 /* this macro returns the length of the array
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
179 * (basically a synonym for sizeof/sizeof[0]) */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
180 int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr);
36
677c03c382b8 Backed out changeset e26874655738
Paper <paper@tflc.us>
parents: 23
diff changeset
181
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
182 /* no need to free the aligned array -- it is always on the stack */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
183
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
184 The heap-based API is based off the good old C malloc API:
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
185
39
f9ca85d2f14c *: rearrange some things; add avx512bw support
Paper <paper@tflc.us>
parents: 38
diff changeset
186 /* heap allocation stuff is only defined here: */
f9ca85d2f14c *: rearrange some things; add avx512bw support
Paper <paper@tflc.us>
parents: 38
diff changeset
187 #include "vec/mem.h"
f9ca85d2f14c *: rearrange some things; add avx512bw support
Paper <paper@tflc.us>
parents: 38
diff changeset
188
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
189 vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32));
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
190
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
191 /* q is now aligned, and ready for use with a vector aligned load
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
192 * function. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
193 vint32x16_load_aligned(q);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
194
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
195 /* Say we want to reallocate the memory with a different size.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
196 * No problem there! */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
197 q = vec_realloc(q, 2048 * sizeof(vec_int32));
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
198
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
199 /* In a real world program, you'll want to check that vec_malloc
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
200 * and vec_realloc do not fail, but this error checking has been
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
201 * withheld from this example, as it is the same as for regular
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
202 * malloc and realloc. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
203
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
204 vec_free(q);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
205
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
206 /* If you need it to be initialized, we have you covered: */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
207 q = vec_calloc(1024, sizeof(vec_int32));
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
208
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
209 /* vec_calloc forwards to the real calloc, so there is no overhead of
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
210 * calling memset or something similar. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
211
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
212 vec_free(q);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
213
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
214 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
215 THE BOTTOM
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
216 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
217 vec is copyright (c) Paper 2024-2025.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
218 See the file LICENSE in the distribution for more information.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
219
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
220 Bugs? Questions? Suggestions? Patches?
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
221 Feel free to contact me at any of the following:
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
222
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
223 Website: https://tflc.us/
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
224 Email: paper@tflc.us
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
225 IRC: slipofpaper on Libera.chat
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
226 Discord: @slipofpaper
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
227
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
228
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
229 am I a real programmer now? :^)