annotate README @ 38:fd42f9b1b95e

docs: update copyright for 2025, update the README with more info I slightly edited vec.h however to use calloc directly rather than malloc + memset.
author Paper <paper@tflc.us>
date Sat, 26 Apr 2025 02:54:44 -0400
parents 677c03c382b8
children f9ca85d2f14c
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
36
677c03c382b8 Backed out changeset e26874655738
Paper <paper@tflc.us>
parents: 23
diff changeset
1 vec - a tiny SIMD vector header-only library written in C99
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
2
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
3 - Be prepared! Are you sure you want to know? :-)
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
4
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
5 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
6 THE VECTOR API
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
7 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
8 vec comes with an extremely basic API that is similar to other intrinsics
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
9 libraries; each type is in the exact same format:
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
10
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
11 v[sign][bits]x[size]
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
12 where `sign' is either nothing (for signed) or `u' (for unsigned),
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
13 `bits' is the bit size of the integer format,
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
14 and `size' is the how many integers are in the vector
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
15
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
16 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
17 on processors where vec has an implementation and falls back to array-based
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
18 implementations where they are not. For example, creating a 256-bit vector
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
19 on powerpc would simply create two consecutive 128-bit vectors.
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
20
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
21 All of these have many operations that are prefixed with the name of the
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
22 type and an underscore, for example:
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
23
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
24 vint8x16 vint8x16_splat(int8_t x)
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
25 - creates a vint8x16 where all of the values are filled
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
26 with the value of `x'
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
27
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
28 The currently supported operations are:
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
29
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
30 v[u]intAxB splat([u]intA_t x)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
31 creates a vector with all of the values are filled with
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
32 the value of `x'
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
33
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
34 v[u]intAxB load(const [u]intA_t x[B])
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
35 copies the values from the memory address stored at `x';
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
36 the address is NOT required to be aligned
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
37
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
38 v[u]intAxB load_aligned(const [u]intA_t x[B])
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
39 like `load', but the address is required to be aligned,
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
40 which can cause some speed improvements if done correctly.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
41
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
42 void store(v[u]intAxB vec, [u]intA_t x[B])
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
43 copies the values from the vector into the memory address
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
44 stored at `x'.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
45 like with load(), this does not require address alignment
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
46
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
47 void store_aligned(v[u]intAxB vec, [u]intA_t x[B])
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
48 like `store', but the address is required to be aligned,
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
49 which can cause some speed improvements if done correctly.
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
50
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
51 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
52 adds the value of `vec1' and `vec2' and returns it
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
53
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
54 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
55 subtracts the value of `vec2' from `vec1' and returns it
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
56
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
57 v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
58 multiplies the values of `vec1' and `vec2' together and
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
59 returns it
2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
60
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
61 v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
62 divides vec1 by the values in vec2. dividing by zero is
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
63 considered defined behavior and should result in a zero;
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
64 if this doesn't happen it's considered a bug
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
65
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
66 v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
67 bitwise AND (&) of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
68
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
69 v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
70 bitwise OR (|) of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
71
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
72 v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
73 bitwise XOR (^) of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
74
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
75 v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
76 arithmetic right shift of the values in vec1 by
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
77 the corresponding values in vec2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
78
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
79 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
80 arithmetic left shift of the values in vec1 by
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
81 the corresponding values in vec2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
82
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
83 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
84 logical right shift of the values in vec1 by
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
85 the corresponding values in vec2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
86
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
87 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
88 returns the average of the values in both vectors
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
89 i.e., div(add(vec1, vec2), splat(2)), without
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
90 the possibility of overflow.
2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
91
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
92 v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2)
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
93 returns the minimum of the values in both vectors
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
94
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
95 v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2)
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
96 returns the maximum of the values in both vectors
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
97
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
98 There are also a number of comparisons possible:
2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
99
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
100 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
101 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
102 the result vector if the value in `vec1' is less
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
103 than the corresponding value in `vec2', else all
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
104 of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
105
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
106 v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
107 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
108 the result vector if the value in `vec1' is greater
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
109 than the corresponding value in `vec2', else all
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
110 of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
111
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
112 v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
113 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
114 the result vector if the value in `vec1' are equal
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
115 to the corresponding value in `vec2', else all
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
116 of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
117
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
118 v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
119 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
120 the result vector if the value in `vec1' is less
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
121 than or equal to the corresponding value in `vec2',
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
122 else all of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
123
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
124 v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
125 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
126 the result vector if the value in `vec1' is greater
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
127 than or equal to the corresponding value in `vec2',
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
128 else all of the bits are turned off.
36
677c03c382b8 Backed out changeset e26874655738
Paper <paper@tflc.us>
parents: 23
diff changeset
129
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
130 This API will most definitely have more operations available as they are
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
131 requested (and as they are needed). Patches are accepted and encouraged!
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
132
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
133 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
134 USING VEC
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
135 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
136 To use vec, simply include `vec/vec.h` in your program. If you would like
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
137 your program to also be able to run on older systems, you can create
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
138 multiple translation units and pass different command line arguments
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
139 to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
140 modes the CPU supports at runtime. vec provides an optional public API
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
141 specifically for this use-case within `vec/impl/cpu.h`; bear in mind
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
142 though that it is not thread-safe, so if your program is multithreaded
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
143 you'll want to cache the results on startup.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
144
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
145 The CPU vector detection API is extremely simple, and self-explanatory.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
146 You call `vec_get_CPU_features()', and it returns a bit-mask of the
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
147 values within the enum placed above the function definition. From there,
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
148 you can test for each value specifically.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
149
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
150 vec should work perfectly fine with C++, though it is not tested as
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
151 thoroughly as C is. Your mileage may vary. You should probably be using
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
152 a library more tailored towards C++ such as Highway[1] or std::simd.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
153
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
154 [1]: https://google.github.io/highway/en/master/
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
155
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
156 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
157 MEMORY ALLOCATION
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
158 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
159 vec allows for stack-based and heap-based aligned array allocation. The
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
160 stack-based API is simple, and goes among the lines of this:
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
161
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
162 VINT16x32_ALIGNED_ARRAY(arr);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
163
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
164 /* arr is now either an array type or a pointer type, depending on whether
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
165 * the compiler supports the alignas operator within C11 or later, or has
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
166 * its own extension to align arrays. vec will fallback to manual pointer
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
167 * alignment if the compiler does not support it. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
168
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
169 /* this macro returns the full size of the array in bytes */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
170 int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
171
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
172 /* this macro returns the length of the array
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
173 * (basically a synonym for sizeof/sizeof[0]) */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
174 int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr);
36
677c03c382b8 Backed out changeset e26874655738
Paper <paper@tflc.us>
parents: 23
diff changeset
175
38
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
176 /* no need to free the aligned array -- it is always on the stack */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
177
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
178 The heap-based API is based off the good old C malloc API:
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
179
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
180 vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32));
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
181
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
182 /* q is now aligned, and ready for use with a vector aligned load
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
183 * function. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
184 vint32x16_load_aligned(q);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
185
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
186 /* Say we want to reallocate the memory with a different size.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
187 * No problem there! */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
188 q = vec_realloc(q, 2048 * sizeof(vec_int32));
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
189
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
190 /* In a real world program, you'll want to check that vec_malloc
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
191 * and vec_realloc do not fail, but this error checking has been
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
192 * withheld from this example, as it is the same as for regular
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
193 * malloc and realloc. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
194
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
195 vec_free(q);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
196
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
197 /* If you need it to be initialized, we have you covered: */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
198 q = vec_calloc(1024, sizeof(vec_int32));
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
199
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
200 /* vec_calloc forwards to the real calloc, so there is no overhead of
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
201 * calling memset or something similar. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
202
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
203 vec_free(q);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
204
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
205 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
206 THE BOTTOM
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
207 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
208 vec is copyright (c) Paper 2024-2025.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
209 See the file LICENSE in the distribution for more information.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
210
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
211 Bugs? Questions? Suggestions? Patches?
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
212 Feel free to contact me at any of the following:
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
213
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
214 Website: https://tflc.us/
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
215 Email: paper@tflc.us
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
216 IRC: slipofpaper on Libera.chat
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
217 Discord: @slipofpaper
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
218
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
219
fd42f9b1b95e docs: update copyright for 2025, update the README with more info
Paper <paper@tflc.us>
parents: 36
diff changeset
220 am I a real programmer now? :^)