Mercurial > vec
annotate README @ 27:d00b95f95dd1 default tip
impl/arm/neon: it compiles again, but is untested
author | Paper <paper@tflc.us> |
---|---|
date | Mon, 25 Nov 2024 00:33:02 -0500 (2 months ago) |
parents | e26874655738 |
children |
rev | line source |
---|---|
23
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
1 vec - a tiny SIMD vector library written in C99 |
0 | 2 |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
3 it comes with an extremely basic API that is similar to other intrinsics |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
4 libraries; each type is in the exact same format: |
0 | 5 |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
6 v[sign][bits]x[size] |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
7 where `sign' is either nothing (for signed) or `u' (for unsigned), |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
8 `bits' is the bit size of the integer format, |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
9 and `size' is the how many integers are in the vector |
0 | 10 |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
11 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
12 on processors where vec has an implementation and falls back to array-based |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
13 implementations where they are not. |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
14 |
23
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
15 to initialize vec, you MUST call `vec_init()' when your program starts up. |
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
16 |
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
17 note that `vec_init()' is NOT thread-safe, and things can and will |
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
18 blow up if you call it simultaneously from different threads (i.e. you |
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
19 try to only initialize it when you need to... please just initialize |
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
20 it on startup so you don't have to worry about that!!!) |
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
21 |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
22 all of these have many operations that are prefixed with the name of the |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
23 type and an underscore, for example: |
0 | 24 |
25 vint8x16 vint8x16_splat(uint8_t x) | |
26 - creates a vint8x16 where all of the values are filled | |
27 with the value of `x' | |
28 | |
29 the current supported operations are: | |
30 | |
31 v[u]intAxB splat([u]intA_t x) | |
32 creates a vector with all of the values are filled with | |
33 the value of `x' | |
34 | |
35 v[u]intAxB load(const [u]intA_t x[B]) | |
36 copies the values from the memory address stored at `x'; | |
37 the address is NOT required to be aligned | |
38 | |
39 void store(v[u]intAxB vec, [u]intA_t x[B]) | |
40 copies the values from the vector into the memory address | |
41 stored at `x' | |
42 | |
43 like with load(), this does not require address alignment | |
44 | |
45 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2) | |
46 adds the value of `vec1' and `vec2' and returns it | |
47 | |
48 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2) | |
49 subtracts the value of `vec2' from `vec1' and returns it | |
50 | |
51 v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2) | |
52 multiplies the values of `vec1' and `vec2' together and | |
53 returns it | |
2
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
54 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
55 v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
56 divides vec1 by the values in vec2. dividing by zero is |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
57 considered defined behavior and should result in a zero; |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
58 if this doesn't happen it's considered a bug |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
59 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
60 v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
61 bitwise AND (&) of the values in both vectors |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
62 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
63 v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
64 bitwise OR (|) of the values in both vectors |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
65 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
66 v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
67 bitwise XOR (^) of the values in both vectors |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
68 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
69 v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
70 arithmetic right shift of the values in vec1 by |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
71 the corresponding values in vec2 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
72 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
73 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
74 arithmetic left shift of the values in vec1 by |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
75 the corresponding values in vec2 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
76 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
77 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
78 logical right shift of the values in vec1 by |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
79 the corresponding values in vec2 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
80 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
81 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
82 returns the average of the values in both vectors |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
83 i.e., div(mul(vec1, vec2), splat(2)) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
84 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
85 there are also a number of comparisons possible: |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
86 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
87 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
88 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
89 the result vector if the value in `vec1' is less |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
90 than the corresponding value in `vec2', else all |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
91 of the bits are turned off. |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
92 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
93 v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
94 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
95 the result vector if the value in `vec1' is greater |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
96 than the corresponding value in `vec2', else all |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
97 of the bits are turned off. |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
98 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
99 v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
100 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
101 the result vector if the value in `vec1' are equal |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
102 to the corresponding value in `vec2', else all |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
103 of the bits are turned off. |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
104 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
105 v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
106 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
107 the result vector if the value in `vec1' is less |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
108 than or equal to the corresponding value in `vec2', |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
109 else all of the bits are turned off. |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
110 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
111 v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
112 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
113 the result vector if the value in `vec1' is greater |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
114 than or equal to the corresponding value in `vec2', |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
115 else all of the bits are turned off. |