Mercurial > vec
annotate README @ 25:92156fe32755
impl/ppc/altivec: update to new implementation
the signed average function is wrong; it needs to round up the number
when only one of them is odd, but that doesn't necessarily seem to be
true because altivec is weird, and that's what we need to emulate the
quirks for. ugh.
also the altivec backend uses the generic functions instead of fallbacks
because it does indeed use the exact same memory structure as the generic
implementation...
author | Paper <paper@tflc.us> |
---|---|
date | Sun, 24 Nov 2024 11:15:59 +0000 |
parents | e26874655738 |
children |
rev | line source |
---|---|
23
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
1 vec - a tiny SIMD vector library written in C99 |
0 | 2 |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
3 it comes with an extremely basic API that is similar to other intrinsics |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
4 libraries; each type is in the exact same format: |
0 | 5 |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
6 v[sign][bits]x[size] |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
7 where `sign' is either nothing (for signed) or `u' (for unsigned), |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
8 `bits' is the bit size of the integer format, |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
9 and `size' is the how many integers are in the vector |
0 | 10 |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
11 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
12 on processors where vec has an implementation and falls back to array-based |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
13 implementations where they are not. |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
14 |
23
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
15 to initialize vec, you MUST call `vec_init()' when your program starts up. |
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
16 |
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
17 note that `vec_init()' is NOT thread-safe, and things can and will |
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
18 blow up if you call it simultaneously from different threads (i.e. you |
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
19 try to only initialize it when you need to... please just initialize |
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
20 it on startup so you don't have to worry about that!!!) |
e26874655738
*: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents:
15
diff
changeset
|
21 |
15
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
22 all of these have many operations that are prefixed with the name of the |
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents:
2
diff
changeset
|
23 type and an underscore, for example: |
0 | 24 |
25 vint8x16 vint8x16_splat(uint8_t x) | |
26 - creates a vint8x16 where all of the values are filled | |
27 with the value of `x' | |
28 | |
29 the current supported operations are: | |
30 | |
31 v[u]intAxB splat([u]intA_t x) | |
32 creates a vector with all of the values are filled with | |
33 the value of `x' | |
34 | |
35 v[u]intAxB load(const [u]intA_t x[B]) | |
36 copies the values from the memory address stored at `x'; | |
37 the address is NOT required to be aligned | |
38 | |
39 void store(v[u]intAxB vec, [u]intA_t x[B]) | |
40 copies the values from the vector into the memory address | |
41 stored at `x' | |
42 | |
43 like with load(), this does not require address alignment | |
44 | |
45 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2) | |
46 adds the value of `vec1' and `vec2' and returns it | |
47 | |
48 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2) | |
49 subtracts the value of `vec2' from `vec1' and returns it | |
50 | |
51 v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2) | |
52 multiplies the values of `vec1' and `vec2' together and | |
53 returns it | |
2
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
54 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
55 v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
56 divides vec1 by the values in vec2. dividing by zero is |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
57 considered defined behavior and should result in a zero; |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
58 if this doesn't happen it's considered a bug |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
59 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
60 v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
61 bitwise AND (&) of the values in both vectors |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
62 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
63 v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
64 bitwise OR (|) of the values in both vectors |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
65 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
66 v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
67 bitwise XOR (^) of the values in both vectors |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
68 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
69 v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
70 arithmetic right shift of the values in vec1 by |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
71 the corresponding values in vec2 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
72 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
73 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
74 arithmetic left shift of the values in vec1 by |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
75 the corresponding values in vec2 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
76 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
77 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
78 logical right shift of the values in vec1 by |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
79 the corresponding values in vec2 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
80 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
81 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
82 returns the average of the values in both vectors |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
83 i.e., div(mul(vec1, vec2), splat(2)) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
84 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
85 there are also a number of comparisons possible: |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
86 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
87 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
88 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
89 the result vector if the value in `vec1' is less |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
90 than the corresponding value in `vec2', else all |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
91 of the bits are turned off. |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
92 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
93 v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
94 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
95 the result vector if the value in `vec1' is greater |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
96 than the corresponding value in `vec2', else all |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
97 of the bits are turned off. |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
98 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
99 v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
100 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
101 the result vector if the value in `vec1' are equal |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
102 to the corresponding value in `vec2', else all |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
103 of the bits are turned off. |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
104 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
105 v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
106 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
107 the result vector if the value in `vec1' is less |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
108 than or equal to the corresponding value in `vec2', |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
109 else all of the bits are turned off. |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
110 |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
111 v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2) |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
112 turns on all bits of the corresponding value in |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
113 the result vector if the value in `vec1' is greater |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
114 than or equal to the corresponding value in `vec2', |
f12b5dd4e18c
*: many new operations and a real test suite
Paper <paper@tflc.us>
parents:
0
diff
changeset
|
115 else all of the bits are turned off. |