annotate README @ 25:92156fe32755

impl/ppc/altivec: update to new implementation the signed average function is wrong; it needs to round up the number when only one of them is odd, but that doesn't necessarily seem to be true because altivec is weird, and that's what we need to emulate the quirks for. ugh. also the altivec backend uses the generic functions instead of fallbacks because it does indeed use the exact same memory structure as the generic implementation...
author Paper <paper@tflc.us>
date Sun, 24 Nov 2024 11:15:59 +0000
parents e26874655738
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
23
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
1 vec - a tiny SIMD vector library written in C99
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
2
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
3 it comes with an extremely basic API that is similar to other intrinsics
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
4 libraries; each type is in the exact same format:
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
5
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
6 v[sign][bits]x[size]
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
7 where `sign' is either nothing (for signed) or `u' (for unsigned),
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
8 `bits' is the bit size of the integer format,
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
9 and `size' is the how many integers are in the vector
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
10
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
11 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
12 on processors where vec has an implementation and falls back to array-based
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
13 implementations where they are not.
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
14
23
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
15 to initialize vec, you MUST call `vec_init()' when your program starts up.
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
16
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
17 note that `vec_init()' is NOT thread-safe, and things can and will
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
18 blow up if you call it simultaneously from different threads (i.e. you
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
19 try to only initialize it when you need to... please just initialize
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
20 it on startup so you don't have to worry about that!!!)
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
21
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
22 all of these have many operations that are prefixed with the name of the
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
23 type and an underscore, for example:
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
24
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
25 vint8x16 vint8x16_splat(uint8_t x)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
26 - creates a vint8x16 where all of the values are filled
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
27 with the value of `x'
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
28
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
29 the current supported operations are:
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
30
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
31 v[u]intAxB splat([u]intA_t x)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
32 creates a vector with all of the values are filled with
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
33 the value of `x'
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
34
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
35 v[u]intAxB load(const [u]intA_t x[B])
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
36 copies the values from the memory address stored at `x';
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
37 the address is NOT required to be aligned
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
38
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
39 void store(v[u]intAxB vec, [u]intA_t x[B])
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
40 copies the values from the vector into the memory address
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
41 stored at `x'
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
42
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
43 like with load(), this does not require address alignment
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
44
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
45 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
46 adds the value of `vec1' and `vec2' and returns it
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
47
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
48 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
49 subtracts the value of `vec2' from `vec1' and returns it
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
50
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
51 v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
52 multiplies the values of `vec1' and `vec2' together and
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
53 returns it
2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
54
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
55 v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
56 divides vec1 by the values in vec2. dividing by zero is
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
57 considered defined behavior and should result in a zero;
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
58 if this doesn't happen it's considered a bug
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
59
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
60 v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
61 bitwise AND (&) of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
62
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
63 v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
64 bitwise OR (|) of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
65
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
66 v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
67 bitwise XOR (^) of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
68
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
69 v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
70 arithmetic right shift of the values in vec1 by
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
71 the corresponding values in vec2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
72
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
73 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
74 arithmetic left shift of the values in vec1 by
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
75 the corresponding values in vec2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
76
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
77 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
78 logical right shift of the values in vec1 by
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
79 the corresponding values in vec2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
80
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
81 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
82 returns the average of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
83 i.e., div(mul(vec1, vec2), splat(2))
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
84
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
85 there are also a number of comparisons possible:
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
86
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
87 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
88 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
89 the result vector if the value in `vec1' is less
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
90 than the corresponding value in `vec2', else all
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
91 of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
92
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
93 v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
94 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
95 the result vector if the value in `vec1' is greater
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
96 than the corresponding value in `vec2', else all
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
97 of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
98
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
99 v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
100 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
101 the result vector if the value in `vec1' are equal
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
102 to the corresponding value in `vec2', else all
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
103 of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
104
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
105 v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
106 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
107 the result vector if the value in `vec1' is less
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
108 than or equal to the corresponding value in `vec2',
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
109 else all of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
110
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
111 v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
112 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
113 the result vector if the value in `vec1' is greater
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
114 than or equal to the corresponding value in `vec2',
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
115 else all of the bits are turned off.