annotate README @ 27:d00b95f95dd1 default tip

impl/arm/neon: it compiles again, but is untested
author Paper <paper@tflc.us>
date Mon, 25 Nov 2024 00:33:02 -0500
parents e26874655738
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
23
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
1 vec - a tiny SIMD vector library written in C99
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
2
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
3 it comes with an extremely basic API that is similar to other intrinsics
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
4 libraries; each type is in the exact same format:
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
5
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
6 v[sign][bits]x[size]
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
7 where `sign' is either nothing (for signed) or `u' (for unsigned),
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
8 `bits' is the bit size of the integer format,
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
9 and `size' is the how many integers are in the vector
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
10
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
11 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
12 on processors where vec has an implementation and falls back to array-based
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
13 implementations where they are not.
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
14
23
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
15 to initialize vec, you MUST call `vec_init()' when your program starts up.
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
16
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
17 note that `vec_init()' is NOT thread-safe, and things can and will
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
18 blow up if you call it simultaneously from different threads (i.e. you
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
19 try to only initialize it when you need to... please just initialize
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
20 it on startup so you don't have to worry about that!!!)
e26874655738 *: huge refactor, new major release (hahaha)
Paper <paper@tflc.us>
parents: 15
diff changeset
21
15
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
22 all of these have many operations that are prefixed with the name of the
e05c257c6a23 *: huge refactor, add many new x86 intrinsics and the like
Paper <paper@tflc.us>
parents: 2
diff changeset
23 type and an underscore, for example:
0
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
24
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
25 vint8x16 vint8x16_splat(uint8_t x)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
26 - creates a vint8x16 where all of the values are filled
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
27 with the value of `x'
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
28
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
29 the current supported operations are:
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
30
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
31 v[u]intAxB splat([u]intA_t x)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
32 creates a vector with all of the values are filled with
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
33 the value of `x'
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
34
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
35 v[u]intAxB load(const [u]intA_t x[B])
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
36 copies the values from the memory address stored at `x';
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
37 the address is NOT required to be aligned
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
38
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
39 void store(v[u]intAxB vec, [u]intA_t x[B])
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
40 copies the values from the vector into the memory address
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
41 stored at `x'
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
42
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
43 like with load(), this does not require address alignment
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
44
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
45 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
46 adds the value of `vec1' and `vec2' and returns it
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
47
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
48 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
49 subtracts the value of `vec2' from `vec1' and returns it
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
50
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
51 v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
52 multiplies the values of `vec1' and `vec2' together and
02a517e4c492 *: initial commit
Paper <paper@paper.us.eu.org>
parents:
diff changeset
53 returns it
2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
54
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
55 v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
56 divides vec1 by the values in vec2. dividing by zero is
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
57 considered defined behavior and should result in a zero;
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
58 if this doesn't happen it's considered a bug
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
59
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
60 v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
61 bitwise AND (&) of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
62
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
63 v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
64 bitwise OR (|) of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
65
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
66 v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
67 bitwise XOR (^) of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
68
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
69 v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
70 arithmetic right shift of the values in vec1 by
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
71 the corresponding values in vec2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
72
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
73 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
74 arithmetic left shift of the values in vec1 by
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
75 the corresponding values in vec2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
76
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
77 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
78 logical right shift of the values in vec1 by
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
79 the corresponding values in vec2
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
80
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
81 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
82 returns the average of the values in both vectors
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
83 i.e., div(mul(vec1, vec2), splat(2))
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
84
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
85 there are also a number of comparisons possible:
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
86
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
87 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
88 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
89 the result vector if the value in `vec1' is less
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
90 than the corresponding value in `vec2', else all
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
91 of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
92
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
93 v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
94 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
95 the result vector if the value in `vec1' is greater
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
96 than the corresponding value in `vec2', else all
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
97 of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
98
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
99 v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
100 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
101 the result vector if the value in `vec1' are equal
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
102 to the corresponding value in `vec2', else all
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
103 of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
104
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
105 v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
106 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
107 the result vector if the value in `vec1' is less
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
108 than or equal to the corresponding value in `vec2',
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
109 else all of the bits are turned off.
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
110
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
111 v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
112 turns on all bits of the corresponding value in
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
113 the result vector if the value in `vec1' is greater
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
114 than or equal to the corresponding value in `vec2',
f12b5dd4e18c *: many new operations and a real test suite
Paper <paper@tflc.us>
parents: 0
diff changeset
115 else all of the bits are turned off.