Mercurial > vec
annotate README @ 24:e49e70f7012f
impl/x86: add static assertions for alignment and size
| author | Paper <paper@tflc.us> | 
|---|---|
| date | Sun, 24 Nov 2024 03:32:53 -0500 | 
| parents | e26874655738 | 
| children | 677c03c382b8 | 
| rev | line source | 
|---|---|
| 23 
e26874655738
*: huge refactor, new major release (hahaha)
 Paper <paper@tflc.us> parents: 
15diff
changeset | 1 vec - a tiny SIMD vector library written in C99 | 
| 0 | 2 | 
| 15 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 Paper <paper@tflc.us> parents: 
2diff
changeset | 3 it comes with an extremely basic API that is similar to other intrinsics | 
| 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 Paper <paper@tflc.us> parents: 
2diff
changeset | 4 libraries; each type is in the exact same format: | 
| 0 | 5 | 
| 15 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 Paper <paper@tflc.us> parents: 
2diff
changeset | 6 v[sign][bits]x[size] | 
| 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 Paper <paper@tflc.us> parents: 
2diff
changeset | 7 where `sign' is either nothing (for signed) or `u' (for unsigned), | 
| 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 Paper <paper@tflc.us> parents: 
2diff
changeset | 8 `bits' is the bit size of the integer format, | 
| 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 Paper <paper@tflc.us> parents: 
2diff
changeset | 9 and `size' is the how many integers are in the vector | 
| 0 | 10 | 
| 15 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 Paper <paper@tflc.us> parents: 
2diff
changeset | 11 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics | 
| 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 Paper <paper@tflc.us> parents: 
2diff
changeset | 12 on processors where vec has an implementation and falls back to array-based | 
| 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 Paper <paper@tflc.us> parents: 
2diff
changeset | 13 implementations where they are not. | 
| 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 Paper <paper@tflc.us> parents: 
2diff
changeset | 14 | 
| 23 
e26874655738
*: huge refactor, new major release (hahaha)
 Paper <paper@tflc.us> parents: 
15diff
changeset | 15 to initialize vec, you MUST call `vec_init()' when your program starts up. | 
| 
e26874655738
*: huge refactor, new major release (hahaha)
 Paper <paper@tflc.us> parents: 
15diff
changeset | 16 | 
| 
e26874655738
*: huge refactor, new major release (hahaha)
 Paper <paper@tflc.us> parents: 
15diff
changeset | 17 note that `vec_init()' is NOT thread-safe, and things can and will | 
| 
e26874655738
*: huge refactor, new major release (hahaha)
 Paper <paper@tflc.us> parents: 
15diff
changeset | 18 blow up if you call it simultaneously from different threads (i.e. you | 
| 
e26874655738
*: huge refactor, new major release (hahaha)
 Paper <paper@tflc.us> parents: 
15diff
changeset | 19 try to only initialize it when you need to... please just initialize | 
| 
e26874655738
*: huge refactor, new major release (hahaha)
 Paper <paper@tflc.us> parents: 
15diff
changeset | 20 it on startup so you don't have to worry about that!!!) | 
| 
e26874655738
*: huge refactor, new major release (hahaha)
 Paper <paper@tflc.us> parents: 
15diff
changeset | 21 | 
| 15 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 Paper <paper@tflc.us> parents: 
2diff
changeset | 22 all of these have many operations that are prefixed with the name of the | 
| 
e05c257c6a23
*: huge refactor, add many new x86 intrinsics and the like
 Paper <paper@tflc.us> parents: 
2diff
changeset | 23 type and an underscore, for example: | 
| 0 | 24 | 
| 25 vint8x16 vint8x16_splat(uint8_t x) | |
| 26 - creates a vint8x16 where all of the values are filled | |
| 27 with the value of `x' | |
| 28 | |
| 29 the current supported operations are: | |
| 30 | |
| 31 v[u]intAxB splat([u]intA_t x) | |
| 32 creates a vector with all of the values are filled with | |
| 33 the value of `x' | |
| 34 | |
| 35 v[u]intAxB load(const [u]intA_t x[B]) | |
| 36 copies the values from the memory address stored at `x'; | |
| 37 the address is NOT required to be aligned | |
| 38 | |
| 39 void store(v[u]intAxB vec, [u]intA_t x[B]) | |
| 40 copies the values from the vector into the memory address | |
| 41 stored at `x' | |
| 42 | |
| 43 like with load(), this does not require address alignment | |
| 44 | |
| 45 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2) | |
| 46 adds the value of `vec1' and `vec2' and returns it | |
| 47 | |
| 48 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2) | |
| 49 subtracts the value of `vec2' from `vec1' and returns it | |
| 50 | |
| 51 v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2) | |
| 52 multiplies the values of `vec1' and `vec2' together and | |
| 53 returns it | |
| 2 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 54 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 55 v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 56 divides vec1 by the values in vec2. dividing by zero is | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 57 considered defined behavior and should result in a zero; | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 58 if this doesn't happen it's considered a bug | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 59 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 60 v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 61 bitwise AND (&) of the values in both vectors | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 62 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 63 v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 64 bitwise OR (|) of the values in both vectors | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 65 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 66 v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 67 bitwise XOR (^) of the values in both vectors | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 68 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 69 v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 70 arithmetic right shift of the values in vec1 by | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 71 the corresponding values in vec2 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 72 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 73 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 74 arithmetic left shift of the values in vec1 by | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 75 the corresponding values in vec2 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 76 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 77 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 78 logical right shift of the values in vec1 by | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 79 the corresponding values in vec2 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 80 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 81 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 82 returns the average of the values in both vectors | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 83 i.e., div(mul(vec1, vec2), splat(2)) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 84 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 85 there are also a number of comparisons possible: | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 86 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 87 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 88 turns on all bits of the corresponding value in | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 89 the result vector if the value in `vec1' is less | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 90 than the corresponding value in `vec2', else all | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 91 of the bits are turned off. | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 92 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 93 v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 94 turns on all bits of the corresponding value in | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 95 the result vector if the value in `vec1' is greater | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 96 than the corresponding value in `vec2', else all | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 97 of the bits are turned off. | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 98 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 99 v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 100 turns on all bits of the corresponding value in | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 101 the result vector if the value in `vec1' are equal | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 102 to the corresponding value in `vec2', else all | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 103 of the bits are turned off. | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 104 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 105 v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 106 turns on all bits of the corresponding value in | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 107 the result vector if the value in `vec1' is less | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 108 than or equal to the corresponding value in `vec2', | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 109 else all of the bits are turned off. | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 110 | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 111 v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2) | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 112 turns on all bits of the corresponding value in | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 113 the result vector if the value in `vec1' is greater | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 114 than or equal to the corresponding value in `vec2', | 
| 
f12b5dd4e18c
*: many new operations and a real test suite
 Paper <paper@tflc.us> parents: 
0diff
changeset | 115 else all of the bits are turned off. | 
