vec: README annotate

annotate README @ 41:c6e0df09b86f

*: performance improvements with old GCC, reimplement altivec

author	Paper <paper@tflc.us>
date	Mon, 28 Apr 2025 16:31:59 -0400
parents	55cadb1fac4b
children

rev	line source
36 677c03c382b8 Backed out changeset e26874655738 Paper <paper@tflc.us> parents: 23 diff changeset	1 vec - a tiny SIMD vector header-only library written in C99
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	2
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	3 - Be prepared! Are you sure you want to know? :-)
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	4
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	5 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	6 THE VECTOR API
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	7 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	8 vec comes with an extremely basic API that is similar to other intrinsics
15 e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	9 libraries; each type is in the exact same format:
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	10
15 e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	11 v[sign][bits]x[size]
e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	12 where `sign' is either nothing (for signed) or `u' (for unsigned),
e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	13 `bits' is the bit size of the integer format,
e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	14 and `size' is the how many integers are in the vector
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	15
15 e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	16 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics
e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	17 on processors where vec has an implementation and falls back to array-based
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	18 implementations where they are not. For example, creating a 256-bit vector
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	19 on powerpc would simply create two consecutive 128-bit vectors.
15 e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	20
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	21 All of these have many operations that are prefixed with the name of the
15 e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	22 type and an underscore, for example:
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	23
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	24 vint8x16 vint8x16_splat(int8_t x)
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	25 - creates a vint8x16 where all of the values are filled
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	26 with the value of `x'
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	27
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	28 The currently supported operations are:
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	29
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	30 v[u]intAxB splat([u]intA_t x)
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	31 creates a vector with all of the values are filled with
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	32 the value of `x'
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	33
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	34 v[u]intAxB load(const [u]intA_t x[B])
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	35 copies the values from the memory address stored at `x';
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	36 the address is NOT required to be aligned
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	37
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	38 v[u]intAxB load_aligned(const [u]intA_t x[B])
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	39 like `load', but the address is required to be aligned,
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	40 which can cause some speed improvements if done correctly.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	41
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	42 void store(v[u]intAxB vec, [u]intA_t x[B])
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	43 copies the values from the vector into the memory address
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	44 stored at `x'.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	45 like with load(), this does not require address alignment
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	46
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	47 void store_aligned(v[u]intAxB vec, [u]intA_t x[B])
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	48 like `store', but the address is required to be aligned,
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	49 which can cause some speed improvements if done correctly.
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	50
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	51 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	52 adds the value of `vec1' and `vec2' and returns it
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	53
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	54 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	55 subtracts the value of `vec2' from `vec1' and returns it
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	56
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	57 v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	58 multiplies the values of `vec1' and `vec2' together and
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	59 returns it
2 f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	60
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	61 v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	62 divides vec1 by the values in vec2. dividing by zero is
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	63 considered defined behavior and should result in a zero;
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	64 if this doesn't happen it's considered a bug
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	65
40 55cadb1fac4b : add mod operation, add GCC vector backend Paper <paper@tflc.us>* parents: 39 diff changeset	66 v[u]intAxB mod(v[u]intAxB vec1, v[u]intAxB vec2)
55cadb1fac4b : add mod operation, add GCC vector backend Paper <paper@tflc.us>* parents: 39 diff changeset	67 gives the remainder of a division operation. as with div,
55cadb1fac4b : add mod operation, add GCC vector backend Paper <paper@tflc.us>* parents: 39 diff changeset	68 divide-by-zero is defined behavior.
55cadb1fac4b : add mod operation, add GCC vector backend Paper <paper@tflc.us>* parents: 39 diff changeset	69
2 f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	70 v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	71 bitwise AND (&) of the values in both vectors
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	72
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	73 v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	74 bitwise OR (\|) of the values in both vectors
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	75
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	76 v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	77 bitwise XOR (^) of the values in both vectors
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	78
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	79 v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	80 arithmetic right shift of the values in vec1 by
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	81 the corresponding values in vec2
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	82
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	83 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	84 arithmetic left shift of the values in vec1 by
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	85 the corresponding values in vec2
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	86
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	87 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	88 logical right shift of the values in vec1 by
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	89 the corresponding values in vec2
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	90
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	91 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	92 returns the average of the values in both vectors
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	93 i.e., div(add(vec1, vec2), splat(2)), without
40 55cadb1fac4b : add mod operation, add GCC vector backend Paper <paper@tflc.us>* parents: 39 diff changeset	94 the possibility of overflow. If you are familiar
55cadb1fac4b : add mod operation, add GCC vector backend Paper <paper@tflc.us>* parents: 39 diff changeset	95 with AltiVec, this operation exactly mimics
55cadb1fac4b : add mod operation, add GCC vector backend Paper <paper@tflc.us>* parents: 39 diff changeset	96 vec_avg.
2 f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	97
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	98 v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2)
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	99 returns the minimum of the values in both vectors
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	100
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	101 v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2)
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	102 returns the maximum of the values in both vectors
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	103
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	104 There are also a number of comparisons possible:
2 f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	105
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	106 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	107 turns on all bits of the corresponding value in
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	108 the result vector if the value in `vec1' is less
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	109 than the corresponding value in `vec2', else all
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	110 of the bits are turned off.
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	111
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	112 v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	113 turns on all bits of the corresponding value in
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	114 the result vector if the value in `vec1' is greater
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	115 than the corresponding value in `vec2', else all
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	116 of the bits are turned off.
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	117
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	118 v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	119 turns on all bits of the corresponding value in
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	120 the result vector if the value in `vec1' are equal
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	121 to the corresponding value in `vec2', else all
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	122 of the bits are turned off.
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	123
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	124 v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	125 turns on all bits of the corresponding value in
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	126 the result vector if the value in `vec1' is less
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	127 than or equal to the corresponding value in `vec2',
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	128 else all of the bits are turned off.
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	129
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	130 v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	131 turns on all bits of the corresponding value in
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	132 the result vector if the value in `vec1' is greater
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	133 than or equal to the corresponding value in `vec2',
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	134 else all of the bits are turned off.
36 677c03c382b8 Backed out changeset e26874655738 Paper <paper@tflc.us> parents: 23 diff changeset	135
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	136 This API will most definitely have more operations available as they are
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	137 requested (and as they are needed). Patches are accepted and encouraged!
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	138
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	139 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	140 USING VEC
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	141 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	142 To use vec, simply include `vec/vec.h` in your program. If you would like
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	143 your program to also be able to run on older systems, you can create
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	144 multiple translation units and pass different command line arguments
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	145 to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	146 modes the CPU supports at runtime. vec provides an optional public API
39 f9ca85d2f14c : rearrange some things; add avx512bw support Paper <paper@tflc.us>* parents: 38 diff changeset	147 specifically for this use-case within `vec/cpu.h`; bear in mind though
f9ca85d2f14c : rearrange some things; add avx512bw support Paper <paper@tflc.us>* parents: 38 diff changeset	148 that it is not thread-safe, so if your program is multithreaded you'll want
f9ca85d2f14c : rearrange some things; add avx512bw support Paper <paper@tflc.us>* parents: 38 diff changeset	149 to cache the results on startup.
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	150
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	151 The CPU vector detection API is extremely simple, and self-explanatory.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	152 You call `vec_get_CPU_features()', and it returns a bit-mask of the
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	153 values within the enum placed above the function definition. From there,
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	154 you can test for each value specifically.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	155
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	156 vec should work perfectly fine with C++, though it is not tested as
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	157 thoroughly as C is. Your mileage may vary. You should probably be using
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	158 a library more tailored towards C++ such as Highway[1] or std::simd.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	159
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	160 [1]: https://google.github.io/highway/en/master/
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	161
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	162 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	163 MEMORY ALLOCATION
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	164 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	165 vec allows for stack-based and heap-based aligned array allocation. The
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	166 stack-based API is simple, and goes among the lines of this:
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	167
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	168 VINT16x32_ALIGNED_ARRAY(arr);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	169
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	170 /* arr is now either an array type or a pointer type, depending on whether
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	171 * the compiler supports the alignas operator within C11 or later, or has
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	172 * its own extension to align arrays. vec will fallback to manual pointer
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	173 * alignment if the compiler does not support it. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	174
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	175 /* this macro returns the full size of the array in bytes */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	176 int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	177
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	178 /* this macro returns the length of the array
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	179 * (basically a synonym for sizeof/sizeof[0]) */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	180 int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr);
36 677c03c382b8 Backed out changeset e26874655738 Paper <paper@tflc.us> parents: 23 diff changeset	181
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	182 /* no need to free the aligned array -- it is always on the stack */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	183
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	184 The heap-based API is based off the good old C malloc API:
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	185
39 f9ca85d2f14c : rearrange some things; add avx512bw support Paper <paper@tflc.us>* parents: 38 diff changeset	186 /* heap allocation stuff is only defined here: */
f9ca85d2f14c : rearrange some things; add avx512bw support Paper <paper@tflc.us>* parents: 38 diff changeset	187 #include "vec/mem.h"
f9ca85d2f14c : rearrange some things; add avx512bw support Paper <paper@tflc.us>* parents: 38 diff changeset	188
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	189 vec_int32 q = vec_malloc(1024 sizeof(vec_int32));
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	190
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	191 /* q is now aligned, and ready for use with a vector aligned load
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	192 * function. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	193 vint32x16_load_aligned(q);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	194
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	195 /* Say we want to reallocate the memory with a different size.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	196 * No problem there! */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	197 q = vec_realloc(q, 2048 * sizeof(vec_int32));
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	198
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	199 /* In a real world program, you'll want to check that vec_malloc
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	200 * and vec_realloc do not fail, but this error checking has been
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	201 * withheld from this example, as it is the same as for regular
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	202 * malloc and realloc. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	203
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	204 vec_free(q);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	205
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	206 /* If you need it to be initialized, we have you covered: */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	207 q = vec_calloc(1024, sizeof(vec_int32));
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	208
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	209 /* vec_calloc forwards to the real calloc, so there is no overhead of
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	210 * calling memset or something similar. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	211
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	212 vec_free(q);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	213
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	214 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	215 THE BOTTOM
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	216 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	217 vec is copyright (c) Paper 2024-2025.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	218 See the file LICENSE in the distribution for more information.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	219
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	220 Bugs? Questions? Suggestions? Patches?
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	221 Feel free to contact me at any of the following:
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	222
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	223 Website: https://tflc.us/
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	224 Email: paper@tflc.us
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	225 IRC: slipofpaper on Libera.chat
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	226 Discord: @slipofpaper
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	227
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	228
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	229 am I a real programmer now? :^)

Mercurial > vec

annotate README @ 41:c6e0df09b86f