vec: README annotate

annotate README @ 39:f9ca85d2f14c

*: rearrange some things; add avx512bw support

author	Paper <paper@tflc.us>
date	Sat, 26 Apr 2025 15:31:39 -0400
parents	fd42f9b1b95e
children	55cadb1fac4b

rev	line source
36 677c03c382b8 Backed out changeset e26874655738 Paper <paper@tflc.us> parents: 23 diff changeset	1 vec - a tiny SIMD vector header-only library written in C99
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	2
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	3 - Be prepared! Are you sure you want to know? :-)
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	4
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	5 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	6 THE VECTOR API
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	7 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	8 vec comes with an extremely basic API that is similar to other intrinsics
15 e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	9 libraries; each type is in the exact same format:
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	10
15 e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	11 v[sign][bits]x[size]
e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	12 where `sign' is either nothing (for signed) or `u' (for unsigned),
e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	13 `bits' is the bit size of the integer format,
e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	14 and `size' is the how many integers are in the vector
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	15
15 e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	16 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics
e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	17 on processors where vec has an implementation and falls back to array-based
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	18 implementations where they are not. For example, creating a 256-bit vector
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	19 on powerpc would simply create two consecutive 128-bit vectors.
15 e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	20
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	21 All of these have many operations that are prefixed with the name of the
15 e05c257c6a23 : huge refactor, add many new x86 intrinsics and the like Paper <paper@tflc.us>* parents: 2 diff changeset	22 type and an underscore, for example:
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	23
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	24 vint8x16 vint8x16_splat(int8_t x)
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	25 - creates a vint8x16 where all of the values are filled
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	26 with the value of `x'
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	27
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	28 The currently supported operations are:
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	29
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	30 v[u]intAxB splat([u]intA_t x)
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	31 creates a vector with all of the values are filled with
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	32 the value of `x'
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	33
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	34 v[u]intAxB load(const [u]intA_t x[B])
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	35 copies the values from the memory address stored at `x';
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	36 the address is NOT required to be aligned
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	37
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	38 v[u]intAxB load_aligned(const [u]intA_t x[B])
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	39 like `load', but the address is required to be aligned,
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	40 which can cause some speed improvements if done correctly.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	41
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	42 void store(v[u]intAxB vec, [u]intA_t x[B])
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	43 copies the values from the vector into the memory address
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	44 stored at `x'.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	45 like with load(), this does not require address alignment
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	46
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	47 void store_aligned(v[u]intAxB vec, [u]intA_t x[B])
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	48 like `store', but the address is required to be aligned,
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	49 which can cause some speed improvements if done correctly.
0 02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	50
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	51 v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	52 adds the value of `vec1' and `vec2' and returns it
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	53
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	54 v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	55 subtracts the value of `vec2' from `vec1' and returns it
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	56
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	57 v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2)
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	58 multiplies the values of `vec1' and `vec2' together and
02a517e4c492 : initial commit Paper <paper@paper.us.eu.org>* parents: diff changeset	59 returns it
2 f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	60
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	61 v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	62 divides vec1 by the values in vec2. dividing by zero is
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	63 considered defined behavior and should result in a zero;
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	64 if this doesn't happen it's considered a bug
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	65
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	66 v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	67 bitwise AND (&) of the values in both vectors
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	68
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	69 v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	70 bitwise OR (\|) of the values in both vectors
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	71
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	72 v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	73 bitwise XOR (^) of the values in both vectors
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	74
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	75 v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	76 arithmetic right shift of the values in vec1 by
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	77 the corresponding values in vec2
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	78
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	79 v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	80 arithmetic left shift of the values in vec1 by
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	81 the corresponding values in vec2
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	82
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	83 v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	84 logical right shift of the values in vec1 by
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	85 the corresponding values in vec2
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	86
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	87 v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	88 returns the average of the values in both vectors
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	89 i.e., div(add(vec1, vec2), splat(2)), without
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	90 the possibility of overflow.
2 f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	91
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	92 v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2)
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	93 returns the minimum of the values in both vectors
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	94
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	95 v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2)
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	96 returns the maximum of the values in both vectors
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	97
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	98 There are also a number of comparisons possible:
2 f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	99
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	100 v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	101 turns on all bits of the corresponding value in
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	102 the result vector if the value in `vec1' is less
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	103 than the corresponding value in `vec2', else all
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	104 of the bits are turned off.
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	105
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	106 v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	107 turns on all bits of the corresponding value in
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	108 the result vector if the value in `vec1' is greater
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	109 than the corresponding value in `vec2', else all
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	110 of the bits are turned off.
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	111
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	112 v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	113 turns on all bits of the corresponding value in
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	114 the result vector if the value in `vec1' are equal
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	115 to the corresponding value in `vec2', else all
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	116 of the bits are turned off.
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	117
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	118 v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	119 turns on all bits of the corresponding value in
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	120 the result vector if the value in `vec1' is less
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	121 than or equal to the corresponding value in `vec2',
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	122 else all of the bits are turned off.
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	123
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	124 v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2)
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	125 turns on all bits of the corresponding value in
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	126 the result vector if the value in `vec1' is greater
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	127 than or equal to the corresponding value in `vec2',
f12b5dd4e18c : many new operations and a real test suite Paper <paper@tflc.us>* parents: 0 diff changeset	128 else all of the bits are turned off.
36 677c03c382b8 Backed out changeset e26874655738 Paper <paper@tflc.us> parents: 23 diff changeset	129
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	130 This API will most definitely have more operations available as they are
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	131 requested (and as they are needed). Patches are accepted and encouraged!
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	132
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	133 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	134 USING VEC
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	135 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	136 To use vec, simply include `vec/vec.h` in your program. If you would like
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	137 your program to also be able to run on older systems, you can create
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	138 multiple translation units and pass different command line arguments
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	139 to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	140 modes the CPU supports at runtime. vec provides an optional public API
39 f9ca85d2f14c : rearrange some things; add avx512bw support Paper <paper@tflc.us>* parents: 38 diff changeset	141 specifically for this use-case within `vec/cpu.h`; bear in mind though
f9ca85d2f14c : rearrange some things; add avx512bw support Paper <paper@tflc.us>* parents: 38 diff changeset	142 that it is not thread-safe, so if your program is multithreaded you'll want
f9ca85d2f14c : rearrange some things; add avx512bw support Paper <paper@tflc.us>* parents: 38 diff changeset	143 to cache the results on startup.
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	144
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	145 The CPU vector detection API is extremely simple, and self-explanatory.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	146 You call `vec_get_CPU_features()', and it returns a bit-mask of the
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	147 values within the enum placed above the function definition. From there,
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	148 you can test for each value specifically.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	149
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	150 vec should work perfectly fine with C++, though it is not tested as
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	151 thoroughly as C is. Your mileage may vary. You should probably be using
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	152 a library more tailored towards C++ such as Highway[1] or std::simd.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	153
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	154 [1]: https://google.github.io/highway/en/master/
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	155
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	156 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	157 MEMORY ALLOCATION
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	158 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	159 vec allows for stack-based and heap-based aligned array allocation. The
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	160 stack-based API is simple, and goes among the lines of this:
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	161
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	162 VINT16x32_ALIGNED_ARRAY(arr);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	163
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	164 /* arr is now either an array type or a pointer type, depending on whether
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	165 * the compiler supports the alignas operator within C11 or later, or has
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	166 * its own extension to align arrays. vec will fallback to manual pointer
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	167 * alignment if the compiler does not support it. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	168
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	169 /* this macro returns the full size of the array in bytes */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	170 int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	171
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	172 /* this macro returns the length of the array
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	173 * (basically a synonym for sizeof/sizeof[0]) */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	174 int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr);
36 677c03c382b8 Backed out changeset e26874655738 Paper <paper@tflc.us> parents: 23 diff changeset	175
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	176 /* no need to free the aligned array -- it is always on the stack */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	177
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	178 The heap-based API is based off the good old C malloc API:
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	179
39 f9ca85d2f14c : rearrange some things; add avx512bw support Paper <paper@tflc.us>* parents: 38 diff changeset	180 /* heap allocation stuff is only defined here: */
f9ca85d2f14c : rearrange some things; add avx512bw support Paper <paper@tflc.us>* parents: 38 diff changeset	181 #include "vec/mem.h"
f9ca85d2f14c : rearrange some things; add avx512bw support Paper <paper@tflc.us>* parents: 38 diff changeset	182
38 fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	183 vec_int32 q = vec_malloc(1024 sizeof(vec_int32));
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	184
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	185 /* q is now aligned, and ready for use with a vector aligned load
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	186 * function. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	187 vint32x16_load_aligned(q);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	188
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	189 /* Say we want to reallocate the memory with a different size.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	190 * No problem there! */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	191 q = vec_realloc(q, 2048 * sizeof(vec_int32));
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	192
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	193 /* In a real world program, you'll want to check that vec_malloc
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	194 * and vec_realloc do not fail, but this error checking has been
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	195 * withheld from this example, as it is the same as for regular
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	196 * malloc and realloc. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	197
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	198 vec_free(q);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	199
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	200 /* If you need it to be initialized, we have you covered: */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	201 q = vec_calloc(1024, sizeof(vec_int32));
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	202
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	203 /* vec_calloc forwards to the real calloc, so there is no overhead of
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	204 * calling memset or something similar. */
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	205
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	206 vec_free(q);
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	207
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	208 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	209 THE BOTTOM
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	210 ------------------------------------------------------------------------------
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	211 vec is copyright (c) Paper 2024-2025.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	212 See the file LICENSE in the distribution for more information.
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	213
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	214 Bugs? Questions? Suggestions? Patches?
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	215 Feel free to contact me at any of the following:
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	216
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	217 Website: https://tflc.us/
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	218 Email: paper@tflc.us
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	219 IRC: slipofpaper on Libera.chat
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	220 Discord: @slipofpaper
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	221
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	222
fd42f9b1b95e docs: update copyright for 2025, update the README with more info Paper <paper@tflc.us> parents: 36 diff changeset	223 am I a real programmer now? :^)

Mercurial > vec

annotate README @ 39:f9ca85d2f14c