view README @ 23:e26874655738

*: huge refactor, new major release (hahaha) I keep finding things that are broken... The problem NOW was that vec would unintentionally build some functions with extended instruction sets, which is Bad and would mean that for all intents and purposes the CPU detection was completely broken. Now vec is no longer header only either. Boohoo. However this gives a lot more flexibility to vec since we no longer want or need to care about C++ crap. The NEON and Altivec implementations have not been updated which means they won't compile hence why they're commented out in the cmake build file.
author Paper <paper@tflc.us>
date Sun, 24 Nov 2024 02:52:40 -0500
parents e05c257c6a23
children
line wrap: on
line source

vec - a tiny SIMD vector library written in C99

it comes with an extremely basic API that is similar to other intrinsics
libraries; each type is in the exact same format:

	v[sign][bits]x[size]
		where `sign' is either nothing (for signed) or `u' (for unsigned),
		`bits' is the bit size of the integer format,
		and `size' is the how many integers are in the vector

vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics
on processors where vec has an implementation and falls back to array-based
implementations where they are not.

to initialize vec, you MUST call `vec_init()' when your program starts up.

note that `vec_init()' is NOT thread-safe, and things can and will
blow up if you call it simultaneously from different threads (i.e. you
try to only initialize it when you need to... please just initialize
it on startup so you don't have to worry about that!!!)

all of these have many operations that are prefixed with the name of the
type and an underscore, for example:

	vint8x16 vint8x16_splat(uint8_t x)
	- creates a vint8x16 where all of the values are filled
	  with the value of `x'

the current supported operations are:

	v[u]intAxB splat([u]intA_t x)
		creates a vector with all of the values are filled with
		the value of `x'

	v[u]intAxB load(const [u]intA_t x[B])
		copies the values from the memory address stored at `x';
		the address is NOT required to be aligned

	void store(v[u]intAxB vec, [u]intA_t x[B])
		copies the values from the vector into the memory address
		stored at `x'

		like with load(), this does not require address alignment

	v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
		adds the value of `vec1' and `vec2' and returns it

	v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2)
		subtracts the value of `vec2' from `vec1' and returns it

	v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2)
		multiplies the values of `vec1' and `vec2' together and
		returns it

	v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2)
		divides vec1 by the values in vec2. dividing by zero is
		considered defined behavior and should result in a zero;
		if this doesn't happen it's considered a bug

	v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise AND (&) of the values in both vectors

	v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise OR (|) of the values in both vectors

	v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise XOR (^) of the values in both vectors

	v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2)
		arithmetic right shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2)
		arithmetic left shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2)
		logical right shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
		returns the average of the values in both vectors
		i.e., div(mul(vec1, vec2), splat(2))

there are also a number of comparisons possible:

	v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is less
		than the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is greater
		than the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' are equal
		to the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is less
		than or equal to the corresponding value in `vec2',
		else all of the bits are turned off.

	v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is greater
		than or equal to the corresponding value in `vec2',
		else all of the bits are turned off.