view README @ 27:d00b95f95dd1 default tip

impl/arm/neon: it compiles again, but is untested
author Paper <paper@tflc.us>
date Mon, 25 Nov 2024 00:33:02 -0500 (8 weeks ago)
parents e26874655738
children
line wrap: on
line source
vec - a tiny SIMD vector library written in C99

it comes with an extremely basic API that is similar to other intrinsics
libraries; each type is in the exact same format:

	v[sign][bits]x[size]
		where `sign' is either nothing (for signed) or `u' (for unsigned),
		`bits' is the bit size of the integer format,
		and `size' is the how many integers are in the vector

vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics
on processors where vec has an implementation and falls back to array-based
implementations where they are not.

to initialize vec, you MUST call `vec_init()' when your program starts up.

note that `vec_init()' is NOT thread-safe, and things can and will
blow up if you call it simultaneously from different threads (i.e. you
try to only initialize it when you need to... please just initialize
it on startup so you don't have to worry about that!!!)

all of these have many operations that are prefixed with the name of the
type and an underscore, for example:

	vint8x16 vint8x16_splat(uint8_t x)
	- creates a vint8x16 where all of the values are filled
	  with the value of `x'

the current supported operations are:

	v[u]intAxB splat([u]intA_t x)
		creates a vector with all of the values are filled with
		the value of `x'

	v[u]intAxB load(const [u]intA_t x[B])
		copies the values from the memory address stored at `x';
		the address is NOT required to be aligned

	void store(v[u]intAxB vec, [u]intA_t x[B])
		copies the values from the vector into the memory address
		stored at `x'

		like with load(), this does not require address alignment

	v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
		adds the value of `vec1' and `vec2' and returns it

	v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2)
		subtracts the value of `vec2' from `vec1' and returns it

	v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2)
		multiplies the values of `vec1' and `vec2' together and
		returns it

	v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2)
		divides vec1 by the values in vec2. dividing by zero is
		considered defined behavior and should result in a zero;
		if this doesn't happen it's considered a bug

	v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise AND (&) of the values in both vectors

	v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise OR (|) of the values in both vectors

	v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise XOR (^) of the values in both vectors

	v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2)
		arithmetic right shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2)
		arithmetic left shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2)
		logical right shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
		returns the average of the values in both vectors
		i.e., div(mul(vec1, vec2), splat(2))

there are also a number of comparisons possible:

	v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is less
		than the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is greater
		than the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' are equal
		to the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is less
		than or equal to the corresponding value in `vec2',
		else all of the bits are turned off.

	v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is greater
		than or equal to the corresponding value in `vec2',
		else all of the bits are turned off.