view README @ 21:697b9ba1c1de

impl/ppc/altivec: implement comparison functions had to prune some up eh, vec_cmple/vec_cmpge is only available in VSX
author Paper <paper@tflc.us>
date Thu, 21 Nov 2024 21:55:20 +0000
parents e05c257c6a23
children e26874655738
line wrap: on
line source

vec - a tiny SIMD vector header-only library written in C99

it comes with an extremely basic API that is similar to other intrinsics
libraries; each type is in the exact same format:

	v[sign][bits]x[size]
		where `sign' is either nothing (for signed) or `u' (for unsigned),
		`bits' is the bit size of the integer format,
		and `size' is the how many integers are in the vector

vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics
on processors where vec has an implementation and falls back to array-based
implementations where they are not.

all of these have many operations that are prefixed with the name of the
type and an underscore, for example:

	vint8x16 vint8x16_splat(uint8_t x)
	- creates a vint8x16 where all of the values are filled
	  with the value of `x'

the current supported operations are:

	v[u]intAxB splat([u]intA_t x)
		creates a vector with all of the values are filled with
		the value of `x'

	v[u]intAxB load(const [u]intA_t x[B])
		copies the values from the memory address stored at `x';
		the address is NOT required to be aligned

	void store(v[u]intAxB vec, [u]intA_t x[B])
		copies the values from the vector into the memory address
		stored at `x'

		like with load(), this does not require address alignment

	v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
		adds the value of `vec1' and `vec2' and returns it

	v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2)
		subtracts the value of `vec2' from `vec1' and returns it

	v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2)
		multiplies the values of `vec1' and `vec2' together and
		returns it

	v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2)
		divides vec1 by the values in vec2. dividing by zero is
		considered defined behavior and should result in a zero;
		if this doesn't happen it's considered a bug

	v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise AND (&) of the values in both vectors

	v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise OR (|) of the values in both vectors

	v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise XOR (^) of the values in both vectors

	v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2)
		arithmetic right shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2)
		arithmetic left shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2)
		logical right shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
		returns the average of the values in both vectors
		i.e., div(mul(vec1, vec2), splat(2))

there are also a number of comparisons possible:

	v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is less
		than the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is greater
		than the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' are equal
		to the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is less
		than or equal to the corresponding value in `vec2',
		else all of the bits are turned off.

	v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is greater
		than or equal to the corresponding value in `vec2',
		else all of the bits are turned off.

to initialize vec, you MUST call `vec_init()' when your programs starts up.

note that `vec_init()' is NOT thread-safe, and things can and will
blow up if you call it simultaneously from different threads (i.e. you
try to only initialize it when you need to... please just initialize
it on startup so you don't have to worry about that!!!)