view README @ 11:13575ba795d3

impl/gcc: add native 256-bit and 512-bit intrinsics these are simple to implement. At some point I'd like to refactor vec into using a union and being able to detect AVX512 and friends at compile time, so that the processors that *can* use it are enabled at runtime. This would mean adding a vec_init function, which isn't that big of a deal and can just be run at startup anyway and will grab the CPU flags we need.
author Paper <paper@tflc.us>
date Mon, 18 Nov 2024 16:12:24 -0500
parents f12b5dd4e18c
children e05c257c6a23
line wrap: on
line source

vec - a tiny SIMD vector header-only library written in C99

it comes with an extremely basic (and somewhat lacking) API,
where there are eight supported vector types, all 128-bit:

	vint8x16  - 16 signed 8-bit integers
	vint16x8  - 8 signed 16-bit integers
	vint32x4  - 4 signed 32-bit integers
	vint64x2  - 2 signed 64-bit integers
	vuint8x16 - 16 unsigned 8-bit integers
	vuint16x8 - 8 unsigned 16-bit integers
	vuint32x4 - 4 unsigned 32-bit integers
	vuint32x4 - 2 unsigned 64-bit integers

all of these have many operations that are prefixed with the
name of the type and an underscore, for example:

	vint8x16 vint8x16_splat(uint8_t x)
	- creates a vint8x16 where all of the values are filled
	  with the value of `x'

the current supported operations are:

	v[u]intAxB splat([u]intA_t x)
		creates a vector with all of the values are filled with
		the value of `x'

	v[u]intAxB load(const [u]intA_t x[B])
		copies the values from the memory address stored at `x';
		the address is NOT required to be aligned

	void store(v[u]intAxB vec, [u]intA_t x[B])
		copies the values from the vector into the memory address
		stored at `x'

		like with load(), this does not require address alignment

	v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
		adds the value of `vec1' and `vec2' and returns it

	v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2)
		subtracts the value of `vec2' from `vec1' and returns it

	v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2)
		multiplies the values of `vec1' and `vec2' together and
		returns it

	v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2)
		divides vec1 by the values in vec2. dividing by zero is
		considered defined behavior and should result in a zero;
		if this doesn't happen it's considered a bug

	v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise AND (&) of the values in both vectors

	v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise OR (|) of the values in both vectors

	v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise XOR (^) of the values in both vectors

	v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2)
		arithmetic right shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2)
		arithmetic left shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2)
		logical right shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
		returns the average of the values in both vectors
		i.e., div(mul(vec1, vec2), splat(2))

there are also a number of comparisons possible:

	v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is less
		than the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is greater
		than the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' are equal
		to the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is less
		than or equal to the corresponding value in `vec2',
		else all of the bits are turned off.

	v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is greater
		than or equal to the corresponding value in `vec2',
		else all of the bits are turned off.