Mercurial > vec

vec - a tiny SIMD vector header-only library written in C99

- Be prepared! Are you sure you want to know? :-)

------------------------------------------------------------------------------
THE VECTOR API
------------------------------------------------------------------------------
vec comes with an extremely basic API that is similar to other intrinsics
libraries; each type is in the exact same format:

	v[sign][bits]x[size]
		where `sign' is either nothing (for signed) or `u' (for unsigned),
		`bits' is the bit size of the integer format,
		and `size' is the how many integers are in the vector

vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics
on processors where vec has an implementation and falls back to array-based
implementations where they are not. For example, creating a 256-bit vector
on powerpc would simply create two consecutive 128-bit vectors.

All of these have many operations that are prefixed with the name of the
type and an underscore, for example:

	vint8x16 vint8x16_splat(int8_t x)
	- creates a vint8x16 where all of the values are filled
	  with the value of `x'

The currently supported operations are:

	v[u]intAxB splat([u]intA_t x)
		creates a vector with all of the values are filled with
		the value of `x'

	v[u]intAxB load(const [u]intA_t x[B])
		copies the values from the memory address stored at `x';
		the address is NOT required to be aligned

	v[u]intAxB load_aligned(const [u]intA_t x[B])
		like `load', but the address is required to be aligned,
		which can cause some speed improvements if done correctly.

	void store(v[u]intAxB vec, [u]intA_t x[B])
		copies the values from the vector into the memory address
		stored at `x'.
		like with load(), this does not require address alignment

	void store_aligned(v[u]intAxB vec, [u]intA_t x[B])
		like `store', but the address is required to be aligned,
		which can cause some speed improvements if done correctly.

	v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
		adds the value of `vec1' and `vec2' and returns it

	v[u]intAxB sub(v[u]intAxB vec1, v[u]intAxB vec2)
		subtracts the value of `vec2' from `vec1' and returns it

	v[u]intAxB mul(v[u]intAxB vec1, v[u]intAxB vec2)
		multiplies the values of `vec1' and `vec2' together and
		returns it

	v[u]intAxB div(v[u]intAxB vec1, v[u]intAxB vec2)
		divides vec1 by the values in vec2. dividing by zero is
		considered defined behavior and should result in a zero;
		if this doesn't happen it's considered a bug

	v[u]intAxB and(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise AND (&) of the values in both vectors

	v[u]intAxB or(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise OR (|) of the values in both vectors

	v[u]intAxB xor(v[u]intAxB vec1, v[u]intAxB vec2)
		bitwise XOR (^) of the values in both vectors

	v[u]intAxB rshift(v[u]intAxB vec1, vuintAxB vec2)
		arithmetic right shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB lshift(v[u]intAxB vec1, vuintAxB vec2)
		arithmetic left shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB lrshift(v[u]intAxB vec1, vuintAxB vec2)
		logical right shift of the values in vec1 by
		the corresponding values in vec2

	v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
		returns the average of the values in both vectors
		i.e., div(add(vec1, vec2), splat(2)), without
		the possibility of overflow.

	v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2)
		returns the minimum of the values in both vectors

	v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2)
		returns the maximum of the values in both vectors

There are also a number of comparisons possible:

	v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is less
		than the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmpgt(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is greater
		than the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmpeq(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' are equal
		to the corresponding value in `vec2', else all
		of the bits are turned off.

	v[u]intAxB cmple(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is less
		than or equal to the corresponding value in `vec2',
		else all of the bits are turned off.

	v[u]intAxB cmpge(v[u]intAxB vec1, v[u]intAxB vec2)
		turns on all bits of the corresponding value in
		the result vector if the value in `vec1' is greater
		than or equal to the corresponding value in `vec2',
		else all of the bits are turned off.

This API will most definitely have more operations available as they are
requested (and as they are needed). Patches are accepted and encouraged!

------------------------------------------------------------------------------
USING VEC
------------------------------------------------------------------------------
To use vec, simply include `vec/vec.h` in your program. If you would like
your program to also be able to run on older systems, you can create
multiple translation units and pass different command line arguments
to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector
modes the CPU supports at runtime. vec provides an optional public API
specifically for this use-case within `vec/cpu.h`; bear in mind though
that it is not thread-safe, so if your program is multithreaded you'll want
to cache the results on startup.

The CPU vector detection API is extremely simple, and self-explanatory.
You call `vec_get_CPU_features()', and it returns a bit-mask of the
values within the enum placed above the function definition. From there,
you can test for each value specifically.

vec should work perfectly fine with C++, though it is not tested as
thoroughly as C is. Your mileage may vary. You should probably be using
a library more tailored towards C++ such as Highway[1] or std::simd.

[1]: https://google.github.io/highway/en/master/

------------------------------------------------------------------------------
MEMORY ALLOCATION
------------------------------------------------------------------------------
vec allows for stack-based and heap-based aligned array allocation. The
stack-based API is simple, and goes among the lines of this:

	VINT16x32_ALIGNED_ARRAY(arr);

	/* arr is now either an array type or a pointer type, depending on whether
	 * the compiler supports the alignas operator within C11 or later, or has
	 * its own extension to align arrays. vec will fallback to manual pointer
	 * alignment if the compiler does not support it. */

	/* this macro returns the full size of the array in bytes */
	int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr);

	/* this macro returns the length of the array
	 * (basically a synonym for sizeof/sizeof[0]) */
	int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr);

	/* no need to free the aligned array -- it is always on the stack */

The heap-based API is based off the good old C malloc API:

	/* heap allocation stuff is only defined here: */
	#include "vec/mem.h"

	vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32));

	/* q is now aligned, and ready for use with a vector aligned load
	 * function. */
	vint32x16_load_aligned(q);

	/* Say we want to reallocate the memory with a different size.
	 * No problem there! */
	q = vec_realloc(q, 2048 * sizeof(vec_int32));

	/* In a real world program, you'll want to check that vec_malloc
	 * and vec_realloc do not fail, but this error checking has been
	 * withheld from this example, as it is the same as for regular
	 * malloc and realloc. */

	vec_free(q);

	/* If you need it to be initialized, we have you covered: */
	q = vec_calloc(1024, sizeof(vec_int32));

	/* vec_calloc forwards to the real calloc, so there is no overhead of
	 * calling memset or something similar. */

	vec_free(q);

------------------------------------------------------------------------------
THE BOTTOM
------------------------------------------------------------------------------
vec is copyright (c) Paper 2024-2025.
See the file LICENSE in the distribution for more information.

Bugs? Questions? Suggestions? Patches?
Feel free to contact me at any of the following:

Website: https://tflc.us/
Email: paper@tflc.us
IRC: slipofpaper on Libera.chat
Discord: @slipofpaper


am I a real programmer now? :^)
author	Paper <paper@tflc.us>
date	Sat, 26 Apr 2025 15:31:39 -0400
parents	fd42f9b1b95e
children	55cadb1fac4b