diff README @ 38:fd42f9b1b95e

docs: update copyright for 2025, update the README with more info I slightly edited vec.h however to use calloc directly rather than malloc + memset.
author Paper <paper@tflc.us>
date Sat, 26 Apr 2025 02:54:44 -0400
parents 677c03c382b8
children f9ca85d2f14c
line wrap: on
line diff
--- a/README	Sat Apr 26 01:04:35 2025 -0400
+++ b/README	Sat Apr 26 02:54:44 2025 -0400
@@ -1,6 +1,11 @@
 vec - a tiny SIMD vector header-only library written in C99
 
-it comes with an extremely basic API that is similar to other intrinsics
+- Be prepared! Are you sure you want to know? :-)
+
+------------------------------------------------------------------------------
+THE VECTOR API
+------------------------------------------------------------------------------
+vec comes with an extremely basic API that is similar to other intrinsics
 libraries; each type is in the exact same format:
 
 	v[sign][bits]x[size]
@@ -10,16 +15,17 @@
 
 vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics
 on processors where vec has an implementation and falls back to array-based
-implementations where they are not.
+implementations where they are not. For example, creating a 256-bit vector
+on powerpc would simply create two consecutive 128-bit vectors.
 
-all of these have many operations that are prefixed with the name of the
+All of these have many operations that are prefixed with the name of the
 type and an underscore, for example:
 
-	vint8x16 vint8x16_splat(uint8_t x)
+	vint8x16 vint8x16_splat(int8_t x)
 	- creates a vint8x16 where all of the values are filled
 	  with the value of `x'
 
-the current supported operations are:
+The currently supported operations are:
 
 	v[u]intAxB splat([u]intA_t x)
 		creates a vector with all of the values are filled with
@@ -29,11 +35,18 @@
 		copies the values from the memory address stored at `x';
 		the address is NOT required to be aligned
 
+	v[u]intAxB load_aligned(const [u]intA_t x[B])
+		like `load', but the address is required to be aligned,
+		which can cause some speed improvements if done correctly.
+
 	void store(v[u]intAxB vec, [u]intA_t x[B])
 		copies the values from the vector into the memory address
-		stored at `x'
+		stored at `x'.
+		like with load(), this does not require address alignment
 
-		like with load(), this does not require address alignment
+	void store_aligned(v[u]intAxB vec, [u]intA_t x[B])
+		like `store', but the address is required to be aligned,
+		which can cause some speed improvements if done correctly.
 
 	v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2)
 		adds the value of `vec1' and `vec2' and returns it
@@ -73,9 +86,16 @@
 
 	v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2)
 		returns the average of the values in both vectors
-		i.e., div(mul(vec1, vec2), splat(2))
+		i.e., div(add(vec1, vec2), splat(2)), without
+		the possibility of overflow.
 
-there are also a number of comparisons possible:
+	v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2)
+		returns the minimum of the values in both vectors
+
+	v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2)
+		returns the maximum of the values in both vectors
+
+There are also a number of comparisons possible:
 
 	v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2)
 		turns on all bits of the corresponding value in
@@ -107,9 +127,94 @@
 		than or equal to the corresponding value in `vec2',
 		else all of the bits are turned off.
 
-to initialize vec, you MUST call `vec_init()' when your programs starts up.
+This API will most definitely have more operations available as they are
+requested (and as they are needed). Patches are accepted and encouraged!
+
+------------------------------------------------------------------------------
+USING VEC
+------------------------------------------------------------------------------
+To use vec, simply include `vec/vec.h` in your program. If you would like
+your program to also be able to run on older systems, you can create
+multiple translation units and pass different command line arguments
+to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector
+modes the CPU supports at runtime. vec provides an optional public API
+specifically for this use-case within `vec/impl/cpu.h`; bear in mind
+though that it is not thread-safe, so if your program is multithreaded
+you'll want to cache the results on startup.
+
+The CPU vector detection API is extremely simple, and self-explanatory.
+You call `vec_get_CPU_features()', and it returns a bit-mask of the
+values within the enum placed above the function definition. From there,
+you can test for each value specifically.
+
+vec should work perfectly fine with C++, though it is not tested as
+thoroughly as C is. Your mileage may vary. You should probably be using
+a library more tailored towards C++ such as Highway[1] or std::simd.
+
+[1]: https://google.github.io/highway/en/master/
+
+------------------------------------------------------------------------------
+MEMORY ALLOCATION
+------------------------------------------------------------------------------
+vec allows for stack-based and heap-based aligned array allocation. The
+stack-based API is simple, and goes among the lines of this:
+
+	VINT16x32_ALIGNED_ARRAY(arr);
+
+	/* arr is now either an array type or a pointer type, depending on whether
+	 * the compiler supports the alignas operator within C11 or later, or has
+	 * its own extension to align arrays. vec will fallback to manual pointer
+	 * alignment if the compiler does not support it. */
+
+	/* this macro returns the full size of the array in bytes */
+	int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr);
+
+	/* this macro returns the length of the array
+	 * (basically a synonym for sizeof/sizeof[0]) */
+	int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr);
 
-note that `vec_init()' is NOT thread-safe, and things can and will
-blow up if you call it simultaneously from different threads (i.e. you
-try to only initialize it when you need to... please just initialize
-it on startup so you don't have to worry about that!!!)
+	/* no need to free the aligned array -- it is always on the stack */
+
+The heap-based API is based off the good old C malloc API:
+
+	vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32));
+
+	/* q is now aligned, and ready for use with a vector aligned load
+	 * function. */
+	vint32x16_load_aligned(q);
+
+	/* Say we want to reallocate the memory with a different size.
+	 * No problem there! */
+	q = vec_realloc(q, 2048 * sizeof(vec_int32));
+
+	/* In a real world program, you'll want to check that vec_malloc
+	 * and vec_realloc do not fail, but this error checking has been
+	 * withheld from this example, as it is the same as for regular
+	 * malloc and realloc. */
+
+	vec_free(q);
+
+	/* If you need it to be initialized, we have you covered: */
+	q = vec_calloc(1024, sizeof(vec_int32));
+
+	/* vec_calloc forwards to the real calloc, so there is no overhead of
+	 * calling memset or something similar. */
+
+	vec_free(q);
+
+------------------------------------------------------------------------------
+THE BOTTOM
+------------------------------------------------------------------------------
+vec is copyright (c) Paper 2024-2025.
+See the file LICENSE in the distribution for more information.
+
+Bugs? Questions? Suggestions? Patches?
+Feel free to contact me at any of the following:
+
+Website: https://tflc.us/
+Email: paper@tflc.us
+IRC: slipofpaper on Libera.chat
+Discord: @slipofpaper
+
+
+am I a real programmer now? :^)
\ No newline at end of file