# HG changeset patch # User Paper # Date 1745650484 14400 # Node ID fd42f9b1b95e67140be574ddab5586ae0dd20fa7 # Parent 4b5a557aa64f56a003ca3c8774d302e5845ec9bf docs: update copyright for 2025, update the README with more info I slightly edited vec.h however to use calloc directly rather than malloc + memset. diff -r 4b5a557aa64f -r fd42f9b1b95e LICENSE --- a/LICENSE Sat Apr 26 01:04:35 2025 -0400 +++ b/LICENSE Sat Apr 26 02:54:44 2025 -0400 @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2024 Paper +Copyright (c) 2024-2025 Paper Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff -r 4b5a557aa64f -r fd42f9b1b95e README --- a/README Sat Apr 26 01:04:35 2025 -0400 +++ b/README Sat Apr 26 02:54:44 2025 -0400 @@ -1,6 +1,11 @@ vec - a tiny SIMD vector header-only library written in C99 -it comes with an extremely basic API that is similar to other intrinsics +- Be prepared! Are you sure you want to know? :-) + +------------------------------------------------------------------------------ +THE VECTOR API +------------------------------------------------------------------------------ +vec comes with an extremely basic API that is similar to other intrinsics libraries; each type is in the exact same format: v[sign][bits]x[size] @@ -10,16 +15,17 @@ vec provides types for 64-bit, 128-bit, 256-bit, and 512-bit SIMD intrinsics on processors where vec has an implementation and falls back to array-based -implementations where they are not. +implementations where they are not. For example, creating a 256-bit vector +on powerpc would simply create two consecutive 128-bit vectors. -all of these have many operations that are prefixed with the name of the +All of these have many operations that are prefixed with the name of the type and an underscore, for example: - vint8x16 vint8x16_splat(uint8_t x) + vint8x16 vint8x16_splat(int8_t x) - creates a vint8x16 where all of the values are filled with the value of `x' -the current supported operations are: +The currently supported operations are: v[u]intAxB splat([u]intA_t x) creates a vector with all of the values are filled with @@ -29,11 +35,18 @@ copies the values from the memory address stored at `x'; the address is NOT required to be aligned + v[u]intAxB load_aligned(const [u]intA_t x[B]) + like `load', but the address is required to be aligned, + which can cause some speed improvements if done correctly. + void store(v[u]intAxB vec, [u]intA_t x[B]) copies the values from the vector into the memory address - stored at `x' + stored at `x'. + like with load(), this does not require address alignment - like with load(), this does not require address alignment + void store_aligned(v[u]intAxB vec, [u]intA_t x[B]) + like `store', but the address is required to be aligned, + which can cause some speed improvements if done correctly. v[u]intAxB add(v[u]intAxB vec1, v[u]intAxB vec2) adds the value of `vec1' and `vec2' and returns it @@ -73,9 +86,16 @@ v[u]intAxB avg(v[u]intAxB vec1, v[u]intAxB vec2) returns the average of the values in both vectors - i.e., div(mul(vec1, vec2), splat(2)) + i.e., div(add(vec1, vec2), splat(2)), without + the possibility of overflow. -there are also a number of comparisons possible: + v[u]intAxB min(v[u]intAxB vec1, v[u]intAxB vec2) + returns the minimum of the values in both vectors + + v[u]intAxB max(v[u]intAxB vec1, v[u]intAxB vec2) + returns the maximum of the values in both vectors + +There are also a number of comparisons possible: v[u]intAxB cmplt(v[u]intAxB vec1, v[u]intAxB vec2) turns on all bits of the corresponding value in @@ -107,9 +127,94 @@ than or equal to the corresponding value in `vec2', else all of the bits are turned off. -to initialize vec, you MUST call `vec_init()' when your programs starts up. +This API will most definitely have more operations available as they are +requested (and as they are needed). Patches are accepted and encouraged! + +------------------------------------------------------------------------------ +USING VEC +------------------------------------------------------------------------------ +To use vec, simply include `vec/vec.h` in your program. If you would like +your program to also be able to run on older systems, you can create +multiple translation units and pass different command line arguments +to the compiler to enable SSE2/AVX2/Altivec etc, and detect the vector +modes the CPU supports at runtime. vec provides an optional public API +specifically for this use-case within `vec/impl/cpu.h`; bear in mind +though that it is not thread-safe, so if your program is multithreaded +you'll want to cache the results on startup. + +The CPU vector detection API is extremely simple, and self-explanatory. +You call `vec_get_CPU_features()', and it returns a bit-mask of the +values within the enum placed above the function definition. From there, +you can test for each value specifically. + +vec should work perfectly fine with C++, though it is not tested as +thoroughly as C is. Your mileage may vary. You should probably be using +a library more tailored towards C++ such as Highway[1] or std::simd. + +[1]: https://google.github.io/highway/en/master/ + +------------------------------------------------------------------------------ +MEMORY ALLOCATION +------------------------------------------------------------------------------ +vec allows for stack-based and heap-based aligned array allocation. The +stack-based API is simple, and goes among the lines of this: + + VINT16x32_ALIGNED_ARRAY(arr); + + /* arr is now either an array type or a pointer type, depending on whether + * the compiler supports the alignas operator within C11 or later, or has + * its own extension to align arrays. vec will fallback to manual pointer + * alignment if the compiler does not support it. */ + + /* this macro returns the full size of the array in bytes */ + int size = VINT16x32_ALIGNED_ARRAY_SIZEOF(arr); + + /* this macro returns the length of the array + * (basically a synonym for sizeof/sizeof[0]) */ + int length = VINT16x32_ALIGNED_ARRAY_LENGTH(arr); -note that `vec_init()' is NOT thread-safe, and things can and will -blow up if you call it simultaneously from different threads (i.e. you -try to only initialize it when you need to... please just initialize -it on startup so you don't have to worry about that!!!) + /* no need to free the aligned array -- it is always on the stack */ + +The heap-based API is based off the good old C malloc API: + + vec_int32 *q = vec_malloc(1024 * sizeof(vec_int32)); + + /* q is now aligned, and ready for use with a vector aligned load + * function. */ + vint32x16_load_aligned(q); + + /* Say we want to reallocate the memory with a different size. + * No problem there! */ + q = vec_realloc(q, 2048 * sizeof(vec_int32)); + + /* In a real world program, you'll want to check that vec_malloc + * and vec_realloc do not fail, but this error checking has been + * withheld from this example, as it is the same as for regular + * malloc and realloc. */ + + vec_free(q); + + /* If you need it to be initialized, we have you covered: */ + q = vec_calloc(1024, sizeof(vec_int32)); + + /* vec_calloc forwards to the real calloc, so there is no overhead of + * calling memset or something similar. */ + + vec_free(q); + +------------------------------------------------------------------------------ +THE BOTTOM +------------------------------------------------------------------------------ +vec is copyright (c) Paper 2024-2025. +See the file LICENSE in the distribution for more information. + +Bugs? Questions? Suggestions? Patches? +Feel free to contact me at any of the following: + +Website: https://tflc.us/ +Email: paper@tflc.us +IRC: slipofpaper on Libera.chat +Discord: @slipofpaper + + +am I a real programmer now? :^) \ No newline at end of file diff -r 4b5a557aa64f -r fd42f9b1b95e include/vec/impl/arm/neon.h --- a/include/vec/impl/arm/neon.h Sat Apr 26 01:04:35 2025 -0400 +++ b/include/vec/impl/arm/neon.h Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** - * vec - a tiny SIMD vector library in plain C99 + * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal diff -r 4b5a557aa64f -r fd42f9b1b95e include/vec/impl/cpu.h --- a/include/vec/impl/cpu.h Sat Apr 26 01:04:35 2025 -0400 +++ b/include/vec/impl/cpu.h Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal @@ -477,36 +477,38 @@ #define VEC_CPU_FEATURES_RESET UINT32_C(0xFFFFFFFF) -static vec_uint32 vec_CPU_features = VEC_CPU_FEATURES_RESET; - -static void vec_get_CPU_features(void) +VEC_FUNC_IMPL uint32_t vec_get_CPU_features(void) { - vec_CPU_get_CPUID_features(); - vec_CPU_features = 0; - if (vec_CPU_have_ALTIVEC()) - vec_CPU_features |= VEC_CPU_HAS_ALTIVEC; - if (vec_CPU_have_ALTIVEC_VSX()) - vec_CPU_features |= VEC_CPU_HAS_ALTIVEC_VSX; - if (vec_CPU_have_MMX()) - vec_CPU_features |= VEC_CPU_HAS_MMX; - if (vec_CPU_have_SSE()) - vec_CPU_features |= VEC_CPU_HAS_SSE; - if (vec_CPU_have_SSE2()) - vec_CPU_features |= VEC_CPU_HAS_SSE2; - if (vec_CPU_have_SSE3()) - vec_CPU_features |= VEC_CPU_HAS_SSE3; - if (vec_CPU_have_SSE41()) - vec_CPU_features |= VEC_CPU_HAS_SSE41; - if (vec_CPU_have_SSE42()) - vec_CPU_features |= VEC_CPU_HAS_SSE42; - if (vec_CPU_have_AVX()) - vec_CPU_features |= VEC_CPU_HAS_AVX; - if (vec_CPU_have_AVX2()) - vec_CPU_features |= VEC_CPU_HAS_AVX2; - if (vec_CPU_have_AVX512F()) - vec_CPU_features |= VEC_CPU_HAS_AVX512F; - if (vec_CPU_have_NEON()) - vec_CPU_features |= VEC_CPU_HAS_NEON; + static vec_uint32 vec_CPU_features = VEC_CPU_FEATURES_RESET; + if (vec_CPU_features == VEC_CPU_FEATURES_RESET) { + vec_CPU_get_CPUID_features(); + vec_CPU_features = 0; + if (vec_CPU_have_ALTIVEC()) + vec_CPU_features |= VEC_CPU_HAS_ALTIVEC; + if (vec_CPU_have_ALTIVEC_VSX()) + vec_CPU_features |= VEC_CPU_HAS_ALTIVEC_VSX; + if (vec_CPU_have_MMX()) + vec_CPU_features |= VEC_CPU_HAS_MMX; + if (vec_CPU_have_SSE()) + vec_CPU_features |= VEC_CPU_HAS_SSE; + if (vec_CPU_have_SSE2()) + vec_CPU_features |= VEC_CPU_HAS_SSE2; + if (vec_CPU_have_SSE3()) + vec_CPU_features |= VEC_CPU_HAS_SSE3; + if (vec_CPU_have_SSE41()) + vec_CPU_features |= VEC_CPU_HAS_SSE41; + if (vec_CPU_have_SSE42()) + vec_CPU_features |= VEC_CPU_HAS_SSE42; + if (vec_CPU_have_AVX()) + vec_CPU_features |= VEC_CPU_HAS_AVX; + if (vec_CPU_have_AVX2()) + vec_CPU_features |= VEC_CPU_HAS_AVX2; + if (vec_CPU_have_AVX512F()) + vec_CPU_features |= VEC_CPU_HAS_AVX512F; + if (vec_CPU_have_NEON()) + vec_CPU_features |= VEC_CPU_HAS_NEON; + } + return vec_CPU_features; } #endif /* VEC_IMPL_CPU_H_ */ diff -r 4b5a557aa64f -r fd42f9b1b95e include/vec/impl/generic.h --- a/include/vec/impl/generic.h Sat Apr 26 01:04:35 2025 -0400 +++ b/include/vec/impl/generic.h Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** - * vec - a tiny SIMD vector library in plain C99 + * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal diff -r 4b5a557aa64f -r fd42f9b1b95e include/vec/impl/ppc/altivec.h --- a/include/vec/impl/ppc/altivec.h Sat Apr 26 01:04:35 2025 -0400 +++ b/include/vec/impl/ppc/altivec.h Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** - * vec - a tiny SIMD vector library in plain C99 + * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal diff -r 4b5a557aa64f -r fd42f9b1b95e include/vec/impl/x86/avx2.h --- a/include/vec/impl/x86/avx2.h Sat Apr 26 01:04:35 2025 -0400 +++ b/include/vec/impl/x86/avx2.h Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal diff -r 4b5a557aa64f -r fd42f9b1b95e include/vec/impl/x86/avx512f.h --- a/include/vec/impl/x86/avx512f.h Sat Apr 26 01:04:35 2025 -0400 +++ b/include/vec/impl/x86/avx512f.h Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal diff -r 4b5a557aa64f -r fd42f9b1b95e include/vec/impl/x86/mmx.h --- a/include/vec/impl/x86/mmx.h Sat Apr 26 01:04:35 2025 -0400 +++ b/include/vec/impl/x86/mmx.h Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal diff -r 4b5a557aa64f -r fd42f9b1b95e include/vec/impl/x86/sse2.h --- a/include/vec/impl/x86/sse2.h Sat Apr 26 01:04:35 2025 -0400 +++ b/include/vec/impl/x86/sse2.h Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal diff -r 4b5a557aa64f -r fd42f9b1b95e include/vec/impl/x86/sse3.h --- a/include/vec/impl/x86/sse3.h Sat Apr 26 01:04:35 2025 -0400 +++ b/include/vec/impl/x86/sse3.h Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal diff -r 4b5a557aa64f -r fd42f9b1b95e include/vec/impl/x86/sse41.h --- a/include/vec/impl/x86/sse41.h Sat Apr 26 01:04:35 2025 -0400 +++ b/include/vec/impl/x86/sse41.h Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal diff -r 4b5a557aa64f -r fd42f9b1b95e include/vec/impl/x86/sse42.h --- a/include/vec/impl/x86/sse42.h Sat Apr 26 01:04:35 2025 -0400 +++ b/include/vec/impl/x86/sse42.h Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal diff -r 4b5a557aa64f -r fd42f9b1b95e include/vec/vec.h --- a/include/vec/vec.h Sat Apr 26 01:04:35 2025 -0400 +++ b/include/vec/vec.h Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal @@ -1109,15 +1109,15 @@ void *q; size = count * nmemb; - if (size && size / count != nmemb) + if ((size && size / count != nmemb) + || size > VEC_MALLOC_MAX_SIZE) return NULL; /* nope */ - q = vec_malloc(size); + q = calloc(size + VEC_MALLOC_ADDITIONAL_SIZE, 1); + if (!q) + return NULL; - if (q) - memset(q, 0, size); - - return q; + return vec_internal_align_ptr_(q); } VEC_FUNC_IMPL void *vec_realloc(void *ptr, size_t newsize) diff -r 4b5a557aa64f -r fd42f9b1b95e utils/gengeneric.c --- a/utils/gengeneric.c Sat Apr 26 01:04:35 2025 -0400 +++ b/utils/gengeneric.c Sat Apr 26 02:54:44 2025 -0400 @@ -1,7 +1,7 @@ /** - * vec - a tiny SIMD vector library in plain C99 + * vec - a tiny SIMD vector library in C99 * - * Copyright (c) 2024 Paper + * Copyright (c) 2024-2025 Paper * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal @@ -37,9 +37,9 @@ * and then unpacking it all? */ static const char *header = "/**\n" - " * vec - a tiny SIMD vector library in plain C99\n" + " * vec - a tiny SIMD vector library in C99\n" " * \n" - " * Copyright (c) 2024 Paper\n" + " * Copyright (c) 2024-2025 Paper\n" " * \n" " * Permission is hereby granted, free of charge, to any person obtaining a copy\n" " * of this software and associated documentation files (the \"Software\"), to deal\n"