# HG changeset patch # User Paper # Date 1717971645 14400 # Node ID 52d59a351bf5bb7456cc3ad9c4fcb2bc26983902 # Parent 5914d06f72b44204c48bf3ef103dc41785592ae0 add post about unicode in schism diff -r 5914d06f72b4 -r 52d59a351bf5 _posts/2024-05-19-how-do-I-blog.html --- a/_posts/2024-05-19-how-do-I-blog.html Wed Jun 05 21:02:44 2024 -0400 +++ b/_posts/2024-05-19-how-do-I-blog.html Sun Jun 09 18:20:45 2024 -0400 @@ -8,4 +8,3 @@ anyway, with the blog stuff, I'll write here when I actually have things to blog about ;)

P.S.: you can also get a feed of these posts under /blog/feed.xml -
diff -r 5914d06f72b4 -r 52d59a351bf5 _posts/2024-06-09-schism-unicode-and-you.html --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/_posts/2024-06-09-schism-unicode-and-you.html Sun Jun 09 18:20:45 2024 -0400 @@ -0,0 +1,30 @@ +--- +layout: post +author: Paper +title: 'Schism Tracker, Unicode, and you' +--- +Recently I've taken on adding real Unicode-awareness to Schism, and it was surprisingly easy, to say the least. +

+I was expecting to have to convert lots of things to be real Unicode, but nope! All that really needed to be done was to convert UTF-8 to CP437 where necessary to actually *draw* the data while keeping the internal form pure UTF-8, and then bundle everything up into a neat macro to keep everything consistent: +

#define CHARSET_EASY_MODE_EX(MOD, in, inset, outset, x) \
+	do { \
+		MOD uint8_t* out; \
+		charset_error_t err = charset_iconv(in, (uint8_t**)&out, inset, outset); \
+		if (err) \
+			out = in; \
+	\
+		x \
+	\
+		if (!err) \
+			free((uint8_t*)out); \
+	} while (0)
+

+I just shoved this macro anywhere necessary and it works perfectly fine for loading any Unicode path. For example, the Spanish word "mañana" gets displayed correctly now: +

+

+
+The file sorting algorithms were a different beast though, and even now strverscmp doesn't have a real charset-independent variant. For strcasecmp, I had to implement (simple) Unicode case folding, which meant having a switch statement that is almost 1500 lines long and takes up about 20K of space in the binary. +

+Schism currently does not do any Unicode normalization when comparing strings. This is primarily a problem with decomposed strings (which will likely not get converted properly), though with filenames that probably shouldn't exist anyway... +

+anyway, Unicode is easy, if you can't use it properly it's a skill issue :p diff -r 5914d06f72b4 -r 52d59a351bf5 css/style.css --- a/css/style.css Wed Jun 05 21:02:44 2024 -0400 +++ b/css/style.css Sun Jun 09 18:20:45 2024 -0400 @@ -158,3 +158,8 @@ .blog-date-right { float: inline-end; } + +.center-image { + display: block; + margin: 0 auto; +} diff -r 5914d06f72b4 -r 52d59a351bf5 index.html --- a/index.html Wed Jun 05 21:02:44 2024 -0400 +++ b/index.html Sun Jun 09 18:20:45 2024 -0400 @@ -23,7 +23,10 @@

Socials

- E-mail (paper@paper.us.eu.org); please do not send me HTML-infested garbage + E-mail (paper@paper.us.eu.org); please send me plain text email if possible. +
+ IRC; you can usually find me in #openmpt on Libera Chat.
Discord (@slipofpaper) diff -r 5914d06f72b4 -r 52d59a351bf5 media/blog/schism-spanish-file-listing.png Binary file media/blog/schism-spanish-file-listing.png has changed