annotate _posts/2024-06-09-schism-unicode-and-you.html @ 101:d9223d9ab9ba

blog: this post feels incomplete without a conclusion-ish
author Paper <paper@tflc.us>
date Fri, 27 Dec 2024 20:27:24 -0500
parents 60f77a3de847
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
85
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
1 ---
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
2 layout: post
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
3 author: Paper
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
4 title: 'Schism Tracker, Unicode, and you'
86
1fed81c848a5 html: add plugs
Paper <paper@paper.us.eu.org>
parents: 85
diff changeset
5 nowplaying: 'Holy Fuck - LP'
85
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
6 ---
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
7 <span>Recently I've taken on adding real Unicode-awareness to Schism, and it was <i>surprisingly</i> easy, to say the least.</span>
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
8 <br><br>
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
9 <span>I was expecting to have to convert lots of things to be real Unicode, but nope! All that really needed to be done was to convert UTF-8 to CP437 where necessary to actually *draw* the data while keeping the internal form pure UTF-8, and then bundle everything up into a neat macro to keep everything consistent:</span>
87
60f77a3de847 css: improve appearance on mobile
Paper <paper@paper.us.eu.org>
parents: 86
diff changeset
10 <figure><pre class="code-block"><code>#define CHARSET_EASY_MODE_EX(MOD, in, inset, outset, x) \
85
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
11 do { \
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
12 MOD uint8_t* out; \
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
13 charset_error_t err = charset_iconv(in, (uint8_t**)&out, inset, outset); \
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
14 if (err) \
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
15 out = in; \
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
16 \
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
17 x \
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
18 \
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
19 if (!err) \
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
20 free((uint8_t*)out); \
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
21 } while (0)
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
22 </code></pre></figure>
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
23 <span>I just shoved this macro anywhere necessary and it works perfectly fine for loading any Unicode path. For example, the Spanish word "maƱana" gets displayed correctly now:</span>
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
24 <br><br>
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
25 <img class="drop-shadow-box center-image" src="/media/blog/schism-spanish-file-listing.png">
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
26 <br>
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
27 <span>The file sorting algorithms were a different beast though, and even now strverscmp doesn't have a real charset-independent variant. For strcasecmp, I had to implement (simple) Unicode case folding, which meant having a <a class="prettylink" href="https://github.com/schismtracker/schismtracker/blob/b858a5917ee7e83f7cb4da1ad698dd24159f241b/schism/charset_data.c#L183">switch statement that is almost 1500 lines long</a> and takes up about 20K of space in the binary.</span>
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
28 <br><br>
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
29 <span>Schism currently does not do any Unicode normalization when comparing strings. This is primarily a problem with decomposed strings (which will likely not get converted properly), though with filenames that probably shouldn't exist anyway...</span>
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
30 <br><br>
52d59a351bf5 add post about unicode in schism
Paper <paper@paper.us.eu.org>
parents:
diff changeset
31 <span>anyway, Unicode is easy, if you can't use it properly it's a skill issue :p</span>