Mercurial > web
comparison _posts/2024-06-09-schism-unicode-and-you.html @ 85:52d59a351bf5
add post about unicode in schism
author | Paper <paper@paper.us.eu.org> |
---|---|
date | Sun, 09 Jun 2024 18:20:45 -0400 |
parents | |
children | 1fed81c848a5 |
comparison
equal
deleted
inserted
replaced
84:5914d06f72b4 | 85:52d59a351bf5 |
---|---|
1 --- | |
2 layout: post | |
3 author: Paper | |
4 title: 'Schism Tracker, Unicode, and you' | |
5 --- | |
6 <span>Recently I've taken on adding real Unicode-awareness to Schism, and it was <i>surprisingly</i> easy, to say the least.</span> | |
7 <br><br> | |
8 <span>I was expecting to have to convert lots of things to be real Unicode, but nope! All that really needed to be done was to convert UTF-8 to CP437 where necessary to actually *draw* the data while keeping the internal form pure UTF-8, and then bundle everything up into a neat macro to keep everything consistent:</span> | |
9 <figure><pre><code>#define CHARSET_EASY_MODE_EX(MOD, in, inset, outset, x) \ | |
10 do { \ | |
11 MOD uint8_t* out; \ | |
12 charset_error_t err = charset_iconv(in, (uint8_t**)&out, inset, outset); \ | |
13 if (err) \ | |
14 out = in; \ | |
15 \ | |
16 x \ | |
17 \ | |
18 if (!err) \ | |
19 free((uint8_t*)out); \ | |
20 } while (0) | |
21 </code></pre></figure> | |
22 <span>I just shoved this macro anywhere necessary and it works perfectly fine for loading any Unicode path. For example, the Spanish word "maƱana" gets displayed correctly now:</span> | |
23 <br><br> | |
24 <img class="drop-shadow-box center-image" src="/media/blog/schism-spanish-file-listing.png"> | |
25 <br> | |
26 <span>The file sorting algorithms were a different beast though, and even now strverscmp doesn't have a real charset-independent variant. For strcasecmp, I had to implement (simple) Unicode case folding, which meant having a <a class="prettylink" href="https://github.com/schismtracker/schismtracker/blob/b858a5917ee7e83f7cb4da1ad698dd24159f241b/schism/charset_data.c#L183">switch statement that is almost 1500 lines long</a> and takes up about 20K of space in the binary.</span> | |
27 <br><br> | |
28 <span>Schism currently does not do any Unicode normalization when comparing strings. This is primarily a problem with decomposed strings (which will likely not get converted properly), though with filenames that probably shouldn't exist anyway...</span> | |
29 <br><br> | |
30 <span>anyway, Unicode is easy, if you can't use it properly it's a skill issue :p</span> |