GPUTF8

[GameMaker] Shader-based UTF8 character string decoder.

This asset decodes given UTF8 character string (variable-length encoding) into UTF8 codepoints (fixed-length integer), which allows constant-time access to characters.

Using

There are few different Decoder for different input and outputs - but they behave the same.

As this uses shaders for decoding, you need to call the functions within Draw-event.

// Decode the UTF8 characters into codepoints.
// Then you may read codepoints anywhere in constant time.
var _buffer = GPUTF8_DecodeStringToBuffer(_string);
repeat(128)
{
  var _index = irandom(string_length(_string) - 1);
  var _offset = _index * buffer_sizeof(buffer_u32);
  var _codepoint = buffer_peek(_buffer, _offset, buffer_u32);
  show_debug_message($"{_index}th codepoint was: {_codepoint}");
}
// GameMaker allows accessing codepoints directly by using string_ord_at(...)
// -> But this is O(N) time; longer the string, longer it takes to look it up.
// -> Therefore accessing codepoints with large strings it can be slow with builtin function.
repeat(128)
{
  var _index = irandom(string_length(_string) - 1);
  var _codepointB = string_ord_at(_string, _index + 1);
  show_debug_message($"{_index}th codepoint was: {_codepoint}");
}


About implementation

This doesn't use floating-point textures, which would have made implementation easier and more efficient - but should allow broader support. As this uses rgba8unorm-surfaces instead, each pass in practice require encoding/decoding pixel information in some way or another.

After decoding UTF8 characters into codepoints, there are empty spaces between characters (if they required more than 1 byte).  Surfaces are compactified by doing prefix scan to generate indexes, and then binary search to find correct value for output position. 

Instead of binary search, point-vertices and VTF could have been used to do in constant time, but Windows export doesn't support VTF without GMD3D11 or similary extensions. To keep this asset dependency-free, I decided to use binary search instead. 

Also, Shady would have been great asset to use here, which allows importing functions to shaders. Now the functions between shaders are manually copy-pasted, which is prone to errors. 

Internally GPUTF8 creates surfaces and buffers whenever required, and destroys them when not needed. So it might not be great idea to spam it constantly. I could have made smarter solution, but eh.


Itchio example is running on HTML5 export.

To try GX/WASM export, you can test on GX games.

GPUTF8 in GX.games

Here is link to GitHub page:

GPUTF8 in GitHub


Known bugs

By time of writing, GameMaker's HTML5 export has it's unique problems, and during writing this asset I found couple of them related to buffers.

  • In HTML5 buffer_write doesn't write 4-byte UTF8 characters properly - garbling those characters.
  • In HTML5 buffer_poke misses some bytes with UTF8 characters, which use more than 1 byte (non-ASCII).

In practice, when using HTML5 avoid using buffer_poke with non-ASCII string, and assume emojis etc. 4-byte UTF8 characters don't work.

Published 12 hours ago
StatusReleased
CategoryTool
PlatformsHTML5, Windows
AuthorTeroHannula
Made withGameMaker
Tagscodepoint, GameMaker, gpu, Shaders, string, tool, utf8
ContentNo generative AI was used

Download

Download NowName your own price

Click download now to get access to the following files:

GPUTF8_Package.yymps 24 kB
GPUTF8_CompressedProject.yyz 200 kB
GPUTF8_WASM.zip 2 MB
GPUTF8_Windows.zip 3.1 MB

Leave a comment

Log in with itch.io to leave a comment.