Font#text_width erratic with non-standard characters (Ruby)

By erisdiscord Date 2010-10-22 16:00

If you're on Mac OS X, I believe it uses Unicode normalisation form D, where accented characters are stored as the base letter itself followed by combining forms of, e.g., ´, ¨, ˚, etc.. In this case, Font#text_width (or the underlying library) might be incorrectly calculating the width by adding the "width" of the combining characters, giving you a too wide result.

Not that this is going to help you much, but it's a possible explanation. Probably something for Julian to look into.

By Spooner Date 2010-10-22 16:38

I'm on Win7x64, but that does sound like a reasonable explanation.

By erisdiscord Date 2010-10-22 16:57

Hmm. I get the impression that Windows prefers one of the composed forms, but ostensibly it supports all of them. I haven't found any resources that actually say for sure which is the preferred form, though!

What happens when you write "\u00F6" in a string literal instead of typing ö directly? What about "o\u0308"? These are equivalent in Unicode, but one is precomposed and the other is not.

By Spooner Date 2010-10-22 17:13

This is based on typing them in via the Gosu TextInput (using alt+0246, because I just have a UK keyboard). Maybe I should try with a different keyboard layout?

However, I haven't set the encoding in my ruby app. That may be a contributing factor? I'll try setting UTF-8 everywhere and see if it helps.

OK, I'll get back to you to see if either of those or the suggestion you made works. Thanks for the explanations!

By Spooner Date 2010-10-22 19:09

OK, UTF-8ed everything.

Tried "\u00F6" displays correctly, Font#text_width value is too wide, and when edited in a TextInput, crashes Gosu (I assume it is editing in 8-bit, so you end up with a null if you delete the F6 part).

Tried "o\u0308" and get instant crash when trying to display it (or maybe when I try Font#text_width?).

Tried "ö" (entered in Ruby string), gets error: "invalid multibyte char (UTF-8)"

Tried "ö" (alt+0246 entered into TextInput) displays correctly, Font#text_width value is too wide, editing OK.

Tried "\224" which works in IRB, but shows a vertical line in Gosu :(

The key problem, however, is that I need to read/write these from a TextInput, so I have to live with the codes it give me. Hmm, its strings are returned as "ASCII-8BIT", so maybe I should use that.

I tried the tests again in "ASCII-8bit" and that had slightly different failures. I think I'm missing something crucial here...

i18n is a pain!

By erisdiscord Date 2010-10-22 21:54

It turns out that "ö" and "\u00F6" display and calculate correctly on OS X; the decomposed form, "o\u0308", renders an o followed by a blank space and calculates the width "correctly" for that incorrect rendering. Your problem with Font#text_width might be a Windows specific bug at any rate.

To run your script with a literal ö you'll need to invoke Ruby with -KUTF-8, otherwise you get that invalid multibyte sequence error.

I haven't tried these with TextInput yet.

By Spooner Date 2010-10-22 22:51

I'm using the "# encoding: UTF-8" setting in Ruby 1.9.2 (I am not trying to be compatible with other versions). As I say, the issue is that the text being edited by the TextInput is in ASCII-8bit encoding and that is what I need to measure. It won't be a problem for the base game unless someone writes more i18n files, which is unlikely :P

jlnr said he'd look into it (IRC), so I'm going to stay hopeful of that working out.

By jlnr (dev) Date 2010-11-10 16:07

Can you post your exact editor, operating system and Ruby version? I tried my best to reproduce this on Windows but couldn't. I used Notepad++ and saved as "UTF-8 with no BOM", and both characters in the source files as well as characters entered via TextInput have the expected width.

By Spooner Date 2010-11-10 16:48

The encoding that the .rb file is saved in is irrelevant, as far as I'm aware, since it is loaded up into Ruby and all strings are converted to Ruby's default encoding unless you specify an encoding at the top of the file ("# encoding: whatever").

This example is saved in standard ANSI, but uses ascii-8bit so I can write the characters in the editor. I'm using Ruby 1.9.2 with win7 x64. When I use the standard Ruby encoding (second file in gist) it works fine.

https://gist.github.com/671052

Hmm, thinking further, however, I realise I don't actually need to put non-English characters in my ruby file (I'm using i18n gem) so it is sort of irrelevant in real usage. Sorry! Still, some understanding of why this fails might help...

By Spooner Date 2010-11-10 18:27 Edited 2010-11-10 19:39

Right, think I've got this down.

The string returned from TextInput.text is encoded as ASCII-8-bit, even though it is actually a UTF-8 string. Thus, when I #each_char it, it gives me every byte, which is not what I want and gives me a spurious width. So, I used TextInput.text.force_encoding('UTF-8') which then works correctly with #text_width ("ö" iterates to "\u00F6" which is correct and text_input is happy with that).

I'd suggest that you manually specify the encoding of the string as UTF-8 at your end (inside TextInput class somewhere), since that it what it is, not what it is encoded as.

EDIT: Apparently, caret_pos also treats the string as ASCII-8bit, since if I _just_ type 'ö', caret_pos goes from 0 to 2! When I then press left or right, the caret_pos moves from 0<->2. I'll leave you to work out why, though I'm sure it would fix itself if the string is in UTF-8 encoding :)

By jlnr (dev) Date 2010-11-10 21:39

Ah, ok. Yeah, the last bit is intentional because this is the only way it can reasonably work in 1.8, but it needs to to behave differently in 1.9. Good thing they're two binary gems and I can change that bit easily. Thanks, I'll give it a try. :)