Wait hold on I just realized.

jannem@fosstodon.org

@krans @mcc
The bigger problem is that on the web and in apps there's usually no information on what language something is written in. Which means a browser or an app they can only guess what font to render Unicode han characters in. And when a user has installed support for more than one it is certain to frequently go wrong.

Edit: you don't need to know the language to always render "ä" correctly. You do need to know the language in order to render "骨".

krans@mastodon.me.uk

@jannem I agree. The root cause is that file formats, protocols and most programs are written almost entirely by English-speakers, who assume that only English-speaking people use computers and that all content will be in English.

For my entire lifetime, support for multilingual text has always been an afterthought — and many development frameworks make it incredibly difficult.

@mcc

krans@mastodon.me.uk

@jannem Also: “rendering” is necessary, but not sufficient. Collation, dictionary selection, punctuation, text-to-speech, etc. are all language-dependent.

@mcc

0xabad1dea@infosec.exchange

@mcc unfortunately there’s not really a good solution to this problem and Android, like everyone else, just has to pick a resolution method and stick with it. If you’ve heard of “Han Unification,” well it sounds like something that happened violently in 2200 BC but actually it happened quite recently in a Unicode meeting room and it causes this exact specific intractable issue

noone2333@mastodon.social

@mcc You can just say 八人 .
个 isn't needed here. It’s cleaner and more natural without it, especially in short, poetic or title-like phrases.

simon@tutut.delire.party

@mcc i once got homework graded as incorrect because the japanese dictionary website i used did not use "lang" html attributes and firefox ended up selecting a korean font

porglezomp@mastodon.social

@mcc I wonder if android has an API to indicate what language a text field is in? Phanpy web (iOS) handles the character variation just fine and I wonder if it’s because browsers let you set languages for text + it’s using the annotated post language?

porglezomp@mastodon.social

@mcc it seems like it is actually using the declared language on my end because if I switch the post language here to Japanese I see the Japanese variants of the characters, and if I switch it back to Chinese I see the Chinese variants of the characters.

Test post marked as Japanese: 八人入

misterdave@tilde.zone

@mcc @Heliograph @rk my mind went immediately to Knuth up-arrow, which gives numbers lots of friends

serriadh@social.treehouse.systems

@mcc do you ever feel like the time between you making a lighthearted shitpost and then uncovering a pit of writhing software horrors gets shorter every year.

ingalovinde@embracing.space

@jannem @mcc and like telling everybody in the west plus the Greek plus everybody in the eastern Europe that actually all "A"s are the same character.
And that English "B" or "H" and Cyrillic "В" or "Н" are also the same (hint: these Cyrillic letters are actually for "v" and "n")

abrasive@digipres.club

@0xabad1dea @mcc also an act of violence, I would argue

porglezomp@mastodon.social

@mcc I couldn’t find an Android API to do this right, and I found what seems like a reasonable iOS API but it doesn’t do what I was expecting so I’m not actually sure it’s possible to do this well except with web technologies

bigshellevent@toot.cat

@mcc you've been on Chinese fanfiction sites I see

slowtiger@berlin.social

@mcc @Heliograph @rk
It's a Totoro umbrella.

kranzi@mastodon.green

@mcc i think the first one is japanese, the second simplified chinese.

mcc@mastodon.social

@porglezomp I'm talking to someone and we think we found the android API

groxx@hachyderm.io

@0xabad1dea @mcc I suppose the only actually reliable approach would be to store the IME locale per character or something so that it can be accurately rendered as it was written... or are these truly identical graphemes, and there's no chance of confusion in context? Even when people use multiple languages simultaneously?

(late edit after reading a lot more: ah, I see they DID just add a variant-selector character to effectively specify the locale... that seems a bit unlikely to gain major use, but technically I like it I guess)

Maybe one day we'll have UTF-8-2 and it'll just be infinitely extendable, rather than using a limited length prefix.

mcc@mastodon.social

@groxx @0xabad1dea There are various existing solutions but just because the solutions exist does not mean people follow them corectly

groxx@hachyderm.io

@mcc @0xabad1dea definitely agreed. even technically, it seems very unlikely to me that any IME is going to choose to, like, add variant selectors *to every single character* and confuse their users when it's blended with other text or in a size-limited scenario. those characters already take up a ton of space, making it worse won't go over well.

Piero Bosio Social Web Site Personale

Wait hold on I just realized.

Feed RSS

Gli ultimi otto messaggi ricevuti dalla Federazione

Post suggeriti

[in orbit]

Non si può continuare a stare a guardare l'umanità che si suicida

No Alpitour?

Oggi per lavoro ho aperto Facebook e ho scoperto che il patriarcato lo ha inventato Sal Da Vinci.