find the most similar unicode characters based on a sketch.
2 years agoNovember 11, 2011
Many languages around the world use the familiar Latin alphabet (A-Z), but in order to represent the sounds of the language accurately, their writing systems employ diacritical marks and other special characters.
Speakers of these languages have difficulty entering text into a computer because keyboards are often not available, and even when they are, typing special characters can be slow and cumbersome.
Also, in many cases, speakers may not be completely familiar with the “correct” writing system and may not always know where the special characters belong. The end result is that for many languages, the texts people type in emails, blogs, and social networking sites are left as plain ASCII, omitting any special characters, and leading to ambiguities and confusion.
To solve this problem, we have created a free and open source Firefox add-on called Accentuate.us that allows users to type texts in plain ASCII, and then automatically adds all diacritics and special characters in the correct places–a process we call “Unicodification”. Accentuate.us uses a machine learning approach, employing both character-level and word-level models trained on data crawled from the web for more than 100 languages.