[isf-wifidog] UTF-8 codebase and database

Mina Naguib webmaster at topfx.com
Sam 2 Avr 11:34:18 EST 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Francois Proulx wrote:
> WARNING TO ALL DEVELOPPERS WITH CVS ACCESS
> As of yesterday (4h30pm EDT) cvs commit, the codebase and the database schema is 100% UTF-8. We did this because we were starting to have inconsistencies : I translated the e-mail messages in french and we had to make sure the whole system was either ISO-8859-1 or UTF-8. The HTML encoding is now UTF-8. Now the .po file contains UTF-8 characters ( the emails are sent with (text/plain, UTF-8 headers ).I'll have to modify the gen.sh script to match UTF-8 on monday... 
> 
> WHAT THIS MEANS, is that as of now everybody MUST use an UTF-8 compliant editor if they want to commit their work. Eclipse does this job beautifully (check Tools...Preferences menu I think). Editing .po files should be done with translation tools like KBabel to insure valid UTF-8 enconding. In the near future, I suggest we change all the HTML entities in the .po file to UTF-8 characters ( should be fairly easy with find and replace in KBabel ). Using HTML entities prevents us from using spell checking etc... and we can't use existing translations in emails, because they are sent in text/plain.
> 
> P.S: Translating the live database from ISO-8859-1 to UTF-8 meant possible password MD5 hashing matching issues ( if any user had password containing special characters ). That's why we now have a "legacy system" hack ! Duh... :-( We must recode password values we receive in UTF-8 to ISO-8859-1 to match the existing MD5 in database.
> 
> Keep in mind UTF-8 !

Slightly off-topic

UTF8/unicode is often considered black art, but it doesn't have to be.
If anyone needs a nice, readable tutorial check this URL out:

http://www.joelonsoftware.com/articles/Unicode.html

"The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)"

Also if you haven't noticed, most unices come with the "iconv" library
and the accompanying "iconv" binary.  They make encoding to and from
different character sets a breeze.  Most high-level languages also have
hooks for the iconv library.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCTsmKeS99pGMif6wRAqeBAJwKeQQ7Xp5IwmjhZLgk31yRHIOShgCffR7w
xtGX0gXa7/Gr9va3vi20i3Q=
=b4dn
-----END PGP SIGNATURE-----


Plus d'informations sur la liste de diffusion WiFiDog