wwd.ca

 

mon petit blogue sans importance...

Some tips for Unicode django

As of rev 5609, the unicode branch of Django has been merged into trunk. It's mostly transparent, they say, but i've had a few difficulties with it. Here's a couple of simple tips to help.


First, it seems to help if you convert your templates, and even python files, to UTF-8. I of course use vim, and vim has very good unicode support. If your LANG is set correctly, vim should use the proper encoding; in my case, that's fr_CA.UTF-8. You can add set enc=utf-8 to your ~/.vimrc to make double-sure.

UTF-8 is neat, because up until you write a non-ascii character, the file will look, walk and talk like an ascii file. The second you enter a char which is outside the basic ascii character, it'll write that character in UTF-8. Hence the -8 in UTF-8: it's an 8-bit file format, but which can encode UTF characters.

So, in the shell, make sure your file is actually UTF-8 by doing something like:

esj@titan:520@~$ file yo
yo: UTF-8 Unicode text
Note that if you haven't put any non-ascii characters in there, it'll still say ASCII.

Then, make sure the file is properly tagged inside as UTF-8. In python, add this line:

# -*- coding: utf-8 -*-
As the first line of the file (or second, if your first is #!/usr/bin/env python). For templates, make sure there's a header line in the resulting html that says
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
(though django tells the browser in the response headers anyways, but i guess that if the file gets saved, there better be a header). If all this is done properly, you don't have to escape characters, like &eacute; for a é, in either your python code or your templates, or your data.

One last thing that bit me in the rear: when django objects get translated into strings, like for example an entry from this blog:

class Entry(models.Model):
    ...blabla...
    def __str__(self):
        return str(self.headline)
Or maybe you rely on headline's type (CharField) to translate itself into a str, and you don't call str at all. Either way, you'll have UnicodeError exceptions with non-ascii data. It seems the string gets translated into some form of ascii (perhaps 8859-1, i don't know) by str(), and then django puts the string through the unicode encoder, which throws its arms in the air and proclaims that he shouldn't be seeing characters >127 in the data. Do this instead:
from django.utils.encoding import smart_str
(...)
        return smart_str(self.headline)
and it works.

by wiswaud on 7 August 2007
Tags: django, english, geeky, python, web

Comments

Share this page
| More

follow me on Twitter