Trouble with Drupal, MySQL and non UTF-8 formats

I just figured out a fix for an annoyingly circuitous, tortuous, and baffling MySQL problem. I am not yet an expert of back-end system configuration, but I can tool around a bit (i.e. I know where the my.cnf, php.ini, etc. files live and when (mostly) to edit them). With that...

On our multi-lingual, multi-site Drupal platform we were recently dealing with what looked to be formatting errors in character encoding. We have a process by which the site admins will upload information in their local language (and therefore character set). We recently added our first fully Cyrillic site for the Russian region. We were finding that the encoding for the string

Шпионка

was being stored (and thus displayed on the site) as

Шпионка


I was able to pinpoint that the issue was occurring at the point when the information was being stored into MySQL.

Then I went through a plethora of possible solutions failed attempts. Some of my thwarted efforts included:

  • Saving the input files in multiple different character encodings
  • Changing the collation values for the affected tables
  • PHP functions and libraries that deal with character formatting. Just a few of those attempted: mb_convert_encoding, utf8_encode, iconv

This included chasing all of the additional tips and tricks posted by other users on the respective manual pages.

I even tried downloading additional text editors, as I found a dis/encouraging post, that my current one, TextMate, didn't yet support the text encoding I (thought) was looking for.


After much reading, googling and numerous head vs. desk showdowns, I finally stumbled along this Drupal.org post. It is a little outdated (Drupal 4.7 was the stable release at the time I think); but with some updates for the currently supported Drupal functions I was able to add the following call right before I insert the data into MySQL:

// make sure format is in UTF format
db_query('SET NAMES "UTF8"');
db_query("SET collation_connection='utf8_general_ci'");
db_query("SET collation_server='utf8_general_ci'");
db_query("SET character_set_client='utf8'");
db_query("SET character_set_connection='utf8'");
db_query("SET character_set_results='utf8'");
db_query("SET character_set_server='utf8'");
// Insert!
drupal_write_record('your_table_name', $insert_var);

(I tried placing these calls into my.cnf, but was unable to get mysqld to restart)

Some additional info that I found useful from this post was in response #18, from which I grabbed a MySQL command-line call that I used to verify that my tables affected were set to UTF8 format:

ALTER TABLE my_table CONVERT TO CHARACTER SET 'utf8'

Also in my search to transcode the gibberish strings that were being stored in MySQL, I found a nifty Cyrillic gibberish translator. It also gives you insight as to what format your input code *may* be transcoded from.


Lesson to take home: ALWAYS use UTF8 encoding!

I am aware that these suggestions may not work for everyone in their respective environments; but I felt obliged to share my experience, if only to save some time with suggestions not to try. :)

Filed under:

Comments

Hello,
It seems that we are having the same issues. I'm building a website with drupal. It is not an english website. The problem is that any time I sent the latin characters, and save the page, it will change to something like "?",
It seems that you have fix your problem. Could you please give me solution on how to fix the issues. Thank you

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <cpp>, <java>, <php>. The supported tag styles are: <foo>, [foo].
  • Web page addresses and email addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Ready for transformation?