TwitterFacebookGoogleLinkedInEmail

Moving MySQL database from LATIN1 to UTF-8 the easy way

I am a bad human being.  I confess that 7 years ago, when I was first putting together the Online Talent Manager personality testing system, I (I’m so ashamed to say this)  used ISO-8859-15 encoding in the MySQL database!  The first 6 months were fantastic, the application grew by leaps and bounds and the amount of data collected was impressive.

If you are a system’s administrator, you know exactly what has happened to me in the last 7 years.  A series of ‘little’ encoding problems with half-hearted fixes and a series of failed conversion schemes to ‘fix’ the problem.  Every new table, every new feature, every new anything became a two faze process:

  1. Build it.
  2. Play whack-a-mole with badly encoded characters.

And it wasn’t that the characters were badly encoded, it was just that I had to fit those characters into the bizarro scheme that had crudded up my applications over the years.  I thought many times about fixing it, really fixing it, and moving everything to UTF-8, but if you have seen any other posts on the internet about this topic, you know that there are horror stories enough to make a grown administrator quake in his sneakers.

But now, I HAVE DONE IT.  And it wasn’t really that painful or weird.  I’m assuming you are using MySQL and Apache on a flavor of Linux.  Here are the steps.

1)  Create a dump of your current database:

mysqldump -u youruser -pyourpass yourdatabasename > yourdatabase.sql

2) Shut down your web application and go through it removing ISO-8859 or any other encoding schemes from your pages and replacing everything with UTF-8.  If you are using some sort of templating system or CMS, this will be a breeze for you.

3) Now let’s clean up that .sql file with the iconv application.  Make sure you have it installed ( man iconv ).  Here is the formula to take your original dump and clean out all of the latin and replace it with gorgeous clean UTF8:

iconv –from-code LATIN1 –to-code UTF-8 yourdatabase.sql > utfclean_yourdatabase.sql

4) Open up your utfclean_yourdatabase.sql and replace all instances of “latin1″ with “utf8″.  I used vim on my 350 mb sql file and it took about 3 minutes:

vim utfclean_yourdatabase.sql

:%s/latin1/utf8/

:wq

5) Edit the mysql configuration file, in my Ubuntu installation, it was located at /etc/mysql/my.cnf.  Add the following line:

character-set-server=utf8

6) Make sure apache is talking utf8 by default.  Again, for me that was located in the /etc/apache2/conf.d/charset file.  I simply removed the comment from the following line:

AddDefaultCharset UTF-8

7) Restart Apache   /etc/init.d/apache2 restart

8) Restart MySQL /etc/init.d/mysql restart

9) Import your cleaned data back into the database:

mysql -u yourname -pyourpassword yourdatabase < utfclean_yourdatabase.sql

10) Your connections are all talking utf8 now, your database is utf8 now, and now it is just a case of cleaning up your own code to remove any hacks you might have installed.  That took about 2 hours for me and mostly it was just commenting out calls to encoding routines.

11) Enjoy your freedom.

 

Leave a comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

CyberChimps