in Programming, Web Development

Character encoding issues utf8 & MySQL – How To

Any developer working with php and MySQL will have come across this issue whilst developing.
And often the problem doesn’t rear it’s ugly head until after you’ve captured a lot of data.

If your getting strange characters in your database like question marks and , then follow this simple guide to check each part is set to UTF8.

UT8 is best practice for handling all types of characters and that’s the charset we will be using.

If your just trying to retrieve the unreadable data then see a solution at  the bottom of this post.

MySQL DB

First let’s check the MySQL server, open up your favourite GUI like MySQL Tools or via command line and run:

SHOW SESSION VARIABLES LIKE 'character_set%';

charset

The important parts to notice here

  • character_set_database
  • character_set_results
  • character_set_connection.

These should all be utf8 , if they aren’t then you need configure your database to use utf8.

You can reverse engineer your table using

SHOW CREATE TABLE table_name;

At the very end you will see “DEFAULT CHARSET=charset” change this to utf8.

As further final checks run

show variables like "collation_database";

This should show ‘utf8_general_ci’

And you can check each column individually like so.

SHOW FULL COLUMNS FROM table_name;

PHP Connection

All connections to MySQL need to use utf8, you can do this by using SET NAMES.

For PDO connections you can either include this in the SQL statement before “SET NAMES UTF8;” or you can include it in the connection e.g

$pdo = new PDO('mysql:host=myhost.com;dbname=db_name', 'db_user', 'db_password',array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8");

For standard mysqli connections use:

$mysqli->set_charset("utf8");

It’s important you place that straight after the connection before any queries are run.

What to do if you haven’t set this up from the start?

Thankfully there is a way to retrieve data that might otherwise look garbled in php you can use:


//ISO code for most browsers default encoding

mb_convert_encoding($text, "ISO-8859-1")

Note that you must connect to the DB the way you where doing before you fixed any issues. If you try and retrieve data using SET NAME utf8 then the above will not work as your requesting the data in a different format.

ISO-8859-1 is the character encoding used for most browsers, however there are others so this may not work in every case.

Write a Comment

Comment