Any developer working with php and MySQL will have come across this issue whilst developing.
And often the problem doesn’t rear it’s ugly head until after you’ve captured a lot of data.
If your getting strange characters in your database like question marks and , then follow this simple guide to check each part is set to UTF8.
UT8 is best practice for handling all types of characters and that’s the charset we will be using.
If your just trying to retrieve the unreadable data then see a solution at the bottom of this post.
First let’s check the MySQL server, open up your favourite GUI like MySQL Tools or via command line and run:
SHOW SESSION VARIABLES LIKE 'character_set%';
The important parts to notice here
These should all be utf8 , if they aren’t then you need configure your database to use utf8.
You can reverse engineer your table using
SHOW CREATE TABLE table_name;
At the very end you will see “DEFAULT CHARSET=charset” change this to utf8.
As further final checks run
show variables like "collation_database";
This should show ‘utf8_general_ci’
And you can check each column individually like so.
SHOW FULL COLUMNS FROM table_name;
All connections to MySQL need to use utf8, you can do this by using SET NAMES.
For PDO connections you can either include this in the SQL statement before “SET NAMES UTF8;” or you can include it in the connection e.g
$pdo = new PDO('mysql:host=myhost.com;dbname=db_name', 'db_user', 'db_password',array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8");
For standard mysqli connections use:
It’s important you place that straight after the connection before any queries are run.
What to do if you haven’t set this up from the start?
Thankfully there is a way to retrieve data that might otherwise look garbled in php you can use:
//ISO code for most browsers default encoding
Note that you must connect to the DB the way you where doing before you fixed any issues. If you try and retrieve data using SET NAME utf8 then the above will not work as your requesting the data in a different format.
ISO-8859-1 is the character encoding used for most browsers, however there are others so this may not work in every case.