Common MySQL traps

Published 6/9/2019

Note: Some of these might also exist in other database systems.

utf8 is not actually utf8

Ever ran into this SQL exception? Incorrect string value: โ€˜\xF0\x9F\x98\x83 <โ€ฆโ€™ for column...

You probably ran into the same trap as many others. Turns out mySQL's utf8 charset can only store around 6% of all possible unicode code points. This also excludes emoticons for example.

insert into emails(body) values ('๐Ÿฆ„๐Ÿฆ„๐Ÿฆ„');

That's why you should use utf8mb4 instead.

Read more about it and how to migrate: https://mathiasbynens.be/notes/mysql-utf8mb4

Check out my e-book!

Learn to simplify day-to-day code and the balance between over- and under-engineering.

comparing varchar_field with false or 0

Imagine the following contrived example

function normalizeEmail(email) {
    if (!validate(email)) {
        return false
    }

    return normalize(email)
}

// somewhere in the code
await User.where('email', normalizeEmail(email)).first()

The imaginary ORM would execute the query select * from users where email = <prepared value> LIMIT 1

If the validation inside normalizeEmail was successful, the query would be select * from users where email = '[email protected]' LIMIT 1.

If the validation was not successful, the query would be select * from users where email = false LIMIT 1.

Now go ahead and run something like select * from users where <varchar field> = false in your database system.

Since the fields are not comparable, MySQL will convert one to the other, make them match and return you all users. Our ORM will simply pick the first user and continue the logic with that. ๐Ÿ˜ฌ Pretty dangerous.

The same happens with field = 0

insert on duplicate key update creates primary key holes

Imagine we have a table statistics with the columns id(AI), fkid, title.

INSERT INTO statistics (fkid, title) VALUES (1, 'first');

This will insert a new record with the id 1. Let's imagine there is some batch which always inserts or updates the title. It might execute the following queries over some time:

INSERT INTO statistics (fkid, title) VALUES (1, 'second') ON DUPLICATE KEY UPDATE title = 'second';
INSERT INTO statistics (fkid, title) VALUES (1, 'third') ON DUPLICATE KEY UPDATE title = 'third';
INSERT INTO statistics (fkid, title) VALUES (1, 'fourth') ON DUPLICATE KEY UPDATE title = 'fourth';

Finally we want to insert a record with a new fkid.

INSERT INTO statistics (fkid, title) VALUES (100, 'first') ON DUPLICATE KEY UPDATE title = 'first';

This inserts a new record in the table, but guess what the id is? You'd might expect it to be 2, but in reality, everytime insert on duplicate key update failed to insert and processed the update, it incremented the auto increment internally. That means the id of this record would be 5.

It's not too important that the IDs are really sequential, but you might wonder what's the cause of it.

Read more about it: https://stackoverflow.com/questions/38347110/prevent-innodb-auto-increment-on-duplicate-key

int(2) is not what you think it means

The length on INTs have no real meaning in MySQL, you can still insert values like 9999999. They only restrict how many characters should be displayed in the command line client. ๐Ÿคจ

Read more about it: https://stackoverflow.com/questions/5634104/what-is-the-size-of-column-of-int11-in-mysql-in-bytes