32 Tips To Speed Up Your MySQL Queries

If you are interested in how to create fast MySQL queries, this article is for you

  1. Use persistent connections to the database to avoid connection overhead.
  2. Check all tables have PRIMARY KEYs on columns with high cardinality (many rows match the key value). Well,`gender` column has low cardinality (selectivity), unique user id column has high one and is a good candidate to become a primary key.
  3. All references between different tables should usually be done with indices (which also means they must have identical data types so that joins based on the corresponding columns will be faster). Also check that fields that you often need to search in (appear frequently in WHERE, ORDER BY or GROUP BY clauses) have indices, but don’t add too many: the worst thing you can do is to add an index on every column of a table (I haven’t seen a table with more than 5 indices for a table, even 20-30 columns big). If you never refer to a column in comparisons, there’s no need to index it.
  4. Using simpler permissions when you issue GRANT statements enables MySQL to reduce permission-checking overhead when clients execute statements.
  5. Use less RAM per row by declaring columns only as large as they need to be to hold the values stored in them.
  6. Use leftmost index prefix — in MySQL you can define index on several columns so that left part of that index can be used a separate one so that you need less indices.
  7. When your index consists of many columns, why not to create a hash column which is short, reasonably unique, and indexed? Then your query will look like:
  8. Consider running ANALYZE TABLE (or myisamchk --analyze from command line) on a table after it has been loaded with data to help MySQL better optimize queries.
  9. Use CHAR type when possible (instead of VARCHAR, BLOB or TEXT) — when values of a column have constant length: MD5-hash (32 symbols), ICAO or IATA airport code (4 and 3 symbols), BIC bank code (3 symbols), etc. Data in CHAR columns can be found faster rather than in variable length data types columns.
  10. Don’t split a table if you just have too many columns. In accessing a row, the biggest performance hit is the disk seek needed to find the first byte of the row.
  11. A column must be declared as NOT NULL if it really is — thus you speed up table traversing a bit.
  12. If you usually retrieve rows in the same order like expr1, expr2, ..., make ALTER TABLE ... ORDER BY expr1, expr2, ... to optimize the table.
  13. Don’t use PHP loop to fetch rows from database one by one just because you can — use IN instead, e.g.
  14. Use column default value, and insert only those values that differs from the default. This reduces the query parsing time.
  15. Use INSERT DELAYED or INSERT LOW_PRIORITY (for MyISAM) to write to your change log table. Also, if it’s MyISAM, you can add DELAY_KEY_WRITE=1 option — this makes index updates faster because they are not flushed to disk until the table is closed.
  16. Think of storing users sessions data (or any other non-critical data) in MEMORY table — it’s very fast.
  17. For your web application, images and other binary assets should normally be stored as files. That is, store only a reference to the file rather than the file itself in the database.
  18. If you have to store big amounts of textual data, consider using BLOB column to contain compressed data (MySQL’s COMPRESS() seems to be slow, so gzipping at PHP side may help) and decompressing the contents at application server side. Anyway, it must be benchmarked.
  19. If you often need to calculate COUNT or SUM based on information from a lot of rows (articles rating, poll votes, user registrations count, etc.), it makes sense to create a separate table and update the counter in real time, which is much faster. If you need to collect statistics from huge log tables, take advantage of using a summary table instead of scanning the entire log table every time.
  20. Don’t use REPLACE (which is DELETE+INSERT and wastes ids): use INSERT … ON DUPLICATE KEY UPDATE instead (i.e. it’s INSERT + UPDATE if conflict takes place). The same technique can be used when you need first make a SELECT to find out if data is already in database, and then run either INSERT or UPDATE. Why to choose yourself — rely on database side.
  21. Tune MySQL caching: allocate enough memory for the buffer (e.g. SET GLOBAL query_cache_size = 1000000) and define query_cache_min_res_unit depending on average query resultset size.
  22. Divide complex queries into several simpler ones — they have more chances to be cached, so will be quicker.
  23. Group several similar INSERTs in one long INSERT with multiple VALUES lists to insert several rows at a time: quiry will be quicker due to fact that connection + sending + parsing a query takes 5-7 times of actual data insertion (depending on row size). If that is not possible, use START TRANSACTION and COMMIT, if your database is InnoDB, otherwise use LOCK TABLES — this benefits performance because the index buffer is flushed to disk only once, after all INSERT statements have completed; in this case unlock your tables each 1000 rows or so to allow other threads access to the table.
  24. When loading a table from a text file, use LOAD DATA INFILE (or my tool for that), it’s 20-100 times faster.
  25. Log slow queries on your dev/beta environment and investigate them. This way you can catch queries which execution time is high, those that don’t use indexes, and also — slow administrative statements (like OPTIMIZE TABLE and ANALYZE TABLE)
  26. Tune your database server parameters: for example, increase buffers size.
  27. If you have lots of DELETEs in your application, or updates of dynamic format rows (if you have VARCHAR, BLOB or TEXT column, the row has dynamic format) of your MyISAM table to a longer total length (which may split the row), schedule running OPTIMIZE TABLE query every weekend by crond. Thus you make the defragmentation, which means more speed of queries. If you don’t use replication, add LOCAL keyword to make it faster.
  28. Don’t use ORDER BY RAND() to fetch several random rows. Fetch 10-20 entries (last by time added or ID) and make array_random() on PHP side. There are also other solutions.
  29. Consider avoiding using of HAVING clause — it’s rather slow.
  30. In most cases, a DISTINCT clause can be considered as a special case of GROUP BY; so the optimizations applicable to GROUP BY queries can be also applied to queries with a DISTINCT clause. Also, if you use DISTINCT, try to use LIMIT (MySQL stops as soon as it finds row_count unique rows) and avoid ORDER BY (it requires a temporary table in many cases).
  31. When I read “Building scalable web sites”, I found that it worth sometimes to de-normalise some tables (Flickr does this), i.e. duplicate some data in several tables to avoid JOINs which are expensive. You can support data integrity with foreign keys or triggers.
  32. If you want to test a specific MySQL function or expression, use BENCHMARK function to do that.

About the author

This article was written by Alexander Skakunov, author of I want to be free blog, certified MySQL developer.

Comments

Nice tips. Thank you.

Most of the above tips are in reneral for RDBMS (mentioned in every RDBMS book) and not specfic to MySQL. They can be applied to any database.

Thanks for collecting the tips and putting it here.

You are missing prepared queries. To use them for INSERTs (and then insert one row after another) is quicker than INSERTing many rows at once without prepared query.

This is a great list of tips, but most of the time, server side tweaking and triggers are not accessible to developers.
Nevertheless, I already changed a few lines of code in my applications. based on what you've just shared.

On the de-normalization part, I did that for a large scale app, and it paid off not only in 'database speed' but also in making writing queries a lot easier.

Excuse me, but you are offering harmful advice. "Use a hash column which is short, reasonably unique." That's a very simple re-implentation of an index; except your pseudo-index won't auto-update when data changes, to say nothing of performance (e.g. concatenating strings is expensive).

Please don't re-invent square a wheel to solve a nonexisting problem, __especially__ when an optimized, easy to use, well-functioning round wheel already exists in the system.

I meant *"a square wheel", obviously.

Many Thanks, really great article

Nice Tips!!

Great set of tips. In #9 you mention storing MD5 strings as CHAR(32).

A MD5 is 16 bytes big and can be stored in a CHAR(16), it's the Hexidecimal representation of it that is 32 bytes long. You can cast it either way with HEX() and UNHEX()

Seems petty but if you're indexing it and can half the field length it becomes significant.

Using persistent connections can be very dangerous, as they are per user, per Apache process, and cannot be shared between users. So every user may be using multiple database connections, which can lead to MySQL running out of connections. Connection pooling is a far better method to reduce connection overhead.

What a nice collection! Thanks alot!

thanks...gr8 work..

dear
my this php code make slow queries:
$sql = "SELECT * FROM table order by id desc limit 0,12";
$result = mysql_query($sql, $db) or die(mysql_error());

how to fix it

This is very simple & nice article

Wow ! I usualy use TuningPrimer for mysql tuning but these tips are #1 !

What a cool article..... i cant believe I have found answers at last...

Wow this is a great list thanks. I wish you would elaborate on #13.

Thank you guys for the feedback! (I am an author of the article)

Andi Kalsch: You are missing prepared queries

yes, this is a great technique!

Horia Dragomir: server side tweaking and triggers are not accessible to developers

Whom then they are accessible to, I wonder? =)

Piskvor: That's a very simple re-implentation of an index

Hum, it seems you are right.

Jimbo: A MD5 is 16 bytes big and can be stored in a CHAR(16)

Very nice, thank you!

Cameron Eure: Using persistent connections can be very dangerous

Yes, common sense must be used as well.

Hassan: SELECT * FROM table order by id desc limit 0,12

Add primary key on id field.

Thanks for sharing these tips

Just to add to the bit about 16-byte MD5's

For those of you wondering how to do range searches on these, or partitioning by the MD5, something which is more common that you would think.. ( andwhich you would need an integer representation for), there is a neat little trick you can do.

create table Hashes (MD5_Byte tinyint unsigned, MD5_Hash binary(16));
alter table add index MD5_Byte (MD5_Byte);

smallint byte is 1 - 255 which... when converting your 16-byte hex representation, you take the first byte... say 'FF' and do unhex('FF'). Put that result in MD5_Byte.

I do this for two reasons....

1. You can do quick range search on the MD5 (select * from Hashes where MD5 BETWEEN conv('00', 16, 10) AND conv('AA', 16, 10);

2. You dont necessarly have to have the MD5 indexed (unless you really need it to be unique. You have an index on the first byte, which tends to have a reasonably low cardinality, unless you have 100's of millions of md5's. Just be aware that indexing 16-bytes can be quite taxing on your system, and unless you need to look up and print out the MD5 data straight from the index, its better to seek a subset through the MD5_Byte and find pointers to the MD5's matching it on disk.

If you do need VERY fast lookups, and do alot of them on the md5 column, then unique index it (if its unique), and if you can afford the RAM. Normal index is loaded from disk and cached on demand... Remember primary key (big no no)... use normal unique index... try to have an integer as a rowid and put primary on that.... Primary key is held in RAM and 16-byte integer with many rows will take up too much room / ram.

NOTE: For those of you who think you can create a 1 or 2 byte partial index on the binary(16) column, think again. You cannot search binary representations this way.

Just a correction to my previous post.

When taking the first byte of the hex md5, you do conv('FF', 16, 10) NOT unhex('FF') (unhex makes a binary representation of the number)

SQL Query for MySQL is a useful tool that lets you quickly and simply build SQL queries to MySQL databases. any way good article

Just a correction to my previous post.
When taking the first byte of the hex md5, you do conv('FF', 16, 10) NOT unhex('FF') (unhex makes a binary representation of the number)

These tips help me a lot! really thank you!

Good set of info!

Great tips... Thanks a lot

More good examples you can find here:
http://www.hibuddyz.com/forum

Great tips, thanks. I usually use Oracle though.

Helpful overall. I've got a 3.5m table and its slow to do anything with, I don't understand why though. All I can think of is that MySQL isn't getting enough memory allotted to it. I'll keep looking for tips and tricks, but I did learn some things from this.

Hi All,
THis is the very useful article.

Thanks

Advertise on this site

Recent Comments