I believe we wrote about this before, but this topic popups again and again.
Today I've read opinion that if we have clause WHERE has_something=1 we should have index on column `has_something` (the column has two values 0 and 1).
In reality the right answer is not so simple.
Let's look next table
PLAIN TEXT
SQL:
- [FONT='Courier New', Courier, monospace]CREATE TABLE `testr` ([/font]
- [FONT='Courier New', Courier, monospace] `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,[/font]
- [FONT='Courier New', Courier, monospace] `name` varchar(32) NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `has_something` tinyint(3) UNSIGNED NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] PRIMARY KEY (`id`),[/font]
- [FONT='Courier New', Courier, monospace] KEY `has_something` (`has_something`)[/font]
- [FONT='Courier New', Courier, monospace]) ENGINE=MyISAM [/font]
with 20.000.000 records.
And in first case has_something=0 for 90% of rows (with random distribution)
PLAIN TEXT
SQL:
- [FONT='Courier New', Courier, monospace]mysql> SELECT cnt0/cnt FROM (SELECT count(*) cnt0 FROM testr WHERE has_something=0) t, (SELECT count(*) cnt FROM testr) t1;[/font]
- [FONT='Courier New', Courier, monospace]+----------+[/font]
- [FONT='Courier New', Courier, monospace]| cnt0/cnt |[/font]
- [FONT='Courier New', Courier, monospace]+----------+[/font]
- [FONT='Courier New', Courier, monospace]| 0.9001 | [/font]
- [FONT='Courier New', Courier, monospace]+----------+[/font]
- [FONT='Courier New', Courier, monospace]1 row IN SET (7.56 sec) [/font]
Let's check execution time with and without index
mysql> select count(name) from testr force key (has_something) where has_something=0;
+-------------+
| count(name) |
+-------------+
| 18001245 |
+-------------+
1 row in set (35.96 sec)
mysql> select count(name) from testr ignore key (has_something) where has_something=0;
+-------------+
| count(name) |
+-------------+
| 18001245 |
+-------------+
1 row in set (10.46 sec)
As you see with index the time is by 3.5 times slower.
Good that mysql in this case choose do not use index
PLAIN TEXT
SQL:
- [FONT='Courier New', Courier, monospace]mysql> EXPLAIN SELECT count(name) FROM testr WHERE has_something=0; [/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+[/font]
- [FONT='Courier New', Courier, monospace]| id | select_type | TABLE | type | possible_keys | KEY | key_len | ref | rows | Extra |[/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+[/font]
- [FONT='Courier New', Courier, monospace]| 1 | SIMPLE | testr | ALL | has_something | NULL | NULL | NULL | 15000000 | USING WHERE | [/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+[/font]
- [FONT='Courier New', Courier, monospace]1 row IN SET (0.00 sec) [/font]
Let look the case when has_something = 0 for 50% of rows.
mysql> select count(name) from testr force key (has_something) where has_something=0;
1 row in set (20.27 sec)
mysql> select count(name) from testr ignore key (has_something) where has_something=0;
1 row in set (10.62 sec)
query with index is still 2 times slower.
and this time mysql is going to use index in execution plan:
PLAIN TEXT
SQL:
- [FONT='Courier New', Courier, monospace]mysql> EXPLAIN SELECT count(name) FROM testr WHERE has_something=0; +----+-------------+-------+------+---------------+---------------+---------+-------+---------+-------+[/font]
- [FONT='Courier New', Courier, monospace]| id | select_type | TABLE | type | possible_keys | KEY | key_len | ref | rows | Extra |[/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+-------+------+---------------+---------------+---------+-------+---------+-------+[/font]
- [FONT='Courier New', Courier, monospace]| 1 | SIMPLE | testr | ref | has_something | has_something | 1 | const | 8890716 | |[/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+-------+------+---------------+---------------+---------+-------+---------+-------+[/font]
- [FONT='Courier New', Courier, monospace]1 row IN SET (0.00 sec) [/font]
What about 30% rows with has_something=0 ?
mysql> select count(name) from testr force key (has_something) where has_something=0;
1 row in set (12.36 sec)
mysql> select count(name) from testr ignore key (has_something) where has_something=0;
1 row in set (10.51 sec)
Still query without index is faster.
And finally for case with 20% rows with has_someting=0
mysql> select count(name) from testr force key (has_something) where has_something=0;
1 row in set (8.39 sec)
mysql> select count(name) from testr ignore key (has_something) where has_something=0;
1 row in set (10.43 sec)
So only in the last case we really need the index on column `has_something`
Do you always need index on WHERE column ? - Read More...