| Revenue Source Veteran
Join Date: Oct 2005 Posts: 8,777 Jack of All Trades
CyberSpace
| Testing InnoDB ?Barracuda? format with compression -
04-23-2008
New features of InnoDB - compression format and fast index creation sound so promising so I spent some time to research time and sizes on data we have on our production. The schema of one of shards is PLAIN TEXT
SQL: - [FONT='Courier New', Courier, monospace]CREATE TABLE `article87` ([/font]
- [FONT='Courier New', Courier, monospace] `id` bigint(20) UNSIGNED NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `ext_key` varchar(32) NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `site_id` int(10) UNSIGNED NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `forum_id` int(10) UNSIGNED NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `thread_id` varchar(255) CHARACTER SET latin1 NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `published` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',[/font]
- [FONT='Courier New', Courier, monospace] `crawled` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',[/font]
- [FONT='Courier New', Courier, monospace] `subject` varchar(255) NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `title` varchar(255) NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `url` varchar(255) NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `num_links` smallint(6) NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `links_in` int(10) UNSIGNED NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `cache_author` varchar(255) NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `cache_site` varchar(255) DEFAULT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `anchor` varchar(255) NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `isthread` tinyint(3) UNSIGNED NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `author_id` int(10) UNSIGNED NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `inserted` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,[/font]
- [FONT='Courier New', Courier, monospace] `fromfile` varchar(255) NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `language_id` tinyint(3) UNSIGNED NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `encoding` varchar(255) NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `warning` mediumtext NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `is_thread_start` tinyint(3) UNSIGNED NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `source` mediumint(8) UNSIGNED NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `hash` char(32) NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] `mod_is` tinyint(3) UNSIGNED NOT NULL DEFAULT '0',[/font]
- [FONT='Courier New', Courier, monospace] `is_adult` tinyint(3) UNSIGNED NOT NULL DEFAULT '0',[/font]
- [FONT='Courier New', Courier, monospace] `bodyuc` mediumtext NOT NULL,[/font]
- [FONT='Courier New', Courier, monospace] PRIMARY KEY (`id`),[/font]
- [FONT='Courier New', Courier, monospace] KEY `ext_key` (`ext_key`),[/font]
- [FONT='Courier New', Courier, monospace] KEY `forum_id` (`forum_id`,`thread_id`,`published`),[/font]
- [FONT='Courier New', Courier, monospace] KEY `site_id` (`site_id`,`published`),[/font]
- [FONT='Courier New', Courier, monospace] KEY `hash` (`hash`),[/font]
- [FONT='Courier New', Courier, monospace] KEY `forum_id_2` (`forum_id`,`is_thread_start`,`published`),[/font]
- [FONT='Courier New', Courier, monospace] KEY `published` (`published`),[/font]
- [FONT='Courier New', Courier, monospace] KEY `inserted` (`inserted`),[/font]
- [FONT='Courier New', Courier, monospace] KEY `forum_id_3` (`forum_id`,`thread_id`,`is_thread_start`),[/font]
- [FONT='Courier New', Courier, monospace] KEY `site_id_2` (`site_id`,`author_id`)[/font]
- [FONT='Courier New', Courier, monospace]) ENGINE=InnoDB DEFAULT CHARSET=latin1; [/font]
in fact this is not exact schema - difference is we are using DEFAULT CHARSET=utf8, which seems not working with fast index creation ( bug 33650) , so I use latin1 for tests. Also we actually store 'bodyuc' as compressed field, doing compress()/uncompress() to store/restore. To test InnoDB compression I use only uncompressed text, though I tested compression on compress()-ed data just to see if there is any benefit from index compression.
For test I use dump created with mysqldump, final size 30286M.
To create InnoDB compression table I use ENGINE=InnoDB KEY_BLOCK_SIZE=8 and to test fast index creation I create table only with primary key, without additional indexes.
Numbers I got: load dump into table with default format and all indexes (case 1) - 152m34.792s
size of final .ibd file for (case 1) - 43032M What if load data and indexes separately: load dump into table with default format and without indexes (case 2a) - 59m12.575s
size of .ibd after (case 2a) - 36968M
create all indexes (case 2b) - 18min46.05s
size of .ibd after (case 2b) - 40400M So as you see time to load (case 2) (with fast create index way) is faster almost two times and space decreased by 6%.
Now with compressed table: load dump into table with default format and all indexes (case 3) - 228m55.251s (time took 1.5 times longer compare to (case 1) )
size of final .ibd file for (case 3) - 16284M (space ratio is 0.37 compare to (case 1) ) What if load data and indexes separately: load dump into table with default format and without indexes (case 4a) - 71m10.760s (longer 1.2 times compare to (case 2a))
size of .ibd after (case 4a) - 13208M (smaller 0.35 times)
create all indexes (case 4b) - 42m54.63sec (longer 2.28 times)
size of .ibd after (case 4b) - 14968M (smaller 0.37 times) Total time for (case 4) , 6844 sec, took 1.46 times longer compare to (case 2) , 4678 sec, but space decreased by even more ( 0.37 from original)
Also interesting to note that fast index creation allows to load time two time faster than usual load.
Ok, There are a lot of numbers, let me summarize it:
Test Load time, sec Size, MB Baseline (1), default format, usual load 9154 43032 default format, fast index creation 4678 (0.51)* 40400 (0.93) compress format, default load 13735 (1.50) 16284 (0.38) compress format, fast index creation 6844 (0.74) 14968 (0.35) * - ratio to baseline
So in conclusion
- Fast Index creation allows to speedup load 2 times (even indexes fit into memory)
- Load in compress format slower by 30-50%
- Table in compress format takes only 1/3 of original table
Of course we also expect significant performance gain for I/O bound queries in case of compression tables, this is topic for different research.
Entry posted by Vadim | No comment
Add to:  |  |  |  | Testing InnoDB ?Barracuda? format with compression - Read More... |