方案八:CockroachDB
cockroachdb是分布式、支持事务、支持SQL操作、K-V存储模式的数据库。CockroachDB的三位创始人全部来自Google,其架构受到Google的 Spanner和F1的启发,cockroach开源地址。
具有:
- 标准SQL接口, 使用PostgreSQL协议,支持标准SQL接口,兼容关系型数据库SQL生态;
- 扩展能力强、高并发,支持类MPP并行查询框架;
- 弹性扩容,持按需扩容, 自动负载均衡;
- 多副本强一致,使用raft算法保证数据一致性;
- 服务高可用,上去中心化,无SPOF;
- 分布式事务,基于MVCC实现事务控制,支持SI和SSI两种隔离级别;
调研
建表
DROP TABLE IF EXISTS "tracks"; CREATE TABLE IF NOT EXISTS "tracks" ( "id" SERIAL PRIMARY KEY , "third_tracks_id" varchar(32) NOT NULL DEFAULT '' , "tracks_title" varchar(255) NOT NULL DEFAULT '' , "tracks_title_other" varchar(255) NOT NULL DEFAULT '', "tracks_title_py" varchar(64) NOT NULL DEFAULT '' , "data_source" bigint DEFAULT 1 NOT NULL, "tags" varchar(255) NOT NULL DEFAULT ''SQL, "duration" bigint DEFAULT 0 NOT NULL, "status" int DEFAULT 0 NOT NULL, "pa" int DEFAULT 0 NOT NULL, "announcer_name" varchar(255) NOT NULL DEFAULT '', "anchor_name" varchar(255) NOT NULL DEFAULT '', "play_count" bigint DEFAULT 0 NOT NULL, "own_count" bigint DEFAULT 0 NOT NULL, "paid" int DEFAULT 0 NOT NULL, "info" text NOT NULL, "created_at" timestamp NOT NULL, "updated_at" timestamp NOT NULL, "data_updated" bigint NOT NULL, "created" timestamp NOT NULL, "updated" timestamp NOT NULL, "announcer_id" varchar(256) NOT NULL DEFAULT '', "anchor_id" varchar(256) NOT NULL DEFAULT '', UNIQUE INDEX "idx_thirdTrackId_dataSource" (third_tracks_id ASC, data_source ASC), INDEX "idx_announcerid_status_paid_playcount" (announcer_id ASC, status ASC, paid ASC, play_count DESC) );
注意点
1.暂不支持数据表字段加注释; 2.不支持大量数据删除:
> DELETE FROM tracks where id <100000; pq: kv/txn_coord_sender.go:428: transaction is too large to commit: 189948 intents
因为需要强一致性,如果删除大量数据会导致集群延时变大,如果需要删除大量数据可以采用:分段删除
alter table tracks rename to tracks_0907; for (i=1;i<count(*);i+=2000){ DELETE FROM tracks_0907 where id <= i; }
性能圧测
圧测工具 go语言实现圧测脚本
圧测方法 三台机器执行圧测脚本,每一次圧测时长5-10min,每行记录约为1.6kb。因为测试集群为3台性能如下:
M02-XI3 整机SN216486580 CPU【INTEL Xeon E5-2650 V4 12C 2.2GHZ】*2 内存【LANGCHAO PC4-19200 16G】*8 硬盘【LANGCHAO SATA 3T 7.2K】*4 FLASH【LANGCHAO NVMe SSD 800G】*1 网卡【LANGCHAO INTEL 82599】*1 加速卡 RAID无硬件RAID卡
尽可能模拟将线上数据从MySQL迁移到NewSQL的场景。很多人看到mysql或者其他的数据库性能测试报告时候,看到qps都是几万左右,但是大家注意看就会发现测试的单行记录才50byte,与线上的数据是不相符的。
3台比较好的连接数为1000左右,所以测试会以1000并发为限。
圧测表格
序号 | SQL | 数据量 | 并发 | qps | 99分位延时 | 90分位延时 | SQL Byte Traffic | 备注 |
1 | insert | 1500w | 100 | 666 | 152ms | 117ms | 832kb | 18个newsql索引 |
2 | insert | 1500w | 300 | 687 | 352ms | 187ms | 872kb | 18个newsql索引 |
3 | insert | 1500w | 900 | 778 | 700ms | 1500ms | 1.2MB | 18个newsql索引 |
4 | insert | 1500w | 1000 | 807 | 1500s | 1200ms | 1.3MB | 18个newsql索引 |
5 | insert | 1500w | 100 | 1051 | 42ms | 12ms | 1.2MB | 1个newsql索引 |
6 | insert | 1500w | 300 | 2254 | 92ms | 32ms | 3.2MB | 1个newsql索引 |
7 | insert | 1500w | 600 | 4021 | 130ms | 56ms | 6.1MB | 1个newsql索引 |
8 | insert | 1500w | 900 | 5938 | 250ms | 148ms | 8.4MB | 1个newsql索引 |
9 | insert | 1500w | 1000 | 6125 | 270ms | 171ms | 8.7MB | 1个newsql索引 |
10 | select * from tracks WHERE id = 随机id AND status = 0 | 1500w | 300 | 5625 | 30ms | 10ms | 7.3MB | 主键索引 |
11 | select * from tracks WHERE id = 随机id AND status = 0 | 1500w | 600 | 8713 | 45ms | 8ms | 11.7MB | 主键索引 |
12 | select * from tracks WHERE id = 随机id AND status = 0 | 1500w | 1000 | 12320 | 160ms | 130ms | 16.2MB | 主键索引 |
13 | select * from tracks WHERE id (随机20ids) AND status = 0 | 1500w | 300 | 2134 | 200ms | 140ms | 29.7MB | 主键索引 |
14 | select * from tracks WHERE id (随机20ids) AND status = 0 | 1500w | 600 | 2526 | 420ms | 350ms | 34.1MB | 主键索引 |
15 | select * from tracks WHERE id (随机20ids) AND status = 0 | 1500w | 1000 | 2650 | 771ms | 640ms | 36.1MB | 主键索引 |
16 | select * from tracks WHERE id (随机50ids) AND status = 0 | 1500w | 300 | 714 | 670ms | 540ms | 23.2MB | 主键索引 |
17 | select * from tracks WHERE id (随机50ids) AND status = 0 | 1500w | 600 | 672 | 1700ms | 1300ms | 21.4MB | 主键索引 |
18 | select * from tracks WHERE id (随机50ids) AND status = 0 | 1500w | 600 | 757 | 3000ms | 2490ms | 24.7MB | 主键索引 |
19 | SELECT *FROM "tracks" WHERE "third_tracks_id" IN (随机1个id ) AND "data_source" = 随机公司 AND "status" = 0 | 1500w | 300 | 5553 | 40ms | 5ms | 4.1MB | (third_tracks_id , data_source )索引 |
20 | SELECT *FROM "tracks" WHERE "third_tracks_id" IN (随机1个id ) AND "data_source" = 随机公司 AND "status" = 0 | 1500w | 600 | 8624 | 120ms | 18ms | 6.0MB | (third_tracks_id , data_source )索引 |
21 | SELECT *FROM "tracks" WHERE "third_tracks_id" IN (随机1个id ) AND "data_source" = 随机公司 AND "status" = 0 | 1500w | 1000 | 1145 | 310ms | 90ms | 8.7MB | (third_tracks_id , data_source )索引 |
22 | SELECT *FROM "tracks" WHERE "announcer_id" IN (随机1个id ) AND "paid" = 0 AND "status" = 0 order by play_count DESC | 1500w | 300 | 5493 | 160ms | 3ms | 5.1MB | (announcer_id ASC, status ASC, paid ASC, play_count DESC)索引 |
23 | SELECT *FROM "tracks" WHERE "announcer_id" IN (随机1个id ) AND "paid" = 0 AND "status" = 0 order by play_count DESC | 1500w | 600 | 7825 | 283ms | 23ms | 7.5MB | (announcer_id ASC, status ASC, paid ASC, play_count DESC)索引 |
24 | SELECT *FROM "tracks" WHERE "announcer_id" IN (随机1个id ) AND "paid" = 0 AND "status" = 0 order by play_count DESC | 1500w | 1000 | 11171 | 310ms | 68ms | 10.5MB | (announcer_id ASC, status ASC, paid ASC, play_count DESC )索引 |
25 | select * from tracks WHERE id = 随机id AND status = 0 | 3000w | 300 | 5614 | 123ms | 3ms | 8.3MB | 主键索引 |
26 | select * from tracks WHERE id = 随机id AND status = 0 | 3000w | 600 | 8723 | 130ms | 8ms | 12.4MB | 主键索引 |
27 | select * from tracks WHERE id = 随机id AND status = 0 | 3000w | 1000 | 10764 | 320ms | 30ms | 16.2MB | 主键索引 |
28 | select * from tracks WHERE id = 随机id AND status = 0 | 5000w | 300 | 5136 | 159ms | 8ms | 7.8MB | 主键索引 |
29 | select * from tracks WHERE id = 随机id AND status = 0 | 5000w | 600 | 8463 | 180ms | 13ms | 11.8MB | 主键索引 |
30 | select * from tracks WHERE id = 随机id AND status = 0 | 5000w | 1000 | 10848 | 220ms | 26ms | 16.3MB | 主键索引 |
数据分析
- 与MySQL等数据库一样,存在索引时候
insert
的数据会慢上许多,但是在cockroachDB中会更加明显一些。
图中可以得出:
- 大量索引对insert的影响是巨大,会导致写库操作qps大降,在提高并发数后qps亦没有显著提升;
- 在仅有主键索引时候,insert的qps随着并发数的提升得到相应的提升,到了6000qps后再提升并发数效果就不再明显了。
- 通过对主键id的获取数据量不同的压测,可以得到如下图:

图中可以得出:
- id数越多需要查询的range就越多qps就越低;
- id数越多得到的数据量也就越多,网络IO变大,性能也会有所下降;
- 对比主键索引和复合索引的查询效率,可以得到:

图中可以得出:
- 有索引的查询效率会有大幅提升,并且主键索引、唯一索引与复合索引的查询效率基本相同;
- 主键查询效率最优;
- 对比在不同数据记录数中查询效率:

图中可以得出:
- 表中数据量的大小对qps影响不是很大,起码在千万级别是可以接受的;
- 表中数据量越少速度就越快;