近期同事在讨论如何在PostgreSQL中一张大表,添加一个带有not null属性的,且具有缺省值的字段,并且要求在秒级完成。
因为此,有了以下的实验记录:
首先我们是在PostgreSQL 10下做的实验:
postgres=# select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit (1 row)
开启计时:
postgres=# \timing Timing is on.
建表,并查询表信息,插入数据:
postgres=# create table add_c_d_in_ms(id int, a1 text, a2 text, a3 text, a4 text, a5 text, a6 text, a7 text, a8 text not null default 'wangshuo'); CREATE TABLE Time: 72.243 ms postgres=# select oid,relname,relnatts from pg_class where relname='add_c_d_in_ms'; oid | relname | relnatts -------+---------------+---------- 16384 | add_c_d_in_ms | 9 (1 row) Time: 1.504 ms postgres=# insert into add_c_d_in_ms select generate_series(1,10000000),'wangshuo','highgo'||random(),'huang','wang',generate_series(1,10000000)::text,random()*100000000000,'shuo','ms'; INSERT 0 10000000 Time: 81261.870 ms (01:21.262)
我们看一下列信息:
postgres=# select * from pg_attribute where attrelid = 16384; attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atth asdef | attidentity | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions ----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+----- ------+-------------+--------------+------------+-------------+--------------+--------+------------+--------------- 16384 | tableoid | 26 | 0 | 4 | -7 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | 16384 | cmax | 29 | 0 | 4 | -6 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | 16384 | xmax | 28 | 0 | 4 | -5 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | 16384 | cmin | 29 | 0 | 4 | -4 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | 16384 | xmin | 28 | 0 | 4 | -3 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | 16384 | ctid | 27 | 0 | 6 | -1 | 0 | -1 | -1 | f | p | s | t | f | | f | t | 0 | 0 | | | 16384 | id | 23 | -1 | 4 | 1 | 0 | -1 | -1 | t | p | i | f | f | | f | t | 0 | 0 | | | 16384 | a1 | 25 | -1 | -1 | 2 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 100 | | | 16384 | a2 | 25 | -1 | -1 | 3 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 100 | | | 16384 | a3 | 25 | -1 | -1 | 4 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 100 | | | 16384 | a4 | 25 | -1 | -1 | 5 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 100 | | | 16384 | a5 | 25 | -1 | -1 | 6 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 100 | | | 16384 | a6 | 25 | -1 | -1 | 7 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 100 | | | 16384 | a7 | 25 | -1 | -1 | 8 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 100 | | | 16384 | a8 | 25 | -1 | -1 | 9 | 0 | -1 | -1 | f | x | i | t | t | | f | t | 0 | 100 | | | (15 rows) Time: 1.574 ms
我们再来查询一条数据看一下:
postgres=# select * from add_c_d_in_ms where id=1; id | a1 | a2 | a3 | a4 | a5 | a6 | a7 | a8 ----+----------+-------------------------+-------+------+----+------------------+------+---- 1 | wangshuo | highgo0.460023149382323 | huang | wang | 1 | 25913513777.7776 | shuo | ms (1 row) Time: 806.036 ms
然后,我们看一下正常PostgreSQL加一个字段所花费的时间:
postgres=# alter table add_c_d_in_ms add a9 text not null default 'test'; ALTER TABLE Time: 36803.610 ms (00:36.804)
明显看到时间花费相当长,其实PostgreSQL在这里将数据完全重写了,主要原因就是就是添加的字段带有not null属性。
我们来看下一新家字段的列属性:
postgres=# select * from pg_attribute where attrelid = 16384 and attname='a9'; attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | attha sdef | attidentity | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions ----------+---------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+------ -----+-------------+--------------+------------+-------------+--------------+--------+------------+--------------- 16384 | a9 | 25 | -1 | -1 | 10 | 0 | -1 | -1 | f | x | i | t | t | | f | t | 0 | 100 | | | (1 row) Time: 0.670 ms
这时候我们插入数据看一下:
postgres=# insert into add_c_d_in_ms (id) values (1); INSERT 0 1 Time: 14.658 ms postgres=# select * from add_c_d_in_ms where id=1; id | a1 | a2 | a3 | a4 | a5 | a6 | a7 | a8 | a9 ----+----------+-------------------------+-------+------+----+------------------+------+----------+------ 1 | wangshuo | highgo0.460023149382323 | huang | wang | 1 | 25913513777.7776 | shuo | ms | test 1 | | | | | | | | wangshuo | test (2 rows) Time: 850.982 ms
下面见证奇迹的时刻来了,如何快速添加这么一个字段:
首先,在这里我们涉及三张系统表,pg_class(表属性)、pg_attribute(列属性)、pg_attrdef(缺省值信息),接下来依次看一下三张表的信息:
#pg_class:oid表系统序列号,relname表名,relnatts列个数(主要修改属性) postgres=# select oid,relname,relnatts from pg_class where relname='add_c_d_in_ms'; oid | relname | relnatts -------+---------------+---------- 16384 | add_c_d_in_ms | 10 (1 row) Time: 0.418 ms #pg_attribute 这里还没有修改,和前面一致,在此就查看了 #pg_attrdef 缺省值信息,这里只有原来的a9带有缺省值 postgres=# select * from pg_attrdef ; adrelid | adnum | adbin | adsrc ---------+-------+------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------+------------------ 16384 | 9 | {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 123 :constvalue 12 [ 48 0 0 0 119 97 110 103 115 104 117 111 ]} | 'wangshuo'::text 16384 | 10 | {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 55 :constvalue 8 [ 32 0 0 0 116 101 115 116 ]} | 'test'::text (2 rows) Time: 0.363 ms
下面进行改造:
#添加字段属性 postgres=# insert into pg_attribute select attrelid,'new_n_d',atttypid,attstattarget,attlen,attnum+1,attndims,attcacheoff,atttypmod,attbyval,attstorage,attalign,'t',atthasdef,attidentity,attisdropped,attislocal,attinhcount,attcollation,attacl,attoptions,attfdwoptions from pg_attribute where attrelid=16384 and attname='a9'; INSERT 0 1 Time: 25.008 ms #修改pg_class字段个数 postgres=# update pg_class set relnatts=relnatts+1 where relname='add_c_d_in_ms'; UPDATE 1 Time: 43.979 ms #添加缺省值 postgres=# insert into pg_attrdef select adrelid,adnum+1,adbin,adsrc from pg_attrdef where adrelid=16384 and adnum=10; INSERT 16399 1 Time: 15.774 ms
再次查看三个系统表系统信息:
#pg_class postgres=# select oid,relname,relnatts from pg_class where relname='add_c_d_in_ms'; oid | relname | relnatts -------+---------------+---------- 16384 | add_c_d_in_ms | 11 (1 row) #pg_attribute postgres=# select * from pg_attribute where attrelid = 16384 and attname='new_n_d'; attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | attha sdef | attidentity | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions ----------+---------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+------ -----+-------------+--------------+------------+-------------+--------------+--------+------------+--------------- 16384 | new_n_d | 25 | -1 | -1 | 11 | 0 | -1 | -1 | f | x | i | t | t | | f | t | 0 | 100 | | | (1 row) Time: 0.516 ms #pg_attrdef postgres=# select * from pg_attrdef; adrelid | adnum | adbin | adsrc ---------+-------+------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------+------------------ 16384 | 9 | {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 123 :constvalue 12 [ 48 0 0 0 119 97 110 103 115 104 117 111 ]} | 'wangshuo'::text 16384 | 10 | {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 55 :constvalue 8 [ 32 0 0 0 116 101 115 116 ]} | 'test'::text 16384 | 11 | {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 55 :constvalue 8 [ 32 0 0 0 116 101 115 116 ]} | 'test'::text (3 rows)
插入值实验:
postgres=# insert into add_c_d_in_ms (id) values (1); INSERT 0 1 Time: 8.407 ms postgres=# select * from add_c_d_in_ms where id=1; id | a1 | a2 | a3 | a4 | a5 | a6 | a7 | a8 | a9 | new_n_d ----+----------+-------------------------+-------+------+----+------------------+------+----------+------+--------- 1 | wangshuo | highgo0.460023149382323 | huang | wang | 1 | 25913513777.7776 | shuo | ms | test | 1 | | | | | | | | wangshuo | test | 1 | | | | | | | | wangshuo | test | test (3 rows) Time: 322.143 ms
问题:
#正常添加字段可以 postgres=# alter table add_c_d_in_ms add a10 text; ALTER TABLE #如果添加not null属性的字段,则会检测其他字段属性,将会报错 postgres=# alter table add_c_d_in_ms add a11 text not null default 'aaa'; 2018-01-11 00:21:55.587 EST [4217] ERROR: column "new_n_d" contains null values 2018-01-11 00:21:55.587 EST [4217] STATEMENT: alter table add_c_d_in_ms add a11 text not null default 'aaa'; ERROR: column "new_n_d" contains null values
另一种解决方法,添加字段,然后添加check约束:
postgres=# alter table add_c_d_in_ms_new add a9 text default 'abc'; ALTER TABLE Time: 549.182 ms postgres=# alter table add_c_d_in_ms_new add constraint ck_tbl_check_a check (a9 is not null); ALTER TABLE Time: 46.200 ms postgres=# insert into add_c_d_in_ms_new (a1) values (1); INSERT 0 1 Time: 30.716 ms postgres=# select * from add_c_d_in_ms_new where a1=1::text; id | a1 | a2 | a3 | a4 | a5 | a6 | a7 | a8 | a9 ----+----+----+----+----+----+----+----+----------+----- | 1 | | | | | | | wangshuo | abc (1 row) Time: 0.500 ms postgres=# insert into add_c_d_in_ms_new (a1,a9) values (1,null); 2018-01-11 01:07:56.456 EST [4217] ERROR: new row for relation "add_c_d_in_ms_new" violates check constraint "ck_tbl_check_a" 2018-01-11 01:07:56.456 EST [4217] DETAIL: Failing row contains (null, 1, null, null, null, null, null, null, wangshuo, null). 2018-01-11 01:07:56.456 EST [4217] STATEMENT: insert into add_c_d_in_ms_new (a1,a9) values (1,null); ERROR: new row for relation "add_c_d_in_ms_new" violates check constraint "ck_tbl_check_a" DETAIL: Failing row contains (null, 1, null, null, null, null, null, null, wangshuo, null). Time: 0.383 ms