表现 – Postgres哈希加盟速度
我有3个表,我希望使用Postgres 9.1中的内部联接,读取,设备和device_patients加入.下面是每个表的缩写模式.
读 – 约250,000行 CREATE TABLE reads ( id serial NOT NULL,device_id integer NOT NULL,value bigint NOT NULL,read_datetime timestamp without time zone NOT NULL,created_at timestamp without time zone NOT NULL,updated_at timestamp without time zone NOT NULL,CONSTRAINT reads_pkey PRIMARY KEY (id ) ) WITH ( OIDS=FALSE ); ALTER TABLE reads OWNER TO postgres; CREATE INDEX index_reads_on_device_id ON reads USING btree (device_id ); CREATE INDEX index_reads_on_read_datetime ON reads USING btree (read_datetime ); 设备 – 约500行 CREATE TABLE devices ( id serial NOT NULL,serial_number character varying(20) NOT NULL,CONSTRAINT devices_pkey PRIMARY KEY (id ) ) WITH ( OIDS=FALSE ); ALTER TABLE devices OWNER TO postgres; CREATE UNIQUE INDEX index_devices_on_serial_number ON devices USING btree (serial_number COLLATE pg_catalog."default" ); patient_devices – 约25,000行 CREATE TABLE patient_devices ( id serial NOT NULL,patient_id integer NOT NULL,issuance_datetime timestamp without time zone NOT NULL,unassignment_datetime timestamp without time zone,CONSTRAINT patient_devices_pkey PRIMARY KEY (id ) ) WITH ( OIDS=FALSE ); ALTER TABLE patient_devices OWNER TO postgres; CREATE INDEX index_patient_devices_on_device_id ON patient_devices USING btree (device_id ); CREATE INDEX index_patient_devices_on_issuance_datetime ON patient_devices USING btree (issuance_datetime ); CREATE INDEX index_patient_devices_on_patient_id ON patient_devices USING btree (patient_id ); CREATE INDEX index_patient_devices_on_unassignment_datetime ON patient_devices USING btree (unassignment_datetime ); 患者 – ~1,000排 CREATE TABLE patients ( id serial NOT NULL,first_name character varying(50) NOT NULL,middle_name character varying(50),last_name character varying(50) NOT NULL,CONSTRAINT participants_pkey PRIMARY KEY (id ) ) WITH ( OIDS=FALSE ); ALTER TABLE patients OWNER TO postgres; 这是我的缩写查询. SELECT device_patients.patient_id,serial_number FROM reads INNER JOIN devices ON devices.id = reads.device_id INNER JOIN patient_devices ON device_patients.device_id = devices.id WHERE (reads.read_datetime BETWEEN '2012-01-01 10:30:01.000000' AND '2013-05-18 03:03:42') AND (read_datetime > issuance_datetime) AND ((unassignment_datetime IS NOT NULL AND read_datetime < unassignment_datetime) OR (unassignment_datetime IS NULL)) GROUP BY serial_number,patient_devices.patient_id LIMIT 10 最终这将是一个更大的查询的一小部分(没有LIMIT,我只添加限制来向自己证明长运行时不是由于返回一堆行),但是我做了一堆实验和确定这是较大查询的缓慢部分.当我在此查询上运行EXPLAIN ANALYZE时,我得到以下输出(也可以查看here) Limit (cost=156442.31..156442.41 rows=10 width=13) (actual time=2815.435..2815.441 rows=10 loops=1) -> HashAggregate (cost=156442.31..159114.89 rows=267258 width=13) (actual time=2815.432..2815.437 rows=10 loops=1) -> Hash Join (cost=1157.78..151455.79 rows=997304 width=13) (actual time=30.930..2739.164 rows=250150 loops=1) Hash Cond: (devices.device_id = devices.id) Join Filter: ((reads.read_datetime > patient_devices.issuance_datetime) AND (((patient_devices.unassignment_datetime IS NOT NULL) AND (reads.read_datetime < patient_devices.unassignment_datetime)) OR (patient_devices.unassignment_datetime IS NULL))) -> Seq Scan on reads (cost=0.00..7236.94 rows=255396 width=12) (actual time=0.035..64.433 rows=255450 loops=1) Filter: ((read_datetime >= '2012-01-01 10:30:01'::timestamp without time zone) AND (read_datetime <= '2013-05-18 03:03:42'::timestamp without time zone)) -> Hash (cost=900.78..900.78 rows=20560 width=37) (actual time=30.830..30.830 rows=25015 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 1755kB -> Hash Join (cost=19.90..900.78 rows=20560 width=37) (actual time=0.776..20.551 rows=25015 loops=1) Hash Cond: (patient_devices.device_id = devices.id) -> Seq Scan on patient_devices (cost=0.00..581.93 rows=24893 width=24) (actual time=0.014..7.867 rows=25545 loops=1) Filter: ((unassignment_datetime IS NOT NULL) OR (unassignment_datetime IS NULL)) -> Hash (cost=13.61..13.61 rows=503 width=13) (actual time=0.737..0.737 rows=503 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 24kB -> Seq Scan on devices (cost=0.00..13.61 rows=503 width=13) (actual time=0.016..0.466 rows=503 loops=1) Filter: (entity_id = 2) Total runtime: 2820.392 ms 我的问题是如何加快速度?现在我在我的Windows机器上运行它进行测试,但最终它将部署在Ubuntu上,这会有所作为吗?任何洞察为什么这需要2秒钟将非常感激. 谢谢 有人建议LIMIT可能正在改变查询计划. Here是没有LIMIT的相同查询.缓慢的部分似乎仍然是哈希加入. 此外,这是相关的调整参数.我现在只在Windows上测试它,我不知道这会对Linux机器产生什么影响 shared_buffers = 2GB 以下是读取表的统计信息 Statistic Value Sequential Scans 130 Sequential Tuples Read 28865850 Index Scans 283630 Index Tuples Fetched 141421907 Tuples Inserted 255450 Tuples Updated 0 Tuples Deleted 0 Tuples HOT Updated 0 Live Tuples 255450 Dead Tuples 0 Heap Blocks Read 20441 Heap Blocks Hit 3493033 Index Blocks Read 8824 Index Blocks Hit 4840210 Toast Blocks Read Toast Blocks Hit Toast Index Blocks Read Toast Index Blocks Hit Last Vacuum 2013-05-20 09:23:03.782-07 Last Autovacuum Last Analyze 2013-05-20 09:23:03.91-07 Last Autoanalyze 2013-05-17 19:01:44.075-07 Vacuum counter 1 Autovacuum counter 0 Analyze counter 1 Autoanalyze counter 6 Table Size 27 MB Toast Table Size none Indexes Size 34 MB 以下是设备表的统计信息 Statistic Value Sequential Scans 119 Sequential Tuples Read 63336 Index Scans 1053935 Index Tuples Fetched 1053693 Tuples Inserted 609 Tuples Updated 0 Tuples Deleted 0 Tuples HOT Updated 0 Live Tuples 609 Dead Tuples 0 Heap Blocks Read 32 Heap Blocks Hit 1054553 Index Blocks Read 32 Index Blocks Hit 2114305 Toast Blocks Read Toast Blocks Hit Toast Index Blocks Read Toast Index Blocks Hit Last Vacuum Last Autovacuum Last Analyze Last Autoanalyze 2013-05-17 19:02:49.692-07 Vacuum counter 0 Autovacuum counter 0 Analyze counter 0 Autoanalyze counter 2 Table Size 48 kB Toast Table Size none Indexes Size 128 kB 以下是patient_devices表的统计信息 Statistic Value Sequential Scans 137 Sequential Tuples Read 3065400 Index Scans 853990 Index Tuples Fetched 46143763 Tuples Inserted 25545 Tuples Updated 24936 Tuples Deleted 0 Tuples HOT Updated 0 Live Tuples 25547 Dead Tuples 929 Heap Blocks Read 1959 Heap Blocks Hit 6099617 Index Blocks Read 1077 Index Blocks Hit 2462681 Toast Blocks Read Toast Blocks Hit Toast Index Blocks Read Toast Index Blocks Hit Last Vacuum Last Autovacuum 2013-05-17 19:01:44.576-07 Last Analyze Last Autoanalyze 2013-05-17 19:01:44.697-07 Vacuum counter 0 Autovacuum counter 6 Analyze counter 0 Autoanalyze counter 6 Table Size 2624 kB Toast Table Size none Indexes Size 5312 kB 以下是我正在努力加快的完整查询.较小的查询确实更快,但我无法更快地完成我的完整查询,这将在下面重现.按照建议,我添加了4个新索引,UNIQUE(device_id,issuance_datetime),UNIQUE(patient_id,unassignment_datetime),unassignment_datetime) SELECT first_name,last_name,MAX(max_read) AS read_datetime,SUM(value) AS value,serial_number FROM ( SELECT pa.first_name,pa.last_name,value,first_value(de.serial_number) OVER(PARTITION BY pa.id ORDER BY re.read_datetime DESC) AS serial_number -- I'm not sure if this is a good way to do this,but I don't know of another way,re.read_datetime,MAX(re.read_datetime) OVER (PARTITION BY pd.id) AS max_read FROM reads re INNER JOIN devices de ON de.id = re.device_id INNER JOIN patient_devices pd ON pd.device_id = de.id AND re.read_datetime >= pd.issuance_datetime AND re.read_datetime < COALESCE(pd.unassignment_datetime,'infinity'::timestamp) INNER JOIN patients pa ON pa.id = pd.patient_id WHERE re.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-05-18 03:03:42' ) AS foo WHERE read_datetime = max_read GROUP BY first_name,serial_number ORDER BY value desc LIMIT 10 很抱歉没有发布这个,但我认为这个查询太复杂了,并试图简单地解决问题,但显然我仍然无法弄明白.如果我可以使用max_read变量限制嵌套选择返回的结果,似乎会更快,但根据众多消息来源,Postgres中不允许这样做. 解决方法
仅供参考:清理查询:
SELECT pd.patient_id,de.serial_number FROM reads re INNER JOIN devices de ON de.id = re.device_id INNER JOIN patient_devices pd ON pd.device_id = de.id AND re.read_datetime >= pd.issuance_datetime -- changed this from '>' to '>=' AND (re.read_datetime < pd.unissuance_datetime OR pd.unissuance_datetime IS NULL) WHERE re.read_datetime BETWEEN '2012-01-01 10:30:01.000000' AND '2013-05-18 03:03:42' GROUP BY de.serial_number,pd.patient_id LIMIT 10 ; 更新:没有原始拼写错误: EXPLAIN ANALYZE SELECT pd.patient_id,de.serial_number FROM reads re INNER JOIN devices de ON de.id = re.device_id INNER JOIN patient_devices pd ON pd.device_id = de.id AND re.read_datetime >= pd.issuance_datetime AND (re.read_datetime < pd.unassignment_datetime OR pd.unassignment_datetime IS NULL) WHERE re.read_datetime BETWEEN '2012-01-01 10:30:01.000000' AND '2013-05-18 03:03:42' GROUP BY de.serial_number,pd.patient_id LIMIT 10 ; 更新:这大约是此处的6倍(在合成数据上,并且数据模型略有改变) -- Modified data model + synthetic data: CREATE TABLE devices ( id serial NOT NULL,serial_number character varying(20) NOT NULL --,created_at timestamp without time zone NOT NULL --,CONSTRAINT devices_pkey PRIMARY KEY (id ),UNIQUE (serial_number) ) ; CREATE TABLE reads -- ( id serial NOT NULL PRIMARY KEY -- You don't need this surrogate key ( device_id integer NOT NULL REFERENCES devices (id),read_datetime timestamp without time zone NOT NULL --,created_at timestamp without time zone NOT NULL --,PRIMARY KEY ( device_id,read_datetime) ) ; CREATE TABLE patient_devices -- ( id serial NOT NULL PRIMARY KEY -- You don't need this surrogate key ( patient_id integer NOT NULL -- REFERENCES patients (id),device_id integer NOT NULL REFERENCES devices(id),unassignment_datetime timestamp without time zone --,PRIMARY KEY (device_id,UNIQUE (device_id,unassignment_datetime) ) ; -- CREATE INDEX index_patient_devices_on_issuance_datetime ON patient_devices (device_id,unassignment_datetime ); -- may need some additional indices later -- devices -- ~500 rows INSERT INTO devices(serial_number) SELECT 'No_' || gs::text FROM generate_series(1,500) gs; -- reads -- ~100K rows INSERT INTO reads(device_id,read_datetime,value) SELECT de.id,gs,(random()*1000000)::bigint FROM devices de JOIN generate_series('2012-01-01','2013-05-01','1 hour' ::interval) gs ON random() < 0.02; -- patient_devices -- ~25,000 rows INSERT INTO patient_devices(device_id,issuance_datetime,patient_id) SELECT DISTINCT ON (re.device_id,read_datetime) re.device_id,pa FROM generate_series(1,100) pa JOIN reads re ON random() < 0.01; -- close the open intervals UPDATE patient_devices dst SET unassignment_datetime = src.issuance_datetime FROM patient_devices src WHERE src.device_id = dst.device_id AND src.issuance_datetime > dst.issuance_datetime AND NOT EXISTS ( SELECT * FROM patient_devices nx WHERE nx.device_id = src.device_id AND nx.issuance_datetime > dst.issuance_datetime AND nx.issuance_datetime < src.issuance_datetime ) ; VACUUM ANALYZE patient_devices; VACUUM ANALYZE devices; VACUUM ANALYZE reads; -- EXPLAIN ANALYZE SELECT pd.patient_id,de.serial_number --,COUNT (*) AS zcount FROM reads re INNER JOIN devices de ON de.id = re.device_id INNER JOIN patient_devices pd ON pd.device_id = de.id AND re.read_datetime >= pd.issuance_datetime AND re.read_datetime < COALESCE(pd.unassignment_datetime,'infinity'::timestamp) WHERE re.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-05-18 03:03:42' GROUP BY de.serial_number,pd.patient_id LIMIT 10 ; (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |