Sql:优化BETWEEN子句
我写了一个声明,花了差不多一个小时才能运行所以我在寻求帮助,所以我可以更快地做到这一点.所以我们走了:
我正在进行两个表的内连接: 我有许多时间间隔由间隔表示,我想从这些间隔内的度量获得测量数据. interval:有两列,一列是开始时间,另一列是间隔的结束时间(行数= 1295) 度量:有两列,一列有度量,另一列有度量的时间(行数=一百万) 我想得到的结果是一个表,在第一列中有度量,然后是度量完成的时间,所考虑的时间间隔的开始/结束时间(对于时间在所考虑范围内的行,它将重复) 这是我的代码: select measures.measure as measure,measures.time as time,intervals.entry_time as entry_time,intervals.exit_time as exit_time from intervals inner join measures on intervals.entry_time<=measures.time and measures.time <=intervals.exit_time order by time asc 谢谢 解决方法这是一个非常普遍的问题.普通的B-Tree索引不适合这样的查询: SELECT measures.measure as measure,intervals.exit_time as exit_time FROM intervals JOIN measures ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time ORDER BY time ASC 索引适用于搜索给定边界内的值,如下所示: ,但不是用于搜索包含给定值的边界,如下所示: 我的博客中的这篇文章更详细地解释了这个问题: > Adjacency list vs. nested sets: MySQL (嵌套集模型处理类似的谓词类型). 您可以按时间建立索引,这样区间将在连接中领先,范围时间将在嵌套循环内使用.这将需要按时分类. 您可以按间隔创建空间索引(在MySQL中使用MyISAM存储可用),其中包括一个几何列中的开始和结束.这样,测量可以导致连接,并且不需要排序. 然而,空间索引更慢,所以只有少量测量但间隔很多时,这才有效. 由于您只有很少的间隔但有很多度量,因此请确保您有一个measure.time的索引: CREATE INDEX ix_measures_time ON measures (time) 更新: 这是一个要测试的示例脚本: BEGIN DBMS_RANDOM.seed(20091223); END; / CREATE TABLE intervals ( entry_time NOT NULL,exit_time NOT NULL ) AS SELECT TO_DATE('23.12.2009','dd.mm.yyyy') - level,TO_DATE('23.12.2009','dd.mm.yyyy') - level + DBMS_RANDOM.value FROM dual CONNECT BY level <= 1500 / CREATE UNIQUE INDEX ux_intervals_entry ON intervals (entry_time) / CREATE TABLE measures ( time NOT NULL,measure NOT NULL ) AS SELECT TO_DATE('23.12.2009','dd.mm.yyyy') - level / 720,CAST(DBMS_RANDOM.value * 10000 AS NUMBER(18,2)) FROM dual CONNECT BY level <= 1080000 / ALTER TABLE measures ADD CONSTRAINT pk_measures_time PRIMARY KEY (time) / CREATE INDEX ix_measures_time_measure ON measures (time,measure) / 这个查询: SELECT SUM(measure),AVG(time - TO_DATE('23.12.2009','dd.mm.yyyy')) FROM ( SELECT * FROM ( SELECT /*+ ORDERED USE_NL(intervals measures) */ * FROM intervals JOIN measures ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time ORDER BY time ) WHERE rownum <= 500000 ) 使用NESTED LOOPS并在1.7秒内返回. 这个查询: SELECT SUM(measure),'dd.mm.yyyy')) FROM ( SELECT * FROM ( SELECT /*+ ORDERED USE_MERGE(intervals measures) */ * FROM intervals JOIN measures ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time ORDER BY time ) WHERE rownum <= 500000 ) 使用MERGE JOIN,我不得不在5分钟后停止它. 更新2: 您很可能需要使用如下提示强制引擎在连接中使用正确的表顺序: SELECT /*+ LEADING (intervals) USE_NL(intervals,measures) */ measures.measure as measure,intervals.exit_time as exit_time FROM intervals JOIN measures ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time ORDER BY time ASC Oracle的优化器不够智能,看不到间隔不相交.这就是为什么它最有可能使用度量作为主导表(如果间隔相交,这将是一个明智的决定). 更新3: WITH splits AS ( SELECT /*+ MATERIALIZE */ entry_range,exit_range,exit_range - entry_range + 1 AS range_span,entry_time,exit_time FROM ( SELECT TRUNC((entry_time - TO_DATE(1,'J')) * 2) AS entry_range,TRUNC((exit_time - TO_DATE(1,'J')) * 2) AS exit_range,exit_time FROM intervals ) ),upper AS ( SELECT /*+ MATERIALIZE */ MAX(range_span) AS max_range FROM splits ),ranges AS ( SELECT /*+ MATERIALIZE */ level AS chunk FROM upper CONNECT BY level <= max_range ),tiles AS ( SELECT /*+ MATERIALIZE USE_MERGE (r s) */ entry_range + chunk - 1 AS tile,exit_time FROM ranges r JOIN splits s ON chunk <= range_span ) SELECT /*+ LEADING(t) USE_HASH(m t) */ SUM(LENGTH(stuffing)) FROM tiles t JOIN measures m ON TRUNC((m.time - TO_DATE(1,'J')) * 2) = tile AND m.time BETWEEN t.entry_time AND t.exit_time 此查询将时间轴拆分为范围,并使用HASH JOIN连接范围值的度量和时间戳,稍后进行精细过滤. 有关其工作原理的更详细说明,请参阅我的博客中的这篇文章: > Oracle: joining timestamps and time intervals (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |