postgresql – OFFSET与ROW_NUMBER()

发布时间：2020-12-13 16:43:32 所属栏目：百科来源：网络整理

导读：我们知道，Postgresql的OFFSET要求它扫描所有的行，直到它到达你请求的位置，这使得它对于通过巨大的结果集的分页是无用的，随着OFFSET的增加越来越慢。 PG 8.4现在支持窗口功能。代替： SELECT * FROM table ORDER BY somecol LIMIT 10 OFFSET 500 你可以说

我们知道，Postgresql的OFFSET要求它扫描所有的行，直到它到达你请求的位置，这使得它对于通过巨大的结果集的分页是无用的，随着OFFSET的增加越来越慢。

PG 8.4现在支持窗口功能。代替：

SELECT * FROM table ORDER BY somecol LIMIT 10 OFFSET 500

你可以说：

SELECT * FROM (SELECT *,ROW_NUMBER() OVER (ORDER BY somecol ASC) AS rownum FROM table) AS foo
WHERE rownum > 500 AND rownum <= 510

后一种方法能帮助我们吗？或者我们必须继续使用标识列和临时表来进行大分页吗？

我构造了一个测试，它比较OFFSET，光标和ROW_NUMBER()。我对ROW_NUMBER()的印象是，无论您在结果集中的位置如何，它的速度都是一致的，是正确的。但是，这个速度比OFFSET或CURSOR的速度要慢得多，这也是我的印象，速度几乎是相同的，速度越慢，结果就越差。

结果：

offset(100,100): 0.016359
scroll(100,100): 0.018393
rownum(100,100): 15.535614

offset(100,480000): 1.761800
scroll(100,480000): 1.781913
rownum(100,480000): 15.158601

offset(100,999900): 3.670898
scroll(100,999900): 3.664517
rownum(100,999900): 14.581068

测试脚本使用sqlalchemy设置表和1000000行测试数据。然后，它使用psycopg2游标执行每个SELECT语句并使用三种不同的方法获取结果。

from sqlalchemy import *

metadata = MetaData()
engine = create_engine('postgresql://scott:tiger@localhost/test',echo=True)

t1 = Table('t1',metadata,Column('id',Integer,primary_key=True),Column('d1',String(50)),Column('d2',Column('d3',Column('d4',Column('d5',String(50))
)

if not engine.has_table('t1'):
    conn = engine.connect()
    t1.create(conn)

    # 1000000 rows
    for i in range(100):
        conn.execute(t1.insert(),[
            dict(
                ('d%d' % col,"data data data %d %d" % (col,(i * 10000) + j))
                for col in range(1,6)
            ) for j in xrange(1,10001)
        ])

import time

def timeit(fn,count,*args):
    now = time.time()
    for i in xrange(count):
        fn(*args)
    total = time.time() - now
    print "%s(%s): %f" % (fn.__name__,",".join(repr(x) for x in args),total)

# this is a raw psycopg2 connection.
conn = engine.raw_connection()

def offset(limit,offset):
    cursor = conn.cursor()
    cursor.execute("select * from t1 order by id limit %d offset %d" % (limit,offset))
    cursor.fetchall()
    cursor.close()

def rownum(limit,offset):
    cursor = conn.cursor()
    cursor.execute("select * from (select *,"
                    "row_number() over (order by id asc) as rownum from t1) as foo "
                    "where rownum>=%d and rownum<%d" % (offset,limit + offset))
    cursor.fetchall()
    cursor.close()

def scroll(limit,offset):
    cursor = conn.cursor('foo')
    cursor.execute("select * from t1 order by id")
    cursor.scroll(offset)
    cursor.fetchmany(limit)
    cursor.close()

print 

timeit(offset,10,100,100)
timeit(scroll,100)
timeit(rownum,100)

print 

timeit(offset,480000)
timeit(scroll,480000)
timeit(rownum,480000)

print 

timeit(offset,999900)
timeit(scroll,999900)
timeit(rownum,999900)

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!