加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

使用多个(python)客户端并行加载cassandra中的所有行

发布时间:2020-12-20 13:34:09 所属栏目:Python 来源:网络整理
导读:使用Cassandra推荐的RandomPartitioner(或Murmur3Partitioner)时,无法对键进行有意义的范围查询,因为行是 distributed around the cluster using the md5 hash of the key.这些哈希称为“令牌”. 尽管如此,通过为每个计算工作者分配一个标记范围来分割大表是
使用Cassandra推荐的RandomPartitioner(或Murmur3Partitioner)时,无法对键进行有意义的范围查询,因为行是 distributed around the cluster using the md5 hash of the key.这些哈希称为“令牌”.

尽管如此,通过为每个计算工作者分配一个标记范围来分割大表是非常有用的.使用CQL3,它似乎可能到issue queries directly against the tokens,但是下面的python不起作用…编辑:在切换到对最新版本的cassandra数据库(doh!)进行测试后工作,并且还更新下面的每个音符的语法:

## use python cql module
import cql

## If running against an old version of Cassandra,this raises: 
## TApplicationException: Invalid method name: 'set_cql_version'
conn = cql.connect('localhost',cql_version='3.0.2')

cursor = conn.cursor()

try:
    ## remove the previous attempt to make this work
    cursor.execute('DROP KEYSPACE test;')
except Exception,exc:
    print exc

## make a keyspace and a simple table
cursor.execute("CREATE KEYSPACE test WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor = 1;")
cursor.execute("USE test;")
cursor.execute('CREATE TABLE data (k int PRIMARY KEY,v varchar);')

## put some data in the table -- must use single quotes around literals,not double quotes                                                                                                                                   
cursor.execute("INSERT INTO data (k,v) VALUES (0,'a');")
cursor.execute("INSERT INTO data (k,v) VALUES (1,'b');")
cursor.execute("INSERT INTO data (k,v) VALUES (2,'c');")
cursor.execute("INSERT INTO data (k,v) VALUES (3,'d');")

## split up the full range of tokens.
## Suppose there are 2**k workers:
k = 3 # --> eight workers
token_sub_range = 2**(127 - k)
worker_num = 2 # for example
start_token =    worker_num  * token_sub_range
end_token = (1 + worker_num) * token_sub_range

## put single quotes around the token strings
cql3_command = "SELECT k,v FROM data WHERE token(k) >= '%d' AND token(k) < '%d';" % (start_token,end_token)
print cql3_command

## this fails with "ProgrammingError: Bad Request: line 1:28 no viable alternative at input 'token'"
cursor.execute(cql3_command)

for row in cursor:
    print row

cursor.close()
conn.close()

理想情况下,我希望能够使用pycassa,因为我更喜欢它更加pythonic的界面.

有一个更好的方法吗?

解决方法

我已更新问题以包含答案.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读