加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

使用Python和pyathenajdbc与Athena连接

发布时间:2020-12-20 11:58:50 所属栏目:Python 来源:网络整理
导读:我正在尝试使用 python连接到AWS Athena.我正在尝试使用pyathenajdbc来完成此任务.我遇到的问题是获得连接.当我运行下面的代码时,我收到一条错误消息,指出它无法找到AthenaDriver. ( java.lang.RuntimeException:未找到类com.amazonaws.athena.jdbc.AthenaD
我正在尝试使用 python连接到AWS Athena.我正在尝试使用pyathenajdbc来完成此任务.我遇到的问题是获得连接.当我运行下面的代码时,我收到一条错误消息,指出它无法找到AthenaDriver. ( java.lang.RuntimeException:未找到类com.amazonaws.athena.jdbc.AthenaDriver).我确实从AWS下载了这个文件,我确认它正在该目录中.

from mdpbi.rsi.config import *
from mdpbi.tools.functions import mdpLog
from pkg_resources import resource_string
import argparse
import os
import pyathenajdbc
import sys

SCRIPT_NAME = "Athena_Export"

ATHENA_JDBC_CLASSPATH = "/opt/amazon/athenajdbc/AthenaJDBC41-1.0.0.jar"
EXPORT_OUTFILE = "RSI_Export.txt"
EXPORT_OUTFILE_PATH = os.path.join(WORKINGDIR,EXPORT_OUTFILE)


def get_arg_parser():
    """This function returns the argument parser object to be used with this script"""
    parser = argparse.ArgumentParser(description=__doc__,formatter_class=argparse.RawDescriptionHelpFormatter)

    return parser


def main():
    args = get_arg_parser().parse_args(sys.argv[1:])
    logger = mdpLog(SCRIPT_NAME,LOGDIR)

    SQL = resource_string("mdpbi.rsi.athena.resources","athena.sql")

    conn = pyathenajdbc.connect(
        s3_staging_dir="s3://athena",access_key=AWS_ACCESS_KEY_ID,secret_key=AWS_SECRET_ACCESS_KEY,region_name="us-east-1",log_path=LOGDIR,driver_path=ATHENA_JDBC_CLASSPATH
    )
    try:
        with conn.cursor() as cursor:
            cursor.execute(SQL)
            logger.info(cursor.description)
            logger.info(cursor.fetchall())
    finally:
        conn.close()

    return 0


if __name__ == '__main__':
    rtn = main()
    sys.exit(rtn)

Traceback (most recent call last): File
“/usr/lib64/python2.7/runpy.py”,line 174,in _run_module_as_main
main“,fname,loader,pkg_name) File “/usr/lib64/python2.7/runpy.py”,line 72,in _run_code
exec code in run_globals File “/home/ec2-user/jason_testing/mdpbi/rsi/athena/main.py”,line 53,
in
rtn = main() File “/home/ec2-user/jason_testing/mdpbi/rsi/athena/main.py”,line 39,
in main
driver_path=athena_jdbc_driver_path File “/opt/mdpbi/Python_Envs/2.7.10/local/lib/python2.7/dist-packages/pyathenajdbc/init.py”,
line 65,in connect
driver_path,**kwargs) File “/opt/mdpbi/Python_Envs/2.7.10/local/lib/python2.7/dist-packages/pyathenajdbc/connection.py”,
line 68,in init
jpype.JClass(ATHENA_DRIVER_CLASS_NAME) File “/opt/mdpbi/Python_Envs/2.7.10/lib64/python2.7/dist-packages/jpype/_jclass.py”,
line 55,in JClass
raise _RUNTIMEEXCEPTION.PYEXC(“Class %s not found” % name)

解决方法

JDBC驱动程序需要Java 8.我当前正在运行Java 7.我能够在EC2实例上安装另一个版本的Java.

https://tecadmin.net/install-java-8-on-centos-rhel-and-fedora/#

我还必须在我的代码中设置java版本.通过这些更改,代码现在可以按预期运行.

from mdpbi.rsi.config import *
from mdpbi.tools.functions import mdpLog
from pkg_resources import resource_string
import argparse
import os
import pyathenajdbc
import sys

SCRIPT_NAME = "Athena_Export"


def get_arg_parser():
    """This function returns the argument parser object to be used with this script"""
    parser = argparse.ArgumentParser(description=__doc__,"athena.sql")

    os.environ["JAVA_HOME"] = "/opt/jdk1.8.0_121"
    os.environ["JRE_HOME"] = "/opt/jdk1.8.0_121/jre"
    os.environ["PATH"] = "/opt/jdk1.8.0_121/bin:/opt/jdk1.8.0_121/jre/bin"

    conn = pyathenajdbc.connect(
        s3_staging_dir="s3://mdpbi.data.rsi.out/",schema_name="rsi",region_name="us-east-1"
    )
    try:
        with conn.cursor() as cursor:
            cursor.execute(SQL)
            logger.info(cursor.description)
            logger.info(cursor.fetchall())
    finally:
        conn.close()

    return 0


if __name__ == '__main__':
    rtn = main()
    sys.exit(rtn)

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读