使用Python和pyathenajdbc与Athena连接
我正在尝试使用
python连接到AWS Athena.我正在尝试使用pyathenajdbc来完成此任务.我遇到的问题是获得连接.当我运行下面的代码时,我收到一条错误消息,指出它无法找到AthenaDriver. (
java.lang.RuntimeException:未找到类com.amazonaws.athena.jdbc.AthenaDriver).我确实从AWS下载了这个文件,我确认它正在该目录中.
from mdpbi.rsi.config import * from mdpbi.tools.functions import mdpLog from pkg_resources import resource_string import argparse import os import pyathenajdbc import sys SCRIPT_NAME = "Athena_Export" ATHENA_JDBC_CLASSPATH = "/opt/amazon/athenajdbc/AthenaJDBC41-1.0.0.jar" EXPORT_OUTFILE = "RSI_Export.txt" EXPORT_OUTFILE_PATH = os.path.join(WORKINGDIR,EXPORT_OUTFILE) def get_arg_parser(): """This function returns the argument parser object to be used with this script""" parser = argparse.ArgumentParser(description=__doc__,formatter_class=argparse.RawDescriptionHelpFormatter) return parser def main(): args = get_arg_parser().parse_args(sys.argv[1:]) logger = mdpLog(SCRIPT_NAME,LOGDIR) SQL = resource_string("mdpbi.rsi.athena.resources","athena.sql") conn = pyathenajdbc.connect( s3_staging_dir="s3://athena",access_key=AWS_ACCESS_KEY_ID,secret_key=AWS_SECRET_ACCESS_KEY,region_name="us-east-1",log_path=LOGDIR,driver_path=ATHENA_JDBC_CLASSPATH ) try: with conn.cursor() as cursor: cursor.execute(SQL) logger.info(cursor.description) logger.info(cursor.fetchall()) finally: conn.close() return 0 if __name__ == '__main__': rtn = main() sys.exit(rtn)
解决方法
JDBC驱动程序需要Java 8.我当前正在运行Java 7.我能够在EC2实例上安装另一个版本的Java.
https://tecadmin.net/install-java-8-on-centos-rhel-and-fedora/# 我还必须在我的代码中设置java版本.通过这些更改,代码现在可以按预期运行. from mdpbi.rsi.config import * from mdpbi.tools.functions import mdpLog from pkg_resources import resource_string import argparse import os import pyathenajdbc import sys SCRIPT_NAME = "Athena_Export" def get_arg_parser(): """This function returns the argument parser object to be used with this script""" parser = argparse.ArgumentParser(description=__doc__,"athena.sql") os.environ["JAVA_HOME"] = "/opt/jdk1.8.0_121" os.environ["JRE_HOME"] = "/opt/jdk1.8.0_121/jre" os.environ["PATH"] = "/opt/jdk1.8.0_121/bin:/opt/jdk1.8.0_121/jre/bin" conn = pyathenajdbc.connect( s3_staging_dir="s3://mdpbi.data.rsi.out/",schema_name="rsi",region_name="us-east-1" ) try: with conn.cursor() as cursor: cursor.execute(SQL) logger.info(cursor.description) logger.info(cursor.fetchall()) finally: conn.close() return 0 if __name__ == '__main__': rtn = main() sys.exit(rtn) (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |