python – MySQL的pandas可以支持文本索引吗？

发布时间：2020-12-20 11:53:03 所属栏目：Python 来源：网络整理

导读：如果我尝试在 MySQL数据库中存储带有文本索引的数据帧,我会收到错误“在没有密钥长度的密钥规范中使用BLOB / TEXT列”,例如： import pandas as pdimport sqlalchemy as sadf = pd.DataFrame( {'Id': ['AJP2008H','BFA2010Z'],'Date': pd.to_datetime(['2010

如果我尝试在 MySQL数据库中存储带有文本索引的数据帧,我会收到错误“在没有密钥长度的密钥规范中使用BLOB / TEXT列”,例如：

import pandas as pd
import sqlalchemy as sa
df = pd.DataFrame(
    {'Id': ['AJP2008H','BFA2010Z'],'Date': pd.to_datetime(['2010-05-05','2010-07-05']),'Value': [74.2,52.3]})
df.set_index(['Id','Date'],inplace=True)
engine = sa.create_engine(db_connection)
conn = engine.connect()
df.to_sql('test_table_index',conn,if_exists='replace')
conn.close()

会产生错误：

InternalError: (pymysql.err.InternalError) 
(1170,"BLOB/TEXT column 'Id' used in key specification without a key length") 
[SQL: 'CREATE INDEX `ix_test_table_index_Id` ON test_table_index (`Id`)']

如果我没有设置索引它工作正常.有没有办法存储它而不直接下载到SQLAlchemy来创建表？

(这是我目前的SQLAlchemy解决方法：

table = Table(
            name,self.metadata,Column('Id',String(ID_LENGTH),primary_key=True),Column('Date',DateTime,Column('Value',String(VALUE_LENGTH)))
sa.MetaData().create_all(engine)  # Creates the table if it doens't exist

)

解决方法

您可以在调用 to_sql()方法时使用dtype参数显式指定 SQLAlchemy data type：

In [48]: from sqlalchemy.types import VARCHAR

In [50]: df
Out[50]:
                     Value
Id       Date
AJP2008H 2010-05-05   74.2
BFA2010Z 2010-07-05   52.3

In [51]: df.to_sql('test_table_index',if_exists='replace',dtype={'Id': VARCHAR(df.index.get_level_values('Id').str.len().max())})

我们在MySQL端检查它：

mysql> show create table test_table_indexG
*************************** 1. row ***************************
       Table: test_table_index
Create Table: CREATE TABLE `test_table_index` (
  `Id` varchar(8) DEFAULT NULL,`Date` datetime DEFAULT NULL,`Value` double DEFAULT NULL,KEY `ix_test_table_index_Id` (`Id`),KEY `ix_test_table_index_Date` (`Date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)


mysql> select * from test_table_index;
+----------+---------------------+-------+
| Id       | Date                | Value |
+----------+---------------------+-------+
| AJP2008H | 2010-05-05 00:00:00 |  74.2 |
| BFA2010Z | 2010-07-05 00:00:00 |  52.3 |
+----------+---------------------+-------+
2 rows in set (0.00 sec)

现在让我们把它读回一个新的DF：

In [52]: x = pd.read_sql('test_table_index',index_col=['Id','Date'])

In [53]: x
Out[53]:
                     Value
Id       Date
AJP2008H 2010-05-05   74.2
BFA2010Z 2010-07-05   52.3

您可以通过以下方式找到对象列的最大长度：

In [75]: df.index.get_level_values('Id').str.len().max()
Out[75]: 8

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!