加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

使用dplyr和SQLite进行UTF-8编码

发布时间:2020-12-12 18:55:58 所属栏目:百科 来源:网络整理
导读:我在SQLite中有一个表,我想用dplyr打开它.我在使用Win 7的PC上使用SQLite专家版35.58.2478,R Studio版本0.98.1062. 用src_sqlite()连接到数据库并用tbl()读取后,我得到了表.但是这个角色是错误的.从csv文件中读取同一个表只是通过将函数read =“utf-8”添加
我在SQLite中有一个表,我想用dplyr打开它.我在使用Win 7的PC上使用SQLite专家版35.58.2478,R Studio版本0.98.1062.

用src_sqlite()连接到数据库并用tbl()读取后,我得到了表.但是这个角色是错误的.从csv文件中读取同一个表只是通过将函数read =“utf-8”添加到函数read.csv,但在这种情况下,第一列名称中会出现另一个错误(请考虑下面的最小示例).

请注意,在SQLite表中,编码为UTF-8,SQLite正确显示数据.

我试图在R Studio选项中更改编码但没有成功.在Windows或r中更改区域也没有任何效果.

是否有任何解决方案使用dplyr将表中的字符正确地转换为r?

最小的例子

library(dplyr)
db <- src_sqlite("C:/Users/Jens/Documents/SQLite/my_db.sqlite")
tbl(db,"prozesse")
## Source: sqlite 3.7.17 [C:/Users/Jens/Documents/SQLite/my_db.sqlite]
## From: prozesse [4 x 4]
## 
##   KH_ID Einsch?¤tzung Prozess Gruppe
## 1     1             3 Buchung     IT
## 2     2             4 Buchung     IT
## 3     3             3 Buchung    OLP
## 4     4             5 Buchung    OLP

您在第二列的名称中看到错误的编码.使用?,?,ü等在colums中也会出现此问题.

第二列的名称显示正确,但第一列错误:

read.csv("C:/Users/Jens/Documents/SQLite/prozess.csv",encoding = "UTF-8")
##   X.U.FEFF.KH_ID Einsch?tzung Gruppe Prozess
## 1              1            3     PO  visite
## 2              2            3     IT  visite
## 3              3            3     IT  visite
## 4              2            3     PO  visite


sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] RSQLite.extfuns_0.0.1 RSQLite_0.11.4        DBI_0.3.0            
## [4] dplyr_0.2            
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1  digest_0.6.4    evaluate_0.5.5  formatR_1.0    
##  [5] htmltools_0.2.6 knitr_1.6       parallel_3.1.1  Rcpp_0.11.2    
##  [9] rmarkdown_0.3.3 stringr_0.6.2   tools_3.1.1     yaml_2.1.13

解决方法

我有同样的问题.我解决了下面的问题.但是,我不保证解决方案是坚如磐石的.试试看:

library(dplyr)
library(sqldf)

# Modifying built-in mtcars dataset

mtcars$test <- 
  c("?","?","?","?",letters) %>% 
  enc2utf8(.)

mtcars$?e???? <- 
  c("?",letters) %>% 
  enc2utf8(.)

names(mtcars) <- 
  iconv(names(mtcars),"cp1250","utf-8")

# Connecting to sqlite database

my_db <- src_sqlite("my_db.sqlite3",create = T)

# exporting mtcars dataset to database
copy_to(my_db,mtcars,temporary = FALSE)

# dbSendQuery(my_db$con,"drop table mtcars")

# getting data from sqlite database
my_mtcars_from_db <-
  collect(tbl(my_db,"mtcars"))

# disconnecting from database
dbDisconnect(my_db$con)

convert_to_encoding()函数

# a function that encodes 
# column names and values in character columns
# with specified encodings
convert_to_encoding <- 
  function(x,from_encoding = "UTF-8",to_encoding = "cp1250"){

    # names of columns are encoded in specified encoding
    my_names <- 
      iconv(names(x),from_encoding,to_encoding) 

    # if any column name is NA,leave the names
    # otherwise replace them with new names
    if(any(is.na(my_names))){
      names(x)
    } else {
      names(x) <- my_names
    }

    # get column classes
    x_char_columns <- sapply(x,class)
    # identify character columns
    x_cols <- names(x_char_columns[x_char_columns == "character"])

    # convert all string values in character columns to 
    # specified encoding
    x <- 
      x %>%
      mutate_each_(funs(iconv(.,to_encoding)),x_cols)
    # return x
    return(x)
  }

# use
convert_to_encoding(my_mtcars_from_db,"UTF-8","cp1250")

结果

# before conversion
my_mtcars_from_db

Source: local data frame [32 x 13]

    mpg cyl  disp  hp drat    wt  qsec vs am gear carb ??e?ˇ?????¤ test
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4          ??   ??
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4          ??   ??
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1          ?ˇ   ?ˇ
4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1          ??   ??
5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2          ??   ??
6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1          ?ˇ   ?ˇ
7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4           a    a
8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2           b    b
9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2           c    c
10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4           d    d
..  ... ...   ... ...  ...   ...   ... .. ..  ...  ...         ...  ...

# after conversion
convert_to_encoding(my_mtcars_from_db,"cp1250")

Source: local data frame [32 x 13]

    mpg cyl  disp  hp drat    wt  qsec vs am gear carb test ?e????
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4    ?      ?
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4    ?      ?
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1    ?      ?
4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1    ?      ?
5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2    ?      ?
6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1    ?      ?
7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4    a      a
8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2    b      b
9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2    c      c
10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4    d      d
..  ... ...   ... ...  ...   ...   ... .. ..  ...  ...  ...    ...

会话信息

devtools::session_info()

Session info -------------------------------------------------------------------
 setting  value                       
 version  R version 3.2.0 (2015-04-16)
 system   x86_64,mingw32             
 ui       RStudio (0.99.441)          
 language (EN)                        
 collate  Slovenian_Slovenia.1250     
 tz       Europe/Prague               

Packages -----------------------------------------------------------------------
 package    * version date       source        
 assertthat * 0.1     2013-12-06 CRAN (R 3.2.0)
 chron      * 2.3-45  2014-02-11 CRAN (R 3.2.0)
 DBI          0.3.1   2014-09-24 CRAN (R 3.2.0)
 devtools   * 1.7.0   2015-01-17 CRAN (R 3.2.0)
 dplyr        0.4.1   2015-01-14 CRAN (R 3.2.0)
 gsubfn       0.6-6   2014-08-27 CRAN (R 3.2.0)
 lazyeval   * 0.1.10  2015-01-02 CRAN (R 3.2.0)
 magrittr   * 1.5     2014-11-22 CRAN (R 3.2.0)
 proto        0.3-10  2012-12-22 CRAN (R 3.2.0)
 R6         * 2.0.1   2014-10-29 CRAN (R 3.2.0)
 Rcpp       * 0.11.6  2015-05-01 CRAN (R 3.2.0)
 RSQLite      1.0.0   2014-10-25 CRAN (R 3.2.0)
 rstudioapi * 0.3.1   2015-04-07 CRAN (R 3.2.0)
 sqldf        0.4-10  2014-11-07 CRAN (R 3.2.0)

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读