使用dplyr和SQLite进行UTF-8编码
发布时间:2020-12-12 18:55:58 所属栏目:百科 来源:网络整理
导读:我在SQLite中有一个表,我想用dplyr打开它.我在使用Win 7的PC上使用SQLite专家版35.58.2478,R Studio版本0.98.1062. 用src_sqlite()连接到数据库并用tbl()读取后,我得到了表.但是这个角色是错误的.从csv文件中读取同一个表只是通过将函数read =“utf-8”添加
我在SQLite中有一个表,我想用dplyr打开它.我在使用Win 7的PC上使用SQLite专家版35.58.2478,R Studio版本0.98.1062.
用src_sqlite()连接到数据库并用tbl()读取后,我得到了表.但是这个角色是错误的.从csv文件中读取同一个表只是通过将函数read =“utf-8”添加到函数read.csv,但在这种情况下,第一列名称中会出现另一个错误(请考虑下面的最小示例). 请注意,在SQLite表中,编码为UTF-8,SQLite正确显示数据. 我试图在R Studio选项中更改编码但没有成功.在Windows或r中更改区域也没有任何效果. 是否有任何解决方案使用dplyr将表中的字符正确地转换为r? 最小的例子 library(dplyr) db <- src_sqlite("C:/Users/Jens/Documents/SQLite/my_db.sqlite") tbl(db,"prozesse") ## Source: sqlite 3.7.17 [C:/Users/Jens/Documents/SQLite/my_db.sqlite] ## From: prozesse [4 x 4] ## ## KH_ID Einsch?¤tzung Prozess Gruppe ## 1 1 3 Buchung IT ## 2 2 4 Buchung IT ## 3 3 3 Buchung OLP ## 4 4 5 Buchung OLP 您在第二列的名称中看到错误的编码.使用?,?,ü等在colums中也会出现此问题. 第二列的名称显示正确,但第一列错误: read.csv("C:/Users/Jens/Documents/SQLite/prozess.csv",encoding = "UTF-8") ## X.U.FEFF.KH_ID Einsch?tzung Gruppe Prozess ## 1 1 3 PO visite ## 2 2 3 IT visite ## 3 3 3 IT visite ## 4 2 3 PO visite sessionInfo() ## R version 3.1.1 (2014-07-10) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## ## locale: ## [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 ## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C ## [5] LC_TIME=German_Germany.1252 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] RSQLite.extfuns_0.0.1 RSQLite_0.11.4 DBI_0.3.0 ## [4] dplyr_0.2 ## ## loaded via a namespace (and not attached): ## [1] assertthat_0.1 digest_0.6.4 evaluate_0.5.5 formatR_1.0 ## [5] htmltools_0.2.6 knitr_1.6 parallel_3.1.1 Rcpp_0.11.2 ## [9] rmarkdown_0.3.3 stringr_0.6.2 tools_3.1.1 yaml_2.1.13 解决方法我有同样的问题.我解决了下面的问题.但是,我不保证解决方案是坚如磐石的.试试看:library(dplyr) library(sqldf) # Modifying built-in mtcars dataset mtcars$test <- c("?","?","?","?",letters) %>% enc2utf8(.) mtcars$?e???? <- c("?",letters) %>% enc2utf8(.) names(mtcars) <- iconv(names(mtcars),"cp1250","utf-8") # Connecting to sqlite database my_db <- src_sqlite("my_db.sqlite3",create = T) # exporting mtcars dataset to database copy_to(my_db,mtcars,temporary = FALSE) # dbSendQuery(my_db$con,"drop table mtcars") # getting data from sqlite database my_mtcars_from_db <- collect(tbl(my_db,"mtcars")) # disconnecting from database dbDisconnect(my_db$con) convert_to_encoding()函数 # a function that encodes # column names and values in character columns # with specified encodings convert_to_encoding <- function(x,from_encoding = "UTF-8",to_encoding = "cp1250"){ # names of columns are encoded in specified encoding my_names <- iconv(names(x),from_encoding,to_encoding) # if any column name is NA,leave the names # otherwise replace them with new names if(any(is.na(my_names))){ names(x) } else { names(x) <- my_names } # get column classes x_char_columns <- sapply(x,class) # identify character columns x_cols <- names(x_char_columns[x_char_columns == "character"]) # convert all string values in character columns to # specified encoding x <- x %>% mutate_each_(funs(iconv(.,to_encoding)),x_cols) # return x return(x) } # use convert_to_encoding(my_mtcars_from_db,"UTF-8","cp1250") 结果 # before conversion my_mtcars_from_db Source: local data frame [32 x 13] mpg cyl disp hp drat wt qsec vs am gear carb ??e?ˇ?????¤ test 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ?? ?? 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ?? ?? 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ?ˇ ?ˇ 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ?? ?? 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ?? ?? 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ?ˇ ?ˇ 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 a a 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 b b 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 c c 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 d d .. ... ... ... ... ... ... ... .. .. ... ... ... ... # after conversion convert_to_encoding(my_mtcars_from_db,"cp1250") Source: local data frame [32 x 13] mpg cyl disp hp drat wt qsec vs am gear carb test ?e???? 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ? ? 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ? ? 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ? ? 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ? ? 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ? ? 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ? ? 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 a a 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 b b 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 c c 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 d d .. ... ... ... ... ... ... ... .. .. ... ... ... ... 会话信息 devtools::session_info() Session info ------------------------------------------------------------------- setting value version R version 3.2.0 (2015-04-16) system x86_64,mingw32 ui RStudio (0.99.441) language (EN) collate Slovenian_Slovenia.1250 tz Europe/Prague Packages ----------------------------------------------------------------------- package * version date source assertthat * 0.1 2013-12-06 CRAN (R 3.2.0) chron * 2.3-45 2014-02-11 CRAN (R 3.2.0) DBI 0.3.1 2014-09-24 CRAN (R 3.2.0) devtools * 1.7.0 2015-01-17 CRAN (R 3.2.0) dplyr 0.4.1 2015-01-14 CRAN (R 3.2.0) gsubfn 0.6-6 2014-08-27 CRAN (R 3.2.0) lazyeval * 0.1.10 2015-01-02 CRAN (R 3.2.0) magrittr * 1.5 2014-11-22 CRAN (R 3.2.0) proto 0.3-10 2012-12-22 CRAN (R 3.2.0) R6 * 2.0.1 2014-10-29 CRAN (R 3.2.0) Rcpp * 0.11.6 2015-05-01 CRAN (R 3.2.0) RSQLite 1.0.0 2014-10-25 CRAN (R 3.2.0) rstudioapi * 0.3.1 2015-04-07 CRAN (R 3.2.0) sqldf 0.4-10 2014-11-07 CRAN (R 3.2.0) (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |