R :Read csv numeric with comma in decimal, package sparklyr
I need to read a file of type ".csv" using the library "sparklyr", in which the numeric values appear with commas.
I am using:
library(sparklyr)
library(dplyr)
df<-data.frame(DNI=c("22-e","EE-4","55-W"),
DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9"))
write.csv(df,"aff.csv")
sc <- spark_connect(master = "local", spark_home = "/home/tomas/spark-2.1.0-bin-hadoop2.7/", version = "2.1.0")
df <- spark_read_csv(sc, name = "data", path = "/home/tomas/Documentos/Clusterapp/aff.csv", header = TRUE, delimiter = ",")
tbl <- sdf_copy_to(sc = sc, x =C_tbl , overwrite = T)
The problem, read the numbers as factor
r apache-spark sparklyr
New contributor
add a comment |
I need to read a file of type ".csv" using the library "sparklyr", in which the numeric values appear with commas.
I am using:
library(sparklyr)
library(dplyr)
df<-data.frame(DNI=c("22-e","EE-4","55-W"),
DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9"))
write.csv(df,"aff.csv")
sc <- spark_connect(master = "local", spark_home = "/home/tomas/spark-2.1.0-bin-hadoop2.7/", version = "2.1.0")
df <- spark_read_csv(sc, name = "data", path = "/home/tomas/Documentos/Clusterapp/aff.csv", header = TRUE, delimiter = ",")
tbl <- sdf_copy_to(sc = sc, x =C_tbl , overwrite = T)
The problem, read the numbers as factor
r apache-spark sparklyr
New contributor
2
You might want to include some sample data from the file.
– Tim Biegeleisen
Dec 27 at 13:46
2
Please add this data to your question, formatted as readable code.
– Tim Biegeleisen
Dec 27 at 13:58
The first column is an identifier and the others are numeric values, in which the comma identifies a number with decimal. df<-data.frame(DNI=c("22-e","EE-4","55-W"), DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) write.csv(df,"aff.csv")
– Tomas Tapia
Dec 27 at 14:36
I'm looking for something like the csv2 () function of R, but of sparklyr
– Tomas Tapia
Dec 27 at 15:07
add a comment |
I need to read a file of type ".csv" using the library "sparklyr", in which the numeric values appear with commas.
I am using:
library(sparklyr)
library(dplyr)
df<-data.frame(DNI=c("22-e","EE-4","55-W"),
DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9"))
write.csv(df,"aff.csv")
sc <- spark_connect(master = "local", spark_home = "/home/tomas/spark-2.1.0-bin-hadoop2.7/", version = "2.1.0")
df <- spark_read_csv(sc, name = "data", path = "/home/tomas/Documentos/Clusterapp/aff.csv", header = TRUE, delimiter = ",")
tbl <- sdf_copy_to(sc = sc, x =C_tbl , overwrite = T)
The problem, read the numbers as factor
r apache-spark sparklyr
New contributor
I need to read a file of type ".csv" using the library "sparklyr", in which the numeric values appear with commas.
I am using:
library(sparklyr)
library(dplyr)
df<-data.frame(DNI=c("22-e","EE-4","55-W"),
DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9"))
write.csv(df,"aff.csv")
sc <- spark_connect(master = "local", spark_home = "/home/tomas/spark-2.1.0-bin-hadoop2.7/", version = "2.1.0")
df <- spark_read_csv(sc, name = "data", path = "/home/tomas/Documentos/Clusterapp/aff.csv", header = TRUE, delimiter = ",")
tbl <- sdf_copy_to(sc = sc, x =C_tbl , overwrite = T)
The problem, read the numbers as factor
r apache-spark sparklyr
r apache-spark sparklyr
New contributor
New contributor
edited 17 hours ago
user6910411
32.6k86995
32.6k86995
New contributor
asked Dec 27 at 13:39
Tomas Tapia
11
11
New contributor
New contributor
2
You might want to include some sample data from the file.
– Tim Biegeleisen
Dec 27 at 13:46
2
Please add this data to your question, formatted as readable code.
– Tim Biegeleisen
Dec 27 at 13:58
The first column is an identifier and the others are numeric values, in which the comma identifies a number with decimal. df<-data.frame(DNI=c("22-e","EE-4","55-W"), DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) write.csv(df,"aff.csv")
– Tomas Tapia
Dec 27 at 14:36
I'm looking for something like the csv2 () function of R, but of sparklyr
– Tomas Tapia
Dec 27 at 15:07
add a comment |
2
You might want to include some sample data from the file.
– Tim Biegeleisen
Dec 27 at 13:46
2
Please add this data to your question, formatted as readable code.
– Tim Biegeleisen
Dec 27 at 13:58
The first column is an identifier and the others are numeric values, in which the comma identifies a number with decimal. df<-data.frame(DNI=c("22-e","EE-4","55-W"), DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) write.csv(df,"aff.csv")
– Tomas Tapia
Dec 27 at 14:36
I'm looking for something like the csv2 () function of R, but of sparklyr
– Tomas Tapia
Dec 27 at 15:07
2
2
You might want to include some sample data from the file.
– Tim Biegeleisen
Dec 27 at 13:46
You might want to include some sample data from the file.
– Tim Biegeleisen
Dec 27 at 13:46
2
2
Please add this data to your question, formatted as readable code.
– Tim Biegeleisen
Dec 27 at 13:58
Please add this data to your question, formatted as readable code.
– Tim Biegeleisen
Dec 27 at 13:58
The first column is an identifier and the others are numeric values, in which the comma identifies a number with decimal. df<-data.frame(DNI=c("22-e","EE-4","55-W"), DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) write.csv(df,"aff.csv")
– Tomas Tapia
Dec 27 at 14:36
The first column is an identifier and the others are numeric values, in which the comma identifies a number with decimal. df<-data.frame(DNI=c("22-e","EE-4","55-W"), DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) write.csv(df,"aff.csv")
– Tomas Tapia
Dec 27 at 14:36
I'm looking for something like the csv2 () function of R, but of sparklyr
– Tomas Tapia
Dec 27 at 15:07
I'm looking for something like the csv2 () function of R, but of sparklyr
– Tomas Tapia
Dec 27 at 15:07
add a comment |
2 Answers
2
active
oldest
votes
To manipulate string inside a spark df you can use regexp_replace
function as mentioned here:
https://spark.rstudio.com/guides/textmining/
For you problem it would work out like this:
tbl <- sdf_copy_to(sc = sc, x =df, overwrite = T)
tbl0<-tbl%>%
mutate(DD=regexp_replace(DD,",","."),CC=regexp_replace(CC,",","."))%>%
mutate_at(vars(c("DD","CC")),as.numeric)
to check your result:
> glimpse(tbl0)
Observations: ??
Variables: 3
$ DNI <chr> "22-e", "EE-4", "55-W"
$ DD <dbl> 33.20, 33.20, 14.55
$ CC <dbl> 2.0, 44.4, 44.9
add a comment |
You could replace the "," in the numbers with "." and convert them to numeric. For instance
df$DD<-as.numeric(gsub(pattern = ",",replacement = ".",x = df$DD))
Does that help?
It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Tomas Tapia is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946018%2fr-read-csv-numeric-with-comma-in-decimal-package-sparklyr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
To manipulate string inside a spark df you can use regexp_replace
function as mentioned here:
https://spark.rstudio.com/guides/textmining/
For you problem it would work out like this:
tbl <- sdf_copy_to(sc = sc, x =df, overwrite = T)
tbl0<-tbl%>%
mutate(DD=regexp_replace(DD,",","."),CC=regexp_replace(CC,",","."))%>%
mutate_at(vars(c("DD","CC")),as.numeric)
to check your result:
> glimpse(tbl0)
Observations: ??
Variables: 3
$ DNI <chr> "22-e", "EE-4", "55-W"
$ DD <dbl> 33.20, 33.20, 14.55
$ CC <dbl> 2.0, 44.4, 44.9
add a comment |
To manipulate string inside a spark df you can use regexp_replace
function as mentioned here:
https://spark.rstudio.com/guides/textmining/
For you problem it would work out like this:
tbl <- sdf_copy_to(sc = sc, x =df, overwrite = T)
tbl0<-tbl%>%
mutate(DD=regexp_replace(DD,",","."),CC=regexp_replace(CC,",","."))%>%
mutate_at(vars(c("DD","CC")),as.numeric)
to check your result:
> glimpse(tbl0)
Observations: ??
Variables: 3
$ DNI <chr> "22-e", "EE-4", "55-W"
$ DD <dbl> 33.20, 33.20, 14.55
$ CC <dbl> 2.0, 44.4, 44.9
add a comment |
To manipulate string inside a spark df you can use regexp_replace
function as mentioned here:
https://spark.rstudio.com/guides/textmining/
For you problem it would work out like this:
tbl <- sdf_copy_to(sc = sc, x =df, overwrite = T)
tbl0<-tbl%>%
mutate(DD=regexp_replace(DD,",","."),CC=regexp_replace(CC,",","."))%>%
mutate_at(vars(c("DD","CC")),as.numeric)
to check your result:
> glimpse(tbl0)
Observations: ??
Variables: 3
$ DNI <chr> "22-e", "EE-4", "55-W"
$ DD <dbl> 33.20, 33.20, 14.55
$ CC <dbl> 2.0, 44.4, 44.9
To manipulate string inside a spark df you can use regexp_replace
function as mentioned here:
https://spark.rstudio.com/guides/textmining/
For you problem it would work out like this:
tbl <- sdf_copy_to(sc = sc, x =df, overwrite = T)
tbl0<-tbl%>%
mutate(DD=regexp_replace(DD,",","."),CC=regexp_replace(CC,",","."))%>%
mutate_at(vars(c("DD","CC")),as.numeric)
to check your result:
> glimpse(tbl0)
Observations: ??
Variables: 3
$ DNI <chr> "22-e", "EE-4", "55-W"
$ DD <dbl> 33.20, 33.20, 14.55
$ CC <dbl> 2.0, 44.4, 44.9
answered 20 hours ago
Antonis
1,4831715
1,4831715
add a comment |
add a comment |
You could replace the "," in the numbers with "." and convert them to numeric. For instance
df$DD<-as.numeric(gsub(pattern = ",",replacement = ".",x = df$DD))
Does that help?
It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday
add a comment |
You could replace the "," in the numbers with "." and convert them to numeric. For instance
df$DD<-as.numeric(gsub(pattern = ",",replacement = ".",x = df$DD))
Does that help?
It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday
add a comment |
You could replace the "," in the numbers with "." and convert them to numeric. For instance
df$DD<-as.numeric(gsub(pattern = ",",replacement = ".",x = df$DD))
Does that help?
You could replace the "," in the numbers with "." and convert them to numeric. For instance
df$DD<-as.numeric(gsub(pattern = ",",replacement = ".",x = df$DD))
Does that help?
answered Dec 28 at 10:32
Rage
14312
14312
It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday
add a comment |
It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday
It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday
It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday
add a comment |
Tomas Tapia is a new contributor. Be nice, and check out our Code of Conduct.
Tomas Tapia is a new contributor. Be nice, and check out our Code of Conduct.
Tomas Tapia is a new contributor. Be nice, and check out our Code of Conduct.
Tomas Tapia is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946018%2fr-read-csv-numeric-with-comma-in-decimal-package-sparklyr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
You might want to include some sample data from the file.
– Tim Biegeleisen
Dec 27 at 13:46
2
Please add this data to your question, formatted as readable code.
– Tim Biegeleisen
Dec 27 at 13:58
The first column is an identifier and the others are numeric values, in which the comma identifies a number with decimal. df<-data.frame(DNI=c("22-e","EE-4","55-W"), DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) write.csv(df,"aff.csv")
– Tomas Tapia
Dec 27 at 14:36
I'm looking for something like the csv2 () function of R, but of sparklyr
– Tomas Tapia
Dec 27 at 15:07