R :Read csv numeric with comma in decimal, package sparklyr

I need to read a file of type ".csv" using the library "sparklyr", in which the numeric values appear with commas.

I am using:

library(sparklyr)

library(dplyr)



df<-data.frame(DNI=c("22-e","EE-4","55-W"), 

DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) 



write.csv(df,"aff.csv")



sc <- spark_connect(master = "local", spark_home = "/home/tomas/spark-2.1.0-bin-hadoop2.7/", version = "2.1.0")



df <- spark_read_csv(sc, name = "data", path = "/home/tomas/Documentos/Clusterapp/aff.csv", header = TRUE, delimiter = ",")



tbl <- sdf_copy_to(sc = sc, x =C_tbl , overwrite = T)

The problem, read the numbers as factor

edited 17 hours ago

user6910411

32.6k86995

asked Dec 27 at 13:39

Tomas Tapia

New contributor

2

You might want to include some sample data from the file.
– Tim Biegeleisen
Dec 27 at 13:46

2

Please add this data to your question, formatted as readable code.
– Tim Biegeleisen
Dec 27 at 13:58

The first column is an identifier and the others are numeric values, in which the comma identifies a number with decimal. df<-data.frame(DNI=c("22-e","EE-4","55-W"), DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) write.csv(df,"aff.csv")
– Tomas Tapia
Dec 27 at 14:36

I'm looking for something like the csv2 () function of R, but of sparklyr
– Tomas Tapia
Dec 27 at 15:07

add a comment |

I need to read a file of type ".csv" using the library "sparklyr", in which the numeric values appear with commas.

I am using:

library(sparklyr)

library(dplyr)



df<-data.frame(DNI=c("22-e","EE-4","55-W"), 

DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) 



write.csv(df,"aff.csv")



sc <- spark_connect(master = "local", spark_home = "/home/tomas/spark-2.1.0-bin-hadoop2.7/", version = "2.1.0")



df <- spark_read_csv(sc, name = "data", path = "/home/tomas/Documentos/Clusterapp/aff.csv", header = TRUE, delimiter = ",")



tbl <- sdf_copy_to(sc = sc, x =C_tbl , overwrite = T)

The problem, read the numbers as factor

edited 17 hours ago

user6910411

32.6k86995

asked Dec 27 at 13:39

Tomas Tapia

New contributor

2

You might want to include some sample data from the file.
– Tim Biegeleisen
Dec 27 at 13:46

2

Please add this data to your question, formatted as readable code.
– Tim Biegeleisen
Dec 27 at 13:58

The first column is an identifier and the others are numeric values, in which the comma identifies a number with decimal. df<-data.frame(DNI=c("22-e","EE-4","55-W"), DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) write.csv(df,"aff.csv")
– Tomas Tapia
Dec 27 at 14:36

I'm looking for something like the csv2 () function of R, but of sparklyr
– Tomas Tapia
Dec 27 at 15:07

add a comment |

I need to read a file of type ".csv" using the library "sparklyr", in which the numeric values appear with commas.

I am using:

library(sparklyr)

library(dplyr)



df<-data.frame(DNI=c("22-e","EE-4","55-W"), 

DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) 



write.csv(df,"aff.csv")



sc <- spark_connect(master = "local", spark_home = "/home/tomas/spark-2.1.0-bin-hadoop2.7/", version = "2.1.0")



df <- spark_read_csv(sc, name = "data", path = "/home/tomas/Documentos/Clusterapp/aff.csv", header = TRUE, delimiter = ",")



tbl <- sdf_copy_to(sc = sc, x =C_tbl , overwrite = T)

The problem, read the numbers as factor

edited 17 hours ago

user6910411

32.6k86995

asked Dec 27 at 13:39

Tomas Tapia

New contributor

I need to read a file of type ".csv" using the library "sparklyr", in which the numeric values appear with commas.

I am using:

library(sparklyr)

library(dplyr)



df<-data.frame(DNI=c("22-e","EE-4","55-W"), 

DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) 



write.csv(df,"aff.csv")



sc <- spark_connect(master = "local", spark_home = "/home/tomas/spark-2.1.0-bin-hadoop2.7/", version = "2.1.0")



df <- spark_read_csv(sc, name = "data", path = "/home/tomas/Documentos/Clusterapp/aff.csv", header = TRUE, delimiter = ",")



tbl <- sdf_copy_to(sc = sc, x =C_tbl , overwrite = T)

The problem, read the numbers as factor

r apache-spark sparklyr

edited 17 hours ago

user6910411

32.6k86995

asked Dec 27 at 13:39

Tomas Tapia

New contributor

edited 17 hours ago

user6910411

32.6k86995

asked Dec 27 at 13:39

Tomas Tapia

New contributor

edited 17 hours ago

user6910411

32.6k86995

edited 17 hours ago

user6910411

32.6k86995

edited 17 hours ago

user6910411

32.6k86995

asked Dec 27 at 13:39

Tomas Tapia

New contributor

asked Dec 27 at 13:39

Tomas Tapia

asked Dec 27 at 13:39

Tomas Tapia

New contributor

Tomas Tapia is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

2

You might want to include some sample data from the file.
– Tim Biegeleisen
Dec 27 at 13:46

2

Please add this data to your question, formatted as readable code.
– Tim Biegeleisen
Dec 27 at 13:58

The first column is an identifier and the others are numeric values, in which the comma identifies a number with decimal. df<-data.frame(DNI=c("22-e","EE-4","55-W"), DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) write.csv(df,"aff.csv")
– Tomas Tapia
Dec 27 at 14:36

I'm looking for something like the csv2 () function of R, but of sparklyr
– Tomas Tapia
Dec 27 at 15:07

add a comment |

2

You might want to include some sample data from the file.
– Tim Biegeleisen
Dec 27 at 13:46

2

Please add this data to your question, formatted as readable code.
– Tim Biegeleisen
Dec 27 at 13:58

The first column is an identifier and the others are numeric values, in which the comma identifies a number with decimal. df<-data.frame(DNI=c("22-e","EE-4","55-W"), DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) write.csv(df,"aff.csv")
– Tomas Tapia
Dec 27 at 14:36

I'm looking for something like the csv2 () function of R, but of sparklyr
– Tomas Tapia
Dec 27 at 15:07

You might want to include some sample data from the file.
– Tim Biegeleisen
Dec 27 at 13:46

Please add this data to your question, formatted as readable code.
– Tim Biegeleisen
Dec 27 at 13:58

The first column is an identifier and the others are numeric values, in which the comma identifies a number with decimal. df<-data.frame(DNI=c("22-e","EE-4","55-W"), DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9")) write.csv(df,"aff.csv")
– Tomas Tapia
Dec 27 at 14:36

I'm looking for something like the csv2 () function of R, but of sparklyr
– Tomas Tapia
Dec 27 at 15:07

add a comment |

2 Answers
2

active

oldest

votes

To manipulate string inside a spark df you can use regexp_replace function as mentioned here:

https://spark.rstudio.com/guides/textmining/

For you problem it would work out like this:

tbl <- sdf_copy_to(sc = sc, x =df, overwrite = T)



tbl0<-tbl%>%

    mutate(DD=regexp_replace(DD,",","."),CC=regexp_replace(CC,",","."))%>%

    mutate_at(vars(c("DD","CC")),as.numeric)

to check your result:

> glimpse(tbl0)

Observations: ??

Variables: 3

$ DNI <chr> "22-e", "EE-4", "55-W"

$ DD  <dbl> 33.20, 33.20, 14.55

$ CC  <dbl> 2.0, 44.4, 44.9

answered 20 hours ago

Antonis

1,4831715

add a comment |

You could replace the "," in the numbers with "." and convert them to numeric. For instance

df$DD<-as.numeric(gsub(pattern = ",",replacement = ".",x = df$DD))

Does that help?

answered Dec 28 at 10:32

Rage

14312

It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Tomas Tapia is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53946018%2fr-read-csv-numeric-with-comma-in-decimal-package-sparklyr%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

To manipulate string inside a spark df you can use regexp_replace function as mentioned here:

https://spark.rstudio.com/guides/textmining/

For you problem it would work out like this:

tbl <- sdf_copy_to(sc = sc, x =df, overwrite = T)



tbl0<-tbl%>%

    mutate(DD=regexp_replace(DD,",","."),CC=regexp_replace(CC,",","."))%>%

    mutate_at(vars(c("DD","CC")),as.numeric)

to check your result:

> glimpse(tbl0)

Observations: ??

Variables: 3

$ DNI <chr> "22-e", "EE-4", "55-W"

$ DD  <dbl> 33.20, 33.20, 14.55

$ CC  <dbl> 2.0, 44.4, 44.9

answered 20 hours ago

Antonis

1,4831715

add a comment |

To manipulate string inside a spark df you can use regexp_replace function as mentioned here:

https://spark.rstudio.com/guides/textmining/

For you problem it would work out like this:

tbl <- sdf_copy_to(sc = sc, x =df, overwrite = T)



tbl0<-tbl%>%

    mutate(DD=regexp_replace(DD,",","."),CC=regexp_replace(CC,",","."))%>%

    mutate_at(vars(c("DD","CC")),as.numeric)

to check your result:

> glimpse(tbl0)

Observations: ??

Variables: 3

$ DNI <chr> "22-e", "EE-4", "55-W"

$ DD  <dbl> 33.20, 33.20, 14.55

$ CC  <dbl> 2.0, 44.4, 44.9

answered 20 hours ago

Antonis

1,4831715

add a comment |

To manipulate string inside a spark df you can use regexp_replace function as mentioned here:

https://spark.rstudio.com/guides/textmining/

For you problem it would work out like this:

tbl <- sdf_copy_to(sc = sc, x =df, overwrite = T)



tbl0<-tbl%>%

    mutate(DD=regexp_replace(DD,",","."),CC=regexp_replace(CC,",","."))%>%

    mutate_at(vars(c("DD","CC")),as.numeric)

to check your result:

> glimpse(tbl0)

Observations: ??

Variables: 3

$ DNI <chr> "22-e", "EE-4", "55-W"

$ DD  <dbl> 33.20, 33.20, 14.55

$ CC  <dbl> 2.0, 44.4, 44.9

answered 20 hours ago

Antonis

1,4831715

To manipulate string inside a spark df you can use regexp_replace function as mentioned here:

https://spark.rstudio.com/guides/textmining/

For you problem it would work out like this:

tbl <- sdf_copy_to(sc = sc, x =df, overwrite = T)



tbl0<-tbl%>%

    mutate(DD=regexp_replace(DD,",","."),CC=regexp_replace(CC,",","."))%>%

    mutate_at(vars(c("DD","CC")),as.numeric)

to check your result:

> glimpse(tbl0)

Observations: ??

Variables: 3

$ DNI <chr> "22-e", "EE-4", "55-W"

$ DD  <dbl> 33.20, 33.20, 14.55

$ CC  <dbl> 2.0, 44.4, 44.9

answered 20 hours ago

Antonis

1,4831715

answered 20 hours ago

Antonis

1,4831715

answered 20 hours ago

Antonis

1,4831715

answered 20 hours ago

Antonis

1,4831715

add a comment |

You could replace the "," in the numbers with "." and convert them to numeric. For instance

df$DD<-as.numeric(gsub(pattern = ",",replacement = ".",x = df$DD))

Does that help?

answered Dec 28 at 10:32

Rage

14312

It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday

add a comment |

You could replace the "," in the numbers with "." and convert them to numeric. For instance

df$DD<-as.numeric(gsub(pattern = ",",replacement = ".",x = df$DD))

Does that help?

answered Dec 28 at 10:32

Rage

14312

It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday

add a comment |

You could replace the "," in the numbers with "." and convert them to numeric. For instance

df$DD<-as.numeric(gsub(pattern = ",",replacement = ".",x = df$DD))

Does that help?

answered Dec 28 at 10:32

Rage

14312

You could replace the "," in the numbers with "." and convert them to numeric. For instance

df$DD<-as.numeric(gsub(pattern = ",",replacement = ".",x = df$DD))

Does that help?

answered Dec 28 at 10:32

Rage

14312

answered Dec 28 at 10:32

Rage

14312

answered Dec 28 at 10:32

Rage

14312

answered Dec 28 at 10:32

Rage

14312

It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday

add a comment |

It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday

It is a good strategy, but the objective is to be able to read the data without having to load the data in R, but to operate with Sparklyr directly. In R is the command read.csv2() that allows to read decimal numeric data with comma.
– Tomas Tapia
yesterday

add a comment |

Tomas Tapia is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Tomas Tapia is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk