Expand multiple columns of data.table containing observations












1















I have a data.table where more than 2 columns are of the type list. I would like to expand these columns, so that each element of the list becomes a new column. I would like to have a more elegant way than to "manually" expand each column and then join the tables together.



the setup



Edit: (providing the json from which i obtained the data.table)



So i have a json file like this:



[
{
"origins": [
{
"orig_lon": "14.36784",
"orig_lat": "49.985982",
"local_id": "AD.22045279",
"full_address": "Věštínská 36/9, Radotín, 15300 Praha 5"
},
{
"orig_lon": "14.352792",
"orig_lat": "49.983317",
"local_id": "AD.22055428",
"full_address": "Otínská 1102/37, Radotín, 15300 Praha 5"
}
],
"destinations": [
{
"dest_lon": "14.352245",
"dest_lat": "49.981314",
"local_id": "AD.22045848",
"full_address": "Zderazská 98/3, Radotín, 15300 Praha 5"
},
{
"dest_lon": "14.226975",
"dest_lat": "50.051702",
"local_id": "AD.27261433",
"full_address": "Západní 458, 25303 Chýně"
}
],
"destination_addresses": [
"Zderazská 98/3, 153 00 Praha-Radotín, Czechia",
"Západní 458, 253 01 Chýně, Czechia"
],
"origin_addresses": [
"U Jankovky 455/18, 153 00 Praha-Radotín, Czechia",
"Otínská 1102/37, 153 00 Praha-Radotín, Czechia"
],
"rows": [
{
"elements": [
{
"distance": {
"text": "1.6 km",
"value": 1620
},
"duration": {
"text": "5 mins",
"value": 272
},
"duration_in_traffic": {
"text": "5 mins",
"value": 277
},
"status": "OK"
},
{
"distance": {
"text": "19.3 km",
"value": 19313
},
"duration": {
"text": "22 mins",
"value": 1343
},
"duration_in_traffic": {
"text": "24 mins",
"value": 1424
},
"status": "OK"
}
]
},
{
"elements": [
{
"distance": {
"text": "0.7 km",
"value": 691
},
"duration": {
"text": "2 mins",
"value": 101
},
"duration_in_traffic": {
"text": "2 mins",
"value": 99
},
"status": "OK"
},
{
"distance": {
"text": "18.7 km",
"value": 18655
},
"duration": {
"text": "21 mins",
"value": 1246
},
"duration_in_traffic": {
"text": "22 mins",
"value": 1336
},
"status": "OK"
}
]
}
],
"status": "OK"
},
{
"origins": [
{
"orig_lon": "14.36784",
"orig_lat": "49.985982",
"local_id": "AD.22045279",
"full_address": "Věštínská 36/9, Radotín, 15300 Praha 5"
},
{
"orig_lon": "14.352792",
"orig_lat": "49.983317",
"local_id": "AD.22055428",
"full_address": "Otínská 1102/37, Radotín, 15300 Praha 5"
}
],
"destinations": [
{
"dest_lon": "14.36053",
"dest_lat": "49.981687",
"local_id": "AD.22047131",
"full_address": "Zítkova 235/7, Radotín, 15300 Praha 5"
},
{
"dest_lon": "14.361052",
"dest_lat": "49.988529",
"local_id": "AD.22054952",
"full_address": "Strážovská 1053/33, Radotín, 15300 Praha 5"
}
],
"destination_addresses": [
"Zítkova 235/7, 153 00 Praha-Radotín, Czechia",
"Strážovská 1053/33, 153 00 Praha-Radotín, Czechia"
],
"origin_addresses": [
"U Jankovky 455/18, 153 00 Praha-Radotín, Czechia",
"Otínská 1102/37, 153 00 Praha-Radotín, Czechia"
],
"rows": [
{
"elements": [
{
"distance": {
"text": "1.4 km",
"value": 1445
},
"duration": {
"text": "4 mins",
"value": 248
},
"duration_in_traffic": {
"text": "4 mins",
"value": 247
},
"status": "OK"
},
{
"distance": {
"text": "1.9 km",
"value": 1933
},
"duration": {
"text": "4 mins",
"value": 264
},
"duration_in_traffic": {
"text": "4 mins",
"value": 267
},
"status": "OK"
}
]
},
{
"elements": [
{
"distance": {
"text": "1.4 km",
"value": 1374
},
"duration": {
"text": "4 mins",
"value": 232
},
"duration_in_traffic": {
"text": "4 mins",
"value": 241
},
"status": "OK"
},
{
"distance": {
"text": "1.3 km",
"value": 1274
},
"duration": {
"text": "3 mins",
"value": 167
},
"duration_in_traffic": {
"text": "3 mins",
"value": 174
},
"status": "OK"
}
]
}
],
"status": "OK"
}
]


Which I read in like:



library(jsonlite)
library(data.table)
data <- read_json('./path_to_that_json/that_json.json')


This results in a list of length 2.



I can covert this into data.table like:



dt <- rbindlist(lapply(data, as.data.table))


Which then results in a data.table like:



   origins destinations                             destination_addresses                                 origin_addresses
1: <list> <list> Zderazská 98/3, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia
2: <list> <list> Západní 458, 253 01 Chýne, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia
3: <list> <list> Zítkova 235/7, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia
4: <list> <list> Strážovská 1053/33, 153 00 Praha-Radotín, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia
rows status
1: <list> OK
2: <list> OK
3: <list> OK
4: <list> OK


This means I have several columns containing list and i would like to expand them.



what kinda works



I know that to expand just one column, I can do:



dt[, r = as.character(.I)]
res1 <- dt[, rbindlist(setNames(origins, r), id = "r")]


(I found that here: Expand list column of data.tables )



Now, i could expand multiple columns by repeating this call and joining the results using the column r. This could look like:



res1 <- dt[dt[, rbindlist(origins, id = "r")][
, `:=`(r=as.character(r))], on = "r"][, `:=`(origins = NULL, destinations = NULL)][dt[
, rbindlist(destinations, id = "r")][
, `:=`(r=as.character(r))], on = "r"]


Which would give me the desired output of:



                               destination_addresses                                 origin_addresses   rows status r
1: Zderazská 98/3, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia <list> OK 1
2: Západní 458, 253 01 Chýne, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia <list> OK 2
3: Zítkova 235/7, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia <list> OK 3
4: Strážovská 1053/33, 153 00 Praha-Radotín, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia <list> OK 4
orig_lon orig_lat local_id full_address dest_lon dest_lat i.local_id
1: 14.36784 49.985982 AD.22045279 Veštínská 36/9, Radotín, 15300 Praha 5 14.352245 49.981314 AD.22045848
2: 14.352792 49.983317 AD.22055428 Otínská 1102/37, Radotín, 15300 Praha 5 14.226975 50.051702 AD.27261433
3: 14.36784 49.985982 AD.22045279 Veštínská 36/9, Radotín, 15300 Praha 5 14.36053 49.981687 AD.22047131
4: 14.352792 49.983317 AD.22055428 Otínská 1102/37, Radotín, 15300 Praha 5 14.361052 49.988529 AD.22054952
i.full_address
1: Zderazská 98/3, Radotín, 15300 Praha 5
2: Západní 458, 25303 Chýne
3: Zítkova 235/7, Radotín, 15300 Praha 5
4: Strážovská 1053/33, Radotín, 15300 Praha 5


My question is:



Is there a more elegant and more efficient way of expanding several columns? In theory, i would like to have a list of columns to be expanded and then make one call which would expand all of them and return the above result.



Also, with the column rows, the expanding is a bit more complicated: so far i am creating a new column of type list, which does not include the status record. Something like:



dt[, rows2 := lapply(rows, function(x) list("distance" = (x[[1]][[1]]["distance"]),
"duration" = (x[[1]][[1]]["duration"]),
"duration_in_traffic" = (x[[1]][[1]]["duration_in_traffic"])))]


And then the above procedure can be used to expand rows2 into three columns of type list, which can be subsequently expanded using the same procedure. Now, this approach sucks for the obvious reason as not being really straightforward for anyone who reads the code after me. Moreover, it takes a lot of typing. I think there must be way more elegant way of wrangling this.










share|improve this question

























  • Oh, in case anyone notices that the: destination_addresses and origin_addresses columns do not match exactly the full_address columns, that is not a problem in my data set: it's just that google api sometimes matches coordinates to different address than another api provided by the cadastral office

    – ira
    Dec 29 '18 at 11:29











  • The real question is how did you get nested lists into data table? From JSON? Describe the source of data.

    – Parfait
    Dec 29 '18 at 15:33











  • Yes, i read it in from JSON, which has been generated by a python script used to query Google distance matrix api based on a list of coordinates stored in another Json.

    – ira
    Dec 29 '18 at 16:51











  • @Parfait okay, i added a subset of the json i am using. That is the source data i am reading into R and then wrangling.

    – ira
    Dec 29 '18 at 18:15
















1















I have a data.table where more than 2 columns are of the type list. I would like to expand these columns, so that each element of the list becomes a new column. I would like to have a more elegant way than to "manually" expand each column and then join the tables together.



the setup



Edit: (providing the json from which i obtained the data.table)



So i have a json file like this:



[
{
"origins": [
{
"orig_lon": "14.36784",
"orig_lat": "49.985982",
"local_id": "AD.22045279",
"full_address": "Věštínská 36/9, Radotín, 15300 Praha 5"
},
{
"orig_lon": "14.352792",
"orig_lat": "49.983317",
"local_id": "AD.22055428",
"full_address": "Otínská 1102/37, Radotín, 15300 Praha 5"
}
],
"destinations": [
{
"dest_lon": "14.352245",
"dest_lat": "49.981314",
"local_id": "AD.22045848",
"full_address": "Zderazská 98/3, Radotín, 15300 Praha 5"
},
{
"dest_lon": "14.226975",
"dest_lat": "50.051702",
"local_id": "AD.27261433",
"full_address": "Západní 458, 25303 Chýně"
}
],
"destination_addresses": [
"Zderazská 98/3, 153 00 Praha-Radotín, Czechia",
"Západní 458, 253 01 Chýně, Czechia"
],
"origin_addresses": [
"U Jankovky 455/18, 153 00 Praha-Radotín, Czechia",
"Otínská 1102/37, 153 00 Praha-Radotín, Czechia"
],
"rows": [
{
"elements": [
{
"distance": {
"text": "1.6 km",
"value": 1620
},
"duration": {
"text": "5 mins",
"value": 272
},
"duration_in_traffic": {
"text": "5 mins",
"value": 277
},
"status": "OK"
},
{
"distance": {
"text": "19.3 km",
"value": 19313
},
"duration": {
"text": "22 mins",
"value": 1343
},
"duration_in_traffic": {
"text": "24 mins",
"value": 1424
},
"status": "OK"
}
]
},
{
"elements": [
{
"distance": {
"text": "0.7 km",
"value": 691
},
"duration": {
"text": "2 mins",
"value": 101
},
"duration_in_traffic": {
"text": "2 mins",
"value": 99
},
"status": "OK"
},
{
"distance": {
"text": "18.7 km",
"value": 18655
},
"duration": {
"text": "21 mins",
"value": 1246
},
"duration_in_traffic": {
"text": "22 mins",
"value": 1336
},
"status": "OK"
}
]
}
],
"status": "OK"
},
{
"origins": [
{
"orig_lon": "14.36784",
"orig_lat": "49.985982",
"local_id": "AD.22045279",
"full_address": "Věštínská 36/9, Radotín, 15300 Praha 5"
},
{
"orig_lon": "14.352792",
"orig_lat": "49.983317",
"local_id": "AD.22055428",
"full_address": "Otínská 1102/37, Radotín, 15300 Praha 5"
}
],
"destinations": [
{
"dest_lon": "14.36053",
"dest_lat": "49.981687",
"local_id": "AD.22047131",
"full_address": "Zítkova 235/7, Radotín, 15300 Praha 5"
},
{
"dest_lon": "14.361052",
"dest_lat": "49.988529",
"local_id": "AD.22054952",
"full_address": "Strážovská 1053/33, Radotín, 15300 Praha 5"
}
],
"destination_addresses": [
"Zítkova 235/7, 153 00 Praha-Radotín, Czechia",
"Strážovská 1053/33, 153 00 Praha-Radotín, Czechia"
],
"origin_addresses": [
"U Jankovky 455/18, 153 00 Praha-Radotín, Czechia",
"Otínská 1102/37, 153 00 Praha-Radotín, Czechia"
],
"rows": [
{
"elements": [
{
"distance": {
"text": "1.4 km",
"value": 1445
},
"duration": {
"text": "4 mins",
"value": 248
},
"duration_in_traffic": {
"text": "4 mins",
"value": 247
},
"status": "OK"
},
{
"distance": {
"text": "1.9 km",
"value": 1933
},
"duration": {
"text": "4 mins",
"value": 264
},
"duration_in_traffic": {
"text": "4 mins",
"value": 267
},
"status": "OK"
}
]
},
{
"elements": [
{
"distance": {
"text": "1.4 km",
"value": 1374
},
"duration": {
"text": "4 mins",
"value": 232
},
"duration_in_traffic": {
"text": "4 mins",
"value": 241
},
"status": "OK"
},
{
"distance": {
"text": "1.3 km",
"value": 1274
},
"duration": {
"text": "3 mins",
"value": 167
},
"duration_in_traffic": {
"text": "3 mins",
"value": 174
},
"status": "OK"
}
]
}
],
"status": "OK"
}
]


Which I read in like:



library(jsonlite)
library(data.table)
data <- read_json('./path_to_that_json/that_json.json')


This results in a list of length 2.



I can covert this into data.table like:



dt <- rbindlist(lapply(data, as.data.table))


Which then results in a data.table like:



   origins destinations                             destination_addresses                                 origin_addresses
1: <list> <list> Zderazská 98/3, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia
2: <list> <list> Západní 458, 253 01 Chýne, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia
3: <list> <list> Zítkova 235/7, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia
4: <list> <list> Strážovská 1053/33, 153 00 Praha-Radotín, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia
rows status
1: <list> OK
2: <list> OK
3: <list> OK
4: <list> OK


This means I have several columns containing list and i would like to expand them.



what kinda works



I know that to expand just one column, I can do:



dt[, r = as.character(.I)]
res1 <- dt[, rbindlist(setNames(origins, r), id = "r")]


(I found that here: Expand list column of data.tables )



Now, i could expand multiple columns by repeating this call and joining the results using the column r. This could look like:



res1 <- dt[dt[, rbindlist(origins, id = "r")][
, `:=`(r=as.character(r))], on = "r"][, `:=`(origins = NULL, destinations = NULL)][dt[
, rbindlist(destinations, id = "r")][
, `:=`(r=as.character(r))], on = "r"]


Which would give me the desired output of:



                               destination_addresses                                 origin_addresses   rows status r
1: Zderazská 98/3, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia <list> OK 1
2: Západní 458, 253 01 Chýne, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia <list> OK 2
3: Zítkova 235/7, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia <list> OK 3
4: Strážovská 1053/33, 153 00 Praha-Radotín, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia <list> OK 4
orig_lon orig_lat local_id full_address dest_lon dest_lat i.local_id
1: 14.36784 49.985982 AD.22045279 Veštínská 36/9, Radotín, 15300 Praha 5 14.352245 49.981314 AD.22045848
2: 14.352792 49.983317 AD.22055428 Otínská 1102/37, Radotín, 15300 Praha 5 14.226975 50.051702 AD.27261433
3: 14.36784 49.985982 AD.22045279 Veštínská 36/9, Radotín, 15300 Praha 5 14.36053 49.981687 AD.22047131
4: 14.352792 49.983317 AD.22055428 Otínská 1102/37, Radotín, 15300 Praha 5 14.361052 49.988529 AD.22054952
i.full_address
1: Zderazská 98/3, Radotín, 15300 Praha 5
2: Západní 458, 25303 Chýne
3: Zítkova 235/7, Radotín, 15300 Praha 5
4: Strážovská 1053/33, Radotín, 15300 Praha 5


My question is:



Is there a more elegant and more efficient way of expanding several columns? In theory, i would like to have a list of columns to be expanded and then make one call which would expand all of them and return the above result.



Also, with the column rows, the expanding is a bit more complicated: so far i am creating a new column of type list, which does not include the status record. Something like:



dt[, rows2 := lapply(rows, function(x) list("distance" = (x[[1]][[1]]["distance"]),
"duration" = (x[[1]][[1]]["duration"]),
"duration_in_traffic" = (x[[1]][[1]]["duration_in_traffic"])))]


And then the above procedure can be used to expand rows2 into three columns of type list, which can be subsequently expanded using the same procedure. Now, this approach sucks for the obvious reason as not being really straightforward for anyone who reads the code after me. Moreover, it takes a lot of typing. I think there must be way more elegant way of wrangling this.










share|improve this question

























  • Oh, in case anyone notices that the: destination_addresses and origin_addresses columns do not match exactly the full_address columns, that is not a problem in my data set: it's just that google api sometimes matches coordinates to different address than another api provided by the cadastral office

    – ira
    Dec 29 '18 at 11:29











  • The real question is how did you get nested lists into data table? From JSON? Describe the source of data.

    – Parfait
    Dec 29 '18 at 15:33











  • Yes, i read it in from JSON, which has been generated by a python script used to query Google distance matrix api based on a list of coordinates stored in another Json.

    – ira
    Dec 29 '18 at 16:51











  • @Parfait okay, i added a subset of the json i am using. That is the source data i am reading into R and then wrangling.

    – ira
    Dec 29 '18 at 18:15














1












1








1


1






I have a data.table where more than 2 columns are of the type list. I would like to expand these columns, so that each element of the list becomes a new column. I would like to have a more elegant way than to "manually" expand each column and then join the tables together.



the setup



Edit: (providing the json from which i obtained the data.table)



So i have a json file like this:



[
{
"origins": [
{
"orig_lon": "14.36784",
"orig_lat": "49.985982",
"local_id": "AD.22045279",
"full_address": "Věštínská 36/9, Radotín, 15300 Praha 5"
},
{
"orig_lon": "14.352792",
"orig_lat": "49.983317",
"local_id": "AD.22055428",
"full_address": "Otínská 1102/37, Radotín, 15300 Praha 5"
}
],
"destinations": [
{
"dest_lon": "14.352245",
"dest_lat": "49.981314",
"local_id": "AD.22045848",
"full_address": "Zderazská 98/3, Radotín, 15300 Praha 5"
},
{
"dest_lon": "14.226975",
"dest_lat": "50.051702",
"local_id": "AD.27261433",
"full_address": "Západní 458, 25303 Chýně"
}
],
"destination_addresses": [
"Zderazská 98/3, 153 00 Praha-Radotín, Czechia",
"Západní 458, 253 01 Chýně, Czechia"
],
"origin_addresses": [
"U Jankovky 455/18, 153 00 Praha-Radotín, Czechia",
"Otínská 1102/37, 153 00 Praha-Radotín, Czechia"
],
"rows": [
{
"elements": [
{
"distance": {
"text": "1.6 km",
"value": 1620
},
"duration": {
"text": "5 mins",
"value": 272
},
"duration_in_traffic": {
"text": "5 mins",
"value": 277
},
"status": "OK"
},
{
"distance": {
"text": "19.3 km",
"value": 19313
},
"duration": {
"text": "22 mins",
"value": 1343
},
"duration_in_traffic": {
"text": "24 mins",
"value": 1424
},
"status": "OK"
}
]
},
{
"elements": [
{
"distance": {
"text": "0.7 km",
"value": 691
},
"duration": {
"text": "2 mins",
"value": 101
},
"duration_in_traffic": {
"text": "2 mins",
"value": 99
},
"status": "OK"
},
{
"distance": {
"text": "18.7 km",
"value": 18655
},
"duration": {
"text": "21 mins",
"value": 1246
},
"duration_in_traffic": {
"text": "22 mins",
"value": 1336
},
"status": "OK"
}
]
}
],
"status": "OK"
},
{
"origins": [
{
"orig_lon": "14.36784",
"orig_lat": "49.985982",
"local_id": "AD.22045279",
"full_address": "Věštínská 36/9, Radotín, 15300 Praha 5"
},
{
"orig_lon": "14.352792",
"orig_lat": "49.983317",
"local_id": "AD.22055428",
"full_address": "Otínská 1102/37, Radotín, 15300 Praha 5"
}
],
"destinations": [
{
"dest_lon": "14.36053",
"dest_lat": "49.981687",
"local_id": "AD.22047131",
"full_address": "Zítkova 235/7, Radotín, 15300 Praha 5"
},
{
"dest_lon": "14.361052",
"dest_lat": "49.988529",
"local_id": "AD.22054952",
"full_address": "Strážovská 1053/33, Radotín, 15300 Praha 5"
}
],
"destination_addresses": [
"Zítkova 235/7, 153 00 Praha-Radotín, Czechia",
"Strážovská 1053/33, 153 00 Praha-Radotín, Czechia"
],
"origin_addresses": [
"U Jankovky 455/18, 153 00 Praha-Radotín, Czechia",
"Otínská 1102/37, 153 00 Praha-Radotín, Czechia"
],
"rows": [
{
"elements": [
{
"distance": {
"text": "1.4 km",
"value": 1445
},
"duration": {
"text": "4 mins",
"value": 248
},
"duration_in_traffic": {
"text": "4 mins",
"value": 247
},
"status": "OK"
},
{
"distance": {
"text": "1.9 km",
"value": 1933
},
"duration": {
"text": "4 mins",
"value": 264
},
"duration_in_traffic": {
"text": "4 mins",
"value": 267
},
"status": "OK"
}
]
},
{
"elements": [
{
"distance": {
"text": "1.4 km",
"value": 1374
},
"duration": {
"text": "4 mins",
"value": 232
},
"duration_in_traffic": {
"text": "4 mins",
"value": 241
},
"status": "OK"
},
{
"distance": {
"text": "1.3 km",
"value": 1274
},
"duration": {
"text": "3 mins",
"value": 167
},
"duration_in_traffic": {
"text": "3 mins",
"value": 174
},
"status": "OK"
}
]
}
],
"status": "OK"
}
]


Which I read in like:



library(jsonlite)
library(data.table)
data <- read_json('./path_to_that_json/that_json.json')


This results in a list of length 2.



I can covert this into data.table like:



dt <- rbindlist(lapply(data, as.data.table))


Which then results in a data.table like:



   origins destinations                             destination_addresses                                 origin_addresses
1: <list> <list> Zderazská 98/3, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia
2: <list> <list> Západní 458, 253 01 Chýne, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia
3: <list> <list> Zítkova 235/7, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia
4: <list> <list> Strážovská 1053/33, 153 00 Praha-Radotín, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia
rows status
1: <list> OK
2: <list> OK
3: <list> OK
4: <list> OK


This means I have several columns containing list and i would like to expand them.



what kinda works



I know that to expand just one column, I can do:



dt[, r = as.character(.I)]
res1 <- dt[, rbindlist(setNames(origins, r), id = "r")]


(I found that here: Expand list column of data.tables )



Now, i could expand multiple columns by repeating this call and joining the results using the column r. This could look like:



res1 <- dt[dt[, rbindlist(origins, id = "r")][
, `:=`(r=as.character(r))], on = "r"][, `:=`(origins = NULL, destinations = NULL)][dt[
, rbindlist(destinations, id = "r")][
, `:=`(r=as.character(r))], on = "r"]


Which would give me the desired output of:



                               destination_addresses                                 origin_addresses   rows status r
1: Zderazská 98/3, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia <list> OK 1
2: Západní 458, 253 01 Chýne, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia <list> OK 2
3: Zítkova 235/7, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia <list> OK 3
4: Strážovská 1053/33, 153 00 Praha-Radotín, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia <list> OK 4
orig_lon orig_lat local_id full_address dest_lon dest_lat i.local_id
1: 14.36784 49.985982 AD.22045279 Veštínská 36/9, Radotín, 15300 Praha 5 14.352245 49.981314 AD.22045848
2: 14.352792 49.983317 AD.22055428 Otínská 1102/37, Radotín, 15300 Praha 5 14.226975 50.051702 AD.27261433
3: 14.36784 49.985982 AD.22045279 Veštínská 36/9, Radotín, 15300 Praha 5 14.36053 49.981687 AD.22047131
4: 14.352792 49.983317 AD.22055428 Otínská 1102/37, Radotín, 15300 Praha 5 14.361052 49.988529 AD.22054952
i.full_address
1: Zderazská 98/3, Radotín, 15300 Praha 5
2: Západní 458, 25303 Chýne
3: Zítkova 235/7, Radotín, 15300 Praha 5
4: Strážovská 1053/33, Radotín, 15300 Praha 5


My question is:



Is there a more elegant and more efficient way of expanding several columns? In theory, i would like to have a list of columns to be expanded and then make one call which would expand all of them and return the above result.



Also, with the column rows, the expanding is a bit more complicated: so far i am creating a new column of type list, which does not include the status record. Something like:



dt[, rows2 := lapply(rows, function(x) list("distance" = (x[[1]][[1]]["distance"]),
"duration" = (x[[1]][[1]]["duration"]),
"duration_in_traffic" = (x[[1]][[1]]["duration_in_traffic"])))]


And then the above procedure can be used to expand rows2 into three columns of type list, which can be subsequently expanded using the same procedure. Now, this approach sucks for the obvious reason as not being really straightforward for anyone who reads the code after me. Moreover, it takes a lot of typing. I think there must be way more elegant way of wrangling this.










share|improve this question
















I have a data.table where more than 2 columns are of the type list. I would like to expand these columns, so that each element of the list becomes a new column. I would like to have a more elegant way than to "manually" expand each column and then join the tables together.



the setup



Edit: (providing the json from which i obtained the data.table)



So i have a json file like this:



[
{
"origins": [
{
"orig_lon": "14.36784",
"orig_lat": "49.985982",
"local_id": "AD.22045279",
"full_address": "Věštínská 36/9, Radotín, 15300 Praha 5"
},
{
"orig_lon": "14.352792",
"orig_lat": "49.983317",
"local_id": "AD.22055428",
"full_address": "Otínská 1102/37, Radotín, 15300 Praha 5"
}
],
"destinations": [
{
"dest_lon": "14.352245",
"dest_lat": "49.981314",
"local_id": "AD.22045848",
"full_address": "Zderazská 98/3, Radotín, 15300 Praha 5"
},
{
"dest_lon": "14.226975",
"dest_lat": "50.051702",
"local_id": "AD.27261433",
"full_address": "Západní 458, 25303 Chýně"
}
],
"destination_addresses": [
"Zderazská 98/3, 153 00 Praha-Radotín, Czechia",
"Západní 458, 253 01 Chýně, Czechia"
],
"origin_addresses": [
"U Jankovky 455/18, 153 00 Praha-Radotín, Czechia",
"Otínská 1102/37, 153 00 Praha-Radotín, Czechia"
],
"rows": [
{
"elements": [
{
"distance": {
"text": "1.6 km",
"value": 1620
},
"duration": {
"text": "5 mins",
"value": 272
},
"duration_in_traffic": {
"text": "5 mins",
"value": 277
},
"status": "OK"
},
{
"distance": {
"text": "19.3 km",
"value": 19313
},
"duration": {
"text": "22 mins",
"value": 1343
},
"duration_in_traffic": {
"text": "24 mins",
"value": 1424
},
"status": "OK"
}
]
},
{
"elements": [
{
"distance": {
"text": "0.7 km",
"value": 691
},
"duration": {
"text": "2 mins",
"value": 101
},
"duration_in_traffic": {
"text": "2 mins",
"value": 99
},
"status": "OK"
},
{
"distance": {
"text": "18.7 km",
"value": 18655
},
"duration": {
"text": "21 mins",
"value": 1246
},
"duration_in_traffic": {
"text": "22 mins",
"value": 1336
},
"status": "OK"
}
]
}
],
"status": "OK"
},
{
"origins": [
{
"orig_lon": "14.36784",
"orig_lat": "49.985982",
"local_id": "AD.22045279",
"full_address": "Věštínská 36/9, Radotín, 15300 Praha 5"
},
{
"orig_lon": "14.352792",
"orig_lat": "49.983317",
"local_id": "AD.22055428",
"full_address": "Otínská 1102/37, Radotín, 15300 Praha 5"
}
],
"destinations": [
{
"dest_lon": "14.36053",
"dest_lat": "49.981687",
"local_id": "AD.22047131",
"full_address": "Zítkova 235/7, Radotín, 15300 Praha 5"
},
{
"dest_lon": "14.361052",
"dest_lat": "49.988529",
"local_id": "AD.22054952",
"full_address": "Strážovská 1053/33, Radotín, 15300 Praha 5"
}
],
"destination_addresses": [
"Zítkova 235/7, 153 00 Praha-Radotín, Czechia",
"Strážovská 1053/33, 153 00 Praha-Radotín, Czechia"
],
"origin_addresses": [
"U Jankovky 455/18, 153 00 Praha-Radotín, Czechia",
"Otínská 1102/37, 153 00 Praha-Radotín, Czechia"
],
"rows": [
{
"elements": [
{
"distance": {
"text": "1.4 km",
"value": 1445
},
"duration": {
"text": "4 mins",
"value": 248
},
"duration_in_traffic": {
"text": "4 mins",
"value": 247
},
"status": "OK"
},
{
"distance": {
"text": "1.9 km",
"value": 1933
},
"duration": {
"text": "4 mins",
"value": 264
},
"duration_in_traffic": {
"text": "4 mins",
"value": 267
},
"status": "OK"
}
]
},
{
"elements": [
{
"distance": {
"text": "1.4 km",
"value": 1374
},
"duration": {
"text": "4 mins",
"value": 232
},
"duration_in_traffic": {
"text": "4 mins",
"value": 241
},
"status": "OK"
},
{
"distance": {
"text": "1.3 km",
"value": 1274
},
"duration": {
"text": "3 mins",
"value": 167
},
"duration_in_traffic": {
"text": "3 mins",
"value": 174
},
"status": "OK"
}
]
}
],
"status": "OK"
}
]


Which I read in like:



library(jsonlite)
library(data.table)
data <- read_json('./path_to_that_json/that_json.json')


This results in a list of length 2.



I can covert this into data.table like:



dt <- rbindlist(lapply(data, as.data.table))


Which then results in a data.table like:



   origins destinations                             destination_addresses                                 origin_addresses
1: <list> <list> Zderazská 98/3, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia
2: <list> <list> Západní 458, 253 01 Chýne, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia
3: <list> <list> Zítkova 235/7, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia
4: <list> <list> Strážovská 1053/33, 153 00 Praha-Radotín, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia
rows status
1: <list> OK
2: <list> OK
3: <list> OK
4: <list> OK


This means I have several columns containing list and i would like to expand them.



what kinda works



I know that to expand just one column, I can do:



dt[, r = as.character(.I)]
res1 <- dt[, rbindlist(setNames(origins, r), id = "r")]


(I found that here: Expand list column of data.tables )



Now, i could expand multiple columns by repeating this call and joining the results using the column r. This could look like:



res1 <- dt[dt[, rbindlist(origins, id = "r")][
, `:=`(r=as.character(r))], on = "r"][, `:=`(origins = NULL, destinations = NULL)][dt[
, rbindlist(destinations, id = "r")][
, `:=`(r=as.character(r))], on = "r"]


Which would give me the desired output of:



                               destination_addresses                                 origin_addresses   rows status r
1: Zderazská 98/3, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia <list> OK 1
2: Západní 458, 253 01 Chýne, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia <list> OK 2
3: Zítkova 235/7, 153 00 Praha-Radotín, Czechia U Jankovky 455/18, 153 00 Praha-Radotín, Czechia <list> OK 3
4: Strážovská 1053/33, 153 00 Praha-Radotín, Czechia Otínská 1102/37, 153 00 Praha-Radotín, Czechia <list> OK 4
orig_lon orig_lat local_id full_address dest_lon dest_lat i.local_id
1: 14.36784 49.985982 AD.22045279 Veštínská 36/9, Radotín, 15300 Praha 5 14.352245 49.981314 AD.22045848
2: 14.352792 49.983317 AD.22055428 Otínská 1102/37, Radotín, 15300 Praha 5 14.226975 50.051702 AD.27261433
3: 14.36784 49.985982 AD.22045279 Veštínská 36/9, Radotín, 15300 Praha 5 14.36053 49.981687 AD.22047131
4: 14.352792 49.983317 AD.22055428 Otínská 1102/37, Radotín, 15300 Praha 5 14.361052 49.988529 AD.22054952
i.full_address
1: Zderazská 98/3, Radotín, 15300 Praha 5
2: Západní 458, 25303 Chýne
3: Zítkova 235/7, Radotín, 15300 Praha 5
4: Strážovská 1053/33, Radotín, 15300 Praha 5


My question is:



Is there a more elegant and more efficient way of expanding several columns? In theory, i would like to have a list of columns to be expanded and then make one call which would expand all of them and return the above result.



Also, with the column rows, the expanding is a bit more complicated: so far i am creating a new column of type list, which does not include the status record. Something like:



dt[, rows2 := lapply(rows, function(x) list("distance" = (x[[1]][[1]]["distance"]),
"duration" = (x[[1]][[1]]["duration"]),
"duration_in_traffic" = (x[[1]][[1]]["duration_in_traffic"])))]


And then the above procedure can be used to expand rows2 into three columns of type list, which can be subsequently expanded using the same procedure. Now, this approach sucks for the obvious reason as not being really straightforward for anyone who reads the code after me. Moreover, it takes a lot of typing. I think there must be way more elegant way of wrangling this.







r data.table rbindlist






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 29 '18 at 18:22







ira

















asked Dec 29 '18 at 11:11









iraira

1,037921




1,037921













  • Oh, in case anyone notices that the: destination_addresses and origin_addresses columns do not match exactly the full_address columns, that is not a problem in my data set: it's just that google api sometimes matches coordinates to different address than another api provided by the cadastral office

    – ira
    Dec 29 '18 at 11:29











  • The real question is how did you get nested lists into data table? From JSON? Describe the source of data.

    – Parfait
    Dec 29 '18 at 15:33











  • Yes, i read it in from JSON, which has been generated by a python script used to query Google distance matrix api based on a list of coordinates stored in another Json.

    – ira
    Dec 29 '18 at 16:51











  • @Parfait okay, i added a subset of the json i am using. That is the source data i am reading into R and then wrangling.

    – ira
    Dec 29 '18 at 18:15



















  • Oh, in case anyone notices that the: destination_addresses and origin_addresses columns do not match exactly the full_address columns, that is not a problem in my data set: it's just that google api sometimes matches coordinates to different address than another api provided by the cadastral office

    – ira
    Dec 29 '18 at 11:29











  • The real question is how did you get nested lists into data table? From JSON? Describe the source of data.

    – Parfait
    Dec 29 '18 at 15:33











  • Yes, i read it in from JSON, which has been generated by a python script used to query Google distance matrix api based on a list of coordinates stored in another Json.

    – ira
    Dec 29 '18 at 16:51











  • @Parfait okay, i added a subset of the json i am using. That is the source data i am reading into R and then wrangling.

    – ira
    Dec 29 '18 at 18:15

















Oh, in case anyone notices that the: destination_addresses and origin_addresses columns do not match exactly the full_address columns, that is not a problem in my data set: it's just that google api sometimes matches coordinates to different address than another api provided by the cadastral office

– ira
Dec 29 '18 at 11:29





Oh, in case anyone notices that the: destination_addresses and origin_addresses columns do not match exactly the full_address columns, that is not a problem in my data set: it's just that google api sometimes matches coordinates to different address than another api provided by the cadastral office

– ira
Dec 29 '18 at 11:29













The real question is how did you get nested lists into data table? From JSON? Describe the source of data.

– Parfait
Dec 29 '18 at 15:33





The real question is how did you get nested lists into data table? From JSON? Describe the source of data.

– Parfait
Dec 29 '18 at 15:33













Yes, i read it in from JSON, which has been generated by a python script used to query Google distance matrix api based on a list of coordinates stored in another Json.

– ira
Dec 29 '18 at 16:51





Yes, i read it in from JSON, which has been generated by a python script used to query Google distance matrix api based on a list of coordinates stored in another Json.

– ira
Dec 29 '18 at 16:51













@Parfait okay, i added a subset of the json i am using. That is the source data i am reading into R and then wrangling.

– ira
Dec 29 '18 at 18:15





@Parfait okay, i added a subset of the json i am using. That is the source data i am reading into R and then wrangling.

– ira
Dec 29 '18 at 18:15












2 Answers
2






active

oldest

votes


















1














So one way to think about the problem is to process the list columns using an lapply to expand each separately and store into a list of data.tables and then merge all of those in the list at once.



To create the list of expanded variables you would just do the following:



    expandcols<-c("origins","destinations")

lapply(expandcols,function(i) rbindlist(dt[[i]],idcol = "r")))


Also note that your original r column is a character vector and the idcol created by rbindlist is an integer so you will need consistency here. In my code I just converted your original to numeric.



To merge a list of data.tables I like to use the Reduce function like this:



     Reduce(function(...) merge(...,by="keys"), list())


The output will be one data.table where your key column is "r" and the list will be the result of the lapply call above. You can then merge the result with your original dataframe the data.table way. Putting it altogether the call would look like this:



    dtfinal<-Reduce(function(...) merge(...,by="r"),lapply(expandcols,function(i) rbindlist(dt[[i]],idcol = "r")))[dt[,-expandcols,with=F],on="r"]


Here is the code for the function I made:



    list_expander_fn<-function(X){
'%notin%'<-Negate('%in%')##Helpful for selecting column names later
expandcols_fun<-function(Y){##Main function to be called recursively as needed and takes in a data.table object as its only argument.
listcols<-colnames(Y)[which(sapply(Y,is.list))] #Identify list columns
listdt<-lapply(listcols,function(i) tryCatch(rbindlist(Y[[i]],idcol = "r"),error=function(e) NULL)) #Expand lists using rbindlist and returns null on error.

invalidlists<-which(sapply(listdt,is.null)) #Rbindlist does not work unless list elements contain data.tables

##Simply unlists if character vector is created like in destination and origin addresses columns
if(length(invalidlists)!=0){
Y[,listcols[invalidlists]:=lapply(.SD,unlist),.SDcols = listcols[invalidlists]]

listcols<-listcols[-invalidlists] ##Update list columns to be merged
listdt<-listdt[-invalidlists]##Removes NULL elements from the listdt.
}

origcols<-colnames(Y)[colnames(Y)%notin%listcols]##Identifies nonlist columns for final merge
currentdt<-Reduce(function(...) merge(...,by="r"),listdt) ##merges list of data.tables
return(currentdt[Y[,origcols,with=F],on="r"])
}

repeat{
currentexpand<-expandcols_fun(X) #Executes the expandcols_fun
listcheck<-sapply(currentexpand,is.list) #Checks again if lists still exist
if(sum(listcheck)!=0){
X<-currentexpand #Updates the X for recursive calls

} else{
break
}
}

return(currentexpand)
}


It works but there are issues with variable names because of the final field names (text and value). I could probably tinker with that a bit if you like where this is going. It works on 'rows2' but not 'rows'. The code to call it will be of course simple:



    finaldt<-list_expander_fn(dt)


Does that help answer your question? Let me know if you want me to add anything to the explanation. Good luck!






share|improve this answer


























  • Thank you very much, this is certainly an improvement. Do you know also how to expand the rows column?

    – ira
    Dec 29 '18 at 21:48













  • Yes I am working on that too with the basic idea of applying that above code recursively until there are no more list columns in the data.table. Not sure how to do it without writing a function but I will post what I have here shortly which may work or at least generate some ideas.

    – Jason Johnson
    Dec 29 '18 at 21:54



















1














Rather than wrangling in a data.table, consider building a data table from the json data object which often imports as a heavily nested list of data frames or other lists. As a result, you need to migrate according to paths of the different level items:



library(jsonlite)
library(data.table)

json_data <- read_json('/path/to/posted.json')

df_list <- lapply(json_data, function(item)
data.frame(origin_address = unlist(item$origin_addresses), # TOP LEVEL
destination_address = unlist(item$destination_addresses), # TOP LEVEL
do.call(rbind, lapply(item$origins, data.frame)), # NESTED LEVEL
do.call(rbind, lapply(item$destinations, data.frame))) # NESTED LEVEL
)

final_df <- do.call(rbind, df_list) # SINGLE DATA FRAME
final_dt <- rbindlist(df_list) # SINGLE DATA TABLE


Output (be sure to rename full_address and local_id fields as origin_ or destination_)



final_dt

# origin_address destination_address orig_lon orig_lat local_id
# 1: U Jankovky 455/18, 153 00 Praha-Radotín, Czechia Zderazská 98/3, 153 00 Praha-Radotín, Czechia 14.36784 49.985982 AD.22045279
# 2: Otínská 1102/37, 153 00 Praha-Radotín, Czechia Západní 458, 253 01 Chýně, Czechia 14.352792 49.983317 AD.22055428
# 3: U Jankovky 455/18, 153 00 Praha-Radotín, Czechia Zítkova 235/7, 153 00 Praha-Radotín, Czechia 14.36784 49.985982 AD.22045279
# 4: Otínská 1102/37, 153 00 Praha-Radotín, Czechia Strážovská 1053/33, 153 00 Praha-Radotín, Czechia 14.352792 49.983317 AD.22055428
# full_address dest_lon dest_lat local_id.1 full_address.1
# 1: Věštínská 36/9, Radotín, 15300 Praha 5 14.352245 49.981314 AD.22045848 Zderazská 98/3, Radotín, 15300 Praha 5
# 2: Otínská 1102/37, Radotín, 15300 Praha 5 14.226975 50.051702 AD.27261433 Západní 458, 25303 Chýně
# 3: Věštínská 36/9, Radotín, 15300 Praha 5 14.36053 49.981687 AD.22047131 Zítkova 235/7, Radotín, 15300 Praha 5
# 4: Otínská 1102/37, Radotín, 15300 Praha 5 14.361052 49.988529 AD.22054952 Strážovská 1053/33, Radotín, 15300 Praha 5





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53969000%2fexpand-multiple-columns-of-data-table-containing-list-observations%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    So one way to think about the problem is to process the list columns using an lapply to expand each separately and store into a list of data.tables and then merge all of those in the list at once.



    To create the list of expanded variables you would just do the following:



        expandcols<-c("origins","destinations")

    lapply(expandcols,function(i) rbindlist(dt[[i]],idcol = "r")))


    Also note that your original r column is a character vector and the idcol created by rbindlist is an integer so you will need consistency here. In my code I just converted your original to numeric.



    To merge a list of data.tables I like to use the Reduce function like this:



         Reduce(function(...) merge(...,by="keys"), list())


    The output will be one data.table where your key column is "r" and the list will be the result of the lapply call above. You can then merge the result with your original dataframe the data.table way. Putting it altogether the call would look like this:



        dtfinal<-Reduce(function(...) merge(...,by="r"),lapply(expandcols,function(i) rbindlist(dt[[i]],idcol = "r")))[dt[,-expandcols,with=F],on="r"]


    Here is the code for the function I made:



        list_expander_fn<-function(X){
    '%notin%'<-Negate('%in%')##Helpful for selecting column names later
    expandcols_fun<-function(Y){##Main function to be called recursively as needed and takes in a data.table object as its only argument.
    listcols<-colnames(Y)[which(sapply(Y,is.list))] #Identify list columns
    listdt<-lapply(listcols,function(i) tryCatch(rbindlist(Y[[i]],idcol = "r"),error=function(e) NULL)) #Expand lists using rbindlist and returns null on error.

    invalidlists<-which(sapply(listdt,is.null)) #Rbindlist does not work unless list elements contain data.tables

    ##Simply unlists if character vector is created like in destination and origin addresses columns
    if(length(invalidlists)!=0){
    Y[,listcols[invalidlists]:=lapply(.SD,unlist),.SDcols = listcols[invalidlists]]

    listcols<-listcols[-invalidlists] ##Update list columns to be merged
    listdt<-listdt[-invalidlists]##Removes NULL elements from the listdt.
    }

    origcols<-colnames(Y)[colnames(Y)%notin%listcols]##Identifies nonlist columns for final merge
    currentdt<-Reduce(function(...) merge(...,by="r"),listdt) ##merges list of data.tables
    return(currentdt[Y[,origcols,with=F],on="r"])
    }

    repeat{
    currentexpand<-expandcols_fun(X) #Executes the expandcols_fun
    listcheck<-sapply(currentexpand,is.list) #Checks again if lists still exist
    if(sum(listcheck)!=0){
    X<-currentexpand #Updates the X for recursive calls

    } else{
    break
    }
    }

    return(currentexpand)
    }


    It works but there are issues with variable names because of the final field names (text and value). I could probably tinker with that a bit if you like where this is going. It works on 'rows2' but not 'rows'. The code to call it will be of course simple:



        finaldt<-list_expander_fn(dt)


    Does that help answer your question? Let me know if you want me to add anything to the explanation. Good luck!






    share|improve this answer


























    • Thank you very much, this is certainly an improvement. Do you know also how to expand the rows column?

      – ira
      Dec 29 '18 at 21:48













    • Yes I am working on that too with the basic idea of applying that above code recursively until there are no more list columns in the data.table. Not sure how to do it without writing a function but I will post what I have here shortly which may work or at least generate some ideas.

      – Jason Johnson
      Dec 29 '18 at 21:54
















    1














    So one way to think about the problem is to process the list columns using an lapply to expand each separately and store into a list of data.tables and then merge all of those in the list at once.



    To create the list of expanded variables you would just do the following:



        expandcols<-c("origins","destinations")

    lapply(expandcols,function(i) rbindlist(dt[[i]],idcol = "r")))


    Also note that your original r column is a character vector and the idcol created by rbindlist is an integer so you will need consistency here. In my code I just converted your original to numeric.



    To merge a list of data.tables I like to use the Reduce function like this:



         Reduce(function(...) merge(...,by="keys"), list())


    The output will be one data.table where your key column is "r" and the list will be the result of the lapply call above. You can then merge the result with your original dataframe the data.table way. Putting it altogether the call would look like this:



        dtfinal<-Reduce(function(...) merge(...,by="r"),lapply(expandcols,function(i) rbindlist(dt[[i]],idcol = "r")))[dt[,-expandcols,with=F],on="r"]


    Here is the code for the function I made:



        list_expander_fn<-function(X){
    '%notin%'<-Negate('%in%')##Helpful for selecting column names later
    expandcols_fun<-function(Y){##Main function to be called recursively as needed and takes in a data.table object as its only argument.
    listcols<-colnames(Y)[which(sapply(Y,is.list))] #Identify list columns
    listdt<-lapply(listcols,function(i) tryCatch(rbindlist(Y[[i]],idcol = "r"),error=function(e) NULL)) #Expand lists using rbindlist and returns null on error.

    invalidlists<-which(sapply(listdt,is.null)) #Rbindlist does not work unless list elements contain data.tables

    ##Simply unlists if character vector is created like in destination and origin addresses columns
    if(length(invalidlists)!=0){
    Y[,listcols[invalidlists]:=lapply(.SD,unlist),.SDcols = listcols[invalidlists]]

    listcols<-listcols[-invalidlists] ##Update list columns to be merged
    listdt<-listdt[-invalidlists]##Removes NULL elements from the listdt.
    }

    origcols<-colnames(Y)[colnames(Y)%notin%listcols]##Identifies nonlist columns for final merge
    currentdt<-Reduce(function(...) merge(...,by="r"),listdt) ##merges list of data.tables
    return(currentdt[Y[,origcols,with=F],on="r"])
    }

    repeat{
    currentexpand<-expandcols_fun(X) #Executes the expandcols_fun
    listcheck<-sapply(currentexpand,is.list) #Checks again if lists still exist
    if(sum(listcheck)!=0){
    X<-currentexpand #Updates the X for recursive calls

    } else{
    break
    }
    }

    return(currentexpand)
    }


    It works but there are issues with variable names because of the final field names (text and value). I could probably tinker with that a bit if you like where this is going. It works on 'rows2' but not 'rows'. The code to call it will be of course simple:



        finaldt<-list_expander_fn(dt)


    Does that help answer your question? Let me know if you want me to add anything to the explanation. Good luck!






    share|improve this answer


























    • Thank you very much, this is certainly an improvement. Do you know also how to expand the rows column?

      – ira
      Dec 29 '18 at 21:48













    • Yes I am working on that too with the basic idea of applying that above code recursively until there are no more list columns in the data.table. Not sure how to do it without writing a function but I will post what I have here shortly which may work or at least generate some ideas.

      – Jason Johnson
      Dec 29 '18 at 21:54














    1












    1








    1







    So one way to think about the problem is to process the list columns using an lapply to expand each separately and store into a list of data.tables and then merge all of those in the list at once.



    To create the list of expanded variables you would just do the following:



        expandcols<-c("origins","destinations")

    lapply(expandcols,function(i) rbindlist(dt[[i]],idcol = "r")))


    Also note that your original r column is a character vector and the idcol created by rbindlist is an integer so you will need consistency here. In my code I just converted your original to numeric.



    To merge a list of data.tables I like to use the Reduce function like this:



         Reduce(function(...) merge(...,by="keys"), list())


    The output will be one data.table where your key column is "r" and the list will be the result of the lapply call above. You can then merge the result with your original dataframe the data.table way. Putting it altogether the call would look like this:



        dtfinal<-Reduce(function(...) merge(...,by="r"),lapply(expandcols,function(i) rbindlist(dt[[i]],idcol = "r")))[dt[,-expandcols,with=F],on="r"]


    Here is the code for the function I made:



        list_expander_fn<-function(X){
    '%notin%'<-Negate('%in%')##Helpful for selecting column names later
    expandcols_fun<-function(Y){##Main function to be called recursively as needed and takes in a data.table object as its only argument.
    listcols<-colnames(Y)[which(sapply(Y,is.list))] #Identify list columns
    listdt<-lapply(listcols,function(i) tryCatch(rbindlist(Y[[i]],idcol = "r"),error=function(e) NULL)) #Expand lists using rbindlist and returns null on error.

    invalidlists<-which(sapply(listdt,is.null)) #Rbindlist does not work unless list elements contain data.tables

    ##Simply unlists if character vector is created like in destination and origin addresses columns
    if(length(invalidlists)!=0){
    Y[,listcols[invalidlists]:=lapply(.SD,unlist),.SDcols = listcols[invalidlists]]

    listcols<-listcols[-invalidlists] ##Update list columns to be merged
    listdt<-listdt[-invalidlists]##Removes NULL elements from the listdt.
    }

    origcols<-colnames(Y)[colnames(Y)%notin%listcols]##Identifies nonlist columns for final merge
    currentdt<-Reduce(function(...) merge(...,by="r"),listdt) ##merges list of data.tables
    return(currentdt[Y[,origcols,with=F],on="r"])
    }

    repeat{
    currentexpand<-expandcols_fun(X) #Executes the expandcols_fun
    listcheck<-sapply(currentexpand,is.list) #Checks again if lists still exist
    if(sum(listcheck)!=0){
    X<-currentexpand #Updates the X for recursive calls

    } else{
    break
    }
    }

    return(currentexpand)
    }


    It works but there are issues with variable names because of the final field names (text and value). I could probably tinker with that a bit if you like where this is going. It works on 'rows2' but not 'rows'. The code to call it will be of course simple:



        finaldt<-list_expander_fn(dt)


    Does that help answer your question? Let me know if you want me to add anything to the explanation. Good luck!






    share|improve this answer















    So one way to think about the problem is to process the list columns using an lapply to expand each separately and store into a list of data.tables and then merge all of those in the list at once.



    To create the list of expanded variables you would just do the following:



        expandcols<-c("origins","destinations")

    lapply(expandcols,function(i) rbindlist(dt[[i]],idcol = "r")))


    Also note that your original r column is a character vector and the idcol created by rbindlist is an integer so you will need consistency here. In my code I just converted your original to numeric.



    To merge a list of data.tables I like to use the Reduce function like this:



         Reduce(function(...) merge(...,by="keys"), list())


    The output will be one data.table where your key column is "r" and the list will be the result of the lapply call above. You can then merge the result with your original dataframe the data.table way. Putting it altogether the call would look like this:



        dtfinal<-Reduce(function(...) merge(...,by="r"),lapply(expandcols,function(i) rbindlist(dt[[i]],idcol = "r")))[dt[,-expandcols,with=F],on="r"]


    Here is the code for the function I made:



        list_expander_fn<-function(X){
    '%notin%'<-Negate('%in%')##Helpful for selecting column names later
    expandcols_fun<-function(Y){##Main function to be called recursively as needed and takes in a data.table object as its only argument.
    listcols<-colnames(Y)[which(sapply(Y,is.list))] #Identify list columns
    listdt<-lapply(listcols,function(i) tryCatch(rbindlist(Y[[i]],idcol = "r"),error=function(e) NULL)) #Expand lists using rbindlist and returns null on error.

    invalidlists<-which(sapply(listdt,is.null)) #Rbindlist does not work unless list elements contain data.tables

    ##Simply unlists if character vector is created like in destination and origin addresses columns
    if(length(invalidlists)!=0){
    Y[,listcols[invalidlists]:=lapply(.SD,unlist),.SDcols = listcols[invalidlists]]

    listcols<-listcols[-invalidlists] ##Update list columns to be merged
    listdt<-listdt[-invalidlists]##Removes NULL elements from the listdt.
    }

    origcols<-colnames(Y)[colnames(Y)%notin%listcols]##Identifies nonlist columns for final merge
    currentdt<-Reduce(function(...) merge(...,by="r"),listdt) ##merges list of data.tables
    return(currentdt[Y[,origcols,with=F],on="r"])
    }

    repeat{
    currentexpand<-expandcols_fun(X) #Executes the expandcols_fun
    listcheck<-sapply(currentexpand,is.list) #Checks again if lists still exist
    if(sum(listcheck)!=0){
    X<-currentexpand #Updates the X for recursive calls

    } else{
    break
    }
    }

    return(currentexpand)
    }


    It works but there are issues with variable names because of the final field names (text and value). I could probably tinker with that a bit if you like where this is going. It works on 'rows2' but not 'rows'. The code to call it will be of course simple:



        finaldt<-list_expander_fn(dt)


    Does that help answer your question? Let me know if you want me to add anything to the explanation. Good luck!







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 30 '18 at 0:15

























    answered Dec 29 '18 at 19:23









    Jason JohnsonJason Johnson

    965




    965













    • Thank you very much, this is certainly an improvement. Do you know also how to expand the rows column?

      – ira
      Dec 29 '18 at 21:48













    • Yes I am working on that too with the basic idea of applying that above code recursively until there are no more list columns in the data.table. Not sure how to do it without writing a function but I will post what I have here shortly which may work or at least generate some ideas.

      – Jason Johnson
      Dec 29 '18 at 21:54



















    • Thank you very much, this is certainly an improvement. Do you know also how to expand the rows column?

      – ira
      Dec 29 '18 at 21:48













    • Yes I am working on that too with the basic idea of applying that above code recursively until there are no more list columns in the data.table. Not sure how to do it without writing a function but I will post what I have here shortly which may work or at least generate some ideas.

      – Jason Johnson
      Dec 29 '18 at 21:54

















    Thank you very much, this is certainly an improvement. Do you know also how to expand the rows column?

    – ira
    Dec 29 '18 at 21:48







    Thank you very much, this is certainly an improvement. Do you know also how to expand the rows column?

    – ira
    Dec 29 '18 at 21:48















    Yes I am working on that too with the basic idea of applying that above code recursively until there are no more list columns in the data.table. Not sure how to do it without writing a function but I will post what I have here shortly which may work or at least generate some ideas.

    – Jason Johnson
    Dec 29 '18 at 21:54





    Yes I am working on that too with the basic idea of applying that above code recursively until there are no more list columns in the data.table. Not sure how to do it without writing a function but I will post what I have here shortly which may work or at least generate some ideas.

    – Jason Johnson
    Dec 29 '18 at 21:54













    1














    Rather than wrangling in a data.table, consider building a data table from the json data object which often imports as a heavily nested list of data frames or other lists. As a result, you need to migrate according to paths of the different level items:



    library(jsonlite)
    library(data.table)

    json_data <- read_json('/path/to/posted.json')

    df_list <- lapply(json_data, function(item)
    data.frame(origin_address = unlist(item$origin_addresses), # TOP LEVEL
    destination_address = unlist(item$destination_addresses), # TOP LEVEL
    do.call(rbind, lapply(item$origins, data.frame)), # NESTED LEVEL
    do.call(rbind, lapply(item$destinations, data.frame))) # NESTED LEVEL
    )

    final_df <- do.call(rbind, df_list) # SINGLE DATA FRAME
    final_dt <- rbindlist(df_list) # SINGLE DATA TABLE


    Output (be sure to rename full_address and local_id fields as origin_ or destination_)



    final_dt

    # origin_address destination_address orig_lon orig_lat local_id
    # 1: U Jankovky 455/18, 153 00 Praha-Radotín, Czechia Zderazská 98/3, 153 00 Praha-Radotín, Czechia 14.36784 49.985982 AD.22045279
    # 2: Otínská 1102/37, 153 00 Praha-Radotín, Czechia Západní 458, 253 01 Chýně, Czechia 14.352792 49.983317 AD.22055428
    # 3: U Jankovky 455/18, 153 00 Praha-Radotín, Czechia Zítkova 235/7, 153 00 Praha-Radotín, Czechia 14.36784 49.985982 AD.22045279
    # 4: Otínská 1102/37, 153 00 Praha-Radotín, Czechia Strážovská 1053/33, 153 00 Praha-Radotín, Czechia 14.352792 49.983317 AD.22055428
    # full_address dest_lon dest_lat local_id.1 full_address.1
    # 1: Věštínská 36/9, Radotín, 15300 Praha 5 14.352245 49.981314 AD.22045848 Zderazská 98/3, Radotín, 15300 Praha 5
    # 2: Otínská 1102/37, Radotín, 15300 Praha 5 14.226975 50.051702 AD.27261433 Západní 458, 25303 Chýně
    # 3: Věštínská 36/9, Radotín, 15300 Praha 5 14.36053 49.981687 AD.22047131 Zítkova 235/7, Radotín, 15300 Praha 5
    # 4: Otínská 1102/37, Radotín, 15300 Praha 5 14.361052 49.988529 AD.22054952 Strážovská 1053/33, Radotín, 15300 Praha 5





    share|improve this answer




























      1














      Rather than wrangling in a data.table, consider building a data table from the json data object which often imports as a heavily nested list of data frames or other lists. As a result, you need to migrate according to paths of the different level items:



      library(jsonlite)
      library(data.table)

      json_data <- read_json('/path/to/posted.json')

      df_list <- lapply(json_data, function(item)
      data.frame(origin_address = unlist(item$origin_addresses), # TOP LEVEL
      destination_address = unlist(item$destination_addresses), # TOP LEVEL
      do.call(rbind, lapply(item$origins, data.frame)), # NESTED LEVEL
      do.call(rbind, lapply(item$destinations, data.frame))) # NESTED LEVEL
      )

      final_df <- do.call(rbind, df_list) # SINGLE DATA FRAME
      final_dt <- rbindlist(df_list) # SINGLE DATA TABLE


      Output (be sure to rename full_address and local_id fields as origin_ or destination_)



      final_dt

      # origin_address destination_address orig_lon orig_lat local_id
      # 1: U Jankovky 455/18, 153 00 Praha-Radotín, Czechia Zderazská 98/3, 153 00 Praha-Radotín, Czechia 14.36784 49.985982 AD.22045279
      # 2: Otínská 1102/37, 153 00 Praha-Radotín, Czechia Západní 458, 253 01 Chýně, Czechia 14.352792 49.983317 AD.22055428
      # 3: U Jankovky 455/18, 153 00 Praha-Radotín, Czechia Zítkova 235/7, 153 00 Praha-Radotín, Czechia 14.36784 49.985982 AD.22045279
      # 4: Otínská 1102/37, 153 00 Praha-Radotín, Czechia Strážovská 1053/33, 153 00 Praha-Radotín, Czechia 14.352792 49.983317 AD.22055428
      # full_address dest_lon dest_lat local_id.1 full_address.1
      # 1: Věštínská 36/9, Radotín, 15300 Praha 5 14.352245 49.981314 AD.22045848 Zderazská 98/3, Radotín, 15300 Praha 5
      # 2: Otínská 1102/37, Radotín, 15300 Praha 5 14.226975 50.051702 AD.27261433 Západní 458, 25303 Chýně
      # 3: Věštínská 36/9, Radotín, 15300 Praha 5 14.36053 49.981687 AD.22047131 Zítkova 235/7, Radotín, 15300 Praha 5
      # 4: Otínská 1102/37, Radotín, 15300 Praha 5 14.361052 49.988529 AD.22054952 Strážovská 1053/33, Radotín, 15300 Praha 5





      share|improve this answer


























        1












        1








        1







        Rather than wrangling in a data.table, consider building a data table from the json data object which often imports as a heavily nested list of data frames or other lists. As a result, you need to migrate according to paths of the different level items:



        library(jsonlite)
        library(data.table)

        json_data <- read_json('/path/to/posted.json')

        df_list <- lapply(json_data, function(item)
        data.frame(origin_address = unlist(item$origin_addresses), # TOP LEVEL
        destination_address = unlist(item$destination_addresses), # TOP LEVEL
        do.call(rbind, lapply(item$origins, data.frame)), # NESTED LEVEL
        do.call(rbind, lapply(item$destinations, data.frame))) # NESTED LEVEL
        )

        final_df <- do.call(rbind, df_list) # SINGLE DATA FRAME
        final_dt <- rbindlist(df_list) # SINGLE DATA TABLE


        Output (be sure to rename full_address and local_id fields as origin_ or destination_)



        final_dt

        # origin_address destination_address orig_lon orig_lat local_id
        # 1: U Jankovky 455/18, 153 00 Praha-Radotín, Czechia Zderazská 98/3, 153 00 Praha-Radotín, Czechia 14.36784 49.985982 AD.22045279
        # 2: Otínská 1102/37, 153 00 Praha-Radotín, Czechia Západní 458, 253 01 Chýně, Czechia 14.352792 49.983317 AD.22055428
        # 3: U Jankovky 455/18, 153 00 Praha-Radotín, Czechia Zítkova 235/7, 153 00 Praha-Radotín, Czechia 14.36784 49.985982 AD.22045279
        # 4: Otínská 1102/37, 153 00 Praha-Radotín, Czechia Strážovská 1053/33, 153 00 Praha-Radotín, Czechia 14.352792 49.983317 AD.22055428
        # full_address dest_lon dest_lat local_id.1 full_address.1
        # 1: Věštínská 36/9, Radotín, 15300 Praha 5 14.352245 49.981314 AD.22045848 Zderazská 98/3, Radotín, 15300 Praha 5
        # 2: Otínská 1102/37, Radotín, 15300 Praha 5 14.226975 50.051702 AD.27261433 Západní 458, 25303 Chýně
        # 3: Věštínská 36/9, Radotín, 15300 Praha 5 14.36053 49.981687 AD.22047131 Zítkova 235/7, Radotín, 15300 Praha 5
        # 4: Otínská 1102/37, Radotín, 15300 Praha 5 14.361052 49.988529 AD.22054952 Strážovská 1053/33, Radotín, 15300 Praha 5





        share|improve this answer













        Rather than wrangling in a data.table, consider building a data table from the json data object which often imports as a heavily nested list of data frames or other lists. As a result, you need to migrate according to paths of the different level items:



        library(jsonlite)
        library(data.table)

        json_data <- read_json('/path/to/posted.json')

        df_list <- lapply(json_data, function(item)
        data.frame(origin_address = unlist(item$origin_addresses), # TOP LEVEL
        destination_address = unlist(item$destination_addresses), # TOP LEVEL
        do.call(rbind, lapply(item$origins, data.frame)), # NESTED LEVEL
        do.call(rbind, lapply(item$destinations, data.frame))) # NESTED LEVEL
        )

        final_df <- do.call(rbind, df_list) # SINGLE DATA FRAME
        final_dt <- rbindlist(df_list) # SINGLE DATA TABLE


        Output (be sure to rename full_address and local_id fields as origin_ or destination_)



        final_dt

        # origin_address destination_address orig_lon orig_lat local_id
        # 1: U Jankovky 455/18, 153 00 Praha-Radotín, Czechia Zderazská 98/3, 153 00 Praha-Radotín, Czechia 14.36784 49.985982 AD.22045279
        # 2: Otínská 1102/37, 153 00 Praha-Radotín, Czechia Západní 458, 253 01 Chýně, Czechia 14.352792 49.983317 AD.22055428
        # 3: U Jankovky 455/18, 153 00 Praha-Radotín, Czechia Zítkova 235/7, 153 00 Praha-Radotín, Czechia 14.36784 49.985982 AD.22045279
        # 4: Otínská 1102/37, 153 00 Praha-Radotín, Czechia Strážovská 1053/33, 153 00 Praha-Radotín, Czechia 14.352792 49.983317 AD.22055428
        # full_address dest_lon dest_lat local_id.1 full_address.1
        # 1: Věštínská 36/9, Radotín, 15300 Praha 5 14.352245 49.981314 AD.22045848 Zderazská 98/3, Radotín, 15300 Praha 5
        # 2: Otínská 1102/37, Radotín, 15300 Praha 5 14.226975 50.051702 AD.27261433 Západní 458, 25303 Chýně
        # 3: Věštínská 36/9, Radotín, 15300 Praha 5 14.36053 49.981687 AD.22047131 Zítkova 235/7, Radotín, 15300 Praha 5
        # 4: Otínská 1102/37, Radotín, 15300 Praha 5 14.361052 49.988529 AD.22054952 Strážovská 1053/33, Radotín, 15300 Praha 5






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 30 '18 at 2:44









        ParfaitParfait

        50.6k84269




        50.6k84269






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53969000%2fexpand-multiple-columns-of-data-table-containing-list-observations%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas