Map a function by key path in nested dict including slices, wildcards and ragged hierarchies
This question is an extension based on here and here.
What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:
- List of keys at a given path position
- Key slices (assuming sorting)
- Wildcards (ie all keys at a path position)
- Handling ragged hierarchies by ignoring keys that don't appear at a given level
If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...))
.
However, the hierarchy can be ragged, eg:
data = {0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': 300}}}
Would like to be able to specify key path like this:
map_at(f, ['*',['b','c'],'d'])
To return:
{0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}
Here f
is mapped to key paths [2,b,d]
and [3,c,d]
.
Slicing would be specified as, eg [0:3,b]
for instance.
I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f
would also be mapped at [0,b]` and other paths).
Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError
etc?
Please do not suggest Pandas as an alternative.
python dictionary functional-programming
|
show 3 more comments
This question is an extension based on here and here.
What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:
- List of keys at a given path position
- Key slices (assuming sorting)
- Wildcards (ie all keys at a path position)
- Handling ragged hierarchies by ignoring keys that don't appear at a given level
If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...))
.
However, the hierarchy can be ragged, eg:
data = {0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': 300}}}
Would like to be able to specify key path like this:
map_at(f, ['*',['b','c'],'d'])
To return:
{0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}
Here f
is mapped to key paths [2,b,d]
and [3,c,d]
.
Slicing would be specified as, eg [0:3,b]
for instance.
I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f
would also be mapped at [0,b]` and other paths).
Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError
etc?
Please do not suggest Pandas as an alternative.
python dictionary functional-programming
1
Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includestry
?
– Davis Herring
Jan 1 at 1:06
@DavisHerring, the primary problem isKeyError
is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.
– alancalvitti
Jan 2 at 14:34
What if a key path resolves to adict
?
– Davis Herring
Jan 2 at 21:28
@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?
– alancalvitti
Jan 3 at 14:52
No ambiguity, but does “should return it” mean with or without applyingf
?
– Davis Herring
Jan 3 at 14:58
|
show 3 more comments
This question is an extension based on here and here.
What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:
- List of keys at a given path position
- Key slices (assuming sorting)
- Wildcards (ie all keys at a path position)
- Handling ragged hierarchies by ignoring keys that don't appear at a given level
If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...))
.
However, the hierarchy can be ragged, eg:
data = {0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': 300}}}
Would like to be able to specify key path like this:
map_at(f, ['*',['b','c'],'d'])
To return:
{0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}
Here f
is mapped to key paths [2,b,d]
and [3,c,d]
.
Slicing would be specified as, eg [0:3,b]
for instance.
I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f
would also be mapped at [0,b]` and other paths).
Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError
etc?
Please do not suggest Pandas as an alternative.
python dictionary functional-programming
This question is an extension based on here and here.
What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:
- List of keys at a given path position
- Key slices (assuming sorting)
- Wildcards (ie all keys at a path position)
- Handling ragged hierarchies by ignoring keys that don't appear at a given level
If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...))
.
However, the hierarchy can be ragged, eg:
data = {0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': 300}}}
Would like to be able to specify key path like this:
map_at(f, ['*',['b','c'],'d'])
To return:
{0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}
Here f
is mapped to key paths [2,b,d]
and [3,c,d]
.
Slicing would be specified as, eg [0:3,b]
for instance.
I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f
would also be mapped at [0,b]` and other paths).
Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError
etc?
Please do not suggest Pandas as an alternative.
python dictionary functional-programming
python dictionary functional-programming
asked Dec 31 '18 at 21:21
alancalvittialancalvitti
1528
1528
1
Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includestry
?
– Davis Herring
Jan 1 at 1:06
@DavisHerring, the primary problem isKeyError
is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.
– alancalvitti
Jan 2 at 14:34
What if a key path resolves to adict
?
– Davis Herring
Jan 2 at 21:28
@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?
– alancalvitti
Jan 3 at 14:52
No ambiguity, but does “should return it” mean with or without applyingf
?
– Davis Herring
Jan 3 at 14:58
|
show 3 more comments
1
Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includestry
?
– Davis Herring
Jan 1 at 1:06
@DavisHerring, the primary problem isKeyError
is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.
– alancalvitti
Jan 2 at 14:34
What if a key path resolves to adict
?
– Davis Herring
Jan 2 at 21:28
@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?
– alancalvitti
Jan 3 at 14:52
No ambiguity, but does “should return it” mean with or without applyingf
?
– Davis Herring
Jan 3 at 14:58
1
1
Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes
try
?– Davis Herring
Jan 1 at 1:06
Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes
try
?– Davis Herring
Jan 1 at 1:06
@DavisHerring, the primary problem is
KeyError
is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.– alancalvitti
Jan 2 at 14:34
@DavisHerring, the primary problem is
KeyError
is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.– alancalvitti
Jan 2 at 14:34
What if a key path resolves to a
dict
?– Davis Herring
Jan 2 at 21:28
What if a key path resolves to a
dict
?– Davis Herring
Jan 2 at 21:28
@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?
– alancalvitti
Jan 3 at 14:52
@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?
– alancalvitti
Jan 3 at 14:52
No ambiguity, but does “should return it” mean with or without applying
f
?– Davis Herring
Jan 3 at 14:58
No ambiguity, but does “should return it” mean with or without applying
f
?– Davis Herring
Jan 3 at 14:58
|
show 3 more comments
2 Answers
2
active
oldest
votes
I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:
map_at(func, path_pattern, data)
:
- if
path_pattern
is not empty
- if
data
is terminal, it's a failure : we did not match the fullpath_pattern
̀so there is no reason to apply the function. Just returndata
. - else, we have to explore every path in data. We consume the head of
path_pattern
if possible. That is return a dictdata key
->map_at(func, new_path, data value)
wherenew_path
is thetail
of thepath_pattern
if the key matches thehead
, else the `path_pattern itself.
- if
- else, it's a success, because all the
path_pattern
was consumed:
- if
data
is terminal, returnfunc(data)
- else, find the leaves and apply
func
: return return a dictdata key
->map_at(func, , data value)
- if
Notes:
- I assume that the pattern
*-b-d
matches the path0-a-b-c-d-e
; - it's an eager algorithm: the head of the path is always consumed when possible;
- if the path is fully consumed, every terminal should be mapped;
- it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.
Here's the code:
def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, , v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)
Note that tail if matches(head, k) else path_pattern
means: consume head
if possible. To use a range in the pattern, just use range(...)
.
As you can see, you never escape from case 2. : if the path_pattern
is empty, you just have to map all leaves whatever happens. This is clearer in this version:
def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)
def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)
I expected it work with any nested combinations oflist
anddict
. Givendata2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}]
,map_at(type,['*',['b','c'],'d'],data2)
returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like0:2
and:
in place of the wildcard, resulting in syntax errors.
– alancalvitti
10 hours ago
@alancalvitti I did not implement slices, but you can userange(0,2)
instead of0:2
. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.
– jferard
10 hours ago
add a comment |
This is not very simple and less efficient, but it should work:
def map_at(f,kp,d): return map_at0(f,kp,d,0)
def slice_contains(s,i): # no negative-index support
a=s.start or 0
return i>=a and (s.end is None or i<s.end) and
not (i-a)%(s.step or 1)
def map_at0(f,kp,d,i):
if i==len(kp): return f(d)
if not isinstance(d,dict): return d # no such path here
ret={}
p=kp[i]
if isinstance(p,str) and p!='*': p=p,
for j,(k,v) in enumerate(sorted(d.items())):
if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):
v=map_at0(f,kp,v,i+1)
ret[k]=v
return ret
Note that this copies every dictionary it expands (because it matches the key path, even if no further keys match and f
is never applied) but returns unmatched subdictionaries by reference. Note also that '*'
can be “quoted” by putting it in a list.
I triedmap_at(type,['*',['b','c'],'d'],data)
usingdata
as per original Q and gotAttributeError: 'int' object has no attribute 'items'
– alancalvitti
10 hours ago
Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.
– Davis Herring
2 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53991503%2fmap-a-function-by-key-path-in-nested-dict-including-slices-wildcards-and-ragged%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:
map_at(func, path_pattern, data)
:
- if
path_pattern
is not empty
- if
data
is terminal, it's a failure : we did not match the fullpath_pattern
̀so there is no reason to apply the function. Just returndata
. - else, we have to explore every path in data. We consume the head of
path_pattern
if possible. That is return a dictdata key
->map_at(func, new_path, data value)
wherenew_path
is thetail
of thepath_pattern
if the key matches thehead
, else the `path_pattern itself.
- if
- else, it's a success, because all the
path_pattern
was consumed:
- if
data
is terminal, returnfunc(data)
- else, find the leaves and apply
func
: return return a dictdata key
->map_at(func, , data value)
- if
Notes:
- I assume that the pattern
*-b-d
matches the path0-a-b-c-d-e
; - it's an eager algorithm: the head of the path is always consumed when possible;
- if the path is fully consumed, every terminal should be mapped;
- it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.
Here's the code:
def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, , v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)
Note that tail if matches(head, k) else path_pattern
means: consume head
if possible. To use a range in the pattern, just use range(...)
.
As you can see, you never escape from case 2. : if the path_pattern
is empty, you just have to map all leaves whatever happens. This is clearer in this version:
def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)
def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)
I expected it work with any nested combinations oflist
anddict
. Givendata2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}]
,map_at(type,['*',['b','c'],'d'],data2)
returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like0:2
and:
in place of the wildcard, resulting in syntax errors.
– alancalvitti
10 hours ago
@alancalvitti I did not implement slices, but you can userange(0,2)
instead of0:2
. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.
– jferard
10 hours ago
add a comment |
I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:
map_at(func, path_pattern, data)
:
- if
path_pattern
is not empty
- if
data
is terminal, it's a failure : we did not match the fullpath_pattern
̀so there is no reason to apply the function. Just returndata
. - else, we have to explore every path in data. We consume the head of
path_pattern
if possible. That is return a dictdata key
->map_at(func, new_path, data value)
wherenew_path
is thetail
of thepath_pattern
if the key matches thehead
, else the `path_pattern itself.
- if
- else, it's a success, because all the
path_pattern
was consumed:
- if
data
is terminal, returnfunc(data)
- else, find the leaves and apply
func
: return return a dictdata key
->map_at(func, , data value)
- if
Notes:
- I assume that the pattern
*-b-d
matches the path0-a-b-c-d-e
; - it's an eager algorithm: the head of the path is always consumed when possible;
- if the path is fully consumed, every terminal should be mapped;
- it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.
Here's the code:
def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, , v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)
Note that tail if matches(head, k) else path_pattern
means: consume head
if possible. To use a range in the pattern, just use range(...)
.
As you can see, you never escape from case 2. : if the path_pattern
is empty, you just have to map all leaves whatever happens. This is clearer in this version:
def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)
def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)
I expected it work with any nested combinations oflist
anddict
. Givendata2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}]
,map_at(type,['*',['b','c'],'d'],data2)
returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like0:2
and:
in place of the wildcard, resulting in syntax errors.
– alancalvitti
10 hours ago
@alancalvitti I did not implement slices, but you can userange(0,2)
instead of0:2
. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.
– jferard
10 hours ago
add a comment |
I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:
map_at(func, path_pattern, data)
:
- if
path_pattern
is not empty
- if
data
is terminal, it's a failure : we did not match the fullpath_pattern
̀so there is no reason to apply the function. Just returndata
. - else, we have to explore every path in data. We consume the head of
path_pattern
if possible. That is return a dictdata key
->map_at(func, new_path, data value)
wherenew_path
is thetail
of thepath_pattern
if the key matches thehead
, else the `path_pattern itself.
- if
- else, it's a success, because all the
path_pattern
was consumed:
- if
data
is terminal, returnfunc(data)
- else, find the leaves and apply
func
: return return a dictdata key
->map_at(func, , data value)
- if
Notes:
- I assume that the pattern
*-b-d
matches the path0-a-b-c-d-e
; - it's an eager algorithm: the head of the path is always consumed when possible;
- if the path is fully consumed, every terminal should be mapped;
- it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.
Here's the code:
def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, , v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)
Note that tail if matches(head, k) else path_pattern
means: consume head
if possible. To use a range in the pattern, just use range(...)
.
As you can see, you never escape from case 2. : if the path_pattern
is empty, you just have to map all leaves whatever happens. This is clearer in this version:
def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)
def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)
I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:
map_at(func, path_pattern, data)
:
- if
path_pattern
is not empty
- if
data
is terminal, it's a failure : we did not match the fullpath_pattern
̀so there is no reason to apply the function. Just returndata
. - else, we have to explore every path in data. We consume the head of
path_pattern
if possible. That is return a dictdata key
->map_at(func, new_path, data value)
wherenew_path
is thetail
of thepath_pattern
if the key matches thehead
, else the `path_pattern itself.
- if
- else, it's a success, because all the
path_pattern
was consumed:
- if
data
is terminal, returnfunc(data)
- else, find the leaves and apply
func
: return return a dictdata key
->map_at(func, , data value)
- if
Notes:
- I assume that the pattern
*-b-d
matches the path0-a-b-c-d-e
; - it's an eager algorithm: the head of the path is always consumed when possible;
- if the path is fully consumed, every terminal should be mapped;
- it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.
Here's the code:
def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, , v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)
Note that tail if matches(head, k) else path_pattern
means: consume head
if possible. To use a range in the pattern, just use range(...)
.
As you can see, you never escape from case 2. : if the path_pattern
is empty, you just have to map all leaves whatever happens. This is clearer in this version:
def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)
def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)
edited yesterday
answered yesterday
jferardjferard
1,597311
1,597311
I expected it work with any nested combinations oflist
anddict
. Givendata2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}]
,map_at(type,['*',['b','c'],'d'],data2)
returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like0:2
and:
in place of the wildcard, resulting in syntax errors.
– alancalvitti
10 hours ago
@alancalvitti I did not implement slices, but you can userange(0,2)
instead of0:2
. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.
– jferard
10 hours ago
add a comment |
I expected it work with any nested combinations oflist
anddict
. Givendata2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}]
,map_at(type,['*',['b','c'],'d'],data2)
returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like0:2
and:
in place of the wildcard, resulting in syntax errors.
– alancalvitti
10 hours ago
@alancalvitti I did not implement slices, but you can userange(0,2)
instead of0:2
. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.
– jferard
10 hours ago
I expected it work with any nested combinations of
list
and dict
. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}]
, map_at(type,['*',['b','c'],'d'],data2)
returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2
and :
in place of the wildcard, resulting in syntax errors.– alancalvitti
10 hours ago
I expected it work with any nested combinations of
list
and dict
. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}]
, map_at(type,['*',['b','c'],'d'],data2)
returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2
and :
in place of the wildcard, resulting in syntax errors.– alancalvitti
10 hours ago
@alancalvitti I did not implement slices, but you can use
range(0,2)
instead of 0:2
. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.– jferard
10 hours ago
@alancalvitti I did not implement slices, but you can use
range(0,2)
instead of 0:2
. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.– jferard
10 hours ago
add a comment |
This is not very simple and less efficient, but it should work:
def map_at(f,kp,d): return map_at0(f,kp,d,0)
def slice_contains(s,i): # no negative-index support
a=s.start or 0
return i>=a and (s.end is None or i<s.end) and
not (i-a)%(s.step or 1)
def map_at0(f,kp,d,i):
if i==len(kp): return f(d)
if not isinstance(d,dict): return d # no such path here
ret={}
p=kp[i]
if isinstance(p,str) and p!='*': p=p,
for j,(k,v) in enumerate(sorted(d.items())):
if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):
v=map_at0(f,kp,v,i+1)
ret[k]=v
return ret
Note that this copies every dictionary it expands (because it matches the key path, even if no further keys match and f
is never applied) but returns unmatched subdictionaries by reference. Note also that '*'
can be “quoted” by putting it in a list.
I triedmap_at(type,['*',['b','c'],'d'],data)
usingdata
as per original Q and gotAttributeError: 'int' object has no attribute 'items'
– alancalvitti
10 hours ago
Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.
– Davis Herring
2 hours ago
add a comment |
This is not very simple and less efficient, but it should work:
def map_at(f,kp,d): return map_at0(f,kp,d,0)
def slice_contains(s,i): # no negative-index support
a=s.start or 0
return i>=a and (s.end is None or i<s.end) and
not (i-a)%(s.step or 1)
def map_at0(f,kp,d,i):
if i==len(kp): return f(d)
if not isinstance(d,dict): return d # no such path here
ret={}
p=kp[i]
if isinstance(p,str) and p!='*': p=p,
for j,(k,v) in enumerate(sorted(d.items())):
if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):
v=map_at0(f,kp,v,i+1)
ret[k]=v
return ret
Note that this copies every dictionary it expands (because it matches the key path, even if no further keys match and f
is never applied) but returns unmatched subdictionaries by reference. Note also that '*'
can be “quoted” by putting it in a list.
I triedmap_at(type,['*',['b','c'],'d'],data)
usingdata
as per original Q and gotAttributeError: 'int' object has no attribute 'items'
– alancalvitti
10 hours ago
Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.
– Davis Herring
2 hours ago
add a comment |
This is not very simple and less efficient, but it should work:
def map_at(f,kp,d): return map_at0(f,kp,d,0)
def slice_contains(s,i): # no negative-index support
a=s.start or 0
return i>=a and (s.end is None or i<s.end) and
not (i-a)%(s.step or 1)
def map_at0(f,kp,d,i):
if i==len(kp): return f(d)
if not isinstance(d,dict): return d # no such path here
ret={}
p=kp[i]
if isinstance(p,str) and p!='*': p=p,
for j,(k,v) in enumerate(sorted(d.items())):
if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):
v=map_at0(f,kp,v,i+1)
ret[k]=v
return ret
Note that this copies every dictionary it expands (because it matches the key path, even if no further keys match and f
is never applied) but returns unmatched subdictionaries by reference. Note also that '*'
can be “quoted” by putting it in a list.
This is not very simple and less efficient, but it should work:
def map_at(f,kp,d): return map_at0(f,kp,d,0)
def slice_contains(s,i): # no negative-index support
a=s.start or 0
return i>=a and (s.end is None or i<s.end) and
not (i-a)%(s.step or 1)
def map_at0(f,kp,d,i):
if i==len(kp): return f(d)
if not isinstance(d,dict): return d # no such path here
ret={}
p=kp[i]
if isinstance(p,str) and p!='*': p=p,
for j,(k,v) in enumerate(sorted(d.items())):
if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):
v=map_at0(f,kp,v,i+1)
ret[k]=v
return ret
Note that this copies every dictionary it expands (because it matches the key path, even if no further keys match and f
is never applied) but returns unmatched subdictionaries by reference. Note also that '*'
can be “quoted” by putting it in a list.
edited 2 hours ago
answered Jan 4 at 1:48
Davis HerringDavis Herring
8,5351736
8,5351736
I triedmap_at(type,['*',['b','c'],'d'],data)
usingdata
as per original Q and gotAttributeError: 'int' object has no attribute 'items'
– alancalvitti
10 hours ago
Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.
– Davis Herring
2 hours ago
add a comment |
I triedmap_at(type,['*',['b','c'],'d'],data)
usingdata
as per original Q and gotAttributeError: 'int' object has no attribute 'items'
– alancalvitti
10 hours ago
Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.
– Davis Herring
2 hours ago
I tried
map_at(type,['*',['b','c'],'d'],data)
using data
as per original Q and got AttributeError: 'int' object has no attribute 'items'
– alancalvitti
10 hours ago
I tried
map_at(type,['*',['b','c'],'d'],data)
using data
as per original Q and got AttributeError: 'int' object has no attribute 'items'
– alancalvitti
10 hours ago
Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.
– Davis Herring
2 hours ago
Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.
– Davis Herring
2 hours ago
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53991503%2fmap-a-function-by-key-path-in-nested-dict-including-slices-wildcards-and-ragged%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes
try
?– Davis Herring
Jan 1 at 1:06
@DavisHerring, the primary problem is
KeyError
is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.– alancalvitti
Jan 2 at 14:34
What if a key path resolves to a
dict
?– Davis Herring
Jan 2 at 21:28
@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?
– alancalvitti
Jan 3 at 14:52
No ambiguity, but does “should return it” mean with or without applying
f
?– Davis Herring
Jan 3 at 14:58