Map a function by key path in nested dict including slices, wildcards and ragged hierarchies












3















This question is an extension based on here and here.



What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:




  1. List of keys at a given path position

  2. Key slices (assuming sorting)

  3. Wildcards (ie all keys at a path position)

  4. Handling ragged hierarchies by ignoring keys that don't appear at a given level


If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...)).



However, the hierarchy can be ragged, eg:



data = {0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': 300}}}


Would like to be able to specify key path like this:



map_at(f, ['*',['b','c'],'d'])


To return:



{0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}


Here f is mapped to key paths [2,b,d] and [3,c,d].



Slicing would be specified as, eg [0:3,b] for instance.



I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f would also be mapped at [0,b]` and other paths).



Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError etc?



Please do not suggest Pandas as an alternative.










share|improve this question


















  • 1





    Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes try?

    – Davis Herring
    Jan 1 at 1:06











  • @DavisHerring, the primary problem is KeyError is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.

    – alancalvitti
    Jan 2 at 14:34











  • What if a key path resolves to a dict?

    – Davis Herring
    Jan 2 at 21:28











  • @DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?

    – alancalvitti
    Jan 3 at 14:52











  • No ambiguity, but does “should return it” mean with or without applying f?

    – Davis Herring
    Jan 3 at 14:58
















3















This question is an extension based on here and here.



What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:




  1. List of keys at a given path position

  2. Key slices (assuming sorting)

  3. Wildcards (ie all keys at a path position)

  4. Handling ragged hierarchies by ignoring keys that don't appear at a given level


If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...)).



However, the hierarchy can be ragged, eg:



data = {0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': 300}}}


Would like to be able to specify key path like this:



map_at(f, ['*',['b','c'],'d'])


To return:



{0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}


Here f is mapped to key paths [2,b,d] and [3,c,d].



Slicing would be specified as, eg [0:3,b] for instance.



I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f would also be mapped at [0,b]` and other paths).



Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError etc?



Please do not suggest Pandas as an alternative.










share|improve this question


















  • 1





    Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes try?

    – Davis Herring
    Jan 1 at 1:06











  • @DavisHerring, the primary problem is KeyError is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.

    – alancalvitti
    Jan 2 at 14:34











  • What if a key path resolves to a dict?

    – Davis Herring
    Jan 2 at 21:28











  • @DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?

    – alancalvitti
    Jan 3 at 14:52











  • No ambiguity, but does “should return it” mean with or without applying f?

    – Davis Herring
    Jan 3 at 14:58














3












3








3


0






This question is an extension based on here and here.



What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:




  1. List of keys at a given path position

  2. Key slices (assuming sorting)

  3. Wildcards (ie all keys at a path position)

  4. Handling ragged hierarchies by ignoring keys that don't appear at a given level


If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...)).



However, the hierarchy can be ragged, eg:



data = {0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': 300}}}


Would like to be able to specify key path like this:



map_at(f, ['*',['b','c'],'d'])


To return:



{0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}


Here f is mapped to key paths [2,b,d] and [3,c,d].



Slicing would be specified as, eg [0:3,b] for instance.



I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f would also be mapped at [0,b]` and other paths).



Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError etc?



Please do not suggest Pandas as an alternative.










share|improve this question














This question is an extension based on here and here.



What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:




  1. List of keys at a given path position

  2. Key slices (assuming sorting)

  3. Wildcards (ie all keys at a path position)

  4. Handling ragged hierarchies by ignoring keys that don't appear at a given level


If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...)).



However, the hierarchy can be ragged, eg:



data = {0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': 300}}}


Would like to be able to specify key path like this:



map_at(f, ['*',['b','c'],'d'])


To return:



{0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}


Here f is mapped to key paths [2,b,d] and [3,c,d].



Slicing would be specified as, eg [0:3,b] for instance.



I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f would also be mapped at [0,b]` and other paths).



Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError etc?



Please do not suggest Pandas as an alternative.







python dictionary functional-programming






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 31 '18 at 21:21









alancalvittialancalvitti

1528




1528








  • 1





    Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes try?

    – Davis Herring
    Jan 1 at 1:06











  • @DavisHerring, the primary problem is KeyError is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.

    – alancalvitti
    Jan 2 at 14:34











  • What if a key path resolves to a dict?

    – Davis Herring
    Jan 2 at 21:28











  • @DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?

    – alancalvitti
    Jan 3 at 14:52











  • No ambiguity, but does “should return it” mean with or without applying f?

    – Davis Herring
    Jan 3 at 14:58














  • 1





    Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes try?

    – Davis Herring
    Jan 1 at 1:06











  • @DavisHerring, the primary problem is KeyError is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.

    – alancalvitti
    Jan 2 at 14:34











  • What if a key path resolves to a dict?

    – Davis Herring
    Jan 2 at 21:28











  • @DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?

    – alancalvitti
    Jan 3 at 14:52











  • No ambiguity, but does “should return it” mean with or without applying f?

    – Davis Herring
    Jan 3 at 14:58








1




1





Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes try?

– Davis Herring
Jan 1 at 1:06





Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes try?

– Davis Herring
Jan 1 at 1:06













@DavisHerring, the primary problem is KeyError is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.

– alancalvitti
Jan 2 at 14:34





@DavisHerring, the primary problem is KeyError is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.

– alancalvitti
Jan 2 at 14:34













What if a key path resolves to a dict?

– Davis Herring
Jan 2 at 21:28





What if a key path resolves to a dict?

– Davis Herring
Jan 2 at 21:28













@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?

– alancalvitti
Jan 3 at 14:52





@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?

– alancalvitti
Jan 3 at 14:52













No ambiguity, but does “should return it” mean with or without applying f?

– Davis Herring
Jan 3 at 14:58





No ambiguity, but does “should return it” mean with or without applying f?

– Davis Herring
Jan 3 at 14:58












2 Answers
2






active

oldest

votes


















1














I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:



map_at(func, path_pattern, data):




  1. if path_pattern is not empty


    • if data is terminal, it's a failure : we did not match the full path_pattern ̀so there is no reason to apply the function. Just return data.

    • else, we have to explore every path in data. We consume the head of path_pattern if possible. That is return a dict data key -> map_at(func, new_path, data value) where new_path is the tail of the path_pattern if the key matches the head, else the `path_pattern itself.



  2. else, it's a success, because all the path_pattern was consumed:


    • if data is terminal, return func(data)

    • else, find the leaves and apply func: return return a dict data key -> map_at(func, , data value)




Notes:




  • I assume that the pattern *-b-d matches the path 0-a-b-c-d-e;

  • it's an eager algorithm: the head of the path is always consumed when possible;

  • if the path is fully consumed, every terminal should be mapped;

  • it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.


Here's the code:



def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern

if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, , v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)


Note that tail if matches(head, k) else path_pattern means: consume head if possible. To use a range in the pattern, just use range(...).



As you can see, you never escape from case 2. : if the path_pattern is empty, you just have to map all leaves whatever happens. This is clearer in this version:



def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)

def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern

if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)





share|improve this answer


























  • I expected it work with any nested combinations of list and dict. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}], map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

    – alancalvitti
    10 hours ago













  • @alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

    – jferard
    10 hours ago





















0














This is not very simple and less efficient, but it should work:



def map_at(f,kp,d): return map_at0(f,kp,d,0)
def slice_contains(s,i): # no negative-index support
a=s.start or 0
return i>=a and (s.end is None or i<s.end) and
not (i-a)%(s.step or 1)
def map_at0(f,kp,d,i):
if i==len(kp): return f(d)
if not isinstance(d,dict): return d # no such path here
ret={}
p=kp[i]
if isinstance(p,str) and p!='*': p=p,
for j,(k,v) in enumerate(sorted(d.items())):
if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):
v=map_at0(f,kp,v,i+1)
ret[k]=v
return ret


Note that this copies every dictionary it expands (because it matches the key path, even if no further keys match and f is never applied) but returns unmatched subdictionaries by reference. Note also that '*' can be “quoted” by putting it in a list.






share|improve this answer


























  • I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

    – alancalvitti
    10 hours ago











  • Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

    – Davis Herring
    2 hours ago











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53991503%2fmap-a-function-by-key-path-in-nested-dict-including-slices-wildcards-and-ragged%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:



map_at(func, path_pattern, data):




  1. if path_pattern is not empty


    • if data is terminal, it's a failure : we did not match the full path_pattern ̀so there is no reason to apply the function. Just return data.

    • else, we have to explore every path in data. We consume the head of path_pattern if possible. That is return a dict data key -> map_at(func, new_path, data value) where new_path is the tail of the path_pattern if the key matches the head, else the `path_pattern itself.



  2. else, it's a success, because all the path_pattern was consumed:


    • if data is terminal, return func(data)

    • else, find the leaves and apply func: return return a dict data key -> map_at(func, , data value)




Notes:




  • I assume that the pattern *-b-d matches the path 0-a-b-c-d-e;

  • it's an eager algorithm: the head of the path is always consumed when possible;

  • if the path is fully consumed, every terminal should be mapped;

  • it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.


Here's the code:



def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern

if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, , v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)


Note that tail if matches(head, k) else path_pattern means: consume head if possible. To use a range in the pattern, just use range(...).



As you can see, you never escape from case 2. : if the path_pattern is empty, you just have to map all leaves whatever happens. This is clearer in this version:



def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)

def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern

if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)





share|improve this answer


























  • I expected it work with any nested combinations of list and dict. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}], map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

    – alancalvitti
    10 hours ago













  • @alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

    – jferard
    10 hours ago


















1














I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:



map_at(func, path_pattern, data):




  1. if path_pattern is not empty


    • if data is terminal, it's a failure : we did not match the full path_pattern ̀so there is no reason to apply the function. Just return data.

    • else, we have to explore every path in data. We consume the head of path_pattern if possible. That is return a dict data key -> map_at(func, new_path, data value) where new_path is the tail of the path_pattern if the key matches the head, else the `path_pattern itself.



  2. else, it's a success, because all the path_pattern was consumed:


    • if data is terminal, return func(data)

    • else, find the leaves and apply func: return return a dict data key -> map_at(func, , data value)




Notes:




  • I assume that the pattern *-b-d matches the path 0-a-b-c-d-e;

  • it's an eager algorithm: the head of the path is always consumed when possible;

  • if the path is fully consumed, every terminal should be mapped;

  • it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.


Here's the code:



def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern

if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, , v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)


Note that tail if matches(head, k) else path_pattern means: consume head if possible. To use a range in the pattern, just use range(...).



As you can see, you never escape from case 2. : if the path_pattern is empty, you just have to map all leaves whatever happens. This is clearer in this version:



def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)

def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern

if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)





share|improve this answer


























  • I expected it work with any nested combinations of list and dict. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}], map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

    – alancalvitti
    10 hours ago













  • @alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

    – jferard
    10 hours ago
















1












1








1







I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:



map_at(func, path_pattern, data):




  1. if path_pattern is not empty


    • if data is terminal, it's a failure : we did not match the full path_pattern ̀so there is no reason to apply the function. Just return data.

    • else, we have to explore every path in data. We consume the head of path_pattern if possible. That is return a dict data key -> map_at(func, new_path, data value) where new_path is the tail of the path_pattern if the key matches the head, else the `path_pattern itself.



  2. else, it's a success, because all the path_pattern was consumed:


    • if data is terminal, return func(data)

    • else, find the leaves and apply func: return return a dict data key -> map_at(func, , data value)




Notes:




  • I assume that the pattern *-b-d matches the path 0-a-b-c-d-e;

  • it's an eager algorithm: the head of the path is always consumed when possible;

  • if the path is fully consumed, every terminal should be mapped;

  • it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.


Here's the code:



def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern

if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, , v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)


Note that tail if matches(head, k) else path_pattern means: consume head if possible. To use a range in the pattern, just use range(...).



As you can see, you never escape from case 2. : if the path_pattern is empty, you just have to map all leaves whatever happens. This is clearer in this version:



def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)

def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern

if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)





share|improve this answer















I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:



map_at(func, path_pattern, data):




  1. if path_pattern is not empty


    • if data is terminal, it's a failure : we did not match the full path_pattern ̀so there is no reason to apply the function. Just return data.

    • else, we have to explore every path in data. We consume the head of path_pattern if possible. That is return a dict data key -> map_at(func, new_path, data value) where new_path is the tail of the path_pattern if the key matches the head, else the `path_pattern itself.



  2. else, it's a success, because all the path_pattern was consumed:


    • if data is terminal, return func(data)

    • else, find the leaves and apply func: return return a dict data key -> map_at(func, , data value)




Notes:




  • I assume that the pattern *-b-d matches the path 0-a-b-c-d-e;

  • it's an eager algorithm: the head of the path is always consumed when possible;

  • if the path is fully consumed, every terminal should be mapped;

  • it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.


Here's the code:



def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern

if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, , v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)


Note that tail if matches(head, k) else path_pattern means: consume head if possible. To use a range in the pattern, just use range(...).



As you can see, you never escape from case 2. : if the path_pattern is empty, you just have to map all leaves whatever happens. This is clearer in this version:



def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)

def map_at(func, path_pattern, data):
def matches(pattern, value):
return pattern == '*' or value == pattern or value in pattern

if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)






share|improve this answer














share|improve this answer



share|improve this answer








edited yesterday

























answered yesterday









jferardjferard

1,597311




1,597311













  • I expected it work with any nested combinations of list and dict. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}], map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

    – alancalvitti
    10 hours ago













  • @alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

    – jferard
    10 hours ago





















  • I expected it work with any nested combinations of list and dict. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}], map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

    – alancalvitti
    10 hours ago













  • @alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

    – jferard
    10 hours ago



















I expected it work with any nested combinations of list and dict. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}], map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

– alancalvitti
10 hours ago







I expected it work with any nested combinations of list and dict. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}], map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

– alancalvitti
10 hours ago















@alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

– jferard
10 hours ago







@alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

– jferard
10 hours ago















0














This is not very simple and less efficient, but it should work:



def map_at(f,kp,d): return map_at0(f,kp,d,0)
def slice_contains(s,i): # no negative-index support
a=s.start or 0
return i>=a and (s.end is None or i<s.end) and
not (i-a)%(s.step or 1)
def map_at0(f,kp,d,i):
if i==len(kp): return f(d)
if not isinstance(d,dict): return d # no such path here
ret={}
p=kp[i]
if isinstance(p,str) and p!='*': p=p,
for j,(k,v) in enumerate(sorted(d.items())):
if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):
v=map_at0(f,kp,v,i+1)
ret[k]=v
return ret


Note that this copies every dictionary it expands (because it matches the key path, even if no further keys match and f is never applied) but returns unmatched subdictionaries by reference. Note also that '*' can be “quoted” by putting it in a list.






share|improve this answer


























  • I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

    – alancalvitti
    10 hours ago











  • Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

    – Davis Herring
    2 hours ago
















0














This is not very simple and less efficient, but it should work:



def map_at(f,kp,d): return map_at0(f,kp,d,0)
def slice_contains(s,i): # no negative-index support
a=s.start or 0
return i>=a and (s.end is None or i<s.end) and
not (i-a)%(s.step or 1)
def map_at0(f,kp,d,i):
if i==len(kp): return f(d)
if not isinstance(d,dict): return d # no such path here
ret={}
p=kp[i]
if isinstance(p,str) and p!='*': p=p,
for j,(k,v) in enumerate(sorted(d.items())):
if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):
v=map_at0(f,kp,v,i+1)
ret[k]=v
return ret


Note that this copies every dictionary it expands (because it matches the key path, even if no further keys match and f is never applied) but returns unmatched subdictionaries by reference. Note also that '*' can be “quoted” by putting it in a list.






share|improve this answer


























  • I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

    – alancalvitti
    10 hours ago











  • Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

    – Davis Herring
    2 hours ago














0












0








0







This is not very simple and less efficient, but it should work:



def map_at(f,kp,d): return map_at0(f,kp,d,0)
def slice_contains(s,i): # no negative-index support
a=s.start or 0
return i>=a and (s.end is None or i<s.end) and
not (i-a)%(s.step or 1)
def map_at0(f,kp,d,i):
if i==len(kp): return f(d)
if not isinstance(d,dict): return d # no such path here
ret={}
p=kp[i]
if isinstance(p,str) and p!='*': p=p,
for j,(k,v) in enumerate(sorted(d.items())):
if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):
v=map_at0(f,kp,v,i+1)
ret[k]=v
return ret


Note that this copies every dictionary it expands (because it matches the key path, even if no further keys match and f is never applied) but returns unmatched subdictionaries by reference. Note also that '*' can be “quoted” by putting it in a list.






share|improve this answer















This is not very simple and less efficient, but it should work:



def map_at(f,kp,d): return map_at0(f,kp,d,0)
def slice_contains(s,i): # no negative-index support
a=s.start or 0
return i>=a and (s.end is None or i<s.end) and
not (i-a)%(s.step or 1)
def map_at0(f,kp,d,i):
if i==len(kp): return f(d)
if not isinstance(d,dict): return d # no such path here
ret={}
p=kp[i]
if isinstance(p,str) and p!='*': p=p,
for j,(k,v) in enumerate(sorted(d.items())):
if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):
v=map_at0(f,kp,v,i+1)
ret[k]=v
return ret


Note that this copies every dictionary it expands (because it matches the key path, even if no further keys match and f is never applied) but returns unmatched subdictionaries by reference. Note also that '*' can be “quoted” by putting it in a list.







share|improve this answer














share|improve this answer



share|improve this answer








edited 2 hours ago

























answered Jan 4 at 1:48









Davis HerringDavis Herring

8,5351736




8,5351736













  • I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

    – alancalvitti
    10 hours ago











  • Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

    – Davis Herring
    2 hours ago



















  • I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

    – alancalvitti
    10 hours ago











  • Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

    – Davis Herring
    2 hours ago

















I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

– alancalvitti
10 hours ago





I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

– alancalvitti
10 hours ago













Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

– Davis Herring
2 hours ago





Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

– Davis Herring
2 hours ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53991503%2fmap-a-function-by-key-path-in-nested-dict-including-slices-wildcards-and-ragged%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas