Map a function by key path in nested dict including slices, wildcards and ragged hierarchies

This question is an extension based on here and here.

What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:

List of keys at a given path position

Key slices (assuming sorting)

Wildcards (ie all keys at a path position)

Handling ragged hierarchies by ignoring keys that don't appear at a given level

If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...)).

However, the hierarchy can be ragged, eg:

data = {0: {'a': 1, 'b': 2},

 1: {'a': 10, 'c': 13},

 2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},

 3: {'a': 30, 'b': 31, 'c': {'d': 300}}}

Would like to be able to specify key path like this:

map_at(f, ['*',['b','c'],'d'])

To return:

{0: {'a': 1, 'b': 2},

     1: {'a': 10, 'c': 13},

     2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},

     3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}

Here f is mapped to key paths [2,b,d] and [3,c,d].

Slicing would be specified as, eg [0:3,b] for instance.

I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f would also be mapped at [0,b]` and other paths).

Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError etc?

Please do not suggest Pandas as an alternative.

asked Dec 31 '18 at 21:21

alancalvitti

1528

1

Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes try?

– Davis Herring
Jan 1 at 1:06

@DavisHerring, the primary problem is KeyError is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.

– alancalvitti
Jan 2 at 14:34

What if a key path resolves to a dict?

– Davis Herring
Jan 2 at 21:28

@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?

– alancalvitti
Jan 3 at 14:52

No ambiguity, but does “should return it” mean with or without applying f?

– Davis Herring
Jan 3 at 14:58

|
show 3 more comments

This question is an extension based on here and here.

What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:

List of keys at a given path position

Key slices (assuming sorting)

Wildcards (ie all keys at a path position)

Handling ragged hierarchies by ignoring keys that don't appear at a given level

If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...)).

However, the hierarchy can be ragged, eg:

data = {0: {'a': 1, 'b': 2},

 1: {'a': 10, 'c': 13},

 2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},

 3: {'a': 30, 'b': 31, 'c': {'d': 300}}}

Would like to be able to specify key path like this:

map_at(f, ['*',['b','c'],'d'])

To return:

{0: {'a': 1, 'b': 2},

     1: {'a': 10, 'c': 13},

     2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},

     3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}

Here f is mapped to key paths [2,b,d] and [3,c,d].

Slicing would be specified as, eg [0:3,b] for instance.

I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f would also be mapped at [0,b]` and other paths).

Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError etc?

Please do not suggest Pandas as an alternative.

asked Dec 31 '18 at 21:21

alancalvitti

1528

1

Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes try?

– Davis Herring
Jan 1 at 1:06

@DavisHerring, the primary problem is KeyError is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.

– alancalvitti
Jan 2 at 14:34

What if a key path resolves to a dict?

– Davis Herring
Jan 2 at 21:28

@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?

– alancalvitti
Jan 3 at 14:52

No ambiguity, but does “should return it” mean with or without applying f?

– Davis Herring
Jan 3 at 14:58

|
show 3 more comments

This question is an extension based on here and here.

What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:

List of keys at a given path position

Key slices (assuming sorting)

Wildcards (ie all keys at a path position)

Handling ragged hierarchies by ignoring keys that don't appear at a given level

If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...)).

However, the hierarchy can be ragged, eg:

data = {0: {'a': 1, 'b': 2},

 1: {'a': 10, 'c': 13},

 2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},

 3: {'a': 30, 'b': 31, 'c': {'d': 300}}}

Would like to be able to specify key path like this:

map_at(f, ['*',['b','c'],'d'])

To return:

{0: {'a': 1, 'b': 2},

     1: {'a': 10, 'c': 13},

     2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},

     3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}

Here f is mapped to key paths [2,b,d] and [3,c,d].

Slicing would be specified as, eg [0:3,b] for instance.

I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f would also be mapped at [0,b]` and other paths).

Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError etc?

Please do not suggest Pandas as an alternative.

asked Dec 31 '18 at 21:21

alancalvitti

1528

This question is an extension based on here and here.

What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:

List of keys at a given path position

Key slices (assuming sorting)

Wildcards (ie all keys at a path position)

Handling ragged hierarchies by ignoring keys that don't appear at a given level

If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...)).

However, the hierarchy can be ragged, eg:

data = {0: {'a': 1, 'b': 2},

 1: {'a': 10, 'c': 13},

 2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},

 3: {'a': 30, 'b': 31, 'c': {'d': 300}}}

Would like to be able to specify key path like this:

map_at(f, ['*',['b','c'],'d'])

To return:

{0: {'a': 1, 'b': 2},

     1: {'a': 10, 'c': 13},

     2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},

     3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}

Here f is mapped to key paths [2,b,d] and [3,c,d].

Slicing would be specified as, eg [0:3,b] for instance.

I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f would also be mapped at [0,b]` and other paths).

Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError etc?

Please do not suggest Pandas as an alternative.

python dictionary functional-programming

asked Dec 31 '18 at 21:21

alancalvitti

1528

asked Dec 31 '18 at 21:21

alancalvitti

1528

asked Dec 31 '18 at 21:21

alancalvitti

1528

asked Dec 31 '18 at 21:21

alancalvitti

1528

asked Dec 31 '18 at 21:21

alancalvitti

1528

1

Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes try?

– Davis Herring
Jan 1 at 1:06

@DavisHerring, the primary problem is KeyError is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.

– alancalvitti
Jan 2 at 14:34

What if a key path resolves to a dict?

– Davis Herring
Jan 2 at 21:28

@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?

– alancalvitti
Jan 3 at 14:52

No ambiguity, but does “should return it” mean with or without applying f?

– Davis Herring
Jan 3 at 14:58

|
show 3 more comments

1

Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes try?

– Davis Herring
Jan 1 at 1:06

@DavisHerring, the primary problem is KeyError is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.

– alancalvitti
Jan 2 at 14:34

What if a key path resolves to a dict?

– Davis Herring
Jan 2 at 21:28

@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?

– alancalvitti
Jan 3 at 14:52

No ambiguity, but does “should return it” mean with or without applying f?

– Davis Herring
Jan 3 at 14:58

Anything can be implemented via recursion—exactly what kind of “heavy lifting” are you trying to avoid that includes try?

– Davis Herring
Jan 1 at 1:06

@DavisHerring, the primary problem is KeyError is thrown in ragged data when one or more branches do not have a specified key, as shown in the example.

– alancalvitti
Jan 2 at 14:34

What if a key path resolves to a dict?

– Davis Herring
Jan 2 at 21:28

@DavisHerring, if key path resolves to a dict, it should return it. Do you foresee any ambiguities there?

– alancalvitti
Jan 3 at 14:52

No ambiguity, but does “should return it” mean with or without applying f?

– Davis Herring
Jan 3 at 14:58

|
show 3 more comments

2 Answers
2

active

oldest

votes

I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:

map_at(func, path_pattern, data):

if path_pattern is not empty
- if data is terminal, it's a failure : we did not match the full path_pattern ̀so there is no reason to apply the function. Just return data.
- else, we have to explore every path in data. We consume the head of path_pattern if possible. That is return a dict data key -> map_at(func, new_path, data value) where new_path is the tail of the path_pattern if the key matches the head, else the `path_pattern itself.

else, it's a success, because all the path_pattern was consumed:
- if data is terminal, return func(data)
- else, find the leaves and apply func: return return a dict data key -> map_at(func, , data value)

Notes:

I assume that the pattern *-b-d matches the path 0-a-b-c-d-e;

it's an eager algorithm: the head of the path is always consumed when possible;

if the path is fully consumed, every terminal should be mapped;

it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.

Here's the code:

def map_at(func, path_pattern, data):

    def matches(pattern, value):

        return pattern == '*' or value == pattern or value in pattern



    if path_pattern:

        head, *tail = path_pattern

        try: # try to consume head for each key of data

            return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}

        except AttributeError: # fail: terminal data but path_pattern was not consumed

            return data

    else: # success: path_pattern is empty.

        try: # not a leaf: map every leaf of every path

            return {k: map_at(func, , v) for k,v in data.items()}

        except AttributeError: # a leaf: map it

            return func(data)

Note that tail if matches(head, k) else path_pattern means: consume head if possible. To use a range in the pattern, just use range(...).

As you can see, you never escape from case 2. : if the path_pattern is empty, you just have to map all leaves whatever happens. This is clearer in this version:

def map_all_leaves(func, data):

    """Apply func to all leaves"""

    try:

        return {k: map_all_leaves(func, v) for k,v in data.items()}

    except AttributeError:

        return func(data)



def map_at(func, path_pattern, data):

    def matches(pattern, value):

        return pattern == '*' or value == pattern or value in pattern



    if path_pattern:

        head, *tail = path_pattern

        try: # try to consume head for each key of data

            return {k: map_at(func, tail if matches(head, k) else  path_pattern, v) for k,v in data.items()}

        except AttributeError: # fail: terminal data but path_pattern is not consumed

            return data

    else:

        map_all_leaves(func, data)

edited yesterday

answered yesterday

jferard

1,597311

I expected it work with any nested combinations of list and dict. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}], map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

– alancalvitti
10 hours ago

@alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

– jferard
10 hours ago

add a comment |

This is not very simple and less efficient, but it should work:

def map_at(f,kp,d): return map_at0(f,kp,d,0)

def slice_contains(s,i):  # no negative-index support

  a=s.start or 0

  return i>=a and (s.end is None or i<s.end) and

    not (i-a)%(s.step or 1)

def map_at0(f,kp,d,i):

  if i==len(kp): return f(d)

  if not isinstance(d,dict): return d  # no such path here

  ret={}

  p=kp[i]

  if isinstance(p,str) and p!='*': p=p,

  for j,(k,v) in enumerate(sorted(d.items())):

    if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):

      v=map_at0(f,kp,v,i+1)

    ret[k]=v

  return ret

Note that this copies every dictionary it expands (because it matches the key path, even if no further keys match and f is never applied) but returns unmatched subdictionaries by reference. Note also that '*' can be “quoted” by putting it in a list.

edited 2 hours ago

answered Jan 4 at 1:48

Davis Herring

8,5351736

I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

– alancalvitti
10 hours ago

Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

– Davis Herring
2 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53991503%2fmap-a-function-by-key-path-in-nested-dict-including-slices-wildcards-and-ragged%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:

map_at(func, path_pattern, data):

if path_pattern is not empty
- if data is terminal, it's a failure : we did not match the full path_pattern ̀so there is no reason to apply the function. Just return data.
- else, we have to explore every path in data. We consume the head of path_pattern if possible. That is return a dict data key -> map_at(func, new_path, data value) where new_path is the tail of the path_pattern if the key matches the head, else the `path_pattern itself.

else, it's a success, because all the path_pattern was consumed:
- if data is terminal, return func(data)
- else, find the leaves and apply func: return return a dict data key -> map_at(func, , data value)

Notes:

I assume that the pattern *-b-d matches the path 0-a-b-c-d-e;

it's an eager algorithm: the head of the path is always consumed when possible;

if the path is fully consumed, every terminal should be mapped;

it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.

Here's the code:

def map_at(func, path_pattern, data):

    def matches(pattern, value):

        return pattern == '*' or value == pattern or value in pattern



    if path_pattern:

        head, *tail = path_pattern

        try: # try to consume head for each key of data

            return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}

        except AttributeError: # fail: terminal data but path_pattern was not consumed

            return data

    else: # success: path_pattern is empty.

        try: # not a leaf: map every leaf of every path

            return {k: map_at(func, , v) for k,v in data.items()}

        except AttributeError: # a leaf: map it

            return func(data)

Note that tail if matches(head, k) else path_pattern means: consume head if possible. To use a range in the pattern, just use range(...).

As you can see, you never escape from case 2. : if the path_pattern is empty, you just have to map all leaves whatever happens. This is clearer in this version:

def map_all_leaves(func, data):

    """Apply func to all leaves"""

    try:

        return {k: map_all_leaves(func, v) for k,v in data.items()}

    except AttributeError:

        return func(data)



def map_at(func, path_pattern, data):

    def matches(pattern, value):

        return pattern == '*' or value == pattern or value in pattern



    if path_pattern:

        head, *tail = path_pattern

        try: # try to consume head for each key of data

            return {k: map_at(func, tail if matches(head, k) else  path_pattern, v) for k,v in data.items()}

        except AttributeError: # fail: terminal data but path_pattern is not consumed

            return data

    else:

        map_all_leaves(func, data)

edited yesterday

answered yesterday

jferard

1,597311

I expected it work with any nested combinations of list and dict. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}], map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

– alancalvitti
10 hours ago

@alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

– jferard
10 hours ago

add a comment |

I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:

map_at(func, path_pattern, data):

if path_pattern is not empty
- if data is terminal, it's a failure : we did not match the full path_pattern ̀so there is no reason to apply the function. Just return data.
- else, we have to explore every path in data. We consume the head of path_pattern if possible. That is return a dict data key -> map_at(func, new_path, data value) where new_path is the tail of the path_pattern if the key matches the head, else the `path_pattern itself.

else, it's a success, because all the path_pattern was consumed:
- if data is terminal, return func(data)
- else, find the leaves and apply func: return return a dict data key -> map_at(func, , data value)

Notes:

I assume that the pattern *-b-d matches the path 0-a-b-c-d-e;

it's an eager algorithm: the head of the path is always consumed when possible;

if the path is fully consumed, every terminal should be mapped;

it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.

Here's the code:

def map_at(func, path_pattern, data):

    def matches(pattern, value):

        return pattern == '*' or value == pattern or value in pattern



    if path_pattern:

        head, *tail = path_pattern

        try: # try to consume head for each key of data

            return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}

        except AttributeError: # fail: terminal data but path_pattern was not consumed

            return data

    else: # success: path_pattern is empty.

        try: # not a leaf: map every leaf of every path

            return {k: map_at(func, , v) for k,v in data.items()}

        except AttributeError: # a leaf: map it

            return func(data)

Note that tail if matches(head, k) else path_pattern means: consume head if possible. To use a range in the pattern, just use range(...).

As you can see, you never escape from case 2. : if the path_pattern is empty, you just have to map all leaves whatever happens. This is clearer in this version:

def map_all_leaves(func, data):

    """Apply func to all leaves"""

    try:

        return {k: map_all_leaves(func, v) for k,v in data.items()}

    except AttributeError:

        return func(data)



def map_at(func, path_pattern, data):

    def matches(pattern, value):

        return pattern == '*' or value == pattern or value in pattern



    if path_pattern:

        head, *tail = path_pattern

        try: # try to consume head for each key of data

            return {k: map_at(func, tail if matches(head, k) else  path_pattern, v) for k,v in data.items()}

        except AttributeError: # fail: terminal data but path_pattern is not consumed

            return data

    else:

        map_all_leaves(func, data)

edited yesterday

answered yesterday

jferard

1,597311

I expected it work with any nested combinations of list and dict. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}], map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

– alancalvitti
10 hours ago

@alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

– jferard
10 hours ago

add a comment |

I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:

map_at(func, path_pattern, data):

if path_pattern is not empty
- if data is terminal, it's a failure : we did not match the full path_pattern ̀so there is no reason to apply the function. Just return data.
- else, we have to explore every path in data. We consume the head of path_pattern if possible. That is return a dict data key -> map_at(func, new_path, data value) where new_path is the tail of the path_pattern if the key matches the head, else the `path_pattern itself.

else, it's a success, because all the path_pattern was consumed:
- if data is terminal, return func(data)
- else, find the leaves and apply func: return return a dict data key -> map_at(func, , data value)

Notes:

I assume that the pattern *-b-d matches the path 0-a-b-c-d-e;

it's an eager algorithm: the head of the path is always consumed when possible;

if the path is fully consumed, every terminal should be mapped;

it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.

Here's the code:

def map_at(func, path_pattern, data):

    def matches(pattern, value):

        return pattern == '*' or value == pattern or value in pattern



    if path_pattern:

        head, *tail = path_pattern

        try: # try to consume head for each key of data

            return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}

        except AttributeError: # fail: terminal data but path_pattern was not consumed

            return data

    else: # success: path_pattern is empty.

        try: # not a leaf: map every leaf of every path

            return {k: map_at(func, , v) for k,v in data.items()}

        except AttributeError: # a leaf: map it

            return func(data)

Note that tail if matches(head, k) else path_pattern means: consume head if possible. To use a range in the pattern, just use range(...).

As you can see, you never escape from case 2. : if the path_pattern is empty, you just have to map all leaves whatever happens. This is clearer in this version:

def map_all_leaves(func, data):

    """Apply func to all leaves"""

    try:

        return {k: map_all_leaves(func, v) for k,v in data.items()}

    except AttributeError:

        return func(data)



def map_at(func, path_pattern, data):

    def matches(pattern, value):

        return pattern == '*' or value == pattern or value in pattern



    if path_pattern:

        head, *tail = path_pattern

        try: # try to consume head for each key of data

            return {k: map_at(func, tail if matches(head, k) else  path_pattern, v) for k,v in data.items()}

        except AttributeError: # fail: terminal data but path_pattern is not consumed

            return data

    else:

        map_all_leaves(func, data)

edited yesterday

answered yesterday

jferard

1,597311

I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:

map_at(func, path_pattern, data):

if path_pattern is not empty
- if data is terminal, it's a failure : we did not match the full path_pattern ̀so there is no reason to apply the function. Just return data.
- else, we have to explore every path in data. We consume the head of path_pattern if possible. That is return a dict data key -> map_at(func, new_path, data value) where new_path is the tail of the path_pattern if the key matches the head, else the `path_pattern itself.

else, it's a success, because all the path_pattern was consumed:
- if data is terminal, return func(data)
- else, find the leaves and apply func: return return a dict data key -> map_at(func, , data value)

Notes:

I assume that the pattern *-b-d matches the path 0-a-b-c-d-e;

it's an eager algorithm: the head of the path is always consumed when possible;

if the path is fully consumed, every terminal should be mapped;

it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.

Here's the code:

def map_at(func, path_pattern, data):

    def matches(pattern, value):

        return pattern == '*' or value == pattern or value in pattern



    if path_pattern:

        head, *tail = path_pattern

        try: # try to consume head for each key of data

            return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}

        except AttributeError: # fail: terminal data but path_pattern was not consumed

            return data

    else: # success: path_pattern is empty.

        try: # not a leaf: map every leaf of every path

            return {k: map_at(func, , v) for k,v in data.items()}

        except AttributeError: # a leaf: map it

            return func(data)

Note that tail if matches(head, k) else path_pattern means: consume head if possible. To use a range in the pattern, just use range(...).

As you can see, you never escape from case 2. : if the path_pattern is empty, you just have to map all leaves whatever happens. This is clearer in this version:

def map_all_leaves(func, data):

    """Apply func to all leaves"""

    try:

        return {k: map_all_leaves(func, v) for k,v in data.items()}

    except AttributeError:

        return func(data)



def map_at(func, path_pattern, data):

    def matches(pattern, value):

        return pattern == '*' or value == pattern or value in pattern



    if path_pattern:

        head, *tail = path_pattern

        try: # try to consume head for each key of data

            return {k: map_at(func, tail if matches(head, k) else  path_pattern, v) for k,v in data.items()}

        except AttributeError: # fail: terminal data but path_pattern is not consumed

            return data

    else:

        map_all_leaves(func, data)

edited yesterday

answered yesterday

jferard

1,597311

edited yesterday

answered yesterday

jferard

1,597311

answered yesterday

jferard

1,597311

answered yesterday

jferard

1,597311

I expected it work with any nested combinations of list and dict. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}], map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

– alancalvitti
10 hours ago

@alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

– jferard
10 hours ago

add a comment |

I expected it work with any nested combinations of list and dict. Given data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}], map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

– alancalvitti
10 hours ago

@alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

– jferard
10 hours ago

I expected it work with any nested combinations of list and dict. Given

data2 = [{'a': 1, 'b': 2},  {'a': 10, 'c': 13},  {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},  {'a': 30, 'b': 31, 'c': {'d': 300}}]

, map_at(type,['*',['b','c'],'d'],data2) returns the input, but the top-level wildcard should map over all dicts in the list. Also tried slices like 0:2 and : in place of the wildcard, resulting in syntax errors.

– alancalvitti
10 hours ago

I expected it work with any nested combinations of list and dict. Given

data2 = [{'a': 1, 'b': 2},  {'a': 10, 'c': 13},  {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},  {'a': 30, 'b': 31, 'c': {'d': 300}}]

@alancalvitti I did not implement slices, but you can use range(0,2) instead of 0:2. For the lists, I assumed you had only dicts. If the lists are only at top level, it's easy to fix, but if you have values of dicts that are lists or lists inside lists, it becomes more complex and you will have to check the type of the elements.

– jferard
10 hours ago

add a comment |

This is not very simple and less efficient, but it should work:

def map_at(f,kp,d): return map_at0(f,kp,d,0)

def slice_contains(s,i):  # no negative-index support

  a=s.start or 0

  return i>=a and (s.end is None or i<s.end) and

    not (i-a)%(s.step or 1)

def map_at0(f,kp,d,i):

  if i==len(kp): return f(d)

  if not isinstance(d,dict): return d  # no such path here

  ret={}

  p=kp[i]

  if isinstance(p,str) and p!='*': p=p,

  for j,(k,v) in enumerate(sorted(d.items())):

    if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):

      v=map_at0(f,kp,v,i+1)

    ret[k]=v

  return ret

edited 2 hours ago

answered Jan 4 at 1:48

Davis Herring

8,5351736

I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

– alancalvitti
10 hours ago

Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

– Davis Herring
2 hours ago

add a comment |

This is not very simple and less efficient, but it should work:

def map_at(f,kp,d): return map_at0(f,kp,d,0)

def slice_contains(s,i):  # no negative-index support

  a=s.start or 0

  return i>=a and (s.end is None or i<s.end) and

    not (i-a)%(s.step or 1)

def map_at0(f,kp,d,i):

  if i==len(kp): return f(d)

  if not isinstance(d,dict): return d  # no such path here

  ret={}

  p=kp[i]

  if isinstance(p,str) and p!='*': p=p,

  for j,(k,v) in enumerate(sorted(d.items())):

    if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):

      v=map_at0(f,kp,v,i+1)

    ret[k]=v

  return ret

edited 2 hours ago

answered Jan 4 at 1:48

Davis Herring

8,5351736

I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

– alancalvitti
10 hours ago

Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

– Davis Herring
2 hours ago

add a comment |

This is not very simple and less efficient, but it should work:

def map_at(f,kp,d): return map_at0(f,kp,d,0)

def slice_contains(s,i):  # no negative-index support

  a=s.start or 0

  return i>=a and (s.end is None or i<s.end) and

    not (i-a)%(s.step or 1)

def map_at0(f,kp,d,i):

  if i==len(kp): return f(d)

  if not isinstance(d,dict): return d  # no such path here

  ret={}

  p=kp[i]

  if isinstance(p,str) and p!='*': p=p,

  for j,(k,v) in enumerate(sorted(d.items())):

    if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):

      v=map_at0(f,kp,v,i+1)

    ret[k]=v

  return ret

edited 2 hours ago

answered Jan 4 at 1:48

Davis Herring

8,5351736

This is not very simple and less efficient, but it should work:

def map_at(f,kp,d): return map_at0(f,kp,d,0)

def slice_contains(s,i):  # no negative-index support

  a=s.start or 0

  return i>=a and (s.end is None or i<s.end) and

    not (i-a)%(s.step or 1)

def map_at0(f,kp,d,i):

  if i==len(kp): return f(d)

  if not isinstance(d,dict): return d  # no such path here

  ret={}

  p=kp[i]

  if isinstance(p,str) and p!='*': p=p,

  for j,(k,v) in enumerate(sorted(d.items())):

    if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):

      v=map_at0(f,kp,v,i+1)

    ret[k]=v

  return ret

edited 2 hours ago

answered Jan 4 at 1:48

Davis Herring

8,5351736

edited 2 hours ago

answered Jan 4 at 1:48

Davis Herring

8,5351736

answered Jan 4 at 1:48

Davis Herring

8,5351736

answered Jan 4 at 1:48

Davis Herring

8,5351736

I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

– alancalvitti
10 hours ago

Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

– Davis Herring
2 hours ago

add a comment |

I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

– alancalvitti
10 hours ago

Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

– Davis Herring
2 hours ago

I tried map_at(type,['*',['b','c'],'d'],data) using data as per original Q and got AttributeError: 'int' object has no attribute 'items'

– alancalvitti
10 hours ago

Sorry--I added the obvious missing line to handle the ragged data. Your test case passes now.

– Davis Herring
2 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk