Python read xml with related child elements

I have a xml file with this structure:

<?DOMParser ?> 

<logbook:LogBook xmlns:logbook="http://www/logbook/1.0"  version="1.2">

<product>

    <serialNumber value="764000606"/>

</product>

<visits>

<visit>

    <general>

        <startDateTime>2014-01-10T12:22:39.166Z</startDateTime>

        <endDateTime>2014-03-11T13:51:31.480Z</endDateTime>

    </general>

    <parts>

        <part number="03081" name="WSSA" index="0016"/>

    </parts>

</visit>

<visit>

<general>

    <startDateTime>2013-01-10T12:22:39.166Z</startDateTime>

    <endDateTime>2013-03-11T13:51:31.480Z</endDateTime>

</general>

<parts>

    <part number="02081" name="PSSF" index="0017"/>

</parts>

</visit>

</visits>

</logbook:LogBook>

I want to have two outputs from this xml:

1- visit including the serial Number, so I wrote:

import pandas as pd

import xml.etree.ElementTree as ET

tree = ET.parse(filename)

root=tree.getroot()

visits=pd.DataFrame()

for general in root.iter('general'):

    for child in root.iter('serialNumber'):

        visits=visits.append({'startDateTime':general.find('startDateTime').text ,

                  'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)

The output of this code is following dataframe:

serialNumber | startDateTime          | endDateTime            

-------------|------------------------|------------------------|

 764000606   |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|

 764000606   |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|

2- parts

For parts, I want to have the following output, in a way that I distinguish visits from each other by startDateTime and I want to show the parts related to the each visit :

 serialNumber | startDateTime|number|name|index|

 -------------|--------------|------|----|-----|

for parts I wrote:

parts=pd.DataFrame()

for part in root.iter('part'):

    for child in root.iter('serialNumber'):

            parts=parts.append({'index':part.attrib['index'],

                        'znumber':part.attrib['number'],

                        'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)

This is what I get from this code:

 index |name|serialNumber| startDateTime          |znumber|

 ------|----|------------|------------------------|-------|

 0016  |WSSA|  764000606 |2013-01-10T12:22:39.166Z| 03081 |

 0017  |PSSF|  764000606 |2013-01-10T12:22:39.166Z| 02081 |

While i want this: look at startDateTime:

 index |name|serialNumber| startDateTime          |znumber|

 ------|----|------------|------------------------|-------|

 0016  |WSSA|  764000606 |2014-01-10T12:22:39.166Z| 03081 |

 0017  |PSSF|  764000606 |2013-01-10T12:22:39.166Z| 02081 |

Any idea?
I am using XML ElementTree

edited Jul 12 '17 at 11:18

asked Jul 12 '17 at 6:15

Safariba

84119

Shouldn't </product> termination tag be at the end of the file? Because your XML file should only contain one root node.

– CristiFati
Jul 12 '17 at 6:35

Is visits a pandas dataframe?

– mzjn
Jul 12 '17 at 8:54

@mzjn yes visit=pandas.DataFrame()

– Safariba
Jul 12 '17 at 8:58

1

You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".

– mzjn
Jul 12 '17 at 9:01

add a comment |

I have a xml file with this structure:

<?DOMParser ?> 

<logbook:LogBook xmlns:logbook="http://www/logbook/1.0"  version="1.2">

<product>

    <serialNumber value="764000606"/>

</product>

<visits>

<visit>

    <general>

        <startDateTime>2014-01-10T12:22:39.166Z</startDateTime>

        <endDateTime>2014-03-11T13:51:31.480Z</endDateTime>

    </general>

    <parts>

        <part number="03081" name="WSSA" index="0016"/>

    </parts>

</visit>

<visit>

<general>

    <startDateTime>2013-01-10T12:22:39.166Z</startDateTime>

    <endDateTime>2013-03-11T13:51:31.480Z</endDateTime>

</general>

<parts>

    <part number="02081" name="PSSF" index="0017"/>

</parts>

</visit>

</visits>

</logbook:LogBook>

I want to have two outputs from this xml:

1- visit including the serial Number, so I wrote:

import pandas as pd

import xml.etree.ElementTree as ET

tree = ET.parse(filename)

root=tree.getroot()

visits=pd.DataFrame()

for general in root.iter('general'):

    for child in root.iter('serialNumber'):

        visits=visits.append({'startDateTime':general.find('startDateTime').text ,

                  'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)

The output of this code is following dataframe:

serialNumber | startDateTime          | endDateTime            

-------------|------------------------|------------------------|

 764000606   |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|

 764000606   |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|

2- parts

For parts, I want to have the following output, in a way that I distinguish visits from each other by startDateTime and I want to show the parts related to the each visit :

 serialNumber | startDateTime|number|name|index|

 -------------|--------------|------|----|-----|

for parts I wrote:

parts=pd.DataFrame()

for part in root.iter('part'):

    for child in root.iter('serialNumber'):

            parts=parts.append({'index':part.attrib['index'],

                        'znumber':part.attrib['number'],

                        'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)

This is what I get from this code:

 index |name|serialNumber| startDateTime          |znumber|

 ------|----|------------|------------------------|-------|

 0016  |WSSA|  764000606 |2013-01-10T12:22:39.166Z| 03081 |

 0017  |PSSF|  764000606 |2013-01-10T12:22:39.166Z| 02081 |

While i want this: look at startDateTime:

 index |name|serialNumber| startDateTime          |znumber|

 ------|----|------------|------------------------|-------|

 0016  |WSSA|  764000606 |2014-01-10T12:22:39.166Z| 03081 |

 0017  |PSSF|  764000606 |2013-01-10T12:22:39.166Z| 02081 |

Any idea?
I am using XML ElementTree

edited Jul 12 '17 at 11:18

asked Jul 12 '17 at 6:15

Safariba

84119

Shouldn't </product> termination tag be at the end of the file? Because your XML file should only contain one root node.

– CristiFati
Jul 12 '17 at 6:35

Is visits a pandas dataframe?

– mzjn
Jul 12 '17 at 8:54

@mzjn yes visit=pandas.DataFrame()

– Safariba
Jul 12 '17 at 8:58

1

You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".

– mzjn
Jul 12 '17 at 9:01

add a comment |

I have a xml file with this structure:

<?DOMParser ?> 

<logbook:LogBook xmlns:logbook="http://www/logbook/1.0"  version="1.2">

<product>

    <serialNumber value="764000606"/>

</product>

<visits>

<visit>

    <general>

        <startDateTime>2014-01-10T12:22:39.166Z</startDateTime>

        <endDateTime>2014-03-11T13:51:31.480Z</endDateTime>

    </general>

    <parts>

        <part number="03081" name="WSSA" index="0016"/>

    </parts>

</visit>

<visit>

<general>

    <startDateTime>2013-01-10T12:22:39.166Z</startDateTime>

    <endDateTime>2013-03-11T13:51:31.480Z</endDateTime>

</general>

<parts>

    <part number="02081" name="PSSF" index="0017"/>

</parts>

</visit>

</visits>

</logbook:LogBook>

I want to have two outputs from this xml:

1- visit including the serial Number, so I wrote:

import pandas as pd

import xml.etree.ElementTree as ET

tree = ET.parse(filename)

root=tree.getroot()

visits=pd.DataFrame()

for general in root.iter('general'):

    for child in root.iter('serialNumber'):

        visits=visits.append({'startDateTime':general.find('startDateTime').text ,

                  'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)

The output of this code is following dataframe:

serialNumber | startDateTime          | endDateTime            

-------------|------------------------|------------------------|

 764000606   |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|

 764000606   |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|

2- parts

For parts, I want to have the following output, in a way that I distinguish visits from each other by startDateTime and I want to show the parts related to the each visit :

 serialNumber | startDateTime|number|name|index|

 -------------|--------------|------|----|-----|

for parts I wrote:

parts=pd.DataFrame()

for part in root.iter('part'):

    for child in root.iter('serialNumber'):

            parts=parts.append({'index':part.attrib['index'],

                        'znumber':part.attrib['number'],

                        'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)

This is what I get from this code:

 index |name|serialNumber| startDateTime          |znumber|

 ------|----|------------|------------------------|-------|

 0016  |WSSA|  764000606 |2013-01-10T12:22:39.166Z| 03081 |

 0017  |PSSF|  764000606 |2013-01-10T12:22:39.166Z| 02081 |

While i want this: look at startDateTime:

 index |name|serialNumber| startDateTime          |znumber|

 ------|----|------------|------------------------|-------|

 0016  |WSSA|  764000606 |2014-01-10T12:22:39.166Z| 03081 |

 0017  |PSSF|  764000606 |2013-01-10T12:22:39.166Z| 02081 |

Any idea?
I am using XML ElementTree

edited Jul 12 '17 at 11:18

asked Jul 12 '17 at 6:15

Safariba

84119

I have a xml file with this structure:

<?DOMParser ?> 

<logbook:LogBook xmlns:logbook="http://www/logbook/1.0"  version="1.2">

<product>

    <serialNumber value="764000606"/>

</product>

<visits>

<visit>

    <general>

        <startDateTime>2014-01-10T12:22:39.166Z</startDateTime>

        <endDateTime>2014-03-11T13:51:31.480Z</endDateTime>

    </general>

    <parts>

        <part number="03081" name="WSSA" index="0016"/>

    </parts>

</visit>

<visit>

<general>

    <startDateTime>2013-01-10T12:22:39.166Z</startDateTime>

    <endDateTime>2013-03-11T13:51:31.480Z</endDateTime>

</general>

<parts>

    <part number="02081" name="PSSF" index="0017"/>

</parts>

</visit>

</visits>

</logbook:LogBook>

I want to have two outputs from this xml:

1- visit including the serial Number, so I wrote:

import pandas as pd

import xml.etree.ElementTree as ET

tree = ET.parse(filename)

root=tree.getroot()

visits=pd.DataFrame()

for general in root.iter('general'):

    for child in root.iter('serialNumber'):

        visits=visits.append({'startDateTime':general.find('startDateTime').text ,

                  'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)

The output of this code is following dataframe:

serialNumber | startDateTime          | endDateTime            

-------------|------------------------|------------------------|

 764000606   |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|

 764000606   |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|

2- parts

For parts, I want to have the following output, in a way that I distinguish visits from each other by startDateTime and I want to show the parts related to the each visit :

 serialNumber | startDateTime|number|name|index|

 -------------|--------------|------|----|-----|

for parts I wrote:

parts=pd.DataFrame()

for part in root.iter('part'):

    for child in root.iter('serialNumber'):

            parts=parts.append({'index':part.attrib['index'],

                        'znumber':part.attrib['number'],

                        'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)

This is what I get from this code:

 index |name|serialNumber| startDateTime          |znumber|

 ------|----|------------|------------------------|-------|

 0016  |WSSA|  764000606 |2013-01-10T12:22:39.166Z| 03081 |

 0017  |PSSF|  764000606 |2013-01-10T12:22:39.166Z| 02081 |

While i want this: look at startDateTime:

 index |name|serialNumber| startDateTime          |znumber|

 ------|----|------------|------------------------|-------|

 0016  |WSSA|  764000606 |2014-01-10T12:22:39.166Z| 03081 |

 0017  |PSSF|  764000606 |2013-01-10T12:22:39.166Z| 02081 |

Any idea?
I am using XML ElementTree

python xml pandas xml-parsing elementtree

edited Jul 12 '17 at 11:18

asked Jul 12 '17 at 6:15

Safariba

84119

edited Jul 12 '17 at 11:18

asked Jul 12 '17 at 6:15

Safariba

84119

edited Jul 12 '17 at 11:18

asked Jul 12 '17 at 6:15

Safariba

84119

asked Jul 12 '17 at 6:15

Safariba

84119

asked Jul 12 '17 at 6:15

Safariba

84119

Shouldn't </product> termination tag be at the end of the file? Because your XML file should only contain one root node.

– CristiFati
Jul 12 '17 at 6:35

Is visits a pandas dataframe?

– mzjn
Jul 12 '17 at 8:54

@mzjn yes visit=pandas.DataFrame()

– Safariba
Jul 12 '17 at 8:58

1

You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".

– mzjn
Jul 12 '17 at 9:01

add a comment |

Shouldn't </product> termination tag be at the end of the file? Because your XML file should only contain one root node.

– CristiFati
Jul 12 '17 at 6:35

Is visits a pandas dataframe?

– mzjn
Jul 12 '17 at 8:54

@mzjn yes visit=pandas.DataFrame()

– Safariba
Jul 12 '17 at 8:58

1

You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".

– mzjn
Jul 12 '17 at 9:01

Shouldn't </product> termination tag be at the end of the file? Because your XML file should only contain one root node.

– CristiFati
Jul 12 '17 at 6:35

Is visits a pandas dataframe?

– mzjn
Jul 12 '17 at 8:54

@mzjn yes visit=pandas.DataFrame()

– Safariba
Jul 12 '17 at 8:58

You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".

– mzjn
Jul 12 '17 at 9:01

add a comment |

2 Answers
2

active

oldest

votes

Here's an example that gets the data from xml.

code.py:

#!/usr/bin/env python3



import sys

import xml.etree.ElementTree as ET

from pprint import pprint as pp





file_name = "a.xml"





def get_product_sn(product_node):

    for product_node_child in list(product_node):

        if product_node_child.tag == "serialNumber":

            return product_node_child.attrib.get("value", None)

    return None





def get_parts_data(parts_node):

    ret = list()

    for parts_node_child in list(parts_node):

        attrs = parts_node_child.attrib

        ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})

    return ret





def get_visit_node_data(visit_node):

    ret = dict()

    for visit_node_child in list(visit_node):

        if visit_node_child.tag == "general":

            for general_node_child in list(visit_node_child):

                if general_node_child.tag == "startDateTime":

                    ret["startDateTime"] = general_node_child.text

                elif general_node_child.tag == "endDateTime":

                    ret["endDateTime"] = general_node_child.text

        elif visit_node_child.tag == "parts":

            ret["parts"] = get_parts_data(visit_node_child)

    return ret





def get_node_data(node):

    ret = {"visits": list()}

    for node_child in list(node):

        if node_child.tag == "product":

            ret["serialNumber"] = get_product_sn(node_child)

        elif node_child.tag == "visits":

            for visits_node_child in list(node_child):

                ret["visits"].append(get_visit_node_data(visits_node_child))

    return ret





def main():

    tree = ET.parse(file_name)

    root_node = tree.getroot()

    data = get_node_data(root_node)

    pp(data)





if __name__ == "__main__":

    print("Python {:s} on {:s}n".format(sys.version, sys.platform))

    main()

Notes:

It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)

It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above

It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception

As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard

I've run it with Python 2.7 and Python 3.5

The output (a dictionary containing 2 keys) - indented for readability:

serialNumber - the serial number (obviously)

visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node

Output:

(py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py

Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32



{'serialNumber': '764000606',

 'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',

             'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],

             'startDateTime': '2014-01-10T12:22:39.166Z'},

            {'endDateTime': '2013-03-11T13:51:31.480Z',

             'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],

             'startDateTime': '2013-01-10T12:22:39.166Z'}]}

@EDIT0: added multiple part node handling as requested in one of the comments. That functionality has been moved to get_parts_data. Now, each entry in the visits list will have a parts key whose value will be a list consisting of dictionaries extracted from each part node (not the case for the provided xml).

edited Dec 30 '18 at 20:55

answered Jul 12 '17 at 13:40

CristiFati

13k72436

1

In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

– Safariba
Jul 28 '17 at 7:11

2

That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

– CristiFati
Jul 28 '17 at 9:38

1

yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

– Safariba
Jul 28 '17 at 9:41

Thanks for updating the answer

– Safariba
Jul 31 '17 at 7:04

add a comment |

try the following,

import xml.dom.minidom as minidom

doc = minidom.parse('filename')

memoryElem = doc.getElementsByTagName('part')[0]



print memoryElem.getAttribute('number')

print memoryElem.getAttribute('name')

print memoryElem.getAttribute('index')

Hope it will help u.

answered Jul 12 '17 at 9:40

shang

2915

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f45049761%2fpython-read-xml-with-related-child-elements%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Here's an example that gets the data from xml.

code.py:

#!/usr/bin/env python3



import sys

import xml.etree.ElementTree as ET

from pprint import pprint as pp





file_name = "a.xml"





def get_product_sn(product_node):

    for product_node_child in list(product_node):

        if product_node_child.tag == "serialNumber":

            return product_node_child.attrib.get("value", None)

    return None





def get_parts_data(parts_node):

    ret = list()

    for parts_node_child in list(parts_node):

        attrs = parts_node_child.attrib

        ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})

    return ret





def get_visit_node_data(visit_node):

    ret = dict()

    for visit_node_child in list(visit_node):

        if visit_node_child.tag == "general":

            for general_node_child in list(visit_node_child):

                if general_node_child.tag == "startDateTime":

                    ret["startDateTime"] = general_node_child.text

                elif general_node_child.tag == "endDateTime":

                    ret["endDateTime"] = general_node_child.text

        elif visit_node_child.tag == "parts":

            ret["parts"] = get_parts_data(visit_node_child)

    return ret





def get_node_data(node):

    ret = {"visits": list()}

    for node_child in list(node):

        if node_child.tag == "product":

            ret["serialNumber"] = get_product_sn(node_child)

        elif node_child.tag == "visits":

            for visits_node_child in list(node_child):

                ret["visits"].append(get_visit_node_data(visits_node_child))

    return ret





def main():

    tree = ET.parse(file_name)

    root_node = tree.getroot()

    data = get_node_data(root_node)

    pp(data)





if __name__ == "__main__":

    print("Python {:s} on {:s}n".format(sys.version, sys.platform))

    main()

Notes:

It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)

It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above

It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception

As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard

I've run it with Python 2.7 and Python 3.5

The output (a dictionary containing 2 keys) - indented for readability:

serialNumber - the serial number (obviously)

visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node

Output:

(py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py

Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32



{'serialNumber': '764000606',

 'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',

             'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],

             'startDateTime': '2014-01-10T12:22:39.166Z'},

            {'endDateTime': '2013-03-11T13:51:31.480Z',

             'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],

             'startDateTime': '2013-01-10T12:22:39.166Z'}]}

edited Dec 30 '18 at 20:55

answered Jul 12 '17 at 13:40

CristiFati

13k72436

1

In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

– Safariba
Jul 28 '17 at 7:11

2

That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

– CristiFati
Jul 28 '17 at 9:38

1

yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

– Safariba
Jul 28 '17 at 9:41

Thanks for updating the answer

– Safariba
Jul 31 '17 at 7:04

add a comment |

Here's an example that gets the data from xml.

code.py:

#!/usr/bin/env python3



import sys

import xml.etree.ElementTree as ET

from pprint import pprint as pp





file_name = "a.xml"





def get_product_sn(product_node):

    for product_node_child in list(product_node):

        if product_node_child.tag == "serialNumber":

            return product_node_child.attrib.get("value", None)

    return None





def get_parts_data(parts_node):

    ret = list()

    for parts_node_child in list(parts_node):

        attrs = parts_node_child.attrib

        ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})

    return ret





def get_visit_node_data(visit_node):

    ret = dict()

    for visit_node_child in list(visit_node):

        if visit_node_child.tag == "general":

            for general_node_child in list(visit_node_child):

                if general_node_child.tag == "startDateTime":

                    ret["startDateTime"] = general_node_child.text

                elif general_node_child.tag == "endDateTime":

                    ret["endDateTime"] = general_node_child.text

        elif visit_node_child.tag == "parts":

            ret["parts"] = get_parts_data(visit_node_child)

    return ret





def get_node_data(node):

    ret = {"visits": list()}

    for node_child in list(node):

        if node_child.tag == "product":

            ret["serialNumber"] = get_product_sn(node_child)

        elif node_child.tag == "visits":

            for visits_node_child in list(node_child):

                ret["visits"].append(get_visit_node_data(visits_node_child))

    return ret





def main():

    tree = ET.parse(file_name)

    root_node = tree.getroot()

    data = get_node_data(root_node)

    pp(data)





if __name__ == "__main__":

    print("Python {:s} on {:s}n".format(sys.version, sys.platform))

    main()

Notes:

It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)

It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above

It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception

As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard

I've run it with Python 2.7 and Python 3.5

The output (a dictionary containing 2 keys) - indented for readability:

serialNumber - the serial number (obviously)

visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node

Output:

(py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py

Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32



{'serialNumber': '764000606',

 'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',

             'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],

             'startDateTime': '2014-01-10T12:22:39.166Z'},

            {'endDateTime': '2013-03-11T13:51:31.480Z',

             'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],

             'startDateTime': '2013-01-10T12:22:39.166Z'}]}

edited Dec 30 '18 at 20:55

answered Jul 12 '17 at 13:40

CristiFati

13k72436

1

In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

– Safariba
Jul 28 '17 at 7:11

2

That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

– CristiFati
Jul 28 '17 at 9:38

1

yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

– Safariba
Jul 28 '17 at 9:41

Thanks for updating the answer

– Safariba
Jul 31 '17 at 7:04

add a comment |

Here's an example that gets the data from xml.

code.py:

#!/usr/bin/env python3



import sys

import xml.etree.ElementTree as ET

from pprint import pprint as pp





file_name = "a.xml"





def get_product_sn(product_node):

    for product_node_child in list(product_node):

        if product_node_child.tag == "serialNumber":

            return product_node_child.attrib.get("value", None)

    return None





def get_parts_data(parts_node):

    ret = list()

    for parts_node_child in list(parts_node):

        attrs = parts_node_child.attrib

        ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})

    return ret





def get_visit_node_data(visit_node):

    ret = dict()

    for visit_node_child in list(visit_node):

        if visit_node_child.tag == "general":

            for general_node_child in list(visit_node_child):

                if general_node_child.tag == "startDateTime":

                    ret["startDateTime"] = general_node_child.text

                elif general_node_child.tag == "endDateTime":

                    ret["endDateTime"] = general_node_child.text

        elif visit_node_child.tag == "parts":

            ret["parts"] = get_parts_data(visit_node_child)

    return ret





def get_node_data(node):

    ret = {"visits": list()}

    for node_child in list(node):

        if node_child.tag == "product":

            ret["serialNumber"] = get_product_sn(node_child)

        elif node_child.tag == "visits":

            for visits_node_child in list(node_child):

                ret["visits"].append(get_visit_node_data(visits_node_child))

    return ret





def main():

    tree = ET.parse(file_name)

    root_node = tree.getroot()

    data = get_node_data(root_node)

    pp(data)





if __name__ == "__main__":

    print("Python {:s} on {:s}n".format(sys.version, sys.platform))

    main()

Notes:

It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)

It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above

It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception

As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard

I've run it with Python 2.7 and Python 3.5

The output (a dictionary containing 2 keys) - indented for readability:

serialNumber - the serial number (obviously)

visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node

Output:

(py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py

Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32



{'serialNumber': '764000606',

 'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',

             'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],

             'startDateTime': '2014-01-10T12:22:39.166Z'},

            {'endDateTime': '2013-03-11T13:51:31.480Z',

             'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],

             'startDateTime': '2013-01-10T12:22:39.166Z'}]}

edited Dec 30 '18 at 20:55

answered Jul 12 '17 at 13:40

CristiFati

13k72436

Here's an example that gets the data from xml.

code.py:

#!/usr/bin/env python3



import sys

import xml.etree.ElementTree as ET

from pprint import pprint as pp





file_name = "a.xml"





def get_product_sn(product_node):

    for product_node_child in list(product_node):

        if product_node_child.tag == "serialNumber":

            return product_node_child.attrib.get("value", None)

    return None





def get_parts_data(parts_node):

    ret = list()

    for parts_node_child in list(parts_node):

        attrs = parts_node_child.attrib

        ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})

    return ret





def get_visit_node_data(visit_node):

    ret = dict()

    for visit_node_child in list(visit_node):

        if visit_node_child.tag == "general":

            for general_node_child in list(visit_node_child):

                if general_node_child.tag == "startDateTime":

                    ret["startDateTime"] = general_node_child.text

                elif general_node_child.tag == "endDateTime":

                    ret["endDateTime"] = general_node_child.text

        elif visit_node_child.tag == "parts":

            ret["parts"] = get_parts_data(visit_node_child)

    return ret





def get_node_data(node):

    ret = {"visits": list()}

    for node_child in list(node):

        if node_child.tag == "product":

            ret["serialNumber"] = get_product_sn(node_child)

        elif node_child.tag == "visits":

            for visits_node_child in list(node_child):

                ret["visits"].append(get_visit_node_data(visits_node_child))

    return ret





def main():

    tree = ET.parse(file_name)

    root_node = tree.getroot()

    data = get_node_data(root_node)

    pp(data)





if __name__ == "__main__":

    print("Python {:s} on {:s}n".format(sys.version, sys.platform))

    main()

Notes:

It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)

It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above

It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception

As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard

I've run it with Python 2.7 and Python 3.5

The output (a dictionary containing 2 keys) - indented for readability:

serialNumber - the serial number (obviously)

visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node

Output:

(py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py

Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32



{'serialNumber': '764000606',

 'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',

             'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],

             'startDateTime': '2014-01-10T12:22:39.166Z'},

            {'endDateTime': '2013-03-11T13:51:31.480Z',

             'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],

             'startDateTime': '2013-01-10T12:22:39.166Z'}]}

edited Dec 30 '18 at 20:55

answered Jul 12 '17 at 13:40

CristiFati

13k72436

edited Dec 30 '18 at 20:55

answered Jul 12 '17 at 13:40

CristiFati

13k72436

answered Jul 12 '17 at 13:40

CristiFati

13k72436

answered Jul 12 '17 at 13:40

CristiFati

13k72436

1

In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

– Safariba
Jul 28 '17 at 7:11

2

That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

– CristiFati
Jul 28 '17 at 9:38

1

yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

– Safariba
Jul 28 '17 at 9:41

Thanks for updating the answer

– Safariba
Jul 31 '17 at 7:04

add a comment |

1

In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

– Safariba
Jul 28 '17 at 7:11

2

That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

– CristiFati
Jul 28 '17 at 9:38

1

yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

– Safariba
Jul 28 '17 at 9:41

Thanks for updating the answer

– Safariba
Jul 31 '17 at 7:04

In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

– Safariba
Jul 28 '17 at 7:11

That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

– CristiFati
Jul 28 '17 at 9:38

yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

– Safariba
Jul 28 '17 at 9:41

Thanks for updating the answer

– Safariba
Jul 31 '17 at 7:04

add a comment |

try the following,

import xml.dom.minidom as minidom

doc = minidom.parse('filename')

memoryElem = doc.getElementsByTagName('part')[0]



print memoryElem.getAttribute('number')

print memoryElem.getAttribute('name')

print memoryElem.getAttribute('index')

Hope it will help u.

answered Jul 12 '17 at 9:40

shang

2915

add a comment |

try the following,

import xml.dom.minidom as minidom

doc = minidom.parse('filename')

memoryElem = doc.getElementsByTagName('part')[0]



print memoryElem.getAttribute('number')

print memoryElem.getAttribute('name')

print memoryElem.getAttribute('index')

Hope it will help u.

answered Jul 12 '17 at 9:40

shang

2915

add a comment |

try the following,

import xml.dom.minidom as minidom

doc = minidom.parse('filename')

memoryElem = doc.getElementsByTagName('part')[0]



print memoryElem.getAttribute('number')

print memoryElem.getAttribute('name')

print memoryElem.getAttribute('index')

Hope it will help u.

answered Jul 12 '17 at 9:40

shang

2915

try the following,

import xml.dom.minidom as minidom

doc = minidom.parse('filename')

memoryElem = doc.getElementsByTagName('part')[0]



print memoryElem.getAttribute('number')

print memoryElem.getAttribute('name')

print memoryElem.getAttribute('index')

Hope it will help u.

answered Jul 12 '17 at 9:40

shang

2915

answered Jul 12 '17 at 9:40

shang

2915

answered Jul 12 '17 at 9:40

shang

2915

answered Jul 12 '17 at 9:40

shang

2915

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

xYk BH9zuxK,YulYOoqy5VB8Kb6eOG8vYnP09 jO OrEien,BV5RtkumE o8ISDXcPFHXcee0haV6,fau9cG7FIM

搜尋此網誌

Bdtjtk