Python read xml with related child elements

Multi tool use
Multi tool use












1















I have a xml file with this structure:



<?DOMParser ?> 
<logbook:LogBook xmlns:logbook="http://www/logbook/1.0" version="1.2">
<product>
<serialNumber value="764000606"/>
</product>
<visits>
<visit>
<general>
<startDateTime>2014-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2014-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="03081" name="WSSA" index="0016"/>
</parts>
</visit>
<visit>
<general>
<startDateTime>2013-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2013-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="02081" name="PSSF" index="0017"/>
</parts>
</visit>
</visits>
</logbook:LogBook>


I want to have two outputs from this xml:



1- visit including the serial Number, so I wrote:



import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root=tree.getroot()
visits=pd.DataFrame()
for general in root.iter('general'):
for child in root.iter('serialNumber'):
visits=visits.append({'startDateTime':general.find('startDateTime').text ,
'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)


The output of this code is following dataframe:



serialNumber | startDateTime          | endDateTime            
-------------|------------------------|------------------------|
764000606 |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|
764000606 |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|


2- parts



For parts, I want to have the following output, in a way that I distinguish visits from each other by startDateTime and I want to show the parts related to the each visit :



 serialNumber | startDateTime|number|name|index|
-------------|--------------|------|----|-----|


for parts I wrote:



parts=pd.DataFrame()
for part in root.iter('part'):
for child in root.iter('serialNumber'):
parts=parts.append({'index':part.attrib['index'],
'znumber':part.attrib['number'],
'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)


This is what I get from this code:



 index |name|serialNumber| startDateTime          |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2013-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |


While i want this: look at startDateTime:



 index |name|serialNumber| startDateTime          |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2014-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |


Any idea?
I am using XML ElementTree










share|improve this question

























  • Shouldn't </product> termination tag be at the end of the file? Because your XML file should only contain one root node.

    – CristiFati
    Jul 12 '17 at 6:35











  • Is visits a pandas dataframe?

    – mzjn
    Jul 12 '17 at 8:54













  • @mzjn yes visit=pandas.DataFrame()

    – Safariba
    Jul 12 '17 at 8:58






  • 1





    You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".

    – mzjn
    Jul 12 '17 at 9:01
















1















I have a xml file with this structure:



<?DOMParser ?> 
<logbook:LogBook xmlns:logbook="http://www/logbook/1.0" version="1.2">
<product>
<serialNumber value="764000606"/>
</product>
<visits>
<visit>
<general>
<startDateTime>2014-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2014-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="03081" name="WSSA" index="0016"/>
</parts>
</visit>
<visit>
<general>
<startDateTime>2013-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2013-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="02081" name="PSSF" index="0017"/>
</parts>
</visit>
</visits>
</logbook:LogBook>


I want to have two outputs from this xml:



1- visit including the serial Number, so I wrote:



import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root=tree.getroot()
visits=pd.DataFrame()
for general in root.iter('general'):
for child in root.iter('serialNumber'):
visits=visits.append({'startDateTime':general.find('startDateTime').text ,
'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)


The output of this code is following dataframe:



serialNumber | startDateTime          | endDateTime            
-------------|------------------------|------------------------|
764000606 |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|
764000606 |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|


2- parts



For parts, I want to have the following output, in a way that I distinguish visits from each other by startDateTime and I want to show the parts related to the each visit :



 serialNumber | startDateTime|number|name|index|
-------------|--------------|------|----|-----|


for parts I wrote:



parts=pd.DataFrame()
for part in root.iter('part'):
for child in root.iter('serialNumber'):
parts=parts.append({'index':part.attrib['index'],
'znumber':part.attrib['number'],
'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)


This is what I get from this code:



 index |name|serialNumber| startDateTime          |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2013-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |


While i want this: look at startDateTime:



 index |name|serialNumber| startDateTime          |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2014-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |


Any idea?
I am using XML ElementTree










share|improve this question

























  • Shouldn't </product> termination tag be at the end of the file? Because your XML file should only contain one root node.

    – CristiFati
    Jul 12 '17 at 6:35











  • Is visits a pandas dataframe?

    – mzjn
    Jul 12 '17 at 8:54













  • @mzjn yes visit=pandas.DataFrame()

    – Safariba
    Jul 12 '17 at 8:58






  • 1





    You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".

    – mzjn
    Jul 12 '17 at 9:01














1












1








1


1






I have a xml file with this structure:



<?DOMParser ?> 
<logbook:LogBook xmlns:logbook="http://www/logbook/1.0" version="1.2">
<product>
<serialNumber value="764000606"/>
</product>
<visits>
<visit>
<general>
<startDateTime>2014-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2014-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="03081" name="WSSA" index="0016"/>
</parts>
</visit>
<visit>
<general>
<startDateTime>2013-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2013-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="02081" name="PSSF" index="0017"/>
</parts>
</visit>
</visits>
</logbook:LogBook>


I want to have two outputs from this xml:



1- visit including the serial Number, so I wrote:



import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root=tree.getroot()
visits=pd.DataFrame()
for general in root.iter('general'):
for child in root.iter('serialNumber'):
visits=visits.append({'startDateTime':general.find('startDateTime').text ,
'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)


The output of this code is following dataframe:



serialNumber | startDateTime          | endDateTime            
-------------|------------------------|------------------------|
764000606 |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|
764000606 |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|


2- parts



For parts, I want to have the following output, in a way that I distinguish visits from each other by startDateTime and I want to show the parts related to the each visit :



 serialNumber | startDateTime|number|name|index|
-------------|--------------|------|----|-----|


for parts I wrote:



parts=pd.DataFrame()
for part in root.iter('part'):
for child in root.iter('serialNumber'):
parts=parts.append({'index':part.attrib['index'],
'znumber':part.attrib['number'],
'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)


This is what I get from this code:



 index |name|serialNumber| startDateTime          |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2013-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |


While i want this: look at startDateTime:



 index |name|serialNumber| startDateTime          |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2014-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |


Any idea?
I am using XML ElementTree










share|improve this question
















I have a xml file with this structure:



<?DOMParser ?> 
<logbook:LogBook xmlns:logbook="http://www/logbook/1.0" version="1.2">
<product>
<serialNumber value="764000606"/>
</product>
<visits>
<visit>
<general>
<startDateTime>2014-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2014-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="03081" name="WSSA" index="0016"/>
</parts>
</visit>
<visit>
<general>
<startDateTime>2013-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2013-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="02081" name="PSSF" index="0017"/>
</parts>
</visit>
</visits>
</logbook:LogBook>


I want to have two outputs from this xml:



1- visit including the serial Number, so I wrote:



import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root=tree.getroot()
visits=pd.DataFrame()
for general in root.iter('general'):
for child in root.iter('serialNumber'):
visits=visits.append({'startDateTime':general.find('startDateTime').text ,
'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)


The output of this code is following dataframe:



serialNumber | startDateTime          | endDateTime            
-------------|------------------------|------------------------|
764000606 |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|
764000606 |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|


2- parts



For parts, I want to have the following output, in a way that I distinguish visits from each other by startDateTime and I want to show the parts related to the each visit :



 serialNumber | startDateTime|number|name|index|
-------------|--------------|------|----|-----|


for parts I wrote:



parts=pd.DataFrame()
for part in root.iter('part'):
for child in root.iter('serialNumber'):
parts=parts.append({'index':part.attrib['index'],
'znumber':part.attrib['number'],
'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)


This is what I get from this code:



 index |name|serialNumber| startDateTime          |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2013-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |


While i want this: look at startDateTime:



 index |name|serialNumber| startDateTime          |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2014-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |


Any idea?
I am using XML ElementTree







python xml pandas xml-parsing elementtree






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jul 12 '17 at 11:18







Safariba

















asked Jul 12 '17 at 6:15









SafaribaSafariba

84119




84119













  • Shouldn't </product> termination tag be at the end of the file? Because your XML file should only contain one root node.

    – CristiFati
    Jul 12 '17 at 6:35











  • Is visits a pandas dataframe?

    – mzjn
    Jul 12 '17 at 8:54













  • @mzjn yes visit=pandas.DataFrame()

    – Safariba
    Jul 12 '17 at 8:58






  • 1





    You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".

    – mzjn
    Jul 12 '17 at 9:01



















  • Shouldn't </product> termination tag be at the end of the file? Because your XML file should only contain one root node.

    – CristiFati
    Jul 12 '17 at 6:35











  • Is visits a pandas dataframe?

    – mzjn
    Jul 12 '17 at 8:54













  • @mzjn yes visit=pandas.DataFrame()

    – Safariba
    Jul 12 '17 at 8:58






  • 1





    You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".

    – mzjn
    Jul 12 '17 at 9:01

















Shouldn't </product> termination tag be at the end of the file? Because your XML file should only contain one root node.

– CristiFati
Jul 12 '17 at 6:35





Shouldn't </product> termination tag be at the end of the file? Because your XML file should only contain one root node.

– CristiFati
Jul 12 '17 at 6:35













Is visits a pandas dataframe?

– mzjn
Jul 12 '17 at 8:54







Is visits a pandas dataframe?

– mzjn
Jul 12 '17 at 8:54















@mzjn yes visit=pandas.DataFrame()

– Safariba
Jul 12 '17 at 8:58





@mzjn yes visit=pandas.DataFrame()

– Safariba
Jul 12 '17 at 8:58




1




1





You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".

– mzjn
Jul 12 '17 at 9:01





You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".

– mzjn
Jul 12 '17 at 9:01












2 Answers
2






active

oldest

votes


















2














Here's an example that gets the data from xml.



code.py:



#!/usr/bin/env python3

import sys
import xml.etree.ElementTree as ET
from pprint import pprint as pp


file_name = "a.xml"


def get_product_sn(product_node):
for product_node_child in list(product_node):
if product_node_child.tag == "serialNumber":
return product_node_child.attrib.get("value", None)
return None


def get_parts_data(parts_node):
ret = list()
for parts_node_child in list(parts_node):
attrs = parts_node_child.attrib
ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})
return ret


def get_visit_node_data(visit_node):
ret = dict()
for visit_node_child in list(visit_node):
if visit_node_child.tag == "general":
for general_node_child in list(visit_node_child):
if general_node_child.tag == "startDateTime":
ret["startDateTime"] = general_node_child.text
elif general_node_child.tag == "endDateTime":
ret["endDateTime"] = general_node_child.text
elif visit_node_child.tag == "parts":
ret["parts"] = get_parts_data(visit_node_child)
return ret


def get_node_data(node):
ret = {"visits": list()}
for node_child in list(node):
if node_child.tag == "product":
ret["serialNumber"] = get_product_sn(node_child)
elif node_child.tag == "visits":
for visits_node_child in list(node_child):
ret["visits"].append(get_visit_node_data(visits_node_child))
return ret


def main():
tree = ET.parse(file_name)
root_node = tree.getroot()
data = get_node_data(root_node)
pp(data)


if __name__ == "__main__":
print("Python {:s} on {:s}n".format(sys.version, sys.platform))
main()


Notes:




  • It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)

  • It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above

  • It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception

  • As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard

  • I've run it with Python 2.7 and Python 3.5


The output (a dictionary containing 2 keys) - indented for readability:





  • serialNumber - the serial number (obviously)


  • visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node


Output:




(py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

{'serialNumber': '764000606',
'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',
'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],
'startDateTime': '2014-01-10T12:22:39.166Z'},
{'endDateTime': '2013-03-11T13:51:31.480Z',
'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],
'startDateTime': '2013-01-10T12:22:39.166Z'}]}






@EDIT0: added multiple part node handling as requested in one of the comments. That functionality has been moved to get_parts_data. Now, each entry in the visits list will have a parts key whose value will be a list consisting of dictionaries extracted from each part node (not the case for the provided xml).






share|improve this answer





















  • 1





    In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

    – Safariba
    Jul 28 '17 at 7:11






  • 2





    That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

    – CristiFati
    Jul 28 '17 at 9:38






  • 1





    yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

    – Safariba
    Jul 28 '17 at 9:41













  • Thanks for updating the answer

    – Safariba
    Jul 31 '17 at 7:04



















0














try the following,



import xml.dom.minidom as minidom
doc = minidom.parse('filename')
memoryElem = doc.getElementsByTagName('part')[0]

print memoryElem.getAttribute('number')
print memoryElem.getAttribute('name')
print memoryElem.getAttribute('index')


Hope it will help u.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f45049761%2fpython-read-xml-with-related-child-elements%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    Here's an example that gets the data from xml.



    code.py:



    #!/usr/bin/env python3

    import sys
    import xml.etree.ElementTree as ET
    from pprint import pprint as pp


    file_name = "a.xml"


    def get_product_sn(product_node):
    for product_node_child in list(product_node):
    if product_node_child.tag == "serialNumber":
    return product_node_child.attrib.get("value", None)
    return None


    def get_parts_data(parts_node):
    ret = list()
    for parts_node_child in list(parts_node):
    attrs = parts_node_child.attrib
    ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})
    return ret


    def get_visit_node_data(visit_node):
    ret = dict()
    for visit_node_child in list(visit_node):
    if visit_node_child.tag == "general":
    for general_node_child in list(visit_node_child):
    if general_node_child.tag == "startDateTime":
    ret["startDateTime"] = general_node_child.text
    elif general_node_child.tag == "endDateTime":
    ret["endDateTime"] = general_node_child.text
    elif visit_node_child.tag == "parts":
    ret["parts"] = get_parts_data(visit_node_child)
    return ret


    def get_node_data(node):
    ret = {"visits": list()}
    for node_child in list(node):
    if node_child.tag == "product":
    ret["serialNumber"] = get_product_sn(node_child)
    elif node_child.tag == "visits":
    for visits_node_child in list(node_child):
    ret["visits"].append(get_visit_node_data(visits_node_child))
    return ret


    def main():
    tree = ET.parse(file_name)
    root_node = tree.getroot()
    data = get_node_data(root_node)
    pp(data)


    if __name__ == "__main__":
    print("Python {:s} on {:s}n".format(sys.version, sys.platform))
    main()


    Notes:




    • It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)

    • It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above

    • It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception

    • As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard

    • I've run it with Python 2.7 and Python 3.5


    The output (a dictionary containing 2 keys) - indented for readability:





    • serialNumber - the serial number (obviously)


    • visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node


    Output:




    (py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py
    Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

    {'serialNumber': '764000606',
    'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',
    'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],
    'startDateTime': '2014-01-10T12:22:39.166Z'},
    {'endDateTime': '2013-03-11T13:51:31.480Z',
    'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],
    'startDateTime': '2013-01-10T12:22:39.166Z'}]}






    @EDIT0: added multiple part node handling as requested in one of the comments. That functionality has been moved to get_parts_data. Now, each entry in the visits list will have a parts key whose value will be a list consisting of dictionaries extracted from each part node (not the case for the provided xml).






    share|improve this answer





















    • 1





      In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

      – Safariba
      Jul 28 '17 at 7:11






    • 2





      That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

      – CristiFati
      Jul 28 '17 at 9:38






    • 1





      yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

      – Safariba
      Jul 28 '17 at 9:41













    • Thanks for updating the answer

      – Safariba
      Jul 31 '17 at 7:04
















    2














    Here's an example that gets the data from xml.



    code.py:



    #!/usr/bin/env python3

    import sys
    import xml.etree.ElementTree as ET
    from pprint import pprint as pp


    file_name = "a.xml"


    def get_product_sn(product_node):
    for product_node_child in list(product_node):
    if product_node_child.tag == "serialNumber":
    return product_node_child.attrib.get("value", None)
    return None


    def get_parts_data(parts_node):
    ret = list()
    for parts_node_child in list(parts_node):
    attrs = parts_node_child.attrib
    ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})
    return ret


    def get_visit_node_data(visit_node):
    ret = dict()
    for visit_node_child in list(visit_node):
    if visit_node_child.tag == "general":
    for general_node_child in list(visit_node_child):
    if general_node_child.tag == "startDateTime":
    ret["startDateTime"] = general_node_child.text
    elif general_node_child.tag == "endDateTime":
    ret["endDateTime"] = general_node_child.text
    elif visit_node_child.tag == "parts":
    ret["parts"] = get_parts_data(visit_node_child)
    return ret


    def get_node_data(node):
    ret = {"visits": list()}
    for node_child in list(node):
    if node_child.tag == "product":
    ret["serialNumber"] = get_product_sn(node_child)
    elif node_child.tag == "visits":
    for visits_node_child in list(node_child):
    ret["visits"].append(get_visit_node_data(visits_node_child))
    return ret


    def main():
    tree = ET.parse(file_name)
    root_node = tree.getroot()
    data = get_node_data(root_node)
    pp(data)


    if __name__ == "__main__":
    print("Python {:s} on {:s}n".format(sys.version, sys.platform))
    main()


    Notes:




    • It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)

    • It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above

    • It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception

    • As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard

    • I've run it with Python 2.7 and Python 3.5


    The output (a dictionary containing 2 keys) - indented for readability:





    • serialNumber - the serial number (obviously)


    • visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node


    Output:




    (py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py
    Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

    {'serialNumber': '764000606',
    'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',
    'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],
    'startDateTime': '2014-01-10T12:22:39.166Z'},
    {'endDateTime': '2013-03-11T13:51:31.480Z',
    'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],
    'startDateTime': '2013-01-10T12:22:39.166Z'}]}






    @EDIT0: added multiple part node handling as requested in one of the comments. That functionality has been moved to get_parts_data. Now, each entry in the visits list will have a parts key whose value will be a list consisting of dictionaries extracted from each part node (not the case for the provided xml).






    share|improve this answer





















    • 1





      In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

      – Safariba
      Jul 28 '17 at 7:11






    • 2





      That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

      – CristiFati
      Jul 28 '17 at 9:38






    • 1





      yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

      – Safariba
      Jul 28 '17 at 9:41













    • Thanks for updating the answer

      – Safariba
      Jul 31 '17 at 7:04














    2












    2








    2







    Here's an example that gets the data from xml.



    code.py:



    #!/usr/bin/env python3

    import sys
    import xml.etree.ElementTree as ET
    from pprint import pprint as pp


    file_name = "a.xml"


    def get_product_sn(product_node):
    for product_node_child in list(product_node):
    if product_node_child.tag == "serialNumber":
    return product_node_child.attrib.get("value", None)
    return None


    def get_parts_data(parts_node):
    ret = list()
    for parts_node_child in list(parts_node):
    attrs = parts_node_child.attrib
    ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})
    return ret


    def get_visit_node_data(visit_node):
    ret = dict()
    for visit_node_child in list(visit_node):
    if visit_node_child.tag == "general":
    for general_node_child in list(visit_node_child):
    if general_node_child.tag == "startDateTime":
    ret["startDateTime"] = general_node_child.text
    elif general_node_child.tag == "endDateTime":
    ret["endDateTime"] = general_node_child.text
    elif visit_node_child.tag == "parts":
    ret["parts"] = get_parts_data(visit_node_child)
    return ret


    def get_node_data(node):
    ret = {"visits": list()}
    for node_child in list(node):
    if node_child.tag == "product":
    ret["serialNumber"] = get_product_sn(node_child)
    elif node_child.tag == "visits":
    for visits_node_child in list(node_child):
    ret["visits"].append(get_visit_node_data(visits_node_child))
    return ret


    def main():
    tree = ET.parse(file_name)
    root_node = tree.getroot()
    data = get_node_data(root_node)
    pp(data)


    if __name__ == "__main__":
    print("Python {:s} on {:s}n".format(sys.version, sys.platform))
    main()


    Notes:




    • It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)

    • It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above

    • It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception

    • As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard

    • I've run it with Python 2.7 and Python 3.5


    The output (a dictionary containing 2 keys) - indented for readability:





    • serialNumber - the serial number (obviously)


    • visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node


    Output:




    (py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py
    Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

    {'serialNumber': '764000606',
    'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',
    'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],
    'startDateTime': '2014-01-10T12:22:39.166Z'},
    {'endDateTime': '2013-03-11T13:51:31.480Z',
    'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],
    'startDateTime': '2013-01-10T12:22:39.166Z'}]}






    @EDIT0: added multiple part node handling as requested in one of the comments. That functionality has been moved to get_parts_data. Now, each entry in the visits list will have a parts key whose value will be a list consisting of dictionaries extracted from each part node (not the case for the provided xml).






    share|improve this answer















    Here's an example that gets the data from xml.



    code.py:



    #!/usr/bin/env python3

    import sys
    import xml.etree.ElementTree as ET
    from pprint import pprint as pp


    file_name = "a.xml"


    def get_product_sn(product_node):
    for product_node_child in list(product_node):
    if product_node_child.tag == "serialNumber":
    return product_node_child.attrib.get("value", None)
    return None


    def get_parts_data(parts_node):
    ret = list()
    for parts_node_child in list(parts_node):
    attrs = parts_node_child.attrib
    ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})
    return ret


    def get_visit_node_data(visit_node):
    ret = dict()
    for visit_node_child in list(visit_node):
    if visit_node_child.tag == "general":
    for general_node_child in list(visit_node_child):
    if general_node_child.tag == "startDateTime":
    ret["startDateTime"] = general_node_child.text
    elif general_node_child.tag == "endDateTime":
    ret["endDateTime"] = general_node_child.text
    elif visit_node_child.tag == "parts":
    ret["parts"] = get_parts_data(visit_node_child)
    return ret


    def get_node_data(node):
    ret = {"visits": list()}
    for node_child in list(node):
    if node_child.tag == "product":
    ret["serialNumber"] = get_product_sn(node_child)
    elif node_child.tag == "visits":
    for visits_node_child in list(node_child):
    ret["visits"].append(get_visit_node_data(visits_node_child))
    return ret


    def main():
    tree = ET.parse(file_name)
    root_node = tree.getroot()
    data = get_node_data(root_node)
    pp(data)


    if __name__ == "__main__":
    print("Python {:s} on {:s}n".format(sys.version, sys.platform))
    main()


    Notes:




    • It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)

    • It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above

    • It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception

    • As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard

    • I've run it with Python 2.7 and Python 3.5


    The output (a dictionary containing 2 keys) - indented for readability:





    • serialNumber - the serial number (obviously)


    • visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node


    Output:




    (py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py
    Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

    {'serialNumber': '764000606',
    'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',
    'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],
    'startDateTime': '2014-01-10T12:22:39.166Z'},
    {'endDateTime': '2013-03-11T13:51:31.480Z',
    'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],
    'startDateTime': '2013-01-10T12:22:39.166Z'}]}






    @EDIT0: added multiple part node handling as requested in one of the comments. That functionality has been moved to get_parts_data. Now, each entry in the visits list will have a parts key whose value will be a list consisting of dictionaries extracted from each part node (not the case for the provided xml).







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 30 '18 at 20:55

























    answered Jul 12 '17 at 13:40









    CristiFatiCristiFati

    13k72436




    13k72436








    • 1





      In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

      – Safariba
      Jul 28 '17 at 7:11






    • 2





      That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

      – CristiFati
      Jul 28 '17 at 9:38






    • 1





      yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

      – Safariba
      Jul 28 '17 at 9:41













    • Thanks for updating the answer

      – Safariba
      Jul 31 '17 at 7:04














    • 1





      In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

      – Safariba
      Jul 28 '17 at 7:11






    • 2





      That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

      – CristiFati
      Jul 28 '17 at 9:38






    • 1





      yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

      – Safariba
      Jul 28 '17 at 9:41













    • Thanks for updating the answer

      – Safariba
      Jul 31 '17 at 7:04








    1




    1





    In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

    – Safariba
    Jul 28 '17 at 7:11





    In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.

    – Safariba
    Jul 28 '17 at 7:11




    2




    2





    That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

    – CristiFati
    Jul 28 '17 at 9:38





    That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)

    – CristiFati
    Jul 28 '17 at 9:38




    1




    1





    yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

    – Safariba
    Jul 28 '17 at 9:41







    yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.

    – Safariba
    Jul 28 '17 at 9:41















    Thanks for updating the answer

    – Safariba
    Jul 31 '17 at 7:04





    Thanks for updating the answer

    – Safariba
    Jul 31 '17 at 7:04













    0














    try the following,



    import xml.dom.minidom as minidom
    doc = minidom.parse('filename')
    memoryElem = doc.getElementsByTagName('part')[0]

    print memoryElem.getAttribute('number')
    print memoryElem.getAttribute('name')
    print memoryElem.getAttribute('index')


    Hope it will help u.






    share|improve this answer




























      0














      try the following,



      import xml.dom.minidom as minidom
      doc = minidom.parse('filename')
      memoryElem = doc.getElementsByTagName('part')[0]

      print memoryElem.getAttribute('number')
      print memoryElem.getAttribute('name')
      print memoryElem.getAttribute('index')


      Hope it will help u.






      share|improve this answer


























        0












        0








        0







        try the following,



        import xml.dom.minidom as minidom
        doc = minidom.parse('filename')
        memoryElem = doc.getElementsByTagName('part')[0]

        print memoryElem.getAttribute('number')
        print memoryElem.getAttribute('name')
        print memoryElem.getAttribute('index')


        Hope it will help u.






        share|improve this answer













        try the following,



        import xml.dom.minidom as minidom
        doc = minidom.parse('filename')
        memoryElem = doc.getElementsByTagName('part')[0]

        print memoryElem.getAttribute('number')
        print memoryElem.getAttribute('name')
        print memoryElem.getAttribute('index')


        Hope it will help u.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jul 12 '17 at 9:40









        shangshang

        2915




        2915






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f45049761%2fpython-read-xml-with-related-child-elements%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            xYk BH9zuxK,YulYOoqy5VB8Kb6eOG8vYnP09 jO OrEien,BV5RtkumE o8ISDXcPFHXcee0haV6,fau9cG7FIM
            pr SPvX7qulsV,5FcZLUmwoBWB,INY84FkBXLQ pMPMtbS4HguonriFoR8g,PQ2KU1vkx3cZs8nkQ1Yuyd bK7nFzhB

            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas