Python read xml with related child elements

Multi tool use
I have a xml file with this structure:
<?DOMParser ?>
<logbook:LogBook xmlns:logbook="http://www/logbook/1.0" version="1.2">
<product>
<serialNumber value="764000606"/>
</product>
<visits>
<visit>
<general>
<startDateTime>2014-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2014-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="03081" name="WSSA" index="0016"/>
</parts>
</visit>
<visit>
<general>
<startDateTime>2013-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2013-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="02081" name="PSSF" index="0017"/>
</parts>
</visit>
</visits>
</logbook:LogBook>
I want to have two outputs from this xml:
1- visit including the serial Number, so I wrote:
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root=tree.getroot()
visits=pd.DataFrame()
for general in root.iter('general'):
for child in root.iter('serialNumber'):
visits=visits.append({'startDateTime':general.find('startDateTime').text ,
'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)
The output of this code is following dataframe:
serialNumber | startDateTime | endDateTime
-------------|------------------------|------------------------|
764000606 |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|
764000606 |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|
2- parts
For parts
, I want to have the following output, in a way that I distinguish visits from each other by startDateTime
and I want to show the parts related to the each visit :
serialNumber | startDateTime|number|name|index|
-------------|--------------|------|----|-----|
for parts I wrote:
parts=pd.DataFrame()
for part in root.iter('part'):
for child in root.iter('serialNumber'):
parts=parts.append({'index':part.attrib['index'],
'znumber':part.attrib['number'],
'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)
This is what I get from this code:
index |name|serialNumber| startDateTime |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2013-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |
While i want this: look at startDateTime
:
index |name|serialNumber| startDateTime |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2014-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |
Any idea?
I am using XML ElementTree
python xml pandas xml-parsing elementtree
add a comment |
I have a xml file with this structure:
<?DOMParser ?>
<logbook:LogBook xmlns:logbook="http://www/logbook/1.0" version="1.2">
<product>
<serialNumber value="764000606"/>
</product>
<visits>
<visit>
<general>
<startDateTime>2014-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2014-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="03081" name="WSSA" index="0016"/>
</parts>
</visit>
<visit>
<general>
<startDateTime>2013-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2013-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="02081" name="PSSF" index="0017"/>
</parts>
</visit>
</visits>
</logbook:LogBook>
I want to have two outputs from this xml:
1- visit including the serial Number, so I wrote:
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root=tree.getroot()
visits=pd.DataFrame()
for general in root.iter('general'):
for child in root.iter('serialNumber'):
visits=visits.append({'startDateTime':general.find('startDateTime').text ,
'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)
The output of this code is following dataframe:
serialNumber | startDateTime | endDateTime
-------------|------------------------|------------------------|
764000606 |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|
764000606 |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|
2- parts
For parts
, I want to have the following output, in a way that I distinguish visits from each other by startDateTime
and I want to show the parts related to the each visit :
serialNumber | startDateTime|number|name|index|
-------------|--------------|------|----|-----|
for parts I wrote:
parts=pd.DataFrame()
for part in root.iter('part'):
for child in root.iter('serialNumber'):
parts=parts.append({'index':part.attrib['index'],
'znumber':part.attrib['number'],
'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)
This is what I get from this code:
index |name|serialNumber| startDateTime |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2013-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |
While i want this: look at startDateTime
:
index |name|serialNumber| startDateTime |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2014-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |
Any idea?
I am using XML ElementTree
python xml pandas xml-parsing elementtree
Shouldn't</product>
termination tag be at the end of the file? Because your XML file should only contain one root node.
– CristiFati
Jul 12 '17 at 6:35
Isvisits
a pandas dataframe?
– mzjn
Jul 12 '17 at 8:54
@mzjn yes visit=pandas.DataFrame()
– Safariba
Jul 12 '17 at 8:58
1
You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".
– mzjn
Jul 12 '17 at 9:01
add a comment |
I have a xml file with this structure:
<?DOMParser ?>
<logbook:LogBook xmlns:logbook="http://www/logbook/1.0" version="1.2">
<product>
<serialNumber value="764000606"/>
</product>
<visits>
<visit>
<general>
<startDateTime>2014-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2014-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="03081" name="WSSA" index="0016"/>
</parts>
</visit>
<visit>
<general>
<startDateTime>2013-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2013-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="02081" name="PSSF" index="0017"/>
</parts>
</visit>
</visits>
</logbook:LogBook>
I want to have two outputs from this xml:
1- visit including the serial Number, so I wrote:
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root=tree.getroot()
visits=pd.DataFrame()
for general in root.iter('general'):
for child in root.iter('serialNumber'):
visits=visits.append({'startDateTime':general.find('startDateTime').text ,
'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)
The output of this code is following dataframe:
serialNumber | startDateTime | endDateTime
-------------|------------------------|------------------------|
764000606 |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|
764000606 |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|
2- parts
For parts
, I want to have the following output, in a way that I distinguish visits from each other by startDateTime
and I want to show the parts related to the each visit :
serialNumber | startDateTime|number|name|index|
-------------|--------------|------|----|-----|
for parts I wrote:
parts=pd.DataFrame()
for part in root.iter('part'):
for child in root.iter('serialNumber'):
parts=parts.append({'index':part.attrib['index'],
'znumber':part.attrib['number'],
'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)
This is what I get from this code:
index |name|serialNumber| startDateTime |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2013-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |
While i want this: look at startDateTime
:
index |name|serialNumber| startDateTime |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2014-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |
Any idea?
I am using XML ElementTree
python xml pandas xml-parsing elementtree
I have a xml file with this structure:
<?DOMParser ?>
<logbook:LogBook xmlns:logbook="http://www/logbook/1.0" version="1.2">
<product>
<serialNumber value="764000606"/>
</product>
<visits>
<visit>
<general>
<startDateTime>2014-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2014-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="03081" name="WSSA" index="0016"/>
</parts>
</visit>
<visit>
<general>
<startDateTime>2013-01-10T12:22:39.166Z</startDateTime>
<endDateTime>2013-03-11T13:51:31.480Z</endDateTime>
</general>
<parts>
<part number="02081" name="PSSF" index="0017"/>
</parts>
</visit>
</visits>
</logbook:LogBook>
I want to have two outputs from this xml:
1- visit including the serial Number, so I wrote:
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root=tree.getroot()
visits=pd.DataFrame()
for general in root.iter('general'):
for child in root.iter('serialNumber'):
visits=visits.append({'startDateTime':general.find('startDateTime').text ,
'endDateTime': general.find('endDateTime').text, 'serialNumber':child.attrib['value'] }, ignore_index=True)
The output of this code is following dataframe:
serialNumber | startDateTime | endDateTime
-------------|------------------------|------------------------|
764000606 |2014-01-10T12:22:39.166Z|2014-03-11T13:51:31.480Z|
764000606 |2013-03-11T13:51:31.480Z|2013-01-10T12:22:39.166Z|
2- parts
For parts
, I want to have the following output, in a way that I distinguish visits from each other by startDateTime
and I want to show the parts related to the each visit :
serialNumber | startDateTime|number|name|index|
-------------|--------------|------|----|-----|
for parts I wrote:
parts=pd.DataFrame()
for part in root.iter('part'):
for child in root.iter('serialNumber'):
parts=parts.append({'index':part.attrib['index'],
'znumber':part.attrib['number'],
'name': part.attrib['name'], 'serialNumber':child.attrib['value'], 'startDateTime':general.find('startDateTime').text}, ignore_index=True)
This is what I get from this code:
index |name|serialNumber| startDateTime |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2013-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |
While i want this: look at startDateTime
:
index |name|serialNumber| startDateTime |znumber|
------|----|------------|------------------------|-------|
0016 |WSSA| 764000606 |2014-01-10T12:22:39.166Z| 03081 |
0017 |PSSF| 764000606 |2013-01-10T12:22:39.166Z| 02081 |
Any idea?
I am using XML ElementTree
python xml pandas xml-parsing elementtree
python xml pandas xml-parsing elementtree
edited Jul 12 '17 at 11:18
Safariba
asked Jul 12 '17 at 6:15
SafaribaSafariba
84119
84119
Shouldn't</product>
termination tag be at the end of the file? Because your XML file should only contain one root node.
– CristiFati
Jul 12 '17 at 6:35
Isvisits
a pandas dataframe?
– mzjn
Jul 12 '17 at 8:54
@mzjn yes visit=pandas.DataFrame()
– Safariba
Jul 12 '17 at 8:58
1
You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".
– mzjn
Jul 12 '17 at 9:01
add a comment |
Shouldn't</product>
termination tag be at the end of the file? Because your XML file should only contain one root node.
– CristiFati
Jul 12 '17 at 6:35
Isvisits
a pandas dataframe?
– mzjn
Jul 12 '17 at 8:54
@mzjn yes visit=pandas.DataFrame()
– Safariba
Jul 12 '17 at 8:58
1
You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".
– mzjn
Jul 12 '17 at 9:01
Shouldn't
</product>
termination tag be at the end of the file? Because your XML file should only contain one root node.– CristiFati
Jul 12 '17 at 6:35
Shouldn't
</product>
termination tag be at the end of the file? Because your XML file should only contain one root node.– CristiFati
Jul 12 '17 at 6:35
Is
visits
a pandas dataframe?– mzjn
Jul 12 '17 at 8:54
Is
visits
a pandas dataframe?– mzjn
Jul 12 '17 at 8:54
@mzjn yes visit=pandas.DataFrame()
– Safariba
Jul 12 '17 at 8:58
@mzjn yes visit=pandas.DataFrame()
– Safariba
Jul 12 '17 at 8:58
1
1
You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".
– mzjn
Jul 12 '17 at 9:01
You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".
– mzjn
Jul 12 '17 at 9:01
add a comment |
2 Answers
2
active
oldest
votes
Here's an example that gets the data from xml.
code.py:
#!/usr/bin/env python3
import sys
import xml.etree.ElementTree as ET
from pprint import pprint as pp
file_name = "a.xml"
def get_product_sn(product_node):
for product_node_child in list(product_node):
if product_node_child.tag == "serialNumber":
return product_node_child.attrib.get("value", None)
return None
def get_parts_data(parts_node):
ret = list()
for parts_node_child in list(parts_node):
attrs = parts_node_child.attrib
ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})
return ret
def get_visit_node_data(visit_node):
ret = dict()
for visit_node_child in list(visit_node):
if visit_node_child.tag == "general":
for general_node_child in list(visit_node_child):
if general_node_child.tag == "startDateTime":
ret["startDateTime"] = general_node_child.text
elif general_node_child.tag == "endDateTime":
ret["endDateTime"] = general_node_child.text
elif visit_node_child.tag == "parts":
ret["parts"] = get_parts_data(visit_node_child)
return ret
def get_node_data(node):
ret = {"visits": list()}
for node_child in list(node):
if node_child.tag == "product":
ret["serialNumber"] = get_product_sn(node_child)
elif node_child.tag == "visits":
for visits_node_child in list(node_child):
ret["visits"].append(get_visit_node_data(visits_node_child))
return ret
def main():
tree = ET.parse(file_name)
root_node = tree.getroot()
data = get_node_data(root_node)
pp(data)
if __name__ == "__main__":
print("Python {:s} on {:s}n".format(sys.version, sys.platform))
main()
Notes:
- It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)
- It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above
- It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception
- As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard
- I've run it with Python 2.7 and Python 3.5
The output (a dictionary containing 2 keys) - indented for readability:
serialNumber - the serial number (obviously)
visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node
Output:
(py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
{'serialNumber': '764000606',
'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',
'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],
'startDateTime': '2014-01-10T12:22:39.166Z'},
{'endDateTime': '2013-03-11T13:51:31.480Z',
'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],
'startDateTime': '2013-01-10T12:22:39.166Z'}]}
@EDIT0: added multiple part node handling as requested in one of the comments. That functionality has been moved to get_parts_data. Now, each entry in the visits list will have a parts key whose value will be a list consisting of dictionaries extracted from each part node (not the case for the provided xml).
1
In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.
– Safariba
Jul 28 '17 at 7:11
2
That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)
– CristiFati
Jul 28 '17 at 9:38
1
yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.
– Safariba
Jul 28 '17 at 9:41
Thanks for updating the answer
– Safariba
Jul 31 '17 at 7:04
add a comment |
try the following,
import xml.dom.minidom as minidom
doc = minidom.parse('filename')
memoryElem = doc.getElementsByTagName('part')[0]
print memoryElem.getAttribute('number')
print memoryElem.getAttribute('name')
print memoryElem.getAttribute('index')
Hope it will help u.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f45049761%2fpython-read-xml-with-related-child-elements%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here's an example that gets the data from xml.
code.py:
#!/usr/bin/env python3
import sys
import xml.etree.ElementTree as ET
from pprint import pprint as pp
file_name = "a.xml"
def get_product_sn(product_node):
for product_node_child in list(product_node):
if product_node_child.tag == "serialNumber":
return product_node_child.attrib.get("value", None)
return None
def get_parts_data(parts_node):
ret = list()
for parts_node_child in list(parts_node):
attrs = parts_node_child.attrib
ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})
return ret
def get_visit_node_data(visit_node):
ret = dict()
for visit_node_child in list(visit_node):
if visit_node_child.tag == "general":
for general_node_child in list(visit_node_child):
if general_node_child.tag == "startDateTime":
ret["startDateTime"] = general_node_child.text
elif general_node_child.tag == "endDateTime":
ret["endDateTime"] = general_node_child.text
elif visit_node_child.tag == "parts":
ret["parts"] = get_parts_data(visit_node_child)
return ret
def get_node_data(node):
ret = {"visits": list()}
for node_child in list(node):
if node_child.tag == "product":
ret["serialNumber"] = get_product_sn(node_child)
elif node_child.tag == "visits":
for visits_node_child in list(node_child):
ret["visits"].append(get_visit_node_data(visits_node_child))
return ret
def main():
tree = ET.parse(file_name)
root_node = tree.getroot()
data = get_node_data(root_node)
pp(data)
if __name__ == "__main__":
print("Python {:s} on {:s}n".format(sys.version, sys.platform))
main()
Notes:
- It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)
- It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above
- It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception
- As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard
- I've run it with Python 2.7 and Python 3.5
The output (a dictionary containing 2 keys) - indented for readability:
serialNumber - the serial number (obviously)
visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node
Output:
(py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
{'serialNumber': '764000606',
'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',
'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],
'startDateTime': '2014-01-10T12:22:39.166Z'},
{'endDateTime': '2013-03-11T13:51:31.480Z',
'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],
'startDateTime': '2013-01-10T12:22:39.166Z'}]}
@EDIT0: added multiple part node handling as requested in one of the comments. That functionality has been moved to get_parts_data. Now, each entry in the visits list will have a parts key whose value will be a list consisting of dictionaries extracted from each part node (not the case for the provided xml).
1
In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.
– Safariba
Jul 28 '17 at 7:11
2
That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)
– CristiFati
Jul 28 '17 at 9:38
1
yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.
– Safariba
Jul 28 '17 at 9:41
Thanks for updating the answer
– Safariba
Jul 31 '17 at 7:04
add a comment |
Here's an example that gets the data from xml.
code.py:
#!/usr/bin/env python3
import sys
import xml.etree.ElementTree as ET
from pprint import pprint as pp
file_name = "a.xml"
def get_product_sn(product_node):
for product_node_child in list(product_node):
if product_node_child.tag == "serialNumber":
return product_node_child.attrib.get("value", None)
return None
def get_parts_data(parts_node):
ret = list()
for parts_node_child in list(parts_node):
attrs = parts_node_child.attrib
ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})
return ret
def get_visit_node_data(visit_node):
ret = dict()
for visit_node_child in list(visit_node):
if visit_node_child.tag == "general":
for general_node_child in list(visit_node_child):
if general_node_child.tag == "startDateTime":
ret["startDateTime"] = general_node_child.text
elif general_node_child.tag == "endDateTime":
ret["endDateTime"] = general_node_child.text
elif visit_node_child.tag == "parts":
ret["parts"] = get_parts_data(visit_node_child)
return ret
def get_node_data(node):
ret = {"visits": list()}
for node_child in list(node):
if node_child.tag == "product":
ret["serialNumber"] = get_product_sn(node_child)
elif node_child.tag == "visits":
for visits_node_child in list(node_child):
ret["visits"].append(get_visit_node_data(visits_node_child))
return ret
def main():
tree = ET.parse(file_name)
root_node = tree.getroot()
data = get_node_data(root_node)
pp(data)
if __name__ == "__main__":
print("Python {:s} on {:s}n".format(sys.version, sys.platform))
main()
Notes:
- It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)
- It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above
- It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception
- As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard
- I've run it with Python 2.7 and Python 3.5
The output (a dictionary containing 2 keys) - indented for readability:
serialNumber - the serial number (obviously)
visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node
Output:
(py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
{'serialNumber': '764000606',
'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',
'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],
'startDateTime': '2014-01-10T12:22:39.166Z'},
{'endDateTime': '2013-03-11T13:51:31.480Z',
'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],
'startDateTime': '2013-01-10T12:22:39.166Z'}]}
@EDIT0: added multiple part node handling as requested in one of the comments. That functionality has been moved to get_parts_data. Now, each entry in the visits list will have a parts key whose value will be a list consisting of dictionaries extracted from each part node (not the case for the provided xml).
1
In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.
– Safariba
Jul 28 '17 at 7:11
2
That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)
– CristiFati
Jul 28 '17 at 9:38
1
yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.
– Safariba
Jul 28 '17 at 9:41
Thanks for updating the answer
– Safariba
Jul 31 '17 at 7:04
add a comment |
Here's an example that gets the data from xml.
code.py:
#!/usr/bin/env python3
import sys
import xml.etree.ElementTree as ET
from pprint import pprint as pp
file_name = "a.xml"
def get_product_sn(product_node):
for product_node_child in list(product_node):
if product_node_child.tag == "serialNumber":
return product_node_child.attrib.get("value", None)
return None
def get_parts_data(parts_node):
ret = list()
for parts_node_child in list(parts_node):
attrs = parts_node_child.attrib
ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})
return ret
def get_visit_node_data(visit_node):
ret = dict()
for visit_node_child in list(visit_node):
if visit_node_child.tag == "general":
for general_node_child in list(visit_node_child):
if general_node_child.tag == "startDateTime":
ret["startDateTime"] = general_node_child.text
elif general_node_child.tag == "endDateTime":
ret["endDateTime"] = general_node_child.text
elif visit_node_child.tag == "parts":
ret["parts"] = get_parts_data(visit_node_child)
return ret
def get_node_data(node):
ret = {"visits": list()}
for node_child in list(node):
if node_child.tag == "product":
ret["serialNumber"] = get_product_sn(node_child)
elif node_child.tag == "visits":
for visits_node_child in list(node_child):
ret["visits"].append(get_visit_node_data(visits_node_child))
return ret
def main():
tree = ET.parse(file_name)
root_node = tree.getroot()
data = get_node_data(root_node)
pp(data)
if __name__ == "__main__":
print("Python {:s} on {:s}n".format(sys.version, sys.platform))
main()
Notes:
- It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)
- It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above
- It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception
- As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard
- I've run it with Python 2.7 and Python 3.5
The output (a dictionary containing 2 keys) - indented for readability:
serialNumber - the serial number (obviously)
visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node
Output:
(py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
{'serialNumber': '764000606',
'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',
'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],
'startDateTime': '2014-01-10T12:22:39.166Z'},
{'endDateTime': '2013-03-11T13:51:31.480Z',
'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],
'startDateTime': '2013-01-10T12:22:39.166Z'}]}
@EDIT0: added multiple part node handling as requested in one of the comments. That functionality has been moved to get_parts_data. Now, each entry in the visits list will have a parts key whose value will be a list consisting of dictionaries extracted from each part node (not the case for the provided xml).
Here's an example that gets the data from xml.
code.py:
#!/usr/bin/env python3
import sys
import xml.etree.ElementTree as ET
from pprint import pprint as pp
file_name = "a.xml"
def get_product_sn(product_node):
for product_node_child in list(product_node):
if product_node_child.tag == "serialNumber":
return product_node_child.attrib.get("value", None)
return None
def get_parts_data(parts_node):
ret = list()
for parts_node_child in list(parts_node):
attrs = parts_node_child.attrib
ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})
return ret
def get_visit_node_data(visit_node):
ret = dict()
for visit_node_child in list(visit_node):
if visit_node_child.tag == "general":
for general_node_child in list(visit_node_child):
if general_node_child.tag == "startDateTime":
ret["startDateTime"] = general_node_child.text
elif general_node_child.tag == "endDateTime":
ret["endDateTime"] = general_node_child.text
elif visit_node_child.tag == "parts":
ret["parts"] = get_parts_data(visit_node_child)
return ret
def get_node_data(node):
ret = {"visits": list()}
for node_child in list(node):
if node_child.tag == "product":
ret["serialNumber"] = get_product_sn(node_child)
elif node_child.tag == "visits":
for visits_node_child in list(node_child):
ret["visits"].append(get_visit_node_data(visits_node_child))
return ret
def main():
tree = ET.parse(file_name)
root_node = tree.getroot()
data = get_node_data(root_node)
pp(data)
if __name__ == "__main__":
print("Python {:s} on {:s}n".format(sys.version, sys.platform))
main()
Notes:
- It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)
- It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above
- It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception
- As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard
- I've run it with Python 2.7 and Python 3.5
The output (a dictionary containing 2 keys) - indented for readability:
serialNumber - the serial number (obviously)
visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node
Output:
(py_064_03.05.04_test0) e:WorkDevStackOverflowq045049761>"e:WorkDevVEnvspy_064_03.05.04_test0Scriptspython.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
{'serialNumber': '764000606',
'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',
'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],
'startDateTime': '2014-01-10T12:22:39.166Z'},
{'endDateTime': '2013-03-11T13:51:31.480Z',
'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],
'startDateTime': '2013-01-10T12:22:39.166Z'}]}
@EDIT0: added multiple part node handling as requested in one of the comments. That functionality has been moved to get_parts_data. Now, each entry in the visits list will have a parts key whose value will be a list consisting of dictionaries extracted from each part node (not the case for the provided xml).
edited Dec 30 '18 at 20:55
answered Jul 12 '17 at 13:40
CristiFatiCristiFati
13k72436
13k72436
1
In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.
– Safariba
Jul 28 '17 at 7:11
2
That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)
– CristiFati
Jul 28 '17 at 9:38
1
yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.
– Safariba
Jul 28 '17 at 9:41
Thanks for updating the answer
– Safariba
Jul 31 '17 at 7:04
add a comment |
1
In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.
– Safariba
Jul 28 '17 at 7:11
2
That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)
– CristiFati
Jul 28 '17 at 9:38
1
yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.
– Safariba
Jul 28 '17 at 9:41
Thanks for updating the answer
– Safariba
Jul 31 '17 at 7:04
1
1
In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.
– Safariba
Jul 28 '17 at 7:11
In this code when there are more than one part for each visit, only last part is returned. It does not return all the parts for each visit.
– Safariba
Jul 28 '17 at 7:11
2
2
That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)
– CristiFati
Jul 28 '17 at 9:38
That's true. I thought that there could only be one part node per visit (as in the example xml). Do you want it to handle multiple part nodes ? (changes are trivial)
– CristiFati
Jul 28 '17 at 9:38
1
1
yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.
– Safariba
Jul 28 '17 at 9:41
yes I want to handle multiple parts, I am less experienced in handling dictionaries, can you help me with that? Thanks.
– Safariba
Jul 28 '17 at 9:41
Thanks for updating the answer
– Safariba
Jul 31 '17 at 7:04
Thanks for updating the answer
– Safariba
Jul 31 '17 at 7:04
add a comment |
try the following,
import xml.dom.minidom as minidom
doc = minidom.parse('filename')
memoryElem = doc.getElementsByTagName('part')[0]
print memoryElem.getAttribute('number')
print memoryElem.getAttribute('name')
print memoryElem.getAttribute('index')
Hope it will help u.
add a comment |
try the following,
import xml.dom.minidom as minidom
doc = minidom.parse('filename')
memoryElem = doc.getElementsByTagName('part')[0]
print memoryElem.getAttribute('number')
print memoryElem.getAttribute('name')
print memoryElem.getAttribute('index')
Hope it will help u.
add a comment |
try the following,
import xml.dom.minidom as minidom
doc = minidom.parse('filename')
memoryElem = doc.getElementsByTagName('part')[0]
print memoryElem.getAttribute('number')
print memoryElem.getAttribute('name')
print memoryElem.getAttribute('index')
Hope it will help u.
try the following,
import xml.dom.minidom as minidom
doc = minidom.parse('filename')
memoryElem = doc.getElementsByTagName('part')[0]
print memoryElem.getAttribute('number')
print memoryElem.getAttribute('name')
print memoryElem.getAttribute('index')
Hope it will help u.
answered Jul 12 '17 at 9:40
shangshang
2915
2915
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f45049761%2fpython-read-xml-with-related-child-elements%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
xYk BH9zuxK,YulYOoqy5VB8Kb6eOG8vYnP09 jO OrEien,BV5RtkumE o8ISDXcPFHXcee0haV6,fau9cG7FIM
Shouldn't
</product>
termination tag be at the end of the file? Because your XML file should only contain one root node.– CristiFati
Jul 12 '17 at 6:35
Is
visits
a pandas dataframe?– mzjn
Jul 12 '17 at 8:54
@mzjn yes visit=pandas.DataFrame()
– Safariba
Jul 12 '17 at 8:58
1
You left that out from the code snippet. Please show us complete code that we can copy and execute, and tag the question "pandas".
– mzjn
Jul 12 '17 at 9:01