Parsing XML file in Node.js












1















I am using an Arch Linux system with KDE plasma. I have approximately 50mb XML, and I need to parse it. The file has custom tags.



Example XML:



<JMdict>
<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<sense>
<pos>&unc;</pos>
<gloss g_type="expl">repetition mark in katakana</gloss>
</sense>
</entry>
</JMdict>


I have tried many solutions that were suggested on Stack Overflow, and they did not work at all, and some of them could not installed to my system like xml-stream, xml2json. I decided to use xml2js (most of them suggest to use xml2js), and got the same result. How can I correctly use it ?
I am using this code but it always returns undefined:



const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();

const path = "test.xml";

fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
parser.parseString(data, function(err, res) {
console.log(res);
});
});

Result: Undefined


Is there any way to handle an XML file by hand (without a package)?










share|improve this question




















  • 1





    Your "XML" file is not well-formed: it contains an undefined entity reference &unc;. So parsing should fail.

    – Michael Kay
    Jan 1 at 18:58
















1















I am using an Arch Linux system with KDE plasma. I have approximately 50mb XML, and I need to parse it. The file has custom tags.



Example XML:



<JMdict>
<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<sense>
<pos>&unc;</pos>
<gloss g_type="expl">repetition mark in katakana</gloss>
</sense>
</entry>
</JMdict>


I have tried many solutions that were suggested on Stack Overflow, and they did not work at all, and some of them could not installed to my system like xml-stream, xml2json. I decided to use xml2js (most of them suggest to use xml2js), and got the same result. How can I correctly use it ?
I am using this code but it always returns undefined:



const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();

const path = "test.xml";

fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
parser.parseString(data, function(err, res) {
console.log(res);
});
});

Result: Undefined


Is there any way to handle an XML file by hand (without a package)?










share|improve this question




















  • 1





    Your "XML" file is not well-formed: it contains an undefined entity reference &unc;. So parsing should fail.

    – Michael Kay
    Jan 1 at 18:58














1












1








1








I am using an Arch Linux system with KDE plasma. I have approximately 50mb XML, and I need to parse it. The file has custom tags.



Example XML:



<JMdict>
<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<sense>
<pos>&unc;</pos>
<gloss g_type="expl">repetition mark in katakana</gloss>
</sense>
</entry>
</JMdict>


I have tried many solutions that were suggested on Stack Overflow, and they did not work at all, and some of them could not installed to my system like xml-stream, xml2json. I decided to use xml2js (most of them suggest to use xml2js), and got the same result. How can I correctly use it ?
I am using this code but it always returns undefined:



const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();

const path = "test.xml";

fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
parser.parseString(data, function(err, res) {
console.log(res);
});
});

Result: Undefined


Is there any way to handle an XML file by hand (without a package)?










share|improve this question
















I am using an Arch Linux system with KDE plasma. I have approximately 50mb XML, and I need to parse it. The file has custom tags.



Example XML:



<JMdict>
<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<sense>
<pos>&unc;</pos>
<gloss g_type="expl">repetition mark in katakana</gloss>
</sense>
</entry>
</JMdict>


I have tried many solutions that were suggested on Stack Overflow, and they did not work at all, and some of them could not installed to my system like xml-stream, xml2json. I decided to use xml2js (most of them suggest to use xml2js), and got the same result. How can I correctly use it ?
I am using this code but it always returns undefined:



const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();

const path = "test.xml";

fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
parser.parseString(data, function(err, res) {
console.log(res);
});
});

Result: Undefined


Is there any way to handle an XML file by hand (without a package)?







javascript node.js xml






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 1 at 16:03









jonrsharpe

77.8k11105213




77.8k11105213










asked Jan 1 at 15:53









Kaan Taha KökenKaan Taha Köken

1811315




1811315








  • 1





    Your "XML" file is not well-formed: it contains an undefined entity reference &unc;. So parsing should fail.

    – Michael Kay
    Jan 1 at 18:58














  • 1





    Your "XML" file is not well-formed: it contains an undefined entity reference &unc;. So parsing should fail.

    – Michael Kay
    Jan 1 at 18:58








1




1





Your "XML" file is not well-formed: it contains an undefined entity reference &unc;. So parsing should fail.

– Michael Kay
Jan 1 at 18:58





Your "XML" file is not well-formed: it contains an undefined entity reference &unc;. So parsing should fail.

– Michael Kay
Jan 1 at 18:58












3 Answers
3






active

oldest

votes


















1














Answer is below Working Example Link



var fs = require('fs'),
slash = require('slash'),
xml2js = require('xml2js');

var parser = new xml2js.Parser();

let filename = slash(__dirname+'/foo.xml');

// console.log(filename);

fs.readFile(filename, "utf8", function(err, data) {

if(err) {
console.log('Err1111');
console.log(err);
} else {
//console.log(data);
// data.toString('ascii', 0, data.length)

parser.parseString(data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&amp;'), function (err, result) {
if(err) {
console.log('Err');
console.log(err);
} else {
console.log(JSON.stringify(result));
console.log('Done');
}
});
}
});


Exact you have to do it below :




data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&')




Problem is below tag only &unc;



<pos>&unc;</pos>


Referenced And Thanks to @tim






share|improve this answer

































    1














    The way you use the xml2js package should be fine. However, the format of your xml is a little bit off.



    if you add a console.log to see what's causing the error



    fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
    parser.parseString(data, function(err, res) {
    if (err) console.log(err);

    console.log(res);
    });
    });


    You'll see that it's the line <pos>&unc;</pos> that causes the problem.
    If you fix the HTML entities, the parser should works fine.






    share|improve this answer































      1














      I think your problem is unescaped characters in your xml data.



      I'm able to get your example to work by using this:



      xml data:



      <JMdict>
      <entry>
      <ent_seq>1000000</ent_seq>
      <r_ele>
      <reb>ヽ</reb>
      </r_ele>
      <sense>
      <pos>YOUR PROBLEM WAS HERE</pos>
      <gloss g_type="expl">repetition mark in katakana</gloss>
      </sense>
      </entry>




      node.js code:



      const fs = require('fs-extra');
      const xml2js = require('xml2js');
      const parser = new xml2js.Parser();

      const path = "test.xml";

      fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
      parser.parseString(data, function(err, res) {
      console.log(JSON.stringify(res.JMdict.entry, null, 4));
      });

      });


      In situations like this, when I know it should work fine, I always look at the data and for any possible issues with the input data.






      share|improve this answer

























        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53996838%2fparsing-xml-file-in-node-js%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        1














        Answer is below Working Example Link



        var fs = require('fs'),
        slash = require('slash'),
        xml2js = require('xml2js');

        var parser = new xml2js.Parser();

        let filename = slash(__dirname+'/foo.xml');

        // console.log(filename);

        fs.readFile(filename, "utf8", function(err, data) {

        if(err) {
        console.log('Err1111');
        console.log(err);
        } else {
        //console.log(data);
        // data.toString('ascii', 0, data.length)

        parser.parseString(data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&amp;'), function (err, result) {
        if(err) {
        console.log('Err');
        console.log(err);
        } else {
        console.log(JSON.stringify(result));
        console.log('Done');
        }
        });
        }
        });


        Exact you have to do it below :




        data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&')




        Problem is below tag only &unc;



        <pos>&unc;</pos>


        Referenced And Thanks to @tim






        share|improve this answer






























          1














          Answer is below Working Example Link



          var fs = require('fs'),
          slash = require('slash'),
          xml2js = require('xml2js');

          var parser = new xml2js.Parser();

          let filename = slash(__dirname+'/foo.xml');

          // console.log(filename);

          fs.readFile(filename, "utf8", function(err, data) {

          if(err) {
          console.log('Err1111');
          console.log(err);
          } else {
          //console.log(data);
          // data.toString('ascii', 0, data.length)

          parser.parseString(data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&amp;'), function (err, result) {
          if(err) {
          console.log('Err');
          console.log(err);
          } else {
          console.log(JSON.stringify(result));
          console.log('Done');
          }
          });
          }
          });


          Exact you have to do it below :




          data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&')




          Problem is below tag only &unc;



          <pos>&unc;</pos>


          Referenced And Thanks to @tim






          share|improve this answer




























            1












            1








            1







            Answer is below Working Example Link



            var fs = require('fs'),
            slash = require('slash'),
            xml2js = require('xml2js');

            var parser = new xml2js.Parser();

            let filename = slash(__dirname+'/foo.xml');

            // console.log(filename);

            fs.readFile(filename, "utf8", function(err, data) {

            if(err) {
            console.log('Err1111');
            console.log(err);
            } else {
            //console.log(data);
            // data.toString('ascii', 0, data.length)

            parser.parseString(data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&amp;'), function (err, result) {
            if(err) {
            console.log('Err');
            console.log(err);
            } else {
            console.log(JSON.stringify(result));
            console.log('Done');
            }
            });
            }
            });


            Exact you have to do it below :




            data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&')




            Problem is below tag only &unc;



            <pos>&unc;</pos>


            Referenced And Thanks to @tim






            share|improve this answer















            Answer is below Working Example Link



            var fs = require('fs'),
            slash = require('slash'),
            xml2js = require('xml2js');

            var parser = new xml2js.Parser();

            let filename = slash(__dirname+'/foo.xml');

            // console.log(filename);

            fs.readFile(filename, "utf8", function(err, data) {

            if(err) {
            console.log('Err1111');
            console.log(err);
            } else {
            //console.log(data);
            // data.toString('ascii', 0, data.length)

            parser.parseString(data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&amp;'), function (err, result) {
            if(err) {
            console.log('Err');
            console.log(err);
            } else {
            console.log(JSON.stringify(result));
            console.log('Done');
            }
            });
            }
            });


            Exact you have to do it below :




            data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&')




            Problem is below tag only &unc;



            <pos>&unc;</pos>


            Referenced And Thanks to @tim







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jan 1 at 17:58

























            answered Jan 1 at 16:37









            KittaKitta

            348




            348

























                1














                The way you use the xml2js package should be fine. However, the format of your xml is a little bit off.



                if you add a console.log to see what's causing the error



                fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
                parser.parseString(data, function(err, res) {
                if (err) console.log(err);

                console.log(res);
                });
                });


                You'll see that it's the line <pos>&unc;</pos> that causes the problem.
                If you fix the HTML entities, the parser should works fine.






                share|improve this answer




























                  1














                  The way you use the xml2js package should be fine. However, the format of your xml is a little bit off.



                  if you add a console.log to see what's causing the error



                  fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
                  parser.parseString(data, function(err, res) {
                  if (err) console.log(err);

                  console.log(res);
                  });
                  });


                  You'll see that it's the line <pos>&unc;</pos> that causes the problem.
                  If you fix the HTML entities, the parser should works fine.






                  share|improve this answer


























                    1












                    1








                    1







                    The way you use the xml2js package should be fine. However, the format of your xml is a little bit off.



                    if you add a console.log to see what's causing the error



                    fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
                    parser.parseString(data, function(err, res) {
                    if (err) console.log(err);

                    console.log(res);
                    });
                    });


                    You'll see that it's the line <pos>&unc;</pos> that causes the problem.
                    If you fix the HTML entities, the parser should works fine.






                    share|improve this answer













                    The way you use the xml2js package should be fine. However, the format of your xml is a little bit off.



                    if you add a console.log to see what's causing the error



                    fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
                    parser.parseString(data, function(err, res) {
                    if (err) console.log(err);

                    console.log(res);
                    });
                    });


                    You'll see that it's the line <pos>&unc;</pos> that causes the problem.
                    If you fix the HTML entities, the parser should works fine.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Jan 1 at 16:35









                    Ray ChanRay Chan

                    47919




                    47919























                        1














                        I think your problem is unescaped characters in your xml data.



                        I'm able to get your example to work by using this:



                        xml data:



                        <JMdict>
                        <entry>
                        <ent_seq>1000000</ent_seq>
                        <r_ele>
                        <reb>ヽ</reb>
                        </r_ele>
                        <sense>
                        <pos>YOUR PROBLEM WAS HERE</pos>
                        <gloss g_type="expl">repetition mark in katakana</gloss>
                        </sense>
                        </entry>




                        node.js code:



                        const fs = require('fs-extra');
                        const xml2js = require('xml2js');
                        const parser = new xml2js.Parser();

                        const path = "test.xml";

                        fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
                        parser.parseString(data, function(err, res) {
                        console.log(JSON.stringify(res.JMdict.entry, null, 4));
                        });

                        });


                        In situations like this, when I know it should work fine, I always look at the data and for any possible issues with the input data.






                        share|improve this answer






























                          1














                          I think your problem is unescaped characters in your xml data.



                          I'm able to get your example to work by using this:



                          xml data:



                          <JMdict>
                          <entry>
                          <ent_seq>1000000</ent_seq>
                          <r_ele>
                          <reb>ヽ</reb>
                          </r_ele>
                          <sense>
                          <pos>YOUR PROBLEM WAS HERE</pos>
                          <gloss g_type="expl">repetition mark in katakana</gloss>
                          </sense>
                          </entry>




                          node.js code:



                          const fs = require('fs-extra');
                          const xml2js = require('xml2js');
                          const parser = new xml2js.Parser();

                          const path = "test.xml";

                          fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
                          parser.parseString(data, function(err, res) {
                          console.log(JSON.stringify(res.JMdict.entry, null, 4));
                          });

                          });


                          In situations like this, when I know it should work fine, I always look at the data and for any possible issues with the input data.






                          share|improve this answer




























                            1












                            1








                            1







                            I think your problem is unescaped characters in your xml data.



                            I'm able to get your example to work by using this:



                            xml data:



                            <JMdict>
                            <entry>
                            <ent_seq>1000000</ent_seq>
                            <r_ele>
                            <reb>ヽ</reb>
                            </r_ele>
                            <sense>
                            <pos>YOUR PROBLEM WAS HERE</pos>
                            <gloss g_type="expl">repetition mark in katakana</gloss>
                            </sense>
                            </entry>




                            node.js code:



                            const fs = require('fs-extra');
                            const xml2js = require('xml2js');
                            const parser = new xml2js.Parser();

                            const path = "test.xml";

                            fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
                            parser.parseString(data, function(err, res) {
                            console.log(JSON.stringify(res.JMdict.entry, null, 4));
                            });

                            });


                            In situations like this, when I know it should work fine, I always look at the data and for any possible issues with the input data.






                            share|improve this answer















                            I think your problem is unescaped characters in your xml data.



                            I'm able to get your example to work by using this:



                            xml data:



                            <JMdict>
                            <entry>
                            <ent_seq>1000000</ent_seq>
                            <r_ele>
                            <reb>ヽ</reb>
                            </r_ele>
                            <sense>
                            <pos>YOUR PROBLEM WAS HERE</pos>
                            <gloss g_type="expl">repetition mark in katakana</gloss>
                            </sense>
                            </entry>




                            node.js code:



                            const fs = require('fs-extra');
                            const xml2js = require('xml2js');
                            const parser = new xml2js.Parser();

                            const path = "test.xml";

                            fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
                            parser.parseString(data, function(err, res) {
                            console.log(JSON.stringify(res.JMdict.entry, null, 4));
                            });

                            });


                            In situations like this, when I know it should work fine, I always look at the data and for any possible issues with the input data.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Jan 1 at 16:50

























                            answered Jan 1 at 16:43









                            tamaktamak

                            9251232




                            9251232






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53996838%2fparsing-xml-file-in-node-js%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Monofisismo

                                Angular Downloading a file using contenturl with Basic Authentication

                                Olmecas