Databricks spark-xml when reading tags ending in “/>” return values are null

Multi tool use
Multi tool use












1















I'm using the latest version of spark-xml (0.4.1) with scala 11, when I read some xml that contains tags ending with "/>" the corresponding values ​​are null, fallow the example:



XML:



<Clients>
<Client ID="1" name="teste1" age="10">
<Operation ID="1" name="operation1">
</Operation>
<Operation ID="2" name="operation2">
</Operation>
</Client>
<Client ID="2" name="teste2" age="20"/>
<Client ID="3" name="teste3" age="30">
<Operation ID="1" name="operation1">
</Operation>
<Operation ID="2" name="operation2">
</Operation>
</Client>
</Clients>


Dataframe:



+----+------+----+--------------------+
| _ID| _name|_age| Operation|
+----+------+----+--------------------+
| 1|teste1| 10|[[1,operation1], ...|
|null| null|null| null|
+----+------+----+--------------------+


Code:



Dataset<Row> clients = sparkSession.sqlContext().read()
.format("com.databricks.spark.xml")
.option("rowTag", "Client")
.schema(getSchemaClient())
.load(dirtorio);

clients.show(10);

public StructType getSchemaClient() {
return new StructType(
new StructField {
new StructField("_ID", DataTypes.StringType, true, Metadata.empty()),
new StructField("_name", DataTypes.StringType, true, Metadata.empty()),
new StructField("_age", DataTypes.StringType, true, Metadata.empty()),
new StructField("Operation", DataTypes.createArrayType(this.getSchemaOperation()), true, Metadata.empty()) });
}

public StructType getSchemaOperation() {
return new StructType(new StructField {
new StructField("_ID", DataTypes.StringType, true, Metadata.empty()),
new StructField("_name", DataTypes.StringType, true, Metadata.empty()),
});
}









share|improve this question



























    1















    I'm using the latest version of spark-xml (0.4.1) with scala 11, when I read some xml that contains tags ending with "/>" the corresponding values ​​are null, fallow the example:



    XML:



    <Clients>
    <Client ID="1" name="teste1" age="10">
    <Operation ID="1" name="operation1">
    </Operation>
    <Operation ID="2" name="operation2">
    </Operation>
    </Client>
    <Client ID="2" name="teste2" age="20"/>
    <Client ID="3" name="teste3" age="30">
    <Operation ID="1" name="operation1">
    </Operation>
    <Operation ID="2" name="operation2">
    </Operation>
    </Client>
    </Clients>


    Dataframe:



    +----+------+----+--------------------+
    | _ID| _name|_age| Operation|
    +----+------+----+--------------------+
    | 1|teste1| 10|[[1,operation1], ...|
    |null| null|null| null|
    +----+------+----+--------------------+


    Code:



    Dataset<Row> clients = sparkSession.sqlContext().read()
    .format("com.databricks.spark.xml")
    .option("rowTag", "Client")
    .schema(getSchemaClient())
    .load(dirtorio);

    clients.show(10);

    public StructType getSchemaClient() {
    return new StructType(
    new StructField {
    new StructField("_ID", DataTypes.StringType, true, Metadata.empty()),
    new StructField("_name", DataTypes.StringType, true, Metadata.empty()),
    new StructField("_age", DataTypes.StringType, true, Metadata.empty()),
    new StructField("Operation", DataTypes.createArrayType(this.getSchemaOperation()), true, Metadata.empty()) });
    }

    public StructType getSchemaOperation() {
    return new StructType(new StructField {
    new StructField("_ID", DataTypes.StringType, true, Metadata.empty()),
    new StructField("_name", DataTypes.StringType, true, Metadata.empty()),
    });
    }









    share|improve this question

























      1












      1








      1








      I'm using the latest version of spark-xml (0.4.1) with scala 11, when I read some xml that contains tags ending with "/>" the corresponding values ​​are null, fallow the example:



      XML:



      <Clients>
      <Client ID="1" name="teste1" age="10">
      <Operation ID="1" name="operation1">
      </Operation>
      <Operation ID="2" name="operation2">
      </Operation>
      </Client>
      <Client ID="2" name="teste2" age="20"/>
      <Client ID="3" name="teste3" age="30">
      <Operation ID="1" name="operation1">
      </Operation>
      <Operation ID="2" name="operation2">
      </Operation>
      </Client>
      </Clients>


      Dataframe:



      +----+------+----+--------------------+
      | _ID| _name|_age| Operation|
      +----+------+----+--------------------+
      | 1|teste1| 10|[[1,operation1], ...|
      |null| null|null| null|
      +----+------+----+--------------------+


      Code:



      Dataset<Row> clients = sparkSession.sqlContext().read()
      .format("com.databricks.spark.xml")
      .option("rowTag", "Client")
      .schema(getSchemaClient())
      .load(dirtorio);

      clients.show(10);

      public StructType getSchemaClient() {
      return new StructType(
      new StructField {
      new StructField("_ID", DataTypes.StringType, true, Metadata.empty()),
      new StructField("_name", DataTypes.StringType, true, Metadata.empty()),
      new StructField("_age", DataTypes.StringType, true, Metadata.empty()),
      new StructField("Operation", DataTypes.createArrayType(this.getSchemaOperation()), true, Metadata.empty()) });
      }

      public StructType getSchemaOperation() {
      return new StructType(new StructField {
      new StructField("_ID", DataTypes.StringType, true, Metadata.empty()),
      new StructField("_name", DataTypes.StringType, true, Metadata.empty()),
      });
      }









      share|improve this question














      I'm using the latest version of spark-xml (0.4.1) with scala 11, when I read some xml that contains tags ending with "/>" the corresponding values ​​are null, fallow the example:



      XML:



      <Clients>
      <Client ID="1" name="teste1" age="10">
      <Operation ID="1" name="operation1">
      </Operation>
      <Operation ID="2" name="operation2">
      </Operation>
      </Client>
      <Client ID="2" name="teste2" age="20"/>
      <Client ID="3" name="teste3" age="30">
      <Operation ID="1" name="operation1">
      </Operation>
      <Operation ID="2" name="operation2">
      </Operation>
      </Client>
      </Clients>


      Dataframe:



      +----+------+----+--------------------+
      | _ID| _name|_age| Operation|
      +----+------+----+--------------------+
      | 1|teste1| 10|[[1,operation1], ...|
      |null| null|null| null|
      +----+------+----+--------------------+


      Code:



      Dataset<Row> clients = sparkSession.sqlContext().read()
      .format("com.databricks.spark.xml")
      .option("rowTag", "Client")
      .schema(getSchemaClient())
      .load(dirtorio);

      clients.show(10);

      public StructType getSchemaClient() {
      return new StructType(
      new StructField {
      new StructField("_ID", DataTypes.StringType, true, Metadata.empty()),
      new StructField("_name", DataTypes.StringType, true, Metadata.empty()),
      new StructField("_age", DataTypes.StringType, true, Metadata.empty()),
      new StructField("Operation", DataTypes.createArrayType(this.getSchemaOperation()), true, Metadata.empty()) });
      }

      public StructType getSchemaOperation() {
      return new StructType(new StructField {
      new StructField("_ID", DataTypes.StringType, true, Metadata.empty()),
      new StructField("_name", DataTypes.StringType, true, Metadata.empty()),
      });
      }






      apache-spark xml-parsing databricks






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jul 11 '18 at 18:21









      Thiago ZolingerThiago Zolinger

      61




      61
























          1 Answer
          1






          active

          oldest

          votes


















          0














          Version 0.5.0 was just released, which resolved issues with self-closing tags. It may resolve this issue. See https://github.com/databricks/spark-xml/pull/352






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51292002%2fdatabricks-spark-xml-when-reading-tags-ending-in-return-values-are-null%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Version 0.5.0 was just released, which resolved issues with self-closing tags. It may resolve this issue. See https://github.com/databricks/spark-xml/pull/352






            share|improve this answer




























              0














              Version 0.5.0 was just released, which resolved issues with self-closing tags. It may resolve this issue. See https://github.com/databricks/spark-xml/pull/352






              share|improve this answer


























                0












                0








                0







                Version 0.5.0 was just released, which resolved issues with self-closing tags. It may resolve this issue. See https://github.com/databricks/spark-xml/pull/352






                share|improve this answer













                Version 0.5.0 was just released, which resolved issues with self-closing tags. It may resolve this issue. See https://github.com/databricks/spark-xml/pull/352







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Dec 30 '18 at 20:28









                Sean OwenSean Owen

                58.1k18123158




                58.1k18123158






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51292002%2fdatabricks-spark-xml-when-reading-tags-ending-in-return-values-are-null%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    WzkZSVgo 8nKNw7WGl,MJZyi,0CpCKk3KN4qAvIO8hZQzLdAb2Bc1mQbtD,Ydd4hs3Cdwe 4Ad DPXR7AxuULVYqZI88k2U
                    hJIILNMWwvz,1WjoRLpO4TXMDC9ivEYCP023xLJ5u1S7j,KJx0HKItDu9nua oP3R

                    Popular posts from this blog

                    Monofisismo

                    Angular Downloading a file using contenturl with Basic Authentication

                    Olmecas