pySpark and py4j: NoClassDefFoundError when upgrading a jar












0















We developed a Scala library to run on spark called FV. We also built wrappers in python for its public API using py4j as in spark. For example, the main object is instantiated like this



self._java_obj = self._new_java_obj("com.example.FV", self.uid)


and the methods on the object are called in this way



def add(self, r):
self._java_obj.add(r)


We are experiencing an annoying issue when running pyspark with this external library. We use to run the pyspark shell like this



pyspark --repositories <our-own-maven-release-repo> --packages <com.example.FV:latest.release>


When we release a new version and we have some change in the Scala API, things start to randomly break for some users. For example, in version 0.44 we had a class DateUtils (used by class Utils, which is used by class FV in method add) that was dropped in version 0.45. When version 0.45 was released and a user called the method add in python API we got



java.lang.NoClassDefFoundError: Could not initialize class DateUtils


Basically, the python API is running the method add which contains a reference to class DateUtils (v0.44) but when it is actually going to load the needed class it doesn't find it, because the loaded jar is the v0.45 (as the ivy log shows when starting up the shell)



Do you have any idea of what the problem might be? Does maybe py4j cache something so that when upgrading the classes we get this error?










share|improve this question





























    0















    We developed a Scala library to run on spark called FV. We also built wrappers in python for its public API using py4j as in spark. For example, the main object is instantiated like this



    self._java_obj = self._new_java_obj("com.example.FV", self.uid)


    and the methods on the object are called in this way



    def add(self, r):
    self._java_obj.add(r)


    We are experiencing an annoying issue when running pyspark with this external library. We use to run the pyspark shell like this



    pyspark --repositories <our-own-maven-release-repo> --packages <com.example.FV:latest.release>


    When we release a new version and we have some change in the Scala API, things start to randomly break for some users. For example, in version 0.44 we had a class DateUtils (used by class Utils, which is used by class FV in method add) that was dropped in version 0.45. When version 0.45 was released and a user called the method add in python API we got



    java.lang.NoClassDefFoundError: Could not initialize class DateUtils


    Basically, the python API is running the method add which contains a reference to class DateUtils (v0.44) but when it is actually going to load the needed class it doesn't find it, because the loaded jar is the v0.45 (as the ivy log shows when starting up the shell)



    Do you have any idea of what the problem might be? Does maybe py4j cache something so that when upgrading the classes we get this error?










    share|improve this question



























      0












      0








      0








      We developed a Scala library to run on spark called FV. We also built wrappers in python for its public API using py4j as in spark. For example, the main object is instantiated like this



      self._java_obj = self._new_java_obj("com.example.FV", self.uid)


      and the methods on the object are called in this way



      def add(self, r):
      self._java_obj.add(r)


      We are experiencing an annoying issue when running pyspark with this external library. We use to run the pyspark shell like this



      pyspark --repositories <our-own-maven-release-repo> --packages <com.example.FV:latest.release>


      When we release a new version and we have some change in the Scala API, things start to randomly break for some users. For example, in version 0.44 we had a class DateUtils (used by class Utils, which is used by class FV in method add) that was dropped in version 0.45. When version 0.45 was released and a user called the method add in python API we got



      java.lang.NoClassDefFoundError: Could not initialize class DateUtils


      Basically, the python API is running the method add which contains a reference to class DateUtils (v0.44) but when it is actually going to load the needed class it doesn't find it, because the loaded jar is the v0.45 (as the ivy log shows when starting up the shell)



      Do you have any idea of what the problem might be? Does maybe py4j cache something so that when upgrading the classes we get this error?










      share|improve this question
















      We developed a Scala library to run on spark called FV. We also built wrappers in python for its public API using py4j as in spark. For example, the main object is instantiated like this



      self._java_obj = self._new_java_obj("com.example.FV", self.uid)


      and the methods on the object are called in this way



      def add(self, r):
      self._java_obj.add(r)


      We are experiencing an annoying issue when running pyspark with this external library. We use to run the pyspark shell like this



      pyspark --repositories <our-own-maven-release-repo> --packages <com.example.FV:latest.release>


      When we release a new version and we have some change in the Scala API, things start to randomly break for some users. For example, in version 0.44 we had a class DateUtils (used by class Utils, which is used by class FV in method add) that was dropped in version 0.45. When version 0.45 was released and a user called the method add in python API we got



      java.lang.NoClassDefFoundError: Could not initialize class DateUtils


      Basically, the python API is running the method add which contains a reference to class DateUtils (v0.44) but when it is actually going to load the needed class it doesn't find it, because the loaded jar is the v0.45 (as the ivy log shows when starting up the shell)



      Do you have any idea of what the problem might be? Does maybe py4j cache something so that when upgrading the classes we get this error?







      apache-spark pyspark classloader py4j






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 30 '18 at 19:13







      alexlipa

















      asked Dec 29 '18 at 10:48









      alexlipaalexlipa

      7718




      7718
























          0






          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53968827%2fpyspark-and-py4j-noclassdeffounderror-when-upgrading-a-jar%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53968827%2fpyspark-and-py4j-noclassdeffounderror-when-upgrading-a-jar%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Monofisismo

          Angular Downloading a file using contenturl with Basic Authentication

          Olmecas