scrapy getting stuck after some time












0















I have a master-worker network on aws ec2 using dask distributed library. For now i have one master machine and one worker machine. Master has REST api (flask) for scheduling scrapy jobs on worker machine. I am using docker for both master and worker that means both master container and worker container communicating with each other using dask distributed.



When i scheduler scrapy job, crawling starts successfully and scrapy uploads data to s3 as well. But after some time scrapy gets stuck at one point and nothing happens after that.



Please check attached log file for more info



log.txt



2019-01-02 08:05:30 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f1fe54adf28>>


scrapy get stuck at above point.



command to run docker:



sudo docker run --network host -d crawler-worker # for worker
sudo docker run -p 80:80 -p 8786:8786 -p 8787:8787 --net=host -d crawler-master # for master


I am facing this issue on fresh ec2 machine as well










share|improve this question





























    0















    I have a master-worker network on aws ec2 using dask distributed library. For now i have one master machine and one worker machine. Master has REST api (flask) for scheduling scrapy jobs on worker machine. I am using docker for both master and worker that means both master container and worker container communicating with each other using dask distributed.



    When i scheduler scrapy job, crawling starts successfully and scrapy uploads data to s3 as well. But after some time scrapy gets stuck at one point and nothing happens after that.



    Please check attached log file for more info



    log.txt



    2019-01-02 08:05:30 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f1fe54adf28>>


    scrapy get stuck at above point.



    command to run docker:



    sudo docker run --network host -d crawler-worker # for worker
    sudo docker run -p 80:80 -p 8786:8786 -p 8787:8787 --net=host -d crawler-master # for master


    I am facing this issue on fresh ec2 machine as well










    share|improve this question



























      0












      0








      0








      I have a master-worker network on aws ec2 using dask distributed library. For now i have one master machine and one worker machine. Master has REST api (flask) for scheduling scrapy jobs on worker machine. I am using docker for both master and worker that means both master container and worker container communicating with each other using dask distributed.



      When i scheduler scrapy job, crawling starts successfully and scrapy uploads data to s3 as well. But after some time scrapy gets stuck at one point and nothing happens after that.



      Please check attached log file for more info



      log.txt



      2019-01-02 08:05:30 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f1fe54adf28>>


      scrapy get stuck at above point.



      command to run docker:



      sudo docker run --network host -d crawler-worker # for worker
      sudo docker run -p 80:80 -p 8786:8786 -p 8787:8787 --net=host -d crawler-master # for master


      I am facing this issue on fresh ec2 machine as well










      share|improve this question
















      I have a master-worker network on aws ec2 using dask distributed library. For now i have one master machine and one worker machine. Master has REST api (flask) for scheduling scrapy jobs on worker machine. I am using docker for both master and worker that means both master container and worker container communicating with each other using dask distributed.



      When i scheduler scrapy job, crawling starts successfully and scrapy uploads data to s3 as well. But after some time scrapy gets stuck at one point and nothing happens after that.



      Please check attached log file for more info



      log.txt



      2019-01-02 08:05:30 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f1fe54adf28>>


      scrapy get stuck at above point.



      command to run docker:



      sudo docker run --network host -d crawler-worker # for worker
      sudo docker run -p 80:80 -p 8786:8786 -p 8787:8787 --net=host -d crawler-master # for master


      I am facing this issue on fresh ec2 machine as well







      python amazon-web-services docker scrapy dask-distributed






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 2 at 15:00







      suraj deshmukh

















      asked Jan 2 at 10:25









      suraj deshmukhsuraj deshmukh

      666




      666
























          2 Answers
          2






          active

          oldest

          votes


















          1














          I solved the problem. The problem was in subprocess which i was using it to execute the scrapy with argument stdout=subprocess.PIPE and as per subprocess's documentation wait() function can cause a deadlock when using stdout=subprocess.PIPE or stderr=subprocess.PIPE.






          share|improve this answer































            0














            (This would be a comment but I don't yet have the points to do so.)



            You're probably encountering some sort of anti-DDOS protection. Have you tried scraping a control site?






            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54004644%2fscrapy-getting-stuck-after-some-time%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1














              I solved the problem. The problem was in subprocess which i was using it to execute the scrapy with argument stdout=subprocess.PIPE and as per subprocess's documentation wait() function can cause a deadlock when using stdout=subprocess.PIPE or stderr=subprocess.PIPE.






              share|improve this answer




























                1














                I solved the problem. The problem was in subprocess which i was using it to execute the scrapy with argument stdout=subprocess.PIPE and as per subprocess's documentation wait() function can cause a deadlock when using stdout=subprocess.PIPE or stderr=subprocess.PIPE.






                share|improve this answer


























                  1












                  1








                  1







                  I solved the problem. The problem was in subprocess which i was using it to execute the scrapy with argument stdout=subprocess.PIPE and as per subprocess's documentation wait() function can cause a deadlock when using stdout=subprocess.PIPE or stderr=subprocess.PIPE.






                  share|improve this answer













                  I solved the problem. The problem was in subprocess which i was using it to execute the scrapy with argument stdout=subprocess.PIPE and as per subprocess's documentation wait() function can cause a deadlock when using stdout=subprocess.PIPE or stderr=subprocess.PIPE.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jan 3 at 12:01









                  suraj deshmukhsuraj deshmukh

                  666




                  666

























                      0














                      (This would be a comment but I don't yet have the points to do so.)



                      You're probably encountering some sort of anti-DDOS protection. Have you tried scraping a control site?






                      share|improve this answer




























                        0














                        (This would be a comment but I don't yet have the points to do so.)



                        You're probably encountering some sort of anti-DDOS protection. Have you tried scraping a control site?






                        share|improve this answer


























                          0












                          0








                          0







                          (This would be a comment but I don't yet have the points to do so.)



                          You're probably encountering some sort of anti-DDOS protection. Have you tried scraping a control site?






                          share|improve this answer













                          (This would be a comment but I don't yet have the points to do so.)



                          You're probably encountering some sort of anti-DDOS protection. Have you tried scraping a control site?







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Jan 2 at 16:19









                          benasbenas

                          743




                          743






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54004644%2fscrapy-getting-stuck-after-some-time%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Monofisismo

                              Angular Downloading a file using contenturl with Basic Authentication

                              Olmecas