scrapy getting stuck after some time

I have a master-worker network on aws ec2 using dask distributed library. For now i have one master machine and one worker machine. Master has REST api (flask) for scheduling scrapy jobs on worker machine. I am using docker for both master and worker that means both master container and worker container communicating with each other using dask distributed.

When i scheduler scrapy job, crawling starts successfully and scrapy uploads data to s3 as well. But after some time scrapy gets stuck at one point and nothing happens after that.

Please check attached log file for more info

log.txt

2019-01-02 08:05:30 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f1fe54adf28>>

scrapy get stuck at above point.

command to run docker:

sudo docker run --network host -d crawler-worker # for worker

sudo docker run -p 80:80 -p 8786:8786 -p 8787:8787 --net=host -d crawler-master # for master

I am facing this issue on fresh ec2 machine as well

edited Jan 2 at 15:00

asked Jan 2 at 10:25

suraj deshmukh

666

add a comment |

When i scheduler scrapy job, crawling starts successfully and scrapy uploads data to s3 as well. But after some time scrapy gets stuck at one point and nothing happens after that.

Please check attached log file for more info

log.txt

2019-01-02 08:05:30 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f1fe54adf28>>

scrapy get stuck at above point.

command to run docker:

sudo docker run --network host -d crawler-worker # for worker

sudo docker run -p 80:80 -p 8786:8786 -p 8787:8787 --net=host -d crawler-master # for master

I am facing this issue on fresh ec2 machine as well

edited Jan 2 at 15:00

asked Jan 2 at 10:25

suraj deshmukh

666

add a comment |

When i scheduler scrapy job, crawling starts successfully and scrapy uploads data to s3 as well. But after some time scrapy gets stuck at one point and nothing happens after that.

Please check attached log file for more info

log.txt

2019-01-02 08:05:30 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f1fe54adf28>>

scrapy get stuck at above point.

command to run docker:

sudo docker run --network host -d crawler-worker # for worker

sudo docker run -p 80:80 -p 8786:8786 -p 8787:8787 --net=host -d crawler-master # for master

I am facing this issue on fresh ec2 machine as well

edited Jan 2 at 15:00

asked Jan 2 at 10:25

suraj deshmukh

666

When i scheduler scrapy job, crawling starts successfully and scrapy uploads data to s3 as well. But after some time scrapy gets stuck at one point and nothing happens after that.

Please check attached log file for more info

log.txt

2019-01-02 08:05:30 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f1fe54adf28>>

scrapy get stuck at above point.

command to run docker:

sudo docker run --network host -d crawler-worker # for worker

sudo docker run -p 80:80 -p 8786:8786 -p 8787:8787 --net=host -d crawler-master # for master

I am facing this issue on fresh ec2 machine as well

python amazon-web-services docker scrapy dask-distributed

edited Jan 2 at 15:00

asked Jan 2 at 10:25

suraj deshmukh

666

edited Jan 2 at 15:00

asked Jan 2 at 10:25

suraj deshmukh

666

edited Jan 2 at 15:00

asked Jan 2 at 10:25

suraj deshmukh

666

asked Jan 2 at 10:25

suraj deshmukh

666

asked Jan 2 at 10:25

suraj deshmukh

666

add a comment |

2 Answers
2

active

oldest

votes

I solved the problem. The problem was in subprocess which i was using it to execute the scrapy with argument stdout=subprocess.PIPE and as per subprocess's documentation wait() function can cause a deadlock when using stdout=subprocess.PIPE or stderr=subprocess.PIPE.

answered Jan 3 at 12:01

suraj deshmukh

666

add a comment |

(This would be a comment but I don't yet have the points to do so.)

You're probably encountering some sort of anti-DDOS protection. Have you tried scraping a control site?

answered Jan 2 at 16:19

benas

743

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54004644%2fscrapy-getting-stuck-after-some-time%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

answered Jan 3 at 12:01

suraj deshmukh

666

add a comment |

answered Jan 3 at 12:01

suraj deshmukh

666

add a comment |

answered Jan 3 at 12:01

suraj deshmukh

666

answered Jan 3 at 12:01

suraj deshmukh

666

answered Jan 3 at 12:01

suraj deshmukh

666

answered Jan 3 at 12:01

suraj deshmukh

666

answered Jan 3 at 12:01

suraj deshmukh

666

add a comment |

(This would be a comment but I don't yet have the points to do so.)

You're probably encountering some sort of anti-DDOS protection. Have you tried scraping a control site?

answered Jan 2 at 16:19

benas

743

add a comment |

(This would be a comment but I don't yet have the points to do so.)

You're probably encountering some sort of anti-DDOS protection. Have you tried scraping a control site?

answered Jan 2 at 16:19

benas

743

add a comment |

(This would be a comment but I don't yet have the points to do so.)

You're probably encountering some sort of anti-DDOS protection. Have you tried scraping a control site?

answered Jan 2 at 16:19

benas

743

(This would be a comment but I don't yet have the points to do so.)

You're probably encountering some sort of anti-DDOS protection. Have you tried scraping a control site?

answered Jan 2 at 16:19

benas

743

answered Jan 2 at 16:19

benas

743

answered Jan 2 at 16:19

benas

743

answered Jan 2 at 16:19

benas

743

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk