Populating Rails application with scraped content from another site

I need to seed or scrape the data from another site in order to have content for my project.

How do you go about scraping data from another site using your own rails app? Do you use a separate application/server to run some sort of cron job, then add that data to your rails app? Or is it possible to have your own site scrape the data and display it directly?

My first idea was to scrape a site using Mechanize, then add the data to the Fixtures in my rails app as seed data. Is there a better way? Maybe even a way to continuously scrape the other site to display the data using my own rails app?

edited Dec 28 '18 at 0:46

sawa

130k28199300

asked Dec 28 '18 at 0:39

Ali Ove

205

@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27

add a comment |

I need to seed or scrape the data from another site in order to have content for my project.

edited Dec 28 '18 at 0:46

sawa

130k28199300

asked Dec 28 '18 at 0:39

Ali Ove

205

@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27

add a comment |

I need to seed or scrape the data from another site in order to have content for my project.

edited Dec 28 '18 at 0:46

sawa

130k28199300

asked Dec 28 '18 at 0:39

Ali Ove

205

I need to seed or scrape the data from another site in order to have content for my project.

ruby-on-rails

edited Dec 28 '18 at 0:46

sawa

130k28199300

asked Dec 28 '18 at 0:39

Ali Ove

205

edited Dec 28 '18 at 0:46

sawa

130k28199300

asked Dec 28 '18 at 0:39

Ali Ove

205

edited Dec 28 '18 at 0:46

sawa

130k28199300

edited Dec 28 '18 at 0:46

sawa

130k28199300

edited Dec 28 '18 at 0:46

sawa

130k28199300

asked Dec 28 '18 at 0:39

Ali Ove

205

asked Dec 28 '18 at 0:39

Ali Ove

205

asked Dec 28 '18 at 0:39

Ali Ove

205

@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27

add a comment |

@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27

@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27

add a comment |

3 Answers
3

active

oldest

votes

You can use rufus scheduler and watir-dom-wait gem for your problem solution. I have also done a similar task for scraping for amazon kdp book list fetch
by using the watir-dom-wait gem you can also fetch the data for ajax call request the mechanize and Nokogiri will not work for Ajax

require 'rufus-scheduler'

require 'watir-dom-wait'

require 'selenium-webdriver'

scheduler = Rufus::Scheduler.new



scheduler.in '1d' do

  download_report

end

#download the report form amazon kdp

def download_report

  #login

  @browser = Watir::Browser.new :chrome, options: {prefs: prefs}

  @browser.goto 'https://kdp.amazon.com/en_US/reports-new'

  @browser.input(:name => "email").send_keys("test@gmail.com")

  @browser.input(:name => "password").send_keys("password")

  @browser.input(:id => 'signInSubmit').click

  @browser.span(:text => "Generate Report").click

end

answered Dec 28 '18 at 6:25

Cryptex Technologies

750113

Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04

add a comment |

I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.

Heroku Scheduler

Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query

answered Dec 28 '18 at 1:23

MorboRe'

597

add a comment |

I use Nokogiri to scrape websites.

You don't need a separate application. You can have methods inside your models that deal with all the scraping and populating your database and then you can create a rake file that will run those functions.

I name mine scheduler.rake

This goes in /lib/tasks/

And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)

Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.

answered Dec 28 '18 at 3:54

NemyaNation

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53952430%2fpopulating-rails-application-with-scraped-content-from-another-site%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

require 'rufus-scheduler'

require 'watir-dom-wait'

require 'selenium-webdriver'

scheduler = Rufus::Scheduler.new



scheduler.in '1d' do

  download_report

end

#download the report form amazon kdp

def download_report

  #login

  @browser = Watir::Browser.new :chrome, options: {prefs: prefs}

  @browser.goto 'https://kdp.amazon.com/en_US/reports-new'

  @browser.input(:name => "email").send_keys("test@gmail.com")

  @browser.input(:name => "password").send_keys("password")

  @browser.input(:id => 'signInSubmit').click

  @browser.span(:text => "Generate Report").click

end

answered Dec 28 '18 at 6:25

Cryptex Technologies

750113

Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04

add a comment |

require 'rufus-scheduler'

require 'watir-dom-wait'

require 'selenium-webdriver'

scheduler = Rufus::Scheduler.new



scheduler.in '1d' do

  download_report

end

#download the report form amazon kdp

def download_report

  #login

  @browser = Watir::Browser.new :chrome, options: {prefs: prefs}

  @browser.goto 'https://kdp.amazon.com/en_US/reports-new'

  @browser.input(:name => "email").send_keys("test@gmail.com")

  @browser.input(:name => "password").send_keys("password")

  @browser.input(:id => 'signInSubmit').click

  @browser.span(:text => "Generate Report").click

end

answered Dec 28 '18 at 6:25

Cryptex Technologies

750113

Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04

add a comment |

require 'rufus-scheduler'

require 'watir-dom-wait'

require 'selenium-webdriver'

scheduler = Rufus::Scheduler.new



scheduler.in '1d' do

  download_report

end

#download the report form amazon kdp

def download_report

  #login

  @browser = Watir::Browser.new :chrome, options: {prefs: prefs}

  @browser.goto 'https://kdp.amazon.com/en_US/reports-new'

  @browser.input(:name => "email").send_keys("test@gmail.com")

  @browser.input(:name => "password").send_keys("password")

  @browser.input(:id => 'signInSubmit').click

  @browser.span(:text => "Generate Report").click

end

answered Dec 28 '18 at 6:25

Cryptex Technologies

750113

require 'rufus-scheduler'

require 'watir-dom-wait'

require 'selenium-webdriver'

scheduler = Rufus::Scheduler.new



scheduler.in '1d' do

  download_report

end

#download the report form amazon kdp

def download_report

  #login

  @browser = Watir::Browser.new :chrome, options: {prefs: prefs}

  @browser.goto 'https://kdp.amazon.com/en_US/reports-new'

  @browser.input(:name => "email").send_keys("test@gmail.com")

  @browser.input(:name => "password").send_keys("password")

  @browser.input(:id => 'signInSubmit').click

  @browser.span(:text => "Generate Report").click

end

answered Dec 28 '18 at 6:25

Cryptex Technologies

750113

answered Dec 28 '18 at 6:25

Cryptex Technologies

750113

answered Dec 28 '18 at 6:25

Cryptex Technologies

750113

answered Dec 28 '18 at 6:25

Cryptex Technologies

750113

Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04

add a comment |

Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04

Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04

add a comment |

I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.

Heroku Scheduler

Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query

answered Dec 28 '18 at 1:23

MorboRe'

597

add a comment |

I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.

Heroku Scheduler

Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query

answered Dec 28 '18 at 1:23

MorboRe'

597

add a comment |

I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.

Heroku Scheduler

Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query

answered Dec 28 '18 at 1:23

MorboRe'

597

I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.

Heroku Scheduler

Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query

answered Dec 28 '18 at 1:23

MorboRe'

597

answered Dec 28 '18 at 1:23

MorboRe'

597

answered Dec 28 '18 at 1:23

MorboRe'

597

answered Dec 28 '18 at 1:23

MorboRe'

597

add a comment |

I use Nokogiri to scrape websites.

I name mine scheduler.rake

This goes in /lib/tasks/

And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)

Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.

answered Dec 28 '18 at 3:54

NemyaNation

add a comment |

I use Nokogiri to scrape websites.

I name mine scheduler.rake

This goes in /lib/tasks/

And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)

Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.

answered Dec 28 '18 at 3:54

NemyaNation

add a comment |

I use Nokogiri to scrape websites.

I name mine scheduler.rake

This goes in /lib/tasks/

And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)

Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.

answered Dec 28 '18 at 3:54

NemyaNation

I use Nokogiri to scrape websites.

I name mine scheduler.rake

This goes in /lib/tasks/

And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)

Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.

answered Dec 28 '18 at 3:54

NemyaNation

answered Dec 28 '18 at 3:54

NemyaNation

answered Dec 28 '18 at 3:54

NemyaNation

answered Dec 28 '18 at 3:54

NemyaNation

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

kzCzZ,tM9,uIWLCWOughZGCGm5xvbST aW Hiv8eY K5Eu2X9I

搜尋此網誌

Bdtjtk