Populating Rails application with scraped content from another site
I need to seed or scrape the data from another site in order to have content for my project.
How do you go about scraping data from another site using your own rails app? Do you use a separate application/server to run some sort of cron job, then add that data to your rails app? Or is it possible to have your own site scrape the data and display it directly?
My first idea was to scrape a site using Mechanize, then add the data to the Fixtures in my rails app as seed data. Is there a better way? Maybe even a way to continuously scrape the other site to display the data using my own rails app?
ruby-on-rails
add a comment |
I need to seed or scrape the data from another site in order to have content for my project.
How do you go about scraping data from another site using your own rails app? Do you use a separate application/server to run some sort of cron job, then add that data to your rails app? Or is it possible to have your own site scrape the data and display it directly?
My first idea was to scrape a site using Mechanize, then add the data to the Fixtures in my rails app as seed data. Is there a better way? Maybe even a way to continuously scrape the other site to display the data using my own rails app?
ruby-on-rails
@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27
add a comment |
I need to seed or scrape the data from another site in order to have content for my project.
How do you go about scraping data from another site using your own rails app? Do you use a separate application/server to run some sort of cron job, then add that data to your rails app? Or is it possible to have your own site scrape the data and display it directly?
My first idea was to scrape a site using Mechanize, then add the data to the Fixtures in my rails app as seed data. Is there a better way? Maybe even a way to continuously scrape the other site to display the data using my own rails app?
ruby-on-rails
I need to seed or scrape the data from another site in order to have content for my project.
How do you go about scraping data from another site using your own rails app? Do you use a separate application/server to run some sort of cron job, then add that data to your rails app? Or is it possible to have your own site scrape the data and display it directly?
My first idea was to scrape a site using Mechanize, then add the data to the Fixtures in my rails app as seed data. Is there a better way? Maybe even a way to continuously scrape the other site to display the data using my own rails app?
ruby-on-rails
ruby-on-rails
edited Dec 28 '18 at 0:46
sawa
130k28199300
130k28199300
asked Dec 28 '18 at 0:39
Ali Ove
205
205
@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27
add a comment |
@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27
@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27
@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27
add a comment |
3 Answers
3
active
oldest
votes
You can use rufus scheduler and watir-dom-wait gem for your problem solution. I have also done a similar task for scraping for amazon kdp book list fetch
by using the watir-dom-wait gem you can also fetch the data for ajax call request the mechanize and Nokogiri will not work for Ajax
require 'rufus-scheduler'
require 'watir-dom-wait'
require 'selenium-webdriver'
scheduler = Rufus::Scheduler.new
scheduler.in '1d' do
download_report
end
#download the report form amazon kdp
def download_report
#login
@browser = Watir::Browser.new :chrome, options: {prefs: prefs}
@browser.goto 'https://kdp.amazon.com/en_US/reports-new'
@browser.input(:name => "email").send_keys("test@gmail.com")
@browser.input(:name => "password").send_keys("password")
@browser.input(:id => 'signInSubmit').click
@browser.span(:text => "Generate Report").click
end
Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04
add a comment |
I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.
Heroku Scheduler
Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query
add a comment |
I use Nokogiri to scrape websites.
You don't need a separate application. You can have methods inside your models that deal with all the scraping and populating your database and then you can create a rake file that will run those functions.
I name mine scheduler.rake
This goes in /lib/tasks/
And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)
Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53952430%2fpopulating-rails-application-with-scraped-content-from-another-site%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use rufus scheduler and watir-dom-wait gem for your problem solution. I have also done a similar task for scraping for amazon kdp book list fetch
by using the watir-dom-wait gem you can also fetch the data for ajax call request the mechanize and Nokogiri will not work for Ajax
require 'rufus-scheduler'
require 'watir-dom-wait'
require 'selenium-webdriver'
scheduler = Rufus::Scheduler.new
scheduler.in '1d' do
download_report
end
#download the report form amazon kdp
def download_report
#login
@browser = Watir::Browser.new :chrome, options: {prefs: prefs}
@browser.goto 'https://kdp.amazon.com/en_US/reports-new'
@browser.input(:name => "email").send_keys("test@gmail.com")
@browser.input(:name => "password").send_keys("password")
@browser.input(:id => 'signInSubmit').click
@browser.span(:text => "Generate Report").click
end
Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04
add a comment |
You can use rufus scheduler and watir-dom-wait gem for your problem solution. I have also done a similar task for scraping for amazon kdp book list fetch
by using the watir-dom-wait gem you can also fetch the data for ajax call request the mechanize and Nokogiri will not work for Ajax
require 'rufus-scheduler'
require 'watir-dom-wait'
require 'selenium-webdriver'
scheduler = Rufus::Scheduler.new
scheduler.in '1d' do
download_report
end
#download the report form amazon kdp
def download_report
#login
@browser = Watir::Browser.new :chrome, options: {prefs: prefs}
@browser.goto 'https://kdp.amazon.com/en_US/reports-new'
@browser.input(:name => "email").send_keys("test@gmail.com")
@browser.input(:name => "password").send_keys("password")
@browser.input(:id => 'signInSubmit').click
@browser.span(:text => "Generate Report").click
end
Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04
add a comment |
You can use rufus scheduler and watir-dom-wait gem for your problem solution. I have also done a similar task for scraping for amazon kdp book list fetch
by using the watir-dom-wait gem you can also fetch the data for ajax call request the mechanize and Nokogiri will not work for Ajax
require 'rufus-scheduler'
require 'watir-dom-wait'
require 'selenium-webdriver'
scheduler = Rufus::Scheduler.new
scheduler.in '1d' do
download_report
end
#download the report form amazon kdp
def download_report
#login
@browser = Watir::Browser.new :chrome, options: {prefs: prefs}
@browser.goto 'https://kdp.amazon.com/en_US/reports-new'
@browser.input(:name => "email").send_keys("test@gmail.com")
@browser.input(:name => "password").send_keys("password")
@browser.input(:id => 'signInSubmit').click
@browser.span(:text => "Generate Report").click
end
You can use rufus scheduler and watir-dom-wait gem for your problem solution. I have also done a similar task for scraping for amazon kdp book list fetch
by using the watir-dom-wait gem you can also fetch the data for ajax call request the mechanize and Nokogiri will not work for Ajax
require 'rufus-scheduler'
require 'watir-dom-wait'
require 'selenium-webdriver'
scheduler = Rufus::Scheduler.new
scheduler.in '1d' do
download_report
end
#download the report form amazon kdp
def download_report
#login
@browser = Watir::Browser.new :chrome, options: {prefs: prefs}
@browser.goto 'https://kdp.amazon.com/en_US/reports-new'
@browser.input(:name => "email").send_keys("test@gmail.com")
@browser.input(:name => "password").send_keys("password")
@browser.input(:id => 'signInSubmit').click
@browser.span(:text => "Generate Report").click
end
answered Dec 28 '18 at 6:25
Cryptex Technologies
750113
750113
Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04
add a comment |
Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04
Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04
Exactly what I was looking for
– Ali Ove
Dec 28 '18 at 10:04
add a comment |
I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.
Heroku Scheduler
Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query
add a comment |
I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.
Heroku Scheduler
Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query
add a comment |
I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.
Heroku Scheduler
Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query
I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.
Heroku Scheduler
Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query
answered Dec 28 '18 at 1:23
MorboRe'
597
597
add a comment |
add a comment |
I use Nokogiri to scrape websites.
You don't need a separate application. You can have methods inside your models that deal with all the scraping and populating your database and then you can create a rake file that will run those functions.
I name mine scheduler.rake
This goes in /lib/tasks/
And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)
Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.
add a comment |
I use Nokogiri to scrape websites.
You don't need a separate application. You can have methods inside your models that deal with all the scraping and populating your database and then you can create a rake file that will run those functions.
I name mine scheduler.rake
This goes in /lib/tasks/
And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)
Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.
add a comment |
I use Nokogiri to scrape websites.
You don't need a separate application. You can have methods inside your models that deal with all the scraping and populating your database and then you can create a rake file that will run those functions.
I name mine scheduler.rake
This goes in /lib/tasks/
And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)
Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.
I use Nokogiri to scrape websites.
You don't need a separate application. You can have methods inside your models that deal with all the scraping and populating your database and then you can create a rake file that will run those functions.
I name mine scheduler.rake
This goes in /lib/tasks/
And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)
Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.
answered Dec 28 '18 at 3:54
NemyaNation
55
55
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53952430%2fpopulating-rails-application-with-scraped-content-from-another-site%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27