Populating Rails application with scraped content from another site












1














I need to seed or scrape the data from another site in order to have content for my project.



How do you go about scraping data from another site using your own rails app? Do you use a separate application/server to run some sort of cron job, then add that data to your rails app? Or is it possible to have your own site scrape the data and display it directly?



My first idea was to scrape a site using Mechanize, then add the data to the Fixtures in my rails app as seed data. Is there a better way? Maybe even a way to continuously scrape the other site to display the data using my own rails app?










share|improve this question
























  • @Alie Please check my answer
    – Cryptex Technologies
    Dec 28 '18 at 6:27
















1














I need to seed or scrape the data from another site in order to have content for my project.



How do you go about scraping data from another site using your own rails app? Do you use a separate application/server to run some sort of cron job, then add that data to your rails app? Or is it possible to have your own site scrape the data and display it directly?



My first idea was to scrape a site using Mechanize, then add the data to the Fixtures in my rails app as seed data. Is there a better way? Maybe even a way to continuously scrape the other site to display the data using my own rails app?










share|improve this question
























  • @Alie Please check my answer
    – Cryptex Technologies
    Dec 28 '18 at 6:27














1












1








1







I need to seed or scrape the data from another site in order to have content for my project.



How do you go about scraping data from another site using your own rails app? Do you use a separate application/server to run some sort of cron job, then add that data to your rails app? Or is it possible to have your own site scrape the data and display it directly?



My first idea was to scrape a site using Mechanize, then add the data to the Fixtures in my rails app as seed data. Is there a better way? Maybe even a way to continuously scrape the other site to display the data using my own rails app?










share|improve this question















I need to seed or scrape the data from another site in order to have content for my project.



How do you go about scraping data from another site using your own rails app? Do you use a separate application/server to run some sort of cron job, then add that data to your rails app? Or is it possible to have your own site scrape the data and display it directly?



My first idea was to scrape a site using Mechanize, then add the data to the Fixtures in my rails app as seed data. Is there a better way? Maybe even a way to continuously scrape the other site to display the data using my own rails app?







ruby-on-rails






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 28 '18 at 0:46









sawa

130k28199300




130k28199300










asked Dec 28 '18 at 0:39









Ali Ove

205




205












  • @Alie Please check my answer
    – Cryptex Technologies
    Dec 28 '18 at 6:27


















  • @Alie Please check my answer
    – Cryptex Technologies
    Dec 28 '18 at 6:27
















@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27




@Alie Please check my answer
– Cryptex Technologies
Dec 28 '18 at 6:27












3 Answers
3






active

oldest

votes


















1














You can use rufus scheduler and watir-dom-wait gem for your problem solution. I have also done a similar task for scraping for amazon kdp book list fetch
by using the watir-dom-wait gem you can also fetch the data for ajax call request the mechanize and Nokogiri will not work for Ajax



require 'rufus-scheduler'
require 'watir-dom-wait'
require 'selenium-webdriver'
scheduler = Rufus::Scheduler.new

scheduler.in '1d' do
download_report
end
#download the report form amazon kdp
def download_report
#login
@browser = Watir::Browser.new :chrome, options: {prefs: prefs}
@browser.goto 'https://kdp.amazon.com/en_US/reports-new'
@browser.input(:name => "email").send_keys("test@gmail.com")
@browser.input(:name => "password").send_keys("password")
@browser.input(:id => 'signInSubmit').click
@browser.span(:text => "Generate Report").click
end





share|improve this answer





















  • Exactly what I was looking for
    – Ali Ove
    Dec 28 '18 at 10:04



















0














I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.



Heroku Scheduler



Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query






share|improve this answer





























    0














    I use Nokogiri to scrape websites.



    You don't need a separate application. You can have methods inside your models that deal with all the scraping and populating your database and then you can create a rake file that will run those functions.



    I name mine scheduler.rake



    This goes in /lib/tasks/



    And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)



    Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.






    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53952430%2fpopulating-rails-application-with-scraped-content-from-another-site%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      You can use rufus scheduler and watir-dom-wait gem for your problem solution. I have also done a similar task for scraping for amazon kdp book list fetch
      by using the watir-dom-wait gem you can also fetch the data for ajax call request the mechanize and Nokogiri will not work for Ajax



      require 'rufus-scheduler'
      require 'watir-dom-wait'
      require 'selenium-webdriver'
      scheduler = Rufus::Scheduler.new

      scheduler.in '1d' do
      download_report
      end
      #download the report form amazon kdp
      def download_report
      #login
      @browser = Watir::Browser.new :chrome, options: {prefs: prefs}
      @browser.goto 'https://kdp.amazon.com/en_US/reports-new'
      @browser.input(:name => "email").send_keys("test@gmail.com")
      @browser.input(:name => "password").send_keys("password")
      @browser.input(:id => 'signInSubmit').click
      @browser.span(:text => "Generate Report").click
      end





      share|improve this answer





















      • Exactly what I was looking for
        – Ali Ove
        Dec 28 '18 at 10:04
















      1














      You can use rufus scheduler and watir-dom-wait gem for your problem solution. I have also done a similar task for scraping for amazon kdp book list fetch
      by using the watir-dom-wait gem you can also fetch the data for ajax call request the mechanize and Nokogiri will not work for Ajax



      require 'rufus-scheduler'
      require 'watir-dom-wait'
      require 'selenium-webdriver'
      scheduler = Rufus::Scheduler.new

      scheduler.in '1d' do
      download_report
      end
      #download the report form amazon kdp
      def download_report
      #login
      @browser = Watir::Browser.new :chrome, options: {prefs: prefs}
      @browser.goto 'https://kdp.amazon.com/en_US/reports-new'
      @browser.input(:name => "email").send_keys("test@gmail.com")
      @browser.input(:name => "password").send_keys("password")
      @browser.input(:id => 'signInSubmit').click
      @browser.span(:text => "Generate Report").click
      end





      share|improve this answer





















      • Exactly what I was looking for
        – Ali Ove
        Dec 28 '18 at 10:04














      1












      1








      1






      You can use rufus scheduler and watir-dom-wait gem for your problem solution. I have also done a similar task for scraping for amazon kdp book list fetch
      by using the watir-dom-wait gem you can also fetch the data for ajax call request the mechanize and Nokogiri will not work for Ajax



      require 'rufus-scheduler'
      require 'watir-dom-wait'
      require 'selenium-webdriver'
      scheduler = Rufus::Scheduler.new

      scheduler.in '1d' do
      download_report
      end
      #download the report form amazon kdp
      def download_report
      #login
      @browser = Watir::Browser.new :chrome, options: {prefs: prefs}
      @browser.goto 'https://kdp.amazon.com/en_US/reports-new'
      @browser.input(:name => "email").send_keys("test@gmail.com")
      @browser.input(:name => "password").send_keys("password")
      @browser.input(:id => 'signInSubmit').click
      @browser.span(:text => "Generate Report").click
      end





      share|improve this answer












      You can use rufus scheduler and watir-dom-wait gem for your problem solution. I have also done a similar task for scraping for amazon kdp book list fetch
      by using the watir-dom-wait gem you can also fetch the data for ajax call request the mechanize and Nokogiri will not work for Ajax



      require 'rufus-scheduler'
      require 'watir-dom-wait'
      require 'selenium-webdriver'
      scheduler = Rufus::Scheduler.new

      scheduler.in '1d' do
      download_report
      end
      #download the report form amazon kdp
      def download_report
      #login
      @browser = Watir::Browser.new :chrome, options: {prefs: prefs}
      @browser.goto 'https://kdp.amazon.com/en_US/reports-new'
      @browser.input(:name => "email").send_keys("test@gmail.com")
      @browser.input(:name => "password").send_keys("password")
      @browser.input(:id => 'signInSubmit').click
      @browser.span(:text => "Generate Report").click
      end






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Dec 28 '18 at 6:25









      Cryptex Technologies

      750113




      750113












      • Exactly what I was looking for
        – Ali Ove
        Dec 28 '18 at 10:04


















      • Exactly what I was looking for
        – Ali Ove
        Dec 28 '18 at 10:04
















      Exactly what I was looking for
      – Ali Ove
      Dec 28 '18 at 10:04




      Exactly what I was looking for
      – Ali Ove
      Dec 28 '18 at 10:04













      0














      I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.



      Heroku Scheduler



      Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query






      share|improve this answer


























        0














        I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.



        Heroku Scheduler



        Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query






        share|improve this answer
























          0












          0








          0






          I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.



          Heroku Scheduler



          Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query






          share|improve this answer












          I use heroku and it comes with something called scheduler that works quite well for my little project. I believe it works very similiar to cron.



          Heroku Scheduler



          Once the data get scrapped. It goes directly into database(psql) then you could display whatever you wanted through database query







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Dec 28 '18 at 1:23









          MorboRe'

          597




          597























              0














              I use Nokogiri to scrape websites.



              You don't need a separate application. You can have methods inside your models that deal with all the scraping and populating your database and then you can create a rake file that will run those functions.



              I name mine scheduler.rake



              This goes in /lib/tasks/



              And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)



              Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.






              share|improve this answer


























                0














                I use Nokogiri to scrape websites.



                You don't need a separate application. You can have methods inside your models that deal with all the scraping and populating your database and then you can create a rake file that will run those functions.



                I name mine scheduler.rake



                This goes in /lib/tasks/



                And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)



                Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.






                share|improve this answer
























                  0












                  0








                  0






                  I use Nokogiri to scrape websites.



                  You don't need a separate application. You can have methods inside your models that deal with all the scraping and populating your database and then you can create a rake file that will run those functions.



                  I name mine scheduler.rake



                  This goes in /lib/tasks/



                  And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)



                  Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.






                  share|improve this answer












                  I use Nokogiri to scrape websites.



                  You don't need a separate application. You can have methods inside your models that deal with all the scraping and populating your database and then you can create a rake file that will run those functions.



                  I name mine scheduler.rake



                  This goes in /lib/tasks/



                  And then if you're using Heroku you will be able to add the Scheduler plugin (It's available for free 28/12/2018)



                  Heroku has some pretty good docs explaining how you can configure things on the Heroku side of things.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Dec 28 '18 at 3:54









                  NemyaNation

                  55




                  55






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53952430%2fpopulating-rails-application-with-scraped-content-from-another-site%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Monofisismo

                      Angular Downloading a file using contenturl with Basic Authentication

                      Olmecas