Persisting data in Google Colaboratory












7















Has anyone figured out a way to keep files persisted across sessions in Google's newly open sourced Colaboratory?



Using the sample notebooks, I'm successfully authenticating and transferring csv files from my Google Drive instance and have stashed them in /tmp, my ~, and ~/datalab. Pandas can read them just fine off of disk too. But once the session times out , it looks like the whole filesystem is wiped and a new VM is spun up, without downloaded files.



I guess this isn't surprising given Google's Colaboratory Faq:




Q: Where is my code executed? What happens to my execution state if I close the browser window?



A: Code is executed in a virtual machine dedicated to your account. Virtual machines are recycled when idle for a while, and have a maximum lifetime enforced by the system.




Given that, maybe this is a feature (ie "go use Google Cloud Storage, which works fine in Colaboratory")? When I first used the tool, I was hoping that any .csv files that were in the My File/Colab Notebooks Google Drive folder would be also loaded onto the VM instance that the notebook was running on :/










share|improve this question





























    7















    Has anyone figured out a way to keep files persisted across sessions in Google's newly open sourced Colaboratory?



    Using the sample notebooks, I'm successfully authenticating and transferring csv files from my Google Drive instance and have stashed them in /tmp, my ~, and ~/datalab. Pandas can read them just fine off of disk too. But once the session times out , it looks like the whole filesystem is wiped and a new VM is spun up, without downloaded files.



    I guess this isn't surprising given Google's Colaboratory Faq:




    Q: Where is my code executed? What happens to my execution state if I close the browser window?



    A: Code is executed in a virtual machine dedicated to your account. Virtual machines are recycled when idle for a while, and have a maximum lifetime enforced by the system.




    Given that, maybe this is a feature (ie "go use Google Cloud Storage, which works fine in Colaboratory")? When I first used the tool, I was hoping that any .csv files that were in the My File/Colab Notebooks Google Drive folder would be also loaded onto the VM instance that the notebook was running on :/










    share|improve this question



























      7












      7








      7


      1






      Has anyone figured out a way to keep files persisted across sessions in Google's newly open sourced Colaboratory?



      Using the sample notebooks, I'm successfully authenticating and transferring csv files from my Google Drive instance and have stashed them in /tmp, my ~, and ~/datalab. Pandas can read them just fine off of disk too. But once the session times out , it looks like the whole filesystem is wiped and a new VM is spun up, without downloaded files.



      I guess this isn't surprising given Google's Colaboratory Faq:




      Q: Where is my code executed? What happens to my execution state if I close the browser window?



      A: Code is executed in a virtual machine dedicated to your account. Virtual machines are recycled when idle for a while, and have a maximum lifetime enforced by the system.




      Given that, maybe this is a feature (ie "go use Google Cloud Storage, which works fine in Colaboratory")? When I first used the tool, I was hoping that any .csv files that were in the My File/Colab Notebooks Google Drive folder would be also loaded onto the VM instance that the notebook was running on :/










      share|improve this question
















      Has anyone figured out a way to keep files persisted across sessions in Google's newly open sourced Colaboratory?



      Using the sample notebooks, I'm successfully authenticating and transferring csv files from my Google Drive instance and have stashed them in /tmp, my ~, and ~/datalab. Pandas can read them just fine off of disk too. But once the session times out , it looks like the whole filesystem is wiped and a new VM is spun up, without downloaded files.



      I guess this isn't surprising given Google's Colaboratory Faq:




      Q: Where is my code executed? What happens to my execution state if I close the browser window?



      A: Code is executed in a virtual machine dedicated to your account. Virtual machines are recycled when idle for a while, and have a maximum lifetime enforced by the system.




      Given that, maybe this is a feature (ie "go use Google Cloud Storage, which works fine in Colaboratory")? When I first used the tool, I was hoping that any .csv files that were in the My File/Colab Notebooks Google Drive folder would be also loaded onto the VM instance that the notebook was running on :/







      python google-colaboratory






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 9 '17 at 4:48







      user3424705

















      asked Nov 9 '17 at 4:43









      user3424705user3424705

      3613




      3613
























          4 Answers
          4






          active

          oldest

          votes


















          4














          Your interpretation is correct. VMs are ephemeral and recycled after periods of inactivity. There's no mechanism for persistent data on the VM itself right now.



          In order for data to persist, you'll need to store it somewhere outside of the VM, e.g., Drive, GCS, or any other cloud hosting provider.



          Some recipes for loading and saving data from external sources is available in the I/O example notebook.






          share|improve this answer































            2














            Not sure whether this is the best solution, but you can sync your data between Colab and Drive with automated authentication like this: https://gist.github.com/rdinse/159f5d77f13d03e0183cb8f7154b170a






            share|improve this answer































              0














              Clouderizer may provide some data persistence, at the cost of a long setup(because you use google colab only as a host) and little space to work on.



              But, in my opinion that's best than have your file(s) "recycled" when you forget to save your progress.






              share|improve this answer































                0














                As you pointed out, Google Colaboratory's file system is ephemeral. There are workarounds, though there's a network latency penalty and code overhead: e.g. you can use boilerplate code in your notebooks to mount external file systems like GDrive (see their example notebook).



                Alternatively, while this is not supported in Colaboratory, other Jupyter hosting services – like Jupyo – provision dedicated VMs with persistent file systems so the data and the notebooks persist across sessions.






                share|improve this answer

























                  Your Answer






                  StackExchange.ifUsing("editor", function () {
                  StackExchange.using("externalEditor", function () {
                  StackExchange.using("snippets", function () {
                  StackExchange.snippets.init();
                  });
                  });
                  }, "code-snippets");

                  StackExchange.ready(function() {
                  var channelOptions = {
                  tags: "".split(" "),
                  id: "1"
                  };
                  initTagRenderer("".split(" "), "".split(" "), channelOptions);

                  StackExchange.using("externalEditor", function() {
                  // Have to fire editor after snippets, if snippets enabled
                  if (StackExchange.settings.snippets.snippetsEnabled) {
                  StackExchange.using("snippets", function() {
                  createEditor();
                  });
                  }
                  else {
                  createEditor();
                  }
                  });

                  function createEditor() {
                  StackExchange.prepareEditor({
                  heartbeatType: 'answer',
                  autoActivateHeartbeat: false,
                  convertImagesToLinks: true,
                  noModals: true,
                  showLowRepImageUploadWarning: true,
                  reputationToPostImages: 10,
                  bindNavPrevention: true,
                  postfix: "",
                  imageUploader: {
                  brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                  contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                  allowUrls: true
                  },
                  onDemand: true,
                  discardSelector: ".discard-answer"
                  ,immediatelyShowMarkdownHelp:true
                  });


                  }
                  });














                  draft saved

                  draft discarded


















                  StackExchange.ready(
                  function () {
                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47194063%2fpersisting-data-in-google-colaboratory%23new-answer', 'question_page');
                  }
                  );

                  Post as a guest















                  Required, but never shown

























                  4 Answers
                  4






                  active

                  oldest

                  votes








                  4 Answers
                  4






                  active

                  oldest

                  votes









                  active

                  oldest

                  votes






                  active

                  oldest

                  votes









                  4














                  Your interpretation is correct. VMs are ephemeral and recycled after periods of inactivity. There's no mechanism for persistent data on the VM itself right now.



                  In order for data to persist, you'll need to store it somewhere outside of the VM, e.g., Drive, GCS, or any other cloud hosting provider.



                  Some recipes for loading and saving data from external sources is available in the I/O example notebook.






                  share|improve this answer




























                    4














                    Your interpretation is correct. VMs are ephemeral and recycled after periods of inactivity. There's no mechanism for persistent data on the VM itself right now.



                    In order for data to persist, you'll need to store it somewhere outside of the VM, e.g., Drive, GCS, or any other cloud hosting provider.



                    Some recipes for loading and saving data from external sources is available in the I/O example notebook.






                    share|improve this answer


























                      4












                      4








                      4







                      Your interpretation is correct. VMs are ephemeral and recycled after periods of inactivity. There's no mechanism for persistent data on the VM itself right now.



                      In order for data to persist, you'll need to store it somewhere outside of the VM, e.g., Drive, GCS, or any other cloud hosting provider.



                      Some recipes for loading and saving data from external sources is available in the I/O example notebook.






                      share|improve this answer













                      Your interpretation is correct. VMs are ephemeral and recycled after periods of inactivity. There's no mechanism for persistent data on the VM itself right now.



                      In order for data to persist, you'll need to store it somewhere outside of the VM, e.g., Drive, GCS, or any other cloud hosting provider.



                      Some recipes for loading and saving data from external sources is available in the I/O example notebook.







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Nov 9 '17 at 4:54









                      Bob SmithBob Smith

                      6,94722130




                      6,94722130

























                          2














                          Not sure whether this is the best solution, but you can sync your data between Colab and Drive with automated authentication like this: https://gist.github.com/rdinse/159f5d77f13d03e0183cb8f7154b170a






                          share|improve this answer




























                            2














                            Not sure whether this is the best solution, but you can sync your data between Colab and Drive with automated authentication like this: https://gist.github.com/rdinse/159f5d77f13d03e0183cb8f7154b170a






                            share|improve this answer


























                              2












                              2








                              2







                              Not sure whether this is the best solution, but you can sync your data between Colab and Drive with automated authentication like this: https://gist.github.com/rdinse/159f5d77f13d03e0183cb8f7154b170a






                              share|improve this answer













                              Not sure whether this is the best solution, but you can sync your data between Colab and Drive with automated authentication like this: https://gist.github.com/rdinse/159f5d77f13d03e0183cb8f7154b170a







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered May 2 '18 at 21:15









                              Robin DinseRobin Dinse

                              624610




                              624610























                                  0














                                  Clouderizer may provide some data persistence, at the cost of a long setup(because you use google colab only as a host) and little space to work on.



                                  But, in my opinion that's best than have your file(s) "recycled" when you forget to save your progress.






                                  share|improve this answer




























                                    0














                                    Clouderizer may provide some data persistence, at the cost of a long setup(because you use google colab only as a host) and little space to work on.



                                    But, in my opinion that's best than have your file(s) "recycled" when you forget to save your progress.






                                    share|improve this answer


























                                      0












                                      0








                                      0







                                      Clouderizer may provide some data persistence, at the cost of a long setup(because you use google colab only as a host) and little space to work on.



                                      But, in my opinion that's best than have your file(s) "recycled" when you forget to save your progress.






                                      share|improve this answer













                                      Clouderizer may provide some data persistence, at the cost of a long setup(because you use google colab only as a host) and little space to work on.



                                      But, in my opinion that's best than have your file(s) "recycled" when you forget to save your progress.







                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Apr 18 '18 at 1:20









                                      LeandroHumbLeandroHumb

                                      155111




                                      155111























                                          0














                                          As you pointed out, Google Colaboratory's file system is ephemeral. There are workarounds, though there's a network latency penalty and code overhead: e.g. you can use boilerplate code in your notebooks to mount external file systems like GDrive (see their example notebook).



                                          Alternatively, while this is not supported in Colaboratory, other Jupyter hosting services – like Jupyo – provision dedicated VMs with persistent file systems so the data and the notebooks persist across sessions.






                                          share|improve this answer






























                                            0














                                            As you pointed out, Google Colaboratory's file system is ephemeral. There are workarounds, though there's a network latency penalty and code overhead: e.g. you can use boilerplate code in your notebooks to mount external file systems like GDrive (see their example notebook).



                                            Alternatively, while this is not supported in Colaboratory, other Jupyter hosting services – like Jupyo – provision dedicated VMs with persistent file systems so the data and the notebooks persist across sessions.






                                            share|improve this answer




























                                              0












                                              0








                                              0







                                              As you pointed out, Google Colaboratory's file system is ephemeral. There are workarounds, though there's a network latency penalty and code overhead: e.g. you can use boilerplate code in your notebooks to mount external file systems like GDrive (see their example notebook).



                                              Alternatively, while this is not supported in Colaboratory, other Jupyter hosting services – like Jupyo – provision dedicated VMs with persistent file systems so the data and the notebooks persist across sessions.






                                              share|improve this answer















                                              As you pointed out, Google Colaboratory's file system is ephemeral. There are workarounds, though there's a network latency penalty and code overhead: e.g. you can use boilerplate code in your notebooks to mount external file systems like GDrive (see their example notebook).



                                              Alternatively, while this is not supported in Colaboratory, other Jupyter hosting services – like Jupyo – provision dedicated VMs with persistent file systems so the data and the notebooks persist across sessions.







                                              share|improve this answer














                                              share|improve this answer



                                              share|improve this answer








                                              edited Dec 30 '18 at 17:17

























                                              answered Dec 30 '18 at 16:54









                                              arturartur

                                              580614




                                              580614






























                                                  draft saved

                                                  draft discarded




















































                                                  Thanks for contributing an answer to Stack Overflow!


                                                  • Please be sure to answer the question. Provide details and share your research!

                                                  But avoid



                                                  • Asking for help, clarification, or responding to other answers.

                                                  • Making statements based on opinion; back them up with references or personal experience.


                                                  To learn more, see our tips on writing great answers.




                                                  draft saved


                                                  draft discarded














                                                  StackExchange.ready(
                                                  function () {
                                                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47194063%2fpersisting-data-in-google-colaboratory%23new-answer', 'question_page');
                                                  }
                                                  );

                                                  Post as a guest















                                                  Required, but never shown





















































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown

































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown







                                                  Popular posts from this blog

                                                  Monofisismo

                                                  Angular Downloading a file using contenturl with Basic Authentication

                                                  Olmecas