When it's worth the tradeoff of using local secondary index in DynamoDB?












1















I've read guidelines for secondary indexes but I'm not sure when the ability to search fast outweighs the disadvantage of scan over attributes. Let me give you an example.



I am saving game progress data for users. The PK is user ID. I need to be able to:




  1. Find out user progress about a particular game.


  2. Get all finished/in progress games for a user.



Thus, I can design my SK as progress_{state} to be able to query all games by progress fast (state represents started/finished) or I can design my SK as progress_{gameId} to be able to query progress of a given game fast. However, I can't have both using just SK. When I chose one, the other operation will require a scan.



Therefore, I was thinking about using LSI which will add an overhead to the whole table as noted by Amazon here:




Every secondary index means more work for DynamoDB. When you add, delete, or replace items in a table that has local secondary indexes, DynamoDB will use additional write capacity units to update the relevant indexes.




I estimate maximum thousands of types games and I wonder whether it's worth using LSI or whether it's better to use scans for the other operation I choose.



Does anyone has any real experience with such problem? I was not able to find anything on this topic.










share|improve this question





























    1















    I've read guidelines for secondary indexes but I'm not sure when the ability to search fast outweighs the disadvantage of scan over attributes. Let me give you an example.



    I am saving game progress data for users. The PK is user ID. I need to be able to:




    1. Find out user progress about a particular game.


    2. Get all finished/in progress games for a user.



    Thus, I can design my SK as progress_{state} to be able to query all games by progress fast (state represents started/finished) or I can design my SK as progress_{gameId} to be able to query progress of a given game fast. However, I can't have both using just SK. When I chose one, the other operation will require a scan.



    Therefore, I was thinking about using LSI which will add an overhead to the whole table as noted by Amazon here:




    Every secondary index means more work for DynamoDB. When you add, delete, or replace items in a table that has local secondary indexes, DynamoDB will use additional write capacity units to update the relevant indexes.




    I estimate maximum thousands of types games and I wonder whether it's worth using LSI or whether it's better to use scans for the other operation I choose.



    Does anyone has any real experience with such problem? I was not able to find anything on this topic.










    share|improve this question



























      1












      1








      1








      I've read guidelines for secondary indexes but I'm not sure when the ability to search fast outweighs the disadvantage of scan over attributes. Let me give you an example.



      I am saving game progress data for users. The PK is user ID. I need to be able to:




      1. Find out user progress about a particular game.


      2. Get all finished/in progress games for a user.



      Thus, I can design my SK as progress_{state} to be able to query all games by progress fast (state represents started/finished) or I can design my SK as progress_{gameId} to be able to query progress of a given game fast. However, I can't have both using just SK. When I chose one, the other operation will require a scan.



      Therefore, I was thinking about using LSI which will add an overhead to the whole table as noted by Amazon here:




      Every secondary index means more work for DynamoDB. When you add, delete, or replace items in a table that has local secondary indexes, DynamoDB will use additional write capacity units to update the relevant indexes.




      I estimate maximum thousands of types games and I wonder whether it's worth using LSI or whether it's better to use scans for the other operation I choose.



      Does anyone has any real experience with such problem? I was not able to find anything on this topic.










      share|improve this question
















      I've read guidelines for secondary indexes but I'm not sure when the ability to search fast outweighs the disadvantage of scan over attributes. Let me give you an example.



      I am saving game progress data for users. The PK is user ID. I need to be able to:




      1. Find out user progress about a particular game.


      2. Get all finished/in progress games for a user.



      Thus, I can design my SK as progress_{state} to be able to query all games by progress fast (state represents started/finished) or I can design my SK as progress_{gameId} to be able to query progress of a given game fast. However, I can't have both using just SK. When I chose one, the other operation will require a scan.



      Therefore, I was thinking about using LSI which will add an overhead to the whole table as noted by Amazon here:




      Every secondary index means more work for DynamoDB. When you add, delete, or replace items in a table that has local secondary indexes, DynamoDB will use additional write capacity units to update the relevant indexes.




      I estimate maximum thousands of types games and I wonder whether it's worth using LSI or whether it's better to use scans for the other operation I choose.



      Does anyone has any real experience with such problem? I was not able to find anything on this topic.







      amazon-web-services amazon-dynamodb amazon-dynamodb-index






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 31 '18 at 8:11









      Uwe Keim

      27.5k31130210




      27.5k31130210










      asked Dec 30 '18 at 23:08









      MartinMartin

      3527




      3527
























          1 Answer
          1






          active

          oldest

          votes


















          2














          When you are designing DynamoDB tables, the main cost factor comes with IOPS for reads and writes.



          This is why avoiding scans are usually better. Scans will consume a significant amount of read IOPS and it will increase with the number of items in the table since scan needs to read all the items in the table before returning the matching items.



          Then coming back to your use-case of using SK for progress, it would be better to use attributes and define Secondary Indexes, since you will need to update the state later on (Which is not possible with PK and SK in the table).



          So based on your use-case and the information given in the question you can define the schema as;



          PK- UserID
          SK- GameID
          GSI- Progress (PK)



          Query all games by progress fast
          GSI Progress (PK)



          Note: if this is for a particular user; you can change it to LSI Progress.



          Query progress of a given game fast (Assuming that for a given user)
          Query using UserID (PK) and GameID (SK) of the Table






          share|improve this answer


























          • Hi Ashan, thank you for the useful answer. May I ask one more thing? Currently, I save both game data (e.g. price, description of game) and progress data in one table with different PKs. Would you split this table into 2 (User_data/Game_data) so the game data queries are not 'hit' by the LSI overhead or is it ok to keep everything in one table.

            – Martin
            Dec 31 '18 at 10:02






          • 1





            Hi @Martin It seems splitting into two tables make sense. Rationale here is in keeping items small (to keep 1kb-write and 4kb-read limits per IOPS per table item) and also seperating data with different query patterns.

            – Ashan
            Dec 31 '18 at 10:18













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53982164%2fwhen-its-worth-the-tradeoff-of-using-local-secondary-index-in-dynamodb%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          When you are designing DynamoDB tables, the main cost factor comes with IOPS for reads and writes.



          This is why avoiding scans are usually better. Scans will consume a significant amount of read IOPS and it will increase with the number of items in the table since scan needs to read all the items in the table before returning the matching items.



          Then coming back to your use-case of using SK for progress, it would be better to use attributes and define Secondary Indexes, since you will need to update the state later on (Which is not possible with PK and SK in the table).



          So based on your use-case and the information given in the question you can define the schema as;



          PK- UserID
          SK- GameID
          GSI- Progress (PK)



          Query all games by progress fast
          GSI Progress (PK)



          Note: if this is for a particular user; you can change it to LSI Progress.



          Query progress of a given game fast (Assuming that for a given user)
          Query using UserID (PK) and GameID (SK) of the Table






          share|improve this answer


























          • Hi Ashan, thank you for the useful answer. May I ask one more thing? Currently, I save both game data (e.g. price, description of game) and progress data in one table with different PKs. Would you split this table into 2 (User_data/Game_data) so the game data queries are not 'hit' by the LSI overhead or is it ok to keep everything in one table.

            – Martin
            Dec 31 '18 at 10:02






          • 1





            Hi @Martin It seems splitting into two tables make sense. Rationale here is in keeping items small (to keep 1kb-write and 4kb-read limits per IOPS per table item) and also seperating data with different query patterns.

            – Ashan
            Dec 31 '18 at 10:18


















          2














          When you are designing DynamoDB tables, the main cost factor comes with IOPS for reads and writes.



          This is why avoiding scans are usually better. Scans will consume a significant amount of read IOPS and it will increase with the number of items in the table since scan needs to read all the items in the table before returning the matching items.



          Then coming back to your use-case of using SK for progress, it would be better to use attributes and define Secondary Indexes, since you will need to update the state later on (Which is not possible with PK and SK in the table).



          So based on your use-case and the information given in the question you can define the schema as;



          PK- UserID
          SK- GameID
          GSI- Progress (PK)



          Query all games by progress fast
          GSI Progress (PK)



          Note: if this is for a particular user; you can change it to LSI Progress.



          Query progress of a given game fast (Assuming that for a given user)
          Query using UserID (PK) and GameID (SK) of the Table






          share|improve this answer


























          • Hi Ashan, thank you for the useful answer. May I ask one more thing? Currently, I save both game data (e.g. price, description of game) and progress data in one table with different PKs. Would you split this table into 2 (User_data/Game_data) so the game data queries are not 'hit' by the LSI overhead or is it ok to keep everything in one table.

            – Martin
            Dec 31 '18 at 10:02






          • 1





            Hi @Martin It seems splitting into two tables make sense. Rationale here is in keeping items small (to keep 1kb-write and 4kb-read limits per IOPS per table item) and also seperating data with different query patterns.

            – Ashan
            Dec 31 '18 at 10:18
















          2












          2








          2







          When you are designing DynamoDB tables, the main cost factor comes with IOPS for reads and writes.



          This is why avoiding scans are usually better. Scans will consume a significant amount of read IOPS and it will increase with the number of items in the table since scan needs to read all the items in the table before returning the matching items.



          Then coming back to your use-case of using SK for progress, it would be better to use attributes and define Secondary Indexes, since you will need to update the state later on (Which is not possible with PK and SK in the table).



          So based on your use-case and the information given in the question you can define the schema as;



          PK- UserID
          SK- GameID
          GSI- Progress (PK)



          Query all games by progress fast
          GSI Progress (PK)



          Note: if this is for a particular user; you can change it to LSI Progress.



          Query progress of a given game fast (Assuming that for a given user)
          Query using UserID (PK) and GameID (SK) of the Table






          share|improve this answer















          When you are designing DynamoDB tables, the main cost factor comes with IOPS for reads and writes.



          This is why avoiding scans are usually better. Scans will consume a significant amount of read IOPS and it will increase with the number of items in the table since scan needs to read all the items in the table before returning the matching items.



          Then coming back to your use-case of using SK for progress, it would be better to use attributes and define Secondary Indexes, since you will need to update the state later on (Which is not possible with PK and SK in the table).



          So based on your use-case and the information given in the question you can define the schema as;



          PK- UserID
          SK- GameID
          GSI- Progress (PK)



          Query all games by progress fast
          GSI Progress (PK)



          Note: if this is for a particular user; you can change it to LSI Progress.



          Query progress of a given game fast (Assuming that for a given user)
          Query using UserID (PK) and GameID (SK) of the Table







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Dec 31 '18 at 8:10

























          answered Dec 31 '18 at 7:52









          AshanAshan

          10.3k21935




          10.3k21935













          • Hi Ashan, thank you for the useful answer. May I ask one more thing? Currently, I save both game data (e.g. price, description of game) and progress data in one table with different PKs. Would you split this table into 2 (User_data/Game_data) so the game data queries are not 'hit' by the LSI overhead or is it ok to keep everything in one table.

            – Martin
            Dec 31 '18 at 10:02






          • 1





            Hi @Martin It seems splitting into two tables make sense. Rationale here is in keeping items small (to keep 1kb-write and 4kb-read limits per IOPS per table item) and also seperating data with different query patterns.

            – Ashan
            Dec 31 '18 at 10:18





















          • Hi Ashan, thank you for the useful answer. May I ask one more thing? Currently, I save both game data (e.g. price, description of game) and progress data in one table with different PKs. Would you split this table into 2 (User_data/Game_data) so the game data queries are not 'hit' by the LSI overhead or is it ok to keep everything in one table.

            – Martin
            Dec 31 '18 at 10:02






          • 1





            Hi @Martin It seems splitting into two tables make sense. Rationale here is in keeping items small (to keep 1kb-write and 4kb-read limits per IOPS per table item) and also seperating data with different query patterns.

            – Ashan
            Dec 31 '18 at 10:18



















          Hi Ashan, thank you for the useful answer. May I ask one more thing? Currently, I save both game data (e.g. price, description of game) and progress data in one table with different PKs. Would you split this table into 2 (User_data/Game_data) so the game data queries are not 'hit' by the LSI overhead or is it ok to keep everything in one table.

          – Martin
          Dec 31 '18 at 10:02





          Hi Ashan, thank you for the useful answer. May I ask one more thing? Currently, I save both game data (e.g. price, description of game) and progress data in one table with different PKs. Would you split this table into 2 (User_data/Game_data) so the game data queries are not 'hit' by the LSI overhead or is it ok to keep everything in one table.

          – Martin
          Dec 31 '18 at 10:02




          1




          1





          Hi @Martin It seems splitting into two tables make sense. Rationale here is in keeping items small (to keep 1kb-write and 4kb-read limits per IOPS per table item) and also seperating data with different query patterns.

          – Ashan
          Dec 31 '18 at 10:18







          Hi @Martin It seems splitting into two tables make sense. Rationale here is in keeping items small (to keep 1kb-write and 4kb-read limits per IOPS per table item) and also seperating data with different query patterns.

          – Ashan
          Dec 31 '18 at 10:18




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53982164%2fwhen-its-worth-the-tradeoff-of-using-local-secondary-index-in-dynamodb%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Monofisismo

          Angular Downloading a file using contenturl with Basic Authentication

          Olmecas