DEFLATE: dynamic block structure












0















I am trying to construct an explicit example of a dynamic block. Please let me know if this is wrong.



Considering this example of lit/len alphabet:



A(0), B(0), C(1), D(0), E(2), F(2), G(2), H(2)
and the rest of symbols having zero code lengths.



The sequence(SQ) of code lengths would be 0...,0,0,1,0,2,2,2,2,...0.



Then we have to compress it further with run-length encoding. So we have to calculate number of repetitions and either use flag 16 to copy the previous code length, or 17 or 18 to repeat code length 0 (using extra bits).



My problem is this. After sending the header information and the sequence of code-length code lengths in the right order 16,17,18,..., the next sequence of information would be something like:



18, some extra bits value,1,0,2,16, some extra bits value 0,18, some extra bits value. (Probably there would be another 18 flag since the maximum repeat count is 138.)



Then we have the same thing with the distance alphabet and finally the inputs data encoded with the canonical Huffman, and extra bits if necessary.




  1. Is it necessary to send the code lengths of 0? If so, why?


  2. If yes, why is it necessary to have hclit and hcdist and not only hclen, knowing that the lengths of the sequences are 286 for lit/len and 30 for distances?


  3. If not what would be the real solution?



Another problem:



In this case we have code length 2 with repetitions (3) extra bits value of 0.




  1. Is this last number also included in the code length tree construction?


If yes I can't understand how: flag 18 has next a maximum possible extra bits value of 127 (1111111) representing 138 repetitions and it couldn't be included into the alphabet symbols of 0-18.



P.S When I say extra bits in this case I mean the factor that it is used to know how many repetitions of the previous length are used.



More precisely 0 - 15 we have 0 bit factor repetition, for 16,17,18 we have 2,3,7 bits repetitions factor. The value of those bits is what I mean with extra bits value.



I think I'm missing something about what Huffman codes are generated by the Huffman code-length alphabet.










share|improve this question





























    0















    I am trying to construct an explicit example of a dynamic block. Please let me know if this is wrong.



    Considering this example of lit/len alphabet:



    A(0), B(0), C(1), D(0), E(2), F(2), G(2), H(2)
    and the rest of symbols having zero code lengths.



    The sequence(SQ) of code lengths would be 0...,0,0,1,0,2,2,2,2,...0.



    Then we have to compress it further with run-length encoding. So we have to calculate number of repetitions and either use flag 16 to copy the previous code length, or 17 or 18 to repeat code length 0 (using extra bits).



    My problem is this. After sending the header information and the sequence of code-length code lengths in the right order 16,17,18,..., the next sequence of information would be something like:



    18, some extra bits value,1,0,2,16, some extra bits value 0,18, some extra bits value. (Probably there would be another 18 flag since the maximum repeat count is 138.)



    Then we have the same thing with the distance alphabet and finally the inputs data encoded with the canonical Huffman, and extra bits if necessary.




    1. Is it necessary to send the code lengths of 0? If so, why?


    2. If yes, why is it necessary to have hclit and hcdist and not only hclen, knowing that the lengths of the sequences are 286 for lit/len and 30 for distances?


    3. If not what would be the real solution?



    Another problem:



    In this case we have code length 2 with repetitions (3) extra bits value of 0.




    1. Is this last number also included in the code length tree construction?


    If yes I can't understand how: flag 18 has next a maximum possible extra bits value of 127 (1111111) representing 138 repetitions and it couldn't be included into the alphabet symbols of 0-18.



    P.S When I say extra bits in this case I mean the factor that it is used to know how many repetitions of the previous length are used.



    More precisely 0 - 15 we have 0 bit factor repetition, for 16,17,18 we have 2,3,7 bits repetitions factor. The value of those bits is what I mean with extra bits value.



    I think I'm missing something about what Huffman codes are generated by the Huffman code-length alphabet.










    share|improve this question



























      0












      0








      0








      I am trying to construct an explicit example of a dynamic block. Please let me know if this is wrong.



      Considering this example of lit/len alphabet:



      A(0), B(0), C(1), D(0), E(2), F(2), G(2), H(2)
      and the rest of symbols having zero code lengths.



      The sequence(SQ) of code lengths would be 0...,0,0,1,0,2,2,2,2,...0.



      Then we have to compress it further with run-length encoding. So we have to calculate number of repetitions and either use flag 16 to copy the previous code length, or 17 or 18 to repeat code length 0 (using extra bits).



      My problem is this. After sending the header information and the sequence of code-length code lengths in the right order 16,17,18,..., the next sequence of information would be something like:



      18, some extra bits value,1,0,2,16, some extra bits value 0,18, some extra bits value. (Probably there would be another 18 flag since the maximum repeat count is 138.)



      Then we have the same thing with the distance alphabet and finally the inputs data encoded with the canonical Huffman, and extra bits if necessary.




      1. Is it necessary to send the code lengths of 0? If so, why?


      2. If yes, why is it necessary to have hclit and hcdist and not only hclen, knowing that the lengths of the sequences are 286 for lit/len and 30 for distances?


      3. If not what would be the real solution?



      Another problem:



      In this case we have code length 2 with repetitions (3) extra bits value of 0.




      1. Is this last number also included in the code length tree construction?


      If yes I can't understand how: flag 18 has next a maximum possible extra bits value of 127 (1111111) representing 138 repetitions and it couldn't be included into the alphabet symbols of 0-18.



      P.S When I say extra bits in this case I mean the factor that it is used to know how many repetitions of the previous length are used.



      More precisely 0 - 15 we have 0 bit factor repetition, for 16,17,18 we have 2,3,7 bits repetitions factor. The value of those bits is what I mean with extra bits value.



      I think I'm missing something about what Huffman codes are generated by the Huffman code-length alphabet.










      share|improve this question
















      I am trying to construct an explicit example of a dynamic block. Please let me know if this is wrong.



      Considering this example of lit/len alphabet:



      A(0), B(0), C(1), D(0), E(2), F(2), G(2), H(2)
      and the rest of symbols having zero code lengths.



      The sequence(SQ) of code lengths would be 0...,0,0,1,0,2,2,2,2,...0.



      Then we have to compress it further with run-length encoding. So we have to calculate number of repetitions and either use flag 16 to copy the previous code length, or 17 or 18 to repeat code length 0 (using extra bits).



      My problem is this. After sending the header information and the sequence of code-length code lengths in the right order 16,17,18,..., the next sequence of information would be something like:



      18, some extra bits value,1,0,2,16, some extra bits value 0,18, some extra bits value. (Probably there would be another 18 flag since the maximum repeat count is 138.)



      Then we have the same thing with the distance alphabet and finally the inputs data encoded with the canonical Huffman, and extra bits if necessary.




      1. Is it necessary to send the code lengths of 0? If so, why?


      2. If yes, why is it necessary to have hclit and hcdist and not only hclen, knowing that the lengths of the sequences are 286 for lit/len and 30 for distances?


      3. If not what would be the real solution?



      Another problem:



      In this case we have code length 2 with repetitions (3) extra bits value of 0.




      1. Is this last number also included in the code length tree construction?


      If yes I can't understand how: flag 18 has next a maximum possible extra bits value of 127 (1111111) representing 138 repetitions and it couldn't be included into the alphabet symbols of 0-18.



      P.S When I say extra bits in this case I mean the factor that it is used to know how many repetitions of the previous length are used.



      More precisely 0 - 15 we have 0 bit factor repetition, for 16,17,18 we have 2,3,7 bits repetitions factor. The value of those bits is what I mean with extra bits value.



      I think I'm missing something about what Huffman codes are generated by the Huffman code-length alphabet.







      deflate






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 30 '18 at 19:53







      Franco Bosi

















      asked Dec 29 '18 at 21:23









      Franco BosiFranco Bosi

      94




      94
























          1 Answer
          1






          active

          oldest

          votes


















          0














          First off, your example code is invalid, since it oversubscribes the available bit patterns. 2,2,2,2 would use all of the bit patterns, since there are only four possible two-bit patterns. You can't have one more code, of any length. Possible valid codes for five symbols is 1,2,3,4,4, or 2,2,2,3,3.



          To answer your questions in order:




          1. You need to send the leading zeros, but you do not need to send the trailing zeros. The HLIT and HDIST counts determine how many lengths of each code type are in the header, where any after those are taken to be zero. You need to send the zeros, since the code lengths are associated with the corresponding symbol by their position in the list.


          2. It saves space in header to have the HLIT and HDIST counts, so that you don't need to provide lengths for all 316 codes in every header.


          3. I don't understand this question, but I guess it doesn't apply.


          4. If I understand your question, extra bits have nothing to do with the descriptions of the Huffman codes in the headers. The extra bits are implied by the symbol. In any case, a repeated length is encoded with code 16, not code 18. So the four twos would be encoded as 2, 16(0), where the (0) represents two extra bits that are zeros.







          share|improve this answer
























          • I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?

            – Franco Bosi
            Dec 30 '18 at 19:42











          • No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.

            – Mark Adler
            Dec 30 '18 at 20:55













          • Ok thank you Adler!

            – Franco Bosi
            Dec 30 '18 at 21:06











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53973435%2fdeflate-dynamic-block-structure%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          First off, your example code is invalid, since it oversubscribes the available bit patterns. 2,2,2,2 would use all of the bit patterns, since there are only four possible two-bit patterns. You can't have one more code, of any length. Possible valid codes for five symbols is 1,2,3,4,4, or 2,2,2,3,3.



          To answer your questions in order:




          1. You need to send the leading zeros, but you do not need to send the trailing zeros. The HLIT and HDIST counts determine how many lengths of each code type are in the header, where any after those are taken to be zero. You need to send the zeros, since the code lengths are associated with the corresponding symbol by their position in the list.


          2. It saves space in header to have the HLIT and HDIST counts, so that you don't need to provide lengths for all 316 codes in every header.


          3. I don't understand this question, but I guess it doesn't apply.


          4. If I understand your question, extra bits have nothing to do with the descriptions of the Huffman codes in the headers. The extra bits are implied by the symbol. In any case, a repeated length is encoded with code 16, not code 18. So the four twos would be encoded as 2, 16(0), where the (0) represents two extra bits that are zeros.







          share|improve this answer
























          • I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?

            – Franco Bosi
            Dec 30 '18 at 19:42











          • No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.

            – Mark Adler
            Dec 30 '18 at 20:55













          • Ok thank you Adler!

            – Franco Bosi
            Dec 30 '18 at 21:06
















          0














          First off, your example code is invalid, since it oversubscribes the available bit patterns. 2,2,2,2 would use all of the bit patterns, since there are only four possible two-bit patterns. You can't have one more code, of any length. Possible valid codes for five symbols is 1,2,3,4,4, or 2,2,2,3,3.



          To answer your questions in order:




          1. You need to send the leading zeros, but you do not need to send the trailing zeros. The HLIT and HDIST counts determine how many lengths of each code type are in the header, where any after those are taken to be zero. You need to send the zeros, since the code lengths are associated with the corresponding symbol by their position in the list.


          2. It saves space in header to have the HLIT and HDIST counts, so that you don't need to provide lengths for all 316 codes in every header.


          3. I don't understand this question, but I guess it doesn't apply.


          4. If I understand your question, extra bits have nothing to do with the descriptions of the Huffman codes in the headers. The extra bits are implied by the symbol. In any case, a repeated length is encoded with code 16, not code 18. So the four twos would be encoded as 2, 16(0), where the (0) represents two extra bits that are zeros.







          share|improve this answer
























          • I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?

            – Franco Bosi
            Dec 30 '18 at 19:42











          • No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.

            – Mark Adler
            Dec 30 '18 at 20:55













          • Ok thank you Adler!

            – Franco Bosi
            Dec 30 '18 at 21:06














          0












          0








          0







          First off, your example code is invalid, since it oversubscribes the available bit patterns. 2,2,2,2 would use all of the bit patterns, since there are only four possible two-bit patterns. You can't have one more code, of any length. Possible valid codes for five symbols is 1,2,3,4,4, or 2,2,2,3,3.



          To answer your questions in order:




          1. You need to send the leading zeros, but you do not need to send the trailing zeros. The HLIT and HDIST counts determine how many lengths of each code type are in the header, where any after those are taken to be zero. You need to send the zeros, since the code lengths are associated with the corresponding symbol by their position in the list.


          2. It saves space in header to have the HLIT and HDIST counts, so that you don't need to provide lengths for all 316 codes in every header.


          3. I don't understand this question, but I guess it doesn't apply.


          4. If I understand your question, extra bits have nothing to do with the descriptions of the Huffman codes in the headers. The extra bits are implied by the symbol. In any case, a repeated length is encoded with code 16, not code 18. So the four twos would be encoded as 2, 16(0), where the (0) represents two extra bits that are zeros.







          share|improve this answer













          First off, your example code is invalid, since it oversubscribes the available bit patterns. 2,2,2,2 would use all of the bit patterns, since there are only four possible two-bit patterns. You can't have one more code, of any length. Possible valid codes for five symbols is 1,2,3,4,4, or 2,2,2,3,3.



          To answer your questions in order:




          1. You need to send the leading zeros, but you do not need to send the trailing zeros. The HLIT and HDIST counts determine how many lengths of each code type are in the header, where any after those are taken to be zero. You need to send the zeros, since the code lengths are associated with the corresponding symbol by their position in the list.


          2. It saves space in header to have the HLIT and HDIST counts, so that you don't need to provide lengths for all 316 codes in every header.


          3. I don't understand this question, but I guess it doesn't apply.


          4. If I understand your question, extra bits have nothing to do with the descriptions of the Huffman codes in the headers. The extra bits are implied by the symbol. In any case, a repeated length is encoded with code 16, not code 18. So the four twos would be encoded as 2, 16(0), where the (0) represents two extra bits that are zeros.








          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Dec 30 '18 at 16:26









          Mark AdlerMark Adler

          57.8k763111




          57.8k763111













          • I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?

            – Franco Bosi
            Dec 30 '18 at 19:42











          • No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.

            – Mark Adler
            Dec 30 '18 at 20:55













          • Ok thank you Adler!

            – Franco Bosi
            Dec 30 '18 at 21:06



















          • I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?

            – Franco Bosi
            Dec 30 '18 at 19:42











          • No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.

            – Mark Adler
            Dec 30 '18 at 20:55













          • Ok thank you Adler!

            – Franco Bosi
            Dec 30 '18 at 21:06

















          I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?

          – Franco Bosi
          Dec 30 '18 at 19:42





          I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?

          – Franco Bosi
          Dec 30 '18 at 19:42













          No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.

          – Mark Adler
          Dec 30 '18 at 20:55







          No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.

          – Mark Adler
          Dec 30 '18 at 20:55















          Ok thank you Adler!

          – Franco Bosi
          Dec 30 '18 at 21:06





          Ok thank you Adler!

          – Franco Bosi
          Dec 30 '18 at 21:06


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53973435%2fdeflate-dynamic-block-structure%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Mossoró

          Error while reading .h5 file using the rhdf5 package in R

          Pushsharp Apns notification error: 'InvalidToken'