DEFLATE: dynamic block structure
I am trying to construct an explicit example of a dynamic block. Please let me know if this is wrong.
Considering this example of lit/len alphabet:
A(0), B(0), C(1), D(0), E(2), F(2), G(2), H(2)
and the rest of symbols having zero code lengths.
The sequence(SQ) of code lengths would be 0...,0,0,1,0,2,2,2,2,...0.
Then we have to compress it further with run-length encoding. So we have to calculate number of repetitions and either use flag 16 to copy the previous code length, or 17 or 18 to repeat code length 0 (using extra bits).
My problem is this. After sending the header information and the sequence of code-length code lengths in the right order 16,17,18,..., the next sequence of information would be something like:
18, some extra bits value,1,0,2,16, some extra bits value 0,18, some extra bits value. (Probably there would be another 18 flag since the maximum repeat count is 138.)
Then we have the same thing with the distance alphabet and finally the inputs data encoded with the canonical Huffman, and extra bits if necessary.
Is it necessary to send the code lengths of 0? If so, why?
If yes, why is it necessary to have hclit and hcdist and not only hclen, knowing that the lengths of the sequences are 286 for lit/len and 30 for distances?
If not what would be the real solution?
Another problem:
In this case we have code length 2 with repetitions (3) extra bits value of 0.
- Is this last number also included in the code length tree construction?
If yes I can't understand how: flag 18 has next a maximum possible extra bits value of 127 (1111111) representing 138 repetitions and it couldn't be included into the alphabet symbols of 0-18.
P.S When I say extra bits in this case I mean the factor that it is used to know how many repetitions of the previous length are used.
More precisely 0 - 15 we have 0 bit factor repetition, for 16,17,18 we have 2,3,7 bits repetitions factor. The value of those bits is what I mean with extra bits value.
I think I'm missing something about what Huffman codes are generated by the Huffman code-length alphabet.
deflate
add a comment |
I am trying to construct an explicit example of a dynamic block. Please let me know if this is wrong.
Considering this example of lit/len alphabet:
A(0), B(0), C(1), D(0), E(2), F(2), G(2), H(2)
and the rest of symbols having zero code lengths.
The sequence(SQ) of code lengths would be 0...,0,0,1,0,2,2,2,2,...0.
Then we have to compress it further with run-length encoding. So we have to calculate number of repetitions and either use flag 16 to copy the previous code length, or 17 or 18 to repeat code length 0 (using extra bits).
My problem is this. After sending the header information and the sequence of code-length code lengths in the right order 16,17,18,..., the next sequence of information would be something like:
18, some extra bits value,1,0,2,16, some extra bits value 0,18, some extra bits value. (Probably there would be another 18 flag since the maximum repeat count is 138.)
Then we have the same thing with the distance alphabet and finally the inputs data encoded with the canonical Huffman, and extra bits if necessary.
Is it necessary to send the code lengths of 0? If so, why?
If yes, why is it necessary to have hclit and hcdist and not only hclen, knowing that the lengths of the sequences are 286 for lit/len and 30 for distances?
If not what would be the real solution?
Another problem:
In this case we have code length 2 with repetitions (3) extra bits value of 0.
- Is this last number also included in the code length tree construction?
If yes I can't understand how: flag 18 has next a maximum possible extra bits value of 127 (1111111) representing 138 repetitions and it couldn't be included into the alphabet symbols of 0-18.
P.S When I say extra bits in this case I mean the factor that it is used to know how many repetitions of the previous length are used.
More precisely 0 - 15 we have 0 bit factor repetition, for 16,17,18 we have 2,3,7 bits repetitions factor. The value of those bits is what I mean with extra bits value.
I think I'm missing something about what Huffman codes are generated by the Huffman code-length alphabet.
deflate
add a comment |
I am trying to construct an explicit example of a dynamic block. Please let me know if this is wrong.
Considering this example of lit/len alphabet:
A(0), B(0), C(1), D(0), E(2), F(2), G(2), H(2)
and the rest of symbols having zero code lengths.
The sequence(SQ) of code lengths would be 0...,0,0,1,0,2,2,2,2,...0.
Then we have to compress it further with run-length encoding. So we have to calculate number of repetitions and either use flag 16 to copy the previous code length, or 17 or 18 to repeat code length 0 (using extra bits).
My problem is this. After sending the header information and the sequence of code-length code lengths in the right order 16,17,18,..., the next sequence of information would be something like:
18, some extra bits value,1,0,2,16, some extra bits value 0,18, some extra bits value. (Probably there would be another 18 flag since the maximum repeat count is 138.)
Then we have the same thing with the distance alphabet and finally the inputs data encoded with the canonical Huffman, and extra bits if necessary.
Is it necessary to send the code lengths of 0? If so, why?
If yes, why is it necessary to have hclit and hcdist and not only hclen, knowing that the lengths of the sequences are 286 for lit/len and 30 for distances?
If not what would be the real solution?
Another problem:
In this case we have code length 2 with repetitions (3) extra bits value of 0.
- Is this last number also included in the code length tree construction?
If yes I can't understand how: flag 18 has next a maximum possible extra bits value of 127 (1111111) representing 138 repetitions and it couldn't be included into the alphabet symbols of 0-18.
P.S When I say extra bits in this case I mean the factor that it is used to know how many repetitions of the previous length are used.
More precisely 0 - 15 we have 0 bit factor repetition, for 16,17,18 we have 2,3,7 bits repetitions factor. The value of those bits is what I mean with extra bits value.
I think I'm missing something about what Huffman codes are generated by the Huffman code-length alphabet.
deflate
I am trying to construct an explicit example of a dynamic block. Please let me know if this is wrong.
Considering this example of lit/len alphabet:
A(0), B(0), C(1), D(0), E(2), F(2), G(2), H(2)
and the rest of symbols having zero code lengths.
The sequence(SQ) of code lengths would be 0...,0,0,1,0,2,2,2,2,...0.
Then we have to compress it further with run-length encoding. So we have to calculate number of repetitions and either use flag 16 to copy the previous code length, or 17 or 18 to repeat code length 0 (using extra bits).
My problem is this. After sending the header information and the sequence of code-length code lengths in the right order 16,17,18,..., the next sequence of information would be something like:
18, some extra bits value,1,0,2,16, some extra bits value 0,18, some extra bits value. (Probably there would be another 18 flag since the maximum repeat count is 138.)
Then we have the same thing with the distance alphabet and finally the inputs data encoded with the canonical Huffman, and extra bits if necessary.
Is it necessary to send the code lengths of 0? If so, why?
If yes, why is it necessary to have hclit and hcdist and not only hclen, knowing that the lengths of the sequences are 286 for lit/len and 30 for distances?
If not what would be the real solution?
Another problem:
In this case we have code length 2 with repetitions (3) extra bits value of 0.
- Is this last number also included in the code length tree construction?
If yes I can't understand how: flag 18 has next a maximum possible extra bits value of 127 (1111111) representing 138 repetitions and it couldn't be included into the alphabet symbols of 0-18.
P.S When I say extra bits in this case I mean the factor that it is used to know how many repetitions of the previous length are used.
More precisely 0 - 15 we have 0 bit factor repetition, for 16,17,18 we have 2,3,7 bits repetitions factor. The value of those bits is what I mean with extra bits value.
I think I'm missing something about what Huffman codes are generated by the Huffman code-length alphabet.
deflate
deflate
edited Dec 30 '18 at 19:53
Franco Bosi
asked Dec 29 '18 at 21:23
Franco BosiFranco Bosi
94
94
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
First off, your example code is invalid, since it oversubscribes the available bit patterns. 2,2,2,2 would use all of the bit patterns, since there are only four possible two-bit patterns. You can't have one more code, of any length. Possible valid codes for five symbols is 1,2,3,4,4, or 2,2,2,3,3.
To answer your questions in order:
You need to send the leading zeros, but you do not need to send the trailing zeros. The HLIT and HDIST counts determine how many lengths of each code type are in the header, where any after those are taken to be zero. You need to send the zeros, since the code lengths are associated with the corresponding symbol by their position in the list.
It saves space in header to have the HLIT and HDIST counts, so that you don't need to provide lengths for all 316 codes in every header.
I don't understand this question, but I guess it doesn't apply.
If I understand your question, extra bits have nothing to do with the descriptions of the Huffman codes in the headers. The extra bits are implied by the symbol. In any case, a repeated length is encoded with code 16, not code 18. So the four twos would be encoded as 2, 16(0), where the (0) represents two extra bits that are zeros.
I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?
– Franco Bosi
Dec 30 '18 at 19:42
No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.
– Mark Adler
Dec 30 '18 at 20:55
Ok thank you Adler!
– Franco Bosi
Dec 30 '18 at 21:06
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53973435%2fdeflate-dynamic-block-structure%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
First off, your example code is invalid, since it oversubscribes the available bit patterns. 2,2,2,2 would use all of the bit patterns, since there are only four possible two-bit patterns. You can't have one more code, of any length. Possible valid codes for five symbols is 1,2,3,4,4, or 2,2,2,3,3.
To answer your questions in order:
You need to send the leading zeros, but you do not need to send the trailing zeros. The HLIT and HDIST counts determine how many lengths of each code type are in the header, where any after those are taken to be zero. You need to send the zeros, since the code lengths are associated with the corresponding symbol by their position in the list.
It saves space in header to have the HLIT and HDIST counts, so that you don't need to provide lengths for all 316 codes in every header.
I don't understand this question, but I guess it doesn't apply.
If I understand your question, extra bits have nothing to do with the descriptions of the Huffman codes in the headers. The extra bits are implied by the symbol. In any case, a repeated length is encoded with code 16, not code 18. So the four twos would be encoded as 2, 16(0), where the (0) represents two extra bits that are zeros.
I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?
– Franco Bosi
Dec 30 '18 at 19:42
No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.
– Mark Adler
Dec 30 '18 at 20:55
Ok thank you Adler!
– Franco Bosi
Dec 30 '18 at 21:06
add a comment |
First off, your example code is invalid, since it oversubscribes the available bit patterns. 2,2,2,2 would use all of the bit patterns, since there are only four possible two-bit patterns. You can't have one more code, of any length. Possible valid codes for five symbols is 1,2,3,4,4, or 2,2,2,3,3.
To answer your questions in order:
You need to send the leading zeros, but you do not need to send the trailing zeros. The HLIT and HDIST counts determine how many lengths of each code type are in the header, where any after those are taken to be zero. You need to send the zeros, since the code lengths are associated with the corresponding symbol by their position in the list.
It saves space in header to have the HLIT and HDIST counts, so that you don't need to provide lengths for all 316 codes in every header.
I don't understand this question, but I guess it doesn't apply.
If I understand your question, extra bits have nothing to do with the descriptions of the Huffman codes in the headers. The extra bits are implied by the symbol. In any case, a repeated length is encoded with code 16, not code 18. So the four twos would be encoded as 2, 16(0), where the (0) represents two extra bits that are zeros.
I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?
– Franco Bosi
Dec 30 '18 at 19:42
No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.
– Mark Adler
Dec 30 '18 at 20:55
Ok thank you Adler!
– Franco Bosi
Dec 30 '18 at 21:06
add a comment |
First off, your example code is invalid, since it oversubscribes the available bit patterns. 2,2,2,2 would use all of the bit patterns, since there are only four possible two-bit patterns. You can't have one more code, of any length. Possible valid codes for five symbols is 1,2,3,4,4, or 2,2,2,3,3.
To answer your questions in order:
You need to send the leading zeros, but you do not need to send the trailing zeros. The HLIT and HDIST counts determine how many lengths of each code type are in the header, where any after those are taken to be zero. You need to send the zeros, since the code lengths are associated with the corresponding symbol by their position in the list.
It saves space in header to have the HLIT and HDIST counts, so that you don't need to provide lengths for all 316 codes in every header.
I don't understand this question, but I guess it doesn't apply.
If I understand your question, extra bits have nothing to do with the descriptions of the Huffman codes in the headers. The extra bits are implied by the symbol. In any case, a repeated length is encoded with code 16, not code 18. So the four twos would be encoded as 2, 16(0), where the (0) represents two extra bits that are zeros.
First off, your example code is invalid, since it oversubscribes the available bit patterns. 2,2,2,2 would use all of the bit patterns, since there are only four possible two-bit patterns. You can't have one more code, of any length. Possible valid codes for five symbols is 1,2,3,4,4, or 2,2,2,3,3.
To answer your questions in order:
You need to send the leading zeros, but you do not need to send the trailing zeros. The HLIT and HDIST counts determine how many lengths of each code type are in the header, where any after those are taken to be zero. You need to send the zeros, since the code lengths are associated with the corresponding symbol by their position in the list.
It saves space in header to have the HLIT and HDIST counts, so that you don't need to provide lengths for all 316 codes in every header.
I don't understand this question, but I guess it doesn't apply.
If I understand your question, extra bits have nothing to do with the descriptions of the Huffman codes in the headers. The extra bits are implied by the symbol. In any case, a repeated length is encoded with code 16, not code 18. So the four twos would be encoded as 2, 16(0), where the (0) represents two extra bits that are zeros.
answered Dec 30 '18 at 16:26
Mark AdlerMark Adler
57.8k763111
57.8k763111
I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?
– Franco Bosi
Dec 30 '18 at 19:42
No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.
– Mark Adler
Dec 30 '18 at 20:55
Ok thank you Adler!
– Franco Bosi
Dec 30 '18 at 21:06
add a comment |
I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?
– Franco Bosi
Dec 30 '18 at 19:42
No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.
– Mark Adler
Dec 30 '18 at 20:55
Ok thank you Adler!
– Franco Bosi
Dec 30 '18 at 21:06
I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?
– Franco Bosi
Dec 30 '18 at 19:42
I really appreciate your effort answering my question and for this you have my gratitude. For the last question when I say extra bits I mean the bit-factor number that you add after the flag 16 that in this case is 00 as you said, not the actual extra bits that are added afer the huffman codes of data. The 00 in this case is also included as a possible symbol of the huffman code length alphabet? In other words, all the numbers in the sequence after the run-encoding are used as a possible symbol of the code length alphabet?
– Franco Bosi
Dec 30 '18 at 19:42
No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.
– Mark Adler
Dec 30 '18 at 20:55
No. Extra bits are extra bits that go directly in the stream. As I said, they have nothing to do with the Huffman codes.
– Mark Adler
Dec 30 '18 at 20:55
Ok thank you Adler!
– Franco Bosi
Dec 30 '18 at 21:06
Ok thank you Adler!
– Franco Bosi
Dec 30 '18 at 21:06
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53973435%2fdeflate-dynamic-block-structure%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown