how to batch a variable length spectogram in tensorflow
I have to train a denoising autoencoder but i need to batch the 5-frame noisy powerspectrum with 1 frame clean powerspectrum , but i dono how to batch the spectrogram since my data are all variable length in time-series.
def parse_line(noise_file,clean_file):
noise_binary = tf.read_file(noise_file)
noise_binary = tf.contrib.ffmpeg.decode_audio(noise_binary, file_format='wav', samples_per_second=16000, channel_count=1)
noise_stfts = tf.contrib.signal.stft(tf.reshape(noise_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)
noise_powerspectrum = tf.log(tf.abs(noise_stfts)**2)
noise_data = tf.squeeze(tf.contrib.signal.frame(noise_powerspectrum,frame_length=5,frame_step=1,axis=1))
clean_binary = tf.read_file(clean_file)
clean_binary = tf.contrib.ffmpeg.decode_audio(clean_binary, file_format='wav', samples_per_second=16000, channel_count=1)
clean_stfts = tf.contrib.signal.stft(tf.reshape(clean_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)
clean_powerspectrum = tf.log(tf.abs(clean_stfts)**2)
clean_data = tf.squeeze(clean_powerspectrum)[:-4]
return noise_data, clean_data
my tf.data pipeline is as shown below
shuffle_batch = 10
batch_size = 10
dataset = tf.data.Dataset.from_tensor_slices((noise_datalist,clean_datalist))
dataset = dataset.shuffle(shuffle_batch) # shuffle number of files perbatch
dataset = dataset.map(parse_line,num_parallel_calls=8)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)
dataset = dataset.make_one_shot_iterator()
next_element = dataset.get_next()
this is the errors that shows
InvalidArgumentError (see above for traceback): Cannot batch tensors with different shapes in component 0. First element had shape [443,5,257] and element 1 had shape [280,5,257].
[[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]
when i change the batch_size to 1 it works and get one data. How can I batch this variable length data or even maybe batch all data to 1 like [443,5,257] and [280,5,257] to [723,5,257]?
python-3.x tensorflow tensorflow-datasets
add a comment |
I have to train a denoising autoencoder but i need to batch the 5-frame noisy powerspectrum with 1 frame clean powerspectrum , but i dono how to batch the spectrogram since my data are all variable length in time-series.
def parse_line(noise_file,clean_file):
noise_binary = tf.read_file(noise_file)
noise_binary = tf.contrib.ffmpeg.decode_audio(noise_binary, file_format='wav', samples_per_second=16000, channel_count=1)
noise_stfts = tf.contrib.signal.stft(tf.reshape(noise_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)
noise_powerspectrum = tf.log(tf.abs(noise_stfts)**2)
noise_data = tf.squeeze(tf.contrib.signal.frame(noise_powerspectrum,frame_length=5,frame_step=1,axis=1))
clean_binary = tf.read_file(clean_file)
clean_binary = tf.contrib.ffmpeg.decode_audio(clean_binary, file_format='wav', samples_per_second=16000, channel_count=1)
clean_stfts = tf.contrib.signal.stft(tf.reshape(clean_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)
clean_powerspectrum = tf.log(tf.abs(clean_stfts)**2)
clean_data = tf.squeeze(clean_powerspectrum)[:-4]
return noise_data, clean_data
my tf.data pipeline is as shown below
shuffle_batch = 10
batch_size = 10
dataset = tf.data.Dataset.from_tensor_slices((noise_datalist,clean_datalist))
dataset = dataset.shuffle(shuffle_batch) # shuffle number of files perbatch
dataset = dataset.map(parse_line,num_parallel_calls=8)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)
dataset = dataset.make_one_shot_iterator()
next_element = dataset.get_next()
this is the errors that shows
InvalidArgumentError (see above for traceback): Cannot batch tensors with different shapes in component 0. First element had shape [443,5,257] and element 1 had shape [280,5,257].
[[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]
when i change the batch_size to 1 it works and get one data. How can I batch this variable length data or even maybe batch all data to 1 like [443,5,257] and [280,5,257] to [723,5,257]?
python-3.x tensorflow tensorflow-datasets
443 and 280 corresponds to shapes of your noisy data and clean data respectively?
– kvish
Jan 1 at 10:28
@kvish my noisy data are created from clean_data, so they are same size, but i have many different time in length of wav files(clean_data) and i read it from noise_datalist and clean_datalist . i have a list of noise and clean data like data1.wav, data2.wav which is different length of time which shows in the post, for example data1.wav have 443frames, data2.wav and 280 frames and etc.
– Leow
Jan 1 at 11:08
oh okay thanks for clearing that up. Would padding and batching change the context of what you are trying to do? It might produce different lengths of first component in different batches according to what is the maximum in that batch, if you do not know what is the overall max length that you would want to use.
– kvish
Jan 1 at 11:34
add a comment |
I have to train a denoising autoencoder but i need to batch the 5-frame noisy powerspectrum with 1 frame clean powerspectrum , but i dono how to batch the spectrogram since my data are all variable length in time-series.
def parse_line(noise_file,clean_file):
noise_binary = tf.read_file(noise_file)
noise_binary = tf.contrib.ffmpeg.decode_audio(noise_binary, file_format='wav', samples_per_second=16000, channel_count=1)
noise_stfts = tf.contrib.signal.stft(tf.reshape(noise_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)
noise_powerspectrum = tf.log(tf.abs(noise_stfts)**2)
noise_data = tf.squeeze(tf.contrib.signal.frame(noise_powerspectrum,frame_length=5,frame_step=1,axis=1))
clean_binary = tf.read_file(clean_file)
clean_binary = tf.contrib.ffmpeg.decode_audio(clean_binary, file_format='wav', samples_per_second=16000, channel_count=1)
clean_stfts = tf.contrib.signal.stft(tf.reshape(clean_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)
clean_powerspectrum = tf.log(tf.abs(clean_stfts)**2)
clean_data = tf.squeeze(clean_powerspectrum)[:-4]
return noise_data, clean_data
my tf.data pipeline is as shown below
shuffle_batch = 10
batch_size = 10
dataset = tf.data.Dataset.from_tensor_slices((noise_datalist,clean_datalist))
dataset = dataset.shuffle(shuffle_batch) # shuffle number of files perbatch
dataset = dataset.map(parse_line,num_parallel_calls=8)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)
dataset = dataset.make_one_shot_iterator()
next_element = dataset.get_next()
this is the errors that shows
InvalidArgumentError (see above for traceback): Cannot batch tensors with different shapes in component 0. First element had shape [443,5,257] and element 1 had shape [280,5,257].
[[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]
when i change the batch_size to 1 it works and get one data. How can I batch this variable length data or even maybe batch all data to 1 like [443,5,257] and [280,5,257] to [723,5,257]?
python-3.x tensorflow tensorflow-datasets
I have to train a denoising autoencoder but i need to batch the 5-frame noisy powerspectrum with 1 frame clean powerspectrum , but i dono how to batch the spectrogram since my data are all variable length in time-series.
def parse_line(noise_file,clean_file):
noise_binary = tf.read_file(noise_file)
noise_binary = tf.contrib.ffmpeg.decode_audio(noise_binary, file_format='wav', samples_per_second=16000, channel_count=1)
noise_stfts = tf.contrib.signal.stft(tf.reshape(noise_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)
noise_powerspectrum = tf.log(tf.abs(noise_stfts)**2)
noise_data = tf.squeeze(tf.contrib.signal.frame(noise_powerspectrum,frame_length=5,frame_step=1,axis=1))
clean_binary = tf.read_file(clean_file)
clean_binary = tf.contrib.ffmpeg.decode_audio(clean_binary, file_format='wav', samples_per_second=16000, channel_count=1)
clean_stfts = tf.contrib.signal.stft(tf.reshape(clean_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)
clean_powerspectrum = tf.log(tf.abs(clean_stfts)**2)
clean_data = tf.squeeze(clean_powerspectrum)[:-4]
return noise_data, clean_data
my tf.data pipeline is as shown below
shuffle_batch = 10
batch_size = 10
dataset = tf.data.Dataset.from_tensor_slices((noise_datalist,clean_datalist))
dataset = dataset.shuffle(shuffle_batch) # shuffle number of files perbatch
dataset = dataset.map(parse_line,num_parallel_calls=8)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)
dataset = dataset.make_one_shot_iterator()
next_element = dataset.get_next()
this is the errors that shows
InvalidArgumentError (see above for traceback): Cannot batch tensors with different shapes in component 0. First element had shape [443,5,257] and element 1 had shape [280,5,257].
[[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]
when i change the batch_size to 1 it works and get one data. How can I batch this variable length data or even maybe batch all data to 1 like [443,5,257] and [280,5,257] to [723,5,257]?
python-3.x tensorflow tensorflow-datasets
python-3.x tensorflow tensorflow-datasets
asked Dec 31 '18 at 6:20
LeowLeow
1209
1209
443 and 280 corresponds to shapes of your noisy data and clean data respectively?
– kvish
Jan 1 at 10:28
@kvish my noisy data are created from clean_data, so they are same size, but i have many different time in length of wav files(clean_data) and i read it from noise_datalist and clean_datalist . i have a list of noise and clean data like data1.wav, data2.wav which is different length of time which shows in the post, for example data1.wav have 443frames, data2.wav and 280 frames and etc.
– Leow
Jan 1 at 11:08
oh okay thanks for clearing that up. Would padding and batching change the context of what you are trying to do? It might produce different lengths of first component in different batches according to what is the maximum in that batch, if you do not know what is the overall max length that you would want to use.
– kvish
Jan 1 at 11:34
add a comment |
443 and 280 corresponds to shapes of your noisy data and clean data respectively?
– kvish
Jan 1 at 10:28
@kvish my noisy data are created from clean_data, so they are same size, but i have many different time in length of wav files(clean_data) and i read it from noise_datalist and clean_datalist . i have a list of noise and clean data like data1.wav, data2.wav which is different length of time which shows in the post, for example data1.wav have 443frames, data2.wav and 280 frames and etc.
– Leow
Jan 1 at 11:08
oh okay thanks for clearing that up. Would padding and batching change the context of what you are trying to do? It might produce different lengths of first component in different batches according to what is the maximum in that batch, if you do not know what is the overall max length that you would want to use.
– kvish
Jan 1 at 11:34
443 and 280 corresponds to shapes of your noisy data and clean data respectively?
– kvish
Jan 1 at 10:28
443 and 280 corresponds to shapes of your noisy data and clean data respectively?
– kvish
Jan 1 at 10:28
@kvish my noisy data are created from clean_data, so they are same size, but i have many different time in length of wav files(clean_data) and i read it from noise_datalist and clean_datalist . i have a list of noise and clean data like data1.wav, data2.wav which is different length of time which shows in the post, for example data1.wav have 443frames, data2.wav and 280 frames and etc.
– Leow
Jan 1 at 11:08
@kvish my noisy data are created from clean_data, so they are same size, but i have many different time in length of wav files(clean_data) and i read it from noise_datalist and clean_datalist . i have a list of noise and clean data like data1.wav, data2.wav which is different length of time which shows in the post, for example data1.wav have 443frames, data2.wav and 280 frames and etc.
– Leow
Jan 1 at 11:08
oh okay thanks for clearing that up. Would padding and batching change the context of what you are trying to do? It might produce different lengths of first component in different batches according to what is the maximum in that batch, if you do not know what is the overall max length that you would want to use.
– kvish
Jan 1 at 11:34
oh okay thanks for clearing that up. Would padding and batching change the context of what you are trying to do? It might produce different lengths of first component in different batches according to what is the maximum in that batch, if you do not know what is the overall max length that you would want to use.
– kvish
Jan 1 at 11:34
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53984242%2fhow-to-batch-a-variable-length-spectogram-in-tensorflow%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53984242%2fhow-to-batch-a-variable-length-spectogram-in-tensorflow%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
443 and 280 corresponds to shapes of your noisy data and clean data respectively?
– kvish
Jan 1 at 10:28
@kvish my noisy data are created from clean_data, so they are same size, but i have many different time in length of wav files(clean_data) and i read it from noise_datalist and clean_datalist . i have a list of noise and clean data like data1.wav, data2.wav which is different length of time which shows in the post, for example data1.wav have 443frames, data2.wav and 280 frames and etc.
– Leow
Jan 1 at 11:08
oh okay thanks for clearing that up. Would padding and batching change the context of what you are trying to do? It might produce different lengths of first component in different batches according to what is the maximum in that batch, if you do not know what is the overall max length that you would want to use.
– kvish
Jan 1 at 11:34