how to batch a variable length spectogram in tensorflow

I have to train a denoising autoencoder but i need to batch the 5-frame noisy powerspectrum with 1 frame clean powerspectrum , but i dono how to batch the spectrogram since my data are all variable length in time-series.

def parse_line(noise_file,clean_file):

    noise_binary = tf.read_file(noise_file)

    noise_binary = tf.contrib.ffmpeg.decode_audio(noise_binary, file_format='wav', samples_per_second=16000, channel_count=1)

    noise_stfts = tf.contrib.signal.stft(tf.reshape(noise_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)

    noise_powerspectrum = tf.log(tf.abs(noise_stfts)**2)

    noise_data = tf.squeeze(tf.contrib.signal.frame(noise_powerspectrum,frame_length=5,frame_step=1,axis=1))

    clean_binary = tf.read_file(clean_file)

    clean_binary = tf.contrib.ffmpeg.decode_audio(clean_binary, file_format='wav', samples_per_second=16000, channel_count=1)

    clean_stfts = tf.contrib.signal.stft(tf.reshape(clean_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)

    clean_powerspectrum = tf.log(tf.abs(clean_stfts)**2)

    clean_data = tf.squeeze(clean_powerspectrum)[:-4]

    return noise_data, clean_data

my tf.data pipeline is as shown below

shuffle_batch = 10

batch_size = 10

dataset = tf.data.Dataset.from_tensor_slices((noise_datalist,clean_datalist))

dataset = dataset.shuffle(shuffle_batch) # shuffle number of files perbatch

dataset = dataset.map(parse_line,num_parallel_calls=8)

dataset = dataset.batch(batch_size)

dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)

dataset = dataset.make_one_shot_iterator()

next_element = dataset.get_next()

this is the errors that shows

InvalidArgumentError (see above for traceback): Cannot batch tensors with different shapes in component 0. First element had shape [443,5,257] and element 1 had shape [280,5,257].

 [[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]

when i change the batch_size to 1 it works and get one data. How can I batch this variable length data or even maybe batch all data to 1 like [443,5,257] and [280,5,257] to [723,5,257]?

asked Dec 31 '18 at 6:20

Leow

1209

443 and 280 corresponds to shapes of your noisy data and clean data respectively?

– kvish
Jan 1 at 10:28

@kvish my noisy data are created from clean_data, so they are same size, but i have many different time in length of wav files(clean_data) and i read it from noise_datalist and clean_datalist . i have a list of noise and clean data like data1.wav, data2.wav which is different length of time which shows in the post, for example data1.wav have 443frames, data2.wav and 280 frames and etc.

– Leow
Jan 1 at 11:08

oh okay thanks for clearing that up. Would padding and batching change the context of what you are trying to do? It might produce different lengths of first component in different batches according to what is the maximum in that batch, if you do not know what is the overall max length that you would want to use.

– kvish
Jan 1 at 11:34

add a comment |

def parse_line(noise_file,clean_file):

    noise_binary = tf.read_file(noise_file)

    noise_binary = tf.contrib.ffmpeg.decode_audio(noise_binary, file_format='wav', samples_per_second=16000, channel_count=1)

    noise_stfts = tf.contrib.signal.stft(tf.reshape(noise_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)

    noise_powerspectrum = tf.log(tf.abs(noise_stfts)**2)

    noise_data = tf.squeeze(tf.contrib.signal.frame(noise_powerspectrum,frame_length=5,frame_step=1,axis=1))

    clean_binary = tf.read_file(clean_file)

    clean_binary = tf.contrib.ffmpeg.decode_audio(clean_binary, file_format='wav', samples_per_second=16000, channel_count=1)

    clean_stfts = tf.contrib.signal.stft(tf.reshape(clean_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)

    clean_powerspectrum = tf.log(tf.abs(clean_stfts)**2)

    clean_data = tf.squeeze(clean_powerspectrum)[:-4]

    return noise_data, clean_data

my tf.data pipeline is as shown below

shuffle_batch = 10

batch_size = 10

dataset = tf.data.Dataset.from_tensor_slices((noise_datalist,clean_datalist))

dataset = dataset.shuffle(shuffle_batch) # shuffle number of files perbatch

dataset = dataset.map(parse_line,num_parallel_calls=8)

dataset = dataset.batch(batch_size)

dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)

dataset = dataset.make_one_shot_iterator()

next_element = dataset.get_next()

this is the errors that shows

InvalidArgumentError (see above for traceback): Cannot batch tensors with different shapes in component 0. First element had shape [443,5,257] and element 1 had shape [280,5,257].

 [[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]

when i change the batch_size to 1 it works and get one data. How can I batch this variable length data or even maybe batch all data to 1 like [443,5,257] and [280,5,257] to [723,5,257]?

asked Dec 31 '18 at 6:20

Leow

1209

443 and 280 corresponds to shapes of your noisy data and clean data respectively?

– kvish
Jan 1 at 10:28

@kvish my noisy data are created from clean_data, so they are same size, but i have many different time in length of wav files(clean_data) and i read it from noise_datalist and clean_datalist . i have a list of noise and clean data like data1.wav, data2.wav which is different length of time which shows in the post, for example data1.wav have 443frames, data2.wav and 280 frames and etc.

– Leow
Jan 1 at 11:08

oh okay thanks for clearing that up. Would padding and batching change the context of what you are trying to do? It might produce different lengths of first component in different batches according to what is the maximum in that batch, if you do not know what is the overall max length that you would want to use.

– kvish
Jan 1 at 11:34

add a comment |

def parse_line(noise_file,clean_file):

    noise_binary = tf.read_file(noise_file)

    noise_binary = tf.contrib.ffmpeg.decode_audio(noise_binary, file_format='wav', samples_per_second=16000, channel_count=1)

    noise_stfts = tf.contrib.signal.stft(tf.reshape(noise_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)

    noise_powerspectrum = tf.log(tf.abs(noise_stfts)**2)

    noise_data = tf.squeeze(tf.contrib.signal.frame(noise_powerspectrum,frame_length=5,frame_step=1,axis=1))

    clean_binary = tf.read_file(clean_file)

    clean_binary = tf.contrib.ffmpeg.decode_audio(clean_binary, file_format='wav', samples_per_second=16000, channel_count=1)

    clean_stfts = tf.contrib.signal.stft(tf.reshape(clean_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)

    clean_powerspectrum = tf.log(tf.abs(clean_stfts)**2)

    clean_data = tf.squeeze(clean_powerspectrum)[:-4]

    return noise_data, clean_data

my tf.data pipeline is as shown below

shuffle_batch = 10

batch_size = 10

dataset = tf.data.Dataset.from_tensor_slices((noise_datalist,clean_datalist))

dataset = dataset.shuffle(shuffle_batch) # shuffle number of files perbatch

dataset = dataset.map(parse_line,num_parallel_calls=8)

dataset = dataset.batch(batch_size)

dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)

dataset = dataset.make_one_shot_iterator()

next_element = dataset.get_next()

this is the errors that shows

InvalidArgumentError (see above for traceback): Cannot batch tensors with different shapes in component 0. First element had shape [443,5,257] and element 1 had shape [280,5,257].

 [[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]

when i change the batch_size to 1 it works and get one data. How can I batch this variable length data or even maybe batch all data to 1 like [443,5,257] and [280,5,257] to [723,5,257]?

asked Dec 31 '18 at 6:20

Leow

1209

def parse_line(noise_file,clean_file):

    noise_binary = tf.read_file(noise_file)

    noise_binary = tf.contrib.ffmpeg.decode_audio(noise_binary, file_format='wav', samples_per_second=16000, channel_count=1)

    noise_stfts = tf.contrib.signal.stft(tf.reshape(noise_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)

    noise_powerspectrum = tf.log(tf.abs(noise_stfts)**2)

    noise_data = tf.squeeze(tf.contrib.signal.frame(noise_powerspectrum,frame_length=5,frame_step=1,axis=1))

    clean_binary = tf.read_file(clean_file)

    clean_binary = tf.contrib.ffmpeg.decode_audio(clean_binary, file_format='wav', samples_per_second=16000, channel_count=1)

    clean_stfts = tf.contrib.signal.stft(tf.reshape(clean_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)

    clean_powerspectrum = tf.log(tf.abs(clean_stfts)**2)

    clean_data = tf.squeeze(clean_powerspectrum)[:-4]

    return noise_data, clean_data

my tf.data pipeline is as shown below

shuffle_batch = 10

batch_size = 10

dataset = tf.data.Dataset.from_tensor_slices((noise_datalist,clean_datalist))

dataset = dataset.shuffle(shuffle_batch) # shuffle number of files perbatch

dataset = dataset.map(parse_line,num_parallel_calls=8)

dataset = dataset.batch(batch_size)

dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)

dataset = dataset.make_one_shot_iterator()

next_element = dataset.get_next()

this is the errors that shows

InvalidArgumentError (see above for traceback): Cannot batch tensors with different shapes in component 0. First element had shape [443,5,257] and element 1 had shape [280,5,257].

 [[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]

when i change the batch_size to 1 it works and get one data. How can I batch this variable length data or even maybe batch all data to 1 like [443,5,257] and [280,5,257] to [723,5,257]?

python-3.x tensorflow tensorflow-datasets

asked Dec 31 '18 at 6:20

Leow

1209

asked Dec 31 '18 at 6:20

Leow

1209

asked Dec 31 '18 at 6:20

Leow

1209

asked Dec 31 '18 at 6:20

Leow

1209

asked Dec 31 '18 at 6:20

Leow

1209

443 and 280 corresponds to shapes of your noisy data and clean data respectively?

– kvish
Jan 1 at 10:28

@kvish my noisy data are created from clean_data, so they are same size, but i have many different time in length of wav files(clean_data) and i read it from noise_datalist and clean_datalist . i have a list of noise and clean data like data1.wav, data2.wav which is different length of time which shows in the post, for example data1.wav have 443frames, data2.wav and 280 frames and etc.

– Leow
Jan 1 at 11:08

oh okay thanks for clearing that up. Would padding and batching change the context of what you are trying to do? It might produce different lengths of first component in different batches according to what is the maximum in that batch, if you do not know what is the overall max length that you would want to use.

– kvish
Jan 1 at 11:34

add a comment |

443 and 280 corresponds to shapes of your noisy data and clean data respectively?

– kvish
Jan 1 at 10:28

@kvish my noisy data are created from clean_data, so they are same size, but i have many different time in length of wav files(clean_data) and i read it from noise_datalist and clean_datalist . i have a list of noise and clean data like data1.wav, data2.wav which is different length of time which shows in the post, for example data1.wav have 443frames, data2.wav and 280 frames and etc.

– Leow
Jan 1 at 11:08

oh okay thanks for clearing that up. Would padding and batching change the context of what you are trying to do? It might produce different lengths of first component in different batches according to what is the maximum in that batch, if you do not know what is the overall max length that you would want to use.

– kvish
Jan 1 at 11:34

443 and 280 corresponds to shapes of your noisy data and clean data respectively?

– kvish
Jan 1 at 10:28

@kvish my noisy data are created from clean_data, so they are same size, but i have many different time in length of wav files(clean_data) and i read it from noise_datalist and clean_datalist . i have a list of noise and clean data like data1.wav, data2.wav which is different length of time which shows in the post, for example data1.wav have 443frames, data2.wav and 280 frames and etc.

– Leow
Jan 1 at 11:08

oh okay thanks for clearing that up. Would padding and batching change the context of what you are trying to do? It might produce different lengths of first component in different batches according to what is the maximum in that batch, if you do not know what is the overall max length that you would want to use.

– kvish
Jan 1 at 11:34

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53984242%2fhow-to-batch-a-variable-length-spectogram-in-tensorflow%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk