Unet architecture on Carvana dataset
I have followed zhixuhao's unet implementation and another implementation from kaggle.
Both models are not that much different from each other, except the fact that former has a few extra layers and thus almost 30 million more parameters.
My problem is that I am not able to get either of the models to perform good (I mean -800 something loss in both models), in terms of binary_crossentropy loss and accuracy or dice_coef as metric.Please help me find where am I going wrong.
here are some of my suspicions:
1) One intresting thing I noticed, dice_coef reaches upto 1.9 within a single epoch ( which shouldn't be possible as it should be less than 1). So here is the dice_coeff function from the kaggle link
def dice_coef(y_true, y_pred, smooth=0):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection=K.sum(y_true_f * y_pred_f)
return(2. * intersection + smooth) / ((K.sum(y_true_f) + K.sum(y_pred_f)) + smooth)
2) flow_from_directory() function given in keras doesn't read .gif files by default (mask image is in .gif format). So I followed this advice and added gif in keras/preprocessing/image.py. And then while reading the image through flow_from_directory() I gave color_mode = 'grayscale' so that target image has 1 channel since UNet architecture last layer was 1 channel output. If i read image by my own through skimage.io.imread() , gif image is of size (1024, 1024), i.e. 1 channel.
3) I also thought that maybe image augmentations were responsible.I have used mainly the default augmentations from keras. Here is whole image reading and augmentation part
input_shape = (1024, 1024, 3)
batch_size = 4
# we create two instances with the same arguments
data_gen_args = dict(rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
image_generator = image_datagen.flow_from_directory(
'Carvana/train',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
class_mode=None,
seed=seed)
mask_generator = mask_datagen.flow_from_directory(
'Carvana/train_masks',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
color_mode = 'grayscale',
class_mode=None,
seed=seed)
# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)
model2 = unet(input_shape)
model2.fit_generator(
train_generator,
steps_per_epoch=50,
epochs=2)
the training output is
Found 5088 images belonging to 1 classes.
Found 5088 images belonging to 1 classes.
Epoch 1/2
50/50 [==============================] - 66s 1s/step - loss: -724.1043 - dice_coef: 1.8661
Epoch 2/2
50/50 [==============================] - 64s 1s/step - loss: -829.2828 - dice_coef: 1.9626
finally here is the whole network from this kaggle kernel, my only modification is changed input to channel-last and axis = 3 from axis = 1 in concatenation layers
def unet(input_shape):
input_ = Input(input_shape)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(input_)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv0)
pool0 = MaxPooling2D(pool_size=(2, 2))(conv0)
conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool0)
conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)
conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
up6 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))
merge6 = Concatenate(axis = 3)([conv4,up6])
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)
up7 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
merge7 = Concatenate(axis = 3)([conv3,up7])
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)
up8 = Conv2D(32, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
merge8 = Concatenate(axis = 3)([conv2,up8])
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)
up9 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
merge9 = Concatenate(axis = 3)([conv1,up9])
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
up10 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv9))
conv10 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(up10)
conv11 = Conv2D(1, 1, activation = 'sigmoid')(conv10)
model = Model(input = input_, outputs = conv11)
model.compile(optimizer= Adam(lr=0.0005), loss='binary_crossentropy', metrics=[dice_coef])
return model
Finally here is me testing only model's mask output on the 1st image in training dataset
pic = cv2.resize(io.imread('Carvana/train/train/0cdf5b5d0ce1_01.jpg'), input_shape[:2])
pic = pic.reshape(1, input_shape[0], input_shape[1], input_shape[2])
res = model2.predict(pic)
print(res[0].shape)
res = np.array(res[0])
r = res * 200
g = res * 1
b = res * 70
res = np.concatenate((r, g, b), axis = 2)
io.imshow(res)
I am sorry for such a long post, but I am unable to pin point exact error that I have committed. Any help is much appreciated.
python keras computer-vision image-segmentation kaggle
add a comment |
I have followed zhixuhao's unet implementation and another implementation from kaggle.
Both models are not that much different from each other, except the fact that former has a few extra layers and thus almost 30 million more parameters.
My problem is that I am not able to get either of the models to perform good (I mean -800 something loss in both models), in terms of binary_crossentropy loss and accuracy or dice_coef as metric.Please help me find where am I going wrong.
here are some of my suspicions:
1) One intresting thing I noticed, dice_coef reaches upto 1.9 within a single epoch ( which shouldn't be possible as it should be less than 1). So here is the dice_coeff function from the kaggle link
def dice_coef(y_true, y_pred, smooth=0):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection=K.sum(y_true_f * y_pred_f)
return(2. * intersection + smooth) / ((K.sum(y_true_f) + K.sum(y_pred_f)) + smooth)
2) flow_from_directory() function given in keras doesn't read .gif files by default (mask image is in .gif format). So I followed this advice and added gif in keras/preprocessing/image.py. And then while reading the image through flow_from_directory() I gave color_mode = 'grayscale' so that target image has 1 channel since UNet architecture last layer was 1 channel output. If i read image by my own through skimage.io.imread() , gif image is of size (1024, 1024), i.e. 1 channel.
3) I also thought that maybe image augmentations were responsible.I have used mainly the default augmentations from keras. Here is whole image reading and augmentation part
input_shape = (1024, 1024, 3)
batch_size = 4
# we create two instances with the same arguments
data_gen_args = dict(rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
image_generator = image_datagen.flow_from_directory(
'Carvana/train',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
class_mode=None,
seed=seed)
mask_generator = mask_datagen.flow_from_directory(
'Carvana/train_masks',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
color_mode = 'grayscale',
class_mode=None,
seed=seed)
# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)
model2 = unet(input_shape)
model2.fit_generator(
train_generator,
steps_per_epoch=50,
epochs=2)
the training output is
Found 5088 images belonging to 1 classes.
Found 5088 images belonging to 1 classes.
Epoch 1/2
50/50 [==============================] - 66s 1s/step - loss: -724.1043 - dice_coef: 1.8661
Epoch 2/2
50/50 [==============================] - 64s 1s/step - loss: -829.2828 - dice_coef: 1.9626
finally here is the whole network from this kaggle kernel, my only modification is changed input to channel-last and axis = 3 from axis = 1 in concatenation layers
def unet(input_shape):
input_ = Input(input_shape)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(input_)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv0)
pool0 = MaxPooling2D(pool_size=(2, 2))(conv0)
conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool0)
conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)
conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
up6 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))
merge6 = Concatenate(axis = 3)([conv4,up6])
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)
up7 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
merge7 = Concatenate(axis = 3)([conv3,up7])
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)
up8 = Conv2D(32, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
merge8 = Concatenate(axis = 3)([conv2,up8])
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)
up9 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
merge9 = Concatenate(axis = 3)([conv1,up9])
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
up10 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv9))
conv10 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(up10)
conv11 = Conv2D(1, 1, activation = 'sigmoid')(conv10)
model = Model(input = input_, outputs = conv11)
model.compile(optimizer= Adam(lr=0.0005), loss='binary_crossentropy', metrics=[dice_coef])
return model
Finally here is me testing only model's mask output on the 1st image in training dataset
pic = cv2.resize(io.imread('Carvana/train/train/0cdf5b5d0ce1_01.jpg'), input_shape[:2])
pic = pic.reshape(1, input_shape[0], input_shape[1], input_shape[2])
res = model2.predict(pic)
print(res[0].shape)
res = np.array(res[0])
r = res * 200
g = res * 1
b = res * 70
res = np.concatenate((r, g, b), axis = 2)
io.imshow(res)
I am sorry for such a long post, but I am unable to pin point exact error that I have committed. Any help is much appreciated.
python keras computer-vision image-segmentation kaggle
many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
– lorenzori
11 hours ago
input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classesnext(train_generator)is useful, but i already know the shape expected frommodel.summary()
– Shubham Debnath
9 hours ago
usenext(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
– lorenzori
8 hours ago
I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago
ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap theflow_from_directoryand try to write my own generator to make sure it behaves as expected (not long to do). Could you try theclass_mode='binary'as parameter as well?
– lorenzori
7 hours ago
add a comment |
I have followed zhixuhao's unet implementation and another implementation from kaggle.
Both models are not that much different from each other, except the fact that former has a few extra layers and thus almost 30 million more parameters.
My problem is that I am not able to get either of the models to perform good (I mean -800 something loss in both models), in terms of binary_crossentropy loss and accuracy or dice_coef as metric.Please help me find where am I going wrong.
here are some of my suspicions:
1) One intresting thing I noticed, dice_coef reaches upto 1.9 within a single epoch ( which shouldn't be possible as it should be less than 1). So here is the dice_coeff function from the kaggle link
def dice_coef(y_true, y_pred, smooth=0):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection=K.sum(y_true_f * y_pred_f)
return(2. * intersection + smooth) / ((K.sum(y_true_f) + K.sum(y_pred_f)) + smooth)
2) flow_from_directory() function given in keras doesn't read .gif files by default (mask image is in .gif format). So I followed this advice and added gif in keras/preprocessing/image.py. And then while reading the image through flow_from_directory() I gave color_mode = 'grayscale' so that target image has 1 channel since UNet architecture last layer was 1 channel output. If i read image by my own through skimage.io.imread() , gif image is of size (1024, 1024), i.e. 1 channel.
3) I also thought that maybe image augmentations were responsible.I have used mainly the default augmentations from keras. Here is whole image reading and augmentation part
input_shape = (1024, 1024, 3)
batch_size = 4
# we create two instances with the same arguments
data_gen_args = dict(rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
image_generator = image_datagen.flow_from_directory(
'Carvana/train',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
class_mode=None,
seed=seed)
mask_generator = mask_datagen.flow_from_directory(
'Carvana/train_masks',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
color_mode = 'grayscale',
class_mode=None,
seed=seed)
# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)
model2 = unet(input_shape)
model2.fit_generator(
train_generator,
steps_per_epoch=50,
epochs=2)
the training output is
Found 5088 images belonging to 1 classes.
Found 5088 images belonging to 1 classes.
Epoch 1/2
50/50 [==============================] - 66s 1s/step - loss: -724.1043 - dice_coef: 1.8661
Epoch 2/2
50/50 [==============================] - 64s 1s/step - loss: -829.2828 - dice_coef: 1.9626
finally here is the whole network from this kaggle kernel, my only modification is changed input to channel-last and axis = 3 from axis = 1 in concatenation layers
def unet(input_shape):
input_ = Input(input_shape)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(input_)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv0)
pool0 = MaxPooling2D(pool_size=(2, 2))(conv0)
conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool0)
conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)
conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
up6 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))
merge6 = Concatenate(axis = 3)([conv4,up6])
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)
up7 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
merge7 = Concatenate(axis = 3)([conv3,up7])
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)
up8 = Conv2D(32, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
merge8 = Concatenate(axis = 3)([conv2,up8])
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)
up9 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
merge9 = Concatenate(axis = 3)([conv1,up9])
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
up10 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv9))
conv10 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(up10)
conv11 = Conv2D(1, 1, activation = 'sigmoid')(conv10)
model = Model(input = input_, outputs = conv11)
model.compile(optimizer= Adam(lr=0.0005), loss='binary_crossentropy', metrics=[dice_coef])
return model
Finally here is me testing only model's mask output on the 1st image in training dataset
pic = cv2.resize(io.imread('Carvana/train/train/0cdf5b5d0ce1_01.jpg'), input_shape[:2])
pic = pic.reshape(1, input_shape[0], input_shape[1], input_shape[2])
res = model2.predict(pic)
print(res[0].shape)
res = np.array(res[0])
r = res * 200
g = res * 1
b = res * 70
res = np.concatenate((r, g, b), axis = 2)
io.imshow(res)
I am sorry for such a long post, but I am unable to pin point exact error that I have committed. Any help is much appreciated.
python keras computer-vision image-segmentation kaggle
I have followed zhixuhao's unet implementation and another implementation from kaggle.
Both models are not that much different from each other, except the fact that former has a few extra layers and thus almost 30 million more parameters.
My problem is that I am not able to get either of the models to perform good (I mean -800 something loss in both models), in terms of binary_crossentropy loss and accuracy or dice_coef as metric.Please help me find where am I going wrong.
here are some of my suspicions:
1) One intresting thing I noticed, dice_coef reaches upto 1.9 within a single epoch ( which shouldn't be possible as it should be less than 1). So here is the dice_coeff function from the kaggle link
def dice_coef(y_true, y_pred, smooth=0):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection=K.sum(y_true_f * y_pred_f)
return(2. * intersection + smooth) / ((K.sum(y_true_f) + K.sum(y_pred_f)) + smooth)
2) flow_from_directory() function given in keras doesn't read .gif files by default (mask image is in .gif format). So I followed this advice and added gif in keras/preprocessing/image.py. And then while reading the image through flow_from_directory() I gave color_mode = 'grayscale' so that target image has 1 channel since UNet architecture last layer was 1 channel output. If i read image by my own through skimage.io.imread() , gif image is of size (1024, 1024), i.e. 1 channel.
3) I also thought that maybe image augmentations were responsible.I have used mainly the default augmentations from keras. Here is whole image reading and augmentation part
input_shape = (1024, 1024, 3)
batch_size = 4
# we create two instances with the same arguments
data_gen_args = dict(rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
image_generator = image_datagen.flow_from_directory(
'Carvana/train',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
class_mode=None,
seed=seed)
mask_generator = mask_datagen.flow_from_directory(
'Carvana/train_masks',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
color_mode = 'grayscale',
class_mode=None,
seed=seed)
# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)
model2 = unet(input_shape)
model2.fit_generator(
train_generator,
steps_per_epoch=50,
epochs=2)
the training output is
Found 5088 images belonging to 1 classes.
Found 5088 images belonging to 1 classes.
Epoch 1/2
50/50 [==============================] - 66s 1s/step - loss: -724.1043 - dice_coef: 1.8661
Epoch 2/2
50/50 [==============================] - 64s 1s/step - loss: -829.2828 - dice_coef: 1.9626
finally here is the whole network from this kaggle kernel, my only modification is changed input to channel-last and axis = 3 from axis = 1 in concatenation layers
def unet(input_shape):
input_ = Input(input_shape)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(input_)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv0)
pool0 = MaxPooling2D(pool_size=(2, 2))(conv0)
conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool0)
conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)
conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
up6 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))
merge6 = Concatenate(axis = 3)([conv4,up6])
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)
up7 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
merge7 = Concatenate(axis = 3)([conv3,up7])
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)
up8 = Conv2D(32, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
merge8 = Concatenate(axis = 3)([conv2,up8])
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)
up9 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
merge9 = Concatenate(axis = 3)([conv1,up9])
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
up10 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv9))
conv10 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(up10)
conv11 = Conv2D(1, 1, activation = 'sigmoid')(conv10)
model = Model(input = input_, outputs = conv11)
model.compile(optimizer= Adam(lr=0.0005), loss='binary_crossentropy', metrics=[dice_coef])
return model
Finally here is me testing only model's mask output on the 1st image in training dataset
pic = cv2.resize(io.imread('Carvana/train/train/0cdf5b5d0ce1_01.jpg'), input_shape[:2])
pic = pic.reshape(1, input_shape[0], input_shape[1], input_shape[2])
res = model2.predict(pic)
print(res[0].shape)
res = np.array(res[0])
r = res * 200
g = res * 1
b = res * 70
res = np.concatenate((r, g, b), axis = 2)
io.imshow(res)
I am sorry for such a long post, but I am unable to pin point exact error that I have committed. Any help is much appreciated.
python keras computer-vision image-segmentation kaggle
python keras computer-vision image-segmentation kaggle
asked 12 hours ago
Shubham Debnath
235
235
many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
– lorenzori
11 hours ago
input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classesnext(train_generator)is useful, but i already know the shape expected frommodel.summary()
– Shubham Debnath
9 hours ago
usenext(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
– lorenzori
8 hours ago
I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago
ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap theflow_from_directoryand try to write my own generator to make sure it behaves as expected (not long to do). Could you try theclass_mode='binary'as parameter as well?
– lorenzori
7 hours ago
add a comment |
many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
– lorenzori
11 hours ago
input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classesnext(train_generator)is useful, but i already know the shape expected frommodel.summary()
– Shubham Debnath
9 hours ago
usenext(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
– lorenzori
8 hours ago
I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago
ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap theflow_from_directoryand try to write my own generator to make sure it behaves as expected (not long to do). Could you try theclass_mode='binary'as parameter as well?
– lorenzori
7 hours ago
many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (
next(train_generator(...))) and check what the network will be getting.– lorenzori
11 hours ago
many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (
next(train_generator(...))) and check what the network will be getting.– lorenzori
11 hours ago
input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes
next(train_generator) is useful, but i already know the shape expected from model.summary()– Shubham Debnath
9 hours ago
input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes
next(train_generator) is useful, but i already know the shape expected from model.summary()– Shubham Debnath
9 hours ago
use
next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?– lorenzori
8 hours ago
use
next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?– lorenzori
8 hours ago
I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago
I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago
ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the
flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?– lorenzori
7 hours ago
ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the
flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?– lorenzori
7 hours ago
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53942634%2funet-architecture-on-carvana-dataset%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53942634%2funet-architecture-on-carvana-dataset%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (
next(train_generator(...))) and check what the network will be getting.– lorenzori
11 hours ago
input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes
next(train_generator)is useful, but i already know the shape expected frommodel.summary()– Shubham Debnath
9 hours ago
use
next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?– lorenzori
8 hours ago
I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago
ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the
flow_from_directoryand try to write my own generator to make sure it behaves as expected (not long to do). Could you try theclass_mode='binary'as parameter as well?– lorenzori
7 hours ago