Unet architecture on Carvana dataset












0














I have followed zhixuhao's unet implementation and another implementation from kaggle.



Both models are not that much different from each other, except the fact that former has a few extra layers and thus almost 30 million more parameters.



My problem is that I am not able to get either of the models to perform good (I mean -800 something loss in both models), in terms of binary_crossentropy loss and accuracy or dice_coef as metric.Please help me find where am I going wrong.
here are some of my suspicions:



1) One intresting thing I noticed, dice_coef reaches upto 1.9 within a single epoch ( which shouldn't be possible as it should be less than 1). So here is the dice_coeff function from the kaggle link



def dice_coef(y_true, y_pred, smooth=0):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection=K.sum(y_true_f * y_pred_f)
return(2. * intersection + smooth) / ((K.sum(y_true_f) + K.sum(y_pred_f)) + smooth)


2) flow_from_directory() function given in keras doesn't read .gif files by default (mask image is in .gif format). So I followed this advice and added gif in keras/preprocessing/image.py. And then while reading the image through flow_from_directory() I gave color_mode = 'grayscale' so that target image has 1 channel since UNet architecture last layer was 1 channel output. If i read image by my own through skimage.io.imread() , gif image is of size (1024, 1024), i.e. 1 channel.



3) I also thought that maybe image augmentations were responsible.I have used mainly the default augmentations from keras. Here is whole image reading and augmentation part



input_shape = (1024, 1024, 3)
batch_size = 4
# we create two instances with the same arguments
data_gen_args = dict(rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)

# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1

image_generator = image_datagen.flow_from_directory(
'Carvana/train',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
class_mode=None,
seed=seed)

mask_generator = mask_datagen.flow_from_directory(
'Carvana/train_masks',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
color_mode = 'grayscale',
class_mode=None,
seed=seed)

# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)

model2 = unet(input_shape)
model2.fit_generator(
train_generator,
steps_per_epoch=50,
epochs=2)


the training output is



Found 5088 images belonging to 1 classes.
Found 5088 images belonging to 1 classes.
Epoch 1/2
50/50 [==============================] - 66s 1s/step - loss: -724.1043 - dice_coef: 1.8661
Epoch 2/2
50/50 [==============================] - 64s 1s/step - loss: -829.2828 - dice_coef: 1.9626


finally here is the whole network from this kaggle kernel, my only modification is changed input to channel-last and axis = 3 from axis = 1 in concatenation layers



def unet(input_shape):
input_ = Input(input_shape)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(input_)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv0)
pool0 = MaxPooling2D(pool_size=(2, 2))(conv0)

conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool0)
conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)

up6 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))
merge6 = Concatenate(axis = 3)([conv4,up6])
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)

up7 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
merge7 = Concatenate(axis = 3)([conv3,up7])
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)

up8 = Conv2D(32, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
merge8 = Concatenate(axis = 3)([conv2,up8])
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)

up9 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
merge9 = Concatenate(axis = 3)([conv1,up9])
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)

up10 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv9))

conv10 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(up10)
conv11 = Conv2D(1, 1, activation = 'sigmoid')(conv10)

model = Model(input = input_, outputs = conv11)

model.compile(optimizer= Adam(lr=0.0005), loss='binary_crossentropy', metrics=[dice_coef])

return model


Finally here is me testing only model's mask output on the 1st image in training dataset



pic = cv2.resize(io.imread('Carvana/train/train/0cdf5b5d0ce1_01.jpg'), input_shape[:2])
pic = pic.reshape(1, input_shape[0], input_shape[1], input_shape[2])
res = model2.predict(pic)
print(res[0].shape)
res = np.array(res[0])
r = res * 200
g = res * 1
b = res * 70
res = np.concatenate((r, g, b), axis = 2)
io.imshow(res)


I am sorry for such a long post, but I am unable to pin point exact error that I have committed. Any help is much appreciated.










share|improve this question






















  • many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
    – lorenzori
    11 hours ago












  • input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes next(train_generator) is useful, but i already know the shape expected from model.summary()
    – Shubham Debnath
    9 hours ago










  • use next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
    – lorenzori
    8 hours ago










  • I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
    – Shubham Debnath
    8 hours ago










  • ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?
    – lorenzori
    7 hours ago
















0














I have followed zhixuhao's unet implementation and another implementation from kaggle.



Both models are not that much different from each other, except the fact that former has a few extra layers and thus almost 30 million more parameters.



My problem is that I am not able to get either of the models to perform good (I mean -800 something loss in both models), in terms of binary_crossentropy loss and accuracy or dice_coef as metric.Please help me find where am I going wrong.
here are some of my suspicions:



1) One intresting thing I noticed, dice_coef reaches upto 1.9 within a single epoch ( which shouldn't be possible as it should be less than 1). So here is the dice_coeff function from the kaggle link



def dice_coef(y_true, y_pred, smooth=0):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection=K.sum(y_true_f * y_pred_f)
return(2. * intersection + smooth) / ((K.sum(y_true_f) + K.sum(y_pred_f)) + smooth)


2) flow_from_directory() function given in keras doesn't read .gif files by default (mask image is in .gif format). So I followed this advice and added gif in keras/preprocessing/image.py. And then while reading the image through flow_from_directory() I gave color_mode = 'grayscale' so that target image has 1 channel since UNet architecture last layer was 1 channel output. If i read image by my own through skimage.io.imread() , gif image is of size (1024, 1024), i.e. 1 channel.



3) I also thought that maybe image augmentations were responsible.I have used mainly the default augmentations from keras. Here is whole image reading and augmentation part



input_shape = (1024, 1024, 3)
batch_size = 4
# we create two instances with the same arguments
data_gen_args = dict(rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)

# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1

image_generator = image_datagen.flow_from_directory(
'Carvana/train',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
class_mode=None,
seed=seed)

mask_generator = mask_datagen.flow_from_directory(
'Carvana/train_masks',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
color_mode = 'grayscale',
class_mode=None,
seed=seed)

# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)

model2 = unet(input_shape)
model2.fit_generator(
train_generator,
steps_per_epoch=50,
epochs=2)


the training output is



Found 5088 images belonging to 1 classes.
Found 5088 images belonging to 1 classes.
Epoch 1/2
50/50 [==============================] - 66s 1s/step - loss: -724.1043 - dice_coef: 1.8661
Epoch 2/2
50/50 [==============================] - 64s 1s/step - loss: -829.2828 - dice_coef: 1.9626


finally here is the whole network from this kaggle kernel, my only modification is changed input to channel-last and axis = 3 from axis = 1 in concatenation layers



def unet(input_shape):
input_ = Input(input_shape)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(input_)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv0)
pool0 = MaxPooling2D(pool_size=(2, 2))(conv0)

conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool0)
conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)

up6 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))
merge6 = Concatenate(axis = 3)([conv4,up6])
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)

up7 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
merge7 = Concatenate(axis = 3)([conv3,up7])
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)

up8 = Conv2D(32, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
merge8 = Concatenate(axis = 3)([conv2,up8])
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)

up9 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
merge9 = Concatenate(axis = 3)([conv1,up9])
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)

up10 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv9))

conv10 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(up10)
conv11 = Conv2D(1, 1, activation = 'sigmoid')(conv10)

model = Model(input = input_, outputs = conv11)

model.compile(optimizer= Adam(lr=0.0005), loss='binary_crossentropy', metrics=[dice_coef])

return model


Finally here is me testing only model's mask output on the 1st image in training dataset



pic = cv2.resize(io.imread('Carvana/train/train/0cdf5b5d0ce1_01.jpg'), input_shape[:2])
pic = pic.reshape(1, input_shape[0], input_shape[1], input_shape[2])
res = model2.predict(pic)
print(res[0].shape)
res = np.array(res[0])
r = res * 200
g = res * 1
b = res * 70
res = np.concatenate((r, g, b), axis = 2)
io.imshow(res)


I am sorry for such a long post, but I am unable to pin point exact error that I have committed. Any help is much appreciated.










share|improve this question






















  • many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
    – lorenzori
    11 hours ago












  • input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes next(train_generator) is useful, but i already know the shape expected from model.summary()
    – Shubham Debnath
    9 hours ago










  • use next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
    – lorenzori
    8 hours ago










  • I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
    – Shubham Debnath
    8 hours ago










  • ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?
    – lorenzori
    7 hours ago














0












0








0







I have followed zhixuhao's unet implementation and another implementation from kaggle.



Both models are not that much different from each other, except the fact that former has a few extra layers and thus almost 30 million more parameters.



My problem is that I am not able to get either of the models to perform good (I mean -800 something loss in both models), in terms of binary_crossentropy loss and accuracy or dice_coef as metric.Please help me find where am I going wrong.
here are some of my suspicions:



1) One intresting thing I noticed, dice_coef reaches upto 1.9 within a single epoch ( which shouldn't be possible as it should be less than 1). So here is the dice_coeff function from the kaggle link



def dice_coef(y_true, y_pred, smooth=0):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection=K.sum(y_true_f * y_pred_f)
return(2. * intersection + smooth) / ((K.sum(y_true_f) + K.sum(y_pred_f)) + smooth)


2) flow_from_directory() function given in keras doesn't read .gif files by default (mask image is in .gif format). So I followed this advice and added gif in keras/preprocessing/image.py. And then while reading the image through flow_from_directory() I gave color_mode = 'grayscale' so that target image has 1 channel since UNet architecture last layer was 1 channel output. If i read image by my own through skimage.io.imread() , gif image is of size (1024, 1024), i.e. 1 channel.



3) I also thought that maybe image augmentations were responsible.I have used mainly the default augmentations from keras. Here is whole image reading and augmentation part



input_shape = (1024, 1024, 3)
batch_size = 4
# we create two instances with the same arguments
data_gen_args = dict(rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)

# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1

image_generator = image_datagen.flow_from_directory(
'Carvana/train',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
class_mode=None,
seed=seed)

mask_generator = mask_datagen.flow_from_directory(
'Carvana/train_masks',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
color_mode = 'grayscale',
class_mode=None,
seed=seed)

# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)

model2 = unet(input_shape)
model2.fit_generator(
train_generator,
steps_per_epoch=50,
epochs=2)


the training output is



Found 5088 images belonging to 1 classes.
Found 5088 images belonging to 1 classes.
Epoch 1/2
50/50 [==============================] - 66s 1s/step - loss: -724.1043 - dice_coef: 1.8661
Epoch 2/2
50/50 [==============================] - 64s 1s/step - loss: -829.2828 - dice_coef: 1.9626


finally here is the whole network from this kaggle kernel, my only modification is changed input to channel-last and axis = 3 from axis = 1 in concatenation layers



def unet(input_shape):
input_ = Input(input_shape)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(input_)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv0)
pool0 = MaxPooling2D(pool_size=(2, 2))(conv0)

conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool0)
conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)

up6 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))
merge6 = Concatenate(axis = 3)([conv4,up6])
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)

up7 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
merge7 = Concatenate(axis = 3)([conv3,up7])
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)

up8 = Conv2D(32, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
merge8 = Concatenate(axis = 3)([conv2,up8])
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)

up9 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
merge9 = Concatenate(axis = 3)([conv1,up9])
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)

up10 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv9))

conv10 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(up10)
conv11 = Conv2D(1, 1, activation = 'sigmoid')(conv10)

model = Model(input = input_, outputs = conv11)

model.compile(optimizer= Adam(lr=0.0005), loss='binary_crossentropy', metrics=[dice_coef])

return model


Finally here is me testing only model's mask output on the 1st image in training dataset



pic = cv2.resize(io.imread('Carvana/train/train/0cdf5b5d0ce1_01.jpg'), input_shape[:2])
pic = pic.reshape(1, input_shape[0], input_shape[1], input_shape[2])
res = model2.predict(pic)
print(res[0].shape)
res = np.array(res[0])
r = res * 200
g = res * 1
b = res * 70
res = np.concatenate((r, g, b), axis = 2)
io.imshow(res)


I am sorry for such a long post, but I am unable to pin point exact error that I have committed. Any help is much appreciated.










share|improve this question













I have followed zhixuhao's unet implementation and another implementation from kaggle.



Both models are not that much different from each other, except the fact that former has a few extra layers and thus almost 30 million more parameters.



My problem is that I am not able to get either of the models to perform good (I mean -800 something loss in both models), in terms of binary_crossentropy loss and accuracy or dice_coef as metric.Please help me find where am I going wrong.
here are some of my suspicions:



1) One intresting thing I noticed, dice_coef reaches upto 1.9 within a single epoch ( which shouldn't be possible as it should be less than 1). So here is the dice_coeff function from the kaggle link



def dice_coef(y_true, y_pred, smooth=0):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection=K.sum(y_true_f * y_pred_f)
return(2. * intersection + smooth) / ((K.sum(y_true_f) + K.sum(y_pred_f)) + smooth)


2) flow_from_directory() function given in keras doesn't read .gif files by default (mask image is in .gif format). So I followed this advice and added gif in keras/preprocessing/image.py. And then while reading the image through flow_from_directory() I gave color_mode = 'grayscale' so that target image has 1 channel since UNet architecture last layer was 1 channel output. If i read image by my own through skimage.io.imread() , gif image is of size (1024, 1024), i.e. 1 channel.



3) I also thought that maybe image augmentations were responsible.I have used mainly the default augmentations from keras. Here is whole image reading and augmentation part



input_shape = (1024, 1024, 3)
batch_size = 4
# we create two instances with the same arguments
data_gen_args = dict(rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)

# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1

image_generator = image_datagen.flow_from_directory(
'Carvana/train',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
class_mode=None,
seed=seed)

mask_generator = mask_datagen.flow_from_directory(
'Carvana/train_masks',
target_size = (input_shape[0], input_shape[1]),
batch_size = batch_size,
color_mode = 'grayscale',
class_mode=None,
seed=seed)

# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)

model2 = unet(input_shape)
model2.fit_generator(
train_generator,
steps_per_epoch=50,
epochs=2)


the training output is



Found 5088 images belonging to 1 classes.
Found 5088 images belonging to 1 classes.
Epoch 1/2
50/50 [==============================] - 66s 1s/step - loss: -724.1043 - dice_coef: 1.8661
Epoch 2/2
50/50 [==============================] - 64s 1s/step - loss: -829.2828 - dice_coef: 1.9626


finally here is the whole network from this kaggle kernel, my only modification is changed input to channel-last and axis = 3 from axis = 1 in concatenation layers



def unet(input_shape):
input_ = Input(input_shape)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(input_)
conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv0)
pool0 = MaxPooling2D(pool_size=(2, 2))(conv0)

conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool0)
conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)

up6 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))
merge6 = Concatenate(axis = 3)([conv4,up6])
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)

up7 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
merge7 = Concatenate(axis = 3)([conv3,up7])
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)

up8 = Conv2D(32, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
merge8 = Concatenate(axis = 3)([conv2,up8])
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)

up9 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
merge9 = Concatenate(axis = 3)([conv1,up9])
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)

up10 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv9))

conv10 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(up10)
conv11 = Conv2D(1, 1, activation = 'sigmoid')(conv10)

model = Model(input = input_, outputs = conv11)

model.compile(optimizer= Adam(lr=0.0005), loss='binary_crossentropy', metrics=[dice_coef])

return model


Finally here is me testing only model's mask output on the 1st image in training dataset



pic = cv2.resize(io.imread('Carvana/train/train/0cdf5b5d0ce1_01.jpg'), input_shape[:2])
pic = pic.reshape(1, input_shape[0], input_shape[1], input_shape[2])
res = model2.predict(pic)
print(res[0].shape)
res = np.array(res[0])
r = res * 200
g = res * 1
b = res * 70
res = np.concatenate((r, g, b), axis = 2)
io.imshow(res)


I am sorry for such a long post, but I am unable to pin point exact error that I have committed. Any help is much appreciated.







python keras computer-vision image-segmentation kaggle






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 12 hours ago









Shubham Debnath

235




235












  • many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
    – lorenzori
    11 hours ago












  • input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes next(train_generator) is useful, but i already know the shape expected from model.summary()
    – Shubham Debnath
    9 hours ago










  • use next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
    – lorenzori
    8 hours ago










  • I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
    – Shubham Debnath
    8 hours ago










  • ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?
    – lorenzori
    7 hours ago


















  • many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
    – lorenzori
    11 hours ago












  • input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes next(train_generator) is useful, but i already know the shape expected from model.summary()
    – Shubham Debnath
    9 hours ago










  • use next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
    – lorenzori
    8 hours ago










  • I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
    – Shubham Debnath
    8 hours ago










  • ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?
    – lorenzori
    7 hours ago
















many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
– lorenzori
11 hours ago






many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
– lorenzori
11 hours ago














input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes next(train_generator) is useful, but i already know the shape expected from model.summary()
– Shubham Debnath
9 hours ago




input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes next(train_generator) is useful, but i already know the shape expected from model.summary()
– Shubham Debnath
9 hours ago












use next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
– lorenzori
8 hours ago




use next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
– lorenzori
8 hours ago












I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago




I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago












ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?
– lorenzori
7 hours ago




ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?
– lorenzori
7 hours ago

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53942634%2funet-architecture-on-carvana-dataset%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53942634%2funet-architecture-on-carvana-dataset%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Mossoró

Error while reading .h5 file using the rhdf5 package in R

Pushsharp Apns notification error: 'InvalidToken'