Unet architecture on Carvana dataset

I have followed zhixuhao's unet implementation and another implementation from kaggle.

Both models are not that much different from each other, except the fact that former has a few extra layers and thus almost 30 million more parameters.

My problem is that I am not able to get either of the models to perform good (I mean -800 something loss in both models), in terms of binary_crossentropy loss and accuracy or dice_coef as metric.Please help me find where am I going wrong.
here are some of my suspicions:

1) One intresting thing I noticed, dice_coef reaches upto 1.9 within a single epoch ( which shouldn't be possible as it should be less than 1). So here is the dice_coeff function from the kaggle link

def dice_coef(y_true, y_pred, smooth=0):

    y_true_f = K.flatten(y_true)

    y_pred_f = K.flatten(y_pred)

    intersection=K.sum(y_true_f * y_pred_f)

    return(2. * intersection + smooth) / ((K.sum(y_true_f) + K.sum(y_pred_f)) + smooth)

2) flow_from_directory() function given in keras doesn't read .gif files by default (mask image is in .gif format). So I followed this advice and added gif in keras/preprocessing/image.py. And then while reading the image through flow_from_directory() I gave color_mode = 'grayscale' so that target image has 1 channel since UNet architecture last layer was 1 channel output. If i read image by my own through skimage.io.imread() , gif image is of size (1024, 1024), i.e. 1 channel.

3) I also thought that maybe image augmentations were responsible.I have used mainly the default augmentations from keras. Here is whole image reading and augmentation part

input_shape = (1024, 1024, 3)

batch_size = 4

# we create two instances with the same arguments

data_gen_args = dict(rotation_range=90,

                     width_shift_range=0.1,

                     height_shift_range=0.1)

image_datagen = ImageDataGenerator(**data_gen_args)

mask_datagen = ImageDataGenerator(**data_gen_args)



# Provide the same seed and keyword arguments to the fit and flow methods

seed = 1



image_generator = image_datagen.flow_from_directory(

    'Carvana/train',

    target_size = (input_shape[0], input_shape[1]),

    batch_size = batch_size,

    class_mode=None,

    seed=seed)



mask_generator = mask_datagen.flow_from_directory(

    'Carvana/train_masks',

    target_size = (input_shape[0], input_shape[1]),

    batch_size = batch_size,

    color_mode = 'grayscale',

    class_mode=None,

    seed=seed)



# combine generators into one which yields image and masks

train_generator = zip(image_generator, mask_generator)



model2 = unet(input_shape)

model2.fit_generator(

    train_generator,    

    steps_per_epoch=50,

    epochs=2)

the training output is

Found 5088 images belonging to 1 classes.

Found 5088 images belonging to 1 classes.

Epoch 1/2

50/50 [==============================] - 66s 1s/step - loss: -724.1043 - dice_coef: 1.8661

Epoch 2/2

50/50 [==============================] - 64s 1s/step - loss: -829.2828 - dice_coef: 1.9626

finally here is the whole network from this kaggle kernel, my only modification is changed input to channel-last and axis = 3 from axis = 1 in concatenation layers

def unet(input_shape):

    input_ = Input(input_shape)

    conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(input_)

    conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv0)

    pool0 = MaxPooling2D(pool_size=(2, 2))(conv0)



    conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool0)

    conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)

    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)



    conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)

    conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)

    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)



    conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)

    conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)

    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)



    conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)

    conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)

    pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)



    conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)

    conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)



    up6 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))

    merge6 = Concatenate(axis = 3)([conv4,up6])

    conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)

    conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)



    up7 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))

    merge7 = Concatenate(axis = 3)([conv3,up7])

    conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)

    conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)



    up8 = Conv2D(32, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))

    merge8 = Concatenate(axis = 3)([conv2,up8])

    conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)

    conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)



    up9 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))

    merge9 = Concatenate(axis = 3)([conv1,up9])

    conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)

    conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)



    up10 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv9))



    conv10 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(up10)

    conv11 = Conv2D(1, 1, activation = 'sigmoid')(conv10)



    model = Model(input = input_, outputs = conv11)



    model.compile(optimizer= Adam(lr=0.0005), loss='binary_crossentropy', metrics=[dice_coef])



    return model

Finally here is me testing only model's mask output on the 1st image in training dataset

pic = cv2.resize(io.imread('Carvana/train/train/0cdf5b5d0ce1_01.jpg'), input_shape[:2])

pic = pic.reshape(1, input_shape[0], input_shape[1], input_shape[2])

res = model2.predict(pic)

print(res[0].shape)

res = np.array(res[0])

r = res * 200

g = res * 1

b = res * 70

res = np.concatenate((r, g, b), axis = 2)

io.imshow(res)

I am sorry for such a long post, but I am unable to pin point exact error that I have committed. Any help is much appreciated.

asked 12 hours ago

Shubham Debnath

235

many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
– lorenzori
11 hours ago

input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes next(train_generator) is useful, but i already know the shape expected from model.summary()
– Shubham Debnath
9 hours ago

use next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
– lorenzori
8 hours ago

I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago

ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?
– lorenzori
7 hours ago

add a comment |

I have followed zhixuhao's unet implementation and another implementation from kaggle.

Both models are not that much different from each other, except the fact that former has a few extra layers and thus almost 30 million more parameters.

1) One intresting thing I noticed, dice_coef reaches upto 1.9 within a single epoch ( which shouldn't be possible as it should be less than 1). So here is the dice_coeff function from the kaggle link

def dice_coef(y_true, y_pred, smooth=0):

    y_true_f = K.flatten(y_true)

    y_pred_f = K.flatten(y_pred)

    intersection=K.sum(y_true_f * y_pred_f)

    return(2. * intersection + smooth) / ((K.sum(y_true_f) + K.sum(y_pred_f)) + smooth)

3) I also thought that maybe image augmentations were responsible.I have used mainly the default augmentations from keras. Here is whole image reading and augmentation part

input_shape = (1024, 1024, 3)

batch_size = 4

# we create two instances with the same arguments

data_gen_args = dict(rotation_range=90,

                     width_shift_range=0.1,

                     height_shift_range=0.1)

image_datagen = ImageDataGenerator(**data_gen_args)

mask_datagen = ImageDataGenerator(**data_gen_args)



# Provide the same seed and keyword arguments to the fit and flow methods

seed = 1



image_generator = image_datagen.flow_from_directory(

    'Carvana/train',

    target_size = (input_shape[0], input_shape[1]),

    batch_size = batch_size,

    class_mode=None,

    seed=seed)



mask_generator = mask_datagen.flow_from_directory(

    'Carvana/train_masks',

    target_size = (input_shape[0], input_shape[1]),

    batch_size = batch_size,

    color_mode = 'grayscale',

    class_mode=None,

    seed=seed)



# combine generators into one which yields image and masks

train_generator = zip(image_generator, mask_generator)



model2 = unet(input_shape)

model2.fit_generator(

    train_generator,    

    steps_per_epoch=50,

    epochs=2)

the training output is

Found 5088 images belonging to 1 classes.

Found 5088 images belonging to 1 classes.

Epoch 1/2

50/50 [==============================] - 66s 1s/step - loss: -724.1043 - dice_coef: 1.8661

Epoch 2/2

50/50 [==============================] - 64s 1s/step - loss: -829.2828 - dice_coef: 1.9626

finally here is the whole network from this kaggle kernel, my only modification is changed input to channel-last and axis = 3 from axis = 1 in concatenation layers

def unet(input_shape):

    input_ = Input(input_shape)

    conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(input_)

    conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv0)

    pool0 = MaxPooling2D(pool_size=(2, 2))(conv0)



    conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool0)

    conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)

    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)



    conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)

    conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)

    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)



    conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)

    conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)

    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)



    conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)

    conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)

    pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)



    conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)

    conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)



    up6 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))

    merge6 = Concatenate(axis = 3)([conv4,up6])

    conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)

    conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)



    up7 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))

    merge7 = Concatenate(axis = 3)([conv3,up7])

    conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)

    conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)



    up8 = Conv2D(32, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))

    merge8 = Concatenate(axis = 3)([conv2,up8])

    conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)

    conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)



    up9 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))

    merge9 = Concatenate(axis = 3)([conv1,up9])

    conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)

    conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)



    up10 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv9))



    conv10 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(up10)

    conv11 = Conv2D(1, 1, activation = 'sigmoid')(conv10)



    model = Model(input = input_, outputs = conv11)



    model.compile(optimizer= Adam(lr=0.0005), loss='binary_crossentropy', metrics=[dice_coef])



    return model

Finally here is me testing only model's mask output on the 1st image in training dataset

pic = cv2.resize(io.imread('Carvana/train/train/0cdf5b5d0ce1_01.jpg'), input_shape[:2])

pic = pic.reshape(1, input_shape[0], input_shape[1], input_shape[2])

res = model2.predict(pic)

print(res[0].shape)

res = np.array(res[0])

r = res * 200

g = res * 1

b = res * 70

res = np.concatenate((r, g, b), axis = 2)

io.imshow(res)

I am sorry for such a long post, but I am unable to pin point exact error that I have committed. Any help is much appreciated.

asked 12 hours ago

Shubham Debnath

235

many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
– lorenzori
11 hours ago

input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes next(train_generator) is useful, but i already know the shape expected from model.summary()
– Shubham Debnath
9 hours ago

use next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
– lorenzori
8 hours ago

I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago

ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?
– lorenzori
7 hours ago

add a comment |

I have followed zhixuhao's unet implementation and another implementation from kaggle.

Both models are not that much different from each other, except the fact that former has a few extra layers and thus almost 30 million more parameters.

1) One intresting thing I noticed, dice_coef reaches upto 1.9 within a single epoch ( which shouldn't be possible as it should be less than 1). So here is the dice_coeff function from the kaggle link

def dice_coef(y_true, y_pred, smooth=0):

    y_true_f = K.flatten(y_true)

    y_pred_f = K.flatten(y_pred)

    intersection=K.sum(y_true_f * y_pred_f)

    return(2. * intersection + smooth) / ((K.sum(y_true_f) + K.sum(y_pred_f)) + smooth)

3) I also thought that maybe image augmentations were responsible.I have used mainly the default augmentations from keras. Here is whole image reading and augmentation part

input_shape = (1024, 1024, 3)

batch_size = 4

# we create two instances with the same arguments

data_gen_args = dict(rotation_range=90,

                     width_shift_range=0.1,

                     height_shift_range=0.1)

image_datagen = ImageDataGenerator(**data_gen_args)

mask_datagen = ImageDataGenerator(**data_gen_args)



# Provide the same seed and keyword arguments to the fit and flow methods

seed = 1



image_generator = image_datagen.flow_from_directory(

    'Carvana/train',

    target_size = (input_shape[0], input_shape[1]),

    batch_size = batch_size,

    class_mode=None,

    seed=seed)



mask_generator = mask_datagen.flow_from_directory(

    'Carvana/train_masks',

    target_size = (input_shape[0], input_shape[1]),

    batch_size = batch_size,

    color_mode = 'grayscale',

    class_mode=None,

    seed=seed)



# combine generators into one which yields image and masks

train_generator = zip(image_generator, mask_generator)



model2 = unet(input_shape)

model2.fit_generator(

    train_generator,    

    steps_per_epoch=50,

    epochs=2)

the training output is

Found 5088 images belonging to 1 classes.

Found 5088 images belonging to 1 classes.

Epoch 1/2

50/50 [==============================] - 66s 1s/step - loss: -724.1043 - dice_coef: 1.8661

Epoch 2/2

50/50 [==============================] - 64s 1s/step - loss: -829.2828 - dice_coef: 1.9626

finally here is the whole network from this kaggle kernel, my only modification is changed input to channel-last and axis = 3 from axis = 1 in concatenation layers

def unet(input_shape):

    input_ = Input(input_shape)

    conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(input_)

    conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv0)

    pool0 = MaxPooling2D(pool_size=(2, 2))(conv0)



    conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool0)

    conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)

    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)



    conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)

    conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)

    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)



    conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)

    conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)

    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)



    conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)

    conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)

    pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)



    conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)

    conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)



    up6 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))

    merge6 = Concatenate(axis = 3)([conv4,up6])

    conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)

    conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)



    up7 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))

    merge7 = Concatenate(axis = 3)([conv3,up7])

    conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)

    conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)



    up8 = Conv2D(32, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))

    merge8 = Concatenate(axis = 3)([conv2,up8])

    conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)

    conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)



    up9 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))

    merge9 = Concatenate(axis = 3)([conv1,up9])

    conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)

    conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)



    up10 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv9))



    conv10 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(up10)

    conv11 = Conv2D(1, 1, activation = 'sigmoid')(conv10)



    model = Model(input = input_, outputs = conv11)



    model.compile(optimizer= Adam(lr=0.0005), loss='binary_crossentropy', metrics=[dice_coef])



    return model

Finally here is me testing only model's mask output on the 1st image in training dataset

pic = cv2.resize(io.imread('Carvana/train/train/0cdf5b5d0ce1_01.jpg'), input_shape[:2])

pic = pic.reshape(1, input_shape[0], input_shape[1], input_shape[2])

res = model2.predict(pic)

print(res[0].shape)

res = np.array(res[0])

r = res * 200

g = res * 1

b = res * 70

res = np.concatenate((r, g, b), axis = 2)

io.imshow(res)

I am sorry for such a long post, but I am unable to pin point exact error that I have committed. Any help is much appreciated.

asked 12 hours ago

Shubham Debnath

235

I have followed zhixuhao's unet implementation and another implementation from kaggle.

Both models are not that much different from each other, except the fact that former has a few extra layers and thus almost 30 million more parameters.

1) One intresting thing I noticed, dice_coef reaches upto 1.9 within a single epoch ( which shouldn't be possible as it should be less than 1). So here is the dice_coeff function from the kaggle link

def dice_coef(y_true, y_pred, smooth=0):

    y_true_f = K.flatten(y_true)

    y_pred_f = K.flatten(y_pred)

    intersection=K.sum(y_true_f * y_pred_f)

    return(2. * intersection + smooth) / ((K.sum(y_true_f) + K.sum(y_pred_f)) + smooth)

3) I also thought that maybe image augmentations were responsible.I have used mainly the default augmentations from keras. Here is whole image reading and augmentation part

input_shape = (1024, 1024, 3)

batch_size = 4

# we create two instances with the same arguments

data_gen_args = dict(rotation_range=90,

                     width_shift_range=0.1,

                     height_shift_range=0.1)

image_datagen = ImageDataGenerator(**data_gen_args)

mask_datagen = ImageDataGenerator(**data_gen_args)



# Provide the same seed and keyword arguments to the fit and flow methods

seed = 1



image_generator = image_datagen.flow_from_directory(

    'Carvana/train',

    target_size = (input_shape[0], input_shape[1]),

    batch_size = batch_size,

    class_mode=None,

    seed=seed)



mask_generator = mask_datagen.flow_from_directory(

    'Carvana/train_masks',

    target_size = (input_shape[0], input_shape[1]),

    batch_size = batch_size,

    color_mode = 'grayscale',

    class_mode=None,

    seed=seed)



# combine generators into one which yields image and masks

train_generator = zip(image_generator, mask_generator)



model2 = unet(input_shape)

model2.fit_generator(

    train_generator,    

    steps_per_epoch=50,

    epochs=2)

the training output is

Found 5088 images belonging to 1 classes.

Found 5088 images belonging to 1 classes.

Epoch 1/2

50/50 [==============================] - 66s 1s/step - loss: -724.1043 - dice_coef: 1.8661

Epoch 2/2

50/50 [==============================] - 64s 1s/step - loss: -829.2828 - dice_coef: 1.9626

finally here is the whole network from this kaggle kernel, my only modification is changed input to channel-last and axis = 3 from axis = 1 in concatenation layers

def unet(input_shape):

    input_ = Input(input_shape)

    conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(input_)

    conv0 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv0)

    pool0 = MaxPooling2D(pool_size=(2, 2))(conv0)



    conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool0)

    conv1 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)

    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)



    conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)

    conv2 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)

    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)



    conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)

    conv3 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)

    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)



    conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)

    conv4 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)

    pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)



    conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)

    conv5 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)



    up6 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))

    merge6 = Concatenate(axis = 3)([conv4,up6])

    conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)

    conv6 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)



    up7 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))

    merge7 = Concatenate(axis = 3)([conv3,up7])

    conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)

    conv7 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)



    up8 = Conv2D(32, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))

    merge8 = Concatenate(axis = 3)([conv2,up8])

    conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)

    conv8 = Conv2D(32, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)



    up9 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))

    merge9 = Concatenate(axis = 3)([conv1,up9])

    conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)

    conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)



    up10 = Conv2D(16, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv9))



    conv10 = Conv2D(8, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(up10)

    conv11 = Conv2D(1, 1, activation = 'sigmoid')(conv10)



    model = Model(input = input_, outputs = conv11)



    model.compile(optimizer= Adam(lr=0.0005), loss='binary_crossentropy', metrics=[dice_coef])



    return model

Finally here is me testing only model's mask output on the 1st image in training dataset

pic = cv2.resize(io.imread('Carvana/train/train/0cdf5b5d0ce1_01.jpg'), input_shape[:2])

pic = pic.reshape(1, input_shape[0], input_shape[1], input_shape[2])

res = model2.predict(pic)

print(res[0].shape)

res = np.array(res[0])

r = res * 200

g = res * 1

b = res * 70

res = np.concatenate((r, g, b), axis = 2)

io.imshow(res)

I am sorry for such a long post, but I am unable to pin point exact error that I have committed. Any help is much appreciated.

python keras computer-vision image-segmentation kaggle

asked 12 hours ago

Shubham Debnath

235

asked 12 hours ago

Shubham Debnath

235

asked 12 hours ago

Shubham Debnath

235

asked 12 hours ago

Shubham Debnath

235

asked 12 hours ago

Shubham Debnath

235

many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
– lorenzori
11 hours ago

input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes next(train_generator) is useful, but i already know the shape expected from model.summary()
– Shubham Debnath
9 hours ago

use next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
– lorenzori
8 hours ago

I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago

ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?
– lorenzori
7 hours ago

add a comment |

many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
– lorenzori
11 hours ago

input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes next(train_generator) is useful, but i already know the shape expected from model.summary()
– Shubham Debnath
9 hours ago

use next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
– lorenzori
8 hours ago

I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago

ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?
– lorenzori
7 hours ago

many things could be going wrong. First off, point number 2: Caravana dataset has coloured images so 3 channels, not 1. The UNET's output has only 1 class because it predicts car/no car. As a tip, a good way to debug usually is to call the generator (next(train_generator(...))) and check what the network will be getting.
– lorenzori
11 hours ago

input images are colored, so i've read them in 3 channels, but masks are binary, thats why i thought to use grayscale, yes usually segmentation has a mask with n channels for n objects, but here only car and background are the 2 classes next(train_generator) is useful, but i already know the shape expected from model.summary()
– Shubham Debnath
9 hours ago

use next(train_generator)to make sure your data is in the right shape not the model. Also, are you scaling or normalizing your data?
– lorenzori
8 hours ago

I am only resizing , i was normalizing and scaling earlier but then removed them for testing , and yes I have checked the shape of both input and target data
– Shubham Debnath
8 hours ago

ok then if the data supplied to the net is good (not just dimensions, also values vs masks), the net looks fine and the loss as well then I am not sure how to help! I would also scrap the flow_from_directory and try to write my own generator to make sure it behaves as expected (not long to do). Could you try the class_mode='binary' as parameter as well?
– lorenzori
7 hours ago

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53942634%2funet-architecture-on-carvana-dataset%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk