Evaluation and Visualization of DeeplabV3 completed or not?

by Satya Harish   Last Updated October 09, 2019 15:26 PM

Disclaimer: First time i am trying Machine Learning! We have a requirement of Automatic segmentation of a objects in an image from Background. Through internet we found that "Deep lab" will solve our purpose. we downloaded the deeplab from their offical site and followed all the instructions that they have mentioned. we trained the pascal_voc_2012 dataset with below command

python deeplab/train.py \ --logtostderr \ --training_number_of_steps=30000 \ --train_split="train" \ --model_variant="xception_65" \ --atrous_rates=6 \ --atrous_rates=12 \ --atrous_rates=18 \ --output_stride=16 \ --decoder_output_stride=4 \ --train_crop_size=513 \ --train_crop_size=513 \ --train_batch_size=1 \ --dataset="pascal_voc_seg" \ --tf_initial_checkpoint=/home/ktpl13/Desktop/models-master/research/deeplab/datasets/pascal_voc_seg/checkpoint \ --train_logdir=/home/ktpl13/Desktop/models-master/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train$ \ --dataset_dir=/home/ktpl13/Desktop/models-master/research/deeplab/datasets/pascal_voc_seg/tfrecord

Training is done after 50 hours. Then i started the Evaluation using below command

python deeplab/eval.py \ --logtostderr \ --eval_split="val" \ --model_variant="xception_65" \ --atrous_rates=6 \ --atrous_rates=12 \ --atrous_rates=18 \ --output_stride=16 \ --decoder_output_stride=4 \ --eval_crop_size=513 \ --eval_crop_size=513 \ --dataset="pascal_voc_seg" \ --checkpoint_dir=/home/ktpl13/Desktop/models-master/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train/ \ --eval_logdir=/home/ktpl13/Desktop/models-master/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/eval/ \ --dataset_dir=/home/ktpl13/Desktop/models-master/research/deeplab/datasets/pascal_voc_seg/tfrecord

After executing the above command, it found one checkpoint correctly, but after that it stays with this message

"Waiting for checkpoint at home/ktpl13/Desktop/models-master/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train/"

So i terminated the execution of Eval after 2 hours and started the visualization with below command

python deeplab/vis.py \ --logtostderr \ --vis_split="val" \ --model_variant="xception_65" \ --atrous_rates=6 \ --atrous_rates=12 \ --atrous_rates=18 \ --output_stride=16 \ --decoder_output_stride=4 \ --vis_crop_size=513 \ --vis_crop_size=513 \ --dataset="pascal_voc_seg" \ --checkpoint_dir=/home/ktpl13/Desktop/models-master/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train/ \ --vis_logdir=/home/ktpl13/Desktop/models-master/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/vis/ \ --dataset_dir=/home/ktpl13/Desktop/models-master/research/deeplab/datasets/pascal_voc_seg/tfrecord/

visualization also executed for one checkpoint and then again got the same message like Eval.

"Waiting for checkpoint at home/ktpl13/Desktop/models-master/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train/"

Again i terminated the execution of vis. There is a folder generated under vis with name "segmentation_results" which contains the "prediction.png" for each input image. which is "completly black image".

Now My questions are.

  1. Did My Evaluation and visualization are done? or am i doing something wrong?
  2. Why the predicted images all are Black?

Answers 2

For future reference, I ran into the same problem. After I found out what happened I laughed so hard.

Both eval and vis ran as expected.

For eval, right above your output of "waiting for checkpoints," there should be a line that says "miou[your model accuracy here]" It is a tiny line and easy to miss.

For vis, you will find your segmented result in the vis logdir you provided in your vis command.

More in depth, both eval and vis have successfully analyze the network your trained, and as a feature they are waiting for more checkpoints in case you decided to train more networks to compare.

Mong H. Ng
Mong H. Ng
November 02, 2018 18:43 PM

About the eval waiting for another checkpoint, it's because the default expects to run along with the train process. To run the eval script only once, after training, add this flag to the eval.sh script:

--max_number_of_evaluations = 1

And you can view the value using TensorBoard.

The vis.sh script appears to be running correctly as it's saving the images to the directory. The issue with all black images is a different problem (e.g: dataset configuration, label weights, colormap removal, etc).

October 09, 2019 15:25 PM

Related Questions

Test a tensorflow cnn model after the training

Updated April 12, 2019 15:26 PM