In this project, I used the deep neural network and convolutional neural networks I learned in the course to classify traffic signs. I trained and validated the model to classify traffic sign images using the [German Traffic Sign Dataset] ( After the model training, I found some models on the Internet to find some models of the German traffic signs.
The goals / steps of this project are the following:
- Load the data set
- Explore, summarize and visualize the data set
- Design, train and test a model architecture
- Use the model to make predictions on new images
- Analyze the softmax probabilities of the new images
- Summarize the results with a written report
First of all, I first imported the pickle module to load the data of this project.
Data Path: ./traffic-signs-data/
import pickle
training_file = "./traffic-signs-data/train.p"
validation_file= "./traffic-signs-data/valid.p"
testing_file = "./traffic-signs-data/test.p"
with open(training_file, mode='rb') as f:
train = pickle.load(f)
with open(validation_file, mode='rb') as f:
valid = pickle.load(f)
with open(testing_file, mode='rb') as f:
test = pickle.load(f)
X_train, y_train = train['features'], train['labels']
X_valid, y_valid = valid['features'], valid['labels']
X_test, y_test = test['features'], test['labels']
Let's first understand the overall size of the data.
label_counts = collections.Counter(y_train)
labels_title = [x[0] for x in sorted(label_counts.most_common())]
n_train = X_train.shape[0]
n_validation = X_valid.shape[0]
n_test = X_test.shape[0]
image_shape = (X_train.shape[1], X_train.shape[2], X_train.shape[3])
n_classes = len(labels_title)
print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
Number of training examples = 258000
Number of testing examples = 12630
Image data shape = (32, 32, 3)
Number of classes = 43
Next I use BAR to visualize my data and randomly display training and test data to check if the distribution is consistent.
with open('signnames.csv', newline='') as csvfile:
rows = csv.reader(csvfile)
label_define = []
for row in rows:
labels_total = [x[1] for x in sorted(label_counts.most_common())]
#print(labels_total),labels_total, 1, color='br',alpha=0.5)
plt.title('The count of each sign')
plt.ylabel('Numbers of Labels')
print("**********Random Training Data!!**************")
#Show example of dataset
H_imgs = 4
V_imgs = 3
fig, axs = plt.subplots(H_imgs,V_imgs, figsize=(20, 25))
axs = axs.ravel()
for i in range(0 , H_imgs*V_imgs):
index = random.randint(0, len(X_train))
image = X_train[index].squeeze()
axs[i].imshow(image, cmap='gray', aspect='auto')
axs[i].set_title(label_define[y_train[index]], fontsize=20)
print("**********Random Test Data!!**************")
H_imgs = 4
V_imgs = 3
fig, axs = plt.subplots(H_imgs,V_imgs, figsize=(20, 25))
axs = axs.ravel()
for i in range(0 , H_imgs*V_imgs):
index = random.randint(0, len(X_test))
image = X_test[index].squeeze()
axs[i].imshow(image, cmap='gray', aspect='auto')
axs[i].set_title(label_define[y_test[index]], fontsize=20)
Because the training data is too small, it is not conducive to training the model, so I tried to create training data by means of perspective transformation and changing the RGB space.
The function of perspective transformation is very useful information found on the Internet. Here is the link
every_class_target = 6000
def transform_image(img,ang_range,shear_range,trans_range):
# Rotation
ang_rot = np.random.uniform(ang_range)-ang_range/2
rows,cols,ch = img.shape
Rot_M = cv2.getRotationMatrix2D((cols/2,rows/2),ang_rot,1)
# Translation
tr_x = trans_range*np.random.uniform()-trans_range/2
tr_y = trans_range*np.random.uniform()-trans_range/2
Trans_M = np.float32([[1,0,tr_x],[0,1,tr_y]])
# She
pts1 = np.float32([[5,5],[20,5],[5,20]])
pt1 = 5+shear_range*np.random.uniform()-shear_range/2
pt2 = 20+shear_range*np.random.uniform()-shear_range/2
pts2 = np.float32([[pt1,5],[pt2,pt1],[5,pt2]])
shear_M = cv2.getAffineTransform(pts1,pts2)
img = cv2.warpAffine(img,Rot_M,(cols,rows))
img = cv2.warpAffine(img,Trans_M,(cols,rows))
img = cv2.warpAffine(img,shear_M,(cols,rows))
return img
def gray_img(img):
mean = np.mean(img, axis=2, keepdims=True).astype(np.uint8)
mean_data = np.concatenate((mean,mean,mean),axis=2)
return np.array(mean_data, np.int32)
def generate_images():
new_x_train = []
new_y_train = []
index = 0
for label_total in labels_total:
images = []
#gather images in a list
for i in range(0, len(X_train)):
if index == y_train[i]:
while label_total < every_class_target:
image = random.choice(images)
pres = [transform_image(image,20,10,5), transform_image(image,20,10,5), transform_image(image,20,10,5), gray_img(image)]
#chose 3:1
image = random.choice(pres)
#image = transform_image(image,20,10,5)
if new_x_train:
new_x_train = [image]
label_total += 1
index += 1
return new_x_train, new_y_train
print('Generating images')
new_x_train, new_y_train = generate_images()
print("**********Random Generated Data!!**************")
H_imgs = 4
V_imgs = 3
fig, axs = plt.subplots(H_imgs,V_imgs, figsize=(20, 25))
axs = axs.ravel()
for i in range(0 , H_imgs*V_imgs):
index = random.randint(0, len(new_x_train))
image = np.array(new_x_train[index].squeeze(),np.uint8)
axs[i].imshow(image, cmap='gray', aspect='auto')
axs[i].set_title(label_define[new_y_train[index]], fontsize=20)
print('input images: ' + str(len(X_train)))
print('generated images: ' + str(len(new_x_train)))
X_train = np.append(X_train, new_x_train).reshape((-1,32,32,3))
y_train = np.append(y_train, new_y_train)
print('Total training images: ' + str(len(X_train)))
Then show the generated results
Normalized data:
def preprocess(data):
mean = np.mean(data)
std = np.std(data)
imgs = center_normaize(data, mean, std)
return data
def center_normaize(data, mean, std):
data = data.astype('float32')
data -= mean
data /= std
return data
X_train = preprocess(X_train)
X_test = preprocess(X_test)
X_valid = preprocess(X_valid)
X_train, y_train = shuffle(X_train, y_train)
print("**********Random Training Data!!**************")
#Show example of dataset
H_imgs = 4
V_imgs = 3
fig, axs = plt.subplots(H_imgs,V_imgs, figsize=(20, 25))
axs = axs.ravel()
for i in range(0 , H_imgs*V_imgs):
index = random.randint(0, len(X_train))
image = X_train[index].squeeze()
axs[i].imshow(image, cmap='gray', aspect='auto')
axs[i].set_title(label_define[y_train[index]], fontsize=20)
Then show the preprocess data results:
Because I designed the model for the first time, I tried to modify it into my model with LeNet.
Then I used it three times at a time, because I have Nvidia GTX 1070 so I added one more layer.
Layer | Description |
Input | 32x32x3 RGB image |
Convolution 3x3 | 1x1 stride, same padding, outputs 32x32x16 |
Max pooling 2x2 | 2x2 stride, outputs 16x16x16 |
Convolution 3x3 | 1x1 stride, same padding, outputs 16x16x32 |
Max pooling 2x2 | 2x2 stride, outputs 8x8x32 |
Convolution 3x3 | 1x1 stride, same padding, outputs 8x8x64 |
Max pooling 3x3 | 2x2 stride, outputs 3x3x64 |
Flatten | output 576 |
Fully connected | output 120 |
Dropout | |
Fully connected | outout 84 |
Dropout | |
Fully connected | output 43 |
Softmax |
def LeNet(x):
# Arguments used for tf.truncated_normal, randomly defines variables for the weights and biases for each layer
mu = 0
sigma = 0.1
Conv1_filter = 3
Conv1_feature = 16
Conv2_filter = 3
Conv2_feature = 32
Conv3_filter = 3
Conv3_feature = 64
Fc1_feature = 120
Fc2_feature = 84
#Layer 1: Convolutional.
print("Input Data shape:"+ str(x.shape))
conv1_W = tf.Variable(tf.truncated_normal(shape=(Conv1_filter, Conv1_filter, int(x.shape[3]), Conv1_feature), mean = mu, stddev = sigma))
conv1_b = tf.Variable(tf.zeros(Conv1_feature))
conv1 = tf.nn.conv2d(x, conv1_W, strides=[1, 1, 1, 1], padding='VALID', use_cudnn_on_gpu=True) + conv1_b
conv1 = tf.nn.relu(conv1)
#Pooling. Input =24x24x48.
conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
print("Conv1.shape :" + str(conv1.shape))
#Layer 2: Convolutional.
conv2_W = tf.Variable(tf.truncated_normal(shape=(Conv2_filter, Conv2_filter, Conv1_feature, Conv2_feature), mean = mu, stddev = sigma))
conv2_b = tf.Variable(tf.zeros(Conv2_feature))
conv2 = tf.nn.conv2d(conv1, conv2_W, strides=[1, 1, 1, 1], padding='VALID', use_cudnn_on_gpu=True) + conv2_b
conv2 = tf.nn.relu(conv2)
#Pooling. Input = 8x8x64.
conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
print("Conv2.shape :" + str(conv2.shape))
#Layer 2: Convolutional. input 4x4x128
conv3_W = tf.Variable(tf.truncated_normal(shape=(Conv3_filter, Conv3_filter, Conv2_feature, Conv3_feature), mean = mu, stddev = sigma))
conv3_b = tf.Variable(tf.zeros(Conv3_feature))
conv3 = tf.nn.conv2d(conv2, conv3_W, strides=[1, 1, 1, 1], padding='VALID', use_cudnn_on_gpu=True) + conv3_b
conv3 = tf.nn.relu(conv3)
conv3 = tf.nn.max_pool(conv3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
print("Conv3.shape :" + str(conv3.shape))
fc0 = flatten(conv3)
#fc0 = flatten(conv2)
print("Fc0.shape :" + str(fc0.shape))
#Layer 3: Fully Connected.
fc1_W = tf.Variable(tf.truncated_normal(shape=(int(fc0.shape[1]), Fc1_feature), mean = mu, stddev = sigma))
fc1_b = tf.Variable(tf.zeros(Fc1_feature))
fc1 = tf.matmul(fc0, fc1_W) + fc1_b
fc1 = tf.nn.relu(fc1)
print("Fc1.shape :" + str(fc1.shape))
#Layer 4: Fully Connected.
fc2_W = tf.Variable(tf.truncated_normal(shape=(Fc1_feature, Fc2_feature), mean = mu, stddev = sigma))
fc2_b = tf.Variable(tf.zeros(Fc2_feature))
fc2 = tf.matmul(fc1, fc2_W) + fc2_b
fc2 = tf.nn.relu(fc2)
print("Fc2.shape :" + str(fc2.shape))
#Layer 5: Fully Connected.
fc3_W = tf.Variable(tf.truncated_normal(shape=(Fc2_feature, n_classes), mean = mu, stddev = sigma))
fc3_b = tf.Variable(tf.zeros(n_classes))
logits = tf.matmul(fc2, fc3_W) + fc3_b
return logits
After designing the model, it starts to be set.
EPOCHS = 100
Learning rate = 0.0005
cross_entropy = softmax
optimizer = AdamOptimizer
Because there are more than one category, use softmax instead of logits.
Use ADAM to avoid gradient descent stopping at the peak
Calculate the accuracy of each Epochs and plot it so that it can be very useful when troubleshooting.
x = tf.placeholder(tf.float32, (None, int(X_train.shape[1]), int(X_train.shape[2]), int(X_train.shape[3])))
y = tf.placeholder(tf.int32, (None))
one_hot_y = tf.one_hot(y, 43)
p = IntProgress()
EPOCHS = 100
rate = 0.001
p.max = EPOCHS
p.description = 'Progress:'
p.bar_style = 'info'
logits = LeNet(x)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=one_hot_y, logits=logits)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate = rate)
training_operation = optimizer.minimize(loss_operation)
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()
def evaluate(X_data, y_data):
num_examples = len(X_data)
total_accuracy = 0
sess = tf.get_default_session()
for offset in range(0, num_examples, BATCH_SIZE):
batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
accuracy =, feed_dict={x: batch_x, y: batch_y})
total_accuracy += (accuracy * len(batch_x))
return total_accuracy / num_examples
with tf.Session() as sess:
num_examples = len(X_train)
validation_accuracy_group = []
for i in range(EPOCHS):
p.value = i+1
X_train, y_train = shuffle(X_train, y_train)
for offset in range(0, num_examples, BATCH_SIZE):
end = offset + BATCH_SIZE
batch_x, batch_y = X_train[offset:end], y_train[offset:end], feed_dict={x: batch_x, y: batch_y})
validation_accuracy = evaluate(X_valid, y_valid)
for i in range(1,EPOCHS+1):
print("EPOCH"+str(i)+" Validation Accuracy= {:.1f}".format(validation_accuracy_group[i-1]))
plt.plot(np.array(range(1, EPOCHS+1)), np.array(validation_accuracy_group))
plt.xlabel('Validation Accuracy')
plt.savefig("./output/Validation_Accuracy.png"), './mynet')
print("Model saved")
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
test_accuracy = evaluate(X_test, y_test)
print("Test Accuracy = {:.3f}".format(test_accuracy))
Input Data shape:(?, 32, 32, 3) Conv1.shape :(?, 16, 16, 32) Conv2.shape :(?, 8, 8, 128) Conv3.shape :(?, 4, 4, 256) Fc0.shape :(?, 4096) Fc1.shape :(?, 120) Fc2.shape :(?, 84) Training...
Model saved
INFO:tensorflow:Restoring parameters from .\mynet
Test Accuracy = 0.964
Because I am not a German, I used Google Images to search a few traffic signs to test the model. Of course, the picture is also subject to pre-processing to predict.
for image_name in os.listdir("testimage/"):
img = mpimg.imread("testimage/"+image_name)
resized_img = cv2.resize(img, (IMAGE_SIZE, IMAGE_SIZE), interpolation=cv2.INTER_CUBIC)
X_data = np.array(resized_img, dtype = np.float32).reshape((-1,32,32,3))
X_data = preprocess(X_data)
fig, axs = plt.subplots(1,2, figsize=(20, 10))
axs = axs.ravel()
index = random.randint(0, len(new_x_train))
fig.suptitle("img" + str(index), fontsize=20)
axs[0].imshow(img, cmap='gray', aspect='auto')
axs[0].set_title("Input_img", fontsize=15)
axs[1].imshow(np.array(X_data.squeeze(),np.uint8), cmap='gray', aspect='auto')
axs[1].set_title("Preprocess_img", fontsize=15)
plt.savefig("./output/img" + str(index) + ".png")
index += 1
Because the project requires "Analyze the softmax probabilities of the new images", I also added the chance when I predicted.
def evaluate_external_image(X_data):
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
sign_chance =[prediction],feed_dict={x:X_data})
sign_chance = sign_chance[0][0]
for i in range(0, 43):
if sign_chance[i] > 0.01:
print( str(int(sign_chance[i]*100)) + '%: ' + label_define[i] )
INFO:tensorflow:Restoring parameters from .\mynet
100%: Right-of-way at the next intersection
INFO:tensorflow:Restoring parameters from .\mynet
100%: Speed limit (60km/h)
INFO:tensorflow:Restoring parameters from .\mynet
100%: Keep right
INFO:tensorflow:Restoring parameters from .\mynet
100%: Roundabout mandatory
INFO:tensorflow:Restoring parameters from .\mynet
100%: Stop
When I first started the project, I didn't know where to start. Later, I followed the instructions from the LeNet network and took another time to review Andrew Ng's course at coursera. Link: In addition, I went to GOOGLE to find a lot of CNN materials, and later I really understood the whole architecture. Then after a long training and finding information, I finally found the best answer. Although there are still some problems in forecasting, I believe that if the amount of data is a little more and there are good quality photos, you can train a better model. I can also add deeper networks to train and try to adjust DropOut to avoid Overfit. In addition, the resolution of 3232 is not a good training material. It should be increased to 6464. It will be more textured so that the model can have more features to judge.
This lab requires:
The lab environment can be created with CarND Term1 Starter Kit. Click here for the details.
- Download the data set. The classroom has a link to the data set in the "Project Instructions" content. This is a pickled dataset in which we've already resized the images to 32x32. It contains a training, validation and test set.
- Clone the project, which contains the Ipython notebook and the writeup template.
git clone
cd CarND-Traffic-Sign-Classifier-Project
jupyter notebook Traffic_Sign_Classifier.ipynb