Skip to content

Perception Technical Documentation

Joshua Williams edited this page Oct 29, 2021 · 1 revision

Perception Documentation

Chart

Perception Technical Documentation Chart

NOTE

Lot of the functionality in the following node are determined by the capabilities of the camera vendor's SDK and sample codes, which we have yet to test out. We will update this technical document to include more specific details once we finish our testing and assess the necessary configuration of the node for those dependent on our outputs (like the behavior team). Until then we will try to include the advertised functions.

ZED ROS2 Wrapper

Inputs

The ZED Camera's video feed itself is the input for the wrapper.

Functionality

The ZED SDK performs its own object detection on vehicles and pedestrians, creating bounding boxes among other various data like colored point clouds.

Outputs

The ZED node publishes the video feed to our traffic sign detector node. The ZED node also publishes bounding boxes for the vehicles and pedestrians it detected directly to the behavior planner

Traffic Sign Detector

Inputs

2-D images provided from the output of the ZED ROS2 Wrapper.

Functionality

The traffic sign detector is a convolutional neural network(CNN) that works on a concept similar to the YOLO algorithm as well as the Google Segmentation Algorithm. The model first takes the image dataset, in this case, road signs, and subdivides the image into a grid of 100x100 pixel images. These images are then exported and sorted by their class for training. There is an 8 layer CNN consisting of 3 convolutional layers and 5 dense layers. There is also a hyperparameter tuner setup to tune the changeable parameters to further increase accuracy. After training, it takes a given input image and subdivides it exactly how it did with the training dataset. The AI predicts on each smaller image and determines if there is a road sign in the section. A temperature value we assign will determine if there is enough confidence in the prediction to confirm a road sign. This is done to each smaller image. Then, these images are reconstructed and color-coded based on if they have a road sign in them or not. Finally, we find the outer edges of the color codes and return the bounding box.

Outputs

2-D bounding boxes with (x, y, w, h) coordinates, or similar data format, specifying bounding box location for road signs.

Clone this wiki locally