Vehicle detection for self driving cars
Building a data pipeline to detect vehicles on the road
I am working on building a data pipeline to detect vehicles from a video feed for a self driving car. Various computer vision techniques are used, including Histogram of Oriented Gradients (HOG), as well has a sliding window approach combined with a machine learned classifier.
The general steps for creating this data pipeline are as follows:
1. Perform a histogram of oriented gradients (HOG) feature extraction process on a labeled training set of images.
2. Use the output of the HOG to train a supervised classifier (SVM, neural network, etc.)
2. Use the output of the HOG to train a supervised classifier (SVM, neural network, etc.)
3. Implement a sliding window technique with windows of various sizes using the trained classifier to search for vehicles in the images using the classifier.
4. Create a heat map of recurring detections. Create a overlap threshold to reject false positives. Also estimate a bounding box based on pixels detected.
The dataset:
We have a dataset of images that we can use to build a model to detect whether an image is a car or not. The dataset consists of roughly 9000 images of cars and 9000 images that do not contain cars. Below is a random sample of images from the dataset.
Histogram of Oriented Gradients
We use a method called histogram of oriented gradients to extract features from these images. HOG is a feature extraction technique that works by counting the occurrence of gradient orientation in localized portions of an image. Essentially, HOG works by extracting the edges/gradients from each part of image, which we believe will be useful for distinguishing cars from non-cars.
Below is an example of what an HOG feature vector looks like after performing the HOG extraction technique:
Using the OpenCV library, we implement a method called `extract_hog_features` which takes as input images and HOG parameters and outputs a flattened HOG feature vector for each image in the dataset.
Next, these feature vectors are combined with a label vector (1 for cars, 0 for non-cars) to be used for training the model. The data is shuffled and split into training and test sets.
The optimal HOG parameters are chosen based on the performance of the classifier. Since this system will be used in a self driving car, not only is the accuracy important, but also the speed of the prediction. Thus, a balance must by struck between speed and accuracy.
Training a classifier for detecting cars:
A number of models from sci-kit library are trained and tested to determine the optimal classifier to use on the pipeline.
Although it is the slowest to predict by a factor of about 4x compared to the other classifiers,
the neural network / multi-layered perceptron has the highest test accuracy by a significant amount, and is used in the data pipeline.
Sliding Window Search
A sliding window approach is taken to classifier the car in the images. Windows of various sizes will scan the image as the classifier looks for cars. Instead of performing the HOG feature extraction technique on each of the cars, which would be too computationally expensive, the HOG features are extracted for the entire image, then the subset of these features are fed into the classifier depending on the sliding window.
Below shows how the sliding window moves across the image. The overlap in the X direction was set to 50% while the overlap in the Y direction was set to 75%. This pattern proved fairly effective in producing redundant true positive detections, which is useful for later on weeding out false positive detections using a heatmap strategy (explained below).
Filtering out false positives with a heatmap approach:
A heatmap approach is used to improve the robustness of the model as well as for drawing the final bounding boxes for the detected cars.
During the sliding window process, true positive detections typically happened very frequently and close together whereas false positive detections will happen few and far between. A heatmap is very effective at differentiating between the two.
We create a function called `add_heat` which increments the brightness of an all black image every time a positive classification occurs during the sliding window process. Areas with overlapping classifications will be brighter than less with less overlap.
In the video feed, we take a rolling sum of the last 12 frames to calculate the heatmap.
At the end of this process, we apply a minimum cutoff threshold, setting all pixels that don't meet the threshold value to zero.
Next, we group contiguous areas together:
Finally, the area is extended to create the rectangular bounding box and applied to the original image.
Here are some more examples of the pipeline working on other images in from the video feed:
Implement the data pipeline on a video feed:
Finally, use all the steps above to process a video feed for a self driving car.
A deque data structure is used to store bounding boxes from the last 12 frames of the video as it is being processed. The final list of bounding rectangles will be generated from the last 12 frames of the video instead of just using one frame. The OpenCV function cv2.groupRectangles is used to group overlapping boxes together. A threshold of 10 is used, meaning a minimum of 10 overlapping rectangles must occur before a detection is made. Doing this serves to weed out false positives and make the model more robust, as it unlikely for more than 10 out of the last 12 frames to contain false positives detections.
Conclusion:
The most difficult parts of this project included:
- Figuring out an effective method for eliminating false positives. I had to find the appropriate threshold for the model as well as tune the sliding window behavior (i.e window size, overlap percentage) to optimize for the least amount of false positives. To further improve the model, I could build a much larger dataset by downloading more images or augmenting the dataset.
- Figuring out a way to create discrete bounding boxes on each car. I had to do alot of experimentation until I found the optimum threshold to apply to the heatmap and optimum strategy for grouping together the boxes.
- Implementation of the pipeline in realtime. Although the video was only 46 seconds, it took my laptop 31 minutes to process the video feed and detect the vehicles. Therefore, my model would need a 40x speed up in order to run on realtime on my laptop. Its not clear to me yet whether increased computational power could enable realtime deployment of this model
Areas for further exploration:
- Instead of using HOG feature extraction followed by a classifier, I could explore using a convolution neural network. Omitting the HOG feature extract step could improve the performance of the data model without sacrificing prediction time because of the effectiveness of the convolutional neural network at doing both feature extract and prediction.
Additional Resources / Further Reading:
link to the my code for this project on github:
https://github.com/luweizhang/self-driving-car/blob/master/CarND-Vehicle-Detection/pipeline.ipynb
some papers on vehicle detection:
http://cmp.felk.cvut.cz/~vojirtom/publications/itsc2012.pdf
http://ijiset.com/vol2/v2s7/IJISET_V2_I6_08.pdf
http://old.cescg.org/CESCG-2014/papers/Sochor-Fully_Automated_Real-Time_Vehicles_Detection_and_Tracking_with_Lanes_Analysis.pdf
more info on histogram of oriented gradients (hog):
http://www.learnopencv.com/histogram-of-oriented-gradients/
http://www.pyimagesearch.com/2014/11/10/histogram-oriented-gradients-object-detection/
convolutional neural networks with hard negative mining:
https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwivtuuNyoXTAhUDF5QKHRVFCfUQFggbMAA&url=http%3A%2F%2Fwww.mdpi.com%2F1424-8220%2F17%2F2%2F336%2Fpdf&usg=AFQjCNEdOQ8t58ipUDfICPXaEwNtkQbgzw&sig2=fHURKopLjMhrx_vp2VQK
The dataset:
We have a dataset of images that we can use to build a model to detect whether an image is a car or not. The dataset consists of roughly 9000 images of cars and 9000 images that do not contain cars. Below is a random sample of images from the dataset.
Histogram of Oriented Gradients
We use a method called histogram of oriented gradients to extract features from these images. HOG is a feature extraction technique that works by counting the occurrence of gradient orientation in localized portions of an image. Essentially, HOG works by extracting the edges/gradients from each part of image, which we believe will be useful for distinguishing cars from non-cars.
Below is an example of what an HOG feature vector looks like after performing the HOG extraction technique:
Using the OpenCV library, we implement a method called `extract_hog_features` which takes as input images and HOG parameters and outputs a flattened HOG feature vector for each image in the dataset.
Next, these feature vectors are combined with a label vector (1 for cars, 0 for non-cars) to be used for training the model. The data is shuffled and split into training and test sets.
The optimal HOG parameters are chosen based on the performance of the classifier. Since this system will be used in a self driving car, not only is the accuracy important, but also the speed of the prediction. Thus, a balance must by struck between speed and accuracy.
Training a classifier for detecting cars:
A number of models from sci-kit library are trained and tested to determine the optimal classifier to use on the pipeline.
Classifier | Training Accuracy | Test Accuracy | Prediction Time |
---|---|---|---|
LinearSVC | .9998 | .951 | .000147 seconds |
Logistic Regression | 1.00 | .956 | .000143 seconds |
Neural Network | 1.00 | .993 | .000308 seconds |
the neural network / multi-layered perceptron has the highest test accuracy by a significant amount, and is used in the data pipeline.
Sliding Window Search
A sliding window approach is taken to classifier the car in the images. Windows of various sizes will scan the image as the classifier looks for cars. Instead of performing the HOG feature extraction technique on each of the cars, which would be too computationally expensive, the HOG features are extracted for the entire image, then the subset of these features are fed into the classifier depending on the sliding window.
Below shows how the sliding window moves across the image. The overlap in the X direction was set to 50% while the overlap in the Y direction was set to 75%. This pattern proved fairly effective in producing redundant true positive detections, which is useful for later on weeding out false positive detections using a heatmap strategy (explained below).
Filtering out false positives with a heatmap approach:
A heatmap approach is used to improve the robustness of the model as well as for drawing the final bounding boxes for the detected cars.
During the sliding window process, true positive detections typically happened very frequently and close together whereas false positive detections will happen few and far between. A heatmap is very effective at differentiating between the two.
We create a function called `add_heat` which increments the brightness of an all black image every time a positive classification occurs during the sliding window process. Areas with overlapping classifications will be brighter than less with less overlap.
In the video feed, we take a rolling sum of the last 12 frames to calculate the heatmap.
At the end of this process, we apply a minimum cutoff threshold, setting all pixels that don't meet the threshold value to zero.
Next, we group contiguous areas together:
Finally, the area is extended to create the rectangular bounding box and applied to the original image.
Here are some more examples of the pipeline working on other images in from the video feed:
Implement the data pipeline on a video feed:
Finally, use all the steps above to process a video feed for a self driving car.
A deque data structure is used to store bounding boxes from the last 12 frames of the video as it is being processed. The final list of bounding rectangles will be generated from the last 12 frames of the video instead of just using one frame. The OpenCV function cv2.groupRectangles is used to group overlapping boxes together. A threshold of 10 is used, meaning a minimum of 10 overlapping rectangles must occur before a detection is made. Doing this serves to weed out false positives and make the model more robust, as it unlikely for more than 10 out of the last 12 frames to contain false positives detections.
The most difficult parts of this project included:
- Figuring out an effective method for eliminating false positives. I had to find the appropriate threshold for the model as well as tune the sliding window behavior (i.e window size, overlap percentage) to optimize for the least amount of false positives. To further improve the model, I could build a much larger dataset by downloading more images or augmenting the dataset.
- Figuring out a way to create discrete bounding boxes on each car. I had to do alot of experimentation until I found the optimum threshold to apply to the heatmap and optimum strategy for grouping together the boxes.
- Implementation of the pipeline in realtime. Although the video was only 46 seconds, it took my laptop 31 minutes to process the video feed and detect the vehicles. Therefore, my model would need a 40x speed up in order to run on realtime on my laptop. Its not clear to me yet whether increased computational power could enable realtime deployment of this model
Areas for further exploration:
- Instead of using HOG feature extraction followed by a classifier, I could explore using a convolution neural network. Omitting the HOG feature extract step could improve the performance of the data model without sacrificing prediction time because of the effectiveness of the convolutional neural network at doing both feature extract and prediction.
Additional Resources / Further Reading:
link to the my code for this project on github:
https://github.com/luweizhang/self-driving-car/blob/master/CarND-Vehicle-Detection/pipeline.ipynb
some papers on vehicle detection:
http://cmp.felk.cvut.cz/~vojirtom/publications/itsc2012.pdf
http://ijiset.com/vol2/v2s7/IJISET_V2_I6_08.pdf
http://old.cescg.org/CESCG-2014/papers/Sochor-Fully_Automated_Real-Time_Vehicles_Detection_and_Tracking_with_Lanes_Analysis.pdf
more info on histogram of oriented gradients (hog):
http://www.learnopencv.com/histogram-of-oriented-gradients/
http://www.pyimagesearch.com/2014/11/10/histogram-oriented-gradients-object-detection/
convolutional neural networks with hard negative mining:
https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwivtuuNyoXTAhUDF5QKHRVFCfUQFggbMAA&url=http%3A%2F%2Fwww.mdpi.com%2F1424-8220%2F17%2F2%2F336%2Fpdf&usg=AFQjCNEdOQ8t58ipUDfICPXaEwNtkQbgzw&sig2=fHURKopLjMhrx_vp2VQK
Comments
Post a Comment