This post is a companion explanation to the KBotics Vision System for the 2013 challenge Ultimate Ascent. The full source code is available at: https://github.com/QueensFRC/2809_vision2013
I think our vision system this year is pretty unique, I doubt there are any other teams doing vision the same way we are. I learned a lot from last year and I had two new ideas for this years vision system. I’m going to start by describing the one we didn’t do and end with what we did.
Last years vision system used raw thresholding on the input image and then filtering to identify the targets – which did work it was just highly susceptible to changes in the lighting conditions. We remedied this by saving images we took on the field and then tuning our vision system in between matches. The system worked well but I knew we could do better. The trick was going to be how to get away from absolute thresholding and use a “relative” approach, this is where the two new ideas come in.
The first approach requires the LED ring to be wired to either a spike (relay) or into the solenoid breakout board – this gives you control over when the light is on and when it is off (I hope you can see where I’m going with this!!), then using some back and fourth with the Smartdashboard have your robot turn the lights off then take an image, turn the lights back on and take a second image. Assuming you stopped your robot to aim then simply subtract the 2 images and you should get a very cleanly segmented target. Pretty simple, very effective, I am surprised more teams aren’t doing something like this. We tried this out but we eventually chose a slightly different approach due to the stopped assumption (wish I’d kept some of the images though for this blog post to demonstrate my point!).
Okay so what did we do? We implemented a two layer vision system, a low level system identifies possible targets and then a machine learning algorithm classifies the candidates as target or not. The inspiration for this approach came from my new favourite book “Mastering OpenCV with Practical Computer Vision Projects” Chapter 5 – Number Plate Recognition Using SVM and Neural Networks.
The low level vision system:
The input image:
The low level vision system runs a sobel filter over the image to highlight the horizontal edges (we know the targets will have quite strong horizontal edges). The following figure shows what the output from this looks like:
Next we run a morphological filter to join the bottom of the target to the top, the result looks like this:
The next step is to run a connected components algorithm and keep only the blobs that are roughly the correct size and aspect ratio. The contours that passed our filter are drawn in blue on the following image:
From here we take each blob that has passed all the filters thus far as a candidate target. We take each candidate and rotate it to be oriented horizontally and resize it to a constant size. The low level system returns all these normalized sub-images along with their locations. 2 such images are shown below:
Top: Target, Bottom: False Positive (looks like a light)
As seen above the love level vision system is pretty easy to fool – that’s what we have a second part to our algorithm!
The second part of the vision system sends the candidate targets through a Support Vector Machine (SVM) which gets the final say on whether it really is indeed a target or not. SVMs are a type of machine learning algorithm, the goal of machine learning is to teach the computer how to perform a task through showing it examples rather than giving it explicit instructions. So in a separate step before running the vision system we trained SVM by showing it many examples of targets and “not-targets” allowing it to learn how to classify new inputs on its own. SVMs are implemented in OpenCV and the training code is in the trainSVM.cpp file on GitHub (along with a few python scripts for generating file lists).
The neat part about our vision system is that because we save the candidate targets when we are out on the field we can add them to the pool of training data in between matches and our algorithm will actually get smarter and improve as we play more matches and get more data!
Here is a video of the system in action (using my cppVision framework) ! At 12 seconds the operator turns the vision system on and you can see the targets light up as the LEDs turn on to. A small green cross-hair appears in the middle of the top target and when the operator is aligned the fullscreen cross-hair changes from red to green.