20170129; Sunday, 29 Jan. 2017 Basic Autonomous Driving Research
I had a number of motivations, now historical, to look into the matter of figuring out how to autonomously drive some ground based motion device. But, I will discuss those later, if time permits, or in some future paper, since I think that those motivations are quite interesting; this paper is ancillary to those questions.
The mere question raised was: can I program a working, even if trivial, autonomous piloted vehicle? A ground based device, a Parrot "Jumping Sumo," was chosen because: of its cost, ease in getting replacement devices, and, most importantly: the "open" nature of its control protocols. Despite the fact that a c library and SDK with bindings for various languages was available, one could, in an otherwise uncomplicated and straight-forward manner connect to the vehicle via an unencrypted on-device wifi hotspot and read and write packets to control and gain information from the device.
At first, the goal was to have the entire autonomous driving done onboard, but, there was no easy facility that would allow a software plug-in module to be loaded into the device. Since the Parrot device has a linux operating system that is periodically updated and can be flashed by its owner onto the device - it obviously would be possible to re-write the OS and change the device's controls, and there are projects one can fairly easily find on the internet to do this, but, such becomes an obvious voiding of the device's warranty, and, further, I don't think such a project would then become "easy" and the goal of this inquiry was to do relatively easy things that any programmer could do. Now, it it would be quite nice and wise for Parrot to further "open" up the device controls to allow modification of the control loop (similar, in nature, to a modperl.so plugin, or something like this), via some published api, and I expect that they or some intrepid company soon will. Alternatively, one could attach a Raspberry Pi with a camera to a remote controlled car and achieve the same.
So, since the Parrot "Jumping Sumo" has a camera that takes fair 640x480px jpgs, I decided to attempt to simply have the device transmit its video signal to some base computer for off-line analysis and processing, and then have that base computer transmit necessary navigation back to the device, and to repeat this as a control loop. This paper, then, is about the software procedures used, and some other discoveries made, in this navigation.
As an easy case, I choose to have the device follow colored tape laid out in some fashion on the floor, and I found blue painter's tape which could be easily placed down, moved, and removed without much trouble.
The horizon line of the camera occurs a bit under half the height of the image, so, it is possible to immediately chop the image in half, or even a bit more - at the 220 line and not consider anything in the top half of the original image without loss of information in terms of navigating from a line on on the floor.
Since we are only considering moving the device along a path of a single blue line on a floor, and, generally speaking, we don't have other falsely positive blue information in the environment, we can use opencv, which can quickly operate on matrices of data, to tranform the pixels of the jpg from an RGB colorspace to HSV, and then filter the image for the hue ranges the blue we are looking for are most likely to be in; we can create a new black and white "image" whose pixels are either on or off, black or white, respectively, whenever we find a "blue" color in our chosen HSV ranges. In practice, finding a range of blue colors was a bit difficult because of certain features of the camera that returned blue-range pixels in low light, as well as did areas around light flares and reflections. These flare other noted features are visible in the first still image. Further, a particular blue hue from the camera would change throughout light conditions.
Further, we can, still quickly, using opencv, reduce this "image" into a smaller one by sliding a 4x4 or 9x9 "Region Of Interest" or ROI window across this matrix and set into a new bit matrix a true (black) value where ever we have some number of true bits in the old. In this fashion, we can quickly reduce our consideration of roughly 500KB of data down to about 600 bytes.
There are a number of choices we could make to determine which way we'd go. Simply enough, we can weight "on" pixels that are closer to us, or, at the bottom of our matrix, greater than those at the top, and further, those in the middle of the image as of greater value than those on the sides. We can count the values from each column and then determine if the preponderance of weighting has us move either towards the center, left, or right, and if chosen, a degree or or angle as an offset from the center. The below image shows an enlarged visualization of the mapping I proposed to weight pixels - those falling into darker area are counted greater than those falling into lighter areas, and the final sum of the columns determine where we drive the device. If the camera find no blue pixels to sum in our initial photograph, the driver can return NONE and the device can turn left or right, some amount, and then re-attempt to find a direction from a new position. If none is found after a certain number of direction-less turns, the program can end.
Because of its simplicity and ease of use, particularly as a prototyping tool, I used perl to communicate with the device, via the published ARSDK protocol document. In short, packets with a certain format that are described with good detail in the protocol document are read and written. The Operating System of the host computer managed the wifi connection to the device's hotspot, and the perl code starts by making a socket connection to a published interface, first via TCP and then later via UDP, after a configuration is received. The published information about the Jumping Sumo device suggested that a "video feed" of 16 jpeg images were sent per second, but this was frequently not true. At first I thought that it would be possible to, within this timeframe, receive an image, process it and then send back a direction command, but, I soon discovered that various conditions on both my host computer and the device prevented that from happening in a realtime fashion. Once driving or MOVE commands stopped being received by the device in a time-bound consistent fashion, the video feed itself would shut off. Other Parrot devices did not have this feature, and, at least as far I was concerned, the feature itself was undocumented.
The video feed from the device could be shut down entirely, and it was possible to send commands to the device to have it take a single photo and place that photo into a directory on the device. It was possible to determine, somewhat painstakingly, which particular commands to send the device by capturing and then analyzing the packets sent from Parrot's published iPhone app created to drive the device. The perl code on the host could then connect to the device's ftp server, and read the most recent image, deleteing any others. This received image could then be processed in the manner described above, and I found that I could safely complete two such control loops per second. One issue of note was that if the camera was in motion, set in motion by motion of the device, because of the camera's relatively slow shutter speed, blurred images were much more difficult to correctly process, and so it helped to have the device still while acquiring images.
Another interesting problem, which is apparent in the above low angle photo of the device was that the camera on the "Jumping Sumo" could not have its angle changed, and so pointed out in such a fashion that a path on the floor 4 inches and closer could not be seen; a device moving too fast across a sharp angle would soon lose its way. One of inquiry was that this problem could be ameliorated (short of opening up and modifying the device) by placing a fixed mirror on the device that pointed downward by some degree, thereby serving as a direction change for the lens, and the first picture in this paper shows some investigations towards that, but, in the end, I determined that adding a mirror is unnecessary as such an addition would change the scope of the project a bit.
I include a link to fairly rough, but working, versions of both the perl control code as well as the opencv matrix analysis c++ code, for further analysis and discussion.
I was deeply impressed the moment, now a number of years ago, that I saw a youtube demonstration of a UPenn GRASP Lab project with quadrotor copters, particularly where the drone manages flight planning through the middle of a moving hoop. While I have done no particular research into the particularities of the project shown, I think that cameras and other sensing devices off board would make the work easier, although the obvious gold standard would be to have the device itself with enough sensors and computing power, including a faster camera with sufficient shutter speed and other sensing devices, such as sonar and radar to aid in its own automatic navigation.
Further immediate and non costly lines of inquiry with this "Jumping Sumo" device could be to set up cameras outside the device, perhaps even only one, and have the third party driving computer both plan and navigate the drone along similar to paths shown above. Another obvious inquiry would be, using three or more cameras (for three dimensions) in a space, to have the device respond to real-time time changes in path and pilot through a hoop in either a vertical or horizontal fashion, or even both.