In this assignment, we are given two image sequences of partially segmented data of bats and fishes, and we need to track the movement of these animals across multiple frames.
Method and Implementation
To do the object tracking, we decided to utilize the kalman filter built-in method for predicting the area of a given object's next location, as well as a greedy tracking method where the closest object to the new predicted location is used in conjunction with a maximum distance threshold. The basic concept is that for each point in an image, the prediction is run to where it would be located in the next frame. We look for the closest still available point to track to in the next frame in a circle around this point, by checking all tracks not marked and ensuring it is less than the maximum threshold distance away from the predicted location. Then, the track data and new point is pushed into the data-structure. If it is a new object, then it is initialized with a new array of points and color. If the object died, then it's data is deleted - we do not try to re-connect lost tracks or same tracks that re-enter as separated objects.
Note: The dataset for the bat tracking is entirely contained in part1.cpp, and the dataset for the fish tracking is entirely contained in part2.cpp, and they both share an imported header file called "filterBundle.h" which implements the kalman filter tracking itself.
- Object detected in current frame, 1 or more available tracks within max distance of prediction of previous frame: Simply select the closet matching point to the new predicted area, based on the result of the kalman filter
- Object detected in current frame, but no (available) tracks within range: The object dies, its data removed
- Object had no previous frame: new object is born, given a dataslot with a color to keep thereafter
- drawObjectDetections (part 1 & 2): This function takes in the current frame and the global data for each tracked object and draws the centroid and tails of each object for that frame.
- getLabelMatrix (part 1): This reads in the data specific to the file format of the provided bat segmentation data.
- convertFileToMatWithColor (part 1 & 2): This file is used to recolor the current frame's objects to be consistent with previous frames. It's input is the global bundle data, which has objects for every tracked objects, containing it's current and past positions and it's static color.
- getHashMapArea (part 1): This gets and returns the according area found calculated for an object to compare as another parameter in determining the best track match.
- semgent2binary (part 2): This function does the segmentation of the original image input into a set of binary objects, where each object has its own number (non-zero). The output is a vector of 1-channel images, each having a black background with labelled objects up to the number found, as well as a parallel vector with the centroids for each object.
- readImages (part 2): This is a function that just reads in the image files of the original fish sequence and pushes them into a vector.
Describe your experiments, including the number of tests that you performed, and the relevant parameter values.
Detection rate: As segmentation was only provided to us in part 1, segmentation only is really applicable to compare for part 2, despite it more or less being provided. We can visually see below that the detection of object nearly perfectly matches the original image where red contours provided - places where the fish was not detected by the provided sequence were of course not detected.
Accuracy rate: For the tracking itself, it is reasonably accurate overall. It quite reliably tracks lone points that move and change directions without issues, and when there are many points nearby it is still able to function decently - however there are still of course errors.
Our primary results are shown in these two image sequences, the tracked bat and then tracked fish sequences as shown below. The bat sequence we are already provided with labeled objects, whereas the fish sequence we had to re-extract the contours for already segmented fish.
|Dataset||Source Image||Result Image|
|Tracking the bats||Tracking the fish|
Discuss your method and results:
- The strength of our method is that it effectively deals with changes in velocity and direction very gracefully. Looking at the tracked points, where it falls short is only when the provided segmentation is not effective and it disappears and reappears.
- Where our method falls short is when there are several fish nearby, we can see that the color can often "jump around" or we find that a fish suddenly changes color, which would be representative of the situation where it "died" in the previous frame because all new tracks were taken, but then a new point was still in the same location so a point is "born" in the same place.
- Our results differed from what we expected in that we expected a slightly better track in some cases where there were large area differences, such as when one bat occludes another. Our algorithm tries to make use of area sizes and matching such that there aren't huge changes in area, but it doesn't help if one segmented object overlaps another of course.
- There are many potential improvements to be made. Firstly, there currently is no interpretation of fish that "die" and are "reborn" in the middle of image. While fish going off and back on the screen may be more difficult to faithfully recgonize, we should be able to determine if say two fish overlapped each other (and hence have merged segmentation objects) and then passed on (back to two segmentation objects). We could potentially "connect" the old and new tracks of the fish based on the velocity direction and position before and after two tracks merged.
- Question 1: The animation in the results sections shows the whole sequence. One specific location where it does very well in the fish sequence for example is when that dark green fish darts close to two other fish that ultimately overlap and get colored yellow, it is fast moving and close to other blobs but the track stays strong and correct. Other places where it fails more is with the smaller denser blobs, like with the fats in the lower field of the image.
- Question 2: New tracks are started when an object has no available past track it can map to. Old tracks are considered dead when they do not have a new track to link to in the current frame. There is no memory beyond one frame
- Question 3: When the touch or occlude each other, they combine and form one blob with a common centroid. As mentioned above, some extra analysis of the before and after touching frames could be utilized to keep the correct tracks together, for example recognizing that two points suddenly become one and with a large change in percentage area as a detection for the situation.
- Question 4: When this happens, the object will just be born and die in one frame's time.
- Question 5: We did model the change in velocity, and while it was more complex, it made for much more solid tracking. The velocity is important because they are all too dense together, trying to guess which maps to which without any sense of direction or movement would be very difficult and not successful.
My conclusion is that the Kalman filter tracking method is very robust and can yield pretty good results. Most of the improvement I expect can come from the logic of how to match one track with the other, as the prediction of the kalman filter clearly works fairly well. Occlusion and high density points to track naturally are hard problems to tackle, but that would be what needs consideration to improve our method. That being said, despite going with a greedy track selection method, it still worked quite well.
Credits and Bibliography
Work was done with Timothy Chong