Abstract
Multiple Object Tracking is an undertaking normally used to look into the optical attentiveness of human beings in observing and supervising the attributes of various objects in motion. Human players portray different forms of achievements and failures in tracking practices that are frequently imputed to boundaries on a target system, tracking mental faculty or any other particularized cognitive structure.
In order to determine a human error that might spring up from cognitive restrictions and effects of unavoidable perceptual precariousness in tracking tasks, computation analyses are used. In dealing with a changing world, individuals are capable of maintaining discriminating focus on a set of mobile items within the surrounding. [108]
Introduction
In routine life, different situations are encountered where one requires look onto various objects at the same time in their fields of study. Performance in the tracking of such many items is bounded by some key elements- the amount of items to be tracked, the fastness of tracking them and the closeness of those items.
For instance, walking across an engaged street calls for one to direct their focus on the vehicles, traffic lights at the crosswalk and other obstacles to attention (Zikan, 1988).
Other applications such as protection and surveillance, automobile piloting, video redaction and compression, augmented realty, behavior analysis and medical imaging also require Multiple Object Tracking as they involve monitoring of several objects (Mole, Smithies, & Wu, 2011).
Humans are versatile in conducting circumstances such as these, whenever one’s focus needs to be spread over a number of details (Wolf, 2007). The ease with which several objects in motion are supervised in routine chores has yielded in the visceral anticipation that these many objects are tracked in line.
Nevertheless, the subject of whether figures of attention are spread in serial or in a parallel mode over many objects has led to many arguments in the 20th century (Challa, Morelande, Misicki, & Evans, 2011).
This has yielded several models for Multiple Object Tracking including; Serial model, FINST (Fingers of INSTantiation) model, Yantis Grouping model, Double Serial model and the (Parallel) FLEX model which stands for FLexibiliy allocated indEXs.
This study aims to discuss the five models of Multiple Object Tracking while identifying their strengths and weaknesses, and how they can be improved.
The Serial Model
This was first proposed by Pylyshyn and Storm in 1988. This model commands only a single focus of attention visits in each target successively. It therefore necessitates rapid cycling through the objects so as to index their positions and return to each one before it goes so far-off (Pylyshyn & Storm, Tracking multiple independent targets, 1988).
Every time the attention revisits the target’s recalled position, the closest object would be considered as the new location of the target and that current position would be noted for the following round. When it is time to revisit a given target, the model assumes that the object that is closest to the target’s previously remembered position is the target (Turek, 2007).
A shortcoming to this model is that, its tracking ability of a target becomes poor if an object is not correctly located during the revisits. The preciseness with which an object can be situated is determined by a number of factors such as the speed with which the objects move, which might lead to invalid location info within the memory (Smith, 2011).
An error occurs if the targeted item has travelled so distant from the time it was previously monitored, that it is no more the nearest item to its previous location. Similarly, when the space betwixt items is decreased, there is a likeliness of confusing some nearby item for the purported object (Cui, Sun, & Yang, 2011).
Therefore, the performance of this model is greatly influenced by increase in the distance of the path linking each object to the other, increase in the number of objects to be tracked, increase in the amount per unit size of items being monitored, and increase in the speed of the items (Cremers, 2007).
These factors influence the operation of the serial model by affecting the velocity with which the info concerning the presentation is outdated in proportion to the sampling speed of the serial mechanism (Seiffert, 1996).
The FINST Model
It was at the beginning also proposed by Pylyshyn and Storm in 1988. It argues that pre-attentive intellectual pointers track objects in parallel, mechanically and with no exertion. Once connected to a point, an index does not have to be reviewed but rather binds to the target while in motion (Pylyshyn & Storm, 1988).
These indexes, Pylyshyn argues, do not need attention for contact to be maintained with the positions of the target but they act as pointers to permit focus speedy approach of an item; but just a single item at once.
The amount of pointers relies upon the person but is restricted to 3-5 (Pylyshyn, Tracking Without Keeping Track of Object Identities, 2004). Thus one would expect only these many objects to be tracked. Although the pointers themselves track objects pre-attentively, attaching them to an object to begin with may require focus (Scholl & Pylyshyn, 1999).
Yantis Grouping Model
Yantis proposed that each target is sorted into a single advanced grade objects with every target an item in a vertex within a practical polygon. To track this shifting figure, only one channel of attention is required.
By tracking the individual polygon, the positions of the targets can be inferred since the targets share a mutual movement, they group more firmly thus making tracking to be less difficult (Yantis, 1992).
In short, it leaves all the targets to be tracked in one case whilst still only having a single centre of attention, that is, the focus of attention is on the polygon and not the targets as such (Cohen, Horowitz, & Wolfe, 2010). This showcases that, object redundancy can be taken advantage of by the commentator but still is not a ground that grouping is the tracking mechanism for single targets in motion.
Its disadvantages are that the objects might not be moving in unison hence reducing the tracking performance and that it disregards target identities hence might lead to confusion (Luck, 2007). The polygon might also collapse into a concave polygon (Sperling, 1960).
For it to perform better, the objects need to form a more solid body. The motion of the objects also needs to be restricted so as to avoid the collapsing of the polygon (Kunar, Carter, Cohen, & Horowitz, 2008).
All the same, assuming that an observer loses or mixes-up target identities, this method will be the most suitable for locating the lost target as it groups all targets into one item hence making it easy to trace its location (Sharan, 2008).
The Double Serial Model
This was first proposed by Alvarez and Cavanagh in 2005. It resembles the Standard Serial Model in some ways except that there is a main point of attention in every Hemifield. It suggests that, each cerebral hemisphere is a representation of the opposite ocular Hemifield which has an independent method of tracking targets from the other (Brady, Konkle, Avarez, & Oliva, 2008).
It follows therefore that, every time a target traverses a perpendicular midline, the obligation for its tracking would require to be channeled from mechanism to mechanism (Howe, Sagreiya, Curtis, Zheng, & Livingstone, 2008).
Due to this independent object tracking, this method can only track two objects in each Hemifield but not four objects within a single Hemifield. Nonetheless, this method is advantageous as it can track as many objects per unit time (Cavanagh & Alvarez, 2005).
The FLEX Model
This was introduced by Alvarez and Franconeri in 2007. This, just as the FINST Model, presupposes that items are tracked by mental pointers that run parallel to each other (Blakeslee & McCourt, 1999).
Nevertheless, dissimilar to the FINST model, it presumes that one can have any amount of the pointers but making a pointer utilizes a resource which requires lots of effort thus limiting the amount of pointers that one can have in the end (Franconeri, Jonathan, & Scimeca, 2010).
Hence performance reduces with the amount of items under tracking (Winawer, 2005). Consequently, the quicker the velocity, the more attention one requires to track every item and so only fewer items can be tracked (Anderson, 2008).
Conclusion
The information capability of human memory has a crucial function in cognitive and neurotic models of memory, identification, and classification, as models of these courses inexplicitly or denotatively makes claims concerning the level of information put in memory. Multiple object tracking addresses the subject matter of how attention can be split.
Generally, the first three models of Multiple Object Tracking- serial, Yantis Grouping and FINST assume that human attention cannot be divided whilst the other two differs and proposes that attention can be divided. However, all agree that attention is vital in the tracking of an object.
For instance, using a hands free phone whilst driving could pose some dangers to a motorist since it will impair his/her attention to an alerting rate. In this case, the danger lies most in what the conversation does to the motorists brain rather than what his/her hands does when taking the call.
All the models except the FINST agree that objects can be tracked by occlusion which is applicable in real life situations. This is because tracking objects by occlusion requires that a prediction of where the object will reappear is done which would seem improbable for the FINST model as it is pre-attentive.
Ultimately, in my point view therefore, the FINST model holds up as the most plausible model for tracking objects. This is because, despite its inability to track occluded objects and to track many targets at the same time, it has the ability to pre-attentively track objects in parallel and with less effort.
Its efficiency can be improved by creating a kind of a MOT that is similar to it only that it will have a pointer pre-attentively tracking several objects at once.
References
Anderson, B. L. (2008). Transparency and Occlusion. University of New South Wales.
Blakeslee, B., & McCourt, M. (1999). A multiscale spatial filtering account of the White effect. Vision Research.
Brady, T., Konkle, T., Avarez, G., & Oliva, A. (2008). Visual long-term memory has a massive storage capacity for object details. Massachusetts Institute of Technology.
Cavanagh, P., & Alvarez, G. (2005). Tracking Multiple Targets with Multifocal Attention. Harvard University.
Challa, S., Morelande, M., Misicki, D., & Evans, R. (2011). Fundamentals of Object Tracking. Cambridge University Press.
Cohen, M., Horowitz, T., & Wolfe, J. (2010). Auditory recognition memory is inferior to visual. PNAS.
Cremers, D. (2007). Energy minimization methods in computer vision and pattern recognition. Springer.
Cui, P., Sun, L., & Yang, S. (2011). Adaptive mixture observation models for multiple object tracking. Springer.
Franconeri, S., Jonathan, S., & Scimeca, J. (2010). Tracking Multiple Objects Is Limited Only by Object Spacing, Not by Speed, Time, or Capacity. SAGE.
Howe, P., Sagreiya, H., Curtis, D., Zheng, C., & Livingstone, M. (2008). The Double- Anchoring Theory of Lightness Perception. Harvard Medical School.
Kunar, M., Carter, R., Cohen, M., & Horowitz, T. (2008). Telephone conversation impairs sustained visual attention via a central bottleneck. Psychonomic Bulletin, 1135- 1140.
Luck, S. (2007). The Capacity of Visual Working Memory for Features and Conjunctions. Macmillan Publishers Ltd.
Mole, C., Smithies, D., & Wu, W. (2011). Attention: Philosophical and Psychological Essays. Oxford University Press.
Pylyshyn, Z. (2004). Tracking Without Keeping Track of Object Identities. Visual Cognition, 11, 801-822.
Pylyshyn, Z., & Storm, R. (1988). Tracking multiple independent targets. Spatial Vision, 179-197.
Scholl, B., & Pylyshyn, Z. (1999). Tracking Multiple Items Through Occlusion. Cognitive Psychology, 38, 259-290.
Seiffert, A. (1996). Attentional costs in multiple object tracking. Cognition. Oxford Press.
Sharan, L. (2008). Image statistics for surface reflectance perception. Cambrige University Press.
Smith, K. (2011). Reversible-jump markov chain monte carlo multi-object tracking tutorial. Web.
Sperling, G. (1960). The Information Available in Brief Visual Representations. Harvard University.
Turek, M. (2007). Combinatorial optimization for tracking and low-level computer vision problems. ProQuest.
Winawer, A. (2005). Letters to Nature. Nature Publishing Group.
Wolf, J. (2007). Current Progress With a Model of Visual Search. New York: Oxford.
Yantis, S. (1992). Multielement Visual Tracking: Attention and Perceptual Organization. COGNITIVE PSYCHOLOGY, 295-340.
Zikan, K. (1988). Track initialization in the multiple-object tracking problem. Stanford University.