<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.0" specific-use="sps-1.6" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
	<front>
		<journal-meta>
			<journal-id journal-id-type="publisher-id">dyna</journal-id>
			<journal-title-group>
				<journal-title>DYNA</journal-title>
				<abbrev-journal-title abbrev-type="publisher">Dyna rev.fac.nac.minas</abbrev-journal-title>
			</journal-title-group>
			<issn pub-type="ppub">0012-7353</issn>
			<publisher>
				<publisher-name>Universidad Nacional de Colombia</publisher-name>
			</publisher>
		</journal-meta>
		<article-meta>
			<article-id pub-id-type="doi">10.15446/dyna.v84n200.57028</article-id>
			<article-categories>
				<subj-group subj-group-type="heading">
					<subject>Articles</subject>
				</subj-group>
			</article-categories>
			<title-group>
				<article-title>Pedestrian tracking using probability fields and a movement feature space</article-title>
				<trans-title-group xml:lang="es">
					<trans-title>Seguimiento de peatones utilizando campos probabilísticos y un espacio de descriptores dinámicos</trans-title>
				</trans-title-group>
			</title-group>
			<contrib-group>
				<contrib contrib-type="author">
					<name>
						<surname>Negri</surname>
						<given-names>Pablo</given-names>
					</name>
					<xref ref-type="aff" rid="aff1"><sup>
 <italic>a</italic>
</sup> </xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Garayalde</surname>
						<given-names>Damián</given-names>
					</name>
					<xref ref-type="aff" rid="aff2"><sup>
 <italic>b</italic>
</sup> </xref>
				</contrib>
			</contrib-group>
			<aff id="aff1">
				<label>a</label>
				<institution content-type="original"> Universidad Argentina de la Empresa (UADE). CONICET. Buenos Aires, Argentina. pnegri@uade.edu.ar </institution>
				<institution content-type="normalized">Universidad Argentina de la Empresa</institution>
				<institution content-type="orgname">Universidad Argentina de la Empresa</institution>
				<addr-line>
					<named-content content-type="city">Buenos Aires</named-content>
				</addr-line>
				<country country="AR">Argentina</country>
				<email>pnegri@uade.edu.ar</email>
			</aff>
			<aff id="aff2">
				<label>b</label>
				<institution content-type="original"> Instituto Tecnológico de Buenos Aires (ITBA), Buenos Aires, Argentina. dgarayal@itba.edu.ar </institution>
				<institution content-type="orgname">Instituto Tecnológico de Buenos Aires (ITBA)</institution>
				<addr-line>
					<named-content content-type="city">Buenos Aires</named-content>
				</addr-line>
				<country country="AR">Argentina</country>
				<email>dgarayal@itba.edu.ar</email>
			</aff>
			<pub-date pub-type="epub-ppub">
				<season>Jan-Mar</season>
				<year>2017</year>
			</pub-date>
			<volume>84</volume>
			<issue>200</issue>
			<fpage>217</fpage>
			<lpage>227</lpage>
			<history>
				<date date-type="received">
					<day>18</day>
					<month>04</month>
					<year>2016</year>
				</date>
				<date date-type="rev-recd">
					<day>01</day>
					<month>11</month>
					<year>2016</year>
				</date>
				<date date-type="accepted">
					<day>02</day>
					<month>12</month>
					<year>2016</year>
				</date>
			</history>
			<permissions>
				<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by-nc-nd/4.0/" xml:lang="en">
					<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License</license-p>
				</license>
			</permissions>
			<abstract>
				<title>Abstract</title>
				<p>Retrieving useful information from video sequences, such as the dynamics of pedestrians, and other moving objects on a video sequence, leads to further knowledge of what is happening on a scene. In this paper, a Target Framework associates each person with an autonomous entity, modeling its trajectory and speed by using a state machine. The particularity of our methodology is the use of a Movement Feature Space (MFS) to generate descriptors for classifiers and trackers. This approach is applied to two public sequences (PETS2009 and TownCentre). The results of this tracking outperform other algorithms reported in the literature, which have, however, a higher computational complexity.</p>
			</abstract>
			<trans-abstract xml:lang="es">
				<title>Resumen</title>
				<p>Recuperar información de secuencias de video, como la dinámica de peatones u otros objetos en movimiento en la escena, representa una herramienta indispensable para interpretar que está ocurriendo en la escena. Este artículo propone el uso de una Arquitectura basada en Targets, que asocian a cada persona una entidad autónoma y modeliza su dinámica con una máquina de estados. Nuestra metodología utiliza una familia de descriptores calculados en el Movement Feature Space (MFS) para realizar la detección y seguimiento de las personas. Esta arquitectura fue evaluada usando dos bases de datos públicas (PETS2009 y TownCentre), y comparándola con algoritmos de la literatura, arrojó mejores resultados, aun cuando estos algoritmos poseen una mayor complejidad computacional.</p>
			</trans-abstract>
			<kwd-group xml:lang="en">
				<title><bold>
 <italic>Keywords</italic>
</bold>: </title>
				<kwd>pedestrian tracking</kwd>
				<kwd>movement feature space</kwd>
				<kwd>target framework</kwd>
			</kwd-group>
			<kwd-group xml:lang="es">
				<title><bold>
 <italic>Palabras clave</italic>
</bold>: </title>
				<kwd>seguimiento de peatones</kwd>
				<kwd>espacio de descriptores dinámicos</kwd>
				<kwd>target framework</kwd>
			</kwd-group>
			<counts>
				<fig-count count="9"/>
				<table-count count="3"/>
				<equation-count count="8"/>
				<ref-count count="31"/>
				<page-count count="11"/>
			</counts>
		</article-meta>
	</front>
	<body>
		<sec sec-type="intro">
			<title>1. Introduction</title>
			<p>Vision-based object tracking is an important task and a useful source of information. It analyzes video sequences to retrieve the motion of an object at each frame [<xref ref-type="bibr" rid="B1">1</xref>]. Recovered metrics can consist of location, orientation, speed, and acceleration, computed on the image plane (2D) or real world (3D) reference coordinates. In general, the complexity is closely related to the object tracked: its articulated nature, or abrupt motion changes. Complex scenarios with illumination changes, noise, and object-to-object and/or object-to-scene occlusions will also degrade the tracking performance, particularly on non-controlled real life video sequences. Some examples of object tracking applications include: motion-based recognition [<xref ref-type="bibr" rid="B2">2</xref>], automatic surveillance [<xref ref-type="bibr" rid="B3">3</xref>], traffic monitoring [<xref ref-type="bibr" rid="B4">4</xref>], and vehicle navigation [<xref ref-type="bibr" rid="B5">5</xref>]. In this list, pedestrians or people are one of the most interesting objects to track for researchers and developers. In addition, it is an open subject because of its high complexity given a person’s changing appearance, non-rigid structure, and occasional hazardous motion.</p>
			<sec>
				<title><italic>1.1</italic>.<bold>
 <italic>Related work</italic>
</bold> </title>
				<p>There are different approaches to tackle human tracking. Some trackers use a bounding box at an initial frame of the sequence. Among them, two methods from the Tracking-by-Matching group [<xref ref-type="bibr" rid="B1">1</xref>] are distinguished by their simplicity and good performance. The first one is the Lukas-Kanade (LKT) algorithm [<xref ref-type="bibr" rid="B6">6</xref>]. It seeks to locate either a moving or non-moving object from one image to the next frame in the sequence. This method iteratively minimizes a dissimilarity measure in the neighborhood of a point of the tracked object. Shi and Tomasi [<xref ref-type="bibr" rid="B7">7</xref>] prove that corners are the best choice to obtain optimal tracking results. The second method is Mean Shift [<xref ref-type="bibr" rid="B8">8</xref>], which locates the object position within the next frame by maximizing a similarity coefficient calculated with the Bhattacharyya distance. This coefficient compares the color distribution of the target object against the possible object positions on the following image.</p>
				<p>Online discriminative classification [<xref ref-type="bibr" rid="B9">9</xref>] also uses initial bounding boxes. This method trains an adaptive classifier considering the first bounding box as positive, and the surrounding background as negative. Exhaustive research on the image at t+1 (the consecutive frame) will provide a new positive and negative sample which updates the classifier, and the loop repeats itself.</p>
				<p>When pedestrians (or vehicles) are the only moving objects on the scene, background suppression methods can estimate their motion. In [<xref ref-type="bibr" rid="B4">4</xref>], the dynamics of collected blobs are described by a collection of key-points. They are tracked by similarity functions matching the new blobs with stored objects.</p>
				<p>In surveillance applications, where new pedestrians continuously enter the scene, pre-trained trackers using a target model which is known <italic>a priori</italic> are employed. This methodology, called Tracking-by-Detection, is generally implemented starting with pedestrian location hypotheses generated by a person detector. Dalal's and Triggs’s people detector [<xref ref-type="bibr" rid="B10">10</xref>], provided by OpenCV [<xref ref-type="bibr" rid="B11">11</xref>], is perhaps the most widely used in the literature. Those hypotheses are associated with previously saved tracks. This matching can be performed using the Hungarian algorithm [<xref ref-type="bibr" rid="B12">12</xref>]. Finally, the tracking itself can be performed by particle filters [<xref ref-type="bibr" rid="B13">13</xref>], Kalman-inspired Event Cones [<xref ref-type="bibr" rid="B14">14</xref>], or the evolution of a state machine [<xref ref-type="bibr" rid="B4">4</xref>]. The procedures of object detection and trajectory estimation can be combined into a coupled optimization problem [<xref ref-type="bibr" rid="B14">14</xref>], enhancing their individual performance. This methodology is robust to changing backgrounds, moving cameras, and the presence of other moving objects, and is the best adapted approach to be used on real-world, non-controlled, video tracking sequences.</p>
				<p>Offline tracking systems are a variant of the Tracking-by-Detection pipeline [<xref ref-type="bibr" rid="B15">15</xref>,<xref ref-type="bibr" rid="B16">16</xref>]. They seek a global optimization of people's trajectories scanning forwards and backwards through the hypothesis locations at each frame to find the best path to explain the collected data. Ben Shitrit <italic>et al.</italic> [<xref ref-type="bibr" rid="B15">15</xref>] divide the ground plane on cells, and associate a probability occupancy map from people detector outputs. They use the K-Shortest Paths algorithm (KSP) to find the trajectories on the grid cells. The identities of the path are found by running a Linear Program procedure.</p>
			</sec>
			<sec>
				<title><italic>1.2</italic>. <bold>
 <italic>Proposed methodology</italic>
</bold> </title>
				<p>This paper aims at conducting pedestrian tracking from monocular video sequences captured by a calibrated camera with a fixed view on outdoor real scenes. Pedestrians can have different postures, and they are walking at different distances from the camera. Given that recorded sequences have cluttered and changing backgrounds, our tracking system follows a methodology based on the Tracking-by-Detection Framework.</p>
				<p>The first contribution of this paper is an adapted tracking procedure using the Movement Feature Space (MFS). In [<xref ref-type="bibr" rid="B17">17</xref>,<xref ref-type="bibr" rid="B18">18</xref>], the MFS was successfully used to detect vehicles and pedestrians respectively. The advantages of this detector include efficient calculation time, increased robustness, and minimum loss of information. Also, in the MFS, all the operations are performed in motion, thus the presence of cluttered backgrounds does not interfere with the tracking algorithm. Since the MFS does not have a notion of pixel intensity or color to compute an image gradient [<xref ref-type="bibr" rid="B6">6</xref>], or color histogram [<xref ref-type="bibr" rid="B8">8</xref>], the tracking approach is based on tracking fields: an object detection field, and an appearance field. The former is constructed using the people detector output scores, and provides the likelihood of a person at a given location. The appearance field is computed using a corner analysis applied on the MFS, capturing a pedestrian texture which is robust enough to perform the tracking when partial information is available. </p>
				<p>This paper also proposes an architecture where each pedestrian is considered as an autonomous entity, and their evolution on the scene is followed individually by a <italic>Target Framework</italic>. The framework associates one target with one pedestrian from his/her first view on the scene until he/she disappears from sight. A state machine models the dynamics of the target, which is continuously stored at a repository inside the framework. The evolution of the states of each target is employed in the data association stage, filtering false alarms, or concatenating broken trajectories.</p>
				<p>The Target Framework combines on-line and off-line tracking methods. Firstly, the video sequence is analyzed on-line, populating with targets a repository which saves the temporal information about all the hypotheses generated during the detection and tracking. The next stage implements off-line algorithms to filter false alarms and concatenate broken trajectories. The performance of the detection and tracking system is evaluated in two public datasets, and compared against two state-of-the-art tracking systems [<xref ref-type="bibr" rid="B12">12</xref>,<xref ref-type="bibr" rid="B16">16</xref>]. The sensitivity of the procedure with different people detectors is also analyzed.</p>
				<p>This paper is organized as follows: the section below details the detection and tracking procedure on the MFS, as well as the state machine associated with each target. Section 3 develops the procedures pruning the target framework in order to improve the results. Next, the Implementation System Setup is presented. In the Results and Discussion section the performance of the system on the tests datasets is described, followed by the Conclusions of the paper.</p>
			</sec>
		</sec>
		<sec>
			<title><italic>2</italic>. <bold>
 <italic>Online target framework generation</italic>
</bold> </title>
			<p>The Target Framework builds an on-line target repository based on the scene dynamics. <xref ref-type="fig" rid="f1">Fig. 1</xref> details the pipeline to generate this repository. Frame F<sub>t</sub> at instant <italic>t</italic> is projected on the MFS capturing the motion on the scene. The <italic>Pedestrian Detection</italic> block uses the MFS on a people detector to generate pedestrian hypotheses <bold>r</bold>
 <sub>d</sub>, also referred to as Regions of Interest (ROIs or rois). The <italic>ROI Filtering</italic> block filters neighbors and superposed rois in <bold>r</bold>
 <sub>d</sub> using the Non-Maximal Suppression (NMS) algorithm [<xref ref-type="bibr" rid="B19">19</xref>]. The resulting set <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i001.png"/> is used in the data association step. <bold>r</bold>
 <sub>d</sub> also generates a probability map <sub>
 <sup>
 <italic>Gt</italic>
</sup> 
</sub> on the <italic>Detection Field</italic> block. The MFS data are used in the <italic>Appearance Field</italic> step to generate a Gaussian corner map <sub>
 <sup>
 <italic>Kt</italic>
</sup> 
</sub> . The <italic>Target Repository</italic> block saves all the targets produced within the sequence. It is defined as the set of active targets <sub>
 <sup>
 <italic>Γt-1</italic> 
</sup> 
</sub> = {T<sub>1,t-1</sub>, … , T<sub>n,t-1</sub>} using those present on the previous frame. The <italic>Target Tracking</italic> block employs Probability fields <sub>
 <sup>
 <italic>Gt</italic>
</sup> 
</sub> and <sub>
 <sup>
 <italic>Kt</italic>
</sup> 
</sub> , detected rois <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i001.png"/>, and active targets set Γt-1 , to update Γt opening new targets, and closing others.</p>
			<p>
				<fig id="f1">
					<label>Figure 1</label>
					<caption>
						<title>Block diagram of the pedestrian on-line tracking system and the Target Framework.</title>
					</caption>
					<graphic xlink:href="0012-7353-dyna-84-200-00217-gf1.png"/>
					<attrib>Source: The authors.</attrib>
				</fig>
			</p>
			<sec>
				<title>2.1. Pedestrian detection on the MFS</title>
				<p>The MFS is an adaptive motion extraction model. It uses level lines and their orientations as features to generate an adaptive background model. The motion in the frame at time t, corresponds to the set of level lines which do not belong to the background model. It is encoded in two arrays: St and Ot, as shown on <xref ref-type="fig" rid="f2">Fig. 2</xref>. Matrix St(p) counts the number of moving level lines passing through pixel p, and Ot(p) indicates the orientation of the level lines with a different color. The background details are not present on the St(p) and Ot(p) matrix, as can be seen on the figure. New static objects entering the scene, will integrate the background model after a temporal window.</p>
				<p>
					<xref ref-type="fig" rid="f2">Fig. 2</xref> shows an example of the pedestrian detector output. The detector consists of a cascade of boosted classifiers trained using the Real Adaboost approach. The feature family encoding the information of the MFS are the histograms of oriented level lines (HO2L). They are computed by accumulating the number of pixels on Ot(p) having the same orientation (see [<xref ref-type="bibr" rid="B18">18</xref>] for further details). The detector output is a list of rois, as depicted on <xref ref-type="fig" rid="f2">Fig. 2</xref>, with their associated confidence score: rd={ri,si}n=0,...,n-1. Each roi is defined as ri = [xic , yic , wi , hi], where (xic, yic) is its central position and (wi, hi) are its width and height, respectively. The NMS filtering is applied to rd in order to determine the estimated pedestrian positions <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i003.png"/>.</p>
				<p>
					<fig id="f2">
						<label>Figure 2</label>
						<caption>
							<title>MFS computation in the PETS2009 dataset and the result of a pedestrian detector. The Fig. shows the MFS information of the St and the Ot arrays of the entire capture, and a zoom on the pedestrian position. It also shows set rd of pedestrian detected rectangles, and filtered pedestrian position<inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i003.png"/>.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-84-200-00217-gf2.png"/>
						<attrib>Source: The authors.</attrib>
					</fig>
				</p>
				<p>2.2. Association of targets and detections</p>
				<p>A target is an autonomous entity individually tracing a pedestrian moving on the scene. Target i is described with parameters: Ti,t-1 = {b,id,e,m}. Ti,t-1.b={x,y,w,h} is the bounding box containing the pedestrian, Ti,t-1.id is a label identifying the target, Ti,t-1.e is the state of Ti, and Ti,t-1.m is the motion history consisting of the last z displacement vectors: m={dt-z ,..., dt-1}.</p>
				<p>The association of the active targets on Γt-1 and the pedestrian hypotheses r at time t is key for the Tracking-by-Detection approach. As a result, it is possible to validate targets, filter false alarms, or use alternative tracking procedures if the detector fails.</p>
				<p>The association task is as follows: each pair (rj ,Ti), with rj being a detection on <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i003.png"/> and T<sub>i</sub> one of the n targets on Γ<sub>
 <italic>t-1</italic>
</sub> , generates a displacement vector <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i006.jpg"/> from the central point of T<sub>i</sub>.b to the central point of r<sub>j</sub> , where <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i007.jpg"/> and θ are the modulus and the angle of<inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i008.png"/>. Overlap ratio a<sub>
 <italic>0</italic>
</sub> between T<sub>i</sub>.b and r<sub>j</sub> is used as a confidence criterion, and is evaluated employing the PASCAL VOC formula [<xref ref-type="bibr" rid="B19">19</xref>]:</p>
				<p>
					<disp-formula id="e1">
						<graphic xlink:href="0012-7353-dyna-84-200-00217-e1.png"/>
						<label>(1)</label>
					</disp-formula>
				</p>
				<p>Assuming slow changes in the pedestrian dynamics, <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i010.png"/>should have low values, θ should be similar to <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i011.jpg"/> where <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i012.png"/> is the average of the motion vectors saved in T<sub>i</sub>.<bold>m</bold>, and a<sub>
 <italic>0</italic>
</sub> would be greater than zero. The angle between <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i013.png"/> and <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i012.png"/> is computed using the dot product:</p>
				<p>
					<disp-formula id="e2">
						<graphic xlink:href="0012-7353-dyna-84-200-00217-e2.png"/>
						<label>(2)</label>
					</disp-formula>
				</p>
				<p>After all the (r<sub>j</sub>, T<sub>i</sub>) pairs are evaluated, target T<sub>i</sub> is associated with the detection r<sub>j</sub> which best matches its historical motion. The best match will consist of the pair which minimizes <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i015.jpg"/> and <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i016.jpg"/>, maximizing a0. If rj is not associated with any Ti, a new target T is created and saved in the Γt set. A Ti of Γt-1 is considered as lost if no detection rj matches its dynamic.</p>
			</sec>
			<sec>
				<title>2.3. Probability fields for tracking targets</title>
				<p>To track one target, the procedure estimates its position on the current frame Ft from its position recorded in Γt-1. In this paper, tracking is conducted using two types of tracking fields: a detection field and an appearance field. The detection field is computed using output score si of the people detector on the detected set rd={ri,si}i=0, … , n-1 The appearance field is based on a corner extraction on the MFS.</p>
				<p>2.3.1. Detection field</p>
				<p>The Detection Field Gt(x) is a probability map generated using the rois set rd={ri,si}i=0, … , n-1 , where detected rois ri={xi, yi, wi, hi}. To compute the Gt field, ROIs ri in rd generate the map M(x) as follows: M x = 𝑖 𝑠 𝑖 𝛿 𝐱− 𝐱 𝑖 where δ(x) is a kroneker delta in<inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i018.png"/>, and xi is the central point of ri. The detection field is computed by convolving the map M(x) with a 2D Gaussian filter: </p>
				<p>
					<disp-formula id="e3">
						<graphic xlink:href="0012-7353-dyna-84-200-00217-e3.png"/>
						<label>(3)</label>
					</disp-formula>
				</p>
				<p>
					<disp-formula id="e4">
						<graphic xlink:href="0012-7353-dyna-84-200-00217-e4.png"/>
						<label>(4)</label>
					</disp-formula>
				</p>
				<p>where the parameters of the Gaussian filter include the covariance matrix Σ=0.12 [(wi)2 0 ; 0 (hi)2], the central point in patch xc=( wik/2, hik/2), and the position inside patch x’=(x',y') where x'=0,...,wi-1 and y'=0,..., hi-1. The highest values in Gt(x) are associated with a high confidence output of the people detector, and could suggest the presence of a person at position x of the image.</p>
				<p>2.3.2. Appearance Field</p>
				<p>The tracking using the Appearance Field is activated when one target Ti∈ Γt-1 is not associated with a detection in rd. As an example, if Γt-1 has only one target T0 and rd is empty (the detector failed to detect the pedestrian), all the elements of Detection Field Gt will be zero. In those cases, the Appearance Field will be used to compare the pedestrian characteristics from the previous and the present frame of the sequence. Furthermore, this procedure is robust enough to track the target with partial information.</p>
				<p>
					<fig id="f3">
						<label>Figure 3</label>
						<caption>
							<title>Appearance field generation from MFS corners of two consecutive captures from an urban video sequence.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-84-200-00217-gf3.png"/>
						<attrib>Source: The authors.</attrib>
					</fig>
				</p>
				<p>The Appearance Field uses arrays St and Ot of the MFS, on a vector-based corner detector [<xref ref-type="bibr" rid="B17">17</xref>]. This corner detector considers that, in a neighborhood Bp of one corner point p, there are pixels with significant gradients with different orientations. The average of the cross products between all the pixels in Bp is, in general, greater than the value computed in the neighborhood of a pixel that is not a corner.</p>
				<p>The average cross product in neighborhood Bp can be computed as: Kt = Ix2 &lt;Iy2&gt; + Iy2 &lt; Ix2&gt; - 2 Ix Iy &lt;Ix Iy&gt;. &lt;&gt; is the convolution with a 5x5 mask, where all its elements are ones, except for the center which has a zero value. Assuming that orientations Ot are given in radians, values Ix and Iy are defined as: Iy = St sin(Ot) and Ix = St cos(Ot).</p>
				<p>Higher values of Kt suggest the presence of corners, and are shown as darker regions in <xref ref-type="fig" rid="f3">Fig. 3</xref>. A Gaussian filter with standard deviation σ=3 is applied to smooth map Kt. As can be seen in Ot and St in <xref ref-type="fig" rid="f3">Fig. 3</xref>, the rear vehicle does not generate corners, because it belongs to the background model. This is a great advantage of the methodology. Two consecutive captures of the dataset and the corresponding Appearance Field are shown on <xref ref-type="fig" rid="f3">Fig. 3</xref>. Both fields are similar and the tracking system can successfully follow the person. <xref ref-type="fig" rid="f3">Fig. 3</xref> also compares the corner map obtained using the Harris corner detector which works on the gray scale image. As can be seen, the background behind the person incorporates a lot of noise for the tracking system. </p>
			</sec>
			<sec>
				<title>2.4. Tracking procedure</title>
				<p>This section describes the iterative tracking method. It is inspired by the LKT and Mean Shift trackers. Instead of using image intensities [<xref ref-type="bibr" rid="B13">13</xref>] or colors [<xref ref-type="bibr" rid="B15">15</xref>], the algorithm uses tracking fields: Detection and Appearance Fields. </p>
				<p>2.4.1 Iterative Tracking <xref ref-type="table" rid="t0">Algorithm</xref>
				</p>
				<p>For a given Ti,t-1 = {b,id,e,m} ∈ Γt-1, the methodology seeks their most probable position in Ft using Ti,t-1.b as the first hypothesis. Algorithm 1 presents a pseudo-code of the iterative tracking. It has four inputs: two tracking fields, Q and P, obtained from the previous frame and the current frame respectively, initial position y0, and a first displacement hypothesis g. If the association stage matches a detection ROI to Ti, the tracking fields correspond to Gaussian fields: Q = Gt-1 and P=Gt. Otherwise, the tracking fields employed correspond to appearance fields: Q=Kt-1 and P=Kt. y0 is the central point of Ti,t-1.b: y0 = {b.x + b.w / 2,b.y + b.h / 2}. Subsection 2.4.2 details the use of the displacement hypothesis.</p>
				<p>
					<table-wrap id="t0">
						<label>Algorithm 1</label>
						<caption>
							<title>Iterative Tracking Algorithm</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-84-200-00217-gt0.png"/>
						<table-wrap-foot>
							<fn id="TFN1">
								<p>Source: The authors.</p>
							</fn>
						</table-wrap-foot>
					</table-wrap>
				</p>
				<p>Distributions Q and P are respectively computed to obtain a similarity score comparing position y0 at time t-1 to a new position yi, at time t. We define the target distribution q={qu}u=1, …, g, which are all the pixel values of field Q inside the ROI centered at y0. Distribution p(yi) are the pixel values of field P inside the ROI centered at yi. These vectors are normalized to obtain distributions, i.e. <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i023.jpg"/>. The similarity score is calculated based on the Bhattacharyya coefficient [8]: </p>
				<p>
					<disp-formula id="e5">
						<graphic xlink:href="0012-7353-dyna-84-200-00217-e5.png"/>
						<label>(5)</label>
					</disp-formula>
				</p>
				<p>In (5), both distributions have the same length g (the ROIs have the same size). The algorithm will seek to find the new position y1 = y0 + dt in P, where dt is a displacement vector, which maximizes the similarity criteria.</p>
				<p>The initial position of y1 is the location of the target in the previous frame, y0. At each iteration, y1 moves one pixel toward the largest Bhattacharyya coefficient in the 8-connected neighborhood By1. This algorithm converges to a local maximum due to the nature of the Gaussian fields. However, a necessary condition for an accurate convergence is that the local maximum should be in the neighborhood of y0. To ensure this condition, and a fast convergence, next location y1 of the object is searched in a pyramidal representation of tracking field P. The number of iterations of the algorithm was fixed to MAXIT=20 in our tests, but, in general, the convergence is reached after two or three iterations.</p>
				<p>
					<xref ref-type="fig" rid="f4">Fig. 4</xref> shows an example of the computation of the displacement vector dt using the probability fields. In the upper box, the people detector had found the person in both frames Ft-1 and Ft. Detection fields Gt-1 and Gt could be created using their output ROIs rdt-1 and rdt. Thus, the tracking algorithm uses both probability fields to find the flow vector dt associated with the dynamics of the pedestrian. The second box simulates when a pedestrian is lost by the detector, and then, the system uses appearance fields to obtain flow motion vector dt.</p>
				<p>4.2.2. Pyramidal search</p>
				<p>The pyramidal search is carried out on downsampled versions of the original probability fields P and Q. These versions are denoted as PL and QL, with L={0,1,2}. The new size of PL and QL corresponds to the original size of P and Q divided by factor 2L. The tracking of target Tk starts in the third level L = 2 of the pyramidal representation. Algorithm 1 is executed using P2, Q2 and y20 = y0 / 4 as parameters. We also define a vector gL, which pre-translates tracking field QL, and represents a first displacement hypothesis. For L = 2, gL=[0 0]T. Algorithm 1 returns flow vector dL, which is used to compute the pre-translation vector of the next level: gL-1 = 2 ∙ (gL + dL). The algorithm is executed a second time using the next level of the pyramidal representation. Final solution d of the pyramidal tracking is obtained by computing the displacement at level 0: d = g0+d0. Motion vector dt is stored in target k, and its new location is computed as: Tk,t.b = Tk,t-1.b + dt.</p>
				<p>
					<fig id="f4">
						<label>Figure 4</label>
						<caption>
							<title>Flow motion vectors dt obtained from probability fields: using detection fields in the upper box, and using appearance fields in the lower box.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-84-200-00217-gf4.jpg"/>
						<attrib>Source: The authors.</attrib>
					</fig>
				</p>
				<p>
					<fig id="f5">
						<label>Figure 5</label>
						<caption>
							<title>State machine of an individual target.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-84-200-00217-gf5.png"/>
						<attrib>Source: The authors.</attrib>
					</fig>
				</p>
			</sec>
			<sec>
				<title>4.3. State machine of a target</title>
				<p>The target framework models each pedestrian as an autonomous agent with a state machine. Throughout the target lifetime, from the first time it appears in the sequence, until it exits the view, the tracking system collects information about its evolution generating events that trigger transitions between the states. <xref ref-type="fig" rid="f5">Fig. 5</xref> shows the states of a target and the events generating the transitions to the following states.</p>
				<p>
					<list list-type="bullet">
						<list-item>
							<p>INIT STATE:</p>
						</list-item>
						<list-item>
							<p>A detection roi rk in <inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i027.png"/> not associated with an existing target in Γ<sub>
 <italic>t-1</italic> 
</sub> generates a new target T<sub>new</sub>={b,id,e,m}. The initial position of T<sub>new</sub>.b copies the value from r<sub>k</sub>. The number identifying the target, T<sub>new</sub>.id, is computed incrementing by one the last id number. Motion history T<sub>new</sub>.m is created as an empty array. The state of T<sub>new</sub>.e is initialized with the INIT value. Then, target T<sub>new</sub> is stored in the Γ<sub>
 <italic>t</italic>
</sub> set of the Target Repository. In the next frame, if the association stage does not find a corresponding detection on<inline-graphic xlink:href="0012-7353-dyna-84-200-00217-i027.png"/>, the lost event is generated and the new state will be VERIFY. When a detection is associated with the target, the tracking procedure generates a motion vector d. If the module of d is near zero, the stop event triggers the transition to the STILL state. Otherwise, the move event is generated and the new state of the target will be WALKING.</p>
						</list-item>
						<list-item>
							<p>STILL STATE</p>
						</list-item>
						<list-item>
							<p>The target remains in this state until a move event, generated with a value of motion vector d other than zero, triggers the machine to the WALKING state. A lost event triggers a transition to the VERIFY state.</p>
						</list-item>
						<list-item>
							<p>WALKING STATE</p>
						</list-item>
						<list-item>
							<p>In this state, the target is supposed to be continuously moving. The average of the last three motion vectors, d<sub>t-2</sub>, d<sub>t-1</sub> and d<sub>
 <bold>t</bold>
</sub> saved on the m array, estimates this movement. If this value is near zero, the stop event is generated and the target changes to the STILL state. Otherwise, the target remains on the WALKING state. This procedure helps to filter some tracking errors and to generate smooth transitions between the states. If the target goes beyond the limits of a scene, a go out event is generated and there is a transition to the END state.</p>
						</list-item>
					</list>
				</p>
				<p>
					<fig id="f6">
						<label>Figure 6</label>
						<caption>
							<title>This figure shows different situations using the target framework. In this schema, real pedestrians p<sub>i</sub> are represented by filled circles, false positives by empty circles, and T<sub>i</sub> are shown by their different states.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-84-200-00217-gf6.png"/>
						<attrib>Source: The authors.</attrib>
					</fig>
				</p>
				<p>
					<list list-type="bullet">
						<list-item>
							<p>VERIFY STATE</p>
						</list-item>
						<list-item>
							<p>This state is triggered when the association task does not match any detection to the target, and the lost event is generated. The transition to this state initializes a counter c of the number of captures where the target is lost. In this state, the tracking of the target is executed using the Appearance Field. If counter c reaches a threshold C, an erase event is generated, and the state machine changes to the END state. However, when the association procedure finds a detection corresponding to the target position, the found event is generated and the target state has a transition to STILL or WALKING, depending on the modulus of motion vector d<sub>t</sub>.</p>
						</list-item>
						<list-item>
							<p>END STATE</p>
						</list-item>
						<list-item>
							<p>This state closes the target, and will not be present at next set Γ<sub>
 <italic>t-1</italic> 
</sub> .</p>
						</list-item>
						<list-item>
							<p>
								<xref ref-type="fig" rid="f6">Fig. 6</xref> shows some examples of the tracking procedure with different detection results. Target T<sub>1</sub> correctly tracks pedestrian p<sub>1</sub>. T<sub>3</sub> is an example of a lost pedestrian, and T<sub>2</sub>, which was generated by a false alarm f, was immediately closed because f was the only detection.</p>
						</list-item>
					</list>
				</p>
			</sec>
		</sec>
		<sec>
			<title>5. Off-line target framework pruning</title>
			<p>This section develops the off-line evaluation of the target framework through the sequence. All the targets are analyzed in order to filter false alarms and concatenate broken tracks. The procedures detailed in the next sections can be considered, however, as causal. This implies that it could be possible to adapt the false positive filtering and the concatenation algorithm to work on-line, but this is beyond the scope of this paper.</p>
			<sec>
				<title>5.1. Filtering of false positives</title>
				<p>False positives (detections that are not pedestrians) can be easily identified if the only states of the corresponding target are INIT and VERIFY, i.e. target T2 in <xref ref-type="fig" rid="f6">Fig. 6</xref>. The example depicts the case where only one detection, possibly due to light conditions, shadows, etc., generates target T2 which is closed four frames later. To filter those cases, the target is eliminated if: #NV &gt; #NS+#NW$, where #NV, #NS, #NW are the number of times the target is in the VERIFY, STILL and WALKING states, respectively.</p>
			</sec>
			<sec>
				<title>5.2. Target concatenation with the kalman filter</title>
				<p>The concatenation procedure aims at connecting a target in VERIFY state and a new target created before, but corresponding to the same pedestrian. The Kalman filter is used for its robust estimation of trajectories, providing metrics to match the lost target with one of the potential new ones as proposed by Deriche &amp; Faugeras [<xref ref-type="bibr" rid="B20">20</xref>].</p>
				<p>To perform the first task, a Kalman filter is used. It computes the optimal estimation of state vector X<sub>t</sub> (positions, speeds and accelerations) from noisy measurements V<sub>t</sub>, consisting of the ground plane positions of the target at time t [<xref ref-type="bibr" rid="B21">21</xref>]. Those positions are obtained using the calibration parameters of the camera and the scene. The model of our application, following the system dynamics and measurements, is shown below:</p>
				<p>
					<disp-formula id="e6">
						<graphic xlink:href="0012-7353-dyna-84-200-00217-e6.png"/>
						<label>(6)</label>
					</disp-formula>
				</p>
				<p>
					<disp-formula id="e7">
						<graphic xlink:href="0012-7353-dyna-84-200-00217-e7.png"/>
						<label>(7)</label>
					</disp-formula>
				</p>
				<p>where ω<sub>t</sub> and υ<sub>t</sub> are assumed to be normally distributed with zero mean and covariances Q<sub>
 <italic>t</italic>
</sub> representing the model error, and R<sub>
 <italic>t</italic>
</sub> the covariance of the measurement error. Φ<sub>t+1,t</sub> is the evolution matrix, and H<sub>t</sub>, the selection matrix. A detailed description of matrices can be found in [<xref ref-type="bibr" rid="B22">22</xref>].</p>
				<p>Let T<sub>0</sub> be the tracked target which changes to the VERIFY state at time t<sub>v</sub> and whose position was estimated by the Kalman filter. Let T<sub>i=1, …,n</sub> be all the other targets generated at t<sub>v</sub>. The matching procedure consists of evaluating those targets that are near T<sub>0</sub> and have a similar evolution. Proximity is measured by the Mahalanobis distance between the filtered position of T<sub>0</sub> and the candidate position [<xref ref-type="bibr" rid="B20">20</xref>]:</p>
				<p>
					<disp-formula id="e8">
						<graphic xlink:href="0012-7353-dyna-84-200-00217-e8.png"/>
						<label>(8)</label>
					</disp-formula>
				</p>
				<p>where the covariance of difference 𝑋 𝑥,𝑦 𝑇 0 − 𝑉 𝑇 𝑖 , S, is the sum of the Kalman covariance matrix P<sup>To</sup> and R. Distance d<sub>i</sub> has a χ<sup>2</sup> distribution with one degree of freedom. A threshold of d<sub>i</sub> &lt; 3.84 is proposed in order to account for a 95 percent probability of matching targets related to the same pedestrian. More distant targets are discarded. Candidate targets should also have a similar evolution to T<sub>0</sub>. This is computed by comparing their displacement vector, and the displacement vector of the target candidate. From those targets which fulfill both conditions, the one with the minimum distance is matched with T<sub>0</sub> by unifying their ID. If none applies at time t<sub>v</sub>, the targets on the next frame are evaluated. This procedure is repeated for W frames before considering the target as not concatenable.</p>
				<p>
					<xref ref-type="fig" rid="f7">Fig. 7</xref> shows an example where a pedestrian has an abrupt change in direction, and the original T<sub>0</sub> lost the track. The search area where T<sub>0</sub> changes to the VERIFY state is defined by S. There are two candidates, T<sub>i</sub> and T<sub>k</sub>, initiating inside the area. However, the evolution of T<sub>k</sub> is different from that of T<sub>0</sub>, and the second criterion is not fulfilled. Then, T<sub>i</sub> is retained and concatenated with T<sub>0</sub> updating its ID (T<sub>i</sub>.id = T<sub>0</sub>.id).</p>
				<p>
					<fig id="f7">
						<label>Figure 7</label>
						<caption>
							<title>The figure shows the research area around the position of pedestrian p<sub>a</sub>, target T<sub>0</sub> which is closed, and target candidates T<sub>i</sub> and T<sub>k</sub> for the matching procedure.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-84-200-00217-gf7.png"/>
						<attrib>Source: The authors.</attrib>
					</fig>
				</p>
				<p>
					<fig id="f8">
						<label>Figure 8</label>
						<caption>
							<title>This figure shows the result of the concatenation procedure on a set of targets. The first row shows unconcatenated targets corresponding to a single pedestrian, and the second row shows concatenated results. The labels indicate the ID number of the targets at the beginning of their paths.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-84-200-00217-gf8.png"/>
						<attrib>Source: The Authors.</attrib>
					</fig>
				</p>
				<p>
					<xref ref-type="fig" rid="f8">Fig. 8</xref> shows an example of the concatenation procedure of targets 29 and 65. The upper image of column (a) shows the trajectories of targets 29, 32, 35 and 42. Here, the track is broken due to abrupt changes in direction. A kind of tail on their path before closing can be noticed. This corresponds to the positions where the targets have VERIFY states, and lose the pedestrian's track. The lower image of the left column shows the tails, which were filtered on a concatenated trajectory. The right column shows a different case. The trajectories of targets 65, 66, 83, and 94 correspond to the same pedestrian, but target 66 is lost due to the occlusion by a pedestrian tagged with target 62. The lower image shows the concatenated result.</p>
			</sec>
		</sec>
		<sec>
			<title>6. Experimental setup</title>
			<sec>
				<title>6.1. Video sequences used for the tests</title>
				<p>The system was evaluated in two public datasets. The first one is the PETS2009 dataset, task S2.L1 (view 1) [<xref ref-type="bibr" rid="B22">22</xref>]. The sequence consists of 795 frames with 4650 annotated ROI positions corresponding to 19 pedestrians. The second is the Oxford Town Centre [<xref ref-type="bibr" rid="B23">23</xref>,<xref ref-type="bibr" rid="B24">24</xref>] dataset, which captures a pedestrian street with hundreds of people walking. The sequence was captured at 25 fps, and the total number of labeled frames is 4500. Thus, the total number of pedestrians is 230 from 71460 annotated ROIs. </p>
			</sec>
			<sec>
				<title>6.2. Pedestrian detector training</title>
				<p>The detection step of the target framework is performed by a cascade of boosted classifiers using the Real Adaboost algorithm. The detector was trained following the guidelines from [<xref ref-type="bibr" rid="B18">18</xref>], using a three-fold cross validation. The input features are the histogram of oriented level lines, HO2L, which are computed on the MFS. The resulting cascade has, on average, 23 strong classifiers, which are composed of a combination of generative and discriminative classification functions [<xref ref-type="bibr" rid="B25">25</xref>,<xref ref-type="bibr" rid="B26">26</xref>]. Training positive patches consist of pedestrian images captured from an outdoor street sequence. The total number is 6,726 positive samples [<xref ref-type="bibr" rid="B27">27</xref>]. The motion information of each patch is captured by the MFS, and the information employed to compute HO2L features are the S<sub>t</sub> and the O<sub>t</sub> matrices, as shown in Figs. 2 and 3. It is important to notice that the pedestrian detector is trained using people from different views and the appearance of the TownCentre and PETS2009 datasets (cross-database evaluation). The negative training set is PASCALVOC 2012, composed of 7,166 images without people [<xref ref-type="bibr" rid="B28">28</xref>]. The number of negatives is increased by rotating the images 90 degrees three times. </p>
			</sec>
			<sec>
				<title>6.3. Tracking system benchmark</title>
				<p>Two different kinds of software codes for tracking were evaluated in this paper to compare the performance of the Target Framework. The first is the HierarchyEnsemble on-line tracking algorithm proposed by Zhang [<xref ref-type="bibr" rid="B12">12</xref>,<xref ref-type="bibr" rid="B29">29</xref>]. The methodology is based on a Tracking-by-Detection algorithm combining a Mean Shift using a color appearance model, and a Kalman filter to follow the target motion dynamics. It also introduces a Tracker Hierarchy used to label each tracker as novice or expert depending on the number of templates accumulated during the tracking. The advantage of this methodology is that it does not need any calibration (it works on 2D) and the available code runs on-line. The Continuous Energy Minimization (CEM) is an off-line algorithm introduced by Milan [<xref ref-type="bibr" rid="B16">16</xref>,<xref ref-type="bibr" rid="B30">30</xref>]. It is a traditional forward and backward methodology which employs pedestrian detections to connect paths using linear approximation or splines. The final trajectories minimize a Total Energy computed using different metrics. It uses both 2D and 3D information (it needs calibration) and gives good output results.</p>
				<p>The three tracking systems, HierarchyEnsemble, CEM, and MFS Target Framework (MFS-TF), employ pedestrian location hypotheses as input. In order to evaluate the sensitivity of the tracking algorithm with different inputs, the following files with pedestrian rois were used on the tests:</p>
				<p>
					<list list-type="bullet">
						<list-item>
							<p>det_opencv: these files are the result of the HOG detector [<xref ref-type="bibr" rid="B10">10</xref>,<xref ref-type="bibr" rid="B11">11</xref>] applied on the TownCentre and the PETS2009 datasets, and shared in Zhang’s webpage [<xref ref-type="bibr" rid="B29">29</xref>]. </p>
						</list-item>
						<list-item>
							<p>det_autre: there are two additional detection files, one for the PETS2009 provided by Milan [<xref ref-type="bibr" rid="B30">30</xref>], and the other for the TownCentre dataset shared at [<xref ref-type="bibr" rid="B24">24</xref>].</p>
						</list-item>
						<list-item>
							<p>ada_mfs_color: the pedestrian detector uses the MFS (see sec. 4.2), and a Color Texton Space (CTS) which better captures transitions between colored regions which could be lost on grayscale transformation [<xref ref-type="bibr" rid="B31">31</xref>].</p>
						</list-item>
					</list>
				</p>
			</sec>
			<sec>
				<title>6.4. Tracking evaluation</title>
				<p>We have used Milan's code [<xref ref-type="bibr" rid="B30">30</xref>] implementing the CLEAR MOT metrics to test the performance of the target framework on the datasets. The multiple object tracking accuracy (MOTA) evaluated the tracking performance considering the false negative rate, false positive rate, and number of identity switches. The multiple object tracking precision (MOTP) measured the precision of the tracker computing an overlap ratio of the estimated position for matched true pedestrian-target pairs. For the MOTP, as in [<xref ref-type="bibr" rid="B13">13</xref>], a score of 50 percent was considered as significant for tracking, like the Pascal VOC Challenge [<xref ref-type="bibr" rid="B20">20</xref>]. False Negatives (FN) indicate the number of missed pedestrians. This score is closely related to the performance of the detector. False Positives (FP) measure the number of targets ROIs not matched to a pedestrian position. The other scores are False Alarms per Image (FPPI), the number of ground true (GT) unique pedestrians on the sequence and Mostly Tracked (MT) pedestrians.</p>
			</sec>
		</sec>
		<sec sec-type="results|discussion">
			<title>7. Results and discussions</title>
			<p>This section presents and discussed the results of the different detection files and the performance of the three tracking systems on the datasets.</p>
			<sec>
				<title>7.1. Pedestrian detection performance</title>
				<p>The performance of the tracking algorithms is closely related to pedestrian hypotheses provided by the detectors. <xref ref-type="table" rid="t1">Table 1</xref> presents the results of the different detection files on the datasets. The first column indicates the name of the detection file's, and the second is the total number of pedestrian rois in the sequence. The correct detection percentage is shown on column Det. The other metrics, FN, FP and FPPI, were introduced in the “Tracking Evaluation” section. It is worth noticing that the PETS2009 dataset will be evaluated without using a region of interest, as suggested in many papers of the state-of-the-art. This reduces our Det ratio and increases the number of FP, compared with those papers. The best performance of the detectors is found in the PETS2009 dataset. This is the noiseless sequence, with a simple static background. People walk without blocking each other, and there are no other moving objects on the scene.</p>
				<p>
					<table-wrap id="t1">
						<label>Table 1</label>
						<caption>
							<title>Performance of detection files on the datasets. </title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-84-200-00217-gt1.png"/>
						<table-wrap-foot>
							<fn id="TFN2">
								<p>Source: The Authors.</p>
							</fn>
						</table-wrap-foot>
					</table-wrap>
				</p>
				<p>The Town Centre sequence has a cluttered background, i.e. there are mannequins on the shop windows which are considered as pedestrians by the classifiers. In addition, pedestrians mask, either totally or partially, other people along their paths. This effect is especially remarkable when people are far away from the camera, and their size is small. It is then possible to conclude that the number of distracters for the detectors is more important in this sequence than in the first one. This would account for the decreased performance. As can be seen in <xref ref-type="table" rid="t1">Table 1</xref>, the MFS Adaboost detector works at an operational point which minimizes FP. On the other hand, the operational point of PETS2009-S2L1-c1-det and TownCentre_Body_DET maximizes the Det ratio, but increases by 15 and 1.9 times the FP number for the PETS2009 and TownCentre, respectively. The corresponding opencv classifier files have a behavior comparable to the MFS classifiers. However, with similar FP values, the MFS classifiers have lower FN values (miss rate).</p>
			</sec>
			<sec>
				<title>7.2. Results of the tracking systems</title>
				<p>The best performance on the MOTA metrics was obtained by MFS-TF using the ada_mfs_color detection file. MFS-TS also gets the highest MOTP metric ratio, which proves that the tracking algorithm generates targets that correctly overlap the pedestrian positions. When MFS-TS works with accurate detection files with a low miss rate and a low FP, such as the ada_mfs_color and det_opencv files, the behavior outperforms the other tracking systems in almost all the datasets. In the case of ada_mfs_color, our tracking algorithm improves the FN metric (related to the miss rate), increasing the number of the original detections by 290 for PETS2009, and 3600 for TownCentre. At the same time, the number of FP remains at very low values. The number of targets created by the algorithm is low, similarly to the SWIDs number showing that each pedestrian is well tracked. When MFS-TF works with detectors at different operational points, the improvements are less remarkable.</p>
				<p>
					<table-wrap id="t2">
						<label>Table 2</label>
						<caption>
							<title>Evaluation scores for tracking results. The best scores are in bold and the second best results are underlined. </title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-84-200-00217-gt2.png"/>
						<table-wrap-foot>
							<fn id="TFN3">
								<p>Source: The Authors</p>
							</fn>
						</table-wrap-foot>
					</table-wrap>
				</p>
				<p>The HierarchyEnsemble algorithm [<xref ref-type="bibr" rid="B12">12</xref>] has the lowest number of tracks between the methodologies. It is related to the use of several color templates for each track. Therefore, a new track is created with a novice state if a detection has not matched the active tracks and the saved templates. It is also robust when occlusions happen, or when a pedestrian was lost from sight for some frames. This is the reason why the Hierarchy Ensemble has the highest MT score on TownCentre set. This methodology is very sensitive to the detectors with a high number of FN and FP. The analysis shows that those detections are noisy (central point and size of the roi), hindering the recollection of the pedestrian templates, and increasing the number of false tracks. </p>
				<p>The CEM algorithm, developed by Milan et al. [<xref ref-type="bibr" rid="B16">16</xref>], uses a backward and forward growing path to complete non-detected pedestrian positions, obtaining a high score on MT. In general, it shows a better performance than Zhang's algorithm. It is also robust against noisy detections, because the track paths are obtained using interpolations (linear or spline). The main problem of this algorithm arises when the number of FP is high, such as in the case of the det_autre file for the TownCentre dataset. In those situations, the non-filtered FP draws an erroneous path, which, in turn, increases the FP number.</p>
				<p>
					<fig id="f9">
						<label>Figure 9</label>
						<caption>
							<title>The figure shows the targets and their trajectories computed by our analysis. Each target trajectory is plotted with a different color.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-84-200-00217-gf9.png"/>
						<attrib>Source: The Authors.</attrib>
					</fig>
				</p>
				<p>The results of our analysis are plotted on <xref ref-type="fig" rid="f9">Fig. 9</xref>, where the pedestrians' trajectories are depicted using different colors indicating that a new target captured their paths. <xref ref-type="table" rid="t2">Table 2</xref> reports the scores of the Target Framework, and the other two tracking methodologies applied on the four sequences following the CLEAR MOTS metrics [<xref ref-type="bibr" rid="B19">19</xref>]. The best results are in bold and the second best results are underlined.</p>
			</sec>
		</sec>
		<sec sec-type="conclusions">
			<title>8. Conclusions</title>
			<p>The pedestrians' dynamics was successfully detected and tracked using the proposed Target Framework. The MFS used on the detection stage obtained accurate hypothesis detections. It also built an Appearance field which correctly followed pedestrians lost by the detector. </p>
			<p>Future work will involve improving the overall system. For tracking purposes, the development of additional features should help to perform the tracking on the MFS without the necessity to work on still spaces (color histograms, etc.). Furthermore, the target state machine could be improved by adding other states, for example, to detect complex pedestrian behavior.</p>
		</sec>
	</body>
	<back>
		<ack>
			<title>Acknowledgments</title>
			<p>This paper was supported by PICT-2283 of ANPCyT, CONICET (Argentina), and ACyT A15T14 of UADE. </p>
		</ack>
		<ref-list>
			<title>References</title>
			<ref id="B1">
				<label>[1]</label>
				<mixed-citation>[1]  Smeulders, A.W., Chu, D., Cucchiara, R., Calderara, S., Dehghan, A. and Shah, M., Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 36(7), pp. 1442-1468, 2014. DOI: 10.1109/TPAMI.2013.230</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Smeulders</surname>
							<given-names>A.W.</given-names>
						</name>
						<name>
							<surname>Chu</surname>
							<given-names>D.</given-names>
						</name>
						<name>
							<surname>Cucchiara</surname>
							<given-names>R.</given-names>
						</name>
						<name>
							<surname>Calderara</surname>
							<given-names>S.</given-names>
						</name>
						<name>
							<surname>Dehghan</surname>
							<given-names>A.</given-names>
						</name>
						<name>
							<surname>Shah</surname>
							<given-names>M</given-names>
						</name>
					</person-group>
					<article-title>Visual tracking: An experimental survey</article-title>
					<source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
					<volume>36</volume>
					<issue>7</issue>
					<fpage>1442</fpage>
					<lpage>1468</lpage>
					<year>2014</year>
					<pub-id pub-id-type="doi">10.1109/TPAMI.2013.230</pub-id>
				</element-citation>
			</ref>
			<ref id="B2">
				<label>[2]</label>
				<mixed-citation>[2]  Goldhammer, M., Gerhard, M., Zernetsch, S., Doll, K. and Brunsmann, U., Early prediction of a pedestrians trajectory at intersections. IEEE Proceedings on Intelligent Transport Systems. pp. 237-242, 2013. DOI: 10.1109/ITSC.2013.6728239</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Goldhammer</surname>
							<given-names>M.</given-names>
						</name>
						<name>
							<surname>Gerhard</surname>
							<given-names>M.</given-names>
						</name>
						<name>
							<surname>Zernetsch</surname>
							<given-names>S.</given-names>
						</name>
						<name>
							<surname>Doll</surname>
							<given-names>K.</given-names>
						</name>
						<name>
							<surname>Brunsmann</surname>
							<given-names>U</given-names>
						</name>
					</person-group>
					<source>Early prediction of a pedestrians trajectory at intersections</source>
					<conf-name>IEEE Proceedings on Intelligent Transport Systems</conf-name>
					<fpage>237</fpage>
					<lpage>242</lpage>
					<year>2013</year>
					<pub-id pub-id-type="doi">10.1109/ITSC.2013.6728239</pub-id>
				</element-citation>
			</ref>
			<ref id="B3">
				<label>[3]</label>
				<mixed-citation>[3]  Zhang, Y., Ji, Q. and Lu, H., Event detection in complex scenes using interval temporal constraints. In: IEEE International Conference on Computer Vision. pp. 3184-3191, 2013. DOI: 10.1109/ICCV.2013.395 </mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Zhang</surname>
							<given-names>Y.</given-names>
						</name>
						<name>
							<surname>Ji</surname>
							<given-names>Q.</given-names>
						</name>
						<name>
							<surname>Lu</surname>
							<given-names>H</given-names>
						</name>
					</person-group>
					<source>Event detection in complex scenes using interval temporal constraints</source>
					<conf-name>IEEE International Conference on Computer Vision</conf-name>
					<conf-date>2013</conf-date>
					<pub-id pub-id-type="doi">10.1109/ICCV.2013.395</pub-id>
				</element-citation>
			</ref>
			<ref id="B4">
				<label>[4]</label>
				<mixed-citation>[4]  Jodoin, J., Bilodeau, G. and Saunier, N., Urban tracker: Multiple object tracking in urban mixed traffic. In: IEEE Winter Conf. on App. of Comp. Vision. pp. 885-892, 2014. DOI: 10.1109/WACV.2014.6836010</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Jodoin</surname>
							<given-names>J.</given-names>
						</name>
						<name>
							<surname>Bilodeau</surname>
							<given-names>G.</given-names>
						</name>
						<name>
							<surname>Saunier</surname>
							<given-names>N</given-names>
						</name>
					</person-group>
					<article-title>Urban tracker: Multiple object tracking in urban mixed traffic</article-title>
					<source>IEEE Winter Conf. on App. of Comp. Vision</source>
					<fpage>885</fpage>
					<lpage>892</lpage>
					<year>2014</year>
					<pub-id pub-id-type="doi">10.1109/WACV.2014.6836010</pub-id>
				</element-citation>
			</ref>
			<ref id="B5">
				<label>[5]</label>
				<mixed-citation>[5]  Keller, C. and Gavrila, D., Will the pedestrian cross?. A study on pedestrian path prediction. IEEE Trans. on Intell. Transp. Systems 15(2), pp. 494-506, 2014. DOI: 10.1109/TITS.2013.2280766</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Keller</surname>
							<given-names>C.</given-names>
						</name>
						<name>
							<surname>Gavrila</surname>
							<given-names>D</given-names>
						</name>
					</person-group>
					<article-title>Will the pedestrian cross?. A study on pedestrian path prediction</article-title>
					<source>IEEE Trans. on Intell. Transp. Systems</source>
					<volume>15</volume>
					<issue>2</issue>
					<fpage>494</fpage>
					<lpage>506</lpage>
					<year>2014</year>
					<pub-id pub-id-type="doi">10.1109/TITS.2013.2280766</pub-id>
				</element-citation>
			</ref>
			<ref id="B6">
				<label>[6]</label>
				<mixed-citation>[6]  Lucas, B. and Kanade, T., An iterative image registration technique with an application to stereo vision. In: Proc. Int. Joint Conf. on Artificial Intell. 2, pp. 674-679, August, 1981. </mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Lucas</surname>
							<given-names>B.</given-names>
						</name>
						<name>
							<surname>Kanade</surname>
							<given-names>T</given-names>
						</name>
					</person-group>
					<article-title>An iterative image registration technique with an application to stereo vision</article-title>
					<source>Proc. Int. Joint Conf. on Artificial Intell</source>
					<volume>2</volume>
					<fpage>674</fpage>
					<lpage>679</lpage>
					<month>08</month>
					<year>1981</year>
				</element-citation>
			</ref>
			<ref id="B7">
				<label>[7]</label>
				<mixed-citation>[7]  Shi, J. and Tomasi, C., Good features to track. In: IEEE Proc. Comp. Vis. and Pattern Recognit. pp. 593-600, June, 1994. DOI: 10.1109/CVPR.1994.323794</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Shi</surname>
							<given-names>J.</given-names>
						</name>
						<name>
							<surname>Tomasi</surname>
							<given-names>C</given-names>
						</name>
					</person-group>
					<article-title>Good features to track</article-title>
					<source>IEEE Proc. Comp. Vis. and Pattern Recognit</source>
					<fpage>593</fpage>
					<lpage>600</lpage>
					<month>06</month>
					<year>1994</year>
					<pub-id pub-id-type="doi">10.1109/CVPR.1994.323794</pub-id>
				</element-citation>
			</ref>
			<ref id="B8">
				<label>[8]</label>
				<mixed-citation>[8]  Comaniciu, D., Ramesh, V. and Meer, P., Real-time tracking of non-rigid objects using mean shift. In: IEEE Proc. Comp. Vis. and Pattern Recognit. 2, pp. 142-149, 2000. DOI: 10.1109/CVPR.2000.854761</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Comaniciu</surname>
							<given-names>D.</given-names>
						</name>
						<name>
							<surname>Ramesh</surname>
							<given-names>V.</given-names>
						</name>
						<name>
							<surname>Meer</surname>
							<given-names>P</given-names>
						</name>
					</person-group>
					<article-title>Real-time tracking of non-rigid objects using mean shift</article-title>
					<source>IEEE Proc. Comp. Vis. and Pattern Recognit</source>
					<volume>2</volume>
					<fpage>142</fpage>
					<lpage>149</lpage>
					<year>2000</year>
					<pub-id pub-id-type="doi">10.1109/CVPR.2000.854761</pub-id>
				</element-citation>
			</ref>
			<ref id="B9">
				<label>[9]</label>
				<mixed-citation>[9]  Kalal, Z., Mikolajczyk, K. and Matas, J., Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), pp. 1409-1422, 2012. DOI: 10.1109/TPAMI.2011.239</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Kalal</surname>
							<given-names>Z.</given-names>
						</name>
						<name>
							<surname>Mikolajczyk</surname>
							<given-names>K.</given-names>
						</name>
						<name>
							<surname>Matas</surname>
							<given-names>J</given-names>
						</name>
					</person-group>
					<article-title>Tracking-learning-detection</article-title>
					<source>IEEE Trans. Pattern Anal. Mach. Intell</source>
					<volume>34</volume>
					<issue>7</issue>
					<fpage>1409</fpage>
					<lpage>1422</lpage>
					<year>2012</year>
					<pub-id pub-id-type="doi">10.1109/TPAMI.2011.239</pub-id>
				</element-citation>
			</ref>
			<ref id="B10">
				<label>[10]</label>
				<mixed-citation>[10]  Dalal, N. and Triggs, B., Histograms of oriented gradients for human detection. In: IEEE Proc. Comp. Vis. and Pattern Recognit.1, pp. 886-893, 2005. DOI: 10.1109/CVPR.2005.177</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Dalal</surname>
							<given-names>N.</given-names>
						</name>
						<name>
							<surname>Triggs</surname>
							<given-names>B</given-names>
						</name>
					</person-group>
					<article-title>Histograms of oriented gradients for human detection</article-title>
					<source>IEEE Proc. Comp. Vis. and Pattern Recognit</source>
					<volume>1</volume>
					<fpage>886</fpage>
					<lpage>893</lpage>
					<year>2005</year>
					<pub-id pub-id-type="doi">10.1109/CVPR.2005.177</pub-id>
				</element-citation>
			</ref>
			<ref id="B11">
				<label>[11]</label>
				<mixed-citation>[11]  OpenCV library v. 2.4.9.0, [online]. [Accessed April 2016]. Available at: <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://opencv.org/">http://opencv.org/</ext-link>
					</comment>
				</mixed-citation>
				<element-citation publication-type="webpage">
					<source>OpenCV library v. 2.4.9.0</source>
					<comment>online</comment>
					<date-in-citation content-type="access-date" iso-8601-date="2016-00-00">Accessed April 2016</date-in-citation>
					<comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://opencv.org/">http://opencv.org/</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B12">
				<label>[12]</label>
				<mixed-citation>[12]  Zhang, J., Presti, L. and Sclaroff, S., Online multi-person tracking by tracker hierarchy. In: Proc. Int. Conf. on Adv. Video and Signal-Based Surveillance. pp. 379-385, September, 2012. DOI: 10.1109/AVSS.2012.51</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Zhang</surname>
							<given-names>J.</given-names>
						</name>
						<name>
							<surname>Presti</surname>
							<given-names>L.</given-names>
						</name>
						<name>
							<surname>Sclaroff</surname>
							<given-names>S</given-names>
						</name>
					</person-group>
					<article-title>Online multi-person tracking by tracker hierarchy</article-title>
					<source>Proc. Int. Conf. on Adv. Video and Signal-Based Surveillance</source>
					<fpage>379</fpage>
					<lpage>385</lpage>
					<month>09</month>
					<year>2012</year>
					<pub-id pub-id-type="doi">10.1109/AVSS.2012.51</pub-id>
				</element-citation>
			</ref>
			<ref id="B13">
				<label>[13]</label>
				<mixed-citation>[13]  Breitenstein, M., Reichlin, F., Leibe, B., Koller-Meier, E. and Van Gool, L., Online multiperson tracking-by-detection from a single, uncalibrated camera. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), pp. 1820-1833, 2011. DOI: 10.1109/TPAMI.2010.232</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Breitenstein</surname>
							<given-names>M.</given-names>
						</name>
						<name>
							<surname>Reichlin</surname>
							<given-names>F.</given-names>
						</name>
						<name>
							<surname>Leibe</surname>
							<given-names>B.</given-names>
						</name>
						<name>
							<surname>Koller-Meier</surname>
							<given-names>E.</given-names>
						</name>
						<name>
							<surname>Van Gool</surname>
							<given-names>L</given-names>
						</name>
					</person-group>
					<article-title>Online multiperson tracking-by-detection from a single, uncalibrated camera</article-title>
					<source>IEEE Trans. Pattern Anal. Mach. Intell</source>
					<volume>33</volume>
					<issue>9</issue>
					<fpage>1820</fpage>
					<lpage>1833</lpage>
					<year>2011</year>
					<pub-id pub-id-type="doi">10.1109/TPAMI.2010.232</pub-id>
				</element-citation>
			</ref>
			<ref id="B14">
				<label>[14]</label>
				<mixed-citation>[14]  Leibe, B., Schindles, K., Cornelis, N. and Van Gool, L., Coupled object detection and tracking form static cameras and moving vehicles. IEEE Trans. Pattern Anal. Mach. Intell., 30(10), pp. 1683-1698, 2008. </mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Leibe</surname>
							<given-names>B.</given-names>
						</name>
						<name>
							<surname>Schindles</surname>
							<given-names>K.</given-names>
						</name>
						<name>
							<surname>Cornelis</surname>
							<given-names>N.</given-names>
						</name>
						<name>
							<surname>Van Gool</surname>
							<given-names>L</given-names>
						</name>
					</person-group>
					<article-title>Coupled object detection and tracking form static cameras and moving vehicles</article-title>
					<source>IEEE Trans. Pattern Anal. Mach. Intell</source>
					<volume>30</volume>
					<issue>10</issue>
					<fpage>1683</fpage>
					<lpage>1698</lpage>
					<year>2008</year>
				</element-citation>
			</ref>
			<ref id="B15">
				<label>[15]</label>
				<mixed-citation>[15]  Ben-Shitrit, H., Berclaz, J., Fleuret, F. and Fua, P., Tracking multiple people under global appearance constraints. IEEE Int. Conf. on Comp. Vision. pp. 137-144, 2011. DOI: 10.1109/ICCV.2011.6126235</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Ben-Shitrit</surname>
							<given-names>H.</given-names>
						</name>
						<name>
							<surname>Berclaz</surname>
							<given-names>J.</given-names>
						</name>
						<name>
							<surname>Fleuret</surname>
							<given-names>F.</given-names>
						</name>
						<name>
							<surname>Fua</surname>
							<given-names>P</given-names>
						</name>
					</person-group>
					<article-title>Tracking multiple people under global appearance constraints</article-title>
					<source>IEEE Int. Conf. on Comp. Vision</source>
					<fpage>137</fpage>
					<lpage>144</lpage>
					<year>2011</year>
					<pub-id pub-id-type="doi">10.1109/ICCV.2011.6126235</pub-id>
				</element-citation>
			</ref>
			<ref id="B16">
				<label>[16]</label>
				<mixed-citation>[16]  Milan, A., Roth, S. and Schindler, K., Continuous energy minimization for multitarget tracking. IEEE Trans. Pattern Anal. Mach. Intell., 36(1), pp. 58-72, 2014. DOI: 10.1109/TPAMI.2013.103</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Milan</surname>
							<given-names>A.</given-names>
						</name>
						<name>
							<surname>Roth</surname>
							<given-names>S.</given-names>
						</name>
						<name>
							<surname>Schindler</surname>
							<given-names>K</given-names>
						</name>
					</person-group>
					<article-title>Continuous energy minimization for multitarget tracking</article-title>
					<source>IEEE Trans. Pattern Anal. Mach. Intell</source>
					<volume>36</volume>
					<issue>1</issue>
					<fpage>58</fpage>
					<lpage>72</lpage>
					<year>2014</year>
					<pub-id pub-id-type="doi">10.1109/TPAMI.2013.103</pub-id>
				</element-citation>
			</ref>
			<ref id="B17">
				<label>[17]</label>
				<mixed-citation>[17]  Negri, P., Estimating the queue length at street intersections by using a movement feature space approach. IET Image Processing 8, pp. 406-416, 2014. DOI: 10.1049/iet-ipr.2013.0496</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Negri</surname>
							<given-names>P</given-names>
						</name>
					</person-group>
					<article-title>Estimating the queue length at street intersections by using a movement feature space approach</article-title>
					<source>IET Image Processing</source>
					<volume>8</volume>
					<fpage>406</fpage>
					<lpage>416</lpage>
					<year>2014</year>
					<pub-id pub-id-type="doi">10.1049/iet-ipr.2013.0496</pub-id>
				</element-citation>
			</ref>
			<ref id="B18">
				<label>[18]</label>
				<mixed-citation>[18]  Negri, P., Goussies, N. and Lotito, P., Detecting pedestrians on a movement feature space. Pattern Recognition 47(1), pp. 56-71, 2014. DOI: 10.1016/j.patcog.2013.05.020</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Negri</surname>
							<given-names>P.</given-names>
						</name>
						<name>
							<surname>Goussies</surname>
							<given-names>N.</given-names>
						</name>
						<name>
							<surname>Lotito</surname>
							<given-names>P</given-names>
						</name>
					</person-group>
					<article-title>Detecting pedestrians on a movement feature space</article-title>
					<source>Pattern Recognition</source>
					<volume>47</volume>
					<issue>1</issue>
					<fpage>56</fpage>
					<lpage>71</lpage>
					<year>2014</year>
					<pub-id pub-id-type="doi">10.1016/j.patcog.2013.05.020</pub-id>
				</element-citation>
			</ref>
			<ref id="B19">
				<label>[19]</label>
				<mixed-citation>[19]  Everingham, M., Gool, L., Williams, C.K., Winn, J. and Zisserman, A., The PASCAL Visual Object Classes (VOC) Challenge. Int. J. Comp. Vis. 8(2), pp. 303-338, 2010. </mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Everingham</surname>
							<given-names>M.</given-names>
						</name>
						<name>
							<surname>Gool</surname>
							<given-names>L.</given-names>
						</name>
						<name>
							<surname>Williams</surname>
							<given-names>C.K.</given-names>
						</name>
						<name>
							<surname>Winn</surname>
							<given-names>J.</given-names>
						</name>
						<name>
							<surname>Zisserman</surname>
							<given-names>A</given-names>
						</name>
					</person-group>
					<article-title>The PASCAL Visual Object Classes (VOC) Challenge</article-title>
					<source>Int. J. Comp. Vis</source>
					<volume>8</volume>
					<issue>2</issue>
					<fpage>303</fpage>
					<lpage>338</lpage>
					<year>2010</year>
				</element-citation>
			</ref>
			<ref id="B20">
				<label>[20]</label>
				<mixed-citation>[20]  Deriche, R. and Faugeras, O.D., Tracking line segments. In: European Conf. on Comp. Vis. pp. 259-268, 1990. DOI: 10.1016/0262-8856(90)80002-B </mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Deriche</surname>
							<given-names>R.</given-names>
						</name>
						<name>
							<surname>Faugeras</surname>
							<given-names>O.D</given-names>
						</name>
					</person-group>
					<article-title>Tracking line segments</article-title>
					<source>European Conf. on Comp. Vis</source>
					<fpage>259</fpage>
					<lpage>268</lpage>
					<year>1990</year>
					<pub-id pub-id-type="doi">10.1016/0262-8856(90)80002-B</pub-id>
				</element-citation>
			</ref>
			<ref id="B21">
				<label>[21]</label>
				<mixed-citation>[21]  Negri, P. and Garayalde, D., Concatenating multiple trajectories using kalman filter for pedestrian tracking. In: IEEE Biennial Cong. of Argentina. pp. 364-369, 2014. DOI: 10.1109/ARGENCON.2014.6868520 </mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Negri</surname>
							<given-names>P.</given-names>
						</name>
						<name>
							<surname>Garayalde</surname>
							<given-names>D</given-names>
						</name>
					</person-group>
					<source>Concatenating multiple trajectories using kalman filter for pedestrian tracking</source>
					<conf-name>IEEE Biennial Cong. of Argentina</conf-name>
					<conf-date>2014</conf-date>
					<pub-id pub-id-type="doi">10.1109/ARGENCON.2014.6868520</pub-id>
				</element-citation>
			</ref>
			<ref id="B22">
				<label>[22]</label>
				<mixed-citation>[22]  PETS2009. [online]. [Accessed on June 2016], Available at: <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.cvg.reading.ac.uk/PETS2009/a.html">http://www.cvg.reading.ac.uk/PETS2009/a.html</ext-link>
					</comment>
				</mixed-citation>
				<element-citation publication-type="webpage">
					<source>PETS2009</source>
					<comment>online</comment>
					<date-in-citation content-type="access-date" iso-8601-date="2016-00-00">June 2016</date-in-citation>
					<comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.cvg.reading.ac.uk/PETS2009/a.html">http://www.cvg.reading.ac.uk/PETS2009/a.html</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B23">
				<label>[23]</label>
				<mixed-citation>[23]  Benfold, B. and Reid, I., Stable multi-target tracking in real-time surveillance video. In: IEEE Proc. Comp. Vis. Pattern Recognit., pp. 3457-3464, 2011. DOI: 10.1109/CVPR.2011.5995667 </mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Benfold</surname>
							<given-names>B.</given-names>
						</name>
						<name>
							<surname>Reid</surname>
							<given-names>I</given-names>
						</name>
					</person-group>
					<source>Stable multi-target tracking in real-time surveillance video</source>
					<conf-name>IEEE Proc. Comp. Vis. Pattern Recognit</conf-name>
					<conf-date>2011</conf-date>
					<pub-id pub-id-type="doi">10.1109/CVPR.2011.5995667</pub-id>
				</element-citation>
			</ref>
			<ref id="B24">
				<label>[24]</label>
				<mixed-citation>[24]  Town Centre video and data, [online]. [Accessed June 2016], Available at <comment>Available at <ext-link ext-link-type="uri" xlink:href="http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html">http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html</ext-link>
					</comment>
				</mixed-citation>
				<element-citation publication-type="software">
					<source>Town Centre video and data</source>
					<comment>online</comment>
					<date-in-citation content-type="access-date" iso-8601-date="2016-00-00">June 2016</date-in-citation>
					<comment>Available at <ext-link ext-link-type="uri" xlink:href="http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html">http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B25">
				<label>[25]</label>
				<mixed-citation>[25]  Negri, P., Clady, X. and Prevost, L., Benchmarking haar and histograms of oriented gradients features applied to vehicle detection. In: Int. Conf. on Informat. in Control, Automat. and Robotics. pp. 359-364, 2007. </mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Negri</surname>
							<given-names>P.</given-names>
						</name>
						<name>
							<surname>Clady</surname>
							<given-names>X.</given-names>
						</name>
						<name>
							<surname>Prevost</surname>
							<given-names>L</given-names>
						</name>
					</person-group>
					<source>Benchmarking haar and histograms of oriented gradients features applied to vehicle detection</source>
					<conf-name>Int. Conf. on Informat. in Control, Automat. and Robotics</conf-name>
					<conf-date>2007</conf-date>
				</element-citation>
			</ref>
			<ref id="B26">
				<label>[26]</label>
				<mixed-citation>[26]  Negri, P., Clady, X., Hanif, S. and Prevost, L., A cascade of boosted generative and discriminative classifiers for vehicle detection. EURASIP JASP 2008, pp. 1-12, 2008. DOI: 10.1155/2008/782432.</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Negri</surname>
							<given-names>P.</given-names>
						</name>
						<name>
							<surname>Clady</surname>
							<given-names>X.</given-names>
						</name>
						<name>
							<surname>Hanif</surname>
							<given-names>S.</given-names>
						</name>
						<name>
							<surname>Prevost</surname>
							<given-names>L</given-names>
						</name>
					</person-group>
					<article-title>A cascade of boosted generative and discriminative classifiers for vehicle detection</article-title>
					<source>EURASIP JASP</source>
					<year>2008</year>
					<fpage>1</fpage>
					<lpage>12</lpage>
					<year>2008</year>
					<pub-id pub-id-type="doi">10.1155/2008/782432</pub-id>
				</element-citation>
			</ref>
			<ref id="B27">
				<label>[27]</label>
				<mixed-citation>[27]  Pedestrian Patches-PANKit. [online]. [Accessed on June 2016]. Available at <comment>Available at <ext-link ext-link-type="uri" xlink:href="http://pablonegri.free.fr/Downloads/PedestrianPatchesDataset-PANKit.htm">http://pablonegri.free.fr/Downloads/PedestrianPatchesDataset-PANKit.htm</ext-link>
					</comment>
				</mixed-citation>
				<element-citation publication-type="webpage">
					<source>Pedestrian Patches-PANKit</source>
					<comment>online</comment>
					<date-in-citation content-type="access-date" iso-8601-date="2016-00-00">June 2016</date-in-citation>
					<comment>Available at <ext-link ext-link-type="uri" xlink:href="http://pablonegri.free.fr/Downloads/PedestrianPatchesDataset-PANKit.htm">http://pablonegri.free.fr/Downloads/PedestrianPatchesDataset-PANKit.htm</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B28">
				<label>[28]</label>
				<mixed-citation>[28]  PASCAL VOC2012 dataset. [online]. [Accessed June 2016], Available at (register required) <comment>Available at (register required) <ext-link ext-link-type="uri" xlink:href="http://host.robots.ox.ac.uk:8080/">http://host.robots.ox.ac.uk:8080/</ext-link>
					</comment>. </mixed-citation>
				<element-citation publication-type="webpage">
					<source>PASCAL VOC2012 dataset</source>
					<comment>online</comment>
					<date-in-citation content-type="access-date" iso-8601-date="2016-00-00">June 2016</date-in-citation>
					<comment>register required</comment>
					<comment>Available at (register required) <ext-link ext-link-type="uri" xlink:href="http://host.robots.ox.ac.uk:8080/">http://host.robots.ox.ac.uk:8080/</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B29">
				<label>[29]</label>
				<mixed-citation>[29]  Online multi-person tracking by tracker hierarchy code. [online]. [Accessed June 2016]. Available at: <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://cs-people.bu.edu/jmzhang/tracker_hierarchy/Tracker_Hierarchy.htm">http://cs-people.bu.edu/jmzhang/tracker_hierarchy/Tracker_Hierarchy.htm</ext-link>
					</comment>
				</mixed-citation>
				<element-citation publication-type="webpage">
					<source>Online multi-person tracking by tracker hierarchy code</source>
					<comment>online</comment>
					<date-in-citation content-type="access-date" iso-8601-date="2016-00-00">June 2016</date-in-citation>
					<comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://cs-people.bu.edu/jmzhang/tracker_hierarchy/Tracker_Hierarchy.htm">http://cs-people.bu.edu/jmzhang/tracker_hierarchy/Tracker_Hierarchy.htm</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B30">
				<label>[30]</label>
				<mixed-citation>[30]  Continuous energy minimization for multi-target tracking matlab code. [online]. [Accessed on June 2016]. Available at: <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.milanton.de/contracking/">http://www.milanton.de/contracking/</ext-link>
					</comment>
				</mixed-citation>
				<element-citation publication-type="webpage">
					<source>Continuous energy minimization for multi-target tracking matlab code</source>
					<comment>online</comment>
					<date-in-citation content-type="access-date" iso-8601-date="2016-00-00">June 2016</date-in-citation>
					<comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.milanton.de/contracking/">http://www.milanton.de/contracking/</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B31">
				<label>[31]</label>
				<mixed-citation>[31]  Negri, P., and Lotito, P., Pedestrian detection using a feature space based on colored level lines. In: Alvarez, L., Mejail, M., Gomez, L. and Jacobo, J. (Eds). Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2012. Lecture Notes in Computer Science, 7441, pp 885-892, Springer, Berlin, Heidelberg, 2012. DOI: 10.1007/978-3-642-33275-3_109.</mixed-citation>
				<element-citation publication-type="book">
					<person-group person-group-type="author">
						<name>
							<surname>Negri</surname>
							<given-names>P.</given-names>
						</name>
						<name>
							<surname>Lotito</surname>
							<given-names>P</given-names>
						</name>
					</person-group>
					<chapter-title>Pedestrian detection using a feature space based on colored level lines</chapter-title>
					<person-group person-group-type="editor">
						<name>
							<surname>Alvarez</surname>
							<given-names>L.</given-names>
						</name>
						<name>
							<surname>Mejail</surname>
							<given-names>M.</given-names>
						</name>
						<name>
							<surname>Gomez</surname>
							<given-names>L.</given-names>
						</name>
						<name>
							<surname>Jacobo</surname>
							<given-names>J</given-names>
						</name>
					</person-group>
					<source>Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2012. Lecture Notes in Computer Science</source>
					<volume>7441</volume>
					<fpage>885</fpage>
					<lpage>892</lpage>
					<publisher-loc>Springer, Berlin</publisher-loc>
					<publisher-name>Heidelberg</publisher-name>
					<year>2012</year>
					<pub-id pub-id-type="doi">10.1007/978-3-642-33275-3_109</pub-id>
				</element-citation>
			</ref>
		</ref-list>
		<fn-group>
			<fn fn-type="other" id="fn1">
				<label>1</label>
				<p><bold>How to cite:</bold> Negri, P. &amp; Garayalde, D., Pedestrian tracking using probability fields and a movement feature space, DYNA 84 (200) 217-227, 2017.</p>
			</fn>
		</fn-group>
	</back>
</article>