<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "https://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.1" specific-use="sps-1.8" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
	<front>
		<journal-meta>
			<journal-id journal-id-type="publisher-id">dyna</journal-id>
			<journal-title-group>
				<journal-title>DYNA</journal-title>
				<abbrev-journal-title abbrev-type="publisher">Dyna rev.fac.nac.minas</abbrev-journal-title>
			</journal-title-group>
			<issn pub-type="ppub">0012-7353</issn>
			<publisher>
				<publisher-name>Universidad Nacional de Colombia</publisher-name>
			</publisher>
		</journal-meta>
		<article-meta>
			<article-id pub-id-type="doi">10.15446/dyna.v86n211.81639</article-id>
			<article-categories>
				<subj-group subj-group-type="heading">
					<subject>Artículos</subject>
				</subj-group>
			</article-categories>
			<title-group>
				<article-title>EspiNet V2: a region based deep learning model for detecting motorcycles in urban scenarios</article-title>
				<trans-title-group xml:lang="es">
					<trans-title>EspiNet V2: un modelo basado en regiones de aprendizaje profundo para detectar motocicletas en escenarios urbanos</trans-title>
				</trans-title-group>
			</title-group>
			<contrib-group>
				<contrib contrib-type="author">
					<name>
						<surname>Espinosa-Oviedo</surname>
						<given-names>Jorge Ernesto</given-names>
					</name>
					<xref ref-type="aff" rid="aff1"><sup>
 <italic>a</italic>
</sup></xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Velastín</surname>
						<given-names>Sergio A.</given-names>
					</name>
					<xref ref-type="aff" rid="aff2"><sup>
 <italic>b</italic>
</sup></xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Branch-Bedoya</surname>
						<given-names>John William</given-names>
					</name>
					<xref ref-type="aff" rid="aff3"><sup>
 <italic>c</italic>
</sup></xref>
				</contrib>
			</contrib-group>
			<aff id="aff1">
				<label>a</label>
				<institution content-type="original"> Facultad de Ingeniería, Politécnico Colombiano Jaime Isaza Cadavid, Medellín, Colombia. jeespinosa@elpoli.edu.co</institution>
				<institution content-type="normalized">Politécnico Colombiano Jaime Isaza Cadavid</institution>
				<institution content-type="orgdiv1">Facultad de Ingeniería</institution>
				<institution content-type="orgname">Politécnico Colombiano Jaime Isaza Cadavid</institution>
				<addr-line>
					<city>Medellín</city>
				</addr-line>
				<country country="CO">Colombia</country>
				<email>jeespinosa@elpoli.edu.co</email>
			</aff>
			<aff id="aff2">
				<label>b</label>
				<institution content-type="original"> Cortexica Vision Systems Ltd. UK, Universidad Carlos III de Madrid, Spain and Queen Mary University of London, UK. sergio.velastin@ieee.org</institution>
				<institution content-type="normalized">Universidad Carlos III de Madrid</institution>
				<institution content-type="orgname">Universidad Carlos III de Madrid</institution>
				<institution content-type="orgname">Mary University of London</institution>
				<country country="ES">Spain</country>
				<email>sergio.velastin@ieee.org</email>
			</aff>
			<aff id="aff3">
				<label>c</label>
				<institution content-type="original"> Facultad de Minas, Universidad Nacional de Colombia, Medellín, Colombia. jwbranch@unal.edu.co</institution>
				<institution content-type="normalized">Universidad Nacional de Colombia</institution>
				<institution content-type="orgdiv1">Facultad de Minas</institution>
				<institution content-type="orgname">Universidad Nacional de Colombia</institution>
				<addr-line>
					<city>Medellín</city>
				</addr-line>
				<country country="CO">Colombia</country>
				<email>jwbranch@unal.edu.co</email>
			</aff>
			<pub-date pub-type="collection">
				<season>Oct-Dec</season>
				<year>2019</year>
			</pub-date>
			<volume>86</volume>
			<issue>211</issue>
			<fpage>317</fpage>
			<lpage>326</lpage>
			<history>
				<date date-type="received">
					<day>12</day>
					<month>08</month>
					<year>2019</year>
				</date>
				<date date-type="rev-recd">
					<day>08</day>
					<month>11</month>
					<year>2019</year>
				</date>
				<date date-type="accepted">
					<day>28</day>
					<month>11</month>
					<year>2019</year>
				</date>
			</history>
			<permissions>
				<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by-nc-nd/4.0" xml:lang="en">
					<license-p>The author; licensee Universidad Nacional de Colombia.</license-p>
				</license>
			</permissions>
			<abstract>
				<title>Abstract</title>
				<p>This paper presents “EspiNet V2” a Deep Learning model, based on the region-based detector Faster R-CNN. The model is used for the detection of motorcycles in urban environments, where occlusion is likely. For training, two datasets are used: the Urban Motorbike Dataset (UMD-10K) of 10,000 annotated images, and the new SMMD (Secretaría de Movilidad Motorbike Dataset), of 5,000 images captured from the Traffic Control CCTV System in Medellín (Colombia). Results achieved on the UMD-10K dataset reach 88.8% in average precision (AP) even when 60% motorcycles were occluded, and the images were captured from a low angle and a moving camera. Meanwhile, an AP of 79.5% is reached for SSMD. EspiNet V2 outperforms popular models such as YOLO V3 and Faster R-CNN (VGG16 based) trained end-to-end for those datasets.</p>
			</abstract>
			<trans-abstract xml:lang="es">
				<title>Resumen</title>
				<p>Este artículo presenta &quot;EspiNet V2&quot;, un modelo de aprendizaje profundo, fundamentado en el detector basado regiones Faster R-CNN. El modelo es usado para la detección de motocicletas en entornos urbanos, donde se presenta algún nivel de oclusión. Para el entrenamiento de dicho modelo, se utilizaron dos conjuntos de datos: el conjunto de datos de motocicletas urbanas (UMD-10K) que cuenta con 10,000 imágenes anotadas, y el nuevo conjunto de datos de motos de la Secretaría de Movilidad (SMMD), con 5,000 imágenes capturadas obtenidas del Sistema CCTV de Control de Tráfico de la ciudad de Medellín (Colombia). Los resultados obtenidos en el conjunto de datos UMD-10K alcanzan el 88.8% en precisión promedio (AP), incluso con niveles de oclusión de un 60 %, utilizando imágenes capturadas desde un ángulo bajo y desde una cámara en movimiento. Por otro lado se alcanza un AP de 79.5 % para conjunto de datos de motos de la Secretaría de Movilidad (SMMD). EspiNet V2 supera modelos populares como YOLO V3 y Faster R-CNN (basado en VGG16), siendo estos entrenados de extremo a extremo utilizando los conjuntos de datos mencionados.</p>
			</trans-abstract>
			<kwd-group xml:lang="en">
				<title><bold>
 <italic>Keywords</italic>:</bold></title>
				<kwd>vehicle detection</kwd>
				<kwd>motorcycle detection</kwd>
				<kwd>Faster R-CNN</kwd>
				<kwd>region-based detectors</kwd>
				<kwd>convolutional neural network</kwd>
				<kwd>deep learning</kwd>
			</kwd-group>
			<kwd-group xml:lang="es">
				<title><bold>
 <italic>Palabras clave</italic>:</bold></title>
				<kwd>detección de vehículos</kwd>
				<kwd>detección de motocicletas</kwd>
				<kwd>Faster R-CNN</kwd>
				<kwd>detectores basados en regiones</kwd>
				<kwd>redes neuronales convolucionales</kwd>
				<kwd>aprendizaje profundo</kwd>
			</kwd-group>
			<counts>
				<fig-count count="7"/>
				<table-count count="3"/>
				<equation-count count="5"/>
				<ref-count count="57"/>
				<page-count count="10"/>
			</counts>
		</article-meta>
	</front>
	<body>
		<sec sec-type="intro">
			<title>1. Introduction</title>
			<p>The World Health Organization (WHO) reports in the <italic>Global status report on road safety 2018</italic> that more than half (54%) of the road traffic deaths corresponds to Vulnerable Road Users (pedestrians, cyclists, motorcyclists) [<xref ref-type="bibr" rid="B1">1</xref>]. From this rate, 28% corresponds to Motorcycles. The annual report Traffic Accidents of the Andean Community (Bolivia, Colombia, Ecuador and Perú) [<xref ref-type="bibr" rid="B2">2</xref>] documented 347,642 traffic accidents, 88% of them occurred in urban areas. In this region, for year 2017, Colombia has 57.35% of the total traffic accidentally rate, reporting 171,571 occurrences with 6,479 fatal victims. Although for 2017 deaths due to transport accidents were reduced by 7.23% compared to 2016, these numbers still high compared to world statistics [<xref ref-type="bibr" rid="B3">3</xref>]. Motorcyclist are the road users most affected by traffic accidents in Colombia, reporting 49.82% of deaths and 56.36% injured victims [<xref ref-type="bibr" rid="B3">3</xref>]. This high accidentallity rate can be partially explained due to that 58% of the 14.880.823 total vehicles registered in Colombia for 2019 2Q corresponds to Motorcycles [<xref ref-type="bibr" rid="B4">4</xref>], and 76.64% of these motorcycles belongs to the street sport segment which is used as a regular transport mean. </p>
			<p>Air quality is also an issue in the main cities of Colombia. The National Planning Department (DNP) estimated that, during 2015, the effects of air pollution were associated with 10,527 deaths and 67.8 million symptoms and diseases [<xref ref-type="bibr" rid="B5">5</xref>]. The contaminant with the greatest potential for affectation is Particulate Material Less than 2.5 microns (PM2.5), which is made up of very small particles, produced mainly by heavy vehicles that use diesel as fuel, and which can carry very dangerous material for human body such as heavy metals, organic compounds and viruses, thus affecting the respiratory tract [<xref ref-type="bibr" rid="B6">6</xref>]. In Colombia, 59% of PM2.5 is produced by land transportation, from which 40% corresponds to motorcycles. It is therefore desirable to monitor urban motorcycle traffic to reduce incidents and air pollution on what are becoming very congested roads.</p>
			<p>Video analytic techniques for vehicle detection have been used in urban traffic analysis, reporting success for detecting regular vehicles (bus, cars, trucks), but there is scarce literature on the analysis of motorcycles as major users in many urban environments, characterised by frequent occlusion between vehicles in congested traffic conditions. </p>
			<p>In this paper, we introduce EspiNet V2 a deep learning model based on the two-stage detector Faster R-CNN [<xref ref-type="bibr" rid="B7">7</xref>] (Faster Regions with Convolutional Neural Networks features). The model is used to detect motorcycles in congested urban traffic scenes. The paper is structured as follows; section 2 reviews the literature on motorcycles detection, section 3 gives a brief introduction to deep CNN and Faster R-CNN, section 4 explains the proposed EspiNet V2 model, detailing its architecture and main differences w.r.t Faster R-CNN. Section 5 describes the different experiments done employing the UMD-10K and SMMD datasets, providing a results analysis. The article finishes with section 6 with conclusions and proposed future work.</p>
		</sec>
		<sec>
			<title>2. Motorcycle detection</title>
			<p>Video analytics supports most of the current urban traffic analysis and vehicle detection systems. Traditional approaches for vehicle detection extract discriminative features for vehicle representation, which later implement classification, usually using classifiers trained on those features. Features are generally extracted from object appearance or derived from motion information [<xref ref-type="bibr" rid="B8">8</xref>]. </p>
			<p>Motorcycle detection works based on appearance features such as edge maps are introduced in [<xref ref-type="bibr" rid="B9">9</xref>] using Gabor filters and the Sobel operator [<xref ref-type="bibr" rid="B10">10</xref>] to reduce illumination variances. Other approaches use corner detection with Harris corners [<xref ref-type="bibr" rid="B11">11</xref>], or even using Haar-like features [<xref ref-type="bibr" rid="B12">12</xref>,<xref ref-type="bibr" rid="B13">13</xref>], despite the poor correlation under different view angles. Feature descriptors such as Histogram of oriented gradients (HOG), Scale-invariant feature transform (SIFT), and Local binary patterns (LBP) are compared in [<xref ref-type="bibr" rid="B14">14</xref>] and [<xref ref-type="bibr" rid="B15">15</xref>] for motorcycle detection. For helmet detection in motorcycles riders Speeded up robust features (SURF), Haar-like features (HAAR) and HOG [<xref ref-type="bibr" rid="B16">16</xref>] have been used as feature descriptors. Meanwhile, in [<xref ref-type="bibr" rid="B17">17</xref>], they use hybrid descriptor based on colour for helmet identification. Appearance features based on computer-generated 3D models are used to discriminate between motorcycles and bicycles in [<xref ref-type="bibr" rid="B18">18</xref>], and between car/taxi, bus/lorry, motorbike/bicycle, van, and pedestrian in [<xref ref-type="bibr" rid="B19">19</xref>]. Background subtraction uses spatio-temporal information for detecting a moving object in a giving scene. Motorcycle detection [<xref ref-type="bibr" rid="B20">20</xref>,<xref ref-type="bibr" rid="B21">21</xref>] starts with this technique and uses segmentation to detect and separate motorcycles in the analysis. In some works, a similar approach is used, even to detect motorcycle riders without a helmet [<xref ref-type="bibr" rid="B24">24</xref>-<xref ref-type="bibr" rid="B29">29</xref>]. </p>
			<p>The most used algorithm for background subtraction is Gaussian Mixture Models (GMM) [<xref ref-type="bibr" rid="B22">22</xref>], used in [<xref ref-type="bibr" rid="B23">23</xref>,<xref ref-type="bibr" rid="B24">24</xref>]. For dealing with object shadows and for continuous update of parameters, Self Adaptive GMM [<xref ref-type="bibr" rid="B25">25</xref>] is used in [<xref ref-type="bibr" rid="B26">26</xref>] or adaptive background modelling used in [<xref ref-type="bibr" rid="B14">14</xref>] and [<xref ref-type="bibr" rid="B15">15</xref>]. Nevertheless, background subtraction may fail in congested scenarios or where the objects overlap each other, difficulting their detection, with camera movements, or when objects tend to become part of the background, after a prolonged static sequence as typical in traffic jams.</p>
			<p>Motorcycle detection in [<xref ref-type="bibr" rid="B24">24</xref>] uses spatial features in conjunction with motion features obtained from optical flow, this type of features is useful for obstacle detection in a Lane Change Assistant (LCA) system [<xref ref-type="bibr" rid="B10">10</xref>].</p>
			<p>The most frequently classifiers used for motorcycles classification are Support Vector Machines (SVM), used for classifying and counting motorcycles in [<xref ref-type="bibr" rid="B9">9</xref>], where object occlusion is avoided capturing images from a top-view point. For helmet detection, different types of kernels are compared in [<xref ref-type="bibr" rid="B14">14</xref>] and [<xref ref-type="bibr" rid="B15">15</xref>] using background subtraction for object detection. Head regions described by histograms are also used for helmet detection in [<xref ref-type="bibr" rid="B27">27</xref>], which are later classified by a linear SVM. This method may fail with drastic changes of illumination. SVMs are also used for classifying a multi-shape descriptor vehicle [<xref ref-type="bibr" rid="B25">25</xref>,<xref ref-type="bibr" rid="B26">26</xref>] demanding high computational resources for the descriptor construction and evaluation. There is also a proposed Real-Time on Road Vehicle Detection system [<xref ref-type="bibr" rid="B10">10</xref>], which uses a binary SVM classification by hierarchies, boosting its performance thanks to an Integrated Memory Array Processor (IMAP) architecture. Nonetheless, the model can fail in adverse weather conditions with low illumination. SVMs for motorcycles detection are also used in conjunction with Bag of Visual Words (BoVW) [<xref ref-type="bibr" rid="B20">28</xref>] with a Radial basis function kernel (RBF) or using HOG as a feature descriptor [<xref ref-type="bibr" rid="B29">29</xref>], even with 3D models as appearance features [<xref ref-type="bibr" rid="B18">18</xref>,<xref ref-type="bibr" rid="B19">19</xref>]. </p>
			<p>Other classifiers used are decision trees for Overhead Real-Time Motorbike Counting [<xref ref-type="bibr" rid="B30">30</xref>], where the method relies on the camera specification for decision tree rule construction. Neural networks (NN) such as the Multilayer Perceptron (MLP) have been proposed for motorcycle detection and classification, even though their architectures require tuning of many parameters and the implemented loss function may not converge to a local optimum. Nevertheless, NN are used for helmet detection in [<xref ref-type="bibr" rid="B16">16</xref>,<xref ref-type="bibr" rid="B31">31</xref>]. There is also Fuzzy neural network (FNN) [<xref ref-type="bibr" rid="B24">24</xref>], but without a significant number of motorcycles to detect in their dataset. Finally, K-Nearest Neighbor (KNN) is also used for Helmet detection [<xref ref-type="bibr" rid="B23">23</xref>]; nevertheless, this model relies on the background subtraction accuracy for motorcycle individualisation, which may fail in occluded scenarios.</p>
			<sec>
				<title><italic>2.1. Deep learning for motorcycle detection</italic></title>
				<p>In recent years deep learning has erupted in the field of computer vision showing impressive results, mainly due to the computing capacity that GPUs (Graphics Processing Units) provide for training models, as well as the creation of vast manually labelled datasets of generic objects.</p>
				<p>The work of Vishnu et al. [<xref ref-type="bibr" rid="B32">32</xref>] use Convolutional Neural Networks (CNNs) as feature extractors in combination with background subtraction for object detection. Once the object is detected, for instance using GMM, the features extracted using the CNN model (e.g., AlexNet), are used to perform classification [<xref ref-type="bibr" rid="B33">33</xref>]. Instead of background subtraction, object localisation uses selective search as in [<xref ref-type="bibr" rid="B34">34</xref>]. Nevertheless, the work in [<xref ref-type="bibr" rid="B35">35</xref>] proposes a straightforward CNN for detecting and classifying motorcycles. The input image is passed through the feature extraction layers generating a motorcycle score map. This score map is thresholded followed by non-maximal suppression for individual motorcycle detections. Most recent works are oriented to detect helmet violation for motorcycle users. For instance, in [<xref ref-type="bibr" rid="B36">36</xref>], motorcycles are detected using HOG+SVM, and later, the riders head area is supplied to a CNN model for helmet presence detection. The work in [<xref ref-type="bibr" rid="B32">32</xref>] proposes a similar approach. Meanwhile, in [<xref ref-type="bibr" rid="B37">37</xref>], moving objects are detected using motion detection algorithms, a pedestrian CNN model is used to detect humans, later a CNN is used again to detect the presence of helmet and the colour of it.</p>
				<p>Unfortunately, the analysed literature lacks a unified metric for reporting results and most of the methods use proprietary datasets which are seldom available for comparison and use by the research community.</p>
			</sec>
			<sec>
				<title>3. Deep CNN networks and Faster R-CNN</title>
				<p>Convolutional Neural Networks (CNNs) are a type of neural network, whose architecture is based on convolutional filters able to capture spatial patterns and that reduce the computational burden of learning parameters. This approach produces features invariant to scale, shift or rotation as the receptive fields provide the neurons access to primitive features such as oriented edges and corners in the initial convolutional layers, which are then aggregated generating more complex features going deeper in the model. Features derived from CNNs often outperform feature descriptors such as HOG, SIFT, SURF, LBP [<xref ref-type="bibr" rid="B38">38</xref>,<xref ref-type="bibr" rid="B39">39</xref>].</p>
				<p>While features obtained from CNNs are very useful for classification, the problem of object detection not only involves the classification of the objects but their localisation in the image. When Spatio-temporal information is available (video sequences), approaches such as background subtraction, optical flow or motion detection algorithms, help to identify moving objects, extracting features from the detected blobs, which are later classified. This approaches may fail due to camera movement, static objects, or even illumination changes. The lack of Spatio-temporal information as in single or static images (frames) forces the use of approaches that combine sliding window search (which slides a window e.g. from left to right, and from up to down in the image extracting patches later used for classification) with binary classifiers (object vs background). Object proposal algorithms, like Branch &amp; Bound [<xref ref-type="bibr" rid="B40">40</xref>], Selective search [<xref ref-type="bibr" rid="B41">41</xref>] Spatial Pyramid Pooling [<xref ref-type="bibr" rid="B42">42</xref>] and Edge boxes [<xref ref-type="bibr" rid="B43">43</xref>] are approaches designed to deal with the large numbers of windows useful to cover different aspect ratios and scales.</p>
				<p>Two-stage detectors as R-CNN (Regions with CNN features) [<xref ref-type="bibr" rid="B44">44</xref>] use selective search to generate up to 2,000 regions which are provided to a CNN to produce a feature vector later fed into SVM to determine the occurrence of an object and the values necessary to adjust the bounding box to the detected object. Since the number of selective search proposals is fixed and is a time-consuming task, Fast R-CNN [<xref ref-type="bibr" rid="B45">45</xref>] feeds the input image to the CNN to generate a feature map, identifying the proposal regions which are later warped and fed into a fully connected layer using a Region of Interest (RoI) pooling layer. This model reduces computational time due to the use of only one convolution operation per image instead of 2000 of the R-CNN model. Nevertheless, the Region Proposal is still the bottleneck during testing time. </p>
				<p>Faster R-CNN [<xref ref-type="bibr" rid="B7">7</xref>] speeds up the detection process, eliminating the use of selective search and using a CNN model which simultaneously learns region proposal and perform object detection. As in Fast R-CNN, the input image is passed through the CNN model generating a feature map, over this feature map the Region Proposal Network (RPN) deploys a sliding window to generate <italic>n</italic> bounding boxes with their associated scores per window. These <italic>n</italic> boxes are called anchor boxes and represent common sizes and aspect ratios that objects can have. A RoI pooling layer is used to reshape predicted region proposals, classifying the image inside the proposed region and generating the offset values for bounding boxes using regression (<xref ref-type="fig" rid="f1">Fig. 1</xref>).</p>
				<p>
					<fig id="f1">
						<label>Figure 1</label>
						<caption>
							<title>The components of Faster R-CNN.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-86-211-317-gf1.png"/>
						<attrib>Source: Image modified from [<xref ref-type="bibr" rid="B46">46</xref>]</attrib>
					</fig>
				</p>
			</sec>
			<sec>
				<title>4. EspiNet V2 </title>
				<p>EspiNet V2 (<xref ref-type="fig" rid="f2">Fig. 2</xref>) is a deep learning model proposed here that is based on the region based detector Faster R-CNN. This model is used to detect motorcycles in congested urban traffic scenes. Occluded scenarios are frequent on urban traffic analysis (<xref ref-type="fig" rid="f3">Fig. 3</xref>). General vehicle detection in urban conditions has been studied by many authors. Occluded situations has been analysed using the KITTI dataset [<xref ref-type="bibr" rid="B47">47</xref>], which unluckily lacks a motorcycle category. EspiNet V2 is an improved version of the one presented in [<xref ref-type="bibr" rid="B48">48</xref>]. This new model increases the number of convolutional layers, pursuing to capture more aggregate features that contribute to identify motorcycles in the given images. </p>
				<p>
					<fig id="f2">
						<label>Figure 2</label>
						<caption>
							<title>EspiNet V2, the Proposed CNN Model. The same model implements RPN and classification.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-86-211-317-gf2.png"/>
						<attrib>Source: The authors.</attrib>
					</fig>
				</p>
				<p>
					<fig id="f3">
						<label>Figure 3</label>
						<caption>
							<title>Example image of the Urban Motorbike Dataset. The smallest object size is 25 px. Occlusions are frequent between motorcycles and other vehicles.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-86-211-317-gf3.png"/>
						<attrib>Source: The Authors.</attrib>
					</fig>
				</p>
				<p>EspiNet V2 is publicly available for download (<ext-link ext-link-type="uri" xlink:href="https://github.com/muratayoshio/EspiNet">https://github.com/muratayoshio/EspiNet</ext-link>). The model can detect motorcycles in congested urban scenarios and, as in Faster R-CNN, unifies two networks: a Region proposal network (RPN) and a Fast R-CNN [<xref ref-type="bibr" rid="B45">45</xref>] detector, sharing the convolutional layers between the two architectures. The main difference between EspiNet V2 and Faster R-CNN lies in the CNN implemented. The best results of Faster R-CNN are obtained working with quite deep models such as VGG-16 [<xref ref-type="bibr" rid="B49">49</xref>] having 16 weight layers, 13 of them convolutional and ~ 138 million parameters to be learned. EspiNet V2 uses a more concise CNN network with only six layers (4 convolutional) reducing the number of parameters to learn (~2 million), still outperforming Faster R-CNN in the chosen task (see section V).</p>
				<p>
					<xref ref-type="table" rid="t1">Table 1</xref> shows in detail the configuration and parameters of the EspiNet V2 model.</p>
				<p>
					<table-wrap id="t1">
						<label>Table 1</label>
						<caption>
							<title>Architecture and learnable parameters of EspiNet V2.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-86-211-317-gt1.jpg"/>
						<table-wrap-foot>
							<fn id="TFN1">
								<p>Source: The Authors.</p>
							</fn>
						</table-wrap-foot>
					</table-wrap>
				</p>
				<p>The input size for classification is the size of the training images. Meanwhile, for detection task, the input layer is a tensor of 32x32x3 (32x32 pixels, 3 channels), considering that in UMD-10K and SMMD datasets the smallest annotated object has a size of 25 pixels. This input layer is zero-center normalised, and its size is determined according to the processing time and the spatial detail the CNN model has to resolve. The first convolutional layer has 64 filters of size 3x3. The same filter size is used for all the convolutional layers to produce a small receptive field, to capture smaller and complex features in the image and optimise the weight sharing process. Each convolutional layer is followed by a ReLU (rectified linear unit) layer, making the learning process computationally efficient, speeding up convergence and reducing the vanishing gradient effect.</p>
				<p>The last two convolutional layers duplicate the number of filters, capturing more complex features, later used for motorcycle recognition due to its enriched image representation [<xref ref-type="bibr" rid="B50">50</xref>]. As in Faster R-CNN Faster R-CNN [<xref ref-type="bibr" rid="B7">7</xref>] architecture, a max RoI pooling layer is used after the four convolutional filters for detection purposes, it removes redundant spatial information, reduces and fixes the feature map spatial size. </p>
				<p>This layer is set to a 15x15 pixels grid covering the smallest detected object. It is the only max-pooling layer in the model since prematurely down-sampling data can lead to loss of important information necessary for learning [<xref ref-type="bibr" rid="B51">51</xref>]. After the first fully connected (FC) layer (64 neurons) combines all features extracted in the previous layers, which is corrected next by a ReLU layer, finally combined in the second fully connected layer. The last layer of the model is a softmax layer, which normalises the output of the previous FC layer, providing a confidence measure and computing the loss of the model. <xref ref-type="fig" rid="f2">Fig. 2</xref> shows the schematic model of EspiNet V2 network.</p>
				<p>The multi-task loss function defined for one image is:</p>
				<p>
					<disp-formula id="e1">
						<graphic xlink:href="0012-7353-dyna-86-211-317-e1.png"/>
					</disp-formula>
				</p>
				<p>In <xref ref-type="disp-formula" rid="e1">eq. (1)</xref><italic>i</italic> is the anchor index in a mini-batch (with positives and negatives examples anchors), <italic>p</italic>
 <sub>
 <italic>i</italic>
</sub> is the predicted probability that the anchor <italic>i</italic> is an object. The ground truth (gt) <italic>p</italic>
 <sub>
 <italic>i</italic>
</sub> 
 <sup>*</sup> has label 1 if the anchor is positive, 0 is the anchor is negative. <italic>t</italic>
 <sub>
 <italic>i</italic>
</sub> represents the predicted bounding box using a vector of 4 parametrised coordinates, where <italic>t</italic>
 <sub>
 <italic>i</italic>
</sub> 
 <sup>*</sup> is the gt box coordinates vector related to a positive anchor. </p>
				<p>The classification loss <italic>L</italic>
 <sub>
 <italic>cls</italic>
</sub> part uses a logistic regression cost function. Meanwhile for the bounding box regression loss part <italic>L</italic>
 <sub>
 <italic>reg</italic>
</sub> (<italic>t</italic>
 <sub>
 <italic>i</italic>
</sub> ,<italic>t</italic>
 <sub>
 <italic>i</italic>
</sub> 
 <sup>*</sup>) the robust loss function (smooth <italic>L</italic>
 <sub>
 <italic>1</italic>
</sub> ) is used.</p>
				<p>
					<disp-formula id="e2">
						<graphic xlink:href="0012-7353-dyna-86-211-317-e2.png"/>
					</disp-formula>
				</p>
				<p>in which</p>
				<p>
					<disp-formula id="e3">
						<graphic xlink:href="0012-7353-dyna-86-211-317-e3.png"/>
					</disp-formula>
				</p>
				<p>In this bounding box regression, each coordinate is parameterized as follows:</p>
				<p>
					<disp-formula id="e4">
						<graphic xlink:href="0012-7353-dyna-86-211-317-e4.png"/>
					</disp-formula>
				</p>
				<p>where <italic>x, y,</italic> corresponds to the boxs center coordinates, <italic>w,</italic> and <italic>h</italic> its width and height. Variables <italic>x</italic> corresponds to predicted box, <italic>x</italic>
 <sub>
 <italic>a</italic>
</sub> anchor box and <italic>x</italic>
 <sup>
 <italic>*</italic>
</sup> ground-truth box, (similarly for <italic>y, w</italic> and <italic>h</italic> variables). This can be assumed as a bounding-box regression from an anchor box to the closest ground truth box. The coordinates of the bounding box are values [0,1] which are relative to a specific anchor. For example, <italic>t</italic>
 <sub>
 <italic>y</italic>
</sub> denotes the coefficient for <italic>y</italic> (box center <italic>x</italic>,<italic>y</italic>). If <italic>t</italic>
 <sub>
 <italic>y</italic>
</sub> is multiplied by <italic>h</italic>
 <sub>
 <italic>a</italic>
</sub> and then add <italic>y</italic>
 <sub>
 <italic>a</italic>
</sub> we get the predicted <italic>y</italic>. The rest of parameters can be calculated in the same way. </p>
				<p>Training comprises four steps using an alternating optimisation. The first two steps train the RPN and the detector network separately. For these first two steps, EspiNet V2 uses a learning rate of <italic>1e-5</italic> trying to obtain a fast convergence, as it is trained from scratch, and no pre-trained models are used for the shared convolutional layers [<xref ref-type="bibr" rid="B7">7</xref>]. Once the shared convolutional layers are trained and fixed, the last two steps fine-tuning the unique layers of the RPN and Fast R-CNN detector, using a learning rate of <italic>1e-6</italic> for a smoother process.</p>
				<p>The optimisation training algorithm used in all the described steps is Stochastic Gradient Descent with Momentum (SGDM) (<xref ref-type="disp-formula" rid="e1">eq. (5)</xref>).</p>
				<p>
					<disp-formula id="e5">
						<graphic xlink:href="0012-7353-dyna-86-211-317-e5.png"/>
					</disp-formula>
				</p>
				<p>where <inline-formula id="e6">
						<inline-graphic xlink:href="0012-7353-dyna-86-211-317-ie6.png"/>
					</inline-formula> is the iteration number, the learning rate is defined as α &gt; 0, weights and biases define the parameter vector <italic>θ</italic> and <italic>E</italic>(<italic>θ)</italic> is the loss function. The algorithm is stochastic since it uses a subset of the training set (minibatch) to evaluate and update the parameter vector. One iteration corresponds to each evaluation of the gradient using the mini-batch. At each iteration, the algorithm takes one step towards minimising the loss function. One epoch encompasses the full pass of the training algorithm over the entire training set using mini-batches. For EspiNet v2, the number of epochs is defined after training analysis [<xref ref-type="bibr" rid="B4">48</xref>]. The momentum term γ regulates the contribution of the previous gradient step to the current iteration and is used to avoid oscillation along steepest descent to the optimum.</p>
			</sec>
		</sec>
		<sec sec-type="results">
			<title>5. Experiments and results</title>
			<sec>
				<title><italic>5.1. Motorbikes datasets</italic></title>
				<p>To train and evaluate the proposed model, two datasets are used: The UMD-10K dataset, which is an extension of [<xref ref-type="bibr" rid="B48">48</xref>], with 10,000 annotated images including 317 motorcycles with 56,975 individual annotations (bounding boxes). 60% of the annotated data corresponds to occluded motorcycles (See <xref ref-type="fig" rid="f3">Fig. 3</xref>). Moreover, the Secretaría de Movilidad de Medellín created the Sistema Inteligente de Movilidad de Medellín (Intelligent Mobility System of Medellín) [<xref ref-type="bibr" rid="B52">52</xref>], which includes a CCTV with 80 cameras to monitoring urban traffic conditions in this Colombian city. From this network of cameras, we selected six strategic surveillance located cameras (<xref ref-type="fig" rid="f4">Fig. 4</xref>) to create the SSMD dataset with 5,000 images, containing 21,625 annotated motorcycles (817 different motorcycles). (<xref ref-type="fig" rid="f5">Fig. 5</xref>). These dataset are available from <ext-link ext-link-type="uri" xlink:href="http://videodatasets.org/UrbanMotorbike">http://videodatasets.org/UrbanMotorbike</ext-link>.</p>
				<p>
					<fig id="f4">
						<label>Figure 4</label>
						<caption>
							<title>Localisation map of the 80 cameras of the CCTV Secretaría de Movilidad de Medellín [<xref ref-type="bibr" rid="B52">52</xref>]. Six cameras are selected for this research. </title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-86-211-317-gf4.png"/>
						<attrib>Source: The Authors.</attrib>
					</fig>
				</p>
				<p>
					<fig id="f5">
						<label>Figure 5</label>
						<caption>
							<title>Images examples of the six selected cameras. Each camera covers an important urban zone; from left to right: Belalcazar, Carlos E., Oriental 1 and 2, Zenú and finally Sura. Note the rather poor quality of the images.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-86-211-317-gf5.png"/>
						<attrib>Source: The Authors.</attrib>
					</fig>
				</p>
			</sec>
			<sec>
				<title><italic>5.2. Results on the UMD-10K dataset</italic></title>
				<p>The performance of previous experiments in [<xref ref-type="bibr" rid="B48">48</xref>] achieved a 75.23% of Average Precision (AP) [<xref ref-type="bibr" rid="B53">53</xref>], training and evaluated on the UMD-7.5k, with 7,500 examples. </p>
				<p>EspiNet is now compared with two models: YOLO V.3 [<xref ref-type="bibr" rid="B54">54</xref>] as a single-stage detector and for a two-stage detectors, the original Faster R-CNN [<xref ref-type="bibr" rid="B49">49</xref>] (VGG16 based). We selected these models since they have been extensively used to compare new proposals, and because of their good performance and their availability in the public domain. All these models were trained end to end from scratch, using the challenging UMD-10k dataset.</p>
				<p>As is recommended by [<xref ref-type="bibr" rid="B55">55</xref>] an due to the large number of examples needed to train, the three models use 90% (9,000 images) of the UMD-10k dataset for training data, while the remaining 10% (1,000 images) are used for validation. The selection of training and test set is done randomly to avoid any bias in the distribution. </p>
				<p>The proposed EspiNet V2 model obtain of 88.8% of AP and 91.8% of F1-score [<xref ref-type="bibr" rid="B56">56</xref>], which outperforms results for YOLO and Faster R-CNN (VGG16 based). <xref ref-type="table" rid="t2">Table 2</xref> shows the comparative results. <xref ref-type="fig" rid="f6">Fig. 6</xref> presents a graphic comparison of the three models Average Precision (AP).</p>
				<p>
					<table-wrap id="t2">
						<label>Table 2</label>
						<caption>
							<title>EspiNet model against Faster-RCNN (VGG16 based) [<xref ref-type="bibr" rid="B49">49</xref>] and YOLO V3 [<xref ref-type="bibr" rid="B54">54</xref>], comparative results - Results on UMD dataset.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-86-211-317-gt2.png"/>
						<table-wrap-foot>
							<fn id="TFN2">
								<p>Source: The Authors.</p>
							</fn>
						</table-wrap-foot>
					</table-wrap>
				</p>
				<p>
					<fig id="f6">
						<label>Figure 6</label>
						<caption>
							<title>Average Precision (AP) of the model compared with YOLO V3 and Faster R-CNN (VGG16 based). Results on UMD-10K dataset.</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-86-211-317-gf6.png"/>
						<attrib>Source: The Authors</attrib>
					</fig>
				</p>
				<p>In all metrics, EspiNet obtains better results than the other two detectors, being YOLO V3 the closest performance. YOLO achieved almost equal precision but a reduced recall since the single stage detector architecture has not Region Proposal Network (RPN), failing to detect too small objects or that appear too close each other. </p>
				<p>The results of the detectors applied to the UMD-10K dataset can be seen on <ext-link ext-link-type="uri" xlink:href="https://goo.gl/bJM3HF">https://goo.gl/bJM3HF</ext-link>
				</p>
			</sec>
			<sec>
				<title><italic>5.3. Results on SMMD dataset</italic></title>
				<p>On the Secretaría de Movilidad de Medellín dataset (SMMD), we train EspiNet, Faster R-CNN (VGG based) and YOLO V3 end to end using the same proportion of training and evaluating sets of UMD-10k. </p>
				<p>
					<xref ref-type="table" rid="t3">Table 3</xref> shows that EspiNet V2 over-perform YOLO V3 and Faster R-CNN in the terms of AP, reaching 79.52 and with a Recall of 83.39. This can be explained again by the absence of RPN in YOLO V3, which fails to detect objects that appear too close or too small. Nevertheless, YOLO V3 can deal better with false detections, outperforming the region based detectors (EspiNet V2 and Faster R-CNN) in terms of Precision, consequently improving the final F1 score. <xref ref-type="fig" rid="f7">Fig. 7</xref> shows the comparative performance of the three detector in terms of Average Precision (AP).</p>
				<p>
					<table-wrap id="t3">
						<label>Table 3</label>
						<caption>
							<title>Comparative detection results - Results for the SMMD dataset</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-86-211-317-gt3.png"/>
						<table-wrap-foot>
							<fn id="TFN3">
								<p>Source: The Authors.</p>
							</fn>
						</table-wrap-foot>
					</table-wrap>
				</p>
				<p>
					<fig id="f7">
						<label>Figure 7</label>
						<caption>
							<title>Average Precision (AP) of the model compared with YOLO V3 and Faster R-CNN (VGG16 based). Results on Secretaría de Movilidad de Medellín Dataset (SMMD).</title>
						</caption>
						<graphic xlink:href="0012-7353-dyna-86-211-317-gf7.png"/>
						<attrib>Source: The Authors.</attrib>
					</fig>
				</p>
				<p>EspiNet V2 and the Faster R-CNN (VGG 16 based) models were trained on a Windows 10 Machine with a CPU core i7 7th generation 4.7 GHz, with 32 GB of RAM using a NVIDIA Titan X (Pascal) 1531Mhz GPU.</p>
				<p>On UMD-10k dataset, the training process of EspiNet V2 model took 32 hours and 47 hours for training Faster R-CNN (VGG 16) model. A Linux machine running Ubuntu 16.04.3, with a Xeon E5-2683 v4 2.10GHz CPU, 64 GB of RAM and a NVIDIA Titan Xp 1582 Mhz GPU was used for training YOLO V3. This model took 18 hours for training on UMD-10k dataset. All models were trained end to end from scratch. </p>
				<p>The time employed for training the model for the SMMD dataset were 24 hours for EspiNet V2, 35 hours for Faster R-CNN (VGG 16) and 14 hours for YOLO, using the same environments described previously.</p>
			</sec>
		</sec>
		<sec sec-type="conclusions">
			<title>6. Conclusions and future work</title>
			<p>This paper has introduced EspiNet V2, a model derived from Faster R-CNN, for motorcycle detection in urban scenarios. The model can deal with occluded objects achieving an Average Precision of nearly 90% for UMD-10K, as far as we know the most challenging urban motorbike detection dataset at present. It achieves almost 80% AP in the new SMMD, also a challenging dataset made public for other researchers to improve on these baseline results.</p>
			<p>EspiNet V2 and the deep learning detectors models as YOLO V3 and Faster R-CNN (VGG16 based) are compared in this study. The models were trained in the UMD-10k and SMMD datasets, and EspiNet V2 was found to outperform the others in terms of Average Precision (AP).</p>
			<p>As per most deep learning architectures, and is also evaluated in [<xref ref-type="bibr" rid="B57">57</xref>], the model obtains better results as the number of training examples increases. It is important to have enough representative data for each distribution of examples used for train a deep learning model. The amount and distribution of examples used in these two datasets explain the quality of the classification obtained.</p>
			<p>The use of spatio-temporal information could be integrated to the model to improve detection capabilities. EspiNet V2 could be used as a neural network layer that incorporates not only the current time step input information (frame) but also the activation values of previous time steps (previous frames). This architecture corresponds to Recurrent Neural Networks (RNNs) such as Gated Recurrent Units (GRUs) or Long short-term memory (LSTM) which apply sequence modelling for predicting next stages after initial detection according to historical information. This improvement could lead to detection by tracking, where the models can spread their detection class scope to include other urban road users like pedestrians, cyclists, trucks, buses, etc.</p>
		</sec>
	</body>
	<back>
		<ack>
			<title>Acknowledgements </title>
			<p>Sergio A. Velastin is grateful for funding received from the Universidad Carlos III de Madrid, the European Unions Seventh Framework Programme for Research, Technological Development and demonstration under grant agreement N. 600371, el Ministerio de Economia, Industria y Competitividad (COFUND2013-51509) el Ministerio de Educación, Cultura y Deporte (CEI-15-17) and Banco Santander.</p>
			<p>This work was partially supported by COLCIENCIAS project: Reduccion de Emisiones Vehiculares Mediante el Modelado y Gestion Optima de Trafico en Areas Metropolitanas - Caso Medellin - Area Metropolitana del Valle de Aburra, codigo 111874558167, CT 049-2017. Universidad Nacional de Colombia- Politécnico Colombiano Jaime Isaza Cadavid. Proyecto HERMES 25374.</p>
			<p>The authors also gratefully acknowledge the support of NVIDIA Corporation with the donation of the two GPUs used for this research. </p>
			<p>The datasets and code used in this work are available upon request from the authors.</p>
		</ack>
		<ref-list>
			<title>References</title>
			<ref id="B1">
				<label>[1]</label>
				<mixed-citation>[1] WHO, Global status report on road safety, [Online]. 2018, WHO. [Accessed: June 10th of 2019]. Available at: <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.who.int/violence_injury_prevention/road_safety_status/2018/en/">http://www.who.int/violence_injury_prevention/road_safety_status/2018/en/</ext-link>
					</comment>. </mixed-citation>
				<element-citation publication-type="book">
					<person-group person-group-type="author">
						<collab>WHO</collab>
					</person-group>
					<source>Global status report on road safety</source>
					<year>2018</year>
					<publisher-name>WHO</publisher-name>
					<date-in-citation content-type="access-date" iso-8601-date="2019-00-00">June 10th of 2019</date-in-citation>
					<comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.who.int/violence_injury_prevention/road_safety_status/2018/en/">http://www.who.int/violence_injury_prevention/road_safety_status/2018/en/</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B2">
				<label>[2]</label>
				<mixed-citation>[2] Accidentes de tránsito en la Comunidad Andina, 2007-2016, 48 P.</mixed-citation>
				<element-citation publication-type="book">
					<source>Accidentes de tránsito en la Comunidad Andina, 2007-2016</source>
					<fpage>48</fpage>
					<lpage>48</lpage>
				</element-citation>
			</ref>
			<ref id="B3">
				<label>[3]</label>
				<mixed-citation>[3] Así Vamos en Salud., Mortalidad por accidentes de tránsito, [Online]. 2018. [Accessed: 2August 23th of 2018]. Available at: <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.asivamosensalud.org/salud-para-ciudadanos/mortalidad-por-accidentes-de-transito">https://www.asivamosensalud.org/salud-para-ciudadanos/mortalidad-por-accidentes-de-transito</ext-link>
					</comment>.</mixed-citation>
				<element-citation publication-type="book">
					<source>Así Vamos en Salud., Mortalidad por accidentes de tránsito</source>
					<year>2018</year>
					<date-in-citation content-type="access-date" iso-8601-date="2018-00-00">2August 23th of 2018</date-in-citation>
					<comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.asivamosensalud.org/salud-para-ciudadanos/mortalidad-por-accidentes-de-transito">https://www.asivamosensalud.org/salud-para-ciudadanos/mortalidad-por-accidentes-de-transito</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B4">
				<label>[4]</label>
				<mixed-citation>[4] RUNT. Estadísticas del RUNT, [Online]. Accessed: August 09th of 2019]. Available at: <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.runt.com.co/cifras">https://www.runt.com.co/cifras</ext-link>
					</comment>
				</mixed-citation>
				<element-citation publication-type="webpage">
					<person-group person-group-type="author">
						<collab>RUNT</collab>
					</person-group>
					<source>Estadísticas del RUNT</source>
					<date-in-citation content-type="access-date" iso-8601-date="2019-00-00">August 09th of 2019</date-in-citation>
					<comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.runt.com.co/cifras">https://www.runt.com.co/cifras</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B5">
				<label>[5]</label>
				<mixed-citation>[5] IDEAM. Calidad del aire, [Online]. [Accessed: August 09th of 2019]. Available at: <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.ideam.gov.co/web/contaminacion-y-calidad-ambiental/calidad-del-aire">http://www.ideam.gov.co/web/contaminacion-y-calidad-ambiental/calidad-del-aire</ext-link>
					</comment>. </mixed-citation>
				<element-citation publication-type="webpage">
					<person-group person-group-type="author">
						<collab>IDEAM</collab>
					</person-group>
					<source>Calidad del aire</source>
					<date-in-citation content-type="access-date" iso-8601-date="2019-00-00">August 09th of 2019</date-in-citation>
					<comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.ideam.gov.co/web/contaminacion-y-calidad-ambiental/calidad-del-aire">http://www.ideam.gov.co/web/contaminacion-y-calidad-ambiental/calidad-del-aire</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B6">
				<label>[6]</label>
				<mixed-citation>[6] Walsh, M.P., PM 2.5: global progress in controlling the motor vehicle contribution, Front. Environ. Sci. Eng., 8(1), pp. 1-17, 2014. DOI: 10.1007/s11783-014-0634-4</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Walsh</surname>
							<given-names>M.P</given-names>
						</name>
					</person-group>
					<article-title>PM 2.5: global progress in controlling the motor vehicle contribution</article-title>
					<source>Front. Environ. Sci. Eng</source>
					<volume>8</volume>
					<issue>1</issue>
					<fpage>1</fpage>
					<lpage>17</lpage>
					<year>2014</year>
					<pub-id pub-id-type="doi">10.1007/s11783-014-0634-4</pub-id>
				</element-citation>
			</ref>
			<ref id="B7">
				<label>[7]</label>
				<mixed-citation>[7] Ren, S., He, K., Girshick, R. and Sun, J., Faster r-cnn: towards real-time object detection with region proposal networks, in: Advances in neural information processing systems, [online]. 2015, pp. 91-99. Available at: <ext-link ext-link-type="uri" xlink:href="http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks">http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks</ext-link>
				</mixed-citation>
				<element-citation publication-type="book">
					<person-group person-group-type="author">
						<name>
							<surname>Ren</surname>
							<given-names>S.</given-names>
						</name>
						<name>
							<surname>He</surname>
							<given-names>K.</given-names>
						</name>
						<name>
							<surname>Girshick</surname>
							<given-names>R.</given-names>
						</name>
						<name>
							<surname>Sun</surname>
							<given-names>J</given-names>
						</name>
					</person-group>
					<chapter-title>Faster r-cnn: towards real-time object detection with region proposal networks</chapter-title>
					<source>Advances in neural information processing systems</source>
					<year>2015</year>
					<fpage>91</fpage>
					<lpage>99</lpage>
					<ext-link ext-link-type="uri" xlink:href="http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks">http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks</ext-link>
				</element-citation>
			</ref>
			<ref id="B8">
				<label>[8]</label>
				<mixed-citation>[8] Tian, B. et al., Hierarchical and networked vehicle surveillance in ITS: a survey, IEEE Trans. Intell. Transp. Syst., 18(1), pp. 25-48, 2017. DOI: 10.1109/TITS.2016.2552778</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Tian</surname>
							<given-names>B.</given-names>
						</name>
						<etal/>
					</person-group>
					<article-title>Hierarchical and networked vehicle surveillance in ITS: a survey</article-title>
					<source>IEEE Trans. Intell. Transp. Syst</source>
					<volume>18</volume>
					<issue>1</issue>
					<fpage>25</fpage>
					<lpage>48</lpage>
					<year>2017</year>
					<pub-id pub-id-type="doi">10.1109/TITS.2016.2552778</pub-id>
				</element-citation>
			</ref>
			<ref id="B9">
				<label>[9]</label>
				<mixed-citation>[9] Le, T.S. and Huynh, C.K., An unified framework for motorbike counting and detecting in traffic videos, in: 2015 International Conference on Advanced Computing and Applications (ACOMP), 2015, pp. 162-168. DOI: 10.1109/ACOMP.2015.32</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Le</surname>
							<given-names>T.S.</given-names>
						</name>
						<name>
							<surname>Huynh</surname>
							<given-names>C.K</given-names>
						</name>
					</person-group>
					<source>An unified framework for motorbike counting and detecting in traffic videos</source>
					<conf-date>2015</conf-date>
					<conf-name>International Conference on Advanced Computing and Applications (ACOMP)</conf-name>
					<year>2015</year>
					<fpage>162</fpage>
					<lpage>168</lpage>
					<pub-id pub-id-type="doi">10.1109/ACOMP.2015.32</pub-id>
				</element-citation>
			</ref>
			<ref id="B10">
				<label>[10]</label>
				<mixed-citation>[10] Duan B., Liu W., Fu P., Yang C., Wen X., and Yuan H., Real-time on-road vehicle and motorcycle detection using a single camera, in Industrial Technology, 2009. ICIT 2009. IEEE International Conference on, 2009, pp. 1-6. DOI: 10.1109/ICIT.2009.4939585</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Duan</surname>
							<given-names>B.</given-names>
						</name>
						<name>
							<surname>Liu</surname>
							<given-names>W.</given-names>
						</name>
						<name>
							<surname>Fu</surname>
							<given-names>P.</given-names>
						</name>
						<name>
							<surname>Yang</surname>
							<given-names>C.</given-names>
						</name>
						<name>
							<surname>Wen</surname>
							<given-names>X.</given-names>
						</name>
						<name>
							<surname>Yuan</surname>
							<given-names>H</given-names>
						</name>
					</person-group>
					<source>Real-time on-road vehicle and motorcycle detection using a single camera, in Industrial Technology, 2009</source>
					<conf-sponsor>ICIT</conf-sponsor>
					<conf-date>2009</conf-date>
					<conf-name>IEEE International Conference on</conf-name>
					<year>2009</year>
					<fpage>1</fpage>
					<lpage>6</lpage>
					<pub-id pub-id-type="doi">10.1109/ICIT.2009.4939585</pub-id>
				</element-citation>
			</ref>
			<ref id="B11">
				<label>[11]</label>
				<mixed-citation>[11] Muzammel, M., Yusoff, M.Z. and Meriaudeau, F., Rear-end vision-based collision detection system for motorcyclists, J. Electron. Imaging, 26(3), pp. 033002, 2017. DOI: 10.1117/1.JEI.26.3.033002</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Muzammel</surname>
							<given-names>M.</given-names>
						</name>
						<name>
							<surname>Yusoff</surname>
							<given-names>M.Z.</given-names>
						</name>
						<name>
							<surname>Meriaudeau</surname>
							<given-names>F</given-names>
						</name>
					</person-group>
					<article-title>Rear-end vision-based collision detection system for motorcyclists</article-title>
					<source>J. Electron. Imaging</source>
					<volume>26</volume>
					<issue>3</issue>
					<fpage>033002</fpage>
					<lpage>033002</lpage>
					<year>2017</year>
					<pub-id pub-id-type="doi">10.1117/1.JEI.26.3.033002</pub-id>
				</element-citation>
			</ref>
			<ref id="B12">
				<label>[12]</label>
				<mixed-citation>[12] Shuo, Y. and Choi, E.-J., A driving support system base on traffic environment analysis, Indian J. Sci. Technol., 9(47), 2016. DOI: 10.17485/ijst/2016/v9i47/108374</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Shuo</surname>
							<given-names>Y.</given-names>
						</name>
						<name>
							<surname>Choi</surname>
							<given-names>E.-J</given-names>
						</name>
					</person-group>
					<article-title>A driving support system base on traffic environment analysis</article-title>
					<source>Indian J. Sci. Technol</source>
					<volume>9</volume>
					<issue>47</issue>
					<year>2016</year>
					<pub-id pub-id-type="doi">10.17485/ijst/2016/v9i47/108374</pub-id>
				</element-citation>
			</ref>
			<ref id="B13">
				<label>[13]</label>
				<mixed-citation>[13] Wonghabut, P., Kumphong, J., Satiennam,, T., Ungarunyawee R. and Leelapatra, W., Automatic helmet-wearing detection for law enforcement using CCTV cameras, in: IOP Conference Series: Earth and Environmental Science, 2018, 143, pp. 012063. DOI: 10.1088/1755-1315/143/1/012063</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Wonghabut</surname>
							<given-names>P</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Kumphong</surname>
							<given-names>J</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Satiennam</surname>
							<given-names>T</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Ungarunyawee</surname>
							<given-names>R</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Leelapatra</surname>
							<given-names>W</given-names>
						</name>
					</person-group>
					<source>Automatic helmet-wearing detection for law enforcement using CCTV cameras</source>
					<conf-name>IOP Conference Series: Earth and Environmental Science</conf-name>
					<year>2018</year>
					<volume>143</volume>
					<fpage>012063</fpage>
					<lpage>012063</lpage>
					<pub-id pub-id-type="doi">10.1088/1755-1315/143/1/012063</pub-id>
				</element-citation>
			</ref>
			<ref id="B14">
				<label>[14]</label>
				<mixed-citation>[14] Dahiya, K., Singh, D. and Mohan, C.K., Automatic detection of bike-riders without helmet using surveillance videos in real-time, in: 2016 International Joint Conference on Neural Networks (IJCNN), 2016, pp. 3046-3051. DOI: 10.1109/IJCNN.2016.7727586</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Dahiya</surname>
							<given-names>K.</given-names>
						</name>
						<name>
							<surname>Singh</surname>
							<given-names>D.</given-names>
						</name>
						<name>
							<surname>Mohan</surname>
							<given-names>C.K</given-names>
						</name>
					</person-group>
					<source>Automatic detection of bike-riders without helmet using surveillance videos in real-time</source>
					<conf-date>2016</conf-date>
					<conf-name>International Joint Conference on Neural Networks (IJCNN)</conf-name>
					<year>2016</year>
					<fpage>3046</fpage>
					<lpage>3051</lpage>
					<pub-id pub-id-type="doi">10.1109/IJCNN.2016.7727586</pub-id>
				</element-citation>
			</ref>
			<ref id="B15">
				<label>[15]</label>
				<mixed-citation>[15] Singh, D., Vishnu, C. and Mohan, C.K., Visual big data analytics for traffic monitoring in smart city, in: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 2016, pp. 886-891. DOI: 10.1109/ICMLA.2016.0159</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Singh</surname>
							<given-names>D.</given-names>
						</name>
						<name>
							<surname>Vishnu</surname>
							<given-names>C.</given-names>
						</name>
						<name>
							<surname>Mohan</surname>
							<given-names>C.K</given-names>
						</name>
					</person-group>
					<source>Visual big data analytics for traffic monitoring in smart city</source>
					<conf-date>2016</conf-date>
					<conf-name>15th IEEE International Conference on Machine Learning and Applications (ICMLA)</conf-name>
					<year>2016</year>
					<fpage>886</fpage>
					<lpage>891</lpage>
					<pub-id pub-id-type="doi">10.1109/ICMLA.2016.0159</pub-id>
				</element-citation>
			</ref>
			<ref id="B16">
				<label>[16]</label>
				<mixed-citation>[16] e Silva, R.R., Aires, K.R. and Veras, R. de MS, Detection of helmets on motorcyclists, Multimed. Tools Appl., 77(5), pp. 5659-5683, 2017. DOI: 10.1007/s11042-017-4482-7</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>e Silva</surname>
							<given-names>R.R</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Aires</surname>
							<given-names>K.R</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Veras</surname>
							<given-names>R</given-names>
						</name>
					</person-group>
					<article-title>de MS, Detection of helmets on motorcyclists</article-title>
					<source>Multimed. Tools Appl</source>
					<volume>77</volume>
					<issue>5</issue>
					<fpage>5659</fpage>
					<lpage>5683</lpage>
					<year>2017</year>
					<pub-id pub-id-type="doi">10.1007/s11042-017-4482-7</pub-id>
				</element-citation>
			</ref>
			<ref id="B17">
				<label>[17]</label>
				<mixed-citation>[17] Wu, H. and Zhao, J., An intelligent vision-based approach for helmet identification for work safety, Comput. Ind., 100, pp. 267-277, 2018. DOI: 10.1016/j.compind.2018.03.037</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Wu</surname>
							<given-names>H.</given-names>
						</name>
						<name>
							<surname>Zhao</surname>
							<given-names>J</given-names>
						</name>
					</person-group>
					<article-title>An intelligent vision-based approach for helmet identification for work safety</article-title>
					<source>Comput. Ind</source>
					<volume>100</volume>
					<fpage>267</fpage>
					<lpage>277</lpage>
					<year>2018</year>
					<pub-id pub-id-type="doi">10.1016/j.compind.2018.03.037</pub-id>
				</element-citation>
			</ref>
			<ref id="B18">
				<label>[18]</label>
				<mixed-citation>[18] Messelodi, S., Modena C.M. and Cattoni, G., Vision-based bicycle/motorcycle classification, Pattern Recognit. Lett., 28(13), pp. 1719-1726, 2007. DOI: 10.1016/j.patrec.2007.04.014</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Messelodi</surname>
							<given-names>S</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Modena</surname>
							<given-names>C.M</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Cattoni</surname>
							<given-names>G</given-names>
						</name>
					</person-group>
					<article-title>Vision-based bicycle/motorcycle classification</article-title>
					<source>Pattern Recognit. Lett</source>
					<volume>28</volume>
					<issue>13</issue>
					<fpage>1719</fpage>
					<lpage>1726</lpage>
					<year>2007</year>
					<pub-id pub-id-type="doi">10.1016/j.patrec.2007.04.014</pub-id>
				</element-citation>
			</ref>
			<ref id="B19">
				<label>[19]</label>
				<mixed-citation>[19] Buch, N., Orwell, J. and Velastin, S.A., Urban road user detection and classification using 3D wire frame models, IET Comput. Vis., 4(2), pp. 105-116, 2010. DOI: 10.1049/iet-cvi.2008.0089</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Buch</surname>
							<given-names>N.</given-names>
						</name>
						<name>
							<surname>Orwell</surname>
							<given-names>J</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Velastin</surname>
							<given-names>S.A</given-names>
						</name>
					</person-group>
					<article-title>Urban road user detection and classification using 3D wire frame models</article-title>
					<source>IET Comput. Vis</source>
					<volume>4</volume>
					<issue>2</issue>
					<fpage>105</fpage>
					<lpage>116</lpage>
					<year>2010</year>
					<pub-id pub-id-type="doi">10.1049/iet-cvi.2008.0089</pub-id>
				</element-citation>
			</ref>
			<ref id="B20">
				<label>[20]</label>
				<mixed-citation>[20] Chiu, C.-C., Ku, M.-Y. and Chen, H.-T., Motorcycle detection and tracking system with occlusion segmentation, in: Image Analysis for Multimedia Interactive Services, 2007. WIAMIS07. Eighth International Workshop on, 2007, pp. 32-32. DOI: 10.1109/WIAMIS.2007.60</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Chiu</surname>
							<given-names>C.-C.</given-names>
						</name>
						<name>
							<surname>Ku</surname>
							<given-names>M.-Y.</given-names>
						</name>
						<name>
							<surname>Chen</surname>
							<given-names>H.-T</given-names>
						</name>
					</person-group>
					<source>Motorcycle detection and tracking system with occlusion segmentation</source>
					<conf-name>Image Analysis for Multimedia Interactive Services, 2007. WIAMIS07. Eighth International Workshop on</conf-name>
					<year>2007</year>
					<fpage>32</fpage>
					<lpage>32</lpage>
					<pub-id pub-id-type="doi">10.1109/WIAMIS.2007.60</pub-id>
				</element-citation>
			</ref>
			<ref id="B21">
				<label>[21]</label>
				<mixed-citation>[21] Ku, M.-Y., Chiu, C.-C., Chen, H.-T. and Hong, S.-H., Visual motorcycle detection and tracking algorithms, WSEAS Trans. Electron., [online]. pp. 121-131, 2008. Available at: <ext-link ext-link-type="uri" xlink:href="http://www.wseas.us/e-library/transactions/electronics/2008/30-863.pdf">http://www.wseas.us/e-library/transactions/electronics/2008/30-863.pdf</ext-link>
				</mixed-citation>
				<element-citation publication-type="book">
					<person-group person-group-type="author">
						<name>
							<surname>Ku</surname>
							<given-names>M.-Y.</given-names>
						</name>
						<name>
							<surname>Chiu</surname>
							<given-names>C.-C</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Chen</surname>
							<given-names>H.-T</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Hong</surname>
							<given-names>S.-H</given-names>
						</name>
					</person-group>
					<source>Visual motorcycle detection and tracking algorithms, WSEAS Trans. Electron</source>
					<fpage>121</fpage>
					<lpage>131</lpage>
					<year>2008</year>
					<ext-link ext-link-type="uri" xlink:href="http://www.wseas.us/e-library/transactions/electronics/2008/30-863.pdf">http://www.wseas.us/e-library/transactions/electronics/2008/30-863.pdf</ext-link>
				</element-citation>
			</ref>
			<ref id="B22">
				<label>[22]</label>
				<mixed-citation>[22] Stauffer, C. and Grimson, W.E.L., Adaptive background mixture models for real-time tracking, in: Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on., 1999, pp. 246-252. DOI: 10.1109/CVPR.1999.784637</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Stauffer</surname>
							<given-names>C</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Grimson</surname>
							<given-names>W.E.L</given-names>
						</name>
					</person-group>
					<source>Adaptive background mixture models for real-time tracking, in: Computer Vision and Pattern Recognition, 1999</source>
					<conf-name>IEEE Computer Society Conference on</conf-name>
					<year>1999</year>
					<fpage>246</fpage>
					<lpage>252</lpage>
					<pub-id pub-id-type="doi">10.1109/CVPR.1999.784637</pub-id>
				</element-citation>
			</ref>
			<ref id="B23">
				<label>[23]</label>
				<mixed-citation>[23] Waranusast, R., Bundon, N., Timtong, V., Tangnoi, C. and Pattanathaburt, P., Machine vision techniques for motorcycle safety helmet detection, in: 28th International Conference on Image and Vision Computing New Zealand (IVCNZ 2013), 2013, pp. 35-40. DOI: 10.1109/IVCNZ.2013.6726989</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Waranusast</surname>
							<given-names>R.</given-names>
						</name>
						<name>
							<surname>Bundon</surname>
							<given-names>N.</given-names>
						</name>
						<name>
							<surname>Timtong</surname>
							<given-names>V.</given-names>
						</name>
						<name>
							<surname>Tangnoi</surname>
							<given-names>C.</given-names>
						</name>
						<name>
							<surname>Pattanathaburt</surname>
							<given-names>P</given-names>
						</name>
					</person-group>
					<source>Machine vision techniques for motorcycle safety helmet detection</source>
					<conf-name>28International Conference on Image and Vision Computing New Zealand (IVCNZ 2013)</conf-name>
					<year>2013</year>
					<fpage>35</fpage>
					<lpage>40</lpage>
					<pub-id pub-id-type="doi">10.1109/IVCNZ.2013.6726989</pub-id>
				</element-citation>
			</ref>
			<ref id="B24">
				<label>[24]</label>
				<mixed-citation>[24] Rashidan, M.A., Mustafah, Y.M., Shafie, A.A., Zainuddin, N.A., Aziz, N.N.A. and Azman, A.W., Moving object detection and classification using Neuro-Fuzzy approach, Int. J. Multimed. Ubiquitous Eng., 11(4), pp. 253-266, 2016. DOI: 10.14257/ijmue.2016.11.4.26</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Rashidan</surname>
							<given-names>M.A</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Mustafah</surname>
							<given-names>Y.M</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Shafie</surname>
							<given-names>A.A</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Zainuddin</surname>
							<given-names>N.A</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Aziz</surname>
							<given-names>N.N.A</given-names>
						</name>
					</person-group>
					<person-group person-group-type="author">
						<name>
							<surname>Azman</surname>
							<given-names>A.W</given-names>
						</name>
					</person-group>
					<article-title>Moving object detection and classification using Neuro-Fuzzy approach</article-title>
					<source>Int. J. Multimed. Ubiquitous Eng</source>
					<volume>11</volume>
					<issue>4</issue>
					<fpage>253</fpage>
					<lpage>266</lpage>
					<year>2016</year>
					<pub-id pub-id-type="doi">10.14257/ijmue.2016.11.4.26</pub-id>
				</element-citation>
			</ref>
			<ref id="B25">
				<label>[25]</label>
				<mixed-citation>[25] Chen, Z. and Ellis, T., Self-adaptive Gaussian mixture model for urban traffic monitoring system, in: IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011, pp. 1769-1776. DOI: 10.1109/ICCVW.2011.6130463</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Chen</surname>
							<given-names>Z.</given-names>
						</name>
						<name>
							<surname>Ellis</surname>
							<given-names>T</given-names>
						</name>
					</person-group>
					<source>Self-adaptive Gaussian mixture model for urban traffic monitoring system</source>
					<conf-name>IEEE International Conference on Computer Vision Workshops (ICCV Workshops)</conf-name>
					<year>2011</year>
					<fpage>1769</fpage>
					<lpage>1776</lpage>
					<pub-id pub-id-type="doi">10.1109/ICCVW.2011.6130463</pub-id>
				</element-citation>
			</ref>
			<ref id="B26">
				<label>[26]</label>
				<mixed-citation>[26] Chen, Z., Ellis, T. and Velastin, S.A., Vehicle detection, tracking and classification in urban traffic, in: 15th International IEEE Conference on Intelligent Transportation Systems, 2012, pp. 951-956. DOI: 10.1109/ITSC.2012.6338852</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Chen</surname>
							<given-names>Z.</given-names>
						</name>
						<name>
							<surname>Ellis</surname>
							<given-names>T.</given-names>
						</name>
						<name>
							<surname>Velastin</surname>
							<given-names>S.A</given-names>
						</name>
					</person-group>
					<source>Vehicle detection, tracking and classification in urban traffic</source>
					<conf-name>15International IEEE Conference on Intelligent Transportation Systems</conf-name>
					<year>2012</year>
					<fpage>951</fpage>
					<lpage>956</lpage>
					<pub-id pub-id-type="doi">10.1109/ITSC.2012.6338852</pub-id>
				</element-citation>
			</ref>
			<ref id="B27">
				<label>[27]</label>
				<mixed-citation>[27] Chiverton, J., Helmet presence classification with motorcycle detection and tracking, Intell. Transp. Syst. IET, 6(3), pp. 259-269, 2012. DOI: 10.1049/iet-its.2011.0138</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Chiverton</surname>
							<given-names>J</given-names>
						</name>
					</person-group>
					<article-title>Helmet presence classification with motorcycle detection and tracking</article-title>
					<source>Intell. Transp. Syst. IET</source>
					<volume>6</volume>
					<issue>3</issue>
					<fpage>259</fpage>
					<lpage>269</lpage>
					<year>2012</year>
					<pub-id pub-id-type="doi">10.1049/iet-its.2011.0138</pub-id>
				</element-citation>
			</ref>
			<ref id="B28">
				<label>[28]</label>
				<mixed-citation>[28] Thai, N.D., Le, T.S., Thoai, N. and Hamamoto, K., Learning bag of visual words for motorbike detection, in: 13th International Conference on Control Automation Robotics Vision (ICARCV), 2014, pp. 1045-1050. DOI: 10.1109/ICARCV.2014.7064450</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Thai</surname>
							<given-names>N.D.</given-names>
						</name>
						<name>
							<surname>Le</surname>
							<given-names>T.S.</given-names>
						</name>
						<name>
							<surname>Thoai</surname>
							<given-names>N.</given-names>
						</name>
						<name>
							<surname>Hamamoto</surname>
							<given-names>K</given-names>
						</name>
					</person-group>
					<source>Learning bag of visual words for motorbike detection</source>
					<conf-name>13International Conference on Control Automation Robotics Vision (ICARCV)</conf-name>
					<year>2014</year>
					<fpage>1045</fpage>
					<lpage>1050</lpage>
					<pub-id pub-id-type="doi">10.1109/ICARCV.2014.7064450</pub-id>
				</element-citation>
			</ref>
			<ref id="B29">
				<label>[29]</label>
				<mixed-citation>[29] Mukhtar, A. and Tang, T.B., Vision based motorcycle detection using HOG features, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2015, pp. 452-456. DOI: 10.1109/ICSIPA.2015.7412234</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Mukhtar</surname>
							<given-names>A.</given-names>
						</name>
						<name>
							<surname>Tang</surname>
							<given-names>T.B</given-names>
						</name>
					</person-group>
					<source>Vision based motorcycle detection using HOG features</source>
					<conf-name>IEEE International Conference on Signal and Image Processing Applications (ICSIPA)</conf-name>
					<year>2015</year>
					<fpage>452</fpage>
					<lpage>456</lpage>
					<pub-id pub-id-type="doi">10.1109/ICSIPA.2015.7412234</pub-id>
				</element-citation>
			</ref>
			<ref id="B30">
				<label>[30]</label>
				<mixed-citation>[30] Dupuis, Y., Subirats, P. and Vasseur, P., Robust image segmentation for overhead real time motorbike counting, in: IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), 2014, pp. 3070-3075. DOI: 10.1109/ITSC.2014.6958183</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Dupuis</surname>
							<given-names>Y.</given-names>
						</name>
						<name>
							<surname>Subirats</surname>
							<given-names>P.</given-names>
						</name>
						<name>
							<surname>Vasseur</surname>
							<given-names>P</given-names>
						</name>
					</person-group>
					<source>Robust image segmentation for overhead real time motorbike counting</source>
					<conf-name>IEEE 17th International Conference on Intelligent Transportation Systems (ITSC)</conf-name>
					<year>2014</year>
					<fpage>3070</fpage>
					<lpage>3075</lpage>
					<pub-id pub-id-type="doi">10.1109/ITSC.2014.6958183</pub-id>
				</element-citation>
			</ref>
			<ref id="B31">
				<label>[31]</label>
				<mixed-citation>[31] Sutikno, S., Waspada, I., Bahtiar, N. and Sasongko, P.S., Classification of motorcyclists not wear helmet on digital image with backpropagation Neural Network, TELKOMNIKA Telecommun. Comput. Electron. Control, 14(3), pp. 1128-1133, 2016. DOI: 10.12928/telkomnika.v14i3.3486</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Sutikno</surname>
							<given-names>S.</given-names>
						</name>
						<name>
							<surname>Waspada</surname>
							<given-names>I.</given-names>
						</name>
						<name>
							<surname>Bahtiar</surname>
							<given-names>N.</given-names>
						</name>
						<name>
							<surname>Sasongko</surname>
							<given-names>P.S</given-names>
						</name>
					</person-group>
					<article-title>Classification of motorcyclists not wear helmet on digital image with backpropagation Neural Network, TELKOMNIKA Telecommun</article-title>
					<source>Comput. Electron. Control</source>
					<volume>14</volume>
					<issue>3</issue>
					<fpage>1128</fpage>
					<lpage>1133</lpage>
					<year>2016</year>
					<pub-id pub-id-type="doi">10.12928/telkomnika.v14i3.3486</pub-id>
				</element-citation>
			</ref>
			<ref id="B32">
				<label>[32]</label>
				<mixed-citation>[32] Vishnu, C., Singh, D., Mohan, C.K. and Babu, S., Detection of motorcyclists without helmet in videos using convolutional neural network, in: International Joint Conference on Neural Networks (IJCNN), 2017, pp. 3036-3041. DOI: 10.1109/IJCNN.2017.7966233</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Vishnu</surname>
							<given-names>C.</given-names>
						</name>
						<name>
							<surname>Singh</surname>
							<given-names>D.</given-names>
						</name>
						<name>
							<surname>Mohan</surname>
							<given-names>C.K.</given-names>
						</name>
						<name>
							<surname>Babu</surname>
							<given-names>S</given-names>
						</name>
					</person-group>
					<source>Detection of motorcyclists without helmet in videos using convolutional neural network</source>
					<conf-name>International Joint Conference on Neural Networks (IJCNN)</conf-name>
					<year>2017</year>
					<fpage>3036</fpage>
					<lpage>3041</lpage>
					<pub-id pub-id-type="doi">10.1109/IJCNN.2017.7966233</pub-id>
				</element-citation>
			</ref>
			<ref id="B33">
				<label>[33]</label>
				<mixed-citation>[33] Espinosa, J.E., Velastin, S.A. and Branch, J.W., Vehicle detection using Alex Net and Faster R-CNN deep learning models: a comparative study, in: International Visual Informatics Conference, 2017, pp. 3-15. DOI: 10.1007/978-3-319-70010-6_1</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Espinosa</surname>
							<given-names>J.E.</given-names>
						</name>
						<name>
							<surname>Velastin</surname>
							<given-names>S.A.</given-names>
						</name>
						<name>
							<surname>Branch</surname>
							<given-names>J.W</given-names>
						</name>
					</person-group>
					<source>Vehicle detection using Alex Net and Faster R-CNN deep learning models: a comparative study</source>
					<conf-name>International Visual Informatics Conference</conf-name>
					<year>2017</year>
					<fpage>3</fpage>
					<lpage>15</lpage>
					<pub-id pub-id-type="doi">10.1007/978-3-319-70010-6_1</pub-id>
				</element-citation>
			</ref>
			<ref id="B34">
				<label>[34]</label>
				<mixed-citation>[34] Adu-Gyamfi, Y.O., Asare, S.K., Sharma, A. and Titus, T., Automated vehicle recognition with deep convolutional Neural Networks, Transportation Research Record: Journal of the Transportation Research Board 2645(1), pp. 113-122, 2017. DOI: 10.3141/2645-13</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Adu-Gyamfi</surname>
							<given-names>Y.O.</given-names>
						</name>
						<name>
							<surname>Asare</surname>
							<given-names>S.K.</given-names>
						</name>
						<name>
							<surname>Sharma</surname>
							<given-names>A.</given-names>
						</name>
						<name>
							<surname>Titus</surname>
							<given-names>T</given-names>
						</name>
					</person-group>
					<article-title>Automated vehicle recognition with deep convolutional Neural Networks, Transportation Research Record</article-title>
					<source>Journal of the Transportation Research Board</source>
					<volume>2645</volume>
					<issue>1</issue>
					<fpage>113</fpage>
					<lpage>122</lpage>
					<year>2017</year>
					<pub-id pub-id-type="doi">10.3141/2645-13</pub-id>
				</element-citation>
			</ref>
			<ref id="B35">
				<label>[35]</label>
				<mixed-citation>[35] Huynh, C.K., Le, T.S. and Hamamoto, K., Convolutional neural network for motorbike detection in dense traffic, in: IEEE Sixth International Conference on Communications and Electronics (ICCE), 2016, pp. 369-374. DOI: 10.1109/CCE.2016.7562664</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Huynh</surname>
							<given-names>C.K.</given-names>
						</name>
						<name>
							<surname>Le</surname>
							<given-names>T.S.</given-names>
						</name>
						<name>
							<surname>Hamamoto</surname>
							<given-names>K</given-names>
						</name>
					</person-group>
					<source>Convolutional neural network for motorbike detection in dense traffic</source>
					<conf-name>IEEE Sixth International Conference on Communications and Electronics (ICCE)</conf-name>
					<year>2016</year>
					<fpage>369</fpage>
					<lpage>374</lpage>
					<pub-id pub-id-type="doi">10.1109/CCE.2016.7562664</pub-id>
				</element-citation>
			</ref>
			<ref id="B36">
				<label>[36]</label>
				<mixed-citation>[36] Ra,j K.C.D., Chairat, A., Timtong, V., Dailey, M.N. and Ekpanyapong, M., Helmet violation processing using deep learning, in: International Workshop on Advanced Image Technology (IWAIT), 2018, pp. 1-4. DOI: 10.1109/IWAIT.2018.8369734</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Ra</surname>
							<given-names>j K.C.D.</given-names>
						</name>
						<name>
							<surname>Chairat</surname>
							<given-names>A.</given-names>
						</name>
						<name>
							<surname>Timtong</surname>
							<given-names>V.</given-names>
						</name>
						<name>
							<surname>Dailey</surname>
							<given-names>M.N.</given-names>
						</name>
						<name>
							<surname>Ekpanyapong</surname>
							<given-names>M</given-names>
						</name>
					</person-group>
					<source>Helmet violation processing using deep learning</source>
					<conf-name>International Workshop on Advanced Image Technology (IWAIT)</conf-name>
					<year>2018</year>
					<fpage>1</fpage>
					<lpage>4</lpage>
					<pub-id pub-id-type="doi">10.1109/IWAIT.2018.8369734</pub-id>
				</element-citation>
			</ref>
			<ref id="B37">
				<label>[37]</label>
				<mixed-citation>[37] Wu, H. and Zhao, J., Automated visual helmet identification based on deep convolutional neural networks, in: Computer Aided Chemical Engineering, 44, Elsevier, 2018, pp. 2299-2304. DOI: 10.1016/B978-0-444-64241-7.50378-5</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Wu</surname>
							<given-names>H.</given-names>
						</name>
						<name>
							<surname>Zhao</surname>
							<given-names>J</given-names>
						</name>
					</person-group>
					<source>Automated visual helmet identification based on deep convolutional neural networks</source>
					<conf-name>Computer Aided Chemical Engineering</conf-name>
					<volume>44</volume>
					<publisher-name>Elsevier</publisher-name>
					<year>2018</year>
					<fpage>2299</fpage>
					<lpage>2304</lpage>
					<pub-id pub-id-type="doi">10.1016/B978-0-444-64241-7.50378-5</pub-id>
				</element-citation>
			</ref>
			<ref id="B38">
				<label>[38]</label>
				<mixed-citation>[38] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. and Fei-Fei, L., ImageNet: a large-scale hierarchical image database, in: IEEE Conference on Computer Vision and Pattern Recognition, 2009 - CVPR 2009, 2009, pp. 248-255. DOI: 10.1109/CVPR.2009.5206848</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Deng</surname>
							<given-names>J.</given-names>
						</name>
						<name>
							<surname>Dong</surname>
							<given-names>W.</given-names>
						</name>
						<name>
							<surname>Socher</surname>
							<given-names>R.</given-names>
						</name>
						<name>
							<surname>Li</surname>
							<given-names>L.J.</given-names>
						</name>
						<name>
							<surname>Li</surname>
							<given-names>K.</given-names>
						</name>
						<name>
							<surname>Fei-Fei</surname>
							<given-names>L</given-names>
						</name>
					</person-group>
					<source>ImageNet: a large-scale hierarchical image database</source>
					<conf-name>IEEE Conference on Computer Vision and Pattern Recognition</conf-name>
					<year>2009</year>
					<fpage>248</fpage>
					<lpage>255</lpage>
					<pub-id pub-id-type="doi">10.1109/CVPR.2009.5206848</pub-id>
				</element-citation>
			</ref>
			<ref id="B39">
				<label>[39]</label>
				<mixed-citation>[39] Zeiler, M.D. and Fergus, R., Visualizing and understanding convolutional networks, in: European Conference on Computer Vision, 2014, pp. 818-833. DOI: 10.1007/978-3-319-10590-1_53</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Zeiler</surname>
							<given-names>M.D.</given-names>
						</name>
						<name>
							<surname>Fergus</surname>
							<given-names>R</given-names>
						</name>
					</person-group>
					<source>Visualizing and understanding convolutional networks</source>
					<conf-name>European Conference on Computer Vision</conf-name>
					<year>2014</year>
					<fpage>818</fpage>
					<lpage>833</lpage>
					<pub-id pub-id-type="doi">10.1007/978-3-319-10590-1_53</pub-id>
				</element-citation>
			</ref>
			<ref id="B40">
				<label>[40]</label>
				<mixed-citation>[40] Lampert, C.H., Blaschko, M.B. and Hofmann, T., Efficient subwindow search: a branch and bound framework for object localization, IEEE Trans. Pattern Anal. Mach. Intell., 31(12), pp. 2129-2142, 2009. DOI: 10.1109/TPAMI.2009.144</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Lampert</surname>
							<given-names>C.H.</given-names>
						</name>
						<name>
							<surname>Blaschko</surname>
							<given-names>M.B.</given-names>
						</name>
						<name>
							<surname>Hofmann</surname>
							<given-names>T</given-names>
						</name>
					</person-group>
					<article-title>Efficient subwindow search: a branch and bound framework for object localization</article-title>
					<source>IEEE Trans. Pattern Anal. Mach. Intell</source>
					<volume>31</volume>
					<issue>12</issue>
					<fpage>2129</fpage>
					<lpage>2142</lpage>
					<year>2009</year>
					<pub-id pub-id-type="doi">10.1109/TPAMI.2009.144</pub-id>
				</element-citation>
			</ref>
			<ref id="B41">
				<label>[41]</label>
				<mixed-citation>[41] Uijlings, J.R., Van De Sande, K.E., Gevers, T. and Smeulders, A.W., Selective search for object recognition, Int. J. Comput. Vis., 104(2), pp. 154-171, 2013. DOI: 10.1007/s11263-013-0620-5</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Uijlings</surname>
							<given-names>J.R.</given-names>
						</name>
						<name>
							<surname>Van De Sande</surname>
							<given-names>K.E.</given-names>
						</name>
						<name>
							<surname>Gevers</surname>
							<given-names>T.</given-names>
						</name>
						<name>
							<surname>Smeulders</surname>
							<given-names>A.W</given-names>
						</name>
					</person-group>
					<article-title>Selective search for object recognition</article-title>
					<source>Int. J. Comput. Vis</source>
					<volume>104</volume>
					<issue>2</issue>
					<fpage>154</fpage>
					<lpage>171</lpage>
					<year>2013</year>
					<pub-id pub-id-type="doi">10.1007/s11263-013-0620-5</pub-id>
				</element-citation>
			</ref>
			<ref id="B42">
				<label>[42]</label>
				<mixed-citation>[42] He, K., Zhang, X., Ren, S. and Sun, J., Spatial pyramid pooling in deep convolutional Networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell., 37(9), pp. 1904-1916, 2015. DOI: 10.1109/TPAMI.2015.2389824</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>He</surname>
							<given-names>K.</given-names>
						</name>
						<name>
							<surname>Zhang</surname>
							<given-names>X.</given-names>
						</name>
						<name>
							<surname>Ren</surname>
							<given-names>S.</given-names>
						</name>
						<name>
							<surname>Sun</surname>
							<given-names>J</given-names>
						</name>
					</person-group>
					<article-title>Spatial pyramid pooling in deep convolutional Networks for visual recognition</article-title>
					<source>IEEE Trans. Pattern Anal. Mach. Intell</source>
					<volume>37</volume>
					<issue>9</issue>
					<fpage>1904</fpage>
					<lpage>1916</lpage>
					<year>2015</year>
					<pub-id pub-id-type="doi">10.1109/TPAMI.2015.2389824</pub-id>
				</element-citation>
			</ref>
			<ref id="B43">
				<label>[43]</label>
				<mixed-citation>[43] Zitnick, C.L. and Dollár, P., Edge boxes: locating object proposals from edges, in: European Conference on Computer Vision, 2014, pp. 391-405. DOI: 10.1007/978-3-319-10602-1_26</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Zitnick</surname>
							<given-names>C.L.</given-names>
						</name>
						<name>
							<surname>Dollár</surname>
							<given-names>P</given-names>
						</name>
					</person-group>
					<source>Edge boxes: locating object proposals from edges</source>
					<conf-name>European Conference on Computer Vision</conf-name>
					<year>2014</year>
					<fpage>391</fpage>
					<lpage>405</lpage>
					<pub-id pub-id-type="doi">10.1007/978-3-319-10602-1_26</pub-id>
				</element-citation>
			</ref>
			<ref id="B44">
				<label>[44]</label>
				<mixed-citation>[44] Girshick, R., Donahue, J., Darrell, T. and Malik, J., Rich feature hierarchies for accurate object detection and semantic segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587. DOI: 10.1109/CVPR.2014.81</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Girshick</surname>
							<given-names>R.</given-names>
						</name>
						<name>
							<surname>Donahue</surname>
							<given-names>J.</given-names>
						</name>
						<name>
							<surname>Darrell</surname>
							<given-names>T.</given-names>
						</name>
						<name>
							<surname>Malik</surname>
							<given-names>J</given-names>
						</name>
					</person-group>
					<source>Rich feature hierarchies for accurate object detection and semantic segmentation</source>
					<conf-name>IEEE Conference on Computer Vision and Pattern Recognition</conf-name>
					<year>2014</year>
					<fpage>580</fpage>
					<lpage>587</lpage>
					<pub-id pub-id-type="doi">10.1109/CVPR.2014.81</pub-id>
				</element-citation>
			</ref>
			<ref id="B45">
				<label>[45]</label>
				<mixed-citation>[45] Girshick, R., Fast r-cnn, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440-1448. DOI: 10.1109/ICCV.2015.169</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Girshick</surname>
							<given-names>R</given-names>
						</name>
					</person-group>
					<source>Fast r-cnn</source>
					<conf-name>Proceedings of the IEEE International Conference on Computer Vision</conf-name>
					<year>2015</year>
					<fpage>1440</fpage>
					<lpage>1448</lpage>
					<pub-id pub-id-type="doi">10.1109/ICCV.2015.169</pub-id>
				</element-citation>
			</ref>
			<ref id="B46">
				<label>[46]</label>
				<mixed-citation>[46] Fan, Q., Brown, L. and Smith, J., A closer look at Faster R-CNN for vehicle detection, in: IEEE Intelligent Vehicles Symposium (IV), 2016, pp. 124-129. DOI: 10.1109/IVS.2016.7535375</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Fan</surname>
							<given-names>Q.</given-names>
						</name>
						<name>
							<surname>Brown</surname>
							<given-names>L.</given-names>
						</name>
						<name>
							<surname>Smith</surname>
							<given-names>J.</given-names>
						</name>
					</person-group>
					<source>A closer look at Faster R-CNN for vehicle detection</source>
					<conf-name>IEEE Intelligent Vehicles Symposium (IV)</conf-name>
					<year>2016</year>
					<fpage>124</fpage>
					<lpage>129</lpage>
					<pub-id pub-id-type="doi">10.1109/IVS.2016.7535375</pub-id>
				</element-citation>
			</ref>
			<ref id="B47">
				<label>[47]</label>
				<mixed-citation>[47] Geiger, A., Lenz, P. and Urtasun, R., Are we ready for autonomous driving?. The kitti vision benchmark suite, in: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 3354-3361. DOI: 10.1109/CVPR.2012.6248074</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Geiger</surname>
							<given-names>A.</given-names>
						</name>
						<name>
							<surname>Lenz</surname>
							<given-names>P.</given-names>
						</name>
						<name>
							<surname>Urtasun</surname>
							<given-names>R</given-names>
						</name>
					</person-group>
					<source>Are we ready for autonomous driving?. The kitti vision benchmark suite</source>
					<conf-name>Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on</conf-name>
					<year>2012</year>
					<fpage>3354</fpage>
					<lpage>3361</lpage>
					<pub-id pub-id-type="doi">10.1109/CVPR.2012.6248074</pub-id>
				</element-citation>
			</ref>
			<ref id="B48">
				<label>[48]</label>
				<mixed-citation>[48] Espinosa, J.E., Velastin, S.A. and Branch, J.W., Motorcycle detection and classification in urban Scenarios using a model based on Faster R-CNN, in: 9th International Conference on Pattern Recognition Systems (ICPRS 2018), 2018, 6 P., ArXiv180802299 Cs, 2018. DOI: 10.1049/cp.2018.1292</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Espinosa</surname>
							<given-names>J.E.</given-names>
						</name>
						<name>
							<surname>Velastin</surname>
							<given-names>S.A.</given-names>
						</name>
						<name>
							<surname>Branch</surname>
							<given-names>J.W</given-names>
						</name>
					</person-group>
					<source>Motorcycle detection and classification in urban Scenarios using a model based on Faster R-CNN</source>
					<conf-name>9International Conference on Pattern Recognition Systems (ICPRS 2018)</conf-name>
					<year>2018</year>
					<fpage>6</fpage>
					<lpage>6</lpage>
					<year>2018</year>
					<pub-id pub-id-type="doi">10.1049/cp.2018.1292</pub-id>
				</element-citation>
			</ref>
			<ref id="B49">
				<label>[49]</label>
				<mixed-citation>[49] Huang, J. et al., Speed/accuracy trade-offs for modern convolutional object detectors, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. ArXiv161110012 Cs, 2017. DOI: 10.1109/CVPR.2017.351</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Huang</surname>
							<given-names>J.</given-names>
						</name>
						<etal/>
					</person-group>
					<source>Speed/accuracy trade-offs for modern convolutional object detectors</source>
					<conf-name>IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</conf-name>
					<year>2017</year>
					<year>2017</year>
					<pub-id pub-id-type="doi">10.1109/CVPR.2017.351</pub-id>
				</element-citation>
			</ref>
			<ref id="B50">
				<label>[50]</label>
				<mixed-citation>[50] Donahue, J. et al., DeCAF: A deep convolutional activation feature for generic visual recognition, in: ICML, [online]. 2014, pp. 647-655. Available at <ext-link ext-link-type="uri" xlink:href="http://www.jmlr.org/proceedings/papers/v32/donahue14.pdf">http://www.jmlr.org/proceedings/papers/v32/donahue14.pdf</ext-link>
				</mixed-citation>
				<element-citation publication-type="book">
					<person-group person-group-type="author">
						<name>
							<surname>Donahue</surname>
							<given-names>J.</given-names>
						</name>
						<etal/>
					</person-group>
					<chapter-title>DeCAF: A deep convolutional activation feature for generic visual recognition</chapter-title>
					<source>ICML</source>
					<year>2014</year>
					<fpage>647</fpage>
					<lpage>655</lpage>
					<ext-link ext-link-type="uri" xlink:href="http://www.jmlr.org/proceedings/papers/v32/donahue14.pdf">http://www.jmlr.org/proceedings/papers/v32/donahue14.pdf</ext-link>
				</element-citation>
			</ref>
			<ref id="B51">
				<label>[51]</label>
				<mixed-citation>[51] Romanuke, V.V., Appropriate number of standard 2 X 2 max pooling layers and their allocation in convolutional neural networks for diverse and heterogeneous datasets, Inf. Technol. Manag. Sci., 20(1), pp. 12-19, 2017. DOI: 10.1515/itms-2017-0002</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Romanuke</surname>
							<given-names>V.V</given-names>
						</name>
					</person-group>
					<article-title>Appropriate number of standard 2 X 2 max pooling layers and their allocation in convolutional neural networks for diverse and heterogeneous datasets</article-title>
					<source>Inf. Technol. Manag. Sci</source>
					<volume>20</volume>
					<issue>1</issue>
					<fpage>12</fpage>
					<lpage>19</lpage>
					<year>2017</year>
					<pub-id pub-id-type="doi">10.1515/itms-2017-0002</pub-id>
				</element-citation>
			</ref>
			<ref id="B52">
				<label>[52]</label>
				<mixed-citation>[52] SIMM. Cámaras de CCTV. [Online]. [Accessed: October 31st of 2018]. Available at: <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.medellin.gov.co/simm/camaras-de-circuito-cerrado">https://www.medellin.gov.co/simm/camaras-de-circuito-cerrado</ext-link>
					</comment>. </mixed-citation>
				<element-citation publication-type="book">
					<person-group person-group-type="author">
						<collab>SIMM</collab>
					</person-group>
					<source>Cámaras de CCTV</source>
					<date-in-citation content-type="access-date" iso-8601-date="2018-00-00">October 31st of 2018</date-in-citation>
					<comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.medellin.gov.co/simm/camaras-de-circuito-cerrado">https://www.medellin.gov.co/simm/camaras-de-circuito-cerrado</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B53">
				<label>[53]</label>
				<mixed-citation>[53] Everingham, M., Van Gool, L., Williams, C.K., Winn, J. and Zisserman, A., The pascal visual object classes (voc) challenge, Int. J. Comput. Vis., 88(2), pp. 303-338, 2010. DOI: 10.1007/s11263-009-0275-4</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Everingham</surname>
							<given-names>M.</given-names>
						</name>
						<name>
							<surname>Van Gool</surname>
							<given-names>L.</given-names>
						</name>
						<name>
							<surname>Williams</surname>
							<given-names>C.K.</given-names>
						</name>
						<name>
							<surname>Winn</surname>
							<given-names>J.</given-names>
						</name>
						<name>
							<surname>Zisserman</surname>
							<given-names>A</given-names>
						</name>
					</person-group>
					<article-title>The pascal visual object classes (voc) challenge</article-title>
					<source>Int. J. Comput. Vis</source>
					<volume>88</volume>
					<issue>2</issue>
					<fpage>303</fpage>
					<lpage>338</lpage>
					<year>2010</year>
					<pub-id pub-id-type="doi">10.1007/s11263-009-0275-4</pub-id>
				</element-citation>
			</ref>
			<ref id="B54">
				<label>[54]</label>
				<mixed-citation>[54] Redmon, J. and Farhadi, A., YOLOv3: an incremental improvement, Tech. Report, in: Computer Vision and Pattern Recognition (cs.CV), [online]. 2018, 6 P. ArXiv180402767 Cs, Available at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1804.02767">http://arxiv.org/abs/1804.02767</ext-link>
				</mixed-citation>
				<element-citation publication-type="book">
					<person-group person-group-type="author">
						<name>
							<surname>Redmon</surname>
							<given-names>J.</given-names>
						</name>
						<name>
							<surname>Farhadi</surname>
							<given-names>A</given-names>
						</name>
					</person-group>
					<source>YOLOv3: an incremental improvement, Tech. Report</source>
					<publisher-name>Computer Vision and Pattern Recognition (cs.CV)</publisher-name>
					<year>2018</year>
					<ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1804.02767">http://arxiv.org/abs/1804.02767</ext-link>
				</element-citation>
			</ref>
			<ref id="B55">
				<label>[55]</label>
				<mixed-citation>[55] Ng, A., Machine learning yearning, URL <ext-link ext-link-type="uri" xlink:href="Http://www Mlyearning Org96">Http://www Mlyearning Org96</ext-link>, 2017.</mixed-citation>
				<element-citation publication-type="book">
					<person-group person-group-type="author">
						<name>
							<surname>Ng</surname>
							<given-names>A</given-names>
						</name>
					</person-group>
					<source>Machine learning yearning</source>
					<ext-link ext-link-type="uri" xlink:href="Http://www Mlyearning Org96">Http://www Mlyearning Org96</ext-link>
					<year>2017</year>
				</element-citation>
			</ref>
			<ref id="B56">
				<label>[56]</label>
				<mixed-citation>[56] Yin, F., Makris, D. and Velastin, S.A., Performance evaluation of object tracking algorithms, in: IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, Rio De Janeiro, Brazil, [online]. 2007. Available at: <ext-link ext-link-type="uri" xlink:href="https://pdfs.semanticscholar.org/ad76/bdc7d06a7ec496ac788d667c6ad5fcc0fe41.pdf">https://pdfs.semanticscholar.org/ad76/bdc7d06a7ec496ac788d667c6ad5fcc0fe41.pdf</ext-link>
				</mixed-citation>
				<element-citation publication-type="confproc">
					<person-group person-group-type="author">
						<name>
							<surname>Yin</surname>
							<given-names>F.</given-names>
						</name>
						<name>
							<surname>Makris</surname>
							<given-names>D.</given-names>
						</name>
						<name>
							<surname>Velastin</surname>
							<given-names>S.A</given-names>
						</name>
					</person-group>
					<source>Performance evaluation of object tracking algorithms</source>
					<conf-name>IEEE International Workshop on Performance Evaluation of Tracking and Surveillance</conf-name>
					<conf-loc>Rio De Janeiro, Brazil</conf-loc>
					<year>2007</year>
					<ext-link ext-link-type="uri" xlink:href="https://pdfs.semanticscholar.org/ad76/bdc7d06a7ec496ac788d667c6ad5fcc0fe41.pdf">https://pdfs.semanticscholar.org/ad76/bdc7d06a7ec496ac788d667c6ad5fcc0fe41.pdf</ext-link>
				</element-citation>
			</ref>
			<ref id="B57">
				<label>[57]</label>
				<mixed-citation>[57] Espinosa-Oviedo, J.E., Detection and tracking of motorcycles in urban environments by using video sequences with high level of oclussion, PhD Thesis, Universidad Nacional de Colombia, Medellín campus, Medellín, Colombia, 2019.</mixed-citation>
				<element-citation publication-type="thesis">
					<person-group person-group-type="author">
						<name>
							<surname>Espinosa-Oviedo</surname>
							<given-names>J.E</given-names>
						</name>
					</person-group>
					<source>Detection and tracking of motorcycles in urban environments by using video sequences with high level of oclussion</source>
					<comment content-type="degree">PhD Thesis</comment>
					<publisher-name>Universidad Nacional de Colombia</publisher-name>
					<publisher-loc>Medellín, Colombia</publisher-loc>
					<publisher-loc>Medellín, Colombia</publisher-loc>
					<year>2019</year>
				</element-citation>
			</ref>
		</ref-list>
		<fn-group>
			<fn fn-type="other" id="fn1">
				<label>J.E. Espinosa-Oviedo,</label>
				<p> was born in Bogotá D.C., Colombia, in 1973. He received the BSc. in System Engineering in 2001, from the Universidad Los Libertadores de Colombia, and MSc. in Artificial Intelligence in 2003, from the Katholieke Universiteit Leuven, Belgium. Currently, he is candidate to PhD. degree in Systems Engineering from Universidad Nacional de Colombia. From 2010, he has been a full-time professor at the Politécnico Colombiano Jaime Isaza Cadavid, Medellín Colombia. He is teaching in areas as programming and artificial intelligence. In the last nine years as a professor, he has scientific publications in national and international journals and congresses, mostly related to his research theme: optimization, artificial intelligence, applied computer vision and related areas. ORCID: 0000-0002-0494-1276</p>
			</fn>
			<fn fn-type="other" id="fn2">
				<label>S.A. Velastin-Carroza,</label>
				<p> (M90, SM12) received the BSc. and MSc. (Research) in Electronics and the PhD. in 1978, 1979, and 1982, respectively, from the University of Manchester, Manchester, U.K., for research on vision systems for pedestrian tracking and road-traffic analysis. He worked in industrial R&amp;D before joining Kings College London, University of London (UK) in 1991 and then Kingston University London where he became director of its Digital Imaging Research Centre and full professor of applied computer vision. In 2013 he became research professor at the University of Santiago Chile, and in 2015 he moved to the University Carlos III of Madrid, Spain where he was a Marie Curie Professor. He has worked in many EU-funded projects and is also a Fellow of the IET. ORCID: 0000-0001-6775-1737</p>
			</fn>
			<fn fn-type="other" id="fn3">
				<label>J.W. Branch-Bedoya,</label>
				<p> received the BSc. in Mining Engineering, his MSc. in System Engineering and his PhD. in Engineering in 1995, 1997 and 2007 respectively, all of tehm from the Universidad Nacional de Colombia, Campus Medellin, Currently, he is a full professor in the Department of Computer Science at Universidad Nacional de Colombia, Campus Medellin. His main research interests encompass computer vision, image processing and their applications to the industry field and applications of pattern recognition ORCID: 0000-0002-0378-028X</p>
			</fn>
			<fn fn-type="other" id="fn4">
				<label>How to cite:</label>
				<p> Espinosa-Oviedo, J.E, Velastín-Carroza, S.A. and Branch-Bedoya, J.W, EspiNet V2: a region based deep learning model for detecting motorcycles in urban scenarios. DYNA, 86(211), pp. 317-326, October - December, 2019.</p>
			</fn>
		</fn-group>
	</back>
</article>