Search | arXiv e-print repository

BaSeNet: A Learning-based Mobile Manipulator Base Pose Sequence Planning for Pickup Tasks

Authors: Lakshadeep Naik, Sinan Kalkan, Sune L. Sørensen, Mikkel B. Kjærgaard, Norbert Krüger

Abstract: In many applications, a mobile manipulator robot is required to grasp a set of objects distributed in space. This may not be feasible from a single base pose and the robot must plan the sequence of base poses for gras** all objects, minimizing the total navigation and gras** time. This is a Combinatorial Optimization problem that can be solved using exact methods, which provide optimal solutio… ▽ More In many applications, a mobile manipulator robot is required to grasp a set of objects distributed in space. This may not be feasible from a single base pose and the robot must plan the sequence of base poses for gras** all objects, minimizing the total navigation and gras** time. This is a Combinatorial Optimization problem that can be solved using exact methods, which provide optimal solutions but are computationally expensive, or approximate methods, which offer computationally efficient but sub-optimal solutions. Recent studies have shown that learning-based methods can solve Combinatorial Optimization problems, providing near-optimal and computationally efficient solutions. In this work, we present BASENET - a learning-based approach to plan the sequence of base poses for the robot to grasp all the objects in the scene. We propose a Reinforcement Learning based solution that learns the base poses for gras** individual objects and the sequence in which the objects should be grasped to minimize the total navigation and gras** costs using Layered Learning. As the problem has a varying number of states and actions, we represent states and actions as a graph and use Graph Neural Networks for learning. We show that the proposed method can produce comparable solutions to exact and approximate methods with significantly less computation time. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Submitted to IROS 2024

arXiv:2405.13264 [pdf, other]

Part-based Quantitative Analysis for Heatmaps

Authors: Osman Tursun, Sinan Kalkan, Simon Denman, Sridha Sridharan, Clinton Fookes

Abstract: Heatmaps have been instrumental in hel** understand deep network decisions, and are a common approach for Explainable AI (XAI). While significant progress has been made in enhancing the informativeness and accessibility of heatmaps, heatmap analysis is typically very subjective and limited to domain experts. As such, develo** automatic, scalable, and numerical analysis methods to make heatmap-… ▽ More Heatmaps have been instrumental in hel** understand deep network decisions, and are a common approach for Explainable AI (XAI). While significant progress has been made in enhancing the informativeness and accessibility of heatmaps, heatmap analysis is typically very subjective and limited to domain experts. As such, develo** automatic, scalable, and numerical analysis methods to make heatmap-based XAI more objective, end-user friendly, and cost-effective is vital. In addition, there is a need for comprehensive evaluation metrics to assess heatmap quality at a granular level. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2404.09692 [pdf, other]

XoFTR: Cross-modal Feature Matching Transformer

Authors: Önder Tuzcuoğlu, Aybora Köksal, Buğra Sofu, Sinan Kalkan, A. Aydın Alatan

Abstract: We introduce, XoFTR, a cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images. Unlike visible images, TIR images are less susceptible to adverse lighting and weather conditions but present difficulties in matching due to significant texture and intensity differences. Current hand-crafted and learning-based methods for visible-TIR matching fall sh… ▽ More We introduce, XoFTR, a cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images. Unlike visible images, TIR images are less susceptible to adverse lighting and weather conditions but present difficulties in matching due to significant texture and intensity differences. Current hand-crafted and learning-based methods for visible-TIR matching fall short in handling viewpoint, scale, and texture diversities. To address this, XoFTR incorporates masked image modeling pre-training and fine-tuning with pseudo-thermal image augmentation to handle the modality differences. Additionally, we introduce a refined matching pipeline that adjusts for scale discrepancies and enhances match reliability through sub-pixel level refinement. To validate our approach, we collect a comprehensive visible-thermal dataset, and show that our method outperforms existing methods on many benchmarks. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: CVPR Image Matching Workshop, 2024. 12 pages, 7 figures, 5 tables. Codes and dataset are available at https://github.com/OnderT/XoFTR

arXiv:2403.01795 [pdf, other]

RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses

Authors: Bedrettin Cetinkaya, Sinan Kalkan, Emre Akbas

Abstract: Detecting edges in images suffers from the problems of (P1) heavy imbalance between positive and negative classes as well as (P2) label uncertainty owing to disagreement between different annotators. Existing solutions address P1 using class-balanced cross-entropy loss and dice loss and P2 by only predicting edges agreed upon by most annotators. In this paper, we propose RankED, a unified ranking-… ▽ More Detecting edges in images suffers from the problems of (P1) heavy imbalance between positive and negative classes as well as (P2) label uncertainty owing to disagreement between different annotators. Existing solutions address P1 using class-balanced cross-entropy loss and dice loss and P2 by only predicting edges agreed upon by most annotators. In this paper, we propose RankED, a unified ranking-based approach that addresses both the imbalance problem (P1) and the uncertainty problem (P2). RankED tackles these two problems with two components: One component which ranks positive pixels over negative pixels, and the second which promotes high confidence edge pixels to have more label certainty. We show that RankED outperforms previous studies and sets a new state-of-the-art on NYUD-v2, BSDS500 and Multi-cue datasets. Code is available at https://ranked-cvpr24.github.io. △ Less

Submitted 7 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: accepted to CVPR 2024

arXiv:2312.17031 [pdf, other]

Generalized Mask-aware IoU for Anchor Assignment for Real-time Instance Segmentation

Authors: Barış Can Çam, Kemal Öksüz, Fehmi Kahraman, Zeynep Sonat Baltacı, Sinan Kalkan, Emre Akbaş

Abstract: This paper introduces Generalized Mask-aware Intersection-over-Union (GmaIoU) as a new measure for positive-negative assignment of anchor boxes during training of instance segmentation methods. Unlike conventional IoU measure or its variants, which only consider the proximity of anchor and ground-truth boxes; GmaIoU additionally takes into account the segmentation mask. This enables GmaIoU to prov… ▽ More This paper introduces Generalized Mask-aware Intersection-over-Union (GmaIoU) as a new measure for positive-negative assignment of anchor boxes during training of instance segmentation methods. Unlike conventional IoU measure or its variants, which only consider the proximity of anchor and ground-truth boxes; GmaIoU additionally takes into account the segmentation mask. This enables GmaIoU to provide more accurate supervision during training. We demonstrate the effectiveness of GmaIoU by replacing IoU with our GmaIoU in ATSS, a state-of-the-art (SOTA) assigner. Then, we train YOLACT, a real-time instance segmentation method, using our GmaIoU-based ATSS assigner. The resulting YOLACT based on the GmaIoU assigner outperforms (i) ATSS with IoU by $\sim 1.0-1.5$ mask AP, (ii) YOLACT with a fixed IoU threshold assigner by $\sim 1.5-2$ mask AP over different image sizes and (iii) decreases the inference time by $25 \%$ owing to using less anchors. Taking advantage of this efficiency, we further devise GmaYOLACT, a faster and $+7$ mask AP points more accurate detector than YOLACT. Our best model achieves $38.7$ mask AP at $26$ fps on COCO test-dev establishing a new state-of-the-art for real-time instance segmentation. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 28 pages, 4 figures

arXiv:2312.11299 [pdf, other]

Uncertainty-based Fairness Measures

Authors: Selim Kuzucu, Jiaee Cheong, Hatice Gunes, Sinan Kalkan

Abstract: Unfair predictions of machine learning (ML) models impede their broad acceptance in real-world settings. Tackling this arduous challenge first necessitates defining what it means for an ML model to be fair. This has been addressed by the ML community with various measures of fairness that depend on the prediction outcomes of the ML models, either at the group level or the individual level. These f… ▽ More Unfair predictions of machine learning (ML) models impede their broad acceptance in real-world settings. Tackling this arduous challenge first necessitates defining what it means for an ML model to be fair. This has been addressed by the ML community with various measures of fairness that depend on the prediction outcomes of the ML models, either at the group level or the individual level. These fairness measures are limited in that they utilize point predictions, neglecting their variances, or uncertainties, making them susceptible to noise, missingness and shifts in data. In this paper, we first show that an ML model may appear to be fair with existing point-based fairness measures but biased against a demographic group in terms of prediction uncertainties. Then, we introduce new fairness measures based on different types of uncertainties, namely, aleatoric uncertainty and epistemic uncertainty. We demonstrate on many datasets that (i) our uncertainty-based measures are complementary to existing measures of fairness, and (ii) they provide more insights about the underlying issues leading to bias. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.14090 [pdf, other]

Class Uncertainty: A Measure to Mitigate Class Imbalance

Authors: Z. S. Baltaci, K. Oksuz, S. Kuzucu, K. Tezoren, B. K. Konar, A. Ozkan, E. Akbas, S. Kalkan

Abstract: Class-wise characteristics of training examples affect the performance of deep classifiers. A well-studied example is when the number of training examples of classes follows a long-tailed distribution, a situation that is likely to yield sub-optimal performance for under-represented classes. This class imbalance problem is conventionally addressed by approaches relying on the class-wise cardinalit… ▽ More Class-wise characteristics of training examples affect the performance of deep classifiers. A well-studied example is when the number of training examples of classes follows a long-tailed distribution, a situation that is likely to yield sub-optimal performance for under-represented classes. This class imbalance problem is conventionally addressed by approaches relying on the class-wise cardinality of training examples, such as data resampling. In this paper, we demonstrate that considering solely the cardinality of classes does not cover all issues causing class imbalance. To measure class imbalance, we propose "Class Uncertainty" as the average predictive uncertainty of the training examples, and we show that this novel measure captures the differences across classes better than cardinality. We also curate SVCI-20 as a novel dataset in which the classes have equal number of training examples but they differ in terms of their hardness; thereby causing a type of class imbalance which cannot be addressed by the approaches relying on cardinality. We incorporate our "Class Uncertainty" measure into a diverse set of ten class imbalance mitigation methods to demonstrate its effectiveness on long-tailed datasets as well as on our SVCI-20. Code and datasets will be made available. △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2308.08062 [pdf, other]

doi 10.1051/0004-6361/202346892

A large topographic feature on the surface of the trans-Neptunian object (307261) 2002 MS$_4$ measured from stellar occultations

Authors: F. L. Rommel, F. Braga-Ribas, J. L. Ortiz, B. Sicardy, P. Santos-Sanz, J. Desmars, J. I. B. Camargo, R. Vieira-Martins, M. Assafin, B. E. Morgado, R. C. Boufleur, G. Benedetti-Rossi, A. R. Gomes-Júnior, E. Fernández-Valenzuela, B. J. Holler, D. Souami, R. Duffard, G. Margoti, M. Vara-Lubiano, J. Lecacheux, J. L. Plouvier, N. Morales, A. Maury, J. Fabrega, P. Ceravolo , et al. (179 additional authors not shown)

Abstract: This work aims at constraining the size, shape, and geometric albedo of the dwarf planet candidate 2002 MS4 through the analysis of nine stellar occultation events. Using multichord detection, we also studied the object's topography by analyzing the obtained limb and the residuals between observed chords and the best-fitted ellipse. We predicted and organized the observational campaigns of nine st… ▽ More This work aims at constraining the size, shape, and geometric albedo of the dwarf planet candidate 2002 MS4 through the analysis of nine stellar occultation events. Using multichord detection, we also studied the object's topography by analyzing the obtained limb and the residuals between observed chords and the best-fitted ellipse. We predicted and organized the observational campaigns of nine stellar occultations by 2002 MS4 between 2019 and 2022, resulting in two single-chord events, four double-chord detections, and three events with three to up to sixty-one positive chords. Using 13 selected chords from the 8 August 2020 event, we determined the global elliptical limb of 2002 MS4. The best-fitted ellipse, combined with the object's rotational information from the literature, constrains the object's size, shape, and albedo. Additionally, we developed a new method to characterize topography features on the object's limb. The global limb has a semi-major axis of 412 $\pm$ 10 km, a semi-minor axis of 385 $\pm$ 17 km, and the position angle of the minor axis is 121 $^\circ$ $\pm$ 16$^\circ$. From this instantaneous limb, we obtained 2002 MS4's geometric albedo and the projected area-equivalent diameter. Significant deviations from the fitted ellipse in the northernmost limb are detected from multiple sites highlighting three distinct topographic features: one 11 km depth depression followed by a 25$^{+4}_{-5}$ km height elevation next to a crater-like depression with an extension of 322 $\pm$ 39 km and 45.1 $\pm$ 1.5 km deep. Our results present an object that is $\approx$138 km smaller in diameter than derived from thermal data, possibly indicating the presence of a so-far unknown satellite. However, within the error bars, the geometric albedo in the V-band agrees with the results published in the literature, even with the radiometric-derived albedo. △ Less

Submitted 23 August, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

Journal ref: A&A 678, A167 (2023)

arXiv:2301.01019 [pdf, other]

Correlation Loss: Enforcing Correlation between Classification and Localization

Authors: Fehmi Kahraman, Kemal Oksuz, Sinan Kalkan, Emre Akbas

Abstract: Object detectors are conventionally trained by a weighted sum of classification and localization losses. Recent studies (e.g., predicting IoU with an auxiliary head, Generalized Focal Loss, Rank & Sort Loss) have shown that forcing these two loss terms to interact with each other in non-conventional ways creates a useful inductive bias and improves performance. Inspired by these works, we focus on… ▽ More Object detectors are conventionally trained by a weighted sum of classification and localization losses. Recent studies (e.g., predicting IoU with an auxiliary head, Generalized Focal Loss, Rank & Sort Loss) have shown that forcing these two loss terms to interact with each other in non-conventional ways creates a useful inductive bias and improves performance. Inspired by these works, we focus on the correlation between classification and localization and make two main contributions: (i) We provide an analysis about the effects of correlation between classification and localization tasks in object detectors. We identify why correlation affects the performance of various NMS-based and NMS-free detectors, and we devise measures to evaluate the effect of correlation and use them to analyze common detectors. (ii) Motivated by our observations, e.g., that NMS-free detectors can also benefit from correlation, we propose Correlation Loss, a novel plug-in loss function that improves the performance of various object detectors by directly optimizing correlation coefficients: E.g., Correlation Loss on Sparse R-CNN, an NMS-free method, yields 1.6 AP gain on COCO and 1.8 AP gain on Cityscapes dataset. Our best model on Sparse R-CNN reaches 51.0 AP without test-time augmentation on COCO test-dev, reaching state-of-the-art. Code is available at https://github.com/fehmikahraman/CorrLoss △ Less

Submitted 3 January, 2023; originally announced January 2023.

Comments: Accepted to AAAI 2023

arXiv:2209.07268 [pdf, other]

AssembleRL: Learning to Assemble Furniture from Their Point Clouds

Authors: Özgür Aslan, Burak Bolat, Batuhan Bal, Tuğba Tümer, Erol Şahin, Sinan Kalkan

Abstract: The rise of simulation environments has enabled learning-based approaches for assembly planning, which is otherwise a labor-intensive and daunting task. Assembling furniture is especially interesting since furniture are intricate and pose challenges for learning-based approaches. Surprisingly, humans can solve furniture assembly mostly given a 2D snapshot of the assembled product. Although recent… ▽ More The rise of simulation environments has enabled learning-based approaches for assembly planning, which is otherwise a labor-intensive and daunting task. Assembling furniture is especially interesting since furniture are intricate and pose challenges for learning-based approaches. Surprisingly, humans can solve furniture assembly mostly given a 2D snapshot of the assembled product. Although recent years have witnessed promising learning-based approaches for furniture assembly, they assume the availability of correct connection labels for each assembly step, which are expensive to obtain in practice. In this paper, we alleviate this assumption and aim to solve furniture assembly with as little human expertise and supervision as possible. To be specific, we assume the availability of the assembled point cloud, and comparing the point cloud of the current assembly and the point cloud of the target product, obtain a novel reward signal based on two measures: Incorrectness and incompleteness. We show that our novel reward signal can train a deep network to successfully assemble different types of furniture. Code and networks available here: https://github.com/METU-KALFA/AssembleRL △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: 6 pages, 6 figures, iros2022

arXiv:2209.02482 [pdf, other]

Segment Augmentation and Differentiable Ranking for Logo Retrieval

Authors: Feyza Yavuz, Sinan Kalkan

Abstract: Logo retrieval is a challenging problem since the definition of similarity is more subjective compared to image retrieval tasks and the set of known similarities is very scarce. To tackle this challenge, in this paper, we propose a simple but effective segment-based augmentation strategy to introduce artificially similar logos for training deep networks for logo retrieval. In this novel augmentati… ▽ More Logo retrieval is a challenging problem since the definition of similarity is more subjective compared to image retrieval tasks and the set of known similarities is very scarce. To tackle this challenge, in this paper, we propose a simple but effective segment-based augmentation strategy to introduce artificially similar logos for training deep networks for logo retrieval. In this novel augmentation strategy, we first find segments in a logo and apply transformations such as rotation, scaling, and color change, on the segments, unlike the conventional image-level augmentation strategies. Moreover, we evaluate whether the recently introduced ranking-based loss function, Smooth-AP, is a better approach for learning similarity for logo retrieval. On the large scale METU Trademark Dataset, we show that (i) our segment-based augmentation strategy improves retrieval performance compared to the baseline model or image-level augmentation strategies, and (ii) Smooth-AP indeed performs better than conventional losses for logo retrieval. △ Less

Submitted 13 September, 2022; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: ICPR2022, Poster Presentation

arXiv:2205.12882 [pdf, other]

doi 10.1051/0004-6361/202141546

Physical properties of the trans-Neptunian object (38628) Huya from a multi-chord stellar occultation

Authors: P. Santos-Sanz, J. L. Ortiz, B. Sicardy, M. Popescu, G. Benedetti-Rossi, N. Morales, M. Vara-Lubiano, J. I. B. Camargo, C. L. Pereira, F. L. Rommel, M. Assafin, J. Desmars, F. Braga-Ribas, R. Duffard, J. Marques Oliveira, R. Vieira-Martins, E. Fernández-Valenzuela, B. E. Morgado, M. Acar, S. Anghel, E. Atalay, A. Ateş, H. Bakış, V. Bakış, Z. Eker , et al. (63 additional authors not shown)

Abstract: Within our international program to obtain accurate physical properties of trans-Neptunian objects (TNOs) we predicted a stellar occultation by the TNO (38628) Huya of the star Gaia DR2 4352760586390566400 (mG = 11.5 mag.) for March 18, 2019. After an extensive observational campaign, we updated the prediction and it turned out to be favorable to central Europe. Therefore, we mobilized half a hund… ▽ More Within our international program to obtain accurate physical properties of trans-Neptunian objects (TNOs) we predicted a stellar occultation by the TNO (38628) Huya of the star Gaia DR2 4352760586390566400 (mG = 11.5 mag.) for March 18, 2019. After an extensive observational campaign, we updated the prediction and it turned out to be favorable to central Europe. Therefore, we mobilized half a hundred professional and amateur astronomers, and the occultation was finally detected from 21 telescopes located at 18 sites. This makes the Huya event one of the best ever observed stellar occultation by a TNO in terms of the number of chords. We determine accurate size, shape, and geometric albedo, and we also provide constraints on the density and other internal properties of this TNO. The 21 positive detections of the occultation by Huya allowed us to obtain well-separated chords which permitted us to fit an ellipse for the limb of the body at the moment of the occultation (i.e., the instantaneous limb) with kilometric accuracy. The projected semi-major and minor axes of the best ellipse fit obtained using the occultation data are (a', b') = (217.6 $\pm$ 3.5 km, 194.1 $\pm$ 6.1 km) with a position angle of the minor axis P' = 55.2 $\pm$ 9.1 degrees. From this fit, the projected area-equivalent diameter is 411.0 $\pm$ 7.3 km. This diameter is compatible with the equivalent diameter for Huya obtained from radiometric techniques (D = 406 $\pm$ 16 km). From this instantaneous limb, we obtained the geometric albedo for Huya (p$\rm_V$ = 0.079 $\pm$ 0.004) and we explored possible 3D shapes and constraints to the mass density for this TNO. We did not detect the satellite of Huya through this occultation, but the presence of rings or debris around Huya is constrained using the occultation data. We also derived an upper limit for a putative Pluto-like global atmosphere of about p$_{\rm surf}$ = 10 nbar. △ Less

Submitted 30 May, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: Accepted for publication in Astronomy & Astrophysics (30-April-2022). 19 pages, 7 figures

Journal ref: A&A 664, A130 (2022)

arXiv:2204.06512 [pdf, other]

Does depth estimation help object detection?

Authors: Bedrettin Cetinkaya, Sinan Kalkan, Emre Akbas

Abstract: Ground-truth depth, when combined with color data, helps improve object detection accuracy over baseline models that only use color. However, estimated depth does not always yield improvements. Many factors affect the performance of object detection when estimated depth is used. In this paper, we comprehensively investigate these factors with detailed experiments, such as using ground-truth vs. es… ▽ More Ground-truth depth, when combined with color data, helps improve object detection accuracy over baseline models that only use color. However, estimated depth does not always yield improvements. Many factors affect the performance of object detection when estimated depth is used. In this paper, we comprehensively investigate these factors with detailed experiments, such as using ground-truth vs. estimated depth, effects of different state-of-the-art depth estimation networks, effects of using different indoor and outdoor RGB-D datasets as training data for depth estimation, and different architectural choices for integrating depth to the base object detector network. We propose an early concatenation strategy of depth, which yields higher mAP than previous works' while using significantly fewer parameters. △ Less

Submitted 13 April, 2022; originally announced April 2022.

Comments: Accepted to Image and Vision Computing

arXiv:2110.09734 [pdf, other]

Mask-aware IoU for Anchor Assignment in Real-time Instance Segmentation

Authors: Kemal Oksuz, Baris Can Cam, Fehmi Kahraman, Zeynep Sonat Baltaci, Sinan Kalkan, Emre Akbas

Abstract: This paper presents Mask-aware Intersection-over-Union (maIoU) for assigning anchor boxes as positives and negatives during training of instance segmentation methods. Unlike conventional IoU or its variants, which only considers the proximity of two boxes; maIoU consistently measures the proximity of an anchor box with not only a ground truth box but also its associated ground truth mask. Thus, ad… ▽ More This paper presents Mask-aware Intersection-over-Union (maIoU) for assigning anchor boxes as positives and negatives during training of instance segmentation methods. Unlike conventional IoU or its variants, which only considers the proximity of two boxes; maIoU consistently measures the proximity of an anchor box with not only a ground truth box but also its associated ground truth mask. Thus, additionally considering the mask, which, in fact, represents the shape of the object, maIoU enables a more accurate supervision during training. We present the effectiveness of maIoU on a state-of-the-art (SOTA) assigner, ATSS, by replacing IoU operation by our maIoU and training YOLACT, a SOTA real-time instance segmentation method. Using ATSS with maIoU consistently outperforms (i) ATSS with IoU by $\sim 1$ mask AP, (ii) baseline YOLACT with fixed IoU threshold assigner by $\sim 2$ mask AP over different image sizes and (iii) decreases the inference time by $25 \%$ owing to using less anchors. Then, exploiting this efficiency, we devise maYOLACT, a faster and $+6$ AP more accurate detector than YOLACT. Our best model achieves $37.7$ mask AP at $25$ fps on COCO test-dev establishing a new state-of-the-art for real-time instance segmentation. Code is available at https://github.com/kemaloksuz/Mask-aware-IoU △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: BMVC 2021, camera ready version

arXiv:2107.11669 [pdf, other]

Rank & Sort Loss for Object Detection and Instance Segmentation

Authors: Kemal Oksuz, Baris Can Cam, Emre Akbas, Sinan Kalkan

Abstract: We propose Rank & Sort (RS) Loss, a ranking-based loss function to train deep object detection and instance segmentation methods (i.e. visual detectors). RS Loss supervises the classifier, a sub-network of these methods, to rank each positive above all negatives as well as to sort positives among themselves with respect to (wrt.) their localisation qualities (e.g. Intersection-over-Union - IoU). T… ▽ More We propose Rank & Sort (RS) Loss, a ranking-based loss function to train deep object detection and instance segmentation methods (i.e. visual detectors). RS Loss supervises the classifier, a sub-network of these methods, to rank each positive above all negatives as well as to sort positives among themselves with respect to (wrt.) their localisation qualities (e.g. Intersection-over-Union - IoU). To tackle the non-differentiable nature of ranking and sorting, we reformulate the incorporation of error-driven update with backpropagation as Identity Update, which enables us to model our novel sorting error among positives. With RS Loss, we significantly simplify training: (i) Thanks to our sorting objective, the positives are prioritized by the classifier without an additional auxiliary head (e.g. for centerness, IoU, mask-IoU), (ii) due to its ranking-based nature, RS Loss is robust to class imbalance, and thus, no sampling heuristic is required, and (iii) we address the multi-task nature of visual detectors using tuning-free task-balancing coefficients. Using RS Loss, we train seven diverse visual detectors only by tuning the learning rate, and show that it consistently outperforms baselines: e.g. our RS Loss improves (i) Faster R-CNN by ~ 3 box AP and aLRP Loss (ranking-based baseline) by ~ 2 box AP on COCO dataset, (ii) Mask R-CNN with repeat factor sampling (RFS) by 3.5 mask AP (~ 7 AP for rare classes) on LVIS dataset; and also outperforms all counterparts. Code is available at: https://github.com/kemaloksuz/RankSortLoss △ Less

Submitted 30 August, 2021; v1 submitted 24 July, 2021; originally announced July 2021.

Comments: ICCV 2021, oral presentation

arXiv:2011.10772 [pdf, other]

One Metric to Measure them All: Localisation Recall Precision (LRP) for Evaluating Visual Detection Tasks

Authors: Kemal Oksuz, Baris Can Cam, Sinan Kalkan, Emre Akbas

Abstract: Despite being widely used as a performance measure for visual detection tasks, Average Precision (AP) is limited in (i) reflecting localisation quality, (ii) interpretability and (iii) robustness to the design choices regarding its computation, and its applicability to outputs without confidence scores. Panoptic Quality (PQ), a measure proposed for evaluating panoptic segmentation (Kirillov et al.… ▽ More Despite being widely used as a performance measure for visual detection tasks, Average Precision (AP) is limited in (i) reflecting localisation quality, (ii) interpretability and (iii) robustness to the design choices regarding its computation, and its applicability to outputs without confidence scores. Panoptic Quality (PQ), a measure proposed for evaluating panoptic segmentation (Kirillov et al., 2019), does not suffer from these limitations but is limited to panoptic segmentation. In this paper, we propose Localisation Recall Precision (LRP) Error as the average matching error of a visual detector computed based on both its localisation and classification qualities for a given confidence score threshold. LRP Error, initially proposed only for object detection by Oksuz et al. (2018), does not suffer from the aforementioned limitations and is applicable to all visual detection tasks. We also introduce Optimal LRP (oLRP) Error as the minimum LRP Error obtained over confidence scores to evaluate visual detectors and obtain optimal thresholds for deployment. We provide a detailed comparative analysis of LRP Error with AP and PQ, and use nearly 100 state-of-the-art visual detectors from seven visual detection tasks (i.e. object detection, keypoint detection, instance segmentation, panoptic segmentation, visual relationship detection, zero-shot detection and generalised zero-shot detection) using ten datasets to empirically show that LRP Error provides richer and more discriminative information than its counterparts. Code available at: https://github.com/kemaloksuz/LRP-Error △ Less

Submitted 21 November, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

Comments: Accepted to TPAMI

arXiv:2011.08819 [pdf, other]

Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks

Authors: Nikhil Churamani, Sinan Kalkan, Hatice Gunes

Abstract: Most state-of-the-art approaches for Facial Action Unit (AU) detection rely upon evaluating facial expressions from static frames, encoding a snapshot of heightened facial activity. In real-world interactions, however, facial expressions are usually more subtle and evolve in a temporal manner requiring AU detection models to learn spatial as well as temporal information. In this paper, we focus on… ▽ More Most state-of-the-art approaches for Facial Action Unit (AU) detection rely upon evaluating facial expressions from static frames, encoding a snapshot of heightened facial activity. In real-world interactions, however, facial expressions are usually more subtle and evolve in a temporal manner requiring AU detection models to learn spatial as well as temporal information. In this paper, we focus on both spatial and spatio-temporal features encoding the temporal evolution of facial AU activation. For this purpose, we propose the Action Unit Lifecycle-Aware Capsule Network (AULA-Caps) that performs AU detection using both frame and sequence-level features. While at the frame-level the capsule layers of AULA-Caps learn spatial feature primitives to determine AU activations, at the sequence-level, it learns temporal dependencies between contiguous frames by focusing on relevant spatio-temporal segments in the sequence. The learnt feature capsules are routed together such that the model learns to selectively focus more on spatial or spatio-temporal information depending upon the AU lifecycle. The proposed model is evaluated on the commonly used BP4D and GFT benchmark datasets obtaining state-of-the-art results on both the datasets. △ Less

Submitted 3 March, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

Comments: Updated Figure 6 and the Acknowledgements. Corrected typos. 11 pages, 6 figures, 3 tables

arXiv:2011.06978 [pdf, other]

Transformer-Encoder Detector Module: Using Context to Improve Robustness to Adversarial Attacks on Object Detection

Authors: Faisal Alamri, Sinan Kalkan, Nicolas Pugeault

Abstract: Deep neural network approaches have demonstrated high performance in object recognition (CNN) and detection (Faster-RCNN) tasks, but experiments have shown that such architectures are vulnerable to adversarial attacks (FFF, UAP): low amplitude perturbations, barely perceptible by the human eye, can lead to a drastic reduction in labeling performance. This article proposes a new context module, cal… ▽ More Deep neural network approaches have demonstrated high performance in object recognition (CNN) and detection (Faster-RCNN) tasks, but experiments have shown that such architectures are vulnerable to adversarial attacks (FFF, UAP): low amplitude perturbations, barely perceptible by the human eye, can lead to a drastic reduction in labeling performance. This article proposes a new context module, called \textit{Transformer-Encoder Detector Module}, that can be applied to an object detector to (i) improve the labeling of object instances; and (ii) improve the detector's robustness to adversarial attacks. The proposed model achieves higher mAP, F1 scores and AUC average score of up to 13\% compared to the baseline Faster-RCNN detector, and an mAP score 8 points higher on images subjected to FFF or UAP attacks due to the inclusion of both contextual and visual features extracted from scene and encoded into the model. The result demonstrates that a simple ad-hoc context module can improve the reliability of object detectors significantly. △ Less

Submitted 13 November, 2020; originally announced November 2020.

Comments: Accepted for the 25th International Conference on Pattern Recognition (ICPR'2020)

arXiv:2009.13592 [pdf, other]

A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection

Authors: Kemal Oksuz, Baris Can Cam, Emre Akbas, Sinan Kalkan

Abstract: We propose average Localisation-Recall-Precision (aLRP), a unified, bounded, balanced and ranking-based loss function for both classification and localisation tasks in object detection. aLRP extends the Localisation-Recall-Precision (LRP) performance metric (Oksuz et al., 2018) inspired from how Average Precision (AP) Loss extends precision to a ranking-based loss function for classification (Chen… ▽ More We propose average Localisation-Recall-Precision (aLRP), a unified, bounded, balanced and ranking-based loss function for both classification and localisation tasks in object detection. aLRP extends the Localisation-Recall-Precision (LRP) performance metric (Oksuz et al., 2018) inspired from how Average Precision (AP) Loss extends precision to a ranking-based loss function for classification (Chen et al., 2020). aLRP has the following distinct advantages: (i) aLRP is the first ranking-based loss function for both classification and localisation tasks. (ii) Thanks to using ranking for both tasks, aLRP naturally enforces high-quality localisation for high-precision classification. (iii) aLRP provides provable balance between positives and negatives. (iv) Compared to on average $\sim$6 hyperparameters in the loss functions of state-of-the-art detectors, aLRP Loss has only one hyperparameter, which we did not tune in practice. On the COCO dataset, aLRP Loss improves its ranking-based predecessor, AP Loss, up to around $5$ AP points, achieves $48.9$ AP without test time augmentation and outperforms all one-stage detectors. Code available at: https://github.com/kemaloksuz/aLRPLoss . △ Less

Submitted 7 January, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: NeurIPS 2020 spotlight paper

arXiv:2008.01232 [pdf, other]

Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition

Authors: M. Esat Kalfaoglu, Sinan Kalkan, A. Aydin Alatan

Abstract: In this work, we combine 3D convolution with late temporal modeling for action recognition. For this aim, we replace the conventional Temporal Global Average Pooling (TGAP) layer at the end of 3D convolutional architecture with the Bidirectional Encoder Representations from Transformers (BERT) layer in order to better utilize the temporal information with BERT's attention mechanism. We show that t… ▽ More In this work, we combine 3D convolution with late temporal modeling for action recognition. For this aim, we replace the conventional Temporal Global Average Pooling (TGAP) layer at the end of 3D convolutional architecture with the Bidirectional Encoder Representations from Transformers (BERT) layer in order to better utilize the temporal information with BERT's attention mechanism. We show that this replacement improves the performances of many popular 3D convolution architectures for action recognition, including ResNeXt, I3D, SlowFast and R(2+1)D. Moreover, we provide the-state-of-the-art results on both HMDB51 and UCF101 datasets with 85.10% and 98.69% top-1 accuracy, respectively. The code is publicly available. △ Less

Submitted 17 September, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

Comments: Presented on the 2nd Workshop on Video Turing Test: Toward Human-Level Video Story Understanding, ECCV 2020

arXiv:2007.12506 [pdf, other]

Mind Your Manners! A Dataset and A Continual Learning Approach for Assessing Social Appropriateness of Robot Actions

Authors: Jonas Tjomsland, Sinan Kalkan, Hatice Gunes

Abstract: To date, endowing robots with an ability to assess social appropriateness of their actions has not been possible. This has been mainly due to (i) the lack of relevant and labelled data, and (ii) the lack of formulations of this as a lifelong learning problem. In this paper, we address these two issues. We first introduce the Socially Appropriate Domestic Robot Actions dataset (MANNERS-DB), which c… ▽ More To date, endowing robots with an ability to assess social appropriateness of their actions has not been possible. This has been mainly due to (i) the lack of relevant and labelled data, and (ii) the lack of formulations of this as a lifelong learning problem. In this paper, we address these two issues. We first introduce the Socially Appropriate Domestic Robot Actions dataset (MANNERS-DB), which contains appropriateness labels of robot actions annotated by humans. To be able to control but vary the configurations of the scenes and the social settings, MANNERS-DB has been created utilising a simulation environment by uniformly sampling relevant contextual attributes. Secondly, we train and evaluate a baseline Bayesian Neural Network (BNN) that estimates social appropriateness of actions in the MANNERS-DB. Finally, we formulate learning social appropriateness of actions as a continual learning problem using the uncertainty of the BNN parameters. The experimental results show that the social appropriateness of robot actions can be predicted with a satisfactory level of precision. Our work takes robots one step closer to a human-like understanding of (social) appropriateness of actions, with respect to the social context they operate in. To facilitate reproducibility and further progress in this area, the MANNERS-DB, the trained models and the relevant code will be made publicly available. △ Less

Submitted 24 July, 2020; originally announced July 2020.

Comments: Human-Robot Interaction; Social Robotics; Social Appropriateness; Continual Learning. Submitted to the RO-MAN 2020 Workshop on Lifelong Learning for Long-term Human-Robot Interaction (LL4LHRI)

arXiv:2007.10075 [pdf, other]

Investigating Bias and Fairness in Facial Expression Recognition

Authors: Tian Xu, Jennifer White, Sinan Kalkan, Hatice Gunes

Abstract: Recognition of expressions of emotions and affect from facial images is a well-studied research problem in the fields of affective computing and computer vision with a large number of datasets available containing facial images and corresponding expression labels. However, virtually none of these datasets have been acquired with consideration of fair distribution across the human population. There… ▽ More Recognition of expressions of emotions and affect from facial images is a well-studied research problem in the fields of affective computing and computer vision with a large number of datasets available containing facial images and corresponding expression labels. However, virtually none of these datasets have been acquired with consideration of fair distribution across the human population. Therefore, in this work, we undertake a systematic investigation of bias and fairness in facial expression recognition by comparing three different approaches, namely a baseline, an attribute-aware and a disentangled approach, on two well-known datasets, RAF-DB and CelebA. Our results indicate that: (i) data augmentation improves the accuracy of the baseline model, but this alone is unable to mitigate the bias effect; (ii) both the attribute-aware and the disentangled approaches fortified with data augmentation perform better than the baseline approach in terms of accuracy and fairness; (iii) the disentangled approach is the best for mitigating demographic bias; and (iv) the bias mitigation strategies are more suitable in the existence of uneven attribute distribution or imbalanced number of subgroup data. △ Less

Submitted 21 August, 2020; v1 submitted 20 July, 2020; originally announced July 2020.

arXiv:2006.03113 [pdf]

doi 10.1038/nature24051

The size, shape, density and ring of the dwarf planet Haumea from a stellar occultation

Authors: J. L. Ortiz, P. Santos-Sanz, B. Sicardy, G. Benedetti-Rossi, D. Bérard, N. Morales, R. Duffard, F. Braga-Ribas, U. Hopp, C. Ries, V. Nascimbeni, F. Marzari, V. Granata, A. Pál, C. Kiss, T. Pribulla, R. Komžík, K. Hornoch, P. Pravec, P. Bacci, M. Maestripieri, L. Nerli, L. Mazzei, M. Bachini, F. Martinelli , et al. (68 additional authors not shown)

Abstract: Among the four known transneptunian dwarf planets, Haumea is an exotic, very elongated, and fast rotating body. In contrast to the other dwarf planets, its size, shape, albedo, and density are not well constrained. Here we report results of a multi-chord stellar occultation, observed on 2017 January 21. Secondary events observed around the main body are consistent with the presence of a ring of op… ▽ More Among the four known transneptunian dwarf planets, Haumea is an exotic, very elongated, and fast rotating body. In contrast to the other dwarf planets, its size, shape, albedo, and density are not well constrained. Here we report results of a multi-chord stellar occultation, observed on 2017 January 21. Secondary events observed around the main body are consistent with the presence of a ring of opacity 0.5, width 70 km, and radius 2,287$_{-45}^{+75}$ km. The Centaur Chariklo was the first body other than a giant planet to show a ring system and the Centaur Chiron was later found to possess something similar to Chariklo's rings. Haumea is the first body outside the Centaur population with a ring. The ring is coplanar with both Haumea's equator and the orbit of its satellite Hi'iaka. Its radius places close to the 3:1 mean motion resonance with Haumea's spin period. The occultation by the main body provides an instantaneous elliptical limb with axes 1,704 $\pm$ 4 km x 1,138 $\pm$ 26 km. Combined with rotational light-curves, it constrains Haumea's 3D orientation and its triaxial shape, which is inconsistent with a homogeneous body in hydrostatic equilibrium. Haumea's largest axis is at least 2,322 $\pm$ 60 km, larger than thought before. This implies an upper limit of 1,885 $\pm$ 80 kg m$^{-3}$ for Haumea's density, smaller and less puzzling than previous estimations, and a geometric albedo of 0.51 $\pm$ 0.02, also smaller than previous estimations. No global N$_2$ or CH$_4$ atmosphere with pressures larger than 15 and 50 nbar (3-$σ$ limits), respectively, is detected. △ Less

Submitted 4 June, 2020; originally announced June 2020.

Journal ref: Nature, Volume 550, Issue 7675, pp. 219-223 (2017)

arXiv:1910.11713 [pdf, other]

ALET (Automated Labeling of Equipment and Tools): A Dataset, a Baseline and a Usecase for Tool Detection in the Wild

Authors: Fatih Can Kurnaz, Burak Hocaoğlu, Mert Kaan Yılmaz, İdil Sülo, Sinan Kalkan

Abstract: Robots collaborating with humans in realistic environments will need to be able to detect the tools that can be used and manipulated. However, there is no available dataset or study that addresses this challenge in real settings. In this paper, we fill this gap by providing an extensive dataset (METU-ALET) for detecting farming, gardening, office, stonemasonry, vehicle, woodworking and workshop to… ▽ More Robots collaborating with humans in realistic environments will need to be able to detect the tools that can be used and manipulated. However, there is no available dataset or study that addresses this challenge in real settings. In this paper, we fill this gap by providing an extensive dataset (METU-ALET) for detecting farming, gardening, office, stonemasonry, vehicle, woodworking and workshop tools. The scenes correspond to sophisticated environments with or without humans using the tools. The scenes we consider introduce several challenges for object detection, including the small scale of the tools, their articulated nature, occlusion, inter-class invariance, etc. Moreover, we train and compare several state of the art deep object detectors (including Faster R-CNN, Cascade R-CNN, RepPoint and RetinaNet) on our dataset. We observe that the detectors have difficulty in detecting especially small-scale tools or tools that are visually similar to parts of other tools. This in turn supports the importance of our dataset and paper. With the dataset, the code and the trained models, our work provides a basis for further research into tools and their use in robotics applications. △ Less

Submitted 13 December, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

Comments: 7 pages, 4 figures

arXiv:1909.09777 [pdf, other]

Generating Positive Bounding Boxes for Balanced Training of Object Detectors

Authors: Kemal Oksuz, Baris Can Cam, Emre Akbas, Sinan Kalkan

Abstract: Two-stage deep object detectors generate a set of regions-of-interest (RoI) in the first stage, then, in the second stage, identify objects among the proposed RoIs that sufficiently overlap with a ground truth (GT) box. The second stage is known to suffer from a bias towards RoIs that have low intersection-over-union (IoU) with the associated GT boxes. To address this issue, we first propose a sam… ▽ More Two-stage deep object detectors generate a set of regions-of-interest (RoI) in the first stage, then, in the second stage, identify objects among the proposed RoIs that sufficiently overlap with a ground truth (GT) box. The second stage is known to suffer from a bias towards RoIs that have low intersection-over-union (IoU) with the associated GT boxes. To address this issue, we first propose a sampling method to generate bounding boxes (BB) that overlap with a given reference box more than a given IoU threshold. Then, we use this BB generation method to develop a positive RoI (pRoI) generator that produces RoIs following any desired spatial or IoU distribution, for the second-stage. We show that our pRoI generator is able to simulate other sampling methods for positive examples such as hard example mining and prime sampling. Using our generator as an analysis tool, we show that (i) IoU imbalance has an adverse effect on performance, (ii) hard positive example mining improves the performance only for certain input IoU distributions, and (iii) the imbalance among the foreground classes has an adverse effect on performance and that it can be alleviated at the batch level. Finally, we train Faster R-CNN using our pRoI generator and, compared to conventional training, obtain better or on-par performance for low IoUs and significant improvements when trained for higher IoUs for Pascal VOC and MS COCO datasets. The code is available at: https://github.com/kemaloksuz/BoundingBoxGenerator. △ Less

Submitted 19 June, 2020; v1 submitted 21 September, 2019; originally announced September 2019.

Comments: To appear in WACV 20

arXiv:1909.00169 [pdf, other]

Imbalance Problems in Object Detection: A Review

Authors: Kemal Oksuz, Baris Can Cam, Sinan Kalkan, Emre Akbas

Abstract: In this paper, we present a comprehensive review of the imbalance problems in object detection. To analyze the problems in a systematic manner, we introduce a problem-based taxonomy. Following this taxonomy, we discuss each problem in depth and present a unifying yet critical perspective on the solutions in the literature. In addition, we identify major open issues regarding the existing imbalance… ▽ More In this paper, we present a comprehensive review of the imbalance problems in object detection. To analyze the problems in a systematic manner, we introduce a problem-based taxonomy. Following this taxonomy, we discuss each problem in depth and present a unifying yet critical perspective on the solutions in the literature. In addition, we identify major open issues regarding the existing imbalance problems as well as imbalance problems that have not been discussed before. Moreover, in order to keep our review up to date, we provide an accompanying webpage which catalogs papers addressing imbalance problems, according to our problem-based taxonomy. Researchers can track newer studies on this webpage available at: https://github.com/kemaloksuz/ObjectDetectionImbalance . △ Less

Submitted 11 March, 2020; v1 submitted 31 August, 2019; originally announced September 2019.

Comments: Accepted to IEEE TPAMI; currently in press

arXiv:1908.01189 [pdf, other]

Searching for Ambiguous Objects in Videos using Relational Referring Expressions

Authors: Hazan Anayurt, Sezai Artun Ozyegin, Ulfet Cetin, Utku Aktas, Sinan Kalkan

Abstract: Humans frequently use referring (identifying) expressions to refer to objects. Especially in ambiguous settings, humans prefer expressions (called relational referring expressions) that describe an object with respect to a distinguishing, unique object. Unlike studies on video object search using referring expressions, in this paper, our focus is on (i) relational referring expressions in highly a… ▽ More Humans frequently use referring (identifying) expressions to refer to objects. Especially in ambiguous settings, humans prefer expressions (called relational referring expressions) that describe an object with respect to a distinguishing, unique object. Unlike studies on video object search using referring expressions, in this paper, our focus is on (i) relational referring expressions in highly ambiguous settings, and (ii) methods that can both generate and comprehend a referring expression. For this goal, we first introduce a new dataset for video object search with referring expressions that includes numerous copies of the objects, making it difficult to use non-relational expressions. Moreover, we train two baseline deep networks on this dataset, which show promising results. Finally, we propose a deep attention network that significantly outperforms the baselines on our dataset. The dataset and the codes are available at https://github.com/hazananayurt/viref. △ Less

Submitted 20 August, 2019; v1 submitted 3 August, 2019; originally announced August 2019.

Comments: BMVC 2019 camera ready

arXiv:1904.07165 [pdf, other]

doi 10.1109/IROS40897.2019.8968510

Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments

Authors: Fethiye Irmak Doğan, Sinan Kalkan, Iolanda Leite

Abstract: Referring to objects in a natural and unambiguous manner is crucial for effective human-robot interaction. Previous research on learning-based referring expressions has focused primarily on comprehension tasks, while generating referring expressions is still mostly limited to rule-based methods. In this work, we propose a two-stage approach that relies on deep learning for estimating spatial relat… ▽ More Referring to objects in a natural and unambiguous manner is crucial for effective human-robot interaction. Previous research on learning-based referring expressions has focused primarily on comprehension tasks, while generating referring expressions is still mostly limited to rule-based methods. In this work, we propose a two-stage approach that relies on deep learning for estimating spatial relations to describe an object naturally and unambiguously with a referring expression. We compare our method to the state of the art algorithm in ambiguous environments (e.g., environments that include very similar objects with similar relationships). We show that our method generates referring expressions that people find to be more accurate ($\sim$30% better) and would prefer to use ($\sim$32% more often). △ Less

Submitted 5 August, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

Comments: International Conference on Intelligent Robots and Systems (IROS 2019), Demo 1: Finding the described object (https://youtu.be/BE6-F6chW0w), Demo 2: Referring to the pointed object (https://youtu.be/nmmv6JUpy8M), Supplementary Video (https://youtu.be/sFjBa_MHS98)

Journal ref: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (2019) 4992-4999

arXiv:1807.01696 [pdf, other]

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Authors: Kemal Oksuz, Baris Can Cam, Emre Akbas, Sinan Kalkan

Abstract: Average precision (AP), the area under the recall-precision (RP) curve, is the standard performance measure for object detection. Despite its wide acceptance, it has a number of shortcomings, the most important of which are (i) the inability to distinguish very different RP curves, and (ii) the lack of directly measuring bounding box localization accuracy. In this paper, we propose 'Localization R… ▽ More Average precision (AP), the area under the recall-precision (RP) curve, is the standard performance measure for object detection. Despite its wide acceptance, it has a number of shortcomings, the most important of which are (i) the inability to distinguish very different RP curves, and (ii) the lack of directly measuring bounding box localization accuracy. In this paper, we propose 'Localization Recall Precision (LRP) Error', a new metric which we specifically designed for object detection. LRP Error is composed of three components related to localization, false negative (FN) rate and false positive (FP) rate. Based on LRP, we introduce the 'Optimal LRP', the minimum achievable LRP error representing the best achievable configuration of the detector in terms of recall-precision and the tightness of the boxes. In contrast to AP, which considers precisions over the entire recall domain, Optimal LRP determines the 'best' confidence score threshold for a class, which balances the trade-off between localization and recall-precision. In our experiments, we show that, for state-of-the-art object (SOTA) detectors, Optimal LRP provides richer and more discriminative information than AP. We also demonstrate that the best confidence score thresholds vary significantly among classes and detectors. Moreover, we present LRP results of a simple online video object detector which uses a SOTA still image object detector and show that the class-specific optimized thresholds increase the accuracy against the common approach of using a general threshold for all classes. At https://github.com/cancam/LRP we provide the source code that can compute LRP for the PASCAL VOC and MSCOCO datasets. Our source code can easily be adapted to other datasets as well. △ Less

Submitted 5 July, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

Comments: to appear in ECCV 2018

arXiv:1807.00511 [pdf, other]

COSMO: Contextualized Scene Modeling with Boltzmann Machines

Authors: Ilker Bozcan, Sinan Kalkan

Abstract: Scene modeling is very crucial for robots that need to perceive, reason about and manipulate the objects in their environments. In this paper, we adapt and extend Boltzmann Machines (BMs) for contextualized scene modeling. Although there are many models on the subject, ours is the first to bring together objects, relations, and affordances in a highly-capable generative model. For this end, we int… ▽ More Scene modeling is very crucial for robots that need to perceive, reason about and manipulate the objects in their environments. In this paper, we adapt and extend Boltzmann Machines (BMs) for contextualized scene modeling. Although there are many models on the subject, ours is the first to bring together objects, relations, and affordances in a highly-capable generative model. For this end, we introduce a hybrid version of BMs where relations and affordances are introduced with shared, tri-way connections into the model. Moreover, we contribute a dataset for relation estimation and modeling studies. We evaluate our method in comparison with several baselines on object estimation, out-of-context object detection, relation estimation, and affordance estimation tasks. Moreover, to illustrate the generative capability of the model, we show several example scenes that the model is able to generate. △ Less

Submitted 19 December, 2018; v1 submitted 2 July, 2018; originally announced July 2018.

Comments: 40 pages, 15 figures, 9 tables, accepted to the Robotics and Autonomous Systems (RAS) special issue on Semantic Policy and Action Representations for Autonomous Robots (SPAR)

arXiv:1710.05664 [pdf, other]

What is (missing or wrong) in the scene? A Hybrid Deep Boltzmann Machine For Contextualized Scene Modeling

Authors: İlker Bozcan, Yağmur Oymak, İdil Zeynep Alemdar, Sinan Kalkan

Abstract: Scene models allow robots to reason about what is in the scene, what else should be in it, and what should not be in it. In this paper, we propose a hybrid Boltzmann Machine (BM) for scene modeling where relations between objects are integrated. To be able to do that, we extend BM to include tri-way edges between visible (object) nodes and make the network to share the relations across different o… ▽ More Scene models allow robots to reason about what is in the scene, what else should be in it, and what should not be in it. In this paper, we propose a hybrid Boltzmann Machine (BM) for scene modeling where relations between objects are integrated. To be able to do that, we extend BM to include tri-way edges between visible (object) nodes and make the network to share the relations across different objects. We evaluate our method against several baseline models (Deep Boltzmann Machines, and Restricted Boltzmann Machines) on a scene classification dataset, and show that it performs better in several scene reasoning tasks. △ Less

Submitted 20 August, 2018; v1 submitted 16 October, 2017; originally announced October 2017.

Comments: 6 pages, 7 figures, submitted to ICRA 2018

arXiv:1710.04981 [pdf, other]

CINet: A Learning Based Approach to Incremental Context Modeling in Robots

Authors: Fethiye Irmak Doğan, İlker Bozcan, Mehmet Çelik, Sinan Kalkan

Abstract: There have been several attempts at modeling context in robots. However, either these attempts assume a fixed number of contexts or use a rule-based approach to determine when to increment the number of contexts. In this paper, we pose the task of when to increment as a learning problem, which we solve using a Recurrent Neural Network. We show that the network successfully (with 98\% testing accur… ▽ More There have been several attempts at modeling context in robots. However, either these attempts assume a fixed number of contexts or use a rule-based approach to determine when to increment the number of contexts. In this paper, we pose the task of when to increment as a learning problem, which we solve using a Recurrent Neural Network. We show that the network successfully (with 98\% testing accuracy) learns to predict when to increment, and demonstrate, in a scene modeling problem (where the correct number of contexts is not known), that the robot increments the number of contexts in an expected manner (i.e., the entropy of the system is reduced). We also present how the incremental model can be used for various scene reasoning tasks. △ Less

Submitted 29 July, 2018; v1 submitted 13 October, 2017; originally announced October 2017.

Comments: The first two authors have contributed equally, 6 pages, 8 figures, International Conference on Intelligent Robots (IROS 2018)

arXiv:1710.04975 [pdf, other]

A Deep Incremental Boltzmann Machine for Modeling Context in Robots

Authors: Fethiye Irmak Doğan, Hande Çelikkanat, Sinan Kalkan

Abstract: Context is an essential capability for robots that are to be as adaptive as possible in challenging environments. Although there are many context modeling efforts, they assume a fixed structure and number of contexts. In this paper, we propose an incremental deep model that extends Restricted Boltzmann Machines. Our model gets one scene at a time, and gradually extends the contextual model when ne… ▽ More Context is an essential capability for robots that are to be as adaptive as possible in challenging environments. Although there are many context modeling efforts, they assume a fixed structure and number of contexts. In this paper, we propose an incremental deep model that extends Restricted Boltzmann Machines. Our model gets one scene at a time, and gradually extends the contextual model when necessary, either by adding a new context or a new context layer to form a hierarchy. We show on a scene classification benchmark that our method converges to a good estimate of the contexts of the scenes, and performs better or on-par on several tasks compared to other incremental models or non-incremental models. △ Less

Submitted 2 March, 2018; v1 submitted 13 October, 2017; originally announced October 2017.

Comments: 6 pages, 5 figures, International Conference on Robotics and Automation (ICRA 2018)

arXiv:1706.05726 [pdf, other]

Using Deep Networks for Drone Detection

Authors: Cemal Aker, Sinan Kalkan

Abstract: Drone detection is the problem of finding the smallest rectangle that encloses the drone(s) in a video sequence. In this study, we propose a solution using an end-to-end object detection model based on convolutional neural networks. To solve the scarce data problem for training the network, we propose an algorithm for creating an extensive artificial dataset by combining background-subtracted real… ▽ More Drone detection is the problem of finding the smallest rectangle that encloses the drone(s) in a video sequence. In this study, we propose a solution using an end-to-end object detection model based on convolutional neural networks. To solve the scarce data problem for training the network, we propose an algorithm for creating an extensive artificial dataset by combining background-subtracted real images. With this approach, we can achieve precision and recall values both of which are high at the same time. △ Less

Submitted 18 June, 2017; originally announced June 2017.

Comments: To appear in International Workshop on Small-Drone Surveillance, Detection and Counteraction Techniques organised within AVSS 2017

arXiv:1701.05766 [pdf, other]

A Large-scale Dataset and Benchmark for Similar Trademark Retrieval

Authors: Osman Tursun, Cemal Aker, Sinan Kalkan

Abstract: Trademark retrieval (TR) has become an important yet challenging problem due to an ever increasing trend in trademark applications and infringement incidents. There have been many promising attempts for the TR problem, which, however, fell impracticable since they were evaluated with limited and mostly trivial datasets. In this paper, we provide a large-scale dataset with benchmark queries with wh… ▽ More Trademark retrieval (TR) has become an important yet challenging problem due to an ever increasing trend in trademark applications and infringement incidents. There have been many promising attempts for the TR problem, which, however, fell impracticable since they were evaluated with limited and mostly trivial datasets. In this paper, we provide a large-scale dataset with benchmark queries with which different TR approaches can be evaluated systematically. Moreover, we provide a baseline on this benchmark using the widely-used methods applied to TR in the literature. Furthermore, we identify and correct two important issues in TR approaches that were not addressed before: reversal of contrast, and presence of irrelevant text in trademarks severely affect the TR methods. Lastly, we applied deep learning, namely, several popular Convolutional Neural Network models, to the TR problem. To the best of the authors, this is the first attempt to do so. △ Less

Submitted 14 October, 2017; v1 submitted 20 January, 2017; originally announced January 2017.

arXiv:1311.6381 [pdf, other]

The local nature of incompressibility at quantised Hall effect modified by interactions

Authors: E. M. Kendirlik, S. Sirt, S. B. Kalkan, N. Ofek, V. Umansky, A. Siddiki

Abstract: Since the experimental realisation of the integer quantised Hall effect in a two dimensional electron system subject to strong perpendicular magnetic fields in 1980, a central question has been the interrelation between the conductance quantisation and the topological properties of the system. It is conjectured that if the electron system is described by a Bloch hamiltonian, then the system is ins… ▽ More Since the experimental realisation of the integer quantised Hall effect in a two dimensional electron system subject to strong perpendicular magnetic fields in 1980, a central question has been the interrelation between the conductance quantisation and the topological properties of the system. It is conjectured that if the electron system is described by a Bloch hamiltonian, then the system is insulating in the bulk of the sample throughout the quantised Hall plateau due to magnetic field induced energy gap. Meanwhile, the system is conducting at the edges resembling a 2+1 dimensional topological insulator without the time-reversal symmetry. However, the validity of this conjecture remains unclear for finite size, non-periodically bounded real Hall bar devices. Here we show experimentally that the close relationship proposed between the quantised Hall effect and the topological bulk insulator is prone to break for specific magnetic field intervals within the plateau evidenced by our magneto-transport measurements performed on GaAs/AlGaAs high purity Hall bars with two inner contacts embedded to bulk. Our data presents a similar behaviour also for fractional states, in particular for 2/3, 3/5 and 4/3. △ Less

Submitted 25 November, 2013; originally announced November 2013.

Comments: Main text is 6 pages (3 figures) and Supp. Mat. is 11 pages (8 figures)

arXiv:1305.4585 [pdf, other]

doi 10.1140/epjb/e2014-40510-2

The Dip Effect under Integer Quantized Hall Conditions

Authors: S. Erden Gulebaglan, S. B. Kalkan, S. Sirt, E. M. Kendirlik, A. Siddiki

Abstract: In this work we investigate an unusual transport phenomenon observed in two-dimensional electron gas under integer quantum Hall effect conditions. Our calculations are based on the screening theory, using a semi-analytical model. The transport anomalies are \emph{dip} and overshoot effects, where the Hall resistance decreases (or increases) unexpectedly at the quantized resistance plateaus interva… ▽ More In this work we investigate an unusual transport phenomenon observed in two-dimensional electron gas under integer quantum Hall effect conditions. Our calculations are based on the screening theory, using a semi-analytical model. The transport anomalies are \emph{dip} and overshoot effects, where the Hall resistance decreases (or increases) unexpectedly at the quantized resistance plateaus intervals. We report on our numerical findings of the \emph{dip} effect in the Hall resistance, considering GaAs/AlGaAs heterostructures in which we investigated the effect under different experimental conditions. We show that, similar to overshoot, the amplitude of the dip effect is strongly influenced by the edge reconstruction due to electrostatics. It is observed that the steep potential variation close to the physical boundaries of the sample results in narrower incompressible strips, hence, the experimental observation of the dip effect is limited by the properties of these current carrying strips. By performing standard Hall resistance measurements on gate defined narrow samples, we demonstrate that the predictions of the screening theory is in well agreement with our experimental findings. △ Less

Submitted 20 May, 2013; originally announced May 2013.

Comments: 6 Pages, 7 figures

arXiv:1305.1156 [pdf, other]

Anomalous resistance overshoot in the integer quantum Hall effect

Authors: E. M. Kendirlik, S. Sirt, S. B. Kalkan, W. Dietsche, W. Wegscheider, S. Ludwig, A. Siddiki

Abstract: In this work we report experiments on defined by shallow etching narrow Hall bars. The magneto-transport properties of intermediate mobility two-dimensional electron systems are investigated and analyzed within the screening theory of the integer quantized Hall effect. We observe a non-monotonic increase of Hall resistance at the low magnetic field ends of the quantized plateaus, known as the over… ▽ More In this work we report experiments on defined by shallow etching narrow Hall bars. The magneto-transport properties of intermediate mobility two-dimensional electron systems are investigated and analyzed within the screening theory of the integer quantized Hall effect. We observe a non-monotonic increase of Hall resistance at the low magnetic field ends of the quantized plateaus, known as the overshoot effect. Unexpectedly, for Hall bars that are defined by shallow chemical etching the overshoot effect becomes more pronounced at elevated temperatures. We observe the overshoot effect at odd and even integer plateaus, which favor a spin independent explanation, in contrast to discussion in the literature. In a second set of the experiments, we investigate the overshoot effect in gate defined Hall bar and explicitly show that the amplitude of the overshoot effect can be directly controlled by gate voltages. We offer a comprehensive explanation based on scattering between evanescent incompressible channels. △ Less

Submitted 6 May, 2013; originally announced May 2013.

Comments: 7 pages and 5 figures

Showing 1–38 of 38 results for author: Kalkan, S