Search | arXiv e-print repository

DSAM: A Deep Learning Framework for Analyzing Temporal and Spatial Dynamics in Brain Networks

Authors: Bishal Thapaliya, Robyn Miller, Jiayu Chen, Yu-** Wang, Esra Akbas, Ram Sapkota, Bhaskar Ray, Pranav Suresh, Santosh Ghimire, Vince Calhoun, **gyu Liu

Abstract: Resting-state functional magnetic resonance imaging (rs-fMRI) is a noninvasive technique pivotal for understanding human neural mechanisms of intricate cognitive processes. Most rs-fMRI studies compute a single static functional connectivity matrix across brain regions of interest, or dynamic functional connectivity matrices with a sliding window approach. These approaches are at risk of oversimpl… ▽ More Resting-state functional magnetic resonance imaging (rs-fMRI) is a noninvasive technique pivotal for understanding human neural mechanisms of intricate cognitive processes. Most rs-fMRI studies compute a single static functional connectivity matrix across brain regions of interest, or dynamic functional connectivity matrices with a sliding window approach. These approaches are at risk of oversimplifying brain dynamics and lack proper consideration of the goal at hand. While deep learning has gained substantial popularity for modeling complex relational data, its application to uncovering the spatiotemporal dynamics of the brain is still limited. We propose a novel interpretable deep learning framework that learns goal-specific functional connectivity matrix directly from time series and employs a specialized graph neural network for the final classification. Our model, DSAM, leverages temporal causal convolutional networks to capture the temporal dynamics in both low- and high-level feature representations, a temporal attention unit to identify important time points, a self-attention unit to construct the goal-specific connectivity matrix, and a novel variant of graph neural network to capture the spatial dynamics for downstream classification. To validate our approach, we conducted experiments on the Human Connectome Project dataset with 1075 samples to build and interpret the model for the classification of sex group, and the Adolescent Brain Cognitive Development Dataset with 8520 samples for independent testing. Compared our proposed framework with other state-of-art models, results suggested this novel approach goes beyond the assumption of a fixed connectivity matrix and provides evidence of goal-specific brain connectivity patterns, which opens up the potential to gain deeper insights into how the human brain adapts its functional connectivity specific to the task at hand. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 18 Pages, 4 figures

arXiv:2404.15612 [pdf, other]

DyGCL: Dynamic Graph Contrastive Learning For Event Prediction

Authors: Muhammed Ifte Khairul Islam, Khaled Mohammed Saifuddin, Tanvir Hossain, Esra Akbas

Abstract: Predicting events such as political protests, flu epidemics, and criminal activities is crucial to proactively taking necessary measures and implementing required responses to address emerging challenges. Capturing contextual information from textual data for event forecasting poses significant challenges due to the intricate structure of the documents and the evolving nature of events. Recently,… ▽ More Predicting events such as political protests, flu epidemics, and criminal activities is crucial to proactively taking necessary measures and implementing required responses to address emerging challenges. Capturing contextual information from textual data for event forecasting poses significant challenges due to the intricate structure of the documents and the evolving nature of events. Recently, dynamic Graph Neural Networks (GNNs) have been introduced to capture the dynamic patterns of input text graphs. However, these models only utilize node-level representation, causing the loss of the global information from graph-level representation. On the other hand, both node-level and graph-level representations are essential for effective event prediction as node-level representation gives insight into the local structure, and the graph-level representation provides an understanding of the global structure of the temporal graph. To address these challenges, in this paper, we propose a Dynamic Graph Contrastive Learning (DyGCL) method for event prediction. Our model DyGCL employs a local view encoder to learn the evolving node representations, which effectively captures the local dynamic structure of input graphs. Additionally, it harnesses a global view encoder to perceive the hierarchical dynamic graph representation of the input graphs. Then we update the graph representations from both encoders using contrastive learning. In the final stage, DyGCL combines both representations using an attention mechanism and optimizes its capability to predict future events. Our extensive experiment demonstrates that our proposed method outperforms the baseline methods for event prediction on six real-world datasets. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.07094 [pdf, other]

MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints

Authors: Bedirhan Uguz, Ozhan Suat, Batuhan Karagoz, Emre Akbas

Abstract: This paper presents Key2Mesh, a model that takes a set of 2D human pose keypoints as input and estimates the corresponding body mesh. Since this process does not involve any visual (i.e. RGB image) data, the model can be trained on large-scale motion capture (MoCap) datasets, thereby overcoming the scarcity of image datasets with 3D labels. To enable the model's application on RGB images, we first… ▽ More This paper presents Key2Mesh, a model that takes a set of 2D human pose keypoints as input and estimates the corresponding body mesh. Since this process does not involve any visual (i.e. RGB image) data, the model can be trained on large-scale motion capture (MoCap) datasets, thereby overcoming the scarcity of image datasets with 3D labels. To enable the model's application on RGB images, we first run an off-the-shelf 2D pose estimator to obtain the 2D keypoints, and then feed these 2D keypoints to Key2Mesh. To improve the performance of our model on RGB images, we apply an adversarial domain adaptation (DA) method to bridge the gap between the MoCap and visual domains. Crucially, our DA method does not require 3D labels for visual data, which enables adaptation to target sets without the need for costly labels. We evaluate Key2Mesh for the task of estimating 3D human meshes from 2D keypoints, in the absence of RGB and mesh label pairs. Our results on widely used H3.6M and 3DPW datasets show that Key2Mesh sets the new state-of-the-art by outperforming other models in PA-MPJPE for both datasets, and in MPJPE and PVE for the 3DPW dataset. Thanks to our model's simple architecture, it operates at least 12x faster than the prior state-of-the-art model, LGD. Additional qualitative samples and code are available on the project website: https://key2mesh.github.io/. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: accepted to CVPRW 2024

arXiv:2403.01795 [pdf, other]

RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses

Authors: Bedrettin Cetinkaya, Sinan Kalkan, Emre Akbas

Abstract: Detecting edges in images suffers from the problems of (P1) heavy imbalance between positive and negative classes as well as (P2) label uncertainty owing to disagreement between different annotators. Existing solutions address P1 using class-balanced cross-entropy loss and dice loss and P2 by only predicting edges agreed upon by most annotators. In this paper, we propose RankED, a unified ranking-… ▽ More Detecting edges in images suffers from the problems of (P1) heavy imbalance between positive and negative classes as well as (P2) label uncertainty owing to disagreement between different annotators. Existing solutions address P1 using class-balanced cross-entropy loss and dice loss and P2 by only predicting edges agreed upon by most annotators. In this paper, we propose RankED, a unified ranking-based approach that addresses both the imbalance problem (P1) and the uncertainty problem (P2). RankED tackles these two problems with two components: One component which ranks positive pixels over negative pixels, and the second which promotes high confidence edge pixels to have more label certainty. We show that RankED outperforms previous studies and sets a new state-of-the-art on NYUD-v2, BSDS500 and Multi-cue datasets. Code is available at https://ranked-cvpr24.github.io. △ Less

Submitted 7 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: accepted to CVPR 2024

arXiv:2402.16285 [pdf, other]

A Comparison of Deep Learning Models for Proton Background Rejection with the AMS Electromagnetic Calorimeter

Authors: Raheem Karim Hashmani, Emre Akbaş, Melahat Bilge Demirköz

Abstract: The Alpha Magnetic Spectrometer (AMS) is a high-precision particle detector onboard the International Space Station containing six different subdetectors. The Transition Radiation Detector and Electromagnetic Calorimeter (ECAL) are used to separate electrons/positrons from the abundant cosmic-ray proton background. The positron flux measured in space by AMS falls with a power law which unexpecte… ▽ More The Alpha Magnetic Spectrometer (AMS) is a high-precision particle detector onboard the International Space Station containing six different subdetectors. The Transition Radiation Detector and Electromagnetic Calorimeter (ECAL) are used to separate electrons/positrons from the abundant cosmic-ray proton background. The positron flux measured in space by AMS falls with a power law which unexpectedly softens above 25 GeV and then hardens above 280 GeV. Several theoretical models try to explain these phenomena, and a purer measurement of positrons at higher energies is needed to help test them. The currently used methods to reject the proton background at high energies involve extrapolating shower features from the ECAL to use as inputs for boosted decision tree and likelihood classifiers. We present a new approach for particle identification with the AMS ECAL using deep learning (DL). By taking the energy deposition within all the ECAL cells as an input and treating them as pixels in an image-like format, we train an MLP, a CNN, and multiple ResNets and Convolutional vision Transformers (CvTs) as shower classifiers. Proton rejection performance is evaluated using Monte Carlo (MC) events and ISS data separately. For MC, using events with a reconstructed energy between 0.2 - 2 TeV, at 90% electron accuracy, the proton rejection power of our CvT model is more than 5 times that of the other DL models. Similarly, for ISS data with a reconstructed energy between 50 - 70 GeV, the proton rejection power of our CvT model is more than 2.5 times that of the other DL models. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: 19 pages, 11 Figures

arXiv:2312.17031 [pdf, other]

Generalized Mask-aware IoU for Anchor Assignment for Real-time Instance Segmentation

Authors: Barış Can Çam, Kemal Öksüz, Fehmi Kahraman, Zeynep Sonat Baltacı, Sinan Kalkan, Emre Akbaş

Abstract: This paper introduces Generalized Mask-aware Intersection-over-Union (GmaIoU) as a new measure for positive-negative assignment of anchor boxes during training of instance segmentation methods. Unlike conventional IoU measure or its variants, which only consider the proximity of anchor and ground-truth boxes; GmaIoU additionally takes into account the segmentation mask. This enables GmaIoU to prov… ▽ More This paper introduces Generalized Mask-aware Intersection-over-Union (GmaIoU) as a new measure for positive-negative assignment of anchor boxes during training of instance segmentation methods. Unlike conventional IoU measure or its variants, which only consider the proximity of anchor and ground-truth boxes; GmaIoU additionally takes into account the segmentation mask. This enables GmaIoU to provide more accurate supervision during training. We demonstrate the effectiveness of GmaIoU by replacing IoU with our GmaIoU in ATSS, a state-of-the-art (SOTA) assigner. Then, we train YOLACT, a real-time instance segmentation method, using our GmaIoU-based ATSS assigner. The resulting YOLACT based on the GmaIoU assigner outperforms (i) ATSS with IoU by $\sim 1.0-1.5$ mask AP, (ii) YOLACT with a fixed IoU threshold assigner by $\sim 1.5-2$ mask AP over different image sizes and (iii) decreases the inference time by $25 \%$ owing to using less anchors. Taking advantage of this efficiency, we further devise GmaYOLACT, a faster and $+7$ mask AP points more accurate detector than YOLACT. Our best model achieves $38.7$ mask AP at $26$ fps on COCO test-dev establishing a new state-of-the-art for real-time instance segmentation. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 28 pages, 4 figures

arXiv:2312.00189 [pdf, other]

HeTriNet: Heterogeneous Graph Triplet Attention Network for Drug-Target-Disease Interaction

Authors: Farhan Tanvir, Khaled Mohammed Saifuddin, Tanvir Hossain, Arunkumar Bagavathi, Esra Akbas

Abstract: Modeling the interactions between drugs, targets, and diseases is paramount in drug discovery and has significant implications for precision medicine and personalized treatments. Current approaches frequently consider drug-target or drug-disease interactions individually, ignoring the interdependencies among all three entities. Within human metabolic systems, drugs interact with protein targets in… ▽ More Modeling the interactions between drugs, targets, and diseases is paramount in drug discovery and has significant implications for precision medicine and personalized treatments. Current approaches frequently consider drug-target or drug-disease interactions individually, ignoring the interdependencies among all three entities. Within human metabolic systems, drugs interact with protein targets in cells, influencing target activities and subsequently impacting biological pathways to promote healthy functions and treat diseases. Moving beyond binary relationships and exploring tighter triple relationships is essential to understanding drugs' mechanism of action (MoAs). Moreover, identifying the heterogeneity of drugs, targets, and diseases, along with their distinct characteristics, is critical to model these complex interactions appropriately. To address these challenges, we effectively model the interconnectedness of all entities in a heterogeneous graph and develop a novel Heterogeneous Graph Triplet Attention Network (\texttt{HeTriNet}). \texttt{HeTriNet} introduces a novel triplet attention mechanism within this heterogeneous graph structure. Beyond pairwise attention as the importance of an entity for the other one, we define triplet attention to model the importance of pairs for entities in the drug-target-disease triplet prediction problem. Experimental results on real-world datasets show that \texttt{HeTriNet} outperforms several baselines, demonstrating its remarkable proficiency in uncovering novel drug-target-disease relationships. △ Less

Submitted 30 November, 2023; originally announced December 2023.

Comments: 13 pages, 3 figures, 6 tables

arXiv:2311.17330 [pdf]

Biomedical knowledge graph-optimized prompt generation for large language models

Authors: Karthik Soman, Peter W Rose, John H Morris, Rabia E Akbas, Brett Smith, Braian Peetoom, Catalina Villouta-Reyes, Gabriel Cerono, Yongmei Shi, Angela Rizk-Jackson, Sharat Israni, Charlotte A Nelson, Sui Huang, Sergio E Baranzini

Abstract: Large Language Models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains like biomedicine. Solutions such as pre-training and domain-specific fine-tuning add substantial computational overhead, requiring further domain expertise. Here, we introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) fra… ▽ More Large Language Models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains like biomedicine. Solutions such as pre-training and domain-specific fine-tuning add substantial computational overhead, requiring further domain expertise. Here, we introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging a massive biomedical KG (SPOKE) with LLMs such as Llama-2-13b, GPT-3.5-Turbo and GPT-4, to generate meaningful biomedical text rooted in established knowledge. Compared to the existing RAG technique for Knowledge Graphs, the proposed method utilizes minimal graph schema for context extraction and uses embedding methods for context pruning. This optimization in context extraction results in more than 50% reduction in token consumption without compromising the accuracy, making a cost-effective and robust RAG implementation on proprietary LLMs. KG-RAG consistently enhanced the performance of LLMs across diverse biomedical prompts by generating responses rooted in established knowledge, accompanied by accurate provenance and statistical evidence (if available) to substantiate the claims. Further benchmarking on human curated datasets, such as biomedical true/false and multiple-choice questions (MCQ), showed a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework's capacity to empower open-source models with fewer parameters for domain specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 and GPT-4. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM in a token optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a cost-effective fashion. △ Less

Submitted 13 May, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 29 pages, 5 figures, 1 table, 1 supplementary file

arXiv:2311.14090 [pdf, other]

Class Uncertainty: A Measure to Mitigate Class Imbalance

Authors: Z. S. Baltaci, K. Oksuz, S. Kuzucu, K. Tezoren, B. K. Konar, A. Ozkan, E. Akbas, S. Kalkan

Abstract: Class-wise characteristics of training examples affect the performance of deep classifiers. A well-studied example is when the number of training examples of classes follows a long-tailed distribution, a situation that is likely to yield sub-optimal performance for under-represented classes. This class imbalance problem is conventionally addressed by approaches relying on the class-wise cardinalit… ▽ More Class-wise characteristics of training examples affect the performance of deep classifiers. A well-studied example is when the number of training examples of classes follows a long-tailed distribution, a situation that is likely to yield sub-optimal performance for under-represented classes. This class imbalance problem is conventionally addressed by approaches relying on the class-wise cardinality of training examples, such as data resampling. In this paper, we demonstrate that considering solely the cardinality of classes does not cover all issues causing class imbalance. To measure class imbalance, we propose "Class Uncertainty" as the average predictive uncertainty of the training examples, and we show that this novel measure captures the differences across classes better than cardinality. We also curate SVCI-20 as a novel dataset in which the classes have equal number of training examples but they differ in terms of their hardness; thereby causing a type of class imbalance which cannot be addressed by the approaches relying on cardinality. We incorporate our "Class Uncertainty" measure into a diverse set of ten class imbalance mitigation methods to demonstrate its effectiveness on long-tailed datasets as well as on our SVCI-20. Code and datasets will be made available. △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2311.03520 [pdf, other]

Brain Networks and Intelligence: A Graph Neural Network Based Approach to Resting State fMRI Data

Authors: Bishal Thapaliya, Esra Akbas, Jiayu Chen, Raam Sapkota, Bhaskar Ray, Pranav Suresh, Vince Calhoun, **gyu Liu

Abstract: Resting-state functional magnetic resonance imaging (rsfMRI) is a powerful tool for investigating the relationship between brain function and cognitive processes as it allows for the functional organization of the brain to be captured without relying on a specific task or stimuli. In this paper, we present a novel modeling architecture called BrainRGIN for predicting intelligence (fluid, crystalli… ▽ More Resting-state functional magnetic resonance imaging (rsfMRI) is a powerful tool for investigating the relationship between brain function and cognitive processes as it allows for the functional organization of the brain to be captured without relying on a specific task or stimuli. In this paper, we present a novel modeling architecture called BrainRGIN for predicting intelligence (fluid, crystallized, and total intelligence) using graph neural networks on rsfMRI derived static functional network connectivity matrices. Extending from the existing graph convolution networks, our approach incorporates a clustering-based embedding and graph isomorphism network in the graph convolutional layer to reflect the nature of the brain sub-network organization and efficient network expression, in combination with TopK pooling and attention-based readout functions. We evaluated our proposed architecture on a large dataset, specifically the Adolescent Brain Cognitive Development Dataset, and demonstrated its effectiveness in predicting individual differences in intelligence. Our model achieved lower mean squared errors and higher correlation scores than existing relevant graph architectures and other traditional machine learning models for all of the intelligence prediction tasks. The middle frontal gyrus exhibited a significant contribution to both fluid and crystallized intelligence, suggesting their pivotal role in these cognitive processes. Total composite scores identified a diverse set of brain regions to be relevant which underscores the complex nature of total intelligence. △ Less

Submitted 26 March, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.09657 [pdf, other]

Topology-guided Hypergraph Transformer Network: Unveiling Structural Insights for Improved Representation

Authors: Khaled Mohammed Saifuddin, Mehmet Emin Aktas, Esra Akbas

Abstract: Hypergraphs, with their capacity to depict high-order relationships, have emerged as a significant extension of traditional graphs. Although Graph Neural Networks (GNNs) have remarkable performance in graph representation learning, their extension to hypergraphs encounters challenges due to their intricate structures. Furthermore, current hypergraph transformers, a special variant of GNN, utilize… ▽ More Hypergraphs, with their capacity to depict high-order relationships, have emerged as a significant extension of traditional graphs. Although Graph Neural Networks (GNNs) have remarkable performance in graph representation learning, their extension to hypergraphs encounters challenges due to their intricate structures. Furthermore, current hypergraph transformers, a special variant of GNN, utilize semantic feature-based self-attention, ignoring topological attributes of nodes and hyperedges. To address these challenges, we propose a Topology-guided Hypergraph Transformer Network (THTN). In this model, we first formulate a hypergraph from a graph while retaining its structural essence to learn higher-order relations within the graph. Then, we design a simple yet effective structural and spatial encoding module to incorporate the topological and spatial information of the nodes into their representation. Further, we present a structure-aware self-attention mechanism that discovers the important nodes and hyperedges from both semantic and structural viewpoints. By leveraging these two modules, THTN crafts an improved node representation, capturing both local and global topological expressions. Extensive experiments conducted on node classification tasks demonstrate that the performance of the proposed model consistently exceeds that of the existing approaches. △ Less

Submitted 21 May, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

Comments: 9 pages, 3 figures

arXiv:2310.04901 [pdf, other]

WAIT: Feature War** for Animation to Illustration video Translation using GANs

Authors: Samet Hicsonmez, Nermin Samet, Fidan Samet, Oguz Bakir, Emre Akbas, Pinar Duygulu

Abstract: In this paper, we explore a new domain for video-to-video translation. Motivated by the availability of animation movies that are adopted from illustrated books for children, we aim to stylize these videos with the style of the original illustrations. Current state-of-the-art video-to-video translation models rely on having a video sequence or a single style image to stylize an input video. We int… ▽ More In this paper, we explore a new domain for video-to-video translation. Motivated by the availability of animation movies that are adopted from illustrated books for children, we aim to stylize these videos with the style of the original illustrations. Current state-of-the-art video-to-video translation models rely on having a video sequence or a single style image to stylize an input video. We introduce a new problem for video stylizing where an unordered set of images are used. This is a challenging task for two reasons: i) we do not have the advantage of temporal consistency as in video sequences; ii) it is more difficult to obtain consistent styles for video frames from a set of unordered images compared to using a single image. Most of the video-to-video translation methods are built on an image-to-image translation model, and integrate additional networks such as optical flow, or temporal predictors to capture temporal relations. These additional networks make the model training and inference complicated and slow down the process. To ensure temporal coherency in video-to-video style transfer, we propose a new generator network with feature war** layers which overcomes the limitations of the previous methods. We show the effectiveness of our method on three datasets both qualitatively and quantitatively. Code and pretrained models are available at https://github.com/giddyyupp/wait. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2309.04902 [pdf, other]

Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art

Authors: Aref Miri Rekavandi, Shima Rashidi, Farid Boussaid, Stephen Hoefs, Emre Akbas, Mohammed bennamoun

Abstract: Transformers have rapidly gained popularity in computer vision, especially in the field of object recognition and detection. Upon examining the outcomes of state-of-the-art object detection methods, we noticed that transformers consistently outperformed well-established CNN-based detectors in almost every video or image dataset. While transformer-based approaches remain at the forefront of small o… ▽ More Transformers have rapidly gained popularity in computer vision, especially in the field of object recognition and detection. Upon examining the outcomes of state-of-the-art object detection methods, we noticed that transformers consistently outperformed well-established CNN-based detectors in almost every video or image dataset. While transformer-based approaches remain at the forefront of small object detection (SOD) techniques, this paper aims to explore the performance benefits offered by such extensive networks and identify potential reasons for their SOD superiority. Small objects have been identified as one of the most challenging object types in detection frameworks due to their low visibility. We aim to investigate potential strategies that could enhance transformers' performance in SOD. This survey presents a taxonomy of over 60 research studies on developed transformers for the task of SOD, spanning the years 2020 to 2023. These studies encompass a variety of detection applications, including small object detection in generic images, aerial images, medical images, active millimeter images, underwater images, and videos. We also compile and present a list of 12 large-scale datasets suitable for SOD that were overlooked in previous studies and compare the performance of the reviewed studies using popular metrics such as mean Average Precision (mAP), Frames Per Second (FPS), number of parameters, and more. Researchers can keep track of newer studies on our web page, which is available at \url{https://github.com/arekavandi/Transformer-SOD}. △ Less

Submitted 9 September, 2023; originally announced September 2023.

arXiv:2306.11484 [pdf, other]

Hypergraph Classification via Persistent Homology

Authors: Mehmet Emin Aktas, Thu Nguyen, Rakin Riza, Muhammad Ifte Islam, Esra Akbas

Abstract: Persistent homology is a mathematical tool used for studying the shape of data by extracting its topological features. It has gained popularity in network science due to its applicability in various network mining problems, including clustering, graph classification, and graph neural networks. The definition of persistent homology for graphs is relatively straightforward, as graphs possess distinc… ▽ More Persistent homology is a mathematical tool used for studying the shape of data by extracting its topological features. It has gained popularity in network science due to its applicability in various network mining problems, including clustering, graph classification, and graph neural networks. The definition of persistent homology for graphs is relatively straightforward, as graphs possess distinct intrinsic distances and a simplicial complex structure. However, hypergraphs present a challenge in preserving topological information since they may not have a simplicial complex structure. In this paper, we define several topological characterizations of hypergraphs in defining hypergraph persistent homology to prioritize different higher-order structures within hypergraphs. We further use these persistent homology filtrations in classifying four different real-world hypergraphs and compare their performance to the state-of-the-art graph neural network models. Experimental results demonstrate that persistent homology filtrations are effective in classifying hypergraphs and outperform the baseline models. To the best of our knowledge, this study represents the first systematic attempt to tackle the hypergraph classification problem using persistent homology. △ Less

Submitted 20 June, 2023; originally announced June 2023.

MSC Class: 55N31; 62R40

arXiv:2303.03654 [pdf, other]

MPool: Motif-Based Graph Pooling

Authors: Muhammad Ifte Khairul Islam, Max Khanov, Esra Akbas

Abstract: Graph Neural networks (GNNs) have recently become a powerful technique for many graph-related tasks including graph classification. Current GNN models apply different graph pooling methods that reduce the number of nodes and edges to learn the higher-order structure of the graph in a hierarchical way. All these methods primarily rely on the one-hop neighborhood. However, they do not consider the h… ▽ More Graph Neural networks (GNNs) have recently become a powerful technique for many graph-related tasks including graph classification. Current GNN models apply different graph pooling methods that reduce the number of nodes and edges to learn the higher-order structure of the graph in a hierarchical way. All these methods primarily rely on the one-hop neighborhood. However, they do not consider the higher- order structure of the graph. In this work, we propose a multi-channel Motif-based Graph Pooling method named (MPool) captures the higher-order graph structure with motif and local and global graph structure with a combination of selection and clustering-based pooling operations. As the first channel, we develop node selection-based graph pooling by designing a node ranking model considering the motif adjacency of nodes. As the second channel, we develop cluster-based graph pooling by designing a spectral clustering model using motif adjacency. As the final layer, the result of each channel is aggregated into the final graph representation. We perform extensive experiments on eight benchmark datasets and show that our proposed method shows better accuracy than the baseline methods for graph classification tasks. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2303.02393 [pdf, other]

Seq-HyGAN: Sequence Classification via Hypergraph Attention Network

Authors: Khaled Mohammed Saifuddin, Corey May, Farhan Tanvir, Muhammad Ifte Khairul Islam, Esra Akbas

Abstract: Sequence classification has a wide range of real-world applications in different domains, such as genome classification in health and anomaly detection in business. However, the lack of explicit features in sequence data makes it difficult for machine learning models. While Neural Network (NN) models address this with learning features automatically, they are limited to capturing adjacent structur… ▽ More Sequence classification has a wide range of real-world applications in different domains, such as genome classification in health and anomaly detection in business. However, the lack of explicit features in sequence data makes it difficult for machine learning models. While Neural Network (NN) models address this with learning features automatically, they are limited to capturing adjacent structural connections and ignore global, higher-order information between the sequences. To address these challenges in the sequence classification problems, we propose a novel Hypergraph Attention Network model, namely Seq-HyGAN. To capture the complex structural similarity between sequence data, we first create a hypergraph where the sequences are depicted as hyperedges and subsequences extracted from sequences are depicted as nodes. Additionally, we introduce an attention-based Hypergraph Neural Network model that utilizes a two-level attention mechanism. This model generates a sequence representation as a hyperedge while simultaneously learning the crucial subsequences for each sequence. We conduct extensive experiments on four data sets to assess and compare our model with several state-of-the-art methods. Experimental results demonstrate that our proposed Seq-HyGAN model can effectively classify sequence data and significantly outperform the baselines. We also conduct case studies to investigate the contribution of each module in Seq-HyGAN. △ Less

Submitted 15 June, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

arXiv:2301.08590 [pdf, other]

Improving Sketch Colorization using Adversarial Segmentation Consistency

Authors: Samet Hicsonmez, Nermin Samet, Emre Akbas, Pinar Duygulu

Abstract: We propose a new method for producing color images from sketches. Current solutions in sketch colorization either necessitate additional user instruction or are restricted to the "paired" translation strategy. We leverage semantic image segmentation from a general-purpose panoptic segmentation network to generate an additional adversarial loss function. The proposed loss function is compatible wit… ▽ More We propose a new method for producing color images from sketches. Current solutions in sketch colorization either necessitate additional user instruction or are restricted to the "paired" translation strategy. We leverage semantic image segmentation from a general-purpose panoptic segmentation network to generate an additional adversarial loss function. The proposed loss function is compatible with any GAN model. Our method is not restricted to datasets with segmentation labels and can be applied to unpaired translation tasks as well. Using qualitative, and quantitative analysis, and based on a user study, we demonstrate the efficacy of our method on four distinct image datasets. On the FID metric, our model improves the baseline by up to 35 points. Our code, pretrained models, scripts to produce newly introduced datasets and corresponding sketch images are available at https://github.com/giddyyupp/AdvSegLoss. △ Less

Submitted 20 January, 2023; originally announced January 2023.

Comments: Under review at Pattern Recognition Letters. arXiv admin note: substantial text overlap with arXiv:2102.06192

arXiv:2301.01019 [pdf, other]

Correlation Loss: Enforcing Correlation between Classification and Localization

Authors: Fehmi Kahraman, Kemal Oksuz, Sinan Kalkan, Emre Akbas

Abstract: Object detectors are conventionally trained by a weighted sum of classification and localization losses. Recent studies (e.g., predicting IoU with an auxiliary head, Generalized Focal Loss, Rank & Sort Loss) have shown that forcing these two loss terms to interact with each other in non-conventional ways creates a useful inductive bias and improves performance. Inspired by these works, we focus on… ▽ More Object detectors are conventionally trained by a weighted sum of classification and localization losses. Recent studies (e.g., predicting IoU with an auxiliary head, Generalized Focal Loss, Rank & Sort Loss) have shown that forcing these two loss terms to interact with each other in non-conventional ways creates a useful inductive bias and improves performance. Inspired by these works, we focus on the correlation between classification and localization and make two main contributions: (i) We provide an analysis about the effects of correlation between classification and localization tasks in object detectors. We identify why correlation affects the performance of various NMS-based and NMS-free detectors, and we devise measures to evaluate the effect of correlation and use them to analyze common detectors. (ii) Motivated by our observations, e.g., that NMS-free detectors can also benefit from correlation, we propose Correlation Loss, a novel plug-in loss function that improves the performance of various object detectors by directly optimizing correlation coefficients: E.g., Correlation Loss on Sparse R-CNN, an NMS-free method, yields 1.6 AP gain on COCO and 1.8 AP gain on Cityscapes dataset. Our best model on Sparse R-CNN reaches 51.0 AP without test-time augmentation on COCO test-dev, reaching state-of-the-art. Code is available at https://github.com/fehmikahraman/CorrLoss △ Less

Submitted 3 January, 2023; originally announced January 2023.

Comments: Accepted to AAAI 2023

arXiv:2208.02012 [pdf, other]

Character Generation through Self-Supervised Vectorization

Authors: Gokcen Gokceoglu, Emre Akbas

Abstract: The prevalent approach in self-supervised image generation is to operate on pixel level representations. While this approach can produce high quality images, it cannot benefit from the simplicity and innate quality of vectorization. Here we present a drawing agent that operates on stroke-level representation of images. At each time step, the agent first assesses the current canvas and decides whet… ▽ More The prevalent approach in self-supervised image generation is to operate on pixel level representations. While this approach can produce high quality images, it cannot benefit from the simplicity and innate quality of vectorization. Here we present a drawing agent that operates on stroke-level representation of images. At each time step, the agent first assesses the current canvas and decides whether to stop or keep drawing. When a 'draw' decision is made, the agent outputs a program indicating the stroke to be drawn. As a result, it produces a final raster image by drawing the strokes on a canvas, using a minimal number of strokes and dynamically deciding when to stop. We train our agent through reinforcement learning on MNIST and Omniglot datasets for unconditional generation and parsing (reconstruction) tasks. We utilize our parsing agent for exemplar generation and type conditioned concept generation in Omniglot challenge without any further training. We present successful results on all three generation tasks and the parsing task. Crucially, we do not need any stroke-level or vector supervision; we only use raster images for training. △ Less

Submitted 3 August, 2022; originally announced August 2022.

arXiv:2207.05672 [pdf, other]

DDI Prediction via Heterogeneous Graph Attention Networks

Authors: Farhan Tanvir, Khaled Mohammed Saifuddin, Esra Akbas

Abstract: Polypharmacy, defined as the use of multiple drugs together, is a standard treatment method, especially for severe and chronic diseases. However, using multiple drugs together may cause interactions between drugs. Drug-drug interaction (DDI) is the activity that occurs when the impact of one drug changes when combined with another. DDIs may obstruct, increase, or decrease the intended effect of ei… ▽ More Polypharmacy, defined as the use of multiple drugs together, is a standard treatment method, especially for severe and chronic diseases. However, using multiple drugs together may cause interactions between drugs. Drug-drug interaction (DDI) is the activity that occurs when the impact of one drug changes when combined with another. DDIs may obstruct, increase, or decrease the intended effect of either drug or, in the worst-case scenario, create adverse side effects. While it is critical to detect DDIs on time, it is timeconsuming and expensive to identify them in clinical trials due to their short duration and many possible drug pairs to be considered for testing. As a result, computational methods are needed for predicting DDIs. In this paper, we present a novel heterogeneous graph attention model, HAN-DDI to predict drug-drug interactions. We create a heterogeneous network of drugs with different biological entities. Then, we develop a heterogeneous graph attention network to learn DDIs using relations of drugs with other entities. It consists of an attention-based heterogeneous graph node encoder for obtaining drug node representations and a decoder for predicting drug-drug interactions. Further, we utilize comprehensive experiments to evaluate of our model and to compare it with state-of-the-art models. Experimental results show that our proposed method, HAN-DDI, outperforms the baselines significantly and accurately predicts DDIs, even for new drugs. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: 10 pages, 3 figures, 8 tables, accepted in BioKDD

arXiv:2206.12747 [pdf, other]

HyGNN: Drug-Drug Interaction Prediction via Hypergraph Neural Network

Authors: Khaled Mohammed Saifuddin, Briana Bumgardner, Farhan Tanvir, Esra Akbas

Abstract: Drug-Drug Interactions (DDIs) may hamper the functionalities of drugs, and in the worst scenario, they may lead to adverse drug reactions (ADRs). Predicting all DDIs is a challenging and critical problem. Most existing computational models integrate drug-centric information from different sources and leverage them as features in machine learning classifiers to predict DDIs. However, these models h… ▽ More Drug-Drug Interactions (DDIs) may hamper the functionalities of drugs, and in the worst scenario, they may lead to adverse drug reactions (ADRs). Predicting all DDIs is a challenging and critical problem. Most existing computational models integrate drug-centric information from different sources and leverage them as features in machine learning classifiers to predict DDIs. However, these models have a high chance of failure, especially for the new drugs when all the information is not available. This paper proposes a novel Hypergraph Neural Network (HyGNN) model based on only the SMILES string of drugs, available for any drug, for the DDI prediction problem. To capture the drug similarities, we create a hypergraph from drugs' chemical substructures extracted from the SMILES strings. Then, we develop HyGNN consisting of a novel attention-based hypergraph edge encoder to get the representation of drugs as hyperedges and a decoder to predict the interactions between drug pairs. Furthermore, we conduct extensive experiments to evaluate our model and compare it with several state-of-the-art methods. Experimental results demonstrate that our proposed HyGNN model effectively predicts DDIs and impressively outperforms the baselines with a maximum ROC-AUC and PR-AUC of 97.9% and 98.1%, respectively. △ Less

Submitted 18 April, 2023; v1 submitted 25 June, 2022; originally announced June 2022.

Comments: Some new experiments have been added. One more dataset has been considered. Theoretical part has been updated too

arXiv:2204.13492 [pdf, other]

Representation Recycling for Streaming Video Analysis

Authors: Can Ufuk Ertenli, Ramazan Gokberk Cinbis, Emre Akbas

Abstract: We present StreamDEQ, a method that aims to infer frame-wise representations on videos with minimal per-frame computation. Conventional deep networks do feature extraction from scratch at each frame in the absence of ad-hoc solutions. We instead aim to build streaming recognition models that can natively exploit temporal smoothness between consecutive video frames. We observe that the recently eme… ▽ More We present StreamDEQ, a method that aims to infer frame-wise representations on videos with minimal per-frame computation. Conventional deep networks do feature extraction from scratch at each frame in the absence of ad-hoc solutions. We instead aim to build streaming recognition models that can natively exploit temporal smoothness between consecutive video frames. We observe that the recently emerging implicit layer models provide a convenient foundation to construct such models, as they define representations as the fixed-points of shallow networks, which need to be estimated using iterative methods. Our main insight is to distribute the inference iterations over the temporal axis by using the most recent representation as a starting point at each frame. This scheme effectively recycles the recent inference computations and greatly reduces the needed processing time. Through extensive experimental analysis, we show that StreamDEQ is able to recover near-optimal representations in a few frames' time and maintain an up-to-date representation throughout the video duration. Our experiments on video semantic segmentation, video object detection, and human pose estimation in videos show that StreamDEQ achieves on-par accuracy with the baseline while being more than 2-4x faster. △ Less

Submitted 6 January, 2024; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: v3: ECCV2022 paper. This version: extended version under review at TPAMI

arXiv:2204.06512 [pdf, other]

Does depth estimation help object detection?

Authors: Bedrettin Cetinkaya, Sinan Kalkan, Emre Akbas

Abstract: Ground-truth depth, when combined with color data, helps improve object detection accuracy over baseline models that only use color. However, estimated depth does not always yield improvements. Many factors affect the performance of object detection when estimated depth is used. In this paper, we comprehensively investigate these factors with detailed experiments, such as using ground-truth vs. es… ▽ More Ground-truth depth, when combined with color data, helps improve object detection accuracy over baseline models that only use color. However, estimated depth does not always yield improvements. Many factors affect the performance of object detection when estimated depth is used. In this paper, we comprehensively investigate these factors with detailed experiments, such as using ground-truth vs. estimated depth, effects of different state-of-the-art depth estimation networks, effects of using different indoor and outdoor RGB-D datasets as training data for depth estimation, and different architectural choices for integrating depth to the base object detector network. We propose an early concatenation strategy of depth, which yields higher mAP than previous works' while using significantly fewer parameters. △ Less

Submitted 13 April, 2022; originally announced April 2022.

Comments: Accepted to Image and Vision Computing

arXiv:2203.11275 [pdf, other]

Liars are more influential: Effect of Deception in Influence Maximization on Social Networks

Authors: Mehmet Emin Aktas, Esra Akbas, Ashley Hahn

Abstract: Detecting influential users, called the influence maximization problem on social networks, is an important graph mining problem with many diverse applications such as information propagation, market advertising, and rumor controlling. There are many studies in the literature for influential users detection problem in social networks. Although the current methods are successfully used in many diffe… ▽ More Detecting influential users, called the influence maximization problem on social networks, is an important graph mining problem with many diverse applications such as information propagation, market advertising, and rumor controlling. There are many studies in the literature for influential users detection problem in social networks. Although the current methods are successfully used in many different applications, they assume that users are honest with each other and ignore the role of deception on social networks. On the other hand, deception appears to be surprisingly common among humans within social networks. In this paper, we study the effect of deception in influence maximization on social networks. We first model deception in social networks. Then, we model the opinion dynamics on these networks taking the deception into consideration thanks to a recent opinion dynamics model via sheaf Laplacian. We then extend two influential node detection methods, namely Laplacian centrality and DFF centrality, for the sheaf Laplacian to measure the effect of deception in influence maximization. Our experimental results on synthetic and real-world networks suggest that liars are more influential than honest users in social networks. △ Less

Submitted 21 March, 2022; originally announced March 2022.

MSC Class: 91D30; 55N30

arXiv:2110.09734 [pdf, other]

Mask-aware IoU for Anchor Assignment in Real-time Instance Segmentation

Authors: Kemal Oksuz, Baris Can Cam, Fehmi Kahraman, Zeynep Sonat Baltaci, Sinan Kalkan, Emre Akbas

Abstract: This paper presents Mask-aware Intersection-over-Union (maIoU) for assigning anchor boxes as positives and negatives during training of instance segmentation methods. Unlike conventional IoU or its variants, which only considers the proximity of two boxes; maIoU consistently measures the proximity of an anchor box with not only a ground truth box but also its associated ground truth mask. Thus, ad… ▽ More This paper presents Mask-aware Intersection-over-Union (maIoU) for assigning anchor boxes as positives and negatives during training of instance segmentation methods. Unlike conventional IoU or its variants, which only considers the proximity of two boxes; maIoU consistently measures the proximity of an anchor box with not only a ground truth box but also its associated ground truth mask. Thus, additionally considering the mask, which, in fact, represents the shape of the object, maIoU enables a more accurate supervision during training. We present the effectiveness of maIoU on a state-of-the-art (SOTA) assigner, ATSS, by replacing IoU operation by our maIoU and training YOLACT, a SOTA real-time instance segmentation method. Using ATSS with maIoU consistently outperforms (i) ATSS with IoU by $\sim 1$ mask AP, (ii) baseline YOLACT with fixed IoU threshold assigner by $\sim 2$ mask AP over different image sizes and (iii) decreases the inference time by $25 \%$ owing to using less anchors. Then, exploiting this efficiency, we devise maYOLACT, a faster and $+6$ AP more accurate detector than YOLACT. Our best model achieves $37.7$ mask AP at $25$ fps on COCO test-dev establishing a new state-of-the-art for real-time instance segmentation. Code is available at https://github.com/kemaloksuz/Mask-aware-IoU △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: BMVC 2021, camera ready version

arXiv:2107.11669 [pdf, other]

Rank & Sort Loss for Object Detection and Instance Segmentation

Authors: Kemal Oksuz, Baris Can Cam, Emre Akbas, Sinan Kalkan

Abstract: We propose Rank & Sort (RS) Loss, a ranking-based loss function to train deep object detection and instance segmentation methods (i.e. visual detectors). RS Loss supervises the classifier, a sub-network of these methods, to rank each positive above all negatives as well as to sort positives among themselves with respect to (wrt.) their localisation qualities (e.g. Intersection-over-Union - IoU). T… ▽ More We propose Rank & Sort (RS) Loss, a ranking-based loss function to train deep object detection and instance segmentation methods (i.e. visual detectors). RS Loss supervises the classifier, a sub-network of these methods, to rank each positive above all negatives as well as to sort positives among themselves with respect to (wrt.) their localisation qualities (e.g. Intersection-over-Union - IoU). To tackle the non-differentiable nature of ranking and sorting, we reformulate the incorporation of error-driven update with backpropagation as Identity Update, which enables us to model our novel sorting error among positives. With RS Loss, we significantly simplify training: (i) Thanks to our sorting objective, the positives are prioritized by the classifier without an additional auxiliary head (e.g. for centerness, IoU, mask-IoU), (ii) due to its ranking-based nature, RS Loss is robust to class imbalance, and thus, no sampling heuristic is required, and (iii) we address the multi-task nature of visual detectors using tuning-free task-balancing coefficients. Using RS Loss, we train seven diverse visual detectors only by tuning the learning rate, and show that it consistently outperforms baselines: e.g. our RS Loss improves (i) Faster R-CNN by ~ 3 box AP and aLRP Loss (ranking-based baseline) by ~ 2 box AP on COCO dataset, (ii) Mask R-CNN with repeat factor sampling (RFS) by 3.5 mask AP (~ 7 AP for rare classes) on LVIS dataset; and also outperforms all counterparts. Code is available at: https://github.com/kemaloksuz/RankSortLoss △ Less

Submitted 30 August, 2021; v1 submitted 24 July, 2021; originally announced July 2021.

Comments: ICCV 2021, oral presentation

arXiv:2106.04269 [pdf, other]

HPRNet: Hierarchical Point Regression for Whole-Body Human Pose Estimation

Authors: Nermin Samet, Emre Akbas

Abstract: In this paper, we present a new bottom-up one-stage method for whole-body pose estimation, which we call "hierarchical point regression," or HPRNet for short. In standard body pose estimation, the locations of $\sim 17$ major joints on the human body are estimated. Differently, in whole-body pose estimation, the locations of fine-grained keypoints (68 on face, 21 on each hand and 3 on each foot) a… ▽ More In this paper, we present a new bottom-up one-stage method for whole-body pose estimation, which we call "hierarchical point regression," or HPRNet for short. In standard body pose estimation, the locations of $\sim 17$ major joints on the human body are estimated. Differently, in whole-body pose estimation, the locations of fine-grained keypoints (68 on face, 21 on each hand and 3 on each foot) are estimated as well, which creates a scale variance problem that needs to be addressed. To handle the scale variance among different body parts, we build a hierarchical point representation of body parts and jointly regress them. The relative locations of fine-grained keypoints in each part (e.g. face) are regressed in reference to the center of that part, whose location itself is estimated relative to the person center. In addition, unlike the existing two-stage methods, our method predicts whole-body pose in a constant time independent of the number of people in an image. On the COCO WholeBody dataset, HPRNet significantly outperforms all previous bottom-up methods on the keypoint detection of all whole-body parts (i.e. body, foot, face and hand); it also achieves state-of-the-art results on face (75.4 AP) and hand (50.4 AP) keypoint detection. Code and models are available at \url{https://github.com/nerminsamet/HPRNet}. △ Less

Submitted 23 August, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: accepted for publication at IMAVIS

arXiv:2105.02763 [pdf, other]

Identifying critical higher-order interactions in complex networks

Authors: Mehmet Emin Aktas, Thu Nguyen, Sidra Jawaid, Rakin Riza, Esra Akbas

Abstract: Information diffusion on networks is an important concept in network science observed in many situations such as information spreading and rumor controlling in social networks, disease contagion between individuals, cascading failures in power grids. The critical interactions in networks are the ones that play critical roles in information diffusion and primarily affect network structure and funct… ▽ More Information diffusion on networks is an important concept in network science observed in many situations such as information spreading and rumor controlling in social networks, disease contagion between individuals, cascading failures in power grids. The critical interactions in networks are the ones that play critical roles in information diffusion and primarily affect network structure and functions. Besides, interactions can occur between not only two nodes as pairwise interactions, i.e., edges, but also three or more nodes, described as higher-order interactions. This report presents a novel method to identify critical higher-order interactions. We propose two new Laplacians that allow redefining classical graph centrality measures for higher-order interactions. We then compare the redefined centrality measures using the Susceptible-Infected-Recovered (SIR) simulation model. Experimental results suggest that the proposed method is promising in identifying critical higher-order interactions. △ Less

Submitted 7 May, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

MSC Class: 55U10; 91D30

arXiv:2104.06773 [pdf, other]

HoughNet: Integrating near and long-range evidence for visual detection

Authors: Nermin Samet, Samet Hicsonmez, Emre Akbas

Abstract: This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughN… ▽ More This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby generalizing and enhancing current object detection methodology, which typically relies on only local evidence. On the COCO dataset, HoughNet's best model achieves $46.4$ $AP$ (and $65.1$ $AP_{50}$), performing on par with the state-of-the-art in bottom-up object detection and outperforming most major one-stage and two-stage methods. We further validate the effectiveness of our proposal in other visual detection tasks, namely, video object detection, instance segmentation, 3D object detection and keypoint detection for human pose estimation, and an additional "labels to photo" image generation task, where the integration of our voting module consistently improves performance in all cases. Code is available at https://github.com/nerminsamet/houghnet. △ Less

Submitted 17 August, 2022; v1 submitted 14 April, 2021; originally announced April 2021.

Comments: accepted to the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). arXiv admin note: substantial text overlap with arXiv:2007.02355

arXiv:2102.08867 [pdf, other]

Hypergraph Laplacians in Diffusion Framework

Authors: Mehmet Emin Aktas, Esra Akbas

Abstract: Networks are important structures used to model complex systems where interactions take place. In a basic network model, entities are represented as nodes, and interaction and relations among them are represented as edges. However, in a complex system, we cannot describe all relations as pairwise interactions, rather should describe as higher-order interactions. Hypergraphs are successfully used t… ▽ More Networks are important structures used to model complex systems where interactions take place. In a basic network model, entities are represented as nodes, and interaction and relations among them are represented as edges. However, in a complex system, we cannot describe all relations as pairwise interactions, rather should describe as higher-order interactions. Hypergraphs are successfully used to model higher-order interactions in complex systems. In this paper, we present two new hypergraph Laplacians based on diffusion framework. Our Laplacians take the relations between higher-order interactions into consideration, hence can be used to model diffusion on hypergraphs not only between vertices but also higher-order structures. These Laplacians can be employed in different network mining problems on hypergraphs, such as social contagion models on hypergraphs, influence study on hypergraphs, and hypergraph classification, to list a few. △ Less

Submitted 17 February, 2021; originally announced February 2021.

MSC Class: 55U10; 91D30

arXiv:2102.08079 [pdf, other]

Just Noticeable Difference for Machine Perception and Generation of Regularized Adversarial Images with Minimal Perturbation

Authors: Adil Kaan Akan, Emre Akbas, Fatos T. Yarman Vural

Abstract: In this study, we introduce a measure for machine perception, inspired by the concept of Just Noticeable Difference (JND) of human perception. Based on this measure, we suggest an adversarial image generation algorithm, which iteratively distorts an image by an additive noise until the model detects the change in the image by outputting a false label. The noise added to the original image is defin… ▽ More In this study, we introduce a measure for machine perception, inspired by the concept of Just Noticeable Difference (JND) of human perception. Based on this measure, we suggest an adversarial image generation algorithm, which iteratively distorts an image by an additive noise until the model detects the change in the image by outputting a false label. The noise added to the original image is defined as the gradient of the cost function of the model. A novel cost function is defined to explicitly minimize the amount of perturbation applied to the input image while enforcing the perceptual similarity between the adversarial and input images. For this purpose, the cost function is regularized by the well-known total variation and bounded range terms to meet the natural appearance of the adversarial image. We evaluate the adversarial images generated by our algorithm both qualitatively and quantitatively on CIFAR10, ImageNet, and MS COCO datasets. Our experiments on image classification and object detection tasks show that adversarial images generated by our JND method are both more successful in deceiving the recognition/detection models and less perturbed compared to the images generated by the state-of-the-art methods, namely, FGV, FSGM, and DeepFool methods. △ Less

Submitted 29 November, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: Accepted to Signal, Image and Video Processing

arXiv:2102.06192 [pdf, other]

Adversarial Segmentation Loss for Sketch Colorization

Authors: Samet Hicsonmez, Nermin Samet, Emre Akbas, Pinar Duygulu

Abstract: We introduce a new method for generating color images from sketches or edge maps. Current methods either require some form of additional user-guidance or are limited to the "paired" translation approach. We argue that segmentation information could provide valuable guidance for sketch colorization. To this end, we propose to leverage semantic image segmentation, as provided by a general purpose pa… ▽ More We introduce a new method for generating color images from sketches or edge maps. Current methods either require some form of additional user-guidance or are limited to the "paired" translation approach. We argue that segmentation information could provide valuable guidance for sketch colorization. To this end, we propose to leverage semantic image segmentation, as provided by a general purpose panoptic segmentation network, to create an additional adversarial loss function. Our loss function can be integrated to any baseline GAN model. Our method is not limited to datasets that contain segmentation labels, and it can be trained for "unpaired" translation tasks. We show the effectiveness of our method on four different datasets spanning scene level indoor, outdoor, and children book illustration images using qualitative, quantitative and user study analysis. Our model improves its baseline up to 35 points on the FID metric. Our code and pretrained models can be found at https://github.com/giddyyupp/AdvSegLoss. △ Less

Submitted 13 June, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

Comments: ICIP 2021 camera-ready version

arXiv:2102.02804 [pdf, other]

A Deeper Look into Convolutions via Eigenvalue-based Pruning

Authors: Ilke Cugu, Emre Akbas

Abstract: Convolutional neural networks (CNNs) are able to attain better visual recognition performance than fully connected neural networks despite having much fewer parameters due to their parameter sharing principle. Modern architectures usually contain a small number of fully-connected layers, often at the end, after multiple layers of convolutions. In some cases, most of the convolutions can be elimina… ▽ More Convolutional neural networks (CNNs) are able to attain better visual recognition performance than fully connected neural networks despite having much fewer parameters due to their parameter sharing principle. Modern architectures usually contain a small number of fully-connected layers, often at the end, after multiple layers of convolutions. In some cases, most of the convolutions can be eliminated without suffering any loss in recognition performance. However, there is no solid recipe to detect the hidden subset of convolutional neurons that is responsible for the majority of the recognition work. In this work, we formulate this as a pruning problem where the aim is to prune as many kernels as possible while preserving the vanilla generalization performance. To this end, we use the matrix characteristics based on eigenvalues for pruning, in comparison to the average absolute weight of a kernel which is the de facto standard in the literature to assess the importance of an individual convolutional kernel, to shed light on the internal mechanisms of a widely used family of CNNs, namely residual neural networks (ResNets), for the image classification problem using CIFAR-10, CIFAR-100 and Tiny ImageNet datasets. △ Less

Submitted 18 October, 2022; v1 submitted 4 February, 2021; originally announced February 2021.

Comments: The codes are available at https://github.com/cuguilke/psykedelic

arXiv:2011.10772 [pdf, other]

One Metric to Measure them All: Localisation Recall Precision (LRP) for Evaluating Visual Detection Tasks

Authors: Kemal Oksuz, Baris Can Cam, Sinan Kalkan, Emre Akbas

Abstract: Despite being widely used as a performance measure for visual detection tasks, Average Precision (AP) is limited in (i) reflecting localisation quality, (ii) interpretability and (iii) robustness to the design choices regarding its computation, and its applicability to outputs without confidence scores. Panoptic Quality (PQ), a measure proposed for evaluating panoptic segmentation (Kirillov et al.… ▽ More Despite being widely used as a performance measure for visual detection tasks, Average Precision (AP) is limited in (i) reflecting localisation quality, (ii) interpretability and (iii) robustness to the design choices regarding its computation, and its applicability to outputs without confidence scores. Panoptic Quality (PQ), a measure proposed for evaluating panoptic segmentation (Kirillov et al., 2019), does not suffer from these limitations but is limited to panoptic segmentation. In this paper, we propose Localisation Recall Precision (LRP) Error as the average matching error of a visual detector computed based on both its localisation and classification qualities for a given confidence score threshold. LRP Error, initially proposed only for object detection by Oksuz et al. (2018), does not suffer from the aforementioned limitations and is applicable to all visual detection tasks. We also introduce Optimal LRP (oLRP) Error as the minimum LRP Error obtained over confidence scores to evaluate visual detectors and obtain optimal thresholds for deployment. We provide a detailed comparative analysis of LRP Error with AP and PQ, and use nearly 100 state-of-the-art visual detectors from seven visual detection tasks (i.e. object detection, keypoint detection, instance segmentation, panoptic segmentation, visual relationship detection, zero-shot detection and generalised zero-shot detection) using ten datasets to empirically show that LRP Error provides richer and more discriminative information than its counterparts. Code available at: https://github.com/kemaloksuz/LRP-Error △ Less

Submitted 21 November, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

Comments: Accepted to TPAMI

arXiv:2009.13592 [pdf, other]

A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection

Authors: Kemal Oksuz, Baris Can Cam, Emre Akbas, Sinan Kalkan

Abstract: We propose average Localisation-Recall-Precision (aLRP), a unified, bounded, balanced and ranking-based loss function for both classification and localisation tasks in object detection. aLRP extends the Localisation-Recall-Precision (LRP) performance metric (Oksuz et al., 2018) inspired from how Average Precision (AP) Loss extends precision to a ranking-based loss function for classification (Chen… ▽ More We propose average Localisation-Recall-Precision (aLRP), a unified, bounded, balanced and ranking-based loss function for both classification and localisation tasks in object detection. aLRP extends the Localisation-Recall-Precision (LRP) performance metric (Oksuz et al., 2018) inspired from how Average Precision (AP) Loss extends precision to a ranking-based loss function for classification (Chen et al., 2020). aLRP has the following distinct advantages: (i) aLRP is the first ranking-based loss function for both classification and localisation tasks. (ii) Thanks to using ranking for both tasks, aLRP naturally enforces high-quality localisation for high-precision classification. (iii) aLRP provides provable balance between positives and negatives. (iv) Compared to on average $\sim$6 hyperparameters in the loss functions of state-of-the-art detectors, aLRP Loss has only one hyperparameter, which we did not tune in practice. On the COCO dataset, aLRP Loss improves its ranking-based predecessor, AP Loss, up to around $5$ AP points, achieves $48.9$ AP without test time augmentation and outperforms all one-stage detectors. Code available at: https://github.com/kemaloksuz/aLRPLoss . △ Less

Submitted 7 January, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: NeurIPS 2020 spotlight paper

arXiv:2008.01167 [pdf, other]

Reducing Label Noise in Anchor-Free Object Detection

Authors: Nermin Samet, Samet Hicsonmez, Emre Akbas

Abstract: Current anchor-free object detectors label all the features that spatially fall inside a predefined central region of a ground-truth box as positive. This approach causes label noise during training, since some of these positively labeled features may be on the background or an occluder object, or they are simply not discriminative features. In this paper, we propose a new labeling strategy aimed… ▽ More Current anchor-free object detectors label all the features that spatially fall inside a predefined central region of a ground-truth box as positive. This approach causes label noise during training, since some of these positively labeled features may be on the background or an occluder object, or they are simply not discriminative features. In this paper, we propose a new labeling strategy aimed to reduce the label noise in anchor-free detectors. We sum-pool predictions stemming from individual features into a single prediction. This allows the model to reduce the contributions of non-discriminatory features during training. We develop a new one-stage, anchor-free object detector, PPDet, to employ this labeling strategy during training and a similar prediction pooling method during inference. On the COCO dataset, PPDet achieves the best performance among anchor-free top-down detectors and performs on-par with the other state-of-the-art methods. It also outperforms all major one-stage and two-stage methods in small object detection (${AP}_{S}$ $31.4$). Code is available at https://github.com/nerminsamet/ppdet △ Less

Submitted 13 August, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

Comments: BMVC 2020 camera-ready version

arXiv:2007.02355 [pdf, other]

HoughNet: Integrating near and long-range evidence for bottom-up object detection

Authors: Nermin Samet, Samet Hicsonmez, Emre Akbas

Abstract: This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughN… ▽ More This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby generalizing and enhancing current object detection methodology, which typically relies on only local evidence. On the COCO dataset, HoughNet's best model achieves 46.4 $AP$ (and 65.1 $AP_{50}$), performing on par with the state-of-the-art in bottom-up object detection and outperforming most major one-stage and two-stage methods. We further validate the effectiveness of our proposal in another task, namely, "labels to photo" image generation by integrating the voting module of HoughNet to two different GAN models and showing that the accuracy is significantly improved in both cases. Code is available at https://github.com/nerminsamet/houghnet. △ Less

Submitted 24 July, 2020; v1 submitted 5 July, 2020; originally announced July 2020.

Comments: ECCV 2020 camera-ready version

arXiv:2007.00434 [pdf, other]

Graph Classification via Heat Diffusion on Simplicial Complexes

Authors: Mehmet Emin Aktas, Esra Akbas

Abstract: In this paper, we study the graph classification problem in vertex-labeled graphs. Our main goal is to classify the graphs comparing their higher-order structures thanks to heat diffusion on their simplices. We first represent vertex-labeled graphs as simplex-weighted super-graphs. We then define the diffusion Frechet function over their simplices to encode the higher-order network topology and fi… ▽ More In this paper, we study the graph classification problem in vertex-labeled graphs. Our main goal is to classify the graphs comparing their higher-order structures thanks to heat diffusion on their simplices. We first represent vertex-labeled graphs as simplex-weighted super-graphs. We then define the diffusion Frechet function over their simplices to encode the higher-order network topology and finally reach our goal by combining the function values with machine learning algorithms. Our experiments on real-world bioinformatics networks show that using diffusion Fr{é}chet function on simplices is promising in graph classification and more effective than the baseline methods. To the best of our knowledge, this paper is the first paper in the literature using heat diffusion on higher-dimensional simplices in a graph mining problem. We believe that our method can be extended to different graph mining domains, not only the graph classification problem. △ Less

Submitted 26 June, 2020; originally announced July 2020.

MSC Class: 55U10

arXiv:2002.05638 [pdf, other]

GANILLA: Generative Adversarial Networks for Image to Illustration Translation

Authors: Samet Hicsonmez, Nermin Samet, Emre Akbas, Pinar Duygulu

Abstract: In this paper, we explore illustrations in children's books as a new domain in unpaired image-to-image translation. We show that although the current state-of-the-art image-to-image translation models successfully transfer either the style or the content, they fail to transfer both at the same time. We propose a new generator network to address this issue and show that the resulting network strike… ▽ More In this paper, we explore illustrations in children's books as a new domain in unpaired image-to-image translation. We show that although the current state-of-the-art image-to-image translation models successfully transfer either the style or the content, they fail to transfer both at the same time. We propose a new generator network to address this issue and show that the resulting network strikes a better balance between style and content. There are no well-defined or agreed-upon evaluation metrics for unpaired image-to-image translation. So far, the success of image translation models has been based on subjective, qualitative visual comparison on a limited number of images. To address this problem, we propose a new framework for the quantitative evaluation of image-to-illustration models, where both content and style are taken into account using separate classifiers. In this new evaluation framework, our proposed model performs better than the current state-of-the-art models on the illustrations dataset. Our code and pretrained models can be found at https://github.com/giddyyupp/ganilla. △ Less

Submitted 14 February, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

Comments: to be published in Image and Vision Computing

arXiv:1909.09777 [pdf, other]

Generating Positive Bounding Boxes for Balanced Training of Object Detectors

Authors: Kemal Oksuz, Baris Can Cam, Emre Akbas, Sinan Kalkan

Abstract: Two-stage deep object detectors generate a set of regions-of-interest (RoI) in the first stage, then, in the second stage, identify objects among the proposed RoIs that sufficiently overlap with a ground truth (GT) box. The second stage is known to suffer from a bias towards RoIs that have low intersection-over-union (IoU) with the associated GT boxes. To address this issue, we first propose a sam… ▽ More Two-stage deep object detectors generate a set of regions-of-interest (RoI) in the first stage, then, in the second stage, identify objects among the proposed RoIs that sufficiently overlap with a ground truth (GT) box. The second stage is known to suffer from a bias towards RoIs that have low intersection-over-union (IoU) with the associated GT boxes. To address this issue, we first propose a sampling method to generate bounding boxes (BB) that overlap with a given reference box more than a given IoU threshold. Then, we use this BB generation method to develop a positive RoI (pRoI) generator that produces RoIs following any desired spatial or IoU distribution, for the second-stage. We show that our pRoI generator is able to simulate other sampling methods for positive examples such as hard example mining and prime sampling. Using our generator as an analysis tool, we show that (i) IoU imbalance has an adverse effect on performance, (ii) hard positive example mining improves the performance only for certain input IoU distributions, and (iii) the imbalance among the foreground classes has an adverse effect on performance and that it can be alleviated at the batch level. Finally, we train Faster R-CNN using our pRoI generator and, compared to conventional training, obtain better or on-par performance for low IoUs and significant improvements when trained for higher IoUs for Pascal VOC and MS COCO datasets. The code is available at: https://github.com/kemaloksuz/BoundingBoxGenerator. △ Less

Submitted 19 June, 2020; v1 submitted 21 September, 2019; originally announced September 2019.

Comments: To appear in WACV 20

arXiv:1909.00169 [pdf, other]

Imbalance Problems in Object Detection: A Review

Authors: Kemal Oksuz, Baris Can Cam, Sinan Kalkan, Emre Akbas

Abstract: In this paper, we present a comprehensive review of the imbalance problems in object detection. To analyze the problems in a systematic manner, we introduce a problem-based taxonomy. Following this taxonomy, we discuss each problem in depth and present a unifying yet critical perspective on the solutions in the literature. In addition, we identify major open issues regarding the existing imbalance… ▽ More In this paper, we present a comprehensive review of the imbalance problems in object detection. To analyze the problems in a systematic manner, we introduce a problem-based taxonomy. Following this taxonomy, we discuss each problem in depth and present a unifying yet critical perspective on the solutions in the literature. In addition, we identify major open issues regarding the existing imbalance problems as well as imbalance problems that have not been discussed before. Moreover, in order to keep our review up to date, we provide an accompanying webpage which catalogs papers addressing imbalance problems, according to our problem-based taxonomy. Researchers can track newer studies on this webpage available at: https://github.com/kemaloksuz/ObjectDetectionImbalance . △ Less

Submitted 11 March, 2020; v1 submitted 31 August, 2019; originally announced September 2019.

Comments: Accepted to IEEE TPAMI; currently in press

arXiv:1907.08708 [pdf, other]

Persistence Homology of Networks: Methods and Applications

Authors: Mehmet Emin Aktas, Esra Akbas, Ahmed El Fatmaoui

Abstract: Information networks are becoming increasingly popular to capture complex relationships across various disciplines, such as social networks, citation networks, and biological networks. The primary challenge in this domain is measuring similarity or distance between networks based on topology. However, classical graph-theoretic measures are usually local and mainly based on differences between eith… ▽ More Information networks are becoming increasingly popular to capture complex relationships across various disciplines, such as social networks, citation networks, and biological networks. The primary challenge in this domain is measuring similarity or distance between networks based on topology. However, classical graph-theoretic measures are usually local and mainly based on differences between either node or edge measurements or correlations without considering the topology of networks such as the connected components or holes. In recent years, mathematical tools and deep learning based methods have become popular to extract the topological features of networks. Persistent homology (PH) is a mathematical tool in computational topology that measures the topological features of data that persist across multiple scales with applications ranging from biological networks to social networks. In this paper, we provide a conceptual review of key advancements in this area of using PH on complex network science. We give a brief mathematical background on PH, review different methods (i.e. filtrations) to define PH on networks and highlight different algorithms and applications where PH is used in solving network mining problems. In doing so, we develop a unified framework to describe these recent approaches and emphasize major conceptual distinctions. We conclude with directions for future work. We focus our review on recent approaches that get significant attention in the mathematics and data mining communities working on network data. We believe our summary of the analysis of PH on networks will provide important insights to researchers in applied network science. △ Less

Submitted 19 July, 2019; originally announced July 2019.

Comments: Submitted to Applied Network Science Special Issue on Machine Learning with Graphs

MSC Class: 55U99; 55N35; 05C82

arXiv:1907.02811 [pdf, ps, other]

Network Embedding: on Compression and Learning

Authors: Esra Akbas, Mehmet Aktas

Abstract: Recently, network embedding that encodes structural information of graphs into a vector space has become popular for network analysis. Although recent methods show promising performance for various applications, the huge sizes of graphs may hinder a direct application of existing network embedding method to them. This paper presents NECL, a novel efficient Network Embedding method with two goals.… ▽ More Recently, network embedding that encodes structural information of graphs into a vector space has become popular for network analysis. Although recent methods show promising performance for various applications, the huge sizes of graphs may hinder a direct application of existing network embedding method to them. This paper presents NECL, a novel efficient Network Embedding method with two goals. 1) Is there an ideal Compression of a network? 2) Will the compression of a network significantly boost the representation Learning of the network? For the first problem, we propose a neighborhood similarity based graph compression method that compresses the input graph to get a smaller graph without losing any/much information about the global structure of the graph and the local proximity of the vertices in the graph. For the second problem, we use the compressed graph for network embedding instead of the original large graph to bring down the embedding cost. NECL is a general meta-strategy to improve the efficiency of all of the state-of-the-art graph embedding algorithms based on random walks, including DeepWalk and Node2vec, without losing their effectiveness. Extensive experiments on large real-world networks validate the efficiency of NECL method that yields an average improvement of 23 - 57% embedding time, including walking and learning time without decreasing classification accuracy as evaluated on single and multi-label classification tasks on real-world graphs such as DBLP, BlogCatalog, Cora and Wiki. △ Less

Submitted 8 July, 2019; v1 submitted 5 July, 2019; originally announced July 2019.

Comments: arXiv admin note: text overlap with arXiv:1706.07845, arXiv:1607.00653 by other authors

arXiv:1903.02330 [pdf, other]

Self-Supervised Learning of 3D Human Pose using Multi-view Geometry

Authors: Muhammed Kocabas, Salih Karagoz, Emre Akbas

Abstract: Training accurate 3D human pose estimators requires large amount of 3D ground-truth data which is costly to collect. Various weakly or self supervised pose estimation methods have been proposed due to lack of 3D data. Nevertheless, these methods, in addition to 2D ground-truth poses, require either additional supervision in various forms (e.g. unpaired 3D ground truth data, a small subset of label… ▽ More Training accurate 3D human pose estimators requires large amount of 3D ground-truth data which is costly to collect. Various weakly or self supervised pose estimation methods have been proposed due to lack of 3D data. Nevertheless, these methods, in addition to 2D ground-truth poses, require either additional supervision in various forms (e.g. unpaired 3D ground truth data, a small subset of labels) or the camera parameters in multiview settings. To address these problems, we present EpipolarPose, a self-supervised learning method for 3D human pose estimation, which does not need any 3D ground-truth data or camera extrinsics. During training, EpipolarPose estimates 2D poses from multi-view images, and then, utilizes epipolar geometry to obtain a 3D pose and camera geometry which are subsequently used to train a 3D pose estimator. We demonstrate the effectiveness of our approach on standard benchmark datasets i.e. Human3.6M and MPI-INF-3DHP where we set the new state-of-the-art among weakly/self-supervised methods. Furthermore, we propose a new performance measure Pose Structure Score (PSS) which is a scale invariant, structure aware measure to evaluate the structural plausibility of a pose with respect to its ground truth. Code and pretrained models are available at https://github.com/mkocabas/EpipolarPose △ Less

Submitted 9 April, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

Comments: CVPR 2019 camera ready. Code is available at https://github.com/mkocabas/EpipolarPose

arXiv:1807.04067 [pdf, other]

MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network

Authors: Muhammed Kocabas, Salih Karagoz, Emre Akbas

Abstract: In this paper, we present MultiPoseNet, a novel bottom-up multi-person pose estimation architecture that combines a multi-task model with a novel assignment method. MultiPoseNet can jointly handle person detection, keypoint detection, person segmentation and pose estimation problems. The novel assignment method is implemented by the Pose Residual Network (PRN) which receives keypoint and person de… ▽ More In this paper, we present MultiPoseNet, a novel bottom-up multi-person pose estimation architecture that combines a multi-task model with a novel assignment method. MultiPoseNet can jointly handle person detection, keypoint detection, person segmentation and pose estimation problems. The novel assignment method is implemented by the Pose Residual Network (PRN) which receives keypoint and person detections, and produces accurate poses by assigning keypoints to person instances. On the COCO keypoints dataset, our pose estimation method outperforms all previous bottom-up methods both in accuracy (+4-point mAP over previous best result) and speed; it also performs on par with the best top-down methods while being at least 4x faster. Our method is the fastest real time system with 23 frames/sec. Source code is available at: https://github.com/mkocabas/pose-residual-network △ Less

Submitted 11 July, 2018; originally announced July 2018.

Comments: to appear in ECCV 2018

arXiv:1807.01696 [pdf, other]

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Authors: Kemal Oksuz, Baris Can Cam, Emre Akbas, Sinan Kalkan

Abstract: Average precision (AP), the area under the recall-precision (RP) curve, is the standard performance measure for object detection. Despite its wide acceptance, it has a number of shortcomings, the most important of which are (i) the inability to distinguish very different RP curves, and (ii) the lack of directly measuring bounding box localization accuracy. In this paper, we propose 'Localization R… ▽ More Average precision (AP), the area under the recall-precision (RP) curve, is the standard performance measure for object detection. Despite its wide acceptance, it has a number of shortcomings, the most important of which are (i) the inability to distinguish very different RP curves, and (ii) the lack of directly measuring bounding box localization accuracy. In this paper, we propose 'Localization Recall Precision (LRP) Error', a new metric which we specifically designed for object detection. LRP Error is composed of three components related to localization, false negative (FN) rate and false positive (FP) rate. Based on LRP, we introduce the 'Optimal LRP', the minimum achievable LRP error representing the best achievable configuration of the detector in terms of recall-precision and the tightness of the boxes. In contrast to AP, which considers precisions over the entire recall domain, Optimal LRP determines the 'best' confidence score threshold for a class, which balances the trade-off between localization and recall-precision. In our experiments, we show that, for state-of-the-art object (SOTA) detectors, Optimal LRP provides richer and more discriminative information than AP. We also demonstrate that the best confidence score thresholds vary significantly among classes and detectors. Moreover, we present LRP results of a simple online video object detector which uses a SOTA still image object detector and show that the class-specific optimized thresholds increase the accuracy against the common approach of using a general threshold for all classes. At https://github.com/cancam/LRP we provide the source code that can compute LRP for the PASCAL VOC and MSCOCO datasets. Our source code can easily be adapted to other datasets as well. △ Less

Submitted 5 July, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

Comments: to appear in ECCV 2018

arXiv:1711.07011 [pdf, other]

MicroExpNet: An Extremely Small and Fast Model For Expression Recognition From Face Images

Authors: İlke Çuğu, Eren Şener, Emre Akbaş

Abstract: This paper is aimed at creating extremely small and fast convolutional neural networks (CNN) for the problem of facial expression recognition (FER) from frontal face images. To this end, we employed the popular knowledge distillation (KD) method and identified two major shortcomings with its use: 1) a fine-grained grid search is needed for tuning the temperature hyperparameter and 2) to find the o… ▽ More This paper is aimed at creating extremely small and fast convolutional neural networks (CNN) for the problem of facial expression recognition (FER) from frontal face images. To this end, we employed the popular knowledge distillation (KD) method and identified two major shortcomings with its use: 1) a fine-grained grid search is needed for tuning the temperature hyperparameter and 2) to find the optimal size-accuracy balance, one needs to search for the final network size (or the compression rate). On the other hand, KD is proved to be useful for model compression for the FER problem, and we discovered that its effects gets more and more significant with the decreasing model size. In addition, we hypothesized that translation invariance achieved using max-pooling layers would not be useful for the FER problem as the expressions are sensitive to small, pixel-wise changes around the eye and the mouth. However, we have found an intriguing improvement on generalization when max-pooling is used. We conducted experiments on two widely-used FER datasets, CK+ and Oulu-CASIA. Our smallest model (MicroExpNet), obtained using knowledge distillation, is less than 1MB in size and works at 1851 frames per second on an Intel i7 CPU. Despite being less accurate than the state-of-the-art, MicroExpNet still provides significant insights for designing a microarchitecture for the FER problem. △ Less

Submitted 24 December, 2019; v1 submitted 19 November, 2017; originally announced November 2017.

Comments: International Conference on Image Processing Theory, Tools and Applications (IPTA) 2019 camera ready version. Codes are available at: https://github.com/cuguilke/microexpnet

arXiv:1704.02665 [pdf, ps, other]

Supervised Infinite Feature Selection

Authors: Sadegh Eskandari, Emre Akbas

Abstract: In this paper, we present a new feature selection method that is suitable for both unsupervised and supervised problems. We build upon the recently proposed Infinite Feature Selection (IFS) method where feature subsets of all sizes (including infinity) are considered. We extend IFS in two ways. First, we propose a supervised version of it. Second, we propose new ways of forming the feature adjacen… ▽ More In this paper, we present a new feature selection method that is suitable for both unsupervised and supervised problems. We build upon the recently proposed Infinite Feature Selection (IFS) method where feature subsets of all sizes (including infinity) are considered. We extend IFS in two ways. First, we propose a supervised version of it. Second, we propose new ways of forming the feature adjacency matrix that perform better for unsupervised problems. We extensively evaluate our methods on many benchmark datasets, including large image-classification datasets (PASCAL VOC), and show that our methods outperform both the IFS and the widely used "minimum-redundancy maximum-relevancy (mRMR)" feature selection algorithm. △ Less

Submitted 21 August, 2017; v1 submitted 9 April, 2017; originally announced April 2017.

arXiv:1704.00016 [pdf, other]

Opinion Mining on Non-English Short Text

Authors: Esra Akbas

Abstract: As the type and the number of such venues increase, automated analysis of sentiment on textual resources has become an essential data mining task. In this paper, we investigate the problem of mining opinions on the collection of informal short texts. Both positive and negative sentiment strength of texts are detected. We focus on a non-English language that has few resources for text mining. This… ▽ More As the type and the number of such venues increase, automated analysis of sentiment on textual resources has become an essential data mining task. In this paper, we investigate the problem of mining opinions on the collection of informal short texts. Both positive and negative sentiment strength of texts are detected. We focus on a non-English language that has few resources for text mining. This approach would help enhance the sentiment analysis in languages where a list of opinionated words does not exist. We propose a new method projects the text into dense and low dimensional feature vectors according to the sentiment strength of the words. We detect the mixture of positive and negative sentiments on a multi-variant scale. Empirical evaluation of the proposed framework on Turkish tweets shows that our approach gets good results for opinion mining. △ Less

Submitted 3 April, 2017; v1 submitted 31 March, 2017; originally announced April 2017.

arXiv:1611.03971 [pdf, other]

doi 10.1007/s13278-016-0412-3

User characterization for online social networks

Authors: Tayfun Tuna, Esra Akbas, Ahmet Aksoy, Muhammed Abdullah Canbaz, Umit Karabiyik, Bilal Gonen, Ramazan Aygun

Abstract: Online social network analysis has attracted great attention with a vast number of users sharing information and availability of APIs that help to crawl online social network data. In this paper, we study the research studies that are helpful for user characterization as online users may not always reveal their true identity or attributes. We especially focused on user attribute determination such… ▽ More Online social network analysis has attracted great attention with a vast number of users sharing information and availability of APIs that help to crawl online social network data. In this paper, we study the research studies that are helpful for user characterization as online users may not always reveal their true identity or attributes. We especially focused on user attribute determination such as gender, age, etc.; user behavior analysis such as motives for deception; mental models that are indicators of user behavior; user categorization such as bots vs. humans; and entity matching on different social networks. We believe our summary of analysis of user characterization will provide important insights to researchers and better services to online users. △ Less

Submitted 26 December, 2016; v1 submitted 12 November, 2016; originally announced November 2016.

Journal ref: Soc. Netw. Anal. Min. (2016) 6: 104. doi:10.1007/s13278-016-0412-3

Showing 1–50 of 54 results for author: Akbaş, E