Search | arXiv e-print repository

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Authors: Huy V. Vo, Vasil Khalidov, Timothée Darcet, Théo Moutakanni, Nikita Smetanin, Marc Szafraniec, Hugo Touvron, Camille Couprie, Maxime Oquab, Armand Joulin, Hervé Jégou, Patrick Labatut, Piotr Bojanowski

Abstract: Self-supervised features are the cornerstone of modern machine learning systems. They are typically pre-trained on data collections whose construction and curation typically require extensive human effort. This manual process has some limitations similar to those encountered in supervised learning, e.g., the crowd-sourced selection of data is costly and time-consuming, preventing scaling the datas… ▽ More Self-supervised features are the cornerstone of modern machine learning systems. They are typically pre-trained on data collections whose construction and curation typically require extensive human effort. This manual process has some limitations similar to those encountered in supervised learning, e.g., the crowd-sourced selection of data is costly and time-consuming, preventing scaling the dataset size. In this work, we consider the problem of automatic curation of high-quality datasets for self-supervised pre-training. We posit that such datasets should be large, diverse and balanced, and propose a clustering-based approach for building ones satisfying all these criteria. Our method involves successive and hierarchical applications of $k$-means on a large and diverse data repository to obtain clusters that distribute uniformly among data concepts, followed by a hierarchical, balanced sampling step from these clusters. Extensive experiments on three different data domains including web-based images, satellite images and text show that features trained on our automatically curated datasets outperform those trained on uncurated data while being on par or better than ones trained on manually curated data. Code is available at https://github.com/facebookresearch/ssl-data-curation. △ Less

Submitted 28 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.05272 [pdf, other]

Learning bridge numbers of knots

Authors: Hanh Vo, Puttipong Pongtanapaisan, Thieu Nguyen

Abstract: This paper employs various computational techniques to determine the bridge numbers of both classical and virtual knots. For classical knots, there is no ambiguity of what the bridge number means. For virtual knots, there are multiple natural definitions of bridge number, and we demonstrate that the difference can be arbitrarily far apart. We then acquired two datasets, one for classical and one f… ▽ More This paper employs various computational techniques to determine the bridge numbers of both classical and virtual knots. For classical knots, there is no ambiguity of what the bridge number means. For virtual knots, there are multiple natural definitions of bridge number, and we demonstrate that the difference can be arbitrarily far apart. We then acquired two datasets, one for classical and one for virtual knots, each comprising over one million labeled data points. With the data, we conduct experiments to evaluate the effectiveness of common machine learning models in classifying knots based on their bridge numbers. △ Less

Submitted 29 April, 2024; originally announced May 2024.

arXiv:2403.00833 [pdf, other]

Position Paper: Agent AI Towards a Holistic Intelligence

Authors: Qiuyuan Huang, Naoki Wake, Bidipta Sarkar, Zane Durante, Ran Gong, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Noboru Kuno, Ade Famoti, Ashley Llorens, John Langford, Hoi Vo, Li Fei-Fei, Katsu Ikeuchi, Jianfeng Gao

Abstract: Recent advancements in large foundation models have remarkably enhanced our understanding of sensory information in open-world environments. In leveraging the power of foundation models, it is crucial for AI research to pivot away from excessive reductionism and toward an emphasis on systems that function as cohesive wholes. Specifically, we emphasize develo** Agent AI -- an embodied system that… ▽ More Recent advancements in large foundation models have remarkably enhanced our understanding of sensory information in open-world environments. In leveraging the power of foundation models, it is crucial for AI research to pivot away from excessive reductionism and toward an emphasis on systems that function as cohesive wholes. Specifically, we emphasize develo** Agent AI -- an embodied system that integrates large foundation models into agent actions. The emerging field of Agent AI spans a wide range of existing embodied and agent-based multimodal interactions, including robotics, gaming, and healthcare systems, etc. In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model. On top of this idea, we discuss how agent AI exhibits remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. Furthermore, we discuss the potential of Agent AI from an interdisciplinary perspective, underscoring AI cognition and consciousness within scientific discourse. We believe that those discussions serve as a basis for future research directions and encourage broader societal engagement. △ Less

Submitted 28 February, 2024; originally announced March 2024.

Comments: 22 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2401.03568

arXiv:2402.05929 [pdf, other]

An Interactive Agent Foundation Model

Authors: Zane Durante, Bidipta Sarkar, Ran Gong, Rohan Taori, Yusuke Noda, Paul Tang, Ehsan Adeli, Shrinidhi Kowshika Lakshmikanth, Kevin Schulman, Arnold Milstein, Demetri Terzopoulos, Ade Famoti, Noboru Kuno, Ashley Llorens, Hoi Vo, Katsu Ikeuchi, Li Fei-Fei, Jianfeng Gao, Naoki Wake, Qiuyuan Huang

Abstract: The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training paradi… ▽ More The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction, enabling a versatile and adaptable AI framework. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare. Our model demonstrates its ability to generate meaningful and contextually relevant outputs in each area. The strength of our approach lies in its generality, leveraging a variety of data sources such as robotics sequences, gameplay data, large-scale video datasets, and textual information for effective multimodal and multi-task learning. Our approach provides a promising avenue for develo** generalist, action-taking, multimodal systems. △ Less

Submitted 17 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.03805 [pdf, other]

Automated Description Generation for Software Patches

Authors: Thanh Trong Vu, Tuan-Dung Bui, Thanh-Dat Do, Thu-Trang Nguyen, Hieu Dinh Vo, Son Nguyen

Abstract: Software patches are pivotal in refining and evolving codebases, addressing bugs, vulnerabilities, and optimizations. Patch descriptions provide detailed accounts of changes, aiding comprehension and collaboration among developers. However, manual description creation poses challenges in terms of time consumption and variations in quality and detail. In this paper, we propose PATCHEXPLAINER, an ap… ▽ More Software patches are pivotal in refining and evolving codebases, addressing bugs, vulnerabilities, and optimizations. Patch descriptions provide detailed accounts of changes, aiding comprehension and collaboration among developers. However, manual description creation poses challenges in terms of time consumption and variations in quality and detail. In this paper, we propose PATCHEXPLAINER, an approach that addresses these challenges by framing patch description generation as a machine translation task. In PATCHEXPLAINER, we leverage explicit representations of critical elements, historical context, and syntactic conventions. Moreover, the translation model in PATCHEXPLAINER is designed with an awareness of description similarity. Particularly, the model is explicitly trained to recognize and incorporate similarities present in patch descriptions clustered into groups, improving its ability to generate accurate and consistent descriptions across similar patches. The dual objectives maximize similarity and accurately predict affiliating groups. Our experimental results on a large dataset of real-world software patches show that PATCHEXPLAINER consistently outperforms existing methods, with improvements up to 189% in BLEU, 5.7X in Exact Match rate, and 154% in Semantic Similarity, affirming its effectiveness in generating software patch descriptions. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: Pre-print version of PATCHEXPLAINER

arXiv:2401.03568 [pdf, other]

Agent AI: Surveying the Horizons of Multimodal Interaction

Authors: Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Ye** Choi, Katsushi Ikeuchi, Hoi Vo, Li Fei-Fei, Jianfeng Gao

Abstract: Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the a… ▽ More Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the ability of models to process and interpret visual and contextual data, which is critical for the creation of more sophisticated and context-aware AI systems. For example, a system that can perceive user actions, human behavior, environmental objects, audio expressions, and the collective sentiment of a scene can be used to inform and direct agent responses within the given environment. To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions. In particular, we explore systems that aim to improve agents based on next-embodied action prediction by incorporating external knowledge, multi-sensory inputs, and human feedback. We argue that by develo** agentic AI systems in grounded environments, one can also mitigate the hallucinations of large foundation models and their tendency to generate environmentally incorrect outputs. The emerging field of Agent AI subsumes the broader embodied and agentic aspects of multimodal interactions. Beyond agents acting and interacting in the physical world, we envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment. △ Less

Submitted 25 January, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

arXiv:2311.14971 [pdf]

Segmentation of diagnostic tissue compartments on whole slide images with renal thrombotic microangiopathies (TMAs)

Authors: Huy Q. Vo, Pietro A. Cicalese, Surya Seshan, Syed A. Rizvi, Aneesh Vathul, Gloria Bueno, Anibal Pedraza Dorado, Niels Grabe, Katharina Stolle, Francesco Pesce, Joris J. T. H. Roelofs, Jesper Kers, Vitoantonio Bevilacqua, Nicola Altini, Bernd Schröppel, Dario Roccatello, Antonella Barreca, Savino Sciascia, Chandra Mohan, Hien V. Nguyen, Jan U. Becker

Abstract: The thrombotic microangiopathies (TMAs) manifest in renal biopsy histology with a broad spectrum of acute and chronic findings. Precise diagnostic criteria for a renal biopsy diagnosis of TMA are missing. As a first step towards a machine learning- and computer vision-based analysis of wholes slide images from renal biopsies, we trained a segmentation model for the decisive diagnostic kidney tissu… ▽ More The thrombotic microangiopathies (TMAs) manifest in renal biopsy histology with a broad spectrum of acute and chronic findings. Precise diagnostic criteria for a renal biopsy diagnosis of TMA are missing. As a first step towards a machine learning- and computer vision-based analysis of wholes slide images from renal biopsies, we trained a segmentation model for the decisive diagnostic kidney tissue compartments artery, arteriole, glomerulus on a set of whole slide images from renal biopsies with TMAs and Mimickers (distinct diseases with a similar nephropathological appearance as TMA like severe benign nephrosclerosis, various vasculitides, Bevacizumab-plug glomerulopathy, arteriolar light chain deposition disease). Our segmentation model combines a U-Net-based tissue detection with a Shifted windows-transformer architecture to reach excellent segmentation results for even the most severely altered glomeruli, arterioles and arteries, even on unseen staining domains from a different nephropathology lab. With accurate automatic segmentation of the decisive renal biopsy compartments in human renal vasculopathies, we have laid the foundation for large-scale compartment-specific machine learning and computer vision analysis of renal biopsy repositories with TMAs. △ Less

Submitted 28 November, 2023; v1 submitted 25 November, 2023; originally announced November 2023.

Comments: 12 pages, 3 figures

arXiv:2309.09971 [pdf, other]

MindAgent: Emergent Gaming Interaction

Authors: Ran Gong, Qiuyuan Huang, Xiaojian Ma, Hoi Vo, Zane Durante, Yusuke Noda, Zilong Zheng, Song-Chun Zhu, Demetri Terzopoulos, Li Fei-Fei, Jianfeng Gao

Abstract: Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration. However, despite the introduction of numerous gaming frameworks, the community has insufficient benchmarks towards building general multi-agents collaboration infrastructure that encompass b… ▽ More Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration. However, despite the introduction of numerous gaming frameworks, the community has insufficient benchmarks towards building general multi-agents collaboration infrastructure that encompass both LLM and human-NPCs collaborations. In this work, we propose a novel infrastructure - MindAgent - to evaluate planning and coordination emergent capabilities for gaming interaction. In particular, our infrastructure leverages existing gaming framework, to i) require understanding of the coordinator for a multi-agent system, ii) collaborate with human players via un-finetuned proper instructions, and iii) establish an in-context learning on few-shot prompt with feedback. Furthermore, we introduce CUISINEWORLD, a new gaming scenario and related benchmark that dispatch a multi-agent collaboration efficiency and supervise multiple agents playing the game simultaneously. We conduct comprehensive evaluations with new auto-metric CoS for calculating the collaboration efficiency. Finally, our infrastructure can be deployed into real-world gaming scenarios in a customized VR version of CUISINEWORLD and adapted in existing broader Minecraft gaming domain. We hope our findings on LLMs and the new infrastructure for general-purpose scheduling and coordination can help shed light on how such skills can be obtained by learning from large language corpora. △ Less

Submitted 19 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: The first three authors contributed equally. 28 pages

arXiv:2309.08225 [pdf, other]

Silent Vulnerability-fixing Commit Identification Based on Graph Neural Networks

Authors: Hieu Dinh Vo, Thanh Trong Vu, Son Nguyen

Abstract: The growing dependence of software projects on external libraries has generated apprehensions regarding the security of these libraries because of concealed vulnerabilities. Handling these vulnerabilities presents difficulties due to the temporal delay between remediation and public exposure. Furthermore, a substantial fraction of open-source projects covertly address vulnerabilities without any f… ▽ More The growing dependence of software projects on external libraries has generated apprehensions regarding the security of these libraries because of concealed vulnerabilities. Handling these vulnerabilities presents difficulties due to the temporal delay between remediation and public exposure. Furthermore, a substantial fraction of open-source projects covertly address vulnerabilities without any formal notification, influencing vulnerability management. Established solutions like OWASP predominantly hinge on public announcements, limiting their efficacy in uncovering undisclosed vulnerabilities. To address this challenge, the automated identification of vulnerability-fixing commits has come to the forefront. In this paper, we present VFFINDER, a novel graph-based approach for automated silent vulnerability fix identification. VFFINDER captures structural changes using Abstract Syntax Trees (ASTs) and represents them in annotated ASTs. To precisely capture the meaning of code changes, the changed code is represented in connection with the related unchanged code. In VFFINDER, the structure of the changed code and related unchanged code are captured and the structural changes are represented in annotated Abstract Syntax Trees (aAST). VFFINDER distinguishes vulnerability-fixing commits from non-fixing ones using attention-based graph neural network models to extract structural features expressed in aASTs. We conducted experiments to evaluate VFFINDER on a dataset of 11K+ vulnerability fixing commits in 507 real-world C/C++ projects. Our results show that VFFINDER significantly improves the state-of-the-art methods by 272-420% in Precision, 22-70% in Recall, and 3.2X-8.2X in F1. Especially, VFFINDER speeds up the silent fix identification process by up to 121% with the same effort reviewing 50K LOC compared to the existing approaches. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2304.08396, arXiv:2309.01971

arXiv:2309.01971 [pdf, other]

VFFINDER: A Graph-based Approach for Automated Silent Vulnerability-Fix Identification

Authors: Son Nguyen, Thanh Trong Vu, Hieu Dinh Vo

Abstract: The increasing reliance of software projects on third-party libraries has raised concerns about the security of these libraries due to hidden vulnerabilities. Managing these vulnerabilities is challenging due to the time gap between fixes and public disclosures. Moreover, a significant portion of open-source projects silently fix vulnerabilities without disclosure, impacting vulnerability manageme… ▽ More The increasing reliance of software projects on third-party libraries has raised concerns about the security of these libraries due to hidden vulnerabilities. Managing these vulnerabilities is challenging due to the time gap between fixes and public disclosures. Moreover, a significant portion of open-source projects silently fix vulnerabilities without disclosure, impacting vulnerability management. Existing tools like OWASP heavily rely on public disclosures, hindering their effectiveness in detecting unknown vulnerabilities. To tackle this problem, automated identification of vulnerability-fixing commits has emerged. However, identifying silent vulnerability fixes remains challenging. This paper presents VFFINDER, a novel graph-based approach for automated silent vulnerability fix identification. VFFINDER captures structural changes using Abstract Syntax Trees (ASTs) and represents them in annotated ASTs. VFFINDER distinguishes vulnerability-fixing commits from non-fixing ones using attention-based graph neural network models to extract structural features. We conducted experiments to evaluate VFFINDER on a dataset of 36K+ fixing and non-fixing commits in 507 real-world C/C++ projects. Our results show that VFFINDER significantly improves the state-of-the-art methods by 39-83% in Precision, 19-148% in Recall, and 30-109% in F1. Especially, VFFINDER speeds up the silent fix identification process by up to 47% with the same review effort of 5% compared to the existing approaches. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted by IEEE KSE 2023

arXiv:2308.16262 [pdf, other]

Causal Strategic Learning with Competitive Selection

Authors: Kiet Q. H. Vo, Muneeb Aadil, Siu Lun Chau, Krikamol Muandet

Abstract: We study the problem of agent selection in causal strategic learning under multiple decision makers and address two key challenges that come with it. Firstly, while much of prior work focuses on studying a fixed pool of agents that remains static regardless of their evaluations, we consider the impact of selection procedure by which agents are not only evaluated, but also selected. When each decis… ▽ More We study the problem of agent selection in causal strategic learning under multiple decision makers and address two key challenges that come with it. Firstly, while much of prior work focuses on studying a fixed pool of agents that remains static regardless of their evaluations, we consider the impact of selection procedure by which agents are not only evaluated, but also selected. When each decision maker unilaterally selects agents by maximising their own utility, we show that the optimal selection rule is a trade-off between selecting the best agents and providing incentives to maximise the agents' improvement. Furthermore, this optimal selection rule relies on incorrect predictions of agents' outcomes. Hence, we study the conditions under which a decision maker's optimal selection rule will not lead to deterioration of agents' outcome nor cause unjust reduction in agents' selection chance. To that end, we provide an analytical form of the optimal selection rule and a mechanism to retrieve the causal parameters from observational data, under certain assumptions on agents' behaviour. Secondly, when there are multiple decision makers, the interference between selection rules introduces another source of biases in estimating the underlying causal parameters. To address this problem, we provide a cooperative protocol which all decision makers must collectively adopt to recover the true causal parameters. Lastly, we complement our theoretical results with simulation studies. Our results highlight not only the importance of causal modeling as a strategy to mitigate the effect of gaming, as suggested by previous work, but also the need of a benevolent regulator to enable it. △ Less

Submitted 3 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Added more discussions on assumptions and the algorithm, and expand the Conclusion

arXiv:2308.13735 [pdf, other]

MST-compression: Compressing and Accelerating Binary Neural Networks with Minimum Spanning Tree

Authors: Quang Hieu Vo, Linh-Tam Tran, Sung-Ho Bae, Lok-Won Kim, Choong Seon Hong

Abstract: Binary neural networks (BNNs) have been widely adopted to reduce the computational cost and memory storage on edge-computing devices by using one-bit representation for activations and weights. However, as neural networks become wider/deeper to improve accuracy and meet practical requirements, the computational burden remains a significant challenge even on the binary version. To address these iss… ▽ More Binary neural networks (BNNs) have been widely adopted to reduce the computational cost and memory storage on edge-computing devices by using one-bit representation for activations and weights. However, as neural networks become wider/deeper to improve accuracy and meet practical requirements, the computational burden remains a significant challenge even on the binary version. To address these issues, this paper proposes a novel method called Minimum Spanning Tree (MST) compression that learns to compress and accelerate BNNs. The proposed architecture leverages an observation from previous works that an output channel in a binary convolution can be computed using another output channel and XNOR operations with weights that differ from the weights of the reused channel. We first construct a fully connected graph with vertices corresponding to output channels, where the distance between two vertices is the number of different values between the weight sets used for these outputs. Then, the MST of the graph with the minimum depth is proposed to reorder output calculations, aiming to reduce computational cost and latency. Moreover, we propose a new learning algorithm to reduce the total MST distance during training. Experimental results on benchmark models demonstrate that our method achieves significant compression ratios with negligible accuracy drops, making it a promising approach for resource-constrained edge-computing devices. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 11 pages, 9 figures, ICCV 2023

arXiv:2308.06246 [pdf, other]

ARGUS: Visualization of AI-Assisted Task Guidance in AR

Authors: Sonia Castelo, Joao Rulff, Erin McGowan, Bea Steers, Guande Wu, Shaoyu Chen, Iran Roman, Roque Lopez, Ethan Brewer, Chen Zhao, **g Qian, Kyunghyun Cho, He He, Qi Sun, Huy Vo, Juan Bello, Michael Krone, Claudio Silva

Abstract: The concept of augmented reality (AR) assistants has captured the human imagination for decades, becoming a staple of modern science fiction. To pursue this goal, it is necessary to develop artificial intelligence (AI)-based methods that simultaneously perceive the 3D environment, reason about physical tasks, and model the performer, all in real-time. Within this framework, a wide variety of senso… ▽ More The concept of augmented reality (AR) assistants has captured the human imagination for decades, becoming a staple of modern science fiction. To pursue this goal, it is necessary to develop artificial intelligence (AI)-based methods that simultaneously perceive the 3D environment, reason about physical tasks, and model the performer, all in real-time. Within this framework, a wide variety of sensors are needed to generate data across different modalities, such as audio, video, depth, speech, and time-of-flight. The required sensors are typically part of the AR headset, providing performer sensing and interaction through visual, audio, and haptic feedback. AI assistants not only record the performer as they perform activities, but also require machine learning (ML) models to understand and assist the performer as they interact with the physical world. Therefore, develo** such assistants is a challenging task. We propose ARGUS, a visual analytics system to support the development of intelligent AR assistants. Our system was designed as part of a multi year-long collaboration between visualization researchers and ML and AR experts. This co-design process has led to advances in the visualization of ML in AR. Our system allows for online visualization of object, action, and step detection as well as offline analysis of previously recorded AR sessions. It visualizes not only the multimodal sensor data streams but also the output of the ML models. This allows developers to gain insights into the performer activities as well as the ML models, hel** them troubleshoot, improve, and fine tune the components of the AR assistant. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 11 pages, 8 figures. This is the author's version of the article of the article that has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics

arXiv:2307.16834 [pdf]

doi 10.1007/978-3-031-53963-3_25

Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System

Authors: Hoang Viet Pham, Thinh Gia Tran, Chuong Dinh Le, An Dinh Le, Hien Bich Vo

Abstract: Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jet… ▽ More Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jetson platform is one of the pioneers in offering optimal performance regarding energy efficiency and throughput in the execution of deep learning algorithms. Previously, most benchmarking analysis was based on 2D images with a single deep learning model for each comparison result. In this paper, we implement an end-to-end video-based crime-scene anomaly detection system inputting from surveillance videos and the system is deployed and completely operates on multiple Jetson edge devices (Nano, AGX Xavier, Orin Nano). The comparison analysis includes the integration of Torch-TensorRT as a software developer kit from NVIDIA for the model performance optimisation. The system is built based on the PySlowfast open-source project from Facebook as the coding template. The end-to-end system process comprises the videos from camera, data preprocessing pipeline, feature extractor and the anomaly detection. We provide the experience of an AI-based system deployment on various Jetson Edge devices with Docker technology. Regarding anomaly detectors, a weakly supervised video-based deep learning model called Robust Temporal Feature Magnitude Learning (RTFM) is applied in the system. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power. △ Less

Submitted 12 September, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: Accepted in Future of Information and Communication Conference (FICC) 2024

arXiv:2306.14726 [pdf, other]

Can An Old Fashioned Feature Extraction and A Light-weight Model Improve Vulnerability Type Identification Performance?

Authors: Hieu Dinh Vo, Son Nguyen

Abstract: Recent advances in automated vulnerability detection have achieved potential results in hel** developers determine vulnerable components. However, after detecting vulnerabilities, investigating to fix vulnerable code is a non-trivial task. In fact, the types of vulnerability, such as buffer overflow or memory corruption, could help developers quickly understand the nature of the weaknesses and l… ▽ More Recent advances in automated vulnerability detection have achieved potential results in hel** developers determine vulnerable components. However, after detecting vulnerabilities, investigating to fix vulnerable code is a non-trivial task. In fact, the types of vulnerability, such as buffer overflow or memory corruption, could help developers quickly understand the nature of the weaknesses and localize vulnerabilities for security analysis. In this work, we investigate the problem of vulnerability type identification (VTI). The problem is modeled as the multi-label classification task, which could be effectively addressed by "pre-training, then fine-tuning" framework with deep pre-trained embedding models. We evaluate the performance of the well-known and advanced pre-trained models for VTI on a large set of vulnerabilities. Surprisingly, their performance is not much better than that of the classical baseline approach with an old-fashioned bag-of-word, TF-IDF. Meanwhile, these deep neural network approaches cost much more resources and require GPU. We also introduce a lightweight independent component to refine the predictions of the baseline approach. Our idea is that the types of vulnerabilities could strongly correlate to certain code tokens (distinguishing tokens) in several crucial parts of programs. The distinguishing tokens for each vulnerability type are statistically identified based on their prevalence in the type versus the others. Our results show that the baseline approach enhanced by our component can outperform the state-of-the-art deep pre-trained approaches while retaining very high efficiency. Furthermore, the proposed component could also improve the neural network approaches by up to 92.8% in macro-average F1. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.14418 [pdf, other]

Context-Encoded Code Change Representation for Automated Commit Message Generation

Authors: Thanh Trong Vu, Thanh-Dat Do, Hieu Dinh Vo

Abstract: Changes in source code are an inevitable part of software development. They are the results of indispensable activities such as fixing bugs or improving functionality. Descriptions for code changes (commit messages) help people better understand the changes. However, due to a lack of motivation and time pressure, writing high-quality commit messages remains reluctantly considered. Several methods… ▽ More Changes in source code are an inevitable part of software development. They are the results of indispensable activities such as fixing bugs or improving functionality. Descriptions for code changes (commit messages) help people better understand the changes. However, due to a lack of motivation and time pressure, writing high-quality commit messages remains reluctantly considered. Several methods have been proposed with the aim of automated commit message generation. However, the existing methods are still limited because they only utilise either the changed code or the changed code combined with surrounding statements. This paper proposes a method to represent code changes by combining the changed code and the unchanged code which have program dependence on the changed code. This method overcomes the limitations of current representations while improving the performance of 5/6 of state-of-the-art commit message generation methods by up to 15% in METEOR, 14% in ROUGE-L, and 10% in BLEU-4. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: 16 pages

arXiv:2306.06620 [pdf, other]

ARIST: An Effective API Argument Recommendation Approach

Authors: Son Nguyen, Cuong Tran Manh, Kien T. Tran, Tan M. Nguyen, Thu-Trang Nguyen, Kien-Tuan Ngo, Hieu Dinh Vo

Abstract: Learning and remembering to use APIs are difficult. Several techniques have been proposed to assist developers in using APIs. Most existing techniques focus on recommending the right API methods to call, but very few techniques focus on recommending API arguments. In this paper, we propose ARIST, a novel automated argument recommendation approach which suggests arguments by predicting developers'… ▽ More Learning and remembering to use APIs are difficult. Several techniques have been proposed to assist developers in using APIs. Most existing techniques focus on recommending the right API methods to call, but very few techniques focus on recommending API arguments. In this paper, we propose ARIST, a novel automated argument recommendation approach which suggests arguments by predicting developers' expectations when they define and use API methods. To implement this idea in the recommendation process, ARIST combines program analysis (PA), language models (LMs), and several features specialized for the recommendation task which consider the functionality of formal parameters and the positional information of code elements (e.g., variables or method calls) in the given context. In ARIST, the LMs and the recommending features are used to suggest the promising candidates identified by PA. Meanwhile, PA navigates the LMs and the features working on the set of the valid candidates which satisfy syntax, accessibility, and type-compatibility constraints defined by the programming language in use. Our evaluation on a large dataset of real-world projects shows that ARIST improves the state-of-the-art approach by 19% and 18% in top-1 precision and recall for recommending arguments of frequently-used libraries. For general argument recommendation task, i.e., recommending arguments for every method call, ARIST outperforms the baseline approaches by up to 125% top-1 accuracy. Moreover, for newly-encountered projects, ARIST achieves more than 60% top-3 accuracy when evaluating on a larger dataset. For working/maintaining projects, with a personalized LM to capture developers' coding practice, ARIST can productively rank the expected arguments at the top-1 position in 7/10 requests. △ Less

Submitted 11 June, 2023; originally announced June 2023.

arXiv:2304.08396 [pdf, other]

Code-centric Learning-based Just-In-Time Vulnerability Detection

Authors: Son Nguyen, Thu-Trang Nguyen, Thanh Trong Vu, Thanh-Dat Do, Kien-Tuan Ngo, Hieu Dinh Vo

Abstract: Attacks against computer systems exploiting software vulnerabilities can cause substantial damage to the cyber-infrastructure of our modern society and economy. To minimize the consequences, it is vital to detect and fix vulnerabilities as soon as possible. Just-in-time vulnerability detection (JIT-VD) discovers vulnerability-prone ("dangerous") commits to prevent them from being merged into sourc… ▽ More Attacks against computer systems exploiting software vulnerabilities can cause substantial damage to the cyber-infrastructure of our modern society and economy. To minimize the consequences, it is vital to detect and fix vulnerabilities as soon as possible. Just-in-time vulnerability detection (JIT-VD) discovers vulnerability-prone ("dangerous") commits to prevent them from being merged into source code and causing vulnerabilities. By JIT-VD, the commits' authors, who understand the commits properly, can review these dangerous commits and fix them if necessary while the relevant modifications are still fresh in their minds. In this paper, we propose CodeJIT, a novel code-centric learning-based approach for just-in-time vulnerability detection. The key idea of CodeJIT is that the meaning of the code changes of a commit is the direct and deciding factor for determining if the commit is dangerous for the code. Based on that idea, we design a novel graph-based representation to represent the semantics of code changes in terms of both code structures and program dependencies. A graph neural network model is developed to capture the meaning of the code changes represented by our graph-based representation and learn to discriminate between dangerous and safe commits. We conducted experiments to evaluate the JIT-VD performance of CodeJIT on a dataset of 20K+ dangerous and safe commits in 506 real-world projects from 1998 to 2022. Our results show that CodeJIT significantly improves the state-of-the-art JIT-VD methods by up to 66% in Recall, 136% in Precision, and 68% in F1. Moreover, CodeJIT correctly classifies nearly 9/10 of dangerous/safe (benign) commits and even detects 69 commits that fix a vulnerability yet produce other issues in source code △ Less

Submitted 17 April, 2023; originally announced April 2023.

arXiv:2304.07213 [pdf, other]

doi 10.1016/j.rse.2023.113888

Very high resolution canopy height maps from RGB imagery using self-supervised vision transformer and convolutional decoder trained on Aerial Lidar

Authors: Jamie Tolan, Hung-I Yang, Ben Nosarzewski, Guillaume Couairon, Huy Vo, John Brandt, Justine Spore, Sayantan Majumdar, Daniel Haziza, Janaki Vamaraju, Theo Moutakanni, Piotr Bojanowski, Tracy Johns, Brian White, Tobias Tiecke, Camille Couprie

Abstract: Vegetation structure map** is critical for understanding the global carbon cycle and monitoring nature-based approaches to climate adaptation and mitigation. Repeated measurements of these data allow for the observation of deforestation or degradation of existing forests, natural forest regeneration, and the implementation of sustainable agricultural practices like agroforestry. Assessments of t… ▽ More Vegetation structure map** is critical for understanding the global carbon cycle and monitoring nature-based approaches to climate adaptation and mitigation. Repeated measurements of these data allow for the observation of deforestation or degradation of existing forests, natural forest regeneration, and the implementation of sustainable agricultural practices like agroforestry. Assessments of tree canopy height and crown projected area at a high spatial resolution are also important for monitoring carbon fluxes and assessing tree-based land uses, since forest structures can be highly spatially heterogeneous, especially in agroforestry systems. Very high resolution satellite imagery (less than one meter (1m) Ground Sample Distance) makes it possible to extract information at the tree level while allowing monitoring at a very large scale. This paper presents the first high-resolution canopy height map concurrently produced for multiple sub-national jurisdictions. Specifically, we produce very high resolution canopy height maps for the states of California and Sao Paulo, a significant improvement in resolution over the ten meter (10m) resolution of previous Sentinel / GEDI based worldwide maps of canopy height. The maps are generated by the extraction of features from a self-supervised model trained on Maxar imagery from 2017 to 2020, and the training of a dense prediction decoder against aerial lidar maps. We also introduce a post-processing step using a convolutional network trained on GEDI observations. We evaluate the proposed maps with set-aside validation lidar data as well as by comparing with other remotely sensed maps and field-collected data, and find our model produces an average Mean Absolute Error (MAE) of 2.8 meters and Mean Error (ME) of 0.6 meters. △ Less

Submitted 15 December, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

Journal ref: Remote Sensing of Environment 300, 113888, 2024

arXiv:2304.07193 [pdf, other]

DINOv2: Learning Robust Visual Features without Supervision

Authors: Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin , et al. (1 additional authors not shown)

Abstract: The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pr… ▽ More The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model (Dosovitskiy et al., 2020) with 1B parameters and distill it into a series of smaller models that surpass the best available all-purpose features, OpenCLIP (Ilharco et al., 2021) on most of the benchmarks at image and pixel levels. △ Less

Submitted 2 February, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

arXiv:2304.05731 [pdf, other]

SketchANIMAR: Sketch-based 3D Animal Fine-Grained Retrieval

Authors: Trung-Nghia Le, Tam V. Nguyen, Minh-Quan Le, Trong-Thuan Nguyen, Viet-Tham Huynh, Trong-Le Do, Khanh-Duy Le, Mai-Khiem Tran, Nhat Hoang-Xuan, Thang-Long Nguyen-Ho, Vinh-Tiep Nguyen, Nhat-Quynh Le-Pham, Huu-Phuc Pham, Trong-Vu Hoang, Quang-Binh Nguyen, Trong-Hieu Nguyen-Mau, Tuan-Luc Huynh, Thanh-Danh Le, Ngoc-Linh Nguyen-Ha, Tuong-Vy Truong-Thuy, Truong Hoai Phong, Tuong-Nghiem Diep, Khanh-Duy Ho, Xuan-Hieu Nguyen, Thien-Phuc Tran , et al. (9 additional authors not shown)

Abstract: The retrieval of 3D objects has gained significant importance in recent years due to its broad range of applications in computer vision, computer graphics, virtual reality, and augmented reality. However, the retrieval of 3D objects presents significant challenges due to the intricate nature of 3D models, which can vary in shape, size, and texture, and have numerous polygons and vertices. To this… ▽ More The retrieval of 3D objects has gained significant importance in recent years due to its broad range of applications in computer vision, computer graphics, virtual reality, and augmented reality. However, the retrieval of 3D objects presents significant challenges due to the intricate nature of 3D models, which can vary in shape, size, and texture, and have numerous polygons and vertices. To this end, we introduce a novel SHREC challenge track that focuses on retrieving relevant 3D animal models from a dataset using sketch queries and expedites accessing 3D models through available sketches. Furthermore, a new dataset named ANIMAR was constructed in this study, comprising a collection of 711 unique 3D animal models and 140 corresponding sketch queries. Our contest requires participants to retrieve 3D models based on complex and detailed sketches. We receive satisfactory results from eight teams and 204 runs. Although further improvement is necessary, the proposed task has the potential to incentivize additional research in the domain of 3D object retrieval, potentially yielding benefits for a wide range of applications. We also provide insights into potential areas of future research, such as improving techniques for feature extraction and matching and creating more diverse datasets to evaluate retrieval performance. https://aichallenge.hcmus.edu.vn/sketchanimar △ Less

Submitted 9 August, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

Comments: Accepted to Computers & Graphics (3DOR 2023, Journal track)

arXiv:2212.14353 [pdf, other]

Sheaf-theoretic self-filtering network of low-cost sensors for local air quality monitoring: A causal approach

Authors: Anh-Duy Pham, Chuong Dinh Le, Hoang Viet Pham, Thinh Gia Tran, Dat Thanh Vo, Chau Long Tran, An Dinh Le, Hien Bich Vo

Abstract: Sheaf theory, which is a complex but powerful tool supported by topological theory, offers more flexibility and precision than traditional graph theory when it comes to modeling relationships between multiple features. In the realm of air quality monitoring, this can be incredibly useful in detecting sudden changes in local dust particle density, which can be difficult to accurately measure using… ▽ More Sheaf theory, which is a complex but powerful tool supported by topological theory, offers more flexibility and precision than traditional graph theory when it comes to modeling relationships between multiple features. In the realm of air quality monitoring, this can be incredibly useful in detecting sudden changes in local dust particle density, which can be difficult to accurately measure using commercial instruments. Traditional methods for air quality measurement often rely on calibrating the measurement with public standard instruments or calculating the measurements moving average over a constant period. However, this can lead to an incorrect index at the measurement location, as well as an oversmoothing effect on the signal. In this study, we propose a compact device that uses sheaf theory to detect and count vehicles as a local air quality change-causing factor. By inferring the number of vehicles into the PM2.5 index and propagating it into the recorded PM2.5 index from low-cost air monitoring sensors such as PMS7003 and BME280, we can achieve self-correction in real-time. Plus, the sheaf-theoretic method allows for easy scaling to multiple nodes for further filtering effects. By implementing sheaf theory in air quality monitoring, we can overcome the limitations of traditional methods and provide more accurate and reliable results. △ Less

Submitted 29 December, 2022; originally announced December 2022.

arXiv:2212.01761 [pdf]

A PM2.5 concentration prediction framework with vehicle tracking system: From cause to effect

Authors: Chuong D. Le, Hoang V. Pham, Duy A. Pham, An D. Le, Hien B. Vo

Abstract: Air pollution is an emerging problem that needs to be solved especially in developed and develo** countries. In Vietnam, air pollution is also a concerning issue in big cities such as Hanoi and Ho Chi Minh cities where air pollution comes mostly from vehicles such as cars and motorbikes. In order to tackle the problem, the paper focuses on develo** a solution that can estimate the emitted PM2.… ▽ More Air pollution is an emerging problem that needs to be solved especially in developed and develo** countries. In Vietnam, air pollution is also a concerning issue in big cities such as Hanoi and Ho Chi Minh cities where air pollution comes mostly from vehicles such as cars and motorbikes. In order to tackle the problem, the paper focuses on develo** a solution that can estimate the emitted PM2.5 pollutants by counting the number of vehicles in the traffic. We first investigated among the recent object detection models and developed our own traffic surveillance system. The observed traffic density showed a similar trend to the measured PM2.5 with a certain lagging in time, suggesting a relation between traffic density and PM2.5. We further express this relationship with a mathematical model which can estimate the PM2.5 value based on the observed traffic density. The estimated result showed a great correlation with the measured PM2.5 plots in the urban area context. △ Less

Submitted 4 December, 2022; originally announced December 2022.

arXiv:2209.12181 [pdf]

Using Multiple Code Representations to Prioritize Static Analysis Warnings

Authors: Thanh Trong Vu, Hieu Dinh Vo

Abstract: In order to ensure the quality of software and prevent attacks from hackers on critical systems, static analysis tools are frequently utilized to detect vulnerabilities in the early development phase. However, these tools often report a large number of warnings with a high false-positive rate, which causes many difficulties for developers. In this paper, we introduce VulRG, a novel approach to add… ▽ More In order to ensure the quality of software and prevent attacks from hackers on critical systems, static analysis tools are frequently utilized to detect vulnerabilities in the early development phase. However, these tools often report a large number of warnings with a high false-positive rate, which causes many difficulties for developers. In this paper, we introduce VulRG, a novel approach to address this problem. Specifically, VulRG predicts and ranks the warnings based on their likelihoods to be true positive. To predict that likelihood, VulRG combines two deep learning models CNN and BiGRU to capture the context of each warning in terms of program syntax, control flow, and program dependence. Our experimental results on a real-world dataset of 6,620 warnings show that VulRG's Recall at Top-50% is 90.9%. This means that using VulRG, 90% of the vulnerabilities can be found by examining only 50% of the warnings. Moreover, at Top-5%, VulRG can improve the state-of-the-art approach by +30% in both Precision and Recall. △ Less

Submitted 26 September, 2022; v1 submitted 25 September, 2022; originally announced September 2022.

Comments: 6 pages, 2 figures, 4 tables

MSC Class: 68N20 ACM Class: D.2.5

arXiv:2209.02415 [pdf, other]

Automatic Infectious Disease Classification Analysis with Concept Discovery

Authors: Elena Sizikova, Joshua Vendrow, Xu Cao, Rachel Grotheer, Jamie Haddock, Lara Kassab, Alona Kryshchenko, Thomas Merkh, R. W. M. A. Madushani, Kenny Moise, Annie Ulichney, Huy V. Vo, Chuntian Wang, Megan Coffee, Kathryn Leonard, Deanna Needell

Abstract: Automatic infectious disease classification from images can facilitate needed medical diagnoses. Such an approach can identify diseases, like tuberculosis, which remain under-diagnosed due to resource constraints and also novel and emerging diseases, like monkeypox, which clinicians have little experience or acumen in diagnosing. Avoiding missed or delayed diagnoses would prevent further transmiss… ▽ More Automatic infectious disease classification from images can facilitate needed medical diagnoses. Such an approach can identify diseases, like tuberculosis, which remain under-diagnosed due to resource constraints and also novel and emerging diseases, like monkeypox, which clinicians have little experience or acumen in diagnosing. Avoiding missed or delayed diagnoses would prevent further transmission and improve clinical outcomes. In order to understand and trust neural network predictions, analysis of learned representations is necessary. In this work, we argue that automatic discovery of concepts, i.e., human interpretable attributes, allows for a deep understanding of learned information in medical image analysis tasks, generalizing beyond the training labels or protocols. We provide an overview of existing concept discovery approaches in medical image and computer vision communities, and evaluate representative methods on tuberculosis (TB) prediction and monkeypox prediction tasks. Finally, we propose NMFx, a general NMF formulation of interpretability by concept discovery that works in a unified way in unsupervised, weakly supervised, and supervised scenarios. △ Less

Submitted 14 November, 2022; v1 submitted 28 August, 2022; originally announced September 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 13 pages

arXiv:2207.12112 [pdf, other]

Active Learning Strategies for Weakly-supervised Object Detection

Authors: Huy V. Vo, Oriane Siméoni, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Jean Ponce

Abstract: Object detectors trained with weak annotations are affordable alternatives to fully-supervised counterparts. However, there is still a significant performance gap between them. We propose to narrow this gap by fine-tuning a base pre-trained weakly-supervised detector with a few fully-annotated samples automatically selected from the training set using ``box-in-box'' (BiB), a novel active learning… ▽ More Object detectors trained with weak annotations are affordable alternatives to fully-supervised counterparts. However, there is still a significant performance gap between them. We propose to narrow this gap by fine-tuning a base pre-trained weakly-supervised detector with a few fully-annotated samples automatically selected from the training set using ``box-in-box'' (BiB), a novel active learning strategy designed specifically to address the well-documented failure modes of weakly-supervised detectors. Experiments on the VOC07 and COCO benchmarks show that BiB outperforms other active learning techniques and significantly improves the base weakly-supervised detector's performance with only a few fully-annotated images per class. BiB reaches 97% of the performance of fully-supervised Fast RCNN with only 10% of fully-annotated images on VOC07. On COCO, using on average 10 fully-annotated images per class, or equivalently 1% of the training set, BiB also reduces the performance gap (in AP) between the weakly-supervised detector and the fully-supervised Fast RCNN by over 70%, showing a good trade-off between performance and data efficiency. Our code is publicly available at https://github.com/huyvvo/BiB. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: Accepted to European Conference on Computer Vision (ECCV) 2022. Contains 27 pages, 9 tables and 6 figures

arXiv:2205.05194 [pdf, other]

Multiplexed Immunofluorescence Brain Image Analysis Using Self-Supervised Dual-Loss Adaptive Masked Autoencoder

Authors: Son T. Ly, Bai Lin, Hung Q. Vo, Dragan Maric, Badri Roysam, Hien V. Nguyen

Abstract: Reliable large-scale cell detection and segmentation is the fundamental first step to understanding biological processes in the brain. The ability to phenotype cells at scale can accelerate preclinical drug evaluation and system-level brain histology studies. The impressive advances in deep learning offer a practical solution to cell image detection and segmentation. Unfortunately, categorizing ce… ▽ More Reliable large-scale cell detection and segmentation is the fundamental first step to understanding biological processes in the brain. The ability to phenotype cells at scale can accelerate preclinical drug evaluation and system-level brain histology studies. The impressive advances in deep learning offer a practical solution to cell image detection and segmentation. Unfortunately, categorizing cells and delineating their boundaries for training deep networks is an expensive process that requires skilled biologists. This paper presents a novel self-supervised Dual-Loss Adaptive Masked Autoencoder (DAMA) for learning rich features from multiplexed immunofluorescence brain images. DAMA's objective function minimizes the conditional entropy in pixel-level reconstruction and feature-level regression. Unlike existing self-supervised learning methods based on a random image masking strategy, DAMA employs a novel adaptive mask sampling strategy to maximize mutual information and effectively learn brain cell data. To the best of our knowledge, this is the first effort to develop a self-supervised learning method for multiplexed immunofluorescence brain images. Our extensive experiments demonstrate that DAMA features enable superior cell detection, segmentation, and classification performance without requiring many annotations. △ Less

Submitted 30 December, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

Comments: Adding new results on multiplexed image data and data efficiency. Pytorch code: https://github.com/hula-ai/DAMA

arXiv:2112.14594 [pdf, other]

The Levy Flight of Cities: Analyzing Social-Economical Trajectories with Auto-Embedding

Authors: Linfang Tian, Kai Zhao, Jiaming Yin, Huy Vo, Weixiong Rao

Abstract: It has been found that human mobility exhibits random patterns following the Levy flight, where human movement contains many short flights and some long flights, and these flights follow a power-law distribution. In this paper, we study the social-economical development trajectories of urban cities. We observe that social-economical movement of cities also exhibit the Levy flight characteristics.… ▽ More It has been found that human mobility exhibits random patterns following the Levy flight, where human movement contains many short flights and some long flights, and these flights follow a power-law distribution. In this paper, we study the social-economical development trajectories of urban cities. We observe that social-economical movement of cities also exhibit the Levy flight characteristics. We collect the social and economical data such as the population, the number of students, GDP and personal income, etc. from several cities. Then we map these urban data into the social and economical factors through a deep-learning embedding method Auto-Encoder. We find that the social-economical factors of these cities can be fitted approximately as a movement pattern of a power-law distribution. We use the Stochastic Multiplicative Processes (SMP) to explain such movement, where in the presence of a boundary constraint, the SMP leads to a power law distribution. It means that the social-economical trajectories of cities also follow a Levy flight pattern, where some years have large changes in terms of social-economical development, and many years have little changes. △ Less

Submitted 29 December, 2021; originally announced December 2021.

arXiv:2112.13341 [pdf, other]

AlertTrap: A study on object detection in remote insects trap monitoring system using on-the-edge deep learning platform

Authors: An D. Le, Duy A. Pham, Dong T. Pham, Hien B. Vo

Abstract: Fruit flies are one of the most harmful insect species to fruit yields. In AlertTrap, implementation of SSD architecture with different state-of-the-art backbone feature extractors such as MobileNetV1 and MobileNetV2 appear to be potential solutions for the real-time detection problem. SSD-MobileNetV1 and SSD-MobileNetV2 perform well and result in [email protected] of 0.957 and 1.0 respectively. YOLOv4-tiny… ▽ More Fruit flies are one of the most harmful insect species to fruit yields. In AlertTrap, implementation of SSD architecture with different state-of-the-art backbone feature extractors such as MobileNetV1 and MobileNetV2 appear to be potential solutions for the real-time detection problem. SSD-MobileNetV1 and SSD-MobileNetV2 perform well and result in [email protected] of 0.957 and 1.0 respectively. YOLOv4-tiny outperforms the SSD family with 1.0 in [email protected]; however, its throughput velocity is slightly slower. △ Less

Submitted 4 March, 2022; v1 submitted 26 December, 2021; originally announced December 2021.

arXiv:2112.11052 [pdf, other]

doi 10.1109/NICS54270.2021.9701541

Predicting Job Titles from Job Descriptions with Multi-label Text Classification

Authors: Hieu Trung Tran, Hanh Hong Phuc Vo, Son T. Luu

Abstract: Finding a suitable job and hunting for eligible candidates are important to job seeking and human resource agencies. With the vast information about job descriptions, employees and employers need assistance to automatically detect job titles based on job description texts. In this paper, we propose the multi-label classification approach for predicting relevant job titles from job description text… ▽ More Finding a suitable job and hunting for eligible candidates are important to job seeking and human resource agencies. With the vast information about job descriptions, employees and employers need assistance to automatically detect job titles based on job description texts. In this paper, we propose the multi-label classification approach for predicting relevant job titles from job description texts, and implement the Bi-GRU-LSTM-CNN with different pre-trained language models to apply for the job titles prediction problem. The BERT with multilingual pre-trained model obtains the highest result by F1-scores on both development and test sets, which are 62.20% on the development set, and 47.44% on the test set. △ Less

Submitted 9 February, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

Comments: Published in the 2021 NAFOSTED Conference on Information and Computer Science (NICS 2021)

arXiv:2111.00640 [pdf, other]

VSEC: Transformer-based Model for Vietnamese Spelling Correction

Authors: Dinh-Truong Do, Ha Thanh Nguyen, Thang Ngoc Bui, Dinh Hieu Vo

Abstract: Spelling error correction is one of topics which have a long history in natural language processing. Although previous studies have achieved remarkable results, challenges still exist. In the Vietnamese language, a state-of-the-art method for the task infers a syllable's context from its adjacent syllables. The method's accuracy can be unsatisfactory, however, because the model may lose the contex… ▽ More Spelling error correction is one of topics which have a long history in natural language processing. Although previous studies have achieved remarkable results, challenges still exist. In the Vietnamese language, a state-of-the-art method for the task infers a syllable's context from its adjacent syllables. The method's accuracy can be unsatisfactory, however, because the model may lose the context if two (or more) spelling mistakes stand near each other. In this paper, we propose a novel method to correct Vietnamese spelling errors. We tackle the problems of mistyped errors and misspelled errors by using a deep learning model. The embedding layer, in particular, is powered by the byte pair encoding technique. The sequence to sequence model based on the Transformer architecture makes our approach different from the previous works on the same problem. In the experiment, we train the model with a large synthetic dataset, which is randomly introduced spelling errors. We test the performance of the proposed method using a realistic dataset. This dataset contains 11,202 human-made misspellings in 9,341 different Vietnamese sentences. The experimental results show that our method achieves encouraging performance with 86.8% errors detected and 81.5% errors corrected, which improves the state-of-the-art approach 5.6% and 2.2%, respectively. △ Less

Submitted 8 November, 2021; v1 submitted 31 October, 2021; originally announced November 2021.

arXiv:2110.03296 [pdf, other]

doi 10.1109/APSEC53868.2021.00040

Ranking Warnings of Static Analysis Tools Using Representation Learning

Authors: Kien-Tuan Ngo, Dinh-Truong Do, Thu-Trang Nguyen, Hieu Dinh Vo

Abstract: Static analysis tools are frequently used to detect potential vulnerabilities in software systems. However, an inevitable problem of these tools is their large number of warnings with a high false positive rate, which consumes time and effort for investigating. In this paper, we present DeFP, a novel method for ranking static analysis warnings. Based on the intuition that warnings which have simil… ▽ More Static analysis tools are frequently used to detect potential vulnerabilities in software systems. However, an inevitable problem of these tools is their large number of warnings with a high false positive rate, which consumes time and effort for investigating. In this paper, we present DeFP, a novel method for ranking static analysis warnings. Based on the intuition that warnings which have similar contexts tend to have similar labels (true positive or false positive), DeFP is built with two BiLSTM models to capture the patterns associated with the contexts of labeled warnings. After that, for a set of new warnings, DeFP can calculate and rank them according to their likelihoods to be true positives (i.e., actual vulnerabilities). Our experimental results on a dataset of 10 real-world projects show that using DeFP, by investigating only 60% of the warnings, developers can find +90% of actual vulnerabilities. Moreover, DeFP improves the state-of-the-art approach 30% in both Precision and Recall. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: Published in Proceedings of the 28th Asia-Pacific Software Engineering Conference (APSEC'21)

arXiv:2109.14279 [pdf, other]

Localizing Objects with Self-Supervised Transformers and no Labels

Authors: Oriane Siméoni, Gilles Puy, Huy V. Vo, Simon Roburin, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Renaud Marlet, Jean Ponce

Abstract: Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns. We propose a simple approach to this problem, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner. Our method, LOST, does not require any external object proposal nor any exploration of the image collection; it operates on a single image.… ▽ More Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns. We propose a simple approach to this problem, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner. Our method, LOST, does not require any external object proposal nor any exploration of the image collection; it operates on a single image. Yet, we outperform state-of-the-art object discovery methods by up to 8 CorLoc points on PASCAL VOC 2012. We also show that training a class-agnostic detector on the discovered objects boosts results by another 7 points. Moreover, we show promising results on the unsupervised object discovery task. The code to reproduce our results can be found at https://github.com/valeoai/LOST. △ Less

Submitted 29 September, 2021; originally announced September 2021.

Journal ref: BMVC 2021

arXiv:2109.10156 [pdf, other]

doi 10.1109/TSE.2021.3113859

A Variability Fault Localization Approach for Software Product Lines

Authors: Thu-Trang Nguyen, Kien-Tuan Ngo, Son Nguyen, Hieu Dinh Vo

Abstract: Software fault localization is one of the most expensive, tedious, and time-consuming activities in program debugging. This activity becomes even much more challenging in Software Product Line (SPL) systems due to variability of failures. These unexpected behaviors are induced by variability faults which can only be exposed under some combinations of system features. The interaction among these fe… ▽ More Software fault localization is one of the most expensive, tedious, and time-consuming activities in program debugging. This activity becomes even much more challenging in Software Product Line (SPL) systems due to variability of failures. These unexpected behaviors are induced by variability faults which can only be exposed under some combinations of system features. The interaction among these features causes the failures of the system. Although localizing bugs in single-system engineering has been studied in-depth, variability fault localization in SPL systems still remains mostly unexplored. In this article, we present VarCop, a novel and effective variability fault localization approach. For an SPL system failed by variability bugs, VarCop isolates suspicious code statements by analyzing the overall test results of the sampled products and their source code. The isolated suspicious statements are the statements related to the interaction among the features which are necessary for the visibility of the bugs in the system. The suspiciousness of each isolated statement is assessed based on both the overall test results of the products containing the statement as well as the detailed results of the test cases executed by the statement in these products. On a large dataset of buggy SPL systems, empirical evaluation shows that VarCop significantly improves two state-of-the-art fault localization techniques by 33% and 50% in ranking the incorrect statements in the systems containing a single bug each. In about two-thirds of the cases, VarCop ranks the buggy statements at the top-3 positions in the resulting lists. For multiple-bug cases, VarCop outperforms the state-of-the-art approaches 2 times and 10 times in the proportion of bugs localized at the top-1 positions. In 22% and 65% of the buggy versions, VarCop correctly ranks at least one bug in a system at the top-1 and top-5 positions. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: Published in IEEE Transactions on Software Engineering (Early Access)

arXiv:2109.02917 [pdf, other]

Fine-grained Hand Gesture Recognition in Multi-viewpoint Hand Hygiene

Authors: Huy Q. Vo, Tuong Do, Vi C. Pham, Duy Nguyen, An T. Duong, Quang D. Tran

Abstract: This paper contributes a new high-quality dataset for hand gesture recognition in hand hygiene systems, named "MFH". Generally, current datasets are not focused on: (i) fine-grained actions; and (ii) data mismatch between different viewpoints, which are available under realistic settings. To address the aforementioned issues, the MFH dataset is proposed to contain a total of 731147 samples obtaine… ▽ More This paper contributes a new high-quality dataset for hand gesture recognition in hand hygiene systems, named "MFH". Generally, current datasets are not focused on: (i) fine-grained actions; and (ii) data mismatch between different viewpoints, which are available under realistic settings. To address the aforementioned issues, the MFH dataset is proposed to contain a total of 731147 samples obtained by different camera views in 6 non-overlap** locations. Additionally, each sample belongs to one of seven steps introduced by the World Health Organization (WHO). As a minor contribution, inspired by advances in fine-grained image recognition and distribution adaptation, this paper recommends using the self-supervised learning method to handle these preceding problems. The extensive experiments on the benchmarking MFH dataset show that the introduced method yields competitive performance in both the Accuracy and the Macro F1-score. The code and the MFH dataset are available at https://github.com/willogy-team/hand-gesture-recognition-smc2021. △ Less

Submitted 7 September, 2021; originally announced September 2021.

Comments: 6 pages, accepted for oral in IEEE SMC 2021

arXiv:2108.09591 [pdf, other]

doi 10.1109/BHI50953.2021.9508604

Multimodal Breast Lesion Classification Using Cross-Attention Deep Networks

Authors: Hung Q. Vo, Pengyu Yuan, Tiancheng He, Stephen T. C. Wong, Hien V. Nguyen

Abstract: Accurate breast lesion risk estimation can significantly reduce unnecessary biopsies and help doctors decide optimal treatment plans. Most existing computer-aided systems rely solely on mammogram features to classify breast lesions. While this approach is convenient, it does not fully exploit useful information in clinical reports to achieve the optimal performance. Would clinical features signifi… ▽ More Accurate breast lesion risk estimation can significantly reduce unnecessary biopsies and help doctors decide optimal treatment plans. Most existing computer-aided systems rely solely on mammogram features to classify breast lesions. While this approach is convenient, it does not fully exploit useful information in clinical reports to achieve the optimal performance. Would clinical features significantly improve breast lesion classification compared to using mammograms alone? How to handle missing clinical information caused by variation in medical practice? What is the best way to combine mammograms and clinical features? There is a compelling need for a systematic study to address these fundamental questions. This paper investigates several multimodal deep networks based on feature concatenation, cross-attention, and co-attention to combine mammograms and categorical clinical variables. We show that the proposed architectures significantly increase the lesion classification performance (average area under ROC curves from 0.89 to 0.94). We also evaluate the model when clinical variables are missing. △ Less

Submitted 21 August, 2021; originally announced August 2021.

arXiv:2107.04741 [pdf, other]

doi 10.1145/3461001.3473058

Variability Fault Localization: A Benchmark

Authors: Kien-Tuan Ngo, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo

Abstract: Software fault localization is one of the most expensive, tedious, and time-consuming activities in program debugging. This activity becomes even much more challenging in Software Product Line (SPL) systems due to the variability of failures in SPL systems. These unexpected behaviors are caused by variability faults which can only be exposed under some combinations of system features. Although loc… ▽ More Software fault localization is one of the most expensive, tedious, and time-consuming activities in program debugging. This activity becomes even much more challenging in Software Product Line (SPL) systems due to the variability of failures in SPL systems. These unexpected behaviors are caused by variability faults which can only be exposed under some combinations of system features. Although localizing bugs in non-configurable code has been investigated in-depth, variability fault localization in SPL systems still remains mostly unexplored. To approach this challenge, we propose a benchmark for variability fault localization with a large set of 1,570 buggy versions of six SPL systems and baseline variability fault localization performance results. Our hope is to engage the community to propose new and better approaches to the problem of variability fault localization in SPL systems. △ Less

Submitted 21 September, 2021; v1 submitted 9 July, 2021; originally announced July 2021.

Comments: Published in Proceedings of the 25th ACM International Systems and Software Product Line Conference - Volume A (SPLC '21)

arXiv:2106.06650 [pdf, other]

Large-Scale Unsupervised Object Discovery

Authors: Huy V. Vo, Elena Sizikova, Cordelia Schmid, Patrick Pérez, Jean Ponce

Abstract: Existing approaches to unsupervised object discovery (UOD) do not scale up to large datasets without approximations that compromise their performance. We propose a novel formulation of UOD as a ranking problem, amenable to the arsenal of distributed methods available for eigenvalue problems and link analysis. Through the use of self-supervised features, we also demonstrate the first effective full… ▽ More Existing approaches to unsupervised object discovery (UOD) do not scale up to large datasets without approximations that compromise their performance. We propose a novel formulation of UOD as a ranking problem, amenable to the arsenal of distributed methods available for eigenvalue problems and link analysis. Through the use of self-supervised features, we also demonstrate the first effective fully unsupervised pipeline for UOD. Extensive experiments on COCO and OpenImages show that, in the single-object discovery setting where a single prominent object is sought in each image, the proposed LOD (Large-scale Object Discovery) approach is on par with, or better than the state of the art for medium-scale datasets (up to 120K images), and over 37% better than the only other algorithms capable of scaling up to 1.7M images. In the multi-object discovery setting where multiple objects are sought in each image, the proposed LOD is over 14% better in average precision (AP) than all other methods for datasets ranging from 20K to 1.7M images. Using self-supervised features, we also show that the proposed method obtains state-of-the-art UOD performance on OpenImages. Our code is publicly available at https://github.com/huyvvo/LOD. △ Less

Submitted 16 November, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

Comments: Accepted to NeurIPS 2021, 19 pages with supplemental materials

arXiv:2106.01598 [pdf, other]

doi 10.1109/RIVF51545.2021.9642116

Automatically Detecting Cyberbullying Comments on Online Game Forums

Authors: Hanh Hong-Phuc Vo, Hieu Trung Tran, Son T. Luu

Abstract: Online game forums are popular to most of game players. They use it to communicate and discuss the strategy of the game, or even to make friends. However, game forums also contain abusive and harassment speech, disturbing and threatening players. Therefore, it is necessary to automatically detect and remove cyberbullying comments to keep the game forum clean and friendly. We use the Cyberbullying… ▽ More Online game forums are popular to most of game players. They use it to communicate and discuss the strategy of the game, or even to make friends. However, game forums also contain abusive and harassment speech, disturbing and threatening players. Therefore, it is necessary to automatically detect and remove cyberbullying comments to keep the game forum clean and friendly. We use the Cyberbullying dataset collected from World of Warcraft (WoW) and League of Legends (LoL) forums and train classification models to automatically detect whether a comment of a player is abusive or not. The result obtains 82.69% of macro F1-score for LoL forum and 83.86% of macro F1-score for WoW forum by the Toxic-BERT model on the Cyberbullying dataset. △ Less

Submitted 26 December, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: Published in the 2021 RIVF International Conference on Computing and Communication Technologies (RIVF)

arXiv:2105.00994 [pdf, other]

Fleet management for ride-pooling with meeting points at scale: a case study in the five boroughs of New York City

Authors: Motahare Mounesan, Vindula Jayawardana, Yaocheng Wu, Samitha Samaranayake, Huy T. Vo

Abstract: Introducing meeting points to ride-pooling (RP) services has been shown to increase the satisfaction level of both riders and service providers. Passengers may choose to walk to a meeting point for a cost reduction. Drivers may also get matched with more riders without making additional stops. There are economic benefits of using ride-pooling with meeting points (RPMP) compared to the traditional… ▽ More Introducing meeting points to ride-pooling (RP) services has been shown to increase the satisfaction level of both riders and service providers. Passengers may choose to walk to a meeting point for a cost reduction. Drivers may also get matched with more riders without making additional stops. There are economic benefits of using ride-pooling with meeting points (RPMP) compared to the traditional RP services. Many RPMP models have been proposed to better understand their benefits. However, most prior works study RPMP either with a restricted set of parameters or at a small scale due to the expensive computation involved. In this paper, we propose STaRS+, a scalable RPMP framework that is based on a comprehensive integer linear programming model. The high scalability of STaRS+ is achieved by utilizing a heuristic optimization strategy along with a novel shortest-path caching scheme. We applied our model to the NYC metro area to evaluate the scalability of the framework and demonstrate the importance of city-scale simulations. Our results show that city-scale simulations can reveal valuable insights for city planners that are not always visible at smaller scales. To the best of our knowledge, STaRS+ is the first study on the RPMP that can solve large-scale instances on the order of the entire NYC metro area. △ Less

Submitted 25 April, 2021; originally announced May 2021.

arXiv:2104.12245 [pdf, other]

doi 10.5220/0010242303960407

Single Stage Class Agnostic Common Object Detection: A Simple Baseline

Authors: Chuong H. Nguyen, Thuy C. Nguyen, Anh H. Vo, Yamazaki Masayuki

Abstract: This paper addresses the problem of common object detection, which aims to detect objects of similar categories from a set of images. Although it shares some similarities with the standard object detection and co-segmentation, common object detection, recently promoted by \cite{Jiang2019a}, has some unique advantages and challenges. First, it is designed to work on both closed-set and open-set con… ▽ More This paper addresses the problem of common object detection, which aims to detect objects of similar categories from a set of images. Although it shares some similarities with the standard object detection and co-segmentation, common object detection, recently promoted by \cite{Jiang2019a}, has some unique advantages and challenges. First, it is designed to work on both closed-set and open-set conditions, a.k.a. known and unknown objects. Second, it must be able to match objects of the same category but not restricted to the same instance, texture, or posture. Third, it can distinguish multiple objects. In this work, we introduce the Single Stage Common Object Detection (SSCOD) to detect class-agnostic common objects from an image set. The proposed method is built upon the standard single-stage object detector. Furthermore, an embedded branch is introduced to generate the object's representation feature, and their similarity is measured by cosine distance. Experiments are conducted on PASCAL VOC 2007 and COCO 2014 datasets. While being simple and flexible, our proposed SSCOD built upon ATSSNet performs significantly better than the baseline of the standard object detection, while still be able to match objects of unknown categories. Our source code can be found at \href{https://github.com/cybercore-co-ltd/Single-Stage-Common-Object-Detection}{(URL)} △ Less

Submitted 25 April, 2021; originally announced April 2021.

Comments: This paper is accepted to International Conference on Pattern Recognition Applications and Methods (ICPRAM) 2021

Report number: ISBN 978-989-758-486-2 ISSN 2184-4313, pages 396-407

arXiv:2007.02662 [pdf, other]

Toward unsupervised, multi-object discovery in large-scale image collections

Authors: Huy V. Vo, Patrick Pérez, Jean Ponce

Abstract: This paper addresses the problem of discovering the objects present in a collection of images without any supervision. We build on the optimization approach of Vo et al. (CVPR'19) with several key novelties: (1) We propose a novel saliency-based region proposal algorithm that achieves significantly higher overlap with ground-truth objects than other competitive methods. This procedure leverages of… ▽ More This paper addresses the problem of discovering the objects present in a collection of images without any supervision. We build on the optimization approach of Vo et al. (CVPR'19) with several key novelties: (1) We propose a novel saliency-based region proposal algorithm that achieves significantly higher overlap with ground-truth objects than other competitive methods. This procedure leverages off-the-shelf CNN features trained on classification tasks without any bounding box information, but is otherwise unsupervised. (2) We exploit the inherent hierarchical structure of proposals as an effective regularizer for the approach to object discovery of Vo et al., boosting its performance to significantly improve over the state of the art on several standard benchmarks. (3) We adopt a two-stage strategy to select promising proposals using small random sets of images before using the whole image collection to discover the objects it depicts, allowing us to tackle, for the first time (to the best of our knowledge), the discovery of multiple objects in each one of the pictures making up datasets with up to 20,000 images, an over five-fold increase compared to existing methods, and a first step toward true large-scale unsupervised image interpretation. △ Less

Submitted 25 August, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: Accepted for publication in European Conference on Computer Vision (ECCV) 2020

arXiv:1906.09995 [pdf]

doi 10.1109/TBDATA.2019.2907987

AMIC: An Adaptive Information Theoretic Method to Identify Multi-Scale Temporal Correlations in Big Time Series Data -- Accepted Version

Authors: Nguyen Ho, Huy Vo, Mai Vu, Torben Bach Pedersen

Abstract: Recent development in computing, sensing and crowd-sourced data have resulted in an explosion in the availability of quantitative information. The possibilities of analyzing this so-called Big Data to inform research and the decision-making process are virtually endless. In general, analyses have to be done across multiple data sets in order to bring out the most value of Big Data. A first importa… ▽ More Recent development in computing, sensing and crowd-sourced data have resulted in an explosion in the availability of quantitative information. The possibilities of analyzing this so-called Big Data to inform research and the decision-making process are virtually endless. In general, analyses have to be done across multiple data sets in order to bring out the most value of Big Data. A first important step is to identify temporal correlations between data sets. Given the characteristics of Big Data in terms of volume and velocity, techniques that identify correlations not only need to be fast and scalable, but also need to help users in ordering the correlations across temporal scales so that they can focus on important relationships. In this paper, we present AMIC (Adaptive Mutual Information-based Correlation), a method based on mutual information to identify correlations at multiple temporal scales in large time series. Discovered correlations are suggested to users in an order based on the strength of the relationships. Our method supports an adaptive streaming technique that minimizes duplicated computation and is implemented on top of Apache Spark for scalability. We also provide a comprehensive evaluation on the effectiveness and the scalability of AMIC using both synthetic and real-world data sets. △ Less

Submitted 7 July, 2019; v1 submitted 24 June, 2019; originally announced June 2019.

arXiv:1904.03148 [pdf, other]

Unsupervised Image Matching and Object Discovery as Optimization

Authors: Huy V. Vo, Francis Bach, Minsu Cho, Kai Han, Yann LeCun, Patrick Perez, Jean Ponce

Abstract: Learning with complete or partial supervision is powerful but relies on ever-growing human annotation efforts. As a way to mitigate this serious problem, as well as to serve specific applications, unsupervised learning has emerged as an important field of research. In computer vision, unsupervised learning comes in various guises. We focus here on the unsupervised discovery and matching of object… ▽ More Learning with complete or partial supervision is powerful but relies on ever-growing human annotation efforts. As a way to mitigate this serious problem, as well as to serve specific applications, unsupervised learning has emerged as an important field of research. In computer vision, unsupervised learning comes in various guises. We focus here on the unsupervised discovery and matching of object categories among images in a collection, following the work of Cho et al. 2015. We show that the original approach can be reformulated and solved as a proper optimization problem. Experiments on several benchmarks establish the merit of our approach. △ Less

Submitted 5 April, 2019; originally announced April 2019.

Comments: Accepted to CVPR 2019

arXiv:1904.02062 [pdf]

An Ensemble Deep Learning Model for Drug Abuse Detection in Sparse Twitter-Sphere

Authors: Han Hu, NhatHai Phan, James Geller, Stephen Iezzi, Huy Vo, De**g Dou, Soon Ae Chun

Abstract: As the problem of drug abuse intensifies in the U.S., many studies that primarily utilize social media data, such as postings on Twitter, to study drug abuse-related activities use machine learning as a powerful tool for text classification and filtering. However, given the wide range of topics of Twitter users, tweets related to drug abuse are rare in most of the datasets. This imbalanced data re… ▽ More As the problem of drug abuse intensifies in the U.S., many studies that primarily utilize social media data, such as postings on Twitter, to study drug abuse-related activities use machine learning as a powerful tool for text classification and filtering. However, given the wide range of topics of Twitter users, tweets related to drug abuse are rare in most of the datasets. This imbalanced data remains a major issue in building effective tweet classifiers, and is especially obvious for studies that include abuse-related slang terms. In this study, we approach this problem by designing an ensemble deep learning model that leverages both word-level and character-level features to classify abuse-related tweets. Experiments are reported on a Twitter dataset, where we can configure the percentages of the two classes (abuse vs. non abuse) to simulate the data imbalance with different amplitudes. Results show that our ensemble deep learning models exhibit better performance than ensembles of traditional machine learning models, especially on heavily imbalanced datasets. △ Less

Submitted 3 April, 2019; originally announced April 2019.

Comments: The 17th World Congress of Medical and Health Informatics [MedInfo 2019]

arXiv:1803.10348 [pdf, other]

doi 10.1145/3240508.3240678

Structural inpainting

Authors: Huy V. Vo, Ngoc Q. K. Duong, Patrick Perez

Abstract: Scene-agnostic visual inpainting remains very challenging despite progress in patch-based methods. Recently, Pathak et al. 2016 have introduced convolutional "context encoders" (CEs) for unsupervised feature learning through image completion tasks. With the additional help of adversarial training, CEs turned out to be a promising tool to complete complex structures in real inpainting problems. In… ▽ More Scene-agnostic visual inpainting remains very challenging despite progress in patch-based methods. Recently, Pathak et al. 2016 have introduced convolutional "context encoders" (CEs) for unsupervised feature learning through image completion tasks. With the additional help of adversarial training, CEs turned out to be a promising tool to complete complex structures in real inpainting problems. In the present paper we propose to push further this key ability by relying on perceptual reconstruction losses at training time. We show on a wide variety of visual scenes the merit of the approach for structural inpainting, and confirm it through a user study. Combined with the optimization-based refinement of Yang et al. 2016 with neural patches, our context encoder opens up new opportunities for prior-free visual inpainting. △ Less

Submitted 27 March, 2018; originally announced March 2018.

arXiv:1306.4411 [pdf, other]

Event-Object Reasoning with Curated Knowledge Bases: Deriving Missing Information

Authors: Chitta Baral, Nguyen H. Vo

Abstract: The broader goal of our research is to formulate answers to why and how questions with respect to knowledge bases, such as AURA. One issue we face when reasoning with many available knowledge bases is that at times needed information is missing. Examples of this include partially missing information about next sub-event, first sub-event, last sub-event, result of an event, input to an event, desti… ▽ More The broader goal of our research is to formulate answers to why and how questions with respect to knowledge bases, such as AURA. One issue we face when reasoning with many available knowledge bases is that at times needed information is missing. Examples of this include partially missing information about next sub-event, first sub-event, last sub-event, result of an event, input to an event, destination of an event, and raw material involved in an event. In many cases one can recover part of the missing knowledge through reasoning. In this paper we give a formal definition about how such missing information can be recovered and then give an ASP implementation of it. We then discuss the implication of this with respect to answering why and how questions. △ Less

Submitted 19 June, 2013; v1 submitted 18 June, 2013; originally announced June 2013.

Comments: 13 pages

arXiv:1207.0140 [pdf, other]

LogBase: A Scalable Log-structured Database System in the Cloud

Authors: Hoang Tam Vo, Sheng Wang, Divyakant Agrawal, Gang Chen, Beng Chin Ooi

Abstract: Numerous applications such as financial transactions (e.g., stock trading) are write-heavy in nature. The shift from reads to writes in web applications has also been accelerating in recent years. Write-ahead-logging is a common approach for providing recovery capability while improving performance in most storage systems. However, the separation of log and application data incurs write overheads… ▽ More Numerous applications such as financial transactions (e.g., stock trading) are write-heavy in nature. The shift from reads to writes in web applications has also been accelerating in recent years. Write-ahead-logging is a common approach for providing recovery capability while improving performance in most storage systems. However, the separation of log and application data incurs write overheads observed in write-heavy environments and hence adversely affects the write throughput and recovery time in the system. In this paper, we introduce LogBase - a scalable log-structured database system that adopts log-only storage for removing the write bottleneck and supporting fast system recovery. LogBase is designed to be dynamically deployed on commodity clusters to take advantage of elastic scaling property of cloud environments. LogBase provides in-memory multiversion indexes for supporting efficient access to data maintained in the log. LogBase also supports transactions that bundle read and write operations spanning across multiple records. We implemented the proposed system and compared it with HBase and a disk-based log-structured record-oriented system modeled after RAMCloud. The experimental results show that LogBase is able to provide sustained write throughput, efficient data access out of the cache, and effective system recovery. △ Less

Submitted 30 June, 2012; originally announced July 2012.

Comments: VLDB2012

Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 10, pp. 1004-1015 (2012)

arXiv:1109.1618 [pdf]

doi 10.1007/978-3-642-29262-0_8

An analysis of Twitter messages in the 2011 Tohoku Earthquake

Authors: Son Doan, Bao-Khanh Ho Vo, Nigel Collier

Abstract: Social media such as Facebook and Twitter have proven to be a useful resource to understand public opinion towards real world events. In this paper, we investigate over 1.5 million Twitter messages (tweets) for the period 9th March 2011 to 31st May 2011 in order to track awareness and anxiety levels in the Tokyo metropolitan district to the 2011 Tohoku Earthquake and subsequent tsunami and nuclear… ▽ More Social media such as Facebook and Twitter have proven to be a useful resource to understand public opinion towards real world events. In this paper, we investigate over 1.5 million Twitter messages (tweets) for the period 9th March 2011 to 31st May 2011 in order to track awareness and anxiety levels in the Tokyo metropolitan district to the 2011 Tohoku Earthquake and subsequent tsunami and nuclear emergencies. These three events were tracked using both English and Japanese tweets. Preliminary results indicated: 1) close correspondence between Twitter data and earthquake events, 2) strong correlation between English and Japanese tweets on the same events, 3) tweets in the native language play an important roles in early warning, 4) tweets showed how quickly Japanese people's anxiety returned to normal levels after the earthquake event. Several distinctions between English and Japanese tweets on earthquake events are also discussed. The results suggest that Twitter data can be used as a useful resource for tracking the public mood of populations affected by natural disasters as well as an early warning system. △ Less

Submitted 7 September, 2011; originally announced September 2011.

Comments: 9 pages, 4 figures, eHealth 2011 conference, Malaga (Spain) (accepted)

Journal ref: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2012, Volume 91, Part 4, 58-66

Showing 1–49 of 49 results for author: Vo, H