Search | arXiv e-print repository

arXiv:2406.19871 [pdf, other]

Koopman based trajectory model and computation offloading for high mobility paradigm in ISAC enabled IoT system

Abstract: User experience on mobile devices is constrained by limited battery capacity and processing power, but 6G technology advancements are diving rapidly into mobile technical evolution. Mobile edge computing (MEC) offers a solution, offloading computationally intensive tasks to edge cloud servers, reducing battery drain compared to local processing. The upcoming integrated sensing and communication in… ▽ More User experience on mobile devices is constrained by limited battery capacity and processing power, but 6G technology advancements are diving rapidly into mobile technical evolution. Mobile edge computing (MEC) offers a solution, offloading computationally intensive tasks to edge cloud servers, reducing battery drain compared to local processing. The upcoming integrated sensing and communication in mobile communication may improve the trajectory prediction and processing delays. This study proposes a greedy resource allocation optimization strategy for multi-user networks to minimize aggregate energy usage. Numerical results show potential improvement at 33\% for every 1000 iteration. Addressing prediction model division and velocity accuracy issues is crucial for better results. A plan for further improvement and achieving objectives is outlined for the upcoming work phase. △ Less

Submitted 28 June, 2024; originally announced June 2024.

MSC Class: 52-08 ACM Class: C.2

arXiv:2406.14819 [pdf, other]

SAM-EG: Segment Anything Model with Egde Guidance framework for efficient Polyp Segmentation

Authors: Quoc-Huy Trinh, Hai-Dang Nguyen, Bao-Tram Nguyen Ngoc, Debesh Jha, Ulas Bagci, Minh-Triet Tran

Abstract: Polyp segmentation, a critical concern in medical imaging, has prompted numerous proposed methods aimed at enhancing the quality of segmented masks. While current state-of-the-art techniques produce impressive results, the size and computational cost of these models pose challenges for practical industry applications. Recently, the Segment Anything Model (SAM) has been proposed as a robust foundat… ▽ More Polyp segmentation, a critical concern in medical imaging, has prompted numerous proposed methods aimed at enhancing the quality of segmented masks. While current state-of-the-art techniques produce impressive results, the size and computational cost of these models pose challenges for practical industry applications. Recently, the Segment Anything Model (SAM) has been proposed as a robust foundation model, showing promise for adaptation to medical image segmentation. Inspired by this concept, we propose SAM-EG, a framework that guides small segmentation models for polyp segmentation to address the computation cost challenge. Additionally, in this study, we introduce the Edge Guiding module, which integrates edge information into image features to assist the segmentation model in addressing boundary issues from current segmentation model in this task. Through extensive experiments, our small models showcase their efficacy by achieving competitive results with state-of-the-art methods, offering a promising approach to develo** compact models with high accuracy for polyp segmentation and in the broader field of medical imaging. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.12710 [pdf, other]

doi 10.1145/3634737.3637636

What is in the Chrome Web Store? Investigating Security-Noteworthy Browser Extensions

Authors: Sheryl Hsu, Manda Tran, Aurore Fass

Abstract: This paper is the first attempt at providing a holistic view of the Chrome Web Store (CWS). We leverage historical data provided by ChromeStats to study global trends in the CWS and security implications. We first highlight the extremely short life cycles of extensions: roughly 60% of extensions stay in the CWS for one year. Second, we define and show that Security-Noteworthy Extensions (SNE) are… ▽ More This paper is the first attempt at providing a holistic view of the Chrome Web Store (CWS). We leverage historical data provided by ChromeStats to study global trends in the CWS and security implications. We first highlight the extremely short life cycles of extensions: roughly 60% of extensions stay in the CWS for one year. Second, we define and show that Security-Noteworthy Extensions (SNE) are a significant issue: they pervade the CWS for years and affect almost 350 million users. Third, we identify clusters of extensions with a similar code base. We discuss how code similarity techniques could be used to flag suspicious extensions. By develo** an approach to extract URLs from extensions' comments, we show that extensions reuse code snippets from public repositories or forums, leading to the propagation of dated code and vulnerabilities. Finally, we underline a critical lack of maintenance in the CWS: 60% of the extensions in the CWS have never been updated; half of the extensions known to be vulnerable are still in the CWS and still vulnerable 2 years after disclosure; a third of extensions use vulnerable library versions. We believe that these issues should be widely known in order to pave the way for a more secure CWS. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Published in ACM AsiaCCS 2024

Journal ref: ACM AsiaCCS 2024

arXiv:2406.11146 [pdf, other]

Designing Interactions with Autonomous Physical Systems

Authors: Marius Hoggenmueller, Tram Thi Minh Tran, Luke Hespanhol, Martin Tomitsch

Abstract: In this position paper, we present a collection of four different prototy** approaches which we have developed and applied to prototype and evaluate interfaces for and interactions around autonomous physical systems. Further, we provide a classification of our approaches aiming to support other researchers and designers in choosing appropriate prototy** platforms and representations. In this position paper, we present a collection of four different prototy** approaches which we have developed and applied to prototype and evaluate interfaces for and interactions around autonomous physical systems. Further, we provide a classification of our approaches aiming to support other researchers and designers in choosing appropriate prototy** platforms and representations. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.09837 [pdf, other]

TabularFM: An Open Framework For Tabular Foundational Models

Authors: Quan M. Tran, Suong N. Hoang, Lam M. Nguyen, Dzung Phan, Hoang Thanh Lam

Abstract: Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowledge base established during pretraining. Most research on FMs has primarily focused on unstructured… ▽ More Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowledge base established during pretraining. Most research on FMs has primarily focused on unstructured data, such as text and images, or semi-structured data, like time-series. However, there has been limited attention to structured data, such as tabular data, which, despite its prevalence, remains under-studied due to a lack of clean datasets and insufficient research on the transferability of FMs for various tabular data tasks. In response to this gap, we introduce a framework called TabularFM, which incorporates state-of-the-art methods for develo** FMs specifically for tabular data. This includes variations of neural architectures such as GANs, VAEs, and Transformers. We have curated a million of tabular datasets and released cleaned versions to facilitate the development of tabular FMs. We pretrained FMs on this curated data, benchmarked various learning methods on these datasets, and released the pretrained models along with leaderboards for future comparative studies. Our fully open-sourced system provides a comprehensive analysis of the transferability of tabular FMs. By releasing these datasets, pretrained models, and leaderboards, we aim to enhance the validity and usability of tabular FMs in the near future. △ Less

Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.08735 [pdf, other]

doi 10.1145/3411764.3445159

Context-Based Interface Prototy**: Understanding the Effect of Prototype Representation on User Feedback

Authors: Marius Hoggenmueller, Martin Tomitsch, Luke Hespanhol, Tram Thi Minh Tran, Stewart Worrall, Eduardo Nebot

Abstract: The rise of autonomous systems in cities, such as automated vehicles (AVs), requires new approaches for prototy** and evaluating how people interact with those systems through context-based user interfaces, such as external human-machine interfaces (eHMIs). In this paper, we present a comparative study of three prototype representations (real-world VR, computer-generated VR, real-world video) of… ▽ More The rise of autonomous systems in cities, such as automated vehicles (AVs), requires new approaches for prototy** and evaluating how people interact with those systems through context-based user interfaces, such as external human-machine interfaces (eHMIs). In this paper, we present a comparative study of three prototype representations (real-world VR, computer-generated VR, real-world video) of an eHMI in a mixed-methods study with 42 participants. Quantitative results show that while the real-world VR representation results in higher sense of presence, no significant differences in user experience and trust towards the AV itself were found. However, interview data shows that participants focused on different experiential and perceptual aspects in each of the prototype representations. These differences are linked to spatial awareness and perceived realism of the AV behaviour and its context, affecting in turn how participants assess trust and the eHMI. The paper offers guidelines for prototy** and evaluating context-based interfaces through simulations. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.00307 [pdf, other]

HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model

Authors: Khoa Vo, Thinh Phan, Kashu Yamazaki, Minh Tran, Ngan Le

Abstract: Current video-language models (VLMs) rely extensively on instance-level alignment between video and language modalities, which presents two major limitations: (1) visual reasoning disobeys the natural perception that humans do in first-person perspective, leading to a lack of reasoning interpretation; and (2) learning is limited in capturing inherent fine-grained relationships between two modaliti… ▽ More Current video-language models (VLMs) rely extensively on instance-level alignment between video and language modalities, which presents two major limitations: (1) visual reasoning disobeys the natural perception that humans do in first-person perspective, leading to a lack of reasoning interpretation; and (2) learning is limited in capturing inherent fine-grained relationships between two modalities. In this paper, we take an inspiration from human perception and explore a compositional approach for egocentric video representation. We introduce HENASY (Hierarchical ENtities ASsemblY), which includes a spatiotemporal token grou** mechanism to explicitly assemble dynamically evolving scene entities through time and model their relationship for video representation. By leveraging compositional structure understanding, HENASY possesses strong interpretability via visual grounding with free-form text queries. We further explore a suite of multi-grained contrastive losses to facilitate entity-centric understandings. This comprises three alignment types: video-narration, noun-entity, verb-entities alignments. Our method demonstrates strong interpretability in both quantitative and qualitative experiments; while maintaining competitive performances on five downstream tasks via zero-shot transfer or as video/text representation, including video/text retrieval, action recognition, multi-choice query, natural language query, and moments query. △ Less

Submitted 6 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

Comments: under submission

arXiv:2405.17926 [pdf, other]

SarcNet: A Novel AI-based Framework to Automatically Analyze and Score Sarcomere Organizations in Fluorescently Tagged hiPSC-CMs

Authors: Huyen Le, Khiet Dang, Tien Lai, Nhung Nguyen, Mai Tran, Hieu Pham

Abstract: Quantifying sarcomere structure organization in human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) is crucial for understanding cardiac disease pathology, improving drug screening, and advancing regenerative medicine. Traditional methods, such as manual annotation and Fourier transform analysis, are labor-intensive, error-prone, and lack high-throughput capabilities. In this st… ▽ More Quantifying sarcomere structure organization in human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) is crucial for understanding cardiac disease pathology, improving drug screening, and advancing regenerative medicine. Traditional methods, such as manual annotation and Fourier transform analysis, are labor-intensive, error-prone, and lack high-throughput capabilities. In this study, we present a novel deep learning-based framework that leverages cell images and integrates cell features to automatically evaluate the sarcomere structure of hiPSC-CMs from the onset of differentiation. This framework overcomes the limitations of traditional methods through automated, high-throughput analysis, providing consistent, reliable results while accurately detecting complex sarcomere patterns across diverse samples. The proposed framework contains the SarcNet, a linear layers-added ResNet-18 module, to output a continuous score ranging from one to five that captures the level of sarcomere structure organization. It is trained and validated on an open-source dataset of hiPSC-CMs images with the endogenously GFP-tagged alpha-actinin-2 structure developed by the Allen Institute for Cell Science (AICS). SarcNet achieves a Spearman correlation of 0.831 with expert evaluations, demonstrating superior performance and an improvement of 0.075 over the current state-of-the-art approach, which uses Linear Regression. Our results also show a consistent pattern of increasing organization from day 18 to day 32 of differentiation, aligning with expert evaluations. By integrating the quantitative features calculated directly from the images with the visual features learned during the deep learning model, our framework offers a more comprehensive and accurate assessment, thereby enhancing the further utility of hiPSC-CMs in medical research and therapy development. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.14608 [pdf, other]

ShapeFormer: Shapelet Transformer for Multivariate Time Series Classification

Authors: Xuan-May Le, Ling Luo, Uwe Aickelin, Minh-Tuan Tran

Abstract: Multivariate time series classification (MTSC) has attracted significant research attention due to its diverse real-world applications. Recently, exploiting transformers for MTSC has achieved state-of-the-art performance. However, existing methods focus on generic features, providing a comprehensive understanding of data, but they ignore class-specific features crucial for learning the representat… ▽ More Multivariate time series classification (MTSC) has attracted significant research attention due to its diverse real-world applications. Recently, exploiting transformers for MTSC has achieved state-of-the-art performance. However, existing methods focus on generic features, providing a comprehensive understanding of data, but they ignore class-specific features crucial for learning the representative characteristics of each class. This leads to poor performance in the case of imbalanced datasets or datasets with similar overall patterns but differing in minor class-specific details. In this paper, we propose a novel Shapelet Transformer (ShapeFormer), which comprises class-specific and generic transformer modules to capture both of these features. In the class-specific module, we introduce the discovery method to extract the discriminative subsequences of each class (i.e. shapelets) from the training set. We then propose a Shapelet Filter to learn the difference features between these shapelets and the input time series. We found that the difference feature for each shapelet contains important class-specific features, as it shows a significant distinction between its class and others. In the generic module, convolution filters are used to extract generic features that contain information to distinguish among all classes. For each module, we employ the transformer encoder to capture the correlation between their features. As a result, the combination of two transformer modules allows our model to exploit the power of both types of features, thereby enhancing the classification performance. Our experiments on 30 UEA MTSC datasets demonstrate that ShapeFormer has achieved the highest accuracy ranking compared to state-of-the-art methods. The code is available at https://github.com/xuanmay2701/shapeformer. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Accepted at KDD 2024

arXiv:2405.05068 [pdf, other]

Chemistry Beyond Exact Solutions on a Quantum-Centric Supercomputer

Authors: Javier Robledo-Moreno, Mario Motta, Holger Haas, Ali Javadi-Abhari, Petar Jurcevic, William Kirby, Simon Martiel, Kunal Sharma, Sandeep Sharma, Tomonori Shirakawa, Iskandar Sitdikov, Rong-Yang Sun, Kevin J. Sung, Maika Takita, Minh C. Tran, Seiji Yunoki, Antonio Mezzacapo

Abstract: A universal quantum computer can be used as a simulator capable of predicting properties of diverse quantum systems. Electronic structure problems in chemistry offer practical use cases around the hundred-qubit mark. This appears promising since current quantum processors have reached these sizes. However, map** these use cases onto quantum computers yields deep circuits, and for for pre-fault-t… ▽ More A universal quantum computer can be used as a simulator capable of predicting properties of diverse quantum systems. Electronic structure problems in chemistry offer practical use cases around the hundred-qubit mark. This appears promising since current quantum processors have reached these sizes. However, map** these use cases onto quantum computers yields deep circuits, and for for pre-fault-tolerant quantum processors, the large number of measurements to estimate molecular energies leads to prohibitive runtimes. As a result, realistic chemistry is out of reach of current quantum computers in isolation. A natural question is whether classical distributed computation can relieve quantum processors from parsing all but a core, intrinsically quantum component of a chemistry workflow. Here, we incorporate quantum computations of chemistry in a quantum-centric supercomputing architecture, using up to 6400 nodes of the supercomputer Fugaku to assist a Heron superconducting quantum processor. We simulate the N$_2$ triple bond breaking in a correlation-consistent cc-pVDZ basis set, and the active-space electronic structure of [2Fe-2S] and [4Fe-4S] clusters, using 58, 45 and 77 qubits respectively, with quantum circuits of up to 10570 (3590 2-qubit) quantum gates. We obtain our results using a class of quantum circuits that approximates molecular eigenstates, and a hybrid estimator. The estimator processes quantum samples, produces upper bounds to the ground-state energy and wavefunctions supported on a polynomial number of states. This guarantees an unconditional quality metric for quantum advantage, certifiable by classical computers at polynomial cost. For current error rates, our results show that classical distributed computing coupled to quantum processors can produce good approximate solutions for practical problems beyond sizes amenable to exact diagonalization. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04489 [pdf, other]

S3Former: Self-supervised High-resolution Transformer for Solar PV Profiling

Authors: Minh Tran, Adrian De Luis, Haitao Liao, Ying Huang, Roy McCann, Alan Mantooth, Jack Cothren, Ngan Le

Abstract: As the impact of climate change escalates, the global necessity to transition to sustainable energy sources becomes increasingly evident. Renewable energies have emerged as a viable solution for users, with Photovoltaic energy being a favored choice for small installations due to its reliability and efficiency. Accurate map** of PV installations is crucial for understanding the extension of its… ▽ More As the impact of climate change escalates, the global necessity to transition to sustainable energy sources becomes increasingly evident. Renewable energies have emerged as a viable solution for users, with Photovoltaic energy being a favored choice for small installations due to its reliability and efficiency. Accurate map** of PV installations is crucial for understanding the extension of its adoption and informing energy policy. To meet this need, we introduce S3Former, designed to segment solar panels from aerial imagery and provide size and location information critical for analyzing the impact of such installations on the grid. Solar panel identification is challenging due to factors such as varying weather conditions, roof characteristics, Ground Sampling Distance variations and lack of appropriate initialization weights for optimized training. To tackle these complexities, S3Former features a Masked Attention Mask Transformer incorporating a self-supervised learning pretrained backbone. Specifically, our model leverages low-level and high-level features extracted from the backbone and incorporates an instance query mechanism incorporated on the Transformer architecture to enhance the localization of solar PV installations. We introduce a self-supervised learning phase (pretext task) to improve the initialization weights on the backbone of S3Former. We evaluated S3Former using diverse datasets, demonstrate improvement state-of-the-art models. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Preprint

arXiv:2404.18705 [pdf, other]

Wireless Information and Energy Transfer in the Era of 6G Communications

Authors: Constantinos Psomas, Konstantinos Ntougias, Nikita Shanin, Dongfang Xu, Kenneth MacSporran Mayer, Nguyen Minh Tran, Laura Cottatellucci, Kae Won Choi, Dong In Kim, Robert Schober, Ioannis Krikidis

Abstract: Wireless information and energy transfer (WIET) represents an emerging paradigm which employs controllable transmission of radio-frequency signals for the dual purpose of data communication and wireless charging. As such, WIET is widely regarded as an enabler of envisioned 6G use cases that rely on energy-sustainable Internet-of-Things (IoT) networks, such as smart cities and smart grids. Meeting… ▽ More Wireless information and energy transfer (WIET) represents an emerging paradigm which employs controllable transmission of radio-frequency signals for the dual purpose of data communication and wireless charging. As such, WIET is widely regarded as an enabler of envisioned 6G use cases that rely on energy-sustainable Internet-of-Things (IoT) networks, such as smart cities and smart grids. Meeting the quality-of-service demands of WIET, in terms of both data transfer and power delivery, requires effective co-design of the information and energy signals. In this article, we present the main principles and design aspects of WIET, focusing on its integration in 6G networks. First, we discuss how conventional communication notions such as resource allocation and waveform design need to be revisited in the context of WIET. Next, we consider various candidate 6G technologies that can boost WIET efficiency, namely, holographic multiple-input multiple-output, near-field beamforming, terahertz communication, intelligent reflecting surfaces (IRSs), and reconfigurable (fluid) antenna arrays. We introduce respective WIET design methods, analyze the promising performance gains of these WIET systems, and discuss challenges, open issues, and future research directions. Finally, a near-field energy beamforming scheme and a power-based IRS beamforming algorithm are experimentally validated using a wireless energy transfer testbed. The vision of WIET in communication systems has been gaining momentum in recent years, with constant progress with respect to theoretical but also practical aspects. The comprehensive overview of the state of the art of WIET presented in this paper highlights the potentials of WIET systems as well as their overall benefits in 6G networks. △ Less

Submitted 16 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: Proceedings of the IEEE, 36 pages, 33 figures

arXiv:2404.11429 [pdf, other]

CarcassFormer: An End-to-end Transformer-based Framework for Simultaneous Localization, Segmentation and Classification of Poultry Carcass Defect

Authors: Minh Tran, Sang Truong, Arthur F. A. Fernandes, Michael T. Kidd, Ngan Le

Abstract: In the food industry, assessing the quality of poultry carcasses during processing is a crucial step. This study proposes an effective approach for automating the assessment of carcass quality without requiring skilled labor or inspector involvement. The proposed system is based on machine learning (ML) and computer vision (CV) techniques, enabling automated defect detection and carcass quality as… ▽ More In the food industry, assessing the quality of poultry carcasses during processing is a crucial step. This study proposes an effective approach for automating the assessment of carcass quality without requiring skilled labor or inspector involvement. The proposed system is based on machine learning (ML) and computer vision (CV) techniques, enabling automated defect detection and carcass quality assessment. To this end, an end-to-end framework called CarcassFormer is introduced. It is built upon a Transformer-based architecture designed to effectively extract visual representations while simultaneously detecting, segmenting, and classifying poultry carcass defects. Our proposed framework is capable of analyzing imperfections resulting from production and transport welfare issues, as well as processing plant stunner, scalder, picker, and other equipment malfunctions. To benchmark the framework, a dataset of 7,321 images was initially acquired, which contained both single and multiple carcasses per image. In this study, the performance of the CarcassFormer system is compared with other state-of-the-art (SOTA) approaches for both classification, detection, and segmentation tasks. Through extensive quantitative experiments, our framework consistently outperforms existing methods, demonstrating remarkable improvements across various evaluation metrics such as AP, AP@50, and AP@75. Furthermore, the qualitative results highlight the strengths of CarcassFormer in capturing fine details, including feathers, and accurately localizing and segmenting carcasses with high precision. To facilitate further research and collaboration, the pre-trained model and source code of CarcassFormer is available for research purposes at: \url{https://github.com/UARK-AICV/CarcassFormer}. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted to Poultry Science Journal

arXiv:2404.08590 [pdf, other]

Improving Referring Image Segmentation using Vision-Aware Text Features

Authors: Hai Nguyen-Truong, E-Ro Nguyen, Tuan-Anh Vu, Minh-Triet Tran, Binh-Son Hua, Sai-Kit Yeung

Abstract: Referring image segmentation is a challenging task that involves generating pixel-wise segmentation masks based on natural language descriptions. Existing methods have relied mostly on visual features to generate the segmentation masks while treating text features as supporting components. This over-reliance on visual features can lead to suboptimal results, especially in complex scenarios where t… ▽ More Referring image segmentation is a challenging task that involves generating pixel-wise segmentation masks based on natural language descriptions. Existing methods have relied mostly on visual features to generate the segmentation masks while treating text features as supporting components. This over-reliance on visual features can lead to suboptimal results, especially in complex scenarios where text prompts are ambiguous or context-dependent. To overcome these challenges, we present a novel framework VATEX to improve referring image segmentation by enhancing object and context understanding with Vision-Aware Text Feature. Our method involves using CLIP to derive a CLIP Prior that integrates an object-centric visual heatmap with text description, which can be used as the initial query in DETR-based architecture for the segmentation task. Furthermore, by observing that there are multiple ways to describe an instance in an image, we enforce feature similarity between text variations referring to the same visual input by two components: a novel Contextual Multimodal Decoder that turns text embeddings into vision-aware text features, and a Meaning Consistency Constraint to ensure further the coherent and consistent interpretation of language expressions with the context understanding obtained from the image. Our method achieves a significant performance improvement on three benchmark datasets RefCOCO, RefCOCO+ and G-Ref. Code is available at: https://nero1342.github.io/VATEX\_RIS. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 30 pages including supplementary

arXiv:2404.04564 [pdf, other]

Enhancing Video Summarization with Context Awareness

Authors: Hai-Dang Huynh-Lam, Ngoc-Phuong Ho-Thi, Minh-Triet Tran, Trung-Nghia Le

Abstract: Video summarization is a crucial research area that aims to efficiently browse and retrieve relevant information from the vast amount of video content available today. With the exponential growth of multimedia data, the ability to extract meaningful representations from videos has become essential. Video summarization techniques automatically generate concise summaries by selecting keyframes, shot… ▽ More Video summarization is a crucial research area that aims to efficiently browse and retrieve relevant information from the vast amount of video content available today. With the exponential growth of multimedia data, the ability to extract meaningful representations from videos has become essential. Video summarization techniques automatically generate concise summaries by selecting keyframes, shots, or segments that capture the video's essence. This process improves the efficiency and accuracy of various applications, including video surveillance, education, entertainment, and social media. Despite the importance of video summarization, there is a lack of diverse and representative datasets, hindering comprehensive evaluation and benchmarking of algorithms. Existing evaluation metrics also fail to fully capture the complexities of video summarization, limiting accurate algorithm assessment and hindering the field's progress. To overcome data scarcity challenges and improve evaluation, we propose an unsupervised approach that leverages video data structure and information for generating informative summaries. By moving away from fixed annotations, our framework can produce representative summaries effectively. Moreover, we introduce an innovative evaluation pipeline tailored specifically for video summarization. Human participants are involved in the evaluation, comparing our generated summaries to ground truth summaries and assessing their informativeness. This human-centric approach provides valuable insights into the effectiveness of our proposed techniques. Experimental results demonstrate that our training-free framework outperforms existing unsupervised approaches and achieves competitive results compared to state-of-the-art supervised methods. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 115 pages, 1 supplementary paper, undergraduate thesis report at US-VNUHCM

arXiv:2404.04511 [pdf, other]

doi 10.1007/978-981-97-0376-0_2

Cluster-based Video Summarization with Temporal Context Awareness

Authors: Hai-Dang Huynh-Lam, Ngoc-Phuong Ho-Thi, Minh-Triet Tran, Trung-Nghia Le

Abstract: In this paper, we present TAC-SUM, a novel and efficient training-free approach for video summarization that addresses the limitations of existing cluster-based models by incorporating temporal context. Our method partitions the input video into temporally consecutive segments with clustering information, enabling the injection of temporal awareness into the clustering process, setting it apart fr… ▽ More In this paper, we present TAC-SUM, a novel and efficient training-free approach for video summarization that addresses the limitations of existing cluster-based models by incorporating temporal context. Our method partitions the input video into temporally consecutive segments with clustering information, enabling the injection of temporal awareness into the clustering process, setting it apart from prior cluster-based summarization methods. The resulting temporal-aware clusters are then utilized to compute the final summary, using simple rules for keyframe selection and frame importance scoring. Experimental results on the SumMe dataset demonstrate the effectiveness of our proposed approach, outperforming existing unsupervised methods and achieving comparable performance to state-of-the-art supervised summarization techniques. Our source code is available for reference at \url{https://github.com/hcmus-thesis-gulu/TAC-SUM}. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 14 pages, 6 figures, accepted in PSIVT 2023

arXiv:2404.02966 [pdf, other]

Hamiltonian Simulation in the Interaction Picture Using the Magnus Expansion

Authors: Kunal Sharma, Minh C. Tran

Abstract: We propose an algorithm for simulating the dynamics of a geometrically local Hamiltonian $A$ under a small geometrically local perturbation $αB$. In certain regimes, the algorithm achieves the optimal scaling and outperforms the state-of-the-art algorithms. By moving into the interaction frame of $A$ and classically computing the Magnus expansion of the interaction-picture Hamiltonian, our algorit… ▽ More We propose an algorithm for simulating the dynamics of a geometrically local Hamiltonian $A$ under a small geometrically local perturbation $αB$. In certain regimes, the algorithm achieves the optimal scaling and outperforms the state-of-the-art algorithms. By moving into the interaction frame of $A$ and classically computing the Magnus expansion of the interaction-picture Hamiltonian, our algorithm bypasses the need for ancillary qubits. In analyzing its performance, we develop a framework to capture the quasi-locality of the Magnus operators, leading to a tightened bound for the error of the Magnus truncation. The Lieb-Robinson bound also guarantees the efficiency of computing the Magnus operators and of their subsequent decomposition into elementary quantum gates. These features make our algorithm appealing for near-term and early-fault-tolerant simulations. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 17 pages, 1 figure, 1 table

arXiv:2404.00852 [pdf, other]

doi 10.1109/rivf60135.2023.10471878

Ensemble Learning for Vietnamese Scene Text Spotting in Urban Environments

Authors: Hieu Nguyen, Cong-Hoang Ta, Phuong-Thuy Le-Nguyen, Minh-Triet Tran, Trung-Nghia Le

Abstract: This paper presents a simple yet efficient ensemble learning framework for Vietnamese scene text spotting. Leveraging the power of ensemble learning, which combines multiple models to yield more accurate predictions, our approach aims to significantly enhance the performance of scene text spotting in challenging urban settings. Through experimental evaluations on the VinText dataset, our proposed… ▽ More This paper presents a simple yet efficient ensemble learning framework for Vietnamese scene text spotting. Leveraging the power of ensemble learning, which combines multiple models to yield more accurate predictions, our approach aims to significantly enhance the performance of scene text spotting in challenging urban settings. Through experimental evaluations on the VinText dataset, our proposed method achieves a significant improvement in accuracy compared to existing methods with an impressive accuracy of 5%. These results unequivocally demonstrate the efficacy of ensemble learning in the context of Vietnamese scene text spotting in urban environments, highlighting its potential for real world applications, such as text detection and recognition in urban signage, advertisements, and various text-rich urban scenes. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: RIVF 2023

Journal ref: In 2023 RIVF International Conference on Computing and Communication Technologies (RIVF) (pp. 177-182). IEEE

arXiv:2403.19153 [pdf, other]

doi 10.1145/3613905.3651086

Exploring Holistic HMI Design for Automated Vehicles: Insights from a Participatory Workshop to Bridge In-Vehicle and External Communication

Authors: Haoyu Dong, Tram Thi Minh Tran, Rutger Verstegen, Silvia Cazacu, Ruolin Gao, Marius Hoggenmüller, Debargha Dey, Mervyn Franssen, Markus Sasalovici, Pavlo Bazilinskyy, Marieke Martens

Abstract: Human-Machine Interfaces (HMIs) for automated vehicles (AVs) are typically divided into two categories: internal HMIs for interactions within the vehicle, and external HMIs for communication with other road users. In this work, we examine the prospects of bridging these two seemingly distinct domains. Through a participatory workshop with automotive user interface researchers and practitioners, we… ▽ More Human-Machine Interfaces (HMIs) for automated vehicles (AVs) are typically divided into two categories: internal HMIs for interactions within the vehicle, and external HMIs for communication with other road users. In this work, we examine the prospects of bridging these two seemingly distinct domains. Through a participatory workshop with automotive user interface researchers and practitioners, we facilitated a critical exploration of holistic HMI design by having workshop participants collaboratively develop interaction scenarios involving AVs, in-vehicle users, and external road users. The discussion offers insights into the escalation of interface elements as an HMI design strategy, the direct interactions between different users, and an expanded understanding of holistic HMI design. This work reflects a collaborative effort to understand the practical aspects of this holistic design approach, offering new perspectives and encouraging further investigation into this underexplored aspect of automotive user interfaces. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.14101 [pdf, other]

Text-Enhanced Data-free Approach for Federated Class-Incremental Learning

Authors: Minh-Tuan Tran, Trung Le, Xuan-May Le, Mehrtash Harandi, Dinh Phung

Abstract: Federated Class-Incremental Learning (FCIL) is an underexplored yet pivotal issue, involving the dynamic addition of new classes in the context of federated learning. In this field, Data-Free Knowledge Transfer (DFKT) plays a crucial role in addressing catastrophic forgetting and data privacy problems. However, prior approaches lack the crucial synergy between DFKT and the model training phases, c… ▽ More Federated Class-Incremental Learning (FCIL) is an underexplored yet pivotal issue, involving the dynamic addition of new classes in the context of federated learning. In this field, Data-Free Knowledge Transfer (DFKT) plays a crucial role in addressing catastrophic forgetting and data privacy problems. However, prior approaches lack the crucial synergy between DFKT and the model training phases, causing DFKT to encounter difficulties in generating high-quality data from a non-anchored latent space of the old task model. In this paper, we introduce LANDER (Label Text Centered Data-Free Knowledge Transfer) to address this issue by utilizing label text embeddings (LTE) produced by pretrained language models. Specifically, during the model training phase, our approach treats LTE as anchor points and constrains the feature embeddings of corresponding training samples around them, enriching the surrounding area with more meaningful information. In the DFKT phase, by using these LTE anchors, LANDER can synthesize more meaningful samples, thereby effectively addressing the forgetting problem. Additionally, instead of tightly constraining embeddings toward the anchor, the Bounding Loss is introduced to encourage sample embeddings to remain flexible within a defined radius. This approach preserves the natural differences in sample embeddings and mitigates the embedding overlap caused by heterogeneous federated settings. Extensive experiments conducted on CIFAR100, Tiny-ImageNet, and ImageNet demonstrate that LANDER significantly outperforms previous methods and achieves state-of-the-art performance in FCIL. The code is available at https://github.com/tmtuan1307/lander. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted at CVPR 2024

arXiv:2403.11386 [pdf, ps, other]

doi 10.1145/3581961.3609837

Holistic HMI Design for Automated Vehicles: Bridging In-Vehicle and External Communication

Authors: Haoyu Dong, Tram Thi Minh Tran, Pavlo Bazilinskyy, Marius Hoggenmüller, Debargha Dey, Silvia Cazacu, Mervyn Franssen, Ruolin Gao

Abstract: As the field of automated vehicles (AVs) advances, it has become increasingly critical to develop human-machine interfaces (HMI) for both internal and external communication. Critical dialogue is emerging around the potential necessity for a holistic approach to HMI designs, which promotes the integration of both in-vehicle user and external road user perspectives. This approach aims to create a u… ▽ More As the field of automated vehicles (AVs) advances, it has become increasingly critical to develop human-machine interfaces (HMI) for both internal and external communication. Critical dialogue is emerging around the potential necessity for a holistic approach to HMI designs, which promotes the integration of both in-vehicle user and external road user perspectives. This approach aims to create a unified and coherent experience for different stakeholders interacting with AVs. This workshop seeks to bring together designers, engineers, researchers, and other stakeholders to delve into relevant use cases, exploring the potential advantages and challenges of this approach. The insights generated from this workshop aim to inform further design and research in the development of coherent HMIs for AVs, ultimately for more seamless integration of AVs into existing traffic. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11378 [pdf, other]

doi 10.1109/THMS.2021.3107517

A Review of Virtual Reality Studies on Autonomous Vehicle--Pedestrian Interaction

Authors: Tram Thi Minh Tran, Callum Parker, Martin Tomitsch

Abstract: An increasing number of studies employ virtual reality (VR) to evaluate interactions between autonomous vehicles (AVs) and pedestrians. VR simulators are valued for their cost-effectiveness, flexibility in develo** various traffic scenarios, safe conduct of user studies, and acceptable ecological validity. Reviewing the literature between 2010 and 2020, we found 31 empirical studies using VR as… ▽ More An increasing number of studies employ virtual reality (VR) to evaluate interactions between autonomous vehicles (AVs) and pedestrians. VR simulators are valued for their cost-effectiveness, flexibility in develo** various traffic scenarios, safe conduct of user studies, and acceptable ecological validity. Reviewing the literature between 2010 and 2020, we found 31 empirical studies using VR as a testing apparatus for both implicit and explicit communication. By performing a systematic analysis, we identified current coverage of critical use cases, obtained a comprehensive account of factors influencing pedestrian behavior in simulated traffic scenarios, and assessed evaluation measures. Based on the findings, we present a set of recommendations for implementing VR pedestrian simulators and propose directions for future research. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11377 [pdf, other]

doi 10.3390/mti7020021

Simulating Wearable Urban Augmented Reality Experiences in VR: Lessons Learnt from Designing Two Future Urban Interfaces

Authors: Tram Thi Minh Tran, Callum Parker, Marius Hoggenmüller, Luke Hespanhol, Martin Tomitsch

Abstract: Augmented reality (AR) has the potential to fundamentally change how people engage with increasingly interactive urban environments. However, many challenges exist in designing and evaluating these new urban AR experiences, such as technical constraints and safety concerns associated with outdoor AR. We contribute to this domain by assessing the use of virtual reality (VR) for simulating wearable… ▽ More Augmented reality (AR) has the potential to fundamentally change how people engage with increasingly interactive urban environments. However, many challenges exist in designing and evaluating these new urban AR experiences, such as technical constraints and safety concerns associated with outdoor AR. We contribute to this domain by assessing the use of virtual reality (VR) for simulating wearable urban AR experiences, allowing participants to interact with future AR interfaces in a realistic, safe and controlled setting. This paper describes two wearable urban AR applications (pedestrian navigation and autonomous mobility) simulated in VR. Based on a thematic analysis of interview data collected across the two studies, we found that the VR simulation successfully elicited feedback on the functional benefits of AR concepts and the potential impact of urban contextual factors, such as safety concerns, attentional capacity, and social considerations. At the same time, we highlighted the limitations of this approach in terms of assessing the AR interface's visual quality and providing exhaustive contextual information. The paper concludes with recommendations for simulating wearable urban AR experiences in VR. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11376 [pdf, other]

ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation

Authors: Minh Tran, Winston Bounsavy, Khoa Vo, Anh Nguyen, Tri Nguyen, Ngan Le

Abstract: Amodal Instance Segmentation (AIS) presents a challenging task as it involves predicting both visible and occluded parts of objects within images. Existing AIS methods rely on a bidirectional approach, encompassing both the transition from amodal features to visible features (amodal-to-visible) and from visible features to amodal features (visible-to-amodal). Our observation shows that the utiliza… ▽ More Amodal Instance Segmentation (AIS) presents a challenging task as it involves predicting both visible and occluded parts of objects within images. Existing AIS methods rely on a bidirectional approach, encompassing both the transition from amodal features to visible features (amodal-to-visible) and from visible features to amodal features (visible-to-amodal). Our observation shows that the utilization of amodal features through the amodal-to-visible can confuse the visible features due to the extra information of occluded/hidden segments not presented in visible display. Consequently, this compromised quality of visible features during the subsequent visible-to-amodal transition. To tackle this issue, we introduce ShapeFormer, a decoupled Transformer-based model with a visible-to-amodal transition. It facilitates the explicit relationship between output segmentations and avoids the need for amodal-to-visible transitions. ShapeFormer comprises three key modules: (i) Visible-Occluding Mask Head for predicting visible segmentation with occlusion awareness, (ii) Shape-Prior Amodal Mask Head for predicting amodal and occluded masks, and (iii) Category-Specific Shape Prior Retriever aims to provide shape prior knowledge. Comprehensive experiments and extensive ablation studies across various AIS benchmarks demonstrate the effectiveness of our ShapeFormer. The code is available at: \url{https://github.com/UARK-AICV/ShapeFormer} △ Less

Submitted 17 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

Comments: Accepted to IJCNN2024

arXiv:2403.09069 [pdf, other]

Dyadic Interaction Modeling for Social Behavior Generation

Authors: Minh Tran, Di Chang, Maksim Siniukov, Mohammad Soleymani

Abstract: Human-human communication is like a delicate dance where listeners and speakers concurrently interact to maintain conversational dynamics. Hence, an effective model for generating listener nonverbal behaviors requires understanding the dyadic context and interaction. In this paper, we present an effective framework for creating 3D facial motions in dyadic interactions. Existing work consider a lis… ▽ More Human-human communication is like a delicate dance where listeners and speakers concurrently interact to maintain conversational dynamics. Hence, an effective model for generating listener nonverbal behaviors requires understanding the dyadic context and interaction. In this paper, we present an effective framework for creating 3D facial motions in dyadic interactions. Existing work consider a listener as a reactive agent with reflexive behaviors to the speaker's voice and facial motions. The heart of our framework is Dyadic Interaction Modeling (DIM), a pre-training approach that jointly models speakers' and listeners' motions through masking and contrastive learning to learn representations that capture the dyadic context. To enable the generation of non-deterministic behaviors, we encode both listener and speaker motions into discrete latent representations, through VQ-VAE. The pre-trained model is further fine-tuned for motion generation. Extensive experiments demonstrate the superiority of our framework in generating listener motions, establishing a new state-of-the-art according to the quantitative measures capturing the diversity and realism of generated motions. Qualitative results demonstrate the superior capabilities of the proposed approach in generating diverse and realistic expressions, eye blinks and head gestures. △ Less

Submitted 26 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08876 [pdf, other]

ARtVista: Gateway To Empower Anyone Into Artist

Authors: Trong-Vu Hoang, Quang-Binh Nguyen, Duy-Nam Ly, Khanh-Duy Le, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le

Abstract: Drawing is an art that enables people to express their imagination and emotions. However, individuals usually face challenges in drawing, especially when translating conceptual ideas into visually coherent representations and bridging the gap between mental visualization and practical execution. In response, we propose ARtVista - a novel system integrating AR and generative AI technologies. ARtVis… ▽ More Drawing is an art that enables people to express their imagination and emotions. However, individuals usually face challenges in drawing, especially when translating conceptual ideas into visually coherent representations and bridging the gap between mental visualization and practical execution. In response, we propose ARtVista - a novel system integrating AR and generative AI technologies. ARtVista not only recommends reference images aligned with users' abstract ideas and generates sketches for users to draw but also goes beyond, crafting vibrant paintings in various painting styles. ARtVista also offers users an alternative approach to create striking paintings by simulating the paint-by-number concept on reference images, empowering users to create visually stunning artwork devoid of the necessity for advanced drawing skills. We perform a pilot study and reveal positive feedback on its usability, emphasizing its effectiveness in visualizing user ideas and aiding the painting process to achieve stunning pictures without requiring advanced drawing skills. The source code will be available at https://github.com/htrvu/ARtVista. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: CHI 2024

arXiv:2403.08746 [pdf, other]

iCONTRA: Toward Thematic Collection Design Via Interactive Concept Transfer

Authors: Dinh-Khoi Vo, Duy-Nam Ly, Khanh-Duy Le, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le

Abstract: Creating thematic collections in industries demands innovative designs and cohesive concepts. Designers may face challenges in maintaining thematic consistency when drawing inspiration from existing objects, landscapes, or artifacts. While AI-powered graphic design tools offer help, they often fail to generate cohesive sets based on specific thematic concepts. In response, we introduce iCONTRA, an… ▽ More Creating thematic collections in industries demands innovative designs and cohesive concepts. Designers may face challenges in maintaining thematic consistency when drawing inspiration from existing objects, landscapes, or artifacts. While AI-powered graphic design tools offer help, they often fail to generate cohesive sets based on specific thematic concepts. In response, we introduce iCONTRA, an interactive CONcept TRAnsfer system. With a user-friendly interface, iCONTRA enables both experienced designers and novices to effortlessly explore creative design concepts and efficiently generate thematic collections. We also propose a zero-shot image editing algorithm, eliminating the need for fine-tuning models, which gradually integrates information from initial objects, ensuring consistency in the generation process without influencing the background. A pilot study suggests iCONTRA's potential to reduce designers' efforts. Experimental results demonstrate its effectiveness in producing consistent and high-quality object concept transfers. iCONTRA stands as a promising tool for innovation and creative exploration in thematic collection design. The source code will be available at: https://github.com/vdkhoi20/iCONTRA. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: CHI 2024

arXiv:2403.07006 [pdf, other]

doi 10.3389/fcomp.2022.866516

Designing Wearable Augmented Reality Concepts to Support Scalability in Autonomous Vehicle-Pedestrian Interaction

Authors: Tram Thi Minh Tran, Callum Parker, Yiyuan Wang, Martin Tomitsch

Abstract: Wearable augmented reality (AR) offers new ways for supporting the interaction between autonomous vehicles (AVs) and pedestrians due to its ability to integrate timely and contextually relevant data into the user's field of view. This article presents novel wearable AR concepts that assist crossing pedestrians in multi-vehicle scenarios where several AVs frequent the road from both directions. Thr… ▽ More Wearable augmented reality (AR) offers new ways for supporting the interaction between autonomous vehicles (AVs) and pedestrians due to its ability to integrate timely and contextually relevant data into the user's field of view. This article presents novel wearable AR concepts that assist crossing pedestrians in multi-vehicle scenarios where several AVs frequent the road from both directions. Three concepts with different communication approaches for signaling responses from multiple AVs to a crossing request, as well as a conventional pedestrian push button, were simulated and tested within a virtual reality environment. The results showed that wearable AR is a promising way to reduce crossing pedestrians' cognitive load when the design offers both individual AV responses and a clear signal to cross. The willingness of pedestrians to adopt a wearable AR solution, however, is subject to different factors, including costs, data privacy, technical defects, liability risks, maintenance duties, and form factors. We further found that all participants favored sending a crossing request to AVs rather than waiting for the vehicles to detect their intentions-pointing to an important gap and opportunity in the current AV-pedestrian interaction literature. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05727 [pdf, other]

Sco** Out the Scalability Issues of Autonomous Vehicle-Pedestrian Interaction

Authors: Tram Thi Minh Tran, Callum Parker, Martin Tomitsch

Abstract: Autonomous vehicles (AVs) may use external interfaces, such as LED light bands, to communicate with pedestrians safely and intuitively. While previous research has demonstrated the effectiveness of these interfaces in simple traffic scenarios involving one pedestrian and one vehicle, their performance in more complex scenarios with multiple road users remains unclear. The scalability of AV externa… ▽ More Autonomous vehicles (AVs) may use external interfaces, such as LED light bands, to communicate with pedestrians safely and intuitively. While previous research has demonstrated the effectiveness of these interfaces in simple traffic scenarios involving one pedestrian and one vehicle, their performance in more complex scenarios with multiple road users remains unclear. The scalability of AV external communication has therefore attracted increasing attention, prompting the need for further investigation. This sco** review synthesises information from 54 papers to identify seven key scalability issues in multi-vehicle and multi-pedestrian environments, with Clarity of Recipients, Information Overload, and Multi-Lane Safety emerging as the most pressing concerns. To guide future research in scalable AV-pedestrian interactions, we propose high-level design directions focused on three communication loci: vehicle, infrastructure, and pedestrian. Our work contributes the groundwork and a roadmap for designing simplified, coordinated, and targeted external AV communication, ultimately improving safety and efficiency in complex traffic scenarios. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05725 [pdf, other]

doi 10.1145/3613904.3642118

Exploring the Impact of Interconnected External Interfaces in Autonomous Vehicleson Pedestrian Safety and Experience

Authors: Tram Thi Minh Tran, Callum Parker, Marius Hoggenmuller, Yiyuan Wang, Martin Tomitsch

Abstract: Policymakers advocate for the use of external Human-Machine Interfaces (eHMIs) to allow autonomous vehicles (AVs) to communicate their intentions or status. Nonetheless, scalability concerns in complex traffic scenarios arise, such as potentially increasing pedestrian cognitive load or conveying contradictory signals. Building upon precursory works, our study explores 'interconnected eHMIs,' where… ▽ More Policymakers advocate for the use of external Human-Machine Interfaces (eHMIs) to allow autonomous vehicles (AVs) to communicate their intentions or status. Nonetheless, scalability concerns in complex traffic scenarios arise, such as potentially increasing pedestrian cognitive load or conveying contradictory signals. Building upon precursory works, our study explores 'interconnected eHMIs,' where multiple AV interfaces are interconnected to provide pedestrians with clear and unified information. In a virtual reality study (N=32), we assessed the effectiveness of this concept in improving pedestrian safety and their crossing experience. We compared these results against two conditions: no eHMIs and unconnected eHMIs. Results indicated interconnected eHMIs enhanced safety feelings and encouraged cautious crossings. However, certain design elements, such as the use of the colour red, led to confusion and discomfort. Prior knowledge slightly influenced perceptions of interconnected eHMIs, underscoring the need for refined user education. We conclude with practical implications and future eHMI design research directions. △ Less

Submitted 17 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.17481 [pdf, other]

Numerical Schemes for 3-Wave Kinetic Equations: A Complete Treatment of the Collision Operator

Authors: Steven Walton, Minh-Binh Tran

Abstract: In our previous work, numerical schemes for a simplified version of 3-wave kinetic equations, in which only the simple forward-cascade terms of the collision operators are kept, have been successfully designed, especially to capture the long time dynamics of the equation given the multiple blow-up time phenomenon. In this second work in the series, we propose numerical treatments for the complete… ▽ More In our previous work, numerical schemes for a simplified version of 3-wave kinetic equations, in which only the simple forward-cascade terms of the collision operators are kept, have been successfully designed, especially to capture the long time dynamics of the equation given the multiple blow-up time phenomenon. In this second work in the series, we propose numerical treatments for the complete 3-wave kinetic equations, in which the complete, much more complicated collision operators are fully considered based on a novel conservative form of the equation. We then derive an implicit finite volume scheme to solve the equation. The new discretization uses an adaptive time-step** method which allows for the simulations to be carried to very long times. Our computed solutions are compared with previously derived long-time asymptotic estimates for the decay rate of total energy of time-dependent solutions of 3-wave kinetic equations and found to be in excellent agreement. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.13613 [pdf, other]

Overview of the VLSP 2023 -- ComOM Shared Task: A Data Challenge for Comparative Opinion Mining from Vietnamese Product Reviews

Authors: Hoang-Quynh Le, Duy-Cat Can, Khanh-Vinh Nguyen, Mai-Vu Tran

Abstract: This paper presents a comprehensive overview of the Comparative Opinion Mining from Vietnamese Product Reviews shared task (ComOM), held as part of the 10$^{th}$ International Workshop on Vietnamese Language and Speech Processing (VLSP 2023). The primary objective of this shared task is to advance the field of natural language processing by develo** techniques that proficiently extract comparati… ▽ More This paper presents a comprehensive overview of the Comparative Opinion Mining from Vietnamese Product Reviews shared task (ComOM), held as part of the 10$^{th}$ International Workshop on Vietnamese Language and Speech Processing (VLSP 2023). The primary objective of this shared task is to advance the field of natural language processing by develo** techniques that proficiently extract comparative opinions from Vietnamese product reviews. Participants are challenged to propose models that adeptly extract a comparative "quintuple" from a comparative sentence, encompassing Subject, Object, Aspect, Predicate, and Comparison Type Label. We construct a human-annotated dataset comprising $120$ documents, encompassing $7427$ non-comparative sentences and $2468$ comparisons within $1798$ sentences. Participating models undergo evaluation and ranking based on the Exact match macro-averaged quintuple F1 score. △ Less

Submitted 4 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: In Proceedings of VLSP 2023

arXiv:2402.09985 [pdf, ps, other]

Semi-parametric financial risk forecasting incorporating multiple realized measures

Authors: Rangika Peiris, Chao Wang, Richard Gerlach, Minh-Ngoc Tran

Abstract: A semi-parametric joint Value-at-Risk (VaR) and Expected Shortfall (ES) forecasting framework employing multiple realized measures is developed. The proposed framework extends the realized exponential GARCH model to be semi-parametrically estimated, via a joint loss function, whilst extending existing quantile time series models to incorporate multiple realized measures. A quasi-likelihood is buil… ▽ More A semi-parametric joint Value-at-Risk (VaR) and Expected Shortfall (ES) forecasting framework employing multiple realized measures is developed. The proposed framework extends the realized exponential GARCH model to be semi-parametrically estimated, via a joint loss function, whilst extending existing quantile time series models to incorporate multiple realized measures. A quasi-likelihood is built, employing the asymmetric Laplace distribution that is directly linked to a joint loss function, which enables Bayesian inference for the proposed model. An adaptive Markov Chain Monte Carlo method is used for the model estimation. The empirical section evaluates the performance of the proposed framework with six stock markets from January 2000 to June 2022, covering the period of COVID-19. Three realized measures, including 5- minute realized variance, bi-power variation, and realized kernel, are incorporated and evaluated in the proposed framework. One-step-ahead VaR and ES forecasting results of the proposed model are compared to a range of parametric and semi-parametric models, lending support to the effectiveness of the proposed framework. △ Less

Submitted 14 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.01496 [pdf]

Constructing 100 MΩ and 1 GΩ Resistance Standards via Star-Mesh Transformations

Authors: Dean G. Jarrett, Albert F. Rigosi, Dominick S. Scaletta, Ngoc Thanh Mai Tran, Heather M. Hill, Alireza R. Panna, Cheng Hsueh Yang, Yanfei Yang, Randolph E. Elmquist, David B. Newell

Abstract: A recent mathematical framework for optimizing resistor networks to achieve values in the MΩ through GΩ levels was employed for two specific cases. Objectives here include proof of concept and identification of possible apparatus limitations for future experiments involving graphene-based quantum Hall array resistance standards. Using fractal-like, or recursive, features of the framework allows on… ▽ More A recent mathematical framework for optimizing resistor networks to achieve values in the MΩ through GΩ levels was employed for two specific cases. Objectives here include proof of concept and identification of possible apparatus limitations for future experiments involving graphene-based quantum Hall array resistance standards. Using fractal-like, or recursive, features of the framework allows one to calculate and implement network designs with substantially lower-valued resistors. The cases of 100 MΩ and 1 GΩ demonstrate that, theoretically, one would not need more than 100 quantum Hall elements to achieve these high resistances. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.16525 [pdf, other]

Identity check problem for shallow quantum circuits

Authors: Sergey Bravyi, Natalie Parham, Minh Tran

Abstract: Checking whether two quantum circuits are approximately equivalent is a common task in quantum computing. We consider a closely related identity check problem: given a quantum circuit $U$, one has to estimate the diamond-norm distance between $U$ and the identity channel. We present a classical algorithm approximating the distance to the identity within a factor $α=D+1$ for shallow geometrically l… ▽ More Checking whether two quantum circuits are approximately equivalent is a common task in quantum computing. We consider a closely related identity check problem: given a quantum circuit $U$, one has to estimate the diamond-norm distance between $U$ and the identity channel. We present a classical algorithm approximating the distance to the identity within a factor $α=D+1$ for shallow geometrically local $D$-dimensional circuits provided that the circuit is sufficiently close to the identity. The runtime of the algorithm scales linearly with the number of qubits for any constant circuit depth and spatial dimension. We also show that the operator-norm distance to the identity $\|U-I\|$ can be efficiently approximated within a factor $α=5$ for shallow 1D circuits and, under a certain technical condition, within a factor $α=2D+3$ for shallow $D$-dimensional circuits. A numerical implementation of the identity check algorithm is reported for 1D Trotter circuits with up to 100 qubits. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 10 pages, 3 figures

arXiv:2401.16500 [pdf, other]

Error detection using pneumatic logic

Authors: Shane Hoang, Mabel Shehada, Zinal Patel, Minh-Huy Tran, Konstantinos Karydis, Philip Brisk, William H. Grover

Abstract: Pneumatic systems are common in manufacturing, healthcare, transportation, robotics, and many other fields. Failures in these systems can have very serious consequences, particularly if they go undetected. In this work, we present an air-powered error detector device that can detect and respond to failures in pneumatically actuated systems. The device contains 21 monolithic membrane valves that ac… ▽ More Pneumatic systems are common in manufacturing, healthcare, transportation, robotics, and many other fields. Failures in these systems can have very serious consequences, particularly if they go undetected. In this work, we present an air-powered error detector device that can detect and respond to failures in pneumatically actuated systems. The device contains 21 monolithic membrane valves that act like transistors in a pneumatic logic "circuit" that uses vacuum to represent TRUE and atmospheric pressure as FALSE. Three pneumatic exclusive-OR (XOR) gates are used to calculate the parity bit corresponding to the values of several control bits. If the calculated value of the parity bit differs from the expected value, then an error (like a leak or a blocked air line) has been detected and the device outputs a pneumatic error signal which can in turn be used to alert a user, shut down the system, or take some other action. As a proof-of-concept, we used our pneumatic error detector to monitor the operation of a medical device, an intermittent pneumatic compression (IPC) device commonly used to prevent the formation of life-threatening blood clots in the wearer's legs. Experiments confirm that when the IPC device was damaged, the pneumatic error detector immediately recognized the error (a leak) and alerted the wearer using sound. By providing a simple and low-cost way to add fault detection to pneumatic actuation systems without using sensors, our pneumatic error detector can promote safety and reliability across the wide range of pneumatic systems. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 23 pages, 5 figures

arXiv:2401.08868 [pdf, other]

B-Cos Aligned Transformers Learn Human-Interpretable Features

Authors: Manuel Tran, Amal Lahiani, Yashin Dicente Cid, Melanie Boxberg, Peter Lienemann, Christian Matek, Sophia J. Wagner, Fabian J. Theis, Eldad Klaiman, Tingying Peng

Abstract: Vision Transformers (ViTs) and Swin Transformers (Swin) are currently state-of-the-art in computational pathology. However, domain experts are still reluctant to use these models due to their lack of interpretability. This is not surprising, as critical decisions need to be transparent and understandable. The most common approach to understanding transformers is to visualize their attention. Howev… ▽ More Vision Transformers (ViTs) and Swin Transformers (Swin) are currently state-of-the-art in computational pathology. However, domain experts are still reluctant to use these models due to their lack of interpretability. This is not surprising, as critical decisions need to be transparent and understandable. The most common approach to understanding transformers is to visualize their attention. However, attention maps of ViTs are often fragmented, leading to unsatisfactory explanations. Here, we introduce a novel architecture called the B-cos Vision Transformer (BvT) that is designed to be more interpretable. It replaces all linear transformations with the B-cos transform to promote weight-input alignment. In a blinded study, medical experts clearly ranked BvTs above ViTs, suggesting that our network is better at capturing biomedically relevant structures. This is also true for the B-cos Swin Transformer (Bwin). Compared to the Swin Transformer, it even improves the F1-score by up to 4.7% on two public datasets. △ Less

Submitted 18 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted at MICCAI 2023 (oral). Camera-ready available at https://doi.org/10.1007/978-3-031-43993-3_50

arXiv:2401.06536 [pdf, ps, other]

Controlling the Rates of a Chain of Harmonic Oscillators with a Point Langevin Thermostat

Authors: Amirali Hannani, Minh-Binh Tran, Minh Nhat Phung, Emmanuel Trélat

Abstract: We consider the control problem for an infinite chain of coupled harmonic oscillators with a Langevin thermostat at the origin. We study the effect of two types of open-loop boundary controls, impulsive control and linear memory-feedback control, in the high frequency limit. We investigate their action on the reflection-transmission coefficients for the wave energy for the scattering of the thermo… ▽ More We consider the control problem for an infinite chain of coupled harmonic oscillators with a Langevin thermostat at the origin. We study the effect of two types of open-loop boundary controls, impulsive control and linear memory-feedback control, in the high frequency limit. We investigate their action on the reflection-transmission coefficients for the wave energy for the scattering of the thermostat. Our study shows that impulsive boundary controls have no impact on the rates and are thus not appropriate to act on the system, despite their physical meaning and relevance. In contrast, the second kind of control that we propose, which is less standard and uses the past of the state solution of the system, is adequate and relevant. We prove that any triple of rates satisfying appropriate assumptions is asymptotically reachable thanks to linear memory-feedback controls that we design explicitly. △ Less

Submitted 14 March, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

arXiv:2401.02093 [pdf, ps, other]

A new approach to convergence analysis of iterative models with optimal error bounds

Authors: Minh-Phuong Tran, Thanh-Nhan Nguyen, Thai-Hung Nguyen, Tan-Phuc Nguyen, Tien-Khai Nguyen, Cong-Duy-Nguyen Nguyen, Trung-Hieu Huynh

Abstract: In this paper, we study a new approach related to the convergence analysis of Ishikawa-type iterative models to a common fixed point of two non-expansive map**s in Banach spaces. The main novelty of our contribution lies in the so-called \emph{optimal error bounds}, which established some necessary and sufficient conditions for convergence and derived both the error estimates and bounds on the c… ▽ More In this paper, we study a new approach related to the convergence analysis of Ishikawa-type iterative models to a common fixed point of two non-expansive map**s in Banach spaces. The main novelty of our contribution lies in the so-called \emph{optimal error bounds}, which established some necessary and sufficient conditions for convergence and derived both the error estimates and bounds on the convergence rates for iterative schemes. Although a special interest here is devoted to the Ishikawa and modified Ishikawa iterative sequences, the theory of \emph{optimal error bounds} proposed in this paper can also be favorably applied to various types of iterative models to approximate common fixed points of non-expansive map**s. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 29 pages, 9 figures

arXiv:2312.12746 [pdf, other]

ChatFDA: Medical Records Risk Assessment

Authors: M Tran, C Sun

Abstract: In healthcare, the emphasis on patient safety and the minimization of medical errors cannot be overstated. Despite concerted efforts, many healthcare systems, especially in low-resource regions, still grapple with preventing these errors effectively. This study explores a pioneering application aimed at addressing this challenge by assisting caregivers in gauging potential risks derived from medic… ▽ More In healthcare, the emphasis on patient safety and the minimization of medical errors cannot be overstated. Despite concerted efforts, many healthcare systems, especially in low-resource regions, still grapple with preventing these errors effectively. This study explores a pioneering application aimed at addressing this challenge by assisting caregivers in gauging potential risks derived from medical notes. The application leverages data from openFDA, delivering real-time, actionable insights regarding prescriptions. Preliminary analyses conducted on the MIMIC-III \cite{mimic} dataset affirm a proof of concept highlighting a reduction in medical errors and an amplification in patient safety. This tool holds promise for drastically enhancing healthcare outcomes in settings with limited resources. To bolster reproducibility and foster further research, the codebase underpinning our methodology is accessible on https://github.com/autonlab/2023.hackAuton/tree/main/prescription_checker. This is a submission for the 30th HackAuton CMU. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.10187 [pdf, other]

TSRNet: Simple Framework for Real-time ECG Anomaly Detection with Multimodal Time and Spectrogram Restoration Network

Authors: Nhat-Tan Bui, Dinh-Hieu Hoang, Thinh Phan, Minh-Triet Tran, Brijesh Patel, Donald Adjeroh, Ngan Le

Abstract: The electrocardiogram (ECG) is a valuable signal used to assess various aspects of heart health, such as heart rate and rhythm. It plays a crucial role in identifying cardiac conditions and detecting anomalies in ECG data. However, distinguishing between normal and abnormal ECG signals can be a challenging task. In this paper, we propose an approach that leverages anomaly detection to identify unh… ▽ More The electrocardiogram (ECG) is a valuable signal used to assess various aspects of heart health, such as heart rate and rhythm. It plays a crucial role in identifying cardiac conditions and detecting anomalies in ECG data. However, distinguishing between normal and abnormal ECG signals can be a challenging task. In this paper, we propose an approach that leverages anomaly detection to identify unhealthy conditions using solely normal ECG data for training. Furthermore, to enhance the information available and build a robust system, we suggest considering both the time series and time-frequency domain aspects of the ECG signal. As a result, we introduce a specialized network called the Multimodal Time and Spectrogram Restoration Network (TSRNet) designed specifically for detecting anomalies in ECG signals. TSRNet falls into the category of restoration-based anomaly detection and draws inspiration from both the time series and spectrogram domains. By extracting representations from both domains, TSRNet effectively captures the comprehensive characteristics of the ECG signal. This approach enables the network to learn robust representations with superior discrimination abilities, allowing it to distinguish between normal and abnormal ECG patterns more effectively. Furthermore, we introduce a novel inference method, termed Peak-based Error, that specifically focuses on ECG peaks, a critical component in detecting abnormalities. The experimental result on the large-scale dataset PTB-XL has demonstrated the effectiveness of our approach in ECG anomaly detection, while also prioritizing efficiency by minimizing the number of trainable parameters. Our code is available at https://github.com/UARK-AICV/TSRNet. △ Less

Submitted 5 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted at ISBI 2024

arXiv:2312.10179 [pdf, other]

3FM: Multi-modal Meta-learning for Federated Tasks

Authors: Minh Tran, Roochi Shah, Zejun Gong

Abstract: We present a novel approach in the domain of federated learning (FL), particularly focusing on addressing the challenges posed by modality heterogeneity, variability in modality availability across clients, and the prevalent issue of missing data. We introduce a meta-learning framework specifically designed for multimodal federated tasks. Our approach is motivated by the need to enable federated m… ▽ More We present a novel approach in the domain of federated learning (FL), particularly focusing on addressing the challenges posed by modality heterogeneity, variability in modality availability across clients, and the prevalent issue of missing data. We introduce a meta-learning framework specifically designed for multimodal federated tasks. Our approach is motivated by the need to enable federated models to robustly adapt when exposed to new modalities, a common scenario in FL where clients often differ in the number of available modalities. The effectiveness of our proposed framework is demonstrated through extensive experimentation on an augmented MNIST dataset, enriched with audio and sign language data. We demonstrate that the proposed algorithm achieves better performance than the baseline on a subset of missing modality scenarios with careful tuning of the meta-learning rates. This is a shortened report, and our work will be extended and updated soon. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.09733 [pdf, other]

Quantum-centric Supercomputing for Materials Science: A Perspective on Challenges and Future Directions

Authors: Yuri Alexeev, Maximilian Amsler, Paul Baity, Marco Antonio Barroca, Sanzio Bassini, Torey Battelle, Daan Camps, David Casanova, Young jai Choi, Frederic T. Chong, Charles Chung, Chris Codella, Antonio D. Corcoles, James Cruise, Alberto Di Meglio, Jonathan Dubois, Ivan Duran, Thomas Eckl, Sophia Economou, Stephan Eidenbenz, Bruce Elmegreen, Clyde Fare, Ismael Faro, Cristina Sanz Fernández, Rodrigo Neumann Barros Ferreira , et al. (102 additional authors not shown)

Abstract: Computational models are an essential tool for the design, characterization, and discovery of novel materials. Hard computational tasks in materials science stretch the limits of existing high-performance supercomputing centers, consuming much of their simulation, analysis, and data resources. Quantum computing, on the other hand, is an emerging technology with the potential to accelerate many of… ▽ More Computational models are an essential tool for the design, characterization, and discovery of novel materials. Hard computational tasks in materials science stretch the limits of existing high-performance supercomputing centers, consuming much of their simulation, analysis, and data resources. Quantum computing, on the other hand, is an emerging technology with the potential to accelerate many of the computational tasks needed for materials science. In order to do that, the quantum technology must interact with conventional high-performance computing in several ways: approximate results validation, identification of hard problems, and synergies in quantum-centric supercomputing. In this paper, we provide a perspective on how quantum-centric supercomputing can help address critical computational problems in materials science, the challenges to face in order to solve representative use cases, and new suggested directions. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 60 pages, 14 figures; comments welcome

arXiv:2312.09633 [pdf, other]

Natural Gradient Variational Bayes without Fisher Matrix Analytic Calculation and Its Inversion

Authors: A. Godichon-Baggioni, D. Nguyen, M-N Tran

Abstract: This paper introduces a method for efficiently approximating the inverse of the Fisher information matrix, a crucial step in achieving effective variational Bayes inference. A notable aspect of our approach is the avoidance of analytically computing the Fisher information matrix and its explicit inversion. Instead, we introduce an iterative procedure for generating a sequence of matrices that conv… ▽ More This paper introduces a method for efficiently approximating the inverse of the Fisher information matrix, a crucial step in achieving effective variational Bayes inference. A notable aspect of our approach is the avoidance of analytically computing the Fisher information matrix and its explicit inversion. Instead, we introduce an iterative procedure for generating a sequence of matrices that converge to the inverse of Fisher information. The natural gradient variational Bayes algorithm without analytic expression of the Fisher matrix and its inversion is provably convergent and achieves a convergence rate of order O(log s/s), with s the number of iterations. We also obtain a central limit theorem for the iterates. Implementation of our method does not require storage of large matrices, and achieves a linear complexity in the number of variational parameters. Our algorithm exhibits versatility, making it applicable across a diverse array of variational Bayes domains, including Gaussian approximation and normalizing flow Variational Bayes. We offer a range of numerical examples to demonstrate the efficiency and reliability of the proposed variational Bayes method. △ Less

Submitted 26 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: 43 pages

arXiv:2312.07489 [pdf, other]

NearbyPatchCL: Leveraging Nearby Patches for Self-Supervised Patch-Level Multi-Class Classification in Whole-Slide Images

Authors: Gia-Bao Le, Van-Tien Nguyen, Trung-Nghia Le, Minh-Triet Tran

Abstract: Whole-slide image (WSI) analysis plays a crucial role in cancer diagnosis and treatment. In addressing the demands of this critical task, self-supervised learning (SSL) methods have emerged as a valuable resource, leveraging their efficiency in circumventing the need for a large number of annotations, which can be both costly and time-consuming to deploy supervised methods. Nevertheless, patch-wis… ▽ More Whole-slide image (WSI) analysis plays a crucial role in cancer diagnosis and treatment. In addressing the demands of this critical task, self-supervised learning (SSL) methods have emerged as a valuable resource, leveraging their efficiency in circumventing the need for a large number of annotations, which can be both costly and time-consuming to deploy supervised methods. Nevertheless, patch-wise representation may exhibit instability in performance, primarily due to class imbalances stemming from patch selection within WSIs. In this paper, we introduce Nearby Patch Contrastive Learning (NearbyPatchCL), a novel self-supervised learning method that leverages nearby patches as positive samples and a decoupled contrastive loss for robust representation learning. Our method demonstrates a tangible enhancement in performance for downstream tasks involving patch-level multi-class classification. Additionally, we curate a new dataset derived from WSIs sourced from the Canine Cutaneous Cancer Histology, thus establishing a benchmark for the rigorous evaluation of patch-level multi-class classification methodologies. Intensive experiments show that our method significantly outperforms the supervised baseline and state-of-the-art SSL methods with top-1 classification accuracy of 87.56%. Our method also achieves comparable results while utilizing a mere 1% of labeled data, a stark contrast to the 100% labeled data requirement of other approaches. Source code: https://github.com/nvtien457/NearbyPatchCL △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: MMM 2024

arXiv:2312.05848 [pdf]

Super-rays grou** scheme and novel coding architecture for computational time reduction of graph-based Light Field coding

Authors: Bach Nguyen Gia, Chanh Minh Tran, Tho Nguyen Duc, Tan Phan Xuan, Eiji Kamioka

Abstract: Graph-based Light Field coding using the concept of super-rays is powerful to exploit signal redundancy along irregular shapes and achieves good energy compaction, compared to rectangular block -based approaches. However, its main limitation lies in the high time complexity for eigen-decomposition of each super-ray local graph, a high number of which can be found in a Light Field when segmented in… ▽ More Graph-based Light Field coding using the concept of super-rays is powerful to exploit signal redundancy along irregular shapes and achieves good energy compaction, compared to rectangular block -based approaches. However, its main limitation lies in the high time complexity for eigen-decomposition of each super-ray local graph, a high number of which can be found in a Light Field when segmented into super-rays. This paper examines a grou** scheme for super-rays in order to reduce the number of eigen-decomposition times, and proposes a novel coding architecture to handle the signal residual data arising for each super-ray group, as a tradeoff to achieve lower computational time. Experimental results have shown to reduce a considerable amount of decoding time for Light Field scenes, despite having a slight increase in the coding bitrates when compared with the original non-grou** super-ray -based approach. The proposal also remains to have competitive performance in Rate Distortion in comparison to HEVC-based and JPEG Pleno -based methods. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.05634 [pdf, other]

PGDS: Pose-Guidance Deep Supervision for Mitigating Clothes-Changing in Person Re-Identification

Authors: Quoc-Huy Trinh, Nhat-Tan Bui, Dinh-Hieu Hoang, Phuoc-Thao Vo Thi, Hai-Dang Nguyen, Debesh Jha, Ulas Bagci, Ngan Le, Minh-Triet Tran

Abstract: Person Re-Identification (Re-ID) task seeks to enhance the tracking of multiple individuals by surveillance cameras. It supports multimodal tasks, including text-based person retrieval and human matching. One of the most significant challenges faced in Re-ID is clothes-changing, where the same person may appear in different outfits. While previous methods have made notable progress in maintaining… ▽ More Person Re-Identification (Re-ID) task seeks to enhance the tracking of multiple individuals by surveillance cameras. It supports multimodal tasks, including text-based person retrieval and human matching. One of the most significant challenges faced in Re-ID is clothes-changing, where the same person may appear in different outfits. While previous methods have made notable progress in maintaining clothing data consistency and handling clothing change data, they still rely excessively on clothing information, which can limit performance due to the dynamic nature of human appearances. To mitigate this challenge, we propose the Pose-Guidance Deep Supervision (PGDS), an effective framework for learning pose guidance within the Re-ID task. It consists of three modules: a human encoder, a pose encoder, and a Pose-to-Human Projection module (PHP). Our framework guides the human encoder, i.e., the main re-identification model, with pose information from the pose encoder through multiple layers via the knowledge transfer mechanism from the PHP module, hel** the human encoder learn body parts information without increasing computation resources in the inference stage. Through extensive experiments, our method surpasses the performance of current state-of-the-art methods, demonstrating its robustness and effectiveness for real-world applications. Our code is available at https://github.com/huyquoctrinh/PGDS. △ Less

Submitted 1 June, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

Comments: Accepted at AVSS 2024

arXiv:2311.17793 [pdf, ps, other]

Vines and MAT-labeled graphs

Authors: Hung Manh Tran, Tan Nhat Tran, Shuhei Tsujie

Abstract: The present paper explores a connection between two concepts arising from different fields of mathematics. The first concept, called vine, is a graphical model for dependent random variables. This concept first appeared in a work of Joe (1994), and the formal definition was given later by Cooke (1997). Vines have nowadays become an active research area whose applications can be found in probabilit… ▽ More The present paper explores a connection between two concepts arising from different fields of mathematics. The first concept, called vine, is a graphical model for dependent random variables. This concept first appeared in a work of Joe (1994), and the formal definition was given later by Cooke (1997). Vines have nowadays become an active research area whose applications can be found in probability theory and uncertainty analysis. The second concept, called MAT-freeness, is a combinatorial property in the theory of freeness of logarithmic derivation modules of hyperplane arrangements. This concept was first studied by Abe-Barakat-Cuntz-Hoge-Terao (2016), and soon afterwards investigated further by Cuntz-M{ü}cksch (2020). In the particular case of graphic arrangements, the last two authors (2023) recently proved that the MAT-freeness is completely characterized by the existence of certain edge-labeled graphs, called MAT-labeled graphs. In this paper, we first introduce a poset characterization of a vine, the so-called vine. Then we show that, interestingly, there exists an explicit equivalence between the categories of locally regular vines and MAT-labeled graphs. In particular, we obtain an equivalence between the categories of regular vines and MAT-labeled complete graphs. Several applications will be mentioned to illustrate the interaction between the two concepts. Notably, we give an affirmative answer to a question of Cuntz-M{ü}cksch that MAT-freeness can be characterized by a generalization of the root poset in the case of graphic arrangements. △ Less

Submitted 23 February, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: 32 pages; refined the definitions of the categories MG and LRV (Def. 6.2 & 6.3), hence improved the main result (Thm. 6.10); the term "vineposet" is no longer used, instead we distinguish the graphical and poset definitions of a vine

MSC Class: Primary 06A07; Secondary 52C35

arXiv:2311.15525 [pdf, other]

Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for Vietnamese Abstractive Multi-document Summarization

Authors: Mai-Vu Tran, Hoang-Quynh Le, Duy-Cat Can, Quoc-An Nguyen

Abstract: This paper reports the overview of the VLSP 2022 - Vietnamese abstractive multi-document summarization (Abmusu) shared task for Vietnamese News. This task is hosted at the 9$^{th}$ annual workshop on Vietnamese Language and Speech Processing (VLSP 2022). The goal of Abmusu shared task is to develop summarization systems that could create abstractive summaries automatically for a set of documents o… ▽ More This paper reports the overview of the VLSP 2022 - Vietnamese abstractive multi-document summarization (Abmusu) shared task for Vietnamese News. This task is hosted at the 9$^{th}$ annual workshop on Vietnamese Language and Speech Processing (VLSP 2022). The goal of Abmusu shared task is to develop summarization systems that could create abstractive summaries automatically for a set of documents on a topic. The model input is multiple news documents on the same topic, and the corresponding output is a related abstractive summary. In the scope of Abmusu shared task, we only focus on Vietnamese news summarization and build a human-annotated dataset of 1,839 documents in 600 clusters, collected from Vietnamese news in 8 categories. Participated models are evaluated and ranked in terms of \texttt{ROUGE2-F1} score, the most typical evaluation metric for document summarization problem. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: VLSP 2022

arXiv:2311.14764 [pdf, other]

SafeSea: Synthetic Data Generation for Adverse & Low Probability Maritime Conditions

Authors: Martin Tran, Jordan Shipard, Hermawan Mulyono, Arnold Wiliem, Clinton Fookes

Abstract: High-quality training data is essential for enhancing the robustness of object detection models. Within the maritime domain, obtaining a diverse real image dataset is particularly challenging due to the difficulty of capturing sea images with the presence of maritime objects , especially in stormy conditions. These challenges arise due to resource limitations, in addition to the unpredictable appe… ▽ More High-quality training data is essential for enhancing the robustness of object detection models. Within the maritime domain, obtaining a diverse real image dataset is particularly challenging due to the difficulty of capturing sea images with the presence of maritime objects , especially in stormy conditions. These challenges arise due to resource limitations, in addition to the unpredictable appearance of maritime objects. Nevertheless, acquiring data from stormy conditions is essential for training effective maritime detection models, particularly for search and rescue, where real-world conditions can be unpredictable. In this work, we introduce SafeSea, which is a step** stone towards transforming actual sea images with various Sea State backgrounds while retaining maritime objects. Compared to existing generative methods such as Stable Diffusion Inpainting~\cite{stableDiffusion}, this approach reduces the time and effort required to create synthetic datasets for training maritime object detection models. The proposed method uses two automated filters to only pass generated images that meet the criteria. In particular, these filters will first classify the sea condition according to its Sea State level and then it will check whether the objects from the input image are still preserved. This method enabled the creation of the SafeSea dataset, offering diverse weather condition backgrounds to supplement the training of maritime models. Lastly, we observed that a maritime object detection model faced challenges in detecting objects in stormy sea backgrounds, emphasizing the impact of weather conditions on detection accuracy. The code, and dataset are available at https://github.com/martin-3240/SafeSea. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: Accepted to WACV 2024 workshop on Maritime Computer Vision

Showing 1–50 of 1,051 results for author: Tran, M