Search | arXiv e-print repository

arXiv:2406.15749 [pdf, ps, other]

Decay of CP-even Higgs $H\rightarrow h γγ$ in Two Higgs Doublet Model: (I) one-loop analytic results, ward identity checks

Authors: Khiem Hong Phan, Dzung Tri Tran, Thanh Huy Nguyen

Abstract: We present the first analytical expressions for one-loop induced contributions for the decay channels of CP-even Higgs $H\rightarrow h γγ$ with $h$ being standard model-like Higgs boson within the framework of Two Higgs Doublet Model in this paper. One-loop form factors for the decay processes are written in terms of the scalar Passarino-Veltman functions following the general notations of the pac… ▽ More We present the first analytical expressions for one-loop induced contributions for the decay channels of CP-even Higgs $H\rightarrow h γγ$ with $h$ being standard model-like Higgs boson within the framework of Two Higgs Doublet Model in this paper. One-loop form factors for the decay processes are written in terms of the scalar Passarino-Veltman functions following the general notations of the package~{\tt LoopTools} as well as the library {\tt Collier}. Subsequently, physical results for the decay processes can be generated numerically by using one of the above-mentioned packages. The analytical expressions shown in this paper, are verified by several numerical checks, for examples, the ultraviolet (UV) and the infrared (IR) finiteness for one-loop amplitude. Furthermore, the amplitude must be followed the so-called ward identity due to on-shell photons in final states. The identity can also be tested numerically in this work. We find that the numerical results for the checks are good stability. In phenomenological studies, the differential decay rates as functions of the invariant of two photons in final state of $H\rightarrow h γγ$ are first studied in parameter space for all types of Two Higgs Doublet Models. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 39 pages, 8 Figures, 9 Tables

Report number: DTU_2024-03

arXiv:2406.02897 [pdf, other]

LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Authors: Trung Dang, David Aponte, Dung Tran, Kazuhito Koishida

Abstract: Prior works have demonstrated zero-shot text-to-speech by using a generative language model on audio tokens obtained via a neural audio codec. It is still challenging, however, to adapt them to low-latency scenarios. In this paper, we present LiveSpeech - a fully autoregressive language model-based approach for zero-shot text-to-speech, enabling low-latency streaming of the output audio. To allow… ▽ More Prior works have demonstrated zero-shot text-to-speech by using a generative language model on audio tokens obtained via a neural audio codec. It is still challenging, however, to adapt them to low-latency scenarios. In this paper, we present LiveSpeech - a fully autoregressive language model-based approach for zero-shot text-to-speech, enabling low-latency streaming of the output audio. To allow multiple token prediction within a single decoding step, we propose (1) using adaptive codebook loss weights that consider codebook contribution in each frame and focus on hard instances, and (2) grou** codebooks and processing groups in parallel. Experiments show our proposed models achieve competitive results to state-of-the-art baselines in terms of content accuracy, speaker similarity, audio quality, and inference speed while being suitable for low-latency streaming applications. △ Less

Submitted 10 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.17582 [pdf]

doi 10.5281/zenodo.6190227

Building a temperature forecasting model for the city with the regression neural network (RNN)

Authors: Nguyen Phuc Tran, Duy Thanh Tran, Thi Thuy Nga Duong

Abstract: In recent years, a study by environmental organizations in the world and Vietnam shows that weather change is quite complex. global warming has become a serious problem in the modern world, which is a concern for scientists. last century, it was difficult to forecast the weather due to missing weather monitoring stations and technological limitations. this made it hard to collect data for building… ▽ More In recent years, a study by environmental organizations in the world and Vietnam shows that weather change is quite complex. global warming has become a serious problem in the modern world, which is a concern for scientists. last century, it was difficult to forecast the weather due to missing weather monitoring stations and technological limitations. this made it hard to collect data for building predictive models to make accurate simulations. in Vietnam, research on weather forecast models is a recent development, having only begun around 2000. along with advancements in computer science, mathematical models are being built and applied with machine learning techniques to create more accurate and reliable predictive models. this article will summarize the research and solutions for applying recurrent neural networks to forecast urban temperatures. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 6 pages

Journal ref: The 6th International Conference for Small & Medium Business in 2020 (ICSMB 2020)

arXiv:2405.17002 [pdf, other]

UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models

Authors: Quan Van Nguyen, Huy Quang Pham, Dan Quang Tran, Thang Kien-Bao Nguyen, Nhat-Hao Nguyen-Dang, Bao-Thien Nguyen-Tat

Abstract: Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Metho… ▽ More Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Methods: In our participation in the ImageCLEFmedical2024 Caption evaluation campaign, we explored caption prediction tasks using advanced Transformer-based models. We developed methods incorporating Transformer encoder-decoder and Query Transformer architectures. These models were trained and evaluated to generate diagnostic captions from radiology images. Results: Experimental evaluations demonstrated the effectiveness of our models, with the VisionDiagnostor-BioBART model achieving the highest BERTScore of 0.6267. This performance contributed to our team, DarkCow, achieving third place on the leaderboard. Conclusion: Our diagnostic captioning models show great promise in aiding medical professionals by generating high-quality reports efficiently. This approach can facilitate better data processing and performance optimization in medical imaging departments, ultimately benefiting healthcare delivery. △ Less

Submitted 27 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.15824 [pdf, other]

Efficient Mitigation of Bus Bunching through Setter-Based Curriculum Learning

Authors: Avidan Shah, Danny Tran, Yuhan Tang

Abstract: Curriculum learning has been growing in the domain of reinforcement learning as a method of improving training efficiency for various tasks. It involves modifying the difficulty (lessons) of the environment as the agent learns, in order to encourage more optimal agent behavior and higher reward states. However, most curriculum learning methods currently involve discrete transitions of the curricul… ▽ More Curriculum learning has been growing in the domain of reinforcement learning as a method of improving training efficiency for various tasks. It involves modifying the difficulty (lessons) of the environment as the agent learns, in order to encourage more optimal agent behavior and higher reward states. However, most curriculum learning methods currently involve discrete transitions of the curriculum or predefined steps by the programmer or using automatic curriculum learning on only a small subset training such as only on an adversary. In this paper, we propose a novel approach to curriculum learning that uses a Setter Model to automatically generate an action space, adversary strength, initialization, and bunching strength. Transportation and traffic optimization is a well known area of study, especially for reinforcement learning based solutions. We specifically look at the bus bunching problem for the context of this study. The main idea of the problem is to minimize the delays caused by inefficient bus timings for passengers arriving and departing from a system of buses. While the heavy exploration in the area makes innovation and improvement with regards to performance marginal, it simultaneously provides an effective baseline for develo** new generalized techniques. Our group is particularly interested in examining curriculum learning and its effect on training efficiency and overall performance. We decide to try a lesser known approach to curriculum learning, in which the curriculum is not fixed or discretely thresholded. Our method for automated curriculum learning involves a curriculum that is dynamically chosen and learned by an adversary network made to increase the difficulty of the agent's training, and defined by multiple forms of input. Our results are shown in the following sections of this paper. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 9 pages, preprint

arXiv:2404.18397 [pdf, other]

ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images

Authors: Huy Quang Pham, Thang Kien-Bao Nguyen, Quan Van Nguyen, Dan Quang Tran, Nghia Hieu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: Optical Character Recognition - Visual Question Answering (OCR-VQA) is the task of answering text information contained in images that have just been significantly developed in the English language in recent years. However, there are limited studies of this task in low-resource languages such as Vietnamese. To this end, we introduce a novel dataset, ViOCRVQA (Vietnamese Optical Character Recogniti… ▽ More Optical Character Recognition - Visual Question Answering (OCR-VQA) is the task of answering text information contained in images that have just been significantly developed in the English language in recent years. However, there are limited studies of this task in low-resource languages such as Vietnamese. To this end, we introduce a novel dataset, ViOCRVQA (Vietnamese Optical Character Recognition - Visual Question Answering dataset), consisting of 28,000+ images and 120,000+ question-answer pairs. In this dataset, all the images contain text and questions about the information relevant to the text in the images. We deploy ideas from state-of-the-art methods proposed for English to conduct experiments on our dataset, revealing the challenges and difficulties inherent in a Vietnamese dataset. Furthermore, we introduce a novel approach, called VisionReader, which achieved 0.4116 in EM and 0.6990 in the F1-score on the test set. Through the results, we found that the OCR system plays a very important role in VQA models on the ViOCRVQA dataset. In addition, the objects in the image also play a role in improving model performance. We open access to our dataset at link (https://github.com/qhnhynmm/ViOCRVQA.git) for further research in OCR-VQA task in Vietnamese. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.11152 [pdf, other]

Multi-target and multi-stage liver lesion segmentation and detection in multi-phase computed tomography scans

Authors: Abdullah F. Al-Battal, Soan T. M. Duong, Van Ha Tang, Quang Duc Tran, Steven Q. H. Truong, Chien Phan, Truong Q. Nguyen, Cheolhong An

Abstract: Multi-phase computed tomography (CT) scans use contrast agents to highlight different anatomical structures within the body to improve the probability of identifying and detecting anatomical structures of interest and abnormalities such as liver lesions. Yet, detecting these lesions remains a challenging task as these lesions vary significantly in their size, shape, texture, and contrast with resp… ▽ More Multi-phase computed tomography (CT) scans use contrast agents to highlight different anatomical structures within the body to improve the probability of identifying and detecting anatomical structures of interest and abnormalities such as liver lesions. Yet, detecting these lesions remains a challenging task as these lesions vary significantly in their size, shape, texture, and contrast with respect to surrounding tissue. Therefore, radiologists need to have an extensive experience to be able to identify and detect these lesions. Segmentation-based neural networks can assist radiologists with this task. Current state-of-the-art lesion segmentation networks use the encoder-decoder design paradigm based on the UNet architecture where the multi-phase CT scan volume is fed to the network as a multi-channel input. Although this approach utilizes information from all the phases and outperform single-phase segmentation networks, we demonstrate that their performance is not optimal and can be further improved by incorporating the learning from models trained on each single-phase individually. Our approach comprises three stages. The first stage identifies the regions within the liver where there might be lesions at three different scales (4, 8, and 16 mm). The second stage includes the main segmentation model trained using all the phases as well as a segmentation model trained on each of the phases individually. The third stage uses the multi-phase CT volumes together with the predictions from each of the segmentation models to generate the final segmentation map. Overall, our approach improves relative liver lesion segmentation performance by 1.6% while reducing performance variability across subjects by 8% when compared to the current state-of-the-art models. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10652 [pdf, other]

ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images

Authors: Quan Van Nguyen, Dan Quang Tran, Huy Quang Pham, Thang Kien-Bao Nguyen, Nghia Hieu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images. Initially, this task was researched, focusing on methods to help machines understand objects and scene contexts in images. However, some text appearing in the image that carries explicit information about the full content of the image is not mentioned. Along… ▽ More Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images. Initially, this task was researched, focusing on methods to help machines understand objects and scene contexts in images. However, some text appearing in the image that carries explicit information about the full content of the image is not mentioned. Along with the continuous development of the AI era, there have been many studies on the reading comprehension ability of VQA models in the world. As a develo** country, conditions are still limited, and this task is still open in Vietnam. Therefore, we introduce the first large-scale dataset in Vietnamese specializing in the ability to understand text appearing in images, we call it ViTextVQA (\textbf{Vi}etnamese \textbf{Text}-based \textbf{V}isual \textbf{Q}uestion \textbf{A}nswering dataset) which contains \textbf{over 16,000} images and \textbf{over 50,000} questions with answers. Through meticulous experiments with various state-of-the-art models, we uncover the significance of the order in which tokens in OCR text are processed and selected to formulate answers. This finding helped us significantly improve the performance of the baseline models on the ViTextVQA dataset. Our dataset is available at this \href{https://github.com/minhquan6203/ViTextVQA-Dataset}{link} for research purposes. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Preprint submitted to IJCV

arXiv:2404.10078 [pdf, other]

Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets

Authors: Dai Quoc Tran, Armstrong Aboah, Yuntae Jeon, Maged Shoman, Minsoo Park, Seunghee Park

Abstract: This study addresses the evolving challenges in urban traffic monitoring detection systems based on fisheye lens cameras by proposing a framework that improves the efficacy and accuracy of these systems. In the context of urban infrastructure and transportation management, advanced traffic monitoring systems have become critical for managing the complexities of urbanization and increasing vehicle… ▽ More This study addresses the evolving challenges in urban traffic monitoring detection systems based on fisheye lens cameras by proposing a framework that improves the efficacy and accuracy of these systems. In the context of urban infrastructure and transportation management, advanced traffic monitoring systems have become critical for managing the complexities of urbanization and increasing vehicle density. Traditional monitoring methods, which rely on static cameras with narrow fields of view, are ineffective in dynamic urban environments, necessitating the installation of multiple cameras, which raises costs. Fisheye lenses, which were recently introduced, provide wide and omnidirectional coverage in a single frame, making them a transformative solution. However, issues such as distorted views and blurriness arise, preventing accurate object detection on these images. Motivated by these challenges, this study proposes a novel approach that combines a ransformer-based image enhancement framework and ensemble learning technique to address these challenges and improve traffic monitoring accuracy, making significant contributions to the future of intelligent traffic management systems. Our proposed methodological framework won 5th place in the 2024 AI City Challenge, Track 4, with an F1 score of 0.5965 on experimental validation data. The experimental results demonstrate the effectiveness, efficiency, and robustness of the proposed system. Our code is publicly available at https://github.com/daitranskku/AIC2024-TRACK4-TEAM15. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09991 [pdf, other]

EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Authors: Amir Bar, Arya Bakhtiar, Danny Tran, Antonio Loquercio, Jathushan Rajasegaran, Yann LeCun, Amir Globerson, Trevor Darrell

Abstract: Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems. To advance our understanding and reduce the gap between the capabilities of animals and AI systems, we introduce a dataset of pet egomotion imagery with diverse examples of simultaneous egomotion and multi-agent interaction.… ▽ More Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems. To advance our understanding and reduce the gap between the capabilities of animals and AI systems, we introduce a dataset of pet egomotion imagery with diverse examples of simultaneous egomotion and multi-agent interaction. Current video datasets separately contain egomotion and interaction examples, but rarely both at the same time. In addition, EgoPet offers a radically distinct perspective from existing egocentric datasets of humans or vehicles. We define two in-domain benchmark tasks that capture animal behavior, and a third benchmark to assess the utility of EgoPet as a pretraining resource to robotic quadruped locomotion, showing that models trained from EgoPet outperform those trained from prior datasets. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: https://www.amirbar.net/egopet

arXiv:2404.02417 [pdf, ps, other]

One-loop contributions for $A^0 \rightarrow \ell \bar{\ell} V$ with $\ell \equiv e, μ$ and $V\equiv γ, Z$ in Higgs Extensions of the Standard Model

Authors: Khiem Hong Phan, Dzung Tri Tran, Thanh Huy Nguyen

Abstract: We present one-loop formulas for the decay of CP-odd Higgs $A^0 \rightarrow \ell \bar{\ell} V$ with $\ell \equiv e, μ$ and $V\equiv γ, Z$ in Higgs Extensions of the Standard Model, considering two higgs doublet model with a complex (and real) scalar, two higgs doublet model as well as triplet higgs model. Analytic results for one-loop amplitudes are expressed in terms of Passarino-Veltman function… ▽ More We present one-loop formulas for the decay of CP-odd Higgs $A^0 \rightarrow \ell \bar{\ell} V$ with $\ell \equiv e, μ$ and $V\equiv γ, Z$ in Higgs Extensions of the Standard Model, considering two higgs doublet model with a complex (and real) scalar, two higgs doublet model as well as triplet higgs model. Analytic results for one-loop amplitudes are expressed in terms of Passarino-Veltman functions following the standard notations of {\tt LoopTools}. As a result, physical results can be generated numerically by using the package. In phenomenological results, the total decay widths and the differential decay rates with respect to the invariant mass of lepton pair are analyzed for two typical models such as two higgs doublet model and triplet higgs model. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 35 pages

Report number: DTU_2024-01

arXiv:2404.01041 [pdf, other]

Can LLMs get help from other LLMs without revealing private information?

Authors: Florian Hartmann, Duc-Hieu Tran, Peter Kairouz, Victor Cărbune, Blaise Aguera y Arcas

Abstract: Cascades are a common type of machine learning systems in which a large, remote model can be queried if a local model is not able to accurately label a user's data by itself. Serving stacks for large language models (LLMs) increasingly use cascades due to their ability to preserve task performance while dramatically reducing inference costs. However, applying cascade systems in situations where th… ▽ More Cascades are a common type of machine learning systems in which a large, remote model can be queried if a local model is not able to accurately label a user's data by itself. Serving stacks for large language models (LLMs) increasingly use cascades due to their ability to preserve task performance while dramatically reducing inference costs. However, applying cascade systems in situations where the local model has access to sensitive data constitutes a significant privacy risk for users since such data could be forwarded to the remote model. In this work, we show the feasibility of applying cascade systems in such setups by equip** the local model with privacy-preserving techniques that reduce the risk of leaking private information when querying the remote model. To quantify information leakage in such setups, we introduce two privacy measures. We then propose a system that leverages the recently introduced social learning paradigm in which LLMs collaboratively learn from each other by exchanging natural language. Using this paradigm, we demonstrate on several datasets that our methods minimize the privacy loss while at the same time improving task performance compared to a non-cascade baseline. △ Less

Submitted 2 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.18802 [pdf, other]

Long-form factuality in large language models

Authors: Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

Abstract: Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factua… ▽ More Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether a fact is supported by the search results. Furthermore, we propose extending F1 score as an aggregated metric for long-form factuality. To do so, we balance the percentage of supported facts in a response (precision) with the percentage of provided facts relative to a hyperparameter representing a user's preferred response length (recall). Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time. At the same time, SAFE is more than 20 times cheaper than human annotators. We also benchmark thirteen language models on LongFact across four model families (Gemini, GPT, Claude, and PaLM-2), finding that larger language models generally achieve better long-form factuality. LongFact, SAFE, and all experimental code are available at https://github.com/google-deepmind/long-form-factuality. △ Less

Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.13533 [pdf, ps, other]

On Sums of Practical Numbers and Polygonal Numbers

Authors: Sai Teja Somu, Duc Van Khanh Tran

Abstract: Practical numbers are positive integers $n$ such that every positive integer less than or equal to $n$ can be written as a sum of distinct positive divisors of $n$. In this paper, we show that all positive integers can be written as a sum of a practical number and a triangular number, resolving a conjecture by Sun. We also show that all sufficiently large natural numbers can be written as a sum of… ▽ More Practical numbers are positive integers $n$ such that every positive integer less than or equal to $n$ can be written as a sum of distinct positive divisors of $n$. In this paper, we show that all positive integers can be written as a sum of a practical number and a triangular number, resolving a conjecture by Sun. We also show that all sufficiently large natural numbers can be written as a sum of a practical number and two $s$-gonal numbers. △ Less

Submitted 20 March, 2024; originally announced March 2024.

MSC Class: 11B83; 11D85; 11A99

Journal ref: Journal of Integer Sequences, 27(5), Article 24.5.1, 2024

arXiv:2403.10297 [pdf, other]

Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression

Authors: Huy-Hoang Bui, Bach-Thuan Bui, Dinh-Tuan Tran, Joo-Ho Lee

Abstract: Classical structural-based visual localization methods offer high accuracy but face trade-offs in terms of storage, speed, and privacy. A recent innovation, keypoint scene coordinate regression (KSCR) named D2S addresses these issues by leveraging graph attention networks to enhance keypoint relationships and predict their 3D coordinates using a simple multilayer perceptron (MLP). Camera pose is t… ▽ More Classical structural-based visual localization methods offer high accuracy but face trade-offs in terms of storage, speed, and privacy. A recent innovation, keypoint scene coordinate regression (KSCR) named D2S addresses these issues by leveraging graph attention networks to enhance keypoint relationships and predict their 3D coordinates using a simple multilayer perceptron (MLP). Camera pose is then determined via PnP+RANSAC, using established 2D-3D correspondences. While KSCR achieves competitive results, rivaling state-of-the-art image-retrieval methods like HLoc across multiple benchmarks, its performance is hindered when data samples are limited due to the deep learning model's reliance on extensive data. This paper proposes a solution to this challenge by introducing a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF). By generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's generalization capabilities in data-scarce environments. The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis. Furthermore, its modular design allows for the integration of multiple NeRFs, offering a versatile and efficient solution for visual localization. The implementation is publicly available at: https://github.com/ais-lab/DescriptorSynthesis4Feat2Map. △ Less

Submitted 19 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.09579 [pdf, other]

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

Authors: Afrina Tabassum, Dung Tran, Trung Dang, Ismini Lourentzou, Kazuhito Koishida

Abstract: Masked Autoencoders (MAEs) learn rich low-level representations from unlabeled data but require substantial labeled data to effectively adapt to downstream tasks. Conversely, Instance Discrimination (ID) emphasizes high-level semantics, offering a potential solution to alleviate annotation requirements in MAEs. Although combining these two approaches can address downstream tasks with limited label… ▽ More Masked Autoencoders (MAEs) learn rich low-level representations from unlabeled data but require substantial labeled data to effectively adapt to downstream tasks. Conversely, Instance Discrimination (ID) emphasizes high-level semantics, offering a potential solution to alleviate annotation requirements in MAEs. Although combining these two approaches can address downstream tasks with limited labeled data, naively integrating ID into MAEs leads to extended training times and high computational costs. To address this challenge, we introduce uaMix-MAE, an efficient ID tuning strategy that leverages unsupervised audio mixtures. Utilizing contrastive tuning, uaMix-MAE aligns the representations of pretrained MAEs, thereby facilitating effective adaptation to task-specific semantics. To optimize the model with small amounts of unlabeled data, we propose an audio mixing technique that manipulates audio samples in both input and virtual label spaces. Experiments in low/few-shot settings demonstrate that \modelname achieves 4-6% accuracy improvements over various benchmarks when tuned with limited unlabeled data, such as AudioSet-20K. Code is available at https://github.com/PLAN-Lab/uamix-MAE △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 5 pages, 6 figures, 4 tables. To appear in ICASSP'2024

arXiv:2403.07763 [pdf, other]

Emerging Technologies for 6G Non-Terrestrial-Networks: From Academia to Industrial Applications

Authors: Cong T. Nguyen, Yuris Mulya Saputra, Nguyen Van Huynh, Tan N. Nguyen, Dinh Thai Hoang, Diep N Nguyen, Van-Quan Pham, Miroslav Voznak, Symeon Chatzinotas, Dinh-Hieu Tran

Abstract: Terrestrial networks form the fundamental infrastructure of modern communication systems, serving more than 4 billion users globally. However, terrestrial networks are facing a wide range of challenges, from coverage and reliability to interference and congestion. As the demands of the 6G era are expected to be much higher, it is crucial to address these challenges to ensure a robust and efficient… ▽ More Terrestrial networks form the fundamental infrastructure of modern communication systems, serving more than 4 billion users globally. However, terrestrial networks are facing a wide range of challenges, from coverage and reliability to interference and congestion. As the demands of the 6G era are expected to be much higher, it is crucial to address these challenges to ensure a robust and efficient communication infrastructure for the future. To address these problems, Non-terrestrial Network (NTN) has emerged to be a promising solution. NTNs are communication networks that leverage airborne (e.g., unmanned aerial vehicles) and spaceborne vehicles (e.g., satellites) to facilitate ultra-reliable communications and connectivity with high data rates and low latency over expansive regions. This article aims to provide a comprehensive survey on the utilization of network slicing, Artificial Intelligence/Machine Learning (AI/ML), and Open Radio Access Network (ORAN) to address diverse challenges of NTNs from the perspectives of both academia and industry. Particularly, we first provide an in-depth tutorial on NTN and the key enabling technologies including network slicing, AI/ML, and ORAN. Then, we provide a comprehensive survey on how network slicing and AI/ML have been leveraged to overcome the challenges that NTNs are facing. Moreover, we present how ORAN can be utilized for NTNs. Finally, we highlight important challenges, open issues, and future research directions of NTN in the 6G era. △ Less

Submitted 3 July, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: 35 pages

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.03212 [pdf, other]

Performance of a modular ton-scale pixel-readout liquid argon time projection chamber

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, D. A. Andrade , et al. (1340 additional authors not shown)

Abstract: The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmi… ▽ More The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmic ray events collected in the spring of 2021. We use this sample to demonstrate the imaging performance of the charge and light readout systems as well as the signal correlations between the two. We also report argon purity and detector uniformity measurements, and provide comparisons to detector simulations. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 47 pages, 41 figures

Report number: FERMILAB-PUB-24-0073-LBNF

arXiv:2403.01454 [pdf, ps, other]

Maximum Length RLL Sequences in de Bruijn Graph

Authors: Yeow Meng Chee, Tuvi Etzion, Tien Long Nguyen, Duy Hoang Ta, Vinh Duc Tran, Van Khu Vu

Abstract: A timing and synchronization system based on a de Bruijn sequence has been proposed and studied recently for a channel associated with quantum communication that requires reliable synchronization. To avoid a long period of no-pulse in such a system on-off pulses are used to simulate a zero and on-on pulses are used to simulate a one. However, these sequences have high redundancy. To reduce the red… ▽ More A timing and synchronization system based on a de Bruijn sequence has been proposed and studied recently for a channel associated with quantum communication that requires reliable synchronization. To avoid a long period of no-pulse in such a system on-off pulses are used to simulate a zero and on-on pulses are used to simulate a one. However, these sequences have high redundancy. To reduce the redundancy, run-length limited sequences in the de Bruijn graph are proposed for the same purpose. The maximum length of such sequences in the de Bruijn graph is studied and an efficient algorithm to construct a large set of these sequences is presented. A maximum length sequence for which the position of each window can be computed efficiently is constructed. Finally, an enumeration of the number of such sequences is given and some generalizations are discussed. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.18067

A gradient flow method for smooth splines versus least-squares fitting on Riemannian manifolds

Authors: Chun-Chi Lin, The Dung Tran

Abstract: This article presents a novel resolution to the problem of spline interpolation versus least-squares fitting on smooth Riemannian manifolds utilizing the method of gradient flows of networks. This approach represents a contribution to both geometric control theory and statistical shape data analysis. Our work encompasses a rigorous proof for the existence of global solutions in Hölder spaces for t… ▽ More This article presents a novel resolution to the problem of spline interpolation versus least-squares fitting on smooth Riemannian manifolds utilizing the method of gradient flows of networks. This approach represents a contribution to both geometric control theory and statistical shape data analysis. Our work encompasses a rigorous proof for the existence of global solutions in Hölder spaces for the gradient flow. The asymptotic limits of these solutions establish the existence of the spline interpolation versus least-squares fitting problem on smooth Riemannian manifolds, offering a comprehensive solution. Notably, the constructive nature of the proof suggests potential numerical schemes for finding solutions. △ Less

Submitted 29 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: the article is merged into another one of the authors' papers, arXiv:2312.10513

MSC Class: primary 35K52; 49J20; secondary 41A15; 35K25

arXiv:2402.18011 [pdf, other]

Representing 3D sparse map points and lines for camera relocalization

Authors: Bach-Thuan Bui, Huy-Hoang Bui, Dinh-Tuan Tran, Joo-Ho Lee

Abstract: Recent advancements in visual localization and map** have demonstrated considerable success in integrating point and line features. However, expanding the localization framework to include additional map** components frequently results in increased demand for memory and computational resources dedicated to matching tasks. In this study, we show how a lightweight neural network can learn to rep… ▽ More Recent advancements in visual localization and map** have demonstrated considerable success in integrating point and line features. However, expanding the localization framework to include additional map** components frequently results in increased demand for memory and computational resources dedicated to matching tasks. In this study, we show how a lightweight neural network can learn to represent both 3D point and line features, and exhibit leading pose accuracy by harnessing the power of multiple learned map**s. Specifically, we utilize a single transformer block to encode line features, effectively transforming them into distinctive point-like descriptors. Subsequently, we treat these point and line descriptor sets as distinct yet interconnected feature sets. Through the integration of self- and cross-attention within several graph layers, our method effectively refines each feature before regressing 3D maps using two simple MLPs. In comprehensive experiments, our indoor localization findings surpass those of Hloc and Limap across both point-based and line-assisted configurations. Moreover, in outdoor scenarios, our method secures a significant lead, marking the most considerable enhancement over state-of-the-art learning-based methodologies. The source code and demo videos of this work are publicly available at: https://thpjp.github.io/pl2map/ △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.16214 [pdf, ps, other]

Products and powers of principal symmetric ideals

Authors: Eric Dannetun, Riccardo Formenti, Bo Y. Gao, Juliann Geraci, Ross Kogel, Yuelin Li, Shreya Mandal, Vinuge Rupasinghe, Alexandra Seceleanu, Duc Van Khank Tran, Noah Walker

Abstract: Principal symmetric ideals were recently introduced by Harada, Seceleanu, and Sega, with a focus on their homological properties. They are ideals generated by the orbit of a single polynomial under permutations of variables in a polynomial ring. In this paper we seek to determine when a product of two principal symmetric ideals is principal symmetric and when all the powers of a principal symmetri… ▽ More Principal symmetric ideals were recently introduced by Harada, Seceleanu, and Sega, with a focus on their homological properties. They are ideals generated by the orbit of a single polynomial under permutations of variables in a polynomial ring. In this paper we seek to determine when a product of two principal symmetric ideals is principal symmetric and when all the powers of a principal symmetric ideal are again principal symmetric ideals. We characterize the ideals that have the latter property as being generated by polynomials invariant up to a scalar multiple under permutation of variables. Recognizing principal symmetric ideals is an open question for the purpose of which we produce certain obstructions. We also demonstrate that the Hilbert functions of symmetric monomial ideals are not all given by symmetric monomial ideals, in contrast to the non-symmetric case. △ Less

Submitted 17 June, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

Comments: A flaw in the proof of Theorem 3.5 in version 1 = Theorem 3.6 in version 2 was remedied

MSC Class: Primary: 13A50; 13C13; Secondary: 13D40; 13F20

arXiv:2402.08643 [pdf, other]

Learned Image Compression with Text Quality Enhancement

Authors: Chih-Yu Lai, Dung Tran, Kazuhito Koishida

Abstract: Learned image compression has gained widespread popularity for their efficiency in achieving ultra-low bit-rates. Yet, images containing substantial textual content, particularly screen-content images (SCI), often suffers from text distortion at such compressed levels. To address this, we propose to minimize a novel text logit loss designed to quantify the disparity in text between the original an… ▽ More Learned image compression has gained widespread popularity for their efficiency in achieving ultra-low bit-rates. Yet, images containing substantial textual content, particularly screen-content images (SCI), often suffers from text distortion at such compressed levels. To address this, we propose to minimize a novel text logit loss designed to quantify the disparity in text between the original and reconstructed images, thereby improving the perceptual quality of the reconstructed text. Through rigorous experimentation across diverse datasets and employing state-of-the-art algorithms, our findings reveal significant enhancements in the quality of reconstructed text upon integration of the proposed loss function with appropriate weighting. Notably, we achieve a Bjontegaard delta (BD) rate of -32.64% for Character Error Rate (CER) and -28.03% for Word Error Rate (WER) on average by applying the text logit loss for two screenshot datasets. Additionally, we present quantitative metrics tailored for evaluating text quality in image compression tasks. Our findings underscore the efficacy and potential applicability of our proposed text logit loss function across various text-aware image compression contexts. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Submitted to ICIP 2024

arXiv:2402.01568 [pdf, other]

Do** Liquid Argon with Xenon in ProtoDUNE Single-Phase: Effects on Scintillation Light

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, H. Amar Es-sghir, P. Amedo, J. Anderson, D. A. Andrade, C. Andreopoulos , et al. (1300 additional authors not shown)

Abstract: Do** of liquid argon TPCs (LArTPCs) with a small concentration of xenon is a technique for light-shifting and facilitates the detection of the liquid argon scintillation light. In this paper, we present the results of the first do** test ever performed in a kiloton-scale LArTPC. From February to May 2020, we carried out this special run in the single-phase DUNE Far Detector prototype (ProtoDUN… ▽ More Do** of liquid argon TPCs (LArTPCs) with a small concentration of xenon is a technique for light-shifting and facilitates the detection of the liquid argon scintillation light. In this paper, we present the results of the first do** test ever performed in a kiloton-scale LArTPC. From February to May 2020, we carried out this special run in the single-phase DUNE Far Detector prototype (ProtoDUNE-SP) at CERN, featuring 770 t of total liquid argon mass with 410 t of fiducial mass. The goal of the run was to measure the light and charge response of the detector to the addition of xenon, up to a concentration of 18.8 ppm. The main purpose was to test the possibility for reduction of non-uniformities in light collection, caused by deployment of photon detectors only within the anode planes. Light collection was analysed as a function of the xenon concentration, by using the pre-existing photon detection system (PDS) of ProtoDUNE-SP and an additional smaller set-up installed specifically for this run. In this paper we first summarize our current understanding of the argon-xenon energy transfer process and the impact of the presence of nitrogen in argon with and without xenon dopant. We then describe the key elements of ProtoDUNE-SP and the injection method deployed. Two dedicated photon detectors were able to collect the light produced by xenon and the total light. The ratio of these components was measured to be about 0.65 as 18.8 ppm of xenon were injected. We performed studies of the collection efficiency as a function of the distance between tracks and light detectors, demonstrating enhanced uniformity of response for the anode-mounted PDS. We also show that xenon do** can substantially recover light losses due to contamination of the liquid argon by nitrogen. △ Less

Submitted 9 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 35 pages, 20 figures

Report number: CERN-EP-2024-024; FERMILAB-PUB-23-0819-LBNF

arXiv:2402.01198 [pdf, other]

Physical Layer Location Privacy in SIMO Communication Using Fake Paths Injection

Authors: Trong Duy Tran, Maxime Ferreira Da Costa, Linh Trung Nguyen

Abstract: Fake path injection is an emerging paradigm for inducing privacy over wireless networks. In this paper, fake paths are injected by the transmitter into a SIMO multipath communication channel to preserve her physical location from an eavesdropper. A novel statistical privacy metric is defined as the ratio between the largest (resp. smallest) eigenvalues of Bob's (resp. Eve's) Cramér-Rao lower bound… ▽ More Fake path injection is an emerging paradigm for inducing privacy over wireless networks. In this paper, fake paths are injected by the transmitter into a SIMO multipath communication channel to preserve her physical location from an eavesdropper. A novel statistical privacy metric is defined as the ratio between the largest (resp. smallest) eigenvalues of Bob's (resp. Eve's) Cramér-Rao lower bound on the SIMO multipath channel parameters to assess the privacy enhancements. Leveraging the spectral properties of generalized Vandermonde matrices, bounds on the privacy margin of the proposed scheme are derived. Specifically, it is shown that the privacy margin increases quadratically in the inverse of the separation between the true and the fake paths under Eve's perspective. Numerical simulations further showcase the approach's benefit. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.08060 [pdf, other]

Fundamental Convergence Analysis of Sharpness-Aware Minimization

Authors: Pham Duy Khanh, Hoang-Chau Luong, Boris S. Mordukhovich, Dat Ba Tran

Abstract: The paper investigates the fundamental convergence properties of Sharpness-Aware Minimization (SAM), a recently proposed gradient-based optimization method (Foret et al., 2021) that significantly improves the generalization of deep neural networks. The convergence properties including the stationarity of accumulation points, the convergence of the sequence of gradients to the origin, the sequence… ▽ More The paper investigates the fundamental convergence properties of Sharpness-Aware Minimization (SAM), a recently proposed gradient-based optimization method (Foret et al., 2021) that significantly improves the generalization of deep neural networks. The convergence properties including the stationarity of accumulation points, the convergence of the sequence of gradients to the origin, the sequence of function values to the optimal value, and the sequence of iterates to the optimal solution are established for the method. The universality of the provided convergence analysis based on inexact gradient descent frameworks (Khanh et al., 2023b) allows its extensions to the normalized versions of SAM such as VaSSO (Li & Giannakis, 2023), RSAM (Liu et al., 2022), and to the unnormalized versions of SAM such as USAM (Andriushchenko & Flammarion, 2022). Numerical experiments are conducted on classification tasks using deep learning models to confirm the practical aspects of our analysis. △ Less

Submitted 29 February, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: 21 pages

arXiv:2401.05552 [pdf, other]

Continuous-Wave Cavity Ring-Down for High-Sensitivity Polarimetry and Magnetometry Measurements

Authors: Dang Bao An Tran, Evan Edwards, David Tew, Robert Peverall, Grant Ritchie

Abstract: We report the development of a novel variant of cavity ring-down polarimetry using a continuous-wave laser operating at 532 nm for highly precise chiroptical activity and magnetometry measurements. The key methodology of the apparatus relies upon the external modulation of the laser frequency at the frequency splitting between non-degenerate left- and right-circularly polarised cavity modes. The m… ▽ More We report the development of a novel variant of cavity ring-down polarimetry using a continuous-wave laser operating at 532 nm for highly precise chiroptical activity and magnetometry measurements. The key methodology of the apparatus relies upon the external modulation of the laser frequency at the frequency splitting between non-degenerate left- and right-circularly polarised cavity modes. The method is demonstrated by evaluation of the Verdet constants of crystalline CeF3 and fused silica, in addition to the observation of gas- and solution-phase optical rotations of selected chiral molecules. Specifically, optical rotations of (i) vapours of alpha-pinene and R-(+)-limonene, (ii) mutarotating D-glucose in water, and (iii) acidified L-histidine solutions, are determined. The detection sensitivities for the gas- and solution phase chiral activity measurements are ~30 microdeg and ~120 microdeg over a 30 s detection period per cavity roundtrip pass, respectively. Furthermore, the measured optical rotations for R-(+)-limonene are compared with computations performed using the Turbomole quantum chemistry package. The experimentally observed optically rotatory dispersion of this cyclic monoterpene was thus rationalised via consideration of its room temperature conformer distribution as determined by the aforementioned single-point energy calculations. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 12 pages, 8 figures

arXiv:2312.17255 [pdf, other]

Single-channel speech enhancement using learnable loss mixup

Authors: Oscar Chang, Dung N. Tran, Kazuhito Koishida

Abstract: Generalization remains a major problem in supervised learning of single-channel speech enhancement. In this work, we propose learnable loss mixup (LLM), a simple and effortless training diagram, to improve the generalization of deep learning-based speech enhancement models. Loss mixup, of which learnable loss mixup is a special variant, optimizes a mixture of the loss functions of random sample pa… ▽ More Generalization remains a major problem in supervised learning of single-channel speech enhancement. In this work, we propose learnable loss mixup (LLM), a simple and effortless training diagram, to improve the generalization of deep learning-based speech enhancement models. Loss mixup, of which learnable loss mixup is a special variant, optimizes a mixture of the loss functions of random sample pairs to train a model on virtual training data constructed from these pairs of samples. In learnable loss mixup, by conditioning on the mixed data, the loss functions are mixed using a non-linear mixing function automatically learned via neural parameterization. Our experimental results on the VCTK benchmark show that learnable loss mixup achieves 3.26 PESQ, outperforming the state-of-the-art. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.11427 [pdf, other]

$(g-2)_{e,μ}$ anomalies and decays $h\to e_a e_b$, $Z\to e_ae_b$, and $e_b\to e_a γ$ in a two Higgs doublet model with inverse seesaw neutrinos

Authors: T. T. Hong, Q. Duyet Tran, T. Phong Nguyen, L. T. Hue, N. H. T. Nha

Abstract: The lepton flavor violating decays $h\to e_b^\pm e_a^\mp $, $Z\to e_b^\pm e_a^\mp$, and $e_b\to e_a γ$ will be discussed in the framework of the Two Higgs doublet model with presence of new inverse seesaw neutrinos and a singly charged Higgs boson that accommodate both $1σ$ experimental data of $(g-2)$ anomalies of the muon and electron. Numerical results indicate that there exist regions of the p… ▽ More The lepton flavor violating decays $h\to e_b^\pm e_a^\mp $, $Z\to e_b^\pm e_a^\mp$, and $e_b\to e_a γ$ will be discussed in the framework of the Two Higgs doublet model with presence of new inverse seesaw neutrinos and a singly charged Higgs boson that accommodate both $1σ$ experimental data of $(g-2)$ anomalies of the muon and electron. Numerical results indicate that there exist regions of the parameter space supporting all experimental data of $(g-2)_{e,μ}$ as well as the promising LFV signals corresponding to the future experimental sensitivities. △ Less

Submitted 28 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Version accepted for publication in EPJC

arXiv:2312.10513 [pdf, ps, other]

Higher-order Riemannian splines and the interpolation problem: an approach of gradient flows

Authors: Chun-Chi Lin, Dung The Tran

Abstract: In this paper, we resolve problems of spline interpolation, regardless of whether least-squares fitting is incorporated, on smooth Riemannian manifolds. Our approach leverages the concept of gradient flows for successively connected curves or networks, offering a fresh perspective on addressing such challenges. Notably, this method extends to the problem of spline interpolation on Lie groups, comm… ▽ More In this paper, we resolve problems of spline interpolation, regardless of whether least-squares fitting is incorporated, on smooth Riemannian manifolds. Our approach leverages the concept of gradient flows for successively connected curves or networks, offering a fresh perspective on addressing such challenges. Notably, this method extends to the problem of spline interpolation on Lie groups, commonly encountered in mechanical optimal control theory formulations, thus contributing to both geometric control theory and statistical shape data analysis. We rigorously establish the existence of global solutions in Hölder spaces for the gradient flow, with the asymptotic limits of these solutions confirming the existence to the problem of spline interpolation. This comprehensive solution underscores the constructive nature of our proof, hinting at potential numerical schemes for discovering solutions. △ Less

Submitted 5 June, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

MSC Class: 35K52; 49J20; 41A15; 35K25

arXiv:2312.02192 [pdf, other]

DiverseDream: Diverse Text-to-3D Synthesis with Augmented Text Embedding

Authors: Uy Dieu Tran, Minh Luu, Phong Nguyen, Janne Heikkila, Khoi Nguyen, Binh-Son Hua

Abstract: Text-to-3D synthesis has recently emerged as a new approach to sampling 3D models by adopting pretrained text-to-image models as guiding visual priors. An intriguing but underexplored problem with existing text-to-3D methods is that 3D models obtained from the sampling-by-optimization procedure tend to have mode collapses, and hence poor diversity in their results. In this paper, we provide an ana… ▽ More Text-to-3D synthesis has recently emerged as a new approach to sampling 3D models by adopting pretrained text-to-image models as guiding visual priors. An intriguing but underexplored problem with existing text-to-3D methods is that 3D models obtained from the sampling-by-optimization procedure tend to have mode collapses, and hence poor diversity in their results. In this paper, we provide an analysis and identify potential causes of such a limited diversity, and then devise a new method that considers the joint generation of different 3D models from the same text prompt, where we propose to use augmented text prompts via textual inversion of reference images to diversify the joint generation. We show that our method leads to improved diversity in text-to-3D synthesis qualitatively and quantitatively. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.16850 [pdf, other]

General Derivative-Free Optimization Methods under Global and Local Lipschitz Continuity of Gradients

Authors: Pham Duy Khanh, Boris S. Mordukhovich, Dat Ba Tran

Abstract: This paper addresses the study of derivative-free smooth optimization problems, where the gradient information on the objective function is unavailable. Two novel general derivative-free methods are proposed and developed for minimizing such functions with either global or local Lipschitz continuous gradients. The newly developed methods use gradient approximations based on finite differences, whe… ▽ More This paper addresses the study of derivative-free smooth optimization problems, where the gradient information on the objective function is unavailable. Two novel general derivative-free methods are proposed and developed for minimizing such functions with either global or local Lipschitz continuous gradients. The newly developed methods use gradient approximations based on finite differences, where finite difference intervals are automatically adapted to the magnitude of the exact gradients without knowing them exactly. The suggested algorithms achieve fundamental convergence results, including stationarity of accumulation points in general settings as well as global convergence with constructive convergence rates when the Kurdyka-Łojasiewicz property is imposed. The local convergence of the proposed algorithms to nonisolated local minimizers, along with their local convergence rates, is also analyzed under this property. Numerical experiences involving various convex, nonconvex, noiseless, and noisy functions demonstrate that the new methods exhibit essential advantages over other state-of-the-art methods in derivative-free optimization. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 30 pages, 49 figures

MSC Class: 90C25; 90C26; 90C30; 90C56

arXiv:2311.10810 [pdf]

Use GPT-J Prompt Generation with RoBERTa for NER Models on Diagnosis Extraction of Periodontal Diagnosis from Electronic Dental Records

Authors: Yao-Shun Chuang, Xiaoqian Jiang, Chun-Teh Lee, Ryan Brandon, Duong Tran, Oluwabunmi Tokede, Muhammad F. Walji

Abstract: This study explored the usability of prompt generation on named entity recognition (NER) tasks and the performance in different settings of the prompt. The prompt generation by GPT-J models was utilized to directly test the gold standard as well as to generate the seed and further fed to the RoBERTa model with the spaCy package. In the direct test, a lower ratio of negative examples with higher nu… ▽ More This study explored the usability of prompt generation on named entity recognition (NER) tasks and the performance in different settings of the prompt. The prompt generation by GPT-J models was utilized to directly test the gold standard as well as to generate the seed and further fed to the RoBERTa model with the spaCy package. In the direct test, a lower ratio of negative examples with higher numbers of examples in prompt achieved the best results with a F1 score of 0.72. The performance revealed consistency, 0.92-0.97 in the F1 score, in all settings after training with the RoBERTa model. The study highlighted the importance of seed quality rather than quantity in feeding NER models. This research reports on an efficient and accurate way to mine clinical notes for periodontal diagnoses, allowing researchers to easily and quickly build a NER model with the prompt generation approach. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 2023 AMIA Annual Symposium, see https://amia.org/education-events/amia-2023-annual-symposium

arXiv:2311.10809 [pdf]

Extracting periodontitis diagnosis in clinical notes with RoBERTa and regular expression

Authors: Yao-Shun Chuang, Chun-Teh Lee, Ryan Brandon, Trung Duong Tran, Oluwabunmi Tokede, Muhammad F. Walji, Xiaoqian Jiang

Abstract: This study aimed to utilize text processing and natural language processing (NLP) models to mine clinical notes for the diagnosis of periodontitis and to evaluate the performance of a named entity recognition (NER) model on different regular expression (RE) methods. Two complexity levels of RE methods were used to extract and generate the training data. The SpaCy package and RoBERTa transformer mo… ▽ More This study aimed to utilize text processing and natural language processing (NLP) models to mine clinical notes for the diagnosis of periodontitis and to evaluate the performance of a named entity recognition (NER) model on different regular expression (RE) methods. Two complexity levels of RE methods were used to extract and generate the training data. The SpaCy package and RoBERTa transformer models were used to build the NER model and evaluate its performance with the manual-labeled gold standards. The comparison of the RE methods with the gold standard showed that as the complexity increased in the RE algorithms, the F1 score increased from 0.3-0.4 to around 0.9. The NER models demonstrated excellent predictions, with the simple RE method showing 0.84-0.92 in the evaluation metrics, and the advanced and combined RE method demonstrating 0.95-0.99 in the evaluation. This study provided an example of the benefit of combining NER methods and NLP models in extracting target information from free-text to structured data and fulfilling the need for missing diagnoses from unstructured notes. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: IEEE ICHI 2023, see https://ieeeichi.github.io/ICHI2023/program.html

arXiv:2311.09455 [pdf, ps, other]

Central limit theorems for Fréchet means on stratified spaces

Authors: Jonathan C. Mattingly, Ezra Miller, Do Tran

Abstract: Fréchet means of samples from a probability measure $μ$ on any smoothly stratified metric space M with curvature bounded above are shown to satisfy a central limit theorem (CLT). The methods and results proceed by introducing and proving analytic properties of the "escape vector" of any finitely supported measure $δ$ in M, which records infinitesimal variation of the Fréchet mean $\barμ$ of $μ$ in… ▽ More Fréchet means of samples from a probability measure $μ$ on any smoothly stratified metric space M with curvature bounded above are shown to satisfy a central limit theorem (CLT). The methods and results proceed by introducing and proving analytic properties of the "escape vector" of any finitely supported measure $δ$ in M, which records infinitesimal variation of the Fréchet mean $\barμ$ of $μ$ in response to perturbation of $μ$ by adding the mass $tδ$ for $t \to 0$. The CLT limiting distribution $N$ on the tangent cone $T$ at the Fréchet mean is characterized in four ways. The first uses tangential collapse $L$ to compare $T$ with a linear space and then applies a distortion map to the usual linear CLT to transfer back to $T$. Distortion is defined by applying escape after taking preimages under $L$. The second characterization constructs singular analogues of Gaussian measures on smoothly stratified spaces and expresses $N$ as the escape vector of any such "Gaussian mass". The third characterization expresses $N$ as the directional derivative, in the space of measures on $M$, of the barycenter map at $μ$ in the (random) direction given by any Gaussian mass. The final characterization expresses $N$ as the directional derivative, in the space $C$ of continuous real-valued functions on $T$, of a minimizer map, with the derivative taken at the Fréchet function $F \in C$ along the (random) direction given by the negative of the Gaussian tangent field induced by $μ$. Precise mild hypotheses on the measure $μ$ guarantee these CLTs, whose convergence is proved via the second characterization of $N$ by formulating a duality between Gaussian masses and Gaussian tangent fields. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 66 pages. This is the last in a four-part series that started with shadow geometry, geometry of measures on stratified spaces, and a CLT for random tangent fields

MSC Class: Primary: 60F05; 53C23; 60D05; 60B05; 49J52; 62R20; 62R07; 57N80; 58A35; 47H14; 58C20; 49J50; 58Z05; 53C80; 62G20; 62R30; 28C99; Secondary: 58K30; 62G35; 92B10

arXiv:2311.09454 [pdf, ps, other]

A central limit theorem for random tangent fields on stratified spaces

Authors: Jonathan C. Mattingly, Ezra Miller, Do Tran

Abstract: Variation of empirical Fréchet means on a metric space with curvature bounded above is encoded via random fields indexed by unit tangent vectors. A central limit theorem shows these random tangent fields converge to a Gaussian such field and lays the foundation for more traditionally formulated central limit theorems in subsequent work. Variation of empirical Fréchet means on a metric space with curvature bounded above is encoded via random fields indexed by unit tangent vectors. A central limit theorem shows these random tangent fields converge to a Gaussian such field and lays the foundation for more traditionally formulated central limit theorems in subsequent work. △ Less

Submitted 15 February, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 16 pages. v1: This is the third in a four-part series, starting with shadow geometry and geometry of measures on stratified spaces, leading to central limit theorems for Fréchet means on stratified spaces. v2: updated intro and minor corrections

MSC Class: Primary: 60F05; 53C23; 60D05; 60B05; 49J52; 62R20; 62R07; 57N80; 58A35; 58Z05; 53C80; 62G20; 62R30; 28C99; Secondary: 58K30; 57R57; 92B10

arXiv:2311.09453 [pdf, ps, other]

Geometry of measures on smoothly stratified metric spaces

Authors: Jonathan C. Mattingly, Ezra Miller, Do Tran

Abstract: Any measure $μ$ on a CAT(k) space M that is stratified as a finite union of manifolds and has local exponential maps near the Fréchet mean $\barμ$ yields a continuous "tangential collapse" from the tangent cone of M at $\barμ$ to a vector space that preserves the Fréchet mean, restricts to an isometry on the "fluctuating cone" of directions in which the Fréchet mean can vary under perturbation of… ▽ More Any measure $μ$ on a CAT(k) space M that is stratified as a finite union of manifolds and has local exponential maps near the Fréchet mean $\barμ$ yields a continuous "tangential collapse" from the tangent cone of M at $\barμ$ to a vector space that preserves the Fréchet mean, restricts to an isometry on the "fluctuating cone" of directions in which the Fréchet mean can vary under perturbation of $μ$, and preserves angles between arbitrary and fluctuating tangent vectors at the Fréchet mean. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 32 pages, 4 figures. This is the second in a four-part series starting with shadow geometry and leading to central limit theorems on stratified spaces

MSC Class: Primary: 60D05; 53C23; 28C99; 57N80; 58A35; 62G20; 49J52; 62R20; 62R07; 53C80; 58Z05; 60B05; 62R30; Secondary: 60F05; 58K30; 57R57; 92B10

arXiv:2311.09451 [pdf, ps, other]

Shadow geometry at singular points of CAT(k) spaces

Authors: Jonathan C. Mattingly, Ezra Miller, Do Tran

Abstract: In any CAT(k) space M, the "shadow" of a tangent vector Z at a point p is the set vectors that form an angle of πor more with Z. Taking logarithm maps at points approaching p along a fixed geodesic ray from p with tangent Z collapses the shadow to a single ray while leaving isometrically intact every convex cone that avoids the shadow of Z. In any CAT(k) space M, the "shadow" of a tangent vector Z at a point p is the set vectors that form an angle of πor more with Z. Taking logarithm maps at points approaching p along a fixed geodesic ray from p with tangent Z collapses the shadow to a single ray while leaving isometrically intact every convex cone that avoids the shadow of Z. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 19 pages, 8 figures. This is the first in a four-part series leading to central limit theorems on stratified spaces

MSC Class: Primary: 53C23; 49J52; 58K30; 53C80; Secondary: 60F05; 60D05; 62R20; 62R07; 92B10

arXiv:2311.02998 [pdf, ps, other]

One-loop contributions for $h\rightarrow \ell \bar{\ell}γ$ and $e^-e^+\rightarrow hγ$ in $U(1)_{B-L}$ extension of the standard model

Authors: Dzung Tri Tran, Thanh Huy Nguyen, Khiem Hong Phan

Abstract: We present one-loop contributing for $h\rightarrow \ell \bar{\ell}γ$ with $\ell =ν_{e,μ, τ}, e, μ$ and $e^-e^+\rightarrow hγ$ in $U(1)_{B-L}$ extension of the standard models. In phenomenological results, the signal strengths for $h\rightarrow \ell \bar{\ell}γ$ at Large Hadron Collider and for $e^-e^+\rightarrow hγ$ at future Lepton Colliders are analyzed in physical parameter space for both vecto… ▽ More We present one-loop contributing for $h\rightarrow \ell \bar{\ell}γ$ with $\ell =ν_{e,μ, τ}, e, μ$ and $e^-e^+\rightarrow hγ$ in $U(1)_{B-L}$ extension of the standard models. In phenomenological results, the signal strengths for $h\rightarrow \ell \bar{\ell}γ$ at Large Hadron Collider and for $e^-e^+\rightarrow hγ$ at future Lepton Colliders are analyzed in physical parameter space for both vector and chiral $B-L$ models. We find that the contributions from neutral gauge boson $Z'$ to the signal strengths are rather small. Consequently, the effects are hard to probe at future colliders. While the impacts of charged Higgs, CP-odd Higgs in the chiral $B-L$ model on the signal strengths are significant and can be measured with the help of the initial polarization beams at future lepton colliders. △ Less

Submitted 30 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: 41 pages, to be published in Chinese Physics C

Report number: DTU2023-03

arXiv:2310.18986 [pdf, other]

Controllable Group Choreography using Contrastive Diffusion

Authors: Nhat Le, Tuong Do, Khoa Do, Hien Nguyen, Erman Tjiputra, Quang D. Tran, Anh Nguyen

Abstract: Music-driven group choreography poses a considerable challenge but holds significant potential for a wide range of industrial applications. The ability to generate synchronized and visually appealing group dance motions that are aligned with music opens up opportunities in many fields such as entertainment, advertising, and virtual performances. However, most of the recent works are not able to ge… ▽ More Music-driven group choreography poses a considerable challenge but holds significant potential for a wide range of industrial applications. The ability to generate synchronized and visually appealing group dance motions that are aligned with music opens up opportunities in many fields such as entertainment, advertising, and virtual performances. However, most of the recent works are not able to generate high-fidelity long-term motions, or fail to enable controllable experience. In this work, we aim to address the demand for high-quality and customizable group dance generation by effectively governing the consistency and diversity of group choreographies. In particular, we utilize a diffusion-based generative approach to enable the synthesis of flexible number of dancers and long-term group dances, while ensuring coherence to the input music. Ultimately, we introduce a Group Contrastive Diffusion (GCD) strategy to enhance the connection between dancers and their group, presenting the ability to control the consistency or diversity level of the synthesized group animation via the classifier-guidance sampling technique. Through intensive experiments and evaluation, we demonstrate the effectiveness of our approach in producing visually captivating and consistent group dance motions. The experimental results show the capability of our method to achieve the desired levels of consistency and diversity, while maintaining the overall quality of the generated group choreography. The source code can be found at https://aioz-ai.github.io/GCD △ Less

Submitted 3 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

arXiv:2310.17955 [pdf, other]

Near-to mid-IR spectral purity transfer with a tunable frequency comb: methanol frequency metrology over a record frequency span

Authors: D B A Tran, Olivier Lopez, M Manceau, A Goncharov, Michel Abgrall, H Alvarez-Martinez, R Le Targat, E Cantin, P. -E Pottie, A Amy-Klein, B Darquié

Abstract: We report the development and operation of a frequency-comb-assisted high-resolution mid-infrared molecular spectrometer combining high spectral purity, SI-traceability, wide tunability and high sensitivity. An optical frequency comb is used to transfer the spectral purity of a SI-traceable 1.54 $μ$m metrology-grade frequency reference to a 10.3 $μ$m quantum cascade laser (QCL). The near-infrared… ▽ More We report the development and operation of a frequency-comb-assisted high-resolution mid-infrared molecular spectrometer combining high spectral purity, SI-traceability, wide tunability and high sensitivity. An optical frequency comb is used to transfer the spectral purity of a SI-traceable 1.54 $μ$m metrology-grade frequency reference to a 10.3 $μ$m quantum cascade laser (QCL). The near-infrared reference is operated at the French time/frequency metrology institute, calibrated there to primary frequency standards, and transferred to Laboratoire de Physique des Lasers via the REFIMEVE fiber network. The QCL exhibits a sub-10 --15 frequency stability from 0.1 to 10 s and its frequency is traceable to the SI with a total uncertainty better than 4 x 10 --14 after 1-s averaging time. We have developed the instrumentation allowing comb modes to be continuously tuned over 9 GHz resulting in a QCL of record spectral purity uninterruptedly tunable at the precision of the reference over an unprecedented span of 1.4 GHz. We have used our apparatus to conduct sub-Doppler spectroscopy of methanol in a multi-pass cell, demonstrating state-of-art frequency uncertainties down to the few kilohertz level. We have observed weak intensity resonances unreported so far, resolved subtle doublets never seen before and brought to light discrepancies with the HITRAN database. This demonstrates the potential of our apparatus for probing subtle internal molecular processes, building accurate spectroscopic models of polyatomic molecules of atmospheric or astrophysical interest, and carrying out precise spectroscopic tests of fundamental physics. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.16990 [pdf, other]

doi 10.18653/v1/2023.emnlp-industry.61

STEER: Semantic Turn Extension-Expansion Recognition for Voice Assistants

Authors: Leon Liyang Zhang, Jiarui Lu, Joel Ruben Antony Moniz, Aditya Kulkarni, Dhivya Piraviperumal, Tien Dung Tran, Nicholas Tzou, Hong Yu

Abstract: In the context of a voice assistant system, steering refers to the phenomenon in which a user issues a follow-up command attempting to direct or clarify a previous turn. We propose STEER, a steering detection model that predicts whether a follow-up turn is a user's attempt to steer the previous command. Constructing a training dataset for steering use cases poses challenges due to the cold-start p… ▽ More In the context of a voice assistant system, steering refers to the phenomenon in which a user issues a follow-up command attempting to direct or clarify a previous turn. We propose STEER, a steering detection model that predicts whether a follow-up turn is a user's attempt to steer the previous command. Constructing a training dataset for steering use cases poses challenges due to the cold-start problem. To overcome this, we developed heuristic rules to sample opt-in usage data, approximating positive and negative samples without any annotation. Our experimental results show promising performance in identifying steering intent, with over 95% accuracy on our sampled data. Moreover, STEER, in conjunction with our sampling strategy, aligns effectively with real-world steering scenarios, as evidenced by its strong zero-shot performance on a human-graded evaluation set. In addition to relying solely on user transcripts as input, we introduce STEER+, an enhanced version of the model. STEER+ utilizes a semantic parse tree to provide more context on out-of-vocabulary words, such as named entities that often occur at the sentence boundary. This further improves model performance, reducing error rate in domains where entities frequently appear, such as messaging. Lastly, we present a data analysis that highlights the improvement in user experience when voice assistants support steering use cases. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Industry Track

arXiv:2310.15543 [pdf, other]

Symmetry-preserving graph attention network to solve routing problems at multiple resolutions

Authors: Cong Dao Tran, Thong Bach, Truong Son Hy

Abstract: Travelling Salesperson Problems (TSPs) and Vehicle Routing Problems (VRPs) have achieved reasonable improvement in accuracy and computation time with the adaptation of Machine Learning (ML) methods. However, none of the previous works completely respects the symmetries arising from TSPs and VRPs including rotation, translation, permutation, and scaling. In this work, we introduce the first-ever co… ▽ More Travelling Salesperson Problems (TSPs) and Vehicle Routing Problems (VRPs) have achieved reasonable improvement in accuracy and computation time with the adaptation of Machine Learning (ML) methods. However, none of the previous works completely respects the symmetries arising from TSPs and VRPs including rotation, translation, permutation, and scaling. In this work, we introduce the first-ever completely equivariant model and training to solve combinatorial problems. Furthermore, it is essential to capture the multiscale structure (i.e. from local to global information) of the input graph, especially for the cases of large and long-range graphs, while previous methods are limited to extracting only local information that can lead to a local or sub-optimal solution. To tackle the above limitation, we propose a Multiresolution scheme in combination with Equivariant Graph Attention network (mEGAT) architecture, which can learn the optimal route based on low-level and high-level graph resolutions in an efficient way. In particular, our approach constructs a hierarchy of coarse-graining graphs from the input graph, in which we try to solve the routing problems on simple low-level graphs first, then utilize that knowledge for the more complex high-level graphs. Experimentally, we have shown that our model outperforms existing baselines and proved that symmetry preservation and multiresolution are important recipes for solving combinatorial problems in a data-driven manner. Our source code is publicly available at https://github.com/HySonLab/Multires-NP-hard △ Less

Submitted 19 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.15516 [pdf, other]

Graph Attention-based Deep Reinforcement Learning for solving the Chinese Postman Problem with Load-dependent costs

Authors: Truong Son Hy, Cong Dao Tran

Abstract: Recently, Deep reinforcement learning (DRL) models have shown promising results in solving routing problems. However, most DRL solvers are commonly proposed to solve node routing problems, such as the Traveling Salesman Problem (TSP). Meanwhile, there has been limited research on applying neural methods to arc routing problems, such as the Chinese Postman Problem (CPP), since they often feature ir… ▽ More Recently, Deep reinforcement learning (DRL) models have shown promising results in solving routing problems. However, most DRL solvers are commonly proposed to solve node routing problems, such as the Traveling Salesman Problem (TSP). Meanwhile, there has been limited research on applying neural methods to arc routing problems, such as the Chinese Postman Problem (CPP), since they often feature irregular and complex solution spaces compared to TSP. To fill these gaps, this paper proposes a novel DRL framework to address the CPP with load-dependent costs (CPP-LC) (Corberan et al., 2018), which is a complex arc routing problem with load constraints. The novelty of our method is two-fold. First, we formulate the CPP-LC as a Markov Decision Process (MDP) sequential model. Subsequently, we introduce an autoregressive model based on DRL, namely Arc-DRL, consisting of an encoder and decoder to address the CPP-LC challenge effectively. Such a framework allows the DRL model to work efficiently and scalably to arc routing problems. Furthermore, we propose a new bio-inspired meta-heuristic solution based on Evolutionary Algorithm (EA) for CPP-LC. Extensive experiments show that Arc-DRL outperforms existing meta-heuristic methods such as Iterative Local Search (ILS) and Variable Neighborhood Search (VNS) proposed by (Corberan et al., 2018) on large benchmark datasets for CPP-LC regarding both solution quality and running time; while the EA gives the best solution quality with much more running time. We release our C++ implementations for metaheuristics such as EA, ILS and VNS along with the code for data generation and our generated data at https://github.com/HySonLab/Chinese_Postman_Problem △ Less

Submitted 2 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.14149 [pdf, ps, other]

Some Results on Zumkeller Numbers

Authors: Sai Teja Somu, Andrzej Kukla, Duc Van Khanh Tran

Abstract: A positive integer $n$ is said to be a Zumkeller number or an integer-perfect number if the set of its positive divisors can be partitioned into two subsets of equal sums. In this paper, we prove several results regarding Zumkeller numbers. For any positive integer $m$, we prove that there are infinitely many positive integers $n$ for which $n+1,\cdots, n+m$ are all Zumkeller numbers. Additionally… ▽ More A positive integer $n$ is said to be a Zumkeller number or an integer-perfect number if the set of its positive divisors can be partitioned into two subsets of equal sums. In this paper, we prove several results regarding Zumkeller numbers. For any positive integer $m$, we prove that there are infinitely many positive integers $n$ for which $n+1,\cdots, n+m$ are all Zumkeller numbers. Additionally, we show that every positive integer greater than $94185$ can be expressed as a sum of two Zumkeller numbers and that all sufficiently large integers can be written as a sum of a Zumkeller number and a practical number. We also show that there are infinitely many positive integers that cannot be expressed as a sum of a Zumkeller number and a square or a prime. △ Less

Submitted 27 November, 2023; v1 submitted 21 October, 2023; originally announced October 2023.

MSC Class: 11B13; 11B25; 11P99

arXiv:2310.01498 [pdf, other]

Considering the Single and Binary Origins of the Type IIP SN 2017eaw

Authors: K. Azalee Bostroem, Emmanouil Zapartas, Brad Koplitz, Benjamin F. Williams, Debby Tran, Andrew Dolphin

Abstract: Current population synthesis modeling suggests that 30-50% of Type II supernovae originate from binary progenitors, however, the identification of a binary progenitor is challenging. One indicator of a binary progenitor is that the surrounding stellar population is too old to contain a massive single star.Measurements of the progenitor mass of SN 2017eaw are starkly divided between observations ma… ▽ More Current population synthesis modeling suggests that 30-50% of Type II supernovae originate from binary progenitors, however, the identification of a binary progenitor is challenging. One indicator of a binary progenitor is that the surrounding stellar population is too old to contain a massive single star.Measurements of the progenitor mass of SN 2017eaw are starkly divided between observations made temporally close to core-collapse which show a progenitor mass of 13-15 solar masses (final helium core mass of 4.4 to 6.0 solar masses - which is a more informative property than initial mass) and those from the stellar population surrounding the SN which find M<10.8 solar masses (helium core mass <3.4 solar masses). In this paper, we reanalyze the surrounding stellar population with improved astrometry and photometry, finding a median age of 16.8 (+3.2, -1.0) Myr for all stars younger than 50 Myr (helium core mass of 4.7 solar masses) and 85.9 (+3.2, -6.5) Myr for stars younger than 150 Myr. 16.8 Myr is now consistent with the helium core mass range derived from the temporally near explosion observations for single stars. Applying the combined constraints to population synthesis models, we determine that the probability of the progenitor of SN 2017eaw being an initially single-star is 65% compared to 35% for prior binary interaction. 85.9 Myr is inconsistent with any formation scenarios. We demonstrate that combining progenitor age constraints with helium core mass estimates from red supergiant SED modeling, late-time spectra, and indirectly from light curve modeling can help to differentiate single and binary progenitor scenarios and provide a framework for the application of this technique to future observations. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: Accepted to AJ

arXiv:2310.01148 [pdf, other]

Cryptocurrency Portfolio Optimization by Neural Networks

Authors: Quoc Minh Nguyen, Dat Thanh Tran, Juho Kanniainen, Alexandros Iosifidis, Moncef Gabbouj

Abstract: Many cryptocurrency brokers nowadays offer a variety of derivative assets that allow traders to perform hedging or speculation. This paper proposes an effective algorithm based on neural networks to take advantage of these investment products. The proposed algorithm constructs a portfolio that contains a pair of negatively correlated assets. A deep neural network, which outputs the allocation weig… ▽ More Many cryptocurrency brokers nowadays offer a variety of derivative assets that allow traders to perform hedging or speculation. This paper proposes an effective algorithm based on neural networks to take advantage of these investment products. The proposed algorithm constructs a portfolio that contains a pair of negatively correlated assets. A deep neural network, which outputs the allocation weight of each asset at a time interval, is trained to maximize the Sharpe ratio. A novel loss term is proposed to regulate the network's bias towards a specific asset, thus enforcing the network to learn an allocation strategy that is close to a minimum variance strategy. Extensive experiments were conducted using data collected from Binance spanning 19 months to evaluate the effectiveness of our approach. The backtest results show that the proposed algorithm can produce neural networks that are able to make profits in different market situations. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 8 pages, 4 figures, accepted at SSCI 2023

arXiv:2309.16699 [pdf]

Circular-Line Trajectory Tracking Controller for Mobile Robot using Multi-Pixy2 Sensors

Authors: Xuan Quang Ngo, Tri Duc Tran, Huy Hung Nguyen, Van Dong Nguyen, Van Tu Duong, Tan Tien Nguyen

Abstract: This study suggests a novel tracking method that employs three Pixy2 sensors to identify the desired line trajectories instead of traditional perceiving means. Firstly, the kinematic model of the mobile robot is derived from the information gathered by three Pixy2 sensors. Secondly, the sliding mode controller is implemented to regulate the tracking error. Finally, simulation results are analyzed… ▽ More This study suggests a novel tracking method that employs three Pixy2 sensors to identify the desired line trajectories instead of traditional perceiving means. Firstly, the kinematic model of the mobile robot is derived from the information gathered by three Pixy2 sensors. Secondly, the sliding mode controller is implemented to regulate the tracking error. Finally, simulation results are analyzed to show the effectiveness of the proposed method. △ Less

Submitted 12 August, 2023; originally announced September 2023.

Comments: 6 pages, 12 figures, the 2023 International Symposium on Electrical and Electronics Engineering, Ho Chi Minh, Viet Nam, 2023

Showing 1–50 of 465 results for author: Tran, D