-
PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance
Authors:
Haohan Weng,
Yikai Wang,
Tong Zhang,
C. L. Philip Chen,
Jun Zhu
Abstract:
Generating compact and sharply detailed 3D meshes poses a significant challenge for current 3D generative models. Different from extracting dense meshes from neural representation, some recent works try to model the native mesh distribution (i.e., a set of triangles), which generates more compact results as humans crafted. However, due to the complexity and variety of mesh topology, these methods…
▽ More
Generating compact and sharply detailed 3D meshes poses a significant challenge for current 3D generative models. Different from extracting dense meshes from neural representation, some recent works try to model the native mesh distribution (i.e., a set of triangles), which generates more compact results as humans crafted. However, due to the complexity and variety of mesh topology, these methods are typically limited to small datasets with specific categories and are hard to extend. In this paper, we introduce a generic and scalable mesh generation framework PivotMesh, which makes an initial attempt to extend the native mesh generation to large-scale datasets. We employ a transformer-based auto-encoder to encode meshes into discrete tokens and decode them from face level to vertex level hierarchically. Subsequently, to model the complex typology, we first learn to generate pivot vertices as coarse mesh representation and then generate the complete mesh tokens with the same auto-regressive Transformer. This reduces the difficulty compared with directly modeling the mesh distribution and further improves the model controllability. PivotMesh demonstrates its versatility by effectively learning from both small datasets like Shapenet, and large-scale datasets like Objaverse and Objaverse-xl. Extensive experiments indicate that PivotMesh can generate compact and sharp 3D meshes across various categories, highlighting its great potential for native mesh modeling.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Authors:
Chia-Yi Hsu,
Yu-Lin Tsai,
Chih-Hsun Lin,
Pin-Yu Chen,
Chia-Mu Yu,
Chun-Ying Huang
Abstract:
While large language models (LLMs) such as Llama-2 or GPT-4 have shown impressive zero-shot performance, fine-tuning is still necessary to enhance their performance for customized datasets, domain-specific tasks, or other private needs. However, fine-tuning all parameters of LLMs requires significant hardware resources, which can be impractical for typical users. Therefore, parameter-efficient fin…
▽ More
While large language models (LLMs) such as Llama-2 or GPT-4 have shown impressive zero-shot performance, fine-tuning is still necessary to enhance their performance for customized datasets, domain-specific tasks, or other private needs. However, fine-tuning all parameters of LLMs requires significant hardware resources, which can be impractical for typical users. Therefore, parameter-efficient fine-tuning such as LoRA have emerged, allowing users to fine-tune LLMs without the need for considerable computing resources, with little performance degradation compared to fine-tuning all parameters. Unfortunately, recent studies indicate that fine-tuning can increase the risk to the safety of LLMs, even when data does not contain malicious content. To address this challenge, we propose Safe LoRA, a simple one-liner patch to the original LoRA implementation by introducing the projection of LoRA weights from selected layers to the safety-aligned subspace, effectively reducing the safety risks in LLM fine-tuning while maintaining utility. It is worth noting that Safe LoRA is a training-free and data-free approach, as it only requires the knowledge of the weights from the base and aligned LLMs. Our extensive experiments demonstrate that when fine-tuning on purely malicious data, Safe LoRA retains similar safety performance as the original aligned model. Moreover, when the fine-tuning dataset contains a mixture of both benign and malicious data, Safe LoRA mitigates the negative effect made by malicious data while preserving performance on downstream tasks.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Crossmodal ASR Error Correction with Discrete Speech Units
Authors:
Yuanchao Li,
Pinzhen Chen,
Peter Bell,
Catherine Lai
Abstract:
ASR remains unsatisfactory in scenarios where the speaking style diverges from that used to train ASR systems, resulting in erroneous transcripts. To address this, ASR Error Correction (AEC), a post-ASR processing approach, is required. In this work, we tackle an understudied issue: the Low-Resource Out-of-Domain (LROOD) problem, by investigating crossmodal AEC on very limited downstream data with…
▽ More
ASR remains unsatisfactory in scenarios where the speaking style diverges from that used to train ASR systems, resulting in erroneous transcripts. To address this, ASR Error Correction (AEC), a post-ASR processing approach, is required. In this work, we tackle an understudied issue: the Low-Resource Out-of-Domain (LROOD) problem, by investigating crossmodal AEC on very limited downstream data with 1-best hypothesis transcription. We explore pre-training and fine-tuning strategies and uncover an ASR domain discrepancy phenomenon, shedding light on appropriate training schemes for LROOD data. Moreover, we propose the incorporation of discrete speech units to align with and enhance the word embeddings for improving AEC quality. Results from multiple corpora and several evaluation metrics demonstrate the feasibility and efficacy of our proposed AEC approach on LROOD data, as well as its generalizability and superiority on large-scale data. Finally, a study on speech emotion recognition confirms that our model produces ASR error-robust transcripts suitable for downstream applications.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
Authors:
Mohammed Nowaz Rabbani Chowdhury,
Meng Wang,
Kaoutar El Maghraoui,
Naigang Wang,
Pin-Yu Chen,
Christopher Carothers
Abstract:
The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks, i.e., experts, through trainable routers. MoE reduces the training computation significantly for large models, but its deployment can be still memory or computation expensive for some downstream tasks. Model pruning is a popular approach to reduce inference computation, but its application in…
▽ More
The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks, i.e., experts, through trainable routers. MoE reduces the training computation significantly for large models, but its deployment can be still memory or computation expensive for some downstream tasks. Model pruning is a popular approach to reduce inference computation, but its application in MoE architecture is largely unexplored. To the best of our knowledge, this paper provides the first provably efficient technique for pruning experts in finetuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy, while significantly reducing the model size and the computational requirements. Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models such as VMoE and E3MoE finetuned on benchmark datasets such as CIFAR10, CIFAR100, and ImageNet.
△ Less
Submitted 30 May, 2024; v1 submitted 26 May, 2024;
originally announced May 2024.
-
SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning
Authors:
Shuai Zhang,
Heshan Devaka Fernando,
Miao Liu,
Keerthiram Murugesan,
Songtao Lu,
Pin-Yu Chen,
Tianyi Chen,
Meng Wang
Abstract:
This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward map**: the former characterizes the transition dynamics, and the latter characterizes the task-specif…
▽ More
This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward map**: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF \& GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN \& GPI, aligning with our theoretical findings.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
High-field magnetoelectric coupling and successive magnetic transitions in Mn-doped polar antiferromagnet Ni3TeO6
Authors:
J. H. Zhang,
L. Lin,
C. Dong,
Y. T. Chang,
J. F. Wang,
C. L. Lu,
P. Z. Chen,
W. J. Zhai,
G. Z. Zhou,
L. Huang,
Y. S. Tang,
S. H. Zheng,
M. F. Liu,
X. H. Zhou,
Z. B. Yan,
J. -M. Liu
Abstract:
Among the 3d transition metal ions doped polar Ni3TeO6, Mn-doped Ni3TeO6 has stimulated great interest due to its high magnetic ordering temperature and complex magnetic phases, but the mechanism of magnetoelectric (ME) coupling is far from understood. Herein we report our systematic investigation of the chemical control of magnetism, metamagnetic transition, and ME properties of Ni3-xMnxTeO6 sing…
▽ More
Among the 3d transition metal ions doped polar Ni3TeO6, Mn-doped Ni3TeO6 has stimulated great interest due to its high magnetic ordering temperature and complex magnetic phases, but the mechanism of magnetoelectric (ME) coupling is far from understood. Herein we report our systematic investigation of the chemical control of magnetism, metamagnetic transition, and ME properties of Ni3-xMnxTeO6 single crystals in high magnetic field (H) up to 52 T. We present a previously unreported weak ferromagnetic behavior appeared in the ab plane below 9.5 K in addition to the incommensurate helical and commensurate collinear antiferromagnetic states. In the low-field region, a spin-flop type metamagnetic transition without any hysteresis occurs at Hc1 for H // c, while another metamagnetic transition accompanied with a change in electric polarization is observed at Hc2 in the high-field region both for H // c and H // ab above 30 K, which can be attributed to the sudden rotation of magnetic moments at Ni2 sites. The ME measurements reveal that a first-order ME effect is observed in the low-T and low-H regions, while a second-order ME coupling term appears above 30 K in the magnetic field range of Hc1 < H < Hc2 for H // c and H < Hc2 for H // ab, both becoming significant with increasing temperature. Eventually, they are dominated by the second-order ME effect near the antiferromagnetic transition temperature. The present work demonstrates that Ni3-xMnxTeO6 is an exotic magnetoelectric material compared with Ni3TeO6 and its derivatives, thereby providing insights to better understand the magnetism and ME coupling in Ni3TeO6 and its derivatives.
△ Less
Submitted 29 May, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
A Declarative System for Optimizing AI Workloads
Authors:
Chunwei Liu,
Matthew Russo,
Michael Cafarella,
Lei Cao,
Peter Baille Chen,
Zui Chen,
Michael Franklin,
Tim Kraska,
Samuel Madden,
Gerardo Vitagliano
Abstract:
A long-standing goal of data management systems has been to build systems which can compute quantitative insights over large corpora of unstructured data in a cost-effective manner. Until recently, it was difficult and expensive to extract facts from company documents, data from scientific papers, or metrics from image and video corpora. Today's models can accomplish these tasks with high accuracy…
▽ More
A long-standing goal of data management systems has been to build systems which can compute quantitative insights over large corpora of unstructured data in a cost-effective manner. Until recently, it was difficult and expensive to extract facts from company documents, data from scientific papers, or metrics from image and video corpora. Today's models can accomplish these tasks with high accuracy. However, a programmer who wants to answer a substantive AI-powered query must orchestrate large numbers of models, prompts, and data operations. For even a single query, the programmer has to make a vast number of decisions such as the choice of model, the right inference method, the most cost-effective inference hardware, the ideal prompt design, and so on. The optimal set of decisions can change as the query changes and as the rapidly-evolving technical landscape shifts. In this paper we present Palimpzest, a system that enables anyone to process AI-powered analytical queries simply by defining them in a declarative language. The system uses its cost optimization framework to implement the query plan with the best trade-offs between runtime, financial cost, and output data quality. We describe the workload of AI-powered analytics tasks, the optimization methods that Palimpzest uses, and the prototype system itself. We evaluate Palimpzest on tasks in Legal Discovery, Real Estate Search, and Medical Schema Matching. We show that even our simple prototype offers a range of appealing plans, including one that is 3.3x faster and 2.9x cheaper than the baseline method, while also offering better data quality. With parallelism enabled, Palimpzest can produce plans with up to a 90.3x speedup at 9.1x lower cost relative to a single-threaded GPT-4 baseline, while obtaining an F1-score within 83.5% of the baseline. These require no additional work by the user.
△ Less
Submitted 29 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing
Authors:
Teng Xu,
Jiamin Chen,
Peng Chen,
Youjia Zhang,
Junqing Yu,
Wei Yang
Abstract:
Editing objects within a scene is a critical functionality required across a broad spectrum of applications in computer vision and graphics. As 3D Gaussian Splatting (3DGS) emerges as a frontier in scene representation, the effective modification of 3D Gaussian scenes has become increasingly vital. This process entails accurately retrieve the target objects and subsequently performing modification…
▽ More
Editing objects within a scene is a critical functionality required across a broad spectrum of applications in computer vision and graphics. As 3D Gaussian Splatting (3DGS) emerges as a frontier in scene representation, the effective modification of 3D Gaussian scenes has become increasingly vital. This process entails accurately retrieve the target objects and subsequently performing modifications based on instructions. Though available in pieces, existing techniques mainly embed sparse semantics into Gaussians for retrieval, and rely on an iterative dataset update paradigm for editing, leading to over-smoothing or inconsistency issues. To this end, this paper proposes a systematic approach, namely TIGER, for coherent text-instructed 3D Gaussian retrieval and editing. In contrast to the top-down language grounding approach for 3D Gaussians, we adopt a bottom-up language aggregation strategy to generate a denser language embedded 3D Gaussians that supports open-vocabulary retrieval. To overcome the over-smoothing and inconsistency issues in editing, we propose a Coherent Score Distillation (CSD) that aggregates a 2D image editing diffusion model and a multi-view diffusion model for score distillation, producing multi-view consistent editing with much finer details. In various experiments, we demonstrate that our TIGER is able to accomplish more consistent and realistic edits than prior work.
△ Less
Submitted 1 June, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
Authors:
Yuchen Hu,
Chen Chen,
Chao-Han Huck Yang,
Chengwei Qin,
Pin-Yu Chen,
Eng Siong Chng,
Chao Zhang
Abstract:
We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent speech foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary). Specifica…
▽ More
We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent speech foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary). Specifically, we propose a novel indicator that empirically integrates step-wise information during decoding to assess the token-level quality of pseudo labels without ground truth, thereby guiding model updates for effective unsupervised adaptation. Experimental results show that STAR achieves an average of 13.5% relative reduction in word error rate across 14 target domains, and it sometimes even approaches the upper-bound performance of supervised adaptation. Surprisingly, we also observe that STAR prevents the adapted model from the common catastrophic forgetting problem without recalling source-domain data. Furthermore, STAR exhibits high data efficiency that only requires less than one-hour unlabeled data, and seamless generality to alternative large speech models and speech translation tasks. Our code aims to open source to the research communities.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling
Authors:
Diwei Huang,
Kunyang Lin,
Peihao Chen,
Qing Du,
Mingkui Tan
Abstract:
Few-shot audio-visual acoustics modeling seeks to synthesize the room impulse response in arbitrary locations with few-shot observations. To sufficiently exploit the provided few-shot data for accurate acoustic modeling, we present a *map-guided* framework by constructing acoustic-related visual semantic feature maps of the scenes. Visual features preserve semantic details related to sound and map…
▽ More
Few-shot audio-visual acoustics modeling seeks to synthesize the room impulse response in arbitrary locations with few-shot observations. To sufficiently exploit the provided few-shot data for accurate acoustic modeling, we present a *map-guided* framework by constructing acoustic-related visual semantic feature maps of the scenes. Visual features preserve semantic details related to sound and maps provide explicit structural regularities of sound propagation, which are valuable for modeling environment acoustics. We thus extract pixel-wise semantic features derived from observations and project them into a top-down map, namely the **observation semantic map**. This map contains the relative positional information among points and the semantic feature information associated with each point. Yet, limited information extracted by few-shot observations on the map is not sufficient for understanding and modeling the whole scene. We address the challenge by generating a **scene semantic map** via diffusing features and anticipating the observation semantic map. The scene semantic map then interacts with echo encoding by a transformer-based encoder-decoder to predict RIR for arbitrary speaker-listener query pairs. Extensive experiments on Matterport3D and Replica dataset verify the efficacy of our framework.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Mosaic IT: Enhancing Instruction Tuning with Data Mosaics
Authors:
Ming Li,
Pei Chen,
Chenguang Wang,
Hongyu Zhao,
Yijun Liang,
Yupeng Hou,
Fuxiao Liu,
Tianyi Zhou
Abstract:
Finetuning large language models with a variety of instruction-response pairs has enhanced their capability to understand and follow instructions. Current instruction tuning primarily relies on teacher models or human intervention to generate and refine the instructions and responses, which are costly, non-sustainable, and may lack diversity. In this paper, we introduce Mosaic Instruction Tuning (…
▽ More
Finetuning large language models with a variety of instruction-response pairs has enhanced their capability to understand and follow instructions. Current instruction tuning primarily relies on teacher models or human intervention to generate and refine the instructions and responses, which are costly, non-sustainable, and may lack diversity. In this paper, we introduce Mosaic Instruction Tuning (Mosaic-IT), a human/model-free method that can efficiently create rich and diverse augmentations from existing instruction tuning data to enhance the finetuned LLM.Mosaic-IT randomly concatenates multiple instruction data into one and trains the model to produce the corresponding responses with predefined higher-level meta-instructions to strengthen its multi-step instruction-following and format-following skills. Our extensive evaluations demonstrate a superior performance and training efficiency of Mosaic-IT, which achieves consistent performance improvements over various benchmarks and an 80% reduction in training costs compared with original instruction tuning. Our codes and data are available at https://github.com/tianyi-lab/Mosaic-IT.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Simulating a Chern Insulator with C = $\pm$2 on Synthetic Floquet Lattice
Authors:
Lingxiao Lei,
Weichen Wang,
Guangyao Huang,
Shun Hu,
Xi Cao,
Xinfang Zhang,
Mingtang Deng,
**xing Chen
Abstract:
The synthetic Floquet lattice, generated by multiple strong drives with mutually incommensurate frequencies, provides a powerful platform for the quantum simulation of topological phenomena. In this study, we propose a 4-band tight-binding model of the Chern insulator with a Chern number C = $\pm$2 by coupling two layers of the half-BHZ lattice and subsequently map** it onto the Floquet lattice…
▽ More
The synthetic Floquet lattice, generated by multiple strong drives with mutually incommensurate frequencies, provides a powerful platform for the quantum simulation of topological phenomena. In this study, we propose a 4-band tight-binding model of the Chern insulator with a Chern number C = $\pm$2 by coupling two layers of the half-BHZ lattice and subsequently map** it onto the Floquet lattice to simulate its topological properties. To determine the Chern number of our Floquet-version model, we extend the energy pum** method proposed by Martin et al. [Phys. Rev. X 7, 041008 (2017)] and the topological oscillation method introduced by Boyers et al. [Phys. Rev. Lett. 125, 160505 (2020)], followed by numerical simulations for both methodologies. The simulation results demonstrate the successful extraction of the Chern number using either of these methods, providing an excellent prediction of the phase diagram that closely aligns with the theoretical one derived from the original bilayer half-BHZ model. Finally, we briefly discuss a potential experimental implementation for our model. Our work demonstrates significant potential for simulating complex topological matter using quantum computing platforms, thereby paving the way for constructing a more universal simulator for non-interacting topological quantum states and advancing our understanding of these intriguing phenomena.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
A Hierarchically Feature Reconstructed Autoencoder for Unsupervised Anomaly Detection
Authors:
Honghui Chen,
**** Chen,
Huan Mao,
Mengxi Jiang
Abstract:
Anomaly detection and localization without any manual annotations and prior knowledge is a challenging task under the setting of unsupervised learning. The existing works achieve excellent performance in the anomaly detection, but with complex networks or cumbersome pipelines. To address this issue, this paper explores a simple but effective architecture in the anomaly detection. It consists of a…
▽ More
Anomaly detection and localization without any manual annotations and prior knowledge is a challenging task under the setting of unsupervised learning. The existing works achieve excellent performance in the anomaly detection, but with complex networks or cumbersome pipelines. To address this issue, this paper explores a simple but effective architecture in the anomaly detection. It consists of a well pre-trained encoder to extract hierarchical feature representations and a decoder to reconstruct these intermediate features from the encoder. In particular, it does not require any data augmentations and anomalous images for training. The anomalies can be detected when the decoder fails to reconstruct features well, and then errors of hierarchical feature reconstruction are aggregated into an anomaly map to achieve anomaly localization. The difference comparison between those features of encoder and decode lead to more accurate and robust localization results than the comparison in single feature or pixel-by-pixel comparison in the conventional works. Experiment results show that the proposed method outperforms the state-of-the-art methods on MNIST, Fashion-MNIST, CIFAR-10, and MVTec Anomaly Detection datasets on both anomaly detection and localization.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition
Authors:
Honghui Chen,
Yuhang Qiu,
Jiabao Wang,
**** Chen,
Nam Ling
Abstract:
Internal Language Model (LM)-based methods use permutation language modeling (PLM) to solve the error correction caused by conditional independence in external LM-based methods. However, random permutations of human interference cause fit oscillations in the model training, and Iterative Refinement (IR) operation to improve multimodal information decoupling also introduces additional overhead. To…
▽ More
Internal Language Model (LM)-based methods use permutation language modeling (PLM) to solve the error correction caused by conditional independence in external LM-based methods. However, random permutations of human interference cause fit oscillations in the model training, and Iterative Refinement (IR) operation to improve multimodal information decoupling also introduces additional overhead. To address these issues, this paper proposes the Hierarchical Attention autoregressive Model with Adaptive Permutation (HAAP) to enhance the location-context-image interaction capability, improving autoregressive generalization with internal LM. First, we propose Implicit Permutation Neurons (IPN) to generate adaptive attention masks to dynamically exploit token dependencies. The adaptive masks increase the diversity of training data and prevent model dependency on a specific order. It reduces the training overhead of PLM while avoiding training fit oscillations. Second, we develop Cross-modal Hierarchical Attention mechanism (CHA) to couple context and image features. This processing establishes rich positional semantic dependencies between context and image while avoiding IR. Extensive experimental results show the proposed HAAP achieves state-of-the-art (SOTA) performance in terms of accuracy, complexity, and latency on several datasets.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Improving Transformers using Faithful Positional Encoding
Authors:
Tsuyoshi Idé,
Jokin Labaien,
Pin-Yu Chen
Abstract:
We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach is based on solid mathematical grounds and has a guarantee of not losing information about the positional order of the input sequence. We show that the new encoding approach systematically improves the prediction performance in the t…
▽ More
We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach is based on solid mathematical grounds and has a guarantee of not losing information about the positional order of the input sequence. We show that the new encoding approach systematically improves the prediction performance in the time-series classification task.
△ Less
Submitted 16 May, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Authors:
Lingdong Kong,
Shaoyuan Xie,
Hanjiang Hu,
Yaru Niu,
Wei Tsang Ooi,
Benoit R. Cottereau,
Lai Xing Ng,
Yuexin Ma,
Wenwei Zhang,
Liang Pan,
Kai Chen,
Ziwei Liu,
Weichao Qiu,
Wei Zhang,
Xu Cao,
Hao Lu,
Ying-Cong Chen,
Caixin Kang,
Xinning Zhou,
Chengyang Ying,
Wentao Shang,
Xingxing Wei,
Yinpeng Dong,
Bo Yang,
Shengyin Jiang
, et al. (66 additional authors not shown)
Abstract:
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c…
▽ More
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field.
△ Less
Submitted 29 May, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
Exploring material compositions for synthesis using oxidation states
Authors:
Maung Thway,
Andy Paul Chen,
Haiwen Dai,
Jose Recatala-Gomez,
Siyu Isaac Parker Tian,
Ruiming Zhu,
Wenhao Zhai,
Fengxia Wei,
D. V. Maheshwar Repaka,
Tonio Buonassisi,
Pieremanuele Canepa,
Kedar Hippalgaonkar
Abstract:
Recent advances in machine learning techniques have made it possible to use high-throughput screening to identify novel materials with specific properties. However, the large number of potential candidates produced by these techniques can make it difficult to select the most promising ones. In this study, we develop the oxidation state probability (OSP) method which evaluates ternary compounds bas…
▽ More
Recent advances in machine learning techniques have made it possible to use high-throughput screening to identify novel materials with specific properties. However, the large number of potential candidates produced by these techniques can make it difficult to select the most promising ones. In this study, we develop the oxidation state probability (OSP) method which evaluates ternary compounds based on the probability (the OSP metric) of each element to adopt the required oxidation states for fulfilling charge neutrality. We compare this model with Roost and the Fourier-transformed crystal properties (FTCP)-based synthesizability score. Among the top 1000 systems with the most database entries in Materials Project (MP), more than 500 systems exhibit an attested compound among the top 3 compositions when ranked by the OSP metric. We find that the OSP method shows promising results for certain classes of ternary systems, especially those containing nonmetals, s-block, or transition metals. When applied to the Cu-In-Te ternary system, an interesting system for thermoelectric applications, the OSP method predicted the synthesizability of CuIn$_3$Te$_5$ without prior knowledge, and we have successfully synthesized CuIn$_3$Te$_5$ in experiment. Our method has the potential to accelerate the discovery of novel compounds by providing a guide for experimentalists to easily select the most synthesizable candidates from an arbitrarily large set of possible chemical compositions.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Production cross sections of superheavy elements: insights from the dinuclear system model with high-quality microscopic nuclear masses
Authors:
Peng-Hui Chen,
Chang Geng,
Fei Niu,
Zu-Xing Yang,
Xiang-Hua Zeng,
Zhao-Qing Feng
Abstract:
To accurately predict the synthesis cross-sections of superheavy elements, identifying the optimal projectile-target combinations and the evaporation channels at specific collision energies, we have attempted to utilize high-quality microscopic nuclear masses (HQMNM) within the dinuclear system (DNS) model, which are obtained by fitting experimental data with the Skyrme energy density functional t…
▽ More
To accurately predict the synthesis cross-sections of superheavy elements, identifying the optimal projectile-target combinations and the evaporation channels at specific collision energies, we have attempted to utilize high-quality microscopic nuclear masses (HQMNM) within the dinuclear system (DNS) model, which are obtained by fitting experimental data with the Skyrme energy density functional theory (DFT), as published in Phys. Lett. B 851 (2024) 138578. The atomic nuclear mass serves as a crucial input for the DNS model, as the Q-values and separation energies it generates directly influence the calculated fusion and survival probabilities. Our calculations have reproduced the experimental data for hot fusion and have been compared with results based on the finite-range droplet model (FRDM12) mass calculations. Compared to the FRDM12 mass results, we have found that the HQMNM provides a better fit to the experimental outcomes. For the specific reaction of \(^{48}\rm{Ca} + ^{243}\rm{Am} \rightarrow ^{291}\rm{Mc}^*\), we have conducted a detailed calculation of capture, fusion, and survival based on the HQMNM model and compared these with calculations based on other mass models. Based on these findings, we have systematically calculated available projectile target combinations for the synthesis of elements 119 and 120, and identified the optimal combinations. We provided the synthesis cross-sections, collision energies, and evaporation channels, offering a reference for conducting experiments on the synthesis of superheavy elements.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Map** the Invisible: A Framework for Tracking COVID-19 Spread Among College Students with Google Location Data
Authors:
Pra**dra Sankar Krishnan,
Chai Phing Chen,
Gamal Alkawsi,
Sieh Kiong Tiong,
Luiz Fernando Capretz
Abstract:
The COVID-19 pandemic and the implementation of social distancing policies have rapidly changed people's visiting patterns, as reflected in mobility data that tracks mobility traffic using location trackers on cell phones. However, the frequency and duration of concurrent occupancy at specific locations govern the transmission rather than the number of customers visiting. Therefore, understanding…
▽ More
The COVID-19 pandemic and the implementation of social distancing policies have rapidly changed people's visiting patterns, as reflected in mobility data that tracks mobility traffic using location trackers on cell phones. However, the frequency and duration of concurrent occupancy at specific locations govern the transmission rather than the number of customers visiting. Therefore, understanding how people interact in different locations is crucial to target policies, inform contact tracing, and prevention strategies. This study proposes an efficient way to reduce the spread of the virus among on-campus university students by develo** a self-developed Google History Location Extractor and Indicator software based on real-world human mobility data. The platform enables policymakers and researchers to explore the possibility of future developments in the epidemic's spread and simulate the outcomes of human mobility and epidemic state under different epidemic control policies. It offers functions for determining potential contacts, assessing individual infection risks, and evaluating the effectiveness of on-campus policies. The proposed multi-functional platform facilitates the screening process by more accurately targeting potential virus carriers and aids in making informed decisions on epidemic control policies, ultimately contributing to preventing and managing future outbreaks.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
MoCo: Fuzzing Deep Learning Libraries via Assembling Code
Authors:
Pin Ji,
Yang Feng,
Duo Wu,
Lingyue Yan,
Pengling Chen,
Jia Liu,
Zhihong Zhao
Abstract:
The rapidly develo** deep learning (DL) techniques have been applied in software systems with various application scenarios. However, they could also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors…
▽ More
The rapidly develo** deep learning (DL) techniques have been applied in software systems with various application scenarios. However, they could also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors of DL systems. Previous research on fuzzing DL libraries still has limitations in the diversity of test inputs, the construction of test oracles, and the precision of detection. In this paper, we propose MoCo, a novel fuzzing testing method for DL libraries via assembling code. MoCo first disassembles the seed code file to obtain the template and code blocks, and then employs code block mutation operators (e.g., API replacement, random generation, and boundary checking) to generate more new code blocks adapted to the template. By inserting context-appropriate code blocks into the template step by step, MoCo can generate a tree of code files with intergenerational relations. According to the derivation relations in this tree and the applied mutation operators, we construct the test oracle based on the execution state consistency. Since the granularity of code assembly and mutation is controlled rather than randomly divergent, we can quickly pinpoint the lines of code where the bugs are located and the corresponding triggering conditions. We conduct a comprehensive experiment to evaluate the efficiency and effectiveness of MoCo using three widely-used DL libraries (i.e., TensorFlow, PyTorch, and Jittor). During the experiment, MoCo detects 64 new bugs of four types in three DL libraries, where 51 bugs have been confirmed, and 13 bugs have been fixed by developers.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Movable Antennas Aided Multicast MISO Communication Systems
Authors:
Zhenqiao Cheng,
Nanxi Li,
Ruizhe Long,
Jianchi Zhu,
Chongjun Ouyang,
Peng Chen
Abstract:
A novel multicast communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the transmission rate. Specifically, an MA-assisted two-user multicast multiple-input single-input system is considered. The joint optimization of the transmit beamforming vector and transmit MA positions is studied by modeling the motion of the MA element…
▽ More
A novel multicast communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the transmission rate. Specifically, an MA-assisted two-user multicast multiple-input single-input system is considered. The joint optimization of the transmit beamforming vector and transmit MA positions is studied by modeling the motion of the MA elements as discrete movements. A low-complexity greedy search-based algorithm is proposed to tackle this non-convex inter-programming problem. A branch-and-bound (BAB)-based method is proposed to achieve the optimal multicast rate with a reduced time complexity than the brute-force search by assuming the two users suffer similar line-of-sight path losses. Numerical results reveal that the proposed MA systems significantly improve the multicast rate compared to conventional fixed-position antennas (FPAs)-based systems.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
3D Hand Mesh Recovery from Monocular RGB in Camera Space
Authors:
Haonan Li,
Patrick P. K. Chen,
Yitong Zhou
Abstract:
With the rapid advancement of technologies such as virtual reality, augmented reality, and gesture control, users expect interactions with computer interfaces to be more natural and intuitive. Existing visual algorithms often struggle to accomplish advanced human-computer interaction tasks, necessitating accurate and reliable absolute spatial prediction methods. Moreover, dealing with complex scen…
▽ More
With the rapid advancement of technologies such as virtual reality, augmented reality, and gesture control, users expect interactions with computer interfaces to be more natural and intuitive. Existing visual algorithms often struggle to accomplish advanced human-computer interaction tasks, necessitating accurate and reliable absolute spatial prediction methods. Moreover, dealing with complex scenes and occlusions in monocular images poses entirely new challenges. This study proposes a network model that performs parallel processing of root-relative grids and root recovery tasks. The model enables the recovery of 3D hand meshes in camera space from monocular RGB images. To facilitate end-to-end training, we utilize an implicit learning approach for 2D heatmaps, enhancing the compatibility of 2D cues across different subtasks. Incorporate the Inception concept into spectral graph convolutional network to explore relative mesh of root, and integrate it with the locally detailed and globally attentive method designed for root recovery exploration. This approach improves the model's predictive performance in complex environments and self-occluded scenes. Through evaluation on the large-scale hand dataset FreiHAND, we have demonstrated that our proposed model is comparable with state-of-the-art models. This study contributes to the advancement of techniques for accurate and reliable absolute spatial prediction in various human-computer interaction applications.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
On pointwise convergence of cone multipliers
Authors:
Peng Chen,
Danqing He,
Xiaochun Li,
Lixin Yan
Abstract:
For $p\ge 2$, and $λ>\max\{n|\tfrac 1p-\tfrac 12|-\tfrac12, 0\}$, we prove the pointwise convergence of cone multipliers, i.e. $$ \lim_{t\to\infty}T_t^λ(f)\to f \text{ a.e.},$$ where $f\in L^p(\mathbb R^n)$ satisfies $supp\ \widehat f\subset\{ξ\in\mathbb R^n:\ 1<|ξ_n|<2\}$. Our main tools are weighted estimates for maximal cone operators, which are consequences of trace inequalities for cones.
For $p\ge 2$, and $λ>\max\{n|\tfrac 1p-\tfrac 12|-\tfrac12, 0\}$, we prove the pointwise convergence of cone multipliers, i.e. $$ \lim_{t\to\infty}T_t^λ(f)\to f \text{ a.e.},$$ where $f\in L^p(\mathbb R^n)$ satisfies $supp\ \widehat f\subset\{ξ\in\mathbb R^n:\ 1<|ξ_n|<2\}$. Our main tools are weighted estimates for maximal cone operators, which are consequences of trace inequalities for cones.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
Mothman at SemEval-2024 Task 9: An Iterative System for Chain-of-Thought Prompt Optimization
Authors:
Alvin Po-Chun Chen,
Ray Groshan,
Sean von Bayern
Abstract:
Extensive research exists on the performance of large language models on logic-based tasks, whereas relatively little has been done on their ability to generate creative solutions on lateral thinking tasks. The BrainTeaser shared task tests lateral thinking and uses adversarial datasets to prevent memorization, resulting in poor performance for out-of-the-box models. We propose a system for iterat…
▽ More
Extensive research exists on the performance of large language models on logic-based tasks, whereas relatively little has been done on their ability to generate creative solutions on lateral thinking tasks. The BrainTeaser shared task tests lateral thinking and uses adversarial datasets to prevent memorization, resulting in poor performance for out-of-the-box models. We propose a system for iterative, chain-of-thought prompt engineering which optimizes prompts using human evaluation. Using this shared task, we demonstrate our system's ability to significantly improve model performance by optimizing prompts and evaluate the input dataset.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Graph is all you need? Lightweight data-agnostic neural architecture search without training
Authors:
Zhenhan Huang,
Tejaswini Pedapati,
Pin-Yu Chen,
Chunhen Jiang,
Jianxi Gao
Abstract:
Neural architecture search (NAS) enables the automatic design of neural network models. However, training the candidates generated by the search algorithm for performance evaluation incurs considerable computational overhead. Our method, dubbed nasgraph, remarkably reduces the computational costs by converting neural architectures to graphs and using the average degree, a graph measure, as the pro…
▽ More
Neural architecture search (NAS) enables the automatic design of neural network models. However, training the candidates generated by the search algorithm for performance evaluation incurs considerable computational overhead. Our method, dubbed nasgraph, remarkably reduces the computational costs by converting neural architectures to graphs and using the average degree, a graph measure, as the proxy in lieu of the evaluation metric. Our training-free NAS method is data-agnostic and light-weight. It can find the best architecture among 200 randomly sampled architectures from NAS-Bench201 in 217 CPU seconds. Besides, our method is able to achieve competitive performance on various datasets including NASBench-101, NASBench-201, and NDS search spaces. We also demonstrate that nasgraph generalizes to more challenging tasks on Micro TransNAS-Bench-101.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Exploring the potential of synthesizing unknown superheavy isotopes via cold-fusion reactions based on the dinuclear system model
Authors:
Hao Wu,
Peng-Hui Chen,
Fei Niu,
Zu-Xing Yang,
Xiang-Hua Zeng,
Zhao-Qing Feng
Abstract:
To assess the potential of cold-fusion for synthesizing superheavy nuclei (SHN) with proton numbers 104-113, we systematically calculated 145 naturally occurring projectile-target combinations within the DNS model. Reactions predominantly show maximum cross-sections in the 1n to 2n channels, peaking near the Coulomb barrier with a sum of barrier and Q-value within 30 MeV. The maximum cross-section…
▽ More
To assess the potential of cold-fusion for synthesizing superheavy nuclei (SHN) with proton numbers 104-113, we systematically calculated 145 naturally occurring projectile-target combinations within the DNS model. Reactions predominantly show maximum cross-sections in the 1n to 2n channels, peaking near the Coulomb barrier with a sum of barrier and Q-value within 30 MeV. The maximum cross-section occurs below the Bass barrier, suggesting either the Bass model's limitation or significant deformation reducing the effective Coulomb barrier. Our calculations align well with experimental data, revealing that more neutron-rich projectiles slightly enhance fusion, though the effect is minor. For fixed targets (Pb, Bi), evaporation residue cross-sections decrease linearly with increasing projectile proton number, attributed to reduced fusion probability and lower fission barriers in heavier SHN. The touching potential $V_{\rm in}$ shows a linear trend with the product of projectile-target proton numbers, with neutron-rich systems exhibiting lower $V_{\rm in}$. Some reactions with $V_{\rm in} < V_{\rm S}$ may involve nucleon transfer before capture. Based on the DNS model, we identified optimal combinations and collision energies for synthesizing SHN with significant cross-sections. Collectively, our findings indicate that cold fusion is a promising avenue for creating proton-rich SHN around the drip line in the Z=104-113 region, offering distinct advantages over alternative mechanisms.
△ Less
Submitted 1 May, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
Movable Antenna-Enhanced Wireless Powered Mobile Edge Computing Systems
Authors:
Pengcheng Chen,
Yuxuan Yang,
Bin Lyu,
Zhen Yang,
Abbas Jamalipour
Abstract:
In this paper, we propose a movable antenna (MA) enhanced scheme for wireless powered mobile edge computing (WP-MEC) system, where the hybrid access point (HAP) equipped with multiple MAs first emits wireless energy to charge wireless devices (WDs), and then receives the offloaded tasks from the WDs for edge computing. The MAs deployed at the HAP enhance the spatial degrees of freedom (DoFs) by fl…
▽ More
In this paper, we propose a movable antenna (MA) enhanced scheme for wireless powered mobile edge computing (WP-MEC) system, where the hybrid access point (HAP) equipped with multiple MAs first emits wireless energy to charge wireless devices (WDs), and then receives the offloaded tasks from the WDs for edge computing. The MAs deployed at the HAP enhance the spatial degrees of freedom (DoFs) by flexibly adjusting the positions of MAs within an available region, thereby improving the efficiency of both downlink wireless energy transfer (WPT) and uplink task offloading. To balance the performance enhancement against the implementation intricacy, we further propose three types of MA positioning configurations, i.e., dynamic MA positioning, semi-dynamic MA positioning, and static MA positioning. In addition, the non-linear power conversion of energy harvesting (EH) circuits at the WDs and the finite computing capability at the edge server are taken into account. Our objective is to maximize the sum computational rate (SCR) by jointly optimizing the time allocation, positions of MAs, energy beamforming matrix, receive combing vectors, and offloading strategies of WDs. To solve the non-convex problems, efficient alternating optimization (AO) frameworks are proposed. Moreover, we propose a hybrid algorithm of particle swarm optimization with variable local search (PSO-VLS) to solve the sub-problem of MA positioning. Numerical results validate the superiority of exploiting MAs over the fixed-position antennas (FPAs) for enhancing the SCR performance of WP-MEC systems.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Current laboratory performance of starlight suppression systems, and potential pathways to desired Habitable Worlds Observatory exoplanet science capabilities
Authors:
Bertrand Mennesson,
Ruslan Belikov,
Emiel Por,
Eugene Serabyn,
Garreth Ruane,
A. J. Eldorado Riggs,
Dan Sirbu,
Laurent Pueyo,
Remi Soummer,
Jeremy Kasdin,
Stuart Shaklan,
Byoung-Joon Seo,
Christopher Stark,
Eric Cady,
Pin Chen,
Brendan Crill,
Kevin Fogarty,
Alexandra Greenbaum,
Olivier Guyon,
Roser Juanola-Parramon,
Brian Kern,
John Krist,
Bruce Macintosh,
David Marx,
Dimitri Mawet
, et al. (12 additional authors not shown)
Abstract:
We summarize the current best polychromatic (10 to 20 % bandwidth) contrast performance demonstrated in the laboratory by different starlight suppression approaches and systems designed to directly characterize exoplanets around nearby stars. We present results obtained by internal coronagraph and external starshade experimental testbeds using entrance apertures equivalent to off-axis or on-axis t…
▽ More
We summarize the current best polychromatic (10 to 20 % bandwidth) contrast performance demonstrated in the laboratory by different starlight suppression approaches and systems designed to directly characterize exoplanets around nearby stars. We present results obtained by internal coronagraph and external starshade experimental testbeds using entrance apertures equivalent to off-axis or on-axis telescopes, either monolithic or segmented. For a given angular separation and spectral bandwidth, the performance of each starlight suppression system is characterized by the values of raw contrast (before image processing), off-axis (exoplanet) core throughput, and post-calibration contrast (the final 1 sigma detection limit of off-axis point sources, after image processing). To place the current laboratory results in the perspective of the future Habitable Worlds Observatory (HWO) mission, we simulate visible observations of a fiducial Earth/Sun twin system at 12 pc, assuming a 6m (inscribed diameter) collecting aperture and a realistic end-to-end optical throughput. The exposure times required for broadband exoearth detection (20% bandwidth around a wavelength of 0.55 microns) and visible spectroscopic observations (R=70) are then computed assuming various levels of starlight suppression performance, including the values currently demonstrated in the laboratory. Using spectroscopic exposure time as a simple metric, our results point to key starlight suppression system design performance improvements and trades to be conducted in support of HWO exoplanet science capabilities. These trades may be explored via numerical studies, lab experiments, as well as high contrast space-based observations and demonstrations.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
FDCE-Net: Underwater Image Enhancement with Embedding Frequency and Dual Color Encoder
Authors:
Zheng Cheng,
Guodong Fan,
**gchun Zhou,
Min Gan,
C. L. Philip Chen
Abstract:
Underwater images often suffer from various issues such as low brightness, color shift, blurred details, and noise due to light absorption and scattering caused by water and suspended particles. Previous underwater image enhancement (UIE) methods have primarily focused on spatial domain enhancement, neglecting the frequency domain information inherent in the images. However, the degradation factor…
▽ More
Underwater images often suffer from various issues such as low brightness, color shift, blurred details, and noise due to light absorption and scattering caused by water and suspended particles. Previous underwater image enhancement (UIE) methods have primarily focused on spatial domain enhancement, neglecting the frequency domain information inherent in the images. However, the degradation factors of underwater images are closely intertwined in the spatial domain. Although certain methods focus on enhancing images in the frequency domain, they overlook the inherent relationship between the image degradation factors and the information present in the frequency domain. As a result, these methods frequently enhance certain attributes of the improved image while inadequately addressing or even exacerbating other attributes. Moreover, many existing methods heavily rely on prior knowledge to address color shift problems in underwater images, limiting their flexibility and robustness. In order to overcome these limitations, we propose the Embedding Frequency and Dual Color Encoder Network (FDCE-Net) in our paper. The FDCE-Net consists of two main structures: (1) Frequency Spatial Network (FS-Net) aims to achieve initial enhancement by utilizing our designed Frequency Spatial Residual Block (FSRB) to decouple image degradation factors in the frequency domain and enhance different attributes separately. (2) To tackle the color shift issue, we introduce the Dual-Color Encoder (DCE). The DCE establishes correlations between color and semantic representations through cross-attention and leverages multi-scale image features to guide the optimization of adaptive color query. The final enhanced images are generated by combining the outputs of FS-Net and DCE through a fusion network. These images exhibit rich details, clear textures, low noise and natural colors.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving
Authors:
Pei Chen,
Boran Han,
Shuai Zhang
Abstract:
Large Language Models (LLMs) have shown great ability in solving traditional natural language tasks and elementary reasoning tasks with appropriate prompting techniques. However, their ability is still limited in solving complicated science problems. In this work, we aim to push the upper bound of the reasoning capability of LLMs by proposing a collaborative multi-agent, multi-reasoning-path (CoMM…
▽ More
Large Language Models (LLMs) have shown great ability in solving traditional natural language tasks and elementary reasoning tasks with appropriate prompting techniques. However, their ability is still limited in solving complicated science problems. In this work, we aim to push the upper bound of the reasoning capability of LLMs by proposing a collaborative multi-agent, multi-reasoning-path (CoMM) prompting framework. Specifically, we prompt LLMs to play different roles in a problem-solving team, and encourage different role-play agents to collaboratively solve the target task. In particular, we discover that applying different reasoning paths for different roles is an effective strategy to implement few-shot prompting approaches in the multi-agent scenarios. Empirical results demonstrate the effectiveness of the proposed methods on two college-level science problems over competitive baselines. Our further analysis shows the necessity of prompting LLMs to play different roles or experts independently. We release the code at: https://github.com/amazon-science/comm-prompt
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement
Authors:
Zishu Yao,
Guodong Fan,
**fu Fan,
Min Gan,
C. L. Philip Chen
Abstract:
Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range corre…
▽ More
Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range correlations in such images. On the other hand, transformer-based methods that focus on global information face high computational complexities when processing high-resolution remote sensing images. From another perspective, Fourier transform can compute global information without introducing a large number of parameters, enabling the network to more efficiently capture the overall image structure and establish long-range correlations. Therefore, we propose a Dual-Domain Feature Fusion Network (DFFN) for low-light remote sensing image enhancement. Specifically, this challenging task of low-light enhancement is divided into two more manageable sub-tasks: the first phase learns amplitude information to restore image brightness, and the second phase learns phase information to refine details. To facilitate information exchange between the two phases, we designed an information fusion affine block that combines data from different phases and scales. Additionally, we have constructed two dark light remote sensing datasets to address the current lack of datasets in dark light remote sensing image enhancement. Extensive evaluations show that our method outperforms existing state-of-the-art methods. The code is available at https://github.com/iijjlk/DFFN.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
Authors:
Xiaohong Liu,
Xiongkuo Min,
Guangtao Zhai,
Chunyi Li,
Tengchuan Kou,
Wei Sun,
Haoning Wu,
Yixuan Gao,
Yuqin Cao,
Zicheng Zhang,
Xiele Wu,
Radu Timofte,
Fei Peng,
Huiyuan Fu,
Anlong Ming,
Chuanming Wang,
Huadong Ma,
Shuai He,
Zifei Dou,
Shu Chen,
Huacong Zhang,
Haiyi Xie,
Chengwei Wang,
Baoying Chen,
Jishen Zeng
, et al. (89 additional authors not shown)
Abstract:
This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte…
▽ More
This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC.
△ Less
Submitted 7 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions
Authors:
Haoyuan Li,
Qi Hu,
You Yao,
Kailun Yang,
Peng Chen
Abstract:
Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause…
▽ More
Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause false negatives and false positives in detection. To address this issue, we introduce a novel and challenging task, termed visible-infrared object detection under adverse weather conditions. To foster this task, we have constructed a new Severe Weather Visible-Infrared Dataset (SWVID) with diverse severe weather scenes. Furthermore, we introduce the Cross-modality Fusion Mamba with Weather-removal (CFMW) to augment detection accuracy in adverse weather conditions. Thanks to the proposed Weather Removal Diffusion Model (WRDM) and Cross-modality Fusion Mamba (CFM) modules, CFMW is able to mine more essential information of pedestrian features in cross-modality fusion, thus could transfer to other rarer scenarios with high efficiency and has adequate availability on those platforms with low computing power. To the best of our knowledge, this is the first study that targeted improvement and integrated both Diffusion and Mamba modules in cross-modality object detection, successfully expanding the practical application of this type of model with its higher accuracy and more advanced architecture. Extensive experiments on both well-recognized and self-created datasets conclusively demonstrate that our CFMW achieves state-of-the-art detection performance, surpassing existing benchmarks. The dataset and source code will be made publicly available at https://github.com/lhy-zjut/CFMW.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Optimal entanglement generation in optomechanical systems via Krotov control of covariance matrix dynamics
Authors:
Peng-Ju Chen,
Da-Wei Luo,
Ting Yu
Abstract:
We investigated the optimal control of a continuous variable system, focusing on entanglement generation in an optomechanical system without utilizing Fock basis cutoffs. Using the Krotov algorithm to optimize the dynamics of the covariance matrix, we illustrated how to design a control objective function to manipulate the dynamics of the system to generate a desirable target state. We showed that…
▽ More
We investigated the optimal control of a continuous variable system, focusing on entanglement generation in an optomechanical system without utilizing Fock basis cutoffs. Using the Krotov algorithm to optimize the dynamics of the covariance matrix, we illustrated how to design a control objective function to manipulate the dynamics of the system to generate a desirable target state. We showed that entanglement between the macroscopic mechanical mirror and the quantum optical cavity can be reliably generated through imposing the control on the detuning of the external laser field. It has be shown that the control may be still achieved when imposing spectral constraints on the external field to restrict it to low-frequency components. In addition, we systematically studies the effects of quantum control on non-Markovian open system dynamics. We observed that memory effects can play a beneficial role in mitigating the detrimental impact of environmental noises. Specifically, the entanglement generated shows reduced decay in the presence of these memory effects.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Online Personalizing White-box LLMs Generation with Neural Bandits
Authors:
Zekai Chen,
Weeden Daniel,
Po-yu Chen,
Francois Buet-Golfouse
Abstract:
The advent of personalized content generation by LLMs presents a novel challenge: how to efficiently adapt text to meet individual preferences without the unsustainable demand of creating a unique model for each user. This study introduces an innovative online method that employs neural bandit algorithms to dynamically optimize soft instruction embeddings based on user feedback, enhancing the pers…
▽ More
The advent of personalized content generation by LLMs presents a novel challenge: how to efficiently adapt text to meet individual preferences without the unsustainable demand of creating a unique model for each user. This study introduces an innovative online method that employs neural bandit algorithms to dynamically optimize soft instruction embeddings based on user feedback, enhancing the personalization of open-ended text generation by white-box LLMs. Through rigorous experimentation on various tasks, we demonstrate significant performance improvements over baseline strategies. NeuralTS, in particular, leads to substantial enhancements in personalized news headline generation, achieving up to a 62.9% improvement in terms of best ROUGE scores and up to 2.76% increase in LLM-agent evaluation against the baseline.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Authors:
Timin Gao,
Peixian Chen,
Mengdan Zhang,
Chaoyou Fu,
Yunhang Shen,
Yan Zhang,
Shengchuan Zhang,
Xiawu Zheng,
Xing Sun,
Liujuan Cao,
Rongrong Ji
Abstract:
With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-…
▽ More
With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-level perception tools that fail to provide abstract summaries necessary for comprehensive reasoning. We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks. This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability. To this end, we propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture. Cantor first acts as a decision generator and integrates visual inputs to analyze the image and problem, ensuring a closer alignment with the actual context. Furthermore, Cantor leverages the advanced cognitive functions of MLLMs to perform as multifaceted experts for deriving higher-level information, enhancing the CoT generation process. Our extensive experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance across two complex visual reasoning datasets, without necessitating fine-tuning or ground-truth rationales. Project Page: https://ggg0919.github.io/cantor/ .
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks
Authors:
Erh-Chung Chen,
Pin-Yu Chen,
I-Hsin Chung,
Che-Rung Lee
Abstract:
Latency attacks against object detection represent a variant of adversarial attacks that aim to inflate the inference time by generating additional ghost objects in a target image. However, generating ghost objects in the black-box scenario remains a challenge since information about these unqualified objects remains opaque. In this study, we demonstrate the feasibility of generating ghost objects…
▽ More
Latency attacks against object detection represent a variant of adversarial attacks that aim to inflate the inference time by generating additional ghost objects in a target image. However, generating ghost objects in the black-box scenario remains a challenge since information about these unqualified objects remains opaque. In this study, we demonstrate the feasibility of generating ghost objects in adversarial examples by extending the concept of "steal now, decrypt later" attacks. These adversarial examples, once produced, can be employed to exploit potential vulnerabilities in the AI service, giving rise to significant security concerns. The experimental results demonstrate that the proposed attack achieves successful attacks across various commonly used models and Google Vision API without any prior knowledge about the target model. Additionally, the average cost of each attack is less than \$ 1 dollars, posing a significant threat to AI security.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
A possible origin of the $α$-vacuum as the initial state of the Universe
Authors:
Pisin Chen,
Kuan-Nan Lin,
Wei-Chen Lin,
Dong-han Yeom
Abstract:
We investigate the cosmological observables using the Euclidean path integral approach. Specifically, we study both the no-boundary compact instantons scenario and the Euclidean wormholes scenario that can induce the creation of two universes from nothing. It is known that perturbations associated with the no-boundary scenario can only be consistent with the Bunch-Davies vacuum. Here we demonstrat…
▽ More
We investigate the cosmological observables using the Euclidean path integral approach. Specifically, we study both the no-boundary compact instantons scenario and the Euclidean wormholes scenario that can induce the creation of two universes from nothing. It is known that perturbations associated with the no-boundary scenario can only be consistent with the Bunch-Davies vacuum. Here we demonstrate that the Euclidean wormholes can allow for a de Sitter invariant vacuum, the so-called $α$-vacuum state, where the Bunch-Davies vacuum is a special case. This therefore provides the $α$-vacuum a geometrical origin. As an aside, we discuss a subtle phase issue when considering the power spectrum related to $α$-vacuum in the closed universe framework.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Fine-Tuning Large Language Models to Translate: Will a Touch of Noisy Data in Misaligned Languages Suffice?
Authors:
Dawei Zhu,
Pinzhen Chen,
Miaoran Zhang,
Barry Haddow,
Xiaoyu Shen,
Dietrich Klakow
Abstract:
Traditionally, success in multilingual machine translation can be attributed to three key factors in training data: large volume, diverse translation directions, and high quality. In the current practice of fine-tuning large language models (LLMs) for translation, we revisit the importance of all these factors. We find that LLMs display strong translation capability after being fine-tuned on as fe…
▽ More
Traditionally, success in multilingual machine translation can be attributed to three key factors in training data: large volume, diverse translation directions, and high quality. In the current practice of fine-tuning large language models (LLMs) for translation, we revisit the importance of all these factors. We find that LLMs display strong translation capability after being fine-tuned on as few as 32 training instances, and that fine-tuning on a single translation direction effectively enables LLMs to translate in multiple directions. However, the choice of direction is critical: fine-tuning LLMs with English on the target side can lead to task misinterpretation, which hinders translations into non-English languages. A similar problem arises when noise is introduced into the target side of parallel data, especially when the target language is well-represented in the LLM's pre-training. In contrast, noise in an under-represented language has a less pronounced effect. Our findings suggest that attaining successful alignment hinges on teaching the model to maintain a "superficial" focus, thereby avoiding the learning of erroneous biases beyond translation.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Unveiling and Mitigating Generalized Biases of DNNs through the Intrinsic Dimensions of Perceptual Manifolds
Authors:
Yanbiao Ma,
Licheng Jiao,
Fang Liu,
Lingling Li,
Wen** Ma,
Shuyuan Yang,
Xu Liu,
Puhua Chen
Abstract:
Building fair deep neural networks (DNNs) is a crucial step towards achieving trustworthy artificial intelligence. Delving into deeper factors that affect the fairness of DNNs is paramount and serves as the foundation for mitigating model biases. However, current methods are limited in accurately predicting DNN biases, relying solely on the number of training samples and lacking more precise measu…
▽ More
Building fair deep neural networks (DNNs) is a crucial step towards achieving trustworthy artificial intelligence. Delving into deeper factors that affect the fairness of DNNs is paramount and serves as the foundation for mitigating model biases. However, current methods are limited in accurately predicting DNN biases, relying solely on the number of training samples and lacking more precise measurement tools. Here, we establish a geometric perspective for analyzing the fairness of DNNs, comprehensively exploring how DNNs internally shape the intrinsic geometric characteristics of datasets-the intrinsic dimensions (IDs) of perceptual manifolds, and the impact of IDs on the fairness of DNNs. Based on multiple findings, we propose Intrinsic Dimension Regularization (IDR), which enhances the fairness and performance of models by promoting the learning of concise and ID-balanced class perceptual manifolds. In various image recognition benchmark tests, IDR significantly mitigates model bias while improving its performance.
△ Less
Submitted 17 May, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Sequential subspace methods on Stiefel manifold optimization problems
Authors:
Pengwen Chen,
Chung-Kuan Cheng,
Chester Holtz
Abstract:
We study the minimization of a quadratic over Stiefel manifolds (the set of all orthogonal $r$-frames in \IR^n), which has applications in high-dimensional semi-supervised classification tasks. To reduce the computational complexity, sequential subspace methods(SSM) are employed to convert the high-dimensional minimization problems to low-dimensional ones. In this paper, we are interested in attai…
▽ More
We study the minimization of a quadratic over Stiefel manifolds (the set of all orthogonal $r$-frames in \IR^n), which has applications in high-dimensional semi-supervised classification tasks. To reduce the computational complexity, sequential subspace methods(SSM) are employed to convert the high-dimensional minimization problems to low-dimensional ones. In this paper, we are interested in attaining an optimal solution of good quality, i.e., a ``qualified" critical point. Qualified critical points are those critical points, at which the associated multiplier matrix meets some upper bound condition. These critical points enjoy the global optimality in special quadratic problems. For a general quadratic,
SSM computes a sequence of ``qualified critical points" in its low-dimensional ``surrogate regularized models". The convergence to a qualified critical point is ensured, whenever each SSM subspace is constructed by the following vectors: (i) a set of orthogonal unit vectors associated with the current iterate, (ii) a set of vectors corresponding to the gradient of the objective, and (iii) a set of eigenvectors associated with the smallest $r$ eigenvalues of the system matrix. In addition, when Newton direction vectors are included in subspaces, the convergence of SSM can be accelerated significantly.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
OGLE-2015-BLG-0845L: A low-mass M dwarf from the microlensing parallax and xallarap effects
Authors:
Zhecheng Hu,
Wei Zhu,
Andrew Gould,
Andrzej Udalski,
Takahiro Sumi,
** Chen,
Sebastiano Calchi Novati,
Jennifer C. Yee,
Charles A. Beichman,
Geoffery Bryden,
Sean Carey,
Michael Fausnaugh,
B. Scott Gaudi,
Calen B. Henderson,
Yossi Shvartzvald,
Benjamin Wibking,
Przemek Mróz,
Jan Skowron,
Radoslaw Poleski,
Michaeł K. Szymański,
Igor Soszynśki,
Paweł Pietrukowicz,
Szymon Kozłowski,
Krzysztof Ulaczyk,
Krzysztof A. Rybicki
, et al. (29 additional authors not shown)
Abstract:
We present the analysis of the microlensing event OGLE-2015-BLG-0845, which was affected by both the microlensing parallax and xallarap effects. The former was detected via the simultaneous observations from the ground and Spitzer, and the latter was caused by the orbital motion of the source star in a relatively close binary. The combination of these two effects led to a direct mass measurement o…
▽ More
We present the analysis of the microlensing event OGLE-2015-BLG-0845, which was affected by both the microlensing parallax and xallarap effects. The former was detected via the simultaneous observations from the ground and Spitzer, and the latter was caused by the orbital motion of the source star in a relatively close binary. The combination of these two effects led to a direct mass measurement of the lens object, revealing a low-mass ($0.14 \pm 0.05 M_{\odot}$) M-dwarf at the bulge distance ($7.6 \pm 1.0$ kpc). The source binary consists of a late F-type subgiant and a K-type dwarf of $\sim1.2 M_{\odot}$ and $\sim 0.9 M_{\odot}$, respectively, and the orbital period is $70 \pm 10$ days. OGLE-2015-BLG-0845 is the first single-lens event in which the lens mass is measured via the binarity of the source. Given the abundance of binary systems as potential microlensing sources, the xallarap effect may not be a rare phenomenon. Our work thus highlights the application of the xallarap effect in the mass determination of microlenses, and the same method can be used to identify isolated dark lenses.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Qubit-assisted quantum metrology
Authors:
Peng Chen,
Jun **g
Abstract:
We propose a quantum metrology protocol based on a two-step joint evolution of the probe system and an ancillary qubit and a single-shot projective measurement. With an optimized initialization of the ancillary qubit, the quantum Fisher information (QFI) about the phase parameter encoded in the probe system is found to be determined by the expectation value of the square of a time-optimized phase…
▽ More
We propose a quantum metrology protocol based on a two-step joint evolution of the probe system and an ancillary qubit and a single-shot projective measurement. With an optimized initialization of the ancillary qubit, the quantum Fisher information (QFI) about the phase parameter encoded in the probe system is found to be determined by the expectation value of the square of a time-optimized phase generator, independent of the probe state. Therefore, QFI can approach the Heisenberg scaling $N^2$ with respect to the quantum number $N$, even when the probe system is prepared in a classical state. We find that this scaling behavior is robust against the imperfections in preparing the ancillary qubit and controlling the evolution time. Using the time-reversal strategy, the classical Fisher information (CFI) in our metrology protocol is saturated with its quantum counterpart. Our work thus paves an economical way to realize the Heisenberg-scaling limit in metrology precision with no use of entanglement or squeezing.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Incremental Self-training for Semi-supervised Learning
Authors:
Jifeng Guo,
Zhulin Liu,
Tong Zhang,
C. L. Philip Chen
Abstract:
Semi-supervised learning provides a solution to reduce the dependency of machine learning on labeled data. As one of the efficient semi-supervised techniques, self-training (ST) has received increasing attention. Several advancements have emerged to address challenges associated with noisy pseudo-labels. Previous works on self-training acknowledge the importance of unlabeled data but have not delv…
▽ More
Semi-supervised learning provides a solution to reduce the dependency of machine learning on labeled data. As one of the efficient semi-supervised techniques, self-training (ST) has received increasing attention. Several advancements have emerged to address challenges associated with noisy pseudo-labels. Previous works on self-training acknowledge the importance of unlabeled data but have not delved into their efficient utilization, nor have they paid attention to the problem of high time consumption caused by iterative learning. This paper proposes Incremental Self-training (IST) for semi-supervised learning to fill these gaps. Unlike ST, which processes all data indiscriminately, IST processes data in batches and priority assigns pseudo-labels to unlabeled samples with high certainty. Then, it processes the data around the decision boundary after the model is stabilized, enhancing classifier performance. Our IST is simple yet effective and fits existing self-training-based semi-supervised learning methods. We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed. Significantly, it outperforms state-of-the-art competitors on three challenging image classification tasks.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
A Strategy Transfer and Decision Support Approach for Epidemic Control in Experience Shortage Scenarios
Authors:
X. Xiao,
P. Chen,
X. Cao,
K. Liu,
L. Deng,
D. Zhao,
Z. Chen,
Q. Deng,
F. Yu,
H. Zhang
Abstract:
Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in…
▽ More
Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in this method: (1) The similarity evaluation indicators are determined from three dimensions, i.e., the Basis of National Epidemic Prevention & Control, Social Resilience, and Infection Situation. (2) The data related to the indicators are collected and preprocessed. (3) The first round of screening on the preprocessed dataset is conducted through an improved collaborative filtering algorithm to calculate the preliminary similarity result from the perspective of the infection situation. (4) Finally, the K-Means model is used for the second round of screening to obtain the final similarity values. The approach will be applied to decision-making support in the context of COVID-19. Our results demonstrate that the recommendations generated by the STDSA model are more accurate and aligned better with the actual situation than those produced by pure K-means models. This study will provide new insights into preventing and controlling epidemics in regions that lack experience.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval
Authors:
Peter Baile Chen,
Yi Zhang,
Dan Roth
Abstract:
Retrieving relevant tables containing the necessary information to accurately answer a given question over tables is critical to open-domain question-answering (QA) systems. Previous methods assume the answer to such a question can be found either in a single table or multiple tables identified through question decomposition or rewriting. However, neither of these approaches is sufficient, as many…
▽ More
Retrieving relevant tables containing the necessary information to accurately answer a given question over tables is critical to open-domain question-answering (QA) systems. Previous methods assume the answer to such a question can be found either in a single table or multiple tables identified through question decomposition or rewriting. However, neither of these approaches is sufficient, as many questions require retrieving multiple tables and joining them through a join plan that cannot be discerned from the user query itself. If the join plan is not considered in the retrieval stage, the subsequent steps of reasoning and answering based on those retrieved tables are likely to be incorrect. To address this problem, we introduce a method that uncovers useful join relations for any query and database during table retrieval. We use a novel re-ranking method formulated as a mixed-integer program that considers not only table-query relevance but also table-table relevance that requires inferring join relationships. Our method outperforms the state-of-the-art approaches for table retrieval by up to 9.3% in F1 score and for end-to-end QA by up to 5.4% in accuracy.
△ Less
Submitted 5 June, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Adaptive Patching for High-resolution Image Segmentation with Transformers
Authors:
Enzhi Zhang,
Isaac Lyngaas,
Peng Chen,
Xiao Wang,
Jun Igarashi,
Yuankai Huo,
Mohamed Wahib,
Masaharu Munetomo
Abstract:
Attention-based models are proliferating in the space of image analytics, including segmentation. The standard method of feeding images to transformer encoders is to divide the images into patches and then feed the patches to the model as a linear sequence of tokens. For high-resolution images, e.g. microscopic pathology images, the quadratic compute and memory cost prohibits the use of an attenti…
▽ More
Attention-based models are proliferating in the space of image analytics, including segmentation. The standard method of feeding images to transformer encoders is to divide the images into patches and then feed the patches to the model as a linear sequence of tokens. For high-resolution images, e.g. microscopic pathology images, the quadratic compute and memory cost prohibits the use of an attention-based model, if we are to use smaller patch sizes that are favorable in segmentation. The solution is to either use custom complex multi-resolution models or approximate attention schemes. We take inspiration from Adapative Mesh Refinement (AMR) methods in HPC by adaptively patching the images, as a pre-processing step, based on the image details to reduce the number of patches being fed to the model, by orders of magnitude. This method has a negligible overhead, and works seamlessly with any attention-based model, i.e. it is a pre-processing step that can be adopted by any attention-based model without friction. We demonstrate superior segmentation quality over SoTA segmentation models for real-world pathology datasets while gaining a geomean speedup of $6.9\times$ for resolutions up to $64K^2$, on up to $2,048$ GPUs.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception
Authors:
Yipo Huang,
Xiangfei Sheng,
Zhichao Yang,
Quan Yuan,
Zhichao Duan,
Pengfei Chen,
Leida Li,
Weisi Lin,
Guangming Shi
Abstract:
The highly abstract nature of image aesthetics perception (IAP) poses significant challenge for current multimodal large language models (MLLMs). The lack of human-annotated multi-modality aesthetic data further exacerbates this dilemma, resulting in MLLMs falling short of aesthetics perception capabilities. To address the above challenge, we first introduce a comprehensively annotated Aesthetic M…
▽ More
The highly abstract nature of image aesthetics perception (IAP) poses significant challenge for current multimodal large language models (MLLMs). The lack of human-annotated multi-modality aesthetic data further exacerbates this dilemma, resulting in MLLMs falling short of aesthetics perception capabilities. To address the above challenge, we first introduce a comprehensively annotated Aesthetic Multi-Modality Instruction Tuning (AesMMIT) dataset, which serves as the footstone for building multi-modality aesthetics foundation models. Specifically, to align MLLMs with human aesthetics perception, we construct a corpus-rich aesthetic critique database with 21,904 diverse-sourced images and 88K human natural language feedbacks, which are collected via progressive questions, ranging from coarse-grained aesthetic grades to fine-grained aesthetic descriptions. To ensure that MLLMs can handle diverse queries, we further prompt GPT to refine the aesthetic critiques and assemble the large-scale aesthetic instruction tuning dataset, i.e. AesMMIT, which consists of 409K multi-typed instructions to activate stronger aesthetic capabilities. Based on the AesMMIT database, we fine-tune the open-sourced general foundation models, achieving multi-modality Aesthetic Expert models, dubbed AesExpert. Extensive experiments demonstrate that the proposed AesExpert models deliver significantly better aesthetic perception performances than the state-of-the-art MLLMs, including the most advanced GPT-4V and Gemini-Pro-Vision. Source data will be available at https://github.com/yipoh/AesExpert.
△ Less
Submitted 18 April, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
The final burst of the moving mirror is unrelated to the partner mode of analog Hawking radiation
Authors:
Yuki Osawa,
Kuan-Nan Lin,
Yasusada Nambu,
Masahiro Hotta,
Pisin Chen
Abstract:
Flying mirrors with appropriate trajectories have been recognized as an analog system that mimics black hole Hawking evaporation and have been widely investigated. It has recently been suggested that the partner mode of the analog Hawking radiation emitted from a moving mirror would manifest itself through a final burst when the mirror executes a sudden stop. Here we argue the opposite via the par…
▽ More
Flying mirrors with appropriate trajectories have been recognized as an analog system that mimics black hole Hawking evaporation and have been widely investigated. It has recently been suggested that the partner mode of the analog Hawking radiation emitted from a moving mirror would manifest itself through a final burst when the mirror executes a sudden stop. Here we argue the opposite via the partner formula for the moving mirror model. By expanding the theoretical foundation of the partner formula and augmenting it with numerical analysis, we demonstrate that the supposed final burst is induced by a shock that requires the input of external energy, whereas the Hawking radiation partner mode, which is associated with the zero-point vacuum fluctuations, is not responsible for the burst.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
The 8th AI City Challenge
Authors:
Shuo Wang,
David C. Anastasiu,
Zheng Tang,
Ming-Ching Chang,
Yue Yao,
Liang Zheng,
Mohammed Shaiqur Rahman,
Meenakshi S. Arya,
Anuj Sharma,
Pranamesh Chakraborty,
Sanjita Prajapati,
Quan Kong,
Norimasa Kobori,
Munkhjargal Gochoo,
Munkh-Erdene Otgonbold,
Fady Alnajjar,
Ganzorig Batnasan,
**-Yang Chen,
Jun-Wei Hsieh,
Xunlei Wu,
Sameer Satish Pusegaonkar,
Yizhou Wang,
Sujit Biswas,
Rama Chellappa
Abstract:
The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC)…
▽ More
The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC) people tracking, highlighting significant enhancements in camera count, character number, 3D annotation, and camera matrices, alongside new rules for 3D tracking and online tracking algorithm encouragement. Track 2 introduced dense video captioning for traffic safety, focusing on pedestrian accidents using multi-camera feeds to improve insights for insurance and prevention. Track 3 required teams to classify driver actions in a naturalistic driving analysis. Track 4 explored fish-eye camera analytics using the FishEye8K dataset. Track 5 focused on motorcycle helmet rule violation detection. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks, some surpassing existing state-of-the-art achievements.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.