-
Riesz potential estimates for mixed local-nonlocal problems with measure data
Authors:
Iwona Chlebicka,
Kyeong Song,
Yeonghun Youn,
Anna Zatorska-Goldstein
Abstract:
We study gradient regularity for mixed local-nonlocal problems modelled upon \[ -Δ_p u +(-Δ_p)^su=μ\qquad\text{for} \quad 2-\tfrac{1}{n}<p<\infty\quad \text{and}\quad s\in(0,1)\,,\] where $μ$ is a bounded Borel measure. We prove pointwise bounds for the gradient $Du$ in terms of the truncated 1-Riesz potential of $μ$.
We study gradient regularity for mixed local-nonlocal problems modelled upon \[ -Δ_p u +(-Δ_p)^su=μ\qquad\text{for} \quad 2-\tfrac{1}{n}<p<\infty\quad \text{and}\quad s\in(0,1)\,,\] where $μ$ is a bounded Borel measure. We prove pointwise bounds for the gradient $Du$ in terms of the truncated 1-Riesz potential of $μ$.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
InFoBench: Evaluating Instruction Following Ability in Large Language Models
Authors:
Yiwei Qin,
Kaiqiang Song,
Yebowen Hu,
Wenlin Yao,
Sangwoo Cho,
Xiaoyang Wang,
Xuansheng Wu,
Fei Liu,
Pengfei Liu,
Dong Yu
Abstract:
This paper introduces the Decomposed Requirements Following Ratio (DRFR), a new metric for evaluating Large Language Models' (LLMs) ability to follow instructions. Addressing a gap in current methodologies, DRFR breaks down complex instructions into simpler criteria, facilitating a detailed analysis of LLMs' compliance with various aspects of tasks. Alongside this metric, we present InFoBench, a b…
▽ More
This paper introduces the Decomposed Requirements Following Ratio (DRFR), a new metric for evaluating Large Language Models' (LLMs) ability to follow instructions. Addressing a gap in current methodologies, DRFR breaks down complex instructions into simpler criteria, facilitating a detailed analysis of LLMs' compliance with various aspects of tasks. Alongside this metric, we present InFoBench, a benchmark comprising 500 diverse instructions and 2,250 decomposed questions across multiple constraint categories. Our experiments compare DRFR with traditional scoring methods and explore annotation sources, including human experts, crowd-sourced workers, and GPT-4. The findings demonstrate DRFR's higher reliability and the effectiveness of using GPT-4 as a cost-efficient annotator. The evaluation of several advanced LLMs using this framework reveals their strengths and areas needing improvement, particularly in complex instruction-following. This study contributes a novel metric and benchmark, offering insights for future LLM development and evaluation.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Nonlinear Rydberg exciton-polaritons in Cu$_2$O microcavities
Authors:
Maxim Makhonin,
Anthonin Delphan,
Kok Wee Song,
Paul Walker,
Tommi Isoniemi,
Peter Claronino,
Konstantinos Orfanakis,
Sai Kiran Rajendran,
Hamid Ohadi,
Julian Heckötter,
Marc Aßmann,
Manfred Bayer,
Alexander Tartakovskii,
Maurice Skolnick,
Oleksandr Kyriienko,
Dmitry Krizhanovskii
Abstract:
Rydberg excitons (analogues of Rydberg atoms in condensed matter systems) are highly excited bound electron-hole states with large Bohr radii. The interaction between them as well as exciton coupling to light may lead to strong optical nonlinearity, with applications in sensing and quantum information processing. Here, we achieve strong effective photon-photon interactions (Kerr-like optical nonli…
▽ More
Rydberg excitons (analogues of Rydberg atoms in condensed matter systems) are highly excited bound electron-hole states with large Bohr radii. The interaction between them as well as exciton coupling to light may lead to strong optical nonlinearity, with applications in sensing and quantum information processing. Here, we achieve strong effective photon-photon interactions (Kerr-like optical nonlinearity) via the Rydberg blockade phenomenon and the hybridisation of excitons and photons forming polaritons in a Cu$_2$O-filled microresonators. Under pulsed resonant excitation polariton resonance frequencies are renormalised due to the reduction of the photon-exciton coupling with increasing exciton density. Theoretical analysis shows that the Rydberg blockade plays a major role in the experimentally observed scaling of the polariton nonlinearity coefficient as $\propto n^{4.4 \pm 1.8}$ for principal quantum numbers up to n = 7. Such high principal quantum numbers studied in a polariton system for the first time are essential for realisation of high Rydberg optical nonlinearities, which paves the way towards quantum optical applications and fundamental studies of strongly-correlated photonic (polaritonic) states in a solid state system.
△ Less
Submitted 14 March, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
Limited Feedback on Measurements: Sharing a Codebook or a Generative Model?
Authors:
Nurettin Turan,
Benedikt Fesl,
Michael Joham,
Zhengxiang Ma,
Anthony C. K. Soong,
Baoling Sheen,
Weimin Xiao,
Wolfgang Utschick
Abstract:
Discrete Fourier transform (DFT) codebook-based solutions are well-established for limited feedback schemes in frequency division duplex (FDD) systems. In recent years, data-aided solutions have been shown to achieve higher performance, enabled by the adaptivity of the feedback scheme to the propagation environment of the base station (BS) cell. In particular, a versatile limited feedback scheme u…
▽ More
Discrete Fourier transform (DFT) codebook-based solutions are well-established for limited feedback schemes in frequency division duplex (FDD) systems. In recent years, data-aided solutions have been shown to achieve higher performance, enabled by the adaptivity of the feedback scheme to the propagation environment of the base station (BS) cell. In particular, a versatile limited feedback scheme utilizing Gaussian mixture models (GMMs) was recently introduced. The scheme supports multi-user communications, exhibits low complexity, supports parallelization, and offers significant flexibility concerning various system parameters. Conceptually, a GMM captures environment knowledge and is subsequently transferred to the mobile terminals (MTs) for online inference of feedback information. Afterward, the BS designs precoders using either directional information or a generative modeling-based approach. A major shortcoming of recent works is that the assessed system performance is only evaluated through synthetic simulation data that is generally unable to fully characterize the features of real-world environments. It raises the question of how the GMM-based feedback scheme performs on real-world measurement data, especially compared to the well-established DFT-based solution. Our experiments reveal that the GMM-based feedback scheme tremendously improves the system performance measured in terms of sum-rate, allowing to deploy systems with fewer pilots or feedback bits.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
Authors:
Kaiwen Song,
Xiaoyi Zeng,
Chenqu Ren,
Juyong Zhang
Abstract:
Existing neural radiance field-based methods can achieve real-time rendering of small scenes on the web platform. However, extending these methods to large-scale scenes still poses significant challenges due to limited resources in computation, memory, and bandwidth. In this paper, we propose City-on-Web, the first method for real-time rendering of large-scale scenes on the web. We propose a block…
▽ More
Existing neural radiance field-based methods can achieve real-time rendering of small scenes on the web platform. However, extending these methods to large-scale scenes still poses significant challenges due to limited resources in computation, memory, and bandwidth. In this paper, we propose City-on-Web, the first method for real-time rendering of large-scale scenes on the web. We propose a block-based volume rendering method to guarantee 3D consistency and correct occlusion between blocks, and introduce a Level-of-Detail strategy combined with dynamic loading/unloading of resources to significantly reduce memory demands. Our system achieves real-time rendering of large-scale scenes at approximately 32FPS with RTX 3060 GPU on the web and maintains rendering quality comparable to the current state-of-the-art novel view synthesis methods.
△ Less
Submitted 31 March, 2024; v1 submitted 27 December, 2023;
originally announced December 2023.
-
ESBMC v7.4: Harnessing the Power of Intervals
Authors:
Rafael Menezes,
Mohannad Aldughaim,
Bruno Farias,
Xianzhiyu Li,
Edoardo Manino,
Fedor Shmarov,
Kunjian Song,
Franz Brauße,
Mikhail R. Gadelha,
Norbert Tihanyi,
Konstantin Korovin,
Lucas C. Cordeiro
Abstract:
ESBMC implements many state-of-the-art techniques for model checking. We report on new and improved features that allow us to obtain verification results for previously unsupported programs and properties. ESBMC employs a new static interval analysis of expressions in programs to increase verification performance. This includes interval-based reasoning over booleans and integers, forward and backw…
▽ More
ESBMC implements many state-of-the-art techniques for model checking. We report on new and improved features that allow us to obtain verification results for previously unsupported programs and properties. ESBMC employs a new static interval analysis of expressions in programs to increase verification performance. This includes interval-based reasoning over booleans and integers, forward and backward contractors, and particular optimizations related to singleton intervals because of their ubiquity. Other relevant improvements concern the verification of concurrent programs, as well as several operational models, internal ones, and also those of libraries such as pthread and the C mathematics library. An extended memory safety analysis now allows tracking of memory leaks that are considered still reachable.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
3D Programming of Patterned Heterogeneous Interface for 4D Smart Robotics
Authors:
Kewei Song,
Chunfeng Xiong,
Ze Zhang,
Kunlin Wu,
Weiyang Wan,
Yifan Wang,
Shinjiro Umezu,
Hirotaka Sato
Abstract:
Shape memory structures are playing an important role in many cutting-edge intelligent fields. However, the existing technologies can only realize 4D printing of a single polymer or metal, which limits practical applications. Here, we report a construction strategy for TSMP/M heterointerface, which uses Pd2+-containing shape memory polymer (AP-SMR) to induce electroless plating reaction and relies…
▽ More
Shape memory structures are playing an important role in many cutting-edge intelligent fields. However, the existing technologies can only realize 4D printing of a single polymer or metal, which limits practical applications. Here, we report a construction strategy for TSMP/M heterointerface, which uses Pd2+-containing shape memory polymer (AP-SMR) to induce electroless plating reaction and relies on molecular dynamics, which has both shape memory properties and metal activity and information processing power. Through multi-material DLP 3D printing technology, the interface can be 3D selectively programmed on functional substrate parts of arbitrary shapes to become 4D electronic smart devices (Robotics). Microscopically, this type of interface appears as a composite structure with a nanometer-micrometer interface height, which is composed of a pure substrate layer (smart materials), an intermediate layer (a composite structure in which metal particles are embedded in a polymer cross-linked network) and a pure metal layer. The structure programmed by TSMP/M heterointerface exhibits both SMA characteristics and metal properties, thus having more intelligent functions (electroactive, electrothermal deformation, electronically controlled denaturation) and higher performance (selectivity of shape memory structures can be realized control, remote control, inline control and low voltage control). This is expected to provide a more flexible manufacturing process as platform technology for designing, manufacturing and applying smart devices with new concepts, and promote the development of cutting-edge industries such as smart robots and smart electronics.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Bootstrap Masked Visual Modeling via Hard Patches Mining
Authors:
Haochen Wang,
Junsong Fan,
Yuxi Wang,
Kaiyou Song,
Tiancai Wang,
Xiangyu Zhang,
Zhaoxiang Zhang
Abstract:
Masked visual modeling has attracted much attention due to its promising potential in learning generalizable representations. Typical approaches urge models to predict specific contents of masked tokens, which can be intuitively considered as teaching a student (the model) to solve given problems (predicting masked contents). Under such settings, the performance is highly correlated with mask stra…
▽ More
Masked visual modeling has attracted much attention due to its promising potential in learning generalizable representations. Typical approaches urge models to predict specific contents of masked tokens, which can be intuitively considered as teaching a student (the model) to solve given problems (predicting masked contents). Under such settings, the performance is highly correlated with mask strategies (the difficulty of provided problems). We argue that it is equally important for the model to stand in the shoes of a teacher to produce challenging problems by itself. Intuitively, patches with high values of reconstruction loss can be regarded as hard samples, and masking those hard patches naturally becomes a demanding reconstruction task. To empower the model as a teacher, we propose Hard Patches Mining (HPM), predicting patch-wise losses and subsequently determining where to mask. Technically, we introduce an auxiliary loss predictor, which is trained with a relative objective to prevent overfitting to exact loss values. Also, to gradually guide the training procedure, we propose an easy-to-hard mask strategy. Empirically, HPM brings significant improvements under both image and video benchmarks. Interestingly, solely incorporating the extra loss prediction objective leads to better representations, verifying the efficacy of determining where is hard to reconstruct. The code is available at https://github.com/Haochen-Wang409/HPM.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery
Authors:
Pengwei Yan,
Kaisong Song,
Zhuoren Jiang,
Yangyang Kang,
Tianqian** Lin,
Changlong Sun,
Xiaozhong Liu
Abstract:
While self-supervised graph pretraining techniques have shown promising results in various domains, their application still experiences challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions. To address these issues, we propose a novel solution, Dual-level Graph self-supervised Pretraining with Motif discovery (DGPM), which introduces a unique…
▽ More
While self-supervised graph pretraining techniques have shown promising results in various domains, their application still experiences challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions. To address these issues, we propose a novel solution, Dual-level Graph self-supervised Pretraining with Motif discovery (DGPM), which introduces a unique dual-level pretraining structure that orchestrates node-level and subgraph-level pretext tasks. Unlike prior approaches, DGPM autonomously uncovers significant graph motifs through an edge pooling module, aligning learned motif similarities with graph kernel-based similarities. A cross-matching task enables sophisticated node-motif interactions and novel representation learning. Extensive experiments on 15 datasets validate DGPM's effectiveness and generalizability, outperforming state-of-the-art methods in unsupervised representation learning and transfer learning settings. The autonomously discovered motifs demonstrate the potential of DGPM to enhance robustness and interpretability.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Semantic-Aware Autoregressive Image Modeling for Visual Representation Learning
Authors:
Kaiyou Song,
Shan Zhang,
Tong Wang
Abstract:
The development of autoregressive modeling (AM) in computer vision lags behind natural language processing (NLP) in self-supervised pre-training. This is mainly caused by the challenge that images are not sequential signals and lack a natural order when applying autoregressive modeling. In this study, inspired by human beings' way of gras** an image, i.e., focusing on the main object first, we p…
▽ More
The development of autoregressive modeling (AM) in computer vision lags behind natural language processing (NLP) in self-supervised pre-training. This is mainly caused by the challenge that images are not sequential signals and lack a natural order when applying autoregressive modeling. In this study, inspired by human beings' way of gras** an image, i.e., focusing on the main object first, we present a semantic-aware autoregressive image modeling (SemAIM) method to tackle this challenge. The key insight of SemAIM is to autoregressive model images from the semantic patches to the less semantic patches. To this end, we first calculate a semantic-aware permutation of patches according to their feature similarities and then perform the autoregression procedure based on the permutation. In addition, considering that the raw pixels of patches are low-level signals and are not ideal prediction targets for learning high-level semantic representation, we also explore utilizing the patch features as the prediction targets. Extensive experiments are conducted on a broad range of downstream tasks, including image classification, object detection, and instance/semantic segmentation, to evaluate the performance of SemAIM. The results demonstrate SemAIM achieves state-of-the-art performance compared with other self-supervised methods. Specifically, with ViT-B, SemAIM achieves 84.1% top-1 accuracy for fine-tuning on ImageNet, 51.3% AP and 45.4% AP for object detection and instance segmentation on COCO, which outperforms the vanilla MAE by 0.5%, 1.0%, and 0.5%, respectively.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Authors:
Kaiqiang Song,
Xiaoyang Wang,
Sangwoo Cho,
Xiaoman Pan,
Dong Yu
Abstract:
This paper introduces a novel approach to enhance the capabilities of Large Language Models (LLMs) in processing and understanding extensive text sequences, a critical aspect in applications requiring deep comprehension and synthesis of large volumes of information. Recognizing the inherent challenges in extending the context window for LLMs, primarily built on Transformer architecture, we propose…
▽ More
This paper introduces a novel approach to enhance the capabilities of Large Language Models (LLMs) in processing and understanding extensive text sequences, a critical aspect in applications requiring deep comprehension and synthesis of large volumes of information. Recognizing the inherent challenges in extending the context window for LLMs, primarily built on Transformer architecture, we propose a new model architecture, referred to as Zebra. This architecture efficiently manages the quadratic time and memory complexity issues associated with full attention in the Transformer by employing grouped local-global attention layers. Our model, akin to a zebra's alternating stripes, balances local and global attention layers, significantly reducing computational requirements and memory consumption. Comprehensive experiments, including pretraining from scratch, continuation of long context adaptation training, and long instruction tuning, are conducted to evaluate the Zebra's performance. The results show that Zebra achieves comparable or superior performance on both short and long sequence benchmarks, while also enhancing training and inference efficiency.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Motion Planning for Multiple Mobile Manipulator System in Complex Flip** Manipulation
Authors:
Wenhang Liu,
Kun Song,
Meng Ren,
Jiawei Hu,
Michael Yu Wang,
Zhenhua Xiong
Abstract:
Multiple robot systems are favored for object manipulation and transportation, especially for large objects. However, in more complex manipulation such as flip**, these systems encounter a new challenge, configuration disconnectivity of manipulators. Gras** objects by manipulators will impose closed-chain constraints on the system, which in turn limits the feasible motions of manipulators and…
▽ More
Multiple robot systems are favored for object manipulation and transportation, especially for large objects. However, in more complex manipulation such as flip**, these systems encounter a new challenge, configuration disconnectivity of manipulators. Gras** objects by manipulators will impose closed-chain constraints on the system, which in turn limits the feasible motions of manipulators and further compromises the configuration connectivity. Multiple mobile manipulator systems show much more flexibility in object manipulation with the mobility of the mobile platform and have the potential to address the above problem. In this paper, a novel planning framework is proposed for complex flip** manipulation by incorporating platform motions and regras**. Firstly, two types of trajectories, mobile manipulator planning and regras** planning, are classified and can be assigned different priorities for different tasks. Secondly, corresponding planning methods are designed for each type of trajectory. Specifically, in mobile manipulator planning, the configuration of the platform is determined through optimization to ensure connectivity when the manipulator approaches configuration boundaries. In regras** planning, closed-chain constraints are temporarily disregarded and the manipulation capabilities are prioritized to facilitate subsequent planning. Finally, the structure of the overall planning framework is provided. Experimental results demonstrate that the proposed planner efficiently plans the motions of the system to accomplish flip** manipulation. Additionally, a comprehensive experiment emphasizes the significance of our planner in extending the capabilities of multiple mobile manipulator systems in complex tasks.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Towards Human-like Perception: Learning Structural Causal Model in Heterogeneous Graph
Authors:
Tianqian** Lin,
Kaisong Song,
Zhuoren Jiang,
Yangyang Kang,
Weikang Yuan,
Xurui Li,
Changlong Sun,
Cui Huang,
Xiaozhong Liu
Abstract:
Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic…
▽ More
Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic the human perception and decision process through two key steps: constructing intelligible variables based on semantics derived from the graph schema and automatically learning task-level causal relationships among these variables by incorporating advanced causal discovery techniques. We compared HG-SCM to seven state-of-the-art baseline models on three real-world datasets, under three distinct and ubiquitous out-of-distribution settings. HG-SCM achieved the highest average performance rank with minimal standard deviation, substantiating its effectiveness and superiority in terms of both predictive power and generalizability. Additionally, the visualization and analysis of the auto-learned causal diagrams for the three tasks aligned well with domain knowledge and human cognition, demonstrating prominent interpretability. HG-SCM's human-like nature and its enhanced generalizability and interpretability make it a promising solution for special scenarios where transparency and trustworthiness are paramount.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Target to Source: Guidance-Based Diffusion Model for Test-Time Adaptation
Authors:
Kaiyu Song,
Hanjiang Lai
Abstract:
Most recent works of test-time adaptation (TTA) aim to alleviate domain shift problems by re-training source classifiers in each domain. On the other hand, the emergence of the diffusion model provides another solution to TTA, which directly maps the test data from the target domain to the source domain based on a diffusion model pre-trained in the source domain. The source classifier does not nee…
▽ More
Most recent works of test-time adaptation (TTA) aim to alleviate domain shift problems by re-training source classifiers in each domain. On the other hand, the emergence of the diffusion model provides another solution to TTA, which directly maps the test data from the target domain to the source domain based on a diffusion model pre-trained in the source domain. The source classifier does not need to be fine-tuned. However, 1) the semantic information loss from test data to the source domain and 2) the model shift between the source classifier and diffusion model would prevent the diffusion model from map** the test data back to the source domain correctly. In this paper, we propose a novel guidance-based diffusion-driven adaptation (GDDA) to overcome the data shift and let the diffusion model find a better way to go back to the source. Concretely, we first propose detail and global guidance to better keep the common semantics of the test and source data. The two guidance include a contrastive loss and mean squared error to alleviate the information loss by fully exploring the diffusion model and the test data. Meanwhile, we propose a classifier-aware guidance to reduce the bias caused by the model shift, which can incorporate the source classifier's information into the generation process of the diffusion model. Extensive experiments on three image datasets with three classifier backbones demonstrate that GDDA significantly performs better than the state-of-the-art baselines. On CIFAR-10C, CIFAR-100C, and ImageNetC, GDDA achieves 11.54\%, 19.05\%, and 11.63\% average accuracy improvements, respectively. GDDA even achieves equal performance compared with methods of re-training classifiers. The code is available in the supplementary material.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model
Authors:
Kaiyu Song,
Hanjiang Lai
Abstract:
Deep neural networks (DNNs) are vulnerable to adversarial perturbation, where an imperceptible perturbation is added to the image that can fool the DNNs. Diffusion-based adversarial purification focuses on using the diffusion model to generate a clean image against such adversarial attacks. Unfortunately, the generative process of the diffusion model is also inevitably affected by adversarial pert…
▽ More
Deep neural networks (DNNs) are vulnerable to adversarial perturbation, where an imperceptible perturbation is added to the image that can fool the DNNs. Diffusion-based adversarial purification focuses on using the diffusion model to generate a clean image against such adversarial attacks. Unfortunately, the generative process of the diffusion model is also inevitably affected by adversarial perturbation since the diffusion model is also a deep network where its input has adversarial perturbation. In this work, we propose MimicDiffusion, a new diffusion-based adversarial purification technique, that directly approximates the generative process of the diffusion model with the clean image as input. Concretely, we analyze the differences between the guided terms using the clean image and the adversarial sample. After that, we first implement MimicDiffusion based on Manhattan distance. Then, we propose two guidance to purify the adversarial perturbation and approximate the clean diffusion model. Extensive experiments on three image datasets including CIFAR-10, CIFAR-100, and ImageNet with three classifier backbones including WideResNet-70-16, WideResNet-28-10, and ResNet50 demonstrate that MimicDiffusion significantly performs better than the state-of-the-art baselines. On CIFAR-10, CIFAR-100, and ImageNet, it achieves 92.67\%, 61.35\%, and 61.53\% average robust accuracy, which are 18.49\%, 13.23\%, and 17.64\% higher, respectively. The code is available in the supplementary material.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Lightweight Frequency-Based Tiering for CXL Memory Systems
Authors:
Kevin Song,
Jiacheng Yang,
Sihang Liu,
Gennady Pekhimenko
Abstract:
Modern workloads are demanding increasingly larger memory capacity. Compute Express Link (CXL)-based memory tiering has emerged as a promising solution for addressing this trend by utilizing traditional DRAM alongside slow-tier CXL-memory devices in the same system. Unfortunately, most prior tiering systems are recency-based, which cannot accurately identify hot and cold pages, since a recently ac…
▽ More
Modern workloads are demanding increasingly larger memory capacity. Compute Express Link (CXL)-based memory tiering has emerged as a promising solution for addressing this trend by utilizing traditional DRAM alongside slow-tier CXL-memory devices in the same system. Unfortunately, most prior tiering systems are recency-based, which cannot accurately identify hot and cold pages, since a recently accessed page is not necessarily a hot page. On the other hand, more accurate frequency-based systems suffer from high memory and runtime overhead as a result of tracking large memories.
In this paper, we propose FreqTier, a fast and accurate frequency-based tiering system for CXL memory. We observe that memory tiering systems can tolerate a small amount of tracking inaccuracy without compromising the overall application performance. Based on this observation, FreqTier probabilistically tracks the access frequency of each page, enabling accurate identification of hot and cold pages while maintaining minimal memory overhead. Finally, FreqTier intelligently adjusts the intensity of tiering operations based on the application's memory access behavior, thereby significantly reducing the amount of migration traffic and application interference.
We evaluate FreqTier on two emulated CXL memory devices with different bandwidths. On the high bandwidth CXL device, FreqTier can outperform state-of-the-art tiering systems while using 4$\times$ less local DRAM memory for in-memory caching workloads. On GAP graph analytics and XGBoost workloads with 1:32 local DRAM to CXL-memory ratio, FreqTier outperforms prior works by 1.04$-$2.04$\times$ (1.39$\times$ on average). Even on the low bandwidth CXL device, FreqTier outperforms AutoNUMA by 1.14$\times$ on average.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
TaskBench: Benchmarking Large Language Models for Task Automation
Authors:
Yongliang Shen,
Kaitao Song,
Xu Tan,
Wenqi Zhang,
Kan Ren,
Siyu Yuan,
Weiming Lu,
Dongsheng Li,
Yueting Zhuang
Abstract:
Recently, the incredible progress of large language models (LLMs) has ignited the spark of task automation, which decomposes the complex tasks described by user instructions into sub-tasks, and invokes external tools to execute them, and plays a central role in autonomous agents. However, there lacks a systematic and standardized benchmark to foster the development of LLMs in task automation. To t…
▽ More
Recently, the incredible progress of large language models (LLMs) has ignited the spark of task automation, which decomposes the complex tasks described by user instructions into sub-tasks, and invokes external tools to execute them, and plays a central role in autonomous agents. However, there lacks a systematic and standardized benchmark to foster the development of LLMs in task automation. To this end, we introduce TaskBench to evaluate the capability of LLMs in task automation. Specifically, task automation can be formulated into three critical stages: task decomposition, tool invocation, and parameter prediction to fulfill user intent. This complexity makes data collection and evaluation more challenging compared to common NLP tasks. To generate high-quality evaluation datasets, we introduce the concept of Tool Graph to represent the decomposed tasks in user intent, and adopt a back-instruct method to simulate user instruction and annotations. Furthermore, we propose TaskEval to evaluate the capability of LLMs from different aspects, including task decomposition, tool invocation, and parameter prediction. Experimental results demonstrate that TaskBench can effectively reflects the capability of LLMs in task automation. Benefiting from the mixture of automated data construction and human verification, TaskBench achieves a high consistency compared to the human evaluation, which can be utilized as a comprehensive and faithful benchmark for LLM-based autonomous agents.
△ Less
Submitted 9 December, 2023; v1 submitted 30 November, 2023;
originally announced November 2023.
-
MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning
Authors:
Fuxiao Liu,
Xiaoyang Wang,
Wenlin Yao,
Jianshu Chen,
Kaiqiang Song,
Sangwoo Cho,
Yaser Yacoob,
Dong Yu
Abstract:
With the rapid development of large language models (LLMs) and their integration into large multimodal models (LMMs), there has been impressive progress in zero-shot completion of user-oriented vision-language tasks. However, a gap remains in the domain of chart image understanding due to the distinct abstract components in charts. To address this, we introduce a large-scale MultiModal Chart Instr…
▽ More
With the rapid development of large language models (LLMs) and their integration into large multimodal models (LMMs), there has been impressive progress in zero-shot completion of user-oriented vision-language tasks. However, a gap remains in the domain of chart image understanding due to the distinct abstract components in charts. To address this, we introduce a large-scale MultiModal Chart Instruction (\textbf{MMC-Instruction}) dataset comprising 600k instances supporting diverse tasks and chart types. Leveraging this data, we develop MultiModal Chart Assistant (\textbf{MMCA}), an LMM that achieves state-of-the-art performance on existing chart QA benchmarks. Recognizing the need for a comprehensive evaluation of LMM chart understanding, we also propose a MultiModal Chart Benchmark (\textbf{MMC-Benchmark}), a comprehensive human-annotated benchmark with nine distinct tasks evaluating reasoning capabilities over charts. Extensive experiments on MMC-Benchmark reveal the limitations of existing LMMs on correctly interpreting charts, even for the most recent GPT-4V model. Our work provides an instruction-tuning methodology and benchmark to advance multimodal understanding of charts. Code and data are available at https://github.com/FuxiaoLiu/MMC.
△ Less
Submitted 15 April, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
General-purpose machine-learned potential for 16 elemental metals and their alloys
Authors:
Keke Song,
Rui Zhao,
Jiahui Liu,
Yanzhou Wang,
Eric Lindgren,
Yong Wang,
Shunda Chen,
Ke Xu,
Ting Liang,
Penghua Ying,
Nan Xu,
Zhiqiang Zhao,
Jiuyang Shi,
Junjie Wang,
Shuang Lyu,
Zezhu Zeng,
Shirong Liang,
Haikuan Dong,
Ligang Sun,
Yue Chen,
Zhuhua Zhang,
Wanlin Guo,
** Qian,
Jian Sun,
Paul Erhart
, et al. (3 additional authors not shown)
Abstract:
Machine-learned potentials (MLPs) have exhibited remarkable accuracy, yet the lack of general-purpose MLPs for a broad spectrum of elements and their alloys limits their applicability. Here, we present a feasible approach for constructing a unified general-purpose MLP for numerous elements, demonstrated through a model (UNEP-v1) for 16 elemental metals and their alloys. To achieve a complete repre…
▽ More
Machine-learned potentials (MLPs) have exhibited remarkable accuracy, yet the lack of general-purpose MLPs for a broad spectrum of elements and their alloys limits their applicability. Here, we present a feasible approach for constructing a unified general-purpose MLP for numerous elements, demonstrated through a model (UNEP-v1) for 16 elemental metals and their alloys. To achieve a complete representation of the chemical space, we show, via principal component analysis and diverse test datasets, that employing one-component and two-component systems suffices. Our unified UNEP-v1 model exhibits superior performance across various physical properties compared to a widely used embedded-atom method potential, while maintaining remarkable efficiency. We demonstrate our approach's effectiveness through reproducing experimentally observed chemical order and stable phases, and large-scale simulations of plasticity and primary radiation damage in MoTaVW alloys. This work represents a significant leap towards a unified general-purpose MLP encompassing the periodic table, with profound implications for materials science.
△ Less
Submitted 12 June, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.
-
Towards Calibrated Robust Fine-Tuning of Vision-Language Models
Authors:
Changdae Oh,
Hyesu Lim,
Mijoo Kim,
Dongyoon Han,
Sangdoo Yun,
Jaegul Choo,
Alexander Hauptmann,
Zhi-Qi Cheng,
Kyungwoo Song
Abstract:
Improving out-of-distribution (OOD) generalization through in-distribution (ID) adaptation is a primary goal of robust fine-tuning methods beyond the naive fine-tuning approach. However, despite decent OOD generalization performance from recent robust fine-tuning methods, OOD confidence calibration for reliable machine learning has not been fully addressed. This work proposes a robust fine-tuning…
▽ More
Improving out-of-distribution (OOD) generalization through in-distribution (ID) adaptation is a primary goal of robust fine-tuning methods beyond the naive fine-tuning approach. However, despite decent OOD generalization performance from recent robust fine-tuning methods, OOD confidence calibration for reliable machine learning has not been fully addressed. This work proposes a robust fine-tuning method that improves both OOD accuracy and calibration error in Vision Language Models (VLMs). Firstly, we show that both types of errors have a shared upper bound consisting of two terms of ID data: 1) calibration error and 2) the smallest singular value of the input covariance matrix. Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value, which is further aided by the self-distillation of a moving averaged model to achieve well-calibrated prediction. Starting from an empirical validation of our theoretical statements, we provide extensive experimental results on ImageNet distribution shift benchmarks that demonstrate the effectiveness of our method.
△ Less
Submitted 27 May, 2024; v1 submitted 3 November, 2023;
originally announced November 2023.
-
Single-pixel imaging based on deep learning
Authors:
Kai Song,
Yaoxing Bian,
Ku Wu,
Hongrui Liu,
Shuang** Han,
Jiaming Li,
Jiazhao Tian,
Chengbin Qin,
Jianyong Hu,
Liantuan Xiao
Abstract:
Single-pixel imaging can collect images at the wavelengths outside the reach of conventional focal plane array detectors. However, the limited image quality and lengthy computational times for iterative reconstruction still impede the practical application of single-pixel imaging. Recently, deep learning has been introduced into single-pixel imaging, which has attracted a lot of attention due to i…
▽ More
Single-pixel imaging can collect images at the wavelengths outside the reach of conventional focal plane array detectors. However, the limited image quality and lengthy computational times for iterative reconstruction still impede the practical application of single-pixel imaging. Recently, deep learning has been introduced into single-pixel imaging, which has attracted a lot of attention due to its exceptional reconstruction quality, fast reconstruction speed, and the potential to complete advanced sensing tasks without reconstructing images. Here, this advance is discussed and some opinions are offered. Firstly, based on the fundamental principles of single-pixel imaging and deep learning, the principles and algorithms of single-pixel imaging based on deep learning are described and analyzed. Subsequently, the implementation technologies of single-pixel imaging based on deep learning are reviewed. They are divided into super-resolution single-pixel imaging, single-pixel imaging through scattering media, photon-level single-pixel imaging, optical encryption based on single-pixel imaging, color single-pixel imaging, and image-free sensing according to diverse application fields. Finally, major challenges and corresponding feasible approaches are discussed, as well as more possible applications in the future.
△ Less
Submitted 16 November, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
FD-Align: Feature Discrimination Alignment for Fine-tuning Pre-Trained Models in Few-Shot Learning
Authors:
Kun Song,
Huimin Ma,
Bochao Zou,
Huishuai Zhang,
Weiran Huang
Abstract:
Due to the limited availability of data, existing few-shot learning methods trained from scratch fail to achieve satisfactory performance. In contrast, large-scale pre-trained models such as CLIP demonstrate remarkable few-shot and zero-shot capabilities. To enhance the performance of pre-trained models for downstream tasks, fine-tuning the model on downstream data is frequently necessary. However…
▽ More
Due to the limited availability of data, existing few-shot learning methods trained from scratch fail to achieve satisfactory performance. In contrast, large-scale pre-trained models such as CLIP demonstrate remarkable few-shot and zero-shot capabilities. To enhance the performance of pre-trained models for downstream tasks, fine-tuning the model on downstream data is frequently necessary. However, fine-tuning the pre-trained model leads to a decrease in its generalizability in the presence of distribution shift, while the limited number of samples in few-shot learning makes the model highly susceptible to overfitting. Consequently, existing methods for fine-tuning few-shot learning primarily focus on fine-tuning the model's classification head or introducing additional structure. In this paper, we introduce a fine-tuning approach termed Feature Discrimination Alignment (FD-Align). Our method aims to bolster the model's generalizability by preserving the consistency of spurious features across the fine-tuning process. Extensive experimental results validate the efficacy of our approach for both ID and OOD tasks. Once fine-tuned, the model can seamlessly integrate with existing methods, leading to performance improvements. Our code can be found in https://github.com/skingorz/FD-Align.
△ Less
Submitted 17 November, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
FHT-Map: Feature-based Hierarchical Topological Map for Relocalization and Path Planning
Authors:
Kun Song,
Wenhang Liu,
Gaoming Chen,
Xiang Xu,
Zhenhua Xiong
Abstract:
Topological maps are favorable for their small storage compared to geometric map. However, they are limited in relocalization and path planning capabilities. To solve this problem, a feature-based hierarchical topological map (FHT-Map) is proposed along with a real-time map construction algorithm for robot exploration. Specifically, the FHT-Map utilizes both RGB cameras and LiDAR information and c…
▽ More
Topological maps are favorable for their small storage compared to geometric map. However, they are limited in relocalization and path planning capabilities. To solve this problem, a feature-based hierarchical topological map (FHT-Map) is proposed along with a real-time map construction algorithm for robot exploration. Specifically, the FHT-Map utilizes both RGB cameras and LiDAR information and consists of two types of nodes: main node and support node. Main nodes will store visual information compressed by convolutional neural network and local laser scan data to enhance subsequent relocalization capability. Support nodes retain a minimal amount of data to ensure storage efficiency while facilitating path planning. After map construction with robot exploration, the FHT-Map can be used by other robots for relocalization and path planning. Experiments are conducted in Gazebo simulator, and the results demonstrate that the proposed FHT-Map can effectively improve relocalization and path planning capability compared with other topological maps. Moreover, experiments on hierarchical architecture are implemented to show the necessity of two types of nodes.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Authors:
Dingyao Yu,
Kaitao Song,
Peiling Lu,
Tianyu He,
Xu Tan,
Wei Ye,
Shikun Zhang,
Jiang Bian
Abstract:
AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data…
▽ More
AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks. Consequently, it is necessary to build a system to organize and integrate these tasks, and thus help practitioners to automatically analyze their demand and call suitable tools as solutions to fulfill their requirements. Inspired by the recent success of large language models (LLMs) in task automation, we develop a system, named MusicAgent, which integrates numerous music-related tools and an autonomous workflow to address user requirements. More specifically, we build 1) toolset that collects tools from diverse sources, including Hugging Face, GitHub, and Web API, etc. 2) an autonomous workflow empowered by LLMs (e.g., ChatGPT) to organize these tools and automatically decompose user requests into multiple sub-tasks and invoke corresponding music tools. The primary goal of this system is to free users from the intricacies of AI-music tools, enabling them to concentrate on the creative aspect. By granting users the freedom to effortlessly combine tools, the system offers a seamless and enriching music experience.
△ Less
Submitted 25 October, 2023; v1 submitted 18 October, 2023;
originally announced October 2023.
-
Learning To Teach Large Language Models Logical Reasoning
Authors:
Meiqi Chen,
Yubo Ma,
Kaitao Song,
Yixin Cao,
Yan Zhang,
Dongsheng Li
Abstract:
Large language models (LLMs) have gained enormous attention from both academia and industry, due to their exceptional ability in language generation and extremely powerful generalization. However, current LLMs still output unreliable content in practical reasoning tasks due to their inherent issues (e.g., hallucination). To better disentangle this problem, in this paper, we conduct an in-depth inv…
▽ More
Large language models (LLMs) have gained enormous attention from both academia and industry, due to their exceptional ability in language generation and extremely powerful generalization. However, current LLMs still output unreliable content in practical reasoning tasks due to their inherent issues (e.g., hallucination). To better disentangle this problem, in this paper, we conduct an in-depth investigation to systematically explore the capability of LLMs in logical reasoning. More in detail, we first investigate the deficiency of LLMs in logical reasoning on different tasks, including event relation extraction and deductive reasoning. Our study demonstrates that LLMs are not good reasoners in solving tasks with rigorous reasoning and will produce counterfactual answers, which require us to iteratively refine. Therefore, we comprehensively explore different strategies to endow LLMs with logical reasoning ability, and thus enable them to generate more logically consistent answers across different scenarios. Based on our approach, we also contribute a synthesized dataset (LLM-LR) involving multi-hop reasoning for evaluation and pre-training. Extensive quantitative and qualitative analyses on different tasks also validate the effectiveness and necessity of teaching LLMs with logic and provide insights for solving practical tasks with LLMs in future work.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Milestoning estimators of dissipation in systems observed at a coarse resolution: When ignorance is truly bliss
Authors:
Kristian Blom,
Kevin Song,
Etienne Vouga,
Aljaž Godec,
Dmitrii E. Makarov
Abstract:
Many non-equilibrium, active processes are observed at a coarse-grained level, where different microscopic configurations are projected onto the same observable state. Such "lumped" observables display memory, and in many cases the irreversible character of the underlying microscopic dynamics becomes blurred, e.g., when the projection hides dissipative cycles. As a result, the observations appear…
▽ More
Many non-equilibrium, active processes are observed at a coarse-grained level, where different microscopic configurations are projected onto the same observable state. Such "lumped" observables display memory, and in many cases the irreversible character of the underlying microscopic dynamics becomes blurred, e.g., when the projection hides dissipative cycles. As a result, the observations appear less irreversible, and it is very challenging to infer the degree of broken time-reversal symmetry. Here we show, contrary to intuition, that by ignoring parts of the already coarse-grained state space we may -- via a process called milestoning -- improve entropy-production estimates. Milestoning systematically renders observations "closer to underlying microscopic dynamics" and thereby improves thermodynamic inference from lumped data assuming a given range of memory. Moreover, whereas the correct general physical definition of time-reversal in the presence of memory remains unknown, we here show by means of systematic, physically relevant examples that at least for semi-Markov processes of first and second order, waiting-time contributions arising from adopting a naive Markovian definition of time-reversal generally must be discarded.
△ Less
Submitted 11 October, 2023; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Deformation Localisation in Ion-Irradiated FeCr
Authors:
Kay Song,
Dina Sheyfer,
Wenjun Liu,
Jonathan Z Tischler,
Suchandrima Das,
Kenichiro Mizohata,
Hongbing Yu,
David E J Armstrong,
Felix Hofmann
Abstract:
Irradiation-induced ductility loss is a major concern facing structural steels in next-generation nuclear reactors. Currently, the mechanisms for this are unclear but crucial to address for the design of reactor components. Here, the deformation characteristics around nanoindents in Fe and Fe10Cr irradiated with Fe ions to $\sim$1 displacement-per-atom at 313 K are non-destructively studied. Defor…
▽ More
Irradiation-induced ductility loss is a major concern facing structural steels in next-generation nuclear reactors. Currently, the mechanisms for this are unclear but crucial to address for the design of reactor components. Here, the deformation characteristics around nanoindents in Fe and Fe10Cr irradiated with Fe ions to $\sim$1 displacement-per-atom at 313 K are non-destructively studied. Deformation localisation in the irradiated materials is evident from the increased pile-up height and slip step formation, measured by atomic force microscopy. From 3D X-ray Laue diffraction, measurements of lattice rotation and strain fields near the indent site show a large confinement, over 85%, of plasticity in the irradiated material. We find that despite causing increased irradiation hardening, Cr content has little effect on the irradiation-induced changes in pile-up topography and deformation fields. The results demonstrate that varying Cr content in steels has limited impact on retaining strain hardening capacity and reducing irradiation-induced embrittlement.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Towards Novel Class Discovery: A Study in Novel Skin Lesions Clustering
Authors:
Wei Feng,
Lie Ju,
Lin Wang,
Kaimin Song,
Zongyuan Ge
Abstract:
Existing deep learning models have achieved promising performance in recognizing skin diseases from dermoscopic images. However, these models can only recognize samples from predefined categories, when they are deployed in the clinic, data from new unknown categories are constantly emerging. Therefore, it is crucial to automatically discover and identify new semantic categories from new data. In t…
▽ More
Existing deep learning models have achieved promising performance in recognizing skin diseases from dermoscopic images. However, these models can only recognize samples from predefined categories, when they are deployed in the clinic, data from new unknown categories are constantly emerging. Therefore, it is crucial to automatically discover and identify new semantic categories from new data. In this paper, we propose a new novel class discovery framework for automatically discovering new semantic classes from dermoscopy image datasets based on the knowledge of known classes. Specifically, we first use contrastive learning to learn a robust and unbiased feature representation based on all data from known and unknown categories. We then propose an uncertainty-aware multi-view cross pseudo-supervision strategy, which is trained jointly on all categories of data using pseudo labels generated by a self-labeling strategy. Finally, we further refine the pseudo label by aggregating neighborhood information through local sample similarity to improve the clustering performance of the model for unknown categories. We conducted extensive experiments on the dermatology dataset ISIC 2019, and the experimental results show that our approach can effectively leverage knowledge from known categories to discover new semantic categories. We also further validated the effectiveness of the different modules through extensive ablation experiments. Our code will be released soon.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Singular elliptic measure data problems with irregular obstacles
Authors:
Sun-Sig Byun,
Kyeong Song,
Yeonghun Youn
Abstract:
We investigate elliptic irregular obstacle problems with $p$-growth involving measure data. Emphasis is on the strongly singular case $1 < p \le 2-1/n$, and we obtain several new comparison estimates to prove gradient potential estimates in an intrinsic form. Our approach can be also applied to derive zero-order potential estimates.
We investigate elliptic irregular obstacle problems with $p$-growth involving measure data. Emphasis is on the strongly singular case $1 < p \le 2-1/n$, and we obtain several new comparison estimates to prove gradient potential estimates in an intrinsic form. Our approach can be also applied to derive zero-order potential estimates.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
The triggering process of an X-class solar flare on a small quadrupolar active region
Authors:
Qiao Song,
**g-Song Wang,
Xiaoxin Zhang,
Hechao Chen,
Shuhong Yang,
Zhenyong Hou,
Yijun Hou,
Qian Ye,
Peng Zhang,
Xiuqing Hu,
**** Dun,
Weiguo Zong,
Xianyong Bai,
Bo Chen,
Ling** He,
Kefei Song
Abstract:
The occurrence of X-class solar flares and their potential impact on the space weather often receive great attention than other flares. But predicting when and where an X-class flare will occur is still a challenge. With the multi-wavelength observation from the Solar Dynamics Observatory and FengYun- 3E satellite, we investigate the triggering of a GOES X1.0 flare occurring in the NOAA active reg…
▽ More
The occurrence of X-class solar flares and their potential impact on the space weather often receive great attention than other flares. But predicting when and where an X-class flare will occur is still a challenge. With the multi-wavelength observation from the Solar Dynamics Observatory and FengYun- 3E satellite, we investigate the triggering of a GOES X1.0 flare occurring in the NOAA active region (AR) 12887. Our results show that this unique X-class flare is bred in a relatively small but complex quadrupolar AR. Before the X-class flare, two filaments (F1 and F2) exist below a null-point topology of the quadrupolar AR. Magnetic field extrapolation and observation reveal that F1 and F2 correspond to two magnetic flux ropes with the same chirality and their adjacent feet rooted at nonconjugated opposite polarities, respectively. Interestingly, these two polarities collide rapidly, accompanied by photospheric magnetic flux emergence, cancellation and shear motion in the AR center. Above this site, F1 and F2 subsequently intersect and merge to a longer filament (F3) via a tether-cutting-like reconnection process. As a result, the F3 rises and erupts, involving the large-scale arcades overlying filament and the quadrupolar magnetic field above the AR, and eventually leads to the eruption of the X-class flare with a quasi-X-shaped flare ribbon and a coronal mass ejection. It suggests that the rapid collision of nonconjugated opposite polarities provides a key condition for the triggering of this X-class flare, and also provides a featured case for flare trigger mechanism and space weather forecasting.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Authors:
Qingyan Guo,
Rui Wang,
Junliang Guo,
Bei Li,
Kaitao Song,
Xu Tan,
Guoqing Liu,
Jiang Bian,
Yujiu Yang
Abstract:
Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. To enable EAs to work on di…
▽ More
Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. To enable EAs to work on discrete prompts, which are natural language expressions that need to be coherent and human-readable, we connect LLMs with EAs. This approach allows us to simultaneously leverage the powerful language processing capabilities of LLMs and the efficient optimization performance of EAs. Specifically, abstaining from any gradients or parameters, EvoPrompt starts from a population of prompts and iteratively generates new prompts with LLMs based on the evolutionary operators, improving the population based on the development set. We optimize prompts for both closed- and open-source LLMs including GPT-3.5 and Alpaca, on 31 datasets covering language understanding, generation tasks, as well as BIG-Bench Hard (BBH) tasks. EvoPrompt significantly outperforms human-engineered prompts and existing methods for automatic prompt generation (e.g., up to 25% on BBH). Furthermore, EvoPrompt demonstrates that connecting LLMs with EAs creates synergies, which could inspire further research on the combination of LLMs and conventional algorithms.
△ Less
Submitted 27 February, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Unsupervised Multi-document Summarization with Holistic Inference
Authors:
Haopeng Zhang,
Sangwoo Cho,
Kaiqiang Song,
Xiaoyang Wang,
Hongwei Wang,
Jiawei Zhang,
Dong Yu
Abstract:
Multi-document summarization aims to obtain core information from a collection of documents written on the same topic. This paper proposes a new holistic framework for unsupervised multi-document extractive summarization. Our method incorporates the holistic beam search inference method associated with the holistic measurements, named Subset Representative Index (SRI). SRI balances the importance…
▽ More
Multi-document summarization aims to obtain core information from a collection of documents written on the same topic. This paper proposes a new holistic framework for unsupervised multi-document extractive summarization. Our method incorporates the holistic beam search inference method associated with the holistic measurements, named Subset Representative Index (SRI). SRI balances the importance and diversity of a subset of sentences from the source documents and can be calculated in unsupervised and adaptive manners. To demonstrate the effectiveness of our method, we conduct extensive experiments on both small and large-scale multi-document summarization datasets under both unsupervised and adaptive settings. The proposed method outperforms strong baselines by a significant margin, as indicated by the resulting ROUGE scores and diversity measures. Our findings also suggest that diversity is essential for improving multi-document summary performance.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
Authors:
Haochen Wang,
Junsong Fan,
Yuxi Wang,
Kaiyou Song,
Tong Wang,
Zhaoxiang Zhang
Abstract:
As it is empirically observed that Vision Transformers (ViTs) are quite insensitive to the order of input tokens, the need for an appropriate self-supervised pretext task that enhances the location awareness of ViTs is becoming evident. To address this, we present DropPos, a novel pretext task designed to reconstruct Dropped Positions. The formulation of DropPos is simple: we first drop a large ra…
▽ More
As it is empirically observed that Vision Transformers (ViTs) are quite insensitive to the order of input tokens, the need for an appropriate self-supervised pretext task that enhances the location awareness of ViTs is becoming evident. To address this, we present DropPos, a novel pretext task designed to reconstruct Dropped Positions. The formulation of DropPos is simple: we first drop a large random subset of positional embeddings and then the model classifies the actual position for each non-overlap** patch among all possible positions solely based on their visual appearance. To avoid trivial solutions, we increase the difficulty of this task by kee** only a subset of patches visible. Additionally, considering there may be different patches with similar visual appearances, we propose position smoothing and attentive reconstruction strategies to relax this classification problem, since it is not necessary to reconstruct their exact positions in these cases. Empirical evaluations of DropPos show strong capabilities. DropPos outperforms supervised pre-training and achieves competitive results compared with state-of-the-art self-supervised alternatives on a wide range of downstream benchmarks. This suggests that explicitly encouraging spatial reasoning abilities, as DropPos does, indeed contributes to the improved location awareness of ViTs. The code is publicly available at https://github.com/Haochen-Wang409/DropPos.
△ Less
Submitted 21 September, 2023; v1 submitted 7 September, 2023;
originally announced September 2023.
-
PromptTTS 2: Describing and Generating Voices with Text Prompt
Authors:
Yichong Leng,
Zhifang Guo,
Kai Shen,
Xu Tan,
Zeqian Ju,
Yanqing Liu,
Yufei Liu,
Dongchao Yang,
Leying Zhang,
Kaitao Song,
Lei He,
Xiang-Yang Li,
Sheng Zhao,
Tao Qin,
Jiang Bian
Abstract:
Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text…
▽ More
Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech. In this work, we introduce PromptTTS 2 to address these challenges with a variation network to provide variability information of voice not captured by text prompts, and a prompt generation pipeline to utilize the large language models (LLM) to compose high quality text prompts. Specifically, the variation network predicts the representation extracted from the reference speech (which contains full information about voice variability) based on the text prompt representation. For the prompt generation pipeline, it generates text prompts for speech with a speech language understanding model to recognize voice attributes (e.g., gender, speed) from speech and a large language model to formulate text prompts based on the recognition results. Experiments on a large-scale (44K hours) speech dataset demonstrate that compared to the previous works, PromptTTS 2 generates voices more consistent with text prompts and supports the sampling of diverse voice variability, thereby offering users more choices on voice generation. Additionally, the prompt generation pipeline produces high-quality text prompts, eliminating the large labeling cost. The demo page of PromptTTS 2 is available online.
△ Less
Submitted 11 October, 2023; v1 submitted 5 September, 2023;
originally announced September 2023.
-
Unified Single-Stage Transformer Network for Efficient RGB-T Tracking
Authors:
Jianqiang Xia,
DianXi Shi,
Ke Song,
Linna Song,
XiaoLei Wang,
Songchang **,
Li Zhou,
Yu Cheng,
Lei **,
Zheng Zhu,
Jianan Li,
Gang Wang,
Junliang Xing,
Jian Zhao
Abstract:
Most existing RGB-T tracking networks extract modality features in a separate manner, which lacks interaction and mutual guidance between modalities. This limits the network's ability to adapt to the diverse dual-modality appearances of targets and the dynamic relationships between the modalities. Additionally, the three-stage fusion tracking paradigm followed by these networks significantly restr…
▽ More
Most existing RGB-T tracking networks extract modality features in a separate manner, which lacks interaction and mutual guidance between modalities. This limits the network's ability to adapt to the diverse dual-modality appearances of targets and the dynamic relationships between the modalities. Additionally, the three-stage fusion tracking paradigm followed by these networks significantly restricts the tracking speed. To overcome these problems, we propose a unified single-stage Transformer RGB-T tracking network, namely USTrack, which unifies the above three stages into a single ViT (Vision Transformer) backbone with a dual embedding layer through self-attention mechanism. With this structure, the network can extract fusion features of the template and search region under the mutual interaction of modalities. Simultaneously, relation modeling is performed between these features, efficiently obtaining the search region fusion features with better target-background discriminability for prediction. Furthermore, we introduce a novel feature selection mechanism based on modality reliability to mitigate the influence of invalid modalities for prediction, further improving the tracking performance. Extensive experiments on three popular RGB-T tracking benchmarks demonstrate that our method achieves new state-of-the-art performance while maintaining the fastest inference speed 84.2FPS. In particular, MPR/MSR on the short-term and long-term subsets of VTUAV dataset increased by 11.1$\%$/11.7$\%$ and 11.3$\%$/9.7$\%$.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Microstructural and material property changes in severely deformed Eurofer-97
Authors:
Kay Song,
Guanze He,
Abdallah Reza,
Tamas Ungár,
Phani Karamched,
David Yang,
Ivan Tolkachev,
Kenichiro Mizohata,
David E J Armstrong,
Felix Hofmann
Abstract:
Severe plastic deformation changes the microstructure and properties of steels, which may be favourable for their use in structural components of nuclear reactors. In this study, high-pressure torsion (HPT) was used to refine the grain structure of Eurofer-97, a ferritic/ martensitic steel. Electron microscopy and X-ray diffraction were used to characterise the microstructural changes. Following H…
▽ More
Severe plastic deformation changes the microstructure and properties of steels, which may be favourable for their use in structural components of nuclear reactors. In this study, high-pressure torsion (HPT) was used to refine the grain structure of Eurofer-97, a ferritic/ martensitic steel. Electron microscopy and X-ray diffraction were used to characterise the microstructural changes. Following HPT, the average grain size reduced by a factor of $\sim$ 30, with a marked increase in high-angle grain boundaries. Dislocation density also increased by more than one order of magnitude. The thermal stability of the deformed material was investigated via in-situ annealing during synchrotron X-ray diffraction. This revealed substantial recovery between 450 K - 800 K. Irradiation with 20 MeV Fe-ions to $\sim$ 0.1 dpa caused a 20% reduction in dislocation density compared to the as-deformed material. However, HPT deformation prior to irradiation did not have a significant effect in mitigating the irradiation-induced reductions in thermal diffusivity and surface acoustic wave velocity of the material. These results provide a multi-faceted understanding of the changes in ferritic/martensitic steels due to severe plastic deformation, and how these changes can be used to alter material properties.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
Authors:
Sara Babakniya,
Ahmed Roushdy Elkordy,
Yahya H. Ezzeldin,
Qingfeng Liu,
Kee-Bong Song,
Mostafa El-Khamy,
Salman Avestimehr
Abstract:
Transfer learning via fine-tuning pre-trained transformer models has gained significant success in delivering state-of-the-art results across various NLP tasks. In the absence of centralized data, Federated Learning (FL) can benefit from distributed and private data of the FL edge clients for fine-tuning. However, due to the limited communication, computation, and storage capabilities of edge devi…
▽ More
Transfer learning via fine-tuning pre-trained transformer models has gained significant success in delivering state-of-the-art results across various NLP tasks. In the absence of centralized data, Federated Learning (FL) can benefit from distributed and private data of the FL edge clients for fine-tuning. However, due to the limited communication, computation, and storage capabilities of edge devices and the huge sizes of popular transformer models, efficient fine-tuning is crucial to make federated training feasible. This work explores the opportunities and challenges associated with applying parameter efficient fine-tuning (PEFT) methods in different FL settings for language tasks. Specifically, our investigation reveals that as the data across users becomes more diverse, the gap between fully fine-tuning the model and employing PEFT methods widens. To bridge this performance gap, we propose a method called SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data scenarios through a novel data-driven initialization technique. Our experimental results demonstrate that SLoRA achieves performance comparable to full fine-tuning, with significant sparse updates with approximately $\sim 1\%$ density while reducing training time by up to $90\%$.
△ Less
Submitted 12 August, 2023;
originally announced August 2023.
-
ESBMC v7.3: Model Checking C++ Programs using Clang AST
Authors:
Kunjian Song,
Mikhail R. Gadelha,
Franz Brauße,
Rafael S. Menezes,
Lucas C. Cordeiro
Abstract:
This paper introduces ESBMC v7.3, the latest Efficient SMT-Based Context-Bounded Model Checker version, which now incorporates a new clang-based C++ front-end. While the previous CPROVER-based front-end served well for handling C++03 programs, it encountered challenges kee** up with the evolving C++ language. As new language and library features were added in each C++ version, the limitations of…
▽ More
This paper introduces ESBMC v7.3, the latest Efficient SMT-Based Context-Bounded Model Checker version, which now incorporates a new clang-based C++ front-end. While the previous CPROVER-based front-end served well for handling C++03 programs, it encountered challenges kee** up with the evolving C++ language. As new language and library features were added in each C++ version, the limitations of the old front-end became apparent, leading to difficult-to-maintain code. Consequently, modern C++ programs were challenging to verify. To overcome this obstacle, we redeveloped the front-end, opting for a more robust approach using clang. The new front-end efficiently traverses the Abstract Syntax Tree (AST) in-memory using clang APIs and transforms each AST node into ESBMC's Intermediate Representation. Through extensive experimentation, our results demonstrate that ESBMC v7.3 with the new front-end significantly reduces parse and conversion errors, enabling successful verification of a wide range of C++ programs, thereby outperforming previous ESBMC versions.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Topological soliton molecule in quasi 1D charge density wave
Authors:
Taehwan Im,
Sun Kyu Song,
Jae Whan Park,
Han Woong Yeom
Abstract:
Soliton molecules, bound states of two solitons, can be important for the informatics using solitons and the quest for exotic particles in a wide range of physical systems from unconventional superconductors to nuclear matter and Higgs field, but have been observed only in temporal dimension for classical wave optical systems. Here, we identify a topological soliton molecule formed spatially in an…
▽ More
Soliton molecules, bound states of two solitons, can be important for the informatics using solitons and the quest for exotic particles in a wide range of physical systems from unconventional superconductors to nuclear matter and Higgs field, but have been observed only in temporal dimension for classical wave optical systems. Here, we identify a topological soliton molecule formed spatially in an electronic system, a quasi 1D charge density wave of indium atomic wires. This system is composed of two coupled Peierls chains, which are endowed with a Z$_4$ topology and three distinct, right-chiral, left-chiral, and non-chiral, solitons. Our scanning tunneling microscopy measurements identify a bound state of right- and left-chiral solitons with distinct in-gap states and net zero phase shift. Our density functional theory calculations reveal the attractive interaction of these solitons and the hybridization of their electronic states. This result initiates the study of the interaction between solitons in electronic systems, which can provide novel manybody electronic states and extra data-handling capacity beyond the given soliton topology.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Dose and compositional dependence of irradiation-induced property change in FeCr
Authors:
Kay Song,
Dina Sheyfer,
Kenichiro Mizohata,
Minyi Zhang,
Wenjun Liu,
Doğa Gürsoy,
David Yang,
Ivan Tolkachev,
Hongbing Yu,
David E J Armstrong,
Felix Hofmann
Abstract:
Ferritic/martensitic steels will be used as structural components in next generation nuclear reactors. Their successful operation relies on an understanding of irradiation-induced defect behaviour in the material. In this study, Fe and FeCr alloys (3-12%Cr) were irradiated with 20 MeV Fe-ions at 313 K to doses ranging between 0.00008 dpa to 6.0 dpa. This dose range covers six orders of magnitude,…
▽ More
Ferritic/martensitic steels will be used as structural components in next generation nuclear reactors. Their successful operation relies on an understanding of irradiation-induced defect behaviour in the material. In this study, Fe and FeCr alloys (3-12%Cr) were irradiated with 20 MeV Fe-ions at 313 K to doses ranging between 0.00008 dpa to 6.0 dpa. This dose range covers six orders of magnitude, spanning low, transition and high dose regimes. Lattice strain and hardness in the irradiated material were characterised with micro-beam Laue X-ray diffraction and nanoindentation, respectively.
Irradiation hardening was observed even at very low doses (0.00008 dpa) and showed a monotonic increase with dose up to 6.0 dpa. Lattice strain measurements of samples at 0.0008 dpa allow the calculation of equivalent Frenkel pair densities and corrections to the Norgett-Robinson-Torrens (NRT) model for Fe and FeCr alloys at low dose. NRT efficiency for FeCr is 0.2, which agrees with literature values for high irradiation energy. Lattice strain increases up to 0.8 dpa and then decreases when the damage dose is further increased. The strains measured in this study are lower and peak at a larger dose than predicted by atomistic simulations. This difference can be explained by taking temperature and impurities into account.
△ Less
Submitted 4 March, 2024; v1 submitted 1 August, 2023;
originally announced August 2023.
-
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
Authors:
Jiaao Chen,
Xiaoman Pan,
Dian Yu,
Kaiqiang Song,
Xiaoyang Wang,
Dong Yu,
Jianshu Chen
Abstract:
We consider the problem of eliciting compositional generalization capabilities in large language models (LLMs) with a novel type of prompting strategy. Compositional generalization empowers the LLMs to solve problems that are harder than the ones they have seen (i.e., easy-to-hard generalization), which is a critical reasoning capability of human-like intelligence. However, even the current state-…
▽ More
We consider the problem of eliciting compositional generalization capabilities in large language models (LLMs) with a novel type of prompting strategy. Compositional generalization empowers the LLMs to solve problems that are harder than the ones they have seen (i.e., easy-to-hard generalization), which is a critical reasoning capability of human-like intelligence. However, even the current state-of-the-art LLMs still struggle with this form of reasoning. To bridge this gap, we propose skills-in-context (SKiC) prompting, which instructs LLMs how to compose basic skills to resolve more complex problems. We find that it is crucial to demonstrate both the skills and the compositional examples within the same prompting context. With as few as two examplars, our SKiC prompting initiates strong synergies between skills and their composition capabilities. Notably, it empowers LLMs to solve unseen problems that require innovative skill compositions, achieving near-perfect generalization on a broad range of challenging compositionality tasks. Intriguingly, SKiC prompting unlocks the latent potential of LLMs, enabling them to leverage pre-existing internal skills acquired during earlier pre-training stages, even when these skills are not explicitly presented in the prompting context. This results in the capability of LLMs to solve unseen complex problems by activating and composing internal competencies. With such prominent features, SKiC prompting is able to achieve state-of-the-art performance on challenging mathematical reasoning benchmarks (e.g., MATH).
△ Less
Submitted 14 August, 2023; v1 submitted 1 August, 2023;
originally announced August 2023.
-
A Spectral Approach for the Dynamic Bradley-Terry Model
Authors:
Xin-Yu Tian,
Jian Shi,
Xiaotong Shen,
Kai Song
Abstract:
The dynamic ranking, due to its increasing importance in many applications, is becoming crucial, especially with the collection of voluminous time-dependent data. One such application is sports statistics, where dynamic ranking aids in forecasting the performance of competitive teams, drawing on historical and current data. Despite its usefulness, predicting and inferring rankings pose challenges…
▽ More
The dynamic ranking, due to its increasing importance in many applications, is becoming crucial, especially with the collection of voluminous time-dependent data. One such application is sports statistics, where dynamic ranking aids in forecasting the performance of competitive teams, drawing on historical and current data. Despite its usefulness, predicting and inferring rankings pose challenges in environments necessitating time-dependent modeling. This paper introduces a spectral ranker called Kernel Rank Centrality, designed to rank items based on pairwise comparisons over time. The ranker operates via kernel smoothing in the Bradley-Terry model, utilizing a Markov chain model. Unlike the maximum likelihood approach, the spectral ranker is nonparametric, demands fewer model assumptions and computations, and allows for real-time ranking. We establish the asymptotic distribution of the ranker by applying an innovative group inverse technique, resulting in a uniform and precise entrywise expansion. This result allows us to devise a new inferential method for predictive inference, previously unavailable in existing approaches. Our numerical examples showcase the ranker's utility in predictive accuracy and constructing an uncertainty measure for prediction, leveraging data from the National Basketball Association (NBA). The results underscore our method's potential compared to the gold standard in sports, the Arpad Elo rating system.
△ Less
Submitted 4 August, 2023; v1 submitted 31 July, 2023;
originally announced July 2023.
-
A Differentially Private Weighted Empirical Risk Minimization Procedure and its Application to Outcome Weighted Learning
Authors:
Spencer Giddens,
Yiwang Zhou,
Kevin R. Krull,
Tara M. Brinkman,
Peter X. K. Song,
Fang Liu
Abstract:
It is commonplace to use data containing personal information to build predictive models in the framework of empirical risk minimization (ERM). While these models can be highly accurate in prediction, results obtained from these models with the use of sensitive data may be susceptible to privacy attacks. Differential privacy (DP) is an appealing framework for addressing such data privacy issues by…
▽ More
It is commonplace to use data containing personal information to build predictive models in the framework of empirical risk minimization (ERM). While these models can be highly accurate in prediction, results obtained from these models with the use of sensitive data may be susceptible to privacy attacks. Differential privacy (DP) is an appealing framework for addressing such data privacy issues by providing mathematically provable bounds on the privacy loss incurred when releasing information from sensitive data. Previous work has primarily concentrated on applying DP to unweighted ERM. We consider an important generalization to weighted ERM (wERM). In wERM, each individual's contribution to the objective function can be assigned varying weights. In this context, we propose the first differentially private wERM algorithm, backed by a rigorous theoretical proof of its DP guarantees under mild regularity conditions. Extending the existing DP-ERM procedures to wERM paves a path to deriving privacy-preserving learning methods for individualized treatment rules, including the popular outcome weighted learning (OWL). We evaluate the performance of the DP-wERM application to OWL in a simulation study and in a real clinical trial of melatonin for sleep health. All empirical results demonstrate the viability of training OWL models via wERM with DP guarantees while maintaining sufficiently useful model performance. Therefore, we recommend practitioners consider implementing the proposed privacy-preserving OWL procedure in real-world scenarios involving sensitive data.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Synthetic Decomposition for Counterfactual Predictions
Authors:
Nathan Canen,
Kyungchul Song
Abstract:
Counterfactual predictions are challenging when the policy variable goes beyond its pre-policy support. However, in many cases, information about the policy of interest is available from different ("source") regions where a similar policy has already been implemented. In this paper, we propose a novel method of using such data from source regions to predict a new policy in a target region. Instead…
▽ More
Counterfactual predictions are challenging when the policy variable goes beyond its pre-policy support. However, in many cases, information about the policy of interest is available from different ("source") regions where a similar policy has already been implemented. In this paper, we propose a novel method of using such data from source regions to predict a new policy in a target region. Instead of relying on extrapolation of a structural relationship using a parametric specification, we formulate a transferability condition and construct a synthetic outcome-policy relationship such that it is as close as possible to meeting the condition. The synthetic relationship weighs both the similarity in distributions of observables and in structural relationships. We develop a general procedure to construct asymptotic confidence intervals for counterfactual predictions and prove its asymptotic validity. We then apply our proposal to predict average teenage employment in Texas following a counterfactual increase in the minimum wage.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Authors:
Kun Song,
Yi lei,
Peikun Chen,
Yiqing Cao,
Kun Wei,
Yongmao Zhang,
Lei Xie,
Ning Jiang,
Guoqing Zhao
Abstract:
This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-speech translation (S2ST) task which aims to translate from English speech of multi-source to Chinese speech. The system is built in a cascaded manner consisting of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS). We make tremendous efforts to handle the challenging multi-source input. Spec…
▽ More
This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-speech translation (S2ST) task which aims to translate from English speech of multi-source to Chinese speech. The system is built in a cascaded manner consisting of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS). We make tremendous efforts to handle the challenging multi-source input. Specifically, to improve the robustness to multi-source speech input, we adopt various data augmentation strategies and a ROVER-based score fusion on multiple ASR model outputs. To better handle the noisy ASR transcripts, we introduce a three-stage fine-tuning strategy to improve translation accuracy. Finally, we build a TTS model with high naturalness and sound quality, which leverages a two-stage framework, using network bottleneck features as a robust intermediate representation for speaker timbre and linguistic content disentanglement. Based on the two-stage framework, pre-trained speaker embedding is leveraged as a condition to transfer the speaker timbre in the source English speech to the translated Chinese speech. Experimental results show that our system has high translation accuracy, speech naturalness, sound quality, and speaker similarity. Moreover, it shows good robustness to multi-source data.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
VOLTA: Improving Generative Diversity by Variational Mutual Information Maximizing Autoencoder
Authors:
Yueen Ma,
Dafeng Chi,
**g**g Li,
Kai Song,
Yuzheng Zhuang,
Irwin King
Abstract:
The natural language generation domain has witnessed great success thanks to Transformer models. Although they have achieved state-of-the-art generative quality, they often neglect generative diversity. Prior attempts to tackle this issue suffer from either low model capacity or over-complicated architectures. Some recent methods employ the VAE framework to enhance diversity, but their latent vari…
▽ More
The natural language generation domain has witnessed great success thanks to Transformer models. Although they have achieved state-of-the-art generative quality, they often neglect generative diversity. Prior attempts to tackle this issue suffer from either low model capacity or over-complicated architectures. Some recent methods employ the VAE framework to enhance diversity, but their latent variables fully depend on the input context, restricting exploration of the latent space. In this paper, we introduce VOLTA, a framework that elevates generative diversity by bridging Transformer with VAE via a more effective cross-attention-based connection, departing from conventional embedding concatenation or summation. Additionally, we propose integrating InfoGAN-style latent codes to enable input-independent variability, further diversifying the generation. Moreover, our framework accommodates discrete inputs alongside its existing support for continuous inputs. We perform comprehensive experiments with two types of Transformers on six datasets from three different NLG tasks to show that our approach can significantly improve generative diversity while maintaining generative quality.
△ Less
Submitted 18 March, 2024; v1 submitted 3 July, 2023;
originally announced July 2023.
-
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Authors:
Yujia Xiao,
Shaofei Zhang,
Xi Wang,
Xu Tan,
Lei He,
Sheng Zhao,
Frank K. Soong,
Tan Lee
Abstract:
While state-of-the-art Text-to-Speech systems can generate natural speech of very high quality at sentence level, they still meet great challenges in speech generation for paragraph / long-form reading. Such deficiencies are due to i) ignorance of cross-sentence contextual information, and ii) high computation and memory cost for long-form synthesis. To address these issues, this work develops a l…
▽ More
While state-of-the-art Text-to-Speech systems can generate natural speech of very high quality at sentence level, they still meet great challenges in speech generation for paragraph / long-form reading. Such deficiencies are due to i) ignorance of cross-sentence contextual information, and ii) high computation and memory cost for long-form synthesis. To address these issues, this work develops a lightweight yet effective TTS system, ContextSpeech. Specifically, we first design a memory-cached recurrence mechanism to incorporate global text and speech context into sentence encoding. Then we construct hierarchically-structured textual semantics to broaden the scope for global context enhancement. Additionally, we integrate linearized self-attention to improve model efficiency. Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency. Audio samples are available at: https://contextspeech.github.io/demo/
△ Less
Submitted 7 October, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Leveraging Skill-to-Skill Supervision for Knowledge Tracing
Authors:
Hyeondey Kim,
**woo Nam,
Minjae Lee,
Yun Jegal,
Kyungwoo Song
Abstract:
Knowledge tracing plays a pivotal role in intelligent tutoring systems. This task aims to predict the probability of students answering correctly to specific questions. To do so, knowledge tracing systems should trace the knowledge state of the students by utilizing their problem-solving history and knowledge about the problems. Recent advances in knowledge tracing models have enabled better explo…
▽ More
Knowledge tracing plays a pivotal role in intelligent tutoring systems. This task aims to predict the probability of students answering correctly to specific questions. To do so, knowledge tracing systems should trace the knowledge state of the students by utilizing their problem-solving history and knowledge about the problems. Recent advances in knowledge tracing models have enabled better exploitation of problem solving history. However, knowledge about problems has not been studied, as well compared to students' answering histories. Knowledge tracing algorithms that incorporate knowledge directly are important to settings with limited data or cold starts. Therefore, we consider the problem of utilizing skill-to-skill relation to knowledge tracing. In this work, we introduce expert labeled skill-to-skill relationships. Moreover, we also provide novel methods to construct a knowledge-tracing model to leverage human experts' insight regarding relationships between skills. The results of an extensive experimental analysis show that our method outperformed a baseline Transformer model. Furthermore, we found that the extent of our model's superiority was greater in situations with limited data, which allows a smooth cold start of our model.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
R-PMAC: A Robust Preamble Based MAC Mechanism Applied in Industrial Internet of Things
Authors:
Kai Song,
Biqian Feng,
Yongpeng Wu,
Zhen Gao,
Wenjun Zhang
Abstract:
This paper proposes a novel media access control (MAC) mechanism, called the robust preamble-based MAC mechanism (R-PMAC), which can be applied to power line communication (PLC) networks in the context of the Industrial Internet of Things (IIoT). Compared with other MAC mechanisms such as P-MAC and the MAC layer of IEEE1901.1, R-PMAC has higher networking speed. Besides, it supports whitelist auth…
▽ More
This paper proposes a novel media access control (MAC) mechanism, called the robust preamble-based MAC mechanism (R-PMAC), which can be applied to power line communication (PLC) networks in the context of the Industrial Internet of Things (IIoT). Compared with other MAC mechanisms such as P-MAC and the MAC layer of IEEE1901.1, R-PMAC has higher networking speed. Besides, it supports whitelist authentication and functions properly in the presence of data frame loss. Firstly, we outline three basic mechanisms of R-PMAC, containing precise time difference calculation, preambles generation and short ID allocation. Secondly, we elaborate its networking process of single layer and multiple layers. Thirdly, we illustrate its robust mechanisms, including collision handling and data retransmission. Moreover, a low-cost hardware platform is established to measure the time of connecting hundreds of PLC nodes for the R-PMAC, P-MAC, and IEEE1901.1 mechanisms in a real power line environment. The experiment results show that R-PMAC outperforms the other mechanisms by achieving a 50% reduction in networking time. These findings indicate that the R-PMAC mechanism holds great potential for quickly and effectively building a PLC network in actual industrial scenarios.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Improving Tuning-Free Real Image Editing with Proximal Guidance
Authors:
Ligong Han,
Song Wen,
Qi Chen,
Zhixing Zhang,
Kunpeng Song,
Mengwei Ren,
Ruijiang Gao,
Anastasis Stathopoulos,
Xiaoxiao He,
Yuxiao Chen,
Di Liu,
Qilong Zhangli,
**dong Jiang,
Zhaoyang Xia,
Akash Srivastava,
Dimitris Metaxas
Abstract:
DDIM inversion has revealed the remarkable potential of real image editing within diffusion-based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier-free guidance (CFG) scales being used for enhanced editing. Null-text inversion (NTI) optimizes null embeddings to align the reconstruction and inversion trajectories with larger CFG scales, enabling real image editing…
▽ More
DDIM inversion has revealed the remarkable potential of real image editing within diffusion-based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier-free guidance (CFG) scales being used for enhanced editing. Null-text inversion (NTI) optimizes null embeddings to align the reconstruction and inversion trajectories with larger CFG scales, enabling real image editing with cross-attention control. Negative-prompt inversion (NPI) further offers a training-free closed-form solution of NTI. However, it may introduce artifacts and is still constrained by DDIM reconstruction quality. To overcome these limitations, we propose proximal guidance and incorporate it to NPI with cross-attention control. We enhance NPI with a regularization term and reconstruction guidance, which reduces artifacts while capitalizing on its training-free nature. Additionally, we extend the concepts to incorporate mutual self-attention control, enabling geometry and layout alterations in the editing process. Our method provides an efficient and straightforward approach, effectively addressing real image editing tasks with minimal computational overhead.
△ Less
Submitted 5 July, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.