-
Enhancing spatial auditory attention decoding with neuroscience-inspired prototype training
Authors:
Zelin Qiu,
Jianjun Gu,
Dingding Yao,
Junfeng Li
Abstract:
The spatial auditory attention decoding (Sp-AAD) technology aims to determine the direction of auditory attention in multi-talker scenarios via neural recordings. Despite the success of recent Sp-AAD algorithms, their performance is hindered by trial-specific features in EEG data. This study aims to improve decoding performance against these features. Studies in neuroscience indicate that spatial…
▽ More
The spatial auditory attention decoding (Sp-AAD) technology aims to determine the direction of auditory attention in multi-talker scenarios via neural recordings. Despite the success of recent Sp-AAD algorithms, their performance is hindered by trial-specific features in EEG data. This study aims to improve decoding performance against these features. Studies in neuroscience indicate that spatial auditory attention can be reflected in the topological distribution of EEG energy across different frequency bands. This insight motivates us to propose Prototype Training, a neuroscience-inspired method for Sp-AAD. This method constructs prototypes with enhanced energy distribution representations and reduced trial-specific characteristics, enabling the model to better capture auditory attention features. To implement prototype training, an EEGWaveNet that employs the wavelet transform of EEG is further proposed. Detailed experiments indicate that the EEGWaveNet with prototype training outperforms other competitive models on various datasets, and the effectiveness of the proposed method is also validated. As a training method independent of model architecture, prototype training offers new insights into the field of Sp-AAD.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
A Closer Look into Mixture-of-Experts in Large Language Models
Authors:
Ka Man Lo,
Zeyu Huang,
Zihan Qiu,
Zili Wang,
Jie Fu
Abstract:
Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks. By sparsely activating a subset of parameters for each token, MoE architecture could increase the model size without sacrificing computational efficiency, achieving a better trade-off between performance and training costs. However, the underlying mechani…
▽ More
Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks. By sparsely activating a subset of parameters for each token, MoE architecture could increase the model size without sacrificing computational efficiency, achieving a better trade-off between performance and training costs. However, the underlying mechanism of MoE still lacks further exploration, and its modularization degree remains questionable. In this paper, we make an initial attempt to understand the inner workings of MoE-based large language models. Concretely, we comprehensively study the parametric and behavioral features of three recent MoE-based models and reveal some intriguing observations, including (1) Neurons act like fine-grained experts. (2) The router of MoE usually selects experts with larger output norms. (3) The expert diversity increases as the layer increases, while the last layer is an outlier. Based on the observations, we also provide suggestions for a broad spectrum of MoE practitioners, such as router design and expert allocation. We hope this work could shed light on future research on the MoE framework and other modular architectures. Code is available at https://github.com/kamanphoebe/Look-into-MoEs.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Entropy-Based Decoding for Retrieval-Augmented Large Language Models
Authors:
Zexuan Qiu,
Zi**g Ou,
Bin Wu,
**g**g Li,
Aiwei Liu,
Irwin King
Abstract:
Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue, where the generated responses are negatively influenced by noise from both external and internal knowledge sources. In this paper, we introduce a novel, trainin…
▽ More
Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue, where the generated responses are negatively influenced by noise from both external and internal knowledge sources. In this paper, we introduce a novel, training-free decoding method guided by entropy considerations to mitigate this issue. Our approach utilizes entropy-based document-parallel ensemble decoding to prioritize low-entropy distributions from retrieved documents, thereby enhancing the extraction of relevant information of context. Additionally, it incorporates a contrastive decoding mechanism that contrasts the obtained low-entropy ensemble distribution with the high-entropy distribution derived from the model's internal knowledge across layers, which ensures a greater emphasis on reliable external information. Extensive experiments on open-domain question answering datasets demonstrate the superiority of our method.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Unlocking Continual Learning Abilities in Language Models
Authors:
Wenyu Du,
Shuang Cheng,
Tongxu Luo,
Zihan Qiu,
Zeyu Huang,
Ka Chun Cheung,
Reynold Cheng,
Jie Fu
Abstract:
Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task informa…
▽ More
Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task information are often unavailable or costly to collect, hindering the availability of current CL approaches for LMs. To address this limitation, we introduce $\textbf{MIGU}$ ($\textbf{M}$agn$\textbf{I}$tude-based $\textbf{G}$radient $\textbf{U}$pdating for continual learning), a rehearsal-free and task-label-free method that only updates the model parameters with large magnitudes of output in LMs' linear layers. MIGU is based on our observation that the L1-normalized magnitude distribution of the output in LMs' linear layers is different when the LM models deal with different task data. By imposing this simple constraint on the gradient update process, we can leverage the inherent behaviors of LMs, thereby unlocking their innate CL abilities. Our experiments demonstrate that MIGU is universally applicable to all three LM architectures (T5, RoBERTa, and Llama2), delivering state-of-the-art or on-par performance across continual finetuning and continual pre-training settings on four CL benchmarks. For example, MIGU brings a 15.2% average accuracy improvement over conventional parameter-efficient finetuning baselines in a 15-task CL benchmark. MIGU can also seamlessly integrate with all three existing CL types to further enhance performance. Code is available at \href{https://github.com/wenyudu/MIGU}{this https URL}.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Foundry compatible, efficient wafer-scale manufacturing of ultra-low loss, high-density Si$_3$N$_4$ photonic integrated circuits
Authors:
Xinru Ji,
Rui Ning Wang,
Yang Liu,
Johann Riemensberger,
Zheru Qiu,
Tobias J. Kippenberg
Abstract:
Silicon nitride (Si$_3$N$_4$) photonic integrated circuits (PICs) have shown low linear loss, negligible nonlinear loss, and high power handling over traditional silicon photonics. To achieve high-density photonic integration and high effective nonlinearity through tight optical confinement, thick stoichiometric Si$_3$N$_4$ films are indispensable. However, when using low-pressure chemical vapor d…
▽ More
Silicon nitride (Si$_3$N$_4$) photonic integrated circuits (PICs) have shown low linear loss, negligible nonlinear loss, and high power handling over traditional silicon photonics. To achieve high-density photonic integration and high effective nonlinearity through tight optical confinement, thick stoichiometric Si$_3$N$_4$ films are indispensable. However, when using low-pressure chemical vapor deposition (LPCVD) to achieve high optical material transparency, Si$_3$N$_4$ films exhibit large tensile stress on the order of GPa. Methods for crack prevention are therefore essential. The photonic Damascene process has addressed this issue, attaining record low loss Si$_3$N$_4$ PICs, but it lacks control of the waveguide height. Conversely, precise waveguide dimension and ultra-low loss have been achieved with subtractive processing, but this method is not compatible with mass production due to the use of electron beam lithography. To date, an outstanding challenge is to attain both lithographic precision and ultra-low loss in high confinement Si$_3$N$_4$ PICs that are compatible with large-scale foundry manufacturing. Here, we present a single-step deposited, DUV-based subtractive method for producing wafer-scale ultra-low loss Si$_3$N$_4$ PICs that harmonize these necessities. By employing deep etching of densely distributed, interconnected trenches into the substrate, we effectively mitigate the tensile stress in the Si$_3$N$_4$ layer, enabling direct deposition of thick films without cracking and substantially prolonged storage duration. Lastly, we identify ultraviolet (UV) radiation-induced damage that can be remedied through rapid thermal annealing. Collectively, we develop ultra-low loss Si$_3$N$_4$ microresonators and 0.5 m-long spiral waveguides with losses down to 1.4 dB/m at 1550 nm with high production yield.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
Authors:
Haoze Wu,
Zihan Qiu,
Zili Wang,
Hang Zhao,
Jie Fu
Abstract:
Mixture-of-Experts (MoE) has been demonstrated as an efficient method to scale up models. By dynamically and sparsely selecting activated experts, MoE can effectively reduce computational costs. Despite the success, we observe that many tokens in the MoE models have uncertain routing results. These tokens have nearly equal scores for choosing each expert, and we demonstrate that this uncertainty c…
▽ More
Mixture-of-Experts (MoE) has been demonstrated as an efficient method to scale up models. By dynamically and sparsely selecting activated experts, MoE can effectively reduce computational costs. Despite the success, we observe that many tokens in the MoE models have uncertain routing results. These tokens have nearly equal scores for choosing each expert, and we demonstrate that this uncertainty can lead to incorrect selections. Inspired by the Global Workspace Theory (GWT), we propose a new fine-tuning method, GW-MoE, to address this issue. The core idea is to broadcast the uncertain tokens across experts during fine-tuning. Therefore, these tokens can acquire the necessary knowledge from any expert during inference and become less sensitive to the choice. GW-MoE does not introduce additional inference overhead. We validate that GW can mitigate the uncertain problem and consistently improve in different tasks (text classification, question answering, summarization, code generation, and mathematical problem solving) and model sizes (650M and 8B parameters).
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Grammaticality Representation in ChatGPT as Compared to Linguists and Laypeople
Authors:
Zhuang Qiu,
Xufeng Duan,
Zhenguang G. Cai
Abstract:
Large language models (LLMs) have demonstrated exceptional performance across various linguistic tasks. However, it remains uncertain whether LLMs have developed human-like fine-grained grammatical intuition. This preregistered study (https://osf.io/t5nes) presents the first large-scale investigation of ChatGPT's grammatical intuition, building upon a previous study that collected laypeople's gram…
▽ More
Large language models (LLMs) have demonstrated exceptional performance across various linguistic tasks. However, it remains uncertain whether LLMs have developed human-like fine-grained grammatical intuition. This preregistered study (https://osf.io/t5nes) presents the first large-scale investigation of ChatGPT's grammatical intuition, building upon a previous study that collected laypeople's grammatical judgments on 148 linguistic phenomena that linguists judged to be grammatical, ungrammatical, or marginally grammatical (Sprouse, Schutze, & Almeida, 2013). Our primary focus was to compare ChatGPT with both laypeople and linguists in the judgement of these linguistic constructions. In Experiment 1, ChatGPT assigned ratings to sentences based on a given reference sentence. Experiment 2 involved rating sentences on a 7-point scale, and Experiment 3 asked ChatGPT to choose the more grammatical sentence from a pair. Overall, our findings demonstrate convergence rates ranging from 73% to 95% between ChatGPT and linguists, with an overall point-estimate of 89%. Significant correlations were also found between ChatGPT and laypeople across all tasks, though the correlation strength varied by task. We attribute these results to the psychometric nature of the judgment tasks and the differences in language processing styles between humans and LLMs.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Data Caching for Enterprise-Grade Petabyte-Scale OLAP
Authors:
Chunxu Tang,
Bin Fan,
**g Zhao,
Chen Liang,
Yi Wang,
Beinan Wang,
Ziyue Qiu,
Lu Qiu,
Bowen Ding,
Shouzhuo Sun,
Saiguang Che,
Jiaming Mai,
Shouwei Chen,
Yu Zhu,
Jianjian Xie,
Yutian,
Sun,
Yao Li,
Yangjun Zhang,
Ke Wang,
Mingmin Chen
Abstract:
With the exponential growth of data and evolving use cases, petabyte-scale OLAP data platforms are increasingly adopting a model that decouples compute from storage. This shift, evident in organizations like Uber and Meta, introduces operational challenges including massive, read-heavy I/O traffic with potential throttling, as well as skewed and fragmented data access patterns. Addressing these ch…
▽ More
With the exponential growth of data and evolving use cases, petabyte-scale OLAP data platforms are increasingly adopting a model that decouples compute from storage. This shift, evident in organizations like Uber and Meta, introduces operational challenges including massive, read-heavy I/O traffic with potential throttling, as well as skewed and fragmented data access patterns. Addressing these challenges, this paper introduces the Alluxio local (edge) cache, a highly effective architectural optimization tailored for such environments. This embeddable cache, optimized for petabyte-scale data analytics, leverages local SSD resources to alleviate network I/O and API call pressures, significantly improving data transfer efficiency. Integrated with OLAP systems like Presto and storage services like HDFS, the Alluxio local cache has demonstrated its effectiveness in handling large-scale, enterprise-grade workloads over three years of deployment at Uber and Meta. We share insights and operational experiences in implementing these optimizations, providing valuable perspectives on managing modern, massive-scale OLAP workloads.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Nonlinear saturation of reversed shear Alfven eigenmode via high-frequency quasi-mode generation
Authors:
Zhiwen Cheng,
Guangyu Wei,
Lei Ye,
Zhiyong Qiu
Abstract:
A nonlinear saturation mechanism for reversed shear Alfven eigenmode (RSAE) is proposed and analysed, and is shown to be of relevance to typical reactor parameter region. The saturation is achieved through the generation of high-frequency quasi-mode due to nonlinear coupling of two RSAEs, which is then damped due to coupling with the shear Alfven continuum, and leads to the nonlinear saturation of…
▽ More
A nonlinear saturation mechanism for reversed shear Alfven eigenmode (RSAE) is proposed and analysed, and is shown to be of relevance to typical reactor parameter region. The saturation is achieved through the generation of high-frequency quasi-mode due to nonlinear coupling of two RSAEs, which is then damped due to coupling with the shear Alfven continuum, and leads to the nonlinear saturation of the primary RSAEs . An estimation of the nonlinear dam** rate is also provided.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Maintaining Diversity Provably Helps in Evolutionary Multimodal Optimization
Authors:
Shengjie Ren,
Zhijia Qiu,
Chao Bian,
Miqing Li,
Chao Qian
Abstract:
In the real world, there exist a class of optimization problems that multiple (local) optimal solutions in the solution space correspond to a single point in the objective space. In this paper, we theoretically show that for such multimodal problems, a simple method that considers the diversity of solutions in the solution space can benefit the search in evolutionary algorithms (EAs). Specifically…
▽ More
In the real world, there exist a class of optimization problems that multiple (local) optimal solutions in the solution space correspond to a single point in the objective space. In this paper, we theoretically show that for such multimodal problems, a simple method that considers the diversity of solutions in the solution space can benefit the search in evolutionary algorithms (EAs). Specifically, we prove that the proposed method, working with crossover, can help enhance the exploration, leading to polynomial or even exponential acceleration on the expected running time. This result is derived by rigorous running time analysis in both single-objective and multi-objective scenarios, including $(μ+1)$-GA solving the widely studied single-objective problem, Jump, and NSGA-II and SMS-EMOA (two well-established multi-objective EAs) solving the widely studied bi-objective problem, OneJumpZeroJump. Experiments are also conducted to validate the theoretical results. We hope that our results may encourage the exploration of diversity maintenance in the solution space for multi-objective optimization, where existing EAs usually only consider the diversity in the objective space and can easily be trapped in local optima.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Authors:
Wenyu Du,
Tongxu Luo,
Zihan Qiu,
Zeyu Huang,
Yikang Shen,
Reynold Cheng,
Yike Guo,
Jie Fu
Abstract:
LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical $\underline{\textit{O}}$bstacles: ($\textit{O}$1) lack of comprehen…
▽ More
LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical $\underline{\textit{O}}$bstacles: ($\textit{O}$1) lack of comprehensive evaluation, ($\textit{O}$2) untested viability for scaling, and ($\textit{O}$3) lack of empirical guidelines. To tackle $\textit{O}$1, we summarize existing approaches into four atomic growth operators and systematically evaluate them in a standardized LLM pre-training setting. Our findings reveal that a depthwise stacking operator, called $G_{\text{stack}}$, exhibits remarkable acceleration in training, leading to decreased loss and improved overall performance on eight standard NLP benchmarks compared to strong baselines. Motivated by these promising results, we conduct extensive experiments to delve deeper into $G_{\text{stack}}$ to address $\textit{O}$2 and $\textit{O}$3. For $\textit{O}$2 (untested scalability), our study shows that $G_{\text{stack}}$ is scalable and consistently performs well, with experiments up to 7B LLMs after growth and pre-training LLMs with 750B tokens. For example, compared to a conventionally trained 7B model using 300B tokens, our $G_{\text{stack}}$ model converges to the same loss with 194B tokens, resulting in a 54.6\% speedup. We further address $\textit{O}$3 (lack of empirical guidelines) by formalizing guidelines to determine growth timing and growth factor for $G_{\text{stack}}$, making it practical in general LLM pre-training. We also provide in-depth discussions and comprehensive ablation studies of $G_{\text{stack}}$. Our code and pre-trained model are available at $\href{https://llm-stacking.github.io/}{https://llm-stacking.github.io/}$.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Semi-supervised Anomaly Detection via Adaptive Reinforcement Learning-Enabled Method with Causal Inference for Sensor Signals
Authors:
Xiangwei Chen,
Ruliang Xiaoa,
Zhixia Zeng,
Zhipeng Qiu,
Shi Zhang,
Xin Du
Abstract:
Semi-supervised anomaly detection for sensor signals is critical in ensuring system reliability in smart manufacturing. However, existing methods rely heavily on data correlation, neglecting causality and leading to potential misinterpretations due to confounding factors. Moreover, while current reinforcement learning-based methods can effectively identify known and unknown anomalies with limited…
▽ More
Semi-supervised anomaly detection for sensor signals is critical in ensuring system reliability in smart manufacturing. However, existing methods rely heavily on data correlation, neglecting causality and leading to potential misinterpretations due to confounding factors. Moreover, while current reinforcement learning-based methods can effectively identify known and unknown anomalies with limited labeled samples, these methods still face several challenges, such as under-utilization of priori knowledge, lack of model flexibility, and deficient reward feedback during environmental interactions. To address the above problems, this paper innovatively constructs a counterfactual causal reinforcement learning model, termed Triple-Assisted Causal Reinforcement Learning Anomaly Detector (Tri-CRLAD). The model leverages causal inference to extract the intrinsic causal feature in data, enhancing the agent's utilization of prior knowledge and improving its generalization capability. In addition, Tri-CRLAD features a triple decision support mechanism, including a sampling strategy based on historical similarity, an adaptive threshold smoothing adjustment strategy, and an adaptive decision reward mechanism. These mechanisms further enhance the flexibility and generalization ability of the model, enabling it to effectively respond to various complex and dynamically changing environments. Experimental results across seven diverse sensor signal datasets demonstrate that Tri-CRLAD outperforms nine state-of-the-art baseline methods. Notably, Tri-CRLAD achieves up to a 23\% improvement in anomaly detection stability with minimal known anomaly samples, highlighting its potential in semi-supervised anomaly detection scenarios. Our code is available at https://github.com/Aoudsung/Tri-CRLAD.
△ Less
Submitted 16 May, 2024; v1 submitted 11 May, 2024;
originally announced May 2024.
-
Efficient PAC Learnability of Dynamical Systems Over Multilayer Networks
Authors:
Zirou Qiu,
Abhi** Adiga,
Madhav V. Marathe,
S. S. Ravi,
Daniel J. Rosenkrantz,
Richard E. Stearns,
Anil Vullikanti
Abstract:
Networked dynamical systems are widely used as formal models of real-world cascading phenomena, such as the spread of diseases and information. Prior research has addressed the problem of learning the behavior of an unknown dynamical system when the underlying network has a single layer. In this work, we study the learnability of dynamical systems over multilayer networks, which are more realistic…
▽ More
Networked dynamical systems are widely used as formal models of real-world cascading phenomena, such as the spread of diseases and information. Prior research has addressed the problem of learning the behavior of an unknown dynamical system when the underlying network has a single layer. In this work, we study the learnability of dynamical systems over multilayer networks, which are more realistic and challenging. First, we present an efficient PAC learning algorithm with provable guarantees to show that the learner only requires a small number of training examples to infer an unknown system. We further provide a tight analysis of the Natarajan dimension which measures the model complexity. Asymptotically, our bound on the Nararajan dimension is tight for almost all multilayer graphs. The techniques and insights from our work provide the theoretical foundations for future investigations of learning problems for multilayer dynamical systems.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
A note on the $ Π$-property of some subgroups of finite groups
Authors:
Zhengtian Qiu,
Jianjun Liu,
Guiyun Chen
Abstract:
Let $ H $ be a subgroup of a finite group $ G $. We say that $ H $ satisfies the $ Π$-property in $ G $ if for any chief factor $ L / K $ of $ G $, $ |G/K : N_{G/K}(HK/K\cap L/K )| $ is a $ π(HK/K\cap L/K) $-number. In this paper, we obtain some criteria for the $ p $-supersolubility or $ p $-nilpotency of a finite group and extend some known results by concerning some subgroups that satisfy the…
▽ More
Let $ H $ be a subgroup of a finite group $ G $. We say that $ H $ satisfies the $ Π$-property in $ G $ if for any chief factor $ L / K $ of $ G $, $ |G/K : N_{G/K}(HK/K\cap L/K )| $ is a $ π(HK/K\cap L/K) $-number. In this paper, we obtain some criteria for the $ p $-supersolubility or $ p $-nilpotency of a finite group and extend some known results by concerning some subgroups that satisfy the $ Π$-property.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture
Authors:
Yitong **,
Zhi** Qiu,
Yi Shi,
Shuangpeng Sun,
Chongwu Wang,
Donghao Pan,
Jiachen Zhao,
Zhenghao Liang,
Yuan Wang,
Xiaobing Li,
Feng Yu,
Tao Yu,
Qionghai Dai
Abstract:
In this paper, we touch on the problem of markerless multi-modal human motion capture especially for string performance capture which involves inherently subtle hand-string contacts and intricate movements. To fulfill this goal, we first collect a dataset, named String Performance Dataset (SPD), featuring cello and violin performances. The dataset includes videos captured from up to 23 different v…
▽ More
In this paper, we touch on the problem of markerless multi-modal human motion capture especially for string performance capture which involves inherently subtle hand-string contacts and intricate movements. To fulfill this goal, we first collect a dataset, named String Performance Dataset (SPD), featuring cello and violin performances. The dataset includes videos captured from up to 23 different views, audio signals, and detailed 3D motion annotations of the body, hands, instrument, and bow. Moreover, to acquire the detailed motion annotations, we propose an audio-guided multi-modal motion capture framework that explicitly incorporates hand-string contacts detected from the audio signals for solving detailed hand poses. This framework serves as a baseline for string performance capture in a completely markerless manner without imposing any external devices on performers, eliminating the potential of introducing distortion in such delicate movements. We argue that the movements of performers, particularly the sound-producing gestures, contain subtle information often elusive to visual methods but can be inferred and retrieved from audio cues. Consequently, we refine the vision-based motion capture results through our innovative audio-guided approach, simultaneously clarifying the contact relationship between the performer and the instrument, as deduced from the audio. We validate the proposed framework and conduct ablation studies to demonstrate its efficacy. Our results outperform current state-of-the-art vision-based algorithms, underscoring the feasibility of augmenting visual motion capture with audio modality. To the best of our knowledge, SPD is the first dataset for musical instrument performance, covering fine-grained hand motion details in a multi-modal, large-scale collection.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
RaffeSDG: Random Frequency Filtering enabled Single-source Domain Generalization for Medical Image Segmentation
Authors:
Heng Li,
Hao** Li,
Jianyu Chen,
Zhongxi Qiu,
Huazhu Fu,
Lidai Wang,
Yan Hu,
Jiang Liu
Abstract:
Deep learning models often encounter challenges in making accurate inferences when there are domain shifts between the source and target data. This issue is particularly pronounced in clinical settings due to the scarcity of annotated data resulting from the professional and private nature of medical data. Despite the existence of decent solutions, many of them are hindered in clinical settings du…
▽ More
Deep learning models often encounter challenges in making accurate inferences when there are domain shifts between the source and target data. This issue is particularly pronounced in clinical settings due to the scarcity of annotated data resulting from the professional and private nature of medical data. Despite the existence of decent solutions, many of them are hindered in clinical settings due to limitations in data collection and computational complexity. To tackle domain shifts in data-scarce medical scenarios, we propose a Random frequency filtering enabled Single-source Domain Generalization algorithm (RaffeSDG), which promises robust out-of-domain inference with segmentation models trained on a single-source domain. A filter-based data augmentation strategy is first proposed to promote domain variability within a single-source domain by introducing variations in frequency space and blending homologous samples. Then Gaussian filter-based structural saliency is also leveraged to learn robust representations across augmented samples, further facilitating the training of generalizable segmentation models. To validate the effectiveness of RaffeSDG, we conducted extensive experiments involving out-of-domain inference on segmentation tasks for three human tissues imaged by four diverse modalities. Through thorough investigations and comparisons, compelling evidence was observed in these experiments, demonstrating the potential and generalizability of RaffeSDG. The code is available at https://github.com/liamheng/Non-IID_Medical_Image_Segmentation.
△ Less
Submitted 15 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Can Increasing the Hit Ratio Hurt Cache Throughput?
Authors:
Ziyue Qiu,
Juncheng Yang,
Mor Harchol-Balter
Abstract:
Software caches are an intrinsic component of almost every computer system. Consequently, caching algorithms, particularly eviction policies, are the topic of many papers. Almost all these prior papers evaluate the caching algorithm based on its hit ratio, namely the fraction of requests that are found in the cache, as opposed to disk. The hit ratio is viewed as a proxy for traditional performance…
▽ More
Software caches are an intrinsic component of almost every computer system. Consequently, caching algorithms, particularly eviction policies, are the topic of many papers. Almost all these prior papers evaluate the caching algorithm based on its hit ratio, namely the fraction of requests that are found in the cache, as opposed to disk. The hit ratio is viewed as a proxy for traditional performance metrics like system throughput or response time. Intuitively it makes sense that higher hit ratio should lead to higher throughput (and lower response time), since more requests are found in the cache (low access time) as opposed to the disk (high access time).
This paper challenges this intuition. We show that increasing the hit ratio can actually hurt the throughput (and response time) for many caching algorithms. Our investigation follows a three-pronged approach involving (i) queueing modeling and analysis, (ii) implementation and measurement, and (iii) simulation to validate the accuracy of the queueing model. We also show that the phenomenon of throughput decreasing at higher hit ratios is likely to be more pronounced in future systems, where the trend is towards faster disks and higher numbers of cores per CPU.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Contrastive Quantization based Semantic Code for Generative Recommendation
Authors:
Mengqun **,
Zexuan Qiu,
Jieming Zhu,
Zhenhua Dong,
Xiu Li
Abstract:
With the success of large language models, generative retrieval has emerged as a new retrieval technique for recommendation. It can be divided into two stages: the first stage involves constructing discrete Codes (i.e., codes), and the second stage involves decoding the code sequentially via the transformer architecture. Current methods often construct item semantic codes by reconstructing based q…
▽ More
With the success of large language models, generative retrieval has emerged as a new retrieval technique for recommendation. It can be divided into two stages: the first stage involves constructing discrete Codes (i.e., codes), and the second stage involves decoding the code sequentially via the transformer architecture. Current methods often construct item semantic codes by reconstructing based quantization on item textual representation, but they fail to capture item discrepancy that is essential in modeling item relationships in recommendation sytems. In this paper, we propose to construct the code representation of items by simultaneously considering both item relationships and semantic information. Specifically, we employ a pre-trained language model to extract item's textual description and translate it into item's embedding. Then we propose to enhance the encoder-decoder based RQVAE model with contrastive objectives to learn item code. To be specific, we employ the embeddings generated by the decoder from the samples themselves as positive instances and those from other samples as negative instances. Thus we effectively enhance the item discrepancy across all items, better preserving the item neighbourhood. Finally, we train and test semantic code with with generative retrieval on a sequential recommendation model. Our experiments demonstrate that our method improves NDCG@5 with 43.76% on the MIND dataset and Recall@10 with 80.95% on the Office dataset compared to the previous baselines.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Large-scale photonic chip based pulse interleaver for low-noise microwave generation
Authors:
Zheru Qiu,
Neetesh Singh,
Yang Liu,
Xinru Ji,
Rui Ning Wang,
Franz X. Kärtner,
Tobias Kippenberg
Abstract:
Microwaves generated by optical techniques have demonstrated unprecedentedly low noise and hold significance in various applications such as communication, radar, instrumentation, and metrology. To date, the purest microwave signals are generated using optical frequency division with femtosecond mode-locked lasers. However, many femtosecond laser combs have a radio frequency (RF) repetition rate i…
▽ More
Microwaves generated by optical techniques have demonstrated unprecedentedly low noise and hold significance in various applications such as communication, radar, instrumentation, and metrology. To date, the purest microwave signals are generated using optical frequency division with femtosecond mode-locked lasers. However, many femtosecond laser combs have a radio frequency (RF) repetition rate in the hundreds of megahertz range, necessitating methods to translate the generated low-noise RF signal to the microwave domain. Benchtop pulse interleavers can multiply the pulse repetition rate, avoid saturation of photodetectors, and facilitate the generation of high-power low-noise microwave signals, which have to date only been demonstrated using optical fibers or free space optics. Here, we introduce a large-scale photonic integrated circuit-based interleaver, offering size reduction and enhanced stability. The all-on-chip interleaver attains a 64-fold multiplication of the repetition rate, directly translated from 216 MHz to 14 GHz in microwave Ku-Band. By overcoming photodetector saturation, the generated microwave power was improved by 36 dB, with a phase noise floor reduced by more than 10 folds to -160 dBc/Hz on the 14 GHz carrier. The device is based on a low-loss and high-density photonic integrated circuit fabricated by the photonic Damascene process. Six cascaded stages of Mach-Zehnder interferometers with optical delay lines up to 33 centimeters long are fully integrated into a compact footprint of 8.5 mmx1.7 mm. The lithographically defined precision of the optical waveguide path length enables the scaling up of the interleaved frequency to millimeter-wave bands, which is challenging the fiber-based counterparts. This interleaver has the potential to reduce the cost and footprint of mode-locked-laser-based microwave generation, allowing for field deployment.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing
Authors:
Yuang Liu,
Zhiheng Qiu,
Xiaokai Qin
Abstract:
Transformer has been applied in the field of computer vision due to its excellent performance in natural language processing, surpassing traditional convolutional neural networks and achieving new state-of-the-art. ViT divides an image into several local patches, known as "visual sentences". However, the information contained in the image is vast and complex, and focusing only on the features at t…
▽ More
Transformer has been applied in the field of computer vision due to its excellent performance in natural language processing, surpassing traditional convolutional neural networks and achieving new state-of-the-art. ViT divides an image into several local patches, known as "visual sentences". However, the information contained in the image is vast and complex, and focusing only on the features at the "visual sentence" level is not enough. The features between local patches should also be taken into consideration. In order to achieve further improvement, the TNT model is proposed, whose algorithm further divides the image into smaller patches, namely "visual words," achieving more accurate results. The core of Transformer is the Multi-Head Attention mechanism, and traditional attention mechanisms ignore interactions across different attention heads. In order to reduce redundancy and improve utilization, we introduce the nested algorithm and apply the Nested-TNT to image classification tasks. The experiment confirms that the proposed model has achieved better classification performance over ViT and TNT, exceeding 2.25%, 1.1% on dataset CIFAR10 and 2.78%, 0.25% on dataset FLOWERS102 respectively.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Deep learning-driven pulmonary arteries and veins segmentation reveals demography-associated pulmonary vasculature anatomy
Authors:
Yuetan Chu,
Gongning Luo,
Longxi Zhou,
Shaodong Cao,
Guolin Ma,
Xianglin Meng,
Juexiao Zhou,
Changchun Yang,
Dexuan Xie,
Ricardo Henao,
Xigang Xiao,
Lianming Wu,
Zhaowen Qiu,
Xin Gao
Abstract:
Pulmonary artery-vein segmentation is crucial for diagnosing pulmonary diseases and surgical planning, and is traditionally achieved by Computed Tomography Pulmonary Angiography (CTPA). However, concerns regarding adverse health effects from contrast agents used in CTPA have constrained its clinical utility. In contrast, identifying arteries and veins using non-contrast CT, a conventional and low-…
▽ More
Pulmonary artery-vein segmentation is crucial for diagnosing pulmonary diseases and surgical planning, and is traditionally achieved by Computed Tomography Pulmonary Angiography (CTPA). However, concerns regarding adverse health effects from contrast agents used in CTPA have constrained its clinical utility. In contrast, identifying arteries and veins using non-contrast CT, a conventional and low-cost clinical examination routine, has long been considered impossible. Here we propose a High-abundant Pulmonary Artery-vein Segmentation (HiPaS) framework achieving accurate artery-vein segmentation on both non-contrast CT and CTPA across various spatial resolutions. HiPaS first performs spatial normalization on raw CT scans via a super-resolution module, and then iteratively achieves segmentation results at different branch levels by utilizing the low-level vessel segmentation as a prior for high-level vessel segmentation. We trained and validated HiPaS on our established multi-centric dataset comprising 1,073 CT volumes with meticulous manual annotation. Both quantitative experiments and clinical evaluation demonstrated the superior performance of HiPaS, achieving a dice score of 91.8% and a sensitivity of 98.0%. Further experiments demonstrated the non-inferiority of HiPaS segmentation on non-contrast CT compared to segmentation on CTPA. Employing HiPaS, we have conducted an anatomical study of pulmonary vasculature on 10,613 participants in China (five sites), discovering a new association between pulmonary vessel abundance and sex and age: vessel abundance is significantly higher in females than in males, and slightly decreases with age, under the controlling of lung volumes (p < 0.0001). HiPaS realizing accurate artery-vein segmentation delineates a promising avenue for clinical diagnosis and understanding pulmonary physiology in a non-invasive manner.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Calculation of toroidal Alfvén eigenmode mode structure in general axisymmetric toroidal geometry
Authors:
Guangyu Wei,
Matteo Valerio Falessi,
Tao Wang,
Fulvio Zonca,
Zhiyong Qiu
Abstract:
A workflow is developed based on the ideal MHD model to investigate the linear physics of various Alfvén eigenmodes in general axisymmetric toroidal geometry, by solving the coupled shear Alfvén wave (SAW) and ion sound wave (ISW) equations in ballooning space. The model equations are solved by the FALCON code in the singular layer, and the corresponding solutions are then taken as the boundary co…
▽ More
A workflow is developed based on the ideal MHD model to investigate the linear physics of various Alfvén eigenmodes in general axisymmetric toroidal geometry, by solving the coupled shear Alfvén wave (SAW) and ion sound wave (ISW) equations in ballooning space. The model equations are solved by the FALCON code in the singular layer, and the corresponding solutions are then taken as the boundary conditions for calculating parallel mode structures in the whole ballooning space. As an application of the code, the frequencies and mode structures of toroidal Alfvén eigenmode (TAE) are calculated in the reference equilibria of the Divertor Tokamak Test facility (DTT) with positive and negative triangularities, respectively. By properly handling the boundary conditions, we demonstrate finite TAE dam** due to coupling with the local acoustic continuum, and find that the dam** rate is small for typical plasma parameters.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
Authors:
Zi-Hao Qiu,
Siqi Guo,
Mao Xu,
Tuo Zhao,
Lijun Zhang,
Tianbao Yang
Abstract:
The temperature parameter plays a profound role during training and/or inference with large foundation models (LFMs) such as large language models (LLMs) and CLIP models. Particularly, it adjusts the logits in the softmax function in LLMs, which is crucial for next token generation, and it scales the similarities in the contrastive loss for training CLIP models. A significant question remains: Is…
▽ More
The temperature parameter plays a profound role during training and/or inference with large foundation models (LFMs) such as large language models (LLMs) and CLIP models. Particularly, it adjusts the logits in the softmax function in LLMs, which is crucial for next token generation, and it scales the similarities in the contrastive loss for training CLIP models. A significant question remains: Is it viable to learn a neural network to predict a personalized temperature of any input data for enhancing LFMs"? In this paper, we present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs. Our solution is composed of a novel learning framework with a robust loss underpinned by constrained distributionally robust optimization (DRO), and a properly designed TempNet with theoretical inspiration. TempNet can be trained together with a large foundation model from scratch or learned separately given a pretrained foundation model. It is not only useful for predicting personalized temperature to promote the training of LFMs but also generalizable and transferable to new tasks. Our experiments on LLMs and CLIP models demonstrate that TempNet greatly improves the performance of existing solutions or models, e.g. Table 1. The code to reproduce the experimental results in this paper can be found at https://github.com/zhqiu/TempNet.
△ Less
Submitted 16 June, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Finite groups with some subgroups of prime power order satisfying the partial $ Π$-property
Authors:
Zhengtian Qiu,
Adolfo Ballester-Bolinches
Abstract:
Let $ H $ be a subgroup of a finite group $ G $. We say that $ H $ satisfies the partial $ Π$-property in $ G $ if there exists a $G$-chief series $ \varGamma_{G}: 1 =G_{0} < G_{1} < \cdot\cdot\cdot < G_{n}= G $ of $ G $ such that $ | G / G_{i-1} : N_{G/G_{i-1}} (HG_{i-1}/G_{i-1}\cap G_{i}/G_{i-1})| $ is a $ π(HG_{i-1}/G_{i-1}\cap G_{i}/G_{i-1}) $-number for every $ G $-chief factor…
▽ More
Let $ H $ be a subgroup of a finite group $ G $. We say that $ H $ satisfies the partial $ Π$-property in $ G $ if there exists a $G$-chief series $ \varGamma_{G}: 1 =G_{0} < G_{1} < \cdot\cdot\cdot < G_{n}= G $ of $ G $ such that $ | G / G_{i-1} : N_{G/G_{i-1}} (HG_{i-1}/G_{i-1}\cap G_{i}/G_{i-1})| $ is a $ π(HG_{i-1}/G_{i-1}\cap G_{i}/G_{i-1}) $-number for every $ G $-chief factor $ G_{i}/G_{i-1} $ of $ \varGamma_{G} $, $1\leq i\leq n$. In this paper, we investigate the structure of a finite group $ G $ under the assumption that some subgroups of prime power order satisfy the partial $ Π$-property.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias
Authors:
Yuemei Xu,
Ling Hu,
Jiayi Zhao,
Zihan Qiu,
Yuqi Ye,
Hanwen Gu
Abstract:
Based on the foundation of Large Language Models (LLMs), Multilingual Large Language Models (MLLMs) have been developed to address the challenges of multilingual natural language processing tasks, ho** to achieve knowledge transfer from high-resource to low-resource languages. However, significant limitations and challenges still exist, such as language imbalance, multilingual alignment, and inh…
▽ More
Based on the foundation of Large Language Models (LLMs), Multilingual Large Language Models (MLLMs) have been developed to address the challenges of multilingual natural language processing tasks, ho** to achieve knowledge transfer from high-resource to low-resource languages. However, significant limitations and challenges still exist, such as language imbalance, multilingual alignment, and inherent bias. In this paper, we aim to provide a comprehensive analysis of MLLMs, delving deeply into discussions surrounding these critical issues. First of all, we start by presenting an overview of MLLMs, covering their evolution, key techniques, and multilingual capacities. Secondly, we explore widely utilized multilingual corpora for MLLMs' training and multilingual datasets oriented for downstream tasks that are crucial for enhancing the cross-lingual capability of MLLMs. Thirdly, we survey the existing studies on multilingual representations and investigate whether the current MLLMs can learn a universal language representation. Fourthly, we discuss bias on MLLMs including its category and evaluation metrics, and summarize the existing debiasing techniques. Finally, we discuss existing challenges and point out promising research directions. By demonstrating these aspects, this paper aims to facilitate a deeper understanding of MLLMs and their potentiality in various domains.
△ Less
Submitted 6 June, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
Authors:
Zhongwei Zhang,
Fuchen Long,
Yingwei Pan,
Zhaofan Qiu,
Ting Yao,
Yang Cao,
Tao Mei
Abstract:
Recent advances in text-to-video generation have demonstrated the utility of powerful diffusion models. Nevertheless, the problem is not trivial when sha** diffusion models to animate static image (i.e., image-to-video generation). The difficulty originates from the aspect that the diffusion process of subsequent animated frames should not only preserve the faithful alignment with the given imag…
▽ More
Recent advances in text-to-video generation have demonstrated the utility of powerful diffusion models. Nevertheless, the problem is not trivial when sha** diffusion models to animate static image (i.e., image-to-video generation). The difficulty originates from the aspect that the diffusion process of subsequent animated frames should not only preserve the faithful alignment with the given image but also pursue temporal coherence among adjacent frames. To alleviate this, we present TRIP, a new recipe of image-to-video diffusion paradigm that pivots on image noise prior derived from static image to jointly trigger inter-frame relational reasoning and ease the coherent temporal modeling via temporal residual learning. Technically, the image noise prior is first attained through one-step backward diffusion process based on both static image and noised video latent codes. Next, TRIP executes a residual-like dual-path scheme for noise prediction: 1) a shortcut path that directly takes image noise prior as the reference noise of each frame to amplify the alignment between the first frame and subsequent frames; 2) a residual path that employs 3D-UNet over noised video and static image latent codes to enable inter-frame relational reasoning, thereby easing the learning of the residual noise for each frame. Furthermore, both reference and residual noise of each frame are dynamically merged via attention mechanism for final video generation. Extensive experiments on WebVid-10M, DTDB and MSR-VTT datasets demonstrate the effectiveness of our TRIP for image-to-video generation. Please see our project page at https://trip-i2v.github.io/TRIP/.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
Authors:
Zhikai Chen,
Fuchen Long,
Zhaofan Qiu,
Ting Yao,
Wengang Zhou,
Jiebo Luo,
Tao Mei
Abstract:
Diffusion models are just at a tip** point for image super-resolution task. Nevertheless, it is not trivial to capitalize on diffusion models for video super-resolution which necessitates not only the preservation of visual appearance from low-resolution to high-resolution videos, but also the temporal consistency across video frames. In this paper, we propose a novel approach, pursuing Spatial…
▽ More
Diffusion models are just at a tip** point for image super-resolution task. Nevertheless, it is not trivial to capitalize on diffusion models for video super-resolution which necessitates not only the preservation of visual appearance from low-resolution to high-resolution videos, but also the temporal consistency across video frames. In this paper, we propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo), for video super-resolution. SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction. Technically, SATeCo freezes all the parameters of the pre-trained UNet and VAE, and only optimizes two deliberately-designed spatial feature adaptation (SFA) and temporal feature alignment (TFA) modules, in the decoder of UNet and VAE. SFA modulates frame features via adaptively estimating affine parameters for each pixel, guaranteeing pixel-wise guidance for high-resolution frame synthesis. TFA delves into feature interaction within a 3D local window (tubelet) through self-attention, and executes cross-attention between tubelet and its low-resolution counterpart to guide temporal feature alignment. Extensive experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Joint chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning: enhancing interpretability
Authors:
Zirui Qiu,
Hassan Rivaz,
Yiming Xiao
Abstract:
As deep learning has become the state-of-the-art for computer-assisted diagnosis, interpretability of the automatic decisions is crucial for clinical deployment. While various methods were proposed in this domain, visual attention maps of clinicians during radiological screening offer a unique asset to provide important insights and can potentially enhance the quality of computer-assisted diagnosi…
▽ More
As deep learning has become the state-of-the-art for computer-assisted diagnosis, interpretability of the automatic decisions is crucial for clinical deployment. While various methods were proposed in this domain, visual attention maps of clinicians during radiological screening offer a unique asset to provide important insights and can potentially enhance the quality of computer-assisted diagnosis. With this paper, we introduce a novel deep-learning framework for joint disease diagnosis and prediction of corresponding visual saliency maps for chest X-ray scans. Specifically, we designed a novel dual-encoder multi-task UNet, which leverages both a DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based encoder to extract diverse features for saliency map prediction, and a multi-scale feature-fusion classifier to perform disease classification. To tackle the issue of asynchronous training schedules of individual tasks in multi-task learning, we proposed a multi-stage cooperative learning strategy, with contrastive learning for feature encoder pretraining to boost performance. Experiments show that our proposed method outperformed existing techniques for chest X-ray diagnosis and the quality of visual saliency map prediction.
△ Less
Submitted 29 March, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Harnessing Large Language Models for Text-Rich Sequential Recommendation
Authors:
Zhi Zheng,
Wenshuo Chao,
Zhaopeng Qiu,
Hengshu Zhu,
Hui Xiong
Abstract:
Recent advances in Large Language Models (LLMs) have been changing the paradigm of Recommender Systems (RS). However, when items in the recommendation scenarios contain rich textual information, such as product descriptions in online shop** or news headlines on social media, LLMs require longer texts to comprehensively depict the historical user behavior sequence. This poses significant challeng…
▽ More
Recent advances in Large Language Models (LLMs) have been changing the paradigm of Recommender Systems (RS). However, when items in the recommendation scenarios contain rich textual information, such as product descriptions in online shop** or news headlines on social media, LLMs require longer texts to comprehensively depict the historical user behavior sequence. This poses significant challenges to LLM-based recommenders, such as over-length limitations, extensive time and space overheads, and suboptimal model performance. To this end, in this paper, we design a novel framework for harnessing Large Language Models for Text-Rich Sequential Recommendation (LLM-TRSR). Specifically, we first propose to segment the user historical behaviors and subsequently employ an LLM-based summarizer for summarizing these user behavior blocks. Particularly, drawing inspiration from the successful application of Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) models in user modeling, we introduce two unique summarization techniques in this paper, respectively hierarchical summarization and recurrent summarization. Then, we construct a prompt text encompassing the user preference summary, recent user interactions, and candidate item information into an LLM-based recommender, which is subsequently fine-tuned using Supervised Fine-Tuning (SFT) techniques to yield our final recommendation model. We also use Low-Rank Adaptation (LoRA) for Parameter-Efficient Fine-Tuning (PEFT). We conduct experiments on two public datasets, and the results clearly demonstrate the effectiveness of our approach.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Baryonic Vortex Phase and Magnetic Field Generation in QCD with Isospin and Baryon Chemical Potentials
Authors:
Zebin Qiu,
Muneto Nitta
Abstract:
We propose a novel baryonic vortex phase in low energy dense QCD with finite baryon and isospin chemical potentials. It is known that the homogeneous charged pion condensate emerges as a ground state at finite isospin chemical potential, and therein arises the Abrikosov vortex lattice with an applied magnetic field. We first demonstrate that a vortex with the same quantized magnetic flux as the co…
▽ More
We propose a novel baryonic vortex phase in low energy dense QCD with finite baryon and isospin chemical potentials. It is known that the homogeneous charged pion condensate emerges as a ground state at finite isospin chemical potential, and therein arises the Abrikosov vortex lattice with an applied magnetic field. We first demonstrate that a vortex with the same quantized magnetic flux as the conventional Abrikosov vortex, carries a baryon number captured by the third homotopy group of Skyrmions, once we take into account a modulation of the neutral pion inside the vortex core. Such a vortex-Skyrmion state is therefore dubbed the baryonic vortex. We further reveal that when the baryon chemical potential is above a critical value, the baryonic vortex has negative tension measured from the charged pion condensation. It implies that the phase, in which such vortices emerge spontaneously without an external magnetic field, would take over the ground state at high baryon density. Such a new phase contributes to the comprehension of QCD phase diagram and relates to the generation of magnetic fields inside neutron stars.
△ Less
Submitted 14 May, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
VAEMax: Open-Set Intrusion Detection based on OpenMax and Variational Autoencoder
Authors:
Zhiyin Qiu,
Ding Zhou,
Yahui Zhai,
Bo Liu,
Lei He,
Jiuxin Cao
Abstract:
Promptly discovering unknown network attacks is critical for reducing the risk of major loss imposed on system or equipment. This paper aims to develop an open-set intrusion detection model to classify known attacks as well as inferring unknown ones. To achieve this, we employ OpenMax and variational autoencoder to propose a dual detection model, VAEMax. First, we extract flow payload feature base…
▽ More
Promptly discovering unknown network attacks is critical for reducing the risk of major loss imposed on system or equipment. This paper aims to develop an open-set intrusion detection model to classify known attacks as well as inferring unknown ones. To achieve this, we employ OpenMax and variational autoencoder to propose a dual detection model, VAEMax. First, we extract flow payload feature based on one-dimensional convolutional neural network. Then, the OpenMax is used to classify flows, during which some unknown attacks can be detected, while the rest are misclassified into a certain class of known flows. Finally, use VAE to perform secondary detection on each class of flows, and determine whether the flow is an unknown attack based on the reconstruction loss. Experiments performed on dataset CIC-IDS2017 and CSE-CIC-IDS2018 show our approach is better than baseline models and can be effectively applied to realistic network environments.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models
Authors:
Zexuan Qiu,
**g**g Li,
Shijue Huang,
Wanjun Zhong,
Irwin King
Abstract:
Develo** Large Language Models (LLMs) with robust long-context capabilities has been the recent research focus, resulting in the emergence of long-context LLMs proficient in Chinese. However, the evaluation of these models remains underdeveloped due to a lack of benchmarks. To address this gap, we present CLongEval, a comprehensive Chinese benchmark for evaluating long-context LLMs. CLongEval is…
▽ More
Develo** Large Language Models (LLMs) with robust long-context capabilities has been the recent research focus, resulting in the emergence of long-context LLMs proficient in Chinese. However, the evaluation of these models remains underdeveloped due to a lack of benchmarks. To address this gap, we present CLongEval, a comprehensive Chinese benchmark for evaluating long-context LLMs. CLongEval is characterized by three key features: (1) Sufficient data volume, comprising 7 distinct tasks and 7,267 examples; (2) Broad applicability, accommodating to models with context windows size from 1K to 100K; (3) High quality, with over 2,000 manually annotated question-answer pairs in addition to the automatically constructed labels. With CLongEval, we undertake a comprehensive assessment of 6 open-source long-context LLMs and 2 leading commercial counterparts that feature both long-context abilities and proficiency in Chinese. We also provide in-depth analysis based on the empirical results, trying to shed light on the critical capabilities that present challenges in long-context settings. The dataset, evaluation scripts, and model outputs will be released.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
Authors:
Hao Zhao,
Zihan Qiu,
Huijia Wu,
Zili Wang,
Zhaofeng He,
Jie Fu
Abstract:
The Mixture of Experts (MoE) for language models has been proven effective in augmenting the capacity of models by dynamically routing each input token to a specific subset of experts for processing. Despite the success, most existing methods face a challenge for balance between sparsity and the availability of expert knowledge: enhancing performance through increased use of expert knowledge often…
▽ More
The Mixture of Experts (MoE) for language models has been proven effective in augmenting the capacity of models by dynamically routing each input token to a specific subset of experts for processing. Despite the success, most existing methods face a challenge for balance between sparsity and the availability of expert knowledge: enhancing performance through increased use of expert knowledge often results in diminishing sparsity during expert selection. To mitigate this contradiction, we propose HyperMoE, a novel MoE framework built upon Hypernetworks. This framework integrates the computational processes of MoE with the concept of knowledge transferring in multi-task learning. Specific modules generated based on the information of unselected experts serve as supplementary information, which allows the knowledge of experts not selected to be used while maintaining selection sparsity. Our comprehensive empirical evaluations across multiple datasets and backbones establish that HyperMoE significantly outperforms existing MoE methods under identical conditions concerning the number of experts.
△ Less
Submitted 21 May, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers
Authors:
Zihan Qiu,
Zeyu Huang,
Youcheng Huang,
Jie Fu
Abstract:
The feed-forward networks (FFNs) in transformers are recognized as a group of key-value neural memories to restore abstract high-level knowledge. In this work, we conduct an empirical ablation study on updating keys (the 1st layer in the FFNs layer) or values (the 2nd layer in the FFNs layer). We compare those two methods in various knowledge editing and fine-tuning tasks of large language models…
▽ More
The feed-forward networks (FFNs) in transformers are recognized as a group of key-value neural memories to restore abstract high-level knowledge. In this work, we conduct an empirical ablation study on updating keys (the 1st layer in the FFNs layer) or values (the 2nd layer in the FFNs layer). We compare those two methods in various knowledge editing and fine-tuning tasks of large language models to draw insights to understand FFNs further. Code is available at $\href{https://github.com/qiuzh20/Tuning-keys-v.s.-values}{this\,repo}$.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Learning the Topology and Behavior of Discrete Dynamical Systems
Authors:
Zirou Qiu,
Abhi** Adiga,
Madhav V. Marathe,
S. S. Ravi,
Daniel J. Rosenkrantz,
Richard E. Stearns,
Anil Vullikanti
Abstract:
Discrete dynamical systems are commonly used to model the spread of contagions on real-world networks. Under the PAC framework, existing research has studied the problem of learning the behavior of a system, assuming that the underlying network is known. In this work, we focus on a more challenging setting: to learn both the behavior and the underlying topology of a black-box system. We show that,…
▽ More
Discrete dynamical systems are commonly used to model the spread of contagions on real-world networks. Under the PAC framework, existing research has studied the problem of learning the behavior of a system, assuming that the underlying network is known. In this work, we focus on a more challenging setting: to learn both the behavior and the underlying topology of a black-box system. We show that, in general, this learning problem is computationally intractable. On the positive side, we present efficient learning methods under the PAC model when the underlying graph of the dynamical system belongs to some classes. Further, we examine a relaxed setting where the topology of an unknown system is partially observed. For this case, we develop an efficient PAC learner to infer the system and establish the sample complexity. Lastly, we present a formal analysis of the expressive power of the hypothesis class of dynamical systems where both the topology and behavior are unknown, using the well-known formalism of the Natarajan dimension. Our results provide a theoretical foundation for learning both the behavior and topology of discrete dynamical systems.
△ Less
Submitted 29 March, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
The Computational Complexity of the Housing Market
Authors:
Edwin Lock,
Zephyr Qiu,
Alexander Teytelboym
Abstract:
We prove that the classic problem of finding a competitive equilibrium in an exchange economy with indivisible goods, money, and unit-demand agents is PPAD-complete. In this "housing market", agents have preferences over the house and amount of money they end up with, but can experience income effects. Our results contrast with the existence of polynomial-time algorithms for related problems: Top…
▽ More
We prove that the classic problem of finding a competitive equilibrium in an exchange economy with indivisible goods, money, and unit-demand agents is PPAD-complete. In this "housing market", agents have preferences over the house and amount of money they end up with, but can experience income effects. Our results contrast with the existence of polynomial-time algorithms for related problems: Top Trading Cycles for the "housing exchange" problem in which there are no transfers and the Hungarian algorithm for the "housing assignment" problem in which agents' utilities are linear in money. Along the way, we prove that the Rainbow-KKM problem, a total search problem based on a generalization by Gale of the Knaster-Kuratowski-Mazurkiewicz lemma, is PPAD-complete. Our reductions also imply bounds on the query complexity of finding competitive equilibrium.
△ Less
Submitted 21 February, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN
Authors:
Shiqi Zhang,
Zheng Qiu,
Daiki Takeuchi,
Noboru Harada,
Shoji Makino
Abstract:
With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding. However, enhancing the phase spectrum using neural networks is often ineffective, which remains a challenging problem. In this paper, we found that the human ear cannot…
▽ More
With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding. However, enhancing the phase spectrum using neural networks is often ineffective, which remains a challenging problem. In this paper, we found that the human ear cannot sensitively perceive the difference between a precise phase spectrum and a biased phase (BP) spectrum. Therefore, we propose an optimization method of phase reconstruction, allowing freedom on the global-phase bias instead of reconstructing the precise phase spectrum. We applied it to a Conformer-based Metric Generative Adversarial Networks (CMGAN) baseline model, which relaxes the existing constraints of precise phase and gives the neural network a broader learning space. Results show that this method achieves a new state-of-the-art performance without incurring additional computational overhead.
△ Less
Submitted 4 June, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
On beat-driven and spontaneous excitations of zonal flows by drift waves
Authors:
Liu Chen,
Zhiyong Qiu,
Fulvio Zonca
Abstract:
Using the slab plasma as a paradigm model, we have derived analytically equations for the nonlinear generation of zero-frequency zonal flows by electron drift waves including, on the same footing, both the beat-driven and spontaneous excitations. It is found that the beat-driven zonal flow tends to reduce the frequency mismatch between the electron drift waves and, thereby, contributes to a signif…
▽ More
Using the slab plasma as a paradigm model, we have derived analytically equations for the nonlinear generation of zero-frequency zonal flows by electron drift waves including, on the same footing, both the beat-driven and spontaneous excitations. It is found that the beat-driven zonal flow tends to reduce the frequency mismatch between the electron drift waves and, thereby, contributes to a significant O(1) enhancement of the modulational instability drive and lowering its threshold. Implications to tokamaks plasmas as well as drift-wave soliton formation are also discussed.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Drift wave soliton formation via forced-driven zonal flow and implication on plasma confinement
Authors:
Ningfei Chen,
Liu Chen,
Fulvio Zonca,
Zhiyong Qiu
Abstract:
In this work, gyrokinetic theory of drift waves (DWs) self-regulation via the forced driven zonal flow (ZF) is presented, and finite diamagnetic drift frequency due to plasma nonuniformity is shown to play dominant role in ZF forced generation. The obtained nonlinear DW equation is a nonlinear Schrödinger equation, in which the linear dispersiveness, linear growth, nonuniformity of diamagnetic dri…
▽ More
In this work, gyrokinetic theory of drift waves (DWs) self-regulation via the forced driven zonal flow (ZF) is presented, and finite diamagnetic drift frequency due to plasma nonuniformity is shown to play dominant role in ZF forced generation. The obtained nonlinear DW equation is a nonlinear Schrödinger equation, in which the linear dispersiveness, linear growth, nonuniformity of diamagnetic drift frequency, and cubic nonlinearity induced by feedback of forced-driven ZF to DWs are self-consistently included. The nonlinear DW equation is solved numerically in both uniform and nonuniform plasmas. It is shown that DWenvelope soliton may form due to the balance of linear dispersiveness and nonlinearity, and lead to turbulence spreading to linearly stable region. It is further found that though the threshold on DW amplitude for soliton formation is well within the relevant parameter regimes of realistic tokamak experiments, solitons can not extend beyond the range bounded by the turning points of the wave packet when plasma nonuniformity is self-consistently accounted for.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Saturation of fishbone instability through zonal flows driven by energetic particle transport in tokamak plasmas
Authors:
G. Brochard,
C. Liu,
X. Wei,
W. Heidbrink,
Z. Lin,
M. V. Falessi,
F. Zonca,
Z. Qiu,
N. Gorelenkov,
C. Chrystal,
X. Du,
J. Bao,
A. R. Polevoi,
M. Schneider,
S. H. Kim,
S. D. Pinches,
P. Liu,
J. H. Nicolau,
H. Lütjens,
the ISEP group
Abstract:
Gyrokinetic and kinetic-MHD simulations are performed for the fishbone instability in the DIII-D discharge #178631, chosen for validation of first-principles simulations to predict the energetic particle (EP) transport in an ITER prefusion baseline scenario. Fishbone modes are found to generate zonal flows, which dominate the fishbone saturation. The underlying mechanisms of the two-way fishbone-z…
▽ More
Gyrokinetic and kinetic-MHD simulations are performed for the fishbone instability in the DIII-D discharge #178631, chosen for validation of first-principles simulations to predict the energetic particle (EP) transport in an ITER prefusion baseline scenario. Fishbone modes are found to generate zonal flows, which dominate the fishbone saturation. The underlying mechanisms of the two-way fishbone-zonal flows nonlinear interplay are discussed in details. Numerical and analytical analyses identify the fishbone-induced EP redistribution as the dominant generation mechanism for zonal flows. The zonal flows modify the nonlinear dynamics of phase space zonal structures, which reduces the amount of EPs able to resonate with the mode, leading to an early fishbone saturation. Simulation results including zonal flows agree quantitatively with DIII-D experimental measurements of the fishbone saturation amplitude and EP transport, supporting this novel saturation mechanism by self-generated zonal flows. Moreover, the wave-particle mode-locking mechanism is shown to determine quantitatively the fishbone frequency down-chir**, as evident in GTC simulation results in agreement with predictions from analytical theory. Finally, the fishbone-induced zonal flows are possibly responsible for the formation of an ion-ITB in the DIII-D discharge. Based on the low EP transport and the large zonal flow shearing rates associated with the fishbone instability in gyrokinetic simulations of the ITER scenario, it is conjectured that high performance scenarios could be designed in ITER burning plasmas through fishbone-induced ITBs.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
A Cross-View Hierarchical Graph Learning Hypernetwork for Skill Demand-Supply Joint Prediction
Authors:
Wenshuo Chao,
Zhaopeng Qiu,
Likang Wu,
Zhuoning Guo,
Zhi Zheng,
Hengshu Zhu,
Hao Liu
Abstract:
The rapidly changing landscape of technology and industries leads to dynamic skill requirements, making it crucial for employees and employers to anticipate such shifts to maintain a competitive edge in the labor market. Existing efforts in this area either rely on domain-expert knowledge or regarding skill evolution as a simplified time series forecasting problem. However, both approaches overloo…
▽ More
The rapidly changing landscape of technology and industries leads to dynamic skill requirements, making it crucial for employees and employers to anticipate such shifts to maintain a competitive edge in the labor market. Existing efforts in this area either rely on domain-expert knowledge or regarding skill evolution as a simplified time series forecasting problem. However, both approaches overlook the sophisticated relationships among different skills and the inner-connection between skill demand and supply variations. In this paper, we propose a Cross-view Hierarchical Graph learning Hypernetwork (CHGH) framework for joint skill demand-supply prediction. Specifically, CHGH is an encoder-decoder network consisting of i) a cross-view graph encoder to capture the interconnection between skill demand and supply, ii) a hierarchical graph encoder to model the co-evolution of skills from a cluster-wise perspective, and iii) a conditional hyper-decoder to jointly predict demand and supply variations by incorporating historical demand-supply gaps. Extensive experiments on three real-world datasets demonstrate the superiority of the proposed framework compared to seven baselines and the effectiveness of the three modules.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Effects of plasma nonuniformity on toroidal Alfvén eigenmode nonlinear decay
Authors:
Zhiwen Cheng,
Kexun Shen,
Zhiyong Qiu
Abstract:
The parametric decay of toroidal Alfvén eigenmode (TAE) in nonuniform plasmas is investigated using nonlinear gyrokinetic equation. It is found that, the plasma nonuniformity not only significantly enhances the nonlinear coupling cross-section, but also qualitatively modifies the decay process. Specifically, the condition for spontaneous decay becomes the toroidal mode number of the sideband TAE b…
▽ More
The parametric decay of toroidal Alfvén eigenmode (TAE) in nonuniform plasmas is investigated using nonlinear gyrokinetic equation. It is found that, the plasma nonuniformity not only significantly enhances the nonlinear coupling cross-section, but also qualitatively modifies the decay process. Specifically, the condition for spontaneous decay becomes the toroidal mode number of the sideband TAE being higher than that of the pump TAE, instead of the frequency of the sideband TAE being lower than the pump TAE in uniform plasmas. The consequences on TAE saturation and energetic particle transport are also discussed.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
HiHPQ: Hierarchical Hyperbolic Product Quantization for Unsupervised Image Retrieval
Authors:
Zexuan Qiu,
Jiahong Liu,
Yankai Chen,
Irwin King
Abstract:
Existing unsupervised deep product quantization methods primarily aim for the increased similarity between different views of the identical image, whereas the delicate multi-level semantic similarities preserved between images are overlooked. Moreover, these methods predominantly focus on the Euclidean space for computational convenience, compromising their ability to map the multi-level semantic…
▽ More
Existing unsupervised deep product quantization methods primarily aim for the increased similarity between different views of the identical image, whereas the delicate multi-level semantic similarities preserved between images are overlooked. Moreover, these methods predominantly focus on the Euclidean space for computational convenience, compromising their ability to map the multi-level semantic relationships between images effectively. To mitigate these shortcomings, we propose a novel unsupervised product quantization method dubbed \textbf{Hi}erarchical \textbf{H}yperbolic \textbf{P}roduct \textbf{Q}uantization (HiHPQ), which learns quantized representations by incorporating hierarchical semantic similarity within hyperbolic geometry. Specifically, we propose a hyperbolic product quantizer, where the hyperbolic codebook attention mechanism and the quantized contrastive learning on the hyperbolic product manifold are introduced to expedite quantization. Furthermore, we propose a hierarchical semantics learning module, designed to enhance the distinction between similar and non-matching images for a query by utilizing the extracted hierarchical semantics as an additional training supervision. Experiments on benchmarks show that our proposed method outperforms state-of-the-art baselines.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Authors:
Tianyu Cui,
Yanling Wang,
Chuanpu Fu,
Yong Xiao,
Sijia Li,
Xinhao Deng,
Yunpeng Liu,
Qinglin Zhang,
Ziyi Qiu,
Peiyang Li,
Zhixing Tan,
Junwu Xiong,
Xinyu Kong,
Zujie Wen,
Ke Xu,
Qi Li
Abstract:
Large language models (LLMs) have strong capabilities in solving diverse natural language processing tasks. However, the safety and security issues of LLM systems have become the major obstacle to their widespread application. Many studies have extensively investigated risks in LLM systems and developed the corresponding mitigation strategies. Leading-edge enterprises such as OpenAI, Google, Meta,…
▽ More
Large language models (LLMs) have strong capabilities in solving diverse natural language processing tasks. However, the safety and security issues of LLM systems have become the major obstacle to their widespread application. Many studies have extensively investigated risks in LLM systems and developed the corresponding mitigation strategies. Leading-edge enterprises such as OpenAI, Google, Meta, and Anthropic have also made lots of efforts on responsible LLMs. Therefore, there is a growing need to organize the existing studies and establish comprehensive taxonomies for the community. In this paper, we delve into four essential modules of an LLM system, including an input module for receiving prompts, a language model trained on extensive corpora, a toolchain module for development and deployment, and an output module for exporting LLM-generated content. Based on this, we propose a comprehensive taxonomy, which systematically analyzes potential risks associated with each module of an LLM system and discusses the corresponding mitigation strategies. Furthermore, we review prevalent benchmarks, aiming to facilitate the risk assessment of LLM systems. We hope that this paper can help LLM participants embrace a systematic perspective to build their responsible LLM systems.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Resonant Decay of Kinetic Alfvén Waves and Implication on Spectral Cascading
Authors:
Kexun Shen,
Zhiwen Cheng,
Zhiyong Qiu
Abstract:
A general equation describing the resonant nonlinear mode-coupling among kinetic Alfvén waves (KAWs) is derived using nonlinear gyrokinetic theory, which can be applied to study the potentially strong spectral energy transfer of KAWs. As a first application, the parametric decay of a pump KAW into two sideband KAWs are studied, with particular emphasis on the cascading in perpendicular wavenumber.…
▽ More
A general equation describing the resonant nonlinear mode-coupling among kinetic Alfvén waves (KAWs) is derived using nonlinear gyrokinetic theory, which can be applied to study the potentially strong spectral energy transfer of KAWs. As a first application, the parametric decay of a pump KAW into two sideband KAWs are studied, with particular emphasis on the cascading in perpendicular wavenumber. It is found that, for the "co-propagating" cases with all three KAWs propagating in the same direction along the equilibrium magnetic field line, it exhibits a dual cascading character in the perpendicular wavenumber space; while for the "counter-propagating" cases with one sideband propagating in the opposite direction with respect to the pump wave, it instead, can exhibit both dual and inverse cascading behaviors. The implications on SAW instability nonlinear saturation and charged particle transport in fusion plasmas is also discussed.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Authors:
Fuchen Long,
Zhaofan Qiu,
Ting Yao,
Tao Mei
Abstract:
The recent innovations and breakthroughs in diffusion models have significantly expanded the possibilities of generating high-quality videos for the given prompts. Most existing works tackle the single-scene scenario with only one video event occurring in a single background. Extending to generate multi-scene videos nevertheless is not trivial and necessitates to nicely manage the logic in between…
▽ More
The recent innovations and breakthroughs in diffusion models have significantly expanded the possibilities of generating high-quality videos for the given prompts. Most existing works tackle the single-scene scenario with only one video event occurring in a single background. Extending to generate multi-scene videos nevertheless is not trivial and necessitates to nicely manage the logic in between while preserving the consistent visual appearance of key content across video scenes. In this paper, we propose a novel framework, namely VideoDrafter, for content-consistent multi-scene video generation. Technically, VideoDrafter leverages Large Language Models (LLM) to convert the input prompt into comprehensive multi-scene script that benefits from the logical knowledge learnt by LLM. The script for each scene includes a prompt describing the event, the foreground/background entities, as well as camera movement. VideoDrafter identifies the common entities throughout the script and asks LLM to detail each entity. The resultant entity description is then fed into a text-to-image model to generate a reference image for each entity. Finally, VideoDrafter outputs a multi-scene video by generating each scene video via a diffusion process that takes the reference images, the descriptive prompt of the event and camera movement into account. The diffusion model incorporates the reference images as the condition and alignment to strengthen the content consistency of multi-scene videos. Extensive experiments demonstrate that VideoDrafter outperforms the SOTA video generation models in terms of visual quality, content consistency, and user preference.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
$μ$-Net: ConvNext-Based U-Nets for Cosmic Muon Tomography
Authors:
Li Xin Jed Lim,
Ziming Qiu
Abstract:
Muon scattering tomography utilises muons, typically originating from cosmic rays to image the interiors of dense objects. However, due to the low flux of cosmic ray muons at sea-level and the highly complex interactions that muons display when travelling through matter, existing reconstruction algorithms often suffer from low resolution and high noise. In this work, we develop a novel two-stage d…
▽ More
Muon scattering tomography utilises muons, typically originating from cosmic rays to image the interiors of dense objects. However, due to the low flux of cosmic ray muons at sea-level and the highly complex interactions that muons display when travelling through matter, existing reconstruction algorithms often suffer from low resolution and high noise. In this work, we develop a novel two-stage deep learning algorithm, $μ$-Net, consisting of an MLP to predict the muon trajectory and a ConvNeXt-based U-Net to convert the scattering points into voxels. $μ$-Net achieves a state-of-the-art performance of 17.14 PSNR at the dosage of 1024 muons, outperforming traditional reconstruction algorithms such as the point of closest approach algorithm and maximum likelihood and expectation maximisation algorithm. Furthermore, we find that our method is robust to various corruptions such as inaccuracies in the muon momentum or a limited detector resolution. We also generate and publicly release the first large-scale dataset that maps muon detections to voxels. We hope that our research will spark further investigations into the potential of deep learning to revolutionise this field.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Map-Reduce for Multiprocessing Large Data and Multi-threading for Data Scra**
Authors:
Zefeng Qiu,
Prashanth Umapathy,
Qingquan Zhang,
Guanqun Song,
Ting Zhu
Abstract:
This document is the final project report for our advanced operating system class. During this project, we mainly focused on applying multiprocessing and multi-threading technology to our whole project and utilized the map-reduce algorithm in our data cleaning and data analysis process. In general, our project can be divided into two components: data scra** and data processing, where the previou…
▽ More
This document is the final project report for our advanced operating system class. During this project, we mainly focused on applying multiprocessing and multi-threading technology to our whole project and utilized the map-reduce algorithm in our data cleaning and data analysis process. In general, our project can be divided into two components: data scra** and data processing, where the previous part was almost web wrangling with employing potential multiprocessing or multi-threading technology to speed up the whole process. And after we collect and scrape a large amount value of data as mentioned above, we can use them as input to implement data cleaning and data analysis, during this period, we take advantage of the map-reduce algorithm to increase efficiency.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Knowledge Graphs and Pre-trained Language Models enhanced Representation Learning for Conversational Recommender Systems
Authors:
Zhangchi Qiu,
Ye Tao,
Shirui Pan,
Alan Wee-Chung Liew
Abstract:
Conversational recommender systems (CRS) utilize natural language interactions and dialogue history to infer user preferences and provide accurate recommendations. Due to the limited conversation context and background knowledge, existing CRSs rely on external sources such as knowledge graphs to enrich the context and model entities based on their inter-relations. However, these methods ignore the…
▽ More
Conversational recommender systems (CRS) utilize natural language interactions and dialogue history to infer user preferences and provide accurate recommendations. Due to the limited conversation context and background knowledge, existing CRSs rely on external sources such as knowledge graphs to enrich the context and model entities based on their inter-relations. However, these methods ignore the rich intrinsic information within entities. To address this, we introduce the Knowledge-Enhanced Entity Representation Learning (KERL) framework, which leverages both the knowledge graph and a pre-trained language model to improve the semantic understanding of entities for CRS. In our KERL framework, entity textual descriptions are encoded via a pre-trained language model, while a knowledge graph helps reinforce the representation of these entities. We also employ positional encoding to effectively capture the temporal information of entities in a conversation. The enhanced entity representation is then used to develop a recommender component that fuses both entity and contextual representations for more informed recommendations, as well as a dialogue component that generates informative entity-related information in the response text. A high-quality knowledge graph with aligned entity descriptions is constructed to facilitate our study, namely the Wiki Movie Knowledge Graph (WikiMKG). The experimental results show that KERL achieves state-of-the-art results in both recommendation and response generation tasks.
△ Less
Submitted 1 May, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction
Authors:
Guangxuan Song,
Dongmei Fu,
Zhongwei Qiu,
Zijiang Yang,
Jiaxin Dai,
Lingwei Ma,
Dawei Zhang
Abstract:
Using machine learning (ML) techniques to predict material properties is a crucial research topic. These properties depend on numerical data and semantic factors. Due to the limitations of small-sample datasets, existing methods typically adopt ML algorithms to regress numerical properties or transfer other pre-trained knowledge graphs (KGs) to the material. However, these methods cannot simultane…
▽ More
Using machine learning (ML) techniques to predict material properties is a crucial research topic. These properties depend on numerical data and semantic factors. Due to the limitations of small-sample datasets, existing methods typically adopt ML algorithms to regress numerical properties or transfer other pre-trained knowledge graphs (KGs) to the material. However, these methods cannot simultaneously handle semantic and numerical information. In this paper, we propose a numerical reasoning method for material KGs (NR-KG), which constructs a cross-modal KG using semantic nodes and numerical proxy nodes. It captures both types of information by projecting KG into a canonical KG and utilizes a graph neural network to predict material properties. In this process, a novel projection prediction loss is proposed to extract semantic features from numerical information. NR-KG facilitates end-to-end processing of cross-modal data, mining relationships and cross-modal information in small-sample datasets, and fully utilizes valuable experimental data to enhance material prediction. We further propose two new High-Entropy Alloys (HEA) property datasets with semantic descriptions. NR-KG outperforms state-of-the-art (SOTA) methods, achieving relative improvements of 25.9% and 16.1% on two material datasets. Besides, NR-KG surpasses SOTA methods on two public physical chemistry molecular datasets, showing improvements of 22.2% and 54.3%, highlighting its potential application and generalizability. We hope the proposed datasets, algorithms, and pre-trained models can facilitate the communities of KG and AI for materials.
△ Less
Submitted 24 April, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.