Skip to main content

Showing 1–50 of 372 results for author: Qiu, Z

.
  1. arXiv:2407.06498  [pdf, other

    cs.HC

    Enhancing spatial auditory attention decoding with neuroscience-inspired prototype training

    Authors: Zelin Qiu, Jianjun Gu, Dingding Yao, Junfeng Li

    Abstract: The spatial auditory attention decoding (Sp-AAD) technology aims to determine the direction of auditory attention in multi-talker scenarios via neural recordings. Despite the success of recent Sp-AAD algorithms, their performance is hindered by trial-specific features in EEG data. This study aims to improve decoding performance against these features. Studies in neuroscience indicate that spatial… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2406.18219  [pdf, other

    cs.CL cs.LG

    A Closer Look into Mixture-of-Experts in Large Language Models

    Authors: Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu

    Abstract: Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks. By sparsely activating a subset of parameters for each token, MoE architecture could increase the model size without sacrificing computational efficiency, achieving a better trade-off between performance and training costs. However, the underlying mechani… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.17519  [pdf, other

    cs.CL

    Entropy-Based Decoding for Retrieval-Augmented Large Language Models

    Authors: Zexuan Qiu, Zi**g Ou, Bin Wu, **g**g Li, Aiwei Liu, Irwin King

    Abstract: Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue, where the generated responses are negatively influenced by noise from both external and internal knowledge sources. In this paper, we introduce a novel, trainin… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.17245  [pdf, other

    cs.LG cs.AI cs.CL

    Unlocking Continual Learning Abilities in Language Models

    Authors: Wenyu Du, Shuang Cheng, Tongxu Luo, Zihan Qiu, Zeyu Huang, Ka Chun Cheung, Reynold Cheng, Jie Fu

    Abstract: Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task informa… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: preprint, 19 pages

  5. arXiv:2406.14296  [pdf, other

    physics.optics physics.app-ph

    Foundry compatible, efficient wafer-scale manufacturing of ultra-low loss, high-density Si$_3$N$_4$ photonic integrated circuits

    Authors: Xinru Ji, Rui Ning Wang, Yang Liu, Johann Riemensberger, Zheru Qiu, Tobias J. Kippenberg

    Abstract: Silicon nitride (Si$_3$N$_4$) photonic integrated circuits (PICs) have shown low linear loss, negligible nonlinear loss, and high power handling over traditional silicon photonics. To achieve high-density photonic integration and high effective nonlinearity through tight optical confinement, thick stoichiometric Si$_3$N$_4$ films are indispensable. However, when using low-pressure chemical vapor d… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 2023 Conference on Lasers and Electro-Optics (CLEO). IEEE, 2023

  6. arXiv:2406.12375  [pdf, other

    cs.LG cs.AI

    GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory

    Authors: Haoze Wu, Zihan Qiu, Zili Wang, Hang Zhao, Jie Fu

    Abstract: Mixture-of-Experts (MoE) has been demonstrated as an efficient method to scale up models. By dynamically and sparsely selecting activated experts, MoE can effectively reduce computational costs. Despite the success, we observe that many tokens in the MoE models have uncertain routing results. These tokens have nearly equal scores for choosing each expert, and we demonstrate that this uncertainty c… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  7. arXiv:2406.11116  [pdf

    cs.CL

    Grammaticality Representation in ChatGPT as Compared to Linguists and Laypeople

    Authors: Zhuang Qiu, Xufeng Duan, Zhenguang G. Cai

    Abstract: Large language models (LLMs) have demonstrated exceptional performance across various linguistic tasks. However, it remains uncertain whether LLMs have developed human-like fine-grained grammatical intuition. This preregistered study (https://osf.io/t5nes) presents the first large-scale investigation of ChatGPT's grammatical intuition, building upon a previous study that collected laypeople's gram… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 23 pages

  8. arXiv:2406.05962  [pdf, other

    cs.DC cs.DB

    Data Caching for Enterprise-Grade Petabyte-Scale OLAP

    Authors: Chunxu Tang, Bin Fan, **g Zhao, Chen Liang, Yi Wang, Beinan Wang, Ziyue Qiu, Lu Qiu, Bowen Ding, Shouzhuo Sun, Saiguang Che, Jiaming Mai, Shouwei Chen, Yu Zhu, Jianjian Xie, Yutian, Sun, Yao Li, Yangjun Zhang, Ke Wang, Mingmin Chen

    Abstract: With the exponential growth of data and evolving use cases, petabyte-scale OLAP data platforms are increasingly adopting a model that decouples compute from storage. This shift, evident in organizations like Uber and Meta, introduces operational challenges including massive, read-heavy I/O traffic with potential throttling, as well as skewed and fragmented data access patterns. Addressing these ch… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to the USENIX Annual Technical Conference (USENIX ATC) 2024

  9. arXiv:2406.05731  [pdf, ps, other

    physics.plasm-ph

    Nonlinear saturation of reversed shear Alfven eigenmode via high-frequency quasi-mode generation

    Authors: Zhiwen Cheng, Guangyu Wei, Lei Ye, Zhiyong Qiu

    Abstract: A nonlinear saturation mechanism for reversed shear Alfven eigenmode (RSAE) is proposed and analysed, and is shown to be of relevance to typical reactor parameter region. The saturation is achieved through the generation of high-frequency quasi-mode due to nonlinear coupling of two RSAEs, which is then damped due to coupling with the shear Alfven continuum, and leads to the nonlinear saturation of… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: submitted to Plasma Physics and Technology

  10. arXiv:2406.02658  [pdf, other

    cs.NE

    Maintaining Diversity Provably Helps in Evolutionary Multimodal Optimization

    Authors: Shengjie Ren, Zhijia Qiu, Chao Bian, Miqing Li, Chao Qian

    Abstract: In the real world, there exist a class of optimization problems that multiple (local) optimal solutions in the solution space correspond to a single point in the objective space. In this paper, we theoretically show that for such multimodal problems, a simple method that considers the diversity of solutions in the solution space can benefit the search in evolutionary algorithms (EAs). Specifically… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2406.02118

  11. arXiv:2405.15319  [pdf, other

    cs.CL cs.AI

    Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

    Authors: Wenyu Du, Tongxu Luo, Zihan Qiu, Zeyu Huang, Yikang Shen, Reynold Cheng, Yike Guo, Jie Fu

    Abstract: LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical $\underline{\textit{O}}$bstacles: ($\textit{O}$1) lack of comprehen… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Preprint; The project link: $\href{https://llm-stacking.github.io/}{https://llm-stacking.github.io/}$

  12. arXiv:2405.06925  [pdf, other

    cs.LG cs.AI

    Semi-supervised Anomaly Detection via Adaptive Reinforcement Learning-Enabled Method with Causal Inference for Sensor Signals

    Authors: Xiangwei Chen, Ruliang Xiaoa, Zhixia Zeng, Zhipeng Qiu, Shi Zhang, Xin Du

    Abstract: Semi-supervised anomaly detection for sensor signals is critical in ensuring system reliability in smart manufacturing. However, existing methods rely heavily on data correlation, neglecting causality and leading to potential misinterpretations due to confounding factors. Moreover, while current reinforcement learning-based methods can effectively identify known and unknown anomalies with limited… ▽ More

    Submitted 16 May, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

  13. arXiv:2405.06884  [pdf, other

    cs.LG

    Efficient PAC Learnability of Dynamical Systems Over Multilayer Networks

    Authors: Zirou Qiu, Abhi** Adiga, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil Vullikanti

    Abstract: Networked dynamical systems are widely used as formal models of real-world cascading phenomena, such as the spread of diseases and information. Prior research has addressed the problem of learning the behavior of an unknown dynamical system when the underlying network has a single layer. In this work, we study the learnability of dynamical systems over multilayer networks, which are more realistic… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted at ICML 2024

  14. arXiv:2405.05106  [pdf, ps, other

    math.GR

    A note on the $ Π$-property of some subgroups of finite groups

    Authors: Zhengtian Qiu, Jianjun Liu, Guiyun Chen

    Abstract: Let $ H $ be a subgroup of a finite group $ G $. We say that $ H $ satisfies the $ Π$-property in $ G $ if for any chief factor $ L / K $ of $ G $, $ |G/K : N_{G/K}(HK/K\cap L/K )| $ is a $ π(HK/K\cap L/K) $-number. In this paper, we obtain some criteria for the $ p $-supersolubility or $ p $-nilpotency of a finite group and extend some known results by concerning some subgroups that satisfy the… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  15. Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture

    Authors: Yitong **, Zhi** Qiu, Yi Shi, Shuangpeng Sun, Chongwu Wang, Donghao Pan, Jiachen Zhao, Zhenghao Liang, Yuan Wang, Xiaobing Li, Feng Yu, Tao Yu, Qionghai Dai

    Abstract: In this paper, we touch on the problem of markerless multi-modal human motion capture especially for string performance capture which involves inherently subtle hand-string contacts and intricate movements. To fulfill this goal, we first collect a dataset, named String Performance Dataset (SPD), featuring cello and violin performances. The dataset includes videos captured from up to 23 different v… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH2024

  16. arXiv:2405.01228  [pdf, other

    cs.CV

    RaffeSDG: Random Frequency Filtering enabled Single-source Domain Generalization for Medical Image Segmentation

    Authors: Heng Li, Hao** Li, Jianyu Chen, Zhongxi Qiu, Huazhu Fu, Lidai Wang, Yan Hu, Jiang Liu

    Abstract: Deep learning models often encounter challenges in making accurate inferences when there are domain shifts between the source and target data. This issue is particularly pronounced in clinical settings due to the scarcity of annotated data resulting from the professional and private nature of medical data. Despite the existence of decent solutions, many of them are hindered in clinical settings du… ▽ More

    Submitted 15 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  17. arXiv:2404.16219  [pdf, other

    cs.PF

    Can Increasing the Hit Ratio Hurt Cache Throughput?

    Authors: Ziyue Qiu, Juncheng Yang, Mor Harchol-Balter

    Abstract: Software caches are an intrinsic component of almost every computer system. Consequently, caching algorithms, particularly eviction policies, are the topic of many papers. Almost all these prior papers evaluate the caching algorithm based on its hit ratio, namely the fraction of requests that are found in the cache, as opposed to disk. The hit ratio is viewed as a proxy for traditional performance… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  18. arXiv:2404.14774  [pdf, other

    cs.IR

    Contrastive Quantization based Semantic Code for Generative Recommendation

    Authors: Mengqun **, Zexuan Qiu, Jieming Zhu, Zhenhua Dong, Xiu Li

    Abstract: With the success of large language models, generative retrieval has emerged as a new retrieval technique for recommendation. It can be divided into two stages: the first stage involves constructing discrete Codes (i.e., codes), and the second stage involves decoding the code sequentially via the transformer architecture. Current methods often construct item semantic codes by reconstructing based q… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  19. arXiv:2404.14242  [pdf, other

    physics.optics

    Large-scale photonic chip based pulse interleaver for low-noise microwave generation

    Authors: Zheru Qiu, Neetesh Singh, Yang Liu, Xinru Ji, Rui Ning Wang, Franz X. Kärtner, Tobias Kippenberg

    Abstract: Microwaves generated by optical techniques have demonstrated unprecedentedly low noise and hold significance in various applications such as communication, radar, instrumentation, and metrology. To date, the purest microwave signals are generated using optical frequency division with femtosecond mode-locked lasers. However, many femtosecond laser combs have a radio frequency (RF) repetition rate i… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  20. arXiv:2404.13434  [pdf, other

    cs.CV cs.AI

    Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing

    Authors: Yuang Liu, Zhiheng Qiu, Xiaokai Qin

    Abstract: Transformer has been applied in the field of computer vision due to its excellent performance in natural language processing, surpassing traditional convolutional neural networks and achieving new state-of-the-art. ViT divides an image into several local patches, known as "visual sentences". However, the information contained in the image is vast and complex, and focusing only on the features at t… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  21. arXiv:2404.07671  [pdf

    cs.CV

    Deep learning-driven pulmonary arteries and veins segmentation reveals demography-associated pulmonary vasculature anatomy

    Authors: Yuetan Chu, Gongning Luo, Longxi Zhou, Shaodong Cao, Guolin Ma, Xianglin Meng, Juexiao Zhou, Changchun Yang, Dexuan Xie, Ricardo Henao, Xigang Xiao, Lianming Wu, Zhaowen Qiu, Xin Gao

    Abstract: Pulmonary artery-vein segmentation is crucial for diagnosing pulmonary diseases and surgical planning, and is traditionally achieved by Computed Tomography Pulmonary Angiography (CTPA). However, concerns regarding adverse health effects from contrast agents used in CTPA have constrained its clinical utility. In contrast, identifying arteries and veins using non-contrast CT, a conventional and low-… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  22. arXiv:2404.06296  [pdf, other

    physics.plasm-ph

    Calculation of toroidal Alfvén eigenmode mode structure in general axisymmetric toroidal geometry

    Authors: Guangyu Wei, Matteo Valerio Falessi, Tao Wang, Fulvio Zonca, Zhiyong Qiu

    Abstract: A workflow is developed based on the ideal MHD model to investigate the linear physics of various Alfvén eigenmodes in general axisymmetric toroidal geometry, by solving the coupled shear Alfvén wave (SAW) and ion sound wave (ISW) equations in ballooning space. The model equations are solved by the FALCON code in the singular layer, and the corresponding solutions are then taken as the boundary co… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  23. arXiv:2404.04575  [pdf, other

    cs.LG cs.AI math.OC

    To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO

    Authors: Zi-Hao Qiu, Siqi Guo, Mao Xu, Tuo Zhao, Lijun Zhang, Tianbao Yang

    Abstract: The temperature parameter plays a profound role during training and/or inference with large foundation models (LFMs) such as large language models (LLMs) and CLIP models. Particularly, it adjusts the logits in the softmax function in LLMs, which is crucial for next token generation, and it scales the similarities in the contrastive loss for training CLIP models. A significant question remains: Is… ▽ More

    Submitted 16 June, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: 41 pages, 10 figures, accepted by ICML2024

  24. arXiv:2404.01254  [pdf, ps, other

    math.GR

    Finite groups with some subgroups of prime power order satisfying the partial $ Π$-property

    Authors: Zhengtian Qiu, Adolfo Ballester-Bolinches

    Abstract: Let $ H $ be a subgroup of a finite group $ G $. We say that $ H $ satisfies the partial $ Π$-property in $ G $ if there exists a $G$-chief series $ \varGamma_{G}: 1 =G_{0} < G_{1} < \cdot\cdot\cdot < G_{n}= G $ of $ G $ such that $ | G / G_{i-1} : N_{G/G_{i-1}} (HG_{i-1}/G_{i-1}\cap G_{i}/G_{i-1})| $ is a $ π(HG_{i-1}/G_{i-1}\cap G_{i}/G_{i-1}) $-number for every $ G $-chief factor… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2304.12753

  25. arXiv:2404.00929  [pdf, other

    cs.CL cs.AI

    A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias

    Authors: Yuemei Xu, Ling Hu, Jiayi Zhao, Zihan Qiu, Yuqi Ye, Hanwen Gu

    Abstract: Based on the foundation of Large Language Models (LLMs), Multilingual Large Language Models (MLLMs) have been developed to address the challenges of multilingual natural language processing tasks, ho** to achieve knowledge transfer from high-resource to low-resource languages. However, significant limitations and challenges still exist, such as language imbalance, multilingual alignment, and inh… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  26. arXiv:2403.17005  [pdf, other

    cs.CV cs.MM

    TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models

    Authors: Zhongwei Zhang, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Ting Yao, Yang Cao, Tao Mei

    Abstract: Recent advances in text-to-video generation have demonstrated the utility of powerful diffusion models. Nevertheless, the problem is not trivial when sha** diffusion models to animate static image (i.e., image-to-video generation). The difficulty originates from the aspect that the diffusion process of subsequent animated frames should not only preserve the faithful alignment with the given imag… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: CVPR 2024; Project page: https://trip-i2v.github.io/TRIP/

  27. arXiv:2403.17000  [pdf, other

    cs.CV cs.MM

    Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

    Authors: Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei

    Abstract: Diffusion models are just at a tip** point for image super-resolution task. Nevertheless, it is not trivial to capitalize on diffusion models for video super-resolution which necessitates not only the preservation of visual appearance from low-resolution to high-resolution videos, but also the temporal consistency across video frames. In this paper, we propose a novel approach, pursuing Spatial… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  28. arXiv:2403.16970  [pdf, other

    eess.IV cs.CV cs.LG

    Joint chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning: enhancing interpretability

    Authors: Zirui Qiu, Hassan Rivaz, Yiming Xiao

    Abstract: As deep learning has become the state-of-the-art for computer-assisted diagnosis, interpretability of the automatic decisions is crucial for clinical deployment. While various methods were proposed in this domain, visual attention maps of clinicians during radiological screening offer a unique asset to provide important insights and can potentially enhance the quality of computer-assisted diagnosi… ▽ More

    Submitted 29 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  29. arXiv:2403.13325  [pdf, other

    cs.IR

    Harnessing Large Language Models for Text-Rich Sequential Recommendation

    Authors: Zhi Zheng, Wenshuo Chao, Zhaopeng Qiu, Hengshu Zhu, Hui Xiong

    Abstract: Recent advances in Large Language Models (LLMs) have been changing the paradigm of Recommender Systems (RS). However, when items in the recommendation scenarios contain rich textual information, such as product descriptions in online shop** or news headlines on social media, LLMs require longer texts to comprehensively depict the historical user behavior sequence. This poses significant challeng… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  30. arXiv:2403.07433  [pdf, other

    hep-ph hep-th nucl-th

    Baryonic Vortex Phase and Magnetic Field Generation in QCD with Isospin and Baryon Chemical Potentials

    Authors: Zebin Qiu, Muneto Nitta

    Abstract: We propose a novel baryonic vortex phase in low energy dense QCD with finite baryon and isospin chemical potentials. It is known that the homogeneous charged pion condensate emerges as a ground state at finite isospin chemical potential, and therein arises the Abrikosov vortex lattice with an applied magnetic field. We first demonstrate that a vortex with the same quantized magnetic flux as the co… ▽ More

    Submitted 14 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 19 pages, 5 figures, V2: An error in Eq.(2.3) corrected. The main result unchanged

  31. arXiv:2403.04193  [pdf

    cs.CR

    VAEMax: Open-Set Intrusion Detection based on OpenMax and Variational Autoencoder

    Authors: Zhiyin Qiu, Ding Zhou, Yahui Zhai, Bo Liu, Lei He, Jiuxin Cao

    Abstract: Promptly discovering unknown network attacks is critical for reducing the risk of major loss imposed on system or equipment. This paper aims to develop an open-set intrusion detection model to classify known attacks as well as inferring unknown ones. To achieve this, we employ OpenMax and variational autoencoder to propose a dual detection model, VAEMax. First, we extract flow payload feature base… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures, 5 tables, 2024 5th ICTC

  32. arXiv:2403.03514  [pdf, other

    cs.CL

    CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models

    Authors: Zexuan Qiu, **g**g Li, Shijue Huang, Wanjun Zhong, Irwin King

    Abstract: Develo** Large Language Models (LLMs) with robust long-context capabilities has been the recent research focus, resulting in the emergence of long-context LLMs proficient in Chinese. However, the evaluation of these models remains underdeveloped due to a lack of benchmarks. To address this gap, we present CLongEval, a comprehensive Chinese benchmark for evaluating long-context LLMs. CLongEval is… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 19 pages, 4 figures

  33. arXiv:2402.12656  [pdf, other

    cs.LG cs.AI

    HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts

    Authors: Hao Zhao, Zihan Qiu, Huijia Wu, Zili Wang, Zhaofeng He, Jie Fu

    Abstract: The Mixture of Experts (MoE) for language models has been proven effective in augmenting the capacity of models by dynamically routing each input token to a specific subset of experts for processing. Despite the success, most existing methods face a challenge for balance between sparsity and the availability of expert knowledge: enhancing performance through increased use of expert knowledge often… ▽ More

    Submitted 21 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  34. arXiv:2402.12233  [pdf, other

    cs.CL

    Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers

    Authors: Zihan Qiu, Zeyu Huang, Youcheng Huang, Jie Fu

    Abstract: The feed-forward networks (FFNs) in transformers are recognized as a group of key-value neural memories to restore abstract high-level knowledge. In this work, we conduct an empirical ablation study on updating keys (the 1st layer in the FFNs layer) or values (the 2nd layer in the FFNs layer). We compare those two methods in various knowledge editing and fine-tuning tasks of large language models… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to Tiny Paper @ ICLR 2024. Codes available at this $\href{https://github.com/qiuzh20/Tuning-keys-v.s.-values}{this\,repo}$

  35. arXiv:2402.11686  [pdf, other

    cs.LG

    Learning the Topology and Behavior of Discrete Dynamical Systems

    Authors: Zirou Qiu, Abhi** Adiga, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil Vullikanti

    Abstract: Discrete dynamical systems are commonly used to model the spread of contagions on real-world networks. Under the PAC framework, existing research has studied the problem of learning the behavior of a system, assuming that the underlying network is known. In this work, we focus on a more challenging setting: to learn both the behavior and the underlying topology of a black-box system. We show that,… ▽ More

    Submitted 29 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted at AAAI-24

  36. arXiv:2402.08484  [pdf, ps, other

    cs.GT cs.CC

    The Computational Complexity of the Housing Market

    Authors: Edwin Lock, Zephyr Qiu, Alexander Teytelboym

    Abstract: We prove that the classic problem of finding a competitive equilibrium in an exchange economy with indivisible goods, money, and unit-demand agents is PPAD-complete. In this "housing market", agents have preferences over the house and amount of money they end up with, but can experience income effects. Our results contrast with the existence of polynomial-time algorithms for related problems: Top… ▽ More

    Submitted 21 February, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  37. arXiv:2402.08252  [pdf, other

    eess.AS cs.SD

    Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN

    Authors: Shiqi Zhang, Zheng Qiu, Daiki Takeuchi, Noboru Harada, Shoji Makino

    Abstract: With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding. However, enhancing the phase spectrum using neural networks is often ineffective, which remains a challenging problem. In this paper, we found that the human ear cannot… ▽ More

    Submitted 4 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted by ICASSP 2024 Updated on 2024/06/04 to add one more citation in appendix

  38. arXiv:2402.07399  [pdf, ps, other

    physics.plasm-ph

    On beat-driven and spontaneous excitations of zonal flows by drift waves

    Authors: Liu Chen, Zhiyong Qiu, Fulvio Zonca

    Abstract: Using the slab plasma as a paradigm model, we have derived analytically equations for the nonlinear generation of zero-frequency zonal flows by electron drift waves including, on the same footing, both the beat-driven and spontaneous excitations. It is found that the beat-driven zonal flow tends to reduce the frequency mismatch between the electron drift waves and, thereby, contributes to a signif… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: submitted to Physics of Plasmas

  39. arXiv:2402.07390  [pdf, ps, other

    physics.plasm-ph

    Drift wave soliton formation via forced-driven zonal flow and implication on plasma confinement

    Authors: Ningfei Chen, Liu Chen, Fulvio Zonca, Zhiyong Qiu

    Abstract: In this work, gyrokinetic theory of drift waves (DWs) self-regulation via the forced driven zonal flow (ZF) is presented, and finite diamagnetic drift frequency due to plasma nonuniformity is shown to play dominant role in ZF forced generation. The obtained nonlinear DW equation is a nonlinear Schrödinger equation, in which the linear dispersiveness, linear growth, nonuniformity of diamagnetic dri… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  40. arXiv:2402.03797  [pdf, ps, other

    physics.plasm-ph

    Saturation of fishbone instability through zonal flows driven by energetic particle transport in tokamak plasmas

    Authors: G. Brochard, C. Liu, X. Wei, W. Heidbrink, Z. Lin, M. V. Falessi, F. Zonca, Z. Qiu, N. Gorelenkov, C. Chrystal, X. Du, J. Bao, A. R. Polevoi, M. Schneider, S. H. Kim, S. D. Pinches, P. Liu, J. H. Nicolau, H. Lütjens, the ISEP group

    Abstract: Gyrokinetic and kinetic-MHD simulations are performed for the fishbone instability in the DIII-D discharge #178631, chosen for validation of first-principles simulations to predict the energetic particle (EP) transport in an ITER prefusion baseline scenario. Fishbone modes are found to generate zonal flows, which dominate the fishbone saturation. The underlying mechanisms of the two-way fishbone-z… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  41. arXiv:2401.17838  [pdf, other

    cs.LG cs.AI

    A Cross-View Hierarchical Graph Learning Hypernetwork for Skill Demand-Supply Joint Prediction

    Authors: Wenshuo Chao, Zhaopeng Qiu, Likang Wu, Zhuoning Guo, Zhi Zheng, Hengshu Zhu, Hao Liu

    Abstract: The rapidly changing landscape of technology and industries leads to dynamic skill requirements, making it crucial for employees and employers to anticipate such shifts to maintain a competitive edge in the labor market. Existing efforts in this area either rely on domain-expert knowledge or regarding skill evolution as a simplified time series forecasting problem. However, both approaches overloo… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 11 pages, 7 figures, AAAI24

  42. arXiv:2401.07507  [pdf, ps, other

    physics.plasm-ph

    Effects of plasma nonuniformity on toroidal Alfvén eigenmode nonlinear decay

    Authors: Zhiwen Cheng, Kexun Shen, Zhiyong Qiu

    Abstract: The parametric decay of toroidal Alfvén eigenmode (TAE) in nonuniform plasmas is investigated using nonlinear gyrokinetic equation. It is found that, the plasma nonuniformity not only significantly enhances the nonlinear coupling cross-section, but also qualitatively modifies the decay process. Specifically, the condition for spontaneous decay becomes the toroidal mode number of the sideband TAE b… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Report number: NF-106807

  43. arXiv:2401.07212  [pdf, other

    cs.IR

    HiHPQ: Hierarchical Hyperbolic Product Quantization for Unsupervised Image Retrieval

    Authors: Zexuan Qiu, Jiahong Liu, Yankai Chen, Irwin King

    Abstract: Existing unsupervised deep product quantization methods primarily aim for the increased similarity between different views of the identical image, whereas the delicate multi-level semantic similarities preserved between images are overlooked. Moreover, these methods predominantly focus on the Euclidean space for computational convenience, compromising their ability to map the multi-level semantic… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI 2024

  44. arXiv:2401.05778  [pdf, other

    cs.CL cs.AI

    Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

    Authors: Tianyu Cui, Yanling Wang, Chuanpu Fu, Yong Xiao, Sijia Li, Xinhao Deng, Yunpeng Liu, Qinglin Zhang, Ziyi Qiu, Peiyang Li, Zhixing Tan, Junwu Xiong, Xinyu Kong, Zujie Wen, Ke Xu, Qi Li

    Abstract: Large language models (LLMs) have strong capabilities in solving diverse natural language processing tasks. However, the safety and security issues of LLM systems have become the major obstacle to their widespread application. Many studies have extensively investigated risks in LLM systems and developed the corresponding mitigation strategies. Leading-edge enterprises such as OpenAI, Google, Meta,… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  45. arXiv:2401.02711  [pdf, ps, other

    physics.plasm-ph

    Resonant Decay of Kinetic Alfvén Waves and Implication on Spectral Cascading

    Authors: Kexun Shen, Zhiwen Cheng, Zhiyong Qiu

    Abstract: A general equation describing the resonant nonlinear mode-coupling among kinetic Alfvén waves (KAWs) is derived using nonlinear gyrokinetic theory, which can be applied to study the potentially strong spectral energy transfer of KAWs. As a first application, the parametric decay of a pump KAW into two sideband KAWs are studied, with particular emphasis on the cascading in perpendicular wavenumber.… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  46. arXiv:2401.01256  [pdf, other

    cs.CV cs.CL

    VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM

    Authors: Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei

    Abstract: The recent innovations and breakthroughs in diffusion models have significantly expanded the possibilities of generating high-quality videos for the given prompts. Most existing works tackle the single-scene scenario with only one video event occurring in a single background. Extending to generate multi-scene videos nevertheless is not trivial and necessitates to nicely manage the logic in between… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Project website: https://videodrafter.github.io

  47. arXiv:2312.17265  [pdf, other

    cs.CV eess.IV physics.ins-det

    $μ$-Net: ConvNext-Based U-Nets for Cosmic Muon Tomography

    Authors: Li Xin Jed Lim, Ziming Qiu

    Abstract: Muon scattering tomography utilises muons, typically originating from cosmic rays to image the interiors of dense objects. However, due to the low flux of cosmic ray muons at sea-level and the highly complex interactions that muons display when travelling through matter, existing reconstruction algorithms often suffer from low resolution and high noise. In this work, we develop a novel two-stage d… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  48. arXiv:2312.15158  [pdf, other

    math.NA

    Map-Reduce for Multiprocessing Large Data and Multi-threading for Data Scra**

    Authors: Zefeng Qiu, Prashanth Umapathy, Qingquan Zhang, Guanqun Song, Ting Zhu

    Abstract: This document is the final project report for our advanced operating system class. During this project, we mainly focused on applying multiprocessing and multi-threading technology to our whole project and utilized the map-reduce algorithm in our data cleaning and data analysis process. In general, our project can be divided into two components: data scra** and data processing, where the previou… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  49. Knowledge Graphs and Pre-trained Language Models enhanced Representation Learning for Conversational Recommender Systems

    Authors: Zhangchi Qiu, Ye Tao, Shirui Pan, Alan Wee-Chung Liew

    Abstract: Conversational recommender systems (CRS) utilize natural language interactions and dialogue history to infer user preferences and provide accurate recommendations. Due to the limited conversation context and background knowledge, existing CRSs rely on external sources such as knowledge graphs to enrich the context and model entities based on their inter-relations. However, these methods ignore the… ▽ More

    Submitted 1 May, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

  50. arXiv:2312.09744  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction

    Authors: Guangxuan Song, Dongmei Fu, Zhongwei Qiu, Zijiang Yang, Jiaxin Dai, Lingwei Ma, Dawei Zhang

    Abstract: Using machine learning (ML) techniques to predict material properties is a crucial research topic. These properties depend on numerical data and semantic factors. Due to the limitations of small-sample datasets, existing methods typically adopt ML algorithms to regress numerical properties or transfer other pre-trained knowledge graphs (KGs) to the material. However, these methods cannot simultane… ▽ More

    Submitted 24 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.