Skip to main content

Showing 1–50 of 333 results for author: Yang, E

.
  1. arXiv:2406.17186  [pdf, other

    cs.CL cs.CY

    CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation

    Authors: Abe Bohan Hou, Orion Weller, Guanghui Qin, Eugene Yang, Dawn Lawrie, Nils Holzenberger, Andrew Blair-Stanek, Benjamin Van Durme

    Abstract: Legal professionals need to write analyses that rely on citations to relevant precedents, i.e., previous case decisions. Intelligent systems assisting legal professionals in writing such documents provide great benefits but are challenging to design. Such systems need to help locate, summarize, and reason over salient precedents in order to be useful. To enable systems for such tasks, we work with… ▽ More

    Submitted 27 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.10086  [pdf

    cs.CL cs.LG stat.ME

    Discovering influential text using convolutional neural networks

    Authors: Megan Ayers, Luke Sanford, Margaret Roberts, Eddie Yang

    Abstract: Experimental methods for estimating the impacts of text on human evaluation have been widely used in the social sciences. However, researchers in experimental settings are usually limited to testing a small number of pre-specified text treatments. While efforts to mine unstructured texts for features that causally affect outcomes have been ongoing in recent years, these models have primarily focus… ▽ More

    Submitted 21 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: To be published in ACL 2024 Findings

  3. arXiv:2406.08645  [pdf, other

    astro-ph.GA astro-ph.CO

    ODIN: Identifying Protoclusters and Cosmic Filaments Traced by Ly$α$-emitting Galaxies

    Authors: Vandana Ramakrishnan, Kyoung-Soo Lee, Maria Celeste Artale, Eric Gawiser. Yu** Yang, Changbom Park, Robin Ciardullo, Lucia Guaita, Sang Hyeok Im, Seongjae Kim, Ankit Kumar, Jaehyun Lee, Seong-Kook Lee, Byeongha Moon, Nelson Padilla, Alexandra Pope, Roxana Popescu, Hyunmi Song, Paulina Troncoso, Francisco Valdes, Ann Zabludoff

    Abstract: To understand the formation and evolution of massive cosmic structures, studying them at high redshift, in the epoch when they formed the majority of their mass is essential. The One-hundred-deg$^2$ DECam Imaging in Narrowbands (ODIN) survey is undertaking the widest-area narrowband program to date, to use Ly$α$-emitting galaxies (LAEs) to trace the large-scale structure (LSS) of the Universe at t… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 26 pages, 18 figures; submitted to ApJ

  4. arXiv:2406.00798  [pdf, other

    cs.CV cs.AI

    PruNeRF: Segment-Centric Dataset Pruning via 3D Spatial Consistency

    Authors: Yeonsung Jung, Heecheol Yun, Joonhyung Park, **-Hwa Kim, Eunho Yang

    Abstract: Neural Radiance Fields (NeRF) have shown remarkable performance in learning 3D scenes. However, NeRF exhibits vulnerability when confronted with distractors in the training images -- unexpected objects are present only within specific views, such as moving entities like pedestrians or birds. Excluding distractors during dataset construction is a straightforward solution, but without prior knowledg… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  5. arXiv:2405.18581  [pdf, other

    cs.AI

    Unleashing the Potential of Text-attributed Graphs: Automatic Relation Decomposition via Large Language Models

    Authors: Hyun** Seo, Taewon Kim, June Yong Yang, Eunho Yang

    Abstract: Recent advancements in text-attributed graphs (TAGs) have significantly improved the quality of node features by using the textual modeling capabilities of language models. Despite this success, utilizing text attributes to enhance the predefined graph structure remains largely unexplored. Our extensive analysis reveals that conventional edges on TAGs, treated as a single relation (e.g., hyperlink… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  6. arXiv:2405.18543  [pdf, other

    math.CO

    De Bruijn Polyominoes

    Authors: D. Condon, Yuxin Wang, E. Yang

    Abstract: We introduce the notions of de Bruijn polyominoes and prismatic polyominoes, which generalize the notions of de Bruijn sequences and arrays. Given a small fixed polyomino $p$ and a set of colors $[n]$, a de Bruijn polyomino for $(p,n)$ is a colored fixed polyomino $P$ with cells colored from $[n]$ such that every possible coloring of $p$ from $[n]$ exists as a subset of $P$. We call de Bruijn poly… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  7. arXiv:2405.11464  [pdf, other

    cs.CL cs.AI cs.LG

    Efficient Prompt Tuning by Multi-Space Projection and Prompt Fusion

    Authors: Pengxiang Lan, Enneng Yang, Yuting Liu, Guibing Guo, Linying Jiang, Jianzhe Zhao, Xingwei Wang

    Abstract: Prompt tuning is a promising method to fine-tune a pre-trained language model without retraining its large-scale parameters. Instead, it attaches a soft prompt to the input text, whereby downstream tasks can be well adapted by merely learning the embeddings of prompt tokens. Nevertheless, existing methods still suffer from two challenges: (i) they are hard to balance accuracy and efficiency. A lon… ▽ More

    Submitted 1 July, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

  8. arXiv:2405.06093  [pdf, other

    cs.LG cs.CL

    Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection

    Authors: Bhawesh Kumar, Jonathan Amar, Eric Yang, Nan Li, Yugang Jia

    Abstract: Large Language Models (LLMs) have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task-specific expert annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from gemini-pr… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 21 pages

  9. Contextualization with SPLADE for High Recall Retrieval

    Authors: Eugene Yang

    Abstract: High Recall Retrieval (HRR), such as eDiscovery and medical systematic review, is a search problem that optimizes the cost of retrieving most relevant documents in a given collection. Iterative approaches, such as iterative relevance feedback and uncertainty sampling, are shown to be effective under various operational scenarios. Despite neural models demonstrating success in other text-related ta… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 5 pages, 1 figure, accepted at SIGIR 2024 as short paper

  10. On the Evaluation of Machine-Generated Reports

    Authors: James Mayfield, Eugene Yang, Dawn Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Kayi, Kate Sanders, Marc Mason, Noah Hibbler

    Abstract: Large Language Models (LLMs) have enabled new ways to satisfy information needs. Although great strides have been made in applying them to settings like document ranking and short-form text generation, they still struggle to compose complete, accurate, and verifiable long-form reports. Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of… ▽ More

    Submitted 9 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures, accepted at SIGIR 2024 as perspective paper

  11. Language Fairness in Multilingual Information Retrieval

    Authors: Eugene Yang, Thomas Jänich, James Mayfield, Dawn Lawrie

    Abstract: Multilingual information retrieval (MLIR) considers the problem of ranking documents in several languages for a query expressed in a language that may differ from any of those languages. Recent work has observed that approaches such as combining ranked lists representing a single document language each or using multilingual pretrained language models demonstrate a preference for one language over… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 5 pages, 1 figure, accepted at SIGIR 2024 as short paper

  12. Distillation for Multilingual Information Retrieval

    Authors: Eugene Yang, Dawn Lawrie, James Mayfield

    Abstract: Recent work in cross-language information retrieval (CLIR), where queries and documents are in different languages, has shown the benefit of the Translate-Distill framework that trains a cross-language neural dual-encoder model using translation and distillation. However, Translate-Distill only supports a single document language. Multilingual information retrieval (MLIR), which ranks a multilingu… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 6 pages, 1 figure, accepted at SIGIR 2024 as short paper

  13. PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval

    Authors: Dawn Lawrie, Efsun Kayi, Eugene Yang, James Mayfield, Douglas W. Oard

    Abstract: PLAID, an efficient implementation of the ColBERT late interaction bi-encoder using pretrained language models for ranking, consistently achieves state-of-the-art performance in monolingual, cross-language, and multilingual retrieval. PLAID differs from ColBERT by assigning terms to clusters and representing those terms as cluster centroids plus compressed residual vectors. While PLAID is effectiv… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 5 pages, 1 figure, accepted at SIGIR 2024 as short paper

  14. arXiv:2404.18797  [pdf, other

    cs.IR

    Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval

    Authors: Eugene Yang, Suraj Nair, Dawn Lawrie, James Mayfield, Douglas W. Oard, Kevin Duh

    Abstract: Probabilistic Structured Queries (PSQ) is a cross-language information retrieval (CLIR) method that uses translation probabilities statistically derived from aligned corpora. PSQ is a strong baseline for efficient CLIR using sparse indexing. It is, therefore, useful as the first stage in a cascaded neural CLIR system whose second stage is more effective but too inefficient to be used on its own to… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 11 pages, 5 figures

  15. arXiv:2404.14511  [pdf

    cs.HC

    Children's Overtrust and Shifting Perspectives of Generative AI

    Authors: Jaemarie Solyst, Ellia Yang, Shixian Xie, Jessica Hammer, Amy Ogan, Motahhare Eslami

    Abstract: The capabilities of generative AI (genAI) have dramatically increased in recent times, and there are opportunities for children to leverage new features for personal and school-related endeavors. However, while the future of genAI is taking form, there remain potentially harmful limitations, such as generation of outputs with misinformation and bias. We ran a workshop study focused on ChatGPT to e… ▽ More

    Submitted 29 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Journal ref: Proceedings of the 18th International Scoeity of the Learning Sciences (ICLS) 2024

  16. arXiv:2404.08134  [pdf, other

    cs.IR cs.CL

    Extending Translate-Train for ColBERT-X to African Language CLIR

    Authors: Eugene Yang, Dawn J. Lawrie, Paul McNamee, James Mayfield

    Abstract: This paper describes the submission runs from the HLTCOE team at the CIRAL CLIR tasks for African languages at FIRE 2023. Our submissions use machine translation models to translate the documents and the training passages, and ColBERT-X as the retrieval model. Additionally, we present a set of unofficial runs that use an alternative training procedure with a similar training setting.

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 10 pages, 2 figures. System description paper for HLTCOE's participation in CIRAL@FIRE 2023

  17. arXiv:2404.08118  [pdf, ps, other

    cs.CL cs.IR

    HLTCOE at TREC 2023 NeuCLIR Track

    Authors: Eugene Yang, Dawn Lawrie, James Mayfield

    Abstract: The HLTCOE team applied PLAID, an mT5 reranker, and document translation to the TREC 2023 NeuCLIR track. For PLAID we included a variety of models and training techniques -- the English model released with ColBERT v2, translate-train~(TT), Translate Distill~(TD) and multilingual translate-train~(MTT). TT trains a ColBERT model with English queries and passages automatically translated into the doc… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 6 pages. Part of TREC 2023 Proceedings

  18. arXiv:2404.08071  [pdf, other

    cs.IR

    Overview of the TREC 2023 NeuCLIR Track

    Authors: Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

    Abstract: The principal goal of the TREC Neural Cross-Language Information Retrieval (NeuCLIR) track is to study the impact of neural approaches to cross-language information retrieval. The track has created four collections, large collections of Chinese, Persian, and Russian newswire and a smaller collection of Chinese scientific abstracts. The principal tasks are ranked retrieval of news in one of the thr… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 27 pages, 17 figures. Part of the TREC 2023 Proceedings

  19. arXiv:2404.01679  [pdf, other

    cs.CL cs.SI physics.soc-ph

    Event Detection from Social Media for Epidemic Prediction

    Authors: Tanmay Parekh, Anh Mac, Jiarui Yu, Yuxuan Dong, Syed Shahriar, Bonnie Liu, Eric Yang, Kuan-Hao Huang, Wei Wang, Nanyun Peng, Kai-Wei Chang

    Abstract: Social media is an easy-to-access platform providing timely updates about societal trends and events. Discussions regarding epidemic-related events such as infections, symptoms, and social interactions can be crucial for informing policymaking during epidemic outbreaks. In our work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by de… ▽ More

    Submitted 24 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL 2024

  20. arXiv:2404.01464  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images

    Authors: JungEun Kim, Hangyul Yoon, Geondo Park, Kyungsu Kim, Eunho Yang

    Abstract: 4D medical images, which represent 3D images with temporal information, are crucial in clinical practice for capturing dynamic changes and monitoring long-term disease progression. However, acquiring 4D medical images poses challenges due to factors such as radiation exposure and imaging duration, necessitating a balance between achieving high temporal resolution and minimizing adverse effects. Gi… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  21. arXiv:2404.00384  [pdf, other

    cs.CV

    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias

    Authors: Sanghyun Jo, Soohyun Ryu, Sungyub Kim, Eunho Yang, Kyungsu Kim

    Abstract: We identify a critical bias in contemporary CLIP-based models, which we denote as single tag bias. This bias manifests as a disproportionate focus on a singular tag (word) while neglecting other pertinent tags, stemming from CLIP's text embeddings that prioritize one specific tag in image-text relationships. When deconstructing text into individual tags, only one tag tends to have high relevancy w… ▽ More

    Submitted 20 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

  22. Exciton-activated effective phonon magnetic moment in monolayer MoS2

    Authors: Chunli Tang, Gaihua Ye, Cynthia Nnokwe, Mengqi Fang, Li Xiang, Masoud Mahjouri-Samani, Dmitry Smirnov, Eui-Hyeok Yang, Tingting Wang, Lifa Zhang, Rui He, Wencan **

    Abstract: Optical excitation of chiral phonons plays a vital role in studying the phonon-driven magnetic phenomena in solids. Transition metal dichalcogenides host chiral phonons at high symmetry points of the Brillouin zone, providing an ideal platform to explore the interplay between chiral phonons and valley degree of freedom. Here, we investigate the helicity-resolved magneto-Raman response of monolayer… ▽ More

    Submitted 7 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Journal ref: Phys. Rev. B 109, 155426 (2024)

  23. arXiv:2403.12327  [pdf, other

    cs.CV cs.LG

    GT-Rain Single Image Deraining Challenge Report

    Authors: Howard Zhang, Yunhao Ba, Ethan Yang, Rishi Upadhyay, Alex Wong, Achuta Kadambi, Yun Guo, Xueyao Xiao, Xiaoxiong Wang, Yi Li, Yi Chang, Luxin Yan, Chaochao Zheng, Lu** Wang, Bin Liu, Sunder Ali Khowaja, Jiseok Yoon, Ik-Hyun Lee, Zhao Zhang, Yanyan Wei, Jiahuan Ren, Suiyi Zhao, Huan Zheng

    Abstract: This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained o… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  24. arXiv:2403.06372  [pdf, other

    cs.IR

    Repeated Padding as Data Augmentation for Sequential Recommendation

    Authors: Yizhou Dang, Yuting Liu, Enneng Yang, Guibing Guo, Linying Jiang, Xingwei Wang, Jianzhe Zhao

    Abstract: Sequential recommendation aims to provide users with personalized suggestions based on their historical interactions. When training sequential models, padding is a widely adopted technique for two main reasons: 1) The vast majority of models can only handle fixed-length sequences; 2) Batching-based training needs to ensure that the sequences in each batch have the same length. The special value \e… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  25. arXiv:2402.18096  [pdf, other

    cs.LG cs.AI

    No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization

    Authors: June Yong Yang, Byeongwook Kim, Jeongin Bae, Beomseok Kwon, Gunho Park, Eunho Yang, Se Jung Kwon, Dongsoo Lee

    Abstract: Key-Value (KV) Caching has become an essential technique for accelerating the inference speed and throughput of generative Large Language Models~(LLMs). However, the memory footprint of the KV cache poses a critical bottleneck in LLM deployment as the cache size grows with batch size and sequence length, often surpassing even the size of the model itself. Although recent methods were proposed to s… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  26. arXiv:2402.16311  [pdf, other

    cs.CL cs.AI

    Cross-domain Chinese Sentence Pattern Parsing

    Authors: **gsi Yu, Cunliang Kong, Liner Yang, Meishan Zhang, Lin Zhu, Yujie Wang, Haozhe Lin, Maosong Sun, Erhong Yang

    Abstract: Sentence Pattern Structure (SPS) parsing is a syntactic analysis method primarily employed in language teaching.Existing SPS parsers rely heavily on textbook corpora for training, lacking cross-domain capability.To overcome this constraint, this paper proposes an innovative approach leveraging large language models (LLMs) within a self-training framework. Partial syntactic rules from a source doma… ▽ More

    Submitted 7 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  27. arXiv:2402.16072  [pdf

    cs.ET quant-ph

    Demonstration of 3 V Programmable Josephson Junction Arrays Using Non-Integer-Multiple Logic

    Authors: Wenhui Cao, Erkun Yang, **** Li, Huan Qiao, Yuan Zhong, Qing Zhong, Da Xu, Xueshen Wang, Xiaolong Xu, Shijian Wang, Jian Chen

    Abstract: This article demonstrates a new kind of programmable logic for the representation of an integer that can be used for the programmable Josephson voltage standard. It can enable the numbers of junctions in most bits to be variable integer values, which is different from normal binary logic or ternary logic. Consequently, missing junctions due to superconducting short circuits can be tolerated under… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  28. arXiv:2402.13740  [pdf, other

    cs.CL

    From Text to CQL: Bridging Natural Language and Corpus Search Engine

    Authors: Luming Lu, Jiyuan An, Yujie Wang, Liner yang, Cunliang Kong, Zhenghao Liu, Shuo Wang, Haozhe Lin, Mingwei Fang, Ya** Huang, Erhong Yang

    Abstract: Natural Language Processing (NLP) technologies have revolutionized the way we interact with information systems, with a significant focus on converting natural language queries into formal query languages such as SQL. However, less emphasis has been placed on the Corpus Query Language (CQL), a critical tool for linguistic research and detailed analysis within text corpora. The manual construction… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  29. arXiv:2402.13524  [pdf, other

    cs.CL

    OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models

    Authors: Yang Liu, Meng Xu, Shuo Wang, Liner Yang, Haoyu Wang, Zhenghao Liu, Cunliang Kong, Yun Chen, Yang Liu, Maosong Sun, Erhong Yang

    Abstract: Modern large language models (LLMs) should generally benefit individuals from various cultural backgrounds around the world. However, most recent advanced generative evaluation benchmarks tailed for LLMs mainly focus on English. To this end, we introduce OMGEval, the first Open-source Multilingual Generative test set that can assess the capability of LLMs in different languages. For each language,… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  30. arXiv:2402.12842  [pdf, other

    cs.CL cs.AI cs.LG

    PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

    Authors: Gyeongman Kim, Doohyuk Jang, Eunho Yang

    Abstract: Recent advancements in large language models (LLMs) have raised concerns about inference costs, increasing the need for research into model compression. While knowledge distillation (KD) is a prominent method for this, research on KD for generative language models like LLMs is relatively sparse, and the approach of distilling student-friendly knowledge, which has shown promising performance in KD… ▽ More

    Submitted 24 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Code: https://github.com/gmkim-ai/PromptKD

  31. arXiv:2402.11148  [pdf, other

    cs.LG cs.CV

    Knowledge Distillation Based on Transformed Teacher Matching

    Authors: Kaixiang Zheng, En-Hui Yang

    Abstract: As a technique to bridge logit matching and probability distribution matching, temperature scaling plays a pivotal role in knowledge distillation (KD). Conventionally, temperature scaling is applied to both teacher's logits and student's logits in KD. Motivated by some recent works, in this paper, we drop instead temperature scaling on the student side, and systematically study the resulting varia… ▽ More

    Submitted 7 March, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Published as a conference paper at ICLR 2024

  32. arXiv:2402.08492  [pdf

    cs.AI

    The Application of ChatGPT in Responding to Questions Related to the Boston Bowel Preparation Scale

    Authors: Xiaoqiang Liu, Yubin Wang, Zicheng Huang, Boming Xu, Yilin Zeng, Xinqi Chen, Zilong Wang, Enning Yang, Xiaoxuan Lei, Yisen Huang, Xiaobo Liu

    Abstract: Background: Colonoscopy, a crucial diagnostic tool in gastroenterology, depends heavily on superior bowel preparation. ChatGPT, a large language model with emergent intelligence which also exhibits potential in medical applications. This study aims to assess the accuracy and consistency of ChatGPT in using the Boston Bowel Preparation Scale (BBPS) for colonoscopy assessment. Methods: We retrospect… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  33. arXiv:2402.02805  [pdf, other

    cs.AI cs.CL cs.LG

    Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

    Authors: Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony Cohn, Janet B. Pierrehumbert

    Abstract: Planning is a fundamental property of human intelligence. Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large language models (LLMs) succeed at this task? Here, we present the first large-scale study investigating this question. We find that a representative set of closed and open-source LLMs, including GPT-4 and LL… ▽ More

    Submitted 3 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted at ICML-2024

  34. arXiv:2402.02705  [pdf, other

    cs.LG cs.AI cs.CV

    Representation Surgery for Multi-Task Model Merging

    Authors: Enneng Yang, Li Shen, Zhenyi Wang, Guibing Guo, Xiaojun Chen, Xingwei Wang, Dacheng Tao

    Abstract: Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization. Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training, greatly expanding the application scenarios of MTL. However, by visualizing the representation distribution o… ▽ More

    Submitted 28 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: Forty-first International Conference on Machine Learning (ICML 2024)

  35. arXiv:2402.01261  [pdf, other

    cs.LG cs.AI

    TEDDY: Trimming Edges with Degree-based Discrimination strategY

    Authors: Hyun** Seo, Jihun Yun, Eunho Yang

    Abstract: Since the pioneering work on the lottery ticket hypothesis for graph neural networks (GNNs) was proposed in Chen et al. (2021), the study on finding graph lottery tickets (GLT) has become one of the pivotal focus in the GNN community, inspiring researchers to discover sparser GLT while achieving comparable performance to original dense networks. In parallel, the graph structure has gained substant… ▽ More

    Submitted 15 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  36. arXiv:2401.08732  [pdf, other

    cs.LG cs.CV cs.IT

    Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information

    Authors: Linfeng Ye, Shayan Mohajer Hamidi, Renhao Tan, En-Hui Yang

    Abstract: It is believed that in knowledge distillation (KD), the role of the teacher is to provide an estimate for the unknown Bayes conditional probability distribution (BCPD) to be used in the student training process. Conventionally, this estimate is obtained by training the teacher using maximum log-likelihood (MLL) method. To improve this estimate for KD, in this paper we introduce the concept of cond… ▽ More

    Submitted 7 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 32 pages, 19 figures, Published as a conference paper at ICLR 2024

    MSC Class: 68T30 ACM Class: I.2.6

    Journal ref: International Conference on Learning Representations 2024 (ICLR)

  37. arXiv:2401.04993  [pdf, other

    cs.LG cs.AI

    AdaFed: Fair Federated Learning via Adaptive Common Descent Direction

    Authors: Shayan Mohajer Hamidi, En-Hui Yang

    Abstract: Federated learning (FL) is a promising technology via which some edge devices/clients collaboratively train a machine learning model orchestrated by a server. Learning an unfair model is known as a critical problem in federated learning, where the trained model may unfairly advantage or disadvantage some of the devices. To tackle this problem, in this work, we propose AdaFed. The goal of AdaFed is… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: This paper has been accepted in Transactions on Machine Learning Research. This is the link to the paper: https://openreview.net/forum?id=rFecyFpFUp&referrer=%5Bthe%20profile%20of%20Shayan%20Mohajer%20Hamidi%5D(%2Fprofile%3Fid%3D~Shayan_Mohajer_Hamidi1)

  38. arXiv:2401.04810  [pdf, other

    cs.IR cs.CL

    Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation

    Authors: Eugene Yang, Dawn Lawrie, James Mayfield, Douglas W. Oard, Scott Miller

    Abstract: Prior work on English monolingual retrieval has shown that a cross-encoder trained using a large number of relevance judgments for query-document pairs can be used as a teacher to train more efficient, but similarly effective, dual-encoder student models. Applying a similar knowledge distillation approach to training an efficient dual-encoder model for Cross-Language Information Retrieval (CLIR),… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 17 pages, 1 figure, accepted at ECIR 2024

  39. arXiv:2312.12511  [pdf, other

    astro-ph.SR astro-ph.EP astro-ph.GA

    Stellar Flares Are Far-Ultraviolet Luminous

    Authors: Vera L. Berger, Jason T. Hinkle, Michael A. Tucker, Benjamin J. Shappee, Jennifer L. van Saders, Daniel Huber, Jeffrey W. Reep, Xudong Sun, Kai E. Yang

    Abstract: We identify 182 flares on 158 stars within 100 pc of the Sun in both the near-ultraviolet (NUV: 1750-2750 Å) and far-ultraviolet (FUV: 1350-1750 Å) using high-cadence light curves from the Galaxy Evolution Explorer (GALEX). Ultraviolet (UV) emission from stellar flares plays a crucial role in determining the habitability of exoplanetary systems. However, whether such UV emission promotes or threat… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Submitted to MNRAS, comments welcome

  40. arXiv:2312.10909  [pdf, ps, other

    math.CO

    Laguerre inequalities and determinantal inequalities for the finite difference of the partition functions

    Authors: Eve Y. Y. Yang

    Abstract: The paper aims to establish the Turán inequalities, the Laguerre inequalities (order $2$), and the determinantal inequalities (order $3$) for $Δp(n)$ and $Δ\bar{p}(n)$, where $Δf(n)$ is the first-order forward difference of a sequence $f(n)$. The functions $p(n)$ and $\bar{p}(n)$ denote the partition function and overpartition function, respectively. Conjectures for thresholds of Laguerre inequali… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  41. arXiv:2312.05487  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.str-el quant-ph

    Chiral symmetry breaking and topological charge of graphene nanoribbons

    Authors: Hyun Cheol Lee, S. -R. Eric Yang

    Abstract: We explore the edge properties of rectangular graphene nanoribbons featuring two zigzag edges and two armchair edges. Although the self-consistent Hartree-Fock fields break chiral symmetry, our work demonstrates that graphene nanoribbons maintain their status as short-range entangled symmetry-protected topological insulators. The relevant symmetry involves combined mirror and time-reversal operati… ▽ More

    Submitted 22 March, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: Published version, New J. Phys. 26 (2024) 033039

  42. arXiv:2312.00944  [pdf, other

    cs.CV cs.GR

    Enhancing Diffusion Models with 3D Perspective Geometry Constraints

    Authors: Rishi Upadhyay, Howard Zhang, Yunhao Ba, Ethan Yang, Blake Gella, Sicheng Jiang, Alex Wong, Achuta Kadambi

    Abstract: While perspective is a well-studied topic in art, it is generally taken for granted in images. However, for the recent wave of high-quality image synthesis methods such as latent diffusion models, perspective accuracy is not an explicit requirement. Since these methods are capable of outputting a wide gamut of possible images, it is difficult for these synthesized images to adhere to the principle… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: Project Webpage: http://visual.ee.ucla.edu/diffusionperspective.htm/

  43. arXiv:2311.05956  [pdf, other

    cs.IR cs.LG

    ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation

    Authors: Yuting Liu, Enneng Yang, Yizhou Dang, Guibing Guo, Qiang Liu, Yuliang Liang, Linying Jiang, Xingwei Wang

    Abstract: Multimodal recommendation aims to model user and item representations comprehensively with the involvement of multimedia content for effective recommendations. Existing research has shown that it is beneficial for recommendation performance to combine (user- and item-) ID embeddings with multimodal salient features, indicating the value of IDs. However, there is a lack of a thorough analysis of th… ▽ More

    Submitted 22 May, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

  44. arXiv:2311.05844  [pdf, other

    cs.CV cs.AI cs.CL cs.MM cs.SD eess.AS

    Face-StyleSpeech: Improved Face-to-Voice latent map** for Natural Zero-shot Speech Synthesis from a Face Image

    Authors: Minki Kang, Wooseok Han, Eunho Yang

    Abstract: Generating a voice from a face image is crucial for develo** virtual humans capable of interacting using their unique voices, without relying on pre-recorded human speech. In this paper, we propose Face-StyleSpeech, a zero-shot Text-To-Speech (TTS) synthesis model that generates natural speech conditioned on a face image rather than reference speech. We hypothesize that learning both speaker ide… ▽ More

    Submitted 25 September, 2023; originally announced November 2023.

    Comments: Submitted to ICASSP 2024

  45. arXiv:2310.19316  [pdf, other

    astro-ph.SR

    A Possible Mechanism for "Late Phase" in Stellar White-Light Flares

    Authors: Kai E. Yang, Xudong Sun, Graham S. Kerr, Hugh S. Hudson

    Abstract: M-dwarf flares observed by the \textit{Transiting Exoplanet Survey Satellite} (\textit{TESS}) sometimes exhibit a "peak-bump" light-curve morphology, characterized by a secondary, gradual peak well after the main, impulsive peak. A similar "late phase" is frequently detected in solar flares observed in the extreme-ultraviolet from longer hot coronal loops distinct from the impulsive flare structur… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: 31 pages, 13 figures, accepted for publication in ApJ

  46. arXiv:2310.10849  [pdf, other

    astro-ph.IM physics.ins-det

    Results and Limits of Time Division Multiplexing for the BICEP Array High Frequency Receivers

    Authors: S. Fatigoni, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, D. Beck, J. J. Bock, V. Buza, J. Cheshire, J. Connors, J. Cornelison, M. Crumrine, A. J. Cukierman, E. V. Denison, M. I. Dierickx, L. Duband, M. Eiben, J. P. Filippini, A. Fortes, M. Gao, C. Giannakopoulos, N. Goeckner-Wald, D. C. Goldfinger , et al. (62 additional authors not shown)

    Abstract: Time-Division Multiplexing is the readout architecture of choice for many ground and space experiments, as it is a very mature technology with proven outstanding low-frequency noise stability, which represents a central challenge in multiplexing. Once fully populated, each of the two BICEP Array high frequency receivers, observing at 150GHz and 220/270GHz, will have 7776 TES detectors tiled on the… ▽ More

    Submitted 24 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 10 pages, 7 figures, Submitted to Journal of Low Temperature Physics

    Journal ref: Journal of Low Temperature Physics (2024)

  47. arXiv:2310.08970  [pdf, other

    cond-mat.mes-hall cond-mat.stat-mech quant-ph

    Mutual information and correlations across topological phase transitions in topologically ordered graphene zigzag nanoribbons

    Authors: In-Hwan Lee, Hoang-Anh Le, S. -R. Eric Yang

    Abstract: Graphene zigzag nanoribbons, initially in a topologically ordered state, undergo a topological phase transition into crossover phases distinguished by quasi-topological order. We computed mutual information for both the topologically ordered phase and its crossover phases, revealing the following results: (i) In the topologically ordered phase, A-chirality carbon lines strongly entangle with B-chi… ▽ More

    Submitted 21 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: A featured paper published in a special issue titled 'Entanglement Entropy and Quantum Phase Transitions' within the journal 'Entropy'. (This is the published version.)

    Journal ref: Entropy 2023, 25(10), 1449

  48. arXiv:2310.02575  [pdf, other

    cs.LG cs.CV

    AdaMerging: Adaptive Model Merging for Multi-Task Learning

    Authors: Enneng Yang, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, Dacheng Tao

    Abstract: Multi-task learning (MTL) aims to empower a model to tackle multiple tasks simultaneously. A recent development known as task arithmetic has revealed that several models, each fine-tuned for distinct tasks, can be directly merged into a single model to execute MTL without necessitating a retraining process using the initial training data. Nevertheless, this direct addition of models often leads to… ▽ More

    Submitted 28 May, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: International Conference on Learning Representations (ICLR 2024)

  49. arXiv:2309.16936  [pdf, other

    cs.CV cs.AI cs.LG

    PC-Adapter: Topology-Aware Adapter for Efficient Domain Adaption on Point Clouds with Rectified Pseudo-label

    Authors: Joonhyung Park, Hyun** Seo, Eunho Yang

    Abstract: Understanding point clouds captured from the real-world is challenging due to shifts in data distribution caused by varying object scales, sensor angles, and self-occlusion. Prior works have addressed this issue by combining recent learning principles such as self-supervised learning, self-training, and adversarial training, which leads to significant computational overhead.Toward succinct yet pow… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 11 pages; Accepted to ICCV 2023

  50. arXiv:2309.15302  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    STERLING: Self-Supervised Terrain Representation Learning from Unconstrained Robot Experience

    Authors: Haresh Karnan, Elvin Yang, Daniel Farkash, Garrett Warnell, Joydeep Biswas, Peter Stone

    Abstract: Terrain awareness, i.e., the ability to identify and distinguish different types of terrain, is a critical ability that robots must have to succeed at autonomous off-road navigation. Current approaches that provide robots with this awareness either rely on labeled data which is expensive to collect, engineered features and cost functions that may not generalize, or expert human demonstrations whic… ▽ More

    Submitted 20 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Project website: https://hareshkarnan.github.io/sterling/

    Journal ref: Conference on Robot Learning (CoRL 2023)