-
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation
Authors:
Abe Bohan Hou,
Orion Weller,
Guanghui Qin,
Eugene Yang,
Dawn Lawrie,
Nils Holzenberger,
Andrew Blair-Stanek,
Benjamin Van Durme
Abstract:
Legal professionals need to write analyses that rely on citations to relevant precedents, i.e., previous case decisions. Intelligent systems assisting legal professionals in writing such documents provide great benefits but are challenging to design. Such systems need to help locate, summarize, and reason over salient precedents in order to be useful. To enable systems for such tasks, we work with…
▽ More
Legal professionals need to write analyses that rely on citations to relevant precedents, i.e., previous case decisions. Intelligent systems assisting legal professionals in writing such documents provide great benefits but are challenging to design. Such systems need to help locate, summarize, and reason over salient precedents in order to be useful. To enable systems for such tasks, we work with legal professionals to transform a large open-source legal corpus into a dataset supporting two important backbone tasks: information retrieval (IR) and retrieval-augmented generation (RAG). This dataset CLERC (Case Law Evaluation Retrieval Corpus), is constructed for training and evaluating models on their ability to (1) find corresponding citations for a given piece of legal analysis and to (2) compile the text of these citations (as well as previous context) into a cogent analysis that supports a reasoning goal. We benchmark state-of-the-art models on CLERC, showing that current approaches still struggle: GPT-4o generates analyses with the highest ROUGE F-scores but hallucinates the most, while zero-shot IR models only achieve 48.3% recall@1000.
△ Less
Submitted 27 June, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Discovering influential text using convolutional neural networks
Authors:
Megan Ayers,
Luke Sanford,
Margaret Roberts,
Eddie Yang
Abstract:
Experimental methods for estimating the impacts of text on human evaluation have been widely used in the social sciences. However, researchers in experimental settings are usually limited to testing a small number of pre-specified text treatments. While efforts to mine unstructured texts for features that causally affect outcomes have been ongoing in recent years, these models have primarily focus…
▽ More
Experimental methods for estimating the impacts of text on human evaluation have been widely used in the social sciences. However, researchers in experimental settings are usually limited to testing a small number of pre-specified text treatments. While efforts to mine unstructured texts for features that causally affect outcomes have been ongoing in recent years, these models have primarily focused on the topics or specific words of text, which may not always be the mechanism of the effect. We connect these efforts with NLP interpretability techniques and present a method for flexibly discovering clusters of similar text phrases that are predictive of human reactions to texts using convolutional neural networks. When used in an experimental setting, this method can identify text treatments and their effects under certain assumptions. We apply the method to two datasets. The first enables direct validation of the model's ability to detect phrases known to cause the outcome. The second demonstrates its ability to flexibly discover text treatments with varying textual structures. In both cases, the model learns a greater variety of text treatments compared to benchmark methods, and these text features quantitatively meet or exceed the ability of benchmark methods to predict the outcome.
△ Less
Submitted 21 June, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
ODIN: Identifying Protoclusters and Cosmic Filaments Traced by Ly$α$-emitting Galaxies
Authors:
Vandana Ramakrishnan,
Kyoung-Soo Lee,
Maria Celeste Artale,
Eric Gawiser. Yu** Yang,
Changbom Park,
Robin Ciardullo,
Lucia Guaita,
Sang Hyeok Im,
Seongjae Kim,
Ankit Kumar,
Jaehyun Lee,
Seong-Kook Lee,
Byeongha Moon,
Nelson Padilla,
Alexandra Pope,
Roxana Popescu,
Hyunmi Song,
Paulina Troncoso,
Francisco Valdes,
Ann Zabludoff
Abstract:
To understand the formation and evolution of massive cosmic structures, studying them at high redshift, in the epoch when they formed the majority of their mass is essential. The One-hundred-deg$^2$ DECam Imaging in Narrowbands (ODIN) survey is undertaking the widest-area narrowband program to date, to use Ly$α$-emitting galaxies (LAEs) to trace the large-scale structure (LSS) of the Universe at t…
▽ More
To understand the formation and evolution of massive cosmic structures, studying them at high redshift, in the epoch when they formed the majority of their mass is essential. The One-hundred-deg$^2$ DECam Imaging in Narrowbands (ODIN) survey is undertaking the widest-area narrowband program to date, to use Ly$α$-emitting galaxies (LAEs) to trace the large-scale structure (LSS) of the Universe at three cosmic epochs. In this work, we present results at $z$ = 3.1 based on early ODIN data in the COSMOS field. We identify and characterize protoclusters and cosmic filaments using multiple methods and discuss their strengths and weaknesses. We then compare our observations against the IllustrisTNG suite of cosmological hydrodynamical simulations. The two are in excellent agreement, with a similar number and angular size of structures identified above a specified density threshold. We are able to recover the simulated protoclusters with $\log$(M$_{z=0}$/$M_\odot$) $\gtrsim$ 14.4 in $\sim$ 60\% of the cases. With these objects we show that the descendant masses of the protoclusters in our sample can be estimated purely based on our 2D measurements, finding a median $z$ = 0 mass of $\sim10^{14.5}$M$_\odot$. The lack of information on the radial extent of each protocluster introduces a $\sim$0.4~dex uncertainty in its descendant mass. Finally, we show that the recovery of the cosmic web in the vicinity of protoclusters is both efficient and accurate. The similarity of our observations and the simulations imply that our structure selection is likewise robust and efficient, demonstrating that LAEs are reliable tracers of the LSS.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
PruNeRF: Segment-Centric Dataset Pruning via 3D Spatial Consistency
Authors:
Yeonsung Jung,
Heecheol Yun,
Joonhyung Park,
**-Hwa Kim,
Eunho Yang
Abstract:
Neural Radiance Fields (NeRF) have shown remarkable performance in learning 3D scenes. However, NeRF exhibits vulnerability when confronted with distractors in the training images -- unexpected objects are present only within specific views, such as moving entities like pedestrians or birds. Excluding distractors during dataset construction is a straightforward solution, but without prior knowledg…
▽ More
Neural Radiance Fields (NeRF) have shown remarkable performance in learning 3D scenes. However, NeRF exhibits vulnerability when confronted with distractors in the training images -- unexpected objects are present only within specific views, such as moving entities like pedestrians or birds. Excluding distractors during dataset construction is a straightforward solution, but without prior knowledge of their types and quantities, it becomes prohibitively expensive. In this paper, we propose PruNeRF, a segment-centric dataset pruning framework via 3D spatial consistency, that effectively identifies and prunes the distractors. We first examine existing metrics for measuring pixel-wise distraction and introduce Influence Functions for more accurate measurements. Then, we assess 3D spatial consistency using a depth-based reprojection technique to obtain 3D-aware distraction. Furthermore, we incorporate segmentation for pixel-to-segment refinement, enabling more precise identification. Our experiments on benchmark datasets demonstrate that PruNeRF consistently outperforms state-of-the-art methods in robustness against distractors.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Unleashing the Potential of Text-attributed Graphs: Automatic Relation Decomposition via Large Language Models
Authors:
Hyun** Seo,
Taewon Kim,
June Yong Yang,
Eunho Yang
Abstract:
Recent advancements in text-attributed graphs (TAGs) have significantly improved the quality of node features by using the textual modeling capabilities of language models. Despite this success, utilizing text attributes to enhance the predefined graph structure remains largely unexplored. Our extensive analysis reveals that conventional edges on TAGs, treated as a single relation (e.g., hyperlink…
▽ More
Recent advancements in text-attributed graphs (TAGs) have significantly improved the quality of node features by using the textual modeling capabilities of language models. Despite this success, utilizing text attributes to enhance the predefined graph structure remains largely unexplored. Our extensive analysis reveals that conventional edges on TAGs, treated as a single relation (e.g., hyperlinks) in previous literature, actually encompass mixed semantics (e.g., "advised by" and "participates in"). This simplification hinders the representation learning process of Graph Neural Networks (GNNs) on downstream tasks, even when integrated with advanced node features. In contrast, we discover that decomposing these edges into distinct semantic relations significantly enhances the performance of GNNs. Despite this, manually identifying and labeling of edges to corresponding semantic relations is labor-intensive, often requiring domain expertise. To this end, we introduce RoSE (Relation-oriented Semantic Edge-decomposition), a novel framework that leverages the capability of Large Language Models (LLMs) to decompose the graph structure by analyzing raw text attributes - in a fully automated manner. RoSE operates in two stages: (1) identifying meaningful relations using an LLM-based generator and discriminator, and (2) categorizing each edge into corresponding relations by analyzing textual contents associated with connected nodes via an LLM-based decomposer. Extensive experiments demonstrate that our model-agnostic framework significantly enhances node classification performance across various datasets, with improvements of up to 16% on the Wisconsin dataset.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
De Bruijn Polyominoes
Authors:
D. Condon,
Yuxin Wang,
E. Yang
Abstract:
We introduce the notions of de Bruijn polyominoes and prismatic polyominoes, which generalize the notions of de Bruijn sequences and arrays. Given a small fixed polyomino $p$ and a set of colors $[n]$, a de Bruijn polyomino for $(p,n)$ is a colored fixed polyomino $P$ with cells colored from $[n]$ such that every possible coloring of $p$ from $[n]$ exists as a subset of $P$. We call de Bruijn poly…
▽ More
We introduce the notions of de Bruijn polyominoes and prismatic polyominoes, which generalize the notions of de Bruijn sequences and arrays. Given a small fixed polyomino $p$ and a set of colors $[n]$, a de Bruijn polyomino for $(p,n)$ is a colored fixed polyomino $P$ with cells colored from $[n]$ such that every possible coloring of $p$ from $[n]$ exists as a subset of $P$. We call de Bruijn polyominoes for $(p,n)$ of minimum size $(p,n)$-prismatic. We discuss for some values of $p$ and $n$ the shape of a $(p,n)$-prismatic polyomino $P$, the construction of a coloring of $P$, and the enumeration of the colorings of $P$. We find evidence that the difficulty of these problems may depend on the parity of the size of $p$
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Efficient Prompt Tuning by Multi-Space Projection and Prompt Fusion
Authors:
Pengxiang Lan,
Enneng Yang,
Yuting Liu,
Guibing Guo,
Linying Jiang,
Jianzhe Zhao,
Xingwei Wang
Abstract:
Prompt tuning is a promising method to fine-tune a pre-trained language model without retraining its large-scale parameters. Instead, it attaches a soft prompt to the input text, whereby downstream tasks can be well adapted by merely learning the embeddings of prompt tokens. Nevertheless, existing methods still suffer from two challenges: (i) they are hard to balance accuracy and efficiency. A lon…
▽ More
Prompt tuning is a promising method to fine-tune a pre-trained language model without retraining its large-scale parameters. Instead, it attaches a soft prompt to the input text, whereby downstream tasks can be well adapted by merely learning the embeddings of prompt tokens. Nevertheless, existing methods still suffer from two challenges: (i) they are hard to balance accuracy and efficiency. A longer (shorter) soft prompt generally leads to a better(worse) accuracy but at the cost of more (less) training time. (ii)The performance may not be consistent when adapting to different downstream tasks. We attribute it to the same embedding space but responsible for different requirements of downstream tasks. To address these issues, we propose an Efficient Prompt Tuning method (EPT) by multi-space projection and prompt fusion. Specifically, it decomposes a given soft prompt into a shorter prompt and two low-rank matrices, significantly reducing the training time. Accuracy is also enhanced by leveraging low-rank matrices and the short prompt as additional knowledge sources to enrich the semantics of the original short prompt. In addition, we project the soft prompt into multiple subspaces to improve the performance consistency, and then adaptively learn the combination weights of different spaces through a gating network. Experiments on 13 natural language processing downstream tasks show that our method significantly and consistently outperforms 11 comparison methods with the relative percentage of improvements up to 12.9%, and training time decreased by 14%.
△ Less
Submitted 1 July, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection
Authors:
Bhawesh Kumar,
Jonathan Amar,
Eric Yang,
Nan Li,
Yugang Jia
Abstract:
Large Language Models (LLMs) have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task-specific expert annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from gemini-pr…
▽ More
Large Language Models (LLMs) have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task-specific expert annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from gemini-pro 1.0 for the detection of Schedule-of-Event (SoE) tables, which specify care plan in clinical trial protocols. We introduce a filtering mechanism to select high-confidence labels for this table classification task, thereby reducing the noise in the auto-generated labels. We show that fine-tuned PaLM-2 with those labels achieves performance that exceeds the gemini-pro 1.0 and other LLMs. Furthermore, its performance is close to a PaLM-2 fine-tuned on labels obtained from non-expert annotators. Our results show that leveraging LLM-generated labels through powerful models like gemini-pro can potentially serve as a viable strategy for improving LLM performance through fine-tuning in specialized tasks, particularly in domains where expert annotations are scarce, expensive, or time-consuming to obtain.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Contextualization with SPLADE for High Recall Retrieval
Authors:
Eugene Yang
Abstract:
High Recall Retrieval (HRR), such as eDiscovery and medical systematic review, is a search problem that optimizes the cost of retrieving most relevant documents in a given collection. Iterative approaches, such as iterative relevance feedback and uncertainty sampling, are shown to be effective under various operational scenarios. Despite neural models demonstrating success in other text-related ta…
▽ More
High Recall Retrieval (HRR), such as eDiscovery and medical systematic review, is a search problem that optimizes the cost of retrieving most relevant documents in a given collection. Iterative approaches, such as iterative relevance feedback and uncertainty sampling, are shown to be effective under various operational scenarios. Despite neural models demonstrating success in other text-related tasks, linear models such as logistic regression, in general, are still more effective and efficient in HRR since the model is trained and retrieves documents from the same fixed collection. In this work, we leverage SPLADE, an efficient retrieval model that transforms documents into contextualized sparse vectors, for HRR. Our approach combines the best of both worlds, leveraging both the contextualization from pretrained language models and the efficiency of linear models. It reduces 10% and 18% of the review cost in two HRR evaluation collections under a one-phase review workflow with a target recall of 80%. The experiment is implemented with TARexp and is available at https://github.com/eugene-yang/LSR-for-TAR.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
On the Evaluation of Machine-Generated Reports
Authors:
James Mayfield,
Eugene Yang,
Dawn Lawrie,
Sean MacAvaney,
Paul McNamee,
Douglas W. Oard,
Luca Soldaini,
Ian Soboroff,
Orion Weller,
Efsun Kayi,
Kate Sanders,
Marc Mason,
Noah Hibbler
Abstract:
Large Language Models (LLMs) have enabled new ways to satisfy information needs. Although great strides have been made in applying them to settings like document ranking and short-form text generation, they still struggle to compose complete, accurate, and verifiable long-form reports. Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of…
▽ More
Large Language Models (LLMs) have enabled new ways to satisfy information needs. Although great strides have been made in applying them to settings like document ranking and short-form text generation, they still struggle to compose complete, accurate, and verifiable long-form reports. Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of users. In this perspective paper, we draw together opinions from industry and academia, and from a variety of related research areas, to present our vision for automatic report generation, and -- critically -- a flexible framework by which such reports can be evaluated. In contrast with other summarization tasks, automatic report generation starts with a detailed description of an information need, stating the necessary background, requirements, and scope of the report. Further, the generated reports should be complete, accurate, and verifiable. These qualities, which are desirable -- if not required -- in many analytic report-writing settings, require rethinking how to build and evaluate systems that exhibit these qualities. To foster new efforts in building these systems, we present an evaluation framework that draws on ideas found in various evaluations. To test completeness and accuracy, the framework uses nuggets of information, expressed as questions and answers, that need to be part of any high-quality generated report. Additionally, evaluation of citations that map claims made in the report to their source documents ensures verifiability.
△ Less
Submitted 9 May, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Language Fairness in Multilingual Information Retrieval
Authors:
Eugene Yang,
Thomas Jänich,
James Mayfield,
Dawn Lawrie
Abstract:
Multilingual information retrieval (MLIR) considers the problem of ranking documents in several languages for a query expressed in a language that may differ from any of those languages. Recent work has observed that approaches such as combining ranked lists representing a single document language each or using multilingual pretrained language models demonstrate a preference for one language over…
▽ More
Multilingual information retrieval (MLIR) considers the problem of ranking documents in several languages for a query expressed in a language that may differ from any of those languages. Recent work has observed that approaches such as combining ranked lists representing a single document language each or using multilingual pretrained language models demonstrate a preference for one language over others. This results in systematic unfair treatment of documents in different languages. This work proposes a language fairness metric to evaluate whether documents across different languages are fairly ranked through statistical equivalence testing using the Kruskal-Wallis test. In contrast to most prior work in group fairness, we do not consider any language to be an unprotected group. Thus our proposed measure, PEER (Probability of EqualExpected Rank), is the first fairness metric specifically designed to capture the language fairness of MLIR systems. We demonstrate the behavior of PEER on artificial ranked lists. We also evaluate real MLIR systems on two publicly available benchmarks and show that the PEER scores align with prior analytical findings on MLIR fairness. Our implementation is compatible with ir-measures and is available at http://github.com/hltcoe/peer_measure.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Distillation for Multilingual Information Retrieval
Authors:
Eugene Yang,
Dawn Lawrie,
James Mayfield
Abstract:
Recent work in cross-language information retrieval (CLIR), where queries and documents are in different languages, has shown the benefit of the Translate-Distill framework that trains a cross-language neural dual-encoder model using translation and distillation. However, Translate-Distill only supports a single document language. Multilingual information retrieval (MLIR), which ranks a multilingu…
▽ More
Recent work in cross-language information retrieval (CLIR), where queries and documents are in different languages, has shown the benefit of the Translate-Distill framework that trains a cross-language neural dual-encoder model using translation and distillation. However, Translate-Distill only supports a single document language. Multilingual information retrieval (MLIR), which ranks a multilingual document collection, is harder to train than CLIR because the model must assign comparable relevance scores to documents in different languages. This work extends Translate-Distill and propose Multilingual Translate-Distill (MTD) for MLIR. We show that ColBERT-X models trained with MTD outperform their counterparts trained ith Multilingual Translate-Train, which is the previous state-of-the-art training approach, by 5% to 25% in nDCG@20 and 15% to 45% in MAP. We also show that the model is robust to the way languages are mixed in training batches. Our implementation is available on GitHub.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval
Authors:
Dawn Lawrie,
Efsun Kayi,
Eugene Yang,
James Mayfield,
Douglas W. Oard
Abstract:
PLAID, an efficient implementation of the ColBERT late interaction bi-encoder using pretrained language models for ranking, consistently achieves state-of-the-art performance in monolingual, cross-language, and multilingual retrieval. PLAID differs from ColBERT by assigning terms to clusters and representing those terms as cluster centroids plus compressed residual vectors. While PLAID is effectiv…
▽ More
PLAID, an efficient implementation of the ColBERT late interaction bi-encoder using pretrained language models for ranking, consistently achieves state-of-the-art performance in monolingual, cross-language, and multilingual retrieval. PLAID differs from ColBERT by assigning terms to clusters and representing those terms as cluster centroids plus compressed residual vectors. While PLAID is effective in batch experiments, its performance degrades in streaming settings where documents arrive over time because representations of new tokens may be poorly modeled by the earlier tokens used to select cluster centroids. PLAID Streaming Hierarchical Indexing that Runs on Terabytes of Temporal Text (PLAID SHIRTTT) addresses this concern using multi-phase incremental indexing based on hierarchical sharding. Experiments on ClueWeb09 and the multilingual NeuCLIR collection demonstrate the effectiveness of this approach both for the largest collection indexed to date by the ColBERT architecture and in the multilingual setting, respectively.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval
Authors:
Eugene Yang,
Suraj Nair,
Dawn Lawrie,
James Mayfield,
Douglas W. Oard,
Kevin Duh
Abstract:
Probabilistic Structured Queries (PSQ) is a cross-language information retrieval (CLIR) method that uses translation probabilities statistically derived from aligned corpora. PSQ is a strong baseline for efficient CLIR using sparse indexing. It is, therefore, useful as the first stage in a cascaded neural CLIR system whose second stage is more effective but too inefficient to be used on its own to…
▽ More
Probabilistic Structured Queries (PSQ) is a cross-language information retrieval (CLIR) method that uses translation probabilities statistically derived from aligned corpora. PSQ is a strong baseline for efficient CLIR using sparse indexing. It is, therefore, useful as the first stage in a cascaded neural CLIR system whose second stage is more effective but too inefficient to be used on its own to search a large text collection. In this reproducibility study, we revisit PSQ by introducing an efficient Python implementation. Unconstrained use of all translation probabilities that can be estimated from aligned parallel text would in the limit assign a weight to every vocabulary term, precluding use of an inverted index to serve queries efficiently. Thus, PSQ's effectiveness and efficiency both depend on how translation probabilities are pruned. This paper presents experiments over a range of modern CLIR test collections to demonstrate that achieving Pareto optimal PSQ effectiveness-efficiency tradeoffs benefits from multi-criteria pruning, which has not been fully explored in prior work. Our Python PSQ implementation is available on GitHub(https://github.com/hltcoe/PSQ) and unpruned translation tables are available on Huggingface Models(https://huggingface.co/hltcoe/psq_translation_tables).
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Children's Overtrust and Shifting Perspectives of Generative AI
Authors:
Jaemarie Solyst,
Ellia Yang,
Shixian Xie,
Jessica Hammer,
Amy Ogan,
Motahhare Eslami
Abstract:
The capabilities of generative AI (genAI) have dramatically increased in recent times, and there are opportunities for children to leverage new features for personal and school-related endeavors. However, while the future of genAI is taking form, there remain potentially harmful limitations, such as generation of outputs with misinformation and bias. We ran a workshop study focused on ChatGPT to e…
▽ More
The capabilities of generative AI (genAI) have dramatically increased in recent times, and there are opportunities for children to leverage new features for personal and school-related endeavors. However, while the future of genAI is taking form, there remain potentially harmful limitations, such as generation of outputs with misinformation and bias. We ran a workshop study focused on ChatGPT to explore middle school girls' (N = 26) attitudes and reasoning about how genAI works. We focused on girls who are often disproportionately impacted by algorithmic bias. We found that: (1) middle school girls were initially overtrusting of genAI, (2) deliberate exposure to the limitations and mistakes of generative AI shifted this overtrust to disillusionment about genAI capabilities, though they were still optimistic for future possibilities of genAI, and (3) their ideas about school policy were nuanced. This work informs how children think about genAI like ChatGPT and its integration in learning settings.
△ Less
Submitted 29 June, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Extending Translate-Train for ColBERT-X to African Language CLIR
Authors:
Eugene Yang,
Dawn J. Lawrie,
Paul McNamee,
James Mayfield
Abstract:
This paper describes the submission runs from the HLTCOE team at the CIRAL CLIR tasks for African languages at FIRE 2023. Our submissions use machine translation models to translate the documents and the training passages, and ColBERT-X as the retrieval model. Additionally, we present a set of unofficial runs that use an alternative training procedure with a similar training setting.
This paper describes the submission runs from the HLTCOE team at the CIRAL CLIR tasks for African languages at FIRE 2023. Our submissions use machine translation models to translate the documents and the training passages, and ColBERT-X as the retrieval model. Additionally, we present a set of unofficial runs that use an alternative training procedure with a similar training setting.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
HLTCOE at TREC 2023 NeuCLIR Track
Authors:
Eugene Yang,
Dawn Lawrie,
James Mayfield
Abstract:
The HLTCOE team applied PLAID, an mT5 reranker, and document translation to the TREC 2023 NeuCLIR track. For PLAID we included a variety of models and training techniques -- the English model released with ColBERT v2, translate-train~(TT), Translate Distill~(TD) and multilingual translate-train~(MTT). TT trains a ColBERT model with English queries and passages automatically translated into the doc…
▽ More
The HLTCOE team applied PLAID, an mT5 reranker, and document translation to the TREC 2023 NeuCLIR track. For PLAID we included a variety of models and training techniques -- the English model released with ColBERT v2, translate-train~(TT), Translate Distill~(TD) and multilingual translate-train~(MTT). TT trains a ColBERT model with English queries and passages automatically translated into the document language from the MS-MARCO v1 collection. This results in three cross-language models for the track, one per language. MTT creates a single model for all three document languages by combining the translations of MS-MARCO passages in all three languages into mixed-language batches. Thus the model learns about matching queries to passages simultaneously in all languages. Distillation uses scores from the mT5 model over non-English translated document pairs to learn how to score query-document pairs. The team submitted runs to all NeuCLIR tasks: the CLIR and MLIR news task as well as the technical documents task.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Overview of the TREC 2023 NeuCLIR Track
Authors:
Dawn Lawrie,
Sean MacAvaney,
James Mayfield,
Paul McNamee,
Douglas W. Oard,
Luca Soldaini,
Eugene Yang
Abstract:
The principal goal of the TREC Neural Cross-Language Information Retrieval (NeuCLIR) track is to study the impact of neural approaches to cross-language information retrieval. The track has created four collections, large collections of Chinese, Persian, and Russian newswire and a smaller collection of Chinese scientific abstracts. The principal tasks are ranked retrieval of news in one of the thr…
▽ More
The principal goal of the TREC Neural Cross-Language Information Retrieval (NeuCLIR) track is to study the impact of neural approaches to cross-language information retrieval. The track has created four collections, large collections of Chinese, Persian, and Russian newswire and a smaller collection of Chinese scientific abstracts. The principal tasks are ranked retrieval of news in one of the three languages, using English topics. Results for a multilingual task, also with English topics but with documents from all three newswire collections, are also reported. New in this second year of the track is a pilot technical documents CLIR task for ranked retrieval of Chinese technical documents using English topics. A total of 220 runs across all tasks were submitted by six participating teams and, as baselines, by track coordinators. Task descriptions and results are presented.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Event Detection from Social Media for Epidemic Prediction
Authors:
Tanmay Parekh,
Anh Mac,
Jiarui Yu,
Yuxuan Dong,
Syed Shahriar,
Bonnie Liu,
Eric Yang,
Kuan-Hao Huang,
Wei Wang,
Nanyun Peng,
Kai-Wei Chang
Abstract:
Social media is an easy-to-access platform providing timely updates about societal trends and events. Discussions regarding epidemic-related events such as infections, symptoms, and social interactions can be crucial for informing policymaking during epidemic outbreaks. In our work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by de…
▽ More
Social media is an easy-to-access platform providing timely updates about societal trends and events. Discussions regarding epidemic-related events such as infections, symptoms, and social interactions can be crucial for informing policymaking during epidemic outbreaks. In our work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by develo** a framework to extract and analyze epidemic-related events from social media posts. To this end, we curate an epidemic event ontology comprising seven disease-agnostic event types and construct a Twitter dataset SPEED with human-annotated events focused on the COVID-19 pandemic. Experimentation reveals how ED models trained on COVID-based SPEED can effectively detect epidemic events for three unseen epidemics of Monkeypox, Zika, and Dengue; while models trained on existing ED datasets fail miserably. Furthermore, we show that reporting sharp increases in the extracted events by our framework can provide warnings 4-9 weeks earlier than the WHO epidemic declaration for Monkeypox. This utility of our framework lays the foundations for better preparedness against emerging epidemics.
△ Less
Submitted 24 May, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images
Authors:
JungEun Kim,
Hangyul Yoon,
Geondo Park,
Kyungsu Kim,
Eunho Yang
Abstract:
4D medical images, which represent 3D images with temporal information, are crucial in clinical practice for capturing dynamic changes and monitoring long-term disease progression. However, acquiring 4D medical images poses challenges due to factors such as radiation exposure and imaging duration, necessitating a balance between achieving high temporal resolution and minimizing adverse effects. Gi…
▽ More
4D medical images, which represent 3D images with temporal information, are crucial in clinical practice for capturing dynamic changes and monitoring long-term disease progression. However, acquiring 4D medical images poses challenges due to factors such as radiation exposure and imaging duration, necessitating a balance between achieving high temporal resolution and minimizing adverse effects. Given these circumstances, not only is data acquisition challenging, but increasing the frame rate for each dataset also proves difficult. To address this challenge, this paper proposes a simple yet effective Unsupervised Volumetric Interpolation framework, UVI-Net. This framework facilitates temporal interpolation without the need for any intermediate frames, distinguishing it from the majority of other existing unsupervised methods. Experiments on benchmark datasets demonstrate significant improvements across diverse evaluation metrics compared to unsupervised and supervised baselines. Remarkably, our approach achieves this superior performance even when trained with a dataset as small as one, highlighting its exceptional robustness and efficiency in scenarios with sparse supervision. This positions UVI-Net as a compelling alternative for 4D medical imaging, particularly in settings where data availability is limited. The source code is available at https://github.com/jungeun122333/UVI-Net.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Authors:
Sanghyun Jo,
Soohyun Ryu,
Sungyub Kim,
Eunho Yang,
Kyungsu Kim
Abstract:
We identify a critical bias in contemporary CLIP-based models, which we denote as single tag bias. This bias manifests as a disproportionate focus on a singular tag (word) while neglecting other pertinent tags, stemming from CLIP's text embeddings that prioritize one specific tag in image-text relationships. When deconstructing text into individual tags, only one tag tends to have high relevancy w…
▽ More
We identify a critical bias in contemporary CLIP-based models, which we denote as single tag bias. This bias manifests as a disproportionate focus on a singular tag (word) while neglecting other pertinent tags, stemming from CLIP's text embeddings that prioritize one specific tag in image-text relationships. When deconstructing text into individual tags, only one tag tends to have high relevancy with CLIP's image embedding, leading to biased tag relevancy. In this paper, we introduce a novel two-step fine-tuning approach, Text-Tag Self-Distillation (TTD), to address this challenge. TTD first extracts image-relevant tags from text based on their similarity to the nearest pixels then employs a self-distillation strategy to align combined masks with the text-derived mask. This approach ensures the unbiased image-text alignment of the CLIP-based models using only image-text pairs without necessitating additional supervision. Our technique demonstrates model-agnostic improvements in multi-tag classification and segmentation tasks, surpassing competing methods that rely on external resources. The code is available at https://github.com/shjo-april/TTD.
△ Less
Submitted 20 May, 2024; v1 submitted 30 March, 2024;
originally announced April 2024.
-
Exciton-activated effective phonon magnetic moment in monolayer MoS2
Authors:
Chunli Tang,
Gaihua Ye,
Cynthia Nnokwe,
Mengqi Fang,
Li Xiang,
Masoud Mahjouri-Samani,
Dmitry Smirnov,
Eui-Hyeok Yang,
Tingting Wang,
Lifa Zhang,
Rui He,
Wencan **
Abstract:
Optical excitation of chiral phonons plays a vital role in studying the phonon-driven magnetic phenomena in solids. Transition metal dichalcogenides host chiral phonons at high symmetry points of the Brillouin zone, providing an ideal platform to explore the interplay between chiral phonons and valley degree of freedom. Here, we investigate the helicity-resolved magneto-Raman response of monolayer…
▽ More
Optical excitation of chiral phonons plays a vital role in studying the phonon-driven magnetic phenomena in solids. Transition metal dichalcogenides host chiral phonons at high symmetry points of the Brillouin zone, providing an ideal platform to explore the interplay between chiral phonons and valley degree of freedom. Here, we investigate the helicity-resolved magneto-Raman response of monolayer MoS2 and identify a doubly degenerate Brillouin-zone-center chiral phonon mode at ~270 cm-1. Our wavelength- and temperature-dependent measurements show that this chiral phonon is activated through the resonant excitation of A exciton. Under an out-of-plane magnetic field, the chiral phonon exhibits giant Zeeman splitting, which corresponds to an effective magnetic moment of ~2.5mu_B. Moreover, we carry out theoretical calculations based on the morphic effects in nonmagnetic crystals, which reproduce the linear Zeeman splitting and Raman cross-section of the chiral phonon. Our study provides important insights into lifting the chiral phonon degeneracy in an achiral covalent material, paving a new route to excite and control chiral phonons.
△ Less
Submitted 7 April, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
GT-Rain Single Image Deraining Challenge Report
Authors:
Howard Zhang,
Yunhao Ba,
Ethan Yang,
Rishi Upadhyay,
Alex Wong,
Achuta Kadambi,
Yun Guo,
Xueyao Xiao,
Xiaoxiong Wang,
Yi Li,
Yi Chang,
Luxin Yan,
Chaochao Zheng,
Lu** Wang,
Bin Liu,
Sunder Ali Khowaja,
Jiseok Yoon,
Ik-Hyun Lee,
Zhao Zhang,
Yanyan Wei,
Jiahuan Ren,
Suiyi Zhao,
Huan Zheng
Abstract:
This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained o…
▽ More
This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained on the GT-Rain dataset and evaluated on an extension of the dataset consisting of 15 additional scenes. Scenes in GT-Rain are comprised of real rainy image and ground truth image captured moments after the rain had stopped. 275 participants were registered in the challenge and 55 competed in the final testing phase.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Repeated Padding as Data Augmentation for Sequential Recommendation
Authors:
Yizhou Dang,
Yuting Liu,
Enneng Yang,
Guibing Guo,
Linying Jiang,
Xingwei Wang,
Jianzhe Zhao
Abstract:
Sequential recommendation aims to provide users with personalized suggestions based on their historical interactions. When training sequential models, padding is a widely adopted technique for two main reasons: 1) The vast majority of models can only handle fixed-length sequences; 2) Batching-based training needs to ensure that the sequences in each batch have the same length. The special value \e…
▽ More
Sequential recommendation aims to provide users with personalized suggestions based on their historical interactions. When training sequential models, padding is a widely adopted technique for two main reasons: 1) The vast majority of models can only handle fixed-length sequences; 2) Batching-based training needs to ensure that the sequences in each batch have the same length. The special value \emph{0} is usually used as the padding content, which does not contain the actual information and is ignored in the model calculations. This common-sense padding strategy leads us to a problem that has never been explored before: \emph{Can we fully utilize this idle input space by padding other content to further improve model performance and training efficiency?}
In this paper, we propose a simple yet effective padding method called \textbf{Rep}eated \textbf{Pad}ding (\textbf{RepPad}). Specifically, we use the original interaction sequences as the padding content and fill it to the padding positions during model training. This operation can be performed a finite number of times or repeated until the input sequences' length reaches the maximum limit. Our RepPad can be viewed as a sequence-level data augmentation strategy. Unlike most existing works, our method contains no trainable parameters or hyperparameters and is a plug-and-play data augmentation operation. Extensive experiments on various categories of sequential models and five real-world datasets demonstrate the effectiveness and efficiency of our approach. The average recommendation performance improvement is up to 60.3\% on GRU4Rec and 24.3\% on SASRec. We also provide in-depth analysis and explanation of what makes RepPad effective from multiple perspectives. The source code will be released to ensure the reproducibility of our experiments.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Authors:
June Yong Yang,
Byeongwook Kim,
Jeongin Bae,
Beomseok Kwon,
Gunho Park,
Eunho Yang,
Se Jung Kwon,
Dongsoo Lee
Abstract:
Key-Value (KV) Caching has become an essential technique for accelerating the inference speed and throughput of generative Large Language Models~(LLMs). However, the memory footprint of the KV cache poses a critical bottleneck in LLM deployment as the cache size grows with batch size and sequence length, often surpassing even the size of the model itself. Although recent methods were proposed to s…
▽ More
Key-Value (KV) Caching has become an essential technique for accelerating the inference speed and throughput of generative Large Language Models~(LLMs). However, the memory footprint of the KV cache poses a critical bottleneck in LLM deployment as the cache size grows with batch size and sequence length, often surpassing even the size of the model itself. Although recent methods were proposed to select and evict unimportant KV pairs from the cache to reduce memory consumption, the potential ramifications of eviction on the generative process are yet to be thoroughly examined. In this paper, we examine the detrimental impact of cache eviction and observe that unforeseen risks arise as the information contained in the KV pairs is exhaustively discarded, resulting in safety breaches, hallucinations, and context loss. Surprisingly, we find that preserving even a small amount of information contained in the evicted KV pairs via reduced precision quantization substantially recovers the incurred degradation. On the other hand, we observe that the important KV pairs must be kept at a relatively higher precision to safeguard the generation quality. Motivated by these observations, we propose \textit{Mixed-precision KV cache}~(MiKV), a reliable cache compression method that simultaneously preserves the context details by retaining the evicted KV pairs in low-precision and ensure generation quality by kee** the important KV pairs in high-precision. Experiments on diverse benchmarks and LLM backbones show that our proposed method offers a state-of-the-art trade-off between compression ratio and performance, compared to other baselines.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Cross-domain Chinese Sentence Pattern Parsing
Authors:
**gsi Yu,
Cunliang Kong,
Liner Yang,
Meishan Zhang,
Lin Zhu,
Yujie Wang,
Haozhe Lin,
Maosong Sun,
Erhong Yang
Abstract:
Sentence Pattern Structure (SPS) parsing is a syntactic analysis method primarily employed in language teaching.Existing SPS parsers rely heavily on textbook corpora for training, lacking cross-domain capability.To overcome this constraint, this paper proposes an innovative approach leveraging large language models (LLMs) within a self-training framework. Partial syntactic rules from a source doma…
▽ More
Sentence Pattern Structure (SPS) parsing is a syntactic analysis method primarily employed in language teaching.Existing SPS parsers rely heavily on textbook corpora for training, lacking cross-domain capability.To overcome this constraint, this paper proposes an innovative approach leveraging large language models (LLMs) within a self-training framework. Partial syntactic rules from a source domain are combined with target domain sentences to dynamically generate training data, enhancing the adaptability of the parser to diverse domains.Experiments conducted on textbook and news domains demonstrate the effectiveness of the proposed method, outperforming rule-based baselines by 1.68 points on F1 metrics.
△ Less
Submitted 7 April, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Demonstration of 3 V Programmable Josephson Junction Arrays Using Non-Integer-Multiple Logic
Authors:
Wenhui Cao,
Erkun Yang,
**** Li,
Huan Qiao,
Yuan Zhong,
Qing Zhong,
Da Xu,
Xueshen Wang,
Xiaolong Xu,
Shijian Wang,
Jian Chen
Abstract:
This article demonstrates a new kind of programmable logic for the representation of an integer that can be used for the programmable Josephson voltage standard. It can enable the numbers of junctions in most bits to be variable integer values, which is different from normal binary logic or ternary logic. Consequently, missing junctions due to superconducting short circuits can be tolerated under…
▽ More
This article demonstrates a new kind of programmable logic for the representation of an integer that can be used for the programmable Josephson voltage standard. It can enable the numbers of junctions in most bits to be variable integer values, which is different from normal binary logic or ternary logic. Consequently, missing junctions due to superconducting short circuits can be tolerated under this logic. This logic can also have nearly the same segmentation efficiency as ternary logic. The completeness of the sequences using this logic is proven by the recursive method in mathematics in this paper. After that, a new algorithm for the representation of integers is presented according to the proven process, and an analysis of the number of fault-tolerant junctions for each bit is provided. Although the first and second bits are not tolerant to missing junctions, bits beyond these can tolerate one to hundreds of missing junctions. Due to the non-fixed multiples between the bits of the sequence, this logic is called non-integer-multiple logic. Finally, the design and fabrication of a 3 V programmable Josephson junction array using this logic are described, and the measurements and analysis of the characteristic parameters are presented.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
From Text to CQL: Bridging Natural Language and Corpus Search Engine
Authors:
Luming Lu,
Jiyuan An,
Yujie Wang,
Liner yang,
Cunliang Kong,
Zhenghao Liu,
Shuo Wang,
Haozhe Lin,
Mingwei Fang,
Ya** Huang,
Erhong Yang
Abstract:
Natural Language Processing (NLP) technologies have revolutionized the way we interact with information systems, with a significant focus on converting natural language queries into formal query languages such as SQL. However, less emphasis has been placed on the Corpus Query Language (CQL), a critical tool for linguistic research and detailed analysis within text corpora. The manual construction…
▽ More
Natural Language Processing (NLP) technologies have revolutionized the way we interact with information systems, with a significant focus on converting natural language queries into formal query languages such as SQL. However, less emphasis has been placed on the Corpus Query Language (CQL), a critical tool for linguistic research and detailed analysis within text corpora. The manual construction of CQL queries is a complex and time-intensive task that requires a great deal of expertise, which presents a notable challenge for both researchers and practitioners. This paper presents the first text-to-CQL task that aims to automate the translation of natural language into CQL. We present a comprehensive framework for this task, including a specifically curated large-scale dataset and methodologies leveraging large language models (LLMs) for effective text-to-CQL task. In addition, we established advanced evaluation metrics to assess the syntactic and semantic accuracy of the generated queries. We created innovative LLM-based conversion approaches and detailed experiments. The results demonstrate the efficacy of our methods and provide insights into the complexities of text-to-CQL task.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models
Authors:
Yang Liu,
Meng Xu,
Shuo Wang,
Liner Yang,
Haoyu Wang,
Zhenghao Liu,
Cunliang Kong,
Yun Chen,
Yang Liu,
Maosong Sun,
Erhong Yang
Abstract:
Modern large language models (LLMs) should generally benefit individuals from various cultural backgrounds around the world. However, most recent advanced generative evaluation benchmarks tailed for LLMs mainly focus on English. To this end, we introduce OMGEval, the first Open-source Multilingual Generative test set that can assess the capability of LLMs in different languages. For each language,…
▽ More
Modern large language models (LLMs) should generally benefit individuals from various cultural backgrounds around the world. However, most recent advanced generative evaluation benchmarks tailed for LLMs mainly focus on English. To this end, we introduce OMGEval, the first Open-source Multilingual Generative test set that can assess the capability of LLMs in different languages. For each language, OMGEval provides 804 open-ended questions, covering a wide range of important capabilities of LLMs, such as general knowledge, logical reasoning, and so on. Each question is rigorously verified by human annotators. Notably, to sufficiently reflect the compatibility of LLMs in different cultural backgrounds, we perform localization for each non-English language. Specifically, the current version of OMGEval includes 5 languages (i.e., Zh, Ru, Fr, Es, Ar). Following AlpacaEval, we employ GPT-4 as the adjudicator to automatically score different model outputs, which is shown closely related to human evaluation. We evaluate several representative multilingual LLMs on the proposed OMGEval, which we believe will provide a valuable reference for the community to further understand and improve the multilingual capability of LLMs. OMGEval is available at https://github.com/blcuicall/OMGEval.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning
Authors:
Gyeongman Kim,
Doohyuk Jang,
Eunho Yang
Abstract:
Recent advancements in large language models (LLMs) have raised concerns about inference costs, increasing the need for research into model compression. While knowledge distillation (KD) is a prominent method for this, research on KD for generative language models like LLMs is relatively sparse, and the approach of distilling student-friendly knowledge, which has shown promising performance in KD…
▽ More
Recent advancements in large language models (LLMs) have raised concerns about inference costs, increasing the need for research into model compression. While knowledge distillation (KD) is a prominent method for this, research on KD for generative language models like LLMs is relatively sparse, and the approach of distilling student-friendly knowledge, which has shown promising performance in KD for classification models, remains unexplored in generative language models. To explore this approach, we propose PromptKD, a simple yet effective method that utilizes prompt tuning - for the first time in KD - to enable generative language models to transfer student-friendly knowledge. Unlike previous works in classification that require fine-tuning the entire teacher model for extracting student-friendly knowledge, PromptKD achieves similar effects by adding a small number of prompt tokens and tuning only the prompt with student guidance. Extensive experiments on instruction-following datasets show that PromptKD achieves state-of-the-art performance while adding only 0.0007% of the teacher's parameters as prompts. Further analysis suggests that distilling student-friendly knowledge alleviates exposure bias effectively throughout the entire training process, leading to performance enhancements.
△ Less
Submitted 24 June, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Knowledge Distillation Based on Transformed Teacher Matching
Authors:
Kaixiang Zheng,
En-Hui Yang
Abstract:
As a technique to bridge logit matching and probability distribution matching, temperature scaling plays a pivotal role in knowledge distillation (KD). Conventionally, temperature scaling is applied to both teacher's logits and student's logits in KD. Motivated by some recent works, in this paper, we drop instead temperature scaling on the student side, and systematically study the resulting varia…
▽ More
As a technique to bridge logit matching and probability distribution matching, temperature scaling plays a pivotal role in knowledge distillation (KD). Conventionally, temperature scaling is applied to both teacher's logits and student's logits in KD. Motivated by some recent works, in this paper, we drop instead temperature scaling on the student side, and systematically study the resulting variant of KD, dubbed transformed teacher matching (TTM). By reinterpreting temperature scaling as a power transform of probability distribution, we show that in comparison with the original KD, TTM has an inherent Rényi entropy term in its objective function, which serves as an extra regularization term. Extensive experiment results demonstrate that thanks to this inherent regularization, TTM leads to trained students with better generalization than the original KD. To further enhance student's capability to match teacher's power transformed probability distribution, we introduce a sample-adaptive weighting coefficient into TTM, yielding a novel distillation approach dubbed weighted TTM (WTTM). It is shown, by comprehensive experiments, that although WTTM is simple, it is effective, improves upon TTM, and achieves state-of-the-art accuracy performance. Our source code is available at https://github.com/zkxufo/TTM.
△ Less
Submitted 7 March, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
The Application of ChatGPT in Responding to Questions Related to the Boston Bowel Preparation Scale
Authors:
Xiaoqiang Liu,
Yubin Wang,
Zicheng Huang,
Boming Xu,
Yilin Zeng,
Xinqi Chen,
Zilong Wang,
Enning Yang,
Xiaoxuan Lei,
Yisen Huang,
Xiaobo Liu
Abstract:
Background: Colonoscopy, a crucial diagnostic tool in gastroenterology, depends heavily on superior bowel preparation. ChatGPT, a large language model with emergent intelligence which also exhibits potential in medical applications. This study aims to assess the accuracy and consistency of ChatGPT in using the Boston Bowel Preparation Scale (BBPS) for colonoscopy assessment. Methods: We retrospect…
▽ More
Background: Colonoscopy, a crucial diagnostic tool in gastroenterology, depends heavily on superior bowel preparation. ChatGPT, a large language model with emergent intelligence which also exhibits potential in medical applications. This study aims to assess the accuracy and consistency of ChatGPT in using the Boston Bowel Preparation Scale (BBPS) for colonoscopy assessment. Methods: We retrospectively collected 233 colonoscopy images from 2020 to 2023. These images were evaluated using the BBPS by 3 senior endoscopists and 3 novice endoscopists. Additionally, ChatGPT also assessed these images, having been divided into three groups and undergone specific Fine-tuning. Consistency was evaluated through two rounds of testing. Results: In the initial round, ChatGPT's accuracy varied between 48.93% and 62.66%, trailing the endoscopists' accuracy of 76.68% to 77.83%. Kappa values for ChatGPT was between 0.52 and 0.53, compared to 0.75 to 0.87 for the endoscopists. Conclusion: While ChatGPT shows promise in bowel preparation scoring, it currently does not match the accuracy and consistency of experienced endoscopists. Future research should focus on in-depth Fine-tuning.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
Authors:
Fangru Lin,
Emanuele La Malfa,
Valentin Hofmann,
Elle Michelle Yang,
Anthony Cohn,
Janet B. Pierrehumbert
Abstract:
Planning is a fundamental property of human intelligence. Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large language models (LLMs) succeed at this task? Here, we present the first large-scale study investigating this question. We find that a representative set of closed and open-source LLMs, including GPT-4 and LL…
▽ More
Planning is a fundamental property of human intelligence. Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large language models (LLMs) succeed at this task? Here, we present the first large-scale study investigating this question. We find that a representative set of closed and open-source LLMs, including GPT-4 and LLaMA-2, behave poorly when not supplied with illustrations about the task-solving process in our benchmark AsyncHow. We propose a novel technique called Plan Like a Graph (PLaG) that combines graphs with natural language prompts and achieves state-of-the-art results. We show that although PLaG can boost model performance, LLMs still suffer from drastic degradation when task complexity increases, highlighting the limits of utilizing LLMs for simulating digital devices. We see our study as an exciting step towards using LLMs as efficient autonomous agents. Our code and data are available at https://github.com/fangru-lin/graph-llm-asynchow-plan.
△ Less
Submitted 3 June, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Representation Surgery for Multi-Task Model Merging
Authors:
Enneng Yang,
Li Shen,
Zhenyi Wang,
Guibing Guo,
Xiaojun Chen,
Xingwei Wang,
Dacheng Tao
Abstract:
Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization. Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training, greatly expanding the application scenarios of MTL. However, by visualizing the representation distribution o…
▽ More
Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization. Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training, greatly expanding the application scenarios of MTL. However, by visualizing the representation distribution of existing model merging schemes, we find that the merged model often suffers from the dilemma of representation bias. That is, there is a significant discrepancy in the representation distribution between the merged and individual models, resulting in poor performance of merged MTL. In this paper, we propose a representation surgery solution called "Surgery" to reduce representation bias in the merged model. Specifically, Surgery is a lightweight task-specific module that takes the representation of the merged model as input and attempts to output the biases contained in the representation from the merged model. We then designed an unsupervised optimization objective that updates the Surgery module by minimizing the distance between the merged model's representation and the individual model's representation. Extensive experiments demonstrate significant MTL performance improvements when our Surgery module is applied to state-of-the-art (SOTA) model merging schemes.
△ Less
Submitted 28 May, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
TEDDY: Trimming Edges with Degree-based Discrimination strategY
Authors:
Hyun** Seo,
Jihun Yun,
Eunho Yang
Abstract:
Since the pioneering work on the lottery ticket hypothesis for graph neural networks (GNNs) was proposed in Chen et al. (2021), the study on finding graph lottery tickets (GLT) has become one of the pivotal focus in the GNN community, inspiring researchers to discover sparser GLT while achieving comparable performance to original dense networks. In parallel, the graph structure has gained substant…
▽ More
Since the pioneering work on the lottery ticket hypothesis for graph neural networks (GNNs) was proposed in Chen et al. (2021), the study on finding graph lottery tickets (GLT) has become one of the pivotal focus in the GNN community, inspiring researchers to discover sparser GLT while achieving comparable performance to original dense networks. In parallel, the graph structure has gained substantial attention as a crucial factor in GNN training dynamics, also elucidated by several recent studies. Despite this, contemporary studies on GLT, in general, have not fully exploited inherent pathways in the graph structure and identified tickets in an iterative manner, which is time-consuming and inefficient. To address these limitations, we introduce TEDDY, a one-shot edge sparsification framework that leverages structural information by incorporating edge-degree information. Following edge sparsification, we encourage the parameter sparsity during training via simple projected gradient descent on the $\ell_0$ ball. Given the target sparsity levels for both the graph structure and the model parameters, our TEDDY facilitates efficient and rapid realization of GLT within a single training. Remarkably, our experimental results demonstrate that TEDDY significantly surpasses conventional iterative approaches in generalization, even when conducting one-shot sparsification that solely utilizes graph structures, without taking feature information into account.
△ Less
Submitted 15 March, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information
Authors:
Linfeng Ye,
Shayan Mohajer Hamidi,
Renhao Tan,
En-Hui Yang
Abstract:
It is believed that in knowledge distillation (KD), the role of the teacher is to provide an estimate for the unknown Bayes conditional probability distribution (BCPD) to be used in the student training process. Conventionally, this estimate is obtained by training the teacher using maximum log-likelihood (MLL) method. To improve this estimate for KD, in this paper we introduce the concept of cond…
▽ More
It is believed that in knowledge distillation (KD), the role of the teacher is to provide an estimate for the unknown Bayes conditional probability distribution (BCPD) to be used in the student training process. Conventionally, this estimate is obtained by training the teacher using maximum log-likelihood (MLL) method. To improve this estimate for KD, in this paper we introduce the concept of conditional mutual information (CMI) into the estimation of BCPD and propose a novel estimator called the maximum CMI (MCMI) method. Specifically, in MCMI estimation, both the log-likelihood and CMI of the teacher are simultaneously maximized when the teacher is trained. Through Eigen-CAM, it is further shown that maximizing the teacher's CMI value allows the teacher to capture more contextual information in an image cluster. Via conducting a thorough set of experiments, we show that by employing a teacher trained via MCMI estimation rather than one trained via MLL estimation in various state-of-the-art KD frameworks, the student's classification accuracy consistently increases, with the gain of up to 3.32\%. This suggests that the teacher's BCPD estimate provided by MCMI method is more accurate than that provided by MLL method. In addition, we show that such improvements in the student's accuracy are more drastic in zero-shot and few-shot settings. Notably, the student's accuracy increases with the gain of up to 5.72\% when 5\% of the training samples are available to the student (few-shot), and increases from 0\% to as high as 84\% for an omitted class (zero-shot). The code is available at \url{https://github.com/iclr2024mcmi/ICLRMCMI}.
△ Less
Submitted 7 March, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
AdaFed: Fair Federated Learning via Adaptive Common Descent Direction
Authors:
Shayan Mohajer Hamidi,
En-Hui Yang
Abstract:
Federated learning (FL) is a promising technology via which some edge devices/clients collaboratively train a machine learning model orchestrated by a server. Learning an unfair model is known as a critical problem in federated learning, where the trained model may unfairly advantage or disadvantage some of the devices. To tackle this problem, in this work, we propose AdaFed. The goal of AdaFed is…
▽ More
Federated learning (FL) is a promising technology via which some edge devices/clients collaboratively train a machine learning model orchestrated by a server. Learning an unfair model is known as a critical problem in federated learning, where the trained model may unfairly advantage or disadvantage some of the devices. To tackle this problem, in this work, we propose AdaFed. The goal of AdaFed is to find an updating direction for the server along which (i) all the clients' loss functions are decreasing; and (ii) more importantly, the loss functions for the clients with larger values decrease with a higher rate. AdaFed adaptively tunes this common direction based on the values of local gradients and loss functions. We validate the effectiveness of AdaFed on a suite of federated datasets, and demonstrate that AdaFed outperforms state-of-the-art fair FL methods.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation
Authors:
Eugene Yang,
Dawn Lawrie,
James Mayfield,
Douglas W. Oard,
Scott Miller
Abstract:
Prior work on English monolingual retrieval has shown that a cross-encoder trained using a large number of relevance judgments for query-document pairs can be used as a teacher to train more efficient, but similarly effective, dual-encoder student models. Applying a similar knowledge distillation approach to training an efficient dual-encoder model for Cross-Language Information Retrieval (CLIR),…
▽ More
Prior work on English monolingual retrieval has shown that a cross-encoder trained using a large number of relevance judgments for query-document pairs can be used as a teacher to train more efficient, but similarly effective, dual-encoder student models. Applying a similar knowledge distillation approach to training an efficient dual-encoder model for Cross-Language Information Retrieval (CLIR), where queries and documents are in different languages, is challenging due to the lack of a sufficiently large training collection when the query and document languages differ. The state of the art for CLIR thus relies on translating queries, documents, or both from the large English MS MARCO training set, an approach called Translate-Train. This paper proposes an alternative, Translate-Distill, in which knowledge distillation from either a monolingual cross-encoder or a CLIR cross-encoder is used to train a dual-encoder CLIR student model. This richer design space enables the teacher model to perform inference in an optimized setting, while training the student model directly for CLIR. Trained models and artifacts are publicly available on Huggingface.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Stellar Flares Are Far-Ultraviolet Luminous
Authors:
Vera L. Berger,
Jason T. Hinkle,
Michael A. Tucker,
Benjamin J. Shappee,
Jennifer L. van Saders,
Daniel Huber,
Jeffrey W. Reep,
Xudong Sun,
Kai E. Yang
Abstract:
We identify 182 flares on 158 stars within 100 pc of the Sun in both the near-ultraviolet (NUV: 1750-2750 Å) and far-ultraviolet (FUV: 1350-1750 Å) using high-cadence light curves from the Galaxy Evolution Explorer (GALEX). Ultraviolet (UV) emission from stellar flares plays a crucial role in determining the habitability of exoplanetary systems. However, whether such UV emission promotes or threat…
▽ More
We identify 182 flares on 158 stars within 100 pc of the Sun in both the near-ultraviolet (NUV: 1750-2750 Å) and far-ultraviolet (FUV: 1350-1750 Å) using high-cadence light curves from the Galaxy Evolution Explorer (GALEX). Ultraviolet (UV) emission from stellar flares plays a crucial role in determining the habitability of exoplanetary systems. However, whether such UV emission promotes or threatens such life depends strongly on the energetics of these flares. Most studies assessing the effect of flares on planetary habitability assume a 9000 K blackbody spectral energy distribution that produces more NUV flux than FUV flux ($R \equiv F_{\rm FUV} / F_{\rm NUV} \approx \frac{1}{6}$). Instead, we observe the opposite with the excess FUV reaching $R \approx \frac{1}{2} - 2$, roughly $3-12$ times the expectation of a 9000 K blackbody. The ratio of FUV to NUV time-integrated flare energies is 3.0 times higher on average than would be predicted by a constant 9000 K blackbody during the flare. Finally, we find that the FUV/NUV ratio at peak tentatively correlates ($\sim 2 σ$ significance) both with total UV flare energy and with the G - RP color of the host star. On average, we observe higher FUV/NUV ratios at peak in $E_{\text{UV}}>10^{32}$ erg flares and in flares on fully convective stars.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Laguerre inequalities and determinantal inequalities for the finite difference of the partition functions
Authors:
Eve Y. Y. Yang
Abstract:
The paper aims to establish the Turán inequalities, the Laguerre inequalities (order $2$), and the determinantal inequalities (order $3$) for $Δp(n)$ and $Δ\bar{p}(n)$, where $Δf(n)$ is the first-order forward difference of a sequence $f(n)$. The functions $p(n)$ and $\bar{p}(n)$ denote the partition function and overpartition function, respectively. Conjectures for thresholds of Laguerre inequali…
▽ More
The paper aims to establish the Turán inequalities, the Laguerre inequalities (order $2$), and the determinantal inequalities (order $3$) for $Δp(n)$ and $Δ\bar{p}(n)$, where $Δf(n)$ is the first-order forward difference of a sequence $f(n)$. The functions $p(n)$ and $\bar{p}(n)$ denote the partition function and overpartition function, respectively. Conjectures for thresholds of Laguerre inequalities (order $m$) and positivity of $m$-order determinants are proposed, extending to $Δ^k p(n)$ and $Δ^k \bar{p}(n)$, with $1 \leq m \leq 11$ and $1 \leq k \leq 5$.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Chiral symmetry breaking and topological charge of graphene nanoribbons
Authors:
Hyun Cheol Lee,
S. -R. Eric Yang
Abstract:
We explore the edge properties of rectangular graphene nanoribbons featuring two zigzag edges and two armchair edges. Although the self-consistent Hartree-Fock fields break chiral symmetry, our work demonstrates that graphene nanoribbons maintain their status as short-range entangled symmetry-protected topological insulators. The relevant symmetry involves combined mirror and time-reversal operati…
▽ More
We explore the edge properties of rectangular graphene nanoribbons featuring two zigzag edges and two armchair edges. Although the self-consistent Hartree-Fock fields break chiral symmetry, our work demonstrates that graphene nanoribbons maintain their status as short-range entangled symmetry-protected topological insulators. The relevant symmetry involves combined mirror and time-reversal operations. In undoped ribbons displaying edge ferromagnetism, the band gap edge states with a topological charge form on the zigzag edges. An analysis of the anomalous continuity equation elucidates that this topological charge is induced by the gap term. In low-doped zigzag ribbons, where the ground state exhibits edge spin density waves, this topological charge appears as a nearly zero-energy edge mode. Our system is outside the conventional calssification for topological insulators.
△ Less
Submitted 22 March, 2024; v1 submitted 9 December, 2023;
originally announced December 2023.
-
Enhancing Diffusion Models with 3D Perspective Geometry Constraints
Authors:
Rishi Upadhyay,
Howard Zhang,
Yunhao Ba,
Ethan Yang,
Blake Gella,
Sicheng Jiang,
Alex Wong,
Achuta Kadambi
Abstract:
While perspective is a well-studied topic in art, it is generally taken for granted in images. However, for the recent wave of high-quality image synthesis methods such as latent diffusion models, perspective accuracy is not an explicit requirement. Since these methods are capable of outputting a wide gamut of possible images, it is difficult for these synthesized images to adhere to the principle…
▽ More
While perspective is a well-studied topic in art, it is generally taken for granted in images. However, for the recent wave of high-quality image synthesis methods such as latent diffusion models, perspective accuracy is not an explicit requirement. Since these methods are capable of outputting a wide gamut of possible images, it is difficult for these synthesized images to adhere to the principles of linear perspective. We introduce a novel geometric constraint in the training process of generative models to enforce perspective accuracy. We show that outputs of models trained with this constraint both appear more realistic and improve performance of downstream models trained on generated images. Subjective human trials show that images generated with latent diffusion models trained with our constraint are preferred over images from the Stable Diffusion V2 model 70% of the time. SOTA monocular depth estimation models such as DPT and PixelFormer, fine-tuned on our images, outperform the original models trained on real images by up to 7.03% in RMSE and 19.3% in SqRel on the KITTI test set for zero-shot transfer.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation
Authors:
Yuting Liu,
Enneng Yang,
Yizhou Dang,
Guibing Guo,
Qiang Liu,
Yuliang Liang,
Linying Jiang,
Xingwei Wang
Abstract:
Multimodal recommendation aims to model user and item representations comprehensively with the involvement of multimedia content for effective recommendations. Existing research has shown that it is beneficial for recommendation performance to combine (user- and item-) ID embeddings with multimodal salient features, indicating the value of IDs. However, there is a lack of a thorough analysis of th…
▽ More
Multimodal recommendation aims to model user and item representations comprehensively with the involvement of multimedia content for effective recommendations. Existing research has shown that it is beneficial for recommendation performance to combine (user- and item-) ID embeddings with multimodal salient features, indicating the value of IDs. However, there is a lack of a thorough analysis of the ID embeddings in terms of feature semantics in the literature. In this paper, we revisit the value of ID embeddings for multimodal recommendation and conduct a thorough study regarding its semantics, which we recognize as subtle features of \emph{content} and \emph{structure}. Based on our findings, we propose a novel recommendation model by incorporating ID embeddings to enhance the salient features of both content and structure. Specifically, we put forward a hierarchical attention mechanism to incorporate ID embeddings in modality fusing, coupled with contrastive learning, to enhance content representations. Meanwhile, we propose a lightweight graph convolution network for each modality to amalgamate neighborhood and ID embeddings for improving structural representations. Finally, the content and structure representations are combined to form the ultimate item embedding for recommendation. Extensive experiments on three real-world datasets (Baby, Sports, and Clothing) demonstrate the superiority of our method over state-of-the-art multimodal recommendation methods and the effectiveness of fine-grained ID embeddings. Our code is available at https://anonymous.4open.science/r/IDSF-code/.
△ Less
Submitted 22 May, 2024; v1 submitted 10 November, 2023;
originally announced November 2023.
-
Face-StyleSpeech: Improved Face-to-Voice latent map** for Natural Zero-shot Speech Synthesis from a Face Image
Authors:
Minki Kang,
Wooseok Han,
Eunho Yang
Abstract:
Generating a voice from a face image is crucial for develo** virtual humans capable of interacting using their unique voices, without relying on pre-recorded human speech. In this paper, we propose Face-StyleSpeech, a zero-shot Text-To-Speech (TTS) synthesis model that generates natural speech conditioned on a face image rather than reference speech. We hypothesize that learning both speaker ide…
▽ More
Generating a voice from a face image is crucial for develo** virtual humans capable of interacting using their unique voices, without relying on pre-recorded human speech. In this paper, we propose Face-StyleSpeech, a zero-shot Text-To-Speech (TTS) synthesis model that generates natural speech conditioned on a face image rather than reference speech. We hypothesize that learning both speaker identity and prosody from a face image poses a significant challenge. To address the issue, our TTS model incorporates both a face encoder and a prosody encoder. The prosody encoder is specifically designed to model prosodic features that are not captured only with a face image, allowing the face encoder to focus solely on capturing the speaker identity from the face image. Experimental results demonstrate that Face-StyleSpeech effectively generates more natural speech from a face image than baselines, even for the face images the model has not trained. Samples are at our demo page https://face-stylespeech.github.io.
△ Less
Submitted 25 September, 2023;
originally announced November 2023.
-
A Possible Mechanism for "Late Phase" in Stellar White-Light Flares
Authors:
Kai E. Yang,
Xudong Sun,
Graham S. Kerr,
Hugh S. Hudson
Abstract:
M-dwarf flares observed by the \textit{Transiting Exoplanet Survey Satellite} (\textit{TESS}) sometimes exhibit a "peak-bump" light-curve morphology, characterized by a secondary, gradual peak well after the main, impulsive peak. A similar "late phase" is frequently detected in solar flares observed in the extreme-ultraviolet from longer hot coronal loops distinct from the impulsive flare structur…
▽ More
M-dwarf flares observed by the \textit{Transiting Exoplanet Survey Satellite} (\textit{TESS}) sometimes exhibit a "peak-bump" light-curve morphology, characterized by a secondary, gradual peak well after the main, impulsive peak. A similar "late phase" is frequently detected in solar flares observed in the extreme-ultraviolet from longer hot coronal loops distinct from the impulsive flare structures. White-light emission has also been observed in off-limb solar flare loops. Here, we perform a suite of one-dimensional hydrodynamic loop simulations for M-dwarf flares inspired by these solar examples. Our results suggest that coronal plasma condensation following impulsive flare heating can yield high electron number density in the loop, allowing it to contribute significantly to the optical light curves via free-bound and free-free emission mechanisms. Our simulation results qualitatively agree with \textit{TESS} observations: the longer evolutionary time scale of coronal loops produces a distinct, secondary emission peak; its intensity increases with the injected flare energy. We argue that coronal plasma condensation is a possible mechanism for the \textit{TESS} late-phase flares.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Results and Limits of Time Division Multiplexing for the BICEP Array High Frequency Receivers
Authors:
S. Fatigoni,
P. A. R. Ade,
Z. Ahmed,
M. Amiri,
D. Barkats,
R. Basu Thakur,
C. A. Bischoff,
D. Beck,
J. J. Bock,
V. Buza,
J. Cheshire,
J. Connors,
J. Cornelison,
M. Crumrine,
A. J. Cukierman,
E. V. Denison,
M. I. Dierickx,
L. Duband,
M. Eiben,
J. P. Filippini,
A. Fortes,
M. Gao,
C. Giannakopoulos,
N. Goeckner-Wald,
D. C. Goldfinger
, et al. (62 additional authors not shown)
Abstract:
Time-Division Multiplexing is the readout architecture of choice for many ground and space experiments, as it is a very mature technology with proven outstanding low-frequency noise stability, which represents a central challenge in multiplexing. Once fully populated, each of the two BICEP Array high frequency receivers, observing at 150GHz and 220/270GHz, will have 7776 TES detectors tiled on the…
▽ More
Time-Division Multiplexing is the readout architecture of choice for many ground and space experiments, as it is a very mature technology with proven outstanding low-frequency noise stability, which represents a central challenge in multiplexing. Once fully populated, each of the two BICEP Array high frequency receivers, observing at 150GHz and 220/270GHz, will have 7776 TES detectors tiled on the focal plane. The constraints set by these two receivers required a redesign of the warm readout electronics. The new version of the standard Multi Channel Electronics, developed and built at the University of British Columbia, is presented here for the first time. BICEP Array operates Time Division Multiplexing readout technology to the limits of its capabilities in terms of multiplexing rate, noise and crosstalk, and applies them in rigorously demanding scientific application requiring extreme noise performance and systematic error control. Future experiments like CMB-S4 plan to use TES bolometers with Time Division/SQUID-based readout for an even larger number of detectors.
△ Less
Submitted 24 October, 2023; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Mutual information and correlations across topological phase transitions in topologically ordered graphene zigzag nanoribbons
Authors:
In-Hwan Lee,
Hoang-Anh Le,
S. -R. Eric Yang
Abstract:
Graphene zigzag nanoribbons, initially in a topologically ordered state, undergo a topological phase transition into crossover phases distinguished by quasi-topological order. We computed mutual information for both the topologically ordered phase and its crossover phases, revealing the following results: (i) In the topologically ordered phase, A-chirality carbon lines strongly entangle with B-chi…
▽ More
Graphene zigzag nanoribbons, initially in a topologically ordered state, undergo a topological phase transition into crossover phases distinguished by quasi-topological order. We computed mutual information for both the topologically ordered phase and its crossover phases, revealing the following results: (i) In the topologically ordered phase, A-chirality carbon lines strongly entangle with B-chirality carbon lines on the opposite side of the zigzag ribbon. This entanglement persists but weakens in crossover phases. (ii) The upper zigzag edge entangles with non-edge lines of different chirality on the opposite side of the ribbon. (iii) Entanglement increases as more carbon lines are grouped together, regardless of the lines' chirality. No long-range entanglement was found in the symmetry-protected phase in the absence of disorder.
△ Less
Submitted 21 October, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
AdaMerging: Adaptive Model Merging for Multi-Task Learning
Authors:
Enneng Yang,
Zhenyi Wang,
Li Shen,
Shiwei Liu,
Guibing Guo,
Xingwei Wang,
Dacheng Tao
Abstract:
Multi-task learning (MTL) aims to empower a model to tackle multiple tasks simultaneously. A recent development known as task arithmetic has revealed that several models, each fine-tuned for distinct tasks, can be directly merged into a single model to execute MTL without necessitating a retraining process using the initial training data. Nevertheless, this direct addition of models often leads to…
▽ More
Multi-task learning (MTL) aims to empower a model to tackle multiple tasks simultaneously. A recent development known as task arithmetic has revealed that several models, each fine-tuned for distinct tasks, can be directly merged into a single model to execute MTL without necessitating a retraining process using the initial training data. Nevertheless, this direct addition of models often leads to a significant deterioration in the overall performance of the merged model. This decline occurs due to potential conflicts and intricate correlations among the multiple tasks. Consequently, the challenge emerges of how to merge pre-trained models more effectively without using their original training data. This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging). This approach aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Specifically, our AdaMerging method operates as an automatic, unsupervised task arithmetic scheme. It leverages entropy minimization on unlabeled test samples from the multi-task setup as a surrogate objective function to iteratively refine the merging coefficients of the multiple models. Our experimental findings across eight tasks demonstrate the efficacy of the AdaMerging scheme we put forth. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11\% improvement in performance. Notably, AdaMerging also exhibits superior generalization capabilities when applied to unseen downstream tasks. Furthermore, it displays a significantly enhanced robustness to data distribution shifts that may occur during the testing phase.
△ Less
Submitted 28 May, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
PC-Adapter: Topology-Aware Adapter for Efficient Domain Adaption on Point Clouds with Rectified Pseudo-label
Authors:
Joonhyung Park,
Hyun** Seo,
Eunho Yang
Abstract:
Understanding point clouds captured from the real-world is challenging due to shifts in data distribution caused by varying object scales, sensor angles, and self-occlusion. Prior works have addressed this issue by combining recent learning principles such as self-supervised learning, self-training, and adversarial training, which leads to significant computational overhead.Toward succinct yet pow…
▽ More
Understanding point clouds captured from the real-world is challenging due to shifts in data distribution caused by varying object scales, sensor angles, and self-occlusion. Prior works have addressed this issue by combining recent learning principles such as self-supervised learning, self-training, and adversarial training, which leads to significant computational overhead.Toward succinct yet powerful domain adaptation for point clouds, we revisit the unique challenges of point cloud data under domain shift scenarios and discover the importance of the global geometry of source data and trends of target pseudo-labels biased to the source label distribution. Motivated by our observations, we propose an adapter-guided domain adaptation method, PC-Adapter, that preserves the global shape information of the source domain using an attention-based adapter, while learning the local characteristics of the target domain via another adapter equipped with graph convolution. Additionally, we propose a novel pseudo-labeling strategy resilient to the classifier bias by adjusting confidence scores using their class-wise confidence distributions to consider relative confidences. Our method demonstrates superiority over baselines on various domain shift settings in benchmark datasets - PointDA, GraspNetPC, and PointSegDA.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
STERLING: Self-Supervised Terrain Representation Learning from Unconstrained Robot Experience
Authors:
Haresh Karnan,
Elvin Yang,
Daniel Farkash,
Garrett Warnell,
Joydeep Biswas,
Peter Stone
Abstract:
Terrain awareness, i.e., the ability to identify and distinguish different types of terrain, is a critical ability that robots must have to succeed at autonomous off-road navigation. Current approaches that provide robots with this awareness either rely on labeled data which is expensive to collect, engineered features and cost functions that may not generalize, or expert human demonstrations whic…
▽ More
Terrain awareness, i.e., the ability to identify and distinguish different types of terrain, is a critical ability that robots must have to succeed at autonomous off-road navigation. Current approaches that provide robots with this awareness either rely on labeled data which is expensive to collect, engineered features and cost functions that may not generalize, or expert human demonstrations which may not be available. Towards endowing robots with terrain awareness without these limitations, we introduce Self-supervised TErrain Representation LearnING (STERLING), a novel approach for learning terrain representations that relies solely on easy-to-collect, unconstrained (e.g., non-expert), and unlabelled robot experience, with no additional constraints on data collection. STERLING employs a novel multi-modal self-supervision objective through non-contrastive representation learning to learn relevant terrain representations for terrain-aware navigation. Through physical robot experiments in off-road environments, we evaluate STERLING features on the task of preference-aligned visual navigation and find that STERLING features perform on par with fully supervised approaches and outperform other state-of-the-art methods with respect to preference alignment. Additionally, we perform a large-scale experiment of autonomously hiking a 3-mile long trail which STERLING completes successfully with only two manual interventions, demonstrating its robustness to real-world off-road conditions.
△ Less
Submitted 20 October, 2023; v1 submitted 26 September, 2023;
originally announced September 2023.