Search | arXiv e-print repository

Toward Precise Robotic Weed Flaming Using a Mobile Manipulator with a Flamethrower

Authors: Di Wang, Chengsong Hu, Shuangyu Xie, Joe Johnson, Hojun Ji, Yingtao Jiang, Muthukumar Bagavathiannan, Dezhen Song

Abstract: Robotic weed flaming is a new and environmentally friendly approach to weed removal in the agricultural field. Using a mobile manipulator equipped with a flamethrower, we design a new system and algorithm to enable effective weed flaming, which requires robotic manipulation with a soft and deformable end effector, as the thermal coverage of the flame is affected by dynamic or unknown environmental… ▽ More Robotic weed flaming is a new and environmentally friendly approach to weed removal in the agricultural field. Using a mobile manipulator equipped with a flamethrower, we design a new system and algorithm to enable effective weed flaming, which requires robotic manipulation with a soft and deformable end effector, as the thermal coverage of the flame is affected by dynamic or unknown environmental factors such as gravity, wind, atmospheric pressure, fuel tank pressure, and pose of the nozzle. System development includes overall design, hardware integration, and software pipeline. To enable precise weed removal, the greatest challenge is to detect and predict dynamic flame coverage in real time before motion planning, which is quite different from a conventional rigid gripper in gras** or a spray gun in painting. Based on the images from two onboard infrared cameras and the pose information of the flamethrower nozzle on a mobile manipulator, we propose a new dynamic flame coverage model. The flame model uses a center-arc curve with a Gaussian cross-section model to describe the flame coverage in real time. The experiments have demonstrated the working system and shown that our model and algorithm can achieve a mean average precision (mAP) of more than 76\% in the reprojected images during online prediction. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: IROS 2024

arXiv:2407.03688 [pdf, other]

Adaptive sampling strategy for tolerance analysis of freeform optical surfaces based on critical ray aiming

Authors: Rundong Fan, Shili Wei, Zhuang Qian, Huiru Ji, Hao Tan, Yan Mo, Donglin Ma

Abstract: The tolerance analysis of freeform surfaces plays a crucial role in the development of advanced imaging systems. However, the intricate relationship between surface error and imaging quality poses significant challenges, necessitating dense sampling of featured rays during the computation process to ensure an accurate tolerance for different fields of view (FOVs). Here, we propose an adaptive samp… ▽ More The tolerance analysis of freeform surfaces plays a crucial role in the development of advanced imaging systems. However, the intricate relationship between surface error and imaging quality poses significant challenges, necessitating dense sampling of featured rays during the computation process to ensure an accurate tolerance for different fields of view (FOVs). Here, we propose an adaptive sampling strategy called "Critical Ray Aiming" for surface tolerance analysis. By identifying the most sensitive ray to wave aberration at each surface point, our methodology facilitates flexible sampling of the FOVs and entrance pupil (EP), achieving computational efficiency without compromising accuracy in determining tolerable surface error. We demonstrate the effectiveness of our method through tolerance analysis of two different freeform imaging systems. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03040 [pdf, other]

Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

Authors: Xia Hou, Qifeng Li, Jian Yang, Tongliang Li, Linzheng Chai, Xianjie Wu, Hangyuan Ji, Zhoujun Li, Jixuan Nie, **gbo Dun, Wenfeng Song

Abstract: Instruction tuning as an effective technique aligns the outputs of large language models (LLMs) with human preference. But how to generate the seasonal multi-turn dialogues from raw documents for instruction tuning still requires further exploration. In this paper, we present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide large language models (LLMs) in generat… ▽ More Instruction tuning as an effective technique aligns the outputs of large language models (LLMs) with human preference. But how to generate the seasonal multi-turn dialogues from raw documents for instruction tuning still requires further exploration. In this paper, we present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide large language models (LLMs) in generating knowledge-intensive multi-turn dialogues for instruction tuning. By integrating raw documents from both open-source datasets and domain-specific web-crawled documents into a benchmark K-BENCH, we cover diverse areas such as Wikipedia (English), Science (Chinese), and Artifacts (Chinese). Our approach first decides the logic flow of the current dialogue and then prompts LLMs to produce key phrases for sourcing relevant response content. This methodology enables the creation of the G I NSTRUCT instruction dataset, retaining raw document knowledge within dialoguestyle interactions. Utilizing this dataset, we fine-tune GLLM, a model designed to transform raw documents into structured multi-turn dialogues, thereby injecting comprehensive domain knowledge into the SFT model for enhanced instruction tuning. This work signifies a stride towards refining the adaptability and effectiveness of LLMs in processing and generating more accurate, contextually nuanced responses across various fields. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 11 pages, 3 figures

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2407.01100 [pdf, other]

Eliminating Position Bias of Language Models: A Mechanistic Approach

Authors: Ziqi Wang, Hanlin Zhang, Xiner Li, Kuan-Hao Huang, Chi Han, Shuiwang Ji, Sham M. Kakade, Hao Peng, Heng Ji

Abstract: Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness, and reliability across various applications. Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of… ▽ More Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness, and reliability across various applications. Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings. Specifically, we find that causal attention generally causes models to favor distant content, while relative positional encodings like RoPE prefer nearby ones based on the analysis of retrieval-augmented question answering (QA). Further, our empirical study on object detection reveals that position bias is also present in vision-language models (VLMs). Based on the above analyses, we propose to ELIMINATE position bias caused by different input segment orders (e.g., options in LM-as-a-judge, retrieved documents in QA) in a TRAINING-FREE ZERO-SHOT manner. Our method changes the causal attention to bidirectional attention between segments and utilizes model attention values to decide the relative orders of segments instead of using the order provided in input prompts, therefore enabling Position-INvariant inferencE (PINE) at the segment level. By eliminating position bias, models achieve better performance and reliability in downstream tasks where position bias widely exists, such as LM-as-a-judge and retrieval-augmented QA. Notably, PINE is especially useful when adapting LMs for evaluating reasoning pairs: it consistently provides 8 to 10 percentage points performance gains in most cases, and makes Llama-3-70B-Instruct perform even better than GPT-4-0125-preview on the RewardBench reasoning subset. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 18 pages, 5 figures

arXiv:2407.00589 [pdf, other]

Modeling film flows down a rotating slippery cylinder

Authors: Souradip Chattopadhyay, Amar K. Gaonkar, Hangjie Ji

Abstract: This study investigates the nonlinear stability and dynamics of gravity-driven viscous films on a vertical rotating cylinder, considering both outer and inner surface flows with slip conditions at the cylinder wall. We develop an asymptotic model for the combined effects of rotation and wall slippage. Linear stability analysis indicates that wall slippage enhances instability on both surfaces, whi… ▽ More This study investigates the nonlinear stability and dynamics of gravity-driven viscous films on a vertical rotating cylinder, considering both outer and inner surface flows with slip conditions at the cylinder wall. We develop an asymptotic model for the combined effects of rotation and wall slippage. Linear stability analysis indicates that wall slippage enhances instability on both surfaces, while rotation has differing impacts: it amplifies instability due to slip for outer surface flow but reduces it for inner surface flow. A weakly nonlinear stability analysis is then conducted to explore the combined impact of rotation and wall slip on flow stability beyond the linear regime, including the bifurcation of the nonlinear evolution equation for both surfaces. The traveling wave solution of the model is analyzed, showing how rotation affects nonlinear wave speed with a slippery wall. A stability analysis of the traveling wave solutions is also performed. Numerical simulations of the nonlinear evolution of the free surface reveal that increasing slip length enhances the choke phenomenon in inner surface flow, while rotation can delay this effect. Additionally, simulations show that for flow along the outer surface of a slippery rotating cylinder, the film tends to break up into droplets in the presence of rotation. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 23 pages, 15 figures

arXiv:2406.15657 [pdf, other]

FIRST: Faster Improved Listwise Reranking with Single Token Decoding

Authors: Revanth Gangi Reddy, JaeHyeok Doo, Yifei Xu, Md Arafat Sultan, Deevya Swain, Avirup Sil, Heng Ji

Abstract: Large Language Models (LLMs) have significantly advanced the field of information retrieval, particularly for reranking. Listwise LLM rerankers have showcased superior performance and generalizability compared to existing supervised approaches. However, conventional listwise LLM reranking methods lack efficiency as they provide ranking output in the form of a generated ordered sequence of candidat… ▽ More Large Language Models (LLMs) have significantly advanced the field of information retrieval, particularly for reranking. Listwise LLM rerankers have showcased superior performance and generalizability compared to existing supervised approaches. However, conventional listwise LLM reranking methods lack efficiency as they provide ranking output in the form of a generated ordered sequence of candidate passage identifiers. Further, they are trained with the typical language modeling objective, which treats all ranking errors uniformly--potentially at the cost of misranking highly relevant passages. Addressing these limitations, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates. Further, we incorporate a learning-to-rank loss during training, prioritizing ranking accuracy for the more relevant passages. Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark. Finally, to illustrate the practical effectiveness of listwise LLM rerankers, we investigate their application in providing relevance feedback for retrievers during inference. Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2406.14137 [pdf, other]

MACAROON: Training Vision-Language Models To Be Your Engaged Partners

Authors: Shu** Wu, Yi R. Fung, Sha Li, Yixin Wan, Kai-Wei Chang, Heng Ji

Abstract: Large vision-language models (LVLMs), while proficient in following instructions and responding to diverse questions, invariably generate detailed responses even when questions are ambiguous or unanswerable, leading to hallucinations and bias issues. Thus, it is essential for LVLMs to proactively engage with humans to ask for clarifications or additional information for better responses. In this s… ▽ More Large vision-language models (LVLMs), while proficient in following instructions and responding to diverse questions, invariably generate detailed responses even when questions are ambiguous or unanswerable, leading to hallucinations and bias issues. Thus, it is essential for LVLMs to proactively engage with humans to ask for clarifications or additional information for better responses. In this study, we aim to shift LVLMs from passive answer providers to proactive engaged partners. We begin by establishing a three-tiered hierarchy for questions of invalid, ambiguous, and personalizable nature to measure the proactive engagement capabilities of LVLMs. Utilizing this hierarchy, we create PIE, (ProactIve Engagement Evaluation) through GPT-4o and human annotators, consisting of 853 questions across six distinct, fine-grained question types that are verified by human annotators and accompanied with well-defined metrics. Our evaluations on \benchmark indicate poor performance of existing LVLMs, with the best-performing open-weights model only achieving an Aggregate Align Rate (AAR) of 0.28. In response, we introduce MACAROON, self-iMaginAtion for ContrAstive pReference OptimizatiON, which instructs LVLMs to autonomously generate contrastive response pairs for unlabeled questions given the task description and human-crafted criteria. Then, the self-imagined data is formatted for conditional reinforcement learning. Experimental results show MACAROON effectively improves LVLMs' capabilities to be proactively engaged (0.84 AAR) while maintaining comparable performance on general tasks. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: The code will be made public at https://github.com/Shu**Wu-0814/MACAROON

arXiv:2406.07067 [pdf, other]

TIM: Temporal Interaction Model in Notification System

Authors: Huxiao Ji, Haitao Yang, Linchuan Li, Shunyu Zhang, Cunyi Zhang, Xuan** Li, Wenwu Ou

Abstract: Modern mobile applications heavily rely on the notification system to acquire daily active users and enhance user engagement. Being able to proactively reach users, the system has to decide when to send notifications to users. Although many researchers have studied optimizing the timing of sending notifications, they only utilized users' contextual features, without modeling users' behavior patter… ▽ More Modern mobile applications heavily rely on the notification system to acquire daily active users and enhance user engagement. Being able to proactively reach users, the system has to decide when to send notifications to users. Although many researchers have studied optimizing the timing of sending notifications, they only utilized users' contextual features, without modeling users' behavior patterns. Additionally, these efforts only focus on individual notifications, and there is a lack of studies on optimizing the holistic timing of multiple notifications within a period. To bridge these gaps, we propose the Temporal Interaction Model (TIM), which models users' behavior patterns by estimating CTR in every time slot over a day in our short video application Kuaishou. TIM leverages long-term user historical interaction sequence features such as notification receipts, clicks, watch time and effective views, and employs a temporal attention unit (TAU) to extract user behavior patterns. Moreover, we provide an elegant strategy of holistic notifications send time control to improve user engagement while minimizing disruption. We evaluate the effectiveness of TIM through offline experiments and online A/B tests. The results indicate that TIM is a reliable tool for forecasting user behavior, leading to a remarkable enhancement in user engagement without causing undue disturbance. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.03871 [pdf]

Development of high-level applications for High Energy Photon Source booster

Authors: Yuemei Peng, Daheng Ji, Hongfei Ji, Nan Li, Xiaohan Lu, Saike Tian, Yuanyuan Wei, Haisheng Xu, Yaliang Zhao, Yi Jiao, **gyi Li

Abstract: The High Energy Photon Source (HEPS), is the first fourth-generation storage ring light source being built in the suburb of Bei**g, China. The storage ring was designed with the emittance lower than 60 pm.rad with a circumference of 1.36 km and beam energy of 6 GeV. Its injector contains a 500 MeV S-band Linac and a 454 m booster which was designed as an accumulator at the extraction energy. In t… ▽ More The High Energy Photon Source (HEPS), is the first fourth-generation storage ring light source being built in the suburb of Bei**g, China. The storage ring was designed with the emittance lower than 60 pm.rad with a circumference of 1.36 km and beam energy of 6 GeV. Its injector contains a 500 MeV S-band Linac and a 454 m booster which was designed as an accumulator at the extraction energy. In the energy ram** control design of HEPS booster, the ram** process was programed to be able to stop and stay at any energy between the injection energy and the extraction energy. This feature enables us to conduct energy-dependent machine studies and ram** curve optimization. The beam commissioning of HEPS Linac finished in June, 2023. And the beam commissioning of booster started in the end of July, 2023. In November 17, main target values proposed in the preliminary design report has been reached. The high-level applications (HLAs) are essential tools for beam commissioning. The development of HLAs, which are based on the framework named Python accelerator physics application set (Pyapas), started in the end of 2021. The HEPS physics team spent more than one year to develop and test the HLAs to meet the requirements of beam commissioning of the booster. Thanks to the modular design, the principle based on physical quantities, and the ability of running simulation models online from the Pyapas, the development efficiency and reliability of the HLAs have been greatly improved. In particular, the principle based on physical quantities allows us to control the beam more intuitively. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.02056 [pdf, other]

CAP: A Context-Aware Neural Predictor for NAS

Authors: Han Ji, Yuqi Feng, Yanan Sun

Abstract: Neural predictors are effective in boosting the time-consuming performance evaluation stage in neural architecture search (NAS), owing to their direct estimation of unseen architectures. Despite the effectiveness, training a powerful neural predictor with fewer annotated architectures remains a huge challenge. In this paper, we propose a context-aware neural predictor (CAP) which only needs a few… ▽ More Neural predictors are effective in boosting the time-consuming performance evaluation stage in neural architecture search (NAS), owing to their direct estimation of unseen architectures. Despite the effectiveness, training a powerful neural predictor with fewer annotated architectures remains a huge challenge. In this paper, we propose a context-aware neural predictor (CAP) which only needs a few annotated architectures for training based on the contextual information from the architectures. Specifically, the input architectures are encoded into graphs and the predictor infers the contextual structure around the nodes inside each graph. Then, enhanced by the proposed context-aware self-supervised task, the pre-trained predictor can obtain expressive and generalizable representations of architectures. Therefore, only a few annotated architectures are sufficient for training. Experimental results in different search spaces demonstrate the superior performance of CAP compared with state-of-the-art neural predictors. In particular, CAP can rank architectures precisely at the budget of only 172 annotated architectures in NAS-Bench-101. Moreover, CAP can help find promising architectures in both NAS-Bench-101 and DARTS search spaces on the CIFAR-10 dataset, serving as a useful navigator for NAS to explore the search space efficiently. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by IJCAI24

arXiv:2406.00875 [pdf, other]

Ohm's Law, the Reconnection Rate, and Energy Conversion in Collisionless Magnetic Reconnection

Authors: Yi-Hsin Liu, Michael Hesse, Kevin Genestreti, Rumi Nakamura, Jim Burch, Paul Cassak, Naoki Bessho, Jonathan Eastwood, Tai Phan, Marc Swisdak, Sergio Toledo-Redondo, Masahiro Hoshino, Cecilia Norgren, Hantao Ji, TKM Nakamura

Abstract: Magnetic reconnection is a ubiquitous plasma process that transforms magnetic energy into particle energy during eruptive events throughout the universe. Reconnection not only converts energy during solar flares and geomagnetic substorms that drive space weather near Earth, but it may also play critical roles in the high energy emissions from the magnetospheres of neutron stars and black holes. In… ▽ More Magnetic reconnection is a ubiquitous plasma process that transforms magnetic energy into particle energy during eruptive events throughout the universe. Reconnection not only converts energy during solar flares and geomagnetic substorms that drive space weather near Earth, but it may also play critical roles in the high energy emissions from the magnetospheres of neutron stars and black holes. In this review article, we focus on collisionless plasmas that are most relevant to reconnection in many space and astrophysical plasmas. Guided by first-principles kinetic simulations and spaceborne in-situ observations, we highlight the most recent progress in understanding this fundamental plasma process. We start by discussing the non-ideal electric field in the generalized Ohm's law that breaks the frozen-in flux condition in ideal magnetohydrodynamics and allows magnetic reconnection to occur. We point out that this same reconnection electric field also plays an important role in sustaining the current and pressure in the current sheet and then discuss the determination of its magnitude (i.e., the reconnection rate), based on force balance and energy conservation. This approach to determining the reconnection rate is applied to kinetic current sheets of a wide variety of magnetic geometries, parameters, and background conditions. We also briefly review the key diagnostics and modeling of energy conversion around the reconnection diffusion region, seeking insights from recently developed theories. Finally, future prospects and open questions are discussed. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Submitted to Space Science Reviews. This is a review paper as an outcome of the 2022 Magnetic Reconnection Workshop in the International Space Science Institute

arXiv:2405.20015 [pdf, other]

Efficient LLM-Jailbreaking by Introducing Visual Modality

Authors: Zhenxing Niu, Yuyao Sun, Haodong Ren, Haoxuan Ji, Quan Wang, Xiaoke Ma, Gang Hua, Rong **

Abstract: This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreaks that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) through the incorporation of a visual module into the target LLM. Subsequently, we conduct an e… ▽ More This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreaks that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) through the incorporation of a visual module into the target LLM. Subsequently, we conduct an efficient MLLM-jailbreak to generate jailbreaking embeddings embJS. Finally, we convert the embJS into text space to facilitate the jailbreaking of the target LLM. Compared to direct LLM-jailbreaking, our approach is more efficient, as MLLMs are more vulnerable to jailbreaking than pure LLM. Additionally, to improve the attack success rate (ASR) of jailbreaking, we propose an image-text semantic matching scheme to identify a suitable initial input. Extensive experiments demonstrate that our approach surpasses current state-of-the-art methods in terms of both efficiency and effectiveness. Moreover, our approach exhibits superior cross-class jailbreaking capabilities. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17303 [pdf, other]

High-Resolution Observation and Magnetic Modeling of a Solar Minifilament: the Formation, Eruption and Failing Mechanisms

Authors: Weilin Teng, Yingna Su, Rui Liu, Jialin Chen, Yanjie Liu, Jun Dai, Wenda Cao, **hua Shen, Haisheng Ji

Abstract: Minifilaments are widespread small-scale structures in the solar atmosphere. To better understand their formation and eruption mechanisms, we investigate the entire life of a sigmoidal minifilament located below a large quiescent filament observed by BBSO/GST on 2015 August 3. The Hα structure initially appears as a group of arched threads, then transforms into two J-shaped arcades, and finally fo… ▽ More Minifilaments are widespread small-scale structures in the solar atmosphere. To better understand their formation and eruption mechanisms, we investigate the entire life of a sigmoidal minifilament located below a large quiescent filament observed by BBSO/GST on 2015 August 3. The Hα structure initially appears as a group of arched threads, then transforms into two J-shaped arcades, and finally forms a sigmoidal shape. SDO/AIA observations in 171Å show that two coronal jets occur around the southern footpoint of the minifilament before the minifilament eruption. The minifilament eruption starts from the southern footpoint, then interacts with the overlying filament and fails. The aforementioned observational changes correspond to three episodes of flux cancellations observed by SDO/HMI. Unlike previous studies, the flux cancellation occurs between the polarity where southern footpoint of the minifilament is rooted in and an external polarity. We construct two magnetic field models before the eruption using the flux rope insertion method, and find an hyperbolic flux tube (HFT) above the flux cancellation site. The observation and modeling results suggest that the eruption is triggered by the external magnetic reconnection between the core field of the minifilament and the external fields due to flux cancellations. This study reveals a new triggering mechanism for minifilament eruptions and a new relationship between minifilament eruptions and coronal jets. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.15028 [pdf, other]

AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings

Authors: Revanth Gangi Reddy, Omar Attia, Yunyao Li, Heng Ji, Saloni Potdar

Abstract: Ranking is a fundamental and popular problem in search. However, existing ranking algorithms usually restrict the granularity of ranking to full passages or require a specific dense index for each desired level of granularity. Such lack of flexibility in granularity negatively affects many applications that can benefit from more granular ranking, such as sentence-level ranking for open-domain ques… ▽ More Ranking is a fundamental and popular problem in search. However, existing ranking algorithms usually restrict the granularity of ranking to full passages or require a specific dense index for each desired level of granularity. Such lack of flexibility in granularity negatively affects many applications that can benefit from more granular ranking, such as sentence-level ranking for open-domain question-answering, or proposition-level ranking for attribution. In this work, we introduce the idea of any-granularity ranking, which leverages multi-vector embeddings to rank at varying levels of granularity while maintaining encoding at a single (coarser) level of granularity. We propose a multi-granular contrastive loss for training multi-vector approaches, and validate its utility with both sentences and propositions as ranking units. Finally, we demonstrate the application of proposition-level ranking to post-hoc citation addition in retrieval-augmented generation, surpassing the performance of prompt-driven citation generation. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14203 [pdf, other]

GLaD: Synergizing Molecular Graphs and Language Descriptors for Enhanced Power Conversion Efficiency Prediction in Organic Photovoltaic Devices

Authors: Thao Nguyen, Tiara Torres-Flores, Changhyun Hwang, Carl Edwards, Ying Diao, Heng Ji

Abstract: This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors for enhanced PCE prediction. Due to the lack of high-quality experimental data, we collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, whic… ▽ More This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors for enhanced PCE prediction. Due to the lack of high-quality experimental data, we collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, which we utilize as the training data for our predictive model. In this low-data regime, GLaD leverages properties learned from large language models (LLMs) pretrained on extensive scientific literature to enrich molecular structural representations, allowing for a multimodal representation of molecules. GLaD achieves precise predictions of PCE, thereby facilitating the synthesis of new OPV molecules with improved efficiency. Furthermore, GLaD showcases versatility, as it applies to a range of molecular property prediction tasks (BBBP, BACE, ClinTox, and SIDER), not limited to those concerning OPV materials. Especially, GLaD proves valuable for tasks in low-data regimes within the chemical space, as it enriches molecular representations by incorporating molecular property descriptions learned from large-scale pretraining. This capability is significant in real-world scientific endeavors like drug and material discovery, where access to comprehensive data is crucial for informed decision-making and efficient exploration of the chemical space. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: In progress

arXiv:2405.13179 [pdf, other]

RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts

Authors: Yuelyu Ji, Zhuochun Li, Rui Meng, Sonish Sivarajkumar, Yanshan Wang, Zeshui Yu, Hui Ji, Yushui Han, Hanyu Zeng, Daqing He

Abstract: This paper introduces the RAG-RLRC-LaySum framework, designed to make complex biomedical research understandable to laymen through advanced Natural Language Processing (NLP) techniques. Our Retrieval Augmented Generation (RAG) solution, enhanced by a reranking method, utilizes multiple knowledge sources to ensure the precision and pertinence of lay summaries. Additionally, our Reinforcement Learni… ▽ More This paper introduces the RAG-RLRC-LaySum framework, designed to make complex biomedical research understandable to laymen through advanced Natural Language Processing (NLP) techniques. Our Retrieval Augmented Generation (RAG) solution, enhanced by a reranking method, utilizes multiple knowledge sources to ensure the precision and pertinence of lay summaries. Additionally, our Reinforcement Learning for Readability Control (RLRC) strategy improves readability, making scientific content comprehensible to non-specialists. Evaluations using the publicly accessible PLOS and eLife datasets show that our methods surpass Plain Gemini model, demonstrating a 20% increase in readability scores, a 15% improvement in ROUGE-2 relevance scores, and a 10% enhancement in factual accuracy. The RAG-RLRC-LaySum framework effectively democratizes scientific knowledge, enhancing public engagement with biomedical discoveries. △ Less

Submitted 24 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.13005 [pdf]

Understanding the Rare Inflammatory Disease Using Large Language Models and Social Media Data

Authors: Nan Miles Xi, Hong-Long Ji, Lin Wang

Abstract: Sarcoidosis is a rare inflammatory disease characterized by the formation of granulomas in various organs. The disease presents diagnostic and treatment challenges due to its diverse manifestations and unpredictable nature. In this study, we employed a Large Language Model (LLM) to analyze sarcoidosis-related discussions on the social media platform Reddit. Our findings underscore the efficacy of… ▽ More Sarcoidosis is a rare inflammatory disease characterized by the formation of granulomas in various organs. The disease presents diagnostic and treatment challenges due to its diverse manifestations and unpredictable nature. In this study, we employed a Large Language Model (LLM) to analyze sarcoidosis-related discussions on the social media platform Reddit. Our findings underscore the efficacy of LLMs in accurately identifying sarcoidosis-related content. We discovered a wide array of symptoms reported by patients, with fatigue, swollen lymph nodes, and shortness of breath as the most prevalent. Prednisone was the most prescribed medication, while infliximab showed the highest effectiveness in improving prognoses. Notably, our analysis revealed disparities in prognosis based on age and gender, with women and younger patients experiencing good and polarized outcomes, respectively. Furthermore, unsupervised clustering identified three distinct patient subgroups (phenotypes) with unique symptom profiles, prognostic outcomes, and demographic distributions. Finally, sentiment analysis revealed a moderate negative impact on patients' mental health post-diagnosis, particularly among women and younger individuals. Our study represents the first application of LLMs to understand sarcoidosis through social media data. It contributes to understanding the disease by providing data-driven insights into its manifestations, treatments, prognoses, and impact on patients' lives. Our findings have direct implications for improving personalized treatment strategies and enhancing the quality of care for individuals living with sarcoidosis. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.05481 [pdf, other]

Achieving millisecond coherence fluxonium through overlap Josephson junctions

Authors: Fei Wang, Kannan Lu, Huijuan Zhan, Lu Ma, Feng Wu, Hantao Sun, Hao Deng, Yang Bai, Feng Bao, Xu Chang, Ran Gao, Xun Gao, Guicheng Gong, Lijuan Hu, Ruizi Hu, Honghong Ji, Xizheng Ma, Liyong Mao, Zhijun Song, Chengchun Tang, Hongcheng Wang, Tenghui Wang, Ziang Wang, Tian Xia, Hongxin Xu , et al. (10 additional authors not shown)

Abstract: Fluxonium qubits are recognized for their high coherence times and high operation fidelities, attributed to their unique design incorporating over 100 Josephson junctions per superconducting loop. However, this complexity poses significant fabrication challenges, particularly in achieving high yield and junction uniformity with traditional methods. Here, we introduce an overlap process for Josephs… ▽ More Fluxonium qubits are recognized for their high coherence times and high operation fidelities, attributed to their unique design incorporating over 100 Josephson junctions per superconducting loop. However, this complexity poses significant fabrication challenges, particularly in achieving high yield and junction uniformity with traditional methods. Here, we introduce an overlap process for Josephson junction fabrication that achieves nearly 100% yield and maintains uniformity across a 2-inch wafer with less than 5% variation for the phase slip junction and less than 2% for the junction array. Our compact junction array design facilitates fluxonium qubits with energy relaxation times exceeding 1 millisecond at the flux frustration point, demonstrating consistency with state-of-the-art dielectric loss tangents and flux noise across multiple devices. This work suggests the scalability of high coherence fluxonium processors using CMOS-compatible processes, marking a significant step towards practical quantum computing. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04602 [pdf, other]

An Empirical Study of Kotlin-Java Interactions

Authors: Qiong Feng, Huan Ji, Xiaotian Ma, Peng Liang

Abstract: Background: Since Google introduced Kotlin as an official programming language for develo** Android apps in 2017, Kotlin has gained widespread adoption in Android development. The interoperability of Java and Kotlin's design nature allows them to coexist and interact with each other smoothly within a project. Aims: However, there is limited research on how Java and Kotlin interact with each othe… ▽ More Background: Since Google introduced Kotlin as an official programming language for develo** Android apps in 2017, Kotlin has gained widespread adoption in Android development. The interoperability of Java and Kotlin's design nature allows them to coexist and interact with each other smoothly within a project. Aims: However, there is limited research on how Java and Kotlin interact with each other in real-world projects and what challenges are faced during these interactions. The answers to these questions are key to understanding these kinds of cross-language software systems. Methods: In this paper, we implemented a tool named DependExtractor, which can extract 11 kinds of Kotlin-Java dependencies, and conducted an empirical study of 23 Kotlin-Java real-world projects with 3,227 Java and 8,630 Kotlin source files. Results: Our findings revealed that Java and Kotlin frequently interact with each other in these cross-language projects, with access and call dependency types being the most dominant. Compared to files interacting with other files in the same language, Java/Kotlin source files, which participate in the cross-language interactions, undergo more commits. Additionally, among all Kotlin-Java problematic interactions, we identified seven common mistakes, along with their fixing strategies. Conclusions: The findings of this study can help developers understand and address the challenges in Kotlin-Java projects. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03446 [pdf, other]

SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence

Authors: Hangyuan Ji, Jian Yang, Linzheng Chai, Chaoren Wei, Liqun Yang, Yunlong Duan, Yunli Wang, Tianzhen Sun, Hongcheng Guo, Tongliang Li, Changyu Ren, Zhoujun Li

Abstract: To address the increasing complexity and frequency of cybersecurity incidents emphasized by the recent cybersecurity threat reports with over 10 billion instances, cyber threat intelligence (CTI) plays a critical role in the modern cybersecurity landscape by offering the insights required to understand and combat the constantly evolving nature of cyber threats. Inspired by the powerful capability… ▽ More To address the increasing complexity and frequency of cybersecurity incidents emphasized by the recent cybersecurity threat reports with over 10 billion instances, cyber threat intelligence (CTI) plays a critical role in the modern cybersecurity landscape by offering the insights required to understand and combat the constantly evolving nature of cyber threats. Inspired by the powerful capability of large language models (LLMs) in handling complex tasks, in this paper, we introduce a framework to benchmark, elicit, and improve cybersecurity incident analysis and response abilities in LLMs for Security Events (SEvenLLM). Specifically, we create a high-quality bilingual instruction corpus by crawling cybersecurity raw text from cybersecurity websites to overcome the lack of effective data for information extraction. Then, we design a pipeline to auto-select tasks from the tasks pool and convert the raw text into supervised corpora comprised of question and response. The instruction dataset SEvenLLM-Instruct is used to train cybersecurity LLMs with the multi-task learning objective (27 well-designed tasks) for augmenting the analysis of cybersecurity events. Extensive experiments in our curated benchmark (SEvenLLM-bench) demonstrate that SEvenLLM performs more sophisticated threat analysis and fortifies defenses against the evolving landscape of cyber threats. △ Less

Submitted 3 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.17512 [pdf, other]

On the spectral edge of non-Hermitian random matrices

Authors: Andrew Campbell, Giorgio Cipolloni, László Erdős, Hong Chang Ji

Abstract: For general non-Hermitian random matrices $X$ and deterministic deformation matrices $A$, we prove that the local eigenvalue statistics of $A+X$ close to the typical edge points of its spectrum are universal. Furthermore, we show that under natural assumptions on $A$ the spectrum of $A+X$ does not have outliers at a distance larger than the natural fluctuation scale of the eigenvalues. As a conseq… ▽ More For general non-Hermitian random matrices $X$ and deterministic deformation matrices $A$, we prove that the local eigenvalue statistics of $A+X$ close to the typical edge points of its spectrum are universal. Furthermore, we show that under natural assumptions on $A$ the spectrum of $A+X$ does not have outliers at a distance larger than the natural fluctuation scale of the eigenvalues. As a consequence, the number of eigenvalues in each component of $\mathrm{Spec}(A+X)$ is deterministic. △ Less

Submitted 6 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Comments: 51 pages

MSC Class: 15B52; 60B20

arXiv:2404.16792 [pdf, other]

Weak-to-Strong Extrapolation Expedites Alignment

Authors: Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng

Abstract: The open-source community is experiencing a surge in the release of large language models (LLMs) that are trained to follow instructions and align with human preference. However, further training to improve them still requires expensive computational resources and data annotations. Is it possible to bypass additional training and cost-effectively acquire better-aligned models? Inspired by the lite… ▽ More The open-source community is experiencing a surge in the release of large language models (LLMs) that are trained to follow instructions and align with human preference. However, further training to improve them still requires expensive computational resources and data annotations. Is it possible to bypass additional training and cost-effectively acquire better-aligned models? Inspired by the literature on model interpolation, we propose a simple method called ExPO to boost LLMs' alignment with human preference. Utilizing a model that has undergone alignment training (e.g., via DPO or RLHF) and its initial SFT checkpoint, ExPO directly obtains a better-aligned model by extrapolating from the weights of the initial and the aligned models, which implicitly optimizes the alignment objective via first-order approximation. Through experiments with twelve open-source LLMs on HuggingFace, we demonstrate that ExPO consistently improves off-the-shelf DPO/RLHF models, as evaluated on the mainstream LLM benchmarks AlpacaEval 2.0 and MT-Bench. Moreover, ExPO exhibits remarkable scalability across various model sizes (from 1.8B to 70B) and capabilities. Through controlled experiments and further empirical analyses, we shed light on the essence of ExPO amplifying the reward signal learned during alignment training. Our work demonstrates the efficacy of model extrapolation in expediting the alignment of LLMs with human preference, suggesting a promising direction for future research. △ Less

Submitted 22 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: Add theoretical explanation and more evaluation results

arXiv:2404.12666 [pdf, other]

A Survey on Federated Analytics: Taxonomy, Enabling Techniques, Applications and Open Issues

Authors: Zibo Wang, Haichao Ji, Yifei Zhu, Dan Wang, Zhu Han

Abstract: The escalating influx of data generated by networked edge devices, coupled with the growing awareness of data privacy, has promoted a transformative shift in computing paradigms from centralized data processing to privacy-preserved distributed data processing. Federated analytics (FA) is an emerging technique to support collaborative data analytics among diverse data owners without centralizing th… ▽ More The escalating influx of data generated by networked edge devices, coupled with the growing awareness of data privacy, has promoted a transformative shift in computing paradigms from centralized data processing to privacy-preserved distributed data processing. Federated analytics (FA) is an emerging technique to support collaborative data analytics among diverse data owners without centralizing the raw data. Despite the wide applications of FA in industry and academia, a comprehensive examination of existing research efforts in FA has been notably absent. This survey aims to bridge this gap by first providing an overview of FA, elucidating key concepts, and discussing its relationship with similar concepts. We then conduct a thorough examination of FA, including its taxonomy, key challenges, and enabling techniques. Diverse FA applications, including statistical metrics, set computation, frequency-related applications, database query operations, model-based applications, FL-assisting FA tasks, and other wireless network applications are then carefully reviewed. We complete the survey with several open research issues and future directions. This survey intends to provide a holistic understanding of the emerging FA techniques and foster the continued evolution of privacy-preserving distributed data processing in the emerging networked society. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: This survey has been submitted to IEEE Communications Surveys & Tutorials

arXiv:2404.12135 [pdf, other]

mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture

Authors: Wei Zhang, Hongcheng Guo, Jian Yang, Yi Zhang, Chaoran Yan, Zhou** Tian, Hangyuan Ji, Zhoujun Li, Tongliang Li, Tieqiao Zheng, Chao Chen, Yi Liang, Xu Shi, Liangfan Zheng, Bo Zhang

Abstract: The escalating complexity of micro-services architecture in cloud-native technologies poses significant challenges for maintaining system stability and efficiency. To conduct root cause analysis (RCA) and resolution of alert events, we propose a pioneering framework, multi-Agent Blockchain-inspired Collaboration for root cause analysis in micro-services architecture (mABC), to revolutionize the AI… ▽ More The escalating complexity of micro-services architecture in cloud-native technologies poses significant challenges for maintaining system stability and efficiency. To conduct root cause analysis (RCA) and resolution of alert events, we propose a pioneering framework, multi-Agent Blockchain-inspired Collaboration for root cause analysis in micro-services architecture (mABC), to revolutionize the AI for IT operations (AIOps) domain, where multiple agents based on the powerful large language models (LLMs) perform blockchain-inspired voting to reach a final agreement following a standardized process for processing tasks and queries provided by Agent Workflow. Specifically, seven specialized agents derived from Agent Workflow each provide valuable insights towards root cause analysis based on their expertise and the intrinsic software knowledge of LLMs collaborating within a decentralized chain. To avoid potential instability issues in LLMs and fully leverage the transparent and egalitarian advantages inherent in a decentralized structure, mABC adopts a decision-making process inspired by blockchain governance principles while considering the contribution index and expertise index of each agent. Experimental results on the public benchmark AIOps challenge dataset and our created train-ticket dataset demonstrate superior performance in accurately identifying root causes and formulating effective solutions, compared to previous strong baselines. The ablation study further highlights the significance of each component within mABC, with Agent Workflow, multi-agent, and blockchain-inspired voting being crucial for achieving optimal performance. mABC offers a comprehensive automated root cause analysis and resolution in micro-services architecture and achieves a significant improvement in the AIOps domain compared to existing baselines △ Less

Submitted 3 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.06479 [pdf, other]

Text-Based Reasoning About Vector Graphics

Authors: Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, Heng Ji

Abstract: While large multimodal models excel in broad vision-language benchmarks, they often struggle with tasks requiring precise perception of low-level visual details, such as comparing line lengths or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics -- images composed purely of 2D objects and shapes. To address this challenge, we propose… ▽ More While large multimodal models excel in broad vision-language benchmarks, they often struggle with tasks requiring precise perception of low-level visual details, such as comparing line lengths or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics -- images composed purely of 2D objects and shapes. To address this challenge, we propose the Visually Descriptive Language Model (VDLM), which performs text-based reasoning about vector graphics. VDLM leverages Scalable Vector Graphics (SVG) for a more precise visual description and first uses an off-the-shelf raster-to-SVG algorithm for encoding. Since existing language models cannot understand raw SVGs in a zero-shot setting, VDLM then bridges SVG with pretrained language models through a newly introduced intermediate symbolic representation, Primal Visual Description (PVD), comprising primitive attributes (e.g., shape, position, measurement) with their corresponding predicted values. PVD is task-agnostic and represents visual primitives that are universal across all vector graphics. It can be learned with procedurally generated (SVG, PVD) pairs and also enables the direct use of LLMs for generalization to complex reasoning tasks. By casting an image to a text-based representation, we can leverage the power of language models to learn alignment from SVG to visual primitives and generalize to unseen question-answering tasks. Empirical results show that VDLM achieves stronger zero-shot performance compared to state-of-the-art LMMs, such as GPT-4V, in various low-level multimodal perception and reasoning tasks on vector graphics. We additionally present extensive analyses on VDLM's performance, demonstrating that our framework offers better interpretability due to its disentangled perception and reasoning processes. Project page: https://mikewangwzhl.github.io/VDLM/ △ Less

Submitted 24 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: Project page: https://mikewangwzhl.github.io/VDLM/

arXiv:2404.01652 [pdf, other]

Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization

Authors: Zixuan Zhang, Revanth Gangi Reddy, Kevin Small, Tong Zhang, Heng Ji

Abstract: Open-domain Question Answering (OpenQA) aims at answering factual questions with an external large-scale knowledge corpus. However, real-world knowledge is not static; it updates and evolves continually. Such a dynamic characteristic of knowledge poses a vital challenge for these models, as the trained models need to constantly adapt to the latest information to make sure that the answers remain a… ▽ More Open-domain Question Answering (OpenQA) aims at answering factual questions with an external large-scale knowledge corpus. However, real-world knowledge is not static; it updates and evolves continually. Such a dynamic characteristic of knowledge poses a vital challenge for these models, as the trained models need to constantly adapt to the latest information to make sure that the answers remain accurate. In addition, it is still unclear how well an OpenQA model can transfer to completely new knowledge domains. In this paper, we investigate the generalization performance of a retrieval-augmented QA model in two specific scenarios: 1) adapting to updated versions of the same knowledge corpus; 2) switching to completely different knowledge domains. We observe that the generalization challenges of OpenQA models stem from the reader's over-reliance on memorizing the knowledge from the external corpus, which hinders the model from generalizing to a new knowledge corpus. We introduce Corpus-Invariant Tuning (CIT), a simple but effective training strategy, to mitigate the knowledge over-memorization by controlling the likelihood of retrieved contexts during training. Extensive experimental results on multiple OpenQA benchmarks show that CIT achieves significantly better generalizability without compromising the model's performance in its original corpus and domain. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted to NAACL 2024 Findings

arXiv:2403.19081 [pdf]

Surface variation analysis of freeform optical systems over surface frequency bands for prescribed wavefront errors

Authors: Rundong Fan, Shili Wei, Huiru JI, Zhuang Qian, Hao Tan, Yan Mo, Donglin MA

Abstract: The surface errors of freeform surfaces reflect the manufacturing complexities and significantly impact the feasibility of processing designed optical systems. With multiple degrees of freedom, freeform surfaces pose challenges in surface tolerance analysis in the field. Nevertheless, current research has neglected the influence of surface slopes on the directions of ray propagation. A sudden alte… ▽ More The surface errors of freeform surfaces reflect the manufacturing complexities and significantly impact the feasibility of processing designed optical systems. With multiple degrees of freedom, freeform surfaces pose challenges in surface tolerance analysis in the field. Nevertheless, current research has neglected the influence of surface slopes on the directions of ray propagation. A sudden alteration in the surface slope will lead to a corresponding abrupt shift in the wavefront, even when the change in surface sag is minimal. Moreover, within the realm of freeform surface manufacturing, variation in surface slope across different frequency bands may give rise to unique surface variation. Within the context of this study, we propose a tolerance analysis method to analyze surface variation in freeform surfaces considering surface frequency band slopes based on real ray data. This approach utilizes real ray data to rapidly evaluate surface variation within a specified frequency band of surface slopes. Crucially, our proposed method yields the capability to obtain system surface variation with significant wavefront aberration, in contrast to previous methodologies. The feasibility and advantages of this framework are assessed by analyzing a single-mirror system with a single field and an off-axis two-mirror system. We expect to integrate the proposed methodology with freeform surface design and manufacturing, thereby expanding the scope of freeform optics. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18671 [pdf, other]

Fact Checking Beyond Training Set

Authors: Payam Karisani, Heng Ji

Abstract: Evaluating the veracity of everyday claims is time consuming and in some cases requires domain expertise. We empirically demonstrate that the commonly used fact checking pipeline, known as the retriever-reader, suffers from performance deterioration when it is trained on the labeled data from one domain and used in another domain. Afterwards, we delve into each component of the pipeline and propos… ▽ More Evaluating the veracity of everyday claims is time consuming and in some cases requires domain expertise. We empirically demonstrate that the commonly used fact checking pipeline, known as the retriever-reader, suffers from performance deterioration when it is trained on the labeled data from one domain and used in another domain. Afterwards, we delve into each component of the pipeline and propose novel algorithms to address this problem. We propose an adversarial algorithm to make the retriever component robust against distribution shift. Our core idea is to initially train a bi-encoder on the labeled source data, and then, to adversarially train two separate document and claim encoders using unlabeled target data. We then focus on the reader component and propose to train it such that it is insensitive towards the order of claims and evidence documents. Our empirical evaluations support the hypothesis that such a reader shows a higher robustness against distribution shift. To our knowledge, there is no publicly available multi-topic fact checking dataset. Thus, we propose a simple automatic method to re-purpose two well-known fact checking datasets. We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models, including recent domain adaptation models that use GPT4 for generating synthetic data. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: NAACL 2024

arXiv:2403.16823 [pdf, ps, other]

Resource and Mobility Management in Hybrid LiFi and WiFi Networks: A User-Centric Learning Approach

Authors: Han Ji, Xi** Wu

Abstract: Hybrid light fidelity (LiFi) and wireless fidelity (WiFi) networks (HLWNets) are an emerging indoor wireless communication paradigm, which combines the advantages of the capacious optical spectra of LiFi and ubiquitous coverage of WiFi. Meanwhile, load balancing (LB) becomes a key challenge in resource management for such hybrid networks. The existing LB methods are mostly network-centric, relying… ▽ More Hybrid light fidelity (LiFi) and wireless fidelity (WiFi) networks (HLWNets) are an emerging indoor wireless communication paradigm, which combines the advantages of the capacious optical spectra of LiFi and ubiquitous coverage of WiFi. Meanwhile, load balancing (LB) becomes a key challenge in resource management for such hybrid networks. The existing LB methods are mostly network-centric, relying on a central unit to make a solution for the users all at once. Consequently, the solution needs to be updated for all users at the same pace, regardless of their moving status. This would affect the network performance in two aspects: i) when the update frequency is low, it would compromise the connectivity of fast-moving users; ii) when the update frequency is high, it would cause unnecessary handovers as well as hefty feedback costs for slow-moving users. Motivated by this, we investigate user-centric LB which allows users to update their solutions at different paces. The research is developed upon our previous work on adaptive target-condition neural network (ATCNN), which can conduct LB for individual users in quasi-static channels. In this paper, a deep neural network (DNN) model is designed to enable an adaptive update interval for each individual user. This new model is termed as mobility-supporting neural network (MSNN). Associating MSNN with ATCNN, a user-centric LB framework named mobility-supporting ATCNN (MS-ATCNN) is proposed to handle resource management and mobility management simultaneously. Results show that at the same level of average update interval, MS-ATCNN can achieve a network throughput up to 215\% higher than conventional LB methods such as game theory, especially for a larger number of users. In addition, MS-ATCNN costs an ultra low runtime at the level of 100s $μ$s, which is two to three orders of magnitude lower than game theory. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 12 pages, 12 figures, 3 tables, submitted to IEEE TWC

arXiv:2403.12027 [pdf, other]

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Authors: Kung-Hsiang Huang, Hou Pong Chan, Yi R. Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, Heng Ji

Abstract: Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increa… ▽ More Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increasingly being applied to chart understanding tasks. This survey paper provides a comprehensive overview of the recent developments, challenges, and future directions in chart understanding within the context of these foundation models. We review fundamental building blocks crucial for studying chart understanding tasks. Additionally, we explore various tasks and their evaluation metrics and sources of both charts and textual inputs. Various modeling strategies are then examined, encompassing both classification-based and generation-based approaches, along with tool augmentation techniques that enhance chart understanding performance. Furthermore, we discuss the state-of-the-art performance of each task and discuss how we can improve the performance. Challenges and future directions are addressed, highlighting the importance of several topics, such as domain-specific charts, lack of efforts in develo** evaluation metrics, and agent-oriented settings. This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis, providing valuable insights and directions for future research in chart understanding leveraging large foundation models. The studies mentioned in this paper, along with emerging new research, will be continually updated at: https://github.com/khuangaf/Awesome-Chart-Understanding. △ Less

Submitted 25 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.08069 [pdf]

doi 10.1002/adma.202313763

Noncentrosymmetric Triangular Magnet CaMnTeO$_6$: Strong Quantum Fluctuations and Role of s0 vs. s2 Electronic States in Competing Exchange Interactions

Authors: Xudong Huai, Emmanuel Acheampong, Erich Delles, Michał J. Winiarski, Maurice Sorolla II, Lila Nassar, Mingli Liang, Caleb Ramette, Huiwen Ji, Allen Scheie, Stuart Calder, Martin Mourigal, Thao T. Tran

Abstract: Noncentrosymmetric triangular magnets offer a unique platform for realizing strong quantum fluctuations. However, designing these quantum materials remains an open challenge attributable to a knowledge gap in the tunability of competing exchange interactions at the atomic level. Here, we create a new noncentrosymmetric triangular S = 3/2 magnet CaMnTeO$_6$ based on careful chemical and physical co… ▽ More Noncentrosymmetric triangular magnets offer a unique platform for realizing strong quantum fluctuations. However, designing these quantum materials remains an open challenge attributable to a knowledge gap in the tunability of competing exchange interactions at the atomic level. Here, we create a new noncentrosymmetric triangular S = 3/2 magnet CaMnTeO$_6$ based on careful chemical and physical considerations. The model material displays competing magnetic interactions and features nonlinear optical responses with the capability of generating coherent photons. The incommensurate magnetic ground state of CaMnTeO$_6$ with an unusually large spin rotation angle of 127 deg.(1) indicates that the anisotropic interlayer exchange is strong and competing with the isotropic interlayer Heisenberg interaction. The moment of 1.39(1) $μ$B, extracted from low-temperature heat capacity and neutron diffraction measurements, is only 46% of the expected value of the static moment 3 $μ$B. This reduction indicates the presence of strong quantum fluctuations in the half-integer spin S = 3/2 CaMnTeO$_6$ magnet, which is rare. By comparing the spin-polarized band structure, chemical bonding, and physical properties of AMnTeO$_6$ (A = Ca, Sr, Pb), we demonstrate how quantum-chemical interpretation can illuminate insights into the fundamentals of magnetic exchange interactions, providing a powerful tool for modulating spin dynamics with atomically precise control. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.06093 [pdf, other]

Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

Authors: Haoxuanye Ji, Pengpeng Liang, Erkang Cheng

Abstract: Multi-camera-based 3D object detection has made notable progress in the past several years. However, we observe that there are cases (e.g. faraway regions) in which popular 2D object detectors are more reliable than state-of-the-art 3D detectors. In this paper, to improve the performance of query-based 3D object detectors, we present a novel query generating approach termed QAF2D, which infers 3D… ▽ More Multi-camera-based 3D object detection has made notable progress in the past several years. However, we observe that there are cases (e.g. faraway regions) in which popular 2D object detectors are more reliable than state-of-the-art 3D detectors. In this paper, to improve the performance of query-based 3D object detectors, we present a novel query generating approach termed QAF2D, which infers 3D query anchors from 2D detection results. A 2D bounding box of an object in an image is lifted to a set of 3D anchors by associating each sampled point within the box with depth, yaw angle, and size candidates. Then, the validity of each 3D anchor is verified by comparing its projection in the image with its corresponding 2D box, and only valid anchors are kept and used to construct queries. The class information of the 2D bounding box associated with each query is also utilized to match the predicted boxes with ground truth for the set-based loss. The image feature extraction backbone is shared between the 3D detector and 2D detector by adding a small number of prompt parameters. We integrate QAF2D into three popular query-based 3D object detectors and carry out comprehensive evaluations on the nuScenes dataset. The largest improvement that QAF2D can bring about on the nuScenes validation subset is $2.3\%$ NDS and $2.7\%$ mAP. Code is available at https://github.com/nullmax-vision/QAF2D. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.05159 [pdf, other]

LVIC: Multi-modality segmentation by Lifting Visual Info as Cue

Authors: Zichao Dong, Bowen Pang, Xufeng Huang, Hang Ji, Xin Zhan, Junbo Chen

Abstract: Multi-modality fusion is proven an effective method for 3d perception for autonomous driving. However, most current multi-modality fusion pipelines for LiDAR semantic segmentation have complicated fusion mechanisms. Point painting is a quite straight forward method which directly bind LiDAR points with visual information. Unfortunately, previous point painting like methods suffer from projection e… ▽ More Multi-modality fusion is proven an effective method for 3d perception for autonomous driving. However, most current multi-modality fusion pipelines for LiDAR semantic segmentation have complicated fusion mechanisms. Point painting is a quite straight forward method which directly bind LiDAR points with visual information. Unfortunately, previous point painting like methods suffer from projection error between camera and LiDAR. In our experiments, we find that this projection error is the devil in point painting. As a result of that, we propose a depth aware point painting mechanism, which significantly boosts the multi-modality fusion. Apart from that, we take a deeper look at the desired visual feature for LiDAR to operate semantic segmentation. By Lifting Visual Information as Cue, LVIC ranks 1st on nuScenes LiDAR semantic segmentation benchmark. Our experiments show the robustness and effectiveness. Codes would be make publicly available soon. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.00791 [pdf, other]

L+M-24: Building a Dataset for Language + Molecules @ ACL 2024

Authors: Carl Edwards, Qingyun Wang, Lawrence Zhao, Heng Ji

Abstract: Language-molecule models have emerged as an exciting direction for molecular discovery and understanding. However, training these models is challenging due to the scarcity of molecule-language pair datasets. At this point, datasets have been released which are 1) small and scraped from existing databases, 2) large but noisy and constructed by performing entity linking on the scientific literature,… ▽ More Language-molecule models have emerged as an exciting direction for molecular discovery and understanding. However, training these models is challenging due to the scarcity of molecule-language pair datasets. At this point, datasets have been released which are 1) small and scraped from existing databases, 2) large but noisy and constructed by performing entity linking on the scientific literature, and 3) built by converting property prediction datasets to natural language using templates. In this document, we detail the $\textit{L+M-24}$ dataset, which has been created for the Language + Molecules Workshop shared task at ACL 2024. In particular, $\textit{L+M-24}$ is designed to focus on three key benefits of natural language in molecule design: compositionality, functionality, and abstraction. △ Less

Submitted 4 July, 2024; v1 submitted 22 February, 2024; originally announced March 2024.

Comments: The dataset, finetuned baselines, and evaluation code are released publicly at https://github.com/language-plus-molecules/LPM-24-Dataset through https://huggingface.co/language-plus-molecules

arXiv:2402.19275 [pdf, other]

Adaptive Testing Environment Generation for Connected and Automated Vehicles with Dense Reinforcement Learning

Authors: **gxuan Yang, Ruoxuan Bai, Haoyuan Ji, Yi Zhang, Jianming Hu, Shuo Feng

Abstract: The assessment of safety performance plays a pivotal role in the development and deployment of connected and automated vehicles (CAVs). A common approach involves designing testing scenarios based on prior knowledge of CAVs (e.g., surrogate models), conducting tests in these scenarios, and subsequently evaluating CAVs' safety performances. However, substantial differences between CAVs and the prio… ▽ More The assessment of safety performance plays a pivotal role in the development and deployment of connected and automated vehicles (CAVs). A common approach involves designing testing scenarios based on prior knowledge of CAVs (e.g., surrogate models), conducting tests in these scenarios, and subsequently evaluating CAVs' safety performances. However, substantial differences between CAVs and the prior knowledge can significantly diminish the evaluation efficiency. In response to this issue, existing studies predominantly concentrate on the adaptive design of testing scenarios during the CAV testing process. Yet, these methods have limitations in their applicability to high-dimensional scenarios. To overcome this challenge, we develop an adaptive testing environment that bolsters evaluation robustness by incorporating multiple surrogate models and optimizing the combination coefficients of these surrogate models to enhance evaluation efficiency. We formulate the optimization problem as a regression task utilizing quadratic programming. To efficiently obtain the regression target via reinforcement learning, we propose the dense reinforcement learning method and devise a new adaptive policy with high sample efficiency. Essentially, our approach centers on learning the values of critical scenes displaying substantial surrogate-to-real gaps. The effectiveness of our method is validated in high-dimensional overtaking scenarios, demonstrating that our approach achieves notable evaluation efficiency. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.18077 [pdf, ps, other]

Locating heating channels of the solar corona in a plage region with the aid of high-resolution 10830 Å filtergrams

Authors: Parida Hashim, Fangyu Xu, Ya Wang, Weijie Men, **hua Shen, Yingna Su, Jian** Li, Zhenyu **, Haisheng Ji

Abstract: In this paper, with a set of high-resolution He I 10830 Å filtergrams, we select an area in a plage, very likely an EUV moss area, as an interface layer to follow the clues of coronal heating channels down to the photosphere. The filtergrams are obtained from the 1-meter aperture New Vacuum Solar Telescope (NVST). We make a distinction between the darker and the brighter regions in the selected ar… ▽ More In this paper, with a set of high-resolution He I 10830 Å filtergrams, we select an area in a plage, very likely an EUV moss area, as an interface layer to follow the clues of coronal heating channels down to the photosphere. The filtergrams are obtained from the 1-meter aperture New Vacuum Solar Telescope (NVST). We make a distinction between the darker and the brighter regions in the selected area and name the two regions enhanced absorption patches (EAPs) and low absorption patches (LAPs). With well-aligned, nearly simultaneous data from multiple channels of the AIA and the continuum of the HMI on board SDO, we compare the EUV/UV emissions, emission measure, mean temperature, and continuum intensity in the two kinds of regions. The following progress is made: 1) The mean EUV emissions over EAPs are mostly stronger than the corresponding emissions over LAPs except for the emission at 335 Å. The UV emissions at 1600 and 1700 Å fail to capture the difference between the two regions. 2) In the logarithmic temperature range of 5.6-6.2, EAPs have higher EUV emission measure than LAPs, but they have lower mean coronal temperature. 3) The mean continuum intensity over EAPs is lower. Based on the above progress, we suggest that the energy for coronal heating in the moss region can be traced down to some areas in intergranular lanes with enhanced density of both cool and hot material. The lower temperature over the EAPs is due to the greater fraction of cool material over there. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: ApJ accepted for publication. 11 pages, 7 figures

arXiv:2402.16315 [pdf, other]

Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models

Authors: Jeonghwan Kim, Heng Ji

Abstract: Recent advances in instruction-tuned Large Vision-Language Models (LVLMs) have imbued the models with the ability to generate high-level, image-grounded explanations with ease. While such capability is largely attributed to the rich world knowledge contained within the Large Language Models (LLMs), our work reveals their shortcomings in fine-grained visual categorization (FGVC) across six differen… ▽ More Recent advances in instruction-tuned Large Vision-Language Models (LVLMs) have imbued the models with the ability to generate high-level, image-grounded explanations with ease. While such capability is largely attributed to the rich world knowledge contained within the Large Language Models (LLMs), our work reveals their shortcomings in fine-grained visual categorization (FGVC) across six different benchmark settings. Most recent state-of-the-art LVLMs like LLaVa-1.5, InstructBLIP and GPT-4V not only severely deteriorate in terms of classification performance, e.g., average drop of 65.58 in EM for Stanford Dogs for LLaVA-1.5, but also struggle to generate an accurate explanation with detailed attributes based on the concept that appears within an input image despite their capability to generate holistic image-level descriptions. In-depth analyses show that instruction-tuned LVLMs exhibit modality gap, showing discrepancy when given textual and visual inputs that correspond to the same concept, preventing the image modality from leveraging the rich parametric knowledge within the LLMs. In an effort to further the community's endeavor in this direction, we propose a multiple granularity attribute-centric evaluation benchmark, Finer, which aims to establish a ground to evaluate LVLMs' fine-grained visual comprehension ability and provide significantly improved explainability. △ Less

Submitted 11 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.15796 [pdf]

Construction and application of artificial intelligence crowdsourcing map based on multi-track GPS data

Authors: Yong Wang, Yanlin Zhou, Huan Ji, Zheng He, Xinyu Shen

Abstract: In recent years, the rapid development of high-precision map technology combined with artificial intelligence has ushered in a new development opportunity in the field of intelligent vehicles. High-precision map technology is an important guarantee for intelligent vehicles to achieve autonomous driving. However, due to the lack of research on high-precision map technology, it is difficult to ratio… ▽ More In recent years, the rapid development of high-precision map technology combined with artificial intelligence has ushered in a new development opportunity in the field of intelligent vehicles. High-precision map technology is an important guarantee for intelligent vehicles to achieve autonomous driving. However, due to the lack of research on high-precision map technology, it is difficult to rationally use this technology in the field of intelligent vehicles. Therefore, relevant researchers studied a fast and effective algorithm to generate high-precision GPS data from a large number of low-precision GPS trajectory data fusion, and generated several key data points to simplify the description of GPS trajectory, and realized the "crowdsourced update" model based on a large number of social vehicles for map data collection came into being. This kind of algorithm has the important significance to improve the data accuracy, reduce the measurement cost and reduce the data storage space. On this basis, this paper analyzes the implementation form of crowdsourcing map, so as to improve the various information data in the high-precision map according to the actual situation, and promote the high-precision map can be reasonably applied to the intelligent car. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.14312 [pdf, other]

doi 10.61977/ati2024008

The Jiao Tong University Spectroscopic Telescope Project

Authors: JUST Team, Chengze Liu, Ying Zu, Fabo Feng, Zhaoyu Li, Yu Yu, Hua Bai, Xiangqun Cui, Bozhong Gu, Yizhou Gu, Jiaxin Han, Yonghui Hou, Zhongwen Hu, Hangxin Ji, Yipeng **g, Wei Li, Zhaoxiang Qi, Xianyu Tan, Cairang Tian, Dehua Yang, Xiangyan Yuan, Chao Zhai, Congcong Zhang, Jun Zhang, Haotong Zhang , et al. (6 additional authors not shown)

Abstract: The Jiao Tong University Spectroscopic Telescope (JUST) is a 4.4-meter f/6.0 segmentedmirror telescope dedicated to spectroscopic observations. The JUST primary mirror is composed of 18 hexagonal segments, each with a diameter of 1.1 m. JUST provides two Nasmyth platforms for placing science instruments. One Nasmyth focus fits a field of view of 10 arcmin and the other has an extended field of vie… ▽ More The Jiao Tong University Spectroscopic Telescope (JUST) is a 4.4-meter f/6.0 segmentedmirror telescope dedicated to spectroscopic observations. The JUST primary mirror is composed of 18 hexagonal segments, each with a diameter of 1.1 m. JUST provides two Nasmyth platforms for placing science instruments. One Nasmyth focus fits a field of view of 10 arcmin and the other has an extended field of view of 1.2 deg with correction optics. A tertiary mirror is used to switch between the two Nasmyth foci. JUST will be installed at a site at Lenghu in Qinghai Province, China, and will conduct spectroscopic observations with three types of instruments to explore the dark universe, trace the dynamic universe, and search for exoplanets: (1) a multi-fiber (2000 fibers) medium-resolution spectrometer (R=4000-5000) to spectroscopically map galaxies and large-scale structure; (2) an integral field unit (IFU) array of 500 optical fibers and/or a long-slit spectrograph dedicated to fast follow-ups of transient sources for multimessenger astronomy; (3) a high-resolution spectrometer (R~100000) designed to identify Jupiter analogs and Earth-like planets, with the capability to characterize the atmospheres of hot exoplanets. △ Less

Submitted 29 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 28 pages, 6 figures

arXiv:2402.14221 [pdf, other]

Towards singular optimality in the presence of local initial knowledge

Authors: Hongyan Ji, Sriram V. Pemmaraju

Abstract: The Knowledge Till rho CONGEST model is a variant of the classical CONGEST model of distributed computing in which each vertex v has initial knowledge of the radius-rho ball centered at v. The most commonly studied variants of the CONGEST model are KT0 CONGEST in which nodes initially know nothing about their neighbors and KT1 CONGEST in which nodes initially know the IDs of all their neighbors. I… ▽ More The Knowledge Till rho CONGEST model is a variant of the classical CONGEST model of distributed computing in which each vertex v has initial knowledge of the radius-rho ball centered at v. The most commonly studied variants of the CONGEST model are KT0 CONGEST in which nodes initially know nothing about their neighbors and KT1 CONGEST in which nodes initially know the IDs of all their neighbors. It has been shown that having access to neighbors' IDs (as in the KT1 CONGEST model) can substantially reduce the message complexity of algorithms for fundamental problems such as BROADCAST and MST. For example, King, Kutten, and Thorup (PODC 2015) show how to construct an MST using just Otilde(n) messages in the KT1 CONGEST model, whereas there is an Omega(m) message lower bound for MST in the KT0 CONGEST model. Building on this result, Gmyr and Pandurangen (DISC 2018) present a family of distributed randomized algorithms for various global problems that exhibit a trade-off between message and round complexity. These algorithms are based on constructing a sparse, spanning subgraph called a danner. Specifically, given a graph G and any delta in [0,1], their algorithm constructs (with high probability) a danner that has diameter Otilde(D + n^{1-delta}) and Otilde(min{m,n^{1+delta}}) edges in Otilde(n^{1-delta}) rounds while using Otilde(min{m,n^{1+δ}}) messages, where n, m, and D are the number of nodes, edges, and the diameter of G, respectively. In the main result of this paper, we show that if we assume the KT2 CONGEST model, it is possible to substantially improve the time-message trade-off in constructing a danner. Specifically, we show in the KT2 CONGEST model, how to construct a danner that has diameter Otilde(D + n^{1-2delta}) and Otilde(min{m,n^{1+delta}}) edges in Otilde(n^{1-2delta}) rounds while using Otilde(min{m,n^{1+δ}}) messages for any delta in [0,1/2]. △ Less

Submitted 22 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.11943 [pdf, other]

LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation

Authors: Keyang Xuan, Li Yi, Fan Yang, Ruochen Wu, Yi R. Fung, Heng Ji

Abstract: The rise of multimodal misinformation on social platforms poses significant challenges for individuals and societies. Its increased credibility and broader impact compared to textual misinformation make detection complex, requiring robust reasoning across diverse media types and profound knowledge for accurate verification. The emergence of Large Vision Language Model (LVLM) offers a potential sol… ▽ More The rise of multimodal misinformation on social platforms poses significant challenges for individuals and societies. Its increased credibility and broader impact compared to textual misinformation make detection complex, requiring robust reasoning across diverse media types and profound knowledge for accurate verification. The emergence of Large Vision Language Model (LVLM) offers a potential solution to this problem. Leveraging their proficiency in processing visual and textual information, LVLM demonstrates promising capabilities in recognizing complex information and exhibiting strong reasoning skills. In this paper, we first investigate the potential of LVLM on multimodal misinformation detection. We find that even though LVLM has a superior performance compared to LLMs, its profound reasoning may present limited power with a lack of evidence. Based on these observations, we propose LEMMA: LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation. LEMMA leverages LVLM intuition and reasoning capabilities while augmenting them with external knowledge to enhance the accuracy of misinformation detection. Our method improves the accuracy over the top baseline LVLM by 7% and 13% on Twitter and Fakeddit datasets respectively. △ Less

Submitted 20 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11324 [pdf, other]

EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries

Authors: Jiateng Liu, Pengfei Yu, Yuji Zhang, Sha Li, Zixuan Zhang, Heng Ji

Abstract: The dynamic nature of real-world information necessitates efficient knowledge editing (KE) in large language models (LLMs) for knowledge updating. However, current KE approaches, which typically operate on (subject, relation, object) triples, ignore the contextual information and the relation among different knowledge. Such editing methods could thus encounter an uncertain editing boundary, leavin… ▽ More The dynamic nature of real-world information necessitates efficient knowledge editing (KE) in large language models (LLMs) for knowledge updating. However, current KE approaches, which typically operate on (subject, relation, object) triples, ignore the contextual information and the relation among different knowledge. Such editing methods could thus encounter an uncertain editing boundary, leaving a lot of relevant knowledge in ambiguity: Queries that could be answered pre-edit cannot be reliably answered afterward. In this work, we analyze this issue by introducing a theoretical framework for KE that highlights an overlooked set of knowledge that remains unchanged and aids in knowledge deduction during editing, which we name as the deduction anchor. We further address this issue by proposing a novel task of event-based knowledge editing that pairs facts with event descriptions. This task manifests not only a closer simulation of real-world editing scenarios but also a more logically sound setting, implicitly defining the deduction anchor to address the issue of indeterminate editing boundaries. We empirically demonstrate the superiority of event-based editing over the existing setting on resolving uncertainty in edited models, and curate a new benchmark dataset EvEdit derived from the CounterFact dataset. Moreover, while we observe that the event-based setting is significantly challenging for existing approaches, we propose a novel approach Self-Edit that showcases stronger performance, achieving 55.6% consistency improvement while maintaining the naturalness of generation. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.11060 [pdf, other]

Persona-DB: Efficient Large Language Model Personalization for Response Prediction with Collaborative Data Refinement

Authors: Chenkai Sun, Ke Yang, Revanth Gangi Reddy, Yi R. Fung, Hou Pong Chan, ChengXiang Zhai, Heng Ji

Abstract: The increasing demand for personalized interactions with large language models (LLMs) calls for the development of methodologies capable of accurately and efficiently identifying user opinions and preferences. Retrieval augmentation emerges as an effective strategy, as it can accommodate a vast number of users without the costs from fine-tuning. Existing research, however, has largely focused on e… ▽ More The increasing demand for personalized interactions with large language models (LLMs) calls for the development of methodologies capable of accurately and efficiently identifying user opinions and preferences. Retrieval augmentation emerges as an effective strategy, as it can accommodate a vast number of users without the costs from fine-tuning. Existing research, however, has largely focused on enhancing the retrieval stage and devoted limited exploration toward optimizing the representation of the database, a crucial aspect for tasks such as personalization. In this work, we examine the problem from a novel angle, focusing on how data can be better represented for more efficient retrieval in the context of LLM customization. To tackle this challenge, we introduce Persona-DB, a simple yet effective framework consisting of a hierarchical construction process to improve generalization across task contexts and collaborative refinement to effectively bridge knowledge gaps among users. In the task of response forecasting, Persona-DB demonstrates superior efficiency in maintaining accuracy with a significantly reduced retrieval size, a critical advantage in scenarios with extensive histories or limited context windows. Our experiments also indicate a marked improvement of over 15% under cold-start scenarios, when users have extremely sparse data. Furthermore, our analysis reveals the increasing importance of collaborative knowledge as the retrieval capacity expands. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10980 [pdf, other]

ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback

Authors: Henry W. Sprueill, Carl Edwards, Khushbu Agarwal, Mariefel V. Olarte, Udishnu Sanyal, Conrad Johnston, Hongbin Liu, Heng Ji, Sutanay Choudhury

Abstract: The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent… ▽ More The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and reaction energy barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts. We introduce planning methods that automatically guide the exploration without human input, providing competitive performance against expert-enumerated chemical descriptor-based implementations. By integrating language-guided reasoning with computational chemistry feedback, our work pioneers AI-accelerated, trustworthy catalyst discovery. △ Less

Submitted 7 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: 9 pages, accepted by ICML 2024, final version

arXiv:2402.09463 [pdf]

Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results

Authors: Kelly Payette, Céline Steger, Roxane Licandro, Priscille de Dumast, Hongwei Bran Li, Matthew Barkovich, Liu Li, Maik Dannecker, Chen Chen, Cheng Ouyang, Niccolò McConnell, Alina Miron, Yongmin Li, Alena Uus, Irina Grigorescu, Paula Ramirez Gilliland, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Haoyu Wang, Ziyan Huang, ** Ye, Mireia Alenyà, Valentin Comte, Oscar Camara , et al. (42 additional authors not shown)

Abstract: Segmentation is a critical step in analyzing the develo** human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across dif… ▽ More Segmentation is a critical step in analyzing the develo** human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across different imaging centers remains unsolved, limiting real-world clinical applicability. The multi-center FeTA Challenge 2022 focuses on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two imaging centers as well as two additional unseen centers. The data from different centers varied in many aspects, including scanners used, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated in the challenge, and 17 algorithms were evaluated. Here, a detailed overview and analysis of the challenge results are provided, focusing on the generalizability of the submissions. Both in- and out of domain, the white matter and ventricles were segmented with the highest accuracy, while the most challenging structure remains the cerebral cortex due to anatomical complexity. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms. The resulting new methods contribute to improving the analysis of brain development in utero. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Results from FeTA Challenge 2022, held at MICCAI; Manuscript submitted. Supplementary Info (including submission methods descriptions) available here: https://zenodo.org/records/10628648

arXiv:2402.09369 [pdf, other]

Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking

Authors: Yi Fung, Ruining Zhao, Jae Doo, Chenkai Sun, Heng Ji

Abstract: Pretrained large language models have revolutionized many applications but still face challenges related to cultural bias and a lack of cultural commonsense knowledge crucial for guiding cross-culture communication and interactions. Recognizing the shortcomings of existing methods in capturing the diverse and rich cultures across the world, this paper introduces a novel approach for massively mult… ▽ More Pretrained large language models have revolutionized many applications but still face challenges related to cultural bias and a lack of cultural commonsense knowledge crucial for guiding cross-culture communication and interactions. Recognizing the shortcomings of existing methods in capturing the diverse and rich cultures across the world, this paper introduces a novel approach for massively multicultural knowledge acquisition. Specifically, our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages. Leveraging this valuable source of data collection, we construct the CultureAtlas dataset, which covers a wide range of sub-country level geographical regions and ethnolinguistic groups, with data cleaning and preprocessing to ensure textual assertion sentence self-containment, as well as fine-grained cultural profile information extraction. Our dataset not only facilitates the evaluation of language model performance in culturally diverse contexts but also serves as a foundational tool for the development of culturally sensitive and aware language models. Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI, to promote a more inclusive and balanced representation of global cultures in the digital domain. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: preprint

arXiv:2402.07401 [pdf, other]

Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate

Authors: Kyungha Kim, Sangyun Lee, Kung-Hsiang Huang, Hou Pong Chan, Manling Li, Heng Ji

Abstract: Fact-checking research has extensively explored verification but less so the generation of natural-language explanations, crucial for user trust. While Large Language Models (LLMs) excel in text generation, their capability for producing faithful explanations in fact-checking remains underexamined. Our study investigates LLMs' ability to generate such explanations, finding that zero-shot prompts o… ▽ More Fact-checking research has extensively explored verification but less so the generation of natural-language explanations, crucial for user trust. While Large Language Models (LLMs) excel in text generation, their capability for producing faithful explanations in fact-checking remains underexamined. Our study investigates LLMs' ability to generate such explanations, finding that zero-shot prompts often result in unfaithfulness. To address these challenges, we propose the Multi-Agent Debate Refinement (MADR) framework, leveraging multiple LLMs as agents with diverse roles in an iterative refining process aimed at enhancing faithfulness in generated explanations. MADR ensures that the final explanation undergoes rigorous validation, significantly reducing the likelihood of unfaithful elements and aligning closely with the provided evidence. Experimental results demonstrate that MADR significantly improves the faithfulness of LLM-generated explanations to the evidence, advancing the credibility and trustworthiness of these explanations. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.07016 [pdf, other]

REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models

Authors: Yinghao Zhu, Changyu Ren, Shiyun Xie, Shukai Liu, Hangyuan Ji, Zixiang Wang, Tao Sun, Long He, Zhoujun Li, Xi Zhu, Chengwei Pan

Abstract: The integration of multimodal Electronic Health Records (EHR) data has significantly improved clinical predictive capabilities. Leveraging clinical notes and multivariate time-series EHR, existing models often lack the medical context relevent to clinical tasks, prompting the incorporation of external knowledge, particularly from the knowledge graph (KG). Previous approaches with KG knowledge have… ▽ More The integration of multimodal Electronic Health Records (EHR) data has significantly improved clinical predictive capabilities. Leveraging clinical notes and multivariate time-series EHR, existing models often lack the medical context relevent to clinical tasks, prompting the incorporation of external knowledge, particularly from the knowledge graph (KG). Previous approaches with KG knowledge have primarily focused on structured knowledge extraction, neglecting unstructured data modalities and semantic high dimensional medical knowledge. In response, we propose REALM, a Retrieval-Augmented Generation (RAG) driven framework to enhance multimodal EHR representations that address these limitations. Firstly, we apply Large Language Model (LLM) to encode long context clinical notes and GRU model to encode time-series EHR data. Secondly, we prompt LLM to extract task-relevant medical entities and match entities in professionally labeled external knowledge graph (PrimeKG) with corresponding medical knowledge. By matching and aligning with clinical standards, our framework eliminates hallucinations and ensures consistency. Lastly, we propose an adaptive multimodal fusion network to integrate extracted knowledge with multimodal EHR data. Our extensive experiments on MIMIC-III mortality and readmission tasks showcase the superior performance of our REALM framework over baselines, emphasizing the effectiveness of each module. REALM framework contributes to refining the use of multimodal EHR data in healthcare and bridging the gap with nuanced medical context essential for informed clinical predictions. △ Less

Submitted 10 February, 2024; originally announced February 2024.

arXiv:2402.06193 [pdf, other]

Experimental study of Alfvén wave reflection from an Alfvén-speed gradient relevant to the solar coronal holes

Authors: Sayak Bose, Jason M. TenBarge, Troy Carter, Michael Hahn, Hantao Ji, James Juno, Daniel Wolf Savin, Shreekrishna Tripathi, Stephen Vincena

Abstract: We report the first experimental detection of a reflected Alfvén wave from an Alfvén-speed gradient under conditions similar to those in coronal holes. The experiments were conducted in the Large Plasma Device at the University of California, Los Angeles. We present the experimentally measured dependence of the coefficient of reflection versus the wave inhomogeneity parameter, i.e., the ratio of t… ▽ More We report the first experimental detection of a reflected Alfvén wave from an Alfvén-speed gradient under conditions similar to those in coronal holes. The experiments were conducted in the Large Plasma Device at the University of California, Los Angeles. We present the experimentally measured dependence of the coefficient of reflection versus the wave inhomogeneity parameter, i.e., the ratio of the wave length of the incident wave to the length scale of the gradient. Two-fluid simulations using the Gkeyll code qualitatively agree with and support the experimental findings. Our experimental results support models of wave heating that rely on wave reflection at low heights from a smooth Alfvén-speed gradient to drive turbulence. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.06190 [pdf, other]

Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain

Authors: Amin Karimi Monsefi, Payam Karisani, Mengxi Zhou, Stacey Choi, Nathan Doble, Heng Ji, Srinivasan Parthasarathy, Rajiv Ramnath

Abstract: Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introdu… ▽ More Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method to mitigate such challenges. LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy to capture both long-range and short-range feature dependencies adeptly. This is in contrast to existing methods that rely on increasing network capacity to enhance feature extraction. This combination of novel techniques in our model is especially beneficial in medical image segmentation, given the difficulty of learning intricate and often irregular body organ shapes, such as the spleen. Complementary, we propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets. The method combines masking and contrastive learning techniques within a multi-task learning framework and is compatible with both Vision Transformer (ViT) and CNN-based models. We demonstrate the efficacy of our methods in numerous tasks across two standard datasets (i.e., BTCV and MSD). Benchmark comparisons with eight state-of-the-art models highlight LoGoNet's superior performance in both inference time and accuracy. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Showing 1–50 of 551 results for author: Ji, H