Search | arXiv e-print repository

Comparison of No-Reference Image Quality Models via MAP Estimation in Diffusion Latents

Authors: Weixia Zhang, Dingquan Li, Guangtao Zhai, Xiaokang Yang, Kede Ma

Abstract: Contemporary no-reference image quality assessment (NR-IQA) models can effectively quantify the perceived image quality, with high correlations between model predictions and human perceptual scores on fixed test sets. However, little progress has been made in comparing NR-IQA models from a perceptual optimization perspective. Here, for the first time, we demonstrate that NR-IQA models can be plugg… ▽ More Contemporary no-reference image quality assessment (NR-IQA) models can effectively quantify the perceived image quality, with high correlations between model predictions and human perceptual scores on fixed test sets. However, little progress has been made in comparing NR-IQA models from a perceptual optimization perspective. Here, for the first time, we demonstrate that NR-IQA models can be plugged into the maximum a posteriori (MAP) estimation framework for image enhancement. This is achieved by taking the gradients in differentiable and bijective diffusion latents rather than in the raw pixel domain. Different NR-IQA models are likely to induce different enhanced images, which are ultimately subject to psychophysical testing. This leads to a new computational method for comparing NR-IQA models within the analysis-by-synthesis framework. Compared to conventional correlation-based metrics, our method provides complementary insights into the relative strengths and weaknesses of the competing NR-IQA models in the context of perceptual optimization. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.04437 [pdf, other]

StableDrag: Stable Dragging for Point-based Image Editing

Authors: Yutao Cui, Xiaotong Zhao, Guozhen Zhang, Shengming Cao, Kai Ma, Limin Wang

Abstract: Point-based image editing has attracted remarkable attention since the emergence of DragGAN. Recently, DragDiffusion further pushes forward the generative quality via adapting this dragging technique to diffusion models. Despite these great success, this dragging scheme exhibits two major drawbacks, namely inaccurate point tracking and incomplete motion supervision, which may result in unsatisfact… ▽ More Point-based image editing has attracted remarkable attention since the emergence of DragGAN. Recently, DragDiffusion further pushes forward the generative quality via adapting this dragging technique to diffusion models. Despite these great success, this dragging scheme exhibits two major drawbacks, namely inaccurate point tracking and incomplete motion supervision, which may result in unsatisfactory dragging outcomes. To tackle these issues, we build a stable and precise drag-based editing framework, coined as StableDrag, by designing a discirminative point tracking method and a confidence-based latent enhancement strategy for motion supervision. The former allows us to precisely locate the updated handle points, thereby boosting the stability of long-range manipulation, while the latter is responsible for guaranteeing the optimized latent as high-quality as possible across all the manipulation steps. Thanks to these unique designs, we instantiate two types of image editing models including StableDrag-GAN and StableDrag-Diff, which attains more stable dragging performance, through extensive qualitative experiments and quantitative assessment on DragBench. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.02752 [pdf, other]

HINTs: Sensemaking on large collections of documents with Hypergraph visualization and INTelligent agents

Authors: Sam Yu-Te Lee, Kwan-Liu Ma

Abstract: Sensemaking on a large collection of documents (corpus) is a challenging task often found in fields such as market research, legal studies, intelligence analysis, political science, computational linguistics, etc. Previous works approach this problem either from a topic- or entity-based perspective, but they lack interpretability and trust due to poor model alignment. In this paper, we present HIN… ▽ More Sensemaking on a large collection of documents (corpus) is a challenging task often found in fields such as market research, legal studies, intelligence analysis, political science, computational linguistics, etc. Previous works approach this problem either from a topic- or entity-based perspective, but they lack interpretability and trust due to poor model alignment. In this paper, we present HINTs, a visual analytics approach that combines topic- and entity-based techniques seamlessly and integrates Large Language Models (LLMs) as both a general NLP task solver and an intelligent agent. By leveraging the extraction capability of LLMs in the data preparation stage, we model the corpus as a hypergraph that matches the user's mental model when making sense of the corpus. The constructed hypergraph is hierarchically organized with an agglomerative clustering algorithm by combining semantic and connectivity similarity. The system further integrates an LLM-based intelligent chatbot agent in the interface to facilitate sensemaking. To demonstrate the generalizability and effectiveness of the HINTs system, we present two case studies on different domains and a comparative user study. We report our insights on the behavior patterns and challenges when intelligent agents are used to facilitate sensemaking. We find that while intelligent agents can address many challenges in sensemaking, the visual hints that visualizations provide are necessary to address the new problems brought by intelligent agents. We discuss limitations and future work for combining interactive visualization and LLMs more profoundly to better support corpus analysis. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.00862 [pdf, other]

NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism

Authors: Miao Li, Ming-Bin Chen, Bo Tang, Shengbin Hou, Pengyu Wang, Haiying Deng, Zhiyu Li, Feiyu Xiong, Keming Mao, Peng Cheng, Yi Luo

Abstract: We present NewsBench, a novel evaluation framework to systematically assess the capabilities of Large Language Models (LLMs) for editorial capabilities in Chinese journalism. Our constructed benchmark dataset is focused on four facets of writing proficiency and six facets of safety adherence, and it comprises manually and carefully designed 1,267 test samples in the types of multiple choice questi… ▽ More We present NewsBench, a novel evaluation framework to systematically assess the capabilities of Large Language Models (LLMs) for editorial capabilities in Chinese journalism. Our constructed benchmark dataset is focused on four facets of writing proficiency and six facets of safety adherence, and it comprises manually and carefully designed 1,267 test samples in the types of multiple choice questions and short answer questions for five editorial tasks in 24 news domains. To measure performances, we propose different GPT-4 based automatic evaluation protocols to assess LLM generations for short answer questions in terms of writing proficiency and safety adherence, and both are validated by the high correlations with human evaluations. Based on the systematic evaluation framework, we conduct a comprehensive analysis of ten popular LLMs which can handle Chinese. The experimental results highlight GPT-4 and ERNIE Bot as top performers, yet reveal a relative deficiency in journalistic safety adherence in creative writing tasks. Our findings also underscore the need for enhanced ethical guidance in machine-generated journalistic content, marking a step forward in aligning LLMs with journalistic standards and safety considerations. △ Less

Submitted 4 June, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

Comments: Long paper, ACL 2024 Main

arXiv:2403.00334 [pdf, other]

NOVA: A visual interface for assessing polarizing media coverage

Authors: Keshav Dasu, Sam Yu-Te Lee, Ying-Cheng Chen, Kwan-Liu Ma

Abstract: Within the United States, the majority of the populace receives their news online. U.S mainstream media outlets both generate and influence the news consumed by U.S citizens. Many of these citizens have their personal beliefs about these outlets and question the fairness of their reporting. We offer an interactive visualization system for the public to assess their perception of the mainstream med… ▽ More Within the United States, the majority of the populace receives their news online. U.S mainstream media outlets both generate and influence the news consumed by U.S citizens. Many of these citizens have their personal beliefs about these outlets and question the fairness of their reporting. We offer an interactive visualization system for the public to assess their perception of the mainstream media's coverage of a topic against the data. Our system combines belief elicitation techniques and narrative structure designs, emphasizing transparency and user-friendliness to facilitate users' self-assessment on personal beliefs. We gathered $\sim${25k} articles from the span of 2020-2022 from six mainstream media outlets as a testbed. To evaluate our system, we present usage scenarios alongside a user study with a qualitative analysis of user exploration strategies for personal belief assessment. We report our observations from this study and discuss future work and challenges of develo** tools for the public to assess media outlet coverage and belief updating on provocative topics. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.19276 [pdf, other]

Modular Blind Video Quality Assessment

Authors: Wen Wen, Mu Li, Yabin Zhang, Yiting Liao, Junlin Li, Li Zhang, Kede Ma

Abstract: Blind video quality assessment (BVQA) plays a pivotal role in evaluating and improving the viewing experience of end-users across a wide range of video-based platforms and services. Contemporary deep learning-based models primarily analyze video content in its aggressively subsampled format, while being blind to the impact of the actual spatial resolution and frame rate on video quality. In this p… ▽ More Blind video quality assessment (BVQA) plays a pivotal role in evaluating and improving the viewing experience of end-users across a wide range of video-based platforms and services. Contemporary deep learning-based models primarily analyze video content in its aggressively subsampled format, while being blind to the impact of the actual spatial resolution and frame rate on video quality. In this paper, we propose a modular BVQA model and a method of training it to improve its modularity. Our model comprises a base quality predictor, a spatial rectifier, and a temporal rectifier, responding to the visual content and distortion, spatial resolution, and frame rate changes on video quality, respectively. During training, spatial and temporal rectifiers are dropped out with some probabilities to render the base quality predictor a standalone BVQA model, which should work better with the rectifiers. Extensive experiments on both professionally-generated content and user-generated content video databases show that our quality model achieves superior or comparable performance to current methods. Additionally, the modularity of our model offers an opportunity to analyze existing video quality databases in terms of their spatial and temporal complexity. △ Less

Submitted 31 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted by CVPR 2024; Camera-ready version

arXiv:2402.17766 [pdf, other]

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Authors: Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, He Wang, Li Yi, Kaisheng Ma

Abstract: This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages. ShapeLLM is built upon an improved 3D encoder by extending ReCon to ReCon++ that benefits from multi-view image distillation for enhanced geometry understanding. By utilizing ReCon++ as the 3D point clo… ▽ More This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages. ShapeLLM is built upon an improved 3D encoder by extending ReCon to ReCon++ that benefits from multi-view image distillation for enhanced geometry understanding. By utilizing ReCon++ as the 3D point cloud input encoder for LLMs, ShapeLLM is trained on constructed instruction-following data and tested on our newly human-curated evaluation benchmark, 3D MM-Vet. ReCon++ and ShapeLLM achieve state-of-the-art performance in 3D geometry understanding and language-unified 3D interaction tasks, such as embodied visual grounding. △ Less

Submitted 6 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Project page: https://qizekun.github.io/shapellm/

arXiv:2402.15678 [pdf, other]

Minions: Accelerating Large Language Model Inference with Adaptive and Collective Speculative Decoding

Authors: Siqi Wang, Hailong Yang, Xuezhu Wang, Tongxuan Liu, Pengbo Wang, Xuning Liang, Kejie Ma, Tianyu Feng, Xin You, Yongjun Bao, Yi Liu, Zhongzhi Luan, Depei Qian

Abstract: Large language models (LLM) have recently attracted surging interest due to their outstanding capabilities across various domains. However, enabling efficient LLM inference is challenging due to its autoregressive decoding that generates tokens only one at a time. Although research works apply pruning or quantization to speed up LLM inference, they typically require fine-tuning the LLM, incurring… ▽ More Large language models (LLM) have recently attracted surging interest due to their outstanding capabilities across various domains. However, enabling efficient LLM inference is challenging due to its autoregressive decoding that generates tokens only one at a time. Although research works apply pruning or quantization to speed up LLM inference, they typically require fine-tuning the LLM, incurring significant time and economic costs. Meanwhile, speculative decoding has been proposed to use small speculative models (SSMs) to accelerate the inference of LLM. However, the low acceptance rate of SSM and the high verification cost of LLM prohibit further performance improvement of inference. In this paper, we propose Minions, an LLM inference system that accelerates LLM inference with a collective and adaptive speculative generation. Specifically, Minions proposes a majority-voted mechanism to leverage multiple SSMs to jointly speculate the outputs of LLM, which improves the inference performance without introducing prohibitive computation costs for LLM. To better trade off the number of tokens speculated from SSM and the verification cost of LLM, Minions proposes an adaptive mechanism to dynamically determine the optimal speculation length of SSM, which can achieve better inference performance across different models, datasets, and hyper-parameters. In addition, Minions decouples the SSM decoding and LLM verification efficiently and adopts a pipelined execution mechanism to further improve the inference performance of LLM. By comparing with the state-of-the-art LLM inference systems, we demonstrate that Minions can achieve higher inference throughput and lower inference time. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.14424 [pdf, other]

doi 10.31234/osf.io/7ck9m

Automating Psychological Hypothesis Generation with AI: Large Language Models Meet Causal Graph

Authors: Song Tong, Kai Mao, Zhen Huang, Yukun Zhao, Kai** Peng

Abstract: Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We analyzed 43,312 psychology articles using a LLM to extract causal relation pairs. This analysis produced a specialized causal graph for psychology. Applying link prediction algorithms, we generated 130 pote… ▽ More Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We analyzed 43,312 psychology articles using a LLM to extract causal relation pairs. This analysis produced a specialized causal graph for psychology. Applying link prediction algorithms, we generated 130 potential psychological hypotheses focusing on `well-being', then compared them against research ideas conceived by doctoral scholars and those produced solely by the LLM. Interestingly, our combined approach of a LLM and causal graphs mirrored the expert-level insights in terms of novelty, clearly surpassing the LLM-only hypotheses (t(59) = 3.34, p=0.007 and t(59) = 4.32, p<0.001, respectively). This alignment was further corroborated using deep semantic analysis. Our results show that combining LLM with machine learning techniques such as causal knowledge graphs can revolutionize automated discovery in psychology, extracting novel insights from the extensive literature. This work stands at the crossroads of psychology and artificial intelligence, championing a new enriched paradigm for data-driven hypothesis generation in psychological research. △ Less

Submitted 17 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.14354 [pdf, other]

GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a Gradient-Aware Mask and Semantic Constraints

Authors: Anqi Cheng, Zhiyuan Yang, Haiyue Zhu, Kezhi Mao

Abstract: Self-supervised depth estimation has evolved into an image reconstruction task that minimizes a photometric loss. While recent methods have made strides in indoor depth estimation, they often produce inconsistent depth estimation in textureless areas and unsatisfactory depth discrepancies at object boundaries. To address these issues, in this work, we propose GAM-Depth, developed upon two novel co… ▽ More Self-supervised depth estimation has evolved into an image reconstruction task that minimizes a photometric loss. While recent methods have made strides in indoor depth estimation, they often produce inconsistent depth estimation in textureless areas and unsatisfactory depth discrepancies at object boundaries. To address these issues, in this work, we propose GAM-Depth, developed upon two novel components: gradient-aware mask and semantic constraints. The gradient-aware mask enables adaptive and robust supervision for both key areas and textureless regions by allocating weights based on gradient magnitudes.The incorporation of semantic constraints for indoor self-supervised depth estimation improves depth discrepancies at object boundaries, leveraging a co-optimization network and proxy semantic labels derived from a pretrained segmentation model. Experimental studies on three indoor datasets, including NYUv2, ScanNet, and InteriorNet, show that GAM-Depth outperforms existing methods and achieves state-of-the-art performance, signifying a meaningful step forward in indoor depth estimation. Our code will be available at https://github.com/AnqiCheng1234/GAM-Depth. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: To be published in 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2402.12774 [pdf, other]

Interpreting Conversational Dense Retrieval by Rewriting-Enhanced Inversion of Session Embedding

Authors: Yiruo Cheng, Kelong Mao, Zhicheng Dou

Abstract: Conversational dense retrieval has shown to be effective in conversational search. However, a major limitation of conversational dense retrieval is their lack of interpretability, hindering intuitive understanding of model behaviors for targeted improvements. This paper presents CONVINV, a simple yet effective approach to shed light on interpretable conversational dense retrieval models. CONVINV t… ▽ More Conversational dense retrieval has shown to be effective in conversational search. However, a major limitation of conversational dense retrieval is their lack of interpretability, hindering intuitive understanding of model behaviors for targeted improvements. This paper presents CONVINV, a simple yet effective approach to shed light on interpretable conversational dense retrieval models. CONVINV transforms opaque conversational session embeddings into explicitly interpretable text while faithfully maintaining their original retrieval performance as much as possible. Such transformation is achieved by training a recently proposed Vec2Text model based on the ad-hoc query encoder, leveraging the fact that the session and query embeddings share the same space in existing conversational dense retrieval. To further enhance interpretability, we propose to incorporate external interpretable query rewrites into the transformation process. Extensive evaluations on three conversational search benchmarks demonstrate that CONVINV can yield more interpretable text and faithfully preserve original retrieval performance than baselines. Our work connects opaque session embeddings with transparent query rewriting, paving the way toward trustworthy conversational search. △ Less

Submitted 1 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted by ACL 2024. Repo: https://github.com/Ariya12138/ConvInv

arXiv:2402.11480 [pdf, other]

Pattern-wise Transparent Sequential Recommendation

Authors: Kun Ma, Cong Xu, Zeyuan Chen, Wei Zhang

Abstract: A transparent decision-making process is essential for develo** reliable and trustworthy recommender systems. For sequential recommendation, it means that the model can identify critical items asthe justifications for its recommendation results. However, achieving both model transparency and recommendation performance simultaneously is challenging, especially for models that take the entire sequ… ▽ More A transparent decision-making process is essential for develo** reliable and trustworthy recommender systems. For sequential recommendation, it means that the model can identify critical items asthe justifications for its recommendation results. However, achieving both model transparency and recommendation performance simultaneously is challenging, especially for models that take the entire sequence of items as input without screening. In this paper,we propose an interpretable framework (named PTSR) that enables a pattern-wise transparent decision-making process. It breaks the sequence of items into multi-level patterns that serve as atomic units for the entire recommendation process. The contribution of each pattern to the outcome is quantified in the probability space. With a carefully designed pattern weighting correction, the pattern contribution can be learned in the absence of ground-truth critical patterns. The final recommended items are those items that most critical patterns strongly endorse. Extensive experiments on four public datasets demonstrate remarkable recommendation performance, while case studies validate the model transparency. Our code is available at https://anonymous.4open.science/r/PTSR-2237. △ Less

Submitted 9 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.11419 [pdf, other]

A Self-Healing Magnetic-Array-Type Current Sensor with Data-Driven Identification of Abnormal Magnetic Measurement Units

Authors: Xiaohu Liu, Wei Zhao, Kang Ma, Jian Liu, Lisha Peng, Songling Huang, Shisong Li

Abstract: Magnetic-array-type current sensors have garnered increasing popularity owing to their notable advantages, including broadband functionality, a large dynamic range, cost-effectiveness, and compact dimensions. However, the susceptibility of the measurement error of one or more magnetic measurement units (MMUs) within the current sensor to drift significantly from the nominal value due to environmen… ▽ More Magnetic-array-type current sensors have garnered increasing popularity owing to their notable advantages, including broadband functionality, a large dynamic range, cost-effectiveness, and compact dimensions. However, the susceptibility of the measurement error of one or more magnetic measurement units (MMUs) within the current sensor to drift significantly from the nominal value due to environmental factors poses a potential threat to the measurement accuracy of the current sensor.In light of the need to ensure sustained measurement accuracy over the long term, this paper proposes an innovative self-healing approach rooted in cyber-physics correlation. This approach aims to identify MMUs exhibiting abnormal measurement errors, allowing for the exclusive utilization of the remaining unaffected MMUs in the current measurement process. To achieve this, principal component analysis (PCA) is employed to discern the primary component, arising from fluctuations of the measured current, from the residual component, attributed to the drift in measurement error. This analysis is conducted by scrutinizing the measured data obtained from the MMUs. Subsequently, the squared prediction error (SPE) statistic (also called $Q$ statistic) is deployed to individually identify any MMU displaying abnormal behavior. The experimental results demonstrate the successful online identification of abnormal MMUs without the need for a standard magnetic field sensor. By eliminating the contributions from the identified abnormal MMUs, the accuracy of the current measurement is effectively preserved. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: 11 pages, 10 figures

arXiv:2402.11250 [pdf, other]

Hierarchical Prior-based Super Resolution for Point Cloud Geometry Compression

Authors: Dingquan Li, Kede Ma, **g Wang, Ge Li

Abstract: The Geometry-based Point Cloud Compression (G-PCC) has been developed by the Moving Picture Experts Group to compress point clouds. In its lossy mode, the reconstructed point cloud by G-PCC often suffers from noticeable distortions due to the naïve geometry quantization (i.e., grid downsampling). This paper proposes a hierarchical prior-based super resolution method for point cloud geometry compre… ▽ More The Geometry-based Point Cloud Compression (G-PCC) has been developed by the Moving Picture Experts Group to compress point clouds. In its lossy mode, the reconstructed point cloud by G-PCC often suffers from noticeable distortions due to the naïve geometry quantization (i.e., grid downsampling). This paper proposes a hierarchical prior-based super resolution method for point cloud geometry compression. The content-dependent hierarchical prior is constructed at the encoder side, which enables coarse-to-fine super resolution of the point cloud geometry at the decoder side. A more accurate prior generally yields improved reconstruction performance, at the cost of increased bits required to encode this side information. With a proper balance between prior accuracy and bit consumption, the proposed method demonstrates substantial Bjontegaard-delta bitrate savings on the MPEG Cat1A dataset, surpassing the octree-based and trisoup-based G-PCC v14. We provide our implementations for reproducible research at https://github.com/lidq92/mpeg-pcc-tmc13. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.09760 [pdf, other]

Grounding Language Model with Chunking-Free In-Context Retrieval

Authors: Hong** Qian, Zheng Liu, Kelong Mao, Yujia Zhou, Zhicheng Dou

Abstract: This paper presents a novel Chunking-Free In-Context (CFIC) retrieval approach, specifically tailored for Retrieval-Augmented Generation (RAG) systems. Traditional RAG systems often struggle with grounding responses using precise evidence text due to the challenges of processing lengthy documents and filtering out irrelevant content. Commonly employed solutions, such as document chunking and adapt… ▽ More This paper presents a novel Chunking-Free In-Context (CFIC) retrieval approach, specifically tailored for Retrieval-Augmented Generation (RAG) systems. Traditional RAG systems often struggle with grounding responses using precise evidence text due to the challenges of processing lengthy documents and filtering out irrelevant content. Commonly employed solutions, such as document chunking and adapting language models to handle longer contexts, have their limitations. These methods either disrupt the semantic coherence of the text or fail to effectively address the issues of noise and inaccuracy in evidence retrieval. CFIC addresses these challenges by circumventing the conventional chunking process. It utilizes the encoded hidden states of documents for in-context retrieval, employing auto-aggressive decoding to accurately identify the specific evidence text required for user queries, eliminating the need for chunking. CFIC is further enhanced by incorporating two decoding strategies, namely Constrained Sentence Prefix Decoding and Skip Decoding. These strategies not only improve the efficiency of the retrieval process but also ensure that the fidelity of the generated grounding text evidence is maintained. Our evaluations of CFIC on a range of open QA datasets demonstrate its superiority in retrieving relevant and accurate evidence, offering a significant improvement over traditional methods. By doing away with the need for document chunking, CFIC presents a more streamlined, effective, and efficient retrieval solution, making it a valuable advancement in the field of RAG systems. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.07431 [pdf, other]

SALAD: Smart AI Language Assistant Daily

Authors: Ragib Amin Nihal, Tran Dong Huu Quoc, Lin Zirui, Xu Yimimg, Liu Haoran, An Zhaoyi, Kyou Ma

Abstract: SALAD is an AI-driven language-learning application designed to help foreigners learn Japanese. It offers translations in Kanji-Kana-Romaji, speech recognition, translated audio, vocabulary tracking, grammar explanations, and songs generated from newly learned words. The app targets beginners and intermediate learners, aiming to make language acquisition more accessible and enjoyable. SALAD uses d… ▽ More SALAD is an AI-driven language-learning application designed to help foreigners learn Japanese. It offers translations in Kanji-Kana-Romaji, speech recognition, translated audio, vocabulary tracking, grammar explanations, and songs generated from newly learned words. The app targets beginners and intermediate learners, aiming to make language acquisition more accessible and enjoyable. SALAD uses daily translations to enhance fluency and comfort in communication with native speakers. The primary objectives include effective Japanese language learning, user engagement, and progress tracking. A survey by us found that 39% of foreigners in Japan face discomfort in conversations with Japanese speakers. Over 60% of foreigners expressed confidence in SALAD's ability to enhance their Japanese language skills. The app uses large language models, speech recognition, and diffusion models to bridge the language gap and foster a more inclusive community in Japan. △ Less

Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.07236 [pdf]

Extending Inferences from Randomized Clinical Trials to Target Populations: A Sco** Review of Transportability Methods

Authors: Guanbo Wang, Ting-Wei Ernie Liao, David Furfaro, Leo Anthony Celi, Kevin Sheng-Kai Ma

Abstract: Objective: Randomized controlled trial (RCT) results often inform clinical decision-making, but the highly curated populations of trials and the care provided during the trial are often not reflective of real-world practice. The objective of this sco** review is to identify the ability of methods to transport findings from RCTs to target populations. Study design: A sco** review was conducted… ▽ More Objective: Randomized controlled trial (RCT) results often inform clinical decision-making, but the highly curated populations of trials and the care provided during the trial are often not reflective of real-world practice. The objective of this sco** review is to identify the ability of methods to transport findings from RCTs to target populations. Study design: A sco** review was conducted on the literature focusing on the transportability of the results from RCTs to observational cohorts. Each study was assessed based on the methodology used for transportability and the extent to which the treatment effect from the RCT was estimated in the target population in observational data. Results: A total of 15 published papers were included. The research topics include cardiovascular diseases, infectious diseases, psychiatry, oncology, orthopedics, anesthesiology, and hematology. These studies show that the findings from RCTs could be translated to real-world settings, with varying degrees of effect size and precision. In some cases, the estimated treatment effect for the target population were statistically significantly different from those in RCTs. Conclusion: Despite variations in the magnitude of effects between RCTs and real-world studies, transportability methods play an important role in effectively bridging the RCTs and real-world care delivery, offering valuable insights for evidence-based medicine. △ Less

Submitted 23 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.07092 [pdf, other]

Generalizing Conversational Dense Retrieval via LLM-Cognition Data Augmentation

Authors: Haonan Chen, Zhicheng Dou, Kelong Mao, Jiongnan Liu, Ziliang Zhao

Abstract: Conversational search utilizes muli-turn natural language contexts to retrieve relevant passages. Existing conversational dense retrieval models mostly view a conversation as a fixed sequence of questions and responses, overlooking the severe data sparsity problem -- that is, users can perform a conversation in various ways, and these alternate conversations are unrecorded. Consequently, they ofte… ▽ More Conversational search utilizes muli-turn natural language contexts to retrieve relevant passages. Existing conversational dense retrieval models mostly view a conversation as a fixed sequence of questions and responses, overlooking the severe data sparsity problem -- that is, users can perform a conversation in various ways, and these alternate conversations are unrecorded. Consequently, they often struggle to generalize to diverse conversations in real-world scenarios. In this work, we propose a framework for generalizing Conversational dense retrieval via LLM-cognition data Augmentation (ConvAug). ConvAug first generates multi-level augmented conversations to capture the diverse nature of conversational contexts. Inspired by human cognition, we devise a cognition-aware process to mitigate the generation of false positives, false negatives, and hallucinations. Moreover, we develop a difficulty-adaptive sample filter that selects challenging samples for complex conversations, thereby giving the model a larger learning space. A contrastive learning objective is then employed to train a better conversational context encoder. Extensive experiments conducted on four public datasets, under both normal and zero-shot settings, demonstrate the effectiveness, generalizability, and applicability of ConvAug. The code is released at https://github.com/haon-chen/ConvAug. △ Less

Submitted 3 June, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

Comments: ACL 2024

arXiv:2402.05817 [pdf]

Using YOLO v7 to Detect Kidney in Magnetic Resonance Imaging

Authors: Pouria Yazdian Anari, Fiona Obiezu, Nathan Lay, Fatemeh Dehghani Firouzabadi, Aditi Chaurasia, Mahshid Golagha, Shiva Singh, Fatemeh Homayounieh, Aryan Zahergivar, Stephanie Harmon, Evrim Turkbey, Rabindra Gautam, Kevin Ma, Maria Merino, Elizabeth C. Jones, Mark W. Ball, W. Marston Linehan, Baris Turkbey, Ashkan A. Malayeri

Abstract: Introduction This study explores the use of the latest You Only Look Once (YOLO V7) object detection method to enhance kidney detection in medical imaging by training and testing a modified YOLO V7 on medical image formats. Methods Study includes 878 patients with various subtypes of renal cell carcinoma (RCC) and 206 patients with normal kidneys. A total of 5657 MRI scans for 1084 patients were r… ▽ More Introduction This study explores the use of the latest You Only Look Once (YOLO V7) object detection method to enhance kidney detection in medical imaging by training and testing a modified YOLO V7 on medical image formats. Methods Study includes 878 patients with various subtypes of renal cell carcinoma (RCC) and 206 patients with normal kidneys. A total of 5657 MRI scans for 1084 patients were retrieved. 326 patients with 1034 tumors recruited from a retrospective maintained database, and bounding boxes were drawn around their tumors. A primary model was trained on 80% of annotated cases, with 20% saved for testing (primary test set). The best primary model was then used to identify tumors in the remaining 861 patients and bounding box coordinates were generated on their scans using the model. Ten benchmark training sets were created with generated coordinates on not-segmented patients. The final model used to predict the kidney in the primary test set. We reported the positive predictive value (PPV), sensitivity, and mean average precision (mAP). Results The primary training set showed an average PPV of 0.94 +/- 0.01, sensitivity of 0.87 +/- 0.04, and mAP of 0.91 +/- 0.02. The best primary model yielded a PPV of 0.97, sensitivity of 0.92, and mAP of 0.95. The final model demonstrated an average PPV of 0.95 +/- 0.03, sensitivity of 0.98 +/- 0.004, and mAP of 0.95 +/- 0.01. Conclusion Using a semi-supervised approach with a medical image library, we developed a high-performing model for kidney detection. Further external validation is required to assess the model's generalizability. △ Less

Submitted 12 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05009 [pdf, other]

A Review on Trajectory Datasets on Advanced Driver Assistance System

Authors: Hang Zhou, Ke Ma, Xiaopeng Li

Abstract: This paper presents a comprehensive review of trajectory data of Advanced Driver Assistance System equipped-vehicle, with the aim of precisely model of Autonomous Vehicles (AVs) behavior. This study emphasizes the importance of trajectory data in the development of AV models, especially in car-following scenarios. We introduce and evaluate several datasets: the OpenACC Dataset, the Connected & Aut… ▽ More This paper presents a comprehensive review of trajectory data of Advanced Driver Assistance System equipped-vehicle, with the aim of precisely model of Autonomous Vehicles (AVs) behavior. This study emphasizes the importance of trajectory data in the development of AV models, especially in car-following scenarios. We introduce and evaluate several datasets: the OpenACC Dataset, the Connected & Autonomous Transportation Systems Laboratory Open Dataset, the Vanderbilt ACC Dataset, the Central Ohio Dataset, and the Waymo Open Dataset. Each dataset offers unique insights into AV behaviors, yet they share common challenges in terms of data availability, processing, and standardization. After a series of data cleaning, outlier removal and statistical analysis, this paper transforms datasets of varied formats into a uniform standard, thereby improving their applicability for modeling AV car-following behavior. Key contributions of this study include: 1. the transformation of all datasets into a unified standard format, enhancing their utility for broad research applications; 2. a comparative analysis of these datasets, highlighting their distinct characteristics and implications for car-following model development; 3. the provision of guidelines for future data collection projects, along with the open-source release of all processed data and code for use by the research community. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 6 pages, 2 figures

arXiv:2402.00629 [pdf, other]

Cocco: Hardware-Map** Co-Exploration towards Memory Capacity-Communication Optimization

Authors: Zhanhong Tan, Zijian Zhu, Kaisheng Ma

Abstract: Memory is a critical design consideration in current data-intensive DNN accelerators, as it profoundly determines energy consumption, bandwidth requirements, and area costs. As DNN structures become more complex, a larger on-chip memory capacity is required to reduce data movement overhead, but at the expense of silicon costs. Some previous works have proposed memory-oriented optimizations, such a… ▽ More Memory is a critical design consideration in current data-intensive DNN accelerators, as it profoundly determines energy consumption, bandwidth requirements, and area costs. As DNN structures become more complex, a larger on-chip memory capacity is required to reduce data movement overhead, but at the expense of silicon costs. Some previous works have proposed memory-oriented optimizations, such as different data reuse and layer fusion schemes. However, these methods are not general and potent enough to cope with various graph structures. In this paper, we explore the intrinsic connection between network structures and memory features to optimize both hardware and map**. First, we introduce a graph-level execution scheme with a corresponding dataflow and memory management method. This scheme enables the execution of arbitrary graph patterns with high data reuse and low hardware overhead. Subsequently, we propose Cocco, a hardware-map** co-exploration framework leveraging graph-level features of networks. It aims to minimize communication overhead, such as energy consumption and bandwidth requirements, with a smaller memory capacity. We formulate the graph-partition scheduling and memory configuration search as an optimization problem and employ a genetic-based method to achieve efficient co-exploration for large and irregular networks. Experiments demonstrate that Cocco obtains lower external memory access, lower bandwidth requirements, and more stable optimization for graph partition compared to the greedy algorithm and dynamic programming introduced in prior works. Cocco also reduces the costs by 1.89% to 50.33% using co-exploration compared to other typical methods. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: Accepted by 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'24)

arXiv:2401.16659 [pdf, other]

History-Aware Conversational Dense Retrieval

Authors: Fengran Mo, Chen Qu, Kelong Mao, Tianyu Zhu, Zhan Su, Kaiyu Huang, Jian-Yun Nie

Abstract: Conversational search facilitates complex information retrieval by enabling multi-turn interactions between users and the system. Supporting such interactions requires a comprehensive understanding of the conversational inputs to formulate a good search query based on historical information. In particular, the search query should include the relevant information from the previous conversation turn… ▽ More Conversational search facilitates complex information retrieval by enabling multi-turn interactions between users and the system. Supporting such interactions requires a comprehensive understanding of the conversational inputs to formulate a good search query based on historical information. In particular, the search query should include the relevant information from the previous conversation turns. However, current approaches for conversational dense retrieval primarily rely on fine-tuning a pre-trained ad-hoc retriever using the whole conversational search session, which can be lengthy and noisy. Moreover, existing approaches are limited by the amount of manual supervision signals in the existing datasets. To address the aforementioned issues, we propose a History-Aware Conversational Dense Retrieval (HAConvDR) system, which incorporates two ideas: context-denoised query reformulation and automatic mining of supervision signals based on the actual impact of historical turns. Experiments on two public conversational search datasets demonstrate the improved history modeling capability of HAConvDR, in particular for long conversations with topic shifts. △ Less

Submitted 28 May, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: Accepted to Findings of ACL 2024

arXiv:2401.13919 [pdf, other]

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Authors: Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, Dong Yu

Abstract: The rapid advancement of large language models (LLMs) has led to a new era marked by the development of autonomous applications in real-world scenarios, which drives innovation in creating advanced web agents. Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots, greatly limiting their applicability in real-world s… ▽ More The rapid advancement of large language models (LLMs) has led to a new era marked by the development of autonomous applications in real-world scenarios, which drives innovation in creating advanced web agents. Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots, greatly limiting their applicability in real-world scenarios. To bridge this gap, we introduce WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites. Moreover, we establish a new benchmark by compiling real-world tasks from 15 popular websites and introduce an automatic evaluation protocol leveraging multimodal understanding abilities of GPT-4V to evaluate open-ended web agents. We show that WebVoyager achieves a 59.1% task success rate on our benchmark, significantly surpassing the performance of both GPT-4 (All Tools) and the WebVoyager (text-only) setups, underscoring the exceptional capability of WebVoyager. The proposed automatic evaluation metric achieves 85.3% agreement with human judgment, indicating its effectiveness in providing reliable and accurate assessments of web agents. △ Less

Submitted 6 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted to ACL 2024 (main). Code and data is released at https://github.com/MinorJerry/WebVoyager

arXiv:2401.13485 [pdf, other]

Ti4Ir2O a time-reversal-invariant fully gapped unconventional superconductor

Authors: Debarchan Das, KeYuan Ma, Jan Jaroszynski, Vahid Sazgari, Tomasz Klimczuk, Fabian O. von Rohr, Zurab Guguchia

Abstract: Here we report muon spin rotation (muSR) experiments on the temperature and field dependence of the effective magnetic penetration depth (lambda) in the eta-carbide-type suboxide Ti4Ir2O, a superconductor with an considerably high upper critical field. Temperature dependence of penetration depth, obtained from transverse-field (TF)-muSR measurements, is in perfect agreement with an isotropic fully… ▽ More Here we report muon spin rotation (muSR) experiments on the temperature and field dependence of the effective magnetic penetration depth (lambda) in the eta-carbide-type suboxide Ti4Ir2O, a superconductor with an considerably high upper critical field. Temperature dependence of penetration depth, obtained from transverse-field (TF)-muSR measurements, is in perfect agreement with an isotropic fully gaped superconducting state. Furthermore, our ZF muSR results confirm that the time-reversal symmetry is preserved in the superconducting state. We find, however, a notably low ratio of 1.22 between the superconducting critical temperature and the superfluid density. This value is close to most unconventional superconductors, showing that a very small superfluid density is present in the superconducting state of Ti4Ir2O. The presented results will pave the way for further theoretical and experimental investigations to obtain a microscopic understanding of the origin of such a high upper critical field in an isotropic single gap superconducting system. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: 7 pages, 3 figures. The methodology employed in this paper bears resemblance to that described in arXiv:2209.03187

arXiv:2401.13478 [pdf, other]

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

Authors: Siwei Wu, Yizhi Li, Kang Zhu, Ge Zhang, Yiming Liang, Kai**g Ma, Chenghao Xiao, Haoran Zhang, Bohao Yang, Wenhu Chen, Wenhao Huang, Noura Al Moubayed, Jie Fu, Chenghua Lin

Abstract: Multi-modal information retrieval (MMIR) is a rapidly evolving field, where significant progress, particularly in image-text pairing, has been made through advanced representation learning and cross-modality alignment research. However, current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap, where chart and table images described in… ▽ More Multi-modal information retrieval (MMIR) is a rapidly evolving field, where significant progress, particularly in image-text pairing, has been made through advanced representation learning and cross-modality alignment research. However, current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap, where chart and table images described in scholarly language usually do not play a significant role. To bridge this gap, we develop a specialised scientific MMIR (SciMMIR) benchmark by leveraging open-access paper collections to extract data relevant to the scientific domain. This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents. We further annotate the image-text pairs with two-level subset-subcategory hierarchy annotations to facilitate a more comprehensive evaluation of the baselines. We conducted zero-shot and fine-tuning evaluations on prominent multi-modal image-captioning and visual language models, such as CLIP and BLIP. Our analysis offers critical insights for MMIR in the scientific domain, including the impact of pre-training and fine-tuning settings and the influence of the visual and textual encoders. All our data and checkpoints are publicly available at https://github.com/Wusiwei0410/SciMMIR. △ Less

Submitted 11 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: camera-ready version for ACL 2024 Findings

arXiv:2401.11748 [pdf, other]

doi 10.1109/ICASSP48485.2024.10445924

GI-PIP: Do We Require Impractical Auxiliary Dataset for Gradient Inversion Attacks?

Authors: Yu Sun, Gaojian Xiong, Xianxun Yao, Kailang Ma, Jian Cui

Abstract: Deep gradient inversion attacks expose a serious threat to Federated Learning (FL) by accurately recovering private data from shared gradients. However, the state-of-the-art heavily relies on impractical assumptions to access excessive auxiliary data, which violates the basic data partitioning principle of FL. In this paper, a novel method, Gradient Inversion Attack using Practical Image Prior (GI… ▽ More Deep gradient inversion attacks expose a serious threat to Federated Learning (FL) by accurately recovering private data from shared gradients. However, the state-of-the-art heavily relies on impractical assumptions to access excessive auxiliary data, which violates the basic data partitioning principle of FL. In this paper, a novel method, Gradient Inversion Attack using Practical Image Prior (GI-PIP), is proposed under a revised threat model. GI-PIP exploits anomaly detection models to capture the underlying distribution from fewer data, while GAN-based methods consume significant more data to synthesize images. The extracted distribution is then leveraged to regulate the attack process as Anomaly Score loss. Experimental results show that GI-PIP achieves a 16.12 dB PSNR recovery using only 3.8% data of ImageNet, while GAN-based methods necessitate over 70%. Moreover, GI-PIP exhibits superior capability on distribution generalization compared to GAN-based methods. Our approach significantly alleviates the auxiliary data requirement on both amount and distribution in gradient inversion attacks, hence posing more substantial threat to real-world FL. △ Less

Submitted 1 April, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.06462 [pdf, other]

AttributionScanner: A Visual Analytics System for Model Validation with Metadata-Free Slice Finding

Authors: Xiwei Xuan, Jorge Piazentin Ono, Liang Gou, Kwan-Liu Ma, Liu Ren

Abstract: Data slice finding is an emerging technique for validating machine learning (ML) models by identifying and analyzing subgroups in a dataset that exhibit poor performance, often characterized by distinct feature sets or descriptive metadata. However, in the context of validating vision models involving unstructured image data, this approach faces significant challenges, including the laborious and… ▽ More Data slice finding is an emerging technique for validating machine learning (ML) models by identifying and analyzing subgroups in a dataset that exhibit poor performance, often characterized by distinct feature sets or descriptive metadata. However, in the context of validating vision models involving unstructured image data, this approach faces significant challenges, including the laborious and costly requirement for additional metadata and the complex task of interpreting the root causes of underperformance. To address these challenges, we introduce AttributionScanner, an innovative human-in-the-loop Visual Analytics (VA) system, designed for metadata-free data slice finding. Our system identifies interpretable data slices that involve common model behaviors and visualizes these patterns through an Attribution Mosaic design. Our interactive interface provides straightforward guidance for users to detect, interpret, and annotate predominant model issues, such as spurious correlations (model biases) and mislabeled data, with minimal effort. Additionally, it employs a cutting-edge model regularization technique to mitigate the detected issues and enhance the model's performance. The efficacy of AttributionScanner is demonstrated through use cases involving two benchmark datasets, with qualitative and quantitative evaluations showcasing its substantial effectiveness in vision model validation, ultimately leading to more reliable and accurate models. △ Less

Submitted 4 May, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: 12 pages, 12 figures, 3 tables. This manuscript is under review by the IEEE Transactions on Visualization and Computer Graphics (TVCG)

arXiv:2401.05960 [pdf, other]

Machine Learning Insides OptVerse AI Solver: Design Principles and Applications

Authors: Xijun Li, Fangzhou Zhu, Hui-Ling Zhen, Weilin Luo, Meng Lu, Yimin Huang, Zhenan Fan, Zirui Zhou, Yufei Kuang, Zhihai Wang, Zijie Geng, Yang Li, Haoyang Liu, Zhiwu An, Muming Yang, Jianshu Li, Jie Wang, Junchi Yan, Defeng Sun, Tao Zhong, Yong Zhang, Jia Zeng, Mingxuan Yuan, Jianye Hao, Jun Yao , et al. (1 additional authors not shown)

Abstract: In an era of digital ubiquity, efficient resource management and decision-making are paramount across numerous industries. To this end, we present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI Solver, which aims to mitigate the scarcity of real-world mathematical programming instances, and to surpass the capabilities of traditional opt… ▽ More In an era of digital ubiquity, efficient resource management and decision-making are paramount across numerous industries. To this end, we present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI Solver, which aims to mitigate the scarcity of real-world mathematical programming instances, and to surpass the capabilities of traditional optimization techniques. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. Furthermore, we introduce a training framework leveraging augmentation policies to maintain solvers' utility in dynamic environments. Besides the data generation and augmentation, our proposed approaches also include novel ML-driven policies for personalized solver strategies, with an emphasis on applications like graph convolutional networks for initial basis selection and reinforcement learning for advanced presolving and cut selection. Additionally, we detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance. Compared with traditional solvers such as Cplex and SCIP, our ML-augmented OptVerse AI Solver demonstrates superior speed and precision across both established benchmarks and real-world scenarios, reinforcing the practical imperative and effectiveness of machine learning techniques in mathematical programming solvers. △ Less

Submitted 17 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.05011 [pdf, other]

Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

Authors: Yucheng Han, Na Zhao, Weiling Chen, Keng Teck Ma, Hanwang Zhang

Abstract: Semi-supervised 3D object detection is a promising yet under-explored direction to reduce data annotation costs, especially for cluttered indoor scenes. A few prior works, such as SESS and 3DIoUMatch, attempt to solve this task by utilizing a teacher model to generate pseudo-labels for unlabeled samples. However, the availability of unlabeled samples in the 3D domain is relatively limited compared… ▽ More Semi-supervised 3D object detection is a promising yet under-explored direction to reduce data annotation costs, especially for cluttered indoor scenes. A few prior works, such as SESS and 3DIoUMatch, attempt to solve this task by utilizing a teacher model to generate pseudo-labels for unlabeled samples. However, the availability of unlabeled samples in the 3D domain is relatively limited compared to its 2D counterpart due to the greater effort required to collect 3D data. Moreover, the loose consistency regularization in SESS and restricted pseudo-label selection strategy in 3DIoUMatch lead to either low-quality supervision or a limited amount of pseudo labels. To address these issues, we present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection. Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective. Specifically, from the data-perspective, we propose a class-probabilistic data augmentation method that augments the input data with additional instances based on the varying distribution of class probabilities. Our DPKE achieves feature-perspective knowledge enrichment by designing a geometry-aware feature matching method that regularizes feature-level similarity between object proposals from the student and teacher models. Extensive experiments on the two benchmark datasets demonstrate that our DPKE achieves superior performance over existing state-of-the-art approaches under various label ratio conditions. The source code will be made available to the public. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: Code is available at https://github.com/tingxueronghua/DPKE

arXiv:2401.04662 [pdf, other]

The Devil Behind the Mirror: Tracking the Campaigns of Cryptocurrency Abuses on the Dark Web

Authors: Pengcheng Xia, Zhou Yu, Kailong Wang, Kai Ma, Shuo Chen, Xiapu Luo, Ya** Zhou, Lei Wu, Guangdong Bai

Abstract: The dark web has emerged as the state-of-the-art solution for enhanced anonymity. Just like a double-edged sword, it also inadvertently becomes the safety net and breeding ground for illicit activities. Among them, cryptocurrencies have been prevalently abused to receive illicit income while evading regulations. Despite the continuing efforts to combat illicit activities, there is still a lack of… ▽ More The dark web has emerged as the state-of-the-art solution for enhanced anonymity. Just like a double-edged sword, it also inadvertently becomes the safety net and breeding ground for illicit activities. Among them, cryptocurrencies have been prevalently abused to receive illicit income while evading regulations. Despite the continuing efforts to combat illicit activities, there is still a lack of an in-depth understanding regarding the characteristics and dynamics of cryptocurrency abuses on the dark web. In this work, we conduct a multi-dimensional and systematic study to track cryptocurrency-related illicit activities and campaigns on the dark web. We first harvest a dataset of 4,923 cryptocurrency-related onion sites with over 130K pages. Then, we detect and extract the illicit blockchain transactions to characterize the cryptocurrency abuses, targeting features from single/clustered addresses and illicit campaigns. Throughout our study, we have identified 2,564 illicit sites with 1,189 illicit blockchain addresses, which account for 90.8 BTC in revenue. Based on their inner connections, we further identify 66 campaigns behind them. Our exploration suggests that illicit activities on the dark web have strong correlations, which can guide us to identify new illicit blockchain addresses and onions, and raise alarms at the early stage of their deployment. △ Less

Submitted 7 April, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.03700 [pdf, other]

A Visual Analytics Design for Connecting Healthcare Team Communication to Patient Outcomes

Authors: Hsiao-Ying Lu, Yiran Li, Kwan-Liu Ma

Abstract: Communication among healthcare professionals (HCPs) is crucial for the quality of patient treatment. Surrounding each patient's treatment, communication among HCPs can be examined as temporal networks, constructed from Electronic Health Record (EHR) access logs. This paper introduces a visual analytics system designed to study the effectiveness and efficiency of temporal communication networks med… ▽ More Communication among healthcare professionals (HCPs) is crucial for the quality of patient treatment. Surrounding each patient's treatment, communication among HCPs can be examined as temporal networks, constructed from Electronic Health Record (EHR) access logs. This paper introduces a visual analytics system designed to study the effectiveness and efficiency of temporal communication networks mediated by the EHR system. We present a method that associates network measures with patient survival outcomes and devises effectiveness metrics based on these associations. To analyze communication efficiency, we extract the latencies and frequencies of EHR accesses. Our visual analytics system is designed to assist in inspecting and understanding the composed communication effectiveness metrics and to enable the exploration of communication efficiency by encoding latencies and frequencies in an information flow diagram. We demonstrate and evaluate our system through multiple case studies and an expert review. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.03206 [pdf, ps, other]

A Robbins--Monro Sequence That Can Exploit Prior Information For Faster Convergence

Authors: Siwei Liu, Ke Ma, Stephan M. Goetz

Abstract: We propose a new method to improve the convergence speed of the Robbins-Monro algorithm by introducing prior information about the target point into the Robbins-Monro iteration. We achieve the incorporation of prior information without the need of a -- potentially wrong -- regression model, which would also entail additional constraints. We show that this prior-information Robbins-Monro sequence i… ▽ More We propose a new method to improve the convergence speed of the Robbins-Monro algorithm by introducing prior information about the target point into the Robbins-Monro iteration. We achieve the incorporation of prior information without the need of a -- potentially wrong -- regression model, which would also entail additional constraints. We show that this prior-information Robbins-Monro sequence is convergent for a wide range of prior distributions, even wrong ones, such as Gaussian, weighted sum of Gaussians, e.g., in a kernel density estimate, as well as bounded arbitrary distribution functions greater than zero. We furthermore analyse the sequence numerically to understand its performance and the influence of parameters. The results demonstrate that the prior-information Robbins-Monro sequence converges faster than the standard one, especially during the first steps, which are particularly important for applications where the number of function measurements is limited, and when the noise of observing the underlying function is large. We finally propose a rule to select the parameters of the sequence. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: 26 pages, 5 figures

MSC Class: 62L20; 62L05; 62L10; 60G99; 60-08; 65B99; 65C99; 90C15

arXiv:2401.01518 [pdf, other]

Highly Scalable Quantum Router with Frequency-Independent Scattering Spectra

Authors: Yue Cai, Kang-Jie Ma, Jie Liu, Gang-Feng Guo, Lei Tan, Wu-Ming Liu

Abstract: Optical quantum routers which play a crucial role in quantum networks, have been extensively studied in both theory and experiment, resulting in significant advancements in their performance. However, these routers impose stringent requirements for achieving optimal routing performance, where the incident photon frequency must be in strict resonance with one or several specific frequencies. To add… ▽ More Optical quantum routers which play a crucial role in quantum networks, have been extensively studied in both theory and experiment, resulting in significant advancements in their performance. However, these routers impose stringent requirements for achieving optimal routing performance, where the incident photon frequency must be in strict resonance with one or several specific frequencies. To address this challenge, we have designed an efficient quantum router capable of stable output with 100\% transfer rate over the entire energy band of coupled-resonator waveguide (CRW) by coupling a giant atom to two or more semi-infinite CRWs. We also explain and prove the fundamental physical mechanism behind this distinctive phenomenon as the result of destructive interference between two waves composing the final reflected wave. We hope that quantum router with output results unaffected by the energy of the incoming information carriers present a more reliable solution for the implementation of quantum networks. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2401.00896 [pdf, other]

TrailBlazer: Trajectory Control for Diffusion-Based Video Generation

Authors: Wan-Duo Kurt Ma, J. P. Lewis, W. Bastiaan Kleijn

Abstract: Within recent approaches to text-to-video (T2V) generation, achieving controllability in the synthesized video is often a challenge. Typically, this issue is addressed by providing low-level per-frame guidance in the form of edge maps, depth maps, or an existing video to be altered. However, the process of obtaining such guidance can be labor-intensive. This paper focuses on enhancing controllabil… ▽ More Within recent approaches to text-to-video (T2V) generation, achieving controllability in the synthesized video is often a challenge. Typically, this issue is addressed by providing low-level per-frame guidance in the form of edge maps, depth maps, or an existing video to be altered. However, the process of obtaining such guidance can be labor-intensive. This paper focuses on enhancing controllability in video synthesis by employing straightforward bounding boxes to guide the subject in various ways, all without the need for neural network training, finetuning, optimization at inference time, or the use of pre-existing videos. Our algorithm, TrailBlazer, is constructed upon a pre-trained (T2V) model, and easy to implement. The subject is directed by a bounding box through the proposed spatial and temporal attention map editing. Moreover, we introduce the concept of keyframing, allowing the subject trajectory and overall appearance to be guided by both a moving bounding box and corresponding prompts, without the need to provide a detailed mask. The method is efficient, with negligible additional computation relative to the underlying pre-trained model. Despite the simplicity of the bounding box guidance, the resulting motion is surprisingly natural, with emergent effects including perspective and movement toward the virtual camera as the box size increases. △ Less

Submitted 8 April, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

Comments: 14 pages, 18 figures, Project Page: https://hohonu-vicml.github.io/Trailblazer.Page/

arXiv:2312.16679 [pdf]

Square Moiré Superlattices in Twisted Two-Dimensional Halide Perovskites

Authors: Shuchen Zhang, Linrui **, Yuan Lu, Linghai Zhang, Jiaqi Yang, Qiuchen Zhao, Dewei Sun, Joshua J. P. Thompson, Biao Yuan, Ke Ma, Akriti, Jee Yung Park, Yoon Ho Lee, Zitang Wei, Blake P. Finkenauer, Daria D. Blach, Sarath Kumar, Hailin Peng, Arun Mannodi-Kanakkithodi, Yi Yu, Ermin Malic, Gang Lu, Letian Dou, Libai Huang

Abstract: Moiré superlattices have emerged as a new platform for studying strongly correlated quantum phenomena, but these systems have been largely limited to van der Waals layer two-dimensional (2D) materials. Here we introduce moiré superlattices leveraging ultra-thin, ligand-free halide perovskites, facilitated by ionic interactions. Square moiré superlattices with varying periodic lengths are clearly v… ▽ More Moiré superlattices have emerged as a new platform for studying strongly correlated quantum phenomena, but these systems have been largely limited to van der Waals layer two-dimensional (2D) materials. Here we introduce moiré superlattices leveraging ultra-thin, ligand-free halide perovskites, facilitated by ionic interactions. Square moiré superlattices with varying periodic lengths are clearly visualized through high-resolution transmission electron microscopy. Twist-angle-dependent transient photoluminescence microscopy and electrical characterizations indicate the emergence of localized bright excitons and trapped charge carriers near a twist angle of ~10°. The localized excitons are accompanied by enhanced exciton emission, attributed to an increased oscillator strength by a theoretically forecasted flat band. This work illustrates the potential of extended ionic interaction in realizing moiré physics at room temperature, broadening the horizon for future investigations. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.16436 [pdf, other]

Gemini: Map** and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators

Authors: **gwei Cai, Zuotong Wu, Sen Peng, Yuchen Wei, Zhanhong Tan, Guiming Shi, Mingyu Gao, Kaisheng Ma

Abstract: Chiplet technology enables the integration of an increasing number of transistors on a single accelerator with higher yield in the post-Moore era, addressing the immense computational demands arising from rapid AI advancements. However, it also introduces more expensive packaging costs and costly Die-to-Die (D2D) interfaces, which require more area, consume higher power, and offer lower bandwidth… ▽ More Chiplet technology enables the integration of an increasing number of transistors on a single accelerator with higher yield in the post-Moore era, addressing the immense computational demands arising from rapid AI advancements. However, it also introduces more expensive packaging costs and costly Die-to-Die (D2D) interfaces, which require more area, consume higher power, and offer lower bandwidth than on-chip interconnects. Maximizing the benefits and minimizing the drawbacks of chiplet technology is crucial for develo** large-scale DNN chiplet accelerators, which poses challenges to both architecture and map**. Despite its importance in the post-Moore era, methods to address these challenges remain scarce. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: Accepted by 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

arXiv:2312.15903 [pdf, other]

An Incremental Update Framework for Online Recommenders with Data-Driven Prior

Authors: Chen Yang, ** Chen, Qian Yu, Xiangdong Wu, Kui Ma, Zihao Zhao, Zhiwei Fang, Wenlong Chen, Chaosheng Fan, Jie He, Chang** Peng, Zhangang Lin, **g** Shao

Abstract: Online recommenders have attained growing interest and created great revenue for businesses. Given numerous users and items, incremental update becomes a mainstream paradigm for learning large-scale models in industrial scenarios, where only newly arrived data within a sliding window is fed into the model, meeting the strict requirements of quick response. However, this strategy would be prone to… ▽ More Online recommenders have attained growing interest and created great revenue for businesses. Given numerous users and items, incremental update becomes a mainstream paradigm for learning large-scale models in industrial scenarios, where only newly arrived data within a sliding window is fed into the model, meeting the strict requirements of quick response. However, this strategy would be prone to overfitting to newly arrived data. When there exists a significant drift of data distribution, the long-term information would be discarded, which harms the recommendation performance. Conventional methods address this issue through native model-based continual learning methods, without analyzing the data characteristics for online recommenders. To address the aforementioned issue, we propose an incremental update framework for online recommenders with Data-Driven Prior (DDP), which is composed of Feature Prior (FP) and Model Prior (MP). The FP performs the click estimation for each specific value to enhance the stability of the training process. The MP incorporates previous model output into the current update while strictly following the Bayes rules, resulting in a theoretically provable prior for the robust update. In this way, both the FP and MP are well integrated into the unified framework, which is model-agnostic and can accommodate various advanced interaction models. Extensive experiments on two publicly available datasets as well as an industrial dataset demonstrate the superior performance of the proposed framework. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.12795 [pdf, ps, other]

doi 10.1109/TSG.2023.3326928

Joint Trading and Scheduling among Coupled Carbon-Electricity-Heat-Gas Industrial Clusters

Authors: Dafeng Zhu, Bo Yang, Yu Wu, Haoran Deng, Zhaoyang Dong, Kai Ma, ** Guan

Abstract: This paper presents a carbon-energy coupling management framework for an industrial park, where the carbon flow model accompanying multi-energy flows is adopted to track and suppress carbon emissions on the user side. To deal with the quadratic constraint of gas flows, a bound tightening algorithm for constraints relaxation is adopted. The synergies among the carbon capture, energy storage, power-… ▽ More This paper presents a carbon-energy coupling management framework for an industrial park, where the carbon flow model accompanying multi-energy flows is adopted to track and suppress carbon emissions on the user side. To deal with the quadratic constraint of gas flows, a bound tightening algorithm for constraints relaxation is adopted. The synergies among the carbon capture, energy storage, power-to-gas further consume renewable energy and reduce carbon emissions. Aiming at carbon emissions disparities and supply-demand imbalances, this paper proposes a carbon trading ladder reward and punishment mechanism and an energy trading and scheduling method based on Lyapunov optimization and matching game to maximize the long-term benefits of each industrial cluster without knowing the prior information of random variables. Case studies show that our proposed trading method can reduce overall costs and carbon emissions while relieving energy pressure, which is important for Environmental, Social and Governance (ESG). △ Less

Submitted 20 December, 2023; originally announced December 2023.

Journal ref: IEEE Transactions on Smart Grid, 2023

arXiv:2312.08000 [pdf, other]

SoK: On the Security of Non-Fungible Tokens

Authors: Kai Ma, **tao Huang, Ningyu He, Zhuo Wang, Haoyu Wang

Abstract: Non-fungible tokens (NFTs) drive the prosperity of the Web3 ecosystem. By November 2023, the total market value of NFT projects reached approximately 16 billion USD. Accompanying the success of NFTs are various security issues, i.e., attacks and scams are prevalent in the ecosystem. While NFTs have attracted significant attentions from both industry and academia, there is a lack of understanding o… ▽ More Non-fungible tokens (NFTs) drive the prosperity of the Web3 ecosystem. By November 2023, the total market value of NFT projects reached approximately 16 billion USD. Accompanying the success of NFTs are various security issues, i.e., attacks and scams are prevalent in the ecosystem. While NFTs have attracted significant attentions from both industry and academia, there is a lack of understanding of kinds of NFT security issues. The discovery, in-depth analysis, and systematic categorization of these security issues are of significant importance for the prosperous development of the NFT ecosystem. To fill the gap, we performed a systematic literature review related to NFT security, and we have identified 142 incidents from 213 security reports and 18 academic papers until October 1st, 2023. Through manual analysis of the compiled security incidents, we have classified them into 12 major categories. Then we explored potential solutions and mitigation strategies. Drawing from these analyses, we established the first NFT security reference frame. Except, we extracted the characteristics of NFT security issues, i.e., the prevalence, severity, and intractability. We have indicated the gap between industry and academy for NFT security, and provide further research directions for the community. This paper, as the first SoK of NFT security, has systematically explored the security issues within the NFT ecosystem, shedding light on their root causes, real-world attacks, and potential ways to address them. Our findings will contribute to the future research of NFT security. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.07750 [pdf, other]

A Galactic Eclipse: The Small Magellanic Cloud is Forming Stars in Two, Superimposed Systems

Authors: Claire E. Murray, Sten Hasselquist, Joshua E. G. Peek, Christina Willecke Lindberg, Andres Almeida, Yumi Choi, Jessica E. M. Craig, Helga Denes, John M. Dickey, Enrico M. Di Teodoro, Christoph Federrath, Isabella A. Gerrard, Steven J. Gibson, Denis Leahy, Min-Young Lee, Callum Lynn, Yik Ki Ma, Antoine Marchal, N. M. McClure-Griffiths, David Nidever, Hiep Nguyen, Nickolas M. **el, Elizabeth Tarantino, Lucero Uscanga, Jacco Th. van Loon

Abstract: The structure and dynamics of the star-forming disk of the Small Magellanic Cloud (SMC) have long confounded us. The SMC is widely used as a prototype for galactic physics at low metallicity, and yet we fundamentally lack an understanding of the structure of its interstellar medium (ISM). In this work, we present a new model for the SMC by comparing the kinematics of young, massive stars with the… ▽ More The structure and dynamics of the star-forming disk of the Small Magellanic Cloud (SMC) have long confounded us. The SMC is widely used as a prototype for galactic physics at low metallicity, and yet we fundamentally lack an understanding of the structure of its interstellar medium (ISM). In this work, we present a new model for the SMC by comparing the kinematics of young, massive stars with the structure of the ISM traced by high-resolution observations of neutral atomic hydrogen (HI) from the Galactic Australian Square Kilometer Array Pathfinder survey (GASKAP-HI). Specifically, we identify thousands of young, massive stars with precise radial velocity constraints from the Gaia and APOGEE surveys and match these stars to the ISM structures in which they likely formed. By comparing the average dust extinction towards these stars, we find evidence that the SMC is composed of two structures with distinct stellar and gaseous chemical compositions. We construct a simple model that successfully reproduces the observations and shows that the ISM of the SMC is arranged into two, superimposed, star-forming systems with similar gas mass separated by ~5 kpc along the line of sight. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: ApJ accepted. 20 pages, 18 figures

arXiv:2312.06648 [pdf, other]

Dense X Retrieval: What Retrieval Granularity Should We Use?

Authors: Tong Chen, Hongwei Wang, Sihao Chen, Wenhao Yu, Kaixin Ma, Xinran Zhao, Hongming Zhang, Dong Yu

Abstract: Dense retrieval has become a prominent method to obtain relevant context or world knowledge in open-domain NLP tasks. When we use a learned dense retriever on a retrieval corpus at inference time, an often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We discover that the retrieval unit choice significantly impacts the performan… ▽ More Dense retrieval has become a prominent method to obtain relevant context or world knowledge in open-domain NLP tasks. When we use a learned dense retriever on a retrieval corpus at inference time, an often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We discover that the retrieval unit choice significantly impacts the performance of both retrieval and downstream tasks. Distinct from the typical approach of using passages or sentences, we introduce a novel retrieval unit, proposition, for dense retrieval. Propositions are defined as atomic expressions within text, each encapsulating a distinct factoid and presented in a concise, self-contained natural language format. We conduct an empirical comparison of different retrieval granularity. Our results reveal that proposition-based retrieval significantly outperforms traditional passage or sentence-based methods in dense retrieval. Moreover, retrieval by proposition also enhances the performance of downstream QA tasks, since the retrieved texts are more condensed with question-relevant information, reducing the need for lengthy input tokens and minimizing the inclusion of extraneous, irrelevant information. △ Less

Submitted 11 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.01679 [pdf, other]

Adversarial Medical Image with Hierarchical Feature Hiding

Authors: Qingsong Yao, Zecheng He, Yuexiang Li, Yi Lin, Kai Ma, Yefeng Zheng, S. Kevin Zhou

Abstract: Deep learning based methods for medical images can be easily compromised by adversarial examples (AEs), posing a great security flaw in clinical decision-making. It has been discovered that conventional adversarial attacks like PGD which optimize the classification logits, are easy to distinguish in the feature space, resulting in accurate reactive defenses. To better understand this phenomenon an… ▽ More Deep learning based methods for medical images can be easily compromised by adversarial examples (AEs), posing a great security flaw in clinical decision-making. It has been discovered that conventional adversarial attacks like PGD which optimize the classification logits, are easy to distinguish in the feature space, resulting in accurate reactive defenses. To better understand this phenomenon and reassess the reliability of the reactive defenses for medical AEs, we thoroughly investigate the characteristic of conventional medical AEs. Specifically, we first theoretically prove that conventional adversarial attacks change the outputs by continuously optimizing vulnerable features in a fixed direction, thereby leading to outlier representations in the feature space. Then, a stress test is conducted to reveal the vulnerability of medical images, by comparing with natural images. Interestingly, this vulnerability is a double-edged sword, which can be exploited to hide AEs. We then propose a simple-yet-effective hierarchical feature constraint (HFC), a novel add-on to conventional white-box attacks, which assists to hide the adversarial feature in the target feature distribution. The proposed method is evaluated on three medical datasets, both 2D and 3D, with different modalities. The experimental results demonstrate the superiority of HFC, \emph{i.e.,} it bypasses an array of state-of-the-art adversarial medical AE detectors more efficiently than competing adaptive attacks, which reveals the deficiencies of medical reactive defense and allows to develop more robust defenses in future. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Our code is available at \url{https://github.com/qsyao/Hierarchical_Feature_Constraint}. arXiv admin note: text overlap with arXiv:2012.09501

arXiv:2311.14288 [pdf, other]

Fair Influence Maximization in Social Networks: A Community-Based Evolutionary Algorithm

Authors: Kaicong Ma, Xinxiang Xu, Haipeng Yang, Renzhi Cao, Lei Zhang

Abstract: Influence Maximization (IM) has been extensively studied in network science, which attempts to find a subset of users to maximize the influence spread. A new variant of IM, Fair Influence Maximization (FIM), which primarily enhances the fair propagation of information, attracts increasing attention in academic. However, existing algorithms for FIM suffer from a trade-off between fairness and runni… ▽ More Influence Maximization (IM) has been extensively studied in network science, which attempts to find a subset of users to maximize the influence spread. A new variant of IM, Fair Influence Maximization (FIM), which primarily enhances the fair propagation of information, attracts increasing attention in academic. However, existing algorithms for FIM suffer from a trade-off between fairness and running time. Since it is a tough task to ensure that users are fairly influenced in terms of sensitive attributes, such as race or gender, while maintaining a high influence spread. To tackle this problem, in this paper, we propose an effective and efficient Community-based Evolutionary Algorithm for FIM (named CEA-FIM). In CEA-FIM, a community-based node selection strategy is proposed to identify potential nodes, which not only considers the size of the community but also the attributes of the nodes in the community. Subsequently, we design an evolutionary algorithm based on the proposed node selection strategy to hasten the search for the optimal solution, including the novel initialization, crossover and mutation strategies. We validate the proposed algorithm CEA-FIM by performing experiments on real-world and synthetic networks. The experimental results show that the proposed CEA-FIM achieves a better balance between effectiveness and efficiency, compared to the state-of-the-art baseline algorithms. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.10745 [pdf]

"Just a little bit on the outside for the whole time": Social belonging confidence and the persistence of Machine Learning and Artificial Intelligence students

Authors: Katherine Mao, Sharon Ferguson, James Magarian, Alison Olechowski

Abstract: The growing field of machine learning (ML) and artificial intelligence (AI) presents a unique and unexplored case within persistence research, meaning it is unclear how past findings from engineering will apply to this develo** field. We conduct an exploratory study to gain an initial understanding of persistence in this field and identify fruitful directions for future work. One factor that has… ▽ More The growing field of machine learning (ML) and artificial intelligence (AI) presents a unique and unexplored case within persistence research, meaning it is unclear how past findings from engineering will apply to this develo** field. We conduct an exploratory study to gain an initial understanding of persistence in this field and identify fruitful directions for future work. One factor that has been shown to predict persistence in engineering is belonging; we study belonging through the lens of confidence, and discuss how attention to social belonging confidence may help to increase diversity in the profession. In this research paper, we conduct a small set of interviews with students in ML/AI courses. Thematic analysis of these interviews revealed initial differences in how students see a career in ML/AI, which diverge based on interest and programming confidence. We identified how exposure and initiation, the interpretation of ML and AI field boundaries, and beliefs of the skills required to succeed might influence students' intentions to persist. We discuss differences in how students describe being motivated by social belonging and the importance of close mentorship. We motivate further persistence research in ML/AI with particular focus on social belonging and close mentorship, the role of intersectional identity, and introductory ML/AI courses. △ Less

Submitted 30 October, 2023; originally announced November 2023.

Comments: Published in the 2023 Annual Conference of the American Society for Engineering Education

Journal ref: 2023 ASEE Annual Conference & Exposition, Baltimore , Maryland

arXiv:2311.10744 [pdf]

Advancing a Model of Students' Intentional Persistence in Machine Learning and Artificial Intelligence

Authors: Sharon Ferguson, Katherine Mao, James Magarian, Alison Olechowski

Abstract: Machine Learning (ML) and Artificial Intelligence (AI) are powering the applications we use, the decisions we make, and the decisions made about us. We have seen numerous examples of non-equitable outcomes, from facial recognition algorithms to recidivism algorithms, when they are designed without diversity in mind. Thus, we must take action to promote diversity among those in this field. A critic… ▽ More Machine Learning (ML) and Artificial Intelligence (AI) are powering the applications we use, the decisions we make, and the decisions made about us. We have seen numerous examples of non-equitable outcomes, from facial recognition algorithms to recidivism algorithms, when they are designed without diversity in mind. Thus, we must take action to promote diversity among those in this field. A critical step in this work is understanding why some students who choose to study ML/AI later leave the field. While the persistence of diverse populations has been studied in engineering, there is a lack of research investigating factors that influence persistence in ML/AI. In this work, we present the advancement of a model of intentional persistence in ML/AI by surveying students in ML/AI courses. We examine persistence across demographic groups, such as gender, international student status, student loan status, and visible minority status. We investigate independent variables that distinguish ML/AI from other STEM fields, such as the varying emphasis on non-technical skills, the ambiguous ethical implications of the work, and the highly competitive and lucrative nature of the field. Our findings suggest that short-term intentional persistence is associated with academic enrollment factors such as major and level of study. Long-term intentional persistence is correlated with measures of professional role confidence. Unique to our study, we show that wanting your work to have a positive social benefit is a negative predictor of long-term intentional persistence, and women generally care more about this. We provide recommendations to educators to meaningfully discuss ML/AI ethics in classes and encourage the development of interpersonal skills to help increase diversity in the field. △ Less

Submitted 30 October, 2023; originally announced November 2023.

Comments: Presented at the 2022 Annual Conference of the American Society for Engineering Education

Journal ref: Paper presented at 2022 ASEE Annual Conference & Exposition, Minneapolis, MN

arXiv:2311.09210 [pdf, other]

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models

Authors: Wenhao Yu, Hongming Zhang, Xiaoman Pan, Kaixin Ma, Hongwei Wang, Dong Yu

Abstract: Retrieval-augmented language models (RALMs) represent a substantial advancement in the capabilities of large language models, notably in reducing factual hallucination by leveraging external knowledge sources. However, the reliability of the retrieved information is not always guaranteed. The retrieval of irrelevant data can lead to misguided responses, and potentially causing the model to overloo… ▽ More Retrieval-augmented language models (RALMs) represent a substantial advancement in the capabilities of large language models, notably in reducing factual hallucination by leveraging external knowledge sources. However, the reliability of the retrieved information is not always guaranteed. The retrieval of irrelevant data can lead to misguided responses, and potentially causing the model to overlook its inherent knowledge, even when it possesses adequate information to address the query. Moreover, standard RALMs often struggle to assess whether they possess adequate knowledge, both intrinsic and retrieved, to provide an accurate answer. In situations where knowledge is lacking, these systems should ideally respond with "unknown" when the answer is unattainable. In response to these challenges, we introduces Chain-of-Noting (CoN), a novel approach aimed at improving the robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios. The core idea of CoN is to generate sequential reading notes for retrieved documents, enabling a thorough evaluation of their relevance to the given question and integrating this information to formulate the final answer. We employed ChatGPT to create training data for CoN, which was subsequently trained on an LLaMa-2 7B model. Our experiments across four open-domain QA benchmarks show that RALMs equipped with CoN significantly outperform standard RALMs. Notably, CoN achieves an average improvement of +7.9 in EM score given entirely noisy retrieved documents and +10.5 in rejection rates for real-time questions that fall outside the pre-training knowledge scope. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: Preprint

arXiv:2311.06555 [pdf, other]

Heuristic-Driven Link-of-Analogy Prompting: Enhancing Large Language Models for Document-Level Event Argument Extraction

Authors: Hanzhang Zhou, Junlang Qian, Zijian Feng, Hui Lu, Zixiao Zhu, Kezhi Mao

Abstract: In this study, we investigate in-context learning (ICL) in document-level event argument extraction (EAE) to alleviate the dependency on large-scale labeled data for this task. We introduce the Heuristic-Driven Link-of-Analogy (HD-LoA) prompting to address the challenge of example selection and to develop a prompting strategy tailored for EAE. Specifically, we hypothesize and validate that LLMs le… ▽ More In this study, we investigate in-context learning (ICL) in document-level event argument extraction (EAE) to alleviate the dependency on large-scale labeled data for this task. We introduce the Heuristic-Driven Link-of-Analogy (HD-LoA) prompting to address the challenge of example selection and to develop a prompting strategy tailored for EAE. Specifically, we hypothesize and validate that LLMs learn task-specific heuristics from demonstrations via ICL. Building upon this hypothesis, we introduce an explicit heuristic-driven demonstration construction approach, which transforms the haphazard example selection process into a methodical method that emphasizes task heuristics. Additionally, inspired by the analogical reasoning of human, we propose the link-of-analogy prompting, which enables LLMs to process new situations by drawing analogies to known situations, enhancing their performance on unseen classes beyond limited ICL examples. Experiments show that our method outperforms existing prompting methods and few-shot supervised learning methods on document-level EAE datasets. Additionally, the HD-LoA prompting shows effectiveness in diverse tasks like sentiment analysis and natural language inference, demonstrating its broad adaptability. △ Less

Submitted 19 February, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

arXiv:2311.02513 [pdf]

Highly tunable room-temperature plexcitons in monolayer WSe2 /gap-plasmon nanocavities

Authors: Thomas P. Darlington, Mahfujur Rahaman, Kevin W. C. Kwock, Emanuil Yanev, Xuehao Wu, Luke N. Holtzman, Madisen Holbrook, Gwangwoo Kim, Kyung Yeol Ma, Hyeon Suk Shin, Andrey Krayev, Matthew Strasbourg, Nicholas J. Borys, D. N. Basov, Katayun Barmak, James C. Hone, Abhay N. Pasupathy, Deep Jariwala, P. James Schuck

Abstract: The advancement of quantum photonic technologies relies on the ability to precisely control the degrees of freedom of optically active states. Here, we realize real-time, room-temperature tunable strong plasmon-exciton coupling in 2D semiconductor monolayers enabled by a general approach that combines strain engineering plus force- and voltage-adjustable plasmonic nanocavities. We show that the ex… ▽ More The advancement of quantum photonic technologies relies on the ability to precisely control the degrees of freedom of optically active states. Here, we realize real-time, room-temperature tunable strong plasmon-exciton coupling in 2D semiconductor monolayers enabled by a general approach that combines strain engineering plus force- and voltage-adjustable plasmonic nanocavities. We show that the exciton energy and nanocavity plasmon resonance can be controllably toggled in concert by applying pressure with a plasmonic nanoprobe, allowing in operando control of detuning and coupling strength, with observed Rabi splittings >100 meV. Leveraging correlated force spectroscopy, nano-photoluminescence (nano-PL) and nano-Raman measurements, augmented with electromagnetic simulations, we identify distinct polariton bands and dark polariton states, and map their evolution as a function of nanogap and strain tuning. Uniquely, the system allows for manipulation of coupling strength over a range of cavity parameters without dramatically altering the detuning. Further, we establish that the tunable strong coupling is robust under multiple pressing cycles and repeated experiments over multiple nanobubbles. Finally, we show that the nanogap size can be directly modulated via an applied DC voltage between the substrate and plasmonic tip, highlighting the inherent nature of the concept as a plexcitonic nano-electro-mechanical system (NEMS). Our work demonstrates the potential to precisely control and tailor plexciton states localized in monolayer (1L) transition metal dichalcogenides (TMDs), paving the way for on-chip polariton-based nanophotonic applications spanning quantum information processing to photochemistry. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: 17 pages, 4 figures

arXiv:2311.01016 [pdf, other]

Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning

Authors: Yiran Li, Junpeng Wang, Prince Aboagye, Michael Yeh, Yan Zheng, Liang Wang, Wei Zhang, Kwan-Liu Ma

Abstract: Recent advancements in pre-trained large-scale language-image models have ushered in a new era of visual comprehension, offering a significant leap forward. These breakthroughs have proven particularly instrumental in addressing long-standing challenges that were previously daunting. Leveraging these innovative techniques, this paper tackles two well-known issues within the realm of visual analyti… ▽ More Recent advancements in pre-trained large-scale language-image models have ushered in a new era of visual comprehension, offering a significant leap forward. These breakthroughs have proven particularly instrumental in addressing long-standing challenges that were previously daunting. Leveraging these innovative techniques, this paper tackles two well-known issues within the realm of visual analytics: (1) the efficient exploration of large-scale image datasets and identification of potential data biases within them; (2) the evaluation of image captions and steering of their generation process. On the one hand, by visually examining the captions automatically generated from language-image models for an image dataset, we gain deeper insights into the semantic underpinnings of the visual contents, unearthing data biases that may be entrenched within the dataset. On the other hand, by depicting the association between visual contents and textual captions, we expose the weaknesses of pre-trained language-image models in their captioning capability and propose an interactive interface to steer caption generation. The two parts have been coalesced into a coordinated visual analytics system, fostering mutual enrichment of visual and textual elements. We validate the effectiveness of the system with domain practitioners through concrete case studies with large-scale image datasets. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2310.16950 [pdf, ps, other]

Stability manifolds of Kuznetsov components of prime Fano threefolds

Authors: Chang** Fan, Zhiyu Liu, Songtao Kenneth Ma

Abstract: Let $X$ be a cubic threefold, quartic double solid or Gushel--Mukai threefold, and $\mathcal{K}u(X)\subset \mathrm{D}^b(X)$ be its Kuznetsov component. We show that a stability condition $σ$ on $\mathcal{K}u(X)$ is Serre-invariant if and only if its homological dimension is at most $2$. As a corollary, we prove that all Serre-invariant stability conditions on $\mathcal{K}u(X)$ form a contractible… ▽ More Let $X$ be a cubic threefold, quartic double solid or Gushel--Mukai threefold, and $\mathcal{K}u(X)\subset \mathrm{D}^b(X)$ be its Kuznetsov component. We show that a stability condition $σ$ on $\mathcal{K}u(X)$ is Serre-invariant if and only if its homological dimension is at most $2$. As a corollary, we prove that all Serre-invariant stability conditions on $\mathcal{K}u(X)$ form a contractible connected component of the stability manifold. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 19 pages, comments are very welcome!

Showing 51–100 of 610 results for author: Ma, K