Search | arXiv e-print repository

HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization

Authors: Yucheng Tang, Yufan He, Vishwesh Nath, Pengfeig Guo, Ruining Deng, Tianyuan Yao, Quan Liu, Can Cui, Mengmeng Yin, Ziyue Xu, Holger Roth, Daguang Xu, Haichun Yang, Yuankai Huo

Abstract: In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this… ▽ More In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this paper, we propose the holistic histopathology (HoloHisto) segmentation method to achieve end-to-end segmentation on gigapixel WSIs, whose maximum resolution is above 80,000$\times$70,000 pixels. HoloHisto fundamentally shifts the paradigm of WSI segmentation to an end-to-end learning fashion with 1) a large (4K) resolution base patch for elevated visual information inclusion and efficient processing, and 2) a novel sequential tokenization mechanism to properly model the contextual relationships and efficiently model the rich information from the 4K input. To our best knowledge, HoloHisto presents the first holistic approach for gigapixel resolution WSI segmentation, supporting direct I/O of complete WSI and their corresponding gigapixel masks. Under the HoloHisto platform, we unveil a random 4K sampler that transcends ultra-high resolution, delivering 31 and 10 times more pixels than standard 2D and 3D patches, respectively, for advancing computational capabilities. To facilitate efficient 4K resolution dense prediction, we leverage sequential tokenization, utilizing a pre-trained image tokenizer to group image features into a discrete token grid. To assess the performance, our team curated a new kidney pathology image segmentation (KPIs) dataset with WSI-level glomeruli segmentation from whole mouse kidneys. From the results, HoloHisto-4K delivers remarkable performance gains over previous state-of-the-art models. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.00596 [pdf, other]

HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract: Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel… ▽ More Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel Hierarchical Adaptive Taxonomy Segmentation (HATs) method, which is designed to thoroughly segment panoramic views of kidney structures by leveraging detailed anatomical insights. Our approach entails (1) the innovative HATs technique which translates spatial relationships among 15 distinct object classes into a versatile "plug-and-play" loss function that spans across regions, functional units, and cells, (2) the incorporation of anatomical hierarchies and scale considerations into a unified simple matrix representation for all panoramic entities, (3) the adoption of the latest AI foundation model (EfficientSAM) as a feature extraction tool to boost the model's adaptability, yet eliminating the need for manual prompt generation in conventional segment anything model (SAM). Experimental findings demonstrate that the HATs method offers an efficient and effective strategy for integrating clinical insights and imaging precedents into a unified segmentation model across more than 15 categories. The official implementation is publicly available at https://github.com/hrlblab/HATs. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.19286

arXiv:2407.00030 [pdf, other]

On Orchestrating Parallel Broadcasts for Distributed Ledgers

Authors: Peiyao Sheng, Chenyuan Wu, Dahlia Malkhi, Michael K. Reiter, Chrysoula Stathakopoulou, Michael Wei, Maofan Yin

Abstract: This paper introduces and develops the concept of ``ticketing'', through which atomic broadcasts are orchestrated by nodes in a distributed system. The paper studies different ticketing regimes that allow parallelism, yet prevent slow nodes from hampering overall progress. It introduces a hybrid scheme which combines managed and unmanaged ticketing regimes, striking a balance between adaptivity an… ▽ More This paper introduces and develops the concept of ``ticketing'', through which atomic broadcasts are orchestrated by nodes in a distributed system. The paper studies different ticketing regimes that allow parallelism, yet prevent slow nodes from hampering overall progress. It introduces a hybrid scheme which combines managed and unmanaged ticketing regimes, striking a balance between adaptivity and resilience. The performance evaluation demonstrates how managed and unmanaged ticketing regimes benefit throughput in systems with heterogeneous resources both in static and dynamic scenarios, with the managed ticketing regime performing better among the two as it adapts better. Finally, it demonstrates how using the hybrid ticketing regime performance can enjoy both the adaptivity of the managed regime and the liveness guarantees of the unmanaged regime. △ Less

Submitted 17 May, 2024; originally announced July 2024.

arXiv:2406.12404 [pdf]

Scan-to-BIM for As-built Roads: Automatic Road Digital Twinning from Semantically Labeled Point Cloud Data

Authors: Yuexiong Ding, Mengtian Yin, Ran Wei, Ioannis Brilakis, Muyang Liu, Xiaowei Luo

Abstract: Creating geometric digital twins (gDT) for as-built roads still faces many challenges, such as low automation level and accuracy, limited asset types and shapes, and reliance on engineering experience. A novel scan-to-building information modeling (scan-to-BIM) framework is proposed for automatic road gDT creation based on semantically labeled point cloud data (PCD), which considers six asset type… ▽ More Creating geometric digital twins (gDT) for as-built roads still faces many challenges, such as low automation level and accuracy, limited asset types and shapes, and reliance on engineering experience. A novel scan-to-building information modeling (scan-to-BIM) framework is proposed for automatic road gDT creation based on semantically labeled point cloud data (PCD), which considers six asset types: Road Surface, Road Side (Slope), Road Lane (Marking), Road Sign, Road Light, and Guardrail. The framework first segments the semantic PCD into spatially independent instances or parts, then extracts the sectional polygon contours as their representative geometric information, stored in JavaScript Object Notation (JSON) files using a new data structure. Primitive gDTs are finally created from JSON files using corresponding conversion algorithms. The proposed method achieves an average distance error of 1.46 centimeters and a processing speed of 6.29 meters per second on six real-world road segments with a total length of 1,200 meters. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.05590 [pdf, other]

NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security

Authors: Minghao Shao, Sofija Jancheska, Meet Udeshi, Brendan Dolan-Gavitt, Haoran Xi, Kimberly Milner, Boyuan Chen, Max Yin, Siddharth Garg, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique

Abstract: Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated. To address this, we develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database specifically designed for these applications. This database incl… ▽ More Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated. To address this, we develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database specifically designed for these applications. This database includes metadata for LLM testing and adaptive learning, compiling a diverse range of CTF challenges from popular competitions. Utilizing the advanced function calling capabilities of LLMs, we build a fully automated system with an enhanced workflow and support for external tool calls. Our benchmark dataset and automated framework allow us to evaluate the performance of five LLMs, encompassing both black-box and open-source models. This work lays the foundation for future research into improving the efficiency of LLMs in interactive cybersecurity tasks and automated task planning. By providing a specialized dataset, our project offers an ideal platform for develo**, testing, and refining LLM-based approaches to vulnerability detection and resolution. Evaluating LLMs on these challenges and comparing with human performance yields insights into their potential for AI-driven cybersecurity solutions to perform real-world threat management. We make our dataset open source to public https://github.com/NYU-LLM-CTF/LLM_CTF_Database along with our playground automated framework https://github.com/NYU-LLM-CTF/llm_ctf_automation. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.01838 [pdf, other]

Learning the Target Network in Function Space

Authors: Kavosh Asadi, Yao Liu, Shoham Sabach, Ming Yin, Rasool Fakoor

Abstract: We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algo… ▽ More We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Accepted to International Conference on Machine Learning (ICML24)

arXiv:2405.20495 [pdf, other]

Transfer Q Star: Principled Decoding for LLM Alignment

Authors: Souradip Chakraborty, Soumya Suvra Ghosal, Ming Yin, Dinesh Manocha, Mengdi Wang, Amrit Singh Bedi, Furong Huang

Abstract: Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable frame… ▽ More Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function ($Q^*$), which is often unavailable in practice. Hence, prior SoTA methods either approximate this $Q^*$ using $Q^{π_{\texttt{sft}}}$ (derived from the reference $\texttt{SFT}$ model) or rely on short-term rewards, resulting in sub-optimal decoding performance. In this work, we propose Transfer $Q^*$, which implicitly estimates the optimal value function for a target reward $r$ through a baseline model $ρ_{\texttt{BL}}$ aligned with a baseline reward $ρ_{\texttt{BL}}$ (which can be different from the target reward $r$). Theoretical analyses of Transfer $Q^*$ provide a rigorous characterization of its optimality, deriving an upper bound on the sub-optimality gap and identifying a hyperparameter to control the deviation from the pre-trained reference $\texttt{SFT}$ model based on user needs. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods and demonstrates superior empirical performance across key metrics such as coherence, diversity, and quality in extensive tests on several synthetic and real datasets. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.20492 [pdf, ps, other]

Monomial identities in the Weyl algebra

Authors: Darij Grinberg, Tom Roby, Stephan Wagner, Mei Yin

Abstract: Motivated by a question and some enumerative conjectures of Richard Stanley, we explore the equivalence classes of words in the Weyl algebra, $\mathbf{k} \langle D,U\rangle/(DU-UD=1)$. We show that each class is generated by the swap** of adjacent *balanced subwords*, i.e., those which have the same number of $D$'s as $U$'s, and give several other characterizations. Armed with this we deduce a… ▽ More Motivated by a question and some enumerative conjectures of Richard Stanley, we explore the equivalence classes of words in the Weyl algebra, $\mathbf{k} \langle D,U\rangle/(DU-UD=1)$. We show that each class is generated by the swap** of adjacent *balanced subwords*, i.e., those which have the same number of $D$'s as $U$'s, and give several other characterizations. Armed with this we deduce a number of enumerative results about the number of such equivalence classes and their sizes. We extend these results to the class of $c$-Dyck words, where every prefix has at least $c$ times as many $U$'s as $D$'s. We also connect these results to previous work on bond percolation and rook theory, and generalize them to some other algebras. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 63 pages, 10 pictures. For Richard Stanley's 80th birthday. Detailed version available as ancillary file. Comments are welcome!

MSC Class: 12H05; 16S32; 05A15; 68R15

arXiv:2405.17795 [pdf, other]

Dataset Regeneration for Sequential Recommendation

Authors: Mingjia Yin, Hao Wang, Wei Guo, Yong Liu, Suojuan Zhang, Sirui Zhao, Defu Lian, Enhong Chen

Abstract: The sequential recommender (SR) system is a crucial component of modern recommender systems, as it aims to capture the evolving preferences of users. Significant efforts have been made to enhance the capabilities of SR systems. These methods typically follow the model-centric paradigm, which involves develo** effective models based on fixed datasets. However, this approach often overlooks potent… ▽ More The sequential recommender (SR) system is a crucial component of modern recommender systems, as it aims to capture the evolving preferences of users. Significant efforts have been made to enhance the capabilities of SR systems. These methods typically follow the model-centric paradigm, which involves develo** effective models based on fixed datasets. However, this approach often overlooks potential quality issues and flaws inherent in the data. Driven by the potential of data-centric AI, we propose a novel data-centric paradigm for develo** an ideal training dataset using a model-agnostic dataset regeneration framework called DR4SR. This framework enables the regeneration of a dataset with exceptional cross-architecture generalizability. Additionally, we introduce the DR4SR+ framework, which incorporates a model-aware dataset personalizer to tailor the regenerated dataset specifically for a target model. To demonstrate the effectiveness of the data-centric paradigm, we integrate our framework with various model-centric methods and observe significant performance improvements across four widely adopted datasets. Furthermore, we conduct in-depth analyses to explore the potential of the data-centric paradigm and provide valuable insights. The code can be found at https://anonymous.4open.science/r/KDD2024-86EA △ Less

Submitted 3 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16345 [pdf]

Cypher4BIM: Releasing the Power of Graph for Building Knowledge Discovery

Authors: Junxiang Zhu, Nicholas Nisbet, Mengtian Yin, Ran Wei, Ioannis Brilakis

Abstract: Graph is considered a promising way for managing building information. A new graphic form of IFC (Industry Foundation Classes) data has just been developed, referred to as IFC-Graph. However, understanding of IFC-Graph is insufficient, especially for information query. This study aims to explore graphic building information query and develop a graph query language tailored for IFC-Graph. A series… ▽ More Graph is considered a promising way for managing building information. A new graphic form of IFC (Industry Foundation Classes) data has just been developed, referred to as IFC-Graph. However, understanding of IFC-Graph is insufficient, especially for information query. This study aims to explore graphic building information query and develop a graph query language tailored for IFC-Graph. A series of tasks were carried out, including a) investigating the structure of IFC data and the main types of information in IFC, b) investigating the graph query language Cypher, and c) develo** a set of tailored functional query patterns. The developed language is referred to as Cypher4BIM. Five IFC models were used for validation, and the result shows that Cypher4BIM can query individual instances and complex relations from IFC, such as spatial structure, space boundary, and space accessibility. This study contributes to applications that require effective building information query, such as digital twin. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.12473 [pdf, other]

Learning Partially Aligned Item Representation for Cross-Domain Sequential Recommendation

Authors: Mingjia Yin, Hao Wang, Wei Guo, Yong Liu, Zhi Li, Sirui Zhao, Defu Lian, Enhong Chen

Abstract: Cross-domain sequential recommendation (CDSR) aims to uncover and transfer users' sequential preferences across multiple recommendation domains. While significant endeavors have been made, they primarily concentrated on develo** advanced transfer modules and aligning user representations using self-supervised learning techniques. However, the problem of aligning item representations has received… ▽ More Cross-domain sequential recommendation (CDSR) aims to uncover and transfer users' sequential preferences across multiple recommendation domains. While significant endeavors have been made, they primarily concentrated on develo** advanced transfer modules and aligning user representations using self-supervised learning techniques. However, the problem of aligning item representations has received limited attention, and misaligned item representations can potentially lead to sub-optimal sequential modeling and user representation alignment. To this end, we propose a model-agnostic framework called \textbf{C}ross-domain item representation \textbf{A}lignment for \textbf{C}ross-\textbf{D}omain \textbf{S}equential \textbf{R}ecommendation (\textbf{CA-CDSR}), which achieves sequence-aware generation and adaptively partial alignment for item representations. Specifically, we first develop a sequence-aware feature augmentation strategy, which captures both collaborative and sequential item correlations, thus facilitating holistic item representation generation. Next, we conduct an empirical study to investigate the partial representation alignment problem from a spectrum perspective. It motivates us to devise an adaptive spectrum filter, achieving partial alignment adaptively. Furthermore, the aligned item representations can be fed into different sequential encoders to obtain user representations. The entire framework is optimized in a multi-task learning paradigm with an annealing strategy. Extensive experiments have demonstrated that CA-CDSR can surpass state-of-the-art baselines by a significant margin and can effectively align items in representation spaces to enhance performance. △ Less

Submitted 3 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

arXiv:2404.17069 [pdf, other]

Channel Modeling for FR3 Upper Mid-band via Generative Adversarial Networks

Authors: Yaqi Hu, Mingsheng Yin, Marco Mezzavilla, Hao Guo, Sundeep Rangan

Abstract: The upper mid-band (FR3) has been recently attracting interest for new generation of mobile networks, as it provides a promising balance between spectrum availability and coverage, which are inherent limitations of the sub 6GHz and millimeter wave bands, respectively. In order to efficiently design and optimize the network, channel modeling plays a key role since FR3 systems are expected to operat… ▽ More The upper mid-band (FR3) has been recently attracting interest for new generation of mobile networks, as it provides a promising balance between spectrum availability and coverage, which are inherent limitations of the sub 6GHz and millimeter wave bands, respectively. In order to efficiently design and optimize the network, channel modeling plays a key role since FR3 systems are expected to operate at multiple frequency bands. Data-driven methods, especially generative adversarial networks (GANs), can capture the intricate relationships among data samples, and provide an appropriate tool for FR3 channel modeling. In this work, we present the architecture, link state model, and path generative network of GAN-based FR3 channel modeling. The comparison of our model greatly matches the ray-tracing simulated data. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.13528 [pdf, other]

doi 10.1145/3620666.3651384

SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

Authors: Wei Niu, Md Musfiqur Rahman Sanim, Zhihao Shu, Jiexiong Guan, Xipeng Shen, Miao Yin, Gagan Agrawal, Bin Ren

Abstract: This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, w… ▽ More This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, we observe that layout transformations between the computational operators cause a significant slowdown in these applications. This paper presents SmartMem, a comprehensive framework for eliminating most layout transformations, with the idea that multiple operators can use the same tensor layout through careful choice of layout and implementation of operations. Our approach is based on classifying the operators into four groups, and considering combinations of producer-consumer edges between the operators. We develop a set of methods for searching such layouts. Another component of our work is develo** efficient memory layouts for 2.5 dimensional memory commonly seen in mobile devices. Our experimental results show that SmartMem outperforms 5 state-of-the-art DNN execution frameworks on mobile devices across 18 varied neural networks, including CNNs, Transformers with both local and global attention, as well as LLMs. In particular, compared to DNNFusion, SmartMem achieves an average speedup of 2.8$\times$, and outperforms TVM and MNN with speedups of 6.9$\times$ and 7.9$\times$, respectively, on average. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13470 [pdf, other]

GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data

Authors: Wenqi Jia, Sian **, **zhen Wang, Wei Niu, Dingwen Tao, Miao Yin

Abstract: The rapid expansion of computational capabilities and the ever-growing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove insufficient in reducing data size to a manageable level while preserving all information intact. In response, researchers have turned to error-bounded… ▽ More The rapid expansion of computational capabilities and the ever-growing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove insufficient in reducing data size to a manageable level while preserving all information intact. In response, researchers have turned to error-bounded lossy compression methods, which offer a balance between data size reduction and information retention. However, despite their utility, these compressors employing conventional techniques struggle with limited reconstruction quality. To address this issue, we draw inspiration from recent advancements in deep learning and propose GWLZ, a novel group-wise learning-based lossy compression framework with multiple lightweight learnable enhancer models. Leveraging a group of neural networks, GWLZ significantly enhances the decompressed data reconstruction quality with negligible impact on the compression efficiency. Experimental results on different fields from the Nyx dataset demonstrate remarkable improvements by GWLZ, achieving up to 20% quality enhancements with negligible overhead as low as 0.0003x. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.11871 [pdf, other]

Group-On: Boosting One-Shot Segmentation with Supportive Query

Authors: Han**g Zhou, Mingze Yin, **Tai Chen, Danny Chen, Jian Wu

Abstract: One-shot semantic segmentation aims to segment query images given only ONE annotated support image of the same class. This task is challenging because target objects in the support and query images can be largely different in appearance and pose (i.e., intra-class variation). Prior works suggested that incorporating more annotated support images in few-shot settings boosts performances but increas… ▽ More One-shot semantic segmentation aims to segment query images given only ONE annotated support image of the same class. This task is challenging because target objects in the support and query images can be largely different in appearance and pose (i.e., intra-class variation). Prior works suggested that incorporating more annotated support images in few-shot settings boosts performances but increases costs due to additional manual labeling. In this paper, we propose a novel approach for ONE-shot semantic segmentation, called Group-On, which packs multiple query images in batches for the benefit of mutual knowledge support within the same category. Specifically, after coarse segmentation masks of the batch of queries are predicted, query-mask pairs act as pseudo support data to enhance mask predictions mutually, under the guidance of a simple Group-On Voting module. Comprehensive experiments on three standard benchmarks show that, in the ONE-shot setting, our Group-On approach significantly outperforms previous works by considerable margins. For example, on the COCO-20i dataset, we increase mIoU scores by 8.21% and 7.46% on ASNet and HSNet baselines, respectively. With only one support image, Group-On can be even competitive with the counterparts using 5 annotated support images. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.04057 [pdf, other]

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

Authors: Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang

Abstract: We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By refo… ▽ More We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By reformulating forward diffusion processes as semi-implicit distributions, we leverage three score-related identities to create an innovative loss mechanism. This mechanism achieves rapid FID reduction by training the generator using its own synthesized images, eliminating the need for real data or reverse-diffusion-based generation, all accomplished within significantly shortened generation time. Upon evaluation across four benchmark datasets, the SiD algorithm demonstrates high iteration efficiency during distillation and surpasses competing distillation approaches, whether they are one-step or few-step, data-free, or dependent on training data, in terms of generation quality. This achievement not only redefines the benchmarks for efficiency and effectiveness in diffusion distillation but also in the broader field of diffusion-based generation. The PyTorch implementation is available at https://github.com/mingyuanzhou/SiD △ Less

Submitted 24 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

Comments: ICML 2024, PyTorch implementation: https://github.com/mingyuanzhou/SiD

arXiv:2404.00268 [pdf, other]

A Unified Framework for Adaptive Representation Enhancement and Inversed Learning in Cross-Domain Recommendation

Authors: Luankang Zhang, Hao Wang, Suojuan Zhang, Mingjia Yin, Yongqiang Han, Jiaqing Zhang, Defu Lian, Enhong Chen

Abstract: Cross-domain recommendation (CDR), aiming to extract and transfer knowledge across domains, has attracted wide attention for its efficacy in addressing data sparsity and cold-start problems. Despite significant advances in representation disentanglement to capture diverse user preferences, existing methods usually neglect representation enhancement and lack rigorous decoupling constraints, thereby… ▽ More Cross-domain recommendation (CDR), aiming to extract and transfer knowledge across domains, has attracted wide attention for its efficacy in addressing data sparsity and cold-start problems. Despite significant advances in representation disentanglement to capture diverse user preferences, existing methods usually neglect representation enhancement and lack rigorous decoupling constraints, thereby limiting the transfer of relevant information. To this end, we propose a Unified Framework for Adaptive Representation Enhancement and Inversed Learning in Cross-Domain Recommendation (AREIL). Specifically, we first divide user embeddings into domain-shared and domain-specific components to disentangle mixed user preferences. Then, we incorporate intra-domain and inter-domain information to adaptively enhance the ability of user representations. In particular, we propose a graph convolution module to capture high-order information, and a self-attention module to reveal inter-domain correlations and accomplish adaptive fusion. Next, we adopt domain classifiers and gradient reversal layers to achieve inversed representation learning in a unified framework. Finally, we employ a cross-entropy loss for measuring recommendation performance and jointly optimize the entire framework via multi-task learning. Extensive experiments on multiple datasets validate the substantial improvement in the recommendation performance of AREIL. Moreover, ablation studies and representation visualizations further illustrate the effectiveness of adaptive enhancement and inversed learning in CDR. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted by DASFAA 2024

arXiv:2403.17110 [pdf, ps, other]

Fixed points and cycles of parking functions

Authors: Martin Rubey, Mei Yin

Abstract: A parking function of length $n$ is a sequence $π=(π_1,\dots, π_n)$ of positive integers such that if $λ_1\leq\cdots\leq λ_n$ is the increasing rearrangement of $π_1,\dots,π_n$, then $λ_i\leq i$ for $1\leq i\leq n$. In this paper we obtain some exact results on the number of fixed points and cycles of parking functions. Our proofs will be based on generalizations of Pollak's argument. Extensions o… ▽ More A parking function of length $n$ is a sequence $π=(π_1,\dots, π_n)$ of positive integers such that if $λ_1\leq\cdots\leq λ_n$ is the increasing rearrangement of $π_1,\dots,π_n$, then $λ_i\leq i$ for $1\leq i\leq n$. In this paper we obtain some exact results on the number of fixed points and cycles of parking functions. Our proofs will be based on generalizations of Pollak's argument. Extensions of our techniques are discussed. △ Less

Submitted 25 March, 2024; originally announced March 2024.

MSC Class: 05A15; 05A19; 60C05

arXiv:2403.16812 [pdf, other]

Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making

Authors: Shuai Ma, Qiaoyi Chen, Xinru Wang, Chengbo Zheng, Zhenhui Peng, Ming Yin, Xiaojuan Ma

Abstract: In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to p… ▽ More In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to promote human reflection and discussion on conflicting human-AI opinions in decision-making. Based on theories in human deliberation, this framework engages humans and AI in dimension-level opinion elicitation, deliberative discussion, and decision updates. To empower AI with deliberative capabilities, we designed Deliberative AI, which leverages large language models (LLMs) as a bridge between humans and domain-specific models to enable flexible conversational interactions and faithful information provision. An exploratory evaluation on a graduate admissions task shows that Deliberative AI outperforms conventional explainable AI (XAI) assistants in improving humans' appropriate reliance and task performance. Based on a mixed-methods analysis of participant behavior, perception, user experience, and open-ended feedback, we draw implications for future AI-assisted decision tool design. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.15329 [pdf, other]

Optimal Data-Driven Prediction and Predictive Control using Signal Matrix Models

Authors: Roy S. Smith, Mohamed Abdalmoaty, Mingzhou Yin

Abstract: Data-driven control uses a past signal trajectory to characterise the input-output behaviour of a system. Willems' lemma provides a data-based prediction model allowing a control designer to bypass the step of identifying a state-space or transfer function model. This paper provides a more parsimonious formulation of Willems' lemma that separates the model into initial condition matching and predi… ▽ More Data-driven control uses a past signal trajectory to characterise the input-output behaviour of a system. Willems' lemma provides a data-based prediction model allowing a control designer to bypass the step of identifying a state-space or transfer function model. This paper provides a more parsimonious formulation of Willems' lemma that separates the model into initial condition matching and predictive control design parts. This avoids the need for regularisers in the predictive control problem that are found in other data-driven predictive control methods. It also gives a closed form expression for the optimal (minimum variance) unbiased predictor of the future output trajectory and applies it for predictive control. Simulation comparisons illustrate very good control performance. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 7 pages, 3 figures. Submitted to IEEE Control Systems Society Letters and the 2024 Conference on Decision and Control

arXiv:2403.12004 [pdf, other]

The Value, Benefits, and Concerns of Generative AI-Powered Assistance in Writing

Authors: Zhuoyan Li, Chen Liang, **g Peng, Ming Yin

Abstract: Recent advances in generative AI technologies like large language models raise both excitement and concerns about the future of human-AI co-creation in writing. To unpack people's attitude towards and experience with generative AI-powered writing assistants, in this paper, we conduct an experiment to understand whether and how much value people attach to AI assistance, and how the incorporation of… ▽ More Recent advances in generative AI technologies like large language models raise both excitement and concerns about the future of human-AI co-creation in writing. To unpack people's attitude towards and experience with generative AI-powered writing assistants, in this paper, we conduct an experiment to understand whether and how much value people attach to AI assistance, and how the incorporation of AI assistance in writing workflows changes people's writing perceptions and performance. Our results suggest that people are willing to forgo financial payments to receive writing assistance from AI, especially if AI can provide direct content generation assistance and the writing task is highly creative. Generative AI-powered assistance is found to offer benefits in increasing people's productivity and confidence in writing. However, direct content generation assistance offered by AI also comes with risks, including decreasing people's sense of accountability and diversity in writing. We conclude by discussing the implications of our findings. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: CHI 2024

arXiv:2403.11574 [pdf, ps, other]

Offline Multitask Representation Learning for Reinforcement Learning

Authors: Haque Ishfaq, Thanh Nguyen-Tang, Songtao Feng, Raman Arora, Mengdi Wang, Ming Yin, Doina Precup

Abstract: We study offline multitask representation learning in reinforcement learning (RL), where a learner is provided with an offline dataset from different tasks that share a common representation and is asked to learn the shared representation. We theoretically investigate offline multitask low-rank RL, and propose a new algorithm called MORL for offline multitask representation learning. Furthermore,… ▽ More We study offline multitask representation learning in reinforcement learning (RL), where a learner is provided with an offline dataset from different tasks that share a common representation and is asked to learn the shared representation. We theoretically investigate offline multitask low-rank RL, and propose a new algorithm called MORL for offline multitask representation learning. Furthermore, we examine downstream RL in reward-free, offline and online scenarios, where a new task is introduced to the agent that shares the same representation as the upstream offline tasks. Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.10850 [pdf, other]

GAgent: An Adaptive Rigid-Soft Grip** Agent with Vision Language Models for Complex Lighting Environments

Authors: Zhuowei Li, Miao Zhang, Xiaotian Lin, Meng Yin, Shuai Lu, Xueqian Wang

Abstract: This paper introduces GAgent: an Grip** Agent designed for open-world environments that provides advanced cognitive abilities via VLM agents and flexible gras** abilities with variable stiffness soft grippers. GAgent comprises three primary components - Prompt Engineer module, Visual-Language Model (VLM) core and Workflow module. These three modules enhance gripper success rates by recognizing… ▽ More This paper introduces GAgent: an Grip** Agent designed for open-world environments that provides advanced cognitive abilities via VLM agents and flexible gras** abilities with variable stiffness soft grippers. GAgent comprises three primary components - Prompt Engineer module, Visual-Language Model (VLM) core and Workflow module. These three modules enhance gripper success rates by recognizing objects and materials and accurately estimating grasp area even under challenging lighting conditions. As part of creativity, researchers also created a bionic hybrid soft gripper with variable stiffness capable of grip** heavy loads while still gently engaging objects. This intelligent agent, featuring VLM-based cognitive processing with bionic design, shows promise as it could potentially benefit UAVs in various scenarios. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.09552 [pdf, other]

"Are You Really Sure?" Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making

Authors: Shuai Ma, Xinru Wang, Ying Lei, Chuhan Shi, Ming Yin, Xiaojuan Ma

Abstract: In AI-assisted decision-making, it is crucial but challenging for humans to achieve appropriate reliance on AI. This paper approaches this problem from a human-centered perspective, "human self-confidence calibration". We begin by proposing an analytical framework to highlight the importance of calibrated human self-confidence. In our first study, we explore the relationship between human self-con… ▽ More In AI-assisted decision-making, it is crucial but challenging for humans to achieve appropriate reliance on AI. This paper approaches this problem from a human-centered perspective, "human self-confidence calibration". We begin by proposing an analytical framework to highlight the importance of calibrated human self-confidence. In our first study, we explore the relationship between human self-confidence appropriateness and reliance appropriateness. Then in our second study, We propose three calibration mechanisms and compare their effects on humans' self-confidence and user experience. Subsequently, our third study investigates the effects of self-confidence calibration on AI-assisted decision-making. Results show that calibrating human self-confidence enhances human-AI team performance and encourages more rational reliance on AI (in some aspects) compared to uncalibrated baselines. Finally, we discuss our main findings and provide implications for designing future AI-assisted decision-making interfaces. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.08685 [pdf, other]

Elastic shape analysis computations for clustering left atrial appendage geometries of atrial fibrillation patients

Authors: Zan Ahmad, Minglang Yin, Yashil Sukurdeep, Noam Rotenberg, Eugene Kholmovski, Natalia A. Trayanova

Abstract: Morphological variations in the left atrial appendage (LAA) are associated with different levels of ischemic stroke risk for patients with atrial fibrillation (AF). Studying LAA morphology can elucidate mechanisms behind this association and lead to the development of advanced stroke risk stratification tools. However, current categorical descriptions of LAA morphologies are qualitative and incons… ▽ More Morphological variations in the left atrial appendage (LAA) are associated with different levels of ischemic stroke risk for patients with atrial fibrillation (AF). Studying LAA morphology can elucidate mechanisms behind this association and lead to the development of advanced stroke risk stratification tools. However, current categorical descriptions of LAA morphologies are qualitative and inconsistent across studies, which impedes advancements in our understanding of stroke pathogenesis in AF. To mitigate these issues, we introduce a quantitative pipeline that combines elastic shape analysis with unsupervised learning for the categorization of LAA morphology in AF patients. As part of our pipeline, we compute pairwise elastic distances between LAA meshes from a cohort of 20 AF patients, and leverage these distances to cluster our shape data. We demonstrate that our method clusters LAA morphologies based on distinctive shape features, overcoming the innate inconsistencies of current LAA categorization systems, and paving the way for improved stroke risk metrics using objective LAA shape groups. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: Submitted as a conference paper to MICCAI 2024

arXiv:2403.04530 [pdf, other]

Multi-District School Choice: Playing on Several Fields

Authors: Yannai A. Gonczarowski, Michael Yin, Shirley Zhang

Abstract: We extend the seminal model of Pathak and Sönmez (2008) to a setting with multiple school districts, each running its own separate centralized match, and focus on the case of two districts. In our setting, in addition to each student being either sincere or sophisticated, she is also either constrained - able to apply only to schools within her own district of residence - or unconstrained - able t… ▽ More We extend the seminal model of Pathak and Sönmez (2008) to a setting with multiple school districts, each running its own separate centralized match, and focus on the case of two districts. In our setting, in addition to each student being either sincere or sophisticated, she is also either constrained - able to apply only to schools within her own district of residence - or unconstrained - able to choose any single district within which to apply. We show that several key results from Pathak and Sönmez (2008) qualitatively flip: A sophisticated student may prefer for a sincere student to become sophisticated, and a sophisticated student may prefer for her own district to use Deferred Acceptance over the Boston Mechanism, irrespective of the mechanism used by the other district. We furthermore investigate the preferences of students over the constraint levels of other students. Many of these phenomena appear abundantly in large random markets. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03149 [pdf, other]

Robust Federated Learning Mitigates Client-side Training Data Distribution Inference Attacks

Authors: Yichang Xu, Ming Yin, Minghong Fang, Neil Zhenqiang Gong

Abstract: Recent studies have revealed that federated learning (FL), once considered secure due to clients not sharing their private data with the server, is vulnerable to attacks such as client-side training data distribution inference, where a malicious client can recreate the victim's data. While various countermeasures exist, they are not practical, often assuming server access to some training data or… ▽ More Recent studies have revealed that federated learning (FL), once considered secure due to clients not sharing their private data with the server, is vulnerable to attacks such as client-side training data distribution inference, where a malicious client can recreate the victim's data. While various countermeasures exist, they are not practical, often assuming server access to some training data or knowledge of label distribution before the attack. In this work, we bridge the gap by proposing InferGuard, a novel Byzantine-robust aggregation rule aimed at defending against client-side training data distribution inference attacks. In our proposed InferGuard, the server first calculates the coordinate-wise median of all the model updates it receives. A client's model update is considered malicious if it significantly deviates from the computed median update. We conduct a thorough evaluation of our proposed InferGuard on five benchmark datasets and perform a comparison with ten baseline methods. The results of our experiments indicate that our defense mechanism is highly effective in protecting against client-side training data distribution inference attacks, even against strong adaptive attacks. Furthermore, our method substantially outperforms the baseline methods in various practical FL scenarios. △ Less

Submitted 4 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: To appear in The Web Conference 2024 (WWW '24)

arXiv:2403.01791 [pdf, other]

Beyond Recommender: An Exploratory Study of the Effects of Different AI Roles in AI-Assisted Decision Making

Authors: Shuai Ma, Chenyi Zhang, Xinru Wang, Xiaojuan Ma, Ming Yin

Abstract: Artificial Intelligence (AI) is increasingly employed in various decision-making tasks, typically as a Recommender, providing recommendations that the AI deems correct. However, recent studies suggest this may diminish human analytical thinking and lead to humans' inappropriate reliance on AI, impairing the synergy in human-AI teams. In contrast, human advisors in group decision-making perform var… ▽ More Artificial Intelligence (AI) is increasingly employed in various decision-making tasks, typically as a Recommender, providing recommendations that the AI deems correct. However, recent studies suggest this may diminish human analytical thinking and lead to humans' inappropriate reliance on AI, impairing the synergy in human-AI teams. In contrast, human advisors in group decision-making perform various roles, such as analyzing alternative options or criticizing decision-makers to encourage their critical thinking. This diversity of roles has not yet been empirically explored in AI assistance. In this paper, we examine three AI roles: Recommender, Analyzer, and Devil's Advocate, and evaluate their effects across two AI performance levels. Our results show each role's distinct strengths and limitations in task performance, reliance appropriateness, and user experience. Notably, the Recommender role is not always the most effective, especially if the AI performance level is low, the Analyzer role may be preferable. These insights offer valuable implications for designing AI assistants with adaptive functional roles according to different situations. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.19286 [pdf, other]

PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Jialin Yue, Juming Xiong, Lining Yu, Yifei Wu, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract: Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intrica… ▽ More Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intricate spatial interrelations among objects from clinical knowledge. In this research, we introduce a novel universal proposition learning approach, called panoramic renal pathology segmentation (PrPSeg), designed to segment comprehensively panoramic structures within kidney by integrating extensive knowledge of kidney anatomy. In this paper, we propose (1) the design of a comprehensive universal proposition matrix for renal pathology, facilitating the incorporation of classification and spatial relationships into the segmentation process; (2) a token-based dynamic head single network architecture, with the improvement of the partial label image segmentation and capability for future data enlargement; and (3) an anatomy loss function, quantifying the inter-object relationships across the kidney. △ Less

Submitted 20 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: IEEE / CVF Computer Vision and Pattern Recognition Conference 2024

arXiv:2402.12781 [pdf, ps, other]

On the anticyclotomic Iwasawa theory of newforms at Eisenstein primes of semistable reduction

Authors: Timo Keller, Mulun Yin

Abstract: Let $f$ be a newform of weight $k$ and level $N$ with trivial nebentypus. Let $\mathfrak{p}\nmid 2N$ be a maximal prime ideal of the coefficient ring of $f$ such that the self-dual twist of the mod-$\mathfrak{p}$ Galois representation of $f$ is reducible with constituents $φ,ψ$. Denote a decomposition group over the rational prime $p$ below $\mathfrak{p}$ by $G_p$. We remove the condition… ▽ More Let $f$ be a newform of weight $k$ and level $N$ with trivial nebentypus. Let $\mathfrak{p}\nmid 2N$ be a maximal prime ideal of the coefficient ring of $f$ such that the self-dual twist of the mod-$\mathfrak{p}$ Galois representation of $f$ is reducible with constituents $φ,ψ$. Denote a decomposition group over the rational prime $p$ below $\mathfrak{p}$ by $G_p$. We remove the condition $φ|_{G_p} \neq \mathbf{1}, ω$ from [CGLS22], and generalize their results to newforms of arbitrary weights. As a consequence, we prove some Iwasawa main conjectures and get the $p$-part of the strong BSD conjecture for elliptic curves of analytic rank $0$ or $1$ over $\mathbf{Q}$ in this setting. In particular, non-trivial $p$-torsion is allowed in the Mordell--Weil group. Using Hida families, we prove a Iwasawa main conjecture for newforms of weight $2$ of multiplicative reduction at Eisenstein primes. In the above situations, we also get $p$-converse theorems to the theorems of Gross--Zagier--Kolyvagin. The $p$-converse theorems have applications to Goldfeld's conjecture in certain quadratic twist families of elliptic curves having a $3$-isogeny. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: Comments welcome

MSC Class: 11G40 (Primary) 11G05; 11G10; 14G10 (Secondary)

arXiv:2402.11637 [pdf, other]

Poisoning Federated Recommender Systems with Fake Users

Authors: Ming Yin, Yichang Xu, Minghong Fang, Neil Zhenqiang Gong

Abstract: Federated recommendation is a prominent use case within federated learning, yet it remains susceptible to various attacks, from user to server-side vulnerabilities. Poisoning attacks are particularly notable among user-side attacks, as participants upload malicious model updates to deceive the global model, often intending to promote or demote specific targeted items. This study investigates strat… ▽ More Federated recommendation is a prominent use case within federated learning, yet it remains susceptible to various attacks, from user to server-side vulnerabilities. Poisoning attacks are particularly notable among user-side attacks, as participants upload malicious model updates to deceive the global model, often intending to promote or demote specific targeted items. This study investigates strategies for executing promotion attacks in federated recommender systems. Current poisoning attacks on federated recommender systems often rely on additional information, such as the local training data of genuine users or item popularity. However, such information is challenging for the potential attacker to obtain. Thus, there is a need to develop an attack that requires no extra information apart from item embeddings obtained from the server. In this paper, we introduce a novel fake user based poisoning attack named PoisonFRS to promote the attacker-chosen targeted item in federated recommender systems without requiring knowledge about user-item rating data, user attributes, or the aggregation rule used by the server. Extensive experiments on multiple real-world datasets demonstrate that PoisonFRS can effectively promote the attacker-chosen targeted item to a large portion of genuine users and outperform current benchmarks that rely on additional information about the system. We further observe that the model updates from both genuine and fake users are indistinguishable within the latent space. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: To appear in The Web Conference 2024 (WWW '24)

arXiv:2402.10516 [pdf, other]

Generative AI for Controllable Protein Sequence Design: A Survey

Authors: Yiheng Zhu, Zitai Kong, Jialu Wu, Weize Liu, Yuqiang Han, Mingze Yin, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou

Abstract: The design of novel protein sequences with targeted functionalities underpins a central theme in protein engineering, impacting diverse fields such as drug discovery and enzymatic engineering. However, navigating this vast combinatorial search space remains a severe challenge due to time and financial constraints. This scenario is rapidly evolving as the transformative advancements in AI, particul… ▽ More The design of novel protein sequences with targeted functionalities underpins a central theme in protein engineering, impacting diverse fields such as drug discovery and enzymatic engineering. However, navigating this vast combinatorial search space remains a severe challenge due to time and financial constraints. This scenario is rapidly evolving as the transformative advancements in AI, particularly in the realm of generative models and optimization algorithms, have been propelling the protein design field towards an unprecedented revolution. In this survey, we systematically review recent advances in generative AI for controllable protein sequence design. To set the stage, we first outline the foundational tasks in protein sequence design in terms of the constraints involved and present key generative models and optimization algorithms. We then offer in-depth reviews of each design task and discuss the pertinent applications. Finally, we identify the unresolved challenges and highlight research opportunities that merit deeper exploration. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 9 pages

arXiv:2402.07250 [pdf, other]

DIMON: Learning Solution Operators of Partial Differential Equations on a Diffeomorphic Family of Domains

Authors: Minglang Yin, Nicolas Charon, Ryan Brody, Lu Lu, Natalia Trayanova, Mauro Maggioni

Abstract: The solution of a PDE over varying initial/boundary conditions on multiple domains is needed in a wide variety of applications, but it is computationally expensive if the solution is computed de novo whenever the initial/boundary conditions of the domain change. We introduce a general operator learning framework, called DIffeomorphic Map** Operator learNing (DIMON) to learn approximate PDE solut… ▽ More The solution of a PDE over varying initial/boundary conditions on multiple domains is needed in a wide variety of applications, but it is computationally expensive if the solution is computed de novo whenever the initial/boundary conditions of the domain change. We introduce a general operator learning framework, called DIffeomorphic Map** Operator learNing (DIMON) to learn approximate PDE solutions over a family of domains $\{Ω_θ}_θ$, that learns the map from initial/boundary conditions and domain $Ω_θ$ to the solution of the PDE, or to specified functionals thereof. DIMON is based on transporting a given problem (initial/boundary conditions and domain $Ω_θ$) to a problem on a reference domain $Ω_{0}$, where training data from multiple problems is used to learn the map to the solution on $Ω_{0}$, which is then re-mapped to the original domain $Ω_θ$. We consider several problems to demonstrate the performance of the framework in learning both static and time-dependent PDEs on non-rigid geometries; these include solving the Laplace equation, reaction-diffusion equations, and a multiscale PDE that characterizes the electrical propagation on the left ventricle. This work paves the way toward the fast prediction of PDE solutions on a family of domains and the application of neural operators in engineering and precision medicine. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.00980 [pdf, other]

The shape memory effect and minimal surfaces

Authors: Mengdi Yin, Dimitri D. Vvedensky

Abstract: Martensitic transformations, viewed as continuous transformations between triply periodic minimal surfaces (TPMS), as originally proposed by Hyde and Andersson [Z. Kristallogr. 174, 225 (1986)], is extended to include paths between the initial and final phases. Bravais lattices correspond to particular TPMS whose lattice points are flat points, where the Gaussian curvature vanishes. Reversible tra… ▽ More Martensitic transformations, viewed as continuous transformations between triply periodic minimal surfaces (TPMS), as originally proposed by Hyde and Andersson [Z. Kristallogr. 174, 225 (1986)], is extended to include paths between the initial and final phases. Bravais lattices correspond to particular TPMS whose lattice points are flat points, where the Gaussian curvature vanishes. Reversible transformations, which correspond to shape memory materials, occur only if lattice points remain at flat points on a TPMS throughout a continuous deformation. For the shape memory material NiTi, density-functional theory (DFT) yields irreversible and reversible paths with and without energy barriers, respectively. Although there are TPMS for face-centered gamma-Fe) and body-centered (alpha-Fe) cubic lattices, gamma to alpha deformation paths are not reversible, in agreement with non-vanishing energy barriers obtained from DFT. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.15603 [pdf, other]

Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction

Authors: Kangkang Lu, Yanhua Yu, Hao Fei, Xuan Li, Zixuan Yang, Zirui Guo, Meiyu Liang, Mengran Yin, Tat-Seng Chua

Abstract: In recent years, spectral graph neural networks, characterized by polynomial filters, have garnered increasing attention and have achieved remarkable performance in tasks such as node classification. These models typically assume that eigenvalues for the normalized Laplacian matrix are distinct from each other, thus expecting a polynomial filter to have a high fitting ability. However, this paper… ▽ More In recent years, spectral graph neural networks, characterized by polynomial filters, have garnered increasing attention and have achieved remarkable performance in tasks such as node classification. These models typically assume that eigenvalues for the normalized Laplacian matrix are distinct from each other, thus expecting a polynomial filter to have a high fitting ability. However, this paper empirically observes that normalized Laplacian matrices frequently possess repeated eigenvalues. Moreover, we theoretically establish that the number of distinguishable eigenvalues plays a pivotal role in determining the expressive power of spectral graph neural networks. In light of this observation, we propose an eigenvalue correction strategy that can free polynomial filters from the constraints of repeated eigenvalue inputs. Concretely, the proposed eigenvalue correction strategy enhances the uniform distribution of eigenvalues, thus mitigating repeated eigenvalues, and improving the fitting capacity and expressive power of polynomial filters. Extensive experimental results on both synthetic and real-world datasets demonstrate the superiority of our method. △ Less

Submitted 18 March, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI-24

arXiv:2401.10341 [pdf, other]

ELRT: Efficient Low-Rank Training for Compact Convolutional Neural Networks

Authors: Yang Sui, Miao Yin, Yu Gong, **qi Xiao, Huy Phan, Bo Yuan

Abstract: Low-rank compression, a popular model compression technique that produces compact convolutional neural networks (CNNs) with low rankness, has been well-studied in the literature. On the other hand, low-rank training, as an alternative way to train low-rank CNNs from scratch, has been exploited little yet. Unlike low-rank compression, low-rank training does not need pre-trained full-rank models, an… ▽ More Low-rank compression, a popular model compression technique that produces compact convolutional neural networks (CNNs) with low rankness, has been well-studied in the literature. On the other hand, low-rank training, as an alternative way to train low-rank CNNs from scratch, has been exploited little yet. Unlike low-rank compression, low-rank training does not need pre-trained full-rank models, and the entire training phase is always performed on the low-rank structure, bringing attractive benefits for practical applications. However, the existing low-rank training solutions still face several challenges, such as a considerable accuracy drop and/or still needing to update full-size models during the training. In this paper, we perform a systematic investigation on low-rank CNN training. By identifying the proper low-rank format and performance-improving strategy, we propose ELRT, an efficient low-rank training solution for high-accuracy, high-compactness, low-rank CNN models. Our extensive evaluation results for training various CNNs on different datasets demonstrate the effectiveness of ELRT. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.07058 [pdf, other]

Does More Advice Help? The Effects of Second Opinions in AI-Assisted Decision Making

Authors: Zhuoran Lu, Dakuo Wang, Ming Yin

Abstract: AI assistance in decision-making has become popular, yet people's inappropriate reliance on AI often leads to unsatisfactory human-AI collaboration performance. In this paper, through three pre-registered, randomized human subject experiments, we explore whether and how the provision of {second opinions} may affect decision-makers' behavior and performance in AI-assisted decision-making. We find t… ▽ More AI assistance in decision-making has become popular, yet people's inappropriate reliance on AI often leads to unsatisfactory human-AI collaboration performance. In this paper, through three pre-registered, randomized human subject experiments, we explore whether and how the provision of {second opinions} may affect decision-makers' behavior and performance in AI-assisted decision-making. We find that if both the AI model's decision recommendation and a second opinion are always presented together, decision-makers reduce their over-reliance on AI while increase their under-reliance on AI, regardless whether the second opinion is generated by a peer or another AI model. However, if decision-makers have the control to decide when to solicit a peer's second opinion, we find that their active solicitations of second opinions have the potential to mitigate over-reliance on AI without inducing increased under-reliance in some cases. We conclude by discussing the implications of our findings for promoting effective human-AI collaborations in decision-making. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2401.05840 [pdf, other]

Decoding AI's Nudge: A Unified Framework to Predict Human Behavior in AI-assisted Decision Making

Authors: Zhuoyan Li, Zhuoran Lu, Ming Yin

Abstract: With the rapid development of AI-based decision aids, different forms of AI assistance have been increasingly integrated into the human decision making processes. To best support humans in decision making, it is essential to quantitatively understand how diverse forms of AI assistance influence humans' decision making behavior. To this end, much of the current research focuses on the end-to-end pr… ▽ More With the rapid development of AI-based decision aids, different forms of AI assistance have been increasingly integrated into the human decision making processes. To best support humans in decision making, it is essential to quantitatively understand how diverse forms of AI assistance influence humans' decision making behavior. To this end, much of the current research focuses on the end-to-end prediction of human behavior using ``black-box'' models, often lacking interpretations of the nuanced ways in which AI assistance impacts the human decision making process. Meanwhile, methods that prioritize the interpretability of human behavior predictions are often tailored for one specific form of AI assistance, making adaptations to other forms of assistance difficult. In this paper, we propose a computational framework that can provide an interpretable characterization of the influence of different forms of AI assistance on decision makers in AI-assisted decision making. By conceptualizing AI assistance as the ``{\em nudge}'' in human decision making processes, our approach centers around modelling how different forms of AI assistance modify humans' strategy in weighing different information in making their decisions. Evaluations on behavior data collected from real human decision makers show that the proposed framework outperforms various baselines in accurately predicting human behavior in AI-assisted decision making. Based on the proposed framework, we further provide insights into how individuals with different cognitive styles are nudged by AI assistance differently. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: AAAI 2024

arXiv:2312.16925 [pdf]

Detecting bulk carbon ferromagnetism in graphene multi-edge structure

Authors: Chao Wang, Nan Jian, Meijie Yin, Xi Zhang, Zhi Yang, Xiuhao Mo, Takashi Kikkawa, Shunsuke Daimon, Eiji Saitoh, Qian Li, Wensheng Yan, Dazhi Hou, Lei Yang, Dongfeng Diao

Abstract: The emergence of bulk carbon ferromagnetism is long-expected over years. At nanoscale, carbon ferromagnetism was detected by analyzing the magnetic edge states via scanning tunneling microscopy(STM), and its origin can be explained by local redistribution of electron wave function. In larger scale, carbon ferromagnetism can be created by deliberately producing defects in graphite, and detected by… ▽ More The emergence of bulk carbon ferromagnetism is long-expected over years. At nanoscale, carbon ferromagnetism was detected by analyzing the magnetic edge states via scanning tunneling microscopy(STM), and its origin can be explained by local redistribution of electron wave function. In larger scale, carbon ferromagnetism can be created by deliberately producing defects in graphite, and detected by macroscopic technical magnetization. Meanwhile, it becomes crucial to determine that the detected magnetization is originated from carbon rather than from magnetic impurities. One solution is X-ray magnetic circular dichroism (XMCD). Nonetheless, a reproducible, full section of XMCD spectrum across C-1s absorption energy has not appeared yet, which should be decisive for assuring the indisputable existence of bulk carbon ferromagnetism. Besides, the lack of direct observation on the atomic structure of the ferromagnetic carbon leaves the structural origin of its ferromagnetism still in mist. In this work, for detecting bulk carbon ferromagnetism, we managed to grow all-carbon film consisting of vertically aligned graphene multi-edge (VGME), which wove into a three-dimensional hyperfine-porous network. Magnetization (M-H) curves and XMCD spectra co-confirmed bulk carbon ferromagnetism of VGME at room temperature, with the average unit magnetic momentum of ~0.0006 miuB/atom. The influence of magnetic impurities on magnetization was excluded by both absorption spectra and inductively coupled plasma mass spectrometry measurements. The spin transfer behavior also verified the long-range and robust feature of the bulk carbon ferromagnetism. Our work provides direct evidence of elementary resolved bulk carbon ferromagnetism at room temperature and clarifies its origin from pi-electrons at graphene edges. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.15877 [pdf, other]

PBCounter: Weighted Model Counting on Pseudo-Boolean Formulas

Authors: Yong Lai, Zhenghang Xu, Minghao Yin

Abstract: In Weighted Model Counting (WMC), we assign weights to literals and compute the sum of the weights of the models of a given propositional formula where the weight of an assignment is the product of the weights of its literals. The current WMC solvers work on Conjunctive Normal Form (CNF) formulas. However, CNF is not a natural representation for human-being in many applications. Motivated by the s… ▽ More In Weighted Model Counting (WMC), we assign weights to literals and compute the sum of the weights of the models of a given propositional formula where the weight of an assignment is the product of the weights of its literals. The current WMC solvers work on Conjunctive Normal Form (CNF) formulas. However, CNF is not a natural representation for human-being in many applications. Motivated by the stronger expressive power of pseudo-Boolean (PB) formulas than CNF, we propose to perform WMC on PB formulas. Based on a recent dynamic programming algorithm framework called ADDMC for WMC, we implement a weighted PB counting tool PBCounter. We compare PBCounter with the state-of-the-art weighted model counters SharpSAT-TD, ExactMC, D4, and ADDMC, where the latter tools work on CNF with encoding methods that convert PB constraints into a CNF formula. The experiments on three domains of benchmarks show that PBCounter is superior to the model counters on CNF formulas. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.15722 [pdf, other]

doi 10.3929/ethz-b-000646581

Frequency-Domain Identification of Discrete-Time Systems using Sum-of-Rational Optimization

Authors: Mohamed Abdalmoaty, Jared Miller, Mingzhou Yin, Roy S. Smith

Abstract: We propose a computationally tractable method for the identification of stable canonical discrete-time rational transfer function models, using frequency domain data. The problem is formulated as a global non-convex optimization problem whose objective function is the sum of weighted squared residuals at each observed frequency datapoint. Stability is enforced using a polynomial matrix inequality… ▽ More We propose a computationally tractable method for the identification of stable canonical discrete-time rational transfer function models, using frequency domain data. The problem is formulated as a global non-convex optimization problem whose objective function is the sum of weighted squared residuals at each observed frequency datapoint. Stability is enforced using a polynomial matrix inequality constraint. The problem is solved globally by a moment-sum-of-squares hierarchy of semidefinite programs through a framework for sum-of-rational-functions optimization. Convergence of the moment-sum-of-squares program is guaranteed as the bound on the degree of the sum-of-squares polynomials approaches infinity. The performance of the proposed method is demonstrated using numerical simulation examples. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: 6 pages

arXiv:2312.15717 [pdf, other]

Spatial-Temporal Interplay in Human Mobility: A Hierarchical Reinforcement Learning Approach with Hypergraph Representation

Authors: Zhaofan Zhang, Yanan Xiao, Lu Jiang, Dingqi Yang, Minghao Yin, Pengyang Wang

Abstract: In the realm of human mobility, the decision-making process for selecting the next-visit location is intricately influenced by a trade-off between spatial and temporal constraints, which are reflective of individual needs and preferences. This trade-off, however, varies across individuals, making the modeling of these spatial-temporal dynamics a formidable challenge. To address the problem, in thi… ▽ More In the realm of human mobility, the decision-making process for selecting the next-visit location is intricately influenced by a trade-off between spatial and temporal constraints, which are reflective of individual needs and preferences. This trade-off, however, varies across individuals, making the modeling of these spatial-temporal dynamics a formidable challenge. To address the problem, in this work, we introduce the "Spatial-temporal Induced Hierarchical Reinforcement Learning" (STI-HRL) framework, for capturing the interplay between spatial and temporal factors in human mobility decision-making. Specifically, STI-HRL employs a two-tiered decision-making process: the low-level focuses on disentangling spatial and temporal preferences using dedicated agents, while the high-level integrates these considerations to finalize the decision. To complement the hierarchical decision setting, we construct a hypergraph to organize historical data, encapsulating the multi-aspect semantics of human mobility. We propose a cross-channel hypergraph embedding module to learn the representations as the states to facilitate the decision-making cycle. Our extensive experiments on two real-world datasets validate the superiority of STI-HRL over state-of-the-art methods in predicting users' next visits across various performance metrics. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2312.02758 [pdf, ps, other]

Stochastic Data-Driven Predictive Control: Regularization, Estimation, and Constraint Tightening

Authors: Mingzhou Yin, Andrea Iannelli, Roy S. Smith

Abstract: Data-driven predictive control methods based on the Willems' fundamental lemma have shown great success in recent years. These approaches use receding horizon predictive control with nonparametric data-driven predictors instead of model-based predictors. This study addresses three problems of applying such algorithms under unbounded stochastic uncertainties: 1) tuning-free regularizer design, 2) i… ▽ More Data-driven predictive control methods based on the Willems' fundamental lemma have shown great success in recent years. These approaches use receding horizon predictive control with nonparametric data-driven predictors instead of model-based predictors. This study addresses three problems of applying such algorithms under unbounded stochastic uncertainties: 1) tuning-free regularizer design, 2) initial condition estimation, and 3) reliable constraint satisfaction, by using stochastic prediction error quantification. The regularizer is designed by leveraging the expected output cost. An initial condition estimator is proposed by filtering the measurements with the one-step-ahead stochastic data-driven prediction. A novel constraint-tightening method, using second-order cone constraints, is presented to ensure high-probability chance constraint satisfaction. Numerical results demonstrate that the proposed methods lead to satisfactory control performance in terms of both control cost and constraint satisfaction, with significantly improved initial condition estimation. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.17384 [pdf, ps, other]

Bases for optimising stabiliser decompositions of quantum states

Authors: Nadish de Silva, Ming Yin, Sergii Strelchuk

Abstract: Stabiliser states play a central role in the theory of quantum computation. For example, they are used to encode computational basis states in the most common quantum error correction schemes. Arbitrary quantum states admit many stabiliser decompositions: ways of being expressed as a superposition of stabiliser states. Understanding the structure of stabiliser decompositions has significant applic… ▽ More Stabiliser states play a central role in the theory of quantum computation. For example, they are used to encode computational basis states in the most common quantum error correction schemes. Arbitrary quantum states admit many stabiliser decompositions: ways of being expressed as a superposition of stabiliser states. Understanding the structure of stabiliser decompositions has significant applications in verifying and simulating near-term quantum computers. We introduce and study the vector space of linear dependencies of $n$-qubit stabiliser states. These spaces have canonical bases containing vectors whose size grows exponentially in $n$. We construct elegant bases of linear dependencies of constant size three. Critically, our sparse bases can be computed without first compiling a dictionary of all $n$-qubit stabiliser states. We utilise them to explicitly compute the stabiliser extent of states of more qubits than is feasible with existing techniques. Finally, we delineate future applications to improving theoretical bounds on the stabiliser rank of magic states. △ Less

Submitted 29 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: Python code for computing bases for spaces of stabiliser decompositions and applications re: the stabiliser extent of large states available at https://github.com/ndesilva/stabiliser-decomp-bases. To appear in Quantum Science and Technology

arXiv:2311.16502 [pdf, other]

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Authors: Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

Abstract: We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and… ▽ More We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of 14 open-source LMMs as well as the proprietary GPT-4V(ision) and Gemini highlights the substantial challenges posed by MMMU. Even the advanced GPT-4V and Gemini Ultra only achieve accuracies of 56% and 59% respectively, indicating significant room for improvement. We believe MMMU will stimulate the community to build next-generation multimodal foundation models towards expert artificial general intelligence. △ Less

Submitted 13 June, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: CVPR 2024 Oral

arXiv:2311.10357 [pdf, ps, other]

Fast algorithms for classical specifications of stabiliser states and Clifford gates

Authors: Nadish de Silva, Wilfred Salmon, Ming Yin

Abstract: The stabiliser formalism plays a central role in quantum computing, error correction, and fault-tolerance. Stabiliser states are used to encode computational basis states. Clifford gates are those which can be easily performed fault-tolerantly in the most common error correction schemes. Their mathematical properties are the subject of significant research interest. Conversions between and verif… ▽ More The stabiliser formalism plays a central role in quantum computing, error correction, and fault-tolerance. Stabiliser states are used to encode computational basis states. Clifford gates are those which can be easily performed fault-tolerantly in the most common error correction schemes. Their mathematical properties are the subject of significant research interest. Conversions between and verifications of different specifications of stabiliser states and Clifford gates are important components of many classical algorithms in quantum information, e.g. for gate synthesis, circuit optimisation, and for simulating quantum circuits. These core functions are also used in the numerical experiments critical to formulating and testing mathematical conjectures on the stabiliser formalism. We develop novel mathematical insights concerning stabiliser states and Clifford gates that significantly clarify their descriptions. We then utilise these to provide ten new fast algorithms which offer asymptotic advantages over any existing implementations. We show how to rapidly verify that a vector is a stabiliser state, and interconvert between its specification as amplitudes, a quadratic form, and a check matrix. These methods are leveraged to rapidly check if a given unitary matrix is a Clifford gate and to interconvert between the matrix of a Clifford gate and its compact specification as a stabiliser tableau. For example, we extract the stabiliser tableau of a Clifford gate matrix with $N^2$ entries in $O(N \log N)$ time. Remarkably, it is not necessary to read all the elements of a Clifford matrix to extract its stabiliser tableau. This is an asymptotic speedup over the best-known method that is superexponential in the number of qubits. We provide example implementations of our algorithms in Python. △ Less

Submitted 26 May, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: Python implementations available at https://github.com/ndesilva/stabiliser-tools. New in v2: new algorithm for extracting the stabiliser tableau of a Clifford gate matrix that is exponentially faster compared to v1, more thorough complexity analyses. New in v3: new and faster algorithms, comparisons with existing implementations

arXiv:2311.09019 [pdf, ps, other]

Closed-Loop Identification of Stabilized Models Using Dual Input-Output Parameterization

Authors: Ran Chen, Amber Srivastava, Mingzhou Yin, Roy S. Smith

Abstract: This paper introduces a dual input-output parameterization (dual IOP) for the identification of linear time-invariant systems from closed-loop data. It draws inspiration from the recent input-output parameterization developed to synthesize a stabilizing controller. The controller is parameterized in terms of closed-loop transfer functions, from the external disturbances to the input and output of… ▽ More This paper introduces a dual input-output parameterization (dual IOP) for the identification of linear time-invariant systems from closed-loop data. It draws inspiration from the recent input-output parameterization developed to synthesize a stabilizing controller. The controller is parameterized in terms of closed-loop transfer functions, from the external disturbances to the input and output of the system, constrained to lie in a given subspace. Analogously, the dual IOP method parameterizes the unknown plant with analogous closed-loop transfer functions, also referred to as dual parameters. In this case, these closed-loop transfer functions are constrained to lie in an affine subspace guaranteeing that the identified plant is \emph{stabilized} by the known controller. Compared with existing closed-loop identification techniques guaranteeing closed-loop stability, such as the dual Youla parameterization, the dual IOP neither requires a doubly-coprime factorization of the controller nor a nominal plant that is stabilized by the controller. The dual IOP does not depend on the order and the state-space realization of the controller either, as in the dual system-level parameterization. Simulation shows that the dual IOP outperforms the existing benchmark methods. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.02816 [pdf, other]

doi 10.1145/3583780.3614781

APGL4SR: A Generic Framework with Adaptive and Personalized Global Collaborative Information in Sequential Recommendation

Authors: Mingjia Yin, Hao Wang, Xiang Xu, Likang Wu, Sirui Zhao, Wei Guo, Yong Liu, Ruiming Tang, Defu Lian, Enhong Chen

Abstract: The sequential recommendation system has been widely studied for its promising effectiveness in capturing dynamic preferences buried in users' sequential behaviors. Despite the considerable achievements, existing methods usually focus on intra-sequence modeling while overlooking exploiting global collaborative information by inter-sequence modeling, resulting in inferior recommendation performance… ▽ More The sequential recommendation system has been widely studied for its promising effectiveness in capturing dynamic preferences buried in users' sequential behaviors. Despite the considerable achievements, existing methods usually focus on intra-sequence modeling while overlooking exploiting global collaborative information by inter-sequence modeling, resulting in inferior recommendation performance. Therefore, previous works attempt to tackle this problem with a global collaborative item graph constructed by pre-defined rules. However, these methods neglect two crucial properties when capturing global collaborative information, i.e., adaptiveness and personalization, yielding sub-optimal user representations. To this end, we propose a graph-driven framework, named Adaptive and Personalized Graph Learning for Sequential Recommendation (APGL4SR), that incorporates adaptive and personalized global collaborative information into sequential recommendation systems. Specifically, we first learn an adaptive global graph among all items and capture global collaborative information with it in a self-supervised fashion, whose computational burden can be further alleviated by the proposed SVD-based accelerator. Furthermore, based on the graph, we propose to extract and utilize personalized item correlations in the form of relative positional encoding, which is a highly compatible manner of personalizing the utilization of global collaborative information. Finally, the entire framework is optimized in a multi-task learning paradigm, thus each part of APGL4SR can be mutually reinforced. As a generic framework, APGL4SR can outperform other baselines with significant margins. The code is available at https://github.com/Graph-Team/APGL4SR. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2310.18919 [pdf, other]

Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation

Authors: Nikki Li**g Kuang, Ming Yin, Mengdi Wang, Yu-Xiang Wang, Yi-An Ma

Abstract: Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms typically rely on the accessibility of immediate feedback upon taking actions. The failure to account for the impact of delay in observations can significantly… ▽ More Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms typically rely on the accessibility of immediate feedback upon taking actions. The failure to account for the impact of delay in observations can significantly degrade the performance of real-world systems due to the regret blow-up. In this work, we tackle the challenge of delayed feedback in RL with linear function approximation by employing posterior sampling, which has been shown to empirically outperform the popular UCB algorithms in a wide range of regimes. We first introduce Delayed-PSVI, an optimistic value-based algorithm that effectively explores the value function space via noise perturbation with posterior sampling. We provide the first analysis for posterior sampling algorithms with delayed feedback in RL and show our algorithm achieves $\widetilde{O}(\sqrt{d^3H^3 T} + d^2H^2 E[τ])$ worst-case regret in the presence of unknown stochastic delays. Here $E[τ]$ is the expected delay. To further improve its computational efficiency and to expand its applicability in high-dimensional RL problems, we incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI, which maintains the same order-optimal regret guarantee with $\widetilde{O}(dHK)$ computational cost. Empirical evaluations are performed to demonstrate the statistical and computational efficacy of our algorithms. △ Less

Submitted 3 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

arXiv:2310.12026 [pdf, other]

Nonparametric Discrete Choice Experiments with Machine Learning Guided Adaptive Design

Authors: Mingzhang Yin, Ruijiang Gao, Weiran Lin, Steven M. Shugan

Abstract: Designing products to meet consumers' preferences is essential for a business's success. We propose the Gradient-based Survey (GBS), a discrete choice experiment for multiattribute product design. The experiment elicits consumer preferences through a sequence of paired comparisons for partial profiles. GBS adaptively constructs paired comparison questions based on the respondents' previous choices… ▽ More Designing products to meet consumers' preferences is essential for a business's success. We propose the Gradient-based Survey (GBS), a discrete choice experiment for multiattribute product design. The experiment elicits consumer preferences through a sequence of paired comparisons for partial profiles. GBS adaptively constructs paired comparison questions based on the respondents' previous choices. Unlike the traditional random utility maximization paradigm, GBS is robust to model misspecification by not requiring a parametric utility model. Cross-pollinating the machine learning and experiment design, GBS is scalable to products with hundreds of attributes and can design personalized products for heterogeneous consumers. We demonstrate the advantage of GBS in accuracy and sample efficiency compared to the existing parametric and nonparametric methods in simulations. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Showing 1–50 of 271 results for author: Yin, M