Search | arXiv e-print repository

doi 10.1145/3658644.3670349

UWBAD: Towards Effective and Imperceptible Jamming Attacks Against UWB Ranging Systems with COTS Chips

Authors: Yuqiao Yang, Zhongjie Wu, Yongzhao Zhang, Ting Chen, Jun Li, Jie Yang, Wenhao Liu, Xiaosong Zhang, Ruicong Shi, **gwei Li, Yu Jiang, Zhuo Su

Abstract: UWB ranging systems have been adopted in many critical and security sensitive applications due to its precise positioning and secure ranging capabilities. We present a practical jamming attack, namely UWBAD, against commercial UWB ranging systems, which exploits the vulnerability of the adoption of the normalized cross-correlation process in UWB ranging and can selectively and quickly block rangin… ▽ More UWB ranging systems have been adopted in many critical and security sensitive applications due to its precise positioning and secure ranging capabilities. We present a practical jamming attack, namely UWBAD, against commercial UWB ranging systems, which exploits the vulnerability of the adoption of the normalized cross-correlation process in UWB ranging and can selectively and quickly block ranging sessions without prior knowledge of the configurations of the victim devices, potentially leading to severe consequences such as property loss, unauthorized access, or vehicle theft. UWBAD achieves more effective and less imperceptible jamming due to: (i) it efficiently blocks every ranging session by leveraging the field-level jamming, thereby exerting a tangible impact on commercial UWB ranging systems, and (ii) the compact, reactive, and selective system design based on COTS UWB chips, making it affordable and less imperceptible. We successfully conducted real attacks against commercial UWB ranging systems from the three largest UWB chip vendors on the market, e.g., Apple, NXP, and Qorvo. We reported our findings to Apple, related Original Equipment Manufacturers (OEM), and the Automotive Security Research Group, triggering internal security incident response procedures at Volkswagen, Audi, Bosch, and NXP. As of the writing of this paper, the related OEM has acknowledged this vulnerability in their automotive systems and has offered a $5,000 reward as a bounty. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security

arXiv:2406.18853 [pdf, other]

Decoding-Time Language Model Alignment with Multiple Objectives

Authors: Ruizhe Shi, Yifang Chen, Yushi Hu, Alisa Liu, Hannaneh Hajishirzi, Noah A. Smith, Simon Du

Abstract: Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $\textbf{multi-objective decoding (MOD)}$, a decoding-time algorithm that outputs the next token from a lin… ▽ More Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $\textbf{multi-objective decoding (MOD)}$, a decoding-time algorithm that outputs the next token from a linear combination of predictions of all base models, for any given weightings over different objectives. We exploit a common form among a family of $f$-divergence regularized alignment approaches (such as PPO, DPO, and their variants) to identify a closed-form solution by Legendre transform, and derive an efficient decoding strategy. Theoretically, we show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method. Empirical results demonstrate the effectiveness of the algorithm. For example, compared to a parameter-merging baseline, MOD achieves 12.8% overall reward improvement when equally optimizing towards $3$ objectives. Moreover, we experiment with MOD on combining three fully-finetuned LLMs of different model sizes, each aimed at different objectives such as safety, coding, and general user preference. Unlike traditional methods that require careful curation of a mixture of datasets to achieve comprehensive improvement, we can quickly experiment with preference weightings using MOD to find the best combination of models. Our best combination reduces toxicity on Toxigen to nearly 0% and achieves 7.9--33.3% improvement across other three metrics ($\textit{i.e.}$, Codex@1, GSM-COT, BBH-COT). △ Less

Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.14880 [pdf, other]

Pathformer: Recursive Path Query Encoding for Complex Logical Query Answering

Authors: Chongzhi Zhang, Zhi** Peng, Junhao Zheng, Linghao Wang, Ruifeng Shi, Qianli Ma

Abstract: Complex Logical Query Answering (CLQA) over incomplete knowledge graphs is a challenging task. Recently, Query Embedding (QE) methods are proposed to solve CLQA by performing multi-hop logical reasoning. However, most of them only consider historical query context information while ignoring future information, which leads to their failure to capture the complex dependencies behind the elements of… ▽ More Complex Logical Query Answering (CLQA) over incomplete knowledge graphs is a challenging task. Recently, Query Embedding (QE) methods are proposed to solve CLQA by performing multi-hop logical reasoning. However, most of them only consider historical query context information while ignoring future information, which leads to their failure to capture the complex dependencies behind the elements of a query. In recent years, the transformer architecture has shown a strong ability to model long-range dependencies between words. The bidirectional attention mechanism proposed by the transformer can solve the limitation of these QE methods regarding query context. Still, as a sequence model, it is difficult for the transformer to model complex logical queries with branch structure computation graphs directly. To this end, we propose a neural one-point embedding method called Pathformer based on the tree-like computation graph, i.e., query computation tree. Specifically, Pathformer decomposes the query computation tree into path query sequences by branches and then uses the transformer encoder to recursively encode these path query sequences to obtain the final query embedding. This allows Pathformer to fully utilize future context information to explicitly model the complex interactions between various parts of the path query. Experimental results show that Pathformer outperforms existing competitive neural QE methods, and we found that Pathformer has the potential to be applied to non-one-point embedding space. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: This work has been submitted to the IEEE

arXiv:2406.04598 [pdf, other]

OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and Evaluation Framework

Authors: Wei Zhou, Hong Huang, Guowen Zhang, Ruize Shi, Kehan Yin, Yuanyuan Lin, Bang Liu

Abstract: Large language models (LLMs) have excelled in various natural language processing tasks, but challenges in interpretability and trustworthiness persist, limiting their use in high-stakes fields. Causal discovery offers a promising approach to improve transparency and reliability. However, current evaluations are often one-sided and lack assessments focused on interpretability performance. Addition… ▽ More Large language models (LLMs) have excelled in various natural language processing tasks, but challenges in interpretability and trustworthiness persist, limiting their use in high-stakes fields. Causal discovery offers a promising approach to improve transparency and reliability. However, current evaluations are often one-sided and lack assessments focused on interpretability performance. Additionally, these evaluations rely on synthetic data and lack comprehensive assessments of real-world datasets. These lead to promising methods potentially being overlooked. To address these issues, we propose a flexible evaluation framework with metrics for evaluating differences in causal structures and causal effects, which are crucial attributes that help improve the interpretability of LLMs. We introduce the Open Causal Discovery Benchmark (OCDB), based on real data, to promote fair comparisons and drive optimization of algorithms. Additionally, our new metrics account for undirected edges, enabling fair comparisons between Directed Acyclic Graphs (DAGs) and Completed Partially Directed Acyclic Graphs (CPDAGs). Experimental results show significant shortcomings in existing algorithms' generalization capabilities on real data, highlighting the potential for performance improvement and the importance of our framework in advancing causal discovery techniques. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.00738 [pdf, other]

Global Rewards in Restless Multi-Armed Bandits

Authors: Naveen Raman, Zheyuan Ryan Shi, Fei Fang

Abstract: Restless multi-armed bandits (RMAB) extend multi-armed bandits so pulling an arm impacts future states. Despite the success of RMABs, a key limiting assumption is the separability of rewards into a sum across arms. We address this deficiency by proposing restless-multi-armed bandit with global rewards (RMAB-G), a generalization of RMABs to global non-separable rewards. To solve RMAB-G, we develop… ▽ More Restless multi-armed bandits (RMAB) extend multi-armed bandits so pulling an arm impacts future states. Despite the success of RMABs, a key limiting assumption is the separability of rewards into a sum across arms. We address this deficiency by proposing restless-multi-armed bandit with global rewards (RMAB-G), a generalization of RMABs to global non-separable rewards. To solve RMAB-G, we develop the Linear- and Shapley-Whittle indices, which extend Whittle indices from RMABs to RMAB-Gs. We prove approximation bounds but also point out how these indices could fail when reward functions are highly non-linear. To overcome this, we propose two sets of adaptive policies: the first computes indices iteratively, and the second combines indices with Monte-Carlo Tree Search (MCTS). Empirically, we demonstrate that our proposed policies outperform baselines and index-based policies with synthetic data and real-world data from food rescue. △ Less

Submitted 7 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

Comments: 27 pages

arXiv:2405.17358 [pdf, other]

Rethinking Transformers in Solving POMDPs

Authors: Chenhao Lu, Ruizhe Shi, Yuyao Liu, Kaizhe Hu, Simon S. Du, Huazhe Xu

Abstract: Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability. This paper scrutinizes the effectiveness of a popular architecture, namely Transformers, in Partially Observable Markov Decision Processes (POMDPs) and reveals its theoretical limitations. We establish that regular languages, which Transformers… ▽ More Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability. This paper scrutinizes the effectiveness of a popular architecture, namely Transformers, in Partially Observable Markov Decision Processes (POMDPs) and reveals its theoretical limitations. We establish that regular languages, which Transformers struggle to model, are reducible to POMDPs. This poses a significant challenge for Transformers in learning POMDP-specific inductive biases, due to their lack of inherent recurrence found in other models like RNNs. This paper casts doubt on the prevalent belief in Transformers as sequence models for RL and proposes to introduce a point-wise recurrent structure. The Deep Linear Recurrent Unit (LRU) emerges as a well-suited alternative for Partially Observable RL, with empirical results highlighting the sub-optimal performance of the Transformer and considerable strength of LRU. △ Less

Submitted 30 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Comments: Accepted by ICML 2024; references added; typos fixed

arXiv:2405.05993 [pdf]

Precision Rehabilitation for Patients Post-Stroke based on Electronic Health Records and Machine Learning

Authors: Fengyi Gao, Xingyu Zhang, Sonish Sivarajkumar, Parker Denny, Bayan Aldhahwani, Shyam Visweswaran, Ryan Shi, William Hogan, Allyn Bove, Yanshan Wang

Abstract: In this study, we utilized statistical analysis and machine learning methods to examine whether rehabilitation exercises can improve patients post-stroke functional abilities, as well as forecast the improvement in functional abilities. Our dataset is patients' rehabilitation exercises and demographic information recorded in the unstructured electronic health records (EHRs) data and free-text reha… ▽ More In this study, we utilized statistical analysis and machine learning methods to examine whether rehabilitation exercises can improve patients post-stroke functional abilities, as well as forecast the improvement in functional abilities. Our dataset is patients' rehabilitation exercises and demographic information recorded in the unstructured electronic health records (EHRs) data and free-text rehabilitation procedure notes. We collected data for 265 stroke patients from the University of Pittsburgh Medical Center. We employed a pre-existing natural language processing (NLP) algorithm to extract data on rehabilitation exercises and developed a rule-based NLP algorithm to extract Activity Measure for Post-Acute Care (AM-PAC) scores, covering basic mobility (BM) and applied cognitive (AC) domains, from procedure notes. Changes in AM-PAC scores were classified based on the minimal clinically important difference (MCID), and significance was assessed using Friedman and Wilcoxon tests. To identify impactful exercises, we used Chi-square tests, Fisher's exact tests, and logistic regression for odds ratios. Additionally, we developed five machine learning models-logistic regression (LR), Adaboost (ADB), support vector machine (SVM), gradient boosting (GB), and random forest (RF)-to predict outcomes in functional ability. Statistical analyses revealed significant associations between functional improvements and specific exercises. The RF model achieved the best performance in predicting functional outcomes. In this study, we identified three rehabilitation exercises that significantly contributed to patient post-stroke functional ability improvement in the first two months. Additionally, the successful application of a machine learning model to predict patient-specific functional outcomes underscores the potential for precision rehabilitation. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2404.02655 [pdf, other]

Calibrating the Confidence of Large Language Models by Eliciting Fidelity

Authors: Mozhi Zhang, Mianqiu Huang, Rundong Shi, Linsen Guo, Chong Peng, Peng Yan, Yaqian Zhou, Xipeng Qiu

Abstract: Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the… ▽ More Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the \textit{Fidelity} to the answer generated by language models. Then, we propose a plug-and-play method to estimate the confidence of language models. Our method has shown good calibration performance by conducting experiments with 6 RLHF-LMs on four MCQA datasets. Moreover, we propose two novel metrics, IPR and CE, to evaluate the calibration of the model, and we have conducted a detailed discussion on \textit{Truly Well-Calibrated Confidence}. Our method could serve as a strong baseline, and we hope that this work will provide some insights into the model confidence calibration. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 17 pages, 13 figures

arXiv:2403.15033 [pdf, other]

Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

Authors: Qiaoqiao **, Xuanhong Chen, Meiguang **, Ying Chen, Rui Shi, Yucheng Zheng, Yupeng Zhu, Bingbing Ni

Abstract: Contemporary makeup approaches primarily hinge on unpaired learning paradigms, yet they grapple with the challenges of inaccurate supervision (e.g., face misalignment) and sophisticated facial prompts (including face parsing, and landmark detection). These challenges prohibit low-cost deployment of facial makeup models, especially on mobile devices. To solve above problems, we propose a brand-new… ▽ More Contemporary makeup approaches primarily hinge on unpaired learning paradigms, yet they grapple with the challenges of inaccurate supervision (e.g., face misalignment) and sophisticated facial prompts (including face parsing, and landmark detection). These challenges prohibit low-cost deployment of facial makeup models, especially on mobile devices. To solve above problems, we propose a brand-new learning paradigm, termed "Data Amplify Learning (DAL)," alongside a compact makeup model named "TinyBeauty." The core idea of DAL lies in employing a Diffusion-based Data Amplifier (DDA) to "amplify" limited images for the model training, thereby enabling accurate pixel-to-pixel supervision with merely a handful of annotations. Two pivotal innovations in DDA facilitate the above training approach: (1) A Residual Diffusion Model (RDM) is designed to generate high-fidelity detail and circumvent the detail vanishing problem in the vanilla diffusion models; (2) A Fine-Grained Makeup Module (FGMM) is proposed to achieve precise makeup control and combination while retaining face identity. Coupled with DAL, TinyBeauty necessitates merely 80K parameters to achieve a state-of-the-art performance without intricate face prompts. Meanwhile, TinyBeauty achieves a remarkable inference speed of up to 460 fps on the iPhone 13. Extensive experiments show that DAL can produce highly competitive makeup models using only 5 image pairs. △ Less

Submitted 8 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.12032 [pdf, other]

Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

Authors: Hansheng Chen, Ruoxi Shi, Yulin Liu, Bokui Shen, Jiayuan Gu, Gordon Wetzstein, Hao Su, Leonidas Guibas

Abstract: Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denois… ▽ More Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denoise multi-view images and output high-quality textured meshes. Built on off-the-shelf 2D diffusion models, MVEdit achieves 3D consistency through a training-free 3D Adapter, which lifts the 2D views of the last timestep into a coherent 3D representation, then conditions the 2D views of the next timestep using rendered views, without uncompromising visual quality. With an inference time of only 2-5 minutes, this framework achieves better trade-off between quality and speed than score distillation. MVEdit is highly versatile and extendable, with a wide range of applications including text/image-to-3D generation, 3D-to-3D editing, and high-quality texture synthesis. In particular, evaluations demonstrate state-of-the-art performance in both image-to-3D and text-guided texture generation tasks. Additionally, we introduce a method for fine-tuning 2D latent diffusion models on small 3D datasets with limited resources, enabling fast low-resolution text-to-3D initialization. △ Less

Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: V2 note: Fix missing acknowledgements. Project page: https://lakonik.github.io/mvedit

arXiv:2402.11818 [pdf, other]

Where It Really Matters: Few-Shot Environmental Conservation Media Monitoring for Low-Resource Languages

Authors: Sameer Jain, Sedrick Scott Keh, Shova Chettri, Karun Dewan, Pablo Izquierdo, Johanna Prussman, Pooja Shreshtha, Cesar Suarez, Zheyuan Ryan Shi, Lei Li, Fei Fang

Abstract: Environmental conservation organizations routinely monitor news content on conservation in protected areas to maintain situational awareness of developments that can have an environmental impact. Existing automated media monitoring systems require large amounts of data labeled by domain experts, which is only feasible at scale for high-resource languages like English. However, such tools are most… ▽ More Environmental conservation organizations routinely monitor news content on conservation in protected areas to maintain situational awareness of developments that can have an environmental impact. Existing automated media monitoring systems require large amounts of data labeled by domain experts, which is only feasible at scale for high-resource languages like English. However, such tools are most needed in the global south where news of interest is mainly in local low-resource languages, and far fewer experts are available to annotate datasets sustainably. In this paper, we propose NewsSerow, a method to automatically recognize environmental conservation content in low-resource languages. NewsSerow is a pipeline of summarization, in-context few-shot classification, and self-reflection using large language models (LLMs). Using at most 10 demonstration example news articles in Nepali, NewsSerow significantly outperforms other few-shot methods and achieves comparable performance with models fully fine-tuned using thousands of examples. The World Wide Fund for Nature (WWF) has deployed NewsSerow for media monitoring in Nepal, significantly reducing their operational burden, and ensuring that AI tools for conservation actually reach the communities that need them the most. NewsSerow has also been deployed for countries with other languages like Colombia. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: AAAI 2024: AI for Social Impact Track

arXiv:2402.09372 [pdf, other]

Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

Authors: Jiancheng Yang, Rui Shi, Liang **, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar… ▽ More Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website. As an independent contribution, we have also extended our previous internal baseline by incorporating recent advancements in large-scale pretrained networks and point-based rib segmentation techniques. The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

arXiv:2402.02026 [pdf, other]

Multimodal-Enhanced Objectness Learner for Corner Case Detection in Autonomous Driving

Authors: Lixing Xiao, Ruixiao Shi, Xiaoyang Tang, Yi Zhou

Abstract: Previous works on object detection have achieved high accuracy in closed-set scenarios, but their performance in open-world scenarios is not satisfactory. One of the challenging open-world problems is corner case detection in autonomous driving. Existing detectors struggle with these cases, relying heavily on visual appearance and exhibiting poor generalization ability. In this paper, we propose a… ▽ More Previous works on object detection have achieved high accuracy in closed-set scenarios, but their performance in open-world scenarios is not satisfactory. One of the challenging open-world problems is corner case detection in autonomous driving. Existing detectors struggle with these cases, relying heavily on visual appearance and exhibiting poor generalization ability. In this paper, we propose a solution by reducing the discrepancy between known and unknown classes and introduce a multimodal-enhanced objectness notion learner. Leveraging both vision-centric and image-text modalities, our semi-supervised learning framework imparts objectness knowledge to the student model, enabling class-aware detection. Our approach, Multimodal-Enhanced Objectness Learner (MENOL) for Corner Case Detection, significantly improves recall for novel classes with lower training costs. By achieving a 76.6% mAR-corner and 79.8% mAR-agnostic on the CODA-val dataset with just 5100 labeled training images, MENOL outperforms the baseline ORE by 71.3% and 60.6%, respectively. The code will be available at https://github.com/tryhiseyyysum/MENOL. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 7 pages,6 figures

arXiv:2401.00167 [pdf, other]

Leveraging Partial Symmetry for Multi-Agent Reinforcement Learning

Authors: Xin Yu, Rongye Shi, Pu Feng, Yongkai Tian, Simin Li, Shuhao Liao, Wenjun Wu

Abstract: Incorporating symmetry as an inductive bias into multi-agent reinforcement learning (MARL) has led to improvements in generalization, data efficiency, and physical consistency. While prior research has succeeded in using perfect symmetry prior, the realm of partial symmetry in the multi-agent domain remains unexplored. To fill in this gap, we introduce the partially symmetric Markov game, a new su… ▽ More Incorporating symmetry as an inductive bias into multi-agent reinforcement learning (MARL) has led to improvements in generalization, data efficiency, and physical consistency. While prior research has succeeded in using perfect symmetry prior, the realm of partial symmetry in the multi-agent domain remains unexplored. To fill in this gap, we introduce the partially symmetric Markov game, a new subclass of the Markov game. We then theoretically show that the performance error introduced by utilizing symmetry in MARL is bounded, implying that the symmetry prior can still be useful in MARL even in partial symmetry situations. Motivated by this insight, we propose the Partial Symmetry Exploitation (PSE) framework that is able to adaptively incorporate symmetry prior in MARL under different symmetry-breaking conditions. Specifically, by adaptively adjusting the exploitation of symmetry, our framework is able to achieve superior sample efficiency and overall performance of MARL algorithms. Extensive experiments are conducted to demonstrate the superior performance of the proposed framework over baselines. Finally, we implement the proposed framework in real-world multi-robot testbed to show its superiority. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: Accepted by AAAI2024

arXiv:2312.17372 [pdf, other]

Beyond PID Controllers: PPO with Neuralized PID Policy for Proton Beam Intensity Control in Mu2e

Authors: Chenwei Xu, Jerry Yao-Chieh Hu, Aakaash Narayanan, Mattson Thieme, Vladimir Nagaslaev, Mark Austin, Jeremy Arnold, Jose Berlioz, Pierrick Hanlet, Aisha Ibrahim, Dennis Nicklaus, Jovan Mitrevski, Jason Michael St. John, Gauri Pradhan, Andrea Saewert, Kiyomi Seiya, Brian Schupbach, Randy Thurman-Keup, Nhan Tran, Rui Shi, Seda Ogrenci, Alexis Maya-Isabelle Shu**, Kyle Hazelwood, Han Liu

Abstract: We introduce a novel Proximal Policy Optimization (PPO) algorithm aimed at addressing the challenge of maintaining a uniform proton beam intensity delivery in the Muon to Electron Conversion Experiment (Mu2e) at Fermi National Accelerator Laboratory (Fermilab). Our primary objective is to regulate the spill process to ensure a consistent intensity profile, with the ultimate goal of creating an aut… ▽ More We introduce a novel Proximal Policy Optimization (PPO) algorithm aimed at addressing the challenge of maintaining a uniform proton beam intensity delivery in the Muon to Electron Conversion Experiment (Mu2e) at Fermi National Accelerator Laboratory (Fermilab). Our primary objective is to regulate the spill process to ensure a consistent intensity profile, with the ultimate goal of creating an automated controller capable of providing real-time feedback and calibration of the Spill Regulation System (SRS) parameters on a millisecond timescale. We treat the Mu2e accelerator system as a Markov Decision Process suitable for Reinforcement Learning (RL), utilizing PPO to reduce bias and enhance training stability. A key innovation in our approach is the integration of a neuralized Proportional-Integral-Derivative (PID) controller into the policy function, resulting in a significant improvement in the Spill Duty Factor (SDF) by 13.6%, surpassing the performance of the current PID controller baseline by an additional 1.6%. This paper presents the preliminary offline results based on a differentiable simulator of the Mu2e accelerator. It paves the groundwork for real-time implementations and applications, representing a crucial step towards automated proton beam intensity control for the Mu2e experiment. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 10 pages, accepted at NeurIPS 2023 ML4Phy Workshop

arXiv:2312.15610 [pdf, other]

Towards Learning Geometric Eigen-Lengths Crucial for Fitting Tasks

Authors: Yijia Weng, Kaichun Mo, Ruoxi Shi, Yanchao Yang, Leonidas J. Guibas

Abstract: Some extremely low-dimensional yet crucial geometric eigen-lengths often determine the success of some geometric tasks. For example, the height of an object is important to measure to check if it can fit between the shelves of a cabinet, while the width of a couch is crucial when trying to move it through a doorway. Humans have materialized such crucial geometric eigen-lengths in common sense sinc… ▽ More Some extremely low-dimensional yet crucial geometric eigen-lengths often determine the success of some geometric tasks. For example, the height of an object is important to measure to check if it can fit between the shelves of a cabinet, while the width of a couch is crucial when trying to move it through a doorway. Humans have materialized such crucial geometric eigen-lengths in common sense since they are very useful in serving as succinct yet effective, highly interpretable, and universal object representations. However, it remains obscure and underexplored if learning systems can be equipped with similar capabilities of automatically discovering such key geometric quantities from doing tasks. In this work, we therefore for the first time formulate and propose a novel learning problem on this question and set up a benchmark suite including tasks, data, and evaluation metrics for studying the problem. We focus on a family of common fitting tasks as the testbed for the proposed learning problem. We explore potential solutions and demonstrate the feasibility of learning eigen-lengths from simply observing successful and failed fitting trials. We also attempt geometric grounding for more accurate eigen-length measurement and study the reusability of the learned eigen-lengths across multiple tasks. Our work marks the first exploratory step toward learning crucial geometric eigen-lengths and we hope it can inspire future research in tackling this important yet underexplored problem. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: ICML 2023. Project page: https://yijiaweng.github.io/geo-eigen-length

Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:36958-36977, 2023

arXiv:2312.15130 [pdf, other]

PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments

Authors: Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W. Harley, Leonidas Guibas, Cewu Lu

Abstract: Pose estimation is a crucial task in computer vision and robotics, enabling the tracking and manipulation of objects in images or videos. While several datasets exist for pose estimation, there is a lack of large-scale datasets specifically focusing on cluttered scenes with occlusions. We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the d… ▽ More Pose estimation is a crucial task in computer vision and robotics, enabling the tracking and manipulation of objects in images or videos. While several datasets exist for pose estimation, there is a lack of large-scale datasets specifically focusing on cluttered scenes with occlusions. We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the development and evaluation of pose estimation methods in cluttered scenarios. PACE consists of 54,945 frames with 257,673 annotations across 300 videos, covering 576 objects from 44 categories and featuring a mix of rigid and articulated items in cluttered scenes. To annotate the real-world data efficiently, we developed an innovative annotation system utilizing a calibrated 3-camera setup. We test state-of-the-art algorithms in PACE along two tracks: pose estimation, and object pose tracking, revealing the benchmark's challenges and research opportunities. Our code and data is available on https://github.com/qq456cvb/PACE. △ Less

Submitted 31 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.09249 [pdf, other]

ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining

Authors: Ruoxi Shi, Xinyue Wei, Cheng Wang, Hao Su

Abstract: We present ZeroRF, a novel per-scene optimization method addressing the challenge of sparse view 360° reconstruction in neural field representations. Current breakthroughs like Neural Radiance Fields (NeRF) have demonstrated high-fidelity image synthesis but struggle with sparse input views. Existing methods, such as Generalizable NeRFs and per-scene optimization approaches, face limitations in da… ▽ More We present ZeroRF, a novel per-scene optimization method addressing the challenge of sparse view 360° reconstruction in neural field representations. Current breakthroughs like Neural Radiance Fields (NeRF) have demonstrated high-fidelity image synthesis but struggle with sparse input views. Existing methods, such as Generalizable NeRFs and per-scene optimization approaches, face limitations in data dependency, computational cost, and generalization across diverse scenarios. To overcome these challenges, we propose ZeroRF, whose key idea is to integrate a tailored Deep Image Prior into a factorized NeRF representation. Unlike traditional methods, ZeroRF parametrizes feature grids with a neural network generator, enabling efficient sparse view 360° reconstruction without any pretraining or additional regularization. Extensive experiments showcase ZeroRF's versatility and superiority in terms of both quality and speed, achieving state-of-the-art results on benchmark datasets. ZeroRF's significance extends to applications in 3D content generation and editing. Project page: https://sarahweiii.github.io/zerorf/ △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Project page: https://sarahweiii.github.io/zerorf/

arXiv:2311.07885 [pdf, other]

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Authors: Minghua Liu, Ruoxi Shi, Linghao Chen, Zhuoyang Zhang, Chao Xu, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, Hao Su

Abstract: Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts. However, most existing models fall short in simultaneously providing rapid generation speeds and high fidelity to input images - two features essential for practical applications. In this paper, we present One-2-3-45++, an… ▽ More Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts. However, most existing models fall short in simultaneously providing rapid generation speeds and high fidelity to input images - two features essential for practical applications. In this paper, we present One-2-3-45++, an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute. Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data. This is achieved by initially finetuning a 2D diffusion model for consistent multi-view image generation, followed by elevating these images to 3D with the aid of multi-view conditioned 3D native diffusion models. Extensive experimental evaluations demonstrate that our method can produce high-quality, diverse 3D assets that closely mirror the original input image. Our project webpage: https://sudo-ai-3d.github.io/One2345plus_page. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.05716 [pdf, other]

ML-based Real-Time Control at the Edge: An Approach Using hls4ml

Authors: R. Shi, S. Ogrenci, J. M. Arnold, J. R. Berlioz, P. Hanlet, K. J. Hazelwood, M. A. Ibrahim, H. Liu, V. P. Nagaslaev, A. Narayanan 1, D. J. Nicklaus, J. Mitrevski, G. Pradhan, A. L. Saewert, B. A. Schupbach, K. Seiya, M. Thieme, R. M. Thurman-Keup, N. V. Tran

Abstract: This study focuses on implementing a real-time control system for a particle accelerator facility that performs high energy physics experiments. A critical operating parameter in this facility is beam loss, which is the fraction of particles deviating from the accelerated proton beam into a cascade of secondary particles. Accelerators employ a large number of sensors to monitor beam loss. The data… ▽ More This study focuses on implementing a real-time control system for a particle accelerator facility that performs high energy physics experiments. A critical operating parameter in this facility is beam loss, which is the fraction of particles deviating from the accelerated proton beam into a cascade of secondary particles. Accelerators employ a large number of sensors to monitor beam loss. The data from these sensors is monitored by human operators who predict the relative contribution of different sub-systems to the beam loss. Using this information, they engage control interventions. In this paper, we present a controller to track this phenomenon in real-time using edge-Machine Learning (ML) and support control with low latency and high accuracy. We implemented this system on an Intel Arria 10 SoC. Optimizations at the algorithm, high-level synthesis, and interface levels to improve latency and resource usage are presented. Our design implements a neural network, which can predict the main source of beam loss (between two possible causes) at speeds up to 575 frames per second (fps) (average latency of 1.74 ms). The practical deployed system is required to operate at 320 fps, with a 3ms latency requirement, which has been met by our design successfully. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.02221 [pdf, other]

Structured Neural Networks for Density Estimation and Causal Inference

Authors: Asic Q. Chen, Ruian Shi, Xiang Gao, Ricardo Baptista, Rahul G. Krishnan

Abstract: Injecting structure into neural networks enables learning functions that satisfy invariances with respect to subsets of inputs. For instance, when learning generative models using neural networks, it is advantageous to encode the conditional independence structure of observed variables, often in the form of Bayesian networks. We propose the Structured Neural Network (StrNN), which injects structur… ▽ More Injecting structure into neural networks enables learning functions that satisfy invariances with respect to subsets of inputs. For instance, when learning generative models using neural networks, it is advantageous to encode the conditional independence structure of observed variables, often in the form of Bayesian networks. We propose the Structured Neural Network (StrNN), which injects structure through masking pathways in a neural network. The masks are designed via a novel relationship we explore between neural network architectures and binary matrix factorization, to ensure that the desired independencies are respected. We devise and study practical algorithms for this otherwise NP-hard design problem based on novel objectives that control the model architecture. We demonstrate the utility of StrNN in three applications: (1) binary and Gaussian density estimation with StrNN, (2) real-valued density estimation with Structured Autoregressive Flows (StrAFs) and Structured Continuous Normalizing Flows (StrCNF), and (3) interventional and counterfactual analysis with StrAFs for causal inference. Our work opens up new avenues for learning neural networks that enable data-efficient generative modeling and the use of normalizing flows for causal effect estimation. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 10 pages with 5 figures, to be published in Neural Information Processing Systems 2023

arXiv:2310.20587 [pdf, other]

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Authors: Ruizhe Shi, Yuyao Liu, Yanjie Ze, Simon S. Du, Huazhe Xu

Abstract: Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces $\textbf{La}$nguage Models… ▽ More Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces $\textbf{La}$nguage Models for $\textbf{Mo}$tion Control ($\textbf{LaMo}$), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate $\textbf{LaMo}$ achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples. △ Less

Submitted 27 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: 24 pages, 16 tables

arXiv:2310.20172 [pdf, other]

Compact Binary Systems Waveform Generation with Generative Pre-trained Transformer

Authors: Ruijun Shi, Yue Zhou, Tianyu Zhao, Zhoujian Cao, Zhixiang Ren

Abstract: Space-based gravitational wave (GW) detection is one of the most anticipated GW detection projects in the next decade, which promises to detect abundant compact binary systems. At present, deep learning methods have not been widely explored for GW waveform generation and extrapolation. To solve the data processing difficulty and the increasing waveform complexity caused by the detector's response… ▽ More Space-based gravitational wave (GW) detection is one of the most anticipated GW detection projects in the next decade, which promises to detect abundant compact binary systems. At present, deep learning methods have not been widely explored for GW waveform generation and extrapolation. To solve the data processing difficulty and the increasing waveform complexity caused by the detector's response and second-generation time-delay interferometry (TDI 2.0), an interpretable pre-trained large model named CBS-GPT (Compact Binary Systems Waveform Generation with Generative Pre-trained Transformer) is proposed. For compact binary system waveforms, three models were trained to predict the waveforms of massive black hole binaries (MBHB), extreme mass-ratio inspirals (EMRIs), and galactic binaries (GB), achieving prediction accuracies of at most 99%, 91%, and 99%, respectively. The CBS-GPT model exhibits notable generalization and interpretability, with its hidden parameters effectively capturing the intricate information of waveforms, even with the complex instrument response and a wide parameter range. Our research demonstrates the potential of large models in the GW realm, opening up new opportunities and guidance for future researches such as complex waveforms generation, gap completion, and deep learning model design for GW science. △ Less

Submitted 5 March, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.15110 [pdf, other]

Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model

Authors: Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, Hao Su

Abstract: We report Zero123++, an image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view. To take full advantage of pretrained 2D generative priors, we develop various conditioning and training schemes to minimize the effort of finetuning from off-the-shelf image diffusion models such as Stable Diffusion. Zero123++ excels in producing high-quality, consiste… ▽ More We report Zero123++, an image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view. To take full advantage of pretrained 2D generative priors, we develop various conditioning and training schemes to minimize the effort of finetuning from off-the-shelf image diffusion models such as Stable Diffusion. Zero123++ excels in producing high-quality, consistent multi-view images from a single image, overcoming common issues like texture degradation and geometric misalignment. Furthermore, we showcase the feasibility of training a ControlNet on Zero123++ for enhanced control over the generation process. The code is available at https://github.com/SUDO-AI-3D/zero123plus. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.11021 [pdf, other]

Dynamic quantum circuit compilation

Authors: Kun Fang, Munan Zhang, Ruqi Shi, Yinan Li

Abstract: Quantum computing has shown tremendous promise in addressing complex computational problems, yet its practical realization is hindered by the limited availability of qubits for computation. Recent advancements in quantum hardware have introduced mid-circuit measurements and resets, enabling the reuse of measured qubits and significantly reducing the qubit requirements for executing quantum algorit… ▽ More Quantum computing has shown tremendous promise in addressing complex computational problems, yet its practical realization is hindered by the limited availability of qubits for computation. Recent advancements in quantum hardware have introduced mid-circuit measurements and resets, enabling the reuse of measured qubits and significantly reducing the qubit requirements for executing quantum algorithms. In this work, we present a systematic study of dynamic quantum circuit compilation, a process that transforms static quantum circuits into their dynamic equivalents with a reduced qubit count through qubit-reuse. We establish the first general framework for optimizing the dynamic circuit compilation via graph manipulation. In particular, we completely characterize the optimal quantum circuit compilation using binary integer programming, provide efficient algorithms for determining whether a given quantum circuit can be reduced to a smaller circuit and present heuristic algorithms for devising dynamic compilation schemes in general. Furthermore, we conduct a thorough analysis of quantum circuits with practical relevance, offering optimal compilations for well-known quantum algorithms in quantum computation, ansatz circuits utilized in quantum machine learning, and measurement-based quantum computation crucial for quantum networking. We also perform a comparative analysis against state-of-the-art approaches, demonstrating the superior performance of our methods in both structured and random quantum circuits. Our framework lays a rigorous foundation for comprehending dynamic quantum circuit compilation via qubit-reuse, bridging the gap between theoretical quantum algorithms and their physical implementation on quantum computers with limited resources. △ Less

Submitted 21 November, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: 51 pages, 32 figures; comments are welcome; v2 reorganize the writing and strengthen the results

arXiv:2310.08738 [pdf, other]

Splicing Up Your Predictions with RNA Contrastive Learning

Authors: Philip Fradkin, Ruian Shi, Bo Wang, Brendan Frey, Leo J. Lee

Abstract: In the face of rapidly accumulating genomic data, our understanding of the RNA regulatory code remains incomplete. Recent self-supervised methods in other domains have demonstrated the ability to learn rules underlying the data-generating process such as sentence structure in language. Inspired by this, we extend contrastive learning techniques to genomic data by utilizing functional similarities… ▽ More In the face of rapidly accumulating genomic data, our understanding of the RNA regulatory code remains incomplete. Recent self-supervised methods in other domains have demonstrated the ability to learn rules underlying the data-generating process such as sentence structure in language. Inspired by this, we extend contrastive learning techniques to genomic data by utilizing functional similarities between sequences generated through alternative splicing and gene duplication. Our novel dataset and contrastive objective enable the learning of generalized RNA isoform representations. We validate their utility on downstream tasks such as RNA half-life and mean ribosome load prediction. Our pre-training strategy yields competitive results using linear probing on both tasks, along with up to a two-fold increase in Pearson correlation in low-data conditions. Importantly, our exploration of the learned latent space reveals that our contrastive objective yields semantically meaningful representations, underscoring its potential as a valuable initialization technique for RNA property prediction. △ Less

Submitted 17 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.01404 [pdf, other]

H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation

Authors: Yanjie Ze, Yuyao Liu, Ruizhe Shi, Jiaxin Qin, Zhecheng Yuan, Jiashun Wang, Huazhe Xu

Abstract: Human hands possess remarkable dexterity and have long served as a source of inspiration for robotic manipulation. In this work, we propose a human $\textbf{H}$and$\textbf{-In}$formed visual representation learning framework to solve difficult $\textbf{Dex}$terous manipulation tasks ($\textbf{H-InDex}$) with reinforcement learning. Our framework consists of three stages: (i) pre-training represent… ▽ More Human hands possess remarkable dexterity and have long served as a source of inspiration for robotic manipulation. In this work, we propose a human $\textbf{H}$and$\textbf{-In}$formed visual representation learning framework to solve difficult $\textbf{Dex}$terous manipulation tasks ($\textbf{H-InDex}$) with reinforcement learning. Our framework consists of three stages: (i) pre-training representations with 3D human hand pose estimation, (ii) offline adapting representations with self-supervised keypoint detection, and (iii) reinforcement learning with exponential moving average BatchNorm. The last two stages only modify $0.36\%$ parameters of the pre-trained representation in total, ensuring the knowledge from pre-training is maintained to the full extent. We empirically study 12 challenging dexterous manipulation tasks and find that H-InDex largely surpasses strong baseline methods and the recent visual foundation models for motor control. Code is available at https://yanjieze.com/H-InDex . △ Less

Submitted 12 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023. Code and videos: https://yanjieze.com/H-InDex

arXiv:2309.13626 [pdf]

Crack-Net: Prediction of Crack Propagation in Composites

Authors: Hao Xu, Wei Fan, Ambrose C. Taylor, Dongxiao Zhang, Lecheng Ruan, Rundong Shi

Abstract: Computational solid mechanics has become an indispensable approach in engineering, and numerical investigation of fracture in composites is essential as composites are widely used in structural applications. Crack evolution in composites is the bridge to elucidate the relationship between the microstructure and fracture performance, but crack-based finite element methods are computationally expens… ▽ More Computational solid mechanics has become an indispensable approach in engineering, and numerical investigation of fracture in composites is essential as composites are widely used in structural applications. Crack evolution in composites is the bridge to elucidate the relationship between the microstructure and fracture performance, but crack-based finite element methods are computationally expensive and time-consuming, limiting their application in computation-intensive scenarios. Here we propose a deep learning framework called Crack-Net, which incorporates the relationship between crack evolution and stress response to predict the fracture process in composites. Trained on a high-precision fracture development dataset generated using the phase field method, Crack-Net demonstrates a remarkable capability to accurately forecast the long-term evolution of crack growth patterns and the stress-strain curve for a given composite design. The Crack-Net captures the essential principle of crack growth, which enables it to handle more complex microstructures such as binary co-continuous structures. Moreover, transfer learning is adopted to further improve the generalization ability of Crack-Net for composite materials with reinforcements of different strengths. The proposed Crack-Net holds great promise for practical applications in engineering and materials science, in which accurate and efficient fracture prediction is crucial for optimizing material performance and microstructural design. △ Less

Submitted 24 September, 2023; originally announced September 2023.

arXiv:2308.16422 [pdf, other]

doi 10.1103/PhysRevD.109.084054

Dilated convolutional neural network for detecting extreme-mass-ratio inspirals

Authors: Tianyu Zhao, Yue Zhou, Ruijun Shi, Zhoujian Cao, Zhixiang Ren

Abstract: The detection of Extreme Mass Ratio Inspirals (EMRIs) is intricate due to their complex waveforms, extended duration, and low signal-to-noise ratio (SNR), making them more challenging to be identified compared to compact binary coalescences. While matched filtering-based techniques are known for their computational demands, existing deep learning-based methods primarily handle time-domain data and… ▽ More The detection of Extreme Mass Ratio Inspirals (EMRIs) is intricate due to their complex waveforms, extended duration, and low signal-to-noise ratio (SNR), making them more challenging to be identified compared to compact binary coalescences. While matched filtering-based techniques are known for their computational demands, existing deep learning-based methods primarily handle time-domain data and are often constrained by data duration and SNR. In addition, most existing work ignores time-delay interferometry (TDI) and applies the long-wavelength approximation in detector response calculations, thus limiting their ability to handle laser frequency noise. In this study, we introduce DECODE, an end-to-end model focusing on EMRI signal detection by sequence modeling in the frequency domain. Centered around a dilated causal convolutional neural network, trained on synthetic data considering TDI-1.5 detector response, DECODE can efficiently process a year's worth of multichannel TDI data with an SNR of around 50. We evaluate our model on 1-year data with accumulated SNR ranging from 50 to 120 and achieve a true positive rate of 96.3% at a false positive rate of 1%, kee** an inference time of less than 0.01 seconds. With the visualization of three showcased EMRI signals for interpretability and generalization, DECODE exhibits strong potential for future space-based gravitational wave data analyses. △ Less

Submitted 14 May, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: 11 pages, 5 figures, and 2 tables

Journal ref: Phys. Rev. D 109, 084054 (2024)

arXiv:2308.12530 [pdf, other]

SieveNet: Selecting Point-Based Features for Mesh Networks

Authors: Shengchao Yuan, Yishun Dou, Rui Shi, Bingbing Ni, Zhong Zheng

Abstract: Meshes are widely used in 3D computer vision and graphics, but their irregular topology poses challenges in applying them to existing neural network architectures. Recent advances in mesh neural networks turn to remeshing and push the boundary of pioneer methods that solely take the raw meshes as input. Although the remeshing offers a regular topology that significantly facilitates the design of m… ▽ More Meshes are widely used in 3D computer vision and graphics, but their irregular topology poses challenges in applying them to existing neural network architectures. Recent advances in mesh neural networks turn to remeshing and push the boundary of pioneer methods that solely take the raw meshes as input. Although the remeshing offers a regular topology that significantly facilitates the design of mesh network architectures, features extracted from such remeshed proxies may struggle to retain the underlying geometry faithfully, limiting the subsequent neural network's capacity. To address this issue, we propose SieveNet, a novel paradigm that takes into account both the regular topology and the exact geometry. Specifically, this method utilizes structured mesh topology from remeshing and accurate geometric information from distortion-aware point sampling on the surface of the original mesh. Furthermore, our method eliminates the need for hand-crafted feature engineering and can leverage off-the-shelf network architectures such as the vision transformer. Comprehensive experimental results on classification and segmentation tasks well demonstrate the effectiveness and superiority of our method. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: The project homepage is https://sievenet.github.io/

arXiv:2308.12515 [pdf, other]

Expanding Targets in Virtual Reality Environments: A Fitts' Law Study

Authors: Rongkai Shi, Yushi Wei, Yue Li, Lingyun Yu, Hai-Ning Liang

Abstract: Target pointing selection is a fundamental task. According to Fitts' law, users need more time to select targets with smaller sizes. Expanding the target to a larger size is a practical approach that can facilitate pointing selection. It has been well-examined and -deployed in 2D user interfaces. However, limited research has investigated target expansion methods using an immersive virtual reality… ▽ More Target pointing selection is a fundamental task. According to Fitts' law, users need more time to select targets with smaller sizes. Expanding the target to a larger size is a practical approach that can facilitate pointing selection. It has been well-examined and -deployed in 2D user interfaces. However, limited research has investigated target expansion methods using an immersive virtual reality (VR) head-mounted display (HMD). In this work, we aimed to fill this gap by conducting a user study using ISO 9241-411 multi-directional pointing task to examine the effect of target expansion on target selection performance in VR HMD. Based on our results, we found that compared to not expanding the target, expanding the target width by 1.5 and 2.5 times during the movement can significantly reduce the selection time. We hope that the design and results derived from the study can help frame future work. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: 4 pages, 3 figures

arXiv:2308.02827 [pdf, other]

SwinGar: Spectrum-Inspired Neural Dynamic Deformation for Free-Swinging Garments

Authors: Tianxing Li, Rui Shi, Qing Zhu, Takashi Kanai

Abstract: Our work presents a novel spectrum-inspired learning-based approach for generating clothing deformations with dynamic effects and personalized details. Existing methods in the field of clothing animation are limited to either static behavior or specific network models for individual garments, which hinders their applicability in real-world scenarios where diverse animated garments are required. Ou… ▽ More Our work presents a novel spectrum-inspired learning-based approach for generating clothing deformations with dynamic effects and personalized details. Existing methods in the field of clothing animation are limited to either static behavior or specific network models for individual garments, which hinders their applicability in real-world scenarios where diverse animated garments are required. Our proposed method overcomes these limitations by providing a unified framework that predicts dynamic behavior for different garments with arbitrary topology and looseness, resulting in versatile and realistic deformations. First, we observe that the problem of bias towards low frequency always hampers supervised learning and leads to overly smooth deformations. To address this issue, we introduce a frequency-control strategy from a spectral perspective that enhances the generation of high-frequency details of the deformation. In addition, to make the network highly generalizable and able to learn various clothing deformations effectively, we propose a spectral descriptor to achieve a generalized description of the global shape information. Building on the above strategies, we develop a dynamic clothing deformation estimator that integrates frequency-controllable attention mechanisms with long short-term memory. The estimator takes as input expressive features from garments and human bodies, allowing it to automatically output continuous deformations for diverse clothing types, independent of mesh topology or vertex count. Finally, we present a neural collision handling method to further enhance the realism of garments. Our experimental results demonstrate the effectiveness of our approach on a variety of free-swinging garments and its superiority over state-of-the-art methods. △ Less

Submitted 5 August, 2023; originally announced August 2023.

arXiv:2307.16186 [pdf, other]

doi 10.3233/FAIA230609

ESP: Exploiting Symmetry Prior for Multi-Agent Reinforcement Learning

Authors: Xin Yu, Rongye Shi, Pu Feng, Yongkai Tian, Jie Luo, Wenjun Wu

Abstract: Multi-agent reinforcement learning (MARL) has achieved promising results in recent years. However, most existing reinforcement learning methods require a large amount of data for model training. In addition, data-efficient reinforcement learning requires the construction of strong inductive biases, which are ignored in the current MARL approaches. Inspired by the symmetry phenomenon in multi-agent… ▽ More Multi-agent reinforcement learning (MARL) has achieved promising results in recent years. However, most existing reinforcement learning methods require a large amount of data for model training. In addition, data-efficient reinforcement learning requires the construction of strong inductive biases, which are ignored in the current MARL approaches. Inspired by the symmetry phenomenon in multi-agent systems, this paper proposes a framework for exploiting prior knowledge by integrating data augmentation and a well-designed consistency loss into the existing MARL methods. In addition, the proposed framework is model-agnostic and can be applied to most of the current MARL algorithms. Experimental tests on multiple challenging tasks demonstrate the effectiveness of the proposed framework. Moreover, the proposed framework is applied to a physical multi-robot testbed to show its superiority. △ Less

Submitted 9 August, 2023; v1 submitted 30 July, 2023; originally announced July 2023.

Comments: Accepted by ECAI 2023

arXiv:2307.02666 [pdf, other]

Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models

Authors: Huwan Peng, Scott Davidson, Richard Shi, Shuaiwen Leon Song, Michael Taylor

Abstract: Large language models (LLMs) such as OpenAI's ChatGPT and Google's Gemini have demonstrated unprecedented capabilities of autoregressive AI models across multiple tasks triggering disruptive technology innovations around the world. However, as models continue to grow the cost to serve these models also continues to grow threatening the democratization of LLMs. To address this issue, we propose C… ▽ More Large language models (LLMs) such as OpenAI's ChatGPT and Google's Gemini have demonstrated unprecedented capabilities of autoregressive AI models across multiple tasks triggering disruptive technology innovations around the world. However, as models continue to grow the cost to serve these models also continues to grow threatening the democratization of LLMs. To address this issue, we propose Chiplet Cloud, a chiplet-based ASIC LLM-supercomputer architecture whose goal is to optimize the total cost of ownership (TCO) per generated token. This architecture is a highly parameterizable ASIC and server-level architecture leveraging thousands of replicated accelerator modules collaborating to scale-up the performance of LLMs at cloud-scale. To determine specific parameterizations of the Chiplet Cloud architecture, we implemented a two-phase hardware-software co-design methodology that can search the massive design space and fine tune the architecture across a collection of LLMs based on an accurate inference simulation. A common bottleneck for LLMs is the memory access performance therefore we introduce CC-MEM, a scalable on-chip memory system for Chiplet Cloud architectures. Using the CC-MEM, Chiplet Clouds can be built using only SRAMs for design points where the power and performance of memory access is critical. The CC-MEM also includes a compression decoder module to add support for sparse models without impacting the compute units using a Store-as-Compressed, Load-as-Dense mechanism. We evaluate Chiplet Cloud architectures across eight popular LLMs. Using fine tuned Chiplet Cloud servers we are able to achieve $97\times$ and $18\times$ improvement in TCO/Token over rented GPU and TPU clouds, or a $8.3\times$ and $3.7\times$ improvement over fabricated GPU and TPU clouds respectively. Chiplet Cloud can also support $1.7\times$ larger models with a sparsity of 60\%. △ Less

Submitted 20 May, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.11182 [pdf, other]

Co-design Hardware and Algorithm for Vector Search

Authors: Wenqi Jiang, Shigang Li, Yu Zhu, Johannes de Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso

Abstract: Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware of… ▽ More Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers. △ Less

Submitted 6 July, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 11 pages

arXiv:2305.10764 [pdf, other]

OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding

Authors: Minghua Liu, Ruoxi Shi, Kaiming Kuang, Yinhao Zhu, Xuanlin Li, Shizhong Han, Hong Cai, Fatih Porikli, Hao Su

Abstract: We introduce OpenShape, a method for learning multi-modal joint representations of text, image, and point clouds. We adopt the commonly used multi-modal contrastive learning framework for representation alignment, but with a specific focus on scaling up 3D representations to enable open-world 3D shape understanding. To achieve this, we scale up training data by ensembling multiple 3D datasets and… ▽ More We introduce OpenShape, a method for learning multi-modal joint representations of text, image, and point clouds. We adopt the commonly used multi-modal contrastive learning framework for representation alignment, but with a specific focus on scaling up 3D representations to enable open-world 3D shape understanding. To achieve this, we scale up training data by ensembling multiple 3D datasets and propose several strategies to automatically filter and enrich noisy text descriptions. We also explore and compare strategies for scaling 3D backbone networks and introduce a novel hard negative mining module for more efficient training. We evaluate OpenShape on zero-shot 3D classification benchmarks and demonstrate its superior capabilities for open-world recognition. Specifically, OpenShape achieves a zero-shot accuracy of 46.8% on the 1,156-category Objaverse-LVIS benchmark, compared to less than 10% for existing methods. OpenShape also achieves an accuracy of 85.3% on ModelNet40, outperforming previous zero-shot baseline methods by 20% and performing on par with some fully-supervised methods. Furthermore, we show that our learned embeddings encode a wide range of visual and semantic concepts (e.g., subcategories, color, shape, style) and facilitate fine-grained text-3D and image-3D interactions. Due to their alignment with CLIP embeddings, our learned shape representations can also be integrated with off-the-shelf CLIP-based models for various applications, such as point cloud captioning and point cloud-conditioned image generation. △ Less

Submitted 16 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: Project Website: https://colin97.github.io/OpenShape/

arXiv:2305.01503 [pdf, other]

NewsPanda: Media Monitoring for Timely Conservation Action

Authors: Sedrick Scott Keh, Zheyuan Ryan Shi, David J. Patterson, Nirmal Bhagabati, Karun Dewan, Areendran Gopala, Pablo Izquierdo, Debojyoti Mallick, Ambika Sharma, Pooja Shrestha, Fei Fang

Abstract: Non-governmental organizations for environmental conservation have a significant interest in monitoring conservation-related media and getting timely updates about infrastructure construction projects as they may cause massive impact to key conservation areas. Such monitoring, however, is difficult and time-consuming. We introduce NewsPanda, a toolkit which automatically detects and analyzes onlin… ▽ More Non-governmental organizations for environmental conservation have a significant interest in monitoring conservation-related media and getting timely updates about infrastructure construction projects as they may cause massive impact to key conservation areas. Such monitoring, however, is difficult and time-consuming. We introduce NewsPanda, a toolkit which automatically detects and analyzes online articles related to environmental conservation and infrastructure construction. We fine-tune a BERT-based model using active learning methods and noise correction algorithms to identify articles that are relevant to conservation and infrastructure construction. For the identified articles, we perform further analysis, extracting keywords and finding potentially related sources. NewsPanda has been successfully deployed by the World Wide Fund for Nature teams in the UK, India, and Nepal since February 2022. It currently monitors over 80,000 websites and 1,074 conservation sites across India and Nepal, saving more than 30 hours of human efforts weekly. We have now scaled it up to cover 60,000 conservation sites globally. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: Accepted to IAAI-23: 35th Annual Conference on Innovative Applications of Artificial Intelligence. Winner of IAAI Deployed Application Award. Code at https://github.com/NewsPanda-WWF-CMU/weekly-pipeline

arXiv:2303.02063 [pdf, other]

doi 10.3390/a16060305

Physics-Informed Deep Learning For Traffic State Estimation: A Survey and the Outlook

Authors: Xuan Di, Rongye Shi, Zhaobin Mo, Yongjie Fu

Abstract: For its robust predictive power (compared to pure physics-based models) and sample-efficient training (compared to pure deep learning models), physics-informed deep learning (PIDL), a paradigm hybridizing physics-based models and deep neural networks (DNN), has been booming in science and engineering fields. One key challenge of applying PIDL to various domains and problems lies in the design of a… ▽ More For its robust predictive power (compared to pure physics-based models) and sample-efficient training (compared to pure deep learning models), physics-informed deep learning (PIDL), a paradigm hybridizing physics-based models and deep neural networks (DNN), has been booming in science and engineering fields. One key challenge of applying PIDL to various domains and problems lies in the design of a computational graph that integrates physics and DNNs. In other words, how physics are encoded into DNNs and how the physics and data components are represented. In this paper, we provide a variety of architecture designs of PIDL computational graphs and how these structures are customized to traffic state estimation (TSE), a central problem in transportation engineering. When observation data, problem type, and goal vary, we demonstrate potential architectures of PIDL computational graphs and compare these variants using the same real-world dataset. △ Less

Submitted 1 July, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

arXiv:2302.10798 [pdf, other]

Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training

Authors: Xiaoying Zhi, Varun Babbar, Pheobe Sun, Fran Silavong, Ruibo Shi, Sean Moran

Abstract: The subject of green AI has been gaining attention within the deep learning community given the recent trend of ever larger and more complex neural network models. Existing solutions for reducing the computational load of training at inference time usually involve pruning the network parameters. Pruning schemes often create extra overhead either by iterative training and fine-tuning for static pru… ▽ More The subject of green AI has been gaining attention within the deep learning community given the recent trend of ever larger and more complex neural network models. Existing solutions for reducing the computational load of training at inference time usually involve pruning the network parameters. Pruning schemes often create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph. We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks. Our proposed pruning scheme is green-oriented, as it only requires a one-off training to discover the optimal static sub-networks by dynamic pruning methods. The pruning scheme consists of a binary gating module and a novel loss function to uncover sub-networks with user-defined sparsity. Our method enables pruning and training simultaneously, which saves energy in both the training and inference phases and avoids extra computational overhead from gating modules at inference time. Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy. Compared to other related pruning methods, our method demonstrates a lower drop in accuracy for equivalent reductions in computational cost. △ Less

Submitted 4 November, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

arXiv:2212.14189 [pdf, other]

High Resolution Modeling and Analysis of Cryptocurrency Mining's Impact on Power Grids: Carbon Footprint, Reliability, and Electricity Price

Authors: Ali Menati, Xiangtian Zheng, Kiyeob Lee, Ranyu Shi, Pengwei Du, Chanan Singh, Le Xie

Abstract: Blockchain technologies are considered one of the most disruptive innovations of the last decade, enabling secure decentralized trust-building. However, in recent years, with the rapid increase in the energy consumption of blockchain-based computations for cryptocurrency mining, there have been growing concerns about their sustainable operation in electric grids. This paper investigates the tri-fa… ▽ More Blockchain technologies are considered one of the most disruptive innovations of the last decade, enabling secure decentralized trust-building. However, in recent years, with the rapid increase in the energy consumption of blockchain-based computations for cryptocurrency mining, there have been growing concerns about their sustainable operation in electric grids. This paper investigates the tri-factor impact of such large loads on carbon footprint, grid reliability, and electricity market price in the Texas grid. We release open-source high-resolution data to enable high-resolution modeling of influencing factors such as location and flexibility. We reveal that the per-megawatt-hour carbon footprint of cryptocurrency mining loads across locations can vary by as much as 50% of the crude system average estimate. We show that the flexibility of mining loads can significantly mitigate power shortages and market disruptions that can result from the deployment of mining loads. These findings suggest policymakers to facilitate the participation of large mining facilities in wholesale markets and require them to provide mandatory demand response. △ Less

Submitted 14 April, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

Comments: This paper has been accepted for publication in the journal of "Advances in Applied Energy"

arXiv:2212.08568 [pdf, other]

Biomedical image analysis competitions: The state of current participation practice

Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, Anubha Gupta, Jan Kybic, Alison Noble, Carlos Ortiz de Solórzano, Samiksha Pachade, Caroline Petitjean, Daniel Sage, Donglai Wei, Elizabeth Wilden, Deepak Alapatt, Vincent Andrearczyk, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano , et al. (331 additional authors not shown)

Abstract: The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,… ▽ More The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps. △ Less

Submitted 12 September, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

arXiv:2209.02145 [pdf, other]

Rare but Severe Neural Machine Translation Errors Induced by Minimal Deletion: An Empirical Study on Chinese and English

Authors: Ruikang Shi, Alvin Grissom II, Duc Minh Trinh

Abstract: We examine the inducement of rare but severe errors in English-Chinese and Chinese-English in-domain neural machine translation by minimal deletion of the source text with character-based models. By deleting a single character, we can induce severe translation errors. We categorize these errors and compare the results of deleting single characters and single words. We also examine the effect of tr… ▽ More We examine the inducement of rare but severe errors in English-Chinese and Chinese-English in-domain neural machine translation by minimal deletion of the source text with character-based models. By deleting a single character, we can induce severe translation errors. We categorize these errors and compare the results of deleting single characters and single words. We also examine the effect of training data size on the number and types of pathological cases induced by these minimal perturbations, finding significant variation. We find that deleting a word hurts overall translation score more than deleting a character, but certain errors are more likely to occur when deleting characters, with language direction also influencing the effect. △ Less

Submitted 16 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

Comments: COLING 2022 Camera Ready

Journal ref: 2022.coling-1.459

arXiv:2208.09706 [pdf, other]

Dual Space Coupling Model Guided Overlap-Free Scatterplot

Authors: Zeyu Li, Ruizhi Shi, Yan Liu, Shizhuo Long, Ziheng Guo, Shichao Jia, Jiawan Zhang

Abstract: The overdraw problem of scatterplots seriously interferes with the visual tasks. Existing methods, such as data sampling, node dispersion, subspace map**, and visual abstraction, cannot guarantee the correspondence and consistency between the data points that reflect the intrinsic original data distribution and the corresponding visual units that reveal the presented data distribution, thus fail… ▽ More The overdraw problem of scatterplots seriously interferes with the visual tasks. Existing methods, such as data sampling, node dispersion, subspace map**, and visual abstraction, cannot guarantee the correspondence and consistency between the data points that reflect the intrinsic original data distribution and the corresponding visual units that reveal the presented data distribution, thus failing to obtain an overlap-free scatterplot with unbiased and lossless data distribution. A dual space coupling model is proposed in this paper to represent the complex bilateral relationship between data space and visual space theoretically and analytically. Under the guidance of the model, an overlap-free scatterplot method is developed through integration of the following: a geometry-based data transformation algorithm, namely DistributionTranscriptor; an efficient spatial mutual exclusion guided view transformation algorithm, namely PolarPacking; an overlap-free oriented visual encoding configuration model and a radius adjustment tool, namely $f_{r_{draw}}$. Our method can ensure complete and accurate information transfer between the two spaces, maintaining consistency between the newly created scatterplot and the original data distribution on global and local features. Quantitative evaluation proves our remarkable progress on computational efficiency compared with the state-of-the-art methods. Three applications involving pattern enhancement, interaction improvement, and overdraw mitigation of trajectory visualization demonstrate the broad prospects of our method. △ Less

Submitted 20 August, 2022; originally announced August 2022.

arXiv:2208.07124 [pdf, other]

ECI: a Customizable Cache Coherency Stack for Hybrid FPGA-CPU Architectures

Authors: Abishek Ramdas, Michael Giardino, Runbin Shi, Adam Turowski, David Cock, Gustavo Alonso, Timothy Roscoe

Abstract: Unlike other accelerators, FPGAs are capable of supporting cache coherency, thereby turning them into a more powerful architectural option than just a peripheral accelerator. However, most existing deployments of FPGAs are either non-cache coherent or support only an asymmetric design where cache coherency is controlled from the CPU. Taking advantage of a recently released two socket CPU-FPGA arch… ▽ More Unlike other accelerators, FPGAs are capable of supporting cache coherency, thereby turning them into a more powerful architectural option than just a peripheral accelerator. However, most existing deployments of FPGAs are either non-cache coherent or support only an asymmetric design where cache coherency is controlled from the CPU. Taking advantage of a recently released two socket CPU-FPGA architecture, in this paper we describe ECI, a flexible implementation of cache coherency on the FPGA capable of supporting both symmetric and asymmetric protocols. ECI is open and customizable, given applications the opportunity to fully interact with the cache coherency protocol, thereby opening up many interesting system design and research opportunities not available in existing designs. Through extensive microbenchmarks we show that ECI exhibits highly competitive performance and discuss in detail one use-case illustrating the benefits of having an open cache coherency stack on the FPGA. △ Less

Submitted 15 August, 2022; originally announced August 2022.

arXiv:2206.15328 [pdf, other]

Neural Annotation Refinement: Development of a New 3D Dataset for Adrenal Gland Analysis

Authors: Jiancheng Yang, Rui Shi, Udaranga Wickramasinghe, Qikui Zhu, Bingbing Ni, Pascal Fua

Abstract: The human annotations are imperfect, especially when produced by junior practitioners. Multi-expert consensus is usually regarded as golden standard, while this annotation protocol is too expensive to implement in many real-world projects. In this study, we propose a method to refine human annotation, named Neural Annotation Refinement (NeAR). It is based on a learnable implicit function, which de… ▽ More The human annotations are imperfect, especially when produced by junior practitioners. Multi-expert consensus is usually regarded as golden standard, while this annotation protocol is too expensive to implement in many real-world projects. In this study, we propose a method to refine human annotation, named Neural Annotation Refinement (NeAR). It is based on a learnable implicit function, which decodes a latent vector into represented shape. By integrating the appearance as an input of implicit functions, the appearance-aware NeAR fixes the annotation artefacts. Our method is demonstrated on the application of adrenal gland analysis. We first show that the NeAR can repair distorted golden standards on a public adrenal gland segmentation dataset. Besides, we develop a new Adrenal gLand ANalysis (ALAN) dataset with the proposed NeAR, where each case consists of a 3D shape of adrenal gland and its diagnosis label (normal vs. abnormal) assigned by experts. We show that models trained on the shapes repaired by the NeAR can diagnose adrenal glands better than the original ones. The ALAN dataset will be open-source, with 1,584 shapes for adrenal gland diagnosis, which serves as a new benchmark for medical shape analysis. Code and dataset are available at https://github.com/M3DV/NeAR. △ Less

Submitted 7 July, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

Comments: MICCAI 2022

arXiv:2206.10066 [pdf, other]

RendNet: Unified 2D/3D Recognizer With Latent Space Rendering

Authors: Ruoxi Shi, Xinyang Jiang, Caihua Shan, Yansen Wang, Dongsheng Li

Abstract: Vector graphics (VG) have been ubiquitous in our daily life with vast applications in engineering, architecture, designs, etc. The VG recognition process of most existing methods is to first render the VG into raster graphics (RG) and then conduct recognition based on RG formats. However, this procedure discards the structure of geometries and loses the high resolution of VG. Recently, another cat… ▽ More Vector graphics (VG) have been ubiquitous in our daily life with vast applications in engineering, architecture, designs, etc. The VG recognition process of most existing methods is to first render the VG into raster graphics (RG) and then conduct recognition based on RG formats. However, this procedure discards the structure of geometries and loses the high resolution of VG. Recently, another category of algorithms is proposed to recognize directly from the original VG format. But it is affected by the topological errors that can be filtered out by RG rendering. Instead of looking at one format, it is a good solution to utilize the formats of VG and RG together to avoid these shortcomings. Besides, we argue that the VG-to-RG rendering process is essential to effectively combine VG and RG information. By specifying the rules on how to transfer VG primitives to RG pixels, the rendering process depicts the interaction and correlation between VG and RG. As a result, we propose RendNet, a unified architecture for recognition on both 2D and 3D scenarios, which considers both VG/RG representations and exploits their interaction by incorporating the VG-to-RG rasterization process. Experiments show that RendNet can achieve state-of-the-art performance on 2D and 3D object recognition tasks on various VG datasets. △ Less

Submitted 20 June, 2022; originally announced June 2022.

Comments: CVPR 2022 Oral

arXiv:2206.01910 [pdf, other]

The Spike Gating Flow: A Hierarchical Structure Based Spiking Neural Network for Online Gesture Recognition

Authors: Zihao Zhao, Yanhong Wang, Qiaosha Zou, Tie Xu, Fangbo Tao, Jiansong Zhang, Xiaoan Wang, C. -J. Richard Shi, Junwen Luo, Yuan Xie

Abstract: Action recognition is an exciting research avenue for artificial intelligence since it may be a game changer in the emerging industrial fields such as robotic visions and automobiles. However, current deep learning faces major challenges for such applications because of the huge computational cost and the inefficient learning. Hence, we develop a novel brain-inspired Spiking Neural Network (SNN) b… ▽ More Action recognition is an exciting research avenue for artificial intelligence since it may be a game changer in the emerging industrial fields such as robotic visions and automobiles. However, current deep learning faces major challenges for such applications because of the huge computational cost and the inefficient learning. Hence, we develop a novel brain-inspired Spiking Neural Network (SNN) based system titled Spiking Gating Flow (SGF) for online action learning. The developed system consists of multiple SGF units which assembled in a hierarchical manner. A single SGF unit involves three layers: a feature extraction layer, an event-driven layer and a histogram-based training layer. To demonstrate the developed system capabilities, we employ a standard Dynamic Vision Sensor (DVS) gesture classification as a benchmark. The results indicate that we can achieve 87.5% accuracy which is comparable with Deep Learning (DL), but at smaller training/inference data number ratio 1.5:1. And only a single training epoch is required during the learning process. Meanwhile, to the best of our knowledge, this is the highest accuracy among the non-backpropagation algorithm based SNNs. At last, we conclude the few-shot learning paradigm of the developed network: 1) a hierarchical structure-based network design involves human prior knowledge; 2) SNNs for content based global dynamic feature detection. △ Less

Submitted 7 June, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

arXiv:2205.12449 [pdf, other]

MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning

Authors: Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Zheyuan Ryan Shi, Charles Kamhoua, Evangelos E. Papalexakis, Fei Fang

Abstract: Many recent breakthroughs in multi-agent reinforcement learning (MARL) require the use of deep neural networks, which are challenging for human experts to interpret and understand. On the other hand, existing work on interpretable reinforcement learning (RL) has shown promise in extracting more interpretable decision tree-based policies from neural networks, but only in the single-agent setting. T… ▽ More Many recent breakthroughs in multi-agent reinforcement learning (MARL) require the use of deep neural networks, which are challenging for human experts to interpret and understand. On the other hand, existing work on interpretable reinforcement learning (RL) has shown promise in extracting more interpretable decision tree-based policies from neural networks, but only in the single-agent setting. To fill this gap, we propose the first set of algorithms that extract interpretable decision-tree policies from neural networks trained with MARL. The first algorithm, IVIPER, extends VIPER, a recent method for single-agent interpretable RL, to the multi-agent setting. We demonstrate that IVIPER learns high-quality decision-tree policies for each agent. To better capture coordination between agents, we propose a novel centralized decision-tree training algorithm, MAVIPER. MAVIPER jointly grows the trees of each agent by predicting the behavior of the other agents using their anticipated trees, and uses resampling to focus on states that are critical for its interactions with other agents. We show that both algorithms generally outperform the baselines and that MAVIPER-trained agents achieve better-coordinated performance than IVIPER-trained agents on three different multi-agent particle-world environments. △ Less

Submitted 11 July, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: ECML camera-ready version. 23 pages

arXiv:2205.08585 [pdf, other]

CV4Code: Sourcecode Understanding via Visual Code Representations

Authors: Ruibo Shi, Lili Tao, Rohan Saphal, Fran Silavong, Sean J. Moran

Abstract: We present CV4Code, a compact and effective computer vision method for sourcecode understanding. Our method leverages the contextual and the structural information available from the code snippet by treating each snippet as a two-dimensional image, which naturally encodes the context and retains the underlying structural information through an explicit spatial representation. To codify snippets as… ▽ More We present CV4Code, a compact and effective computer vision method for sourcecode understanding. Our method leverages the contextual and the structural information available from the code snippet by treating each snippet as a two-dimensional image, which naturally encodes the context and retains the underlying structural information through an explicit spatial representation. To codify snippets as images, we propose an ASCII codepoint-based image representation that facilitates fast generation of sourcecode images and eliminates redundancy in the encoding that would arise from an RGB pixel representation. Furthermore, as sourcecode is treated as images, neither lexical analysis (tokenisation) nor syntax tree parsing is required, which makes the proposed method agnostic to any particular programming language and lightweight from the application pipeline point of view. CV4Code can even featurise syntactically incorrect code which is not possible from methods that depend on the Abstract Syntax Tree (AST). We demonstrate the effectiveness of CV4Code by learning Convolutional and Transformer networks to predict the functional task, i.e. the problem it solves, of the source code directly from its two-dimensional representation, and using an embedding from its latent space to derive a similarity score of two code snippets in a retrieval setup. Experimental results show that our approach achieves state-of-the-art performance in comparison to other methods with the same task and data configurations. For the first time we show the benefits of treating sourcecode understanding as a form of image processing task. △ Less

Submitted 11 May, 2022; originally announced May 2022.

arXiv:2205.07041 [pdf, other]

doi 10.1109/CoG51982.2022.9893678

VRCockpit: Mitigating Simulator Sickness in VR Games Using Multiple Egocentric 2D View Frames

Authors: Hao Chen, Rongkai Shi, Diego Monteiro, Nilufar Baghaei, Hai-Ning Liang

Abstract: Virtual reality head-mounted displays (VR HMDs) have become a popular platform for gaming. However, simulator sickness (SS) is still an impediment to VR's wider adoption, particularly in gaming. It can induce strong discomfort and impair players' immersion, performance, and enjoyment. Researchers have explored techniques to mitigate SS. While these techniques have been shown to help lessen SS, the… ▽ More Virtual reality head-mounted displays (VR HMDs) have become a popular platform for gaming. However, simulator sickness (SS) is still an impediment to VR's wider adoption, particularly in gaming. It can induce strong discomfort and impair players' immersion, performance, and enjoyment. Researchers have explored techniques to mitigate SS. While these techniques have been shown to help lessen SS, they may not be applicable to games because they cannot be easily integrated into various types of games without impacting gameplay, immersion, and performance. In this research, we introduce a new SS mitigation technique, VRCockpit. VRCockpit is a visual technique that surrounds the player with four 2D views, one for each cardinal direction, that show 2D copies of the areas of the 3D environment around the player. To study its effectiveness, we conducted two different experiments, one with a car racing game, followed by a first-person shooter game. Our results show that VRCockpit has the potential to mitigate SS and still allows players to have the same level of immersion and gameplay performance. △ Less

Submitted 23 August, 2022; v1 submitted 14 May, 2022; originally announced May 2022.

Comments: 8 pages, 4 figures, 2 tables

Showing 1–50 of 100 results for author: Shi, R