Search | arXiv e-print repository

A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick

Authors: Nishant Balepur, Matthew Shu, Alexander Hoyle, Alison Robey, Shi Feng, Seraphina Goldfarb-Tarrant, Jordan Boyd-Graber

Abstract: Keyword mnemonics are memorable explanations that link new terms to simpler keywords. Prior works generate mnemonics for students, but they do not guide models toward mnemonics students prefer and aid learning. We build SMART, a mnemonic generator trained on feedback from real students learning new terms. To train SMART, we first fine-tune LLaMA-2 on a curated set of user-written mnemonics. We the… ▽ More Keyword mnemonics are memorable explanations that link new terms to simpler keywords. Prior works generate mnemonics for students, but they do not guide models toward mnemonics students prefer and aid learning. We build SMART, a mnemonic generator trained on feedback from real students learning new terms. To train SMART, we first fine-tune LLaMA-2 on a curated set of user-written mnemonics. We then use LLM alignment to enhance SMART: we deploy mnemonics generated by SMART in a flashcard app to find preferences on mnemonics students favor. We gather 2684 preferences from 45 students across two types: expressed (inferred from ratings) and observed (inferred from student learning), yielding three key findings. First, expressed and observed preferences disagree; what students think is helpful does not fully capture what is truly helpful. Second, Bayesian models can synthesize complementary data from multiple preference types into a single effectiveness signal. SMART is tuned via Direct Preference Optimization on this signal, which we show resolves ties and missing labels in the typical method of pairwise comparisons, augmenting data for LLM output quality gains. Third, mnemonic experts assess SMART as matching GPT-4, at much lower deployment costs, showing the utility of capturing diverse student feedback to align LLMs in education. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: In-Progress Preprint

arXiv:2406.11271 [pdf, other]

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Authors: Anas Awadalla, Le Xue, Oscar Lo, Manli Shu, Hannah Lee, Etash Kumar Guha, Matt Jordan, Sheng Shen, Mohamed Awadalla, Silvio Savarese, Caiming Xiong, Ran Xu, Ye** Choi, Ludwig Schmidt

Abstract: Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimo… ▽ More Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimodal INTerleaved dataset to date. MINT-1T comprises one trillion text tokens and three billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. As scaling multimodal interleaved datasets requires substantial engineering effort, sharing the data curation process and releasing the dataset greatly benefits the community. Our experiments show that LMMs trained on MINT-1T rival the performance of models trained on the previous leading dataset, OBELICS. Our data and code will be released at https://github.com/mlfoundations/MINT-1T. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2405.13628 [pdf]

Spinons in a new Shastry-Sutherland lattice magnet Pr$_2$Ga$_2$BeO$_7$

Authors: N. Li, A. Brassington, M. F. Shu, Y. Y. Wang, H. Liang, Q. J. Li, X. Zhao, P. J. Baker, H. Kikuchi, T. Masuda, G. Duan, C. Liu, H. Wang, W. Xie, R. Zhong, J. Ma, R. Yu, H. D. Zhou, X. F. Sun

Abstract: Identifying the elusive spinon excitations in quantum spin liquid (QSL) materials is what scientists have long sought for. Recently, thermal conductivity ($κ$) has emerged to be a decisive probe because the fermionic nature of spinons leads to a characteristic nonzero linear $κ_0/T$ term while approaching zero Kelvin. So far, only a few systems have been reported to exhibit such term. Here, we rep… ▽ More Identifying the elusive spinon excitations in quantum spin liquid (QSL) materials is what scientists have long sought for. Recently, thermal conductivity ($κ$) has emerged to be a decisive probe because the fermionic nature of spinons leads to a characteristic nonzero linear $κ_0/T$ term while approaching zero Kelvin. So far, only a few systems have been reported to exhibit such term. Here, we report a $κ_0/T \approx$ 0.01 WK$^{-2}$m$^{-1}$, the largest $κ_0/T$ value ever observed in magnetic oxide QSL candidates, in a new quantum magnet Pr$_2$Ga$_2$BeO$_7$ with a Shastry-Sutherland lattice (SSL). Its QSL nature is further supported by the power-law temperature dependence of the specific heat, a plateau of muon spin relaxation rate, and gapless inelastic neutron spectra. Our theoretical analysis reveals that the introduction of XY spin anisotropy is the key for Pr$_2$Ga$_2$BeO$_7$ to be the first QSL realized on the SSL, after more than four decades of extensive studies on this celebrated magnetically frustrated lattice. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 20 pages, 6 figures, with Supplementary Information

arXiv:2404.03145 [pdf, other]

DreamWalk: Style Space Exploration using Diffusion Guidance

Authors: Michelle Shu, Charles Herrmann, Richard Strong Bowen, Forrester Cole, Ramin Zabih

Abstract: Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control. Unlike direct-editing tools like Photoshop, text conditioned models require the artist to perform "prompt engineering," constructing special text sentences to control the style or amount of a particular subject present in the output image. Our goal is to provide fine-grained cont… ▽ More Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control. Unlike direct-editing tools like Photoshop, text conditioned models require the artist to perform "prompt engineering," constructing special text sentences to control the style or amount of a particular subject present in the output image. Our goal is to provide fine-grained control over the style and substance specified by the prompt, for example to adjust the intensity of styles in different regions of the image (Figure 1). Our approach is to decompose the text prompt into conceptual elements, and apply a separate guidance term for each element in a single diffusion process. We introduce guidance scale functions to control when in the diffusion process and \emph{where} in the image to intervene. Since the method is based solely on adjusting diffusion guidance, it does not require fine-tuning or manipulating the internal layers of the diffusion model's neural network, and can be used in conjunction with LoRA- or DreamBooth-trained models (Figure2). Project page: https://mshu1.github.io/dreamwalk.github.io/ △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2402.14020 [pdf, other]

Coercing LLMs to do and reveal (almost) anything

Authors: Jonas Gei**, Alex Stein, Manli Shu, Khalid Saifullah, Yuxin Wen, Tom Goldstein

Abstract: It has recently been shown that adversarial attacks on large language models (LLMs) can "jailbreak" the model into making harmful statements. In this work, we argue that the spectrum of adversarial attacks on LLMs is much larger than merely jailbreaking. We provide a broad overview of possible attack surfaces and attack goals. Based on a series of concrete examples, we discuss, categorize and syst… ▽ More It has recently been shown that adversarial attacks on large language models (LLMs) can "jailbreak" the model into making harmful statements. In this work, we argue that the spectrum of adversarial attacks on LLMs is much larger than merely jailbreaking. We provide a broad overview of possible attack surfaces and attack goals. Based on a series of concrete examples, we discuss, categorize and systematize attacks that coerce varied unintended behaviors, such as misdirection, model control, denial-of-service, or data extraction. We analyze these attacks in controlled experiments, and find that many of them stem from the practice of pre-training LLMs with coding capabilities, as well as the continued existence of strange "glitch" tokens in common LLM vocabularies that should be removed for security reasons. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 32 pages. Implementation available at https://github.com/JonasGei**/carving

arXiv:2402.12291 [pdf, other]

KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students

Authors: Matthew Shu, Nishant Balepur, Shi Feng, Jordan Boyd-Graber

Abstract: Flashcard schedulers are tools that rely on 1) student models to predict the flashcards a student knows; and 2) teaching policies to schedule cards based on these predictions. Existing student models, however, only use flashcard-level features, like the student's past responses, ignoring the semantic ties of flashcards. Deep Knowledge Tracing (DKT) models can capture semantic relations with langua… ▽ More Flashcard schedulers are tools that rely on 1) student models to predict the flashcards a student knows; and 2) teaching policies to schedule cards based on these predictions. Existing student models, however, only use flashcard-level features, like the student's past responses, ignoring the semantic ties of flashcards. Deep Knowledge Tracing (DKT) models can capture semantic relations with language models, but are inefficient, lack content-rich datasets for evaluation, and require robust teaching policies. To address these issues, we design KARL, a DKT-inspired student model that uses retrieval and BERT embeddings for efficient and accurate student recall predictions. To test KARL, we collect a new dataset of diverse study history on trivia questions. KARL bests existing student models in AUC and calibration error. Finally, we propose a novel teaching policy that exploits the predictive power of DKT models to deploy KARL online. Based on 27 learners and 32 6-day study trajectories, KARL shows the ability to enhance medium-term educational learning, proving its efficacy for scheduling. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: In-progress preprint

arXiv:2402.06659 [pdf, other]

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

Authors: Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, Furong Huang

Abstract: Vision-Language Models (VLMs) excel in generating textual responses from visual inputs, yet their versatility raises significant security concerns. This study takes the first step in exposing VLMs' susceptibility to data poisoning attacks that can manipulate responses to innocuous, everyday prompts. We introduce Shadowcast, a stealthy data poisoning attack method where poison samples are visually… ▽ More Vision-Language Models (VLMs) excel in generating textual responses from visual inputs, yet their versatility raises significant security concerns. This study takes the first step in exposing VLMs' susceptibility to data poisoning attacks that can manipulate responses to innocuous, everyday prompts. We introduce Shadowcast, a stealthy data poisoning attack method where poison samples are visually indistinguishable from benign images with matching texts. Shadowcast demonstrates effectiveness in two attack types. The first is Label Attack, tricking VLMs into misidentifying class labels, such as confusing Donald Trump for Joe Biden. The second is Persuasion Attack, which leverages VLMs' text generation capabilities to craft narratives, such as portraying junk food as health food, through persuasive and seemingly rational descriptions. We show that Shadowcast are highly effective in achieving attacker's intentions using as few as 50 poison samples. Moreover, these poison samples remain effective across various prompts and are transferable across different VLM architectures in the black-box setting. This work reveals how poisoned VLMs can generate convincing yet deceptive misinformation and underscores the importance of data quality for responsible deployments of VLMs. Our code is available at: https://github.com/umd-huang-lab/VLM-Poisoning. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02775 [pdf]

Instant square lattice structured illumination microscopy: an optimal strategy towards photon-saving and real-time super-resolution observation

Authors: Tianyu Zhao, Zhaojun Wang, Manming Shu, **gxiang Zhang, Yansheng Liang, Shaowei Wang, Ming Lei

Abstract: Over the past decade, structured illumination microscopy (SIM) has found its niche in super-resolution (SR) microscopy due to its fast imaging speed and low excitation intensity. However, due to the significantly higher light dose compared to wide-field microscopy and the time-consuming post-processing procedures, long-term, real-time, super-resolution observation of living cells is still out of r… ▽ More Over the past decade, structured illumination microscopy (SIM) has found its niche in super-resolution (SR) microscopy due to its fast imaging speed and low excitation intensity. However, due to the significantly higher light dose compared to wide-field microscopy and the time-consuming post-processing procedures, long-term, real-time, super-resolution observation of living cells is still out of reach for most SIM setups, which inevitably limits its routine use by cell biologists. Here, we describe square lattice SIM (SL-SIM) for long-duration live cell imaging by using the square lattice optical field as illumination, which allows continuous super-resolved observation over long periods of time. In addition, by extending the previous joint spatial-frequency reconstruction concept to SL-SIM, a high-speed reconstruction strategy is validated in the GPU environment, whose reconstruction time is even shorter than image acquisition time, thus enabling real-time observation. We have demonstrated the potential of SL-SIM on various biological applications, ranging from microtubule cytoskeleton dynamics to the interactions of mitochondrial cristae and DNAs in COS7 cells. The inherent lower light dose and user-friendly workflow of the SL-SIM could help make long-duration, real-time and super-resolved observations accessible to biological laboratories. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.16545 [pdf]

Leveraging Public Cloud Infrastructure for Real-time Connected Vehicle Speed Advisory at a Signalized Corridor

Authors: Hsien-Wen Deng, M Sabbir Salek, Mizanur Rahman, Mashrur Chowdhury, Mitch Shue, Amy W. Apon

Abstract: In this study, we developed a real-time connected vehicle (CV) speed advisory application that uses public cloud services and tested it on a simulated signalized corridor for different roadway traffic conditions. First, we developed a scalable serverless cloud computing architecture leveraging public cloud services offered by Amazon Web Services (AWS) to support the requirements of a real-time CV… ▽ More In this study, we developed a real-time connected vehicle (CV) speed advisory application that uses public cloud services and tested it on a simulated signalized corridor for different roadway traffic conditions. First, we developed a scalable serverless cloud computing architecture leveraging public cloud services offered by Amazon Web Services (AWS) to support the requirements of a real-time CV application. Second, we developed an optimization-based real-time CV speed advisory algorithm by taking a modular design approach, which makes the application automatically scalable and deployable in the cloud using the serverless architecture. Third, we developed a cloud-in-the-loop simulation testbed using AWS and an open-source microscopic roadway traffic simulator called Simulation of Urban Mobility (SUMO). Our analyses based on different roadway traffic conditions showed that the serverless CV speed advisory application meets the latency requirement of real-time CV mobility applications. Besides, our serverless CV speed advisory application reduced the average stopped delay (by 77%) and the aggregated risk of collision (by 21%) at signalized intersection of a corridor. These prove the feasibility as well as the efficacy of utilizing public cloud infrastructure to implement real-time roadway traffic management applications in a CV environment. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2312.06284 [pdf]

Static magnetic order with strong quantum fluctuations in spin-1/2 honeycomb magnet Na2Co2TeO6

Authors: Gaoting Lin, **long Jiao, Xiyang Li, Mingfang Shu, Oksana Zaharko, Toni Shiroka, Tao Hong, Alexander I. Kolesnikov, Guochu Deng, Sarah Dunsiger, Haidong Zhou, Tian Shang, Jie Ma

Abstract: Kitaev interactions, arising from the interplay of frustration and bond anisotropy, can lead to strong quantum fluctuations and, in an ideal case, to a quantum-spin-liquid state. However, in many nonideal materials, spurious non-Kitaev interactions typically promote a zigzag antiferromagnetic order in the d-orbital transition metal compounds. By combining neutron scattering with muon-spin rotation… ▽ More Kitaev interactions, arising from the interplay of frustration and bond anisotropy, can lead to strong quantum fluctuations and, in an ideal case, to a quantum-spin-liquid state. However, in many nonideal materials, spurious non-Kitaev interactions typically promote a zigzag antiferromagnetic order in the d-orbital transition metal compounds. By combining neutron scattering with muon-spin rotation and relaxation techniques, we provide new insights into the exotic properties of Na2Co2TeO6, a candidate Kitaev material. Below TN, the zero-field muon-spin relaxation rate becomes almost constant (at 0.45 us-1). We attribute this temperature-independent muon-spin relaxation rate to the strong quantum fluctuations, as well as to the frustrated Kitaev interactions. As the magnetic field increases, neutron scattering data indicate a much broader spin-wave-excitation gap at the K-point. Therefore, quantum fluctuations seem not only robust, but are even enhanced by the applied magnetic field. Our findings provide valuable hints for understanding the onset of the quantum-spin-liquid state in Kitaev materials. △ Less

Submitted 20 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 28 pages, 11 figures, and 1 lable

arXiv:2312.01073 [pdf]

High-speed image reconstruction for nonlinear structured illumination microscopy

Authors: **gxiang Zhang, Tianyu Zhao, Xiangda Fu, Manming Shu, Jia**g Yan, **xiao Chen, Yansheng Liang, Shaowei Wang, Ming Lei

Abstract: By exploiting the nonlinear responses of the fluorescent probes, the spatial resolution of structured illumination microscopy(SIM) can be further increased. However, due to the complex reconstruction process, the traditional reconstruction method of nonlinear structured illumination microscopy (NL-SIM) is relatively slow, which brings a great challenge to realizing real-time display of super-resol… ▽ More By exploiting the nonlinear responses of the fluorescent probes, the spatial resolution of structured illumination microscopy(SIM) can be further increased. However, due to the complex reconstruction process, the traditional reconstruction method of nonlinear structured illumination microscopy (NL-SIM) is relatively slow, which brings a great challenge to realizing real-time display of super-resolution results. To address these issues, an accelerated NL-SIM reconstruction algorithm was developed by extending a high-speed reconstruction framework, Joint Space and Frequency Reconstruction (JSFR) to NL-SIM. We anticipate that this algorithm will facilitate NL- SIM becoming a routine tool in biomedical laboratories. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2310.19909 [pdf, other]

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

Authors: Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Prabhu, Gowthami Somepalli, Prithvijit Chattopadhyay, Mark Ibrahim, Adrien Bardes, Judy Hoffman, Rama Chellappa, Andrew Gordon Wilson, Tom Goldstein

Abstract: Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performan… ▽ More Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performance increases for a range of systems, it is difficult for practitioners to make informed decisions about which backbone to choose. Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. Furthermore, BoB sheds light on promising directions for the research community to advance computer vision by illuminating strengths and weakness of existing approaches through a comprehensive analysis conducted on more than 1500 training runs. While vision transformers (ViTs) and self-supervised learning (SSL) are increasingly popular, we find that convolutional neural networks pretrained in a supervised fashion on large training sets still perform best on most tasks among the models we consider. Moreover, in apples-to-apples comparisons on the same architectures and similarly sized pretraining datasets, we find that SSL backbones are highly competitive, indicating that future works should perform SSL pretraining with advanced architectures and larger pretraining datasets. We release the raw results of our experiments along with code that allows researchers to put their own backbones through the gauntlet here: https://github.com/hsouri/Battle-of-the-Backbones △ Less

Submitted 19 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2309.04169 [pdf, other]

Grou** Boundary Proposals for Fast Interactive Image Segmentation

Authors: Li Liu, Da Chen, Minglei Shu, Laurent D. Cohen

Abstract: Geodesic models are known as an efficient tool for solving various image segmentation problems. Most of existing approaches only exploit local pointwise image features to track geodesic paths for delineating the objective boundaries. However, such a segmentation strategy cannot take into account the connectivity of the image edge features, increasing the risk of shortcut problem, especially in the… ▽ More Geodesic models are known as an efficient tool for solving various image segmentation problems. Most of existing approaches only exploit local pointwise image features to track geodesic paths for delineating the objective boundaries. However, such a segmentation strategy cannot take into account the connectivity of the image edge features, increasing the risk of shortcut problem, especially in the case of complicated scenario. In this work, we introduce a new image segmentation model based on the minimal geodesic framework in conjunction with an adaptive cut-based circular optimal path computation scheme and a graph-based boundary proposals grou** scheme. Specifically, the adaptive cut can disconnect the image domain such that the target contours are imposed to pass through this cut only once. The boundary proposals are comprised of precomputed image edge segments, providing the connectivity information for our segmentation model. These boundary proposals are then incorporated into the proposed image segmentation model, such that the target segmentation contours are made up of a set of selected boundary proposals and the corresponding geodesic paths linking them. Experimental results show that the proposed model indeed outperforms state-of-the-art minimal paths-based image segmentation approaches. △ Less

Submitted 8 September, 2023; originally announced September 2023.

arXiv:2309.01579 [pdf, other]

Direct observation of topological surface states in the layered kagome lattice with broken time-reversal symmetry

Authors: Zhicheng Jiang, Tongrui Li, Jian Yuan, Zhengtai Liu, Zhipeng Cao, Soohyun Cho, Mingfang Shu, Yichen Yang, Jianyang Ding, Zhikai Li, Jiayu Liu, Zhonghao Liu, Jishan Liu, Jie Ma, Zhe Sun, Yanfeng Guo, Dawei Shen

Abstract: Magnetic topological quantum materials display a diverse range of fascinating physical properties which arise from their intrinsic magnetism and the breaking of time-reversal symmetry. However, so far, few examples of intrinsic magnetic topological materials have been confirmed experimentally, which significantly hinder our comprehensive understanding of the abundant physical properties in this sy… ▽ More Magnetic topological quantum materials display a diverse range of fascinating physical properties which arise from their intrinsic magnetism and the breaking of time-reversal symmetry. However, so far, few examples of intrinsic magnetic topological materials have been confirmed experimentally, which significantly hinder our comprehensive understanding of the abundant physical properties in this system. The kagome lattices, which host diversity of electronic structure signatures such as Dirac nodes, flat bands, and saddle points, provide an alternative and promising platform for in-depth investigations into correlations and band topology. In this article, drawing inspiration from the stacking configuration of MnBi$_2$Te$_4$, we conceive and then synthesize a high-quality single crystal EuTi$_3$Bi$_4$, which is a unique natural heterostructure consisting of both topological kagome layers and magnetic interlayers. We investigate the electronic structure of EuTi$_3$Bi$_4$ and uncover distinct features of anisotropic multiple Van Hove singularitie (VHS) that might prevent Fermi surface nesting, leading to the absence of a charge density wave (CDW). In addition, we identify the topological nontrivial surface states that serve as connections between different saddle bands in the vicinity of the Fermi level. Combined with calculations, we establish that, the effective time-reversal symmetry S=$θ$$τ_{1/2}$ play a crucial role in the antiferromagnetic ground state of EuTi$_3$Bi$_4$, which ensures the stability of the topological surface states and gives rise to their intriguing topological nature. Therefore, EuTi$_3$Bi$_4$ offers the rare opportunity to investigate correlated topological states in magnetic kagome materials. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: 9 pages, 4 figures

arXiv:2308.15729 [pdf, other]

Computing Geodesic Paths Encoding a Curvature Prior

Authors: Da Chen, Jean-Marie Mirebeau, Minglei Shu, Laurent D. Cohen

Abstract: In this paper, we introduce an efficient method for computing curves minimizing a variant of the Euler-Mumford elastica energy, with fixed endpoints and tangents at these endpoints, where the bending energy is enhanced with a user defined and data-driven scalar-valued term referred to as the curvature prior. In order to guarantee that the globally optimal curve is extracted, the proposed method in… ▽ More In this paper, we introduce an efficient method for computing curves minimizing a variant of the Euler-Mumford elastica energy, with fixed endpoints and tangents at these endpoints, where the bending energy is enhanced with a user defined and data-driven scalar-valued term referred to as the curvature prior. In order to guarantee that the globally optimal curve is extracted, the proposed method involves the numerical computation of the viscosity solution to a specific static Hamilton-Jacobi-Bellman (HJB) partial differential equation (PDE). For that purpose, we derive the explicit Hamiltonian associated to this variant model equipped with a curvature prior, discretize the resulting HJB PDE using an adaptive finite difference scheme, and solve it in a single pass using a generalized Fast-Marching method. In addition, we also present a practical method for estimating the curvature prior values from image data, designed for the task of accurately tracking curvilinear structure centerlines. Numerical experiments on synthetic and real image data illustrate the advantages of the considered variant of the elastica model with a prior curvature enhancement in complex scenarios where challenging geometric structures appear. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2306.17194 [pdf, other]

On the Exploitability of Instruction Tuning

Authors: Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Gei**, Chaowei Xiao, Tom Goldstein

Abstract: Instruction tuning is an effective technique to align large language models (LLMs) with human intents. In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior. For example, an adversary can achieve content injection by injecting training examples that men… ▽ More Instruction tuning is an effective technique to align large language models (LLMs) with human intents. In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior. For example, an adversary can achieve content injection by injecting training examples that mention target content and eliciting such behavior from downstream models. To achieve this goal, we propose \textit{AutoPoison}, an automated data poisoning pipeline. It naturally and coherently incorporates versatile attack goals into poisoned data with the help of an oracle LLM. We showcase two example attacks: content injection and over-refusal attacks, each aiming to induce a specific exploitable behavior. We quantify and benchmark the strength and the stealthiness of our data poisoning scheme. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data while maintaining a high level of stealthiness in the poisoned examples. We hope our work sheds light on how data quality affects the behavior of instruction-tuned models and raises awareness of the importance of data quality for responsible deployments of LLMs. Code is available at \url{https://github.com/azshue/AutoPoison}. △ Less

Submitted 28 October, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023 camera-ready (21 pages, 10 figures)

arXiv:2306.13651 [pdf, other]

Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

Authors: Neel Jain, Khalid Saifullah, Yuxin Wen, John Kirchenbauer, Manli Shu, Aniruddha Saha, Micah Goldblum, Jonas Gei**, Tom Goldstein

Abstract: With the rise of Large Language Models (LLMs) and their ubiquitous deployment in diverse domains, measuring language model behavior on realistic data is imperative. For example, a company deploying a client-facing chatbot must ensure that the model will not respond to client requests with profanity. Current evaluations approach this problem using small, domain-specific datasets with human-curated… ▽ More With the rise of Large Language Models (LLMs) and their ubiquitous deployment in diverse domains, measuring language model behavior on realistic data is imperative. For example, a company deploying a client-facing chatbot must ensure that the model will not respond to client requests with profanity. Current evaluations approach this problem using small, domain-specific datasets with human-curated labels. These evaluation sets are often sampled from a narrow and simplified distribution, and data sources can unknowingly be leaked into the training set which can lead to misleading evaluations. To bypass these drawbacks, we propose a framework for self-supervised evaluation of LLMs by analyzing their sensitivity or invariance to transformations on the input text. Self-supervised evaluation can directly monitor LLM behavior on datasets collected in the wild or streamed during live model deployment. We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence, in addition to sensitivity to grammatical structure and tokenization errors. When comparisons to similar human-labeled benchmarks are available, we find strong correlations between self-supervised and human-supervised evaluations. The self-supervised paradigm complements current evaluation strategies that rely on labeled data. △ Less

Submitted 29 June, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

Comments: Code is available at https://github.com/neelsjain/BYOD. First two authors contributed equally. 21 pages, 22 figures

arXiv:2306.05802 [pdf, other]

Static and dynamical properties of the spin-5/2 nearly ideal triangular lattice antiferromagnet Ba3MnSb2O9

Authors: Mingfang Shu, Weicen Dong, **long Jiao, Jiangtao Wu, Gaoting lin, Tao Hong, Huibo Cao, Masaaki Matsuda, Wei Tian, Songxue Chi, Georg Ehlers, Zhongwen Ouyang, Hongwei Chen, Youming Zou, Zhe Qu, Qing Huang, Haidong Zhou, Yoshitomo Kamiya, Jie Ma

Abstract: We study the ground state and spin excitations in Ba3MnSb2O9, an easy-plane S = 5/2 triangular lattice antiferromagnet. By combining single-crystal neutron scattering, electric spin resonance (ESR), and spin wave calculations, we determine the frustrated quasi-two-dimensional spin Hamiltonian parameters describing the material. While the material has a slight monoclinic structural distortion, whic… ▽ More We study the ground state and spin excitations in Ba3MnSb2O9, an easy-plane S = 5/2 triangular lattice antiferromagnet. By combining single-crystal neutron scattering, electric spin resonance (ESR), and spin wave calculations, we determine the frustrated quasi-two-dimensional spin Hamiltonian parameters describing the material. While the material has a slight monoclinic structural distortion, which could allow for isosceles-triangular exchanges and biaxial anisotropy by symmetry, we observe no deviation from the behavior expected for spin waves in the in-plane 120o state. Even the easy-plane anisotropy is so small that it can only be detected by ESR in our study. In conjunction with the quasi-two-dimensionality, our study establishes that Ba3MnSb2O9 is a nearly ideal triangular lattice antiferromagnet with the quasi-classical spin S = 5/2, which suggests that it has the potential for an experimental study of Z- or Z2-vortex excitations. △ Less

Submitted 7 September, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

arXiv:2306.04634 [pdf, other]

On the Reliability of Watermarks for Large Language Models

Authors: John Kirchenbauer, Jonas Gei**, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, Tom Goldstein

Abstract: As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. Watermarking is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked… ▽ More As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. Watermarking is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked text may be modified to suit a user's needs, or entirely rewritten to avoid detection. We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. While these attacks dilute the strength of the watermark, paraphrases are statistically likely to leak n-grams or even longer fragments of the original text, resulting in high-confidence detections when enough tokens are observed. For example, after strong human paraphrasing the watermark is detectable after observing 800 tokens on average, when setting a 1e-5 false positive rate. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors. △ Less

Submitted 1 May, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: 9 pages in the main body. Published at ICLR 2024. Code is available at https://github.com/jwkirchenbauer/lm-watermarking

arXiv:2304.09391 [pdf]

Inferring High-level Geographical Concepts via Knowledge Graph and Multi-scale Data Integration: A Case Study of C-shaped Building Pattern Recognition

Authors: Zhiwei Wei, Yi Xiao, Wenjia Xu, Mi Shu, Lu Cheng, Yang Wang, Chunbo Liu

Abstract: Effective building pattern recognition is critical for understanding urban form, automating map generalization, and visualizing 3D city models. Most existing studies use object-independent methods based on visual perception rules and proximity graph models to extract patterns. However, because human vision is a part-based system, pattern recognition may require decomposing shapes into parts or gro… ▽ More Effective building pattern recognition is critical for understanding urban form, automating map generalization, and visualizing 3D city models. Most existing studies use object-independent methods based on visual perception rules and proximity graph models to extract patterns. However, because human vision is a part-based system, pattern recognition may require decomposing shapes into parts or grou** them into clusters. Existing methods may not recognize all visually aware patterns, and the proximity graph model can be inefficient. To improve efficiency and effectiveness, we integrate multi-scale data using a knowledge graph, focusing on the recognition of C-shaped building patterns. First, we use a property graph to represent the relationships between buildings within and across different scales involved in C-shaped building pattern recognition. Next, we store this knowledge graph in a graph database and convert the rules for C-shaped pattern recognition and enrichment into query conditions. Finally, we recognize and enrich C-shaped building patterns using rule-based reasoning in the built knowledge graph. We verify the effectiveness of our method using multi-scale data with three levels of detail (LODs) collected from the Gaode Map. Our results show that our method achieves a higher recall rate of 26.4% for LOD1, 20.0% for LOD2, and 9.1% for LOD3 compared to existing approaches. We also achieve recognition efficiency improvements of 0.91, 1.37, and 9.35 times, respectively. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2301.02650 [pdf, other]

Hierarchical Point Attention for Indoor 3D Object Detection

Authors: Manli Shu, Le Xue, Ning Yu, Roberto Martín-Martín, Caiming Xiong, Tom Goldstein, Juan Carlos Niebles, Ran Xu

Abstract: 3D object detection is an essential vision technique for various robotic systems, such as augmented reality and domestic robots. Transformers as versatile network architectures have recently seen great success in 3D point cloud object detection. However, the lack of hierarchy in a plain transformer restrains its ability to learn features at different scales. Such limitation makes transformer detec… ▽ More 3D object detection is an essential vision technique for various robotic systems, such as augmented reality and domestic robots. Transformers as versatile network architectures have recently seen great success in 3D point cloud object detection. However, the lack of hierarchy in a plain transformer restrains its ability to learn features at different scales. Such limitation makes transformer detectors perform worse on smaller objects and affects their reliability in indoor environments where small objects are the majority. This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors. First, we propose Aggregated Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning. Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals. Both attention operations are model-agnostic network modules that can be plugged into existing point cloud transformers for end-to-end training. We evaluate our method on two widely used indoor detection benchmarks. By plugging our proposed modules into the state-of-the-art transformer-based 3D detectors, we improve the previous best results on both benchmarks, with more significant improvements on smaller objects. △ Less

Submitted 8 May, 2024; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: ICRA 2024 camera-ready (7 pages, 5 figures)

arXiv:2212.06727 [pdf, other]

What do Vision Transformers Learn? A Visual Exploration

Authors: Amin Ghiasi, Hamid Kazemi, Eitan Borgnia, Steven Reich, Manli Shu, Micah Goldblum, Andrew Gordon Wilson, Tom Goldstein

Abstract: Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs. Assiste… ▽ More Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs. Assisted by these solutions, we observe that neurons in ViTs trained with language model supervision (e.g., CLIP) are activated by semantic concepts rather than visual features. We also explore the underlying differences between ViTs and CNNs, and we find that transformers detect image background features, just like their convolutional counterparts, but their predictions depend far less on high-frequency information. On the other hand, both architecture types behave similarly in the way features progress from abstract patterns in early layers to concrete objects in late layers. In addition, we show that ViTs maintain spatial information in all layers except the final layer. In contrast to previous works, we show that the last layer most likely discards the spatial information and behaves as a learned global pooling operation. Finally, we conduct large-scale visualizations on a wide range of ViT variants, including DeiT, CoaT, ConViT, PiT, Swin, and Twin, to validate the effectiveness of our method. △ Less

Submitted 13 December, 2022; originally announced December 2022.

arXiv:2209.07511 [pdf, other]

Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

Authors: Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, Chaowei Xiao

Abstract: Pre-trained vision-language models (e.g., CLIP) have shown promising zero-shot generalization in many downstream tasks with properly designed text prompts. Instead of relying on hand-engineered prompts, recent works learn prompts using the training data from downstream tasks. While effective, training on domain-specific data reduces a model's generalization capability to unseen new domains. In thi… ▽ More Pre-trained vision-language models (e.g., CLIP) have shown promising zero-shot generalization in many downstream tasks with properly designed text prompts. Instead of relying on hand-engineered prompts, recent works learn prompts using the training data from downstream tasks. While effective, training on domain-specific data reduces a model's generalization capability to unseen new domains. In this work, we propose test-time prompt tuning (TPT), a method that can learn adaptive prompts on the fly with a single test sample. For image classification, TPT optimizes the prompt by minimizing the entropy with confidence selection so that the model has consistent predictions across different augmented views of each test sample. In evaluating generalization to natural distribution shifts, TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average, surpassing previous prompt tuning approaches that require additional task-specific training data. In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data. Project page: https://azshue.github.io/TPT. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: NeurIPS 2022

arXiv:2208.07237 [pdf, ps, other]

Energy and Spectrum Efficient Federated Learning via High-Precision Over-the-Air Computation

Authors: Liang Li, Chenpei Huang, Dian Shi, Hao Wang, Xiangwei Zhou, Minglei Shu, Miao Pan

Abstract: Federated learning (FL) enables mobile devices to collaboratively learn a shared prediction model while kee** data locally. However, there are two major research challenges to practically deploy FL over mobile devices: (i) frequent wireless updates of huge size gradients v.s. limited spectrum resources, and (ii) energy-hungry FL communication and local computing during training v.s. battery-cons… ▽ More Federated learning (FL) enables mobile devices to collaboratively learn a shared prediction model while kee** data locally. However, there are two major research challenges to practically deploy FL over mobile devices: (i) frequent wireless updates of huge size gradients v.s. limited spectrum resources, and (ii) energy-hungry FL communication and local computing during training v.s. battery-constrained mobile devices. To address those challenges, in this paper, we propose a novel multi-bit over-the-air computation (M-AirComp) approach for spectrum-efficient aggregation of local model updates in FL and further present an energy-efficient FL design for mobile devices. Specifically, a high-precision digital modulation scheme is designed and incorporated in the M-AirComp, allowing mobile devices to upload model updates at the selected positions simultaneously in the multi-access channel. Moreover, we theoretically analyze the convergence property of our FL algorithm. Guided by FL convergence analysis, we formulate a joint transmission probability and local computing control optimization, aiming to minimize the overall energy consumption (i.e., iterative local computing + multi-round communications) of mobile devices in FL. Extensive simulation results show that our proposed scheme outperforms existing ones in terms of spectrum utilization, energy efficiency, and learning accuracy. △ Less

Submitted 15 August, 2022; originally announced August 2022.

arXiv:2204.05575 [pdf, other]

DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection

Authors: Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, Zaiqing Nie

Abstract: Autonomous driving faces great safety challenges for a lack of global perspective and the limitation of long-range perception capabilities. It has been widely agreed that vehicle-infrastructure cooperation is required to achieve Level 5 autonomy. However, there is still NO dataset from real scenarios available for computer vision researchers to work on vehicle-infrastructure cooperation-related pr… ▽ More Autonomous driving faces great safety challenges for a lack of global perspective and the limitation of long-range perception capabilities. It has been widely agreed that vehicle-infrastructure cooperation is required to achieve Level 5 autonomy. However, there is still NO dataset from real scenarios available for computer vision researchers to work on vehicle-infrastructure cooperation-related problems. To accelerate computer vision research and innovation for Vehicle-Infrastructure Cooperative Autonomous Driving (VICAD), we release DAIR-V2X Dataset, which is the first large-scale, multi-modality, multi-view dataset from real scenarios for VICAD. DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations. The Vehicle-Infrastructure Cooperative 3D Object Detection problem (VIC3D) is introduced, formulating the problem of collaboratively locating and identifying 3D objects using sensory inputs from both vehicle and infrastructure. In addition to solving traditional 3D object detection problems, the solution of VIC3D needs to consider the temporal asynchrony problem between vehicle and infrastructure sensors and the data transmission cost between them. Furthermore, we propose Time Compensation Late Fusion (TCLF), a late fusion framework for the VIC3D task as a benchmark based on DAIR-V2X. Find data, code, and more up-to-date information at https://thudair.baai.ac.cn/index and https://github.com/AIR-THU/DAIR-V2X. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: CVPR2022

arXiv:2203.13608 [pdf, other]

Rope3D: TheRoadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task

Authors: Xiaoqing Ye, Mao Shu, Hanyu Li, Yifeng Shi, Yingying Li, Guangjie Wang, Xiao Tan, Errui Ding

Abstract: Concurrent perception datasets for autonomous driving are mainly limited to frontal view with sensors mounted on the vehicle. None of them is designed for the overlooked roadside perception tasks. On the other hand, the data captured from roadside cameras have strengths over frontal-view data, which is believed to facilitate a safer and more intelligent autonomous driving system. To accelerate the… ▽ More Concurrent perception datasets for autonomous driving are mainly limited to frontal view with sensors mounted on the vehicle. None of them is designed for the overlooked roadside perception tasks. On the other hand, the data captured from roadside cameras have strengths over frontal-view data, which is believed to facilitate a safer and more intelligent autonomous driving system. To accelerate the progress of roadside perception, we present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view. The dataset consists of 50k images and over 1.5M 3D objects in various scenes, which are captured under different settings including various cameras with ambiguous mounting positions, camera specifications, viewpoints, and different environmental conditions. We conduct strict 2D-3D joint annotation and comprehensive data analysis, as well as set up a new 3D roadside perception benchmark with metrics and evaluation devkit. Furthermore, we tailor the existing frontal-view monocular 3D object detection approaches and propose to leverage the geometry constraint to solve the inherent ambiguities caused by various sensors, viewpoints. Our dataset is available on https://thudair.baai.ac.cn/rope. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: To appear in CVPR2022

arXiv:2111.11180 [pdf]

Regulate the direct-indirect electronic band gap transition by electron-phonon interaction in BaSnO3

Authors: Binru Zhao, Qing Huang, Jiangtao Wu, **long Jiao, Mingfang Shu, Gaoting Lin, Qiyang Sun, Ranran Zhang, Masato Hagihala, Shuki Torri, Guohua Wang, Qingyong Ren, Chen Li, Zhe Qu, Haidong Zhou, Jie Ma

Abstract: The neutron powder diffraction, specific heat, thermal conductivity, and Raman scattering measurements were presented to study the interplays of lattice, phonons and electrons of the Sr-do** Ba1-xSrxSnO3 (x was less than or equal to 0.1). Although Ba1-xSrxSnO3 kept the cubic lattice, the Raman spectra suggested a dynamic distortion at low temperature. The density functional theory was applied to… ▽ More The neutron powder diffraction, specific heat, thermal conductivity, and Raman scattering measurements were presented to study the interplays of lattice, phonons and electrons of the Sr-do** Ba1-xSrxSnO3 (x was less than or equal to 0.1). Although Ba1-xSrxSnO3 kept the cubic lattice, the Raman spectra suggested a dynamic distortion at low temperature. The density functional theory was applied to analyze the electronic structures and phonon dispersions of Ba1-xSrxSnO3(x = 0, 0.0125), and the behaviors of electron bands around Fermi levels were discussed. According to the experimental and theoretical results, the Sr-do** played a significant role in tuning the indirect band gap of BaSnO3 and influenced the electron-phonon interaction. △ Less

Submitted 11 April, 2022; v1 submitted 13 November, 2021; originally announced November 2021.

Comments: 23 pages, 7 figures, 3 tables, no supplemental materials

arXiv:2111.00794 [pdf, other]

Geodesic Models with Convexity Shape Prior

Authors: Da Chen, Jean-Marie Mirebeau, Minglei Shu, Xuecheng Tai, Laurent D. Cohen

Abstract: The minimal geodesic models based on the Eikonal equations are capable of finding suitable solutions in various image segmentation scenarios. Existing geodesic-based segmentation approaches usually exploit image features in conjunction with geometric regularization terms, such as Euclidean curve length or curvature-penalized length, for computing geodesic curves. In this paper, we take into accoun… ▽ More The minimal geodesic models based on the Eikonal equations are capable of finding suitable solutions in various image segmentation scenarios. Existing geodesic-based segmentation approaches usually exploit image features in conjunction with geometric regularization terms, such as Euclidean curve length or curvature-penalized length, for computing geodesic curves. In this paper, we take into account a more complicated problem: finding curvature-penalized geodesic paths with a convexity shape prior. We establish new geodesic models relying on the strategy of orientation-lifting, by which a planar curve can be mapped to an high-dimensional orientation-dependent space. The convexity shape prior serves as a constraint for the construction of local geodesic metrics encoding a particular curvature constraint. Then the geodesic distances and the corresponding closed geodesic paths in the orientation-lifted space can be efficiently computed through state-of-the-art Hamiltonian fast marching method. In addition, we apply the proposed geodesic models to the active contours, leading to efficient interactive image segmentation algorithms that preserve the advantages of convexity shape prior and curvature penalization. △ Less

Submitted 25 November, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: This paper has been accepted by TPAMI

arXiv:2111.00637 [pdf, other]

To Talk or to Work: Delay Efficient Federated Learning over Mobile Edge Devices

Authors: Pavana Prakash, Jiahao Ding, Maoqiang Wu, Minglei Shu, Rong Yu, Miao Pan

Abstract: Federated learning (FL), an emerging distributed machine learning paradigm, in conflux with edge computing is a promising area with novel applications over mobile edge devices. In FL, since mobile devices collaborate to train a model based on their own data under the coordination of a central server by sharing just the model updates, training data is maintained private. However, without the centra… ▽ More Federated learning (FL), an emerging distributed machine learning paradigm, in conflux with edge computing is a promising area with novel applications over mobile edge devices. In FL, since mobile devices collaborate to train a model based on their own data under the coordination of a central server by sharing just the model updates, training data is maintained private. However, without the central availability of data, computing nodes need to communicate the model updates often to attain convergence. Hence, the local computation time to create local model updates along with the time taken for transmitting them to and from the server result in a delay in the overall time. Furthermore, unreliable network connections may obstruct an efficient communication of these updates. To address these, in this paper, we propose a delay-efficient FL mechanism that reduces the overall time (consisting of both the computation and communication latencies) and communication rounds required for the model to converge. Exploring the impact of various parameters contributing to delay, we seek to balance the trade-off between wireless communication (to talk) and local computation (to work). We formulate a relation with overall time as an optimization problem and demonstrate the efficacy of our approach through extensive simulations. △ Less

Submitted 31 October, 2021; originally announced November 2021.

Comments: Accepted for publication in Globecom'21

arXiv:2108.09641 [pdf, other]

Deep survival analysis with longitudinal X-rays for COVID-19

Authors: Michelle Shu, Richard Strong Bowen, Charles Herrmann, Gengmo Qi, Michele Santacatterina, Ramin Zabih

Abstract: Time-to-event analysis is an important statistical tool for allocating clinical resources such as ICU beds. However, classical techniques like the Cox model cannot directly incorporate images due to their high dimensionality. We propose a deep learning approach that naturally incorporates multiple, time-dependent imaging studies as well as non-imaging data into time-to-event analysis. Our techniqu… ▽ More Time-to-event analysis is an important statistical tool for allocating clinical resources such as ICU beds. However, classical techniques like the Cox model cannot directly incorporate images due to their high dimensionality. We propose a deep learning approach that naturally incorporates multiple, time-dependent imaging studies as well as non-imaging data into time-to-event analysis. Our techniques are benchmarked on a clinical dataset of 1,894 COVID-19 patients, and show that image sequences significantly improve predictions. For example, classical time-to-event methods produce a concordance error of around 30-40% for predicting hospital admission, while our error is 25% without images and 20% with multiple X-rays included. Ablation studies suggest that our models are not learning spurious features such as scanner artifacts. While our focus and evaluation is on COVID-19, the methods we develop are broadly applicable. △ Less

Submitted 22 August, 2021; originally announced August 2021.

arXiv:2108.04430 [pdf, other]

Enhancing Knowledge Tracing via Adversarial Training

Authors: Xiaopeng Guo, Zhijie Huang, Jie Gao, Mingyu Shang, Mao**g Shu, Jun Sun

Abstract: We study the problem of knowledge tracing (KT) where the goal is to trace the students' knowledge mastery over time so as to make predictions on their future performance. Owing to the good representation capacity of deep neural networks (DNNs), recent advances on KT have increasingly concentrated on exploring DNNs to improve the performance of KT. However, we empirically reveal that the DNNs based… ▽ More We study the problem of knowledge tracing (KT) where the goal is to trace the students' knowledge mastery over time so as to make predictions on their future performance. Owing to the good representation capacity of deep neural networks (DNNs), recent advances on KT have increasingly concentrated on exploring DNNs to improve the performance of KT. However, we empirically reveal that the DNNs based KT models may run the risk of overfitting, especially on small datasets, leading to limited generalization. In this paper, by leveraging the current advances in adversarial training (AT), we propose an efficient AT based KT method (ATKT) to enhance KT model's generalization and thus push the limit of KT. Specifically, we first construct adversarial perturbations and add them on the original interaction embeddings as adversarial examples. The original and adversarial examples are further used to jointly train the KT model, forcing it is not only to be robust to the adversarial examples, but also to enhance the generalization over the original ones. To better implement AT, we then present an efficient attentive-LSTM model as KT backbone, where the key is a proposed knowledge hidden state attention module that adaptively aggregates information from previous knowledge hidden states while simultaneously highlighting the importance of current knowledge hidden state to make a more accurate prediction. Extensive experiments on four public benchmark datasets demonstrate that our ATKT achieves new state-of-the-art performance. Code is available at: \color{blue} {\url{https://github.com/xiaopengguo/ATKT}}. △ Less

Submitted 9 August, 2021; originally announced August 2021.

Comments: Accepted by ACM MM 2021

arXiv:2108.01335 [pdf, other]

Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability

Authors: Roman Levin, Manli Shu, Eitan Borgnia, Furong Huang, Micah Goldblum, Tom Goldstein

Abstract: Conventional saliency maps highlight input features to which neural network predictions are highly sensitive. We take a different approach to saliency, in which we identify and analyze the network parameters, rather than inputs, which are responsible for erroneous decisions. We find that samples which cause similar parameters to malfunction are semantically similar. We also show that pruning the m… ▽ More Conventional saliency maps highlight input features to which neural network predictions are highly sensitive. We take a different approach to saliency, in which we identify and analyze the network parameters, rather than inputs, which are responsible for erroneous decisions. We find that samples which cause similar parameters to malfunction are semantically similar. We also show that pruning the most salient parameters for a wrongly classified sample often improves model behavior. Furthermore, fine-tuning a small number of the most salient parameters on a single sample results in error correction on other samples that are misclassified for similar reasons. Based on our parameter saliency method, we also introduce an input-space saliency technique that reveals how image features cause specific network components to malfunction. Further, we rigorously validate the meaningfulness of our saliency maps on both the dataset and case-study levels. △ Less

Submitted 9 October, 2022; v1 submitted 3 August, 2021; originally announced August 2021.

arXiv:2102.13262 [pdf, other]

Improving Robustness of Learning-based Autonomous Steering Using Adversarial Images

Authors: Yu Shen, Laura Zheng, Manli Shu, Weizi Li, Tom Goldstein, Ming C. Lin

Abstract: For safety of autonomous driving, vehicles need to be able to drive under various lighting, weather, and visibility conditions in different environments. These external and environmental factors, along with internal factors associated with sensors, can pose significant challenges to perceptual data processing, hence affecting the decision-making and control of the vehicle. In this work, we address… ▽ More For safety of autonomous driving, vehicles need to be able to drive under various lighting, weather, and visibility conditions in different environments. These external and environmental factors, along with internal factors associated with sensors, can pose significant challenges to perceptual data processing, hence affecting the decision-making and control of the vehicle. In this work, we address this critical issue by introducing a framework for analyzing robustness of the learning algorithm w.r.t varying quality in the image input for autonomous driving. Using the results of sensitivity analysis, we further propose an algorithm to improve the overall performance of the task of "learning to steer". The results show that our approach is able to enhance the learning outcomes up to 48%. A comparative study drawn between our approach and other related techniques, such as data augmentation and adversarial training, confirms the effectiveness of our algorithm as a way to improve the robustness and generalization of neural network training for autonomous driving. △ Less

Submitted 25 February, 2021; originally announced February 2021.

arXiv:2101.03625 [pdf]

The 'COVID' Crash of the 2020 U.S. Stock Market

Authors: Min Shu, Ruiqiang Song, Wei Zhu

Abstract: We employed the log-periodic power law singularity (LPPLS) methodology to systematically investigate the 2020 stock market crash in the U.S. equities sectors with different levels of total market capitalizations through four major U.S. stock market indexes, including the Wilshire 5000 Total Market index, the S&P 500 index, the S&P MidCap 400 index, and the Russell 2000 index, representing the stoc… ▽ More We employed the log-periodic power law singularity (LPPLS) methodology to systematically investigate the 2020 stock market crash in the U.S. equities sectors with different levels of total market capitalizations through four major U.S. stock market indexes, including the Wilshire 5000 Total Market index, the S&P 500 index, the S&P MidCap 400 index, and the Russell 2000 index, representing the stocks overall, the large capitalization stocks, the middle capitalization stocks and the small capitalization stocks, respectively. During the 2020 U.S. stock market crash, all four indexes lost more than a third of their values within five weeks, while both the middle capitalization stocks and the small capitalization stocks have suffered much greater losses than the large capitalization stocks and stocks overall. Our results indicate that the price trajectories of these four stock market indexes prior to the 2020 stock market crash have clearly featured the obvious LPPLS bubble pattern and were indeed in a positive bubble regime. Contrary to the popular belief that the COVID-19 led to the 2020 stock market crash, the 2020 U.S. stock market crash was endogenous, stemming from the increasingly systemic instability of the stock market itself. We also performed the complementary post-mortem analysis of the 2020 U.S. stock market crash. Our analyses indicate that the 2020 U.S. stock market crash originated from a bubble which began to form as early as September 2018; and the bubbles in stocks with different levels of total market capitalizations have significantly different starting time profiles. This study not only sheds new light on the making of the 2020 U.S. stock market crash but also creates a novel pipeline for future real-time crash detection and mechanism dissection of any financial market and/or economic index. △ Less

Submitted 10 January, 2021; originally announced January 2021.

Comments: 19 pages, 3 figures. arXiv admin note: text overlap with arXiv:2101.00327

arXiv:2101.00327 [pdf]

doi 10.1016/j.physa.2021.126425

The 2020 Global Stock Market Crash: Endogenous or Exogenous?

Authors: Ruiqiang Song, Min Shu, Wei Zhu

Abstract: Starting on February 20, 2020, the global stock markets began to suffer the worst decline since the Great Recession in 2008, and the COVID-19 has been widely blamed on the stock market crashes. In this study, we applied the log-periodic power law singularity (LPPLS) methodology based on multilevel time series to unravel the underlying mechanisms of the 2020 global stock market crash by analyzing t… ▽ More Starting on February 20, 2020, the global stock markets began to suffer the worst decline since the Great Recession in 2008, and the COVID-19 has been widely blamed on the stock market crashes. In this study, we applied the log-periodic power law singularity (LPPLS) methodology based on multilevel time series to unravel the underlying mechanisms of the 2020 global stock market crash by analyzing the trajectories of 10 major stock market indexes from both developed and emergent stock markets, including the S&P 500, DJIA, NASDAQ, FTSE, DAX, NIKKEI, CSI 300, HSI, BSESN, and BOVESPA. In order to effectively distinguish between endogenous crash and exogenous crash, we proposed using the LPPLS confidence indicator as a classification proxy. The results show that the apparent LPPLS bubble patterns of the super-exponential increase, corrected by the accelerating logarithm-periodic oscillations, have indeed presented in the price trajectories of the seven indexes: S&P 500, DJIA, NASDAQ, DAX, CSI 300, BSESN, and BOVESPA, indicating that the large positive bubbles have formed endogenously prior to the 2020 stock market crash, and the subsequent crashes for the seven indexes are endogenous, stemming from the increasingly systemic instability of the stock markets, while the well-known external shocks such as the COVID-19 pandemic etc. only acted as sparks during the 2020 global stock market crash. In contrast, the obvious signatures of the LPPLS model have not been observed in the price trajectories of the three remaining indexes: FTSE, NIKKEI, and HSI, signifying that the crashes in these three indexes are exogenous, stemming from external shocks. The novel classification method of crash types proposed in this study can also be used to analyze regime changes of any price trajectories in global financial markets. △ Less

Submitted 1 January, 2021; originally announced January 2021.

Comments: 25 pages, 4 figures

arXiv:2010.07334 [pdf, other]

Towards Accurate Quantization and Pruning via Data-free Knowledge Transfer

Authors: Chen Zhu, Zheng Xu, Ali Shafahi, Manli Shu, Amin Ghiasi, Tom Goldstein

Abstract: When large scale training data is available, one can obtain compact and accurate networks to be deployed in resource-constrained environments effectively through quantization and pruning. However, training data are often protected due to privacy concerns and it is challenging to obtain compact networks without data. We study data-free quantization and pruning by transferring knowledge from trained… ▽ More When large scale training data is available, one can obtain compact and accurate networks to be deployed in resource-constrained environments effectively through quantization and pruning. However, training data are often protected due to privacy concerns and it is challenging to obtain compact networks without data. We study data-free quantization and pruning by transferring knowledge from trained large networks to compact networks. Auxiliary generators are simultaneously and adversarially trained with the targeted compact networks to generate synthetic inputs that maximize the discrepancy between the given large network and its quantized or pruned version. We show theoretically that the alternating optimization for the underlying minimax problem converges under mild conditions for pruning and quantization. Our data-free compact networks achieve competitive accuracy to networks trained and fine-tuned with training data. Our quantized and pruned networks achieve good performance while being more compact and lightweight. Further, we demonstrate that the compact structure and corresponding initialization from the Lottery Ticket Hypothesis can also help in data-free training. △ Less

Submitted 14 October, 2020; originally announced October 2020.

arXiv:2010.05210 [pdf, other]

Generalized Few-shot Semantic Segmentation

Authors: Zhuotao Tian, Xin Lai, Li Jiang, Shu Liu, Michelle Shu, Hengshuang Zhao, Jiaya Jia

Abstract: Training semantic segmentation models requires a large amount of finely annotated data, making it hard to quickly adapt to novel classes not satisfying this condition. Few-Shot Segmentation (FS-Seg) tackles this problem with many constraints. In this paper, we introduce a new benchmark, called Generalized Few-Shot Semantic Segmentation (GFS-Seg), to analyze the generalization ability of simultaneo… ▽ More Training semantic segmentation models requires a large amount of finely annotated data, making it hard to quickly adapt to novel classes not satisfying this condition. Few-Shot Segmentation (FS-Seg) tackles this problem with many constraints. In this paper, we introduce a new benchmark, called Generalized Few-Shot Semantic Segmentation (GFS-Seg), to analyze the generalization ability of simultaneously segmenting the novel categories with very few examples and the base categories with sufficient examples. It is the first study showing that previous representative state-of-the-art FS-Seg methods fall short in GFS-Seg and the performance discrepancy mainly comes from the constrained setting of FS-Seg. To make GFS-Seg tractable, we set up a GFS-Seg baseline that achieves decent performance without structural change on the original model. Then, since context is essential for semantic segmentation, we propose the Context-Aware Prototype Learning (CAPL) that significantly improves performance by 1) leveraging the co-occurrence prior knowledge from support samples, and 2) dynamically enriching contextual information to the classifier, conditioned on the content of each query image. Both two contributions are experimentally shown to have substantial practical merit. Extensive experiments on Pascal-VOC and COCO manifest the effectiveness of CAPL, and CAPL generalizes well to FS-Seg by achieving competitive performance. Code is available at https://github.com/dvlab-research/GFS-Seg. △ Less

Submitted 31 May, 2022; v1 submitted 11 October, 2020; originally announced October 2020.

Comments: Accepted to CVPR 2022

arXiv:2009.08965 [pdf, other]

Encoding Robustness to Image Style via Adversarial Feature Perturbations

Authors: Manli Shu, Zuxuan Wu, Micah Goldblum, Tom Goldstein

Abstract: Adversarial training is the industry standard for producing models that are robust to small adversarial perturbations. However, machine learning practitioners need models that are robust to other kinds of changes that occur naturally, such as changes in the style or illumination of input images. Such changes in input distribution have been effectively modeled as shifts in the mean and variance of… ▽ More Adversarial training is the industry standard for producing models that are robust to small adversarial perturbations. However, machine learning practitioners need models that are robust to other kinds of changes that occur naturally, such as changes in the style or illumination of input images. Such changes in input distribution have been effectively modeled as shifts in the mean and variance of deep image features. We adapt adversarial training by directly perturbing feature statistics, rather than image pixels, to produce models that are robust to various unseen distributional shifts. We explore the relationship between these perturbations and distributional shifts by visualizing adversarial features. Our proposed method, Adversarial Batch Normalization (AdvBN), is a single network layer that generates worst-case feature perturbations during training. By fine-tuning neural networks on adversarial feature distributions, we observe improved robustness of networks to various unseen distributional shifts, including style variations and image corruptions. In addition, we show that our proposed adversarial feature perturbation can be complementary to existing image space data augmentation methods, leading to improved performance. The source code and pre-trained models are released at \url{https://github.com/azshue/AdvBN}. △ Less

Submitted 31 October, 2021; v1 submitted 18 September, 2020; originally announced September 2020.

Comments: NeurIPS 2021

arXiv:2008.07290 [pdf]

Commercial Cloud Computing for Connected Vehicle Applications in Transportation Cyber-Physical Systems

Authors: Hsien-Wen Deng, Mizanur Rahman, Mashrur Chowdhury, M Sabbir Salek, Mitch Shue

Abstract: This study focuses on the feasibility of commercial cloud services for connected vehicle (CV) applications in a Transportation Cyber-Physical Systems (TCPS) environment. TCPS implies that CVs, in addition to being connected with each other, communicates with the transportation and computing infrastructure to fulfill application requirements. The motivation of this study is to accelerate commercial… ▽ More This study focuses on the feasibility of commercial cloud services for connected vehicle (CV) applications in a Transportation Cyber-Physical Systems (TCPS) environment. TCPS implies that CVs, in addition to being connected with each other, communicates with the transportation and computing infrastructure to fulfill application requirements. The motivation of this study is to accelerate commercial cloud-based CV application development by presenting the lessons learned by implementing a CV mobility application using Amazon Web Services (AWS). The feasibility of the cloud-based CV application is assessed at three levels: (i) the development of a cloud-based TCPS architecture, (ii) the deployment of a cloud-based CV application using AWS, and (iii) the evaluation of the cloud-based CV application. We implemented this CV mobility application using a serverless cloud architecture and found that such a cloud-based TCPS environment could meet the permissible delay limits of CV mobility applications. Commercial cloud services, as an integral part of TCPS, could reduce costs associated with establishing and maintaining vast computing infrastructure for supporting CV applications. As the CV penetration levels on the surface transportation systems increase significantly over the next several years, scaling the backend infrastructure to support such applications is a critical issue. This study shows how commercial cloud services could automatically scale the backend infrastructure to meet the rapidly changing demands of real-world CV applications. Through real-world experiments, we demonstrate how commercial cloud services along with serverless cloud architecture could advance the transportation digital infrastructure for supporting connected mobility applications in a TCPS environment. △ Less

Submitted 17 August, 2020; originally announced August 2020.

Comments: 15 pages, 9 figures

arXiv:2008.06909 [pdf, other]

doi 10.1109/TIP.2021.3078106

Geodesic Paths for Image Segmentation with Implicit Region-based Homogeneity Enhancement

Authors: Da Chen, Jian Zhu, Xinxin Zhang, Minglei Shu, Laurent D. Cohen

Abstract: Minimal paths are regarded as a powerful and efficient tool for boundary detection and image segmentation due to its global optimality and the well-established numerical solutions such as fast marching method. In this paper, we introduce a flexible interactive image segmentation model based on the Eikonal partial differential equation (PDE) framework in conjunction with region-based homogeneity en… ▽ More Minimal paths are regarded as a powerful and efficient tool for boundary detection and image segmentation due to its global optimality and the well-established numerical solutions such as fast marching method. In this paper, we introduce a flexible interactive image segmentation model based on the Eikonal partial differential equation (PDE) framework in conjunction with region-based homogeneity enhancement. A key ingredient in the introduced model is the construction of local geodesic metrics, which are capable of integrating anisotropic and asymmetric edge features, implicit region-based homogeneity features and/or curvature regularization. The incorporation of the region-based homogeneity features into the metrics considered relies on an implicit representation of these features, which is one of the contributions of this work. Moreover, we also introduce a way to build simple closed contours as the concatenation of two disjoint open curves. Experimental results prove that the proposed model indeed outperforms state-of-the-art minimal paths-based image segmentation approaches. △ Less

Submitted 6 May, 2021; v1 submitted 16 August, 2020; originally announced August 2020.

Comments: Published in IEEE Trans. Image Processing

arXiv:2008.01449 [pdf, other]

Prior Guided Feature Enrichment Network for Few-Shot Segmentation

Authors: Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Zhicheng Yang, Ruiyu Li, Jiaya Jia

Abstract: State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results and hardly work on unseen classes without fine-tuning. Few-shot segmentation is thus proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen… ▽ More State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results and hardly work on unseen classes without fine-tuning. Few-shot segmentation is thus proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information of training classes and spatial inconsistency between query and support targets. To alleviate these issues, we propose the Prior Guided Feature Enrichment Network (PFENet). It consists of novel designs of (1) a training-free prior mask generation method that not only retains generalization power but also improves model performance and (2) Feature Enrichment Module (FEM) that overcomes spatial inconsistency by adaptively enriching query features with support features and prior masks. Extensive experiments on PASCAL-5$^i$ and COCO prove that the proposed prior generation method and FEM both improve the baseline method significantly. Our PFENet also outperforms state-of-the-art methods by a large margin without efficiency loss. It is surprising that our model even generalizes to cases without labeled support samples. Our code is available at https://github.com/Jia-Research-Lab/PFENet/. △ Less

Submitted 4 August, 2020; originally announced August 2020.

Comments: 16 pages. To appear in TPAMI

arXiv:2006.07839 [pdf, other]

doi 10.1109/TIP.2021.3078102

A Generalized Asymmetric Dual-front Model for Active Contours and Image Segmentation

Authors: Da Chen, Jack Spencer, Jean-Marie Mirebeau, Ke Chen, Minglei Shu, Laurent D. Cohen

Abstract: The Voronoi diagram-based dual-front active contour models are known as a powerful and efficient way for addressing the image segmentation and domain partitioning problems. In the basic formulation of the dual-front models, the evolving contours can be considered as the interfaces of adjacent Voronoi regions. Among these dual-front models, a crucial ingredient is regarded as the geodesic metrics b… ▽ More The Voronoi diagram-based dual-front active contour models are known as a powerful and efficient way for addressing the image segmentation and domain partitioning problems. In the basic formulation of the dual-front models, the evolving contours can be considered as the interfaces of adjacent Voronoi regions. Among these dual-front models, a crucial ingredient is regarded as the geodesic metrics by which the geodesic distances and the corresponding Voronoi diagram can be estimated. In this paper, we introduce a type of asymmetric quadratic metrics dual-front model. The metrics considered are built by the integration of the image features and a vector field derived from the evolving contours. The use of the asymmetry enhancement can reduce the risk of contour shortcut or leakage problems especially when the initial contours are far away from the target boundaries or the images have complicated intensity distributions. Moreover, the proposed dual-front model can be applied for image segmentation in conjunction with various region-based homogeneity terms. The numerical experiments on both synthetic and real images show that the proposed dual-front model indeed achieves encouraging results. △ Less

Submitted 4 May, 2021; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: Published in IEEE Transactions on Image Processing

arXiv:2006.06669 [pdf, other]

Understanding Human Hands in Contact at Internet Scale

Authors: Dandan Shan, Jiaqi Geng, Michelle Shu, David F. Fouhey

Abstract: Hands are the central means by which humans manipulate their world and being able to reliably extract hand state information from Internet videos of humans engaged in their hands has the potential to pave the way to systems that can learn from petabytes of video data. This paper proposes steps towards this by inferring a rich representation of hands engaged in interaction method that includes: han… ▽ More Hands are the central means by which humans manipulate their world and being able to reliably extract hand state information from Internet videos of humans engaged in their hands has the potential to pave the way to systems that can learn from petabytes of video data. This paper proposes steps towards this by inferring a rich representation of hands engaged in interaction method that includes: hand location, side, contact state, and a box around the object in contact. To support this effort, we gather a large-scale dataset of hands in contact with objects consisting of 131 days of footage as well as a 100K annotated hand-contact video frame dataset. The learned model on this dataset can serve as a foundation for hand-contact understanding in videos. We quantitatively evaluate it both on its own and in service of predicting and learning from 3D meshes of human hands. △ Less

Submitted 11 June, 2020; originally announced June 2020.

Comments: To appear at CVPR 2020 (Oral). Project and dataset webpage: http://fouheylab.eecs.umich.edu/~dandans/projects/100DOH/

arXiv:2005.07343 [pdf, other]

Visual Perception Model for Rapid and Adaptive Low-light Image Enhancement

Authors: Xiaoxiao Li, Xiaopeng Guo, Liye Mei, Mingyu Shang, Jie Gao, Mao**g Shu, Xiang Wang

Abstract: Low-light image enhancement is a promising solution to tackle the problem of insufficient sensitivity of human vision system (HVS) to perceive information in low light environments. Previous Retinex-based works always accomplish enhancement task by estimating light intensity. Unfortunately, single light intensity modelling is hard to accurately simulate visual perception information, leading to th… ▽ More Low-light image enhancement is a promising solution to tackle the problem of insufficient sensitivity of human vision system (HVS) to perceive information in low light environments. Previous Retinex-based works always accomplish enhancement task by estimating light intensity. Unfortunately, single light intensity modelling is hard to accurately simulate visual perception information, leading to the problems of imbalanced visual photosensitivity and weak adaptivity. To solve these problems, we explore the precise relationship between light source and visual perception and then propose the visual perception (VP) model to acquire a precise mathematical description of visual perception. The core of VP model is to decompose the light source into light intensity and light spatial distribution to describe the perception process of HVS, offering refinement estimation of illumination and reflectance. To reduce complexity of the estimation process, we introduce the rapid and adaptive $\mathbfβ$ and $\mathbfγ$ functions to build an illumination and reflectance estimation scheme. Finally, we present a optimal determination strategy, consisting of a \emph{cycle operation} and a \emph{comparator}. Specifically, the \emph{comparator} is responsible for determining the optimal enhancement results from multiple enhanced results through implementing the \emph{cycle operation}. By coordinating the proposed VP model, illumination and reflectance estimation scheme, and the optimal determination strategy, we propose a rapid and adaptive framework for low-light image enhancement. Extensive experiment results demenstrate that the proposed method achieves better performance in terms of visual comparison, quantitative assessment, and computational efficiency, compared with the currently state-of-the-arts. △ Less

Submitted 14 May, 2020; originally announced May 2020.

Comments: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

arXiv:2004.09007 [pdf, other]

doi 10.1109/ICASSP40776.2020.9053181

Headless Horseman: Adversarial Attacks on Transfer Learning Models

Authors: Ahmed Abdelkader, Michael J. Curry, Liam Fowl, Tom Goldstein, Avi Schwarzschild, Manli Shu, Christoph Studer, Chen Zhu

Abstract: Transfer learning facilitates the training of task-specific classifiers using pre-trained models as feature extractors. We present a family of transferable adversarial attacks against such classifiers, generated without access to the classification head; we call these \emph{headless attacks}. We first demonstrate successful transfer attacks against a victim network using \textit{only} its feature… ▽ More Transfer learning facilitates the training of task-specific classifiers using pre-trained models as feature extractors. We present a family of transferable adversarial attacks against such classifiers, generated without access to the classification head; we call these \emph{headless attacks}. We first demonstrate successful transfer attacks against a victim network using \textit{only} its feature extractor. This motivates the introduction of a label-blind adversarial attack. This transfer attack method does not require any information about the class-label space of the victim. Our attack lowers the accuracy of a ResNet18 trained on CIFAR10 by over 40\%. △ Less

Submitted 19 April, 2020; originally announced April 2020.

Comments: 5 pages, 2 figures. Accepted in ICASSP 2020. Code available on https://github.com/zhuchen03/headless-attack.git

arXiv:2003.03710 [pdf, other]

Trajectory Grou** with Curvature Regularization for Tubular Structure Tracking

Authors: Li Liu, Da Chen, Minglei Shu, Baosheng Li, Huazhong Shu, Michel Paques, Laurent D. Cohen

Abstract: Tubular structure tracking is a crucial task in the fields of computer vision and medical image analysis. The minimal paths-based approaches have exhibited their strong ability in tracing tubular structures, by which a tubular structure can be naturally modeled as a minimal geodesic path computed with a suitable geodesic metric. However, existing minimal paths-based tracing approaches still suffer… ▽ More Tubular structure tracking is a crucial task in the fields of computer vision and medical image analysis. The minimal paths-based approaches have exhibited their strong ability in tracing tubular structures, by which a tubular structure can be naturally modeled as a minimal geodesic path computed with a suitable geodesic metric. However, existing minimal paths-based tracing approaches still suffer from difficulties such as the shortcuts and short branches combination problems, especially when dealing with the images involving complicated tubular tree structures or background. In this paper, we introduce a new minimal paths-based model for minimally interactive tubular structure centerline extraction in conjunction with a perceptual grou** scheme. Basically, we take into account the prescribed tubular trajectories and curvature-penalized geodesic paths to seek suitable shortest paths. The proposed approach can benefit from the local smoothness prior on tubular structures and the global optimality of the used graph-based path searching scheme. Experimental results on both synthetic and real images prove that the proposed model indeed obtains outperformance comparing with the state-of-the-art minimal paths-based tubular structure tracing algorithms. △ Less

Submitted 8 December, 2021; v1 submitted 7 March, 2020; originally announced March 2020.

arXiv:1911.11230 [pdf, other]

Identifying Model Weakness with Adversarial Examiner

Authors: Michelle Shu, Chenxi Liu, Weichao Qiu, Alan Yuille

Abstract: Machine learning models are usually evaluated according to the average case performance on the test set. However, this is not always ideal, because in some sensitive domains (e.g. autonomous driving), it is the worst case performance that matters more. In this paper, we are interested in systematic exploration of the input data space to identify the weakness of the model to be evaluated. We propos… ▽ More Machine learning models are usually evaluated according to the average case performance on the test set. However, this is not always ideal, because in some sensitive domains (e.g. autonomous driving), it is the worst case performance that matters more. In this paper, we are interested in systematic exploration of the input data space to identify the weakness of the model to be evaluated. We propose to use an adversarial examiner in the testing stage. Different from the existing strategy to always give the same (distribution of) test data, the adversarial examiner will dynamically select the next test data to hand out based on the testing history so far, with the goal being to undermine the model's performance. This sequence of test data not only helps us understand the current model, but also serves as constructive feedback to help improve the model in the next iteration. We conduct experiments on ShapeNet object classification. We show that our adversarial examiner can successfully put more emphasis on the weakness of the model, preventing performance estimates from being overly optimistic. △ Less

Submitted 25 November, 2019; originally announced November 2019.

Comments: To appear in AAAI-20

arXiv:1906.11443 [pdf, other]

Region Refinement Network for Salient Object Detection

Authors: Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Jiaze Wang, Ruiyu Li, Xiaoyong Shen, Jiaya Jia

Abstract: Albeit intensively studied, false prediction and unclear boundaries are still major issues of salient object detection. In this paper, we propose a Region Refinement Network (RRN), which recurrently filters redundant information and explicitly models boundary information for saliency detection. Different from existing refinement methods, we propose a Region Refinement Module (RRM) that optimizes s… ▽ More Albeit intensively studied, false prediction and unclear boundaries are still major issues of salient object detection. In this paper, we propose a Region Refinement Network (RRN), which recurrently filters redundant information and explicitly models boundary information for saliency detection. Different from existing refinement methods, we propose a Region Refinement Module (RRM) that optimizes salient region prediction by incorporating supervised attention masks in the intermediate refinement stages. The module only brings a minor increase in model size and yet significantly reduces false predictions from the background. To further refine boundary areas, we propose a Boundary Refinement Loss (BRL) that adds extra supervision for better distinguishing foreground from background. BRL is parameter free and easy to train. We further observe that BRL helps retain the integrity in prediction by refining the boundary. Extensive experiments on saliency detection datasets show that our refinement module and loss bring significant improvement to the baseline and can be easily applied to different frameworks. We also demonstrate that our proposed model generalizes well to portrait segmentation and shadow detection tasks. △ Less

Submitted 9 October, 2022; v1 submitted 27 June, 2019; originally announced June 2019.

Comments: Tech report

arXiv:1906.03337 [pdf]

Extension of Rough Set Based on Positive Transitive Relation

Authors: Min Shu, Wei Zhu

Abstract: The application of rough set theory in incomplete information systems is a key problem in practice since missing values almost always occur in knowledge acquisition due to the error of data measuring, the limitation of data collection, or the limitation of data comprehension, etc. An incomplete information system is mainly processed by compressing the indiscernibility relation. The existing rough… ▽ More The application of rough set theory in incomplete information systems is a key problem in practice since missing values almost always occur in knowledge acquisition due to the error of data measuring, the limitation of data collection, or the limitation of data comprehension, etc. An incomplete information system is mainly processed by compressing the indiscernibility relation. The existing rough set extension models based on tolerance or symmetric similarity relations typically discard one relation among the reflexive, symmetric and transitive relations, especially the transitive relation. In order to overcome the limitations of the current rough set extension models, we define a new relation called the positive transitive relation and then propose a novel rough set extension model built upon which. The new model holds the merit of the existing rough set extension models while avoids their limitations of discarding transitivity or symmetry. In comparison to the existing extension models, the proposed model has a better performance in processing the incomplete information systems while substantially reducing the computational complexity, taking into account the relation of tolerance and similarity of positive transitivity, and supplementing the related theories in accordance to the intuitive classification of incomplete information. In summary, the positive transitive relation can improve current theoretical analysis of incomplete information systems and the newly proposed extension model is more suitable for processing incomplete information systems and has a broad application prospect. △ Less

Submitted 13 June, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

Comments: 9 pages

arXiv:1905.09647 [pdf]

doi 10.1016/j.physa.2020.124477

Real-time Prediction of Bitcoin Bubble Crashes

Authors: Min Shu, Wei Zhu

Abstract: In the past decade, Bitcoin as an emerging asset class has gained widespread public attention because of their extraordinary returns in phases of extreme price growth and their unpredictable massive crashes. We apply the log-periodic power law singularity (LPPLS) confidence indicator as a diagnostic tool for identifying bubbles using the daily data on Bitcoin price in the past two years. We find t… ▽ More In the past decade, Bitcoin as an emerging asset class has gained widespread public attention because of their extraordinary returns in phases of extreme price growth and their unpredictable massive crashes. We apply the log-periodic power law singularity (LPPLS) confidence indicator as a diagnostic tool for identifying bubbles using the daily data on Bitcoin price in the past two years. We find that the LPPLS confidence indicator based on the daily Bitcoin price data fails to provide effective warnings for detecting the bubbles when the Bitcoin price suffers from a large fluctuation in a short time, especially for positive bubbles. In order to diagnose the existence of bubbles and accurately predict the bubble crashes in the cryptocurrency market, this study proposes an adaptive multilevel time series detection methodology based on the LPPLS model and finer (than daily) timescale for the Bitcoin price data. We adopt two levels of time series, 1 hour and 30 minutes, to demonstrate the adaptive multilevel time series detection methodology. The results show that the LPPLS confidence indicator based on this new method is an outstanding instrument to effectively detect the bubbles and accurately forecast the bubble crashes, even if a bubble exists in a short time. In addition, we discover that the short-term LPPLS confidence indicator highly sensitive to the extreme fluctuations of Bitcoin price can provide some useful insights into the bubble status on a shorter time scale - on a day to week scale, and the long-term LPPLS confidence indicator has a stable performance in terms of effectively monitoring the bubble status on a longer time scale - on a week to month scale. The adaptive multilevel time series detection methodology can provide real-time detection of bubbles and advanced forecast of crashes to warn of the imminent risk. △ Less

Submitted 13 June, 2019; v1 submitted 23 May, 2019; originally announced May 2019.

Comments: 25 pages, 5 figures

MSC Class: 91G70

Showing 1–50 of 55 results for author: Shu, M