Search | arXiv e-print repository

Learning Modality Knowledge Alignment for Cross-Modality Transfer

Authors: Wenxuan Ma, Shuang Li, Lincan Cai, **gxuan Kang

Abstract: Cross-modality transfer aims to leverage large pretrained models to complete tasks that may not belong to the modality of pretraining data. Existing works achieve certain success in extending classical finetuning to cross-modal scenarios, yet we still lack understanding about the influence of modality gap on the transfer. In this work, a series of experiments focusing on the source representation… ▽ More Cross-modality transfer aims to leverage large pretrained models to complete tasks that may not belong to the modality of pretraining data. Existing works achieve certain success in extending classical finetuning to cross-modal scenarios, yet we still lack understanding about the influence of modality gap on the transfer. In this work, a series of experiments focusing on the source representation quality during transfer are conducted, revealing the connection between larger modality gap and lesser knowledge reuse which means ineffective transfer. We then formalize the gap as the knowledge misalignment between modalities using conditional distribution P(Y|X). Towards this problem, we present Modality kNowledge Alignment (MoNA), a meta-learning approach that learns target data transformation to reduce the modality knowledge discrepancy ahead of the transfer. Experiments show that out method enables better reuse of source modality knowledge in cross-modality transfer, which leads to improvements upon existing finetuning methods. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: ICML 2024

arXiv:2406.18085 [pdf, other]

Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints

Authors: Ran Song, Shizhu He, Shengxiang Gao, Li Cai, Kang Liu, Zhengtao Yu, Jun Zhao

Abstract: Multilingual Knowledge Graph Completion (mKGC) aim at solving queries like (h, r, ?) in different languages by reasoning a tail entity t thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages… ▽ More Multilingual Knowledge Graph Completion (mKGC) aim at solving queries like (h, r, ?) in different languages by reasoning a tail entity t thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages, its pretraining tasks cannot be directly aligned with the mKGC tasks. Moreover, the majority of KGs and PLMs currently available exhibit a pronounced English-centric bias. This makes it difficult for mKGC to achieve good results, particularly in the context of low-resource languages. To overcome previous problems, this paper introduces global and local knowledge constraints for mKGC. The former is used to constrain the reasoning of answer entities, while the latter is used to enhance the representation of query contexts. The proposed method makes the pretrained model better adapt to the mKGC task. Experimental results on public datasets demonstrate that our method outperforms the previous SOTA on Hits@1 and Hits@10 by an average of 12.32% and 16.03%, which indicates that our proposed method has significant enhancement on mKGC. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 11 pages, ACL 2023

arXiv:2406.17225 [pdf, other]

Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images

Authors: Songhan Jiang, Zhengyu Gan, Linghan Cai, Yifeng Wang, Yongbing Zhang

Abstract: Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tu… ▽ More Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tumor microenvironment (TME). (2) Existing multimodal methods often rely on alignment strategies to integrate complementary information, which may lead to information loss due to the inherent heterogeneity between pathology and genes. In this paper, we propose a Multimodal Cross-Task Interaction (MCTI) framework to explore the intrinsic correlations between subtype classification and survival analysis tasks. Specifically, to capture TME-related features in WSIs, we leverage the subtype classification task to mine tumor regions. Simultaneously, multi-head attention mechanisms are applied in genomic feature extraction, adaptively performing genes grou** to obtain task-related genomic embedding. With the joint representation of pathological images and genomic data, we further introduce a Transport-Guided Attention (TGA) module that uses optimal transport theory to model the correlation between subtype classification and survival analysis tasks, effectively transferring potential information. Extensive experiments demonstrate the superiority of our approaches, with MCTI outperforming state-of-the-art frameworks on three public benchmarks. \href{https://github.com/jsh0792/MCTI}{https://github.com/jsh0792/MCTI}. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16427 [pdf, other]

Dynamic Pseudo Label Optimization in Point-Supervised Nuclei Segmentation

Authors: Ziyue Wang, Ye Zhang, Yifeng Wang, Linghan Cai, Yongbing Zhang

Abstract: Deep learning has achieved impressive results in nuclei segmentation, but the massive requirement for pixel-wise labels remains a significant challenge. To alleviate the annotation burden, existing methods generate pseudo masks for model training using point labels. However, the generated masks are inevitably different from the ground truth, and these dissimilarities are not handled reasonably dur… ▽ More Deep learning has achieved impressive results in nuclei segmentation, but the massive requirement for pixel-wise labels remains a significant challenge. To alleviate the annotation burden, existing methods generate pseudo masks for model training using point labels. However, the generated masks are inevitably different from the ground truth, and these dissimilarities are not handled reasonably during the network training, resulting in the subpar performance of the segmentation model. To tackle this issue, we propose a framework named DoNuSeg, enabling \textbf{D}ynamic pseudo label \textbf{O}ptimization in point-supervised \textbf{Nu}clei \textbf{Seg}mentation. Specifically, DoNuSeg takes advantage of class activation maps (CAMs) to adaptively capture regions with semantics similar to annotated points. To leverage semantic diversity in the hierarchical feature levels, we design a dynamic selection module to choose the optimal one among CAMs from different encoder blocks as pseudo masks. Meanwhile, a CAM-guided contrastive module is proposed to further enhance the accuracy of pseudo masks. In addition to exploiting the semantic information provided by CAMs, we consider location priors inherent to point labels, develo** a task-decoupled structure for effectively differentiating nuclei. Extensive experiments demonstrate that DoNuSeg outperforms state-of-the-art point-supervised methods. The code is available at https://github.com/shinning0821/MICCAI24-DoNuSeg. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: early accepted by MICCAI2024

arXiv:2406.15269 [pdf, other]

You Only Acquire Sparse-channel (YOAS): A Unified Framework for Dense-channel EEG Generation

Authors: Hongyu Chen, Weiming Zeng, Luhui Cai, Yueyang Li, Lei Wang, Jia Lu, Hongjie Yan, Wai Ting Siok, Nizhuan Wang

Abstract: High-precision acquisition of dense-channel electroencephalogram (EEG) signals is often impeded by the costliness and lack of portability of equipment. In contrast, generating dense-channel EEG signals effectively from sparse channels shows promise and economic viability. However, sparse-channel EEG poses challenges such as reduced spatial resolution, information loss, signal mixing, and heightene… ▽ More High-precision acquisition of dense-channel electroencephalogram (EEG) signals is often impeded by the costliness and lack of portability of equipment. In contrast, generating dense-channel EEG signals effectively from sparse channels shows promise and economic viability. However, sparse-channel EEG poses challenges such as reduced spatial resolution, information loss, signal mixing, and heightened susceptibility to noise and interference. To address these challenges, we first theoretically formulate the dense-channel EEG generation problem as by optimizing a set of cross-channel EEG signal generation problems. Then, we propose the YOAS framework for generating dense-channel data from sparse-channel EEG signals. The YOAS totally consists of four sequential stages: Data Preparation, Data Preprocessing, Biased-EEG Generation, and Synthetic EEG Generation. Data Preparation and Preprocessing carefully consider the distribution of EEG electrodes and low signal-to-noise ratio problem of EEG signals. Biased-EEG Generation includes sub-modules of BiasEEGGanFormer and BiasEEGDiffFormer, which facilitate long-term feature extraction with attention and generate signals by combining electrode position alignment with diffusion model, respectively. Synthetic EEG Generation synthesizes the final signals, employing a deduction paradigm for multi-channel EEG generation. Extensive experiments confirmed YOAS's feasibility, efficiency, and theoretical validity, even remarkably enhancing data discernibility. This breakthrough in dense-channel EEG signal generation from sparse-channel data opens new avenues for exploration in EEG signal processing and application. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14455 [pdf, other]

MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction

Authors: Luhui Cai, Weiming Zeng, Hongyu Chen, Hua Zhang, Yueyang Li, Hongjie Yan, Lingbin Bian, Nizhuan Wang

Abstract: Graph deep learning (GDL) has demonstrated impressive performance in predicting population-based brain disorders (BDs) through the integration of both imaging and non-imaging data. However, the effectiveness of GDL based methods heavily depends on the quality of modeling the multi-modal population graphs and tends to degrade as the graph scale increases. Furthermore, these methods often constrain… ▽ More Graph deep learning (GDL) has demonstrated impressive performance in predicting population-based brain disorders (BDs) through the integration of both imaging and non-imaging data. However, the effectiveness of GDL based methods heavily depends on the quality of modeling the multi-modal population graphs and tends to degrade as the graph scale increases. Furthermore, these methods often constrain interactions between imaging and non-imaging data to node-edge interactions within the graph, overlooking complex inter-modal correlations, leading to suboptimal outcomes. To overcome these challenges, we propose MM-GTUNets, an end-to-end graph transformer based multi-modal graph deep learning (MMGDL) framework designed for brain disorders prediction at large scale. Specifically, to effectively leverage rich multi-modal information related to diseases, we introduce Modality Reward Representation Learning (MRRL) which adaptively constructs population graphs using a reward system. Additionally, we employ variational autoencoder to reconstruct latent representations of non-imaging features aligned with imaging features. Based on this, we propose Adaptive Cross-Modal Graph Learning (ACMGL), which captures critical modality-specific and modality-shared features through a unified GTUNet encoder taking advantages of Graph UNet and Graph Transformer, and feature fusion module. We validated our method on two public multi-modal datasets ABIDE and ADHD-200, demonstrating its superior performance in diagnosing BDs. Our code is available at https://github.com/NZWANG/MM-GTUNets. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13835 [pdf, other]

Bundling in Oligopoly: Revenue Maximization with Single-Item Competitors

Authors: Moshe Babaioff, Linda Cai, Brendan Lucier

Abstract: We consider a principal seller with $m$ heterogeneous products to sell to an additive buyer over independent items. The principal can offer an arbitrary menu of product bundles, but faces competition from smaller and more agile single-item sellers. The single-item sellers choose their prices after the principal commits to a menu, potentially under-cutting the principal's offerings. We explore to w… ▽ More We consider a principal seller with $m$ heterogeneous products to sell to an additive buyer over independent items. The principal can offer an arbitrary menu of product bundles, but faces competition from smaller and more agile single-item sellers. The single-item sellers choose their prices after the principal commits to a menu, potentially under-cutting the principal's offerings. We explore to what extent the principal can leverage the ability to bundle product together to extract revenue. Any choice of menu by the principal induces an oligopoly pricing game between the single-item sellers, which may have multiple equilibria. When there is only a single item this model reduces to Bertrand competition, for which the principal's revenue is $0$ at any equilibrium, so we assume that no single item's value is too dominant. We establish an upper bound on the principal's optimal revenue at every equilibrium: the expected welfare after truncating each item's value to its revenue-maximizing price. Under a technical condition on the value distributions -- that the monopolist's revenue is sufficiently sensitive to price -- we show that the principal seller can simply price the grand-bundle and ensure (in any equilibrium) a constant approximation to this bound (and hence to the optimal revenue). We also show that for some value distributions violating our conditions, grand-bundle pricing does not yield a constant approximation to the optimal revenue in any equilibrium. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Accepted to EC 2024

arXiv:2406.09003 [pdf, other]

Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

Authors: Lincan Cai, Shuang Li, Wenxuan Ma, **gxuan Kang, Binhui Xie, Zixun Sun, Chengwei Zhu

Abstract: Large-scale pretrained models have proven immensely valuable in handling data-intensive modalities like text and image. However, fine-tuning these models for certain specialized modalities, such as protein sequence and cosmic ray, poses challenges due to the significant modality discrepancy and scarcity of labeled data. In this paper, we propose an end-to-end method, PaRe, to enhance cross-modal f… ▽ More Large-scale pretrained models have proven immensely valuable in handling data-intensive modalities like text and image. However, fine-tuning these models for certain specialized modalities, such as protein sequence and cosmic ray, poses challenges due to the significant modality discrepancy and scarcity of labeled data. In this paper, we propose an end-to-end method, PaRe, to enhance cross-modal fine-tuning, aiming to transfer a large-scale pretrained model to various target modalities. PaRe employs a gating mechanism to select key patches from both source and target data. Through a modality-agnostic Patch Replacement scheme, these patches are preserved and combined to construct data-rich intermediate modalities ranging from easy to hard. By gradually intermediate modality generation, we can not only effectively bridge the modality gap to enhance stability and transferability of cross-modal fine-tuning, but also address the challenge of limited data in the target modality by leveraging enriched intermediate modality data. Compared with hand-designed, general-purpose, task-specific, and state-of-the-art cross-modal fine-tuning approaches, PaRe demonstrates superior performance across three challenging benchmarks, encompassing more than ten modalities. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.00924 [pdf, ps, other]

Faster Diffusion-based Sampling with Randomized Midpoints: Sequential and Parallel

Authors: Shivam Gupta, Linda Cai, Sitan Chen

Abstract: In recent years, there has been a surge of interest in proving discretization bounds for diffusion models. These works show that for essentially any data distribution, one can approximately sample in polynomial time given a sufficiently accurate estimate of its score functions at different noise levels. In this work, we propose a new discretization scheme for diffusion models inspired by Shen and… ▽ More In recent years, there has been a surge of interest in proving discretization bounds for diffusion models. These works show that for essentially any data distribution, one can approximately sample in polynomial time given a sufficiently accurate estimate of its score functions at different noise levels. In this work, we propose a new discretization scheme for diffusion models inspired by Shen and Lee's randomized midpoint method for log-concave sampling~\cite{ShenL19}. We prove that this approach achieves the best known dimension dependence for sampling from arbitrary smooth distributions in total variation distance ($\widetilde O(d^{5/12})$ compared to $\widetilde O(\sqrt{d})$ from prior work). We also show that our algorithm can be parallelized to run in only $\widetilde O(\log^2 d)$ parallel rounds, constituting the first provable guarantees for parallel sampling with diffusion models. As a byproduct of our methods, for the well-studied problem of log-concave sampling in total variation distance, we give an algorithm and simple analysis achieving dimension dependence $\widetilde O(d^{5/12})$ compared to $\widetilde O(\sqrt{d})$ from prior work. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.13002 [pdf, other]

DuetRAG: Collaborative Retrieval-Augmented Generation

Authors: Dian Jiao, Li Cai, **gsheng Huang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

Abstract: Retrieval-Augmented Generation (RAG) methods augment the input of Large Language Models (LLMs) with relevant retrieved passages, reducing factual errors in knowledge-intensive tasks. However, contemporary RAG approaches suffer from irrelevant knowledge retrieval issues in complex domain questions (e.g., HotPot QA) due to the lack of corresponding domain knowledge, leading to low-quality generation… ▽ More Retrieval-Augmented Generation (RAG) methods augment the input of Large Language Models (LLMs) with relevant retrieved passages, reducing factual errors in knowledge-intensive tasks. However, contemporary RAG approaches suffer from irrelevant knowledge retrieval issues in complex domain questions (e.g., HotPot QA) due to the lack of corresponding domain knowledge, leading to low-quality generations. To address this issue, we propose a novel Collaborative Retrieval-Augmented Generation framework, DuetRAG. Our bootstrap** philosophy is to simultaneously integrate the domain fintuning and RAG models to improve the knowledge retrieval quality, thereby enhancing generation quality. Finally, we demonstrate DuetRAG' s matches with expert human researchers on HotPot QA. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 5 pages

arXiv:2405.06033 [pdf, other]

ReefGlider: A highly maneuverable vectored buoyancy engine based underwater robot

Authors: Kevin Macauley, Levi Cai, Peter Adamczyk, Yogesh Girdhar

Abstract: There exists a capability gap in the design of currently available autonomous underwater vehicles (AUV). Most AUVs use a set of thrusters, and optionally control surfaces, to control their depth and pose. AUVs utilizing thrusters can be highly maneuverable, making them well-suited to operate in complex environments such as in close-proximity to coral reefs. However, they are inherently power-ineff… ▽ More There exists a capability gap in the design of currently available autonomous underwater vehicles (AUV). Most AUVs use a set of thrusters, and optionally control surfaces, to control their depth and pose. AUVs utilizing thrusters can be highly maneuverable, making them well-suited to operate in complex environments such as in close-proximity to coral reefs. However, they are inherently power-inefficient and produce significant noise and disturbance. Underwater gliders, on the other hand, use changes in buoyancy and center of mass, in combination with a control surface to move around. They are extremely power efficient but not very maneuverable. Gliders are designed for long-range missions that do not require precision maneuvering. Furthermore, since gliders only activate the buoyancy engine for small time intervals, they do not disturb the environment and can also be used for passive acoustic observations. In this paper we present ReefGlider, a novel AUV that uses only buoyancy for control but is still highly maneuverable from additional buoyancy control devices. ReefGlider bridges the gap between the capabilities of thruster-driven AUVs and gliders. These combined characteristics make ReefGlider ideal for tasks such as long-term visual and acoustic monitoring of coral reefs. We present the overall design and implementation of the system, as well as provide analysis of some of its capabilities. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: In IEEE International Conference on Robotics and Automation (ICRA), 2024

arXiv:2405.04434 [pdf, other]

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models. △ Less

Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.17878 [pdf]

Processing HSV Colored Medical Images and Adapting Color Thresholds for Computational Image Analysis: a Practical Introduction to an open-source tool

Authors: Lie Cai, Andre Pfob

Abstract: Background: Using artificial intelligence (AI) techniques for computational medical image analysis has shown promising results. However, colored images are often not readily available for AI analysis because of different coloring thresholds used across centers and physicians as well as the removal of clinical annotations. We aimed to develop an open-source tool that can adapt different color thres… ▽ More Background: Using artificial intelligence (AI) techniques for computational medical image analysis has shown promising results. However, colored images are often not readily available for AI analysis because of different coloring thresholds used across centers and physicians as well as the removal of clinical annotations. We aimed to develop an open-source tool that can adapt different color thresholds of HSV-colored medical images and remove annotations with a simple click. Materials and Methods: We built a function using MATLAB and used multi-center international shear wave elastography data (NCT 02638935) to test the function. We provide step-by-step instructions with accompanying code lines. Results: We demonstrate that the newly developed pre-processing function successfully removed letters and adapted different color thresholds of HSV-colored medical images. Conclusion: We developed an open-source tool for removing letters and adapting different color thresholds in HSV-colored medical images. We hope this contributes to advancing medical image processing for develo** robust computational imaging algorithms using diverse multi-center big data. The open-source Matlab tool is available at https://github.com/cailiemed/image-threshold-adapting. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: An open-source tool that can adapt different color thresholds of HSV-colored medical images. The newly developed pre-processing Matlab function successfully works on multi-center, international shear wave elastography data (NCT 02638935). Step-by-step instructions with accompanying code lines were provided, easy to follow and reproduce

arXiv:2404.14956 [pdf, other]

DAWN: Domain-Adaptive Weakly Supervised Nuclei Segmentation via Cross-Task Interactions

Authors: Ye Zhang, Yifeng Wang, Zijie Fang, Hao Bian, Linghan Cai, Ziyue Wang, Yongbing Zhang

Abstract: Weakly supervised segmentation methods have gained significant attention due to their ability to reduce the reliance on costly pixel-level annotations during model training. However, the current weakly supervised nuclei segmentation approaches typically follow a two-stage pseudo-label generation and network training process. The performance of the nuclei segmentation heavily relies on the quality… ▽ More Weakly supervised segmentation methods have gained significant attention due to their ability to reduce the reliance on costly pixel-level annotations during model training. However, the current weakly supervised nuclei segmentation approaches typically follow a two-stage pseudo-label generation and network training process. The performance of the nuclei segmentation heavily relies on the quality of the generated pseudo-labels, thereby limiting its effectiveness. This paper introduces a novel domain-adaptive weakly supervised nuclei segmentation framework using cross-task interaction strategies to overcome the challenge of pseudo-label generation. Specifically, we utilize weakly annotated data to train an auxiliary detection task, which assists the domain adaptation of the segmentation network. To enhance the efficiency of domain adaptation, we design a consistent feature constraint module integrating prior knowledge from the source domain. Furthermore, we develop pseudo-label optimization and interactive training methods to improve the domain transfer capability. To validate the effectiveness of our proposed method, we conduct extensive comparative and ablation experiments on six datasets. The results demonstrate the superiority of our approach over existing weakly supervised approaches. Remarkably, our method achieves comparable or even better performance than fully supervised methods. Our code will be released in https://github.com/zhangye-zoe/DAWN. △ Less

Submitted 24 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: 13 pages, 11 figures, 8 tables

arXiv:2404.06103 [pdf, other]

doi 10.5281/zenodo.10114235

Exploring Diverse Sounds: Identifying Outliers in a Music Corpus

Authors: Le Cai, Sam Ferguson, Gengfa Fang, Hani Alshamrani

Abstract: Existing research on music recommendation systems primarily focuses on recommending similar music, thereby often neglecting diverse and distinctive musical recordings. Musical outliers can provide valuable insights due to the inherent diversity of music itself. In this paper, we explore music outliers, investigating their potential usefulness for music discovery and recommendation systems. We argu… ▽ More Existing research on music recommendation systems primarily focuses on recommending similar music, thereby often neglecting diverse and distinctive musical recordings. Musical outliers can provide valuable insights due to the inherent diversity of music itself. In this paper, we explore music outliers, investigating their potential usefulness for music discovery and recommendation systems. We argue that not all outliers should be treated as noise, as they can offer interesting perspectives and contribute to a richer understanding of an artist's work. We introduce the concept of 'Genuine' music outliers and provide a definition for them. These genuine outliers can reveal unique aspects of an artist's repertoire and hold the potential to enhance music discovery by exposing listeners to novel and diverse musical experiences. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Journal ref: The 16th International Symposium on Computer Music Multidisciplinary Research,2023

arXiv:2404.05991 [pdf, other]

Polynomial-time derivation of optimal k-tree topology from Markov networks

Authors: Fereshteh R. Dastjerdi, Liming Cai

Abstract: Characterization of joint probability distribution for large networks of random variables remains a challenging task in data science. Probabilistic graph approximation with simple topologies has practically been resorted to; typically the tree topology makes joint probability computation much simpler and can be effective for statistical inference on insufficient data. However, to characterize netw… ▽ More Characterization of joint probability distribution for large networks of random variables remains a challenging task in data science. Probabilistic graph approximation with simple topologies has practically been resorted to; typically the tree topology makes joint probability computation much simpler and can be effective for statistical inference on insufficient data. However, to characterize network components where multiple variables cooperate closely to influence others, model topologies beyond a tree are needed, which unfortunately are infeasible to acquire. In particular, our previous work has related optimal approximation of Markov networks of tree-width k >=2 closely to the graph-theoretic problem of finding maximum spanning k-tree (MSkT), which is a provably intractable task. This paper investigates optimal approximation of Markov networks with k-tree topology that retains some designated underlying subgraph. Such a subgraph may encode certain background information that arises in scientific applications, for example, about a known significant pathway in gene networks or the indispensable backbone connectivity in the residue interaction graphs for a biomolecule 3D structure. In particular, it is proved that the β-retaining MSkT problem, for a number of classes βof graphs, admit O(n^{k+1})-time algorithms for every fixed k>= 1. These β-retaining MSkT algorithms offer efficient solutions for approximation of Markov networks with k-tree topology in the situation where certain persistent information needs to be retained. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 20 pages including references, 1 figure

arXiv:2404.00351 [pdf, other]

Rethinking Attention-Based Multiple Instance Learning for Whole-Slide Pathological Image Classification: An Instance Attribute Viewpoint

Authors: Linghan Cai, Shen** Huang, Ye Zhang, **peng Lu, Yongbing Zhang

Abstract: Multiple instance learning (MIL) is a robust paradigm for whole-slide pathological image (WSI) analysis, processing gigapixel-resolution images with slide-level labels. As pioneering efforts, attention-based MIL (ABMIL) and its variants are increasingly becoming popular due to the characteristics of simultaneously handling clinical diagnosis and tumor localization. However, the attention mechanism… ▽ More Multiple instance learning (MIL) is a robust paradigm for whole-slide pathological image (WSI) analysis, processing gigapixel-resolution images with slide-level labels. As pioneering efforts, attention-based MIL (ABMIL) and its variants are increasingly becoming popular due to the characteristics of simultaneously handling clinical diagnosis and tumor localization. However, the attention mechanism exhibits limitations in discriminating between instances, which often misclassifies tissues and potentially impairs MIL performance. This paper proposes an Attribute-Driven MIL (AttriMIL) framework to address these issues. Concretely, we dissect the calculation process of ABMIL and present an attribute scoring mechanism that measures the contribution of each instance to bag prediction effectively, quantifying instance attributes. Based on attribute quantification, we develop a spatial attribute constraint and an attribute ranking constraint to model instance correlations within and across slides, respectively. These constraints encourage the network to capture the spatial correlation and semantic similarity of instances, improving the ability of AttriMIL to distinguish tissue types and identify challenging instances. Additionally, AttriMIL employs a histopathology adaptive backbone that maximizes the pre-trained model's feature extraction capability for collecting pathological features. Extensive experiments on three public benchmarks demonstrate that our AttriMIL outperforms existing state-of-the-art frameworks across multiple evaluation metrics. The implementation code is available at https://github.com/MedCAI/AttriMIL. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 10 pages, 8 figures

arXiv:2403.18339 [pdf, other]

H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT Images

Authors: **peng Lu, **gyun Chen, Linghan Cai, Songhan Jiang, Yongbing Zhang

Abstract: Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effec… ▽ More Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effectively model the non-linear dependencies between PET and CT modalities. Recent studies have investigated various approaches to optimize the fusion of modality-specific features for enhancing joint representations. However, modality-specific encoders used in these methods operate independently, inadequately leveraging the synergistic relationships inherent in PET and CT modalities, for example, the complementarity between semantics and structure. To address these issues, we propose a Hierarchical Adaptive Interaction and Weighting Network termed H2ASeg to explore the intrinsic cross-modal correlations and transfer potential complementary information. Specifically, we design a Modality-Cooperative Spatial Attention (MCSA) module that performs intra- and inter-modal interactions globally and locally. Additionally, a Target-Aware Modality Weighting (TAMW) module is developed to highlight tumor-related features within multi-modal features, thereby refining tumor segmentation. By embedding these modules across different layers, H2ASeg can hierarchically model cross-modal correlations, enabling a nuanced understanding of both semantic and structural tumor features. Extensive experiments demonstrate the superiority of H2ASeg, outperforming state-of-the-art methods on AutoPet-II and Hecktor2022 benchmarks. The code is released at https://github.com/**PLu/H2ASeg. △ Less

Submitted 28 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: 10 pages,4 figures

arXiv:2403.06898 [pdf, other]

SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation Instructions

Authors: Gang Liao, Ye Liu, Yonghua Ding, Le Cai, Jianjun Chen

Abstract: The ubiquity of variable-length integers in data storage and communication necessitates efficient decoding techniques. In this paper, we present SFVInt, a simple and fast approach to decode the prevalent Little Endian Base-128 (LEB128) varints. Our approach effectively utilizes the Bit Manipulation Instruction Set 2 (BMI2) in modern Intel and AMD processors, achieving significant performance impro… ▽ More The ubiquity of variable-length integers in data storage and communication necessitates efficient decoding techniques. In this paper, we present SFVInt, a simple and fast approach to decode the prevalent Little Endian Base-128 (LEB128) varints. Our approach effectively utilizes the Bit Manipulation Instruction Set 2 (BMI2) in modern Intel and AMD processors, achieving significant performance improvement while maintaining simplicity and avoiding overengineering. SFVInt, with its generic design, effectively processes both 32-bit and 64-bit unsigned integers using a unified code template, marking a significant leap forward in varint decoding efficiency. We thoroughly evaluate SFVInt's performance across various datasets and scenarios, demonstrating that it achieves up to a 2x increase in decoding speed when compared to varint decoding methods used in established frameworks like Facebook Folly and Google Protobuf. △ Less

Submitted 7 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: DaMoN 2024

arXiv:2403.04782 [pdf, other]

A Survey on Temporal Knowledge Graph: Representation Learning and Applications

Authors: Li Cai, Xin Mao, Yuhao Zhou, Zhaoguang Long, Changxu Wu, Man Lan

Abstract: Knowledge graphs have garnered significant research attention and are widely used to enhance downstream applications. However, most current studies mainly focus on static knowledge graphs, whose facts do not change with time, and disregard their dynamic evolution over time. As a result, temporal knowledge graphs have attracted more attention because a large amount of structured knowledge exists on… ▽ More Knowledge graphs have garnered significant research attention and are widely used to enhance downstream applications. However, most current studies mainly focus on static knowledge graphs, whose facts do not change with time, and disregard their dynamic evolution over time. As a result, temporal knowledge graphs have attracted more attention because a large amount of structured knowledge exists only within a specific period. Knowledge graph representation learning aims to learn low-dimensional vector embeddings for entities and relations in a knowledge graph. The representation learning of temporal knowledge graphs incorporates time information into the standard knowledge graph framework and can model the dynamics of entities and relations over time. In this paper, we conduct a comprehensive survey of temporal knowledge graph representation learning and its applications. We begin with an introduction to the definitions, datasets, and evaluation metrics for temporal knowledge graph representation learning. Next, we propose a taxonomy based on the core technologies of temporal knowledge graph representation learning methods, and provide an in-depth analysis of different methods in each category. Finally, we present various downstream applications related to the temporal knowledge graphs. In the end, we conclude the paper and have an outlook on the future research directions in this area. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.02355 [pdf, other]

Temporal Knowledge Graph Completion with Time-sensitive Relations in Hypercomplex Space

Authors: Li Cai, Xin Mao, Zhihong Wang, Shangqing Zhao, Yuhao Zhou, Changxu Wu, Man Lan

Abstract: Temporal knowledge graph completion (TKGC) aims to fill in missing facts within a given temporal knowledge graph at a specific time. Existing methods, operating in real or complex spaces, have demonstrated promising performance in this task. This paper advances beyond conventional approaches by introducing more expressive quaternion representations for TKGC within hypercomplex space. Unlike existi… ▽ More Temporal knowledge graph completion (TKGC) aims to fill in missing facts within a given temporal knowledge graph at a specific time. Existing methods, operating in real or complex spaces, have demonstrated promising performance in this task. This paper advances beyond conventional approaches by introducing more expressive quaternion representations for TKGC within hypercomplex space. Unlike existing quaternion-based methods, our study focuses on capturing time-sensitive relations rather than time-aware entities. Specifically, we model time-sensitive relations through time-aware rotation and periodic time translation, effectively capturing complex temporal variability. Furthermore, we theoretically demonstrate our method's capability to model symmetric, asymmetric, inverse, compositional, and evolutionary relation patterns. Comprehensive experiments on public datasets validate that our proposed approach achieves state-of-the-art performance in the field of TKGC. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.13506 [pdf, other]

Towards Efficient Verification of Constant-Time Cryptographic Implementations

Authors: Luwei Cai, Fu Song, Taolue Chen

Abstract: Timing side-channel attacks exploit secret-dependent execution time to fully or partially recover secrets of cryptographic implementations, posing a severe threat to software security. Constant-time programming discipline is an effective software-based countermeasure against timing side-channel attacks, but develo** constant-time implementations turns out to be challenging and error-prone. Curre… ▽ More Timing side-channel attacks exploit secret-dependent execution time to fully or partially recover secrets of cryptographic implementations, posing a severe threat to software security. Constant-time programming discipline is an effective software-based countermeasure against timing side-channel attacks, but develo** constant-time implementations turns out to be challenging and error-prone. Current verification approaches/tools suffer from scalability and precision issues when applied to production software in practice. In this paper, we put forward practical verification approaches based on a novel synergy of taint analysis and safety verification of self-composed programs. Specifically, we first use an IFDS-based lightweight taint analysis to prove that a large number of potential (timing) side-channel sources do not actually leak secrets. We then resort to a precise taint analysis and a safety verification approach to determine whether the remaining potential side-channel sources can actually leak secrets. These include novel constructions of taint-directed semi-cross-product of the original program and its Boolean abstraction, and a taint-directed self-composition of the program. Our approach is implemented as a cross-platform and fully automated tool CT-Prover. The experiments confirm its efficiency and effectiveness in verifying real-world benchmarks from modern cryptographic and SSL/TLS libraries. In particular, CT-Prover identify new, confirmed vulnerabilities of open-source SSL libraries (e.g., Mbed SSL, BearSSL) and significantly outperforms the state-of-the-art tools. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted by ACM FSE 2024

arXiv:2402.09588 [pdf, other]

Emerging Opportunities of Using Large Language Models for Translation Between Drug Molecules and Indications

Authors: David Oniani, Jordan Hilsman, Chengxi Zang, Junmei Wang, Lian** Cai, Jan Zawala, Yanshan Wang

Abstract: A drug molecule is a substance that changes the organism's mental or physical state. Every approved drug has an indication, which refers to the therapeutic use of that drug for treating a particular medical condition. While the Large Language Model (LLM), a generative Artificial Intelligence (AI) technique, has recently demonstrated effectiveness in translating between molecules and their textual… ▽ More A drug molecule is a substance that changes the organism's mental or physical state. Every approved drug has an indication, which refers to the therapeutic use of that drug for treating a particular medical condition. While the Large Language Model (LLM), a generative Artificial Intelligence (AI) technique, has recently demonstrated effectiveness in translating between molecules and their textual descriptions, there remains a gap in research regarding their application in facilitating the translation between drug molecules and indications, or vice versa, which could greatly benefit the drug discovery process. The capability of generating a drug from a given indication would allow for the discovery of drugs targeting specific diseases or targets and ultimately provide patients with better treatments. In this paper, we first propose a new task, which is the translation between drug molecules and corresponding indications, and then test existing LLMs on this new task. Specifically, we consider nine variations of the T5 LLM and evaluate them on two public datasets obtained from ChEMBL and DrugBank. Our experiments show the early results of using LLMs for this task and provide a perspective on the state-of-the-art. We also emphasize the current limitations and discuss future work that has the potential to improve the performance on this task. The creation of molecules from indications, or vice versa, will allow for more efficient targeting of diseases and significantly reduce the cost of drug discovery, with the potential to revolutionize the field of drug discovery in the era of generative AI. △ Less

Submitted 16 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.04756 [pdf, other]

Boundary-aware Contrastive Learning for Semi-supervised Nuclei Instance Segmentation

Authors: Ye Zhang, Ziyue Wang, Yifeng Wang, Hao Bian, Linghan Cai, Hengrui Li, Lingbo Zhang, Yongbing Zhang

Abstract: Semi-supervised segmentation methods have demonstrated promising results in natural scenarios, providing a solution to reduce dependency on manual annotation. However, these methods face significant challenges when directly applied to pathological images due to the subtle color differences between nuclei and tissues, as well as the significant morphological variations among nuclei. Consequently, t… ▽ More Semi-supervised segmentation methods have demonstrated promising results in natural scenarios, providing a solution to reduce dependency on manual annotation. However, these methods face significant challenges when directly applied to pathological images due to the subtle color differences between nuclei and tissues, as well as the significant morphological variations among nuclei. Consequently, the generated pseudo-labels often contain much noise, especially at the nuclei boundaries. To address the above problem, this paper proposes a boundary-aware contrastive learning network to denoise the boundary noise in a semi-supervised nuclei segmentation task. The model has two key designs: a low-resolution denoising (LRD) module and a cross-RoI contrastive learning (CRC) module. The LRD improves the smoothness of the nuclei boundary by pseudo-labels denoising, and the CRC enhances the discrimination between foreground and background by boundary feature contrastive learning. We conduct extensive experiments to demonstrate the superiority of our proposed method over existing semi-supervised instance segmentation methods. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 12 pages, 3 figures, 6 tables

arXiv:2401.17716 [pdf, other]

Enhancing Large Language Model with Decomposed Reasoning for Emotion Cause Pair Extraction

Authors: Jialiang Wu, Yi Shen, Ziheng Zhang, Longjun Cai

Abstract: Emotion-Cause Pair Extraction (ECPE) involves extracting clause pairs representing emotions and their causes in a document. Existing methods tend to overfit spurious correlations, such as positional bias in existing benchmark datasets, rather than capturing semantic features. Inspired by recent work, we explore leveraging large language model (LLM) to address ECPE task without additional training.… ▽ More Emotion-Cause Pair Extraction (ECPE) involves extracting clause pairs representing emotions and their causes in a document. Existing methods tend to overfit spurious correlations, such as positional bias in existing benchmark datasets, rather than capturing semantic features. Inspired by recent work, we explore leveraging large language model (LLM) to address ECPE task without additional training. Despite strong capabilities, LLMs suffer from uncontrollable outputs, resulting in mediocre performance. To address this, we introduce chain-of-thought to mimic human cognitive process and propose the Decomposed Emotion-Cause Chain (DECC) framework. Combining inducing inference and logical pruning, DECC guides LLMs to tackle ECPE task. We further enhance the framework by incorporating in-context learning. Experiment results demonstrate the strength of DECC compared to state-of-the-art supervised fine-tuning methods. Finally, we analyze the effectiveness of each component and the robustness of the method in various scenarios, including different LLM bases, rebalanced datasets, and multi-pair extraction. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 13 pages, 5 figures

arXiv:2401.11132 [pdf, other]

doi 10.1109/TVCG.2024.3361001

ConceptThread: Visualizing Threaded Concepts in MOOC Videos

Authors: Zhiguang Zhou, Li Ye, Lihong Cai, Lei Wang, Yigang Wang, Yongheng Wang, Wei Chen, Yong Wang

Abstract: Massive Open Online Courses (MOOCs) platforms are becoming increasingly popular in recent years. Online learners need to watch the whole course video on MOOC platforms to learn the underlying new knowledge, which is often tedious and time-consuming due to the lack of a quick overview of the covered knowledge and their structures. In this paper, we propose ConceptThread, a visual analytics approach… ▽ More Massive Open Online Courses (MOOCs) platforms are becoming increasingly popular in recent years. Online learners need to watch the whole course video on MOOC platforms to learn the underlying new knowledge, which is often tedious and time-consuming due to the lack of a quick overview of the covered knowledge and their structures. In this paper, we propose ConceptThread, a visual analytics approach to effectively show the concepts and the relations among them to facilitate effective online learning. Specifically, given that the majority of MOOC videos contain slides, we first leverage video processing and speech analysis techniques, including shot recognition, speech recognition and topic modeling, to extract core knowledge concepts and construct the hierarchical and temporal relations among them. Then, by using a metaphor of thread, we present a novel visualization to intuitively display the concepts based on video sequential flow, and enable learners to perform interactive visual exploration of concepts. We conducted a quantitative study, two case studies, and a user study to extensively evaluate ConceptThread. The results demonstrate the effectiveness and usability of ConceptThread in providing online learners with a quick understanding of the knowledge content of MOOC videos. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: 17 pages, 10 figures, 2 tables

arXiv:2401.09773 [pdf, other]

SEINE: Structure Encoding and Interaction Network for Nuclei Instance Segmentation

Authors: Ye Zhang, Linghan Cai, Ziyue Wang, Yongbing Zhang

Abstract: Nuclei instance segmentation in histopathological images is of great importance for biological analysis and cancer diagnosis but remains challenging for two reasons. (1) Similar visual presentation of intranuclear and extranuclear regions of chromophobe nuclei often causes under-segmentation, and (2) current methods lack the exploration of nuclei structure, resulting in fragmented instance predict… ▽ More Nuclei instance segmentation in histopathological images is of great importance for biological analysis and cancer diagnosis but remains challenging for two reasons. (1) Similar visual presentation of intranuclear and extranuclear regions of chromophobe nuclei often causes under-segmentation, and (2) current methods lack the exploration of nuclei structure, resulting in fragmented instance predictions. To address these problems, this paper proposes a structure encoding and interaction network, termed SEINE, which develops the structure modeling scheme of nuclei and exploits the structure similarity between nuclei to improve the integrality of each segmented instance. Concretely, SEINE introduces a contour-based structure encoding (SE) that considers the correlation between nuclei structure and semantics, realizing a reasonable representation of the nuclei structure. Based on the encoding, we propose a structure-guided attention (SGA) module that takes the clear nuclei as prototypes to enhance the structure learning for the fuzzy nuclei. To strengthen the structural learning ability, a semantic feature fusion (SFF) is presented to boost the semantic consistency of semantic and structure branches. Furthermore, a position enhancement (PE) method is applied to suppress incorrect nuclei boundary predictions. Extensive experiments demonstrate the superiority of our approaches, and SEINE achieves state-of-the-art (SOTA) performance on four datasets. The code is available at https://github.com/zhangye-zoe/SEINE. △ Less

Submitted 8 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: 10 pages, 12 figures, 6 tables, submitted to TMI

arXiv:2401.08123 [pdf, other]

The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and Aggregation

Authors: Xinni Jiang, Zengsheng Kuang, Chunle Guo, Ruixun Zhang, Lei Cai, Xiao Fan, Chongyi Li

Abstract: Guided depth super-resolution (GDSR) involves restoring missing depth details using the high-resolution RGB image of the same scene. Previous approaches have struggled with the heterogeneity and complementarity of the multi-modal inputs, and neglected the issues of modal misalignment, geometrical misalignment, and feature selection. In this study, we rethink some essential components in GDSR netwo… ▽ More Guided depth super-resolution (GDSR) involves restoring missing depth details using the high-resolution RGB image of the same scene. Previous approaches have struggled with the heterogeneity and complementarity of the multi-modal inputs, and neglected the issues of modal misalignment, geometrical misalignment, and feature selection. In this study, we rethink some essential components in GDSR networks and propose a simple yet effective Dynamic Dual Alignment and Aggregation network (D2A2). D2A2 mainly consists of 1) a dynamic dual alignment module that adapts to alleviate the modal misalignment via a learnable domain alignment block and geometrically align cross-modal features by learning the offset; and 2) a mask-to-pixel feature aggregate module that uses the gated mechanism and pixel attention to filter out irrelevant texture noise from RGB features and combine the useful features with depth features. By combining the strengths of RGB and depth features while minimizing disturbance introduced by the RGB image, our method with simple reuse and redesign of basic components achieves state-of-the-art performance on multiple benchmark datasets. The code is available at https://github.com/JiangXinni/D2A2. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.05602 [pdf]

Nucleus subtype classification using inter-modality learning

Authors: Lucas W. Remedios, Shunxing Bao, Samuel W. Remedios, Ho Hin Lee, Leon Y. Cai, Thomas Li, Ruining Deng, Can Cui, Jia Li, Qi Liu, Ken S. Lau, Joseph T. Roland, Mary K. Washington, Lori A. Coburn, Keith T. Wilson, Yuankai Huo, Bennett A. Landman

Abstract: Understanding the way cells communicate, co-locate, and interrelate is essential to understanding human physiology. Hematoxylin and eosin (H&E) staining is ubiquitously available both for clinical studies and research. The Colon Nucleus Identification and Classification (CoNIC) Challenge has recently innovated on robust artificial intelligence labeling of six cell types on H&E stains of the colon.… ▽ More Understanding the way cells communicate, co-locate, and interrelate is essential to understanding human physiology. Hematoxylin and eosin (H&E) staining is ubiquitously available both for clinical studies and research. The Colon Nucleus Identification and Classification (CoNIC) Challenge has recently innovated on robust artificial intelligence labeling of six cell types on H&E stains of the colon. However, this is a very small fraction of the number of potential cell classification types. Specifically, the CoNIC Challenge is unable to classify epithelial subtypes (progenitor, endocrine, goblet), lymphocyte subtypes (B, helper T, cytotoxic T), or connective subtypes (fibroblasts, stromal). In this paper, we propose to use inter-modality learning to label previously un-labelable cell types on virtual H&E. We leveraged multiplexed immunofluorescence (MxIF) histology imaging to identify 14 subclasses of cell types. We performed style transfer to synthesize virtual H&E from MxIF and transferred the higher density labels from MxIF to these virtual H&E images. We then evaluated the efficacy of learning in this approach. We identified helper T and progenitor nuclei with positive predictive values of $0.34 \pm 0.15$ (prevalence $0.03 \pm 0.01$) and $0.47 \pm 0.1$ (prevalence $0.07 \pm 0.02$) respectively on virtual H&E. This approach represents a promising step towards automating annotation in digital pathology. △ Less

Submitted 28 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

arXiv:2401.03571 [pdf, other]

α-HMM: A Graphical Model for RNA Folding

Authors: Sixiang Zhang, Aaron J. Yang, Liming Cai

Abstract: RNA secondary structure is modeled with the novel arbitrary-order hidden Markov model (α-HMM). The α-HMM extends over the traditional HMM with capability to model stochastic events that may be in influenced by historically distant ones, making it suitable to account for long-range canonical base pairings between nucleotides, which constitute the RNA secondary structure. Unlike previous heavy-weigh… ▽ More RNA secondary structure is modeled with the novel arbitrary-order hidden Markov model (α-HMM). The α-HMM extends over the traditional HMM with capability to model stochastic events that may be in influenced by historically distant ones, making it suitable to account for long-range canonical base pairings between nucleotides, which constitute the RNA secondary structure. Unlike previous heavy-weight extensions over HMM, the α-HMM has the flexibility to apply restrictions on how one event may influence another in stochastic processes, enabling efficient prediction of RNA secondary structure including pseudoknots. △ Less

Submitted 7 January, 2024; originally announced January 2024.

Comments: 14 pages, 5 figures, 1 table

arXiv:2312.12666 [pdf, other]

Incremental Semi-supervised Federated Learning for Health Inference via Mobile Sensing

Authors: Guimin Dong, Lihua Cai, Mingyue Tang, Laura E. Barnes, Mehdi Boukhechba

Abstract: Mobile sensing appears as a promising solution for health inference problem (e.g., influenza-like symptom recognition) by leveraging diverse smart sensors to capture fine-grained information about human behaviors and ambient contexts. Centralized training of machine learning models can place mobile users' sensitive information under privacy risks due to data breach and misexploitation. Federated L… ▽ More Mobile sensing appears as a promising solution for health inference problem (e.g., influenza-like symptom recognition) by leveraging diverse smart sensors to capture fine-grained information about human behaviors and ambient contexts. Centralized training of machine learning models can place mobile users' sensitive information under privacy risks due to data breach and misexploitation. Federated Learning (FL) enables mobile devices to collaboratively learn global models without the exposure of local private data. However, there are challenges of on-device FL deployment using mobile sensing: 1) long-term and continuously collected mobile sensing data may exhibit domain shifts as sensing objects (e.g. humans) have varying behaviors as a result of internal and/or external stimulus; 2) model retraining using all available data may increase computation and memory burden; and 3) the sparsity of annotated crowd-sourced data causes supervised FL to lack robustness. In this work, we propose FedMobile, an incremental semi-supervised federated learning algorithm, to train models semi-supervisedly and incrementally in a decentralized online fashion. We evaluate FedMobile using a real-world mobile sensing dataset for influenza-like symptom recognition. Our empirical results show that FedMobile-trained models achieve the best results in comparison to the selected baseline methods. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.04748 [pdf, other]

Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Authors: Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Ling Cai, Nathalie Baracaldo

Abstract: Growing applications of large language models (LLMs) trained by a third party raise serious concerns on the security vulnerability of LLMs.It has been demonstrated that malicious actors can covertly exploit these vulnerabilities in LLMs through poisoning attacks aimed at generating undesirable outputs. While poisoning attacks have received significant attention in the image domain (e.g., object de… ▽ More Growing applications of large language models (LLMs) trained by a third party raise serious concerns on the security vulnerability of LLMs.It has been demonstrated that malicious actors can covertly exploit these vulnerabilities in LLMs through poisoning attacks aimed at generating undesirable outputs. While poisoning attacks have received significant attention in the image domain (e.g., object detection), and classification tasks, their implications for generative models, particularly in the realm of natural language generation (NLG) tasks, remain poorly understood. To bridge this gap, we perform a comprehensive exploration of various poisoning techniques to assess their effectiveness across a range of generative tasks. Furthermore, we introduce a range of metrics designed to quantify the success and stealthiness of poisoning attacks specifically tailored to NLG tasks. Through extensive experiments on multiple NLG tasks, LLMs and datasets, we show that it is possible to successfully poison an LLM during the fine-tuning stage using as little as 1\% of the total tuning data samples. Our paper presents the first systematic approach to comprehend poisoning attacks targeting NLG tasks considering a wide range of triggers and attack settings. We hope our findings will assist the AI security community in devising appropriate defenses against such threats. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 19 pages, 6 figures. Published at NeurIPS 2023 Workshop on Backdoors in Deep Learning: The Good, the Bad, and the Ugly

arXiv:2312.01151 [pdf]

doi 10.5281/zenodo.8286277

Here Is Not There: Measuring Entailment-Based Trajectory Similarity for Location-Privacy Protection and Beyond

Authors: Zilong Liu, Krzysztof Janowicz, Kitty Currier, Meilin Shi, **meng Rao, Song Gao, Ling Cai, Anita Graser

Abstract: While the paths humans take play out in social as well as physical space, measures to describe and compare their trajectories are carried out in abstract, typically Euclidean, space. When these measures are applied to trajectories of actual individuals in an application area, alterations that are inconsequential in abstract space may suddenly become problematic once overlaid with geographic realit… ▽ More While the paths humans take play out in social as well as physical space, measures to describe and compare their trajectories are carried out in abstract, typically Euclidean, space. When these measures are applied to trajectories of actual individuals in an application area, alterations that are inconsequential in abstract space may suddenly become problematic once overlaid with geographic reality. In this work, we present a different view on trajectory similarity by introducing a measure that utilizes logical entailment. This is an inferential perspective that considers facts as triple statements deduced from the social and environmental context in which the travel takes place, and their practical implications. We suggest a formalization of entailment-based trajectory similarity, measured as the overlap** proportion of facts, which are spatial relation statements in our case study. With the proposed measure, we evaluate LSTM-TrajGAN, a privacy-preserving trajectory-generation model. The entailment-based model evaluation reveals potential consequences of disregarding the rich structure of geographic space (e.g., miscalculated insurance risk due to regional shifts in our toy example). Our work highlights the advantage of applying logical entailment to trajectory-similarity reasoning for location-privacy protection and beyond. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.08782 [pdf, other]

Language Semantic Graph Guided Data-Efficient Learning

Authors: Wenxuan Ma, Shuang Li, Lincan Cai, **gxuan Kang

Abstract: Develo** generalizable models that can effectively learn from limited data and with minimal reliance on human supervision is a significant objective within the machine learning community, particularly in the era of deep neural networks. Therefore, to achieve data-efficient learning, researchers typically explore approaches that can leverage more related or unlabeled data without necessitating ad… ▽ More Develo** generalizable models that can effectively learn from limited data and with minimal reliance on human supervision is a significant objective within the machine learning community, particularly in the era of deep neural networks. Therefore, to achieve data-efficient learning, researchers typically explore approaches that can leverage more related or unlabeled data without necessitating additional manual labeling efforts, such as Semi-Supervised Learning (SSL), Transfer Learning (TL), and Data Augmentation (DA). SSL leverages unlabeled data in the training process, while TL enables the transfer of expertise from related data distributions. DA broadens the dataset by synthesizing new data from existing examples. However, the significance of additional knowledge contained within labels has been largely overlooked in research. In this paper, we propose a novel perspective on data efficiency that involves exploiting the semantic information contained in the labels of the available data. Specifically, we introduce a Language Semantic Graph (LSG) which is constructed from labels manifest as natural language descriptions. Upon this graph, an auxiliary graph neural network is trained to extract high-level semantic relations and then used to guide the training of the primary model, enabling more adequate utilization of label knowledge. Across image, video, and audio modalities, we utilize the LSG method in both TL and SSL scenarios and illustrate its versatility in significantly enhancing performance compared to other data-efficient learning approaches. Additionally, our in-depth analysis shows that the LSG method also expedites the training process. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2310.13398 [pdf, other]

OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D Data

Authors: Yijie Zhou, Likun Cai, Xianhui Cheng, Zhongxue Gan, Xiangyang Xue, Wenchao Ding

Abstract: In the era of big data and large models, automatic annotating functions for multi-modal data are of great significance for real-world AI-driven applications, such as autonomous driving and embodied AI. Unlike traditional closed-set annotation, open-vocabulary annotation is essential to achieve human-level cognition capability. However, there are few open-vocabulary auto-labeling systems for multi-… ▽ More In the era of big data and large models, automatic annotating functions for multi-modal data are of great significance for real-world AI-driven applications, such as autonomous driving and embodied AI. Unlike traditional closed-set annotation, open-vocabulary annotation is essential to achieve human-level cognition capability. However, there are few open-vocabulary auto-labeling systems for multi-modal 3D data. In this paper, we introduce OpenAnnotate3D, an open-source open-vocabulary auto-labeling system that can automatically generate 2D masks, 3D masks, and 3D bounding box annotations for vision and point cloud data. Our system integrates the chain-of-thought capabilities of Large Language Models (LLMs) and the cross-modality capabilities of vision-language models (VLMs). To the best of our knowledge, OpenAnnotate3D is one of the pioneering works for open-vocabulary multi-modal 3D auto-labeling. We conduct comprehensive evaluations on both public and in-house real-world datasets, which demonstrate that the system significantly improves annotation efficiency compared to manual annotation while providing accurate open-vocabulary auto-annotating results. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: The source code will be released at https://github.com/Fudan-ProjectTitan/OpenAnnotate3D

arXiv:2310.11246 [pdf, other]

Query2Triple: Unified Query Encoding for Answering Diverse Complex Queries over Knowledge Graphs

Authors: Yao Xu, Shizhu He, Cunguang Wang, Li Cai, Kang Liu, Jun Zhao

Abstract: Complex Query Answering (CQA) is a challenge task of Knowledge Graph (KG). Due to the incompleteness of KGs, query embedding (QE) methods have been proposed to encode queries and entities into the same embedding space, and treat logical operators as neural set operators to obtain answers. However, these methods train KG embeddings and neural set operators concurrently on both simple (one-hop) and… ▽ More Complex Query Answering (CQA) is a challenge task of Knowledge Graph (KG). Due to the incompleteness of KGs, query embedding (QE) methods have been proposed to encode queries and entities into the same embedding space, and treat logical operators as neural set operators to obtain answers. However, these methods train KG embeddings and neural set operators concurrently on both simple (one-hop) and complex (multi-hop and logical) queries, which causes performance degradation on simple queries and low training efficiency. In this paper, we propose Query to Triple (Q2T), a novel approach that decouples the training for simple and complex queries. Q2T divides the training into two stages: (1) Pre-training a neural link predictor on simple queries to predict tail entities based on the head entity and relation. (2) Training a query encoder on complex queries to encode diverse complex queries into a unified triple form that can be efficiently solved by the pretrained neural link predictor. Our proposed Q2T is not only efficient to train, but also modular, thus easily adaptable to various neural link predictors that have been studied well. Extensive experiments demonstrate that, even without explicit modeling for neural set operators, Q2T still achieves state-of-the-art performance on diverse complex queries over three public benchmarks. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted by EMNLP 2023 findings

arXiv:2309.14555 [pdf, ps, other]

Optimal Stop** with Multi-Dimensional Comparative Loss Aversion

Authors: Linda Cai, Joshua Gardner, S. Matthew Weinberg

Abstract: Despite having the same basic prophet inequality setup and model of loss aversion, conclusions in our multi-dimensional model differs considerably from the one-dimensional model of Kleinberg et al. For example, Kleinberg et al. gives a tight closed-form on the competitive ratio that an online decision-maker can achieve as a function of $λ$, for any $λ\geq 0$. In our multi-dimensional model, there… ▽ More Despite having the same basic prophet inequality setup and model of loss aversion, conclusions in our multi-dimensional model differs considerably from the one-dimensional model of Kleinberg et al. For example, Kleinberg et al. gives a tight closed-form on the competitive ratio that an online decision-maker can achieve as a function of $λ$, for any $λ\geq 0$. In our multi-dimensional model, there is a sharp phase transition: if $k$ denotes the number of dimensions, then when $λ\cdot (k-1) \geq 1$, no non-trivial competitive ratio is possible. On the other hand, when $λ\cdot (k-1) < 1$, we give a tight bound on the achievable competitive ratio (similar to Kleinberg et al.). As another example, Kleinberg et al. uncovers an exponential improvement in their competitive ratio for the random-order vs. worst-case prophet inequality problem. In our model with $k\geq 2$ dimensions, the gap is at most a constant-factor. We uncover several additional key differences in the multi- and single-dimensional models. △ Less

Submitted 26 September, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: Accepted to WINE 2023

arXiv:2309.09392 [pdf, other]

Deep conditional generative models for longitudinal single-slice abdominal computed tomography harmonization

Authors: Xin Yu, Qi Yang, Yucheng Tang, Riqiang Gao, Shunxing Bao, Leon Y. Cai, Ho Hin Lee, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman

Abstract: Two-dimensional single-slice abdominal computed tomography (CT) provides a detailed tissue map with high resolution allowing quantitative characterization of relationships between health conditions and aging. However, longitudinal analysis of body composition changes using these scans is difficult due to positional variation between slices acquired in different years, which leading to different or… ▽ More Two-dimensional single-slice abdominal computed tomography (CT) provides a detailed tissue map with high resolution allowing quantitative characterization of relationships between health conditions and aging. However, longitudinal analysis of body composition changes using these scans is difficult due to positional variation between slices acquired in different years, which leading to different organs/tissues captured. To address this issue, we propose C-SliceGen, which takes an arbitrary axial slice in the abdominal region as a condition and generates a pre-defined vertebral level slice by estimating structural changes in the latent space. Our experiments on 2608 volumetric CT data from two in-house datasets and 50 subjects from the 2015 Multi-Atlas Abdomen Labeling Challenge dataset (BTCV) Challenge demonstrate that our model can generate high-quality images that are realistic and similar. We further evaluate our method's capability to harmonize longitudinal positional variation on 1033 subjects from the Baltimore Longitudinal Study of Aging (BLSA) dataset, which contains longitudinal single abdominal slices, and confirmed that our method can harmonize the slice positional variance in terms of visceral fat area. This approach provides a promising direction for map** slices from different vertebral levels to a target slice and reducing positional variance for single-slice longitudinal analysis. The source code is available at: https://github.com/MASILab/C-SliceGen. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.05446 [pdf, other]

A Localization-to-Segmentation Framework for Automatic Tumor Segmentation in Whole-Body PET/CT Images

Authors: Linghan Cai, Jianhao Huang, Zihang Zhu, **peng Lu, Yongbing Zhang

Abstract: Fluorodeoxyglucose (FDG) positron emission tomography (PET) combined with computed tomography (CT) is considered the primary solution for detecting some cancers, such as lung cancer and melanoma. Automatic segmentation of tumors in PET/CT images can help reduce doctors' workload, thereby improving diagnostic quality. However, precise tumor segmentation is challenging due to the small size of many… ▽ More Fluorodeoxyglucose (FDG) positron emission tomography (PET) combined with computed tomography (CT) is considered the primary solution for detecting some cancers, such as lung cancer and melanoma. Automatic segmentation of tumors in PET/CT images can help reduce doctors' workload, thereby improving diagnostic quality. However, precise tumor segmentation is challenging due to the small size of many tumors and the similarity of high-uptake normal areas to the tumor regions. To address these issues, this paper proposes a localization-to-segmentation framework (L2SNet) for precise tumor segmentation. L2SNet first localizes the possible lesions in the lesion localization phase and then uses the location cues to shape the segmentation results in the lesion segmentation phase. To further improve the segmentation performance of L2SNet, we design an adaptive threshold scheme that takes the segmentation results of the two phases into consideration. The experiments with the MICCAI 2023 Automated Lesion Segmentation in Whole-Body FDG-PET/CT challenge dataset show that our method achieved a competitive result and was ranked in the top 7 methods on the preliminary test set. Our work is available at: https://github.com/MedCAI/L2SNet. △ Less

Submitted 14 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: 7 pages,3 figures

arXiv:2309.04344 [pdf, other]

Zero-Shot Robustification of Zero-Shot Models

Authors: Dyah Adila, Changho Shin, Linrong Cai, Frederic Sala

Abstract: Zero-shot inference is a powerful paradigm that enables the use of large pretrained models for downstream classification tasks without further training. However, these models are vulnerable to inherited biases that can impact their performance. The traditional solution is fine-tuning, but this undermines the key advantage of pretrained models, which is their ability to be used out-of-the-box. We p… ▽ More Zero-shot inference is a powerful paradigm that enables the use of large pretrained models for downstream classification tasks without further training. However, these models are vulnerable to inherited biases that can impact their performance. The traditional solution is fine-tuning, but this undermines the key advantage of pretrained models, which is their ability to be used out-of-the-box. We propose RoboShot, a method that improves the robustness of pretrained model embeddings in a fully zero-shot fashion. First, we use language models (LMs) to obtain useful insights from task descriptions. These insights are embedded and used to remove harmful and boost useful components in embeddings -- without any supervision. Theoretically, we provide a simple and tractable model for biases in zero-shot embeddings and give a result characterizing under what conditions our approach can boost performance. Empirically, we evaluate RoboShot on nine image and NLP classification tasks and show an average improvement of 15.98% on worst group accuracy, with trivial decrease in overall accuracy over several zero-shot baselines. Additionally, we demonstrate that RoboShot is compatible with a variety of pretrained and language models and propose a way to further boost performance with a zero-shot adaptation variant. △ Less

Submitted 12 February, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: International Conference on Learning Representations (ICLR), 2024

arXiv:2308.12242 [pdf, ps, other]

Recent Developments in Pandora's Box Problem: Variants and Applications

Authors: Hedyeh Beyhaghi, Linda Cai

Abstract: In 1979, Weitzman introduced Pandora's box problem as a framework for sequential search with costly inspections. Recently, there has been a surge of interest in Pandora's box problem, particularly among researchers working at the intersection of economics and computation. This survey provides an overview of the recent literature on Pandora's box problem, including its latest extensions and applica… ▽ More In 1979, Weitzman introduced Pandora's box problem as a framework for sequential search with costly inspections. Recently, there has been a surge of interest in Pandora's box problem, particularly among researchers working at the intersection of economics and computation. This survey provides an overview of the recent literature on Pandora's box problem, including its latest extensions and applications in areas such as market design, decision theory, and machine learning. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: The survey appears in ACM SIGecom Exchanges, Vol. 21, No. 1, June 2023. https://www.sigecom.org/exchanges/volume_21/1/BEYHAGHI.pdf

arXiv:2307.12548 [pdf]

MFMAN-YOLO: A Method for Detecting Pole-like Obstacles in Complex Environment

Authors: Lei Cai, Hao Wang, Congling Zhou, Yongqiang Wang, Boyu Liu

Abstract: In real-world traffic, there are various uncertainties and complexities in road and weather conditions. To solve the problem that the feature information of pole-like obstacles in complex environments is easily lost, resulting in low detection accuracy and low real-time performance, a multi-scale hybrid attention mechanism detection algorithm is proposed in this paper. First, the optimal transport… ▽ More In real-world traffic, there are various uncertainties and complexities in road and weather conditions. To solve the problem that the feature information of pole-like obstacles in complex environments is easily lost, resulting in low detection accuracy and low real-time performance, a multi-scale hybrid attention mechanism detection algorithm is proposed in this paper. First, the optimal transport function Monge-Kantorovich (MK) is incorporated not only to solve the problem of overlap** multiple prediction frames with optimal matching but also the MK function can be regularized to prevent model over-fitting; then, the features at different scales are up-sampled separately according to the optimized efficient multi-scale feature pyramid. Finally, the extraction of multi-scale feature space channel information is enhanced in complex environments based on the hybrid attention mechanism, which suppresses the irrelevant complex environment background information and focuses the feature information of pole-like obstacles. Meanwhile, this paper conducts real road test experiments in a variety of complex environments. The experimental results show that the detection precision, recall, and average precision of the method are 94.7%, 93.1%, and 97.4%, respectively, and the detection frame rate is 400 f/s. This research method can detect pole-like obstacles in a complex road environment in real time and accurately, which further promotes innovation and progress in the field of automatic driving. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 11 pages

ACM Class: I.4.1; I.2.10

arXiv:2307.06863 [pdf, ps, other]

Measuring a Low-Earth-Orbit Satellite Network

Authors: Jian** Pan, **wei Zhao, Lin Cai

Abstract: Starlink and alike have attracted a lot of attention recently, however, the inner working of these low-earth-orbit (LEO) satellite networks is still largely unknown. This paper presents an ongoing measurement campaign focusing on Starlink, including its satellite access networks, gateway and point-of-presence structures, and backbone and Internet connections, revealing insights applicable to other… ▽ More Starlink and alike have attracted a lot of attention recently, however, the inner working of these low-earth-orbit (LEO) satellite networks is still largely unknown. This paper presents an ongoing measurement campaign focusing on Starlink, including its satellite access networks, gateway and point-of-presence structures, and backbone and Internet connections, revealing insights applicable to other LEO satellite providers. It also highlights the challenges and research opportunities of the integrated space-air-ground-aqua network envisioned by 6G mobile communication systems, and calls for a concerted community effort from practical and experimentation aspects. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2307.06013 [pdf, other]

An Effective and Efficient Time-aware Entity Alignment Framework via Two-aspect Three-view Label Propagation

Authors: Li Cai, Xin Mao, Youshao Xiao, Changxu Wu, Man Lan

Abstract: Entity alignment (EA) aims to find the equivalent entity pairs between different knowledge graphs (KGs), which is crucial to promote knowledge fusion. With the wide use of temporal knowledge graphs (TKGs), time-aware EA (TEA) methods appear to enhance EA. Existing TEA models are based on Graph Neural Networks (GNN) and achieve state-of-the-art (SOTA) performance, but it is difficult to transfer th… ▽ More Entity alignment (EA) aims to find the equivalent entity pairs between different knowledge graphs (KGs), which is crucial to promote knowledge fusion. With the wide use of temporal knowledge graphs (TKGs), time-aware EA (TEA) methods appear to enhance EA. Existing TEA models are based on Graph Neural Networks (GNN) and achieve state-of-the-art (SOTA) performance, but it is difficult to transfer them to large-scale TKGs due to the scalability issue of GNN. In this paper, we propose an effective and efficient non-neural EA framework between TKGs, namely LightTEA, which consists of four essential components: (1) Two-aspect Three-view Label Propagation, (2) Sparse Similarity with Temporal Constraints, (3) Sinkhorn Operator, and (4) Temporal Iterative Learning. All of these modules work together to improve the performance of EA while reducing the time consumption of the model. Extensive experiments on public datasets indicate that our proposed model significantly outperforms the SOTA methods for EA between TKGs, and the time consumed by LightTEA is only dozens of seconds at most, no more than 10% of the most efficient TEA method. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: Accepted by IJCAI 2023

arXiv:2307.04175 [pdf, ps, other]

Selling to Multiple No-Regret Buyers

Authors: Linda Cai, S. Matthew Weinberg, Evan Wildenhain, Shirley Zhang

Abstract: We consider the problem of repeatedly auctioning a single item to multiple i.i.d buyers who each use a no-regret learning algorithm to bid over time. In particular, we study the seller's optimal revenue, if they know that the buyers are no-regret learners (but only that their behavior satisfies some no-regret property -- they do not know the precise algorithm/heuristic used). Our main result des… ▽ More We consider the problem of repeatedly auctioning a single item to multiple i.i.d buyers who each use a no-regret learning algorithm to bid over time. In particular, we study the seller's optimal revenue, if they know that the buyers are no-regret learners (but only that their behavior satisfies some no-regret property -- they do not know the precise algorithm/heuristic used). Our main result designs an auction that extracts revenue equal to the full expected welfare whenever the buyers are "mean-based" (a property satisfied by standard no-regret learning algorithms such as Multiplicative Weights, Follow-the-Perturbed-Leader, etc.). This extends a main result of [BMSW18] which held only for a single buyer. Our other results consider the case when buyers are mean-based but never overbid. On this front, [BMSW18] provides a simple LP formulation for the revenue-maximizing auction for a single-buyer. We identify several formal barriers to extending this approach to multiple buyers. △ Less

Submitted 9 July, 2023; originally announced July 2023.

arXiv:2306.02900 [pdf, other]

Robust Fiber ODF Estimation Using Deep Constrained Spherical Deconvolution for Diffusion MRI

Authors: Tianyuan Yao, Francois Rheault, Leon Y Cai, Vishwesh nath, Zuhayr Asad, Nancy Newlin, Can Cui, Ruining Deng, Karthik Ramadass, Andrea Shafer, Susan Resnick, Kurt Schilling, Bennett A. Landman, Yuankai Huo

Abstract: Diffusion-weighted magnetic resonance imaging (DW-MRI) is a critical imaging method for capturing and modeling tissue microarchitecture at a millimeter scale. A common practice to model the measured DW-MRI signal is via fiber orientation distribution function (fODF). This function is the essential first step for the downstream tractography and connectivity analyses. With recent advantages in data… ▽ More Diffusion-weighted magnetic resonance imaging (DW-MRI) is a critical imaging method for capturing and modeling tissue microarchitecture at a millimeter scale. A common practice to model the measured DW-MRI signal is via fiber orientation distribution function (fODF). This function is the essential first step for the downstream tractography and connectivity analyses. With recent advantages in data sharing, large-scale multi-site DW-MRI datasets are being made available for multi-site studies. However, measurement variabilities (e.g., inter- and intra-site variability, hardware performance, and sequence design) are inevitable during the acquisition of DW-MRI. Most existing model-based methods (e.g., constrained spherical deconvolution (CSD)) and learning based methods (e.g., deep learning (DL)) do not explicitly consider such variabilities in fODF modeling, which consequently leads to inferior performance on multi-site and/or longitudinal diffusion studies. In this paper, we propose a novel data-driven deep constrained spherical deconvolution method to explicitly constrain the scan-rescan variabilities for a more reproducible and robust estimation of brain microstructure from repeated DW-MRI scans. Specifically, the proposed method introduces a new 3D volumetric scanner-invariant regularization scheme during the fODF estimation. We study the Human Connectome Project (HCP) young adults test-retest group as well as the MASiVar dataset (with inter- and intra-site scan/rescan data). The Baltimore Longitudinal Study of Aging (BLSA) dataset is employed for external validation. From the experimental results, the proposed data-driven framework outperforms the existing benchmarks in repeated fODF estimation. The proposed method is assessing the downstream connectivity analysis and shows increased performance in distinguishing subjects with different biomarkers. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: 33 pages, 7 figures

arXiv:2306.01665 [pdf, other]

SourceP: Detecting Ponzi Schemes on Ethereum with Source Code

Authors: Pengcheng Lu, Liang Cai, Keting Yin

Abstract: As blockchain technology becomes more and more popular, a typical financial scam, the Ponzi scheme, has also emerged in the blockchain platform Ethereum. This Ponzi scheme deployed through smart contracts, also known as the smart Ponzi scheme, has caused a lot of economic losses and negative impacts. Existing methods for detecting smart Ponzi schemes on Ethereum mainly rely on bytecode features, o… ▽ More As blockchain technology becomes more and more popular, a typical financial scam, the Ponzi scheme, has also emerged in the blockchain platform Ethereum. This Ponzi scheme deployed through smart contracts, also known as the smart Ponzi scheme, has caused a lot of economic losses and negative impacts. Existing methods for detecting smart Ponzi schemes on Ethereum mainly rely on bytecode features, opcode features, account features, and transaction behavior features of smart contracts, which are unable to truly characterize the behavioral features of Ponzi schemes, and thus generally perform poorly in terms of detection accuracy and false alarm rates. In this paper, we propose SourceP, a method to detect smart Ponzi schemes on the Ethereum platform using pre-trained models and data flow, which only requires using the source code of smart contracts as features. SourceP reduces the difficulty of data acquisition and feature extraction of existing detection methods. Specifically, we first convert the source code of a smart contract into a data flow graph and then introduce a pre-trained model based on learning code representations to build a classification model to identify Ponzi schemes in smart contracts. The experimental results show that SourceP achieves 87.2% recall and 90.7% F-score for detecting smart Ponzi schemes within Ethereum's smart contract dataset, outperforming state-of-the-art methods in terms of performance and sustainability. We also demonstrate through additional experiments that pre-trained models and data flow play an important contribution to SourceP, as well as proving that SourceP has a good generalization ability. △ Less

Submitted 29 February, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: 12 pages, 5 figures, 4 tables

arXiv:2305.02330 [pdf, other]

Robot Goes Fishing: Rapid, High-Resolution Biological Hotspot Map** in Coral Reefs with Vision-Guided Autonomous Underwater Vehicles

Authors: Daniel Yang, Levi Cai, Stewart Jamieson, Yogesh Girdhar

Abstract: Coral reefs are fast-changing and complex ecosystems that are crucial to monitor and study. Biological hotspot detection can help coral reef managers prioritize limited resources for monitoring and intervention tasks. Here, we explore the use of autonomous underwater vehicles (AUVs) with cameras, coupled with visual detectors and photogrammetry, to map and identify these hotspots. This approach ca… ▽ More Coral reefs are fast-changing and complex ecosystems that are crucial to monitor and study. Biological hotspot detection can help coral reef managers prioritize limited resources for monitoring and intervention tasks. Here, we explore the use of autonomous underwater vehicles (AUVs) with cameras, coupled with visual detectors and photogrammetry, to map and identify these hotspots. This approach can provide high spatial resolution information in fast feedback cycles. To the best of our knowledge, we present one of the first attempts at using an AUV to gather visually-observed, fine-grain biological hotspot maps in concert with topography of a coral reefs. Our hotspot maps correlate with rugosity, an established proxy metric for coral reef biodiversity and abundance, as well as with our visual inspections of the 3D reconstruction. We also investigate issues of scaling this approach when applied to new reefs by using these visual detectors pre-trained on large public datasets. △ Less

Submitted 1 February, 2024; v1 submitted 3 May, 2023; originally announced May 2023.

Comments: CV4Animals Workshop at CVPR 2023

arXiv:2304.12149 [pdf, other]

Exploring shared memory architectures for end-to-end gigapixel deep learning

Authors: Lucas W. Remedios, Leon Y. Cai, Samuel W. Remedios, Karthik Ramadass, Aravind Krishnan, Ruining Deng, Can Cui, Shunxing Bao, Lori A. Coburn, Yuankai Huo, Bennett A. Landman

Abstract: Deep learning has made great strides in medical imaging, enabled by hardware advances in GPUs. One major constraint for the development of new models has been the saturation of GPU memory resources during training. This is especially true in computational pathology, where images regularly contain more than 1 billion pixels. These pathological images are traditionally divided into small patches to… ▽ More Deep learning has made great strides in medical imaging, enabled by hardware advances in GPUs. One major constraint for the development of new models has been the saturation of GPU memory resources during training. This is especially true in computational pathology, where images regularly contain more than 1 billion pixels. These pathological images are traditionally divided into small patches to enable deep learning due to hardware limitations. In this work, we explore whether the shared GPU/CPU memory architecture on the M1 Ultra systems-on-a-chip (SoCs) recently released by Apple, Inc. may provide a solution. These affordable systems (less than \$5000) provide access to 128 GB of unified memory (Mac Studio with M1 Ultra SoC). As a proof of concept for gigapixel deep learning, we identified tissue from background on gigapixel areas from whole slide images (WSIs). The model was a modified U-Net (4492 parameters) leveraging large kernels and high stride. The M1 Ultra SoC was able to train the model directly on gigapixel images (16000$\times$64000 pixels, 1.024 billion pixels) with a batch size of 1 using over 100 GB of unified memory for the process at an average speed of 1 minute and 21 seconds per batch with Tensorflow 2/Keras. As expected, the model converged with a high Dice score of 0.989 $\pm$ 0.005. Training up until this point took 111 hours and 24 minutes over 4940 steps. Other high RAM GPUs like the NVIDIA A100 (largest commercially accessible at 80 GB, $\sim$\$15000) are not yet widely available (in preview for select regions on Amazon Web Services at \$40.96/hour as a group of 8). This study is a promising step towards WSI-wise end-to-end deep learning with prevalent network architectures. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2304.03126 [pdf, other]

Datamator: An Intelligent Authoring Tool for Creating Datamations via Data Query Decomposition

Authors: Yi Guo, Nan Cao, Ligan Cai, Yanqiu Wu, Daniel Weiskopf, Danqing Shi, Qing Chen

Abstract: Datamation is designed to animate an analysis pipeline step by step, which is an intuitive and effective way to interpret the results from data analysis. However, creating a datamation is not easy. A qualified datamation needs to not only provide a correct analysis result but also ensure that the data flow and animation are coherent. Existing animation authoring tools focus on either leveraging al… ▽ More Datamation is designed to animate an analysis pipeline step by step, which is an intuitive and effective way to interpret the results from data analysis. However, creating a datamation is not easy. A qualified datamation needs to not only provide a correct analysis result but also ensure that the data flow and animation are coherent. Existing animation authoring tools focus on either leveraging algorithms to automatically generate an animation based on user-provided charts or building graphical user interfaces to provide a programming-free authoring environment for users. None of them are able to help users translate an analysis task into a series of data operations to form an analysis pipeline and visualize them as a datamation. To fill this gap, we introduce Datamator, an intelligent authoring tool developed to support datamation design and generation. It leverages a novel data query decomposition model to allow users to generate an initial datamation by simply inputting a data query in natural language. The initial datamation can be refined via rich interactions and a feedback mechanism is utilized to update the decomposition model based on user knowledge and preferences. Our system produces an animated sequence of visualizations driven by a set of low-level data actions. It supports unit visualizations, which provide a map** from each data item to a unique visual mark. We demonstrate the effectiveness of Datamator via a series of evaluations including case studies, performance validation, and a controlled user study. △ Less

Submitted 12 April, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Showing 1–50 of 176 results for author: Cai, L