Search | arXiv e-print repository

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Authors: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, **gwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao , et al. (2 additional authors not shown)

Abstract: We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th… ▽ More We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. This long-context capability allows IXC-2.5 to excel in tasks requiring extensive input and output contexts. Compared to its previous 2.0 version, InternLM-XComposer-2.5 features three major upgrades in vision-language comprehension: (1) Ultra-High Resolution Understanding, (2) Fine-Grained Video Understanding, and (3) Multi-Turn Multi-Image Dialogue. In addition to comprehension, IXC-2.5 extends to two compelling applications using extra LoRA parameters for text-image composition: (1) Crafting Webpages and (2) Composing High-Quality Text-Image Articles. IXC-2.5 has been evaluated on 28 benchmarks, outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks. The InternLM-XComposer-2.5 is publicly available at https://github.com/InternLM/InternLM-XComposer. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Technical Report. https://github.com/InternLM/InternLM-XComposer

arXiv:2407.03314 [pdf, other]

BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, **yu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

Abstract: This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimu… ▽ More This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimum elements and presents them in a graph structure. Element-wise style enables easy understanding, and structural composition liberates difficult locating. Careful prompt design births the BACON captions with the help of public-available VLMs and segmentation methods. In this way, we gather a dataset with 100K annotated images, which endow VLMs with remarkable capabilities, such as accurately generating BACON, transforming prompts into BACON format, envisioning scenarios in the style of BACONr, and dynamically modifying elements within BACON through interactive dialogue and more. Wide representative experiments, including detection, VQA, and image generation tasks, tell BACON as a lifeline to achieve previous out-of-reach tasks or excel in their current cutting-edge solutions. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03307 [pdf, other]

HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization

Authors: Yucheng Tang, Yufan He, Vishwesh Nath, Pengfeig Guo, Ruining Deng, Tianyuan Yao, Quan Liu, Can Cui, Mengmeng Yin, Ziyue Xu, Holger Roth, Daguang Xu, Haichun Yang, Yuankai Huo

Abstract: In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this… ▽ More In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this paper, we propose the holistic histopathology (HoloHisto) segmentation method to achieve end-to-end segmentation on gigapixel WSIs, whose maximum resolution is above 80,000$\times$70,000 pixels. HoloHisto fundamentally shifts the paradigm of WSI segmentation to an end-to-end learning fashion with 1) a large (4K) resolution base patch for elevated visual information inclusion and efficient processing, and 2) a novel sequential tokenization mechanism to properly model the contextual relationships and efficiently model the rich information from the 4K input. To our best knowledge, HoloHisto presents the first holistic approach for gigapixel resolution WSI segmentation, supporting direct I/O of complete WSI and their corresponding gigapixel masks. Under the HoloHisto platform, we unveil a random 4K sampler that transcends ultra-high resolution, delivering 31 and 10 times more pixels than standard 2D and 3D patches, respectively, for advancing computational capabilities. To facilitate efficient 4K resolution dense prediction, we leverage sequential tokenization, utilizing a pre-trained image tokenizer to group image features into a discrete token grid. To assess the performance, our team curated a new kidney pathology image segmentation (KPIs) dataset with WSI-level glomeruli segmentation from whole mouse kidneys. From the results, HoloHisto-4K delivers remarkable performance gains over previous state-of-the-art models. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03289 [pdf, other]

Correlated Privacy Mechanisms for Differentially Private Distributed Mean Estimation

Authors: Sajani Vithana, Viveck R. Cadambe, Flavio P. Calmon, Haewon Jeong

Abstract: Differentially private distributed mean estimation (DP-DME) is a fundamental building block in privacy-preserving federated learning, where a central server estimates the mean of $d$-dimensional vectors held by $n$ users while ensuring $(ε,δ)$-DP. Local differential privacy (LDP) and distributed DP with secure aggregation (SecAgg) are the most common notions of DP used in DP-DME settings with an u… ▽ More Differentially private distributed mean estimation (DP-DME) is a fundamental building block in privacy-preserving federated learning, where a central server estimates the mean of $d$-dimensional vectors held by $n$ users while ensuring $(ε,δ)$-DP. Local differential privacy (LDP) and distributed DP with secure aggregation (SecAgg) are the most common notions of DP used in DP-DME settings with an untrusted server. LDP provides strong resilience to dropouts, colluding users, and malicious server attacks, but suffers from poor utility. In contrast, SecAgg-based DP-DME achieves an $O(n)$ utility gain over LDP in DME, but requires increased communication and computation overheads and complex multi-round protocols to handle dropouts and malicious attacks. In this work, we propose CorDP-DME, a novel DP-DME mechanism that spans the gap between DME with LDP and distributed DP, offering a favorable balance between utility and resilience to dropout and collusion. CorDP-DME is based on correlated Gaussian noise, ensuring DP without the perfect conditional privacy guarantees of SecAgg-based approaches. We provide an information-theoretic analysis of CorDP-DME, and derive theoretical guarantees for utility under any given privacy parameters and dropout/colluding user thresholds. Our results demonstrate that (anti) correlated Gaussian DP mechanisms can significantly improve utility in mean estimation tasks compared to LDP -- even in adversarial settings -- while maintaining better resilience to dropouts and attacks compared to distributed DP. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03277 [pdf, other]

Evaluating Automatic Metrics with Incremental Machine Translation Systems

Authors: Guojun Wu, Shay B. Cohen, Rico Sennrich

Abstract: We introduce a dataset comprising commercial machine translations, gathered weekly over six years across 12 translation directions. Since human A/B testing is commonly used, we assume commercial systems improve over time, which enables us to evaluate machine translation (MT) metrics based on their preference for more recent translations. Our study confirms several previous findings in MT metrics r… ▽ More We introduce a dataset comprising commercial machine translations, gathered weekly over six years across 12 translation directions. Since human A/B testing is commonly used, we assume commercial systems improve over time, which enables us to evaluate machine translation (MT) metrics based on their preference for more recent translations. Our study confirms several previous findings in MT metrics research and demonstrates the dataset's value as a testbed for metric evaluation. We release our code at https://github.com/gjwubyron/Evo △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03241 [pdf, other]

Terrain Classification Enhanced with Uncertainty for Space Exploration Robots from Proprioceptive Data

Authors: Mariela De Lucas Álvarez, Jichen Guo, Raul Domínguez, Matias Valdenegro-Toro

Abstract: Terrain Classification is an essential task in space exploration, where unpredictable environments are difficult to observe using only exteroceptive sensors such as vision. Implementing Neural Network classifiers can have high performance but can be deemed untrustworthy as they lack transparency, which makes them unreliable for taking high-stakes decisions during mission planning. We address this… ▽ More Terrain Classification is an essential task in space exploration, where unpredictable environments are difficult to observe using only exteroceptive sensors such as vision. Implementing Neural Network classifiers can have high performance but can be deemed untrustworthy as they lack transparency, which makes them unreliable for taking high-stakes decisions during mission planning. We address this by proposing Neural Networks with Uncertainty Quantification in Terrain Classification. We enable our Neural Networks with Monte Carlo Dropout, DropConnect, and Flipout in time series-capable architectures using only proprioceptive data as input. We use Bayesian Optimization with Hyperband for efficient hyperparameter optimization to find optimal models for trustworthy terrain classification. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 6 pages, 4 figures. LatinX in AI Workshop @ ICML 2023 Camera Ready

arXiv:2407.03239 [pdf, other]

Solving the inverse problem of microscopy deconvolution with a residual Beylkin-Coifman-Rokhlin neural network

Authors: Rui Li, Mikhail Kudryashev, Artur Yakimovich

Abstract: Optic deconvolution in light microscopy (LM) refers to recovering the object details from images, revealing the ground truth of samples. Traditional explicit methods in LM rely on the point spread function (PSF) during image acquisition. Yet, these approaches often fall short due to inaccurate PSF models and noise artifacts, hampering the overall restoration quality. In this paper, we approached t… ▽ More Optic deconvolution in light microscopy (LM) refers to recovering the object details from images, revealing the ground truth of samples. Traditional explicit methods in LM rely on the point spread function (PSF) during image acquisition. Yet, these approaches often fall short due to inaccurate PSF models and noise artifacts, hampering the overall restoration quality. In this paper, we approached the optic deconvolution as an inverse problem. Motivated by the nonstandard-form compression scheme introduced by Beylkin, Coifman, and Rokhlin (BCR), we proposed an innovative physics-informed neural network Multi-Stage Residual-BCR Net (m-rBCR) to approximate the optic deconvolution. We validated the m-rBCR model on four microscopy datasets - two simulated microscopy datasets from ImageNet and BioSR, real dSTORM microscopy images, and real widefield microscopy images. In contrast to the explicit deconvolution methods (e.g. Richardson-Lucy) and other state-of-the-art NN models (U-Net, DDPM, CARE, DnCNN, ESRGAN, RCAN, Noise2Noise, MPRNet, and MIMO-U-Net), the m-rBCR model demonstrates superior performance to other candidates by PSNR and SSIM in two real microscopy datasets and the simulated BioSR dataset. In the simulated ImageNet dataset, m-rBCR ranks the second-best place (right after MIMO-U-Net). With the backbone from the optical physics, m-rBCR exploits the trainable parameters with better performances (from ~30 times fewer than the benchmark MIMO-U-Net to ~210 times than ESRGAN). This enables m-rBCR to achieve a shorter runtime (from ~3 times faster than MIMO-U-Net to ~300 times faster than DDPM). To summarize, by leveraging physics constraints our model reduced potentially redundant parameters significantly in expertise-oriented NN candidates and achieved high efficiency with superior performance. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 17 pages, 8 figures

ACM Class: I.4; J.3

arXiv:2407.03235 [pdf, other]

Programming universal unitary transformations on a general-purpose silicon photonics platform

Authors: Jose Roberto Rausell-Campo, Daniel Pérez, López, José Capmany Francoy

Abstract: General-purpose programmable photonic processors provide a versatile platform for integrating diverse functionalities on a single chip. Leveraging a two-dimensional hexagonal waveguide mesh of Mach-Zehnder interferometers, these systems have demonstrated significant potential in microwave photonics applications. Additionally, they are a promising platform for creating unitary linear transformation… ▽ More General-purpose programmable photonic processors provide a versatile platform for integrating diverse functionalities on a single chip. Leveraging a two-dimensional hexagonal waveguide mesh of Mach-Zehnder interferometers, these systems have demonstrated significant potential in microwave photonics applications. Additionally, they are a promising platform for creating unitary linear transformations, which are key elements in quantum computing and photonic neural networks. However, a general procedure for implementing these transformations on such systems has not been established yet. This work demonstrates the programming of universal unitary transformations on a general-purpose programmable photonic circuit with a hexagonal topology. We detail the steps to split the light on-chip, demonstrate that an equivalent structure to the Mach-Zehnder interferometer with one internal and one external phase shifter can be built in the hexagonal mesh, and program both the triangular and rectangular architectures for matrix multiplication. We recalibrate the system to account for passive phase deviations. Experimental programming of 3x3 and 4x4 random unitary matrices yields fidelities > 98% and bit precisions over 5 bits. To the best of our knowledge, this is the first time that random unitary matrices are demonstrated on a general-purpose photonic processor and pave the way for the implementation of programmable photonic circuits in optical computing and signal processing systems. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03218 [pdf, other]

Programmable Photonic Extreme Learning Machines

Authors: Jose Roberto Rausell-Campo, Antonio Hurtado, Daniel Pérez-López, José Capmany Francoy

Abstract: Photonic neural networks offer a promising alternative to traditional electronic systems for machine learning accelerators due to their low latency and energy efficiency. However, the challenge of implementing the backpropagation algorithm during training has limited their development. To address this, alternative machine learning schemes, such as extreme learning machines (ELMs), have been propos… ▽ More Photonic neural networks offer a promising alternative to traditional electronic systems for machine learning accelerators due to their low latency and energy efficiency. However, the challenge of implementing the backpropagation algorithm during training has limited their development. To address this, alternative machine learning schemes, such as extreme learning machines (ELMs), have been proposed. ELMs use a random hidden layer to increase the feature space dimensionality, requiring only the output layer to be trained through linear regression, thus reducing training complexity. Here, we experimentally demonstrate a programmable photonic extreme learning machine (PPELM) using a hexagonal waveguide mesh, and which enables to program directly on chip the input feature vector and the random hidden layer. Our system also permits to apply the nonlinearity directly on-chip by using the systems integrated photodetecting elements. Using the PPELM we solved successfully three different complex classification tasks. Additioanlly, we also propose and demonstrate two techniques to increase the accuracy of the models and reduce their variability using an evolutionary algorithm and a wavelength division multiplexing approach, obtaining excellent performance. Our results show that programmable photonic processors may become a feasible way to train competitive machine learning models on a versatile and compact platform. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03216 [pdf, other]

Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers

Authors: Sanket Gandhi, Atul, Samanyu Mahajan, Vishal Sharma, Rushil Gupta, Arnab Kumar Mondal, Parag Singla

Abstract: Recent work has shown that object-centric representations can greatly help improve the accuracy of learning dynamics while also bringing interpretability. In this work, we take this idea one step further, ask the following question: "can learning disentangled representation further improve the accuracy of visual dynamics prediction in object-centric models?" While there has been some attempt to le… ▽ More Recent work has shown that object-centric representations can greatly help improve the accuracy of learning dynamics while also bringing interpretability. In this work, we take this idea one step further, ask the following question: "can learning disentangled representation further improve the accuracy of visual dynamics prediction in object-centric models?" While there has been some attempt to learn such disentangled representations for the case of static images \citep{nsb}, to the best of our knowledge, ours is the first work which tries to do this in a general setting for video, without making any specific assumptions about the kind of attributes that an object might have. The key building block of our architecture is the notion of a {\em block}, where several blocks together constitute an object. Each block is represented as a linear combination of a given number of learnable concept vectors, which is iteratively refined during the learning process. The blocks in our model are discovered in an unsupervised manner, by attending over object masks, in a style similar to discovery of slots \citep{slot_attention}, for learning a dense object-centric representation. We employ self-attention via transformers over the discovered blocks to predict the next state resulting in discovery of visual dynamics. We perform a series of experiments on several benchmark 2-D, and 3-D datasets demonstrating that our architecture (1) can discover semantically meaningful blocks (2) help improve accuracy of dynamics prediction compared to SOTA object-centric models (3) perform significantly better in OOD setting where the specific attribute combinations are not seen earlier during training. Our experiments highlight the importance discovery of disentangled representation for visual dynamics prediction. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03203 [pdf, other]

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Authors: Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, Tong Zhang

Abstract: Proving mathematical theorems using computer-verifiable formal languages like Lean significantly impacts mathematical reasoning. One approach to formal theorem proving involves generating complete proofs using Large Language Models (LLMs) based on Natural Language (NL) proofs. Similar methods have shown promising results in code generation. However, most modern LLMs exhibit suboptimal performance… ▽ More Proving mathematical theorems using computer-verifiable formal languages like Lean significantly impacts mathematical reasoning. One approach to formal theorem proving involves generating complete proofs using Large Language Models (LLMs) based on Natural Language (NL) proofs. Similar methods have shown promising results in code generation. However, most modern LLMs exhibit suboptimal performance due to the scarcity of aligned NL and Formal Language (FL) theorem-proving data. This scarcity results in a paucity of methodologies for training LLMs and techniques to fully utilize their capabilities in composing formal proofs. To address the challenges, this paper proposes **TheoremLlama**, an end-to-end framework to train a general-purpose LLM to become a Lean4 expert. This framework encompasses NL-FL aligned dataset generation methods, training approaches for the LLM formal theorem prover, and techniques for LLM Lean4 proof writing. Using the dataset generation method, we provide *Open Bootstrapped Theorems* (OBT), an NL-FL aligned and bootstrapped dataset. A key innovation in this framework is the NL-FL bootstrap** method, where NL proofs are integrated into Lean4 code for training datasets, leveraging the NL reasoning ability of LLMs for formal reasoning. The **TheoremLlama** framework achieves cumulative accuracies of 36.48% and 33.61% on MiniF2F-Valid and Test datasets respectively, surpassing the GPT-4 baseline of 22.95% and 25.41%. We have also open-sourced our model checkpoints and generated dataset, and will soon make all the code publicly available. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03169 [pdf, other]

Investigating Decoder-only Large Language Models for Speech-to-text Translation

Authors: Chao-Wei Huang, Hui Lu, Hongyu Gong, Hirofumi Inaguma, Ilia Kulikov, Ruslan Mavlyutov, Sravya Popuri

Abstract: Large language models (LLMs), known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains, present a promising avenue for enhancing speech-related tasks. In this paper, we focus on integrating decoder-only LLMs to the task of speech-to-text translation (S2TT). We propose a decoder-only architecture that enables the LLM to directly consume the encoded sp… ▽ More Large language models (LLMs), known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains, present a promising avenue for enhancing speech-related tasks. In this paper, we focus on integrating decoder-only LLMs to the task of speech-to-text translation (S2TT). We propose a decoder-only architecture that enables the LLM to directly consume the encoded speech representation and generate the text translation. Additionally, we investigate the effects of different parameter-efficient fine-tuning techniques and task formulation. Our model achieves state-of-the-art performance on CoVoST 2 and FLEURS among models trained without proprietary data. We also conduct analyses to validate the design choices of our proposed model and bring insights to the integration of LLMs to S2TT. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted to Interspeech 2024

arXiv:2407.03163 [pdf, other]

Global Context Modeling in YOLOv8 for Pediatric Wrist Fracture Detection

Authors: Rui-Yang Ju, Chun-Tse Chien, Chia-Min Lin, Jen-Shiun Chiang

Abstract: Children often suffer wrist injuries in daily life, while fracture injuring radiologists usually need to analyze and interpret X-ray images before surgical treatment by surgeons. The development of deep learning has enabled neural network models to work as computer-assisted diagnosis (CAD) tools to help doctors and experts in diagnosis. Since the YOLOv8 models have obtained the satisfactory succes… ▽ More Children often suffer wrist injuries in daily life, while fracture injuring radiologists usually need to analyze and interpret X-ray images before surgical treatment by surgeons. The development of deep learning has enabled neural network models to work as computer-assisted diagnosis (CAD) tools to help doctors and experts in diagnosis. Since the YOLOv8 models have obtained the satisfactory success in object detection tasks, it has been applied to fracture detection. The Global Context (GC) block effectively models the global context in a lightweight way, and incorporating it into YOLOv8 can greatly improve the model performance. This paper proposes the YOLOv8+GC model for fracture detection, which is an improved version of the YOLOv8 model with the GC block. Experimental results demonstrate that compared to the original YOLOv8 model, the proposed YOLOv8-GC model increases the mean average precision calculated at intersection over union threshold of 0.5 (mAP 50) from 63.58% to 66.32% on the GRAZPEDWRI-DX dataset, achieving the state-of-the-art (SOTA) level. The implementation code for this work is available on GitHub at https://github.com/RuiyangJu/YOLOv8_Global_Context_Fracture_Detection. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03162 [pdf, other]

Bunny-VisionPro: Real-Time Bimanual Dexterous Teleoperation for Imitation Learning

Authors: Runyu Ding, Yuzhe Qin, Jiyue Zhu, Chengzhe Jia, Shiqi Yang, Ruihan Yang, Xiaojuan Qi, Xiaolong Wang

Abstract: Teleoperation is a crucial tool for collecting human demonstrations, but controlling robots with bimanual dexterous hands remains a challenge. Existing teleoperation systems struggle to handle the complexity of coordinating two hands for intricate manipulations. We introduce Bunny-VisionPro, a real-time bimanual dexterous teleoperation system that leverages a VR headset. Unlike previous vision-bas… ▽ More Teleoperation is a crucial tool for collecting human demonstrations, but controlling robots with bimanual dexterous hands remains a challenge. Existing teleoperation systems struggle to handle the complexity of coordinating two hands for intricate manipulations. We introduce Bunny-VisionPro, a real-time bimanual dexterous teleoperation system that leverages a VR headset. Unlike previous vision-based teleoperation systems, we design novel low-cost devices to provide haptic feedback to the operator, enhancing immersion. Our system prioritizes safety by incorporating collision and singularity avoidance while maintaining real-time performance through innovative designs. Bunny-VisionPro outperforms prior systems on a standard task suite, achieving higher success rates and reduced task completion times. Moreover, the high-quality teleoperation demonstrations improve downstream imitation learning performance, leading to better generalizability. Notably, Bunny-VisionPro enables imitation learning with challenging multi-stage, long-horizon dexterous manipulation tasks, which have rarely been addressed in previous work. Our system's ability to handle bimanual manipulations while prioritizing safety and real-time performance makes it a powerful tool for advancing dexterous manipulation and imitation learning. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: project page: https://dingry.github.io/projects/bunny_visionpro.html

arXiv:2407.03154 [pdf, other]

Reinforcement Learning for Sequence Design Leveraging Protein Language Models

Authors: Jithendaraa Subramanian, Shivakanth Sujit, Niloy Irtisam, Umong Sain, Derek Nowrouzezahrai, Samira Ebrahimi Kahou, Riashat Islam

Abstract: Protein sequence design, determined by amino acid sequences, are essential to protein engineering problems in drug discovery. Prior approaches have resorted to evolutionary strategies or Monte-Carlo methods for protein design, but often fail to exploit the structure of the combinatorial search space, to generalize to unseen sequences. In the context of discrete black box optimization over large se… ▽ More Protein sequence design, determined by amino acid sequences, are essential to protein engineering problems in drug discovery. Prior approaches have resorted to evolutionary strategies or Monte-Carlo methods for protein design, but often fail to exploit the structure of the combinatorial search space, to generalize to unseen sequences. In the context of discrete black box optimization over large search spaces, learning a mutation policy to generate novel sequences with reinforcement learning is appealing. Recent advances in protein language models (PLMs) trained on large corpora of protein sequences offer a potential solution to this problem by scoring proteins according to their biological plausibility (such as the TM-score). In this work, we propose to use PLMs as a reward function to generate new sequences. Yet the PLM can be computationally expensive to query due to its large size. To this end, we propose an alternative paradigm where optimization can be performed on scores from a smaller proxy model that is periodically finetuned, jointly while learning the mutation policy. We perform extensive experiments on various sequence lengths to benchmark RL-based approaches, and provide comprehensive evaluations along biological plausibility and diversity of the protein. Our experimental results include favorable evaluations of the proposed sequences, along with high diversity scores, demonstrating that RL is a strong candidate for biological sequence design. Finally, we provide a modular open source implementation can be easily integrated in most RL training loops, with support for replacing the reward model with other PLMs, to spur further research in this domain. The code for all experiments is provided in the supplementary material. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 22 pages, 7 figures, 4 tables

arXiv:2407.03152 [pdf, other]

Stereo Risk: A Continuous Modeling Approach to Stereo Matching

Authors: Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Yao Yao, Luc Van Gool

Abstract: We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. As it is well-known that stereo matching boils down to a per-pixel disparity estimation problem, the popular state-of-the-art stereo-matching approaches widely rely on regressing the scene disparity values, yet via discretization of scene disparity values. Such discretization o… ▽ More We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. As it is well-known that stereo matching boils down to a per-pixel disparity estimation problem, the popular state-of-the-art stereo-matching approaches widely rely on regressing the scene disparity values, yet via discretization of scene disparity values. Such discretization often fails to capture the nuanced, continuous nature of scene depth. Stereo Risk departs from the conventional discretization approach by formulating the scene disparity as an optimal solution to a continuous risk minimization problem, hence the name "stereo risk". We demonstrate that $L^1$ minimization of the proposed continuous risk function enhances stereo-matching performance for deep networks, particularly for disparities with multi-modal probability distributions. Furthermore, to enable the end-to-end network training of the non-differentiable $L^1$ risk optimization, we exploited the implicit function theorem, ensuring a fully differentiable network. A comprehensive analysis demonstrates our method's theoretical soundness and superior performance over the state-of-the-art methods across various benchmark datasets, including KITTI 2012, KITTI 2015, ETH3D, SceneFlow, and Middlebury 2014. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted as an Oral Paper at ICML 2024. Draft info: 18 pages, 6 Figure, 16 Tables

arXiv:2407.03140 [pdf, other]

Machine Learning Models for Improved Tracking from Range-Doppler Map Images

Authors: Elizabeth Hou, Ross Greenwood, Piyush Kumar

Abstract: Statistical tracking filters depend on accurate target measurements and uncertainty estimates for good tracking performance. In this work, we propose novel machine learning models for target detection and uncertainty estimation in range-Doppler map (RDM) images for Ground Moving Target Indicator (GMTI) radars. We show that by using the outputs of these models, we can significantly improve the perf… ▽ More Statistical tracking filters depend on accurate target measurements and uncertainty estimates for good tracking performance. In this work, we propose novel machine learning models for target detection and uncertainty estimation in range-Doppler map (RDM) images for Ground Moving Target Indicator (GMTI) radars. We show that by using the outputs of these models, we can significantly improve the performance of a multiple hypothesis tracker for complex multi-target air-to-ground tracking scenarios. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03129 [pdf, other]

Social Bias Evaluation for Large Language Models Requires Prompt Variations

Authors: Rem Hida, Masahiro Kaneko, Naoaki Okazaki

Abstract: Warning: This paper contains examples of stereotypes and biases. Large Language Models (LLMs) exhibit considerable social biases, and various studies have tried to evaluate and mitigate these biases accurately. Previous studies use downstream tasks as prompts to examine the degree of social biases for evaluation and mitigation. While LLMs' output highly depends on prompts, previous studies evaluat… ▽ More Warning: This paper contains examples of stereotypes and biases. Large Language Models (LLMs) exhibit considerable social biases, and various studies have tried to evaluate and mitigate these biases accurately. Previous studies use downstream tasks as prompts to examine the degree of social biases for evaluation and mitigation. While LLMs' output highly depends on prompts, previous studies evaluating and mitigating bias have often relied on a limited variety of prompts. In this paper, we investigate the sensitivity of LLMs when changing prompt variations (task instruction and prompt, few-shot examples, debias-prompt) by analyzing task performance and social bias of LLMs. Our experimental results reveal that LLMs are highly sensitive to prompts to the extent that the ranking of LLMs fluctuates when comparing models for task performance and social bias. Additionally, we show that LLMs have tradeoffs between performance and social bias caused by the prompts. Less bias from prompt setting may result in reduced performance. Moreover, the ambiguity of instances is one of the reasons for this sensitivity to prompts in advanced LLMs, leading to various outputs. We recommend using diverse prompts, as in this study, to compare the effects of prompts on social bias in LLMs. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03126 [pdf, other]

Game-Theoretic Protection Adoption Against Networked SIS Epidemics

Authors: Abhisek Satapathi, Ashish R. Hota

Abstract: In this paper, we investigate game-theoretic strategies for containing spreading processes on large-scale networks. Specifically, we consider the class of networked susceptible-infected-susceptible (SIS) epidemics where a large population of agents strategically choose whether to adopt partially effective protection. We define the utilities of the agents which depends on the degree of the agent, i… ▽ More In this paper, we investigate game-theoretic strategies for containing spreading processes on large-scale networks. Specifically, we consider the class of networked susceptible-infected-susceptible (SIS) epidemics where a large population of agents strategically choose whether to adopt partially effective protection. We define the utilities of the agents which depends on the degree of the agent, its individual infection status and action, as well as the the overall prevalence of the epidemic and strategy profile of the entire population. We further present the coupled dynamics of epidemic evolution as well as strategy update which is assumed to follow the replicator dynamics. By relying on timescale separation arguments, we first derive the optimal strategy of protection adoption by the agents for a given epidemic state, and then present the reduced epidemic dynamics. The existence and uniqueness of endemic equilibrium is rigorously characterized and forms the main result of this paper. Finally, we present extensive numerical results to highlight the impacts of heterogeneous node degrees, infection rates, cost of protection adoption, and effectiveness of protection on the epidemic prevalence at the equilibrium. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03108 [pdf, other]

How Reliable and Stable are Explanations of XAI Methods?

Authors: José Ribeiro, Lucas Cardoso, Vitor Santos, Eduardo Carvalho, Níkolas Carneiro, Ronnie Alves

Abstract: Black box models are increasingly being used in the daily lives of human beings living in society. Along with this increase, there has been the emergence of Explainable Artificial Intelligence (XAI) methods aimed at generating additional explanations regarding how the model makes certain predictions. In this sense, methods such as Dalex, Eli5, eXirt, Lofo and Shap emerged as different proposals an… ▽ More Black box models are increasingly being used in the daily lives of human beings living in society. Along with this increase, there has been the emergence of Explainable Artificial Intelligence (XAI) methods aimed at generating additional explanations regarding how the model makes certain predictions. In this sense, methods such as Dalex, Eli5, eXirt, Lofo and Shap emerged as different proposals and methodologies for generating explanations of black box models in an agnostic way. Along with the emergence of these methods, questions arise such as "How Reliable and Stable are XAI Methods?". With the aim of shedding light on this main question, this research creates a pipeline that performs experiments using the diabetes dataset and four different machine learning models (LGBM, MLP, DT and KNN), creating different levels of perturbations of the test data and finally generates explanations from the eXirt method regarding the confidence of the models and also feature relevances ranks from all XAI methods mentioned, in order to measure their stability in the face of perturbations. As a result, it was found that eXirt was able to identify the most reliable models among all those used. It was also found that current XAI methods are sensitive to perturbations, with the exception of one specific method. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 15 pages, 6 figures, submitted to BRACIS 2024

ACM Class: I.2.6

arXiv:2407.03086 [pdf, other]

Effective Heterogeneous Federated Learning via Efficient Hypernetwork-based Weight Generation

Authors: Yu** Shin, Kichang Lee, Sungmin Lee, You Rim Choi, Hyung-Sin Kim, JeongGil Ko

Abstract: While federated learning leverages distributed client resources, it faces challenges due to heterogeneous client capabilities. This necessitates allocating models suited to clients' resources and careful parameter aggregation to accommodate this heterogeneity. We propose HypeMeFed, a novel federated learning framework for supporting client heterogeneity by combining a multi-exit network architectu… ▽ More While federated learning leverages distributed client resources, it faces challenges due to heterogeneous client capabilities. This necessitates allocating models suited to clients' resources and careful parameter aggregation to accommodate this heterogeneity. We propose HypeMeFed, a novel federated learning framework for supporting client heterogeneity by combining a multi-exit network architecture with hypernetwork-based model weight generation. This approach aligns the feature spaces of heterogeneous model layers and resolves per-layer information disparity during weight aggregation. To practically realize HypeMeFed, we also propose a low-rank factorization approach to minimize computation and memory overhead associated with hypernetworks. Our evaluations on a real-world heterogeneous device testbed indicate that HypeMeFed enhances accuracy by 5.12% over FedAvg, reduces the hypernetwork memory requirements by 98.22%, and accelerates its operations by 1.86 times compared to a naive hypernetwork approach. These results demonstrate HypeMeFed's effectiveness in leveraging and engaging heterogeneous clients for federated learning. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03076 [pdf, other]

A Case Study on Context-Aware Neural Machine Translation with Multi-Task Learning

Authors: Ramakrishna Appicharla, Baban Gain, Santanu Pal, Asif Ekbal, Pushpak Bhattacharyya

Abstract: In document-level neural machine translation (DocNMT), multi-encoder approaches are common in encoding context and source sentences. Recent studies \cite{li-etal-2020-multi-encoder} have shown that the context encoder generates noise and makes the model robust to the choice of context. This paper further investigates this observation by explicitly modelling context encoding through multi-task lear… ▽ More In document-level neural machine translation (DocNMT), multi-encoder approaches are common in encoding context and source sentences. Recent studies \cite{li-etal-2020-multi-encoder} have shown that the context encoder generates noise and makes the model robust to the choice of context. This paper further investigates this observation by explicitly modelling context encoding through multi-task learning (MTL) to make the model sensitive to the choice of context. We conduct experiments on cascade MTL architecture, which consists of one encoder and two decoders. Generation of the source from the context is considered an auxiliary task, and generation of the target from the source is the main task. We experimented with German--English language pairs on News, TED, and Europarl corpora. Evaluation results show that the proposed MTL approach performs better than concatenation-based and multi-encoder DocNMT models in low-resource settings and is sensitive to the choice of context. However, we observe that the MTL models are failing to generate the source from the context. These observations align with the previous studies, and this might suggest that the available document-level parallel corpora are not context-aware, and a robust sentence-level model can outperform the context-aware models. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted to EAMT 2024 (poster)

arXiv:2407.03070 [pdf, other]

Federated Learning for Zero-Day Attack Detection in 5G and Beyond V2X Networks

Authors: Abdelaziz Amara korba, Abdelwahab Boualouache, Bouziane Brik, Rabah Rahal, Yacine Ghamri-Doudane, Sidi Mohammed Senouci

Abstract: Deploying Connected and Automated Vehicles (CAVs) on top of 5G and Beyond networks (5GB) makes them vulnerable to increasing vectors of security and privacy attacks. In this context, a wide range of advanced machine/deep learning based solutions have been designed to accurately detect security attacks. Specifically, supervised learning techniques have been widely applied to train attack detection… ▽ More Deploying Connected and Automated Vehicles (CAVs) on top of 5G and Beyond networks (5GB) makes them vulnerable to increasing vectors of security and privacy attacks. In this context, a wide range of advanced machine/deep learning based solutions have been designed to accurately detect security attacks. Specifically, supervised learning techniques have been widely applied to train attack detection models. However, the main limitation of such solutions is their inability to detect attacks different from those seen during the training phase, or new attacks, also called zero-day attacks. Moreover, training the detection model requires significant data collection and labeling, which increases the communication overhead, and raises privacy concerns. To address the aforementioned limits, we propose in this paper a novel detection mechanism that leverages the ability of the deep auto-encoder method to detect attacks relying only on the benign network traffic pattern. Using federated learning, the proposed intrusion detection system can be trained with large and diverse benign network traffic, while preserving the CAVs privacy, and minimizing the communication overhead. The in-depth experiment on a recent network traffic dataset shows that the proposed system achieved a high detection rate while minimizing the false positive rate, and the detection delay. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03068 [pdf, other]

xApp Distillation: AI-based Conflict Mitigation in B5G O-RAN

Authors: Hakan Erdol, Xiaoyang Wang, Robert Piechocki, George Oikonomou, Arjun Parekh

Abstract: The advancements of machine learning-based (ML) decision-making algorithms created various research and industrial opportunities. One of these areas is ML-based near-real-time network management applications (xApps) in Open-Radio Access Network (O-RAN). Normally, xApps are designed solely for the desired objectives, and fine-tuned for deployment. However, telecommunication companies can employ mul… ▽ More The advancements of machine learning-based (ML) decision-making algorithms created various research and industrial opportunities. One of these areas is ML-based near-real-time network management applications (xApps) in Open-Radio Access Network (O-RAN). Normally, xApps are designed solely for the desired objectives, and fine-tuned for deployment. However, telecommunication companies can employ multiple xApps and deploy them in overlap** areas. Consider the different design objectives of xApps, the deployment might cause conflicts. To prevent such conflicts, we proposed the xApp distillation method that distills knowledge from multiple xApps, then uses this knowledge to train a single model that has retained the capabilities of Previous xApps. Performance evaluations show that compared conflict mitigation schemes can cause up to six times more network outages than xApp distillation in some cases. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 5 Pages, 4 figures

arXiv:2407.03043 [pdf, other]

SlerpFace: Face Template Protection via Spherical Linear Interpolation

Authors: Zhizhou Zhong, Yuxi Mi, Yuge Huang, Jianqing Xu, Guodong Mu, Shouhong Ding, **gyun Zhang, Rizen Guo, Yunsheng Wu, Shuigeng Zhou

Abstract: Contemporary face recognition systems use feature templates extracted from face images to identify persons. To enhance privacy, face template protection techniques are widely employed to conceal sensitive identity and appearance information stored in the template. This paper identifies an emerging privacy attack form utilizing diffusion models that could nullify prior protection, referred to as in… ▽ More Contemporary face recognition systems use feature templates extracted from face images to identify persons. To enhance privacy, face template protection techniques are widely employed to conceal sensitive identity and appearance information stored in the template. This paper identifies an emerging privacy attack form utilizing diffusion models that could nullify prior protection, referred to as inversion attacks. The attack can synthesize high-quality, identity-preserving face images from templates, revealing persons' appearance. Based on studies of the diffusion model's generative capability, this paper proposes a defense to deteriorate the attack, by rotating templates to a noise-like distribution. This is achieved efficiently by spherically and linearly interpolating templates, or slerp, on their located hypersphere. This paper further proposes to group-wisely divide and drop out templates' feature dimensions, to enhance the irreversibility of rotated templates. The division of groups and dropouts within each group are learned in a recognition-favored way. The proposed techniques are concretized as a novel face template protection technique, SlerpFace. Extensive experiments show that SlerpFace provides satisfactory recognition accuracy and comprehensive privacy protection against inversion and other attack forms, superior to prior arts. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: face template protection

arXiv:2407.02994 [pdf, other]

MedPix 2.0: A Comprehensive Multimodal Biomedical Dataset for Advanced AI Applications

Authors: Irene Siragusa, Salvatore Contino, Massimo La Ciura, Rosario Alicata, Roberto Pirrone

Abstract: The increasing interest in develo** Artificial Intelligence applications in the medical domain, suffers from the lack of high-quality dataset, mainly due to privacy-related issues. Moreover, the recent rising of Multimodal Large Language Models (MLLM) leads to a need for multimodal medical datasets, where clinical reports and findings are attached to the corresponding CT or MR scans. This paper… ▽ More The increasing interest in develo** Artificial Intelligence applications in the medical domain, suffers from the lack of high-quality dataset, mainly due to privacy-related issues. Moreover, the recent rising of Multimodal Large Language Models (MLLM) leads to a need for multimodal medical datasets, where clinical reports and findings are attached to the corresponding CT or MR scans. This paper illustrates the entire workflow for building the data set MedPix 2.0. Starting from the well-known multimodal dataset MedPix\textsuperscript{\textregistered}, mainly used by physicians, nurses and healthcare students for Continuing Medical Education purposes, a semi-automatic pipeline was developed to extract visual and textual data followed by a manual curing procedure where noisy samples were removed, thus creating a MongoDB database. Along with the dataset, we developed a GUI aimed at navigating efficiently the MongoDB instance, and obtaining the raw data that can be easily used for training and/or fine-tuning MLLMs. To enforce this point, we also propose a CLIP-based model trained on MedPix 2.0 for scan classification tasks. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02984 [pdf, other]

Semantically Rich Local Dataset Generation for Explainable AI in Genomics

Authors: Pedro Barbosa, Rosina Savisaar, Alcides Fonseca

Abstract: Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms. Therefore, interpreting these models may provide novel insights into the underlying biology, supporting downstream biomedical applications. Due to their complexity, interpretable surrogate models can only be built for local explanations (e.g., a single instance). Ho… ▽ More Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms. Therefore, interpreting these models may provide novel insights into the underlying biology, supporting downstream biomedical applications. Due to their complexity, interpretable surrogate models can only be built for local explanations (e.g., a single instance). However, accomplishing this requires generating a dataset in the neighborhood of the input, which must maintain syntactic similarity to the original data while introducing semantic variability in the model's predictions. This task is challenging due to the complex sequence-to-function relationship of DNA. We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity. Our custom, domain-guided individual representation effectively constrains syntactic similarity, and we provide two alternative fitness functions that promote diversity with no computational effort. Applied to the RNA splicing domain, our approach quickly achieves good diversity and significantly outperforms a random baseline in exploring the search space, as shown by our proof-of-concept, short RNA sequence. Furthermore, we assess its generalizability and demonstrate scalability to larger sequences, resulting in a $\approx$30\% improvement over the baseline. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02980 [pdf, other]

Modelling the mitigation of anti-vaccine opinion propagation to suppress epidemic spread: A computational approach

Authors: Sarah Alahmadi, Rebecca Hoyle, Michael Head, Markus Brede

Abstract: Information regarding vaccines from sources such as health services, media, and social networks can significantly shape vaccination decisions. In particular, the dissemination of negative information can contribute to vaccine hesitancy, thereby exacerbating infectious disease outbreaks. This study investigates strategies to mitigate anti-vaccine social contagion through effective counter-campaigns… ▽ More Information regarding vaccines from sources such as health services, media, and social networks can significantly shape vaccination decisions. In particular, the dissemination of negative information can contribute to vaccine hesitancy, thereby exacerbating infectious disease outbreaks. This study investigates strategies to mitigate anti-vaccine social contagion through effective counter-campaigns that disseminate positive vaccine information and encourage vaccine uptake, aiming to reduce the size of epidemics. In a coupled agent-based model that consists of opinion and disease diffusion processes, we explore and compare different heuristics to design positive campaigns based on the network structure and local presence of negative vaccine attitudes. We examine two campaigning regimes: a static regime with a fixed set of targets, and a dynamic regime in which targets can be updated over time. We demonstrate that strategic targeting and engagement with the dynamics of anti-vaccine influence diffusion in the network can effectively mitigate the spread of anti-vaccine sentiment, thereby reducing the epidemic size. However, the effectiveness of the campaigns differs across different targeting strategies and is impacted by a range of factors. We find that the primary advantage of static campaigns lies in their capacity to act as an obstacle, preventing the clustering of emerging anti-vaccine communities, thereby resulting in smaller and unconnected anti-vaccine groups. On the other hand, dynamic campaigns reach a broader segment of the population and adapt to the evolution of anti-vaccine diffusion, not only protecting susceptible agents from negative influence but also fostering positive propagation within negative regions. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Submitted to the PLOS ONE Journal

arXiv:2407.02978 [pdf, other]

Mast Kalandar at SemEval-2024 Task 8: On the Trail of Textual Origins: RoBERTa-BiLSTM Approach to Detect AI-Generated Text

Authors: Jainit Sushil Bafna, Hardik Mittal, Suyash Sethia, Manish Shrivastava, Radhika Mamidi

Abstract: Large Language Models (LLMs) have showcased impressive abilities in generating fluent responses to diverse user queries. However, concerns regarding the potential misuse of such texts in journalism, educational, and academic contexts have surfaced. SemEval 2024 introduces the task of Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection, aiming to develop automat… ▽ More Large Language Models (LLMs) have showcased impressive abilities in generating fluent responses to diverse user queries. However, concerns regarding the potential misuse of such texts in journalism, educational, and academic contexts have surfaced. SemEval 2024 introduces the task of Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection, aiming to develop automated systems for identifying machine-generated text and detecting potential misuse. In this paper, we i) propose a RoBERTa-BiLSTM based classifier designed to classify text into two categories: AI-generated or human ii) conduct a comparative study of our model with baseline approaches to evaluate its effectiveness. This paper contributes to the advancement of automatic text detection systems in addressing the challenges posed by machine-generated text misuse. Our architecture ranked 46th on the official leaderboard with an accuracy of 80.83 among 125. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: SemEval-2024

arXiv:2407.02963 [pdf, ps, other]

Subspace Coding for Spatial Sensing

Authors: Hessam Mahdavifar, Robin Rajamäki, Piya Pal

Abstract: A subspace code is defined as a collection of subspaces of an ambient vector space, where each information-encoding codeword is a subspace. This paper studies a class of spatial sensing problems, notably direction of arrival (DoA) estimation using multisensor arrays, from a novel subspace coding perspective. Specifically, we demonstrate how a canonical (passive) sensing model can be mapped into a… ▽ More A subspace code is defined as a collection of subspaces of an ambient vector space, where each information-encoding codeword is a subspace. This paper studies a class of spatial sensing problems, notably direction of arrival (DoA) estimation using multisensor arrays, from a novel subspace coding perspective. Specifically, we demonstrate how a canonical (passive) sensing model can be mapped into a subspace coding problem, with the sensing operation defining a unique structure for the subspace codewords. We introduce the concept of sensing subspace codes following this structure, and show how these codes can be controlled by judiciously designing the sensor array geometry. We further present a construction of sensing subspace codes leveraging a certain class of Golomb rulers that achieve near-optimal minimum codeword distance. These designs inspire novel noise-robust sparse array geometries achieving high angular resolution. We also prove that codes corresponding to conventional uniform linear arrays are suboptimal in this regard. This work is the first to establish connections between subspace coding and spatial sensing, with the aim of leveraging insights and methodologies in one field to tackle challenging problems in the other. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: ©2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2407.02960 [pdf, other]

ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets

Authors: Ahmed Frikha, Nassim Walha, Ricardo Mendes, Krishna Kanth Nakka, Xue Jiang, Xuebing Zhou

Abstract: This work addresses the timely yet underexplored problem of performing inference and finetuning of a proprietary LLM owned by a model provider entity on the confidential/private data of another data owner entity, in a way that ensures the confidentiality of both the model and the data. Hereby, the finetuning is conducted offsite, i.e., on the computation infrastructure of a third-party cloud provi… ▽ More This work addresses the timely yet underexplored problem of performing inference and finetuning of a proprietary LLM owned by a model provider entity on the confidential/private data of another data owner entity, in a way that ensures the confidentiality of both the model and the data. Hereby, the finetuning is conducted offsite, i.e., on the computation infrastructure of a third-party cloud provider. We tackle this problem by proposing ObfuscaTune, a novel, efficient and fully utility-preserving approach that combines a simple yet effective obfuscation technique with an efficient usage of confidential computing (only 5% of the model parameters are placed on TEE). We empirically demonstrate the effectiveness of ObfuscaTune by validating it on GPT-2 models with different sizes on four NLP benchmark datasets. Finally, we compare to a naïve version of our approach to highlight the necessity of using random matrices with low condition numbers in our approach to reduce errors induced by the obfuscation. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Preprint

arXiv:2407.02956 [pdf, other]

IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization

Authors: Ahmed Frikha, Nassim Walha, Krishna Kanth Nakka, Ricardo Mendes, Xue Jiang, Xuebing Zhou

Abstract: In this work, we address the problem of text anonymization where the goal is to prevent adversaries from correctly inferring private attributes of the author, while kee** the text utility, i.e., meaning and semantics. We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value. Our empirical evaluation shows a redu… ▽ More In this work, we address the problem of text anonymization where the goal is to prevent adversaries from correctly inferring private attributes of the author, while kee** the text utility, i.e., meaning and semantics. We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value. Our empirical evaluation shows a reduction of private attribute leakage by more than 90%. Finally, we demonstrate the maturity of IncogniText for real-world applications by distilling its anonymization capability into a set of LoRA parameters associated with an on-device model. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Preprint

arXiv:2407.02943 [pdf, other]

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

Authors: Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, Xuebing Zhou

Abstract: The latest and most impactful advances in large models stem from their increased size. Unfortunately, this translates into an improved memorization capacity, raising data privacy concerns. Specifically, it has been shown that models can output personal identifiable information (PII) contained in their training data. However, reported PIII extraction performance varies widely, and there is no conse… ▽ More The latest and most impactful advances in large models stem from their increased size. Unfortunately, this translates into an improved memorization capacity, raising data privacy concerns. Specifically, it has been shown that models can output personal identifiable information (PII) contained in their training data. However, reported PIII extraction performance varies widely, and there is no consensus on the optimal methodology to evaluate this risk, resulting in underestimating realistic adversaries. In this work, we empirically demonstrate that it is possible to improve the extractability of PII by over ten-fold by grounding the prefix of the manually constructed extraction prompt with in-domain data. Our approach, PII-Compass, achieves phone number extraction rates of 0.92%, 3.9%, and 6.86% with 1, 128, and 2308 queries, respectively, i.e., the phone number of 1 person in 15 is extractable. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted at ACL 2024

arXiv:2407.02942 [pdf]

doi 10.1007/978-3-030-05171-6_17

Recompression Based JPEG Tamper Detection and Localization Using Deep Neural Network Eliminating Compression Factor Dependency

Authors: Jamimamul Bakas, Praneta Rawat, Kalyan Kokkalla, Ruchira Naskar

Abstract: In this work, we deal with the problem of re compression based image forgery detection, where some regions of an image are modified illegitimately, hence giving rise to presence of dual compression characteristics within a single image. There have been some significant researches in this direction, in the last decade. However, almost all existing techniques fail to detect this form of forgery, whe… ▽ More In this work, we deal with the problem of re compression based image forgery detection, where some regions of an image are modified illegitimately, hence giving rise to presence of dual compression characteristics within a single image. There have been some significant researches in this direction, in the last decade. However, almost all existing techniques fail to detect this form of forgery, when the first compression factor is greater than the second. We address this problem in re compression based forgery detection, here Recently, Machine Learning techniques have started gaining a lot of importance in the domain of digital image forensics. In this work, we propose a Convolution Neural Network based deep learning architecture, which is capable of detecting the presence of re compression based forgery in JPEG images. The proposed architecture works equally efficiently, even in cases where the first compression ratio is greater than the second. In this work, we also aim to localize the regions of image manipulation based on re compression features, using the trained neural network. Our experimental results prove that the proposed method outperforms the state of the art, with respect to forgery detection and localization accuracy. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 24 pages, conference

Journal ref: Information Systems Security: 14th International Conference, ICISS 2018, Bangalore, India, December 17-19, 2018, Proceedings. Vol. 11281. Springer, 2018

arXiv:2407.02921 [pdf, other]

In-Memory Mirroring: Cloning Without Reading

Authors: Simranjeet Singh, Ankit Bende, Chandan Kumar Jha, Vikas Rana, Rolf Drechsler, Sachin Patkar, Farhad Merchant

Abstract: In-memory computing (IMC) has gained significant attention recently as it attempts to reduce the impact of memory bottlenecks. Numerous schemes for digital IMC are presented in the literature, focusing on logic operations. Often, an application's description has data dependencies that must be resolved. Contemporary IMC architectures perform read followed by write operations for this purpose, which… ▽ More In-memory computing (IMC) has gained significant attention recently as it attempts to reduce the impact of memory bottlenecks. Numerous schemes for digital IMC are presented in the literature, focusing on logic operations. Often, an application's description has data dependencies that must be resolved. Contemporary IMC architectures perform read followed by write operations for this purpose, which results in performance and energy penalties. To solve this fundamental problem, this paper presents in-memory mirroring (IMM). IMM eliminates the need for read and write-back steps, thus avoiding energy and performance penalties. Instead, we perform data movement within memory, involving row-wise and column-wise data transfers. Additionally, the IMM scheme enables parallel cloning of entire row (word) with a complexity of $\mathcal{O}(1)$. Moreover, our analysis of the energy consumption of the proposed technique using resistive random-access memory crossbar and experimentally validated JART VCM v1b model. The IMM increases energy efficiency and shows 2$\times$ performance improvement compared to conventional data movement methods. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted in IFIP/IEEE VLSI-SoC 2024

arXiv:2407.02920 [pdf, ps, other]

EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support

Authors: Ramy Battrawy, René Schuster, Didier Stricker

Abstract: Recent weakly-supervised methods for scene flow estimation from LiDAR point clouds are limited to explicit reasoning on object-level. These methods perform multiple iterative optimizations for each rigid object, which makes them vulnerable to clustering robustness. In this paper, we propose our EgoFlowNet - a point-level scene flow estimation network trained in a weakly-supervised manner and witho… ▽ More Recent weakly-supervised methods for scene flow estimation from LiDAR point clouds are limited to explicit reasoning on object-level. These methods perform multiple iterative optimizations for each rigid object, which makes them vulnerable to clustering robustness. In this paper, we propose our EgoFlowNet - a point-level scene flow estimation network trained in a weakly-supervised manner and without object-based abstraction. Our approach predicts a binary segmentation mask that implicitly drives two parallel branches for ego-motion and scene flow. Unlike previous methods, we provide both branches with all input points and carefully integrate the binary mask into the feature extraction and losses. We also use a shared cost volume with local refinement that is updated at multiple scales without explicit clustering or rigidity assumptions. On realistic KITTI scenes, we show that our EgoFlowNet performs better than state-of-the-art methods in the presence of ground surface points. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: This paper is published in BMVC2023 (pp. 441-443)

arXiv:2407.02914 [pdf, other]

The More the Merrier? Navigating Accuracy vs. Energy Efficiency Design Trade-Offs in Ensemble Learning Systems

Authors: Rafiullah Omar, Justus Bogner, Henry Muccini, Patricia Lago, Silverio Martínez-Fernández, Xavier Franch

Abstract: Background: Machine learning (ML) model composition is a popular technique to mitigate shortcomings of a single ML model and to design more effective ML-enabled systems. While ensemble learning, i.e., forwarding the same request to several models and fusing their predictions, has been studied extensively for accuracy, we have insufficient knowledge about how to design energy-efficient ensembles. O… ▽ More Background: Machine learning (ML) model composition is a popular technique to mitigate shortcomings of a single ML model and to design more effective ML-enabled systems. While ensemble learning, i.e., forwarding the same request to several models and fusing their predictions, has been studied extensively for accuracy, we have insufficient knowledge about how to design energy-efficient ensembles. Objective: We therefore analyzed three types of design decisions for ensemble learning regarding a potential trade-off between accuracy and energy consumption: a) ensemble size, i.e., the number of models in the ensemble, b) fusion methods (majority voting vs. a meta-model), and c) partitioning methods (whole-dataset vs. subset-based training). Methods: By combining four popular ML algorithms for classification in different ensembles, we conducted a full factorial experiment with 11 ensembles x 4 datasets x 2 fusion methods x 2 partitioning methods (176 combinations). For each combination, we measured accuracy (F1-score) and energy consumption in J (for both training and inference). Results: While a larger ensemble size significantly increased energy consumption (size 2 ensembles consumed 37.49% less energy than size 3 ensembles, which in turn consumed 26.96% less energy than the size 4 ensembles), it did not significantly increase accuracy. Furthermore, majority voting outperformed meta-model fusion both in terms of accuracy (Cohen's d of 0.38) and energy consumption (Cohen's d of 0.92). Lastly, subset-based training led to significantly lower energy consumption (Cohen's d of 0.91), while training on the whole dataset did not increase accuracy significantly. Conclusions: From a Green AI perspective, we recommend designing ensembles of small size (2 or maximum 3 models), using subset-based training, majority voting, and energy-efficient ML algorithms like decision trees, Naive Bayes, or KNN. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Currently under review at a journal

arXiv:2407.02913 [pdf, other]

SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic

Authors: Liulu He, Yufei Zhao, Rui Gao, Yuan Du, Li Du

Abstract: Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast co… ▽ More Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast convolution by extending the Discrete Fourier Transform (DFT) with symbolic computing, in which only additions are required to perform the transformation at specific transform points, avoiding the calculation of irrational number and reducing the requirement for precision. Additionally, we enhance convolution efficiency by introducing correction terms to convert invalid circular convolution outputs of the Fourier method into effective ones. The numerical error analysis is presented for the first time in this type of work and proves that our algorithms can provide a 3.68x multiplication reduction for 3x3 convolution, while the Winograd algorithm only achieves a 2.25x reduction with similarly low numerical errors. Experiments carried out on benchmarks and FPGA show that our new algorithms can further improve the computation efficiency of quantized models while maintaining accuracy, surpassing both the quantization-alone method and existing works on fast convolution quantization. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: ICML 2024

arXiv:2407.02911 [pdf, other]

Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI

Authors: Luyi Han, Tao Tan, Tianyu Zhang, Xin Wang, Yuan Gao, Chunyao Lu, Xinglong Liang, Haoran Dou, Yunzhi Huang, Ritse Mann

Abstract: Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the rec… ▽ More Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the reconstruction of distinct sequences from the common latent space. We propose a generative model that compresses discrete representations of each sequence to estimate the Gaussian distribution of vector-quantized common (VQC) latent space between multiple sequences. Moreover, we improve the latent space consistency with contrastive learning and increase model stability by domain augmentation. Experiments using BraTS2021 dataset show that our non-adversarial model outperforms other GAN-based methods, and VQC latent space aids our model to achieve (1) anti-interference ability, which can eliminate the effects of noise, bias fields, and artifacts, and (2) solid semantic representation ability, with the potential of one-shot segmentation. Our code is publicly available. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02883 [pdf, other]

CoIR: A Comprehensive Benchmark for Code Information Retrieval Models

Authors: Xiangyang Li, Kuicai Dong, Yi Quan Lee, Wei Xia, Yichun Yin, Hao Zhang, Yong Liu, Yasheng Wang, Ruiming Tang

Abstract: Despite the substantial success of Information Retrieval (IR) in various NLP tasks, most IR systems predominantly handle queries and corpora in natural language, neglecting the domain of code retrieval. Code retrieval is critically important yet remains under-explored, with existing methods and benchmarks inadequately representing the diversity of code in various domains and tasks. Addressing this… ▽ More Despite the substantial success of Information Retrieval (IR) in various NLP tasks, most IR systems predominantly handle queries and corpora in natural language, neglecting the domain of code retrieval. Code retrieval is critically important yet remains under-explored, with existing methods and benchmarks inadequately representing the diversity of code in various domains and tasks. Addressing this gap, we present \textbf{\name} (\textbf{Co}de \textbf{I}nformation \textbf{R}etrieval Benchmark), a robust and comprehensive benchmark specifically designed to assess code retrieval capabilities. \name comprises \textbf{ten} meticulously curated code datasets, spanning \textbf{eight} distinctive retrieval tasks across \textbf{seven} diverse domains. We first discuss the construction of \name and its diverse dataset composition. Further, we evaluate nine widely used retrieval models using \name, uncovering significant difficulties in performing code retrieval tasks even with state-of-the-art systems. To facilitate easy adoption and integration within existing research workflows, \name has been developed as a user-friendly Python framework, readily installable via pip. It shares same data schema as other popular benchmarks like MTEB and BEIR, enabling seamless cross-benchmark evaluations. Through \name, we aim to invigorate research in the code retrieval domain, providing a versatile benchmarking tool that encourages further development and exploration of code retrieval systems\footnote{\url{ https://github.com/CoIR-team/coir}}. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02856 [pdf, other]

Early-Stage Anomaly Detection: A Study of Model Performance on Complete vs. Partial Flows

Authors: Adrian Pekar, Richard Jozsa

Abstract: This study investigates the efficacy of machine learning models, specifically Random Forest, in anomaly detection systems when trained on complete flow records and tested on partial flow data. We explore the performance disparity that arises when models are applied to incomplete data typical in real-world, real-time network environments. Our findings demonstrate a significant decline in model perf… ▽ More This study investigates the efficacy of machine learning models, specifically Random Forest, in anomaly detection systems when trained on complete flow records and tested on partial flow data. We explore the performance disparity that arises when models are applied to incomplete data typical in real-world, real-time network environments. Our findings demonstrate a significant decline in model performance, with precision and recall drop** by up to 30\% under certain conditions when models trained on complete flows are tested against partial flows. Conversely, models trained and tested on consistently complete or partial datasets maintain robustness, highlighting the importance of dataset consistency in training. The study reveals that a minimum of 7 packets in the test set is required for maintaining reliable detection rates. These results underscore the need for tailored training strategies that can effectively adapt to the dynamics of partial data, enhancing the practical applicability of anomaly detection systems in operational settings. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 9 pages, 5 tables, 2 figures

arXiv:2407.02834 [pdf, ps, other]

Aspect-Based Sentiment Analysis Techniques: A Comparative Study

Authors: Dineth Jayakody, Koshila Isuranda, A V A Malkith, Nisansa de Silva, Sachintha Rajith Ponnamperuma, G G N Sandamali, K L K Sudheera

Abstract: Since the dawn of the digitalisation era, customer feedback and online reviews are unequivocally major sources of insights for businesses. Consequently, conducting comparative analyses of such sources has become the de facto modus operandi of any business that wishes to give itself a competitive edge over its peers and improve customer loyalty. Sentiment analysis is one such method instrumental in… ▽ More Since the dawn of the digitalisation era, customer feedback and online reviews are unequivocally major sources of insights for businesses. Consequently, conducting comparative analyses of such sources has become the de facto modus operandi of any business that wishes to give itself a competitive edge over its peers and improve customer loyalty. Sentiment analysis is one such method instrumental in gauging public interest, exposing market trends, and analysing competitors. While traditional sentiment analysis focuses on overall sentiment, as the needs advance with time, it has become important to explore public opinions and sentiments on various specific subjects, products and services mentioned in the reviews on a finer-granular level. To this end, Aspect-based Sentiment Analysis (ABSA), supported by advances in Artificial Intelligence (AI) techniques which have contributed to a paradigm shift from simple word-level analysis to tone and context-aware analyses, focuses on identifying specific aspects within the text and determining the sentiment associated with each aspect. In this study, we compare several deep-NN methods for ABSA on two benchmark datasets (Restaurant14 and Laptop-14) and found that FAST LSA obtains the best overall results of 87.6% and 82.6% accuracy but does not pass LSA+DeBERTa which reports 90.33% and 86.21% accuracy respectively. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02828 [pdf]

Quantum Serverless Paradigm and Application Development using the QFaaS Framework

Authors: Hoa T. Nguyen, Bui Binh An Pham, Muhammad Usman, Rajkumar Buyya

Abstract: Quantum computing has the potential to solve complex problems beyond the capabilities of classical computers. However, its practical use is currently limited due to early-stage quantum software engineering and the constraints of Noisy Intermediate-Scale Quantum (NISQ) devices. To address this issue, this chapter introduces the concept of serverless quantum computing with examples using QFaaS, a pr… ▽ More Quantum computing has the potential to solve complex problems beyond the capabilities of classical computers. However, its practical use is currently limited due to early-stage quantum software engineering and the constraints of Noisy Intermediate-Scale Quantum (NISQ) devices. To address this issue, this chapter introduces the concept of serverless quantum computing with examples using QFaaS, a practical Quantum Function-as-a-Service framework. This framework utilizes the serverless computing model to simplify quantum application development and deployment by abstracting the complexities of quantum hardware and enhancing application portability across different quantum software development kits and quantum backends. The chapter provides comprehensive documentation and guidelines for deploying and using QFaaS, detailing the setup, component deployment, and examples of service-oriented quantum applications. This framework offers a promising approach to overcoming current limitations and advancing the practical software engineering of quantum computing. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Guidelines for deploying and using the QFaaS Framework (for the original paper, see https://doi.org/10.1016/j.future.2024.01.018)

arXiv:2407.02811 [pdf, other]

SPLITZ: Certifiable Robustness via Split Lipschitz Randomized Smoothing

Authors: Meiyu Zhong, Ravi Tandon

Abstract: Certifiable robustness gives the guarantee that small perturbations around an input to a classifier will not change the prediction. There are two approaches to provide certifiable robustness to adversarial examples: a) explicitly training classifiers with small Lipschitz constants, and b) Randomized smoothing, which adds random noise to the input to create a smooth classifier. We propose \textit{S… ▽ More Certifiable robustness gives the guarantee that small perturbations around an input to a classifier will not change the prediction. There are two approaches to provide certifiable robustness to adversarial examples: a) explicitly training classifiers with small Lipschitz constants, and b) Randomized smoothing, which adds random noise to the input to create a smooth classifier. We propose \textit{SPLITZ}, a practical and novel approach which leverages the synergistic benefits of both the above ideas into a single framework. Our main idea is to \textit{split} a classifier into two halves, constrain the Lipschitz constant of the first half, and smooth the second half via randomization. Motivation for \textit{SPLITZ} comes from the observation that many standard deep networks exhibit heterogeneity in Lipschitz constants across layers. \textit{SPLITZ} can exploit this heterogeneity while inheriting the scalability of randomized smoothing. We present a principled approach to train \textit{SPLITZ} and provide theoretical analysis to derive certified robustness guarantees during inference. We present a comprehensive comparison of robustness-accuracy tradeoffs and show that \textit{SPLITZ} consistently improves upon existing state-of-the-art approaches on MNIST and CIFAR-10 datasets. For instance, with $\ell_2$ norm perturbation budget of \textbf{$ε=1$}, \textit{SPLITZ} achieves $\textbf{43.2\%}$ top-1 test accuracy on CIFAR-10 dataset compared to state-of-art top-1 test accuracy $\textbf{39.8\%} △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02807 [pdf, other]

Regional and Temporal Patterns of Partisan Polarization during the COVID-19 Pandemic in the United States and Canada

Authors: Zachary Yang, Anne Imouza, Maximilian Puelma Touzel, Cecile Amadoro, Gabrielle Desrosiers-Brisebois, Kellin Pelrine, Sacha Levy, Jean-Francois Godbout, Reihaneh Rabbany

Abstract: Public health measures were among the most polarizing topics debated online during the COVID-19 pandemic. Much of the discussion surrounded specific events, such as when and which particular interventions came into practise. In this work, we develop and apply an approach to measure subnational and event-driven variation of partisan polarization and explore how these dynamics varied both across and… ▽ More Public health measures were among the most polarizing topics debated online during the COVID-19 pandemic. Much of the discussion surrounded specific events, such as when and which particular interventions came into practise. In this work, we develop and apply an approach to measure subnational and event-driven variation of partisan polarization and explore how these dynamics varied both across and within countries. We apply our measure to a dataset of over 50 million tweets posted during late 2020, a salient period of polarizing discourse in the early phase of the pandemic. In particular, we examine regional variations in both the United States and Canada, focusing on three specific health interventions: lockdowns, masks, and vaccines. We find that more politically conservative regions had higher levels of partisan polarization in both countries, especially in the US where a strong negative correlation exists between regional vaccination rates and degree of polarization in vaccine related discussions. We then analyze the timing, context, and profile of spikes in polarization, linking them to specific events discussed on social media across different regions in both countries. These typically last only a few days in duration, suggesting that online discussions reflect and could even drive changes in public opinion, which in the context of pandemic response impacts public health outcomes across different regions and over time. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 19 pages (main paper), 9 figures, 1 table

ACM Class: J.4

arXiv:2407.02794 [pdf, other]

Euler's Elastica Based Cartoon-Smooth-Texture Image Decomposition

Authors: Roy Y. He, Hao Liu

Abstract: We propose a novel model for decomposing grayscale images into three distinct components: the structural part, representing sharp boundaries and regions with strong light-to-dark transitions; the smooth part, capturing soft shadows and shades; and the oscillatory part, characterizing textures and noise. To capture the homogeneous structures, we introduce a combination of $L^0$-gradient and curvatu… ▽ More We propose a novel model for decomposing grayscale images into three distinct components: the structural part, representing sharp boundaries and regions with strong light-to-dark transitions; the smooth part, capturing soft shadows and shades; and the oscillatory part, characterizing textures and noise. To capture the homogeneous structures, we introduce a combination of $L^0$-gradient and curvature regularization on level lines. This new regularization term enforces strong sparsity on the image gradient while reducing the undesirable staircase effects as well as preserving the geometry of contours. For the smoothly varying component, we utilize the $L^2$-norm of the Laplacian that favors isotropic smoothness. To capture the oscillation, we use the inverse Sobolev seminorm. To solve the associated minimization problem, we design an efficient operator-splitting algorithm. Our algorithm effectively addresses the challenging non-convex non-smooth problem by separating it into sub-problems. Each sub-problem can be solved either directly using closed-form solutions or efficiently using the Fast Fourier Transform (FFT). We provide systematic experiments, including ablation and comparison studies, to analyze our model's behaviors and demonstrate its effectiveness as well as efficiency. △ Less

Submitted 2 July, 2024; originally announced July 2024.

MSC Class: 68U10; 94A08; 65D18

arXiv:2407.02766 [pdf, other]

Balancing Patient Privacy and Health Data Security: The Role of Compliance in Protected Health Information (PHI) Sharing

Authors: Md Al Amin, Hemanth Tummala, Rushabh Shah, Indrajit Ray

Abstract: Protected Health Information (PHI) sharing significantly enhances patient care quality and coordination, contributing to more accurate diagnoses, efficient treatment plans, and a comprehensive understanding of patient history. Compliance with strict privacy and security policies, such as those required by laws like HIPAA, is critical to protect PHI. Blockchain technology, which offers a decentrali… ▽ More Protected Health Information (PHI) sharing significantly enhances patient care quality and coordination, contributing to more accurate diagnoses, efficient treatment plans, and a comprehensive understanding of patient history. Compliance with strict privacy and security policies, such as those required by laws like HIPAA, is critical to protect PHI. Blockchain technology, which offers a decentralized and tamper-evident ledger system, hold promise in policy compliance. This system ensures the authenticity and integrity of PHI while facilitating patient consent management. In this work, we propose a blockchain technology that integrates smart contracts to partially automate consent-related processes and ensuring that PHI access and sharing follow patient preferences and legal requirements. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: The 21st International Conference on Security and Cryptography (SECRYPT 2024)

arXiv:2407.02751 [pdf, other]

Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset

Authors: Rui Liu, Haolin Zuo, Zheng Lian, Xiaofen Xing, Björn W. Schuller, Haizhou Li

Abstract: Emotion and Intent Joint Understanding in Multimodal Conversation (MC-EIU) aims to decode the semantic information manifested in a multimodal conversational history, while inferring the emotions and intents simultaneously for the current utterance. MC-EIU is enabling technology for many human-computer interfaces. However, there is a lack of available datasets in terms of annotation, modality, lang… ▽ More Emotion and Intent Joint Understanding in Multimodal Conversation (MC-EIU) aims to decode the semantic information manifested in a multimodal conversational history, while inferring the emotions and intents simultaneously for the current utterance. MC-EIU is enabling technology for many human-computer interfaces. However, there is a lack of available datasets in terms of annotation, modality, language diversity, and accessibility. In this work, we propose an MC-EIU dataset, which features 7 emotion categories, 9 intent categories, 3 modalities, i.e., textual, acoustic, and visual content, and two languages, i.e., English and Mandarin. Furthermore, it is completely open-source for free access. To our knowledge, MC-EIU is the first comprehensive and rich emotion and intent joint understanding dataset for multimodal conversation. Together with the release of the dataset, we also develop an Emotion and Intent Interaction (EI$^2$) network as a reference system by modeling the deep correlation between emotion and intent in the multimodal conversation. With comparative experiments and ablation studies, we demonstrate the effectiveness of the proposed EI$^2$ method on the MC-EIU dataset. The dataset and codes will be made available at: https://github.com/MC-EIU/MC-EIU. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 26 pages, 8 figures, 12 tables, NeurIPS 2024 Dataset and Benchmark Track

arXiv:2407.02750 [pdf, other]

Learning to Reduce: Towards Improving Performance of Large Language Models on Structured Data

Authors: Younghun Lee, Sungchul Kim, Ryan A. Rossi, Tong Yu, Xiang Chen

Abstract: Large Language Models (LLMs) have been achieving competent performance on a wide range of downstream tasks, yet existing work shows that inference on structured data is challenging for LLMs. This is because LLMs need to either understand long structured data or select the most relevant evidence before inference, and both approaches are not trivial. This paper proposes a framework, Learning to Redu… ▽ More Large Language Models (LLMs) have been achieving competent performance on a wide range of downstream tasks, yet existing work shows that inference on structured data is challenging for LLMs. This is because LLMs need to either understand long structured data or select the most relevant evidence before inference, and both approaches are not trivial. This paper proposes a framework, Learning to Reduce, that fine-tunes a language model with On-Policy Learning to generate a reduced version of an input structured data. When compared to state-of-the-art LLMs like GPT-4, Learning to Reduce not only achieves outstanding performance in reducing the input, but shows generalizability on different datasets. We further show that the model fine-tuned with our framework helps LLMs better perform on table QA tasks especially when the context is longer. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: ICML 2024 Workshop on Long-Context Foundation Models, Vienna, Austria 2024. arXiv admin note: substantial text overlap with arXiv:2402.14195

arXiv:2407.02748 [pdf, other]

DRLQ: A Deep Reinforcement Learning-based Task Placement for Quantum Cloud Computing

Authors: Hoa T. Nguyen, Muhammad Usman, Rajkumar Buyya

Abstract: The quantum cloud computing paradigm presents unique challenges in task placement due to the dynamic and heterogeneous nature of quantum computation resources. Traditional heuristic approaches fall short in adapting to the rapidly evolving landscape of quantum computing. This paper proposes DRLQ, a novel Deep Reinforcement Learning (DRL)-based technique for task placement in quantum cloud computin… ▽ More The quantum cloud computing paradigm presents unique challenges in task placement due to the dynamic and heterogeneous nature of quantum computation resources. Traditional heuristic approaches fall short in adapting to the rapidly evolving landscape of quantum computing. This paper proposes DRLQ, a novel Deep Reinforcement Learning (DRL)-based technique for task placement in quantum cloud computing environments, addressing the optimization of task completion time and quantum task scheduling efficiency. It leverages the Deep Q Network (DQN) architecture, enhanced with the Rainbow DQN approach, to create a dynamic task placement strategy. This approach is one of the first in the field of quantum cloud resource management, enabling adaptive learning and decision-making for quantum cloud environments and effectively optimizing task placement based on changing conditions and resource availability. We conduct extensive experiments using the QSimPy simulation toolkit to evaluate the performance of our method, demonstrating substantial improvements in task execution efficiency and a reduction in the need to reschedule quantum tasks. Our results show that utilizing the DRLQ approach for task placement can significantly reduce total quantum task completion time by 37.81% to 72.93% and prevent task rescheduling attempts compared to other heuristic approaches. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted paper at IEEE CLOUD 2024 conference

Showing 1–50 of 103,283 results for author: R