-
FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography
Authors:
Julia Yang,
Alina Jade Barnett,
Jon Donnelly,
Satvik Kishore,
Jerry Fang,
Fides Regina Schwartz,
Chaofan Chen,
Joseph Y. Lo,
Cynthia Rudin
Abstract:
Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency t…
▽ More
Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency to these formerly black boxes by utilizing prototypes for case-based explanations, achieving high accuracy in applications including mammography. However, these models struggle with precise feature localization, reasoning on large portions of an image when only a small part is relevant. This paper addresses this gap by proposing a novel multi-scale interpretable deep learning model for mammographic mass margin classification. Our contribution not only offers an interpretable model with reasoning aligned with radiologist practices, but also provides a general architecture for computer vision with user-configurable prototypes from coarse- to fine-grained prototypes.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
Authors:
David Romero,
Chenyang Lyu,
Haryo Akbarianto Wibowo,
Teresa Lynn,
Injy Hamed,
Aditya Nanda Kishore,
Aishik Mandal,
Alina Dragonetti,
Artem Abzaliev,
Atnafu Lambebo Tonja,
Bontu Fufa Balcha,
Chenxi Whitehouse,
Christian Salamea,
Dan John Velasco,
David Ifeoluwa Adelani,
David Le Meur,
Emilio Villa-Cueva,
Fajri Koto,
Fauzan Farooqui,
Frederico Belcavello,
Ganzorig Batnasan,
Gisela Vallejo,
Grainne Caulfield,
Guido Ivetta,
Haiyue Song
, et al. (50 additional authors not shown)
Abstract:
Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen…
▽ More
Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages and cultures, where we engage native speakers and cultural experts in the data collection process. As a result, CVQA includes culturally-driven images and questions from across 28 countries on four continents, covering 26 languages with 11 scripts, providing a total of 9k questions. We then benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models. This benchmark can serve as a probing evaluation suite for assessing the cultural capability and bias of multimodal models and hopefully encourage more research efforts toward increasing cultural awareness and linguistic diversity in this field.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
EASI-Tex: Edge-Aware Mesh Texturing from Single Image
Authors:
Sai Raj Kishore Perla,
Yizhi Wang,
Ali Mahdavi-Amiri,
Hao Zhang
Abstract:
We present a novel approach for single-image mesh texturing, which employs a diffusion model with judicious conditioning to seamlessly transfer an object's texture from a single RGB image to a given 3D mesh object. We do not assume that the two objects belong to the same category, and even if they do, there can be significant discrepancies in their geometry and part proportions. Our method aims to…
▽ More
We present a novel approach for single-image mesh texturing, which employs a diffusion model with judicious conditioning to seamlessly transfer an object's texture from a single RGB image to a given 3D mesh object. We do not assume that the two objects belong to the same category, and even if they do, there can be significant discrepancies in their geometry and part proportions. Our method aims to rectify the discrepancies by conditioning a pre-trained Stable Diffusion generator with edges describing the mesh through ControlNet, and features extracted from the input image using IP-Adapter to generate textures that respect the underlying geometry of the mesh and the input texture without any optimization or training. We also introduce Image Inversion, a novel technique to quickly personalize the diffusion model for a single concept using a single image, for cases where the pre-trained IP-Adapter falls short in capturing all the details from the input image faithfully. Experimental results demonstrate the efficiency and effectiveness of our edge-aware single-image mesh texturing approach, coined EASI-Tex, in preserving the details of the input texture on diverse 3D objects, while respecting their geometry.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Deep Oscillatory Neural Network
Authors:
Nurani Rajagopal Rohan,
Vigneswaran C,
Sayan Ghosh,
Kishore Rajendran,
Gaurav A,
V Srinivasa Chakravarthy
Abstract:
We propose a novel, brain-inspired deep neural network model known as the Deep Oscillatory Neural Network (DONN). Deep neural networks like the Recurrent Neural Networks indeed possess sequence processing capabilities but the internal states of the network are not designed to exhibit brain-like oscillatory activity. With this motivation, the DONN is designed to have oscillatory internal dynamics.…
▽ More
We propose a novel, brain-inspired deep neural network model known as the Deep Oscillatory Neural Network (DONN). Deep neural networks like the Recurrent Neural Networks indeed possess sequence processing capabilities but the internal states of the network are not designed to exhibit brain-like oscillatory activity. With this motivation, the DONN is designed to have oscillatory internal dynamics. Neurons of the DONN are either nonlinear neural oscillators or traditional neurons with sigmoidal or ReLU activation. The neural oscillator used in the model is the Hopf oscillator, with the dynamics described in the complex domain. Input can be presented to the neural oscillator in three possible modes. The sigmoid and ReLU neurons also use complex-valued extensions. All the weight stages are also complex-valued. Training follows the general principle of weight change by minimizing the output error and therefore has an overall resemblance to complex backpropagation. A generalization of DONN to convolutional networks known as the Oscillatory Convolutional Neural Network is also proposed. The two proposed oscillatory networks are applied to a variety of benchmark problems in signal and image/video processing. The performance of the proposed models is either comparable or superior to published results on the same data sets.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Recommenadation aided Caching using Combinatorial Multi-armed Bandits
Authors:
Pavamana K J,
Chandramani Kishore Singh
Abstract:
We study content caching with recommendations in a wireless network where the users are connected through a base station equipped with a finite-capacity cache. We assume a fixed set of contents with unknown user preferences and content popularities. We can recommend a subset of the contents to the users which encourages the users to request these contents. Recommendation can thus be used to increa…
▽ More
We study content caching with recommendations in a wireless network where the users are connected through a base station equipped with a finite-capacity cache. We assume a fixed set of contents with unknown user preferences and content popularities. We can recommend a subset of the contents to the users which encourages the users to request these contents. Recommendation can thus be used to increase cache hits. We formulate the cache hit optimization problem as a combinatorial multi-armed bandit (CMAB). We propose a UCB-based algorithm to decide which contents to cache and recommend. We provide an upper bound on the regret of our algorithm. We numerically demonstrate the performance of our algorithm and compare it to state-of-the-art algorithms.
△ Less
Submitted 3 May, 2024; v1 submitted 30 April, 2024;
originally announced May 2024.
-
Evaluating Telugu Proficiency in Large Language Models_ A Comparative Analysis of ChatGPT and Gemini
Authors:
Katikela Sreeharsha Kishore,
Rahimanuddin Shaik
Abstract:
The growing prominence of large language models (LLMs) necessitates the exploration of their capabilities beyond English. This research investigates the Telugu language proficiency of ChatGPT and Gemini, two leading LLMs. Through a designed set of 20 questions encompassing greetings, grammar, vocabulary, common phrases, task completion, and situational reasoning, the study delves into their streng…
▽ More
The growing prominence of large language models (LLMs) necessitates the exploration of their capabilities beyond English. This research investigates the Telugu language proficiency of ChatGPT and Gemini, two leading LLMs. Through a designed set of 20 questions encompassing greetings, grammar, vocabulary, common phrases, task completion, and situational reasoning, the study delves into their strengths and weaknesses in handling Telugu. The analysis aims to identify the LLM that demonstrates a deeper understanding of Telugu grammatical structures, possesses a broader vocabulary, and exhibits superior performance in tasks like writing and reasoning. By comparing their ability to comprehend and use everyday Telugu expressions, the research sheds light on their suitability for real-world language interaction. Furthermore, the evaluation of adaptability and reasoning capabilities provides insights into how each LLM leverages Telugu to respond to dynamic situations. This comparative analysis contributes to the ongoing discussion on multilingual capabilities in AI and paves the way for future research in develo** LLMs that can seamlessly integrate with Telugu-speaking communities.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Exploring and Improving Drafts in Blockwise Parallel Decoding
Authors:
Taehyeon Kim,
Ananda Theertha Suresh,
Kishore Papineni,
Michael Riley,
Sanjiv Kumar,
Adrian Benton
Abstract:
Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. as a method to improve inference speed of language models by simultaneously predicting multiple future tokens, termed block drafts, which are subsequently verifie…
▽ More
Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. as a method to improve inference speed of language models by simultaneously predicting multiple future tokens, termed block drafts, which are subsequently verified and conditionally accepted by the autoregressive model. This paper contributes to the understanding and improvement of block drafts in two ways. First, we analyze the token distributions produced by multiple prediction heads. Secondly, we leverage this analysis to develop algorithms to improve BPD inference speed by refining the block drafts using n-gram and neural language models. Experiments demonstrate that refined block drafts yield a +5-21% increase in block efficiency (i.e., the number of accepted tokens from the block draft) across diverse datasets.
△ Less
Submitted 5 June, 2024; v1 submitted 14 April, 2024;
originally announced April 2024.
-
Privacy preserving layer partitioning for Deep Neural Network models
Authors:
Kishore Rajasekar,
Randolph Loh,
Kar Wai Fok,
Vrizlynn L. L. Thing
Abstract:
MLaaS (Machine Learning as a Service) has become popular in the cloud computing domain, allowing users to leverage cloud resources for running private inference of ML models on their data. However, ensuring user input privacy and secure inference execution is essential. One of the approaches to protect data privacy and integrity is to use Trusted Execution Environments (TEEs) by enabling execution…
▽ More
MLaaS (Machine Learning as a Service) has become popular in the cloud computing domain, allowing users to leverage cloud resources for running private inference of ML models on their data. However, ensuring user input privacy and secure inference execution is essential. One of the approaches to protect data privacy and integrity is to use Trusted Execution Environments (TEEs) by enabling execution of programs in secure hardware enclave. Using TEEs can introduce significant performance overhead due to the additional layers of encryption, decryption, security and integrity checks. This can lead to slower inference times compared to running on unprotected hardware. In our work, we enhance the runtime performance of ML models by introducing layer partitioning technique and offloading computations to GPU. The technique comprises two distinct partitions: one executed within the TEE, and the other carried out using a GPU accelerator. Layer partitioning exposes intermediate feature maps in the clear which can lead to reconstruction attacks to recover the input. We conduct experiments to demonstrate the effectiveness of our approach in protecting against input reconstruction attacks developed using trained conditional Generative Adversarial Network(c-GAN). The evaluation is performed on widely used models such as VGG-16, ResNet-50, and EfficientNetB0, using two datasets: ImageNet for Image classification and TON IoT dataset for cybersecurity attack detection.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
STITCH: Augmented Dexterity for Suture Throws Including Thread Coordination and Handoffs
Authors:
Kush Hari,
Hansoul Kim,
Will Panitch,
Kishore Srinivas,
Vincent Schorp,
Karthik Dharmarajan,
Shreya Ganti,
Tara Sadjadpour,
Ken Goldberg
Abstract:
We present STITCH: an augmented dexterity pipeline that performs Suture Throws Including Thread Coordination and Handoffs. STITCH iteratively performs needle insertion, thread swee**, needle extraction, suture cinching, needle handover, and needle pose correction with failure recovery policies. We introduce a novel visual 6D needle pose estimation framework using a stereo camera pair and new sut…
▽ More
We present STITCH: an augmented dexterity pipeline that performs Suture Throws Including Thread Coordination and Handoffs. STITCH iteratively performs needle insertion, thread swee**, needle extraction, suture cinching, needle handover, and needle pose correction with failure recovery policies. We introduce a novel visual 6D needle pose estimation framework using a stereo camera pair and new suturing motion primitives. We compare STITCH to baselines, including a proprioception-only and a policy without visual servoing. In physical experiments across 15 trials, STITCH achieves an average of 2.93 sutures without human intervention and 4.47 sutures with human intervention. See https://sites.google.com/berkeley.edu/stitch for code and supplemental materials.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Human Mobility in the Metaverse
Authors:
Kishore Vasan,
Marton Karsai,
Albert-Laszlo Barabasi
Abstract:
The metaverse promises a shift in the way humans interact with each other, and with their digital and physical environments. The lack of geographical boundaries and travel costs in the metaverse prompts us to ask if the fundamental laws that govern human mobility in the physical world apply. We collected data on avatar movements, along with their network mobility extracted from NFT purchases. We f…
▽ More
The metaverse promises a shift in the way humans interact with each other, and with their digital and physical environments. The lack of geographical boundaries and travel costs in the metaverse prompts us to ask if the fundamental laws that govern human mobility in the physical world apply. We collected data on avatar movements, along with their network mobility extracted from NFT purchases. We find that despite the absence of commuting costs, an individuals inclination to explore new locations diminishes over time, limiting movement to a small fraction of the metaverse. We also find a lack of correlation between land prices and visitation, a deviation from the patterns characterizing the physical world. Finally, we identify the scaling laws that characterize meta mobility and show that we need to add preferential selection to the existing models to explain quantitative patterns of metaverse mobility. Our ability to predict the characteristics of the emerging meta mobility network implies that the laws governing human mobility are rooted in fundamental patterns of human dynamics, rather than the nature of space and cost of movement.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Enhancing Research Information Systems with Identification of Domain Experts
Authors:
Gautam Kishore Shahi,
Oliver Hummel
Abstract:
Research organisations and their research outputs have been growing considerably in the past decades. This large body of knowledge attracts various stakeholders, e.g., for knowledge sharing, technology transfer, or potential collaborations. However, due to the large amount of complex knowledge created, traditional methods of manually curating catalogues are often out of time, imprecise, and cumber…
▽ More
Research organisations and their research outputs have been growing considerably in the past decades. This large body of knowledge attracts various stakeholders, e.g., for knowledge sharing, technology transfer, or potential collaborations. However, due to the large amount of complex knowledge created, traditional methods of manually curating catalogues are often out of time, imprecise, and cumbersome. Finding domain experts and knowledge within any larger organisation, scientific and also industrial, has thus become a serious challenge. Hence, exploring an institutions domain knowledge and finding its experts can only be solved by an automated solution. This work presents the scheme of an automated approach for identifying scholarly experts based on their publications and, prospectively, their teaching materials. Based on a search engine, this approach is currently being implemented for two universities, for which some examples are presented. The proposed system will be helpful for finding peer researchers as well as starting points for knowledge exploitation and technology transfer. As the system is designed in a scalable manner, it can easily include additional institutions and hence provide a broader coverage of research facilities in the future.
△ Less
Submitted 28 March, 2024;
originally announced April 2024.
-
Unveiling Divergent Inductive Biases of LLMs on Temporal Data
Authors:
Sindhu Kishore,
Hangfeng He
Abstract:
Unraveling the intricate details of events in natural language necessitates a subtle understanding of temporal dynamics. Despite the adeptness of Large Language Models (LLMs) in discerning patterns and relationships from data, their inherent comprehension of temporal dynamics remains a formidable challenge. This research meticulously explores these intrinsic challenges within LLMs, with a specific…
▽ More
Unraveling the intricate details of events in natural language necessitates a subtle understanding of temporal dynamics. Despite the adeptness of Large Language Models (LLMs) in discerning patterns and relationships from data, their inherent comprehension of temporal dynamics remains a formidable challenge. This research meticulously explores these intrinsic challenges within LLMs, with a specific emphasis on evaluating the performance of GPT-3.5 and GPT-4 models in the analysis of temporal data. Employing two distinct prompt types, namely Question Answering (QA) format and Textual Entailment (TE) format, our analysis probes into both implicit and explicit events. The findings underscore noteworthy trends, revealing disparities in the performance of GPT-3.5 and GPT-4. Notably, biases toward specific temporal relationships come to light, with GPT-3.5 demonstrating a preference for "AFTER'' in the QA format for both implicit and explicit events, while GPT-4 leans towards "BEFORE''. Furthermore, a consistent pattern surfaces wherein GPT-3.5 tends towards "TRUE'', and GPT-4 exhibits a preference for "FALSE'' in the TE format for both implicit and explicit events. This persistent discrepancy between GPT-3.5 and GPT-4 in handling temporal data highlights the intricate nature of inductive bias in LLMs, suggesting that the evolution of these models may not merely mitigate bias but may introduce new layers of complexity.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks
Authors:
Abeer Banerjee,
Naval K. Mehta,
Shyam S. Prasad,
Himanshu,
Sumeet Saurav,
Sanjay Singh
Abstract:
In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems. Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture. The temporal encoding metho…
▽ More
In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems. Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture. The temporal encoding method seamlessly integrates Dynamic Vision Sensor (DVS) events with grayscale guide frames, generating consecutively encoded images for input into our neural network. This unique solution not only captures diverse gaze responses from participants within the active age group but also introduces a curated dataset tailored for low-light conditions. The encoded temporal frames paired with our network showcase impressive spatial localization and reliable gaze direction in their predictions. Achieving a remarkable 100-pixel accuracy of 100%, our research underscores the potency of our neural network to work with temporally consecutive encoded images for precise gaze vector predictions in challenging low-light videos, contributing to the advancement of gaze prediction technologies.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix
Authors:
Mrinmay Sen,
A. K. Qin,
Gayathri C,
Raghu Kishore N,
Yen-Wei Chen,
Balasubramanian Raman
Abstract:
This paper introduces a new stochastic optimization method based on the regularized Fisher information matrix (FIM), named SOFIM, which can efficiently utilize the FIM to approximate the Hessian matrix for finding Newton's gradient update in large-scale stochastic optimization of machine learning models. It can be viewed as a variant of natural gradient descent, where the challenge of storing and…
▽ More
This paper introduces a new stochastic optimization method based on the regularized Fisher information matrix (FIM), named SOFIM, which can efficiently utilize the FIM to approximate the Hessian matrix for finding Newton's gradient update in large-scale stochastic optimization of machine learning models. It can be viewed as a variant of natural gradient descent, where the challenge of storing and calculating the full FIM is addressed through making use of the regularized FIM and directly finding the gradient update direction via Sherman-Morrison matrix inversion. Additionally, like the popular Adam method, SOFIM uses the first moment of the gradient to address the issue of non-stationary objectives across mini-batches due to heterogeneous data. The utilization of the regularized FIM and Sherman-Morrison matrix inversion leads to the improved convergence rate with the same space and time complexities as stochastic gradient descent (SGD) with momentum. The extensive experiments on training deep learning models using several benchmark image classification datasets demonstrate that the proposed SOFIM outperforms SGD with momentum and several state-of-the-art Newton optimization methods in term of the convergence speed for achieving the pre-specified objectives of training and test losses as well as test accuracy.
△ Less
Submitted 1 May, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
TweetInfo: An Interactive System to Mitigate Online Harm
Authors:
Gautam Kishore Shahi
Abstract:
The increase in active users on social networking sites (SNSs) has also observed an increase in harmful content on social media sites. Harmful content is described as an inappropriate activity to harm or deceive an individual or a group of users. Alongside existing methods to detect misinformation and hate speech, users still need to be well-informed about the harmfulness of the content on SNSs. T…
▽ More
The increase in active users on social networking sites (SNSs) has also observed an increase in harmful content on social media sites. Harmful content is described as an inappropriate activity to harm or deceive an individual or a group of users. Alongside existing methods to detect misinformation and hate speech, users still need to be well-informed about the harmfulness of the content on SNSs. This study proposes a user-interactive system TweetInfo for mitigating the consumption of harmful content by providing metainformation about the posts. It focuses on two types of harmful content: hate speech and misinformation. TweetInfo provides insights into tweets by doing content analysis. Based on previous research, we have selected a list of metainformation. We offer the option to filter content based on metainformation Bot, Hate Speech, Misinformation, Verified Account, Sentiment, Tweet Category, Language. The proposed user interface allows customising the user's timeline to mitigate harmful content. This study present the demo version of the propose user interface of TweetInfo.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Graph Representation Learning for Contention and Interference Management in Wireless Networks
Authors:
Zhouyou Gu,
Branka Vucetic,
Kishore Chikkam,
Pasquale Aliberti,
Wibowo Hardjawana
Abstract:
Restricted access window (RAW) in Wi-Fi 802.11ah networks manages contention and interference by grou** users and allocating periodic time slots for each group's transmissions. We will find the optimal user grou** decisions in RAW to maximize the network's worst-case user throughput. We review existing user grou** approaches and highlight their performance limitations in the above problem. W…
▽ More
Restricted access window (RAW) in Wi-Fi 802.11ah networks manages contention and interference by grou** users and allocating periodic time slots for each group's transmissions. We will find the optimal user grou** decisions in RAW to maximize the network's worst-case user throughput. We review existing user grou** approaches and highlight their performance limitations in the above problem. We propose formulating user grou** as a graph construction problem where vertices represent users and edge weights indicate the contention and interference. This formulation leverages the graph's max cut to group users and optimizes edge weights to construct the optimal graph whose max cut yields the optimal grou** decisions. To achieve this optimal graph construction, we design an actor-critic graph representation learning (AC-GRL) algorithm. Specifically, the actor neural network (NN) is trained to estimate the optimal graph's edge weights using path losses between users and access points. A graph cut procedure uses semidefinite programming to solve the max cut efficiently and return the grou** decisions for the given weights. The critic NN approximates user throughput achieved by the above-returned decisions and is used to improve the actor. Additionally, we present an architecture that uses the online-measured throughput and path losses to fine-tune the decisions in response to changes in user populations and their locations. Simulations show that our methods achieve $30\%\sim80\%$ higher worst-case user throughput than the existing approaches and that the proposed architecture can further improve the worst-case user throughput by $5\%\sim30\%$ while ensuring timely updates of grou** decisions.
△ Less
Submitted 15 January, 2024;
originally announced February 2024.
-
FakeClaim: A Multiple Platform-driven Dataset for Identification of Fake News on 2023 Israel-Hamas War
Authors:
Gautam Kishore Shahi,
Amit Kumar Jaiswal,
Thomas Mandl
Abstract:
We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification. The FakeClaim data is collected from 60 fact-checking organizations in 30 languages and enriched with metadata from the fact-checking organizations curated by trained journalists specialized in fact-check…
▽ More
We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification. The FakeClaim data is collected from 60 fact-checking organizations in 30 languages and enriched with metadata from the fact-checking organizations curated by trained journalists specialized in fact-checking. Further, we classify fake videos within the subset of YouTube videos using textual information and user comments. We used a pre-trained model to classify each video with different feature combinations. Our best-performing fine-tuned language model, Universal Sentence Encoder (USE), achieves a Macro F1 of 87\%, which shows that the trained model can be helpful for debunking fake videos using the comments from the user discussion. The dataset is available on Github\footnote{https://github.com/Gautamshahi/FakeClaim}
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Integrating Human Vision Perception in Vision Transformers for Classifying Waste Items
Authors:
Akshat Kishore Shrivastava,
Tapan Kumar Gandhi
Abstract:
In this paper, we propose an novel methodology aimed at simulating the learning phenomenon of nystagmus through the application of differential blurring on datasets. Nystagmus is a biological phenomenon that influences human vision throughout life, notably by diminishing head shake from infancy to adulthood. Leveraging this concept, we address the issue of waste classification, a pressing global c…
▽ More
In this paper, we propose an novel methodology aimed at simulating the learning phenomenon of nystagmus through the application of differential blurring on datasets. Nystagmus is a biological phenomenon that influences human vision throughout life, notably by diminishing head shake from infancy to adulthood. Leveraging this concept, we address the issue of waste classification, a pressing global concern. The proposed framework comprises two modules, with the second module closely resembling the original Vision Transformer, a state-of-the-art model model in classification tasks. The primary motivation behind our approach is to enhance the model's precision and adaptability, mirroring the real-world conditions that the human visual system undergoes. This novel methodology surpasses the standard Vision Transformer model in waste classification tasks, exhibiting an improvement with a margin of 2%. This improvement underscores the potential of our methodology in improving model precision by drawing inspiration from human vision perception. Further research in the proposed methodology could yield greater performance results, and can be extrapolated to other global issues.
△ Less
Submitted 20 December, 2023; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Transformer-based Selective Super-Resolution for Efficient Image Refinement
Authors:
Tianyi Zhang,
Kishore Kasichainula,
Yaoxin Zhuo,
Baoxin Li,
Jae-sun Seo,
Yu Cao
Abstract:
Conventional super-resolution methods suffer from two drawbacks: substantial computational cost in upscaling an entire large image, and the introduction of extraneous or potentially detrimental information for downstream computer vision tasks during the refinement of the background. To solve these issues, we propose a novel transformer-based algorithm, Selective Super-Resolution (SSR), which parti…
▽ More
Conventional super-resolution methods suffer from two drawbacks: substantial computational cost in upscaling an entire large image, and the introduction of extraneous or potentially detrimental information for downstream computer vision tasks during the refinement of the background. To solve these issues, we propose a novel transformer-based algorithm, Selective Super-Resolution (SSR), which partitions images into non-overlap** tiles, selects tiles of interest at various scales with a pyramid architecture, and exclusively reconstructs these selected tiles with deep features. Experimental results on three datasets demonstrate the efficiency and robust performance of our approach for super-resolution. Compared to the state-of-the-art methods, the FID score is reduced from 26.78 to 10.41 with 40% reduction in computation cost for the BDD100K dataset. The source code is available at https://github.com/destiny301/SSR.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Patch-based Selection and Refinement for Early Object Detection
Authors:
Tianyi Zhang,
Kishore Kasichainula,
Yaoxin Zhuo,
Baoxin Li,
Jae-Sun Seo,
Yu Cao
Abstract:
Early object detection (OD) is a crucial task for the safety of many dynamic systems. Current OD algorithms have limited success for small objects at a long distance. To improve the accuracy and efficiency of such a task, we propose a novel set of algorithms that divide the image into patches, select patches with objects at various scales, elaborate the details of a small object, and detect it as…
▽ More
Early object detection (OD) is a crucial task for the safety of many dynamic systems. Current OD algorithms have limited success for small objects at a long distance. To improve the accuracy and efficiency of such a task, we propose a novel set of algorithms that divide the image into patches, select patches with objects at various scales, elaborate the details of a small object, and detect it as early as possible. Our approach is built upon a transformer-based network and integrates the diffusion model to improve the detection accuracy. As demonstrated on BDD100K, our algorithms enhance the mAP for small objects from 1.03 to 8.93, and reduce the data volume in computation by more than 77\%. The source code is available at \href{https://github.com/destiny301/dpr}{https://github.com/destiny301/dpr}
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Correction with Backtracking Reduces Hallucination in Summarization
Authors:
Zhenzhen Liu,
Chao Wan,
Varsha Kishore,
** Peng Zhou,
Minmin Chen,
Kilian Q. Weinberger
Abstract:
Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating (or more correctly confabulating), that is to produce summaries with details that are not grounded in the source document. In this paper, we intr…
▽ More
Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating (or more correctly confabulating), that is to produce summaries with details that are not grounded in the source document. In this paper, we introduce a simple yet efficient technique, CoBa, to reduce hallucination in abstractive summarization. The approach is based on two steps: hallucination detection and mitigation. We show that the former can be achieved through measuring simple statistics about conditional word probabilities and distance to context words. Further, we demonstrate that straight-forward backtracking is surprisingly effective at mitigation. We thoroughly evaluate the proposed method with prior art on three benchmark datasets for text summarization. The results show that CoBa is effective and efficient in reducing hallucination, and offers great adaptability and flexibility.
△ Less
Submitted 31 October, 2023; v1 submitted 24 October, 2023;
originally announced October 2023.
-
IncDSI: Incrementally Updatable Document Retrieval
Authors:
Varsha Kishore,
Chao Wan,
Justin Lovelace,
Yoav Artzi,
Kilian Q. Weinberger
Abstract:
Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not…
▽ More
Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not easy to add new documents after a model is trained. We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof). Instead we formulate the addition of documents as a constrained optimization problem that makes minimal changes to the network parameters. Although orders of magnitude faster, our approach is competitive with re-training the model on the whole dataset and enables the development of document retrieval systems that can be updated with new information in real-time. Our code for IncDSI is available at https://github.com/varshakishore/IncDSI.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media
Authors:
Liam Hebert,
Gaurav Sahu,
Yuxuan Guo,
Nanda Kishore Sreenivas,
Lukasz Golab,
Robin Cohen
Abstract:
We present the Multi-Modal Discussion Transformer (mDT), a novel methodfor detecting hate speech in online social networks such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the cont…
▽ More
We present the Multi-Modal Discussion Transformer (mDT), a novel methodfor detecting hate speech in online social networks such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the contextual relationships in the discussion surrounding a comment and grounding the interwoven fusion layers that combine text and image embeddings instead of processing modalities separately. To evaluate our work, we present a new dataset, HatefulDiscussions, comprising complete multi-modal discussions from multiple online communities on Reddit. We compare the performance of our model to baselines that only process individual comments and conduct extensive ablation studies.
△ Less
Submitted 22 February, 2024; v1 submitted 18 July, 2023;
originally announced July 2023.
-
The Busboy Problem: Efficient Tableware Decluttering Using Consolidation and Multi-Object Grasps
Authors:
Kishore Srinivas,
Shreya Ganti,
Rishi Parikh,
Ayah Ahmad,
Wisdom Agboh,
Mehmet Dogar,
Ken Goldberg
Abstract:
We present the "Busboy Problem": automating an efficient decluttering of cups, bowls, and silverware from a planar surface. As gras** and transporting individual items is highly inefficient, we propose policies to generate grasps for multiple items. We introduce the metric of Objects per Trip (OpT) carried by the robot to the collection bin to analyze the improvement seen as a result of our poli…
▽ More
We present the "Busboy Problem": automating an efficient decluttering of cups, bowls, and silverware from a planar surface. As gras** and transporting individual items is highly inefficient, we propose policies to generate grasps for multiple items. We introduce the metric of Objects per Trip (OpT) carried by the robot to the collection bin to analyze the improvement seen as a result of our policies. In physical experiments with singulated items, we find that consolidation and multi-object grasps resulted in an 1.8x improvement in OpT, compared to methods without multi-object grasps. See https://sites.google.com/berkeley.edu/busboyproblem for code and supplemental materials.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
DeepMediX: A Deep Learning-Driven Resource-Efficient Medical Diagnosis Across the Spectrum
Authors:
Kishore Babu Nampalle,
Pradeep Singh,
Uppala Vivek Narayan,
Balasubramanian Raman
Abstract:
In the rapidly evolving landscape of medical imaging diagnostics, achieving high accuracy while preserving computational efficiency remains a formidable challenge. This work presents \texttt{DeepMediX}, a groundbreaking, resource-efficient model that significantly addresses this challenge. Built on top of the MobileNetV2 architecture, DeepMediX excels in classifying brain MRI scans and skin cancer…
▽ More
In the rapidly evolving landscape of medical imaging diagnostics, achieving high accuracy while preserving computational efficiency remains a formidable challenge. This work presents \texttt{DeepMediX}, a groundbreaking, resource-efficient model that significantly addresses this challenge. Built on top of the MobileNetV2 architecture, DeepMediX excels in classifying brain MRI scans and skin cancer images, with superior performance demonstrated on both binary and multiclass skin cancer datasets. It provides a solution to labor-intensive manual processes, the need for large datasets, and complexities related to image properties. DeepMediX's design also includes the concept of Federated Learning, enabling a collaborative learning approach without compromising data privacy. This approach allows diverse healthcare institutions to benefit from shared learning experiences without the necessity of direct data access, enhancing the model's predictive power while preserving the privacy and integrity of sensitive patient data. Its low computational footprint makes DeepMediX suitable for deployment on handheld devices, offering potential for real-time diagnostic support. Through rigorous testing on standard datasets, including the ISIC2018 for dermatological research, DeepMediX demonstrates exceptional diagnostic capabilities, matching the performance of existing models on almost all tasks and even outperforming them in some cases. The findings of this study underline significant implications for the development and deployment of AI-based tools in medical imaging and their integration into point-of-care settings. The source code and models generated would be released at https://github.com/kishorebabun/DeepMediX.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Vision Through the Veil: Differential Privacy in Federated Learning for Medical Image Classification
Authors:
Kishore Babu Nampalle,
Pradeep Singh,
Uppala Vivek Narayan,
Balasubramanian Raman
Abstract:
The proliferation of deep learning applications in healthcare calls for data aggregation across various institutions, a practice often associated with significant privacy concerns. This concern intensifies in medical image analysis, where privacy-preserving mechanisms are paramount due to the data being sensitive in nature. Federated learning, which enables cooperative model training without direc…
▽ More
The proliferation of deep learning applications in healthcare calls for data aggregation across various institutions, a practice often associated with significant privacy concerns. This concern intensifies in medical image analysis, where privacy-preserving mechanisms are paramount due to the data being sensitive in nature. Federated learning, which enables cooperative model training without direct data exchange, presents a promising solution. Nevertheless, the inherent vulnerabilities of federated learning necessitate further privacy safeguards. This study addresses this need by integrating differential privacy, a leading privacy-preserving technique, into a federated learning framework for medical image classification. We introduce a novel differentially private federated learning model and meticulously examine its impacts on privacy preservation and model performance. Our research confirms the existence of a trade-off between model accuracy and privacy settings. However, we demonstrate that strategic calibration of the privacy budget in differential privacy can uphold robust image classification performance while providing substantial privacy protection.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
See Through the Fog: Curriculum Learning with Progressive Occlusion in Medical Imaging
Authors:
Pradeep Singh,
Kishore Babu Nampalle,
Uppala Vivek Narayan,
Balasubramanian Raman
Abstract:
In recent years, deep learning models have revolutionized medical image interpretation, offering substantial improvements in diagnostic accuracy. However, these models often struggle with challenging images where critical features are partially or fully occluded, which is a common scenario in clinical practice. In this paper, we propose a novel curriculum learning-based approach to train deep lear…
▽ More
In recent years, deep learning models have revolutionized medical image interpretation, offering substantial improvements in diagnostic accuracy. However, these models often struggle with challenging images where critical features are partially or fully occluded, which is a common scenario in clinical practice. In this paper, we propose a novel curriculum learning-based approach to train deep learning models to handle occluded medical images effectively. Our method progressively introduces occlusion, starting from clear, unobstructed images and gradually moving to images with increasing occlusion levels. This ordered learning process, akin to human learning, allows the model to first grasp simple, discernable patterns and subsequently build upon this knowledge to understand more complicated, occluded scenarios. Furthermore, we present three novel occlusion synthesis methods, namely Wasserstein Curriculum Learning (WCL), Information Adaptive Learning (IAL), and Geodesic Curriculum Learning (GCL). Our extensive experiments on diverse medical image datasets demonstrate substantial improvements in model robustness and diagnostic accuracy over conventional training methodologies.
△ Less
Submitted 30 June, 2023; v1 submitted 27 June, 2023;
originally announced June 2023.
-
One-shot Imitation Learning via Interaction War**
Authors:
Ondrej Biza,
Skye Thompson,
Kishore Reddy Pagidi,
Abhinav Kumar,
Elise van der Pol,
Robin Walters,
Thomas Kipf,
Jan-Willem van de Meent,
Lawson L. S. Wong,
Robert Platt
Abstract:
Imitation learning of robot policies from few demonstrations is crucial in open-ended applications. We propose a new method, Interaction War**, for learning SE(3) robotic manipulation policies from a single demonstration. We infer the 3D mesh of each object in the environment using shape war**, a technique for aligning point clouds across object instances. Then, we represent manipulation actio…
▽ More
Imitation learning of robot policies from few demonstrations is crucial in open-ended applications. We propose a new method, Interaction War**, for learning SE(3) robotic manipulation policies from a single demonstration. We infer the 3D mesh of each object in the environment using shape war**, a technique for aligning point clouds across object instances. Then, we represent manipulation actions as keypoints on objects, which can be warped with the shape of the object. We show successful one-shot imitation learning on three simulated and real-world object re-arrangement tasks. We also demonstrate the ability of our method to predict object meshes and robot grasps in the wild.
△ Less
Submitted 4 November, 2023; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Contrastive Attention Networks for Attribution of Early Modern Print
Authors:
Nikolai Vogler,
Kartik Goyal,
Kishore PV Reddy,
Elizaveta Pertseva,
Samuel V. Lemley,
Christopher N. Warren,
Max G'Sell,
Taylor Berg-Kirkpatrick
Abstract:
In this paper, we develop machine learning techniques to identify unknown printers in early modern (c.~1500--1800) English printed books. Specifically, we focus on matching uniquely damaged character type-imprints in anonymously printed books to works with known printers in order to provide evidence of their origins. Until now, this work has been limited to manual investigations by analytical bibl…
▽ More
In this paper, we develop machine learning techniques to identify unknown printers in early modern (c.~1500--1800) English printed books. Specifically, we focus on matching uniquely damaged character type-imprints in anonymously printed books to works with known printers in order to provide evidence of their origins. Until now, this work has been limited to manual investigations by analytical bibliographers. We present a Contrastive Attention-based Metric Learning approach to identify similar damage across character image pairs, which is sensitive to very subtle differences in glyph shapes, yet robust to various confounding sources of noise associated with digitized historical books. To overcome the scarce amount of supervised data, we design a random data synthesis procedure that aims to simulate bends, fractures, and inking variations induced by the early printing process. Our method successfully improves downstream damaged type-imprint matching among printed works from this period, as validated by in-domain human experts. The results of our approach on two important philosophical works from the Early Modern period demonstrate potential to extend the extant historical research about the origins and content of these books.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution
Authors:
Prithwish Jana,
Piyush Jha,
Haoyang Ju,
Gautham Kishore,
Aryan Mahajan,
Vijay Ganesh
Abstract:
In this paper, we present an LLM-based code translation method and an associated tool called CoTran, that translates whole-programs from one high-level programming language to another. Current LLM-based code translation methods lack a training approach to ensure that the translated code reliably compiles or bears substantial functional equivalence to the input code. In our work, we train an LLM vi…
▽ More
In this paper, we present an LLM-based code translation method and an associated tool called CoTran, that translates whole-programs from one high-level programming language to another. Current LLM-based code translation methods lack a training approach to ensure that the translated code reliably compiles or bears substantial functional equivalence to the input code. In our work, we train an LLM via reinforcement learning, by modifying the fine-tuning process to incorporate compiler feedback and symbolic execution (symexec)-based equivalence testing feedback that checks for functional equivalence between the input and output programs. The idea is to guide an LLM-in-training, via compiler and symexec-based testing feedback, by letting it know how far it is from producing perfect translations. We report on extensive experiments comparing CoTran with 14 other code translation tools that include human-written transpilers, LLM-based translation tools, and ChatGPT over a benchmark of more than 57,000 Java-Python equivalent pairs, and we show that CoTran outperforms them on relevant metrics such as compilation accuracy (CompAcc) and functional equivalence accuracy (FEqAcc). For example, our tool achieves 48.68% FEqAcc, 76.98% CompAcc for Python-to-Java translation, whereas the nearest competing tool (PLBART-base) only gets 38.26% and 75.77% resp. Also, built upon CodeT5, CoTran achieves +11.23%, +14.89% improvement on FEqAcc and +4.07%, +8.14% on CompAcc for Java-to-Python and Python-to-Java translation resp.
△ Less
Submitted 16 January, 2024; v1 submitted 11 June, 2023;
originally announced June 2023.
-
Transcending Grids: Point Clouds and Surface Representations Powering Neurological Processing
Authors:
Kishore Babu Nampalle,
Pradeep Singh,
Vivek Narayan Uppala,
Sumit Gangwar,
Rajesh Singh Negi,
Balasubramanian Raman
Abstract:
In healthcare, accurately classifying medical images is vital, but conventional methods often hinge on medical data with a consistent grid structure, which may restrict their overall performance. Recent medical research has been focused on tweaking the architectures to attain better performance without giving due consideration to the representation of data. In this paper, we present a novel approa…
▽ More
In healthcare, accurately classifying medical images is vital, but conventional methods often hinge on medical data with a consistent grid structure, which may restrict their overall performance. Recent medical research has been focused on tweaking the architectures to attain better performance without giving due consideration to the representation of data. In this paper, we present a novel approach for transforming grid based data into its higher dimensional representations, leveraging unstructured point cloud data structures. We first generate a sparse point cloud from an image by integrating pixel color information as spatial coordinates. Next, we construct a hypersurface composed of points based on the image dimensions, with each smooth section within this hypersurface symbolizing a specific pixel location. Polygonal face construction is achieved using an adjacency tensor. Finally, a dense point cloud is generated by densely sampling the constructed hypersurface, with a focus on regions of higher detail. The effectiveness of our approach is demonstrated on a publicly accessible brain tumor dataset, achieving significant improvements over existing classification techniques. This methodology allows the extraction of intricate details from the original image, opening up new possibilities for advanced image analysis and processing tasks.
△ Less
Submitted 2 June, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Deliberation and Voting in Approval-Based Multi-Winner Elections
Authors:
Kanav Mehra,
Nanda Kishore Sreenivas,
Kate Larson
Abstract:
Citizen-focused democratic processes where participants deliberate on alternatives and then vote to make the final decision are increasingly popular today. While the computational social choice literature has extensively investigated voting rules, there is limited work that explicitly looks at the interplay of the deliberative process and voting. In this paper, we build a deliberation model using…
▽ More
Citizen-focused democratic processes where participants deliberate on alternatives and then vote to make the final decision are increasingly popular today. While the computational social choice literature has extensively investigated voting rules, there is limited work that explicitly looks at the interplay of the deliberative process and voting. In this paper, we build a deliberation model using established models from the opinion-dynamics literature and study the effect of different deliberation mechanisms on voting outcomes achieved when using well-studied voting rules. Our results show that deliberation generally improves welfare and representation guarantees, but the results are sensitive to how the deliberation process is organized. We also show, experimentally, that simple voting rules, such as approval voting, perform as well as more sophisticated rules such as proportional approval voting or method of equal shares if deliberation is properly supported. This has ramifications on the practical use of such voting rules in citizen-focused democratic processes.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Optimizing Distributed ML Communication with Fused Computation-Collective Operations
Authors:
Kishore Punniyamurthy,
Khaled Hamidouche,
Bradford M. Beckmann
Abstract:
In order to satisfy their ever increasing capacity and compute requirements, machine learning models are distributed across multiple nodes using numerous parallelism strategies. As a result, collective communications are often on the critical path, and hiding their latency by overlap** kernel-granular communication and computation is difficult due to the absence of independent computation. In th…
▽ More
In order to satisfy their ever increasing capacity and compute requirements, machine learning models are distributed across multiple nodes using numerous parallelism strategies. As a result, collective communications are often on the critical path, and hiding their latency by overlap** kernel-granular communication and computation is difficult due to the absence of independent computation. In this work, we propose fusing computation with dependent collective communication by leveraging GPUs' massive parallelism and GPU-initiated communication. We have developed self-contained GPU kernels where workgroups (WGs) immediately communicate their results to remote GPUs when they complete their computation. Meanwhile, other WGs within the same kernel perform overlap** computation, maintaining high ALU utilization.
We demonstrate our approach by creating three prototype fused operators (embedding + All-to-All, GEMV + AllReduce, and GEMM + All-to-All) to address the pervasive communication overheads observed in DLRM, Transformers and MoE model architectures. In order to demonstrate that our approach can be integrated into ML frameworks for wide adoption in production environments, we expose our fused operators as new PyTorch operators as well as extend the Triton framework to enable them. Our evaluations show that our approach can effectively overlap communication with computations, subsequently reducing their combined execution time than the current collective library-based approaches. Our scale-up GEMV + AllReduce and GEMM + All-to-All implementations achieve up to 22% and 20% lower execution time, while our fused embedding + All-to-All reduces execution time by 20% and 31% for intra-node and inter-node configurations. Large scale-out simulations indicate that our approach reduces DLRM execution time by 21% for 128 node system.
△ Less
Submitted 23 April, 2024; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Cardiac Arrhythmia Detection using Artificial Neural Network
Authors:
Prof Sangeetha R G,
Kishore Anand K,
Sreevatsan B,
Vishal Kumar A
Abstract:
The prime purpose of this project is to develop a portable cardiac abnormality monitoring device which can drastically improvise the quality of the monitoring and the overall safety of the device. While a generic, low cost, wearable battery powered device for such applications may not yield sufficient performance, such devices combined with the capabilities of Artificial Neural Network algorithms…
▽ More
The prime purpose of this project is to develop a portable cardiac abnormality monitoring device which can drastically improvise the quality of the monitoring and the overall safety of the device. While a generic, low cost, wearable battery powered device for such applications may not yield sufficient performance, such devices combined with the capabilities of Artificial Neural Network algorithms can however, prove to be as competent as high end flexible and wearable monitoring devices fabricated using advanced manufacturing technologies. This paper evaluates the feasibility of the Levenberg-Marquardt ANN algorithm for use in any generic low power wearable devices implemented either as a pure real-time embedded system or as an IoT device capable of uploading the monitored readings to the cloud.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
A Neural Network Transformer Model for Composite Microstructure Homogenization
Authors:
Emil Pitz,
Kishore Pochiraju
Abstract:
Heterogeneity and uncertainty in a composite microstructure lead to either computational bottlenecks if modeled rigorously or to solution inaccuracies in the stress field and failure predictions if approximated. Although methods suitable for analyzing arbitrary and non-linear microstructures exist, their computational cost makes them impractical to use in large-scale structural analysis. Surrogate…
▽ More
Heterogeneity and uncertainty in a composite microstructure lead to either computational bottlenecks if modeled rigorously or to solution inaccuracies in the stress field and failure predictions if approximated. Although methods suitable for analyzing arbitrary and non-linear microstructures exist, their computational cost makes them impractical to use in large-scale structural analysis. Surrogate models or Reduced Order Models (ROMs) commonly enhance efficiencies but are typically calibrated with a single microstructure. Homogenization methods, such as the Mori-Tanaka method, offer rapid homogenization for a wide range of constituent properties. However, simplifying assumptions, like stress and strain averaging in phases, render the consideration of both deterministic and stochastic variations in microstructure infeasible. This paper illustrates a transformer neural network architecture that captures the knowledge of various microstructures and constituents, enabling it to function as a computationally efficient homogenization surrogate model. Given an image or an abstraction of an arbitrary composite microstructure of linearly elastic fibers in an elastoplastic matrix, the transformer network predicts the history-dependent, non-linear, and homogenized stress-strain response. Two methods for encoding microstructure features were tested: calculating two-point statistics using Principal Component Analysis (PCA) for dimensionality reduction and employing an autoencoder with a Convolutional Neural Network (CNN). Both methods accurately predict the homogenized material response. The developed transformer neural network offers an efficient means for microstructure-to-property translation, generalizable and extendable to a variety of microstructures. The paper describes the network architecture, training and testing data generation, and performance under cycling and random loadings.
△ Less
Submitted 28 May, 2024; v1 submitted 16 April, 2023;
originally announced April 2023.
-
A force-sensing surgical drill for real-time force feedback in robotic mastoidectomy
Authors:
Yuxin Chen,
Anna Goodridge,
Manish Sahu,
Aditi Kishore,
Seena Vafaee,
Harsha Mohan,
Katherina Sapozhnikov,
Francis Creighton,
Russell Taylor,
Deepa Galaiya
Abstract:
Purpose: Robotic assistance in otologic surgery can reduce the task load of operating surgeons during the removal of bone around the critical structures in the lateral skull base. However, safe deployment into the anatomical passageways necessitates the development of advanced sensing capabilities to actively limit the interaction forces between the surgical tools and critical anatomy.
Methods:…
▽ More
Purpose: Robotic assistance in otologic surgery can reduce the task load of operating surgeons during the removal of bone around the critical structures in the lateral skull base. However, safe deployment into the anatomical passageways necessitates the development of advanced sensing capabilities to actively limit the interaction forces between the surgical tools and critical anatomy.
Methods: We introduce a surgical drill equipped with a force sensor that is capable of measuring accurate tool-tissue interaction forces to enable force control and feedback to surgeons. The design, calibration and validation of the force-sensing surgical drill mounted on a cooperatively controlled surgical robot are described in this work.
Results: The force measurements on the tip of the surgical drill are validated with raw-egg drilling experiments, where a force sensor mounted below the egg serves as ground truth. The average root mean square error (RMSE) for points and path drilling experiments are 41.7 (pm 12.2) mN and 48.3 (pm 13.7) mN respectively.
Conclusions: The force-sensing prototype measures forces with sub-millinewton resolution and the results demonstrate that the calibrated force-sensing drill generates accurate force measurements with minimal error compared to the measured drill forces. The development of such sensing capabilities is crucial for the safe use of robotic systems in a clinical context.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Learning Iterative Neural Optimizers for Image Steganography
Authors:
Xiangyu Chen,
Varsha Kishore,
Kilian Q Weinberger
Abstract:
Image steganography is the process of concealing secret information in images through imperceptible changes. Recent work has formulated this task as a classic constrained optimization problem. In this paper, we argue that image steganography is inherently performed on the (elusive) manifold of natural images, and propose an iterative neural network trained to perform the optimization steps. In con…
▽ More
Image steganography is the process of concealing secret information in images through imperceptible changes. Recent work has formulated this task as a classic constrained optimization problem. In this paper, we argue that image steganography is inherently performed on the (elusive) manifold of natural images, and propose an iterative neural network trained to perform the optimization steps. In contrast to classical optimization methods like L-BFGS or projected gradient descent, we train the neural network to also stay close to the manifold of natural images throughout the optimization. We show that our learned neural optimization is faster and more reliable than classical optimization approaches. In comparison to previous state-of-the-art encoder-decoder-based steganography methods, it reduces the recovery error rate by multiple orders of magnitude and achieves zero error up to 3 bits per pixel (bpp) without the need for error-correcting codes.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Regret, Delete, (Do Not) Repeat: An Analysis of Self-Cleaning Practices on Twitter After the Outbreak of the COVID-19 Pandemic
Authors:
Nicolás E. Díaz Ferreyra,
Gautam Kishore Shahi,
Catherine Tony,
Stefan Stieglitz,
Riccardo Scandariato
Abstract:
During the outbreak of the COVID-19 pandemic, many people shared their symptoms across Online Social Networks (OSNs) like Twitter, ho** for others' advice or moral support. Prior studies have shown that those who disclose health-related information across OSNs often tend to regret it and delete their publications afterwards. Hence, deleted posts containing sensitive data can be seen as manifesta…
▽ More
During the outbreak of the COVID-19 pandemic, many people shared their symptoms across Online Social Networks (OSNs) like Twitter, ho** for others' advice or moral support. Prior studies have shown that those who disclose health-related information across OSNs often tend to regret it and delete their publications afterwards. Hence, deleted posts containing sensitive data can be seen as manifestations of online regrets. In this work, we present an analysis of deleted content on Twitter during the outbreak of the COVID-19 pandemic. For this, we collected more than 3.67 million tweets describing COVID-19 symptoms (e.g., fever, cough, and fatigue) posted between January and April 2020. We observed that around 24% of the tweets containing personal pronouns were deleted either by their authors or by the platform after one year. As a practical application of the resulting dataset, we explored its suitability for the automatic classification of regrettable content on Twitter.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Word-Level Structure Identification In FPGA Designs Using Cell Proximity Information
Authors:
Aparajithan Nathamuni-Venkatesan,
Ram-Venkat Narayanan,
Kishore Pula,
Sundarakumar Muthukumaran,
Ranga Vemuri
Abstract:
Reverse engineering of FPGA based designs from the flattened LUT level netlist to high level RTL helps in verification of the design or in understanding legacy designs. We focus on flattened netlists for FPGA devices from ** algorithm that makes use of the location information of the elements on the physical device after place and ro…
▽ More
Reverse engineering of FPGA based designs from the flattened LUT level netlist to high level RTL helps in verification of the design or in understanding legacy designs. We focus on flattened netlists for FPGA devices from ** algorithm that makes use of the location information of the elements on the physical device after place and route. The proposed grou** algorithm gives clusters with average NMI of 0.73 for grou**s including all element types. The benchmarks chosen include a range of designs from communication, arithmetic units, processors and DSP processing units.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
Reverse Engineering Word-Level Models from Look-Up Table Netlists
Authors:
Ram Venkat Narayanan,
Aparajithan Nathamuni Venkatesan,
Kishore Pula,
Sundarakumar Muthukumaran,
Ranga Vemuri
Abstract:
Reverse engineering of FPGA designs from bitstreams to RTL models aids in understanding the high level functionality of the design and for validating and reconstructing legacy designs. Fast carry-chains are commonly used in synthesis of operators in FPGA designs. We propose a method to detect word-level structures by analyzing these carry-chains in LUT (Look-Up Table) level netlists. We also prese…
▽ More
Reverse engineering of FPGA designs from bitstreams to RTL models aids in understanding the high level functionality of the design and for validating and reconstructing legacy designs. Fast carry-chains are commonly used in synthesis of operators in FPGA designs. We propose a method to detect word-level structures by analyzing these carry-chains in LUT (Look-Up Table) level netlists. We also present methods to adapt existing techniques to identify combinational operations and sequential modules in ASIC netlists to LUT netlists. All developed and adapted techniques are consolidated into an integrated tool-chain to aid in reverse engineering of word-level designs from LUT-level netlists. When evaluated on a set of real-world designs, the tool-chain infers 34\% to 100\% of the elements in the netlist to be part of a known word-level operation or a known sequential module.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
Word class representations spontaneously emerge in a deep neural network trained on next word prediction
Authors:
Kishore Surendra,
Achim Schilling,
Paul Stoewer,
Andreas Maier,
Patrick Krauss
Abstract:
How do humans learn language, and can the first language be learned at all? These fundamental questions are still hotly debated. In contemporary linguistics, there are two major schools of thought that give completely opposite answers. According to Chomsky's theory of universal grammar, language cannot be learned because children are not exposed to sufficient data in their linguistic environment.…
▽ More
How do humans learn language, and can the first language be learned at all? These fundamental questions are still hotly debated. In contemporary linguistics, there are two major schools of thought that give completely opposite answers. According to Chomsky's theory of universal grammar, language cannot be learned because children are not exposed to sufficient data in their linguistic environment. In contrast, usage-based models of language assume a profound relationship between language structure and language use. In particular, contextual mental processing and mental representations are assumed to have the cognitive capacity to capture the complexity of actual language use at all levels. The prime example is syntax, i.e., the rules by which words are assembled into larger units such as sentences. Typically, syntactic rules are expressed as sequences of word classes. However, it remains unclear whether word classes are innate, as implied by universal grammar, or whether they emerge during language acquisition, as suggested by usage-based approaches. Here, we address this issue from a machine learning and natural language processing perspective. In particular, we trained an artificial deep neural network on predicting the next word, provided sequences of consecutive words as input. Subsequently, we analyzed the emerging activation patterns in the hidden layers of the neural network. Strikingly, we find that the internal representations of nine-word input sequences cluster according to the word class of the tenth word to be predicted as output, even though the neural network did not receive any explicit information about syntactic rules or word classes during training. This surprising result suggests, that also in the human brain, abstract representational categories such as word classes may naturally emerge as a consequence of predictive coding and processing during language acquisition.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
The Clinical Trials Puzzle: How Network Effects Limit Drug Discovery
Authors:
Kishore Vasan,
Deisy Gysi,
Albert-Laszlo Barabasi
Abstract:
The depth of knowledge offered by post-genomic medicine has carried the promise of new drugs, and cures for multiple diseases. To explore the degree to which this capability has materialized, we extract meta-data from 356,403 clinical trials spanning four decades, aiming to offer mechanistic insights into the innovation practices in drug discovery. We find that convention dominates over innovation…
▽ More
The depth of knowledge offered by post-genomic medicine has carried the promise of new drugs, and cures for multiple diseases. To explore the degree to which this capability has materialized, we extract meta-data from 356,403 clinical trials spanning four decades, aiming to offer mechanistic insights into the innovation practices in drug discovery. We find that convention dominates over innovation, as over 96% of the recorded trials focus on previously tested drug targets, and the tested drugs target only 12% of the human interactome. If current patterns persist, it would take 170 years to target all druggable proteins. We uncover two network-based fundamental mechanisms that currently limit target discovery: preferential attachment, leading to the repeated exploration of previously targeted proteins; and local network effects, limiting exploration to proteins interacting with highly explored proteins. We build on these insights to develop a quantitative network-based model of drug discovery. We demonstrate that the model is able to accurately recreate the exploration patterns observed in clinical trials. Most importantly, we show that a network-based search strategy can widen the scope of drug discovery by guiding exploration to novel proteins that are part of under explored regions in the human interactome.
△ Less
Submitted 25 January, 2023;
originally announced January 2023.
-
Beyond Information Exchange: An Approach to Deploy Network Properties for Information Diffusion
Authors:
Soumita Das,
Anupam Biswas,
Ravi Kishore Devarapalli
Abstract:
Information diffusion in Online Social Networks is a new and crucial problem in social network analysis field and requires significant research attention. Efficient diffusion of information are of critical importance in diverse situations such as; pandemic prevention, advertising, marketing etc. Although several mathematical models have been developed till date, but previous works lacked systemati…
▽ More
Information diffusion in Online Social Networks is a new and crucial problem in social network analysis field and requires significant research attention. Efficient diffusion of information are of critical importance in diverse situations such as; pandemic prevention, advertising, marketing etc. Although several mathematical models have been developed till date, but previous works lacked systematic analysis and exploration of the influence of neighborhood for information diffusion. In this paper, we have proposed Common Neighborhood Strategy (CNS) algorithm for information diffusion that demonstrates the role of common neighborhood in information propagation throughout the network. The performance of CNS algorithm is evaluated on several real-world datasets in terms of diffusion speed and diffusion outspread and compared with several widely used information diffusion models. Empirical results show CNS algorithm enables better information diffusion both in terms of diffusion speed and diffusion outspread.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Latent Diffusion for Language Generation
Authors:
Justin Lovelace,
Varsha Kishore,
Chao Wan,
Eliot Shekhtman,
Kilian Q. Weinberger
Abstract:
Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have presented diffusion as an alternative to existing pretrained language models. We view diffusion and existing language models as complementary. We demonstrate that enc…
▽ More
Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have presented diffusion as an alternative to existing pretrained language models. We view diffusion and existing language models as complementary. We demonstrate that encoder-decoder language models can be utilized to efficiently learn high-quality language autoencoders. We then demonstrate that continuous diffusion models can be learned in the latent space of the language autoencoder, enabling us to sample continuous latent representations that can be decoded into natural language with the pretrained decoder. We validate the effectiveness of our approach for unconditional, class-conditional, and sequence-to-sequence language generation. We demonstrate across multiple diverse data sets that our latent language diffusion models are significantly more effective than previous diffusion language models.
△ Less
Submitted 7 November, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Caching Contents with Varying Popularity using Restless Bandits
Authors:
Pavamana K J,
Chandramani Kishore Singh
Abstract:
Mobile networks are experiencing prodigious increase in data volume and user density , which exerts a great burden on mobile core networks and backhaul links. An efficient technique to lessen this problem is to use caching i.e. to bring the data closer to the users by making use of the caches of edge network nodes, such as fixed or mobile access points and even user devices. The performance of a c…
▽ More
Mobile networks are experiencing prodigious increase in data volume and user density , which exerts a great burden on mobile core networks and backhaul links. An efficient technique to lessen this problem is to use caching i.e. to bring the data closer to the users by making use of the caches of edge network nodes, such as fixed or mobile access points and even user devices. The performance of a caching depends on contents that are cached. In this paper, we examine the problem of content caching at the wireless edge(i.e. base stations) to minimize the discounted cost incurred over infinite horizon. We formulate this problem as a restless bandit problem, which is hard to solve. We begin by showing an optimal policy is of threshold type. Using these structural results, we prove the indexability of the problem, and use Whittle index policy to minimize the discounted cost.
△ Less
Submitted 20 June, 2023; v1 submitted 31 October, 2022;
originally announced December 2022.
-
Automating Vascular Shunt Insertion with the dVRK Surgical Robot
Authors:
Karthik Dharmarajan,
Will Panitch,
Muyan Jiang,
Kishore Srinivas,
Baiyu Shi,
Yahav Avigal,
Huang Huang,
Thomas Low,
Danyal Fer,
Ken Goldberg
Abstract:
Vascular shunt insertion is a fundamental surgical procedure used to temporarily restore blood flow to tissues. It is often performed in the field after major trauma. We formulate a problem of automated vascular shunt insertion and propose a pipeline to perform Automated Vascular Shunt Insertion (AVSI) using a da Vinci Research Kit. The pipeline uses a learned visual model to estimate the locus of…
▽ More
Vascular shunt insertion is a fundamental surgical procedure used to temporarily restore blood flow to tissues. It is often performed in the field after major trauma. We formulate a problem of automated vascular shunt insertion and propose a pipeline to perform Automated Vascular Shunt Insertion (AVSI) using a da Vinci Research Kit. The pipeline uses a learned visual model to estimate the locus of the vessel rim, plans a grasp on the rim, and moves to grasp at that point. The first robot gripper then pulls the rim to stretch open the vessel with a dilation motion. The second robot gripper then proceeds to insert a shunt into the vessel phantom (a model of the blood vessel) with a chamfer tilt followed by a screw motion. Results suggest that AVSI achieves a high success rate even with tight tolerances and varying vessel orientations up to 30°. Supplementary material, dataset, videos, and visualizations can be found at https://sites.google.com/berkeley.edu/autolab-avsi.
△ Less
Submitted 8 March, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
A Non-iterative Spatio-temporal Multi-task Assignments based Collision-free Trajectories for Music Playing Robots
Authors:
Shridhar Velhal,
Krishna Kishore VS,
Suresh Sundaram
Abstract:
In this paper, a non-iterative spatio-temporal multi-task assignment approach is used for playing piano music by a team of robots. This paper considers the piano playing problem, in which an algorithm needs to compute the trajectories for a dynamically sized team of robots who will play the musical notes by traveling through the specific locations associated with musical notes at their respective…
▽ More
In this paper, a non-iterative spatio-temporal multi-task assignment approach is used for playing piano music by a team of robots. This paper considers the piano playing problem, in which an algorithm needs to compute the trajectories for a dynamically sized team of robots who will play the musical notes by traveling through the specific locations associated with musical notes at their respective specific times. A two-step dynamic resource allocation based on a spatio-temporal multi-task assignment problem (DREAM), has been implemented to assign robots for playing the musical tune. The algorithm computes the required number of robots to play the music in the first step. In the second step, optimal assignments are computed for the updated team of robots, which minimizes the total distance traveled by the team. Even for the individual feasible trajectories, the multi-robot execution may fail if robots encounter a collision. As some time will be utilized for this conflict resolution, robots may not be able to reach the desired location on time. This paper analyses and proves that, if robots are operating in a convex region, the solution of the DREAM approach provides collision-free trajectories. The working of the DREAM approach has been illustrated with the help of the high fidelity simulations in Gazebo operated using ROS2. The result clearly shows that the DREAM approach computes the required number of robots and assigns multiple tasks to robots in at most two steps. The simulation of the robots playing music, using computed assignments, is demonstrated in the attached video. video link: \url{https://youtu.be/XToicNm-CO8}
△ Less
Submitted 17 February, 2023; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Learning to Efficiently Plan Robust Frictional Multi-Object Grasps
Authors:
Wisdom C. Agboh,
Satvik Sharma,
Kishore Srinivas,
Mallika Parulekar,
Gaurav Datta,
Tianshuang Qiu,
Jeffrey Ichnowski,
Eugen Solowjow,
Mehmet Dogar,
Ken Goldberg
Abstract:
We consider a decluttering problem where multiple rigid convex polygonal objects rest in randomly placed positions and orientations on a planar surface and must be efficiently transported to a packing box using both single and multi-object grasps. Prior work considered frictionless multi-object gras**. In this paper, we introduce friction to increase the number of potential grasps for a given gr…
▽ More
We consider a decluttering problem where multiple rigid convex polygonal objects rest in randomly placed positions and orientations on a planar surface and must be efficiently transported to a packing box using both single and multi-object grasps. Prior work considered frictionless multi-object gras**. In this paper, we introduce friction to increase the number of potential grasps for a given group of objects, and thus increase picks per hour. We train a neural network using real examples to plan robust multi-object grasps. In physical experiments, we find a 13.7% increase in success rate, a 1.6x increase in picks per hour, and a 6.3x decrease in grasp planning time compared to prior work on multi-object gras**. Compared to single-object gras**, we find a 3.1x increase in picks per hour.
△ Less
Submitted 2 August, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Self-Attention Message Passing for Contrastive Few-Shot Learning
Authors:
Ojas Kishorkumar Shirekar,
Anuj Singh,
Hadi Jamali-Rad
Abstract:
Humans have a unique ability to learn new representations from just a handful of examples with little to no supervision. Deep learning models, however, require an abundance of data and supervision to perform at a satisfactory level. Unsupervised few-shot learning (U-FSL) is the pursuit of bridging this gap between machines and humans. Inspired by the capacity of graph neural networks (GNNs) in dis…
▽ More
Humans have a unique ability to learn new representations from just a handful of examples with little to no supervision. Deep learning models, however, require an abundance of data and supervision to perform at a satisfactory level. Unsupervised few-shot learning (U-FSL) is the pursuit of bridging this gap between machines and humans. Inspired by the capacity of graph neural networks (GNNs) in discovering complex inter-sample relationships, we propose a novel self-attention based message passing contrastive learning approach (coined as SAMP-CLR) for U-FSL pre-training. We also propose an optimal transport (OT) based fine-tuning strategy (we call OpT-Tune) to efficiently induce task awareness into our novel end-to-end unsupervised few-shot classification framework (SAMPTransfer). Our extensive experimental results corroborate the efficacy of SAMPTransfer in a variety of downstream few-shot classification scenarios, setting a new state-of-the-art for U-FSL on both miniImagenet and tieredImagenet benchmarks, offering up to 7%+ and 5%+ improvements, respectively. Our further investigations also confirm that SAMPTransfer remains on-par with some supervised baselines on miniImagenet and outperforms all existing U-FSL baselines in a challenging cross-domain scenario. Our code can be found in our GitHub repository at https://github.com/ojss/SAMPTransfer/.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.