Skip to main content

Showing 1–19 of 19 results for author: Dick, R P

.
  1. arXiv:2405.07527  [pdf, other

    cs.LG cs.AI

    Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

    Authors: Yubin Shi, Yixuan Chen, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Yujiang Wang, Robert P. Dick, Qin Lv, Yingying Zhao, Fan Yang, Tun Lu, Ning Gu, Li Shang

    Abstract: Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized models to attain a more efficient and fruitful training strategy. Empirical evidence reveals that when scaling down into network modules, such as heads in self-atten… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted at NeurIPS 2023

  2. arXiv:2311.18251  [pdf, other

    cs.HC

    Can Large Language Models Be Good Companions? An LLM-Based Eyewear System with Conversational Common Ground

    Authors: Zhenyu Xu, Hailin Xu, Zhouyang Lu, Yingying Zhao, Rui Zhu, Yujiang Wang, Mingzhi Dong, Yuhu Chang, Qin Lv, Robert P. Dick, Fan Yang, Tun Lu, Ning Gu, Li Shang

    Abstract: Develo** chatbots as personal companions has long been a goal of artificial intelligence researchers. Recent advances in Large Language Models (LLMs) have delivered a practical solution for endowing chatbots with anthropomorphic language capabilities. However, it takes more than LLMs to enable chatbots that can act as companions. Humans use their understanding of individual personalities to driv… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 36 pages, 25 figures, Under review at ACM IMWUT

  3. arXiv:2311.12997  [pdf, other

    cs.LG

    Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

    Authors: Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, Robert P. Dick, Hidenori Tanaka

    Abstract: Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing basic arithmetic. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we train autoregressive Transformer models on… ▽ More

    Submitted 5 February, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  4. arXiv:2311.12786  [pdf, other

    cs.LG

    Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

    Authors: Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger

    Abstract: Fine-tuning large pre-trained models has become the de facto strategy for develo** both task-specific and general-purpose machine learning systems, including develo** models that are safe to deploy. Despite its clear importance, there has been minimal work that explains how fine-tuning alters the underlying capabilities learned by a model during pretraining: does fine-tuning yield entirely nov… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  5. arXiv:2310.17639  [pdf, other

    cs.AI cs.CL cs.LG

    In-Context Learning Dynamics with Random Binary Sequences

    Authors: Eric J. Bigelow, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tomer D. Ullman

    Abstract: Large language models (LLMs) trained on huge corpora of text datasets demonstrate intriguing capabilities, achieving state-of-the-art performance on tasks they were not explicitly trained for. The precise nature of LLM capabilities is often mysterious, and different prompts can elicit different capabilities through in-context learning. We propose a framework that enables us to analyze in-context l… ▽ More

    Submitted 15 April, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

  6. arXiv:2310.09336  [pdf, other

    cs.LG

    Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

    Authors: Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka

    Abstract: Modern generative models exhibit unprecedented capabilities to generate extremely realistic data. However, given the inherent compositionality of the real world, reliable use of these models in practical applications requires that they exhibit the capability to compose a novel set of concepts to generate outputs not seen in the training data set. Prior work demonstrates that recent diffusion model… ▽ More

    Submitted 16 February, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS)

  7. arXiv:2211.08422  [pdf, other

    cs.LG cs.CV

    Mechanistic Mode Connectivity

    Authors: Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka

    Abstract: We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss. Specifically, we ask the following question: are minimizers that rely on different mechanisms for making their predictions connected via simple paths of low loss? We provide a definition of… ▽ More

    Submitted 1 June, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: ICML, 2023

  8. arXiv:2205.11506  [pdf, other

    cs.LG cs.CV

    Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering

    Authors: Ekdeep Singh Lubana, Chi Ian Tang, Fahim Kawsar, Robert P. Dick, Akhil Mathur

    Abstract: Federated learning is generally used in tasks where labels are readily available (e.g., next word prediction). Relaxing this constraint requires design of unsupervised learning techniques that can support desirable properties for federated training: robustness to statistical/systems heterogeneity, scalability with number of participants, and communication efficiency. Prior work on this topic has f… ▽ More

    Submitted 11 June, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Camera-ready ICML, 2022

  9. Do Smart Glasses Dream of Sentimental Visions? Deep Emotionship Analysis for Eyewear Devices

    Authors: Yingying Zhao, Yuhu Chang, Yutian Lu, Yujiang Wang, Mingzhi Dong, Qin Lv, Robert P. Dick, Fan Yang, Tun Lu, Ning Gu, Li Shang

    Abstract: Emotion recognition in smart eyewear devices is highly valuable but challenging. One key limitation of previous works is that the expression-related information like facial or eye images is considered as the only emotional evidence. However, emotional status is not isolated; it is tightly associated with people's visual perceptions, especially those sentimental ones. However, little work has exami… ▽ More

    Submitted 19 April, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: The EMO-Film dataset is available at: https://github.com/MemX-Research/EMOShip

    Journal ref: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 6, Issue 1, Article 38. March 2022

  10. arXiv:2106.05956  [pdf, other

    cs.LG cs.CV

    Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

    Authors: Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka

    Abstract: Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative normalization layers, these properties need to be generalized so that any given layer's success/failure can be accurately predicted. In this work, we take a first… ▽ More

    Submitted 26 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted at NeurIPS 2021

  11. arXiv:2105.00916  [pdf, other

    cs.CV cs.HC

    MemX: An Attention-Aware Smart Eyewear System for Personalized Moment Auto-capture

    Authors: Yuhu Chang, Yingying Zhao, Mingzhi Dong, Yujiang Wang, Yutian Lu, Qin Lv, Robert P. Dick, Tun Lu, Ning Gu, Li Shang

    Abstract: This work presents MemX: a biologically-inspired attention-aware eyewear system developed with the goal of pursuing the long-awaited vision of a personalized visual Memex. MemX captures human visual attention on the fly, analyzes the salient visual content, and records moments of personal interest in the form of compact video snippets. Accurate attentive scene detection and analysis on resource-co… ▽ More

    Submitted 9 October, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)

    Journal ref: Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Volume 5 Issue 2, Article 56. June 2021

  12. A Reinforcement-Learning-Based Energy-Efficient Framework for Multi-Task Video Analytics Pipeline

    Authors: Yingying Zhao, Mingzhi Dong, Yujiang Wang, Da Feng, Qin Lv, Robert P. Dick, Dongsheng Li, Tun Lu, Ning Gu, Li Shang

    Abstract: Deep-learning-based video processing has yielded transformative results in recent years. However, the video analytics pipeline is energy-intensive due to high data rates and reliance on complex inference algorithms, which limits its adoption in energy-constrained applications. Motivated by the observation of high and variable spatial redundancy and temporal dynamics in video data streams, we desig… ▽ More

    Submitted 2 May, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: IEEE Transactions on Multimedia

  13. HVAQ: A High-Resolution Vision-Based Air Quality Dataset

    Authors: Zuohui Chen, Tony Zhang, Zhuangzhi Chen, Yun Xiang, Qi Xuan, Robert P. Dick

    Abstract: Air pollutants, such as particulate matter, negatively impact human health. Most existing pollution monitoring techniques use stationary sensors, which are typically sparsely deployed. However, real-world pollution distributions vary rapidly with position and the visual effects of air pollution can be used to estimate concentration, potentially at high spatial resolution. Accurate pollution monito… ▽ More

    Submitted 16 October, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

  14. arXiv:2102.02805  [pdf, other

    cs.LG

    How do Quadratic Regularizers Prevent Catastrophic Forgetting: The Role of Interpolation

    Authors: Ekdeep Singh Lubana, Puja Trivedi, Danai Koutra, Robert P. Dick

    Abstract: Catastrophic forgetting undermines the effectiveness of deep neural networks (DNNs) in scenarios such as continual learning and lifelong learning. While several methods have been proposed to tackle this problem, there is limited work explaining why these methods work well. This paper has the goal of better explaining a popularly used technique for avoiding catastrophic forgetting: quadratic regula… ▽ More

    Submitted 12 August, 2022; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: Camera-ready for Conference on Lifelong Learning Agents (CoLLAs), 2022

  15. arXiv:2009.11839  [pdf, other

    cs.LG cs.CV stat.ML

    A Gradient Flow Framework For Analyzing Network Pruning

    Authors: Ekdeep Singh Lubana, Robert P. Dick

    Abstract: Recent network pruning methods focus on pruning models early-on in training. To estimate the impact of removing a parameter, these methods use importance measures that were originally designed to prune trained models. Despite lacking justification for their use early-on in training, such measures result in surprisingly low accuracy loss. To better explain this behavior, we develop a general framew… ▽ More

    Submitted 23 September, 2021; v1 submitted 24 September, 2020; originally announced September 2020.

    Comments: Accepted at ICLR, 2021

  16. arXiv:2009.05014  [pdf, other

    cs.CV cs.LG

    OrthoReg: Robust Network Pruning Using Orthonormality Regularization

    Authors: Ekdeep Singh Lubana, Puja Trivedi, Conrad Hougen, Robert P. Dick, Alfred O. Hero

    Abstract: Network pruning in Convolutional Neural Networks (CNNs) has been extensively investigated in recent years. To determine the impact of pruning a group of filters on a network's accuracy, state-of-the-art pruning methods consistently assume filters of a CNN are independent. This allows the importance of a group of filters to be estimated as the sum of importances of individual filters. However, over… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

  17. arXiv:1403.5871  [pdf, other

    cs.CR

    The Mason Test: A Defense Against Sybil Attacks in Wireless Networks Without Trusted Authorities

    Authors: Yue Liu, David R. Bild, Robert P. Dick, Z. Morley Mao, Dan S. Wallach

    Abstract: Wireless networks are vulnerable to Sybil attacks, in which a malicious node poses as many identities in order to gain disproportionate influence. Many defenses based on spatial variability of wireless channels exist, but depend either on detailed, multi-tap channel estimation - something not exposed on commodity 802.11 devices - or valid RSSI observations from multiple trusted sources, e.g., corp… ▽ More

    Submitted 24 March, 2014; originally announced March 2014.

  18. arXiv:1403.4677  [pdf, other

    cs.NI

    Performance Analysis of Location Profile Routing

    Authors: David R. Bild, Yue Liu, Robert P. Dick, Z. Morley Mao, Dan S. Wallach

    Abstract: We propose using the predictability of human motion to eliminate the overhead of distributed location services in human-carried MANETs, dubbing the technique location profile routing. This method outperforms the Geographic Hashing Location Service when nodes change locations 2x more frequently than they initiate connections (e.g., start new TCP streams), as in applications like text- and instant-m… ▽ More

    Submitted 18 March, 2014; originally announced March 2014.

  19. arXiv:1402.2671  [pdf, other

    cs.SI physics.soc-ph

    Aggregate Characterization of User Behavior in Twitter and Analysis of the Retweet Graph

    Authors: David R. Bild, Yue Liu, Robert P. Dick, Z. Morley Mao, Dan S. Wallach

    Abstract: Most previous analysis of Twitter user behavior is focused on individual information cascades and the social followers graph. We instead study aggregate user behavior and the retweet graph with a focus on quantitative descriptions. We find that the lifetime tweet distribution is a type-II discrete Weibull stemming from a power law hazard function, the tweet rate distribution, although asymptotical… ▽ More

    Submitted 11 February, 2014; originally announced February 2014.

    Comments: 17 pages, 21 figures

    Journal ref: ACM Trans. Internet Technol. 15, 1, Article 4 (February 2015), 24 pages