Search | arXiv e-print repository

Characterizing Stereotypical Bias from Privacy-preserving Pre-Training

Authors: Stefan Arnold, Rene Gröbner, Annika Schreiner

Abstract: Differential Privacy (DP) can be applied to raw text by exploiting the spatial arrangement of words in an embedding space. We investigate the implications of such text privatization on Language Models (LMs) and their tendency towards stereotypical associations. Since previous studies documented that linguistic proficiency correlates with stereotypical bias, one could assume that techniques for tex… ▽ More Differential Privacy (DP) can be applied to raw text by exploiting the spatial arrangement of words in an embedding space. We investigate the implications of such text privatization on Language Models (LMs) and their tendency towards stereotypical associations. Since previous studies documented that linguistic proficiency correlates with stereotypical bias, one could assume that techniques for text privatization, which are known to degrade language modeling capabilities, would cancel out undesirable biases. By testing BERT models trained on texts containing biased statements primed with varying degrees of privacy, our study reveals that while stereotypical bias generally diminishes when privacy is tightened, text privatization does not uniformly equate to diminishing bias across all social domains. This highlights the need for careful diagnosis of bias in LMs that undergo text privatization. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.18620 [pdf, other]

Documentation Practices of Artificial Intelligence

Authors: Stefan Arnold, Dilara Yesilbas, Rene Gröbner, Dominik Riedelbauch, Maik Horn, Sven Weinzierl

Abstract: Artificial Intelligence (AI) faces persistent challenges in terms of transparency and accountability, which requires rigorous documentation. Through a literature review on documentation practices, we provide an overview of prevailing trends, persistent issues, and the multifaceted interplay of factors influencing the documentation. Our examination of key characteristics such as scope, target audie… ▽ More Artificial Intelligence (AI) faces persistent challenges in terms of transparency and accountability, which requires rigorous documentation. Through a literature review on documentation practices, we provide an overview of prevailing trends, persistent issues, and the multifaceted interplay of factors influencing the documentation. Our examination of key characteristics such as scope, target audiences, support for multimodality, and level of automation, highlights a dynamic evolution in documentation practices, underscored by a shift towards a more holistic, engaging, and automated documentation. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.13121 [pdf, other]

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Authors: **hyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

Abstract: Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-… ▽ More Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-end modeling that minimizes cascading errors in complex pipelines, and allows for the application of sophisticated prompting techniques across the entire system. To assess this paradigm shift, we introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks. However, LCLMs still face challenges in areas like compositional reasoning that are required in SQL-like tasks. Notably, prompting strategies significantly influence performance, emphasizing the need for continued research as context lengths grow. Overall, LOFT provides a rigorous testing ground for LCLMs, showcasing their potential to supplant existing paradigms and tackle novel tasks as model capabilities scale. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 29 pages. Dataset available at https://github.com/google-deepmind/loft

arXiv:2404.12631 [pdf]

Breaching the Bottleneck: Evolutionary Transition from Reward-Driven Learning to Reward-Agnostic Domain-Adapted Learning in Neuromodulated Neural Nets

Authors: Solvi Arnold, Reiji Suzuki, Takaya Arita, Kimitoshi Yamazaki

Abstract: Advanced biological intelligence learns efficiently from an information-rich stream of stimulus information, even when feedback on behaviour quality is sparse or absent. Such learning exploits implicit assumptions about task domains. We refer to such learning as Domain-Adapted Learning (DAL). In contrast, AI learning algorithms rely on explicit externally provided measures of behaviour quality to… ▽ More Advanced biological intelligence learns efficiently from an information-rich stream of stimulus information, even when feedback on behaviour quality is sparse or absent. Such learning exploits implicit assumptions about task domains. We refer to such learning as Domain-Adapted Learning (DAL). In contrast, AI learning algorithms rely on explicit externally provided measures of behaviour quality to acquire fit behaviour. This imposes an information bottleneck that precludes learning from diverse non-reward stimulus information, limiting learning efficiency. We consider the question of how biological evolution circumvents this bottleneck to produce DAL. We propose that species first evolve the ability to learn from reward signals, providing inefficient (bottlenecked) but broad adaptivity. From there, integration of non-reward information into the learning process can proceed via gradual accumulation of biases induced by such information on specific task domains. This scenario provides a biologically plausible pathway towards bottleneck-free, domain-adapted learning. Focusing on the second phase of this scenario, we set up a population of NNs with reward-driven learning modelled as Reinforcement Learning (A2C), and allow evolution to improve learning efficiency by integrating non-reward information into the learning process using a neuromodulatory update mechanism. On a navigation task in continuous 2D space, evolved DAL agents show a 300-fold increase in learning speed compared to pure RL agents. Evolution is found to eliminate reliance on reward information altogether, allowing DAL agents to learn from non-reward information exclusively, using local neuromodulation-based connection weight updates only. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 9 pages, 5 figures

ACM Class: I.2.6

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.17937 [pdf, other]

Can an LLM-Powered Socially Assistive Robot Effectively and Safely Deliver Cognitive Behavioral Therapy? A Study With University Students

Authors: Mina J. Kian, Mingyu Zong, Katrin Fischer, Abhyuday Singh, Anna-Maria Velentza, Pau Sang, Shriya Upadhyay, Anika Gupta, Misha A. Faruki, Wallace Browning, Sebastien M. R. Arnold, Bhaskar Krishnamachari, Maja J. Mataric

Abstract: Cognitive behavioral therapy (CBT) is a widely used therapeutic method for guiding individuals toward restructuring their thinking patterns as a means of addressing anxiety, depression, and other challenges. We developed a large language model (LLM)-powered prompt-engineered socially assistive robot (SAR) that guides participants through interactive CBT at-home exercises. We evaluated the performa… ▽ More Cognitive behavioral therapy (CBT) is a widely used therapeutic method for guiding individuals toward restructuring their thinking patterns as a means of addressing anxiety, depression, and other challenges. We developed a large language model (LLM)-powered prompt-engineered socially assistive robot (SAR) that guides participants through interactive CBT at-home exercises. We evaluated the performance of the SAR through a 15-day study with 38 university students randomly assigned to interact daily with the robot or a chatbot (using the same LLM), or complete traditional CBT worksheets throughout the duration of the study. We measured weekly therapeutic outcomes, changes in pre-/post-session anxiety measures, and adherence to completing CBT exercises. We found that self-reported measures of general psychological distress significantly decreased over the study period in the robot and worksheet conditions but not the chatbot condition. Furthermore, the SAR enabled significant single-session improvements for more sessions than the other two conditions combined. Our findings suggest that SAR-guided LLM-powered CBT may be as effective as traditional worksheet methods in supporting therapeutic progress from the beginning to the end of the study and superior in decreasing user anxiety immediately after completing the CBT exercise. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2310.11363 [pdf, other]

Disentangling the Linguistic Competence of Privacy-Preserving BERT

Authors: Stefan Arnold, Nils Kemmerzell, Annika Schreiner

Abstract: Differential Privacy (DP) has been tailored to address the unique challenges of text-to-text privatization. However, text-to-text privatization is known for degrading the performance of language models when trained on perturbed text. Employing a series of interpretation techniques on the internal representations extracted from BERT trained on perturbed pre-text, we intend to disentangle at the lin… ▽ More Differential Privacy (DP) has been tailored to address the unique challenges of text-to-text privatization. However, text-to-text privatization is known for degrading the performance of language models when trained on perturbed text. Employing a series of interpretation techniques on the internal representations extracted from BERT trained on perturbed pre-text, we intend to disentangle at the linguistic level the distortion induced by differential privacy. Experimental results from a representational similarity analysis indicate that the overall similarity of internal representations is substantially reduced. Using probing tasks to unpack this dissimilarity, we find evidence that text-to-text privatization affects the linguistic competence across several formalisms, encoding localized properties of words while falling short at encoding the contextual relationships between spans of words. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.07899 [pdf, other]

RoboCLIP: One Demonstration is Enough to Learn Robot Policies

Authors: Sumedh A Sontakke, Jesse Zhang, Sébastien M. R. Arnold, Karl Pertsch, Erdem Bıyık, Dorsa Sadigh, Chelsea Finn, Laurent Itti

Abstract: Reward specification is a notoriously difficult problem in reinforcement learning, requiring extensive expert supervision to design robust reward functions. Imitation learning (IL) methods attempt to circumvent these problems by utilizing expert demonstrations but typically require a large number of in-domain expert demonstrations. Inspired by advances in the field of Video-and-Language Models (VL… ▽ More Reward specification is a notoriously difficult problem in reinforcement learning, requiring extensive expert supervision to design robust reward functions. Imitation learning (IL) methods attempt to circumvent these problems by utilizing expert demonstrations but typically require a large number of in-domain expert demonstrations. Inspired by advances in the field of Video-and-Language Models (VLMs), we present RoboCLIP, an online imitation learning method that uses a single demonstration (overcoming the large data requirement) in the form of a video demonstration or a textual description of the task to generate rewards without manual reward function design. Additionally, RoboCLIP can also utilize out-of-domain demonstrations, like videos of humans solving the task for reward generation, circumventing the need to have the same demonstration and deployment domains. RoboCLIP utilizes pretrained VLMs without any finetuning for reward generation. Reinforcement learning agents trained with RoboCLIP rewards demonstrate 2-3 times higher zero-shot performance than competing imitation learning methods on downstream robot manipulation tasks, doing so using only one video/text demonstration. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2306.01471 [pdf, other]

Guiding Text-to-Text Privatization by Syntax

Authors: Stefan Arnold, Dilara Yesilbas, Sven Weinzierl

Abstract: Metric Differential Privacy is a generalization of differential privacy tailored to address the unique challenges of text-to-text privatization. By adding noise to the representation of words in the geometric space of embeddings, words are replaced with words located in the proximity of the noisy representation. Since embeddings are trained based on word co-occurrences, this mechanism ensures that… ▽ More Metric Differential Privacy is a generalization of differential privacy tailored to address the unique challenges of text-to-text privatization. By adding noise to the representation of words in the geometric space of embeddings, words are replaced with words located in the proximity of the noisy representation. Since embeddings are trained based on word co-occurrences, this mechanism ensures that substitutions stem from a common semantic context. Without considering the grammatical category of words, however, this mechanism cannot guarantee that substitutions play similar syntactic roles. We analyze the capability of text-to-text privatization to preserve the grammatical category of words after substitution and find that surrogate texts consist almost exclusively of nouns. Lacking the capability to produce surrogate texts that correlate with the structure of the sensitive texts, we encompass our analysis by transforming the privatization step into a candidate selection problem in which substitutions are directed to words with matching grammatical properties. We demonstrate a substantial improvement in the performance of downstream tasks by up to $4.66\%$ while retaining comparative privacy guarantees. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2306.01457 [pdf, other]

Driving Context into Text-to-Text Privatization

Authors: Stefan Arnold, Dilara Yesilbas, Sven Weinzierl

Abstract: \textit{Metric Differential Privacy} enables text-to-text privatization by adding calibrated noise to the vector of a word derived from an embedding space and projecting this noisy vector back to a discrete vocabulary using a nearest neighbor search. Since words are substituted without context, this mechanism is expected to fall short at finding substitutes for words with ambiguous meanings, such… ▽ More \textit{Metric Differential Privacy} enables text-to-text privatization by adding calibrated noise to the vector of a word derived from an embedding space and projecting this noisy vector back to a discrete vocabulary using a nearest neighbor search. Since words are substituted without context, this mechanism is expected to fall short at finding substitutes for words with ambiguous meanings, such as \textit{'bank'}. To account for these ambiguous words, we leverage a sense embedding and incorporate a sense disambiguation step prior to noise injection. We encompass our modification to the privatization mechanism with an estimation of privacy and utility. For word sense disambiguation on the \textit{Words in Context} dataset, we demonstrate a substantial increase in classification accuracy by $6.05\%$. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.01827 [pdf, other]

Cortical analysis of heterogeneous clinical brain MRI scans for large-scale neuroimaging studies

Authors: Karthik Gopinath, Douglas N. Greve, Sudeshna Das, Steve Arnold, Colin Magdamo, Juan Eugenio Iglesias

Abstract: Surface analysis of the cortex is ubiquitous in human neuroimaging with MRI, e.g., for cortical registration, parcellation, or thickness estimation. The convoluted cortical geometry requires isotropic scans (e.g., 1mm MPRAGEs) and good gray-white matter contrast for 3D reconstruction. This precludes the analysis of most brain MRI scans acquired for clinical purposes. Analyzing such scans would ena… ▽ More Surface analysis of the cortex is ubiquitous in human neuroimaging with MRI, e.g., for cortical registration, parcellation, or thickness estimation. The convoluted cortical geometry requires isotropic scans (e.g., 1mm MPRAGEs) and good gray-white matter contrast for 3D reconstruction. This precludes the analysis of most brain MRI scans acquired for clinical purposes. Analyzing such scans would enable neuroimaging studies with sample sizes that cannot be achieved with current research datasets, particularly for underrepresented populations and rare diseases. Here we present the first method for cortical reconstruction, registration, parcellation, and thickness estimation for clinical brain MRI scans of any resolution and pulse sequence. The methods has a learning component and a classical optimization module. The former uses domain randomization to train a CNN that predicts an implicit representation of the white matter and pial surfaces (a signed distance function) at 1mm isotropic resolution, independently of the pulse sequence and resolution of the input. The latter uses geometry processing to place the surfaces while accurately satisfying topological and geometric constraints, thus enabling subsequent parcellation and thickness estimation with existing methods. We present results on 5mm axial FLAIR scans from ADNI and on a highly heterogeneous clinical dataset with 5,000 scans. Code and data are publicly available at https://surfer.nmr.mgh.harvard.edu/fswiki/recon-all-clinical △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2302.06009 [pdf, other]

Policy-Induced Self-Supervision Improves Representation Finetuning in Visual RL

Authors: Sébastien M. R. Arnold, Fei Sha

Abstract: We study how to transfer representations pretrained on source tasks to target tasks in visual percept based RL. We analyze two popular approaches: freezing or finetuning the pretrained representations. Empirical studies on a set of popular tasks reveal several properties of pretrained representations. First, finetuning is required even when pretrained representations perfectly capture the informat… ▽ More We study how to transfer representations pretrained on source tasks to target tasks in visual percept based RL. We analyze two popular approaches: freezing or finetuning the pretrained representations. Empirical studies on a set of popular tasks reveal several properties of pretrained representations. First, finetuning is required even when pretrained representations perfectly capture the information required to solve the target task. Second, finetuned representations improve learnability and are more robust to noise. Third, pretrained bottom layers are task-agnostic and readily transferable to new tasks, while top layers encode task-specific information and require adaptation. Building on these insights, we propose a self-supervised objective that clusters representations according to the policy they induce, as opposed to traditional representation similarity measures which are policy-agnostic (e.g. Euclidean norm, cosine similarity). Together with freezing the bottom layers, this objective results in significantly better representation than frozen, finetuned, and self-supervised alternatives on a wide range of benchmarks. △ Less

Submitted 12 February, 2023; originally announced February 2023.

arXiv:2301.07799 [pdf, other]

doi 10.1016/j.neunet.2023.01.007

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Authors: Megan M. Baker, Alexander New, Mario Aguilar-Simon, Ziad Al-Halah, Sébastien M. R. Arnold, Ese Ben-Iwhiwhu, Andrew P. Brna, Ethan Brooks, Ryan C. Brown, Zachary Daniels, Anurag Daram, Fabien Delattre, Ryan Dellana, Eric Eaton, Haotian Fu, Kristen Grauman, Jesse Hostetler, Shariq Iqbal, Cassandra Kent, Nicholas Ketz, Soheil Kolouri, George Konidaris, Dhireesha Kudithipudi, Erik Learned-Miller, Seungwon Lee , et al. (22 additional authors not shown)

Abstract: Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through th… ▽ More Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: To appear in Neural Networks

arXiv:2210.07436 [pdf, other]

Smart Headset, Computer Vision and Machine Learning for Efficient Prawn Farm Management

Authors: Mingze Xi, Ashfaqur Rahman, Chuong Nguyen, Stuart Arnold, John McCulloch

Abstract: Understanding the growth and distribution of the prawns is critical for optimising the feed and harvest strategies. An inadequate understanding of prawn growth can lead to reduced financial gain, for example, crops are harvested too early. The key to maintaining a good understanding of prawn growth is frequent sampling. However, the most commonly adopted sampling practice, the cast net approach, i… ▽ More Understanding the growth and distribution of the prawns is critical for optimising the feed and harvest strategies. An inadequate understanding of prawn growth can lead to reduced financial gain, for example, crops are harvested too early. The key to maintaining a good understanding of prawn growth is frequent sampling. However, the most commonly adopted sampling practice, the cast net approach, is unable to sample the prawns at a high frequency as it is expensive and laborious. An alternative approach is to sample prawns from feed trays that farm workers inspect each day. This will allow growth data collection at a high frequency (each day). But measuring prawns manually each day is a laborious task. In this article, we propose a new approach that utilises smart glasses, depth camera, computer vision and machine learning to detect prawn distribution and growth from feed trays. A smart headset was built to allow farmers to collect prawn data while performing daily feed tray checks. A computer vision + machine learning pipeline was developed and demonstrated to detect the growth trends of prawns in 4 prawn ponds over a growing season. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Submitted to Elsevier Aquacultural Engineering

ACM Class: I.4; J.0

arXiv:2209.02032 [pdf, other]

doi 10.1073/pnas.2216399120

Robust machine learning segmentation for large-scale analysis of heterogeneous clinical brain MRI datasets

Authors: Benjamin Billot, Colin Magdamo, You Cheng, Steven E. Arnold, Sudeshna Das, Juan. E. Iglesias

Abstract: Every year, millions of brain MRI scans are acquired in hospitals, which is a figure considerably larger than the size of any research dataset. Therefore, the ability to analyse such scans could transform neuroimaging research. Yet, their potential remains untapped, since no automated algorithm is robust enough to cope with the high variability in clinical acquisitions (MR contrasts, resolutions,… ▽ More Every year, millions of brain MRI scans are acquired in hospitals, which is a figure considerably larger than the size of any research dataset. Therefore, the ability to analyse such scans could transform neuroimaging research. Yet, their potential remains untapped, since no automated algorithm is robust enough to cope with the high variability in clinical acquisitions (MR contrasts, resolutions, orientations, artefacts, subject populations). Here we present SynthSeg+, an AI segmentation suite that enables, for the first time, robust analysis of heterogeneous clinical datasets. In addition to whole-brain segmentation, SynthSeg+ also performs cortical parcellation, intracranial volume estimation, and automated detection of faulty segmentations (mainly caused by scans of very low quality). We demonstrate SynthSeg+ in seven experiments, including an ageing study on 14,000 scans, where it accurately replicates atrophy patterns observed on data of much higher quality. SynthSeg+ is publicly released as a ready-to-use tool to unlock the potential of quantitative morphometry. △ Less

Submitted 4 January, 2023; v1 submitted 5 September, 2022; originally announced September 2022.

Comments: under review, extension of MICCAI 2022 paper

arXiv:2206.10920 [pdf]

Recognising Affordances in Predicted Futures to Plan with Consideration of Non-canonical Affordance Effects

Authors: Solvi Arnold, Mami Kuroishi, Tadashi Adachi, Kimitoshi Yamazaki

Abstract: We propose a novel system for action sequence planning based on a combination of affordance recognition and a neural forward model predicting the effects of affordance execution. By performing affordance recognition on predicted futures, we avoid reliance on explicit affordance effect definitions for multi-step planning. Because the system learns affordance effects from experience data, the system… ▽ More We propose a novel system for action sequence planning based on a combination of affordance recognition and a neural forward model predicting the effects of affordance execution. By performing affordance recognition on predicted futures, we avoid reliance on explicit affordance effect definitions for multi-step planning. Because the system learns affordance effects from experience data, the system can foresee not just the canonical effects of an affordance, but also situation-specific side-effects. This allows the system to avoid planning failures due to such non-canonical effects, and makes it possible to exploit non-canonical effects for realising a given goal. We evaluate the system in simulation, on a set of test tasks that require consideration of canonical and non-canonical affordance effects. △ Less

Submitted 22 June, 2022; originally announced June 2022.

Comments: 8 pages, 8 figures, video: http://youtu.be/4naJ5IghHcg

ACM Class: I.2.9; I.2.6

arXiv:2206.02840 [pdf, other]

Spatial Acoustic Projection for 3D Imaging Sonar Reconstruction

Authors: Sascha Arnold, Bilal Wehbe

Abstract: In this work we present a novel method for reconstructing 3D surfaces using a multi-beam imaging sonar. We integrate the intensities measured by the sonar from different viewpoints for fixed cell positions in a 3D grid. For each cell we integrate a feature vector that holds the mean intensity for a discretized range of viewpoints. Based on the feature vectors and independent sparse range measureme… ▽ More In this work we present a novel method for reconstructing 3D surfaces using a multi-beam imaging sonar. We integrate the intensities measured by the sonar from different viewpoints for fixed cell positions in a 3D grid. For each cell we integrate a feature vector that holds the mean intensity for a discretized range of viewpoints. Based on the feature vectors and independent sparse range measurements that act as ground truth information, we train convolutional neural networks that allow us to predict the signed distance and direction to the nearest surface for each cell. The predicted signed distances can be projected into a truncated signed distance field (TSDF) along the predicted directions. Utilizing the marching cubes algorithm, a polygon mesh can be rendered from the TSDF. Our method allows a dense 3D reconstruction from a limited set of viewpoints and was evaluated on three real-world datasets. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: Preprint

Journal ref: IEEE International Conference on Robotics and Automation (ICRA) 2022

arXiv:2205.06359 [pdf, other]

doi 10.1007/978-3-031-05981-0_3

Deep Learning for Prawn Farming: Forecasting and Anomaly Detection

Authors: Joel Janek Dabrowski, Ashfaqur Rahman, Andrew Hellicar, Mashud Rana, Stuart Arnold

Abstract: We present a decision support system for managing water quality in prawn ponds. The system uses various sources of data and deep learning models in a novel way to provide 24-hour forecasting and anomaly detection of water quality parameters. It provides prawn farmers with tools to proactively avoid a poor growing environment, thereby optimising growth and reducing the risk of losing stock. This is… ▽ More We present a decision support system for managing water quality in prawn ponds. The system uses various sources of data and deep learning models in a novel way to provide 24-hour forecasting and anomaly detection of water quality parameters. It provides prawn farmers with tools to proactively avoid a poor growing environment, thereby optimising growth and reducing the risk of losing stock. This is a major shift for farmers who are forced to manage ponds by reactively correcting poor water quality conditions. To our knowledge, we are the first to apply Transformer as an anomaly detection model, and the first to apply anomaly detection in general to this aquaculture problem. Our technical contributions include adapting ForecastNet for multivariate data and adapting Transformer and the Attention model to incorporate weather forecast data into their decoders. We attain an average mean absolute percentage error of 12% for dissolved oxygen forecasts and we demonstrate two anomaly detection case studies. The system is successfully running in its second year of deployment on a commercial prawn farm. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Journal ref: Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science, vol 13282. Springer, Cham

arXiv:2203.01969 [pdf, other]

Robust Segmentation of Brain MRI in the Wild with Hierarchical CNNs and no Retraining

Authors: Benjamin Billot, Magdamo Colin, Sean E. Arnold, Sudeshna Das, Juan. E. Iglesias

Abstract: Retrospective analysis of brain MRI scans acquired in the clinic has the potential to enable neuroimaging studies with sample sizes much larger than those found in research datasets. However, analysing such clinical images "in the wild" is challenging, since subjects are scanned with highly variable protocols (MR contrast, resolution, orientation, etc.). Nevertheless, recent advances in convolutio… ▽ More Retrospective analysis of brain MRI scans acquired in the clinic has the potential to enable neuroimaging studies with sample sizes much larger than those found in research datasets. However, analysing such clinical images "in the wild" is challenging, since subjects are scanned with highly variable protocols (MR contrast, resolution, orientation, etc.). Nevertheless, recent advances in convolutional neural networks (CNNs) and domain randomisation for image segmentation, best represented by the publicly available method SynthSeg, may enable morphometry of clinical MRI at scale. In this work, we first evaluate SynthSeg on an uncurated, heterogeneous dataset of more than 10,000 scans acquired at Massachusetts General Hospital. We show that SynthSeg is generally robust, but frequently falters on scans with low signal-to-noise ratio or poor tissue contrast. Next, we propose SynthSeg+, a novel method that greatly mitigates these problems using a hierarchy of conditional segmentation and denoising CNNs. We show that this method is considerably more robust than SynthSeg, while also outperforming cascaded networks and state-of-the-art segmentation denoising methods. Finally, we apply our approach to a proof-of-concept volumetric study of ageing, where it closely replicates atrophy patterns observed in research studies conducted on high-quality, 1mm, T1-weighted scans. The code and trained model are publicly available at https://github.com/BBillot/SynthSeg. △ Less

Submitted 4 January, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: MICCAI 2022

arXiv:2202.07808 [pdf, other]

Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

Authors: Sebastien M. R. Arnold, Pierre L'Ecuyer, Liyu Chen, Yi-fan Chen, Fei Sha

Abstract: Reinforcement learning constantly deals with hard integrals, for example when computing expectations in policy evaluation and policy iteration. These integrals are rarely analytically solvable and typically estimated with the Monte Carlo method, which induces high variance in policy values and gradients. In this work, we propose to replace Monte Carlo samples with low-discrepancy point sets. We co… ▽ More Reinforcement learning constantly deals with hard integrals, for example when computing expectations in policy evaluation and policy iteration. These integrals are rarely analytically solvable and typically estimated with the Monte Carlo method, which induces high variance in policy values and gradients. In this work, we propose to replace Monte Carlo samples with low-discrepancy point sets. We combine policy gradient methods with Randomized Quasi-Monte Carlo, yielding variance-reduced formulations of policy gradient and actor-critic algorithms. These formulations are effective for policy evaluation and policy improvement, as they outperform state-of-the-art algorithms on standardized continuous control benchmarks. Our empirical analyses validate the intuition that replacing Monte Carlo with Quasi-Monte Carlo yields significantly more accurate gradient estimates. △ Less

Submitted 21 February, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

Comments: AISTATS 2022 camera ready; more info at: http://seba1511.net/projects/qrl/

arXiv:2108.01662 [pdf, other]

Uniform Sampling over Episode Difficulty

Authors: Sébastien M. R. Arnold, Guneet S. Dhillon, Avinash Ravichandran, Stefano Soatto

Abstract: Episodic training is a core ingredient of few-shot learning to train models on tasks with limited labelled data. Despite its success, episodic training remains largely understudied, prompting us to ask the question: what is the best way to sample episodes? In this paper, we first propose a method to approximate episode sampling distributions based on their difficulty. Building on this method, we p… ▽ More Episodic training is a core ingredient of few-shot learning to train models on tasks with limited labelled data. Despite its success, episodic training remains largely understudied, prompting us to ask the question: what is the best way to sample episodes? In this paper, we first propose a method to approximate episode sampling distributions based on their difficulty. Building on this method, we perform an extensive analysis and find that sampling uniformly over episode difficulty outperforms other sampling schemes, including curriculum and easy-/hard-mining. As the proposed sampling method is algorithm agnostic, we can leverage these insights to improve few-shot learning accuracies across many episodic training algorithms. We demonstrate the efficacy of our method across popular few-shot learning datasets, algorithms, network architectures, and protocols. △ Less

Submitted 15 January, 2022; v1 submitted 3 August, 2021; originally announced August 2021.

Comments: NeurIPS'21 camera ready

arXiv:2108.00775 [pdf, other]

Self-supervised Answer Retrieval on Clinical Notes

Authors: Paul Grundmann, Sebastian Arnold, Alexander Löser

Abstract: Retrieving answer passages from long documents is a complex task requiring semantic understanding of both discourse and document context. We approach this challenge specifically in a clinical scenario, where doctors retrieve cohorts of patients based on diagnoses and other latent medical aspects. We introduce CAPR, a rule-based self-supervision objective for training Transformer language models fo… ▽ More Retrieving answer passages from long documents is a complex task requiring semantic understanding of both discourse and document context. We approach this challenge specifically in a clinical scenario, where doctors retrieve cohorts of patients based on diagnoses and other latent medical aspects. We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching. In addition, we contribute a novel retrieval dataset based on clinical notes to simulate this scenario on a large corpus of clinical notes. We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. From our extensive evaluation on MIMIC-III and three other healthcare datasets, we report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages. This makes the model powerful especially in zero-shot scenarios where only limited training data is available. △ Less

Submitted 2 August, 2021; originally announced August 2021.

arXiv:2104.07255 [pdf, other]

Embedding Adaptation is Still Needed for Few-Shot Learning

Authors: Sébastien M. R. Arnold, Fei Sha

Abstract: Constructing new and more challenging tasksets is a fruitful methodology to analyse and understand few-shot classification methods. Unfortunately, existing approaches to building those tasksets are somewhat unsatisfactory: they either assume train and test task distributions to be identical -- which leads to overly optimistic evaluations -- or take a "worst-case" philosophy -- which typically requ… ▽ More Constructing new and more challenging tasksets is a fruitful methodology to analyse and understand few-shot classification methods. Unfortunately, existing approaches to building those tasksets are somewhat unsatisfactory: they either assume train and test task distributions to be identical -- which leads to overly optimistic evaluations -- or take a "worst-case" philosophy -- which typically requires additional human labor such as obtaining semantic class relationships. We propose ATG, a principled clustering method to defining train and test tasksets without additional human knowledge. ATG models train and test task distributions while requiring them to share a predefined amount of information. We empirically demonstrate the effectiveness of ATG in generating tasksets that are easier, in-between, or harder than existing benchmarks, including those that rely on semantic information. Finally, we leverage our generated tasksets to shed a new light on few-shot classification: gradient-based methods -- previously believed to underperform -- can outperform metric-based ones when transfer is most challenging. △ Less

Submitted 15 April, 2021; originally announced April 2021.

Comments: In submission

arXiv:2103.11226 [pdf, other]

Demystifying the Effects of Non-Independence in Federated Learning

Authors: Stefan Arnold, Dilara Yesilbas

Abstract: Federated Learning (FL) enables statistical models to be built on user-generated data without compromising data security and user privacy. For this reason, FL is well suited for on-device learning from mobile devices where data is abundant and highly privatized. Constrained by the temporal availability of mobile devices, only a subset of devices is accessible to participate in the iterative protoc… ▽ More Federated Learning (FL) enables statistical models to be built on user-generated data without compromising data security and user privacy. For this reason, FL is well suited for on-device learning from mobile devices where data is abundant and highly privatized. Constrained by the temporal availability of mobile devices, only a subset of devices is accessible to participate in the iterative protocol consisting of training and aggregation. In this study, we take a step toward better understanding the effect of non-independent data distributions arising from block-cyclic sampling. By conducting extensive experiments on visual classification, we measure the effects of block-cyclic sampling (both standalone and in combination with non-balanced block distributions). Specifically, we measure the alterations induced by block-cyclic sampling from the perspective of accuracy, fairness, and convergence rate. Experimental results indicate robustness to cycling over a two-block structure, e.g., due to time zones. In contrast, drawing data samples dependently from a multi-block structure significantly degrades the performance and rate of convergence by up to 26%. Moreover, we find that this performance degeneration is further aggravated by unbalanced block distributions to a point that can no longer be adequately compensated by higher communication and more frequent synchronization. △ Less

Submitted 20 March, 2021; originally announced March 2021.

Comments: 8 pages, 7 figures

arXiv:2103.08137 [pdf]

Cloth Manipulation Planning on Basis of Mesh Representations with Incomplete Domain Knowledge and Voxel-to-Mesh Estimation

Authors: Solvi Arnold, Daisuke Tanaka, Kimitoshi Yamazaki

Abstract: We consider the problem of open-goal planning for robotic cloth manipulation. Core of our system is a neural network trained as a forward model of cloth behaviour under manipulation, with planning performed through backpropagation. We introduce a neural network-based routine for estimating mesh representations from voxel input, and perform planning in mesh format internally. We address the problem… ▽ More We consider the problem of open-goal planning for robotic cloth manipulation. Core of our system is a neural network trained as a forward model of cloth behaviour under manipulation, with planning performed through backpropagation. We introduce a neural network-based routine for estimating mesh representations from voxel input, and perform planning in mesh format internally. We address the problem of planning with incomplete domain knowledge by means of an explicit epistemic uncertainty signal. This signal is calculated from prediction divergence between two instances of the forward model network and used to avoid epistemic uncertainty during planning. Finally, we introduce logic for handling restriction of grasp points to a discrete set of candidates, in order to accommodate graspability constraints imposed by robotic hardware. We evaluate the system's mesh estimation, prediction, and planning ability on simulated cloth for sequences of one to three manipulations. Comparative experiments confirm that planning on basis of estimated meshes improves accuracy compared to voxel-based planning, and that epistemic uncertainty avoidance improves performance under conditions of incomplete domain knowledge. Planning time cost is a few seconds. We additionally present qualitative results on robot hardware. △ Less

Submitted 12 November, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

Comments: 27 pages, 13 figures

arXiv:2008.12284 [pdf, ps, other]

learn2learn: A Library for Meta-Learning Research

Authors: Sébastien M. R. Arnold, Praateek Mahajan, Debajyoti Datta, Ian Bunner, Konstantinos Saitas Zarkias

Abstract: Meta-learning researchers face two fundamental issues in their empirical work: prototy** and reproducibility. Researchers are prone to make mistakes when prototy** new algorithms and tasks because modern meta-learning methods rely on unconventional functionalities of machine learning frameworks. In turn, reproducing existing results becomes a tedious endeavour -- a situation exacerbated by the… ▽ More Meta-learning researchers face two fundamental issues in their empirical work: prototy** and reproducibility. Researchers are prone to make mistakes when prototy** new algorithms and tasks because modern meta-learning methods rely on unconventional functionalities of machine learning frameworks. In turn, reproducing existing results becomes a tedious endeavour -- a situation exacerbated by the lack of standardized implementations and benchmarks. As a result, researchers spend inordinate amounts of time on implementing software rather than understanding and develo** new ideas. This manuscript introduces learn2learn, a library for meta-learning research focused on solving those prototy** and reproducibility issues. learn2learn provides low-level routines common across a wide-range of meta-learning techniques (e.g. meta-descent, meta-reinforcement learning, few-shot learning), and builds standardized interfaces to algorithms and benchmarks on top of them. In releasing learn2learn under a free and open source license, we hope to foster a community around standardized software for meta-learning research. △ Less

Submitted 27 August, 2020; v1 submitted 27 August, 2020; originally announced August 2020.

Comments: Software available at: https://github.com/learnables/learn2learn

arXiv:2002.00835 [pdf, other]

doi 10.1145/3366423.3380208

Learning Contextualized Document Representations for Healthcare Answer Retrieval

Authors: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers, Alexander Löser

Abstract: We present Contextual Discourse Vectors (CDV), a distributed document representation for efficient answer retrieval from long healthcare documents. Our approach is based on structured query tuples of entities and aspects from free text and medical taxonomies. Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical ent… ▽ More We present Contextual Discourse Vectors (CDV), a distributed document representation for efficient answer retrieval from long healthcare documents. Our approach is based on structured query tuples of entities and aspects from free text and medical taxonomies. Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse. We use our continuous representations to resolve queries with short latency using approximate nearest neighbor search on sentence level. We apply the CDV model for retrieving coherent answer passages from nine English public health resources from the Web, addressing both patients and medical professionals. Because there is no end-to-end training data available for all application scenarios, we train our model with self-supervised data from Wikipedia. We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking and is able to adapt to heterogeneous domains without additional fine-tuning. △ Less

Submitted 3 February, 2020; originally announced February 2020.

Comments: The Web Conference 2020 (WWW '20)

arXiv:1910.13603 [pdf, other]

When MAML Can Adapt Fast and How to Assist When It Cannot

Authors: Sébastien M. R. Arnold, Shariq Iqbal, Fei Sha

Abstract: Model-Agnostic Meta-Learning (MAML) and its variants have achieved success in meta-learning tasks on many datasets and settings. On the other hand, we have just started to understand and analyze how they are able to adapt fast to new tasks. For example, one popular hypothesis is that the algorithms learn good representations for transfer, as in multi-task learning. In this work, we contribute by p… ▽ More Model-Agnostic Meta-Learning (MAML) and its variants have achieved success in meta-learning tasks on many datasets and settings. On the other hand, we have just started to understand and analyze how they are able to adapt fast to new tasks. For example, one popular hypothesis is that the algorithms learn good representations for transfer, as in multi-task learning. In this work, we contribute by providing a series of empirical and theoretical studies, and discover several interesting yet previously unknown properties of the algorithm. We find MAML adapts better with a deep architecture even if the tasks need only a shallow one (and thus, no representation learning is needed). While echoing previous findings by others that the bottom layers in deep architectures enable representation learning, we also find that upper layers enable fast adaptation by being meta-learned to perform adaptive gradient update when generalizing to new tasks. Motivated by these findings, we study several meta-optimization approaches and propose a new one for learning to optimize adaptively. Those approaches attain stronger performance in meta-learning both shallower and deeper architectures than MAML. △ Less

Submitted 24 January, 2021; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: Accepted at AISTATS 2021

arXiv:1910.01249 [pdf, other]

Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator

Authors: James A. Preiss, Sébastien M. R. Arnold, Chen-Yu Wei, Marius Kloft

Abstract: We study the variance of the REINFORCE policy gradient estimator in environments with continuous state and action spaces, linear dynamics, quadratic cost, and Gaussian noise. These simple environments allow us to derive bounds on the estimator variance in terms of the environment and noise parameters. We compare the predictions of our bounds to the empirical variance in simulation experiments. We study the variance of the REINFORCE policy gradient estimator in environments with continuous state and action spaces, linear dynamics, quadratic cost, and Gaussian noise. These simple environments allow us to derive bounds on the estimator variance in terms of the environment and noise parameters. We compare the predictions of our bounds to the empirical variance in simulation experiments. △ Less

Submitted 2 October, 2019; originally announced October 2019.

Comments: Accepted at NeurIPS 2019 Workshop on Optimization Foundations for Reinforcement Learning. 7 pages + 6 pages appendix

arXiv:1906.03532 [pdf, other]

Reducing the variance in online optimization by transporting past gradients

Authors: Sébastien M. R. Arnold, Pierre-Antoine Manzagol, Reza Babanezhad, Ioannis Mitliagkas, Nicolas Le Roux

Abstract: Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing past gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transpo… ▽ More Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing past gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transport (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly. In addition to reducing the variance and bias of our updates over time, IGT can be used as a drop-in replacement for the gradient estimate in a number of well-understood methods such as heavy ball or Adam. We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal. △ Less

Submitted 18 June, 2019; v1 submitted 8 June, 2019; originally announced June 2019.

Comments: Open-source implementation available at: https://github.com/seba-1511/igt.pth

arXiv:1902.04793 [pdf, other]

SECTOR: A Neural Model for Coherent Topic Segmentation and Classification

Authors: Sebastian Arnold, Rudolf Schneider, Philippe Cudré-Mauroux, Felix A. Gers, Alexander Löser

Abstract: When searching for information, a human reader first glances over a document, spots relevant sections and then focuses on a few sentences for resolving her intention. However, the high variance of document structure complicates to identify the salient topic of a given section at a glance. To tackle this challenge, we present SECTOR, a model to support machine reading systems by segmenting document… ▽ More When searching for information, a human reader first glances over a document, spots relevant sections and then focuses on a few sentences for resolving her intention. However, the high variance of document structure complicates to identify the salient topic of a given section at a glance. To tackle this challenge, we present SECTOR, a model to support machine reading systems by segmenting documents into coherent sections and assigning topic labels to each section. Our deep neural network architecture learns a latent topic embedding over the course of a document. This can be leveraged to classify local topics from plain text and segment a document at topic shifts. In addition, we contribute WikiSection, a publicly available dataset with 242k labeled sections in English and German from two distinct domains: diseases and cities. From our extensive evaluation of 20 architectures, we report a highest score of 71.6% F1 for the segmentation and classification of 30 topics from the English city domain, scored by our SECTOR LSTM model with bloom filter embeddings and bidirectional segmentation. This is a significant improvement of 29.5 points F1 compared to state-of-the-art CNN classifiers with baseline segmentation. △ Less

Submitted 13 February, 2019; originally announced February 2019.

Comments: Author's final version, accepted for publication at TACL, 2019

arXiv:1805.08011 [pdf, other]

Robust Model-Aided Inertial Localization for Autonomous Underwater Vehicles

Authors: Sascha Arnold, Lashika Medagoda

Abstract: This paper presents a manifold based Unscented Kalman Filter that applies a novel strategy for inertial, model-aiding and Acoustic Doppler Current Profiler (ADCP) measurement incorporation. The filter is capable of observing and utilizing the Earth rotation for heading estimation with a tactical grade IMU, and utilizes information from the vehicle model during DVL drop outs. The drag and thrust mo… ▽ More This paper presents a manifold based Unscented Kalman Filter that applies a novel strategy for inertial, model-aiding and Acoustic Doppler Current Profiler (ADCP) measurement incorporation. The filter is capable of observing and utilizing the Earth rotation for heading estimation with a tactical grade IMU, and utilizes information from the vehicle model during DVL drop outs. The drag and thrust model-aiding accounts for the correlated nature of vehicle model parameter error by applying them as states in the filter. ADCP-aiding provides further information for the model-aiding in the case of DVL bottom-lock loss. Additionally this work was implemented using the MTK and ROCK framework in C++, and is capable of running in real-time on computing available on the FlatFish AUV. The IMU biases are estimated in a fully coupled approach in the navigation filter. Heading convergence is shown on a real-world data set. Further experiments show that the filter is capable of consistent positioning, and data denial validates the method for DVL dropouts due to very low or high altitude scenarios. △ Less

Submitted 27 November, 2018; v1 submitted 21 May, 2018; originally announced May 2018.

Comments: IEEE International Conference on Robotics and Automation (ICRA) 2018, Accepted

arXiv:1709.05070 [pdf, other]

Shapechanger: Environments for Transfer Learning

Authors: Sébastien M. R. Arnold, Tsam Kiu Pun, Théo-Tim J. Denisart, Francisco J. Valero-Cuevas

Abstract: We present Shapechanger, a library for transfer reinforcement learning specifically designed for robotic tasks. We consider three types of knowledge transfer---from simulation to simulation, from simulation to real, and from real to real---and a wide range of tasks with continuous states and actions. Shapechanger is under active development and open-sourced at: https://github.com/seba-1511/shapech… ▽ More We present Shapechanger, a library for transfer reinforcement learning specifically designed for robotic tasks. We consider three types of knowledge transfer---from simulation to simulation, from simulation to real, and from real to real---and a wide range of tasks with continuous states and actions. Shapechanger is under active development and open-sourced at: https://github.com/seba-1511/shapechanger/. △ Less

Submitted 15 September, 2017; originally announced September 2017.

Comments: Presented at the SoCal 2017 Robotics Symposium

arXiv:1709.05069 [pdf, other]

Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix

Authors: Sébastien M. R. Arnold, Chunming Wang

Abstract: We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently implement a distributed approximation of the Newton-Raphson method. We also present preliminary results which underline advantages and challenges of second-order meth… ▽ More We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently implement a distributed approximation of the Newton-Raphson method. We also present preliminary results which underline advantages and challenges of second-order methods for large stochastic optimization problems. In particular, our work suggests that novel strategies for combining gradients provide further information on the loss surface. △ Less

Submitted 15 September, 2017; originally announced September 2017.

Comments: ICLR17 Workshop Track

arXiv:1609.03666 [pdf, other]

A Greedy Algorithm to Cluster Specialists

Authors: Sébastien Arnold

Abstract: Several recent deep neural networks experiments leverage the generalist-specialist paradigm for classification. However, no formal study compared the performance of different clustering algorithms for class assignment. In this paper we perform such a study, suggest slight modifications to the clustering procedures, and propose a novel algorithm designed to optimize the performance of of the specia… ▽ More Several recent deep neural networks experiments leverage the generalist-specialist paradigm for classification. However, no formal study compared the performance of different clustering algorithms for class assignment. In this paper we perform such a study, suggest slight modifications to the clustering procedures, and propose a novel algorithm designed to optimize the performance of of the specialist-generalist classification system. Our experiments on the CIFAR-10 and CIFAR-100 datasets allow us to investigate situations for varying number of classes on similar data. We find that our \emph{greedy pairs} clustering algorithm consistently outperforms other alternatives, while the choice of the confusion matrix has little impact on the final performance. △ Less

Submitted 12 September, 2016; originally announced September 2016.

arXiv:1608.06757 [pdf, other]

Robust Named Entity Recognition in Idiosyncratic Domains

Authors: Sebastian Arnold, Felix A. Gers, Torsten Kilias, Alexander Löser

Abstract: Named entity recognition often fails in idiosyncratic domains. That causes a problem for depending tasks, such as entity linking and relation extraction. We propose a generic and robust approach for high-recall named entity recognition. Our approach is easy to train and offers strong generalization over diverse domain-specific language, such as news documents (e.g. Reuters) or biomedical text (e.g… ▽ More Named entity recognition often fails in idiosyncratic domains. That causes a problem for depending tasks, such as entity linking and relation extraction. We propose a generic and robust approach for high-recall named entity recognition. Our approach is easy to train and offers strong generalization over diverse domain-specific language, such as news documents (e.g. Reuters) or biomedical text (e.g. Medline). Our approach is based on deep contextual sequence learning and utilizes stacked bidirectional LSTM networks. Our model is trained with only few hundred labeled sentences and does not rely on further external knowledge. We report from our results F1 scores in the range of 84-94% on standard datasets. △ Less

Submitted 24 August, 2016; originally announced August 2016.

Comments: 8 pages, 1 figure

arXiv:1402.4758 [pdf]

doi 10.14445/22315381/IJETT-V8P273

On Cloud-based Oversubscription

Authors: Rachel Householder, Scott Arnold, Robert Green

Abstract: Rising trends in the number of customers turning to the cloud for their computing needs has made effective resource allocation imperative for cloud service providers. In order to maximize profits and reduce waste, providers have started to explore the role of oversubscribing cloud resources. However, the benefits of cloud-based oversubscription are not without inherent risks. This paper attempts t… ▽ More Rising trends in the number of customers turning to the cloud for their computing needs has made effective resource allocation imperative for cloud service providers. In order to maximize profits and reduce waste, providers have started to explore the role of oversubscribing cloud resources. However, the benefits of cloud-based oversubscription are not without inherent risks. This paper attempts to unveil the incentives, risks, and techniques behind oversubscription in a cloud infrastructure. Additionally, an overview of the current research that has been completed on this highly relevant topic is reviewed, and suggestions are made regarding potential avenues for future work. △ Less

Submitted 5 March, 2014; v1 submitted 19 February, 2014; originally announced February 2014.

Comments: 7 pages, 3 figures

Journal ref: International Journal of Engineering Trends and Technology(IJETT), V8(8),425-431 February 2014. ISSN:2231-5381

Showing 1–38 of 38 results for author: Arnold, S