-
Particle Filter Optimization: A Bayesian Approach for Global Stochastic Optimization
Authors:
Mostafa Eslami,
Maryam Babazadeh
Abstract:
This paper introduces a novel global optimization algorithm called Particle Filter Optimization (PFO), designed for a class of stochastic problems. PFO leverages the Bayesian inference framework of Particle Filters (PF) by integrating the optimization problem into the PF estimation process. In this context, the objective function replaces the measurement, and a customized transitional prior is dev…
▽ More
This paper introduces a novel global optimization algorithm called Particle Filter Optimization (PFO), designed for a class of stochastic problems. PFO leverages the Bayesian inference framework of Particle Filters (PF) by integrating the optimization problem into the PF estimation process. In this context, the objective function replaces the measurement, and a customized transitional prior is developed to function as state dynamics. This dynamic replaces classic acquisition function and grants the PF a local optimization capability, facilitating its transformation towards global optimization. In PFO, the particles serve as agents in the optimization problem. Given the noisy nature of measured outputs, the Unscented Transform (UT) is utilized to estimate the true mean, thereby reducing the impact of erroneous information on particle transitions and weight updates. The algorithm is designed to minimize the introduction of unnecessary parameters and adheres to theoretically validated PF procedures, resulting in a robust heuristic algorithm supported by rigorous theoretical foundations.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Public Technologies Transforming Work of the Public and the Public Sector
Authors:
Seyun Kim,
Bonnie Fan,
Willa Yunqi Yang,
Jessie Ramey,
Sarah E Fox,
Haiyi Zhu,
John Zimmerman,
Motahhare Eslami
Abstract:
Technologies adopted by the public sector have transformed the work practices of employees in public agencies by creating different means of communication and decision-making. Although much of the recent research in the future of work domain has concentrated on the effects of technological advancements on public sector employees, the influence on work practices of external stakeholders engaging wi…
▽ More
Technologies adopted by the public sector have transformed the work practices of employees in public agencies by creating different means of communication and decision-making. Although much of the recent research in the future of work domain has concentrated on the effects of technological advancements on public sector employees, the influence on work practices of external stakeholders engaging with this sector remains under-explored. In this paper, we focus on a digital platform called OneStop which is deployed by several building departments across the U.S. and aims to integrate various steps and services into a single point of online contact between public sector employees and the public. Drawing on semi-structured interviews with 22 stakeholders, including local business owners, experts involved in the construction process, community representatives, and building department employees, we investigate how this technology transition has impacted the work of these different stakeholders. We observe a multifaceted perspective and experience caused by the adoption of OneStop. OneStop exacerbated inequitable practices for local business owners due to a lack of face-to-face interactions with the department employees. For the public sector employees, OneStop standardized the work practices, representing the building department's priorities and values. Based on our findings, we discuss tensions around standardization, equality, and equity in technology transition, as well as design implications for equitable practices in the public sector.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Advancing Multimodal Medical Capabilities of Gemini
Authors:
Lin Yang,
Shawn Xu,
Andrew Sellergren,
Timo Kohlberger,
Yuchen Zhou,
Ira Ktena,
Atilla Kiraly,
Faruk Ahmed,
Farhad Hormozdiari,
Tiam Jaroensri,
Eric Wang,
Ellery Wulczyn,
Fayaz Jamil,
Theo Guidroz,
Chuck Lau,
Siyuan Qiao,
Yun Liu,
Akshay Goel,
Kendall Park,
Arnav Agharwal,
Nick George,
Yang Wang,
Ryutaro Tanno,
David G. T. Barrett,
Wei-Hung Weng
, et al. (22 additional authors not shown)
Abstract:
Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop…
▽ More
Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Capabilities of Gemini Models in Medicine
Authors:
Khaled Saab,
Tao Tu,
Wei-Hung Weng,
Ryutaro Tanno,
David Stutz,
Ellery Wulczyn,
Fan Zhang,
Tim Strother,
Chunjong Park,
Elahe Vedadi,
Juanma Zambrano Chaves,
Szu-Yeu Hu,
Mike Schaekermann,
Aishwarya Kamath,
Yong Cheng,
David G. T. Barrett,
Cathy Cheung,
Basil Mustafa,
Anil Palepu,
Daniel McDuff,
Le Hou,
Tomer Golany,
Luyang Liu,
Jean-baptiste Alayrac,
Neil Houlsby
, et al. (42 additional authors not shown)
Abstract:
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G…
▽ More
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.
△ Less
Submitted 1 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Children's Overtrust and Shifting Perspectives of Generative AI
Authors:
Jaemarie Solyst,
Ellia Yang,
Shixian Xie,
Jessica Hammer,
Amy Ogan,
Motahhare Eslami
Abstract:
The capabilities of generative AI (genAI) have dramatically increased in recent times, and there are opportunities for children to leverage new features for personal and school-related endeavors. However, while the future of genAI is taking form, there remain potentially harmful limitations, such as generation of outputs with misinformation and bias. We ran a workshop study focused on ChatGPT to e…
▽ More
The capabilities of generative AI (genAI) have dramatically increased in recent times, and there are opportunities for children to leverage new features for personal and school-related endeavors. However, while the future of genAI is taking form, there remain potentially harmful limitations, such as generation of outputs with misinformation and bias. We ran a workshop study focused on ChatGPT to explore middle school girls' (N = 26) attitudes and reasoning about how genAI works. We focused on girls who are often disproportionately impacted by algorithmic bias. We found that: (1) middle school girls were initially overtrusting of genAI, (2) deliberate exposure to the limitations and mistakes of generative AI shifted this overtrust to disillusionment about genAI capabilities, though they were still optimistic for future possibilities of genAI, and (3) their ideas about school policy were nuanced. This work informs how children think about genAI like ChatGPT and its integration in learning settings.
△ Less
Submitted 29 June, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
The Fall of an Algorithm: Characterizing the Dynamics Toward Abandonment
Authors:
Nari Johnson,
Sanika Moharana,
Christina N. Harrington,
Nazanin Andalibi,
Hoda Heidari,
Motahhare Eslami
Abstract:
As more algorithmic systems have come under scrutiny for their potential to inflict societal harms, an increasing number of organizations that hold power over harmful algorithms have chosen (or were required under the law) to abandon them. While social movements and calls to abandon harmful algorithms have emerged across application domains, little academic attention has been paid to studying aban…
▽ More
As more algorithmic systems have come under scrutiny for their potential to inflict societal harms, an increasing number of organizations that hold power over harmful algorithms have chosen (or were required under the law) to abandon them. While social movements and calls to abandon harmful algorithms have emerged across application domains, little academic attention has been paid to studying abandonment as a means to mitigate algorithmic harms. In this paper, we take a first step towards conceptualizing "algorithm abandonment" as an organization's decision to stop designing, develo**, or using an algorithmic system due to its (potential) harms. We conduct a thematic analysis of real-world cases of algorithm abandonment to characterize the dynamics leading to this outcome. Our analysis of 40 cases reveals that campaigns to abandon an algorithm follow a common process of six iterative phases: discovery, diagnosis, dissemination, dialogue, decision, and death, which we term the "6 D's of abandonment". In addition, we highlight key factors that facilitate (or prohibit) abandonment, which include characteristics of both the technical and social systems that the algorithm is embedded within. We discuss implications for several stakeholders, including proprietors and technologists who have the power to influence an algorithm's (dis)continued use, FAccT researchers, and policymakers.
△ Less
Submitted 12 May, 2024; v1 submitted 21 April, 2024;
originally announced April 2024.
-
The Future of Research on Social Technologies: CCC Workshop Visioning Report
Authors:
Motahhare Eslami,
Eric Gilbert,
Sarita Schoenebeck,
Eric P. S. Baumer,
Eshwar Chandrasekharan,
Michelle De Mooy,
Karrie Karahalios,
David Karger,
Tressie McMillan Cottom,
Andrés Monroy-Hernández,
Loren Terveen,
John Wihbey
Abstract:
Social technologies are the systems, interfaces, features, infrastructures, and architectures that allow people to interact with each other online. These technologies dramatically shape the fabric of our everyday lives, from the information we consume to the people we interact with to the foundations of our culture and politics. While the benefits of social technologies are well documented, the ha…
▽ More
Social technologies are the systems, interfaces, features, infrastructures, and architectures that allow people to interact with each other online. These technologies dramatically shape the fabric of our everyday lives, from the information we consume to the people we interact with to the foundations of our culture and politics. While the benefits of social technologies are well documented, the harms, too, have cast a long shadow. To address widespread problems like harassment, disinformation, information access, and mental health concerns, we need to rethink the foundations of how social technologies are designed, sustained, and governed.
This report is based on discussions at the Computing Community Consortium Workshop, The Future of Research on Social Technologies, that was held November 2-3, 2023 in Washington, DC. The visioning workshop came together to focus on two questions. What should we know about social technologies, and what is needed to get there? The workshop brought together over 50 information and computer scientists, social scientists, communication and journalism scholars, and policy experts. We used a discussion format, with one day of guiding topics and a second day using an unconference model where participants created discussion topics. The interdisciplinary group of attendees discussed gaps in existing scholarship and the methods, resources, access, and collective effort needed to address those gaps. We also discussed approaches for translating scholarship for various audiences including citizens, funders, educators, industry professionals, and policymakers.
This report presents a synthesis of major themes during our discussions. The themes presented are not a summary of what we know already, they are an exploration of what we do not know enough about, and what we should spend more effort and investment on in the coming years.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Breaking Political Filter Bubbles via Social Comparison
Authors:
Nouran Soliman,
Motahhare Eslami,
Karrie Karahalios
Abstract:
Online social platforms allow users to filter out content they do not like. According to selective exposure theory, people tend to view content they agree with more to get more self-assurance. This causes people to live in ideological filter bubbles. We report on a user study that encourages users to break the political filter bubble of their Twitter feed by reading more diverse viewpoints through…
▽ More
Online social platforms allow users to filter out content they do not like. According to selective exposure theory, people tend to view content they agree with more to get more self-assurance. This causes people to live in ideological filter bubbles. We report on a user study that encourages users to break the political filter bubble of their Twitter feed by reading more diverse viewpoints through social comparison. The user study is conducted using political-bias analyzing and Twitter-mirroring tools to compare the political slant of what a user reads and what other Twitter users read about a topic, and in general. The results show that social comparison can have a great impact on users' reading behavior by motivating them to read viewpoints from the opposing political party.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
SCARF: Securing Chips with a Robust Framework against Fabrication-time Hardware Trojans
Authors:
Mohammad Eslami,
Tara Ghasempouri,
Samuel Pagliarini
Abstract:
The globalization of the semiconductor industry has introduced security challenges to Integrated Circuits (ICs), particularly those related to the threat of Hardware Trojans (HTs) - malicious logic that can be introduced during IC fabrication. While significant efforts are directed towards verifying the correctness and reliability of ICs, their security is often overlooked. In this paper, we propo…
▽ More
The globalization of the semiconductor industry has introduced security challenges to Integrated Circuits (ICs), particularly those related to the threat of Hardware Trojans (HTs) - malicious logic that can be introduced during IC fabrication. While significant efforts are directed towards verifying the correctness and reliability of ICs, their security is often overlooked. In this paper, we propose a comprehensive approach to enhance IC security from the front-end to back-end stages of design. Initially, we outline a systematic method to transform existing verification assets into potent security checkers by repurposing verification assertions. To further improve security, we introduce an innovative technique for integrating online monitors during physical synthesis - a back-end insertion providing an additional layer of defense. Experimental results demonstrate a significant increase in security, measured by our introduced metric, Security Coverage (SC), with a marginal rise in area and power consumption, typically under 20\%. The insertion of online monitors during physical synthesis enhances security metrics by up to 33.5\%. This holistic approach offers a comprehensive and resilient defense mechanism across the entire spectrum of IC design.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
The Public Algorithms Survey in Allegheny County
Authors:
Yu-Ru Lin,
Beth Schwanke,
Rosta Farzan,
Bonnie Fan,
Motahhare Eslami,
Hong Shen,
Sarah Fox
Abstract:
This survey study focuses on public opinion regarding the use of algorithmic decision-making in government sectors, specifically in Allegheny County, Pennsylvania. Algorithms are becoming increasingly prevalent in various public domains, including both routine and high-stakes government functions. Despite their growing use, public sentiment remains divided, with concerns about privacy and accuracy…
▽ More
This survey study focuses on public opinion regarding the use of algorithmic decision-making in government sectors, specifically in Allegheny County, Pennsylvania. Algorithms are becoming increasingly prevalent in various public domains, including both routine and high-stakes government functions. Despite their growing use, public sentiment remains divided, with concerns about privacy and accuracy juxtaposed against perceptions of fairness when compared to human decision-making. In April 2021, a survey was conducted among nearly 1,500 county residents to explore their awareness, experiences, and attitudes towards these algorithms. The study highlights diverse viewpoints influenced by factors such as race, age, education, gender, income, and urban or suburban living. The results demonstrate the complexity of public sentiment towards algorithmic governance and emphasize the need for a nuanced understanding and approach in policy and implementation.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
SALSy: Security-Aware Layout Synthesis
Authors:
Mohammad Eslami,
Tiago Perez,
Samuel Pagliarini
Abstract:
Integrated Circuits (ICs) are the target of diverse attacks during their lifetime. Fabrication-time attacks, such as the insertion of Hardware Trojans, can give an adversary access to privileged data and/or the means to corrupt the IC's internal computation. Post-fabrication attacks, where the end-user takes a malicious role, also attempt to obtain privileged information through means such as faul…
▽ More
Integrated Circuits (ICs) are the target of diverse attacks during their lifetime. Fabrication-time attacks, such as the insertion of Hardware Trojans, can give an adversary access to privileged data and/or the means to corrupt the IC's internal computation. Post-fabrication attacks, where the end-user takes a malicious role, also attempt to obtain privileged information through means such as fault injection and probing. Taking these threats into account and at the same time, this paper proposes a methodology for Security-Aware Layout Synthesis (SALSy), such that ICs can be designed with security in mind in the same manner as power-performance-area (PPA) metrics are considered today, a concept known as security closure. Furthermore, the trade-offs between PPA and security are considered and a chip is fabricated in a 65nm CMOS commercial technology for validation purposes - a feature not seen in previous research on security closure. Measurements on the fabricated ICs indicate that SALSy promotes a modest increase in power in order to achieve significantly improved security metrics.
△ Less
Submitted 21 August, 2023; v1 submitted 11 August, 2023;
originally announced August 2023.
-
Investigating Practices and Opportunities for Cross-functional Collaboration around AI Fairness in Industry Practice
Authors:
Wesley Hanwen Deng,
Nur Yildirim,
Monica Chang,
Motahhare Eslami,
Ken Holstein,
Michael Madaio
Abstract:
An emerging body of research indicates that ineffective cross-functional collaboration -- the interdisciplinary work done by industry practitioners across roles -- represents a major barrier to addressing issues of fairness in AI design and development. In this research, we sought to better understand practitioners' current practices and tactics to enact cross-functional collaboration for AI fairn…
▽ More
An emerging body of research indicates that ineffective cross-functional collaboration -- the interdisciplinary work done by industry practitioners across roles -- represents a major barrier to addressing issues of fairness in AI design and development. In this research, we sought to better understand practitioners' current practices and tactics to enact cross-functional collaboration for AI fairness, in order to identify opportunities to support more effective collaboration. We conducted a series of interviews and design workshops with 23 industry practitioners spanning various roles from 17 companies. We found that practitioners engaged in bridging work to overcome frictions in understanding, contextualization, and evaluation around AI fairness across roles. In addition, in organizational contexts with a lack of resources and incentives for fairness work, practitioners often piggybacked on existing requirements (e.g., for privacy assessments) and AI development norms (e.g., the use of quantitative evaluation metrics), although they worry that these tactics may be fundamentally compromised. Finally, we draw attention to the invisible labor that practitioners take on as part of this bridging and piggybacking work to enact interdisciplinary collaboration for fairness. We close by discussing opportunities for both FAccT researchers and AI practitioners to better support cross-functional collaboration for fairness in the design and development of AI systems.
△ Less
Submitted 10 June, 2023;
originally announced June 2023.
-
Sequential Data-Assisted Control in Flight
Authors:
Mostafa Eslami,
Afshin Banazadeh
Abstract:
Flight dynamics involve uncertainties in parameters, aerodynamic derivatives, and engine thrust. These uncertainties can be categorized into three types: known-predictable, known-unpredictable, and unknown. While advanced control systems typically rely on high-fidelity dynamical models in dealing with known-predictable uncertainties, simplified approaches are used for the second and third categori…
▽ More
Flight dynamics involve uncertainties in parameters, aerodynamic derivatives, and engine thrust. These uncertainties can be categorized into three types: known-predictable, known-unpredictable, and unknown. While advanced control systems typically rely on high-fidelity dynamical models in dealing with known-predictable uncertainties, simplified approaches are used for the second and third categories to manage the complexities involved in synthesis and implementation. In this paper, the focus is on accurately modeling the internal dynamics, which primarily deal with parametric uncertainties. Real-time data is employed to identify uncertainties in the remaining external dynamics, including both known-unpredictable and unknown aspects. To address these uncertainties and maintain optimal performance, stability, and robustness, the authors propose a framework known as Sequential Data-Assisted Control (SDAC). This framework involves using a model-based nonlinear controller for the internal dynamics to provide the desired momentum to a data-based controller responsible for the external dynamics. By the momentum through the internal dynamics and leveraging the Koopman operator, the linear evolution of momentum is derived. This information is then utilized by the data-based controller to assign appropriate control inputs. The proposed approach establishes a novel foundation for a comprehensive analysis of maneuverability, stabilizability, and controllability. To evaluate the performance of SDAC, closed-loop simulations are conducted using the NASA Generic Transport Model (GTM). The data-based controller employs Linear Quadratic Regulator (LQR), while the model-based controller uses robust sliding mode control. A comparison is made with a pure robust nonlinear controller, demonstrating significant performance improvements, particularly in cases involving known-unpredictable uncertainties.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
Participation and Division of Labor in User-Driven Algorithm Audits: How Do Everyday Users Work together to Surface Algorithmic Harms?
Authors:
Rena Li,
Sara Kingsley,
Chelsea Fan,
Proteeti Sinha,
Nora Wai,
Jaimie Lee,
Hong Shen,
Motahhare Eslami,
Jason Hong
Abstract:
Recent years have witnessed an interesting phenomenon in which users come together to interrogate potentially harmful algorithmic behaviors they encounter in their everyday lives. Researchers have started to develop theoretical and empirical understandings of these user driven audits, with a hope to harness the power of users in detecting harmful machine behaviors. However, little is known about u…
▽ More
Recent years have witnessed an interesting phenomenon in which users come together to interrogate potentially harmful algorithmic behaviors they encounter in their everyday lives. Researchers have started to develop theoretical and empirical understandings of these user driven audits, with a hope to harness the power of users in detecting harmful machine behaviors. However, little is known about user participation and their division of labor in these audits, which are essential to support these collective efforts in the future. Through collecting and analyzing 17,984 tweets from four recent cases of user driven audits, we shed light on patterns of user participation and engagement, especially with the top contributors in each case. We also identified the various roles user generated content played in these audits, including hypothesizing, data collection, amplification, contextualization, and escalation. We discuss implications for designing tools to support user driven audits and users who labor to raise awareness of algorithm bias.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Towards "Anytime, Anywhere" Community Learning and Engagement around the Design of Public Sector AI
Authors:
Wesley Hanwen Deng,
Motahhare Eslami,
Kenneth Holstein
Abstract:
Data-driven algorithmic and AI systems are increasingly being deployed to automate or augment decision processes across a wide range of public service settings. Yet community members are often unaware of the presence, operation, and impacts of these systems on their lives. With the shift towards algorithmic decision-making in public services, technology developers increasingly assume the role of d…
▽ More
Data-driven algorithmic and AI systems are increasingly being deployed to automate or augment decision processes across a wide range of public service settings. Yet community members are often unaware of the presence, operation, and impacts of these systems on their lives. With the shift towards algorithmic decision-making in public services, technology developers increasingly assume the role of de-facto policymakers, and opportunities for democratic participation are foreclosed. In this position paper, we articulate an early vision around the design of ubiquitous infrastructure for public learning and engagement around civic AI technologies. Building on this vision, we provide a list of questions that we hope can prompt stimulating conversations among the HCI community.
△ Less
Submitted 21 April, 2023; v1 submitted 31 March, 2023;
originally announced April 2023.
-
Investigating Girls' Perspectives and Knowledge Gaps on Ethics and Fairness in Artificial Intelligence in a Lightweight Workshop
Authors:
Jaemarie Solyst,
Alexis Axon,
Angela E. B. Stewart,
Motahhare Eslami,
Amy Ogan
Abstract:
Artificial intelligence (AI) is everywhere, with many children having increased exposure to AI technologies in daily life. We aimed to understand middle school girls' (a group often excluded group in tech) perceptions and knowledge gaps about AI. We created and explored the feasibility of a lightweight (less than 3 hours) educational workshop in which learners considered challenges in their lives…
▽ More
Artificial intelligence (AI) is everywhere, with many children having increased exposure to AI technologies in daily life. We aimed to understand middle school girls' (a group often excluded group in tech) perceptions and knowledge gaps about AI. We created and explored the feasibility of a lightweight (less than 3 hours) educational workshop in which learners considered challenges in their lives and communities and critically considered how existing and future AI could have an impact. After the workshop, learners had nuanced perceptions of AI, understanding AI can both help and harm. We discuss design implications for creating educational experiences in AI and fairness that embolden learners.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Data-Assisted Control -- A Framework Development by Exploiting NASA GTM Platform
Authors:
Mostafa Eslami,
Afshin Banazadeh
Abstract:
Today's focus on expanding the capabilities of control systems, resulting from the abundance of data and computational resources, requires data-based alternatives over model-based ones. These alternatives may become the sole tool for analysis and synthesis. Nevertheless, mathematical models are available to some extent, especially for air and space vehicles. Hypothetically, data assistance would b…
▽ More
Today's focus on expanding the capabilities of control systems, resulting from the abundance of data and computational resources, requires data-based alternatives over model-based ones. These alternatives may become the sole tool for analysis and synthesis. Nevertheless, mathematical models are available to some extent, especially for air and space vehicles. Hypothetically, data assistance would be the approach to meet the requirements in collaboration with the model. In this paper, a framework of Data-Assisted Control (DAC) for aerospace vehicles is proposed. NASA Generic Transport Model (GTM) is the platform for the study and the data supports the model-based controller in extending performance over a damage event. The framework requires real-time decisions to override the control law with the information obtained from the data, while the model-based controller does not show regular performance. The closed-loop system is shown to be stable in the transition phase between the data and the model. The fixed dynamic parameters are estimated using the Dual Unscented Kalman Filter (DUKF) and the evolution of the generalized force moments is estimated using the Koopman estimator. Simulations have shown that the purely model-based robust control leads to degradation of the closed-loop performance in case of damage, suggesting the need for data assistance.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Self-supervised video pretraining yields human-aligned visual representations
Authors:
Nikhil Parthasarathy,
S. M. Ali Eslami,
João Carreira,
Olivier J. Hénaff
Abstract:
Humans learn powerful representations of objects and scenes by observing how they evolve over time. Yet, outside of specific tasks that require explicit temporal understanding, static image pretraining remains the dominant paradigm for learning visual foundation models. We question this mismatch, and ask whether video pretraining can yield visual representations that bear the hallmarks of human pe…
▽ More
Humans learn powerful representations of objects and scenes by observing how they evolve over time. Yet, outside of specific tasks that require explicit temporal understanding, static image pretraining remains the dominant paradigm for learning visual foundation models. We question this mismatch, and ask whether video pretraining can yield visual representations that bear the hallmarks of human perception: generalisation across tasks, robustness to perturbations, and consistency with human judgements. To that end we propose a novel procedure for curating videos, and develop a contrastive framework which learns from the complex transformations therein. This simple paradigm for distilling knowledge from videos, called VITO, yields general representations that far outperform prior video pretraining methods on image understanding tasks, and image pretraining methods on video understanding tasks. Moreover, VITO representations are significantly more robust to natural and synthetic deformations than image-, video-, and adversarially-trained ones. Finally, VITO's predictions are strongly aligned with human judgements, surpassing models that were specifically trained for that purpose. Together, these results suggest that video pretraining could be a simple way of learning unified, robust, and human-aligned representations of the visual world.
△ Less
Submitted 25 July, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Understanding Practices, Challenges, and Opportunities for User-Engaged Algorithm Auditing in Industry Practice
Authors:
Wesley Hanwen Deng,
Bill Boyuan Guo,
Alicia DeVrio,
Hong Shen,
Motahhare Eslami,
Kenneth Holstein
Abstract:
Recent years have seen growing interest among both researchers and practitioners in user-engaged approaches to algorithm auditing, which directly engage users in detecting problematic behaviors in algorithmic systems. However, we know little about industry practitioners' current practices and challenges around user-engaged auditing, nor what opportunities exist for them to better leverage such app…
▽ More
Recent years have seen growing interest among both researchers and practitioners in user-engaged approaches to algorithm auditing, which directly engage users in detecting problematic behaviors in algorithmic systems. However, we know little about industry practitioners' current practices and challenges around user-engaged auditing, nor what opportunities exist for them to better leverage such approaches in practice. To investigate, we conducted a series of interviews and iterative co-design activities with practitioners who employ user-engaged auditing approaches in their work. Our findings reveal several challenges practitioners face in appropriately recruiting and incentivizing user auditors, scaffolding user audits, and deriving actionable insights from user-engaged audit reports. Furthermore, practitioners shared organizational obstacles to user-engaged auditing, surfacing a complex relationship between practitioners and user auditors. Based on these findings, we discuss opportunities for future HCI research to help realize the potential (and the mitigate risks) of user-engaged auditing in industry practice.
△ Less
Submitted 21 February, 2023; v1 submitted 7 October, 2022;
originally announced October 2022.
-
Affective Medical Estimation and Decision Making via Visualized Learning and Deep Learning
Authors:
Mohammad Eslami,
Solale Tabarestani,
Ehsan Adeli,
Glyn Elwyn,
Tobias Elze,
Mengyu Wang,
Nazlee Zebardast,
Nassir Navab,
Malek Adjouadi
Abstract:
With the advent of sophisticated machine learning (ML) techniques and the promising results they yield, especially in medical applications, where they have been investigated for different tasks to enhance the decision-making process. Since visualization is such an effective tool for human comprehension, memorization, and judgment, we have presented a first-of-its-kind estimation approach we refer…
▽ More
With the advent of sophisticated machine learning (ML) techniques and the promising results they yield, especially in medical applications, where they have been investigated for different tasks to enhance the decision-making process. Since visualization is such an effective tool for human comprehension, memorization, and judgment, we have presented a first-of-its-kind estimation approach we refer to as Visualized Learning for Machine Learning (VL4ML) that not only can serve to assist physicians and clinicians in making reasoned medical decisions, but it also allows to appreciate the uncertainty visualization, which could raise incertitude in making the appropriate classification or prediction. For the proof of concept, and to demonstrate the generalized nature of this visualized estimation approach, five different case studies are examined for different types of tasks including classification, regression, and longitudinal prediction. A survey analysis with more than 100 individuals is also conducted to assess users' feedback on this visualized estimation method. The experiments and the survey demonstrate the practical merits of the VL4ML that include: (1) appreciating visually clinical/medical estimations; (2) getting closer to the patients' preferences; (3) improving doctor-patient communication, and (4) visualizing the uncertainty introduced through the black box effect of the deployed ML algorithm. All the source codes are shared via a GitHub repository.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
A Novel Service Deployment Policy in Fog Computing Considering The Degree of Availability and Fog Landscape Utilization Using Multiobjective Evolutionary Algorithms
Authors:
Maryam Eslami,
Mehdi Sakhaei
Abstract:
Fog computing is a promising paradigm for real-time and mission-critical Internet of Things (IoT) applications. Regarding the high distribution, heterogeneity, and limitation of fog resources, applications should be placed in a distributed manner to fully utilize these resources. In this paper, we propose a linear formulation for assuring the different availability requirements of application serv…
▽ More
Fog computing is a promising paradigm for real-time and mission-critical Internet of Things (IoT) applications. Regarding the high distribution, heterogeneity, and limitation of fog resources, applications should be placed in a distributed manner to fully utilize these resources. In this paper, we propose a linear formulation for assuring the different availability requirements of application services while maximizing the utilization of fog resources. We also compare three multiobjective evolutionary algorithms, namely MOPSO, NSGA-II, and MOEA/D for a trade-off between the mentioned optimization goals. The evaluation results in the iFogSim simulator demonstrate the efficiency of all three algorithms and a generally better behavior of MOPSO algorithm in terms of obtained objective values, application deadline satisfaction, and execution time.
△ Less
Submitted 4 February, 2022;
originally announced February 2022.
-
From data to functa: Your data point is a function and you can treat it like one
Authors:
Emilien Dupont,
Hyunjik Kim,
S. M. Ali Eslami,
Danilo Rezende,
Dan Rosenbaum
Abstract:
It is common practice in deep learning to represent a measurement of the world on a discrete grid, e.g. a 2D grid of pixels. However, the underlying signal represented by these measurements is often continuous, e.g. the scene depicted in an image. A powerful continuous alternative is then to represent these measurements using an implicit neural representation, a neural function trained to output t…
▽ More
It is common practice in deep learning to represent a measurement of the world on a discrete grid, e.g. a 2D grid of pixels. However, the underlying signal represented by these measurements is often continuous, e.g. the scene depicted in an image. A powerful continuous alternative is then to represent these measurements using an implicit neural representation, a neural function trained to output the appropriate measurement value for any input spatial location. In this paper, we take this idea to its next level: what would it take to perform deep learning on these functions instead, treating them as data? In this context we refer to the data as functa, and propose a framework for deep learning on functa. This view presents a number of challenges around efficient conversion from data to functa, compact representation of functa, and effectively solving downstream tasks on functa. We outline a recipe to overcome these challenges and apply it to a wide range of data modalities including images, 3D shapes, neural radiance fields (NeRF) and data on manifolds. We demonstrate that this approach has various compelling properties across data modalities, in particular on the canonical tasks of generative modeling, data imputation, novel view synthesis and classification. Code: https://github.com/deepmind/functa
△ Less
Submitted 10 November, 2022; v1 submitted 28 January, 2022;
originally announced January 2022.
-
Reusing Verification Assertions as Security Checkers for Hardware Trojan Detection
Authors:
Mohammad Eslami,
Tara Ghasempouri,
Samuel Pagliarini
Abstract:
Globalization in the semiconductor industry enables fabless design houses to reduce their costs, save time, and make use of newer technologies. However, the offshoring of Integrated Circuit (IC) fabrication has negative sides, including threats such as Hardware Trojans (HTs) - a type of malicious logic that is not trivial to detect. One aspect of IC design that is not affected by globalization is…
▽ More
Globalization in the semiconductor industry enables fabless design houses to reduce their costs, save time, and make use of newer technologies. However, the offshoring of Integrated Circuit (IC) fabrication has negative sides, including threats such as Hardware Trojans (HTs) - a type of malicious logic that is not trivial to detect. One aspect of IC design that is not affected by globalization is the need for thorough verification. Verification engineers devise complex assets to make sure designs are bug-free, including assertions. This knowledge is typically not reused once verification is over. The premise of this paper is that verification assets that already exist can be turned into effective security checkers for HT detection. For this purpose, we show how assertions can be used as online monitors. To this end, we propose a security metric and an assertion selection flow that leverages Cadence JasperGold Security Path Verification (SPV). The experimental results show that our approach scales for industry-size circuits by analyzing more than 100 assertions for different Intellectual Properties (IPs) of the OpenTitan System-on-Chip (SoC). Moreover, our detection solution is pragmatic since it does not rely on the HT activation mechanism.
△ Less
Submitted 30 January, 2023; v1 submitted 4 January, 2022;
originally announced January 2022.
-
Extreme events in a broad-area semiconductor laser with coherent injection
Authors:
Cristina Rimoldi,
Mansour Eslami,
Franco Prati,
Giovanna Tissoni
Abstract:
Spatiotemporal extreme events are interesting phenomena, both from a fundamental point of view, as manifestations of complexity in dynamical systems, and for their possible applications in different research fields. Here, we present some recent results about extreme events in spatially extended semiconductor laser systems (broad-area VCSELs) with coherent injection. We study the statistics of spat…
▽ More
Spatiotemporal extreme events are interesting phenomena, both from a fundamental point of view, as manifestations of complexity in dynamical systems, and for their possible applications in different research fields. Here, we present some recent results about extreme events in spatially extended semiconductor laser systems (broad-area VCSELs) with coherent injection. We study the statistics of spatiotemporal intensity peaks occurring in the transverse (x,y) section of the field perpendicular to the light propagation direction and identify regions in the parameter space where extreme events are more likely to occur. Searching for precursors of these phenomena, we concentrate, on one hand, on the spatiotemporal dynamics of the field phase and in particular on the presence of optical vortices in the vicinity of an extreme event. On the other hand, we focus on the laser gain dynamics and the phase space trajectories of the system close to the occurrence of an extreme event. Both these complementary approaches are successful and allow us to shed some light on potential prediction strategies.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
Deep Variational Clustering Framework for Self-labeling of Large-scale Medical Images
Authors:
Farzin Soleymani,
Mohammad Eslami,
Tobias Elze,
Bernd Bischl,
Mina Rezaei
Abstract:
We propose a Deep Variational Clustering (DVC) framework for unsupervised representation learning and clustering of large-scale medical images. DVC simultaneously learns the multivariate Gaussian posterior through the probabilistic convolutional encoder and the likelihood distribution with the probabilistic convolutional decoder; and optimizes cluster labels assignment. Here, the learned multivari…
▽ More
We propose a Deep Variational Clustering (DVC) framework for unsupervised representation learning and clustering of large-scale medical images. DVC simultaneously learns the multivariate Gaussian posterior through the probabilistic convolutional encoder and the likelihood distribution with the probabilistic convolutional decoder; and optimizes cluster labels assignment. Here, the learned multivariate Gaussian posterior captures the latent distribution of a large set of unlabeled images. Then, we perform unsupervised clustering on top of the variational latent space using a clustering loss. In this approach, the probabilistic decoder helps to prevent the distortion of data points in the latent space and to preserve the local structure of data generating distribution. The training process can be considered as a self-training process to refine the latent space and simultaneously optimizing cluster assignments iteratively. We evaluated our proposed framework on three public datasets that represented different medical imaging modalities. Our experimental results show that our proposed framework generalizes better across different datasets. It achieves compelling results on several medical imaging benchmarks. Thus, our approach offers potential advantages over conventional deep unsupervised learning in real-world applications. The source code of the method and all the experiments are available publicly at: https://github.com/csfarzin/DVC
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
2D optical rogue waves affected by transverse carrier diffusion in broad-area semiconductor lasers with a saturable absorber
Authors:
Kamel Talouneh,
Reza Kheradmand,
Giovanna Tissoni,
Mansour Eslami
Abstract:
Statistics and dynamics of 2D rogue waves in a broad-area semiconductor laser with an intracavity saturable absorber are numerically investigated under the effect of transverse carrier diffusion. We show that lateral diffusion of carriers alters the statistics of rogue waves by enhancing their formation in smaller ratios of carrier lifetimes in the active and passive materials while suppressing th…
▽ More
Statistics and dynamics of 2D rogue waves in a broad-area semiconductor laser with an intracavity saturable absorber are numerically investigated under the effect of transverse carrier diffusion. We show that lateral diffusion of carriers alters the statistics of rogue waves by enhancing their formation in smaller ratios of carrier lifetimes in the active and passive materials while suppressing them when the ratio is larger. Temporal dynamics of the emitted rogue waves is also studied and shown that finite nonzero transverse carrier diffusion coefficient gives them a longer duration. To further approach the realistic experimental situation, we also investigated statistics and dynamics of rogue waves by simulating a circular disk-shape pump which replaces the flat pump profile typically used in numerical simulations of broad-area lasers. We show that finite pump shape reduces the number emitted rogue waves per unit area for large carrier lifetime ratios and increases that for smaller values of the ratio in both below and above laser threshold. Temporal width of the emitted rogue waves is also shown to reduce as a consequence of removing the nonphysical effects of infinite flat pump on carrier dynamics.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Inferring a Continuous Distribution of Atom Coordinates from Cryo-EM Images using VAEs
Authors:
Dan Rosenbaum,
Marta Garnelo,
Michal Zielinski,
Charlie Beattie,
Ellen Clancy,
Andrea Huber,
Pushmeet Kohli,
Andrew W. Senior,
John Jumper,
Carl Doersch,
S. M. Ali Eslami,
Olaf Ronneberger,
Jonas Adler
Abstract:
Cryo-electron microscopy (cryo-EM) has revolutionized experimental protein structure determination. Despite advances in high resolution reconstruction, a majority of cryo-EM experiments provide either a single state of the studied macromolecule, or a relatively small number of its conformations. This reduces the effectiveness of the technique for proteins with flexible regions, which are known to…
▽ More
Cryo-electron microscopy (cryo-EM) has revolutionized experimental protein structure determination. Despite advances in high resolution reconstruction, a majority of cryo-EM experiments provide either a single state of the studied macromolecule, or a relatively small number of its conformations. This reduces the effectiveness of the technique for proteins with flexible regions, which are known to play a key role in protein function. Recent methods for capturing conformational heterogeneity in cryo-EM data model it in volume space, making recovery of continuous atomic structures challenging. Here we present a fully deep-learning-based approach using variational auto-encoders (VAEs) to recover a continuous distribution of atomic protein structures and poses directly from picked particle images and demonstrate its efficacy on realistic simulated data. We hope that methods built on this work will allow incorporation of stronger prior information about protein structure and enable better understanding of non-rigid protein structures.
△ Less
Submitted 26 June, 2021;
originally announced June 2021.
-
Multimodal Few-Shot Learning with Frozen Language Models
Authors:
Maria Tsimpoukelli,
Jacob Menick,
Serkan Cabi,
S. M. Ali Eslami,
Oriol Vinyals,
Felix Hill
Abstract:
When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, we present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language). Using aligned image and caption data, we train a vision encoder to represent each im…
▽ More
When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, we present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language). Using aligned image and caption data, we train a vision encoder to represent each image as a sequence of continuous embeddings, such that a pre-trained, frozen language model prompted with this prefix generates the appropriate caption. The resulting system is a multimodal few-shot learner, with the surprising ability to learn a variety of new tasks when conditioned on examples, represented as a sequence of multiple interleaved image and text embeddings. We demonstrate that it can rapidly learn words for new objects and novel visual categories, do visual question-answering with only a handful of examples, and make use of outside knowledge, by measuring a single model on a variety of established and new benchmarks.
△ Less
Submitted 3 July, 2021; v1 submitted 25 June, 2021;
originally announced June 2021.
-
Extracting Global Dynamics of Loss Landscape in Deep Learning Models
Authors:
Mohammed Eslami,
Hamed Eramian,
Marcio Gameiro,
William Kalies,
Konstantin Mischaikow
Abstract:
Deep learning models evolve through training to learn the manifold in which the data exists to satisfy an objective. It is well known that evolution leads to different final states which produce inconsistent predictions of the same test data points. This calls for techniques to be able to empirically quantify the difference in the trajectories and highlight problematic regions. While much focus is…
▽ More
Deep learning models evolve through training to learn the manifold in which the data exists to satisfy an objective. It is well known that evolution leads to different final states which produce inconsistent predictions of the same test data points. This calls for techniques to be able to empirically quantify the difference in the trajectories and highlight problematic regions. While much focus is placed on discovering what models learn, the question of how a model learns is less studied beyond theoretical landscape characterizations and local geometric approximations near optimal conditions. Here, we present a toolkit for the Dynamical Organization Of Deep Learning Loss Landscapes, or DOODL3. DOODL3 formulates the training of neural networks as a dynamical system, analyzes the learning process, and presents an interpretable global view of trajectories in the loss landscape. Our approach uses the coarseness of topology to capture the granularity of geometry to mitigate against states of instability or elongated training. Overall, our analysis presents an empirical framework to extract the global dynamics of a model and to use that information to guide the training of neural networks.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Feasibility Assessment of Multitasking in MRI Neuroimaging Analysis: Tissue Segmentation, Cross-Modality Conversion and Bias correction
Authors:
Mohammad Eslami,
Solale Tabarestani,
Malek Adjouadi
Abstract:
Neuroimaging is essential in brain studies for the diagnosis and identification of disease, structure, and function of the brain in its healthy and disease states. Literature shows that there are advantages of multitasking with some deep learning (DL) schemes in challenging neuroimaging applications. This study examines the feasibility of using multitasking in three different applications, includi…
▽ More
Neuroimaging is essential in brain studies for the diagnosis and identification of disease, structure, and function of the brain in its healthy and disease states. Literature shows that there are advantages of multitasking with some deep learning (DL) schemes in challenging neuroimaging applications. This study examines the feasibility of using multitasking in three different applications, including tissue segmentation, cross-modality conversion, and bias-field correction. These applications reflect five different scenarios in which multitasking is explored and 280 training and testing sessions conducted for empirical evaluations. Two well-known networks, U-Net as a well-known convolutional neural network architecture, and a closed architecture based on the conditional generative adversarial network are implemented. Different metrics such as the normalized cross-correlation coefficient and Dice scores are used for comparison of methods and results of the different experiments. Statistical analysis is also provided by paired t-test. The present study explores the pros and cons of these methods and their practical impacts on multitasking in different implementation scenarios. This investigation shows that bias correction and cross-modality conversion applications are significantly easier than the segmentation application, and having multitasking with segmentation is not reasonable if one of them is identified as the main target application. However, when the main application is the segmentation of tissues, multitasking with cross-modality conversion is beneficial, especially for the U-net architecture.
△ Less
Submitted 31 May, 2021;
originally announced May 2021.
-
From Motor Control to Team Play in Simulated Humanoid Football
Authors:
Siqi Liu,
Guy Lever,
Zhe Wang,
Josh Merel,
S. M. Ali Eslami,
Daniel Hennes,
Wojciech M. Czarnecki,
Yuval Tassa,
Shayegan Omidshafiei,
Abbas Abdolmaleki,
Noah Y. Siegel,
Leonard Hasenclever,
Luke Marris,
Saran Tunyasuvunakool,
H. Francis Song,
Markus Wulfmeier,
Paul Muller,
Tuomas Haarnoja,
Brendan D. Tracey,
Karl Tuyls,
Thore Graepel,
Nicolas Heess
Abstract:
Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents…
▽ More
Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents. Recent research in artificial intelligence has shown the promise of learning-based approaches to the respective problems of complex movement, longer-term planning and multi-agent coordination. However, there is limited research aimed at their integration. We study this problem by training teams of physically simulated humanoid avatars to play football in a realistic virtual environment. We develop a method that combines imitation learning, single- and multi-agent reinforcement learning and population-based training, and makes use of transferable representations of behaviour for decision making at different levels of abstraction. In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds. We investigate the emergence of behaviours at different levels of abstraction, as well as the representations that underlie these behaviours using several analysis techniques, including statistics from real-world sports analytics. Our work constitutes a complete demonstration of integrated decision-making at multiple scales in a physically embodied multi-agent setting. See project video at https://youtu.be/KHMwq9pv7mg.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Everyday algorithm auditing: Understanding the power of everyday users in surfacing harmful algorithmic behaviors
Authors:
Hong Shen,
Alicia DeVos,
Motahhare Eslami,
Kenneth Holstein
Abstract:
A growing body of literature has proposed formal approaches to audit algorithmic systems for biased and harmful behaviors. While formal auditing approaches have been greatly impactful, they often suffer major blindspots, with critical issues surfacing only in the context of everyday use once systems are deployed. Recent years have seen many cases in which everyday users of algorithmic systems dete…
▽ More
A growing body of literature has proposed formal approaches to audit algorithmic systems for biased and harmful behaviors. While formal auditing approaches have been greatly impactful, they often suffer major blindspots, with critical issues surfacing only in the context of everyday use once systems are deployed. Recent years have seen many cases in which everyday users of algorithmic systems detect and raise awareness about harmful behaviors that they encounter in the course of their everyday interactions with these systems. However, to date little academic attention has been granted to these bottom-up, user-driven auditing processes. In this paper, we propose and explore the concept of everyday algorithm auditing, a process in which users detect, understand, and interrogate problematic machine behaviors via their day-to-day interactions with algorithmic systems. We argue that everyday users are powerful in surfacing problematic machine behaviors that may elude detection via more centrally-organized forms of auditing, regardless of users' knowledge about the underlying algorithms. We analyze several real-world cases of everyday algorithm auditing, drawing lessons from these cases for the design of future platforms and tools that facilitate such auditing behaviors. Finally, we discuss work that lies ahead, toward bridging the gaps between formal auditing approaches and the organic auditing behaviors that emerge in everyday use of algorithmic systems.
△ Less
Submitted 24 August, 2021; v1 submitted 6 May, 2021;
originally announced May 2021.
-
Generative Art Using Neural Visual Grammars and Dual Encoders
Authors:
Chrisantha Fernando,
S. M. Ali Eslami,
Jean-Baptiste Alayrac,
Piotr Mirowski,
Dylan Banarse,
Simon Osindero
Abstract:
Whilst there are perhaps only a few scientific methods, there seem to be almost as many artistic methods as there are artists. Artistic processes appear to inhabit the highest order of open-endedness. To begin to understand some of the processes of art making it is helpful to try to automate them even partially. In this paper, a novel algorithm for producing generative art is described which allow…
▽ More
Whilst there are perhaps only a few scientific methods, there seem to be almost as many artistic methods as there are artists. Artistic processes appear to inhabit the highest order of open-endedness. To begin to understand some of the processes of art making it is helpful to try to automate them even partially. In this paper, a novel algorithm for producing generative art is described which allows a user to input a text string, and which in a creative response to this string, outputs an image which interprets that string. It does so by evolving images using a hierarchical neural Lindenmeyer system, and evaluating these images along the way using an image text dual encoder trained on billions of images and their associated text from the internet. In doing so we have access to and control over an instance of an artistic process, allowing analysis of which aspects of the artistic process become the task of the algorithm, and which elements remain the responsibility of the artist.
△ Less
Submitted 3 May, 2021; v1 submitted 1 May, 2021;
originally announced May 2021.
-
Temporal cavity solitons and frequency combs via quantum interference
Authors:
Gian-Luca Oppo,
David Grant,
Mansour Eslami
Abstract:
Temporal cavity solitons in ring microresonators provide broad and controllable generation of frequency combs with applications in frequency standards and precise atomic clocks. Three level media in the Λ configuration inside microresonators displaying electromagnetically induced transparency can be used for the generation of temporal cavity solitons and frequency combs in the presence of anomalou…
▽ More
Temporal cavity solitons in ring microresonators provide broad and controllable generation of frequency combs with applications in frequency standards and precise atomic clocks. Three level media in the Λ configuration inside microresonators displaying electromagnetically induced transparency can be used for the generation of temporal cavity solitons and frequency combs in the presence of anomalous dispersion and two external driving fields close to resonance. Here, domain walls separating regions of two dark states due to quantum interference correspond to realizations of stimulated Raman adiabatic passage without input pulses. With no need of modulational instabilities, bright temporal cavity solitons and frequency combs are formed when these domain walls lock with each other. Wide stability ranges, close to resonance operation and optimal shape of the cavity solitons due to three-level quantum interference can make them preferable to those in two-level media.
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
Explaining the Black-box Smoothly- A Counterfactual Approach
Authors:
Sumedha Singla,
Motahhare Eslami,
Brian Pollack,
Stephen Wallace,
Kayhan Batmanghelich
Abstract:
We propose a BlackBox Counterfactual Explainer, designed to explain image classification models for medical applications. Classical approaches (e.g., saliency maps) that assess feature importance do not explain "how" imaging features in important anatomical regions are relevant to the classification decision. Our framework explains the decision for a target class by gradually "exaggerating" the se…
▽ More
We propose a BlackBox Counterfactual Explainer, designed to explain image classification models for medical applications. Classical approaches (e.g., saliency maps) that assess feature importance do not explain "how" imaging features in important anatomical regions are relevant to the classification decision. Our framework explains the decision for a target class by gradually "exaggerating" the semantic effect of the class in a query image. We adopted a Generative Adversarial Network (GAN) to generate a progressive set of perturbations to a query image, such that the classification decision changes from its original class to its negation. We used counterfactual explanations from our framework to audit a classifier trained on a chest x-ray dataset with multiple labels. We proposed clinically-relevant quantitative metrics such as cardiothoracic ratio and the score of a healthy costophrenic recess to evaluate our explanations.
We conducted a human-grounded experiment with diagnostic radiology residents to compare different styles of explanations (no explanation, saliency map, cycleGAN explanation, and our counterfactual explanation) by evaluating different aspects of explanations: (1) understandability, (2) classifier's decision justification, (3) visual quality, (d) identity preservation, and (5) overall helpfulness of an explanation to the users. Our results show that our counterfactual explanation was the only explanation method that significantly improved the users' understanding of the classifier's decision compared to the no-explanation baseline. Our metrics established a benchmark for evaluating model explanation methods in medical images. Our explanations revealed that the classifier relied on clinically relevant radiographic features for its diagnostic decisions, thus making its decision-making process more transparent to the end-user.
△ Less
Submitted 18 November, 2022; v1 submitted 11 January, 2021;
originally announced January 2021.
-
Game Plan: What AI can do for Football, and What Football can do for AI
Authors:
Karl Tuyls,
Shayegan Omidshafiei,
Paul Muller,
Zhe Wang,
Jerome Connor,
Daniel Hennes,
Ian Graham,
William Spearman,
Tim Waskett,
Dafydd Steele,
Pauline Luc,
Adria Recasens,
Alexandre Galashov,
Gregory Thornton,
Romuald Elie,
Pablo Sprechmann,
Pol Moreno,
Kris Cao,
Marta Garnelo,
Praneet Dutta,
Michal Valko,
Nicolas Heess,
Alex Bridgland,
Julien Perolat,
Bart De Vylder
, et al. (11 additional authors not shown)
Abstract:
The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with t…
▽ More
The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players' and coordinated teams' behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual).
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Contrastive Training for Improved Out-of-Distribution Detection
Authors:
Jim Winkens,
Rudy Bunel,
Abhijit Guha Roy,
Robert Stanforth,
Vivek Natarajan,
Joseph R. Ledsam,
Patricia MacWilliams,
Pushmeet Kohli,
Alan Karthikesalingam,
Simon Kohl,
Taylan Cemgil,
S. M. Ali Eslami,
Olaf Ronneberger
Abstract:
Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems. This paper proposes and investigates the use of contrastive training to boost OOD detection performance. Unlike leading methods for OOD detection, our approach does not require access to examples labeled explicitly as OOD, which can be difficult to coll…
▽ More
Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems. This paper proposes and investigates the use of contrastive training to boost OOD detection performance. Unlike leading methods for OOD detection, our approach does not require access to examples labeled explicitly as OOD, which can be difficult to collect in practice. We show in extensive experiments that contrastive training significantly helps OOD detection performance on a number of common benchmarks. By introducing and employing the Confusion Log Probability (CLP) score, which quantifies the difficulty of the OOD detection task by capturing the similarity of inlier and outlier datasets, we show that our method especially improves performance in the `near OOD' classes -- a particularly challenging setting for previous methods.
△ Less
Submitted 10 July, 2020;
originally announced July 2020.
-
PolyGen: An Autoregressive Generative Model of 3D Meshes
Authors:
Charlie Nash,
Yaroslav Ganin,
S. M. Ali Eslami,
Peter W. Battaglia
Abstract:
Polygon meshes are an efficient representation of 3D geometry, and are of central importance in computer graphics, robotics and games development. Existing learning-based approaches have avoided the challenges of working with 3D meshes, instead using alternative object representations that are more compatible with neural architectures and training approaches. We present an approach which models th…
▽ More
Polygon meshes are an efficient representation of 3D geometry, and are of central importance in computer graphics, robotics and games development. Existing learning-based approaches have avoided the challenges of working with 3D meshes, instead using alternative object representations that are more compatible with neural architectures and training approaches. We present an approach which models the mesh directly, predicting mesh vertices and faces sequentially using a Transformer-based architecture. Our model can condition on a range of inputs, including object classes, voxels, and images, and because the model is probabilistic it can produce samples that capture uncertainty in ambiguous scenarios. We show that the model is capable of producing high-quality, usable meshes, and establish log-likelihood benchmarks for the mesh-modelling task. We also evaluate the conditional models on surface reconstruction metrics against alternative methods, and demonstrate competitive performance despite not training directly on this task.
△ Less
Submitted 23 February, 2020;
originally announced February 2020.
-
SignCol: Open-Source Software for Collecting Sign Language Gestures
Authors:
Mohammad Eslami,
Mahdi Karami,
Sedigheh Eslami,
Solale Tabarestani,
Farah Torkamani-Azar,
Christoph Meinel
Abstract:
Sign(ed) languages use gestures, such as hand or head movements, for communication. Sign language recognition is an assistive technology for individuals with hearing disability and its goal is to improve such individuals' life quality by facilitating their social involvement. Since sign languages are vastly varied in alphabets, as known as signs, a sign recognition software should be capable of ha…
▽ More
Sign(ed) languages use gestures, such as hand or head movements, for communication. Sign language recognition is an assistive technology for individuals with hearing disability and its goal is to improve such individuals' life quality by facilitating their social involvement. Since sign languages are vastly varied in alphabets, as known as signs, a sign recognition software should be capable of handling eight different types of sign combinations, e.g. numbers, letters, words and sentences. Due to the intrinsic complexity and diversity of symbolic gestures, recognition algorithms need a comprehensive visual dataset to learn by. In this paper, we describe the design and implementation of a Microsoft Kinect-based open source software, called SignCol, for capturing and saving the gestures used in sign languages. Our work supports a multi-language database and reports the recorded items statistics. SignCol can capture and store colored(RGB) frames, depth frames, infrared frames, body index frames, coordinate mapped color-body frames, skeleton information of each frame and camera parameters simultaneously.
△ Less
Submitted 31 October, 2019;
originally announced November 2019.
-
Unsupervised Doodling and Painting with Improved SPIRAL
Authors:
John F. J. Mellor,
Eunbyung Park,
Yaroslav Ganin,
Igor Babuschkin,
Tejas Kulkarni,
Dan Rosenbaum,
Andy Ballard,
Theophane Weber,
Oriol Vinyals,
S. M. Ali Eslami
Abstract:
We investigate using reinforcement learning agents as generative models of images (extending arXiv:1804.01118). A generative agent controls a simulated painting environment, and is trained with rewards provided by a discriminator network simultaneously trained to assess the realism of the agent's samples, either unconditional or reconstructions. Compared to prior work, we make a number of improvem…
▽ More
We investigate using reinforcement learning agents as generative models of images (extending arXiv:1804.01118). A generative agent controls a simulated painting environment, and is trained with rewards provided by a discriminator network simultaneously trained to assess the realism of the agent's samples, either unconditional or reconstructions. Compared to prior work, we make a number of improvements to the architectures of the agents and discriminators that lead to intriguing and at times surprising results. We find that when sufficiently constrained, generative agents can learn to produce images with a degree of visual abstraction, despite having only ever seen real photographs (no human brush strokes). And given enough time with the painting environment, they can produce images with considerable realism. These results show that, under the right circumstances, some aspects of human drawing can emerge from simulated embodiment, without the need for external supervision, imitation or social cues. Finally, we note the framework's potential for use in creative applications.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Automatic vocal tract landmark localization from midsagittal MRI data
Authors:
Mohammad Eslami,
Christiane Neuschaefer-Rube,
Antoine Serrurier
Abstract:
The various speech sounds of a language are obtained by varying the shape and position of the articulators surrounding the vocal tract. Analyzing their variations is crucial for understanding speech production, diagnosing speech disorders and planning therapy. Identifying key anatomical landmarks of these structures on medical images is a pre-requisite for any quantitative analysis and the rising…
▽ More
The various speech sounds of a language are obtained by varying the shape and position of the articulators surrounding the vocal tract. Analyzing their variations is crucial for understanding speech production, diagnosing speech disorders and planning therapy. Identifying key anatomical landmarks of these structures on medical images is a pre-requisite for any quantitative analysis and the rising amount of data generated in the field calls for an automatic solution. The challenge lies in the high inter- and intra-speaker variability, the mutual interaction between the articulators and the moderate quality of the images. This study addresses this issue for the first time and tackles it by means by means of Deep Learning. It proposes a dedicated network architecture named Flat-net and its performance are evaluated and compared with eleven state-of-the-art methods from the literature. The dataset contains midsagittal anatomical Magnetic Resonance Images for 9 speakers sustaining 62 articulations with 21 annotated anatomical landmarks per image. Results show that the Flat-net approach outperforms the former methods, leading to an overall Root Mean Square Error of 3.6 pixels/0.36 cm obtained in a leave-one-out procedure over the speakers. The implementation codes are also shared publicly on GitHub.
△ Less
Submitted 9 January, 2020; v1 submitted 18 July, 2019;
originally announced July 2019.
-
Image to Images Translation for Multi-Task Organ Segmentation and Bone Suppression in Chest X-Ray Radiography
Authors:
Mohammad Eslami,
Solale Tabarestani,
Shadi Albarqouni,
Ehsan Adeli,
Nassir Navab,
Malek Adjouadi
Abstract:
Chest X-ray radiography is one of the earliest medical imaging technologies and remains one of the most widely-used for diagnosis, screening, and treatment follow up of diseases related to lungs and heart. The literature in this field of research reports many interesting studies dealing with the challenging tasks of bone suppression and organ segmentation but performed separately, limiting any lea…
▽ More
Chest X-ray radiography is one of the earliest medical imaging technologies and remains one of the most widely-used for diagnosis, screening, and treatment follow up of diseases related to lungs and heart. The literature in this field of research reports many interesting studies dealing with the challenging tasks of bone suppression and organ segmentation but performed separately, limiting any learning that comes with the consolidation of parameters that could optimize both processes. This study, and for the first time, introduces a multitask deep learning model that generates simultaneously the bone-suppressed image and the organ-segmented image, enhancing the accuracy of tasks, minimizing the number of parameters needed by the model and optimizing the processing time, all by exploiting the interplay between the network parameters to benefit the performance of both tasks. The architectural design of this model, which relies on a conditional generative adversarial network, reveals the process on how the well-established pix2pix network (image-to-image network) is modified to fit the need for multitasking and extending it to the new image-to-images architecture. The developed source code of this multitask model is shared publicly on Github as the first attempt for providing the two-task pix2pix extension, a supervised/paired/aligned/registered image-to-images translation which would be useful in many multitask applications. Dilated convolutions are also used to improve the results through a more effective receptive field assessment. The comparison with state-of-the-art algorithms along with ablation study and a demonstration video are provided to evaluate efficacy and gauge the merits of the proposed approach.
△ Less
Submitted 31 December, 2019; v1 submitted 24 June, 2019;
originally announced June 2019.
-
A Hierarchical Probabilistic U-Net for Modeling Multi-Scale Ambiguities
Authors:
Simon A. A. Kohl,
Bernardino Romera-Paredes,
Klaus H. Maier-Hein,
Danilo Jimenez Rezende,
S. M. Ali Eslami,
Pushmeet Kohli,
Andrew Zisserman,
Olaf Ronneberger
Abstract:
Medical imaging only indirectly measures the molecular identity of the tissue within each voxel, which often produces only ambiguous image evidence for target measures of interest, like semantic segmentation. This diversity and the variations of plausible interpretations are often specific to given image regions and may thus manifest on various scales, spanning all the way from the pixel to the im…
▽ More
Medical imaging only indirectly measures the molecular identity of the tissue within each voxel, which often produces only ambiguous image evidence for target measures of interest, like semantic segmentation. This diversity and the variations of plausible interpretations are often specific to given image regions and may thus manifest on various scales, spanning all the way from the pixel to the image level. In order to learn a flexible distribution that can account for multiple scales of variations, we propose the Hierarchical Probabilistic U-Net, a segmentation network with a conditional variational auto-encoder (cVAE) that uses a hierarchical latent space decomposition. We show that this model formulation enables sampling and reconstruction of segmenations with high fidelity, i.e. with finely resolved detail, while providing the flexibility to learn complex structured distributions across scales. We demonstrate these abilities on the task of segmenting ambiguous medical scans as well as on instance segmentation of neurobiological and natural images. Our model automatically separates independent factors across scales, an inductive bias that we deem beneficial in structured output prediction tasks beyond segmentation.
△ Less
Submitted 30 May, 2019;
originally announced May 2019.
-
Data-Efficient Image Recognition with Contrastive Predictive Coding
Authors:
Olivier J. Hénaff,
Aravind Srinivas,
Jeffrey De Fauw,
Ali Razavi,
Carl Doersch,
S. M. Ali Eslami,
Aaron van den Oord
Abstract:
Human observers can learn to recognize new categories of images from a handful of examples, yet doing so with artificial ones remains an open challenge. We hypothesize that data-efficient recognition is enabled by representations which make the variability in natural signals more predictable. We therefore revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning suc…
▽ More
Human observers can learn to recognize new categories of images from a handful of examples, yet doing so with artificial ones remains an open challenge. We hypothesize that data-efficient recognition is enabled by representations which make the variability in natural signals more predictable. We therefore revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning such representations. This new implementation produces features which support state-of-the-art linear classification accuracy on the ImageNet dataset. When used as input for non-linear classification with deep neural networks, this representation allows us to use 2-5x less labels than classifiers trained directly on image pixels. Finally, this unsupervised representation substantially improves transfer learning to object detection on the PASCAL VOC dataset, surpassing fully supervised pre-trained ImageNet classifiers.
△ Less
Submitted 1 July, 2020; v1 submitted 22 May, 2019;
originally announced May 2019.
-
Meta-Learning surrogate models for sequential decision making
Authors:
Alexandre Galashov,
Jonathan Schwarz,
Hyunjik Kim,
Marta Garnelo,
David Saxton,
Pushmeet Kohli,
S. M. Ali Eslami,
Yee Whye Teh
Abstract:
We introduce a unified probabilistic framework for solving sequential decision making problems ranging from Bayesian optimisation to contextual bandits and reinforcement learning. This is accomplished by a probabilistic model-based approach that explains observed data while capturing predictive uncertainty during the decision making process. Crucially, this probabilistic model is chosen to be a Me…
▽ More
We introduce a unified probabilistic framework for solving sequential decision making problems ranging from Bayesian optimisation to contextual bandits and reinforcement learning. This is accomplished by a probabilistic model-based approach that explains observed data while capturing predictive uncertainty during the decision making process. Crucially, this probabilistic model is chosen to be a Meta-Learning system that allows learning from a distribution of related problems, allowing data efficient adaptation to a target task. As a suitable instantiation of this framework, we explore the use of Neural processes due to statistical and computational desiderata. We apply our framework to a broad range of problem domains, such as control problems, recommender systems and adversarial attacks on RL agents, demonstrating an efficient and general black-box learning approach.
△ Less
Submitted 12 June, 2019; v1 submitted 28 March, 2019;
originally announced March 2019.
-
Learning models for visual 3D localization with implicit map**
Authors:
Dan Rosenbaum,
Frederic Besse,
Fabio Viola,
Danilo J. Rezende,
S. M. Ali Eslami
Abstract:
We consider learning based methods for visual localization that do not require the construction of explicit maps in the form of point clouds or voxels. The goal is to learn an implicit representation of the environment at a higher, more abstract level. We propose to use a generative approach based on Generative Query Networks (GQNs, Eslami et al. 2018), asking the following questions: 1) Can GQN c…
▽ More
We consider learning based methods for visual localization that do not require the construction of explicit maps in the form of point clouds or voxels. The goal is to learn an implicit representation of the environment at a higher, more abstract level. We propose to use a generative approach based on Generative Query Networks (GQNs, Eslami et al. 2018), asking the following questions: 1) Can GQN capture more complex scenes than those it was originally demonstrated on? 2) Can GQN be used for localization in those scenes? To study this approach we consider procedurally generated Minecraft worlds, for which we can generate images of complex 3D scenes along with camera pose coordinates. We first show that GQNs, enhanced with a novel attention mechanism can capture the structure of 3D scenes in Minecraft, as evidenced by their samples. We then apply the models to the localization problem, comparing the results to a discriminative baseline, and comparing the ways each approach captures the task uncertainty.
△ Less
Submitted 12 December, 2018; v1 submitted 4 July, 2018;
originally announced July 2018.
-
Consistent Generative Query Networks
Authors:
Ananya Kumar,
S. M. Ali Eslami,
Danilo J. Rezende,
Marta Garnelo,
Fabio Viola,
Edward Lockhart,
Murray Shanahan
Abstract:
Stochastic video prediction models take in a sequence of image frames, and generate a sequence of consecutive future image frames. These models typically generate future frames in an autoregressive fashion, which is slow and requires the input and output frames to be consecutive. We introduce a model that overcomes these drawbacks by generating a latent representation from an arbitrary set of fram…
▽ More
Stochastic video prediction models take in a sequence of image frames, and generate a sequence of consecutive future image frames. These models typically generate future frames in an autoregressive fashion, which is slow and requires the input and output frames to be consecutive. We introduce a model that overcomes these drawbacks by generating a latent representation from an arbitrary set of frames that can then be used to simultaneously and efficiently sample temporally consistent frames at arbitrary time-points. For example, our model can "jump" and directly sample frames at the end of the video, without sampling intermediate frames. Synthetic video evaluations confirm substantial gains in speed and functionality without loss in fidelity. We also apply our framework to a 3D scene reconstruction dataset. Here, our model is conditioned on camera location and can sample consistent sets of images for what an occluded region of a 3D scene might look like, even if there are multiple possibilities for what that region might contain. Reconstructions and videos are available at https://bit.ly/2O4Pc4R.
△ Less
Submitted 21 April, 2019; v1 submitted 5 July, 2018;
originally announced July 2018.
-
Encoding Spatial Relations from Natural Language
Authors:
Tiago Ramalho,
Tomáš Kočiský,
Frederic Besse,
S. M. Ali Eslami,
Gábor Melis,
Fabio Viola,
Phil Blunsom,
Karl Moritz Hermann
Abstract:
Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes.…
▽ More
Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes. We present a system capable of capturing the semantics of spatial relations such as behind, left of, etc from natural language. Our key contributions are a novel multi-modal objective based on generating images of scenes from their textual descriptions, and a new dataset on which to train it. We demonstrate that internal representations are robust to meaning preserving transformations of descriptions (paraphrase invariance), while viewpoint invariance is an emergent property of the system.
△ Less
Submitted 5 July, 2018; v1 submitted 4 July, 2018;
originally announced July 2018.
-
Neural Processes
Authors:
Marta Garnelo,
Jonathan Schwarz,
Dan Rosenbaum,
Fabio Viola,
Danilo J. Rezende,
S. M. Ali Eslami,
Yee Whye Teh
Abstract:
A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision. A Gaussian process (GP), on the other hand, is a probabilistic model that defines a distribution over possible functions, and is updated in light of data via the rules of probabilistic inference. GPs are probabilistic, data-efficient and flexibl…
▽ More
A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision. A Gaussian process (GP), on the other hand, is a probabilistic model that defines a distribution over possible functions, and is updated in light of data via the rules of probabilistic inference. GPs are probabilistic, data-efficient and flexible, however they are also computationally intensive and thus limited in their applicability. We introduce a class of neural latent variable models which we call Neural Processes (NPs), combining the best of both worlds. Like GPs, NPs define distributions over functions, are capable of rapid adaptation to new observations, and can estimate the uncertainty in their predictions. Like NNs, NPs are computationally efficient during training and evaluation but also learn to adapt their priors to data. We demonstrate the performance of NPs on a range of learning tasks, including regression and optimisation, and compare and contrast with related models in the literature.
△ Less
Submitted 4 July, 2018;
originally announced July 2018.