-
Johnsen-Rahbek Capstan Clutch: A High Torque Electrostatic Clutch
Authors:
Timothy E. Amish,
Jeffrey T. Auletta,
Chad C. Kessens,
Joshua R. Smith,
Jeffrey I. Lipton
Abstract:
In many robotic systems, the holding state consumes power, limits operating time, and increases operating costs. Electrostatic clutches have the potential to improve robotic performance by generating holding torques with low power consumption. A key limitation of electrostatic clutches has been their low specific shear stresses which restrict generated holding torque, limiting many applications. H…
▽ More
In many robotic systems, the holding state consumes power, limits operating time, and increases operating costs. Electrostatic clutches have the potential to improve robotic performance by generating holding torques with low power consumption. A key limitation of electrostatic clutches has been their low specific shear stresses which restrict generated holding torque, limiting many applications. Here we show how combining the Johnsen-Rahbek (JR) effect with the exponential tension scaling capstan effect can produce clutches with the highest specific shear stress in the literature. Our system generated 31.3 N/cm^2 sheer stress and a total holding torque of 7.1 Nm while consuming only 2.5 mW/cm^2 at 500 V. We demonstrate a theoretical model of an electrostatic adhesive capstan clutch and demonstrate how large angle (theta > 2pi) designs increase efficiency over planar or small angle (theta < pi) clutch designs. We also report the first unfilled polymeric material, polybenzimidazole (PBI), to exhibit the JR-effect.
△ Less
Submitted 27 March, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
TacoGFN: Target-conditioned GFlowNet for Structure-based Drug Design
Authors:
Tony Shen,
Seonghwan Seo,
Grayson Lee,
Mohit Pandey,
Jason R Smith,
Artem Cherkasov,
Woo Youn Kim,
Martin Ester
Abstract:
Searching the vast chemical space for drug-like and synthesizable molecules with high binding affinity to a protein pocket is a challenging task in drug discovery. Recently, molecular deep generative models have been introduced which promise to be more efficient than exhaustive virtual screening, by directly generating molecules based on the protein structure. However, since they learn the distrib…
▽ More
Searching the vast chemical space for drug-like and synthesizable molecules with high binding affinity to a protein pocket is a challenging task in drug discovery. Recently, molecular deep generative models have been introduced which promise to be more efficient than exhaustive virtual screening, by directly generating molecules based on the protein structure. However, since they learn the distribution of a limited protein-ligand complex dataset, the existing methods struggle with generating novel molecules with significant property improvements. In this paper, we frame the generation task as a Reinforcement Learning task, where the goal is to search the wider chemical space for molecules with desirable properties as opposed to fitting a training data distribution. More specifically, we propose TacoGFN, a Generative Flow Network conditioned on protein pocket structure, using binding affinity, drug-likeliness and synthesizability measures as our reward. Empirically, our method outperforms state-of-art methods on the CrossDocked2020 benchmark for every molecular property (Vina score, QED, SA), while significantly improving the generation time. TacoGFN achieves $-8.82$ in median docking score and $52.63\%$ in Novel Hit Rate.
△ Less
Submitted 7 April, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Stackelberg Games for Learning Emergent Behaviors During Competitive Autocurricula
Authors:
Boling Yang,
Liyuan Zheng,
Lillian J. Ratliff,
Byron Boots,
Joshua R. Smith
Abstract:
Autocurricular training is an important sub-area of multi-agent reinforcement learning~(MARL) that allows multiple agents to learn emergent skills in an unsupervised co-evolving scheme. The robotics community has experimented autocurricular training with physically grounded problems, such as robust control and interactive manipulation tasks. However, the asymmetric nature of these tasks makes the…
▽ More
Autocurricular training is an important sub-area of multi-agent reinforcement learning~(MARL) that allows multiple agents to learn emergent skills in an unsupervised co-evolving scheme. The robotics community has experimented autocurricular training with physically grounded problems, such as robust control and interactive manipulation tasks. However, the asymmetric nature of these tasks makes the generation of sophisticated policies challenging. Indeed, the asymmetry in the environment may implicitly or explicitly provide an advantage to a subset of agents which could, in turn, lead to a low-quality equilibrium. This paper proposes a novel game-theoretic algorithm, Stackelberg Multi-Agent Deep Deterministic Policy Gradient (ST-MADDPG), which formulates a two-player MARL problem as a Stackelberg game with one player as the `leader' and the other as the `follower' in a hierarchical interaction structure wherein the leader has an advantage. We first demonstrate that the leader's advantage from ST-MADDPG can be used to alleviate the inherent asymmetry in the environment. By exploiting the leader's advantage, ST-MADDPG improves the quality of a co-evolution process and results in more sophisticated and complex strategies that work well even against an unseen strong opponent.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Accelerating Material Design with the Generative Toolkit for Scientific Discovery
Authors:
Matteo Manica,
Jannis Born,
Joris Cadow,
Dimitrios Christofidellis,
Ashish Dave,
Dean Clarke,
Yves Gaetan Nana Teukam,
Giorgio Giannone,
Samuel C. Hoffman,
Matthew Buchan,
Vijil Chenthamarakshan,
Timothy Donovan,
Hsiang Han Hsu,
Federico Zipoli,
Oliver Schilter,
Akihiro Kishimoto,
Lisa Hamada,
Inkit Padhi,
Karl Wehden,
Lauren McHugh,
Alexy Khrabrov,
Payel Das,
Seiji Takeda,
John R. Smith
Abstract:
With the growing availability of data within various scientific domains, generative models hold enormous potential to accelerate scientific discovery. They harness powerful representations learned from datasets to speed up the formulation of novel hypotheses with the potential to impact material discovery broadly. We present the Generative Toolkit for Scientific Discovery (GT4SD). This extensible…
▽ More
With the growing availability of data within various scientific domains, generative models hold enormous potential to accelerate scientific discovery. They harness powerful representations learned from datasets to speed up the formulation of novel hypotheses with the potential to impact material discovery broadly. We present the Generative Toolkit for Scientific Discovery (GT4SD). This extensible open-source library enables scientists, developers, and researchers to train and use state-of-the-art generative models to accelerate scientific discovery focused on material design.
△ Less
Submitted 31 January, 2023; v1 submitted 8 July, 2022;
originally announced July 2022.
-
Improved Object Pose Estimation via Deep Pre-touch Sensing
Authors:
Patrick Lancaster,
Boling Yang,
Joshua R. Smith
Abstract:
For certain manipulation tasks, object pose estimation from head-mounted cameras may not be sufficiently accurate. This is at least in part due to our inability to perfectly calibrate the coordinate frames of today's high degree of freedom robot arms that link the head to the end-effectors. We present a novel framework combining pre-touch sensing and deep learning to more accurately estimate pose…
▽ More
For certain manipulation tasks, object pose estimation from head-mounted cameras may not be sufficiently accurate. This is at least in part due to our inability to perfectly calibrate the coordinate frames of today's high degree of freedom robot arms that link the head to the end-effectors. We present a novel framework combining pre-touch sensing and deep learning to more accurately estimate pose in an efficient manner. The use of pre-touch sensing allows our method to localize the object directly with respect to the robot's end effector, thereby avoiding error caused by miscalibration of the arms. Instead of requiring the robot to scan the entire object with its pre-touch sensor, we use a deep neural network to detect object regions that contain distinctive geometric features. By focusing pre-touch sensing on these regions, the robot can more efficiently gather the information necessary to adjust its original pose estimate. Our region detection network was trained using a new dataset containing objects of widely varying geometries and has been labeled in a scalable fashion that is free from human bias. This dataset is applicable to any task that involves a pre-touch sensor gathering geometric information, and has been made publicly available. We evaluate our framework by having the robot re-estimate the pose of a number of objects of varying geometries. Compared to two simpler region proposal methods, we find that our deep neural network performs significantly better. In addition, we find that after a sequence of scans, objects can typically be localized to within 0.5 cm of their true position. We also observe that the original pose estimate can often be significantly improved after collecting a single quick scan.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
Optical Proximity Sensing for Pose Estimation During In-Hand Manipulation
Authors:
Patrick Lancaster,
Pratik Gyawali,
Christoforos Mavrogiannis,
Siddhartha S. Srinivasa,
Joshua R. Smith
Abstract:
During in-hand manipulation, robots must be able to continuously estimate the pose of the object in order to generate appropriate control actions. The performance of algorithms for pose estimation hinges on the robot's sensors being able to detect discriminative geometric object features, but previous sensing modalities are unable to make such measurements robustly. The robot's fingers can occlude…
▽ More
During in-hand manipulation, robots must be able to continuously estimate the pose of the object in order to generate appropriate control actions. The performance of algorithms for pose estimation hinges on the robot's sensors being able to detect discriminative geometric object features, but previous sensing modalities are unable to make such measurements robustly. The robot's fingers can occlude the view of environment- or robot-mounted image sensors, and tactile sensors can only measure at the local areas of contact. Motivated by fingertip-embedded proximity sensors' robustness to occlusion and ability to measure beyond the local areas of contact, we present the first evaluation of proximity sensor based pose estimation for in-hand manipulation. We develop a novel two-fingered hand with fingertip-embedded optical time-of-flight proximity sensors as a testbed for pose estimation during planar in-hand manipulation. Here, the in-hand manipulation task consists of the robot moving a cylindrical object from one end of its workspace to the other. We demonstrate, with statistical significance, that proximity-sensor based pose estimation via particle filtering during in-hand manipulation: a) exhibits 50% lower average pose error than a tactile-sensor based baseline; b) empowers a model predictive controller to achieve 30% lower final positioning error compared to when using tactile-sensor based pose estimates.
△ Less
Submitted 30 October, 2023; v1 submitted 5 April, 2022;
originally announced April 2022.
-
Benchmarking Robot Manipulation with the Rubik's Cube
Authors:
Boling Yang,
Patrick E. Lancaster,
Siddhartha S. Srinivasa,
Joshua R. Smith
Abstract:
Benchmarks for robot manipulation are crucial to measuring progress in the field, yet there are few benchmarks that demonstrate critical manipulation skills, possess standardized metrics, and can be attempted by a wide array of robot platforms. To address a lack of such benchmarks, we propose Rubik's cube manipulation as a benchmark to measure simultaneous performance of precise manipulation and s…
▽ More
Benchmarks for robot manipulation are crucial to measuring progress in the field, yet there are few benchmarks that demonstrate critical manipulation skills, possess standardized metrics, and can be attempted by a wide array of robot platforms. To address a lack of such benchmarks, we propose Rubik's cube manipulation as a benchmark to measure simultaneous performance of precise manipulation and sequential manipulation. The sub-structure of the Rubik's cube demands precise positioning of the robot's end effectors, while its highly reconfigurable nature enables tasks that require the robot to manage pose uncertainty throughout long sequences of actions. We present a protocol for quantitatively measuring both the accuracy and speed of Rubik's cube manipulation. This protocol can be attempted by any general-purpose manipulator, and only requires a standard 3x3 Rubik's cube and a flat surface upon which the Rubik's cube initially rests (e.g. a table). We demonstrate this protocol for two distinct baseline approaches on a PR2 robot. The first baseline provides a fundamental approach for pose-based Rubik's cube manipulation. The second baseline demonstrates the benchmark's ability to quantify improved performance by the system, particularly that resulting from the integration of pre-touch sensing. To demonstrate the benchmark's applicability to other robot platforms and algorithmic approaches, we present the functional blocks required to enable the HERB robot to manipulate the Rubik's cube via push-gras**.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
Motivating Physical Activity via Competitive Human-Robot Interaction
Authors:
Boling Yang,
Golnaz Habibi,
Patrick E. Lancaster,
Byron Boots,
Joshua R. Smith
Abstract:
This project aims to motivate research in competitive human-robot interaction by creating a robot competitor that can challenge human users in certain scenarios such as physical exercise and games. With this goal in mind, we introduce the Fencing Game, a human-robot competition used to evaluate both the capabilities of the robot competitor and user experience. We develop the robot competitor throu…
▽ More
This project aims to motivate research in competitive human-robot interaction by creating a robot competitor that can challenge human users in certain scenarios such as physical exercise and games. With this goal in mind, we introduce the Fencing Game, a human-robot competition used to evaluate both the capabilities of the robot competitor and user experience. We develop the robot competitor through iterative multi-agent reinforcement learning and show that it can perform well against human competitors. Our user study additionally found that our system was able to continuously create challenging and enjoyable interactions that significantly increased human subjects' heart rates. The majority of human subjects considered the system to be entertaining and desirable for improving the quality of their exercise.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
Communication by means of Modulated Johnson Noise
Authors:
Zerina Kapetanovic,
Miguel Morales,
Joshua R. Smith
Abstract:
We present the design of a new passive wireless communication system that does not rely on ambient or generated RF sources. Instead, we exploit the Johnson (thermal) noise generated by a resistor to transmit information bits wirelessly. By switching the load connected to an antenna between a resistor and open circuit, we can achieve data rates of up to 26bps and distances of up to 7.3 meters. This…
▽ More
We present the design of a new passive wireless communication system that does not rely on ambient or generated RF sources. Instead, we exploit the Johnson (thermal) noise generated by a resistor to transmit information bits wirelessly. By switching the load connected to an antenna between a resistor and open circuit, we can achieve data rates of up to 26bps and distances of up to 7.3 meters. This communication method is orders of magnitude less power consuming than conventional communication schemes and presents the opportunity to enable wireless communication in areas with a complete lack of connectivity.
△ Less
Submitted 6 August, 2022; v1 submitted 16 November, 2021;
originally announced November 2021.
-
No Size Fits All: Automated Radio Configuration for LPWANs
Authors:
Zerina Kapetanovic,
Deepak Vasisht,
Tusher Chakraborty,
Joshua R. Smith,
Ranveer Chandra
Abstract:
Low power long-range networks like LoRa have become increasingly mainstream for Internet of Things deployments. Given the versatility of applications that these protocols enable, they support many data rates and bandwidths. Yet, for a given network that supports hundreds of devices over multiple miles, the network operator typically needs to specify the same configuration or among a small subset o…
▽ More
Low power long-range networks like LoRa have become increasingly mainstream for Internet of Things deployments. Given the versatility of applications that these protocols enable, they support many data rates and bandwidths. Yet, for a given network that supports hundreds of devices over multiple miles, the network operator typically needs to specify the same configuration or among a small subset of configurations for all the client devices to communicate with the gateway. This one-size-fits-all approach is highly inefficient in large networks. We propose an alternative approach -- we allow network devices to transmit at any data rate they choose. The gateway uses the first few symbols in the preamble to classify the correct data rate, switches its configuration, and then decodes the data. Our design leverages the inherent asymmetry in outdoor IoT deployments where the clients are power-starved and resource-constrained, but the gateway is not. Our gateway design, Proteus, runs a neural network architecture and is backward compatible with existing LoRa protocols. Our experiments reveal that Proteus can identify the correct configuration with over 97% accuracy in both indoor and outdoor deployments. Our network architecture leads to a 3.8 to 11 times increase in throughput for our LoRa testbed.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
Proximity Perception in Human-Centered Robotics: A Survey on Sensing Systems and Applications
Authors:
Stefan Escaida Navarro,
Stephan Mühlbacher-Karrer,
Hosam Alagi,
Hubert Zangl,
Keisuke Koyama,
Björn Hein,
Christian Duriez,
Joshua R. Smith
Abstract:
Proximity perception is a technology that has the potential to play an essential role in the future of robotics. It can fulfill the promise of safe, robust, and autonomous systems in industry and everyday life, alongside humans, as well as in remote locations in space and underwater. In this survey paper, we cover the developments of this field from the early days up to the present, with a focus o…
▽ More
Proximity perception is a technology that has the potential to play an essential role in the future of robotics. It can fulfill the promise of safe, robust, and autonomous systems in industry and everyday life, alongside humans, as well as in remote locations in space and underwater. In this survey paper, we cover the developments of this field from the early days up to the present, with a focus on human-centered robotics. Here, proximity sensors are typically deployed in two scenarios: first, on the exterior of manipulator arms to support safety and interaction functionality, and second, on the inside of grippers or hands to support gras** and exploration. Starting from this observation, we propose a categorization for the approaches found in the literature. To provide a basis for understanding these approaches, we devote effort to present the technologies and different measuring principles that were developed over the years, also providing a summary in form of a table. Then, we show the diversity of applications that have been presented in the literature. Finally, we give an overview of the most important trends that will shape the future of this domain.
△ Less
Submitted 17 August, 2021; v1 submitted 16 August, 2021;
originally announced August 2021.
-
Few shot clustering for indoor occupancy detection with extremely low-quality images from battery free cameras
Authors:
Homagni Saha,
Sin Yong Tan,
Ali Saffari,
Mohamad Katanbaf,
Joshua R. Smith,
Soumik Sarkar
Abstract:
Reliable detection of human occupancy in indoor environments is critical for various energy efficiency, security, and safety applications. We consider this challenge of occupancy detection using extremely low-quality, privacy-preserving images from low power image sensors. We propose a combined few shot learning and clustering algorithm to address this challenge that has very low commissioning and…
▽ More
Reliable detection of human occupancy in indoor environments is critical for various energy efficiency, security, and safety applications. We consider this challenge of occupancy detection using extremely low-quality, privacy-preserving images from low power image sensors. We propose a combined few shot learning and clustering algorithm to address this challenge that has very low commissioning and maintenance cost. While the few shot learning concept enables us to commission our system with a few labeled examples, the clustering step serves the purpose of online adaptation to changing imaging environment over time. Apart from validating and comparing our algorithm on benchmark datasets, we also demonstrate performance of our algorithm on streaming images collected from real homes using our novel battery free camera hardware.
△ Less
Submitted 12 August, 2020;
originally announced August 2020.
-
Contact-less manipulation of millimeter-scale objects via ultrasonic levitation
Authors:
Jared Nakahara,
Boling Yang,
Joshua R. Smith
Abstract:
Although general purpose robotic manipulators are becoming more capable at manipulating various objects, their ability to manipulate millimeter-scale objects are usually very limited. On the other hand, ultrasonic levitation devices have been shown to levitate a large range of small objects, from polystyrene balls to living organisms. By controlling the acoustic force fields, ultrasonic levitation…
▽ More
Although general purpose robotic manipulators are becoming more capable at manipulating various objects, their ability to manipulate millimeter-scale objects are usually very limited. On the other hand, ultrasonic levitation devices have been shown to levitate a large range of small objects, from polystyrene balls to living organisms. By controlling the acoustic force fields, ultrasonic levitation devices can compensate for robot manipulator positioning uncertainty and control the gras** force exerted on the target object. The material agnostic nature of acoustic levitation devices and their ability to dexterously manipulate millimeter-scale objects make them appealing as a gras** mode for general purpose robots. In this work, we present an ultrasonic, contact-less manipulation device that can be attached to or picked up by any general purpose robotic arm, enabling millimeter-scale manipulation with little to no modification to the robot itself. This device is capable of performing the very first phase-controlled picking action on acoustically reflective surfaces. With the manipulator placed around the target object, the manipulator can grasp objects smaller in size than the robot's positioning uncertainty, trap the object to resist air currents during robot movement, and dexterously hold a small and fragile object, like a flower bud. Due to the contact-less nature of the ultrasound-based gripper, a camera positioned to look into the cylinder can inspect the object without occlusion, facilitating accurate visual feature extraction.
△ Less
Submitted 20 February, 2020;
originally announced February 2020.
-
Covering the News with (AI) Style
Authors:
Michele Merler,
Cicero Nogueira dos Santos,
Mauro Martino,
Alfio M. Gliozzo,
John R. Smith
Abstract:
We introduce a multi-modal discriminative and generative frame-work capable of assisting humans in producing visual content re-lated to a given theme, starting from a collection of documents(textual, visual, or both). This framework can be used by edit or to generate images for articles, as well as books or music album covers. Motivated by a request from the The New York Times (NYT) seeking help t…
▽ More
We introduce a multi-modal discriminative and generative frame-work capable of assisting humans in producing visual content re-lated to a given theme, starting from a collection of documents(textual, visual, or both). This framework can be used by edit or to generate images for articles, as well as books or music album covers. Motivated by a request from the The New York Times (NYT) seeking help to use AI to create art for their special section on Artificial Intelligence, we demonstrated the application of our system in producing such image.
△ Less
Submitted 5 January, 2020;
originally announced February 2020.
-
A Broader Study of Cross-Domain Few-Shot Learning
Authors:
Yunhui Guo,
Noel C. Codella,
Leonid Karlinsky,
James V. Codella,
John R. Smith,
Kate Saenko,
Tajana Rosing,
Rogerio Feris
Abstract:
Recent progress on few-shot learning largely relies on annotated data for meta-learning: base classes sampled from the same domain as the novel classes. However, in many applications, collecting data for meta-learning is infeasible or impossible. This leads to the cross-domain few-shot learning problem, where there is a large shift between base and novel class domains. While investigations of the…
▽ More
Recent progress on few-shot learning largely relies on annotated data for meta-learning: base classes sampled from the same domain as the novel classes. However, in many applications, collecting data for meta-learning is infeasible or impossible. This leads to the cross-domain few-shot learning problem, where there is a large shift between base and novel class domains. While investigations of the cross-domain few-shot scenario exist, these works are limited to natural images that still contain a high degree of visual similarity. No work yet exists that examines few-shot learning across different imaging methods seen in real world scenarios, such as aerial and medical imaging. In this paper, we propose the Broader Study of Cross-Domain Few-Shot Learning (BSCD-FSL) benchmark, consisting of image data from a diverse assortment of image acquisition methods. This includes natural images, such as crop disease images, but additionally those that present with an increasing dissimilarity to natural images, such as satellite images, dermatology images, and radiology images. Extensive experiments on the proposed benchmark are performed to evaluate state-of-art meta-learning approaches, transfer learning approaches, and newer methods for cross-domain few-shot learning. The results demonstrate that state-of-art meta-learning methods are surprisingly outperformed by earlier meta-learning approaches, and all meta-learning methods underperform in relation to simple fine-tuning by 12.8% average accuracy. Performance gains previously observed with methods specialized for cross-domain few-shot learning vanish in this more challenging benchmark. Finally, accuracy of all methods tend to correlate with dataset similarity to natural images, verifying the value of the benchmark to better represent the diversity of data seen in practice and guiding future research.
△ Less
Submitted 17 July, 2020; v1 submitted 16 December, 2019;
originally announced December 2019.
-
MuSHR: A Low-Cost, Open-Source Robotic Racecar for Education and Research
Authors:
Siddhartha S. Srinivasa,
Patrick Lancaster,
Johan Michalove,
Matt Schmittle,
Colin Summers,
Matthew Rockett,
Rosario Scalise,
Joshua R. Smith,
Sanjiban Choudhury,
Christoforos Mavrogiannis,
Fereshteh Sadeghi
Abstract:
We present MuSHR, the Multi-agent System for non-Holonomic Racing. MuSHR is a low-cost, open-source robotic racecar platform for education and research, developed by the Personal Robotics Lab in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. MuSHR aspires to contribute towards democratizing the field of robotics as a low-cost platform that can be built…
▽ More
We present MuSHR, the Multi-agent System for non-Holonomic Racing. MuSHR is a low-cost, open-source robotic racecar platform for education and research, developed by the Personal Robotics Lab in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. MuSHR aspires to contribute towards democratizing the field of robotics as a low-cost platform that can be built and deployed by following detailed, open documentation and do-it-yourself tutorials. A set of demos and lab assignments developed for the Mobile Robots course at the University of Washington provide guided, hands-on experience with the platform, and milestones for further development. MuSHR is a valuable asset for academic research labs, robotics instructors, and robotics enthusiasts.
△ Less
Submitted 24 December, 2023; v1 submitted 21 August, 2019;
originally announced August 2019.
-
Diversity in Faces
Authors:
Michele Merler,
Nalini Ratha,
Rogerio S. Feris,
John R. Smith
Abstract:
Face recognition is a long standing challenge in the field of Artificial Intelligence (AI). The goal is to create systems that accurately detect, recognize, verify, and understand human faces. There are significant technical hurdles in making these systems accurate, particularly in unconstrained settings due to confounding factors related to pose, resolution, illumination, occlusion, and viewpoint…
▽ More
Face recognition is a long standing challenge in the field of Artificial Intelligence (AI). The goal is to create systems that accurately detect, recognize, verify, and understand human faces. There are significant technical hurdles in making these systems accurate, particularly in unconstrained settings due to confounding factors related to pose, resolution, illumination, occlusion, and viewpoint. However, with recent advances in neural networks, face recognition has achieved unprecedented accuracy, largely built on data-driven deep learning methods. While this is encouraging, a critical aspect that is limiting facial recognition accuracy and fairness is inherent facial diversity. Every face is different. Every face reflects something unique about us. Aspects of our heritage - including race, ethnicity, culture, geography - and our individual identify - age, gender, and other visible manifestations of self-expression, are reflected in our faces. We expect face recognition to work equally accurately for every face. Face recognition needs to be fair. As we rely on data-driven methods to create face recognition technology, we need to ensure necessary balance and coverage in training data. However, there are still scientific questions about how to represent and extract pertinent facial features and quantitatively measure facial diversity. Towards this goal, Diversity in Faces (DiF) provides a data set of one million annotated human face images for advancing the study of facial diversity. The annotations are generated using ten well-established facial coding schemes from the scientific literature. The facial coding schemes provide human-interpretable quantitative measures of facial features. We believe that by making the extracted coding schemes available on a large set of faces, we can accelerate research and development towards creating more fair and accurate facial recognition systems.
△ Less
Submitted 8 April, 2019; v1 submitted 29 January, 2019;
originally announced January 2019.
-
Improved Proximity, Contact, and Force Sensing via Optimization of Elastomer-Air Interface Geometry
Authors:
Patrick E. Lancaster,
Joshua R. Smith,
Siddhartha S. Srinivasa
Abstract:
We describe a single fingertip-mounted sensing system for robot manipulation that provides proximity (pre-touch), contact detection (touch), and force sensing (post-touch). The sensor system consists of optical time-of-flight range measurement modules covered in a clear elastomer. Because the elastomer is clear, the sensor can detect and range nearby objects, as well as measure deformations caused…
▽ More
We describe a single fingertip-mounted sensing system for robot manipulation that provides proximity (pre-touch), contact detection (touch), and force sensing (post-touch). The sensor system consists of optical time-of-flight range measurement modules covered in a clear elastomer. Because the elastomer is clear, the sensor can detect and range nearby objects, as well as measure deformations caused by objects that are in contact with the sensor and thereby estimate the applied force. We examine how this sensor design can be improved with respect to invariance to object reflectivity, signal-to-noise ratio, and continuous operation when switching between the distance and force measurement regimes. By harnessing time-of-flight technology and optimizing the elastomer-air boundary to control the emitted light's path, we develop a sensor that is able to seamlessly transition between measuring distances of up to 50mm and contact forces of up to 10 newtons. Furthermore, we provide all hardware design files and software sources, and offer thorough instructions on how to manufacture the sensor from inexpensive, commercially available components.
△ Less
Submitted 30 September, 2018;
originally announced October 2018.
-
Collaborative Human-AI (CHAI): Evidence-Based Interpretable Melanoma Classification in Dermoscopic Images
Authors:
Noel C. F. Codella,
Chung-Ching Lin,
Allan Halpern,
Michael Hind,
Rogerio Feris,
John R. Smith
Abstract:
Automated dermoscopic image analysis has witnessed rapid growth in diagnostic performance. Yet adoption faces resistance, in part, because no evidence is provided to support decisions. In this work, an approach for evidence-based classification is presented. A feature embedding is learned with CNNs, triplet-loss, and global average pooling, and used to classify via kNN search. Evidence is provided…
▽ More
Automated dermoscopic image analysis has witnessed rapid growth in diagnostic performance. Yet adoption faces resistance, in part, because no evidence is provided to support decisions. In this work, an approach for evidence-based classification is presented. A feature embedding is learned with CNNs, triplet-loss, and global average pooling, and used to classify via kNN search. Evidence is provided as both the discovered neighbors, as well as localized image regions most relevant to measuring distance between query and neighbors. To ensure that results are relevant in terms of both label accuracy and human visual similarity for any skill level, a novel hierarchical triplet logic is implemented to jointly learn an embedding according to disease labels and non-expert similarity. Results are improved over baselines trained on disease labels alone, as well as standard multiclass loss. Quantitative relevance of results, according to non-expert similarity, as well as localized image regions, are also significantly improved.
△ Less
Submitted 1 August, 2018; v1 submitted 30 May, 2018;
originally announced May 2018.
-
Wireless Quantization Index Modulation: Enabling Communication Through Existing Signals
Authors:
Zerina Kapetanovic,
Vamsi Talla,
Aaron Parks,
**g Qian,
Joshua R. Smith
Abstract:
As the number of IoT devices continue to exponentially increase and saturate the wireless spectrum, there is a dire need for additional spectrum to support large networks of wireless devices. Over the past years, many promising solutions have been proposed but they all suffer from the drawback of new infrastructure costs, setup and maintenance, or are difficult to implement due to FCC regulations.…
▽ More
As the number of IoT devices continue to exponentially increase and saturate the wireless spectrum, there is a dire need for additional spectrum to support large networks of wireless devices. Over the past years, many promising solutions have been proposed but they all suffer from the drawback of new infrastructure costs, setup and maintenance, or are difficult to implement due to FCC regulations. In this paper, we propose a novel Wireless Quantization Index Modulation (QIM) technique which uses existing infrastructure to embed information into existing wireless signals to communicate with IoT devices with negligible impact on the original signal and zero spectrum overhead. We explore the design space for wireless QIM and evaluate the performance of embedding information in TV, FM and AM radio broadcast signals under different conditions. We demonstrate that we can embed messages at up to 8-200~kbps with negligible impact on the audio and video quality of the original FM, AM and TV signals respectively.
△ Less
Submitted 24 April, 2018;
originally announced April 2018.
-
Ultra-low-power Wireless Streaming Cameras
Authors:
Saman Naderiparizi,
Mehrdad Hessar,
Vamsi Talla,
Shyamnath Gollakota,
Joshua R. Smith
Abstract:
Wireless video streaming has traditionally been considered an extremely power-hungry operation. Existing approaches optimize the camera and communication modules individually to minimize their power consumption. However, the joint redesign and optimization of wireless communication as well as the camera is what that provides more power saving. We present an ultra-low-power wireless video streaming…
▽ More
Wireless video streaming has traditionally been considered an extremely power-hungry operation. Existing approaches optimize the camera and communication modules individually to minimize their power consumption. However, the joint redesign and optimization of wireless communication as well as the camera is what that provides more power saving. We present an ultra-low-power wireless video streaming camera. To achieve this, we present a novel "analog" video backscatter technique that feeds analog pixels from the photo-diodes directly to the backscatter hardware, thereby eliminating power consuming hardware components such as ADCs and amplifiers. We prototype our wireless camera using off-the-shelf hardware and show that our design can stream video at up to 13 FPS and can operate up to a distance of 150 feet from the access point. Our COTS prototype consumes 2.36mW. Finally, to demonstrate the potential of our design, we built two proof-of-concept applications: video streaming for micro-robots and security cameras for face detection.
△ Less
Submitted 27 July, 2017;
originally announced July 2017.
-
Automatic Curation of Golf Highlights using Multimodal Excitement Features
Authors:
Michele Merler,
Dhiraj Joshi,
Quoc-Bao Nguyen,
Stephen Hammer,
John Kent,
John R. Smith,
Rogerio S. Feris
Abstract:
The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose a novel approach for auto-curating sports highlights, and use it to create a real-world system for the editorial aid of golf highlight reels. Our method fuses information from the players' reactions (action recog…
▽ More
The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose a novel approach for auto-curating sports highlights, and use it to create a real-world system for the editorial aid of golf highlight reels. Our method fuses information from the players' reactions (action recognition such as high-fives and fist pumps), spectators (crowd cheering), and commentator (tone of the voice and word analysis) to determine the most interesting moments of a game. We accurately identify the start and end frames of key shot highlights with additional metadata, such as the player's name and the hole number, allowing personalized content summarization and retrieval. In addition, we introduce new techniques for learning our classifiers with reduced manual training data annotation by exploiting the correlation of different modalities. Our work has been demonstrated at a major golf tournament, successfully extracting highlights from live video streams over four consecutive days.
△ Less
Submitted 21 July, 2017;
originally announced July 2017.
-
LoRa Backscatter: Enabling The Vision of Ubiquitous Connectivity
Authors:
Vamsi Talla,
Mehrdad Hessar,
Bryce Kellogg,
Ali Najafi,
Joshua R. Smith,
Shyamnath Gollakota
Abstract:
The vision of embedding connectivity into billions of everyday objects runs into the reality of existing communication technologies --- there is no existing wireless technology that can provide reliable and long-range communication at tens of microwatts of power as well as cost less than a dime. While backscatter is low-power and low-cost, it is known to be limited to short ranges. This paper over…
▽ More
The vision of embedding connectivity into billions of everyday objects runs into the reality of existing communication technologies --- there is no existing wireless technology that can provide reliable and long-range communication at tens of microwatts of power as well as cost less than a dime. While backscatter is low-power and low-cost, it is known to be limited to short ranges. This paper overturns this conventional wisdom about backscatter and presents the first wide-area backscatter system. Our design can successfully backscatter from any location between an RF source and receiver, separated by 475 m, while being compatible with commodity LoRa hardware. Further, when our backscatter device is co-located with the RF source, the receiver can be as far as 2.8 km away. We deploy our system in a 4,800 $ft^{2}$ (446 $m^{2}$) house spread across three floors, a 13,024 $ft^{2}$ (1210 $m^{2}$) office area covering 41 rooms, as well as a one-acre (4046 $m^{2}$) vegetable farm and show that we can achieve reliable coverage, using only a single RF source and receiver. We also build a contact lens prototype as well as a flexible epidermal patch device attached to the human skin. We show that these devices can reliably backscatter data across a 3,328 $ft^{2}$ (309 $m^{2}$) room. Finally, we present a design sketch of a LoRa backscatter IC that shows that it costs less than a dime at scale and consumes only 9.25 $μ$W of power, which is more than 1000x lower power than LoRa radio chipsets.
△ Less
Submitted 16 May, 2017;
originally announced May 2017.
-
FM Backscatter: Enabling Connected Cities and Smart Fabrics
Authors:
Anran Wang,
Vikram Iyer,
Vamsi Talla,
Joshua R. Smith,
Shyamnath Gollakota
Abstract:
This paper enables connectivity on everyday objects by transforming them into FM radio stations. To do this, we show for the first time that ambient FM radio signals can be used as a signal source for backscatter communication. Our design creates backscatter transmissions that can be decoded on any FM receiver including those in cars and smartphones. This enables us to achieve a previously infeasi…
▽ More
This paper enables connectivity on everyday objects by transforming them into FM radio stations. To do this, we show for the first time that ambient FM radio signals can be used as a signal source for backscatter communication. Our design creates backscatter transmissions that can be decoded on any FM receiver including those in cars and smartphones. This enables us to achieve a previously infeasible capability: backscattering information to cars and smartphones in outdoor environments.
Our key innovation is a modulation technique that transforms backscatter, which is a multiplication operation on RF signals, into an addition operation on the audio signals output by FM receivers. This enables us to embed both digital data as well as arbitrary audio into ambient analog FM radio signals. We build prototype hardware of our design and successfully embed audio transmissions over ambient FM signals. Further, we achieve data rates of up to 3.2 kbps and ranges of 5-60 feet, while consuming as little as 11.07μW of power. To demonstrate the potential of our design, we also fabricate our prototype on a cotton t-shirt by machine sewing patterns of a conductive thread to create a smart fabric that can transmit data to a smartphone. We also embed FM antennas into posters and billboards and show that they can communicate with FM receivers in cars and smartphones.
△ Less
Submitted 24 February, 2017; v1 submitted 22 February, 2017;
originally announced February 2017.
-
Deep Learning Ensembles for Melanoma Recognition in Dermoscopy Images
Authors:
Noel Codella,
Quoc-Bao Nguyen,
Sharath Pankanti,
David Gutman,
Brian Helba,
Allan Halpern,
John R. Smith
Abstract:
Melanoma is the deadliest form of skin cancer. While curable with early detection, only highly trained specialists are capable of accurately recognizing the disease. As expertise is in limited supply, automated systems capable of identifying disease could save lives, reduce unnecessary biopsies, and reduce costs. Toward this goal, we propose a system that combines recent developments in deep learn…
▽ More
Melanoma is the deadliest form of skin cancer. While curable with early detection, only highly trained specialists are capable of accurately recognizing the disease. As expertise is in limited supply, automated systems capable of identifying disease could save lives, reduce unnecessary biopsies, and reduce costs. Toward this goal, we propose a system that combines recent developments in deep learning with established machine learning approaches, creating ensembles of methods that are capable of segmenting skin lesions, as well as analyzing the detected area and surrounding tissue for melanoma detection. The system is evaluated using the largest publicly available benchmark dataset of dermoscopic images, containing 900 training and 379 testing images. New state-of-the-art performance levels are demonstrated, leading to an improvement in the area under receiver operating characteristic curve of 7.5% (0.843 vs. 0.783), in average precision of 4% (0.649 vs. 0.624), and in specificity measured at the clinically relevant 95% sensitivity operating point 2.9 times higher than the previous state-of-the-art (36.8% specificity compared to 12.5%). Compared to the average of 8 expert dermatologists on a subset of 100 test images, the proposed system produces a higher accuracy (76% vs. 70.5%), and specificity (62% vs. 59%) evaluated at an equivalent sensitivity (82%).
△ Less
Submitted 17 October, 2016; v1 submitted 14 October, 2016;
originally announced October 2016.
-
Inter-Technology Backscatter: Towards Internet Connectivity for Implanted Devices
Authors:
Vikram Iyer,
Vamsi Talla,
Bryce Kellogg,
Shyamnath Gollakota,
Joshua R. Smith
Abstract:
We introduce inter-technology backscatter, a novel approach that transforms wireless transmissions from one technology to another, on the air. Specifically, we show for the first time that Bluetooth transmissions can be used to create Wi-Fi and ZigBee-compatible signals using backscatter communication. Since Bluetooth, Wi-Fi and ZigBee radios are widely available, this approach enables a backscatt…
▽ More
We introduce inter-technology backscatter, a novel approach that transforms wireless transmissions from one technology to another, on the air. Specifically, we show for the first time that Bluetooth transmissions can be used to create Wi-Fi and ZigBee-compatible signals using backscatter communication. Since Bluetooth, Wi-Fi and ZigBee radios are widely available, this approach enables a backscatter design that works using only commodity devices.
We build prototype backscatter hardware using an FPGA and experiment with various Wi-Fi, Bluetooth and ZigBee devices. Our experiments show we can create 2-11 Mbps Wi-Fi standards-compliant signals by backscattering Bluetooth transmissions. To show the generality of our approach, we also demonstrate generation of standards-complaint ZigBee signals by backscattering Bluetooth transmissions. Finally, we build proof-of-concepts for previously infeasible applications including the first contact lens form-factor antenna prototype and an implantable neural recording interface that communicate directly with commodity devices such as smartphones and watches, thus enabling the vision of Internet connected implanted devices.
△ Less
Submitted 15 July, 2016;
originally announced July 2016.
-
BLISP: Enhancing Backscatter Radio with Active Radio for Computational RFIDs
Authors:
Ivar in 't Veen,
Qingzhi Liu,
Przemysław Pawełczak,
Aaron Parks,
Joshua R. Smith
Abstract:
We demonstrate the world's first hybrid radio platform which combines the strengths of active radio (long range and robustness to interference) and Computational RFIDs (low power consumption). We evaluate the Wireless Identification and Sensing Platform (WISP), an EPC C1G2 standard-based, Computational RFID backscatter radio, against Bluetooth Low Energy (BLE) and show (theoretically and experimen…
▽ More
We demonstrate the world's first hybrid radio platform which combines the strengths of active radio (long range and robustness to interference) and Computational RFIDs (low power consumption). We evaluate the Wireless Identification and Sensing Platform (WISP), an EPC C1G2 standard-based, Computational RFID backscatter radio, against Bluetooth Low Energy (BLE) and show (theoretically and experimentally) that WISP in high channel attenuation conditions is less energy efficient per received byte than BLE. Exploiting this observation we design a simple switching mechanisms that backs off to BLE when radio conditions for WISP are unfavorable. By a set of laboratory experiments, we show that our proposed hybrid active/backscatter radio obtains higher goodput than WISP and lower energy consumption than BLE as stand-alone platforms, especially when WISP is in range of an RFID interrogator for the majority of the time. Simultaneously, our proposed platform is as energy efficient as BLE when user is mostly out of RFID interrogator range.
△ Less
Submitted 23 April, 2016; v1 submitted 14 March, 2016;
originally announced April 2016.
-
Wisent: Robust Downstream Communication and Storage for Computational RFIDs
Authors:
Jethro Tan,
Przemysław Pawełczak,
Aaron Parks,
Joshua R. Smith
Abstract:
Computational RFID (CRFID) devices are emerging platforms that can enable perennial computation and sensing by eliminating the need for batteries. Although much research has been devoted to improving upstream (CRFID to RFID reader) communication rates, the opposite direction has so far been neglected, presumably due to the difficulty of guaranteeing fast and error-free transfer amidst frequent pow…
▽ More
Computational RFID (CRFID) devices are emerging platforms that can enable perennial computation and sensing by eliminating the need for batteries. Although much research has been devoted to improving upstream (CRFID to RFID reader) communication rates, the opposite direction has so far been neglected, presumably due to the difficulty of guaranteeing fast and error-free transfer amidst frequent power interruptions of CRFID. With growing interest in the market where CRFIDs are forever-embedded in many structures, it is necessary for this void to be filled. Therefore, we propose Wisent-a robust downstream communication protocol for CRFIDs that operates on top of the legacy UHF RFID communication protocol: EPC C1G2. The novelty of Wisent is its ability to adaptively change the frame length sent by the reader, based on the length throttling mechanism, to minimize the transfer times at varying channel conditions. We present an implementation of Wisent for the WISP 5 and an off-the-shelf RFID reader. Our experiments show that Wisent allows transfer up to 16 times faster than a baseline, non-adaptive shortest frame case, i.e. single word length, at sub-meter distance. As a case study, we show how Wisent enables wireless CRFID reprogramming, demonstrating the world's first wirelessly reprogrammable (software defined) CRFID.
△ Less
Submitted 1 February, 2016; v1 submitted 14 December, 2015;
originally announced December 2015.
-
Oracle performance for visual captioning
Authors:
Li Yao,
Nicolas Ballas,
Kyunghyun Cho,
John R. Smith,
Yoshua Bengio
Abstract:
The task of associating images and videos with a natural language description has attracted a great amount of attention recently. Rapid progress has been made in terms of both develo** novel algorithms and releasing new datasets. Indeed, the state-of-the-art results on some of the standard datasets have been pushed into the regime where it has become more and more difficult to make significant i…
▽ More
The task of associating images and videos with a natural language description has attracted a great amount of attention recently. Rapid progress has been made in terms of both develo** novel algorithms and releasing new datasets. Indeed, the state-of-the-art results on some of the standard datasets have been pushed into the regime where it has become more and more difficult to make significant improvements. Instead of proposing new models, this work investigates the possibility of empirically establishing performance upper bounds on various visual captioning datasets without extra data labelling effort or human evaluation. In particular, it is assumed that visual captioning is decomposed into two steps: from visual inputs to visual concepts, and from visual concepts to natural language descriptions. One would be able to obtain an upper bound when assuming the first step is perfect and only requiring training a conditional language model for the second step. We demonstrate the construction of such bounds on MS-COCO, YouTube2Text and LSMDC (a combination of M-VAD and MPII-MD). Surprisingly, despite of the imperfect process we used for visual concept extraction in the first step and the simplicity of the language model for the second step, we show that current state-of-the-art models fall short when being compared with the learned upper bounds. Furthermore, with such a bound, we quantify several important factors concerning image and video captioning: the number of visual concepts captured by different models, the trade-off between the amount of visual elements captured and their accuracy, and the intrinsic difficulty and blessing of different datasets.
△ Less
Submitted 14 September, 2016; v1 submitted 14 November, 2015;
originally announced November 2015.
-
Powering the Next Billion Devices with Wi-Fi
Authors:
Vamsi Talla,
Bryce Kellogg,
Benjamin Ransford,
Saman Naderiparizi,
Shyamnath Gollakota,
Joshua R. Smith
Abstract:
We present the first power over Wi-Fi system that delivers power and works with existing Wi-Fi chipsets. Specifically, we show that a ubiquitous piece of wireless communication infrastructure, the Wi-Fi router, can provide far field wireless power without compromising the network's communication performance. Building on our design we prototype, for the first time, battery-free temperature and came…
▽ More
We present the first power over Wi-Fi system that delivers power and works with existing Wi-Fi chipsets. Specifically, we show that a ubiquitous piece of wireless communication infrastructure, the Wi-Fi router, can provide far field wireless power without compromising the network's communication performance. Building on our design we prototype, for the first time, battery-free temperature and camera sensors that are powered using Wi-Fi chipsets with ranges of 20 and 17 feet respectively. We also demonstrate the ability to wirelessly recharge nickel-metal hydride and lithium-ion coin-cell batteries at distances of up to 28 feet. Finally, we deploy our system in six homes in a metropolitan area and show that our design can successfully deliver power via Wi-Fi in real-world network conditions.
△ Less
Submitted 26 May, 2015;
originally announced May 2015.
-
Tracking Large-Scale Video Remix in Real-World Events
Authors:
Lexing Xie,
Apostol Natsev,
Xuming He,
John Kender,
Matthew Hill,
John R Smith
Abstract:
Social information networks, such as YouTube, contains traces of both explicit online interaction (such as "like", leaving a comment, or subscribing to video feed), and latent interactions (such as quoting, or remixing parts of a video). We propose visual memes, or frequently re-posted short video segments, for tracking such latent video interactions at scale. Visual memes are extracted by scalabl…
▽ More
Social information networks, such as YouTube, contains traces of both explicit online interaction (such as "like", leaving a comment, or subscribing to video feed), and latent interactions (such as quoting, or remixing parts of a video). We propose visual memes, or frequently re-posted short video segments, for tracking such latent video interactions at scale. Visual memes are extracted by scalable detection algorithms that we develop, with high accuracy. We further augment visual memes with text, via a statistical model of latent topics. We model content interactions on YouTube with visual memes, defining several measures of influence and building predictive models for meme popularity. Experiments are carried out on with over 2 million video shots from more than 40,000 videos on two prominent news events in 2009: the election in Iran and the swine flu epidemic. In these two events, a high percentage of videos contain remixed content, and it is apparent that traditional news media and citizen journalists have different roles in disseminating remixed content. We perform two quantitative evaluations for annotating visual memes and predicting their popularity. The joint statistical model of visual memes and words outperform a concurrence model, and the average error is ~2% for predicting meme volume and ~17% for their lifespan.
△ Less
Submitted 12 May, 2013; v1 submitted 1 October, 2012;
originally announced October 2012.