Skip to main content

Showing 1–50 of 55 results for author: Jain, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19800  [pdf, other

    cs.LG cs.RO

    Modeling the Real World with High-Density Visual Particle Dynamics

    Authors: William F. Whitney, Jacob Varley, Deepali Jain, Krzysztof Choromanski, Sumeet Singh, Vikas Sindhwani

    Abstract: We present High-Density Visual Particle Dynamics (HD-VPD), a learned world model that can emulate the physical dynamics of real scenes by processing massive latent point clouds containing 100K+ particles. To enable efficiency at this scale, we introduce a novel family of Point Cloud Transformers (PCTs) called Interlacers leveraging intertwined linear-attention Performer layers and graph-based neig… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.17740  [pdf, other

    cs.LG cs.AI cs.CV

    Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning

    Authors: Arijit Sehanobish, Avinava Dubey, Krzysztof Choromanski, Somnath Basu Roy Chowdhury, Deepali Jain, Vikas Sindhwani, Snigdha Chaturvedi

    Abstract: Recent efforts to scale Transformer models have demonstrated rapid progress across a wide range of tasks (Wei et al., 2022). However, fine-tuning these models for downstream tasks is expensive due to their large parameter counts. Parameter-efficient fine-tuning (PEFT) approaches have emerged as a viable alternative by allowing us to fine-tune models by updating only a small number of parameters. I… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Work in progress

  3. arXiv:2405.14878  [pdf, other

    eess.IV cs.CV cs.LG stat.AP

    Improving and Evaluating Machine Learning Methods for Forensic Shoeprint Matching

    Authors: Divij Jain, Saatvik Kher, Lena Liang, Yufeng Wu, Ashley Zheng, Xizhen Cai, Anna Plantinga, Elizabeth Upton

    Abstract: We propose a machine learning pipeline for forensic shoeprint pattern matching that improves on the accuracy and generalisability of existing methods. We extract 2D coordinates from shoeprint scans using edge detection and align the two shoeprints with iterative closest point (ICP). We then extract similarity metrics to quantify how well the two prints match and use these metrics to train a random… ▽ More

    Submitted 2 April, 2024; originally announced May 2024.

  4. arXiv:2405.09373  [pdf, other

    cs.CL

    PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models

    Authors: Devansh Jain, Priyanshu Kumar, Samuel Gehman, Xuhui Zhou, Thomas Hartvigsen, Maarten Sap

    Abstract: Recent advances in large language models (LLMs) have led to their extensive global deployment, and ensuring their safety calls for comprehensive and multilingual toxicity evaluations. However, existing toxicity benchmarks are overwhelmingly focused on English, posing serious risks to deploying LLMs in other languages. We address this by introducing PolygloToxicityPrompts (PTP), the first large-sca… ▽ More

    Submitted 20 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  5. arXiv:2404.03570  [pdf, other

    cs.RO

    Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity

    Authors: Jake Varley, Sumeet Singh, Deepali Jain, Krzysztof Choromanski, Andy Zeng, Somnath Basu Roy Chowdhury, Avinava Dubey, Vikas Sindhwani

    Abstract: We present an embodied AI system which receives open-ended natural language instructions from a human, and controls two arms to collaboratively accomplish potentially long-horizon tasks over a large workspace. Our system is modular: it deploys state of the art Large Language Models for task planning,Vision-Language models for semantic perception, and Point Cloud transformers for gras**. With sem… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  6. arXiv:2402.11477  [pdf, other

    cs.CY

    Studying Differential Mental Health Expressions in India

    Authors: Khushi Shelat, Sunny Rai, Devansh R Jain, Kishen Sivabalan, Young Min Cho, Maitreyi Redkar, Samindara Sawant, Sharath Chandra Guntuku

    Abstract: Psychosocial stressors and the symptomatology of mental disorders vary across cultures. However, current understandings of mental health expressions on social media are predominantly derived from studies in WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. In this paper, we analyze mental health posts on Reddit made by individuals in India, to identify variations in online… ▽ More

    Submitted 16 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  7. arXiv:2402.10051  [pdf, other

    cs.AI cs.CL

    SwissNYF: Tool Grounded LLM Agents for Black Box Setting

    Authors: Somnath Sendhil Kumar, Dhruv Jain, Eshaan Agarwal, Raunak Pandey

    Abstract: While Large Language Models (LLMs) have demonstrated enhanced capabilities in function-calling, these advancements primarily rely on accessing the functions' responses. This methodology is practical for simpler APIs but faces scalability issues with irreversible APIs that significantly impact the system, such as a database deletion API. Similarly, processes requiring extensive time for each API ca… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  8. arXiv:2401.11095  [pdf, other

    cs.HC cs.SD eess.AS

    SoundShift: Exploring Sound Manipulations for Accessible Mixed-Reality Awareness

    Authors: Ruei-Che Chang, Chia-Sheng Hung, Bing-Yu Chen, Dhruv Jain, Anhong Guo

    Abstract: Mixed-reality (MR) soundscapes blend real-world sound with virtual audio from hearing devices, presenting intricate auditory information that is hard to discern and differentiate. This is particularly challenging for blind or visually impaired individuals, who rely on sounds and descriptions in their everyday lives. To understand how complex audio information is consumed, we analyzed online forum… ▽ More

    Submitted 26 May, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: DIS 2024

  9. arXiv:2312.13752  [pdf

    eess.IV cs.AI cs.CV

    Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge

    Authors: Yang Nan, Xiaodan Xing, Shiyi Wang, Zeyu Tang, Federico N Felder, Sheng Zhang, Roberta Eufrasia Ledda, Xiaoliu Ding, Ruiqi Yu, Wei** Liu, Feng Shi, Tianyang Sun, Zehong Cao, Minghui Zhang, Yun Gu, Hanxiao Zhang, Jian Gao, **yu Wang, Wen Tang, Pengxin Yu, Han Kang, Junqiang Chen, Xing Lu, Boyu Zhang, Michail Mamalakis , et al. (16 additional authors not shown)

    Abstract: Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intric… ▽ More

    Submitted 16 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 19 pages

  10. arXiv:2312.01990  [pdf, other

    cs.RO cs.AI

    SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention

    Authors: Isabel Leal, Krzysztof Choromanski, Deepali Jain, Avinava Dubey, Jake Varley, Michael Ryoo, Yao Lu, Frederick Liu, Vikas Sindhwani, Quan Vuong, Tamas Sarlos, Ken Oslund, Karol Hausman, Kanishka Rao

    Abstract: We present Self-Adaptive Robust Attention for Robotics Transformers (SARA-RT): a new paradigm for addressing the emerging challenge of scaling up Robotics Transformers (RT) for on-robot deployment. SARA-RT relies on the new method of fine-tuning proposed by us, called up-training. It converts pre-trained or already fine-tuned Transformer-based robotic policies of quadratic time complexity (includi… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  11. arXiv:2311.10781  [pdf, other

    cs.CL cs.AI

    Can Language Model Moderators Improve the Health of Online Discourse?

    Authors: Hyundong Cho, Shuai Liu, Taiwei Shi, Darpan Jain, Basem Rizk, Yuyang Huang, Zixun Lu, Nuan Wen, Jonathan Gratch, Emilio Ferrara, Jonathan May

    Abstract: Conversational moderation of online communities is crucial to maintaining civility for a constructive environment, but it is challenging to scale and harmful to moderators. The inclusion of sophisticated natural language generation modules as a force multiplier to aid human moderators is a tantalizing prospect, but adequate evaluation approaches have so far been elusive. In this paper, we establis… ▽ More

    Submitted 6 May, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: 9 pages, NAACL 2024 Main

  12. Robotic Table Tennis: A Case Study into a High Speed Learning System

    Authors: David B. D'Ambrosio, Jonathan Abelian, Saminda Abeyruwan, Michael Ahn, Alex Bewley, Justin Boyd, Krzysztof Choromanski, Omar Cortes, Erwin Coumans, Tianli Ding, Wenbo Gao, Laura Graesser, Atil Iscen, Navdeep Jaitly, Deepali Jain, Juhana Kangaspunta, Satoshi Kataoka, Gus Kouretas, Yuheng Kuang, Nevena Lazic, Corey Lynch, Reza Mahjourian, Sherry Q. Moore, Thinh Nguyen, Ken Oslund , et al. (10 additional authors not shown)

    Abstract: We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real w… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: Published and presented at Robotics: Science and Systems (RSS2023)

  13. arXiv:2306.08205  [pdf, other

    cs.RO

    Agile Catching with Whole-Body MPC and Blackbox Policy Learning

    Authors: Saminda Abeyruwan, Alex Bewley, Nicholas M. Boffi, Krzysztof Choromanski, David D'Ambrosio, Deepali Jain, Pannag Sanketi, Anish Shankar, Vikas Sindhwani, Sumeet Singh, Jean-Jacques Slotine, Stephen Tu

    Abstract: We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) M… ▽ More

    Submitted 19 October, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: L4DC 2023

  14. arXiv:2305.14654  [pdf, other

    cs.RO cs.AI

    Barkour: Benchmarking Animal-level Agility with Quadruped Robots

    Authors: Ken Caluwaerts, Atil Iscen, J. Chase Kew, Wenhao Yu, Tingnan Zhang, Daniel Freeman, Kuang-Huei Lee, Lisa Lee, Stefano Saliceti, Vincent Zhuang, Nathan Batchelor, Steven Bohez, Federico Casarini, Jose Enrique Chen, Omar Cortes, Erwin Coumans, Adil Dostmohamed, Gabriel Dulac-Arnold, Alejandro Escontrela, Erik Frey, Roland Hafner, Deepali Jain, Bauyrjan Jyenis, Yuheng Kuang, Edward Lee , et al. (19 additional authors not shown)

    Abstract: Animals have evolved various agile locomotion strategies, such as sprinting, lea**, and jum**. There is a growing interest in develo** legged robots that move like their biological counterparts and show various agile skills to navigate complex environments quickly. Despite the interest, the field lacks systematic benchmarks to measure the performance of control policies and hardware in agili… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 17 pages, 19 figures

  15. arXiv:2302.01128  [pdf, other

    cs.LG cs.AI

    Mnemosyne: Learning to Train Transformers with Transformers

    Authors: Deepali Jain, Krzysztof Marcin Choromanski, Avinava Dubey, Sumeet Singh, Vikas Sindhwani, Tingnan Zhang, Jie Tan

    Abstract: In this work, we propose a new class of learnable optimizers, called \textit{Mnemosyne}. It is based on the novel spatio-temporal low-rank implicit attention Transformers that can learn to train entire neural network architectures, including other Transformers, without any task-specific optimizer tuning. We show that Mnemosyne: (a) outperforms popular LSTM optimizers (also with new feature enginee… ▽ More

    Submitted 16 June, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  16. arXiv:2302.00942  [pdf, other

    cs.LG

    Efficient Graph Field Integrators Meet Point Clouds

    Authors: Krzysztof Choromanski, Arijit Sehanobish, Han Lin, Yunfan Zhao, Eli Berger, Tetiana Parshakova, Alvin Pan, David Watkins, Tianyi Zhang, Valerii Likhosherstov, Somnath Basu Roy Chowdhury, Avinava Dubey, Deepali Jain, Tamas Sarlos, Snigdha Chaturvedi, Adrian Weller

    Abstract: We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds. Both can be viewed as providing the functionality of Fast Multipole Metho… ▽ More

    Submitted 4 October, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Journal ref: ICML 2023

  17. arXiv:2301.13473  [pdf, other

    cs.CV cs.AI cs.RO

    CRC-RL: A Novel Visual Feature Representation Architecture for Unsupervised Reinforcement Learning

    Authors: Darshita Jain, Anima Majumder, Samrat Dutta, Swagat Kumar

    Abstract: This paper addresses the problem of visual feature representation learning with an aim to improve the performance of end-to-end reinforcement learning (RL) models. Specifically, a novel architecture is proposed that uses a heterogeneous loss function, called CRC loss, to learn improved visual features which can then be used for policy learning in RL. The CRC-loss function is a combination of three… ▽ More

    Submitted 28 February, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

  18. arXiv:2210.13769  [pdf, other

    cs.CV cs.LG

    GlobalFlowNet: Video Stabilization using Deep Distilled Global Motion Estimates

    Authors: Jerin Geo James, Devansh Jain, Ajit Rajwade

    Abstract: Videos shot by laymen using hand-held cameras contain undesirable shaky motion. Estimating the global motion between successive frames, in a manner not influenced by moving objects, is central to many video stabilization techniques, but poses significant challenges. A large body of work uses 2D affine transformations or homography for the global motion. However, in this work, we introduce a more g… ▽ More

    Submitted 4 November, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: Accepted in WACV 2023

  19. arXiv:2208.01191  [pdf, other

    cs.LG cs.AI cs.NE

    Implicit Two-Tower Policies

    Authors: Yunfan Zhao, Qingkai Pan, Krzysztof Choromanski, Deepali Jain, Vikas Sindhwani

    Abstract: We present a new class of structured reinforcement learning policy-architectures, Implicit Two-Tower (ITT) policies, where the actions are chosen based on the attention scores of their learnable latent representations with those of the input states. By explicitly disentangling action from state processing in the policy stack, we achieve two main goals: substantial computational gains and better pe… ▽ More

    Submitted 25 October, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

  20. arXiv:2207.06572  [pdf, other

    cs.RO

    i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops

    Authors: Saminda Abeyruwan, Laura Graesser, David B. D'Ambrosio, Avi Singh, Anish Shankar, Alex Bewley, Deepali Jain, Krzysztof Choromanski, Pannag R. Sanketi

    Abstract: Sim-to-real transfer is a powerful paradigm for robotic reinforcement learning. The ability to train policies in simulation enables safe exploration and large-scale data collection quickly at low cost. However, prior works in sim-to-real transfer of robotic policies typically do not involve any human-robot interaction because accurately simulating human behavior is an open problem. In this work, o… ▽ More

    Submitted 21 November, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: 8+24 pages

  21. arXiv:2204.04545  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Labeling Refinement for Robust Representation Learning with Bootstrap Your Own Latent

    Authors: Siddhant Garg, Dhruval Jain

    Abstract: In this work, we have worked towards two major goals. Firstly, we have investigated the importance of Batch Normalisation (BN) layers in a non-contrastive representation learning framework called Bootstrap Your Own Latent (BYOL). We conducted several experiments to conclude that BN layers are not necessary for representation learning in BYOL. Moreover, BYOL only learns from the positive pairs of i… ▽ More

    Submitted 9 April, 2022; originally announced April 2022.

  22. arXiv:2202.11134  [pdf

    cs.HC cs.LG cs.SD eess.AS

    ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users

    Authors: Dhruv Jain, Khoa Huynh Anh Nguyen, Steven Goodman, Rachel Grossman-Kahn, Hung Ngo, Aditya Kusupati, Ruofei Du, Alex Olwal, Leah Findlater, Jon E. Froehlich

    Abstract: Recent advances have enabled automatic sound recognition systems for deaf and hard of hearing (DHH) users on mobile devices. However, these tools use pre-trained, generic sound recognition models, which do not meet the diverse needs of DHH users. We introduce ProtoSound, an interactive system for customizing sound recognition models by recording a few examples, thereby enabling personalized and fi… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: Published at the ACM CHI Conference on Human Factors in Computing Systems (CHI) 2022

  23. arXiv:2202.07750  [pdf, other

    eess.AS cs.CL cs.SD

    Nonverbal Sound Detection for Disordered Speech

    Authors: Colin Lea, Zifang Huang, Dhruv Jain, Lauren Tooley, Zeinab Liaghat, Shrinath Thelapurath, Leah Findlater, Jeffrey P. Bigham

    Abstract: Voice assistants have become an essential tool for people with various disabilities because they enable complex phone- or tablet-based interactions without the need for fine-grained motor control, such as with touchscreens. However, these systems are not tuned for the unique characteristics of individuals with speech disorders, including many of those who have a motor-speech disorder, are deaf or… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: Accepted at ICASSP 2022

  24. arXiv:2202.04876  [pdf, other

    cs.CL cs.AI

    Distilling Hypernymy Relations from Language Models: On the Effectiveness of Zero-Shot Taxonomy Induction

    Authors: Devansh Jain, Luis Espinosa Anke

    Abstract: In this paper, we analyze zero-shot taxonomy learning methods which are based on distilling knowledge from language models via prompting and sentence scoring. We show that, despite their simplicity, these methods outperform some supervised strategies and are competitive with the current state-of-the-art under adequate conditions. We also show that statistical and linguistic properties of prompts d… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

  25. arXiv:2110.04367  [pdf, other

    cs.LG stat.ML

    Hybrid Random Features

    Authors: Krzysztof Choromanski, Haoxian Chen, Han Lin, Yuanzhe Ma, Arijit Sehanobish, Deepali Jain, Michael S Ryoo, Jake Varley, Andy Zeng, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller

    Abstract: We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest. Special instantiations of HRFs lead to well-known methods such as trigonometric (Rahimi and Recht, 2007) or (recently introduced in the… ▽ More

    Submitted 30 January, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at ICLR 2022

  26. arXiv:2107.12719  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    The CORSMAL benchmark for the prediction of the properties of containers

    Authors: Alessio Xompero, Santiago Donaher, Vladimir Iashin, Francesca Palermo, Gökhan Solak, Claudio Coppola, Reina Ishikawa, Yuichi Nagao, Ryo Hachiuma, Qi Liu, Fan Feng, Chuanlin Lan, Rosa H. M. Chan, Guilherme Christmann, Jyun-Ting Song, Gonuguntla Neeharika, Chinnakotla Krishna Teja Reddy, Dinesh Jain, Bakhtawar Ur Rehman, Andrea Cavallaro

    Abstract: The contactless estimation of the weight of a container and the amount of its content manipulated by a person are key pre-requisites for safe human-to-robot handovers. However, opaqueness and transparencies of the container and the content, and variability of materials, shapes, and sizes, make this estimation difficult. In this paper, we present a range of methods and an open framework to benchmar… ▽ More

    Submitted 21 April, 2022; v1 submitted 27 July, 2021; originally announced July 2021.

    Comments: Authors' post-print accepted for publication in IEEE Access, see https://doi.org/10.1109/ACCESS.2022.3166906 . 14 pages, 6 tables, 7 figures

    Journal ref: IEEE Access, vol. 10, 2022, 1-15

  27. arXiv:2105.07730  [pdf

    cs.SI cs.LG

    The State of Infodemic on Twitter

    Authors: Drishti Jain, Tavpritesh Sethi

    Abstract: Following the wave of misinterpreted, manipulated and malicious information growing on the Internet, the misinformation surrounding COVID-19 has become a paramount issue. In the context of the current COVID-19 pandemic, social media posts and platforms are at risk of rumors and misinformation in the face of the serious uncertainty surrounding the virus itself. At the same time, the uncertainty and… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: 8 pages

  28. A bibliometric analysis of citation diversity in accessibility and HCI research

    Authors: Lucy Lu Wang, Kelly Mack, Emma McDonnell, Dhruv Jain, Leah Findlater, Jon E. Froehlich

    Abstract: Accessibility research sits at the junction of several disciplines, drawing influence from HCI, disability studies, psychology, education, and more. To characterize the influences and extensions of accessibility research, we undertake a study of citation trends for accessibility and related HCI communities. We assess the diversity of venues and fields of study represented among the referenced and… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: 11 pages, 5 figures, 3 tables, 2 appendices; CHI LBW 2021; accessible PDF available at https://makeabilitylab.cs.washington.edu/

  29. arXiv:2102.04353  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Unlocking Pixels for Reinforcement Learning via Implicit Attention

    Authors: Krzysztof Marcin Choromanski, Deepali Jain, Wenhao Yu, Xingyou Song, Jack Parker-Holder, Tingnan Zhang, Valerii Likhosherstov, Aldo Pacchiano, Anirban Santara, Yunhao Tang, Jie Tan, Adrian Weller

    Abstract: There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments. This poses many challenges, such as high dimensionality and the potential for observational overfitting through spurious correlations. A promising approach to solve both of these problems is an attention bottleneck, which provides a simple and effective framework for learning h… ▽ More

    Submitted 1 October, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

  30. arXiv:2101.07415  [pdf, other

    cs.LG cs.NE cs.RO

    ES-ENAS: Efficient Evolutionary Optimization for Large Hybrid Search Spaces

    Authors: Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Qiuyi Zhang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Yuxiang Yang

    Abstract: In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters. We demonstrate that previous evolutionary algorithms which rely on mutation-based approaches, while flexible over combinatorial spaces, suffer from a curse of dimensionality in high dimensional continuous spaces both theoretically and e… ▽ More

    Submitted 15 March, 2023; v1 submitted 18 January, 2021; originally announced January 2021.

    Comments: Previously published at ICLR 2020 NAS Workshop. See https://github.com/google-research/google-research/tree/master/es_enas for associated code

  31. What Do We Mean by "Accessibility Research"? A Literature Survey of Accessibility Papers in CHI and ASSETS from 1994 to 2019

    Authors: Kelly Mack, Emma McDonnell, Dhruv Jain, Lucy Lu Wang, Jon E. Froehlich, Leah Findlater

    Abstract: Accessibility research has grown substantially in the past few decades, yet there has been no literature review of the field. To understand current and historical trends, we created and analyzed a dataset of accessibility papers appearing at CHI and ASSETS since ASSETS' founding in 1994. We qualitatively coded areas of focus and methodological decisions for the past 10 years (2010-2019, N=506 pape… ▽ More

    Submitted 3 February, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

  32. arXiv:2012.14464  [pdf, other

    cs.RO cs.AI

    Disentangled Planning and Control in Vision Based Robotics via Reward Machines

    Authors: Alberto Camacho, Jacob Varley, Deepali Jain, Atil Iscen, Dmitry Kalashnikov

    Abstract: In this work we augment a Deep Q-Learning agent with a Reward Machine (DQRM) to increase speed of learning vision-based policies for robot tasks, and overcome some of the limitations of DQN that prevent it from converging to good-quality policies. A reward machine (RM) is a finite state machine that decomposes a task into a discrete planning graph and equips the agent with a reward function to gui… ▽ More

    Submitted 28 December, 2020; originally announced December 2020.

    Comments: Accepted to the Deep Reinforcement Learning Workshop at Neural Information Processing Systems (2020)

  33. Codeswitched Sentence Creation using Dependency Parsing

    Authors: Dhruval Jain, Arun D Prabhu, Shubham Vatsal, Gopi Ramena, Naresh Purre

    Abstract: Codeswitching has become one of the most common occurrences across multilingual speakers of the world, especially in countries like India which encompasses around 23 official languages with the number of bilingual speakers being around 300 million. The scarcity of Codeswitched data becomes a bottleneck in the exploration of this domain with respect to various Natural Language Processing (NLP) task… ▽ More

    Submitted 5 December, 2020; originally announced December 2020.

  34. arXiv:2011.11722  [pdf, other

    cs.RO cs.CV cs.LG

    From Pixels to Legs: Hierarchical Learning of Quadruped Locomotion

    Authors: Deepali Jain, Atil Iscen, Ken Caluwaerts

    Abstract: Legged robots navigating crowded scenes and complex terrains in the real world are required to execute dynamic leg movements while processing visual input for obstacle avoidance and path planning. We show that a quadruped robot can acquire both of these skills by means of hierarchical reinforcement learning (HRL). By virtue of their hierarchical structure, our policies learn to implicitly break do… ▽ More

    Submitted 23 November, 2020; originally announced November 2020.

    Journal ref: 4th Conference on Robot Learning (CoRL 2020), Cambridge MA, USA

  35. On-Device Text Image Super Resolution

    Authors: Dhruval Jain, Arun D Prabhu, Gopi Ramena, Manoj Goyal, Debi Prasanna Mohanty, Sukumar Moharana, Naresh Purre

    Abstract: Recent research on super-resolution (SR) has witnessed major developments with the advancements of deep convolutional neural networks. There is a need for information extraction from scenic text images or even document images on device, most of which are low-resolution (LR) images. Therefore, SR becomes an essential pre-processing step as Bicubic Upsampling, which is conventionally present in smar… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: Accepted to the International Conference on Pattern Recognition(ICPR), 2020

  36. arXiv:2011.05541  [pdf, other

    cs.RO

    Learning Agile Locomotion Skills with a Mentor

    Authors: Atil Iscen, George Yu, Alejandro Escontrela, Deepali Jain, Jie Tan, Ken Caluwaerts

    Abstract: Develo** agile behaviors for legged robots remains a challenging problem. While deep reinforcement learning is a promising approach, learning truly agile behaviors typically requires tedious reward sha** and careful curriculum design. We formulate agile locomotion as a multi-stage learning problem in which a mentor guides the agent throughout the training. The mentor is optimized to place a ch… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

  37. On-Device Language Identification of Text in Images using Diacritic Characters

    Authors: Shubham Vatsal, Nikhil Arora, Gopi Ramena, Sukumar Moharana, Dhruval Jain, Naresh Purre, Rachit S Munjal

    Abstract: Diacritic characters can be considered as a unique set of characters providing us with adequate and significant clue in identifying a given language with considerably high accuracy. Diacritics, though associated with phonetics often serve as a distinguishing feature for many languages especially the ones with a Latin script. In this proposed work, we aim to identify language of text in images usin… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

  38. Surveys without Questions: A Reinforcement Learning Approach

    Authors: Atanu R Sinha, Deepali Jain, Nikhil Sheoran, Sopan Khosla, Reshmi Sasidharan

    Abstract: The 'old world' instrument, survey, remains a tool of choice for firms to obtain ratings of satisfaction and experience that customers realize while interacting online with firms. While avenues for survey have evolved from emails and links to pop-ups while browsing, the deficiencies persist. These include - reliance on ratings of very few respondents to infer about all customers' online interactio… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, July 2019, pp. 257-64

  39. On-device Filtering of Social Media Images for Efficient Storage

    Authors: Dhruval Jain, DP Mohanty, Sanjeev Roy, Naresh Purre, Sukumar Moharana

    Abstract: Artificially crafted images such as memes, seasonal greetings, etc are flooding the social media platforms today. These eventually start occupying a lot of internal memory of smartphones and it gets cumbersome for the user to go through hundreds of images and delete these synthetic images. To address this, we propose a novel method based on Convolutional Neural Networks (CNNs) for the on-device fi… ▽ More

    Submitted 14 May, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  40. arXiv:1907.06511  [pdf, other

    cs.NE cs.AI cs.LG cs.RO

    Reinforcement Learning with Chromatic Networks for Compact Architecture Search

    Authors: Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang

    Abstract: We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way. By defining the combinatorial search space of NAS to be the set of different edge-partitionings (colorings) into same-weight classes, we represent compact architectures via efficient learned edge-partitionings. For several RL… ▽ More

    Submitted 6 April, 2021; v1 submitted 10 July, 2019; originally announced July 2019.

    Comments: Published at ICLR 2020 Neural Architecture Search Workshop. This paper is deprecated; please see arXiv:2101.07415 for the newer version

  41. arXiv:1905.08926  [pdf, other

    cs.LG cs.AI cs.RO

    Hierarchical Reinforcement Learning for Quadruped Locomotion

    Authors: Deepali Jain, Atil Iscen, Ken Caluwaerts

    Abstract: Legged locomotion is a challenging task for learning algorithms, especially when the task requires a diverse set of primitive behaviors. To solve these problems, we introduce a hierarchical framework to automatically decompose complex locomotion tasks. A high-level policy issues commands in a latent space and also selects for how long the low-level policy will execute the latent command. Concurren… ▽ More

    Submitted 21 May, 2019; originally announced May 2019.

  42. arXiv:1905.00060  [pdf, other

    cs.CV

    Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch

    Authors: Danna Gurari, Yinan Zhao, Suyog Dutt Jain, Margrit Betke, Kristen Grauman

    Abstract: Foreground object segmentation is a critical step for many image analysis tasks. While automated methods can produce high-quality results, their failures disappoint users in need of practical solutions. We propose a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher quality segmentations for a given batch of images… ▽ More

    Submitted 30 April, 2019; originally announced May 2019.

  43. arXiv:1903.02993  [pdf, other

    cs.LG stat.ML

    Provably Robust Blackbox Optimization for Reinforcement Learning

    Authors: Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani

    Abstract: Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics. However, it is well known that DFO methods suffer from prohibitively high sampling complexity. They can also be very sensitive to noisy rew… ▽ More

    Submitted 8 July, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

  44. arXiv:1808.04702  [pdf, other

    cs.CV

    Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos

    Authors: Bo Xiong, Suyog Dutt Jain, Kristen Grauman

    Abstract: We propose an end-to-end learning framework for segmenting generic objects in both images and videos. Given a novel image or video, our approach produces a pixel-level mask for all "object-like" regions---even for object categories never seen during training. We formulate the task as a structured prediction problem of assigning an object/background label to each pixel, implemented using a deep ful… ▽ More

    Submitted 17 December, 2018; v1 submitted 11 August, 2018; originally announced August 2018.

    Comments: To appear in PAMI. arXiv admin note: text overlap with arXiv:1701.05349, arXiv:1701.05384

  45. arXiv:1805.02173  [pdf, other

    cs.CV

    An Interval Type-2 Fuzzy Approach to Automatic PDF Generation for Histogram Specification

    Authors: Vishal Agarwal, Diwanshu Jain, A. Vamshi Krishna Reddy, Frank Chung-Hoon Rhee

    Abstract: Image enhancement plays an important role in several application in the field of computer vision and image processing. Histogram specification (HS) is one of the most widely used techniques for contrast enhancement of an image, which requires an appropriate probability density function for the transformation. In this paper, we propose a fuzzy method to find a suitable PDF automatically for histogr… ▽ More

    Submitted 6 May, 2018; originally announced May 2018.

  46. arXiv:1803.05070  [pdf, other

    cs.CV cs.LG stat.ML

    A Multi-Modal Approach to Infer Image Affect

    Authors: Ashok Sundaresan, Sugumar Murugesan, Sean Davis, Karthik Kappaganthu, ZhongYi **, Divya Jain, Anurag Maunder

    Abstract: The group affect or emotion in an image of people can be inferred by extracting features about both the people in the picture and the overall makeup of the scene. The state-of-the-art on this problem investigates a combination of facial features, scene extraction and even audio tonality. This paper combines three additional modalities, namely, human pose, text-based tagging and CNN extracted featu… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

  47. arXiv:1705.00366  [pdf, other

    cs.CV

    Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s)

    Authors: Danna Gurari, Kun He, Bo Xiong, Jianming Zhang, Mehrnoosh Sameki, Suyog Dutt Jain, Stan Sclaroff, Margrit Betke, Kristen Grauman

    Abstract: We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems. Specifically, we distinguish between images which lead multiple annotators to segment different foreground objects (ambiguous) versus minor inter-annotator differences of the same object. Taking images from eight wid… ▽ More

    Submitted 30 April, 2017; originally announced May 2017.

  48. arXiv:1701.05384  [pdf, other

    cs.CV

    FusionSeg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos

    Authors: Suyog Dutt Jain, Bo Xiong, Kristen Grauman

    Abstract: We propose an end-to-end learning framework for segmenting generic objects in videos. Our method learns to combine appearance and motion information to produce pixel level segmentation masks for all prominent objects in videos. We formulate this task as a structured prediction problem and design a two-stream fully convolutional neural network which fuses together motion and appearance in a unified… ▽ More

    Submitted 12 April, 2017; v1 submitted 19 January, 2017; originally announced January 2017.

    Comments: CVPR 2017

  49. arXiv:1701.05349  [pdf, other

    cs.CV

    Pixel Objectness

    Authors: Suyog Dutt Jain, Bo Xiong, Kristen Grauman

    Abstract: We propose an end-to-end learning framework for generating foreground object segmentations. Given a single novel image, our approach produces pixel-level masks for all "object-like" regions---even for object categories never seen during training. We formulate the task as a structured prediction problem of assigning foreground/background labels to all pixels, implemented using a deep fully convolut… ▽ More

    Submitted 12 April, 2017; v1 submitted 19 January, 2017; originally announced January 2017.

  50. arXiv:1607.01115  [pdf, other

    cs.CV cs.AI cs.HC

    Click Carving: Segmenting Objects in Video with Point Clicks

    Authors: Suyog Dutt Jain, Kristen Grauman

    Abstract: We present a novel form of interactive video object segmentation where a few clicks by the user helps the system produce a full spatio-temporal segmentation of the object of interest. Whereas conventional interactive pipelines take the user's initialization as a starting point, we show the value in the system taking the lead even in initialization. In particular, for a given video frame, the syste… ▽ More

    Submitted 5 July, 2016; originally announced July 2016.

    Comments: A preliminary version of the material in this document was filed as University of Texas technical report no. UT AI16-01

    Report number: University of Texas Technical Report UT AI16-01