-
A neuro-symbolic approach for multimodal reference expression comprehension
Authors:
Aman Jain,
Anirudh Reddy Kondapally,
Kentaro Yamada,
Hitomi Yanaka
Abstract:
Human-Machine Interaction (HMI) systems have gained huge interest in recent years, with reference expression comprehension being one of the main challenges. Traditionally human-machine interaction has been mostly limited to speech and visual modalities. However, to allow for more freedom in interaction, recent works have proposed the integration of additional modalities, such as gestures in HMI sy…
▽ More
Human-Machine Interaction (HMI) systems have gained huge interest in recent years, with reference expression comprehension being one of the main challenges. Traditionally human-machine interaction has been mostly limited to speech and visual modalities. However, to allow for more freedom in interaction, recent works have proposed the integration of additional modalities, such as gestures in HMI systems. We consider such an HMI system with pointing gestures and construct a table-top object picking scenario inside a simulated virtual reality (VR) environment to collect data. Previous works for such a task have used deep neural networks to classify the referred object, which lacks transparency. In this work, we propose an interpretable and compositional model, crucial to building robust HMI systems for real-world application, based on a neuro-symbolic approach to tackle this task. Finally we also show the generalizability of our model on unseen environments and report the results.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Intention estimation from gaze and motion features for human-robot shared-control object manipulation
Authors:
Anna Belardinelli,
Anirudh Reddy Kondapally,
Dirk Ruiken,
Daniel Tanneberg,
Tomoki Watabe
Abstract:
Shared control can help in teleoperated object manipulation by assisting with the execution of the user's intention. To this end, robust and prompt intention estimation is needed, which relies on behavioral observations. Here, an intention estimation framework is presented, which uses natural gaze and motion features to predict the current action and the target object. The system is trained and te…
▽ More
Shared control can help in teleoperated object manipulation by assisting with the execution of the user's intention. To this end, robust and prompt intention estimation is needed, which relies on behavioral observations. Here, an intention estimation framework is presented, which uses natural gaze and motion features to predict the current action and the target object. The system is trained and tested in a simulated environment with pick and place sequences produced in a relatively cluttered scene and with both hands, with possible hand-over to the other hand. Validation is conducted across different users and hands, achieving good accuracy and earliness of prediction. An analysis of the predictive power of single features shows the predominance of the gras** trigger and the gaze features in the early identification of the current action. In the current framework, the same probabilistic model can be used for the two hands working in parallel and independently, while a rule-based model is proposed to identify the resulting bimanual action. Finally, limitations and perspectives of this approach to more complex, full-bimanual manipulations are discussed.
△ Less
Submitted 18 August, 2022;
originally announced August 2022.
-
Automatic Construction of Enterprise Knowledge Base
Authors:
Junyi Chai,
Yujie He,
Homa Hashemi,
Bing Li,
Daraksha Parveen,
Ranganath Kondapally,
Wen** Xu
Abstract:
In this paper, we present an automatic knowledge base construction system from large scale enterprise documents with minimal efforts of human intervention. In the design and deployment of such a knowledge mining system for enterprise, we faced several challenges including data distributional shift, performance evaluation, compliance requirements and other practical issues. We leveraged state-of-th…
▽ More
In this paper, we present an automatic knowledge base construction system from large scale enterprise documents with minimal efforts of human intervention. In the design and deployment of such a knowledge mining system for enterprise, we faced several challenges including data distributional shift, performance evaluation, compliance requirements and other practical issues. We leveraged state-of-the-art deep learning models to extract information (named entities and definitions) at per document level, then further applied classical machine learning techniques to process global statistical information to improve the knowledge base. Experimental results are reported on actual enterprise documents. This system is currently serving as part of a Microsoft 365 service.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Impersonation: Modeling Persona in Smart Responses to Email
Authors:
Rajeev Gupta,
Ranganath Kondapally,
Chakrapani Ravi Kiran
Abstract:
In this paper, we present design, implementation, and effectiveness of generating personalized suggestions for email replies. To personalize email responses based on users style and personality, we model the users persona based on her past responses to emails. This model is added to the language-based model created across users using past responses of the all user emails.
A users model captures…
▽ More
In this paper, we present design, implementation, and effectiveness of generating personalized suggestions for email replies. To personalize email responses based on users style and personality, we model the users persona based on her past responses to emails. This model is added to the language-based model created across users using past responses of the all user emails.
A users model captures the typical responses of the user given a particular context. The context includes the email received, recipient of the email, and other external signals such as calendar activities, preferences, etc. The context along with users personality (e.g., extrovert, formal, reserved, etc.) is used to suggest responses. These responses can be a mixture of multiple modes: email replies (textual), audio clips, etc. This helps in making responses mimic the user as much as possible and helps the user to be more productive while retaining her mark in the responses.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
Information Complexity versus Corruption and Applications to Orthogonality and Gap-Hamming
Authors:
Amit Chakrabarti,
Ranganath Kondapally,
Zhenghui Wang
Abstract:
Three decades of research in communication complexity have led to the invention of a number of techniques to lower bound randomized communication complexity. The majority of these techniques involve properties of large submatrices (rectangles) of the truth-table matrix defining a communication problem. The only technique that does not quite fit is information complexity, which has been investigate…
▽ More
Three decades of research in communication complexity have led to the invention of a number of techniques to lower bound randomized communication complexity. The majority of these techniques involve properties of large submatrices (rectangles) of the truth-table matrix defining a communication problem. The only technique that does not quite fit is information complexity, which has been investigated over the last decade. Here, we connect information complexity to one of the most powerful "rectangular" techniques: the recently-introduced smooth corruption (or "smooth rectangle") bound. We show that the former subsumes the latter under rectangular input distributions. We conjecture that this subsumption holds more generally, under arbitrary distributions, which would resolve the long-standing direct sum question for randomized communication. As an application, we obtain an optimal $Ω(n)$ lower bound on the information complexity---under the {\em uniform distribution}---of the so-called orthogonality problem (ORT), which is in turn closely related to the much-studied Gap-Hamming-Distance (GHD). The proof of this bound is along the lines of recent communication lower bounds for GHD, but we encounter a surprising amount of additional technical detail.
△ Less
Submitted 4 May, 2012;
originally announced May 2012.
-
Information Cost Tradeoffs for Augmented Index and Streaming Language Recognition
Authors:
Amit Chakrabarti,
Graham Cormode,
Ranganath Kondapally,
Andrew McGregor
Abstract:
This paper makes three main contributions to the theory of communication complexity and stream computation. First, we present new bounds on the information complexity of AUGMENTED-INDEX. In contrast to analogous results for INDEX by Jain, Radhakrishnan and Sen [J. ACM, 2009], we have to overcome the significant technical challenge that protocols for AUGMENTED-INDEX may violate the "rectangle prope…
▽ More
This paper makes three main contributions to the theory of communication complexity and stream computation. First, we present new bounds on the information complexity of AUGMENTED-INDEX. In contrast to analogous results for INDEX by Jain, Radhakrishnan and Sen [J. ACM, 2009], we have to overcome the significant technical challenge that protocols for AUGMENTED-INDEX may violate the "rectangle property" due to the inherent input sharing. Second, we use these bounds to resolve an open problem of Magniez, Mathieu and Nayak [STOC, 2010] that asked about the multi-pass complexity of recognizing Dyck languages. This results in a natural separation between the standard multi-pass model and the multi-pass model that permits reverse passes. Third, we present the first passive memory checkers that verify the interaction transcripts of priority queues, stacks, and double-ended queues. We obtain tight upper and lower bounds for these problems, thereby addressing an important sub-class of the memory checking framework of Blum et al. [Algorithmica, 1994].
△ Less
Submitted 19 April, 2010;
originally announced April 2010.