Search | arXiv e-print repository

doi 10.1109/TASLP.2024.3353574

TrICy: Trigger-guided Data-to-text Generation with Intent aware Attention-Copy

Authors: Vibhav Agarwal, Sourav Ghosh, Harichandana BSS, Himanshu Arora, Barath Raj Kandur Raja

Abstract: Data-to-text (D2T) generation is a crucial task in many natural language understanding (NLU) applications and forms the foundation of task-oriented dialog systems. In the context of conversational AI solutions that can work directly with local data on the user's device, architectures utilizing large pre-trained language models (PLMs) are impractical for on-device deployment due to a high memory fo… ▽ More Data-to-text (D2T) generation is a crucial task in many natural language understanding (NLU) applications and forms the foundation of task-oriented dialog systems. In the context of conversational AI solutions that can work directly with local data on the user's device, architectures utilizing large pre-trained language models (PLMs) are impractical for on-device deployment due to a high memory footprint. To this end, we propose TrICy, a novel lightweight framework for an enhanced D2T task that generates text sequences based on the intent in context and may further be guided by user-provided triggers. We leverage an attention-copy mechanism to predict out-of-vocabulary (OOV) words accurately. Performance analyses on E2E NLG dataset (BLEU: 66.43%, ROUGE-L: 70.14%), WebNLG dataset (BLEU: Seen 64.08%, Unseen 52.35%), and our Custom dataset related to text messaging applications, showcase our architecture's effectiveness. Moreover, we show that by leveraging an optional trigger input, data-to-text generation quality increases significantly and achieves the new SOTA score of 69.29% BLEU for E2E NLG. Furthermore, our analyses show that TrICy achieves at least 24% and 3% improvement in BLEU and METEOR respectively over LLMs like GPT-3, ChatGPT, and Llama 2. We also demonstrate that in some scenarios, performance improvement due to triggers is observed even when they are absent in training. △ Less

Submitted 25 January, 2024; originally announced February 2024.

Comments: Published in the IEEE/ACM Transactions on Audio, Speech, and Language Processing. (Sourav Ghosh and Vibhav Agarwal contributed equally to this work.)

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1173-1184, 2024

arXiv:2312.00766 [pdf, other]

Automated Material Properties Extraction For Enhanced Beauty Product Discovery and Makeup Virtual Try-on

Authors: Fatemeh Taheri Dezaki, Himanshu Arora, Rahul Suresh, Amin Banitalebi-Dehkordi

Abstract: The multitude of makeup products available can make it challenging to find the ideal match for desired attributes. An intelligent approach for product discovery is required to enhance the makeup shop** experience to make it more convenient and satisfying. However, enabling accurate and efficient product discovery requires extracting detailed attributes like color and finish type. Our work introd… ▽ More The multitude of makeup products available can make it challenging to find the ideal match for desired attributes. An intelligent approach for product discovery is required to enhance the makeup shop** experience to make it more convenient and satisfying. However, enabling accurate and efficient product discovery requires extracting detailed attributes like color and finish type. Our work introduces an automated pipeline that utilizes multiple customized machine learning models to extract essential material attributes from makeup product images. Our pipeline is versatile and capable of handling various makeup products. To showcase the efficacy of our pipeline, we conduct extensive experiments on eyeshadow products (both single and multi-shade ones), a challenging makeup product known for its diverse range of shapes, colors, and finish types. Furthermore, we demonstrate the applicability of our approach by successfully extending it to other makeup categories like lipstick and foundation, showcasing its adaptability and effectiveness across different beauty products. Additionally, we conduct ablation experiments to demonstrate the superiority of our machine learning pipeline over human labeling methods in terms of reliability. Our proposed method showcases its effectiveness in cross-category product discovery, specifically in recommending makeup products that perfectly match a specified outfit. Lastly, we also demonstrate the application of these material attributes in enabling virtual-try-on experiences which makes makeup shop** experience significantly more engaging. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: Presented in Fifth Workshop on Recommender Systems in Fashion(fashionxrecsys) of ACM Conference on Recommender Systems

arXiv:2209.02834 [pdf, other]

Unsupervised Scene Sketch to Photo Synthesis

Authors: Jiayun Wang, Sangryul Jeon, Stella X. Yu, Xi Zhang, Himanshu Arora, Yu Lou

Abstract: Sketches make an intuitive and powerful visual expression as they are fast executed freehand drawings. We present a method for synthesizing realistic photos from scene sketches. Without the need for sketch and photo pairs, our framework directly learns from readily available large-scale photo datasets in an unsupervised manner. To this end, we introduce a standardization module that provides pseud… ▽ More Sketches make an intuitive and powerful visual expression as they are fast executed freehand drawings. We present a method for synthesizing realistic photos from scene sketches. Without the need for sketch and photo pairs, our framework directly learns from readily available large-scale photo datasets in an unsupervised manner. To this end, we introduce a standardization module that provides pseudo sketch-photo pairs during training by converting photos and sketches to a standardized domain, i.e. the edge map. The reduced domain gap between sketch and photo also allows us to disentangle them into two components: holistic scene structures and low-level visual styles such as color and texture. Taking this advantage, we synthesize a photo-realistic image by combining the structure of a sketch and the visual style of a reference photo. Extensive experimental results on perceptual similarity metrics and human perceptual studies show the proposed method could generate realistic photos with high fidelity from scene sketches and outperform state-of-the-art photo synthesis baselines. We also demonstrate that our framework facilitates a controllable manipulation of photo synthesis by editing strokes of corresponding sketches, delivering more fine-grained details than previous approaches that rely on region-level editing. △ Less

Submitted 6 September, 2022; originally announced September 2022.

Journal ref: ECCVW 2022

arXiv:2204.04867 [pdf, other]

Structured Graph Variational Autoencoders for Indoor Furniture layout Generation

Authors: Aditya Chattopadhyay, Xi Zhang, David Paul Wipf, Himanshu Arora, Rene Vidal

Abstract: We present a structured graph variational autoencoder for generating the layout of indoor 3D scenes. Given the room type (e.g., living room or library) and the room layout (e.g., room elements such as floor and walls), our architecture generates a collection of objects (e.g., furniture items such as sofa, table and chairs) that is consistent with the room type and layout. This is a challenging pro… ▽ More We present a structured graph variational autoencoder for generating the layout of indoor 3D scenes. Given the room type (e.g., living room or library) and the room layout (e.g., room elements such as floor and walls), our architecture generates a collection of objects (e.g., furniture items such as sofa, table and chairs) that is consistent with the room type and layout. This is a challenging problem because the generated scene should satisfy multiple constrains, e.g., each object must lie inside the room and two objects cannot occupy the same volume. To address these challenges, we propose a deep generative model that encodes these relationships as soft constraints on an attributed graph (e.g., the nodes capture attributes of room and furniture elements, such as class, pose and size, and the edges capture geometric relationships such as relative orientation). The architecture consists of a graph encoder that maps the input graph to a structured latent space, and a graph decoder that generates a furniture graph, given a latent code and the room graph. The latent space is modeled with auto-regressive priors, which facilitates the generation of highly structured scenes. We also propose an efficient training procedure that combines matching and constrained learning. Experiments on the 3D-FRONT dataset show that our method produces scenes that are diverse and are adapted to the room layout. △ Less

Submitted 22 July, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

arXiv:2112.12028 [pdf, other]

doi 10.1109/INDICON52576.2021.9691564

VoiceMoji: A Novel On-Device Pipeline for Seamless Emoji Insertion in Dictation

Authors: Sumit Kumar, Harichandana B S S, Himanshu Arora

Abstract: Most of the speech recognition systems recover only words in the speech and fail to capture emotions. Users have to manually add emoji(s) in text for adding tone and making communication fun. Though there is much work done on punctuation addition on transcribed speech, the area of emotion addition is untouched. In this paper, we propose a novel on-device pipeline to enrich the voice input experien… ▽ More Most of the speech recognition systems recover only words in the speech and fail to capture emotions. Users have to manually add emoji(s) in text for adding tone and making communication fun. Though there is much work done on punctuation addition on transcribed speech, the area of emotion addition is untouched. In this paper, we propose a novel on-device pipeline to enrich the voice input experience. It involves, given a blob of transcribed text, intelligently processing and identifying structure where emoji insertion makes sense. Moreover, it includes semantic text analysis to predict emoji for each of the sub-parts for which we propose a novel architecture Attention-based Char Aware (ACA) LSTM which handles Out-Of-Vocabulary (OOV) words as well. All these tasks are executed completely on-device and hence can aid on-device dictation systems. To the best of our knowledge, this is the first work that shows how to add emoji(s) in the transcribed text. We demonstrate that our components achieve comparable results to previous neural approaches for punctuation addition and emoji prediction with 80% fewer parameters. Overall, our proposed model has a very small memory footprint of a mere 4MB to suit on-device deployment. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: Accepted at IEEE INDICON 2021, 19-21 December, 2021, India

arXiv:2110.15717 [pdf, other]

doi 10.1109/ICMLA52953.2021.00182

LIDSNet: A Lightweight on-device Intent Detection model using Deep Siamese Network

Authors: Vibhav Agarwal, Sudeep Deepak Shivnikar, Sourav Ghosh, Himanshu Arora, Yashwant Saini

Abstract: Intent detection is a crucial task in any Natural Language Understanding (NLU) system and forms the foundation of a task-oriented dialogue system. To build high-quality real-world conversational solutions for edge devices, there is a need for deploying intent detection model on device. This necessitates a light-weight, fast, and accurate model that can perform efficiently in a resource-constrained… ▽ More Intent detection is a crucial task in any Natural Language Understanding (NLU) system and forms the foundation of a task-oriented dialogue system. To build high-quality real-world conversational solutions for edge devices, there is a need for deploying intent detection model on device. This necessitates a light-weight, fast, and accurate model that can perform efficiently in a resource-constrained environment. To this end, we propose LIDSNet, a novel lightweight on-device intent detection model, which accurately predicts the message intent by utilizing a Deep Siamese Network for learning better sentence representations. We use character-level features to enrich the sentence-level representations and empirically demonstrate the advantage of transfer learning by utilizing pre-trained embeddings. Furthermore, to investigate the efficacy of the modules in our architecture, we conduct an ablation study and arrive at our optimal model. Experimental results prove that LIDSNet achieves state-of-the-art competitive accuracy of 98.00% and 95.97% on SNIPS and ATIS public datasets respectively, with under 0.59M parameters. We further benchmark LIDSNet against fine-tuned BERTs and show that our model is at least 41x lighter and 30x faster during inference than MobileBERT on Samsung Galaxy S20 device, justifying its efficiency on resource-constrained edge devices. △ Less

Submitted 6 October, 2021; originally announced October 2021.

Comments: Accepted for publication in 2021 IEEE 20th International Conference on Machine Learning and Applications (ICMLA)

Journal ref: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), Pasadena, CA, USA, 2021, pp. 1112-1117

arXiv:2110.06199 [pdf, other]

ABO: Dataset and Benchmarks for Real-World 3D Object Understanding

Authors: Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F. Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu Guillaumin, Jitendra Malik

Abstract: We introduce Amazon Berkeley Objects (ABO), a new large-scale dataset designed to help bridge the gap between real and virtual 3D worlds. ABO contains product catalog images, metadata, and artist-created 3D models with complex geometries and physically-based materials that correspond to real, household objects. We derive challenging benchmarks that exploit the unique properties of ABO and measure… ▽ More We introduce Amazon Berkeley Objects (ABO), a new large-scale dataset designed to help bridge the gap between real and virtual 3D worlds. ABO contains product catalog images, metadata, and artist-created 3D models with complex geometries and physically-based materials that correspond to real, household objects. We derive challenging benchmarks that exploit the unique properties of ABO and measure the current limits of the state-of-the-art on three open problems for real-world 3D object understanding: single-view 3D reconstruction, material estimation, and cross-domain multi-view object retrieval. △ Less

Submitted 24 June, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

arXiv:2110.00644 [pdf, other]

RoomStructNet: Learning to Rank Non-Cuboidal Room Layouts From Single View

Authors: Xi Zhang, Chun-Kai Wang, Kenan Deng, Tomas Yago-Vicente, Himanshu Arora

Abstract: In this paper, we present a new approach to estimate the layout of a room from its single image. While recent approaches for this task use robust features learnt from data, they resort to optimization for detecting the final layout. In addition to using learnt robust features, our approach learns an additional ranking function to estimate the final layout instead of using optimization. To learn th… ▽ More In this paper, we present a new approach to estimate the layout of a room from its single image. While recent approaches for this task use robust features learnt from data, they resort to optimization for detecting the final layout. In addition to using learnt robust features, our approach learns an additional ranking function to estimate the final layout instead of using optimization. To learn this ranking function, we propose a framework to train a CNN using max-margin structure cost. Also, while most approaches aim at detecting cuboidal layouts, our approach detects non-cuboidal layouts for which we explicitly estimates layout complexity parameters. We use these parameters to propose layout candidates in a novel way. Our approach shows state-of-the-art results on standard datasets with mostly cuboidal layouts and also performs well on a dataset containing rooms with non-cuboidal layouts. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: 10 pages

arXiv:2106.16237 [pdf, other]

Multimodal Shape Completion via IMLE

Authors: Himanshu Arora, Saurabh Mishra, Shichong Peng, Ke Li, Ali Mahdavi-Amiri

Abstract: Shape completion is the problem of completing partial input shapes such as partial scans. This problem finds important applications in computer vision and robotics due to issues such as occlusion or sparsity in real-world data. However, most of the existing research related to shape completion has been focused on completing shapes by learning a one-to-one map** which limits the diversity and cre… ▽ More Shape completion is the problem of completing partial input shapes such as partial scans. This problem finds important applications in computer vision and robotics due to issues such as occlusion or sparsity in real-world data. However, most of the existing research related to shape completion has been focused on completing shapes by learning a one-to-one map** which limits the diversity and creativity of the produced results. We propose a novel multimodal shape completion technique that is effectively able to learn a one-to-many map** and generates diverse complete shapes. Our approach is based on the conditional Implicit MaximumLikelihood Estimation (IMLE) technique wherein we condition our inputs on partial 3D point clouds. We extensively evaluate our approach by comparing it to various baselines both quantitatively and qualitatively. We show that our method is superior to alternatives in terms of completeness and diversity of shapes. △ Less

Submitted 7 July, 2021; v1 submitted 30 June, 2021; originally announced June 2021.

Comments: Project Website: https://sites.google.com/site/alimahdaviamiri/projects/shape-completion

arXiv:2101.05970 [pdf, other]

Affordance-based Reinforcement Learning for Urban Driving

Authors: Tanmay Agarwal, Hitesh Arora, Jeff Schneider

Abstract: Traditional autonomous vehicle pipelines that follow a modular approach have been very successful in the past both in academia and industry, which has led to autonomy deployed on road. Though this approach provides ease of interpretation, its generalizability to unseen environments is limited and hand-engineering of numerous parameters is required, especially in the prediction and planning systems… ▽ More Traditional autonomous vehicle pipelines that follow a modular approach have been very successful in the past both in academia and industry, which has led to autonomy deployed on road. Though this approach provides ease of interpretation, its generalizability to unseen environments is limited and hand-engineering of numerous parameters is required, especially in the prediction and planning systems. Recently, deep reinforcement learning has been shown to learn complex strategic games and perform challenging robotic tasks, which provides an appealing framework for learning to drive. In this work, we propose a deep reinforcement learning framework to learn optimal control policy using waypoints and low-dimensional visual representations, also known as affordances. We demonstrate that our agents when trained from scratch learn the tasks of lane-following, driving around inter-sections as well as stop** in front of other actors or traffic lights even in the dense traffic setting. We note that our method achieves comparable or better performance than the baseline methods on the original and NoCrash benchmarks on the CARLA simulator. △ Less

Submitted 15 January, 2021; originally announced January 2021.

arXiv:2101.04456 [pdf]

A character representation enhanced on-device Intent Classification

Authors: Sudeep Deepak Shivnikar, Himanshu Arora, Harichandana B S S

Abstract: Intent classification is an important task in natural language understanding systems. Existing approaches have achieved perfect scores on the benchmark datasets. However they are not suitable for deployment on low-resource devices like mobiles, tablets, etc. due to their massive model size. Therefore, in this paper, we present a novel light-weight architecture for intent classification that can ru… ▽ More Intent classification is an important task in natural language understanding systems. Existing approaches have achieved perfect scores on the benchmark datasets. However they are not suitable for deployment on low-resource devices like mobiles, tablets, etc. due to their massive model size. Therefore, in this paper, we present a novel light-weight architecture for intent classification that can run efficiently on a device. We use character features to enrich the word representation. Our experiments prove that our proposed model outperforms existing approaches and achieves state-of-the-art results on benchmark datasets. We also report that our model has tiny memory footprint of ~5 MB and low inference time of ~2 milliseconds, which proves its efficiency in a resource-constrained environment. △ Less

Submitted 12 January, 2021; originally announced January 2021.

Comments: Accepted for publication in ICON 2020: 17th International Conference on Natural Language Processing

arXiv:2008.05723 [pdf, other]

Contextual Diversity for Active Learning

Authors: Sharat Agarwal, Himanshu Arora, Saket Anand, Chetan Arora

Abstract: Requirement of large annotated datasets restrict the use of deep convolutional neural networks (CNNs) for many practical applications. The problem can be mitigated by using active learning (AL) techniques which, under a given annotation budget, allow to select a subset of data that yields maximum accuracy upon fine tuning. State of the art AL approaches typically rely on measures of visual diversi… ▽ More Requirement of large annotated datasets restrict the use of deep convolutional neural networks (CNNs) for many practical applications. The problem can be mitigated by using active learning (AL) techniques which, under a given annotation budget, allow to select a subset of data that yields maximum accuracy upon fine tuning. State of the art AL approaches typically rely on measures of visual diversity or prediction uncertainty, which are unable to effectively capture the variations in spatial context. On the other hand, modern CNN architectures make heavy use of spatial context for achieving highly accurate predictions. Since the context is difficult to evaluate in the absence of ground-truth labels, we introduce the notion of contextual diversity that captures the confusion associated with spatially co-occurring classes. Contextual Diversity (CD) hinges on a crucial observation that the probability vector predicted by a CNN for a region of interest typically contains information from a larger receptive field. Exploiting this observation, we use the proposed CD measure within two AL frameworks: (1) a core-set based strategy and (2) a reinforcement learning based policy, for active frame selection. Our extensive empirical evaluation establish state of the art results for active learning on benchmark datasets of Semantic Segmentation, Object Detection and Image Classification. Our ablation studies show clear advantages of using contextual diversity for active learning. The source code and additional results are available at https://github.com/sharat29ag/CDAL. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: A variant of this report is accepted in ECCV 2020

arXiv:2004.04146 [pdf, other]

Complex Network Analysis of Indian Railway Zones

Authors: Nikhil Kumar Rajput, Piyush Badola, Harshit Arora, Bhavya Ahuja Grover

Abstract: Indian Railway Network has been analyzed on the basis of number of trains directly linking two railway zones. The network has been displayed as a weighted graph where the weights denote the number of trains between the zones. It may be pointed out that each zone is a complex network in itself and may depict different characteristic features. The zonal network therefore can be considered as a netwo… ▽ More Indian Railway Network has been analyzed on the basis of number of trains directly linking two railway zones. The network has been displayed as a weighted graph where the weights denote the number of trains between the zones. It may be pointed out that each zone is a complex network in itself and may depict different characteristic features. The zonal network therefore can be considered as a network of complex networks. In this paper, self links, in-degree and out-degree of each zone have been computed which provides information about the inter and intra zonal connectivity. Degree passenger correlation which gives an idea about number of trains and passengers originating from a particular zone which might play a role in policy making decisions has also been studied. Some other complex network parameters like betweenness, clustering coefficient and cliques have been obtained to get more insight about the complex Indian zonal network. △ Less

Submitted 8 April, 2020; originally announced April 2020.

arXiv:1902.09685 [pdf, ps, other]

doi 10.4204/EPTCS.299.7

Iteratively Composing Statically Verified Traits

Authors: Isaac Oscar Gariano, Marco Servetto, Alex Potanin, Hrshikesh Arora

Abstract: Static verification relying on an automated theorem prover can be very slow and brittle: since static verification is undecidable, correct code may not pass a particular static verifier. In this work we use metaprogramming to generate code that is correct by construction. A theorem prover is used only to verify initial "traits": units of code that can be used to compose bigger programs. In our w… ▽ More Static verification relying on an automated theorem prover can be very slow and brittle: since static verification is undecidable, correct code may not pass a particular static verifier. In this work we use metaprogramming to generate code that is correct by construction. A theorem prover is used only to verify initial "traits": units of code that can be used to compose bigger programs. In our work, meta-programming is done by trait composition, which starting from correct code, is guaranteed to produce correct code. We do this by extending conventional traits with pre- and post-conditions for the methods; we also extend the traditional trait composition (+) operator to check the compatibility of contracts. In this way, there is no need to re-verify the produced code. We show how our approach can be applied to the standard "power" function example, where metaprogramming generates optimised, and correct, versions when the exponent is known in advance. △ Less

Submitted 20 August, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

Comments: In Proceedings VPT 2019, arXiv:1908.06723

Journal ref: EPTCS 299, 2019, pp. 49-55

arXiv:1902.05436 [pdf, ps, other]

Checking Observational Purity of Procedures

Authors: Himanshu Arora, Raghavan Komondoor, G. Ramalingam

Abstract: Verifying whether a procedure is observationally pure is useful in many software engineering scenarios. An observationally pure procedure always returns the same value for the same argument, and thus mimics a mathematical function. The problem is challenging when procedures use private mutable global variables, e.g., for memoization of frequently returned answers, and when they involve recursion.… ▽ More Verifying whether a procedure is observationally pure is useful in many software engineering scenarios. An observationally pure procedure always returns the same value for the same argument, and thus mimics a mathematical function. The problem is challenging when procedures use private mutable global variables, e.g., for memoization of frequently returned answers, and when they involve recursion. We present a novel verification approach for this problem. Our approach involves encoding the procedure's code as a formula that is a disjunction of path constraints, with the recursive calls being replaced in the formula with references to a mathematical function symbol. Then, a theorem prover is invoked to check whether the formula that has been constructed agrees with the function symbol referred to above in terms of input-output behavior for all arguments. We evaluate our approach on a set of realistic examples, using the Boogie intermediate language and theorem prover. Our evaluation shows that the invariants are easy to construct manually, and that our approach is effective at verifying observationally pure procedures. △ Less

Submitted 14 February, 2019; originally announced February 2019.

Comments: FASE 2019

arXiv:1902.00546 [pdf]

doi 10.22152/programming-journal.org/2019/3/12

Separating Use and Reuse to Improve Both

Authors: Hrshikesh Arora, Marco Servetto, Bruno C. D. S. Oliveira

Abstract: Context: Trait composition has inspired new research in the area of code reuse for object oriented (OO) languages. One of the main advantages of this kind of composition is that it makes possible to separate subty** from subclassing; which is good for code-reuse, design and reasoning. However, handling of state within traits is difficult, verbose or inelegant. Inquiry: We identify the this-leaki… ▽ More Context: Trait composition has inspired new research in the area of code reuse for object oriented (OO) languages. One of the main advantages of this kind of composition is that it makes possible to separate subty** from subclassing; which is good for code-reuse, design and reasoning. However, handling of state within traits is difficult, verbose or inelegant. Inquiry: We identify the this-leaking problem as the fundamental limitation that prevents the separation of subty** from subclassing in conventional OO languages. We explain that the concept of trait composition addresses this problem, by distinguishing code designed for use (as a type) from code designed for reuse (i.e. inherited). We are aware of at least 3 concrete independently designed research languages following this methodology: TraitRecordJ, Package Templates and DeepFJig. Approach: In this paper, we design $42_μ$ a new language, where we improve use and reuse and support the This type and family polymorphism by distinguishing code designed for use from code designed for reuse. In this way $42_μ$ synthesise the 3 approaches above, and improves them with abstract state operations: a new elegant way to handle state composition in trait based languages. Knowledge and Grounding: Using case studies, we show that $42_μ$'s model of traits with abstract state operations is more usable and compact than prior work. We formalise our work and prove that type errors cannot arise from composing well typed code. Importance: This work is the logical core of the programming language 42. This shows that the ideas presented in this paper can be applicable to a full general purpose language. This form of composition is very flexible and could be used in many new languages. △ Less

Submitted 1 February, 2019; originally announced February 2019.

Journal ref: The Art, Science, and Engineering of Programming, 2019, Vol. 3, Issue 3, Article 12

arXiv:1802.01034 [pdf, other]

Multi-task Learning for Continuous Control

Authors: Himani Arora, Rajath Kumar, Jason Krone, Chong Li

Abstract: Reliable and effective multi-task learning is a prerequisite for the development of robotic agents that can quickly learn to accomplish related, everyday tasks. However, in the reinforcement learning domain, multi-task learning has not exhibited the same level of success as in other domains, such as computer vision. In addition, most reinforcement learning research on multi-task learning has been… ▽ More Reliable and effective multi-task learning is a prerequisite for the development of robotic agents that can quickly learn to accomplish related, everyday tasks. However, in the reinforcement learning domain, multi-task learning has not exhibited the same level of success as in other domains, such as computer vision. In addition, most reinforcement learning research on multi-task learning has been focused on discrete action spaces, which are not used for robotic control in the real-world. In this work, we apply multi-task learning methods to continuous action spaces and benchmark their performance on a series of simulated continuous control tasks. Most notably, we show that multi-task learning outperforms our baselines and alternative knowledge sharing methods. △ Less

Submitted 3 February, 2018; originally announced February 2018.

arXiv:1710.09798 [pdf, other]

Lip2AudSpec: Speech reconstruction from silent lip movements video

Authors: Hassan Akbari, Himani Arora, Liangliang Cao, Nima Mesgarani

Abstract: In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram w… ▽ More In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98% correlation and also improves the quality of reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with superior word recognition accuracy. △ Less

Submitted 26 October, 2017; originally announced October 2017.

arXiv:1701.04743 [pdf, other]

doi 10.1109/WACV.2017.57

Computing Egomotion with Local Loop Closures for Egocentric Videos

Authors: Suvam Patra, Himanshu Aggarwal, Himani Arora, Chetan Arora, Subhashis Banerjee

Abstract: Finding the camera pose is an important step in many egocentric video applications. It has been widely reported that, state of the art SLAM algorithms fail on egocentric videos. In this paper, we propose a robust method for camera pose estimation, designed specifically for egocentric videos. In an egocentric video, the camera views the same scene point multiple times as the wearer's head sweeps ba… ▽ More Finding the camera pose is an important step in many egocentric video applications. It has been widely reported that, state of the art SLAM algorithms fail on egocentric videos. In this paper, we propose a robust method for camera pose estimation, designed specifically for egocentric videos. In an egocentric video, the camera views the same scene point multiple times as the wearer's head sweeps back and forth. We use this specific motion profile to perform short loop closures aligned with wearer's footsteps. For egocentric videos, depth estimation is usually noisy. In an important departure, we use 2D computations for rotation averaging which do not rely upon depth estimates. The two modification results in much more stable algorithm as is evident from our experiments on various egocentric video datasets for different egocentric applications. The proposed algorithm resolves a long standing problem in egocentric vision and unlocks new usage scenarios for future applications. △ Less

Submitted 17 January, 2017; originally announced January 2017.

Comments: Accepted in WACV 2017

Showing 1–19 of 19 results for author: Arora, H