Skip to main content

Showing 1–17 of 17 results for author: Huang, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08164  [pdf, other

    cs.CV

    ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

    Authors: Irene Huang, Wei Lin, M. Jehanzeb Mirza, Jacob A. Hansen, Sivan Doveh, Victor Ion Butoi, Roei Herzig, Assaf Arbelle, Hilde Kuhene, Trevor Darrel, Chuang Gan, Aude Oliva, Rogerio Feris, Leonid Karlinsky

    Abstract: Compositional Reasoning (CR) entails gras** the significance of attributes, relations, and word order. Recent Vision-Language Models (VLMs), comprising a visual encoder and a Large Language Model (LLM) decoder, have demonstrated remarkable proficiency in such reasoning tasks. This prompts a crucial question: have VLMs effectively tackled the CR challenge? We conjecture that existing CR benchmark… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: The first three authors contributed equally

  2. arXiv:2404.17672  [pdf, other

    cs.CV cs.GR

    BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

    Authors: Ian Huang, Guandao Yang, Leonidas Guibas

    Abstract: Graphics design is important for various applications, including movie production and game design. To create a high-quality scene, designers usually need to spend hours in software like Blender, in which they might need to interleave and repeat operations, such as connecting material nodes, hundreds of times. Moreover, slightly different design goals may require completely different sequences, mak… ▽ More

    Submitted 22 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  3. arXiv:2312.06663  [pdf, other

    cs.CV cs.GR

    CAD: Photorealistic 3D Generation via Adversarial Distillation

    Authors: Ziyu Wan, Despoina Paschalidou, Ian Huang, Hongyu Liu, Bokui Shen, Xiaoyu Xiang, **g Liao, Leonidas Guibas

    Abstract: The increased demand for 3D data in AR/VR, robotics and gaming applications, gave rise to powerful generative pipelines capable of synthesizing high-quality 3D objects. Most of these models rely on the Score Distillation Sampling (SDS) algorithm to optimize a 3D representation such that the rendered image maintains a high likelihood as evaluated by a pre-trained diffusion model. However, finding a… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Project page: http://raywzy.com/CAD/

  4. arXiv:2306.14649  [pdf, other

    cs.NE

    CIMulator: A Comprehensive Simulation Platform for Computing-In-Memory Circuit Macros with Low Bit-Width and Real Memory Materials

    Authors: Hoang-Hiep Le, Md. Aftab Baig, Wei-Chen Hong, Cheng-Hsien Tsai, Cheng-Jui Yeh, Fu-Xiang Liang, I-Ting Huang, Wei-Tzu Tsai, Ting-Yin Cheng, Sourav De, Nan-Yow Chen, Wen-Jay Lee, Ing-Chao Lin, Da-Wei Chang, Darsen D. Lu

    Abstract: This paper presents a simulation platform, namely CIMulator, for quantifying the efficacy of various synaptic devices in neuromorphic accelerators for different neural network architectures. Nonvolatile memory devices, such as resistive random-access memory, ferroelectric field-effect transistor, and volatile static random-access memory devices, can be selected as synaptic devices. A multilayer pe… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  5. arXiv:2306.06212  [pdf, other

    cs.CV cs.GR

    Aladdin: Zero-Shot Hallucination of Stylized 3D Assets from Abstract Scene Descriptions

    Authors: Ian Huang, Vrishab Krishna, Omoruyi Atekha, Leonidas Guibas

    Abstract: What constitutes the "vibe" of a particular scene? What should one find in "a busy, dirty city street", "an idyllic countryside", or "a crime scene in an abandoned living room"? The translation from abstract scene descriptions to stylized scene elements cannot be done with any generality by extant systems trained on rigid and limited indoor datasets. In this paper, we propose to leverage the knowl… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  6. arXiv:2304.09185  [pdf, other

    cs.CL cs.AI

    Token Imbalance Adaptation for Radiology Report Generation

    Authors: Yuexin Wu, I-Chan Huang, Xiaolei Huang

    Abstract: Imbalanced token distributions naturally exist in text documents, leading neural language models to overfit on frequent tokens. The token imbalance may dampen the robustness of radiology report generators, as complex medical terms appear less frequently but reflect more medical information. In this study, we demonstrate how current state-of-the-art models fail to generate infrequent tokens on two… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted by CHIL2023

  7. arXiv:2303.16138  [pdf, other

    cs.RO

    DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets

    Authors: Isabella Huang, Yashraj Narang, Ruzena Bajcsy, Fabio Ramos, Tucker Hermans, Dieter Fox

    Abstract: Robotic gras** of 3D deformable objects is critical for real-world applications such as food handling and robotic surgery. Unlike rigid and articulated objects, 3D deformable objects have infinite degrees of freedom. Fully defining their state requires 3D deformation and stress fields, which are exceptionally difficult to analytically compute or experimentally measure. Thus, evaluating grasp can… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: To be published in the IEEE Conference on Robotics and Automation (ICRA), 2023

  8. The Media Inequality, Uncanny Mountain, and the Singularity is Far from Near: Iwaa and Sophia Robot versus a Real Human Being

    Authors: Johan F. Hoorn, Ivy S. Huang

    Abstract: Design of Artificial Intelligence and robotics habitually assumes that adding more humanlike features improves the user experience, mainly kept in check by suspicion of uncanny effects. Three strands of theorizing are brought together for the first time and empirically put to the test: Media Equation (and in its wake, Computers Are Social Actors), Uncanny Valley theory, and as an extreme of human-… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  9. arXiv:2302.10727  [pdf

    cs.RO

    Design Project of an Open-Source, Low-Cost, and Lightweight Robotic Manipulator for High School Students

    Authors: Isabella Huang, Qianwen Zhao, Maxine Fontaine, Long Wang

    Abstract: In recent years, there is an increasing interest in high school robotics extracurriculars such as robotics clubs and robotics competitions. The growing demand is a result of more ubiquitous open-source software and affordable off-the-shelf hardware kits, which significantly help lower the barrier for entry-level robotics hobbyists. In this project, we present an open-source, low-cost, and lightwei… ▽ More

    Submitted 16 March, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Accepted to ASEE Zone 1 Conference

  10. arXiv:2212.05011  [pdf, other

    cs.CV cs.CL

    LADIS: Language Disentanglement for 3D Shape Editing

    Authors: Ian Huang, Panos Achlioptas, Tianyi Zhang, Sergey Tulyakov, Minhyuk Sung, Leonidas Guibas

    Abstract: Natural language interaction is a promising direction for democratizing 3D shape design. However, existing methods for text-driven 3D shape editing face challenges in producing decoupled, local edits to 3D shapes. We address this problem by learning disentangled latent representations that ground language in 3D geometry. To this end, we propose a complementary tool set including a novel network ar… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  11. DefGraspSim: Physics-based simulation of grasp outcomes for 3D deformable objects

    Authors: Isabella Huang, Yashraj Narang, Clemens Eppner, Balakumar Sundaralingam, Miles Macklin, Ruzena Bajcsy, Tucker Hermans, Dieter Fox

    Abstract: Robotic gras** of 3D deformable objects (e.g., fruits/vegetables, internal organs, bottles/boxes) is critical for real-world applications such as food processing, robotic surgery, and household automation. However, develo** grasp strategies for such objects is uniquely challenging. Unlike rigid objects, deformable objects have infinite degrees of freedom and require field quantities (e.g., def… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

    Comments: For associated web page, see \url{https://sites.google.com/nvidia.com/defgraspsim}. To be published in the IEEE Robotics and Automation Letters (RA-L) special issue on Robotic Handling of Deformable Objects, 2022. arXiv admin note: substantial text overlap with arXiv:2107.05778

  12. arXiv:2112.06390  [pdf, other

    cs.CV

    PartGlot: Learning Shape Part Segmentation from Language Reference Games

    Authors: Juil Koo, Ian Huang, Panos Achlioptas, Leonidas Guibas, Minhyuk Sung

    Abstract: We introduce PartGlot, a neural framework and associated architectures for learning semantic part segmentation of 3D shape geometry, based solely on part referential language. We exploit the fact that linguistic descriptions of a shape can provide priors on the shape's parts -- as natural language has evolved to reflect human perception of the compositional structure of objects, essential to their… ▽ More

    Submitted 30 March, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

    Comments: CVPR 2022 (Oral)

  13. arXiv:2111.01391  [pdf, other

    cs.RO

    IPC-GraspSim: Reducing the Sim2Real Gap for Parallel-Jaw Gras** with the Incremental Potential Contact Model

    Authors: Chung Min Kim, Michael Danielczuk, Isabella Huang, Ken Goldberg

    Abstract: Accurately simulating whether an object will be lifted securely or dropped during gras** is a longstanding Sim2Real challenge. Soft compliant jaw tips are almost universally used with parallel-jaw robot grippers due to their ability to increase contact area and friction between the jaws and the object to be manipulated. However, interactions between the compliant surfaces and rigid objects are n… ▽ More

    Submitted 1 March, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

  14. arXiv:2107.05778  [pdf, other

    cs.RO

    DefGraspSim: Simulation-based gras** of 3D deformable objects

    Authors: Isabella Huang, Yashraj Narang, Clemens Eppner, Balakumar Sundaralingam, Miles Macklin, Tucker Hermans, Dieter Fox

    Abstract: Robotic gras** of 3D deformable objects (e.g., fruits/vegetables, internal organs, bottles/boxes) is critical for real-world applications such as food processing, robotic surgery, and household automation. However, develo** grasp strategies for such objects is uniquely challenging. In this work, we efficiently simulate grasps on a wide range of 3D deformable objects using a GPU-based implement… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Comments: 11 pages, 19 figures. For associated website and code repository, see https://sites.google.com/nvidia.com/defgraspsim and https://github.com/NVlabs/deformable_object_gras**. Published in DO-Sim: Workshop on Deformable Object Simulation in Robotics at Robotics: Science and Systems (RSS) 2021

  15. arXiv:1911.02320  [pdf, other

    cs.RO cs.HC cs.LG

    Nonverbal Robot Feedback for Human Teachers

    Authors: Sandy H. Huang, Isabella Huang, Ravi Pandya, Anca D. Dragan

    Abstract: Robots can learn preferences from human demonstrations, but their success depends on how informative these demonstrations are. Being informative is unfortunately very challenging, because during teaching, people typically get no transparency into what the robot already knows or has learned so far. In contrast, human students naturally provide a wealth of nonverbal feedback that reveals their level… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

    Comments: CoRL 2019

  16. arXiv:1808.04486  [pdf, other

    cs.DB

    DeepBase: Deep Inspection of Neural Networks

    Authors: Thibault Sellam, Kevin Lin, Ian Yiran Huang, Yiru Chen, Michelle Yang, Carl Vondrick, Eugene Wu

    Abstract: Although deep learning models perform remarkably well across a range of tasks such as language translation and object recognition, it remains unclear what high-level logic, if any, they follow. Understanding this logic may lead to more transparency, better model design, and faster experimentation. Recent machine learning research has leveraged statistical methods to identify hidden units that beha… ▽ More

    Submitted 7 January, 2019; v1 submitted 13 August, 2018; originally announced August 2018.

  17. arXiv:1704.03944  [pdf, other

    cs.CV stat.ML

    Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

    Authors: Yuting Zhang, Luyao Yuan, Yijie Guo, Zhiyuan He, I-An Huang, Honglak Lee

    Abstract: Associating image regions with text queries has been recently explored as a new way to bridge visual and linguistic representations. A few pioneering approaches have been proposed based on recurrent neural language models trained generatively (e.g., generating captions), but achieving somewhat limited localization accuracy. To better address natural-language-based visual entity localization, we pr… ▽ More

    Submitted 17 April, 2017; v1 submitted 12 April, 2017; originally announced April 2017.

    Comments: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017