Skip to main content

Showing 51–100 of 233 results for author: Malik, J

.
  1. arXiv:2210.03109  [pdf, other

    cs.RO cs.CV cs.LG

    Real-World Robot Learning with Masked Visual Pre-training

    Authors: Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell

    Abstract: In this work, we explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks. Like prior work, our visual representations are pre-trained via a masked autoencoder (MAE), frozen, and then passed into a learnable control module. Unlike prior work, we show that the pre-trained representations are effective across a range of real-world robotic ta… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: CoRL 2022; Project page: https://tetexiao.com/projects/real-mvp

  2. arXiv:2209.12892  [pdf, other

    cs.LG cs.CV

    Learning to Learn with Generative Models of Neural Network Checkpoints

    Authors: William Peebles, Ilija Radosavovic, Tim Brooks, Alexei A. Efros, Jitendra Malik

    Abstract: We explore a data-driven approach for learning to optimize neural networks. We construct a dataset of neural network checkpoints and train a generative model on the parameters. In particular, our model is a conditional diffusion transformer that, given an initial input parameter vector and a prompted loss, error, or return, predicts the distribution over parameter updates that achieve the desired… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: Code available at https://www.github.com/wpeebles/G.pt . Project page and videos available at https://www.wpeebles.com/Gpt

  3. arXiv:2209.09232  [pdf, other

    cs.RO cs.AI

    Learning a Single Near-hover Position Controller for Vastly Different Quadcopters

    Authors: Dingqi Zhang, Antonio Loquercio, Xiangyu Wu, Ashish Kumar, Jitendra Malik, Mark W. Mueller

    Abstract: This paper proposes an adaptive near-hover position controller for quadcopters, which can be deployed to quadcopters of very different mass, size and motor constants, and also shows rapid adaptation to unknown disturbances during runtime. The core algorithmic idea is to learn a single policy that can adapt online at test time not only to the disturbances applied to the drone, but also to the robot… ▽ More

    Submitted 2 May, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: Comprehensive video results can be found on the project webpage https://dz298.github.io/universal-drone-controller/

  4. arXiv:2209.02778  [pdf, other

    cs.RO cs.LG

    Multi-skill Mobile Manipulation for Object Rearrangement

    Authors: Jiayuan Gu, Devendra Singh Chaplot, Hao Su, Jitendra Malik

    Abstract: We study a modular approach to tackle long-horizon mobile manipulation tasks for object rearrangement, which decomposes a full task into a sequence of subtasks. To tackle the entire task, prior work chains multiple stationary manipulation skills with a point-goal navigation skill, which are learned individually on subtasks. Although more effective than monolithic end-to-end RL policies, this frame… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Comments: Project website: https://sites.google.com/view/hab-m3

  5. arXiv:2207.08139  [pdf

    cs.CV cs.AI

    2D Self-Organized ONN Model For Handwritten Text Recognition

    Authors: Hanadi Hassen Mohammed, Junaid Malik, Somaya Al-Madeed, Serkan Kiranyaz

    Abstract: Deep Convolutional Neural Networks (CNNs) have recently reached state-of-the-art Handwritten Text Recognition (HTR) performance. However, recent research has shown that typical CNNs' learning performance is limited since they are homogeneous networks with a simple (linear) neuron model. With their heterogeneous network structure incorporating non-linear neurons, Operational Neural Networks (ONNs)… ▽ More

    Submitted 17 July, 2022; originally announced July 2022.

    Comments: To appear in in Applied Soft Computing Journal (Elsevier)

  6. arXiv:2206.00888  [pdf, other

    eess.AS cs.CL cs.SD

    Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

    Authors: Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer

    Abstract: The recently proposed Conformer model has become the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture that captures both local and global features. However, through a series of systematic studies, we find that the Conformer architecture's design choices are not optimal. After re-examining the design choices for both the macro and mi… ▽ More

    Submitted 15 October, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  7. arXiv:2205.15299  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Adapting Rapid Motor Adaptation for Bipedal Robots

    Authors: Ashish Kumar, Zhongyu Li, Jun Zeng, Deepak Pathak, Koushil Sreenath, Jitendra Malik

    Abstract: Recent advances in legged locomotion have enabled quadrupeds to walk on challenging terrains. However, bipedal robots are inherently more unstable and hence it's harder to design walking controllers for them. In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots. Similar to existing works, we start with a base policy which p… ▽ More

    Submitted 6 September, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: First two authors contributed equally. Website at https://ashish-kmr.github.io/a-rma/

  8. arXiv:2204.06107  [pdf, other

    cs.CV

    Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

    Authors: Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran

    Abstract: Open-world instance segmentation is the task of grou** pixels into object instances without any pre-determined taxonomy. This is challenging, as state-of-the-art methods rely on explicit class semantics obtained from large labeled datasets, and out-of-domain evaluation performance drops significantly. Here we propose a novel approach for mask proposals, Generic Grou** Networks (GGNs), construc… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: CVPR 2022

  9. Symmetric Bidirectional Quantum Teleportation using a Six-Qubit Cluster State as a Quantum Channel

    Authors: Javid A Malik, Muzaffar Qadir Lone, Rayees A Malla

    Abstract: Bidirectional quantum teleportation is a fundamental protocol for exchanging quantum information between two quantum nodes. All bidirectional quantum teleportation protocols till now have achieved a maximum efficiency of $40\%$. Here, we propose a new scheme for symmetric bidirectional quantum teleportation using a six-qubit cluster state as the quantum channel, for symmetric ($3\leftrightarrow3$)… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

  10. arXiv:2203.06173  [pdf, other

    cs.CV cs.LG cs.RO

    Masked Visual Pre-training for Motor Control

    Authors: Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik

    Abstract: This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels. We first train the visual representations by masked modeling of natural images. We then freeze the visual encoder and train neural network controllers on top with reinforcement learning. We do not perform any task-specific fine-tuning of the encoder; the same… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: Code and videos at: https://tetexiao.com/projects/mvp

  11. arXiv:2202.09517  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Deep Learning for Hate Speech Detection: A Comparative Study

    Authors: Jitendra Singh Malik, Hezhe Qiao, Guansong Pang, Anton van den Hengel

    Abstract: Automated hate speech detection is an important tool in combating the spread of hate speech, particularly in social media. Numerous methods have been developed for the task, including a recent proliferation of deep-learning based approaches. A variety of datasets have also been developed, exemplifying various manifestations of the hate-speech detection problem. We present here a large-scale empiri… ▽ More

    Submitted 6 December, 2023; v1 submitted 18 February, 2022; originally announced February 2022.

    Comments: 18 pages, 4 figures, and 6 tables

  12. arXiv:2202.05265  [pdf, other

    cs.LG cs.CV eess.IV q-bio.QM stat.ML

    Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging

    Authors: Anastasios N Angelopoulos, Amit P Kohli, Stephen Bates, Michael I Jordan, Jitendra Malik, Thayer Alshaabi, Srigokul Upadhyayula, Yaniv Romano

    Abstract: Image-to-image regression is an important learning task, used frequently in biological imaging. Current algorithms, however, do not generally offer statistical guarantees that protect against a model's mistakes and hallucinations. To address this, we develop uncertainty quantification techniques with rigorous statistical guarantees for image-to-image regression problems. In particular, we show how… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: Code available at https://github.com/aangelopoulos/im2im-uq

  13. arXiv:2202.00589  [pdf

    eess.SP cs.AI cs.LG

    Blind ECG Restoration by Operational Cycle-GANs

    Authors: Serkan Kiranyaz, Ozer Can Devecioglu, Turker Ince, Junaid Malik, Muhammad Chowdhury, Tahir Hamid, Rashid Mazhar, Amith Khandakar, Anas Tahir, Tawsifur Rahman, Moncef Gabbouj

    Abstract: Continuous long-term monitoring of electrocardiography (ECG) signals is crucial for the early detection of cardiac abnormalities such as arrhythmia. Non-clinical ECG recordings acquired by Holter and wearable ECG sensors often suffer from severe artifacts such as baseline wander, signal cuts, motion artifacts, variations on QRS amplitude, noise, and other interferences. Usually, a set of such arti… ▽ More

    Submitted 29 January, 2022; originally announced February 2022.

    Comments: 16 pages, 10 figures, journal article submission

  14. arXiv:2201.10029  [pdf, other

    cs.CV cs.AI

    PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

    Authors: Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman

    Abstract: State-of-the-art approaches to ObjectGoal navigation rely on reinforcement learning and typically require significant computational resources and time for learning. We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?' for an object and `how to navigate to (x, y)?'. Our key insight is that… ▽ More

    Submitted 17 June, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: 8 pages + supplementary. Accepted in CVPR 2022

  15. arXiv:2201.08383  [pdf, other

    cs.CV

    MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

    Authors: Chao-Yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

    Abstract: While today's video recognition systems parse snapshots or short clips accurately, they cannot connect the dots and reason across a longer range of time yet. Most existing video architectures can only process <5 seconds of a video without hitting the computation or memory bottlenecks. In this paper, we propose a new strategy to overcome this challenge. Instead of trying to process more frames at… ▽ More

    Submitted 30 November, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: Technical report. arXiv v2: add link to code

  16. arXiv:2201.06091  [pdf, other

    physics.med-ph

    Parallel transmit PUlse design for Saturation Homogeneity (PUSH) for Magnetization Transfer imaging at 7T

    Authors: David Leitão, Raphael Tomi-Tricot, Pip Bridgen, Tom Wilkinson, Patrick Liebig, Rene Gumbrecht, Dieter Ritter, Sharon L. Giles, Ana Baburamani, Jan Sedlacik, Joseph V. Hajnal, Shaihan J. Malik

    Abstract: Purpose: This work proposes a novel RF pulse design for parallel transmit (pTx) systems to obtain uniform saturation of semisolid magnetization for Magnetization Transfer (MT) contrast in the presence of transmit field ($B_1^+$) inhomogeneities. The semisolid magnetization is usually modeled as being purely longitudinal, with the applied $B_1^+$ field saturating but not rotating its magnetization,… ▽ More

    Submitted 16 January, 2022; originally announced January 2022.

    Comments: 18 pages, 9 figures. Code available at: https://github.com/mriphysics/PUSH

  17. arXiv:2201.02076  [pdf

    physics.med-ph

    Universal pulses for homogeneous excitation using single channel coils

    Authors: Ronald Mooiweer, Ian A. Clark, Eleanor A. Maguire, Martina F. Callaghan, Jospeh V. Hajnal, Shaihan J. Malik

    Abstract: Purpose: Universal Pulses (UPs) are excitation pulses that reduce the flip angle inhomogeneity in high field MRI systems without subject-specific optimization, originally developed for parallel transmit (PTX) systems at 7T. We investigated the potential benefits of UPs for single channel (SC) transmit systems at 3T, which are widely used for clinical and research imaging, and for which flip angle… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

    Comments: Submitted to Magnetic Resonance Imaging

  18. arXiv:2112.04477  [pdf, other

    cs.CV

    Tracking People by Predicting 3D Appearance, Location & Pose

    Authors: Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

    Abstract: In this paper, we present an approach for tracking people in monocular videos, by predicting their future 3D representations. To achieve this, we first lift people to 3D from a single frame in a robust way. This lifting includes information about the 3D pose of the person, his or her location in the 3D space, and the 3D appearance. As we track a person, we collect 3D observations over time in a tr… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

    Comments: Project Page : https://brjathu.github.io/PHALP/

  19. arXiv:2112.02094  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Coupling Vision and Proprioception for Navigation of Legged Robots

    Authors: Zipeng Fu, Ashish Kumar, Ananye Agarwal, Haozhi Qi, Jitendra Malik, Deepak Pathak

    Abstract: We exploit the complementary strengths of vision and proprioception to develop a point-goal navigation system for legged robots, called VP-Nav. Legged systems are capable of traversing more complex terrain than wheeled robots, but to fully utilize this capability, we need a high-level path planner in the navigation system to be aware of the walking capabilities of the low-level locomotion policy i… ▽ More

    Submitted 24 July, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

    Comments: CVPR 2022 final version. Website at https://navigation-locomotion.github.io

  20. arXiv:2112.01526  [pdf, other

    cs.CV

    MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

    Authors: Yanghao Li, Chao-Yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer

    Abstract: In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection. We present an improved version of MViT that incorporates decomposed relative positional embeddings and residual pooling connections. We instantiate this architecture in five sizes and evaluate it for ImageNet classification, COCO detection and K… ▽ More

    Submitted 30 March, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: CVPR 2022 Camera Ready

  21. arXiv:2112.01010  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Differentiable Spatial Planning using Transformers

    Authors: Devendra Singh Chaplot, Deepak Pathak, Jitendra Malik

    Abstract: We consider the problem of spatial path planning. In contrast to the classical solutions which optimize a new plan from scratch and assume access to the full map with ground truth obstacle locations, we learn a planner from the data in a differentiable manner that allows us to leverage statistical regularities from past data. We propose Spatial Planning Transformers (SPT), which given an obstacle… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: Published at ICML 2021. See project webpage at https://devendrachaplot.github.io/projects/spatial-planning-transformers

  22. arXiv:2112.01001  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency

    Authors: Devendra Singh Chaplot, Murtaza Dalal, Saurabh Gupta, Jitendra Malik, Ruslan Salakhutdinov

    Abstract: In this paper, we explore how we can build upon the data and models of Internet images and use them to adapt to robot vision without requiring any extra labels. We present a framework called Self-supervised Embodied Active Learning (SEAL). It utilizes perception models trained on internet images to learn an active exploration policy. The observations gathered by this exploration policy are labelle… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: Published at NeurIPS 2021. See project webpage at https://devendrachaplot.github.io/projects/seal

  23. arXiv:2111.14948  [pdf

    cs.CV cs.LG

    Image denoising by Super Neurons: Why go deep?

    Authors: Junaid Malik, Serkan Kiranyaz, Moncef Gabbouj

    Abstract: Classical image denoising methods utilize the non-local self-similarity principle to effectively recover image content from noisy images. Current state-of-the-art methods use deep convolutional neural networks (CNNs) to effectively learn the map** from noisy to clean images. Deep denoising CNNs manifest a high learning capacity and integrate non-local information owing to the large receptive fie… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

  24. arXiv:2111.09887  [pdf, other

    cs.CV cs.LG

    PyTorchVideo: A Deep Learning Library for Video Understanding

    Authors: Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

    Abstract: We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing. The library covers a full stack of video understanding tools including multimodal data loading, transformations, and models tha… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: Technical report

  25. arXiv:2111.07868  [pdf, other

    cs.CV

    Tracking People with 3D Representations

    Authors: Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

    Abstract: We present a novel approach for tracking multiple people in video. Unlike past approaches which employ 2D representations, we focus on using 3D representations of people, located in three-dimensional space. To this end, we develop a method, Human Mesh and Appearance Recovery (HMAR) which in addition to extracting the 3D geometry of the person as a SMPL mesh, also extracts appearance as a texture m… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  26. arXiv:2111.01674  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots

    Authors: Zipeng Fu, Ashish Kumar, Jitendra Malik, Deepak Pathak

    Abstract: Legged locomotion is commonly studied and expressed as a discrete set of gait patterns, like walk, trot, gallop, which are usually treated as given and pre-programmed in legged robots for efficient locomotion at different speeds. However, fixing a set of pre-programmed gaits limits the generality of locomotion. Recent animal motor studies show that these conventional gaits are only prevalent in id… ▽ More

    Submitted 25 October, 2021; originally announced November 2021.

    Comments: CoRL 2021. Website at https://energy-locomotion.github.io

  27. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  28. arXiv:2110.06199  [pdf, other

    cs.CV cs.AI cs.GR

    ABO: Dataset and Benchmarks for Real-World 3D Object Understanding

    Authors: Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F. Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu Guillaumin, Jitendra Malik

    Abstract: We introduce Amazon Berkeley Objects (ABO), a new large-scale dataset designed to help bridge the gap between real and virtual 3D worlds. ABO contains product catalog images, metadata, and artist-created 3D models with complex geometries and physically-based materials that correspond to real, household objects. We derive challenging benchmarks that exploit the unique properties of ABO and measure… ▽ More

    Submitted 24 June, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

  29. arXiv:2110.05472  [pdf, other

    cs.CV

    Differentiable Stereopsis: Meshes from multiple views using differentiable rendering

    Authors: Shubham Goel, Georgia Gkioxari, Jitendra Malik

    Abstract: We propose Differentiable Stereopsis, a multi-view stereo approach that reconstructs shape and texture from few input views and noisy cameras. We pair traditional stereopsis and modern differentiable rendering to build an end-to-end model which predicts textured 3D meshes of objects with varying topologies and shape. We frame stereopsis as an optimization problem and simultaneously update shape an… ▽ More

    Submitted 23 September, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: In CVPR2022. Project webpage: https://shubham-goel.github.io/ds/

    Journal ref: In CVPR 2022 (pp. 8635-8644)

  30. arXiv:2110.04994  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans

    Authors: Ainaz Eftekhar, Alexander Sax, Roman Bachmann, Jitendra Malik, Amir Zamir

    Abstract: This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world. Changing the sampling parameters allows one to "steer" the generated datasets to emphasize specific information. In addition to enabling interesting lines of research, we show the tooling and generated data suffice to train robust vision models. Common… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: ICCV 2021: See project website https://omnidata.vision

  31. Robust Peak Detection for Holter ECGs by Self-Organized Operational Neural Networks

    Authors: Moncef Gabbouj, Serkan Kiranyaz, Junaid Malik, Muhammad Uzair Zahid, Turker Ince, Muhammad Chowdhury, Amith Khandakar, Anas Tahir

    Abstract: Although numerous R-peak detectors have been proposed in the literature, their robustness and performance levels may significantly deteriorate in low-quality and noisy signals acquired from mobile electrocardiogram (ECG) sensors, such as Holter monitors. Recently, this issue has been addressed by deep 1-D convolutional neural networks (CNNs) that have achieved state-of-the-art performance levels i… ▽ More

    Submitted 12 January, 2024; v1 submitted 30 September, 2021; originally announced October 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2110.02215

    Journal ref: in IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 11, pp. 9363-9374, Nov. 2023

  32. arXiv:2110.02215  [pdf

    eess.SP cs.LG

    Real-Time Patient-Specific ECG Classification by 1D Self-Operational Neural Networks

    Authors: Junaid Malik, Ozer Can Devecioglu, Serkan Kiranyaz, Turker Ince, Moncef Gabbouj

    Abstract: Despite the proliferation of numerous deep learning methods proposed for generic ECG classification and arrhythmia detection, compact systems with the real-time ability and high accuracy for classifying patient-specific ECG are still few. Particularly, the scarcity of patient-specific data poses an ultimate challenge to any classifier. Recently, compact 1D Convolutional Neural Networks (CNNs) have… ▽ More

    Submitted 30 September, 2021; originally announced October 2021.

  33. arXiv:2109.14873  [pdf

    cs.LG cs.AI

    Early Bearing Fault Diagnosis of Rotating Machinery by 1D Self-Organized Operational Neural Networks

    Authors: Turker Ince, Junaid Malik, Ozer Can Devecioglu, Serkan Kiranyaz, Onur Avci, Levent Eren, Moncef Gabbouj

    Abstract: Preventive maintenance of modern electric rotating machinery (RM) is critical for ensuring reliable operation, preventing unpredicted breakdowns and avoiding costly repairs. Recently many studies investigated machine learning monitoring methods especially based on Deep Learning networks focusing mostly on detecting bearing faults; however, none of them addressed bearing fault severity classificati… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

  34. arXiv:2109.13604  [pdf

    eess.IV cs.AI cs.CV cs.LG

    Real-Time Glaucoma Detection from Digital Fundus Images using Self-ONNs

    Authors: Ozer Can Devecioglu, Junaid Malik, Turker Ince, Serkan Kiranyaz, Eray Atalay, Moncef Gabbouj

    Abstract: Glaucoma leads to permanent vision disability by damaging the optical nerve that transmits visual images to the brain. The fact that glaucoma does not show any symptoms as it progresses and cannot be stopped at the later stages, makes it critical to be diagnosed in its early stages. Although various deep learning models have been applied for detecting glaucoma from digital fundus images, due to th… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

  35. arXiv:2109.01594  [pdf

    cs.CV cs.AI

    Super Neurons

    Authors: Serkan Kiranyaz, Junaid Malik, Mehmet Yamac, Mert Duman, Ilke Adalioglu, Esin Guldogan, Turker Ince, Moncef Gabbouj

    Abstract: Self-Organized Operational Neural Networks (Self-ONNs) have recently been proposed as new-generation neural network models with nonlinear learning units, i.e., the generative neurons that yield an elegant level of diversity; however, like its predecessor, conventional Convolutional Neural Networks (CNNs), they still have a common drawback: localized (fixed) kernel operations. This severely limits… ▽ More

    Submitted 15 April, 2023; v1 submitted 3 August, 2021; originally announced September 2021.

    Comments: 25 pages, 13 figures

  36. arXiv:2107.09584  [pdf, other

    cs.CV cs.RO

    Active 3D Shape Reconstruction from Vision and Touch

    Authors: Edward J. Smith, David Meger, Luis Pineda, Roberto Calandra, Jitendra Malik, Adriana Romero, Michal Drozdzal

    Abstract: Humans build 3D understandings of the world through active object exploration, using jointly their senses of vision and touch. However, in 3D shape reconstruction, most recent progress has relied on static datasets of limited sensory data such as RGB images, depth maps or haptic readings, leaving the active exploration of the shape largely unexplored. Inactive touch sensing for 3D reconstruction,… ▽ More

    Submitted 26 October, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

    Journal ref: Published at Neurips 2021

  37. arXiv:2107.04034  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    RMA: Rapid Motor Adaptation for Legged Robots

    Authors: Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik

    Abstract: Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these c… ▽ More

    Submitted 8 July, 2021; originally announced July 2021.

    Comments: RSS 2021. Webpage at https://ashish-kmr.github.io/rma-legged-robots/

  38. arXiv:2107.01205  [pdf, other

    cs.CV

    HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural Networks

    Authors: Jameel Malik, Soshi Shimada, Ahmed Elhayek, Sk Aziz Ali, Christian Theobalt, Vladislav Golyanik, Didier Stricker

    Abstract: 3D hand shape and pose estimation from a single depth map is a new and challenging computer vision problem with many applications. Existing methods addressing it directly regress hand meshes via 2D convolutional neural networks, which leads to artefacts due to perspective distortions in the images. To address the limitations of the existing methods, we develop HandVoxNet++, i.e., a voxel-based dee… ▽ More

    Submitted 5 December, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: 13 pages, 6 tables, 7 figures; project webpage: http://4dqv.mpi-inf.mpg.de/HandVoxNet++/. arXiv admin note: text overlap with arXiv:2004.01588

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

  39. arXiv:2106.14405  [pdf, other

    cs.LG cs.RO

    Habitat 2.0: Training Home Assistants to Rearrange their Habitat

    Authors: Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra

    Abstract: We introduce Habitat 2.0 (H2.0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios. We make comprehensive contributions to all levels of the embodied AI stack - data, simulation, and benchmark tasks. Specifically, we present: (i) ReplicaCAD: an artist-authored, annotated, reconfigurable 3D dataset of apartments (matching real spa… ▽ More

    Submitted 1 July, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

  40. arXiv:2105.14926  [pdf, other

    eess.IV

    Self-Organized Residual Blocks for Image Super-Resolution

    Authors: Onur Keleş, A. Murat Tekalp, Junaid Malik, Serkan Kıranyaz

    Abstract: It has become a standard practice to use the convolutional networks (ConvNet) with RELU non-linearity in image restoration and super-resolution (SR). Although the universal approximation theorem states that a multi-layer neural network can approximate any non-linear function with the desired precision, it does not reveal the best network architecture to do so. Recently, operational neural networks… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

    Comments: Accepted for publication in IEEE International Conference on Image Processing (ICIP) 2021

  41. arXiv:2105.12651  [pdf, other

    cs.HC

    The Usability and Trustworthiness of Medical Eye Images

    Authors: Daniel Diethei, Ashley Colley, Lisa Dannenberg, Muhammad Fawad Jawaid Malik, Johannes Schöning

    Abstract: The majority of blindness is preventable, and is located in develo** countries. While mHealth applications for retinal imaging in combination with affordable smartphone lens adaptors are a step towards better eye care access, the expert knowledge and additional hardware needed are often unavailable in develo** countries. Eye screening apps without lens adaptors exist, but we do not know much a… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

    Journal ref: 2021 IEEE International Conference on Healthcare Informatics (ICHI)

  42. arXiv:2105.12107  [pdf, other

    eess.IV cs.CV

    Self-Organized Variational Autoencoders (Self-VAE) for Learned Image Compression

    Authors: M. Akın Yılmaz, Onur Keleş, Hilal Güven, A. Murat Tekalp, Junaid Malik, Serkan Kıranyaz

    Abstract: In end-to-end optimized learned image compression, it is standard practice to use a convolutional variational autoencoder with generalized divisive normalization (GDN) to transform images into a latent space. Recently, Operational Neural Networks (ONNs) that learn the best non-linearity from a set of alternatives, and their self-organized variants, Self-ONNs, that approximate any non-linearity via… ▽ More

    Submitted 28 May, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

    Comments: Accepted for publication in IEEE International Conference on Image Processing (ICIP) 2021

  43. arXiv:2104.11227  [pdf, other

    cs.CV cs.AI cs.LG

    Multiscale Vision Transformers

    Authors: Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer

    Abstract: We present Multiscale Vision Transformers (MViT) for video and image recognition, by connecting the seminal idea of multiscale feature hierarchies with transformer models. Multiscale Transformers have several channel-resolution scale stages. Starting from the input resolution and a small channel dimension, the stages hierarchically expand the channel capacity while reducing the spatial resolution.… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

    Comments: Technical report

  44. arXiv:2104.02406  [pdf

    physics.med-ph eess.IV

    Magnetization Transfer-Mediated MR Fingerprinting

    Authors: Daniel J. West, Gastao Cruz, Rui P. A. G. Teixeira, Torben Schneider, Jacques-Donald Tournier, Joseph V. Hajnal, Claudia Prieto, Shaihan J. Malik

    Abstract: Purpose: Magnetization transfer (MT) and inhomogeneous MT (ihMT) contrasts are used in MRI to provide information about macromolecular tissue content. In particular, MT is sensitive to macromolecules and ihMT appears to be specific to myelinated tissue. This study proposes a technique to characterize MT and ihMT properties from a single acquisition, producing both semiquantitative contrast ratios,… ▽ More

    Submitted 23 August, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: 34 Pages and 15 Figures (Including Supporting Information), Submitted to Magnetic Resonance in Medicine (MRM). Updated to include link to final published article

  45. arXiv:2103.14580  [pdf, other

    cs.CL

    Correcting Automated and Manual Speech Transcription Errors using Warped Language Models

    Authors: Mahdi Namazifar, John Malik, Li Erran Li, Gokhan Tur, Dilek Hakkani Tür

    Abstract: Masked language models have revolutionized natural language processing systems in the past few years. A recently introduced generalization of masked language models called warped language models are trained to be more robust to the types of errors that appear in automatic or manual transcriptions of spoken language by exposing the language model to the same types of errors during training. In this… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: Submitted to INTERSPEECH

  46. arXiv:2103.03070  [pdf

    cs.CV cs.LG cs.MM cs.NI

    Convolutional versus Self-Organized Operational Neural Networks for Real-World Blind Image Denoising

    Authors: Junaid Malik, Serkan Kiranyaz, Mehmet Yamac, Esin Guldogan, Moncef Gabbouj

    Abstract: Real-world blind denoising poses a unique image restoration challenge due to the non-deterministic nature of the underlying noise distribution. Prevalent discriminative networks trained on synthetic noise models have been shown to generalize poorly to real-world noisy images. While curating real-world noisy images and improving ground truth estimation procedures remain key points of interest, a po… ▽ More

    Submitted 5 May, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

    Comments: Submitted for review in IEEE TIP

  47. arXiv:2103.03060  [pdf

    cs.CV cs.AI cs.LG cs.NE

    BM3D vs 2-Layer ONN

    Authors: Junaid Malik, Serkan Kiranyaz, Mehmet Yamac, Moncef Gabbouj

    Abstract: Despite their recent success on image denoising, the need for deep and complex architectures still hinders the practical usage of CNNs. Older but computationally more efficient methods such as BM3D remain a popular choice, especially in resource-constrained scenarios. In this study, we aim to find out whether compact neural networks can learn to produce competitive results as compared to BM3D for… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: Submitted for review in ICIP 2021

  48. arXiv:2101.02703  [pdf, other

    cs.LG cs.AI cs.CV stat.ME stat.ML

    Distribution-Free, Risk-Controlling Prediction Sets

    Authors: Stephen Bates, Anastasios Angelopoulos, Lihua Lei, Jitendra Malik, Michael I. Jordan

    Abstract: While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making. Deploying learning systems in consequential settings also requires calibrating and communicating the uncertainty of predictions. To convey instance-wise uncertainty for prediction tasks, we show how to generate set-valued predictions from a black-box… ▽ More

    Submitted 4 August, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

    Comments: Project website available at http://www.angelopoulos.ai/blog/posts/rcps/ and codebase available at https://github.com/aangelopoulos/rcps

  49. arXiv:2012.09856  [pdf, other

    cs.CV

    Reconstructing Hand-Object Interactions in the Wild

    Authors: Zhe Cao, Ilija Radosavovic, Angjoo Kanazawa, Jitendra Malik

    Abstract: In this work we explore reconstructing hand-object interactions in the wild. The core challenge of this problem is the lack of appropriate 3D labeled data. To overcome this issue, we propose an optimization-based procedure which does not require direct 3D supervision. The general strategy we adopt is to exploit all available related data (2D bounding boxes, 2D hand keypoints, 2D instance masks, 3D… ▽ More

    Submitted 30 December, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: Project page: https://people.eecs.berkeley.edu/~zhecao/rhoi/

  50. arXiv:2012.09843  [pdf, other

    cs.CV

    Human Mesh Recovery from Multiple Shots

    Authors: Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

    Abstract: Videos from edited media like movies are a useful, yet under-explored source of information. The rich variety of appearance and interactions between humans depicted over a large temporal context in these films could be a valuable source of data. However, the richness of data comes at the expense of fundamental challenges such as abrupt shot changes and close up shots of actors with heavy truncatio… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.