-
Python-based DSL for generating Verilog model of Synchronous Digital Circuits
Authors:
Mandar Datar,
Dhruva S. Hegde,
Vendra Durga Prasad,
Manish Prajapati,
Neralla Manikanta,
Devansh Gupta,
Janampalli Pavanija,
Pratyush Pare,
Akash,
Shivam Gupta,
Sachin B. Patkar
Abstract:
We have designed a Python-based Domain Specific Language (DSL) for modeling synchronous digital circuits. In this DSL, hardware is modeled as a collection of transactions -- running in series, parallel, and loops. When the model is executed by a Python interpreter, synthesizable and behavioural Verilog is generated as output, which can be integrated with other RTL designs or directly used for FPGA…
▽ More
We have designed a Python-based Domain Specific Language (DSL) for modeling synchronous digital circuits. In this DSL, hardware is modeled as a collection of transactions -- running in series, parallel, and loops. When the model is executed by a Python interpreter, synthesizable and behavioural Verilog is generated as output, which can be integrated with other RTL designs or directly used for FPGA and ASIC flows. In this paper, we describe - 1) the language (DSL), which allows users to express computation in series/parallel/loop constructs, with explicit cycle boundaries, 2) the internals of a simple Python implementation to produce synthesizable Verilog, and 3) several design examples and case studies for applications in post-quantum cryptography, stereo-vision, digital signal processing and optimization techniques. In the end, we list ideas to extend this framework.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results
Authors:
Xiaoning Liu,
Zongwei Wu,
Ao Li,
Florin-Alexandru Vasluianu,
Yulun Zhang,
Shuhang Gu,
Le Zhang,
Ce Zhu,
Radu Timofte,
Zhi **,
Hongjun Wu,
Chenxi Wang,
Haitao Ling,
Yuanhao Cai,
Hao Bian,
Yuxin Zheng,
**g Lin,
Alan Yuille,
Ben Shao,
** Guo,
Tianli Liu,
Mohao Wu,
Yixu Feng,
Shuo Hou,
Haotian Lin
, et al. (87 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig…
▽ More
This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlighting, extreme darkness, and night scenes. A notable total of 428 participants registered for the challenge, with 22 teams ultimately making valid submissions. This paper meticulously evaluates the state-of-the-art advancements in enhancing low-light images, reflecting the significant progress and creativity in this field.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Multimodal 3D Object Detection on Unseen Domains
Authors:
Deepti Hegde,
Suhas Lohit,
Kuan-Chuan Peng,
Michael J. Jones,
Vishal M. Patel
Abstract:
LiDAR datasets for autonomous driving exhibit biases in properties such as point cloud density, range, and object dimensions. As a result, object detection networks trained and evaluated in different environments often experience performance degradation. Domain adaptation approaches assume access to unannotated samples from the test distribution to address this problem. However, in the real world,…
▽ More
LiDAR datasets for autonomous driving exhibit biases in properties such as point cloud density, range, and object dimensions. As a result, object detection networks trained and evaluated in different environments often experience performance degradation. Domain adaptation approaches assume access to unannotated samples from the test distribution to address this problem. However, in the real world, the exact conditions of deployment and access to samples representative of the test dataset may be unavailable while training. We argue that the more realistic and challenging formulation is to require robustness in performance to unseen target domains. We propose to address this problem in a two-pronged manner. First, we leverage paired LiDAR-image data present in most autonomous driving datasets to perform multimodal object detection. We suggest that working with multimodal features by leveraging both images and LiDAR point clouds for scene understanding tasks results in object detectors more robust to unseen domain shifts. Second, we train a 3D object detector to learn multimodal object features across different distributions and promote feature invariance across these source domains to improve generalizability to unseen target domains. To this end, we propose CLIX$^\text{3D}$, a multimodal fusion and supervised contrastive learning framework for 3D object detection that performs alignment of object features from same-class samples of different domains while pushing the features from different classes apart. We show that CLIX$^\text{3D}$ yields state-of-the-art domain generalization performance under multiple dataset shifts.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection
Authors:
Deepti Hegde,
Suhas Lohit,
Kuan-Chuan Peng,
Michael J. Jones,
Vishal M. Patel
Abstract:
Popular representation learning methods encourage feature invariance under transformations applied at the input. However, in 3D perception tasks like object localization and segmentation, outputs are naturally equivariant to some transformations, such as rotation. Using pre-training loss functions that encourage equivariance of features under certain transformations provides a strong self-supervis…
▽ More
Popular representation learning methods encourage feature invariance under transformations applied at the input. However, in 3D perception tasks like object localization and segmentation, outputs are naturally equivariant to some transformations, such as rotation. Using pre-training loss functions that encourage equivariance of features under certain transformations provides a strong self-supervision signal while also retaining information of geometric relationships between transformed feature representations. This can enable improved performance in downstream tasks that are equivariant to such transformations. In this paper, we propose a spatio-temporal equivariant learning framework by considering both spatial and temporal augmentations jointly. Our experiments show that the best performance arises with a pre-training approach that encourages equivariance to translation, scaling, and flip, rotation and scene flow. For spatial augmentations, we find that depending on the transformation, either a contrastive objective or an equivariance-by-classification objective yields best results. To leverage real-world object deformations and motion, we consider sequential LiDAR scene pairs and develop a novel 3D scene flow-based equivariance objective that leads to improved performance overall. We show our pre-training method for 3D object detection which outperforms existing equivariant and invariant approaches in many settings.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Poles of unramified degenerate Eisenstein series
Authors:
Devadatta G. Hegde
Abstract:
We determine the poles of maximal unramified degenerate Eisenstein series of a split semisimple algebraic group over a number field using a straightforward global argument, avoiding delicate analysis of intertwining operators.
We determine the poles of maximal unramified degenerate Eisenstein series of a split semisimple algebraic group over a number field using a straightforward global argument, avoiding delicate analysis of intertwining operators.
△ Less
Submitted 15 May, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Machine Learning (ML)-assisted Beam Management in millimeter (mm)Wave Distributed Multiple Input Multiple Output (D-MIMO) systems
Authors:
Karthik R M,
Dhiraj Nagaraja Hegde,
Muris Sarajlic,
Abhishek Sarkar
Abstract:
Beam management (BM) protocols are critical for establishing and maintaining connectivity between network radio nodes and User Equipments (UEs). In Distributed Multiple Input Multiple Output systems (D-MIMO), a number of access points (APs), coordinated by a central processing unit (CPU), serves a number of UEs. At mmWave frequencies, the problem of finding the best AP and beam to serve the UEs is…
▽ More
Beam management (BM) protocols are critical for establishing and maintaining connectivity between network radio nodes and User Equipments (UEs). In Distributed Multiple Input Multiple Output systems (D-MIMO), a number of access points (APs), coordinated by a central processing unit (CPU), serves a number of UEs. At mmWave frequencies, the problem of finding the best AP and beam to serve the UEs is challenging due to a large number of beams that need to be sounded with Downlink (DL) reference signals. The objective of this paper is to investigate whether the best AP/beam can be reliably inferred from sounding only a small subset of beams and leveraging AI/ML for inference of best beam/AP. We use Random Forest (RF), MissForest (MF) and conditional Generative Adversarial Networks (c-GAN) for demonstrating the performance benefits of inference.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Rotation of a Stealth CME on 2012 October 5 Observed in the Inner Heliosphere
Authors:
Sandeep Kumar,
Dinesha V. Hegde,
Nandita Srivastava,
Nikolai V. Pogorelov,
Nat Gopalswamy,
Seiji Yashiro
Abstract:
Coronal Mass Ejections (CMEs) are subject to changes in their direction of propagation, tilt, and other properties. This is because CMEs interact with the ambient solar wind and other large-scale magnetic field structures. In this work, we report on the observations of the 2012 October 5 stealth CME using coronagraphic and heliospheric images. We find clear evidence of a continuous rotation of the…
▽ More
Coronal Mass Ejections (CMEs) are subject to changes in their direction of propagation, tilt, and other properties. This is because CMEs interact with the ambient solar wind and other large-scale magnetic field structures. In this work, we report on the observations of the 2012 October 5 stealth CME using coronagraphic and heliospheric images. We find clear evidence of a continuous rotation of the CME, i.e., an increase in the tilt angle, estimated using the Graduated Cylindrical Shell (GCS) reconstruction at different heliocentric distances, up to 58 solar radii. We find a further increase in the tilt at L1 estimated from the toroidal and cylindrical flux rope fitting on the in situ observations of IMF and solar wind parameters. This study highlights the importance of observations of Heliospheric Imager (HI), onboard the Solar TErrestrial RElations Observatory (STEREO). In particular, the GCS reconstruction of CMEs in HI field-of-view promises to bridge the gap between the near-Sun and in-situ observations at the L1. The changes in the CME tilt has significant implications for the space weather impact of stealth CMEs.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition
Authors:
Deepti Hegde,
Jeya Maria Jose Valanarasu,
Vishal M. Patel
Abstract:
Vision-Language models like CLIP have been widely adopted for various tasks due to their impressive zero-shot capabilities. However, CLIP is not suitable for extracting 3D geometric features as it was trained on only images and text by natural language supervision. We work on addressing this limitation and propose a new framework termed CG3D (CLIP Goes 3D) where a 3D encoder is learned to exhibit…
▽ More
Vision-Language models like CLIP have been widely adopted for various tasks due to their impressive zero-shot capabilities. However, CLIP is not suitable for extracting 3D geometric features as it was trained on only images and text by natural language supervision. We work on addressing this limitation and propose a new framework termed CG3D (CLIP Goes 3D) where a 3D encoder is learned to exhibit zero-shot capabilities. CG3D is trained using triplets of pointclouds, corresponding rendered 2D images, and texts using natural language supervision. To align the features in a multimodal embedding space, we utilize contrastive loss on 3D features obtained from the 3D encoder, as well as visual and text features extracted from CLIP. We note that the natural images used to train CLIP and the rendered 2D images in CG3D have a distribution shift. Attempting to train the visual and text encoder to account for this shift results in catastrophic forgetting and a notable decrease in performance. To solve this, we employ prompt tuning and introduce trainable parameters in the input space to shift CLIP towards the 3D pre-training dataset utilized in CG3D. We extensively test our pre-trained CG3D framework and demonstrate its impressive capabilities in zero-shot, open scene understanding, and retrieval tasks. Further, it also serves as strong starting weights for fine-tuning in downstream 3D recognition tasks.
△ Less
Submitted 18 April, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection
Authors:
Deepti Hegde,
Vishal M. Patel
Abstract:
3D object detection networks tend to be biased towards the data they are trained on. Evaluation on datasets captured in different locations, conditions or sensors than that of the training (source) data results in a drop in model performance due to the gap in distribution with the test (or target) data. Current methods for domain adaptation either assume access to source data during training, whic…
▽ More
3D object detection networks tend to be biased towards the data they are trained on. Evaluation on datasets captured in different locations, conditions or sensors than that of the training (source) data results in a drop in model performance due to the gap in distribution with the test (or target) data. Current methods for domain adaptation either assume access to source data during training, which may not be available due to privacy or memory concerns, or require a sequence of lidar frames as an input. We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors that uses class prototypes to mitigate the effect pseudo-label noise. Addressing the limitations of traditional feature aggregation methods for prototype computation in the presence of noisy labels, we utilize a transformer module to identify outlier ROI's that correspond to incorrect, over-confident annotations, and compute an attentive class prototype. Under an iterative training strategy, the losses associated with noisy pseudo labels are down-weighed and thus refined in the process of self-training. To validate the effectiveness of our proposed approach, we examine the domain shift associated with networks trained on large, label-rich datasets (such as the Waymo Open Dataset and nuScenes) and evaluate on smaller, label-poor datasets (such as KITTI) and vice-versa. We demonstrate our approach on two recent object detectors and achieve results that out-perform the other domain adaptation works.
△ Less
Submitted 1 December, 2021; v1 submitted 30 November, 2021;
originally announced November 2021.
-
Uncertainty-aware Mean Teacher for Source-free Unsupervised Domain Adaptive 3D Object Detection
Authors:
Deepti Hegde,
Vishwanath Sindagi,
Velat Kilic,
A. Brinton Cooper,
Mark Foster,
Vishal Patel
Abstract:
Pseudo-label based self training approaches are a popular method for source-free unsupervised domain adaptation. However, their efficacy depends on the quality of the labels generated by the source trained model. These labels may be incorrect with high confidence, rendering thresholding methods ineffective. In order to avoid reinforcing errors caused by label noise, we propose an uncertainty-aware…
▽ More
Pseudo-label based self training approaches are a popular method for source-free unsupervised domain adaptation. However, their efficacy depends on the quality of the labels generated by the source trained model. These labels may be incorrect with high confidence, rendering thresholding methods ineffective. In order to avoid reinforcing errors caused by label noise, we propose an uncertainty-aware mean teacher framework which implicitly filters incorrect pseudo-labels during training. Leveraging model uncertainty allows the mean teacher network to perform implicit filtering by down-weighing losses corresponding uncertain pseudo-labels. Effectively, we perform automatic soft-sampling of pseudo-labeled data while aligning predictions from the student and teacher networks. We demonstrate our method on several domain adaptation scenarios, from cross-dataset to cross-weather conditions, and achieve state-of-the-art performance in these cases, on the KITTI lidar target dataset.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection
Authors:
Velat Kilic,
Deepti Hegde,
Vishwanath Sindagi,
A. Brinton Cooper,
Mark A. Foster,
Vishal M. Patel
Abstract:
Lidar-based object detectors are critical parts of the 3D perception pipeline in autonomous navigation systems such as self-driving cars. However, they are known to be sensitive to adverse weather conditions such as rain, snow and fog due to reduced signal-to-noise ratio (SNR) and signal-to-background ratio (SBR). As a result, lidar-based object detectors trained on data captured in normal weather…
▽ More
Lidar-based object detectors are critical parts of the 3D perception pipeline in autonomous navigation systems such as self-driving cars. However, they are known to be sensitive to adverse weather conditions such as rain, snow and fog due to reduced signal-to-noise ratio (SNR) and signal-to-background ratio (SBR). As a result, lidar-based object detectors trained on data captured in normal weather tend to perform poorly in such scenarios. However, collecting and labelling sufficient training data in a diverse range of adverse weather conditions is laborious and prohibitively expensive. To address this issue, we propose a physics-based approach to simulate lidar point clouds of scenes in adverse weather conditions. These augmented datasets can then be used to train lidar-based detectors to improve their all-weather reliability. Specifically, we introduce a hybrid Monte-Carlo based approach that treats (i) the effects of large particles by placing them randomly and comparing their back reflected power against the target, and (ii) attenuation effects on average through calculation of scattering efficiencies from the Mie theory and particle size distributions. Retraining networks with this augmented data improves mean average precision evaluated on real world rainy scenes and we observe greater improvement in performance with our model relative to existing models from the literature. Furthermore, we evaluate recent state-of-the-art detectors on the simulated weather conditions and present an in-depth analysis of their performance.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
Schwartz functions, Hadamard products, and the Dixmier-Malliavin theorem
Authors:
Devadatta G. Hegde
Abstract:
In this paper we show that functions of the form $\prod_{n\ge1}\frac{1}{\left(1+\frac{x^{2}}{a_{n}^{2}}\right)}$ where $a_{n}>0$ and $\sum_{n\ge1}\frac{1}{a_{n}^{2}}<\infty$ are in the Schwartz space of the real line, answering a question raised by Casselman. As a consequence we obtain substantial simplifications in the proofs of Dixmier and Malliavin of their theorem that every test function on a…
▽ More
In this paper we show that functions of the form $\prod_{n\ge1}\frac{1}{\left(1+\frac{x^{2}}{a_{n}^{2}}\right)}$ where $a_{n}>0$ and $\sum_{n\ge1}\frac{1}{a_{n}^{2}}<\infty$ are in the Schwartz space of the real line, answering a question raised by Casselman. As a consequence we obtain substantial simplifications in the proofs of Dixmier and Malliavin of their theorem that every test function on a Lie group is a finite linear combination of convolutions of two test functions, and an analogue of this for Fréchet space Lie group representations.
△ Less
Submitted 31 March, 2021; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Multi-View Fusion of Sensor Data for Improved Perception and Prediction in Autonomous Driving
Authors:
Sudeep Fadadu,
Shreyash Pandey,
Darshan Hegde,
Yi Shi,
Fang-Chieh Chou,
Nemanja Djuric,
Carlos Vallespi-Gonzalez
Abstract:
We present an end-to-end method for object detection and trajectory prediction utilizing multi-view representations of LiDAR returns and camera images. In this work, we recognize the strengths and weaknesses of different view representations, and we propose an efficient and generic fusing method that aggregates benefits from all views. Our model builds on a state-of-the-art Bird's-Eye View (BEV) n…
▽ More
We present an end-to-end method for object detection and trajectory prediction utilizing multi-view representations of LiDAR returns and camera images. In this work, we recognize the strengths and weaknesses of different view representations, and we propose an efficient and generic fusing method that aggregates benefits from all views. Our model builds on a state-of-the-art Bird's-Eye View (BEV) network that fuses voxelized features from a sequence of historical LiDAR data as well as rasterized high-definition map to perform detection and prediction tasks. We extend this model with additional LiDAR Range-View (RV) features that use the raw LiDAR information in its native, non-quantized representation. The RV feature map is projected into BEV and fused with the BEV features computed from LiDAR and high-definition map. The fused features are then further processed to output the final detections and trajectories, within a single end-to-end trainable network. In addition, the RV fusion of LiDAR and camera is performed in a straightforward and computationally efficient manner using this framework. The proposed multi-view fusion approach improves the state-of-the-art on proprietary large-scale real-world data collected by a fleet of self-driving vehicles, as well as on the public nuScenes data set with minimal increases on the computational cost.
△ Less
Submitted 18 October, 2021; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Sensor Fusion for Joint 3D Object Detection and Semantic Segmentation
Authors:
Gregory P. Meyer,
Jake Charland,
Darshan Hegde,
Ankit Laddha,
Carlos Vallespi-Gonzalez
Abstract:
In this paper, we present an extension to LaserNet, an efficient and state-of-the-art LiDAR based 3D object detector. We propose a method for fusing image data with the LiDAR data and show that this sensor fusion method improves the detection performance of the model especially at long ranges. The addition of image data is straightforward and does not require image labels. Furthermore, we expand t…
▽ More
In this paper, we present an extension to LaserNet, an efficient and state-of-the-art LiDAR based 3D object detector. We propose a method for fusing image data with the LiDAR data and show that this sensor fusion method improves the detection performance of the model especially at long ranges. The addition of image data is straightforward and does not require image labels. Furthermore, we expand the capabilities of the model to perform 3D semantic segmentation in addition to 3D object detection. On a large benchmark dataset, we demonstrate our approach achieves state-of-the-art performance on both object detection and semantic segmentation while maintaining a low runtime.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
Temporal Analysis of Language through Neural Language Models
Authors:
Yoon Kim,
Yi-I Chiu,
Kentaro Hanaki,
Darshan Hegde,
Slav Petrov
Abstract:
We provide a method for automatically detecting change in language across time through a chronologically trained neural language model. We train the model on the Google Books Ngram corpus to obtain word vector representations specific to each year, and identify words that have changed significantly from 1900 to 2009. The model identifies words such as "cell" and "gay" as having changed during that…
▽ More
We provide a method for automatically detecting change in language across time through a chronologically trained neural language model. We train the model on the Google Books Ngram corpus to obtain word vector representations specific to each year, and identify words that have changed significantly from 1900 to 2009. The model identifies words such as "cell" and "gay" as having changed during that time period. The model simultaneously identifies the specific years during which such words underwent change.
△ Less
Submitted 14 May, 2014;
originally announced May 2014.