Search | arXiv e-print repository

Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts

Authors: Kastan Day, Daniel Christl, Rohan Salvi, Pranav Sriram

Abstract: We present Video Pre-trained Transformer. VPT uses four SOTA encoder models from prior work to convert a video into a sequence of compact embeddings. Our backbone, based on a reference Flan-T5-11B architecture, learns a universal representation of the video that is a non-linear sum of the encoder models. It learns using an autoregressive causal language modeling loss by predicting the words spoken… ▽ More We present Video Pre-trained Transformer. VPT uses four SOTA encoder models from prior work to convert a video into a sequence of compact embeddings. Our backbone, based on a reference Flan-T5-11B architecture, learns a universal representation of the video that is a non-linear sum of the encoder models. It learns using an autoregressive causal language modeling loss by predicting the words spoken in YouTube videos. Finally, we evaluate on standard downstream benchmarks by training fully connected prediction heads for each task. To the best of our knowledge, this is the first use of multiple frozen SOTA models as encoders in an "embedding -> backbone -> prediction head" design pattern - all others have trained their own joint encoder models. Additionally, we include more modalities than the current SOTA, Merlot Reserve, by adding explicit Scene Graph information. For these two reasons, we believe it could combine the world's best open-source models to achieve SOTA performance. Initial experiments demonstrate the model is learning appropriately, but more experimentation and compute is necessary, and already in progress, to realize our loftier goals. Alongside this work, we build on the YT-20M dataset, reproducing it and adding 25,000 personally selected YouTube videos to its corpus. All code and model checkpoints are open sourced under a standard MIT license. △ Less

Submitted 24 March, 2023; originally announced April 2023.

arXiv:2207.04028 [pdf, other]

CoCAtt: A Cognitive-Conditioned Driver Attention Dataset (Supplementary Material)

Authors: Yuan Shen, Niviru Wijayaratne, Pranav Sriram, Aamir Hasan, Peter Du, Katherine Driggs-Campbell

Abstract: The task of driver attention prediction has drawn considerable interest among researchers in robotics and the autonomous vehicle industry. Driver attention prediction can play an instrumental role in mitigating and preventing high-risk events, like collisions and casualties. However, existing driver attention prediction models neglect the distraction state and intention of the driver, which can si… ▽ More The task of driver attention prediction has drawn considerable interest among researchers in robotics and the autonomous vehicle industry. Driver attention prediction can play an instrumental role in mitigating and preventing high-risk events, like collisions and casualties. However, existing driver attention prediction models neglect the distraction state and intention of the driver, which can significantly influence how they observe their surroundings. To address these issues, we present a new driver attention dataset, CoCAtt (Cognitive-Conditioned Attention). Unlike previous driver attention datasets, CoCAtt includes per-frame annotations that describe the distraction state and intention of the driver. In addition, the attention data in our dataset is captured in both manual and autopilot modes using eye-tracking devices of different resolutions. Our results demonstrate that incorporating the above two driver states into attention modeling can improve the performance of driver attention prediction. To the best of our knowledge, this work is the first to provide autopilot attention data. Furthermore, CoCAtt is currently the largest and the most diverse driver attention dataset in terms of autonomy levels, eye tracker resolutions, and driving scenarios. CoCAtt is available for download at https://cocatt-dataset.github.io. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: Supplementary Material for the main paper, "CoCAtt: A Cognitive-Conditioned Driver Attention Dataset". Accepted at ITSC2022

arXiv:2202.13427 [pdf]

doi 10.1109/ICRA46639.2022.9811632

Meta-path Analysis on Spatio-Temporal Graphs for Pedestrian Trajectory Prediction

Authors: Aamir Hasan, Pranav Sriram, Katherine Driggs-Campbell

Abstract: Spatio-temporal graphs (ST-graphs) have been used to model time series tasks such as traffic forecasting, human motion modeling, and action recognition. The high-level structure and corresponding features from ST-graphs have led to improved performance over traditional architectures. However, current methods tend to be limited by simple features, despite the rich information provided by the full g… ▽ More Spatio-temporal graphs (ST-graphs) have been used to model time series tasks such as traffic forecasting, human motion modeling, and action recognition. The high-level structure and corresponding features from ST-graphs have led to improved performance over traditional architectures. However, current methods tend to be limited by simple features, despite the rich information provided by the full graph structure, which leads to inefficiencies and suboptimal performance in downstream tasks. We propose the use of features derived from meta-paths, walks across different types of edges, in ST-graphs to improve the performance of Structural Recurrent Neural Network. In this paper, we present the Meta-path Enhanced Structural Recurrent Neural Network (MESRNN), a generic framework that can be applied to any spatio-temporal task in a simple and scalable manner. We employ MESRNN for pedestrian trajectory prediction, utilizing these meta-path based features to capture the relationships between the trajectories of pedestrians at different points in time and space. We compare our MESRNN against state-of-the-art ST-graph methods on standard datasets to show the performance boost provided by meta-path information. The proposed model consistently outperforms the baselines in trajectory prediction over long time horizons by over 32\%, and produces more socially compliant trajectories in dense crowds. For more information please refer to the project website at https://sites.google.com/illinois.edu/mesrnn/home. △ Less

Submitted 27 February, 2022; originally announced February 2022.

Journal ref: ICRA 2022

arXiv:2202.05829 [pdf, other]

doi 10.1103/PhysRevB.105.195303

Clean quantum point contacts in an InAs quantum well grown on a lattice-mismatched InP substrate

Authors: Connie L. Hsueh, Praveen Sriram, Tiantian Wang, Candice Thomas, Geoffrey Gardner, Marc A. Kastner, Michael J. Manfra, David Goldhaber-Gordon

Abstract: Strong spin-orbit coupling, the resulting large $g$ factor, and small effective mass make InAs an attractive material platform for inducing topological superconductivity. The surface Fermi level pinning in the conduction band enables highly transparent ohmic contact without excessive do**. We investigate electrostatically defined quantum point contacts (QPCs) in a deep-well InAs two-dimensional… ▽ More Strong spin-orbit coupling, the resulting large $g$ factor, and small effective mass make InAs an attractive material platform for inducing topological superconductivity. The surface Fermi level pinning in the conduction band enables highly transparent ohmic contact without excessive do**. We investigate electrostatically defined quantum point contacts (QPCs) in a deep-well InAs two-dimensional electron gas. Despite the 3.3% lattice mismatch between the InAs quantum well and the InP substrate, we report clean QPCs with up to eight pronounced quantized conductance plateaus at zero magnetic field. Source-drain dc bias spectroscopy reveals a harmonic confinement potential with a nearly $5$ meV subband spacing. We find a many-body exchange interaction enhancement for the out-of-plane $g$ factor $|g_{\perp}^*| = 27 \pm 1$, whereas the in-plane $g$ factor is isotropic $|g^*_{x}| = |g^*_{y}| = 12 \pm 2$, close to the bulk value for InAs. △ Less

Submitted 14 May, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

Comments: Main text (8 pages, 5 figures), Supplemental Material (10 pages, 10 figures)

Journal ref: Phys. Rev. B 105, 195303 (2022)

arXiv:2111.10014 [pdf, other]

CoCAtt: A Cognitive-Conditioned Driver Attention Dataset

Authors: Yuan Shen, Niviru Wijayaratne, Pranav Sriram, Aamir Hasan, Peter Du, Katie Driggs-Campbell

Abstract: The task of driver attention prediction has drawn considerable interest among researchers in robotics and the autonomous vehicle industry. Driver attention prediction can play an instrumental role in mitigating and preventing high-risk events, like collisions and casualties. However, existing driver attention prediction models neglect the distraction state and intention of the driver, which can si… ▽ More The task of driver attention prediction has drawn considerable interest among researchers in robotics and the autonomous vehicle industry. Driver attention prediction can play an instrumental role in mitigating and preventing high-risk events, like collisions and casualties. However, existing driver attention prediction models neglect the distraction state and intention of the driver, which can significantly influence how they observe their surroundings. To address these issues, we present a new driver attention dataset, CoCAtt (Cognitive-Conditioned Attention). Unlike previous driver attention datasets, CoCAtt includes per-frame annotations that describe the distraction state and intention of the driver. In addition, the attention data in our dataset is captured in both manual and autopilot modes using eye-tracking devices of different resolutions. Our results demonstrate that incorporating the above two driver states into attention modeling can improve the performance of driver attention prediction. To the best of our knowledge, this work is the first to provide autopilot attention data. Furthermore, CoCAtt is currently the largest and the most diverse driver attention dataset in terms of autonomy levels, eye tracker resolutions, and driving scenarios. △ Less

Submitted 23 November, 2021; v1 submitted 18 November, 2021; originally announced November 2021.

Comments: 10 pages, 5 figures

arXiv:2103.08857 [pdf, other]

doi 10.1088/1361-648X/ac0d16

Role of dephasing on the conductance signatures of Majorana zero modes

Authors: Chaitrali Duse, Praveen Sriram, Kaveh Gharavi, Jonathan Baugh, Bhaskaran Muralidharan

Abstract: Conductance signatures that signal the presence of Majorana zero modes in a three terminal nanowire-topological superconductor hybrid system are analyzed in detail, in both the clean nanowire limit and in the presence of non-coherent dephasing interactions. In the coherent transport regime for a clean wire, we point out contributions of the local Andreev reflection and the non-local transmissions… ▽ More Conductance signatures that signal the presence of Majorana zero modes in a three terminal nanowire-topological superconductor hybrid system are analyzed in detail, in both the clean nanowire limit and in the presence of non-coherent dephasing interactions. In the coherent transport regime for a clean wire, we point out contributions of the local Andreev reflection and the non-local transmissions toward the total conductance lineshapes while clarifying the role of contact broadening on the Majorana conductance lineshapes at the magnetic field parity crossings. Interestingly, at larger $B$-field parity crossings, the contribution of the Andreev reflection process decreases which is compensated by the non-local processes in order to maintain the conductance quantum regardless of contact coupling strength. In the non-coherent transport regime, we include dephasing that is introduced by momentum randomization processes, that allows one to smoothly transition to the diffusive limit. Here, as expected, we note that while the Majorana character of the zero modes is unchanged, there is a reduction in the conductance peak magnitude that scales with the strength of the impurity scattering potentials. Important distinctions between the effect of non-coherent dephasing processes and contact-induced tunnel broadenings in the coherent regime on the conductance lineshapes are elucidated. Most importantly our results reveal that the addition of dephasing in the set up does not lead to any notable length dependence to the conductance of the zero modes, contrary to what one would expect in a gradual transition to the diffusive limit. We believe this work paves a way for a systematic introduction of scattering processes into the realistic modeling of Majorana nanowire hybrid devices and assessing topological signatures in such systems in the presence of non-coherent scattering processes. △ Less

Submitted 17 April, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: 17 pages, 14 figures

arXiv:1902.10947 [pdf, other]

doi 10.1103/PhysRevB.100.155431

Supercurrent Interference in Semiconductor Nanowire Josephson Junctions

Authors: Praveen Sriram, Sandesh S Kalantre, Kaveh Gharavi, Jonathan Baugh, Bhaskaran Muralidharan

Abstract: Semiconductor-superconductor hybrid systems provide a promising platform for hosting unpaired Majorana fermions towards the realisation of fault-tolerant topological quantum computing. In this study, we employ the Keldysh Non-Equilibrium Green's function formalism to model quantum transport in normal-superconductor junctions. We analyze III-V semiconductor nanowire Josephson junctions (InAs/Nb) us… ▽ More Semiconductor-superconductor hybrid systems provide a promising platform for hosting unpaired Majorana fermions towards the realisation of fault-tolerant topological quantum computing. In this study, we employ the Keldysh Non-Equilibrium Green's function formalism to model quantum transport in normal-superconductor junctions. We analyze III-V semiconductor nanowire Josephson junctions (InAs/Nb) using a three-dimensional discrete lattice model described by the Bogolubov-de Gennes Hamiltonian in the tight-binding approximation, and compute the Andreev bound state spectrum and current-phase relations. Recent experiments [Zuo et al., Phys. Rev. Lett. 119,187704 (2017)] and [Gharavi et al., arXiv:1405.7455v2 (2014)] reveal critical current oscillations in these devices, and our simulations confirm these to be an interference effect of the transverse sub-bands in the nanowire. We add disorder to model coherent scattering and study its effect on the critical current oscillations, with an aim to gain a thorough understanding of the experiments. The oscillations in the disordered junction are highly sensitive to the particular realisation of the random disorder potential, and to the gate voltage. A macroscopic current measurement thus gives us information about the microscopic profile of the junction. Finally, we study dephasing in the channel by including elastic phase-breaking interactions. The oscillations thus obtained are in good qualitative agreement with the experimental data, and this signifies the essential role of phase-breaking processes in III-V semiconductor nanowire Josephson junctions. △ Less

Submitted 14 October, 2019; v1 submitted 28 February, 2019; originally announced February 2019.

Comments: 21 pages, 18 figures, Accepted for publication in Physical Review B

Journal ref: Phys. Rev. B 100, 155431 (2019)

arXiv:1901.01015 [pdf, other]

Vehicle Re-Identification: an Efficient Baseline Using Triplet Embedding

Authors: Ratnesh Kumar, Edwin Weill, Farzin Aghdasi, Parthsarathy Sriram

Abstract: In this paper we tackle the problem of vehicle re-identification in a camera network utilizing triplet embeddings. Re-identification is the problem of matching appearances of objects across different cameras. With the proliferation of surveillance cameras enabling smart and safer cities, there is an ever-increasing need to re-identify vehicles across cameras. Typical challenges arising in smart ci… ▽ More In this paper we tackle the problem of vehicle re-identification in a camera network utilizing triplet embeddings. Re-identification is the problem of matching appearances of objects across different cameras. With the proliferation of surveillance cameras enabling smart and safer cities, there is an ever-increasing need to re-identify vehicles across cameras. Typical challenges arising in smart city scenarios include variations of viewpoints, illumination and self occlusions. Most successful approaches for re-identification involve (deep) learning an embedding space such that the vehicles of same identities are projected closer to one another, compared to the vehicles representing different identities. Popular loss functions for learning an embedding (space) include contrastive or triplet loss. In this paper we provide an extensive evaluation of these losses applied to vehicle re-identification and demonstrate that using the best practices for learning embeddings outperform most of the previous approaches proposed in the vehicle re-identification literature. Compared to most existing state-of-the-art approaches, our approach is simpler and more straightforward for training utilizing only identity-level annotations, along with one of the smallest published embedding dimensions for efficient inference. Furthermore in this work we introduce a formal evaluation of a triplet sampling variant (batch sample) into the re-identification literature. △ Less

Submitted 8 August, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

Comments: Accepted at IJCNN 2019. This arxiv version adds result on newer datasets post conference submission

arXiv:1602.00508 [pdf, other]

doi 10.1103/PhysRevA.96.032339

A Bayesian view of Single-Qubit Clocks, and an Energy versus Accuracy tradeoff

Authors: Manoj Gopalkrishnan, Varshith Kandula, Praveen Sriram, Abhishek Deshpande, Bhaskaran Muralidharan

Abstract: We bring a Bayesian approach to the analysis of clocks. Using exponential distributions as priors for clocks, we analyze how well one can keep time with a single qubit freely precessing under a magnetic field. We find that, at least with a single qubit, quantum mechanics does not allow exact timekee**, in contrast to classical mechanics which does. We find the design of the single-qubit clock th… ▽ More We bring a Bayesian approach to the analysis of clocks. Using exponential distributions as priors for clocks, we analyze how well one can keep time with a single qubit freely precessing under a magnetic field. We find that, at least with a single qubit, quantum mechanics does not allow exact timekee**, in contrast to classical mechanics which does. We find the design of the single-qubit clock that leads to maximum accuracy. Further, we find an energy versus accuracy tradeoff --- the energy cost is at least $k_BT$ times the improvement in accuracy as measured by the entropy reduction in going from the prior distribution to the posterior distribution. We propose a physical realization of the single qubit clock using charge transport across a capacitively-coupled quantum dot. △ Less

Submitted 30 May, 2017; v1 submitted 1 February, 2016; originally announced February 2016.

Comments: v2: 9 pages, 5 figures. v1:6 pages, figures

Journal ref: Phys. Rev. A 96, 032339 (2017)

Showing 1–9 of 9 results for author: Sriram, P