-
HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video
Authors:
Jia-Wei Liu,
Yan-Pei Cao,
Tianyuan Yang,
Eric Zhongcong Xu,
Jussi Keppo,
Ying Shan,
Xiaohu Qie,
Mike Zheng Shou
Abstract:
We introduce HOSNeRF, a novel 360° free-viewpoint rendering method that reconstructs neural radiance fields for dynamic human-object-scene from a single monocular in-the-wild video. Our method enables pausing the video at any frame and rendering all scene details (dynamic humans, objects, and backgrounds) from arbitrary viewpoints. The first challenge in this task is the complex object motions in…
▽ More
We introduce HOSNeRF, a novel 360° free-viewpoint rendering method that reconstructs neural radiance fields for dynamic human-object-scene from a single monocular in-the-wild video. Our method enables pausing the video at any frame and rendering all scene details (dynamic humans, objects, and backgrounds) from arbitrary viewpoints. The first challenge in this task is the complex object motions in human-object interactions, which we tackle by introducing the new object bones into the conventional human skeleton hierarchy to effectively estimate large object deformations in our dynamic human-object model. The second challenge is that humans interact with different objects at different times, for which we introduce two new learnable object state embeddings that can be used as conditions for learning our human-object representation and scene representation, respectively. Extensive experiments show that HOSNeRF significantly outperforms SOTA approaches on two challenging datasets by a large margin of 40% ~ 50% in terms of LPIPS. The code, data, and compelling examples of 360° free-viewpoint renderings from single videos will be released in https://showlab.github.io/HOSNeRF.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Indefinite and Bidirectional Near Infrared Nanocrystal Photoswitching
Authors:
Changhwan Lee,
Emma Z. Xu,
Kevin W. C. Kwock,
Ayelet Teitelboim,
Yawei Liu,
Natalie Fardian-Melamed,
Cassio C. S. Pedroso,
Hye Sun Park,
Jongwoo Kim,
Stefanie D. Pritzl,
Sang Hwan Nam,
Theobald Lohmueller,
Peter Ercius,
Yung Doug Suh,
Bruce E Cohen,
Emory M Chan,
P. James Schuck
Abstract:
Materials whose luminescence can be switched by optical stimulation drive technologies ranging from superresolution imaging1-4, nanophotonics5, and optical data storage6-8, to targeted pharmacology, optogenetics, and chemical reactivity9. These photoswitchable probes, including organic fluorophores and proteins, are prone to photodegradation, and often require phototoxic doses of ultraviolet (UV)…
▽ More
Materials whose luminescence can be switched by optical stimulation drive technologies ranging from superresolution imaging1-4, nanophotonics5, and optical data storage6-8, to targeted pharmacology, optogenetics, and chemical reactivity9. These photoswitchable probes, including organic fluorophores and proteins, are prone to photodegradation, and often require phototoxic doses of ultraviolet (UV) or visible light. Colloidal inorganic nanoparticles have significant stability advantages over existing photoswitchable materials, but the ability to switch emission bidirectionally, particularly with NIR light, has not been reported with nanoparticles. Here, we present 2-way, near-infrared (NIR) photoswitching of avalanching nanoparticles (ANPs), showing full optical control of upconverted emission using phototriggers in the NIR-I and NIR-II spectral regions useful for subsurface imaging. Employing single-step photodarkening10-13 and photobrightening12,14-18, we demonstrate indefinite photoswitching of individual nanoparticles (>1000 cycles over 7 h) in ambient or aqueous conditions without measurable photodegradation. Critical steps of the photoswitching mechanism are elucidated by modeling and by measuring the photon avalanche properties of single ANPs in both bright and dark states. Unlimited, reversible photoswitching of ANPs enables indefinitely rewritable 2D and 3D multi-level optical patterning of ANPs, as well as optical nanoscopy with sub-Å localization superresolution that allows us to distinguish individual ANPs within tightly packed clusters.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Egocentric Video-Language Pretraining @ Ego4D Challenge 2022
Authors:
Kevin Qinghong Lin,
Alex **peng Wang,
Mattia Soldan,
Michael Wray,
Rui Yan,
Eric Zhongcong Xu,
Difei Gao,
Rongcheng Tu,
Wenzhe Zhao,
Weijie Kong,
Chengfei Cai,
Hongfa Wang,
Dima Damen,
Bernard Ghanem,
Wei Liu,
Mike Zheng Shou
Abstract:
In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR). Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from pretraining dataset, pre…
▽ More
In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR). Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from pretraining dataset, pretraining objective, and development set. Based on the above three designs, we develop a pretrained video-language model that is able to transfer its egocentric video-text representation or video-only representation to several video downstream tasks. Our Egocentric VLP achieves 10.46R@1&IoU @0.3 on NLQ, 10.33 mAP on MQ, 74% Acc on OSCC, 0.67 sec error on PNR. The code is available at https://github.com/showlab/EgoVLP.
△ Less
Submitted 3 August, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022
Authors:
Kevin Qinghong Lin,
Alex **peng Wang,
Rui Yan,
Eric Zhongcong Xu,
Rongcheng Tu,
Yanru Zhu,
Wenzhe Zhao,
Weijie Kong,
Chengfei Cai,
Hongfa Wang,
Wei Liu,
Mike Zheng Shou
Abstract:
In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for the EPIC-KITCHENS-100 Multi-Instance Retrieval (MIR) challenge. Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from pretraining dataset, pretraining objective, and development set. Based on the above three designs, we develop a pretra…
▽ More
In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for the EPIC-KITCHENS-100 Multi-Instance Retrieval (MIR) challenge. Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from pretraining dataset, pretraining objective, and development set. Based on the above three designs, we develop a pretrained video-language model that is able to transfer its egocentric video-text representation to MIR benchmark. Furthermore, we devise an adaptive multi-instance max-margin loss to effectively fine-tune the model and equip the dual-softmax technique for reliable inference. Our best single model obtains strong performance on the challenge test set with 47.39% mAP and 61.44% nDCG. The code is available at https://github.com/showlab/EgoVLP.
△ Less
Submitted 3 August, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Egocentric Video-Language Pretraining
Authors:
Kevin Qinghong Lin,
Alex **peng Wang,
Mattia Soldan,
Michael Wray,
Rui Yan,
Eric Zhongcong Xu,
Difei Gao,
Rongcheng Tu,
Wenzhe Zhao,
Weijie Kong,
Chengfei Cai,
Hongfa Wang,
Dima Damen,
Bernard Ghanem,
Wei Liu,
Mike Zheng Shou
Abstract:
Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention. Best performing works rely on large-scale, 3rd-person video-text datasets, such as HowTo100M. In this work, we exploit the recently released Ego4D dataset to pioneer Egocentric VLP along three directions. (i) We create…
▽ More
Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention. Best performing works rely on large-scale, 3rd-person video-text datasets, such as HowTo100M. In this work, we exploit the recently released Ego4D dataset to pioneer Egocentric VLP along three directions. (i) We create EgoClip, a 1st-person video-text pretraining dataset comprising 3.8M clip-text pairs well-chosen from Ego4D, covering a large variety of human daily activities. (ii) We propose a novel pretraining objective, dubbed EgoNCE, which adapts video-text contrastive learning to the egocentric domain by mining egocentric-aware positive and negative samples. (iii) We introduce EgoMCQ, a development benchmark that is close to EgoClip and hence can support effective validation and fast exploration of our design decisions in EgoClip and EgoNCE. Furthermore, we demonstrate strong performance on five egocentric downstream tasks across three datasets: video-text retrieval on EPIC-KITCHENS-100; action recognition on Charades-Ego; natural language query, moment query, and object state change classification on Ego4D challenge benchmarks. The dataset and code are available at https://github.com/showlab/EgoVLP.
△ Less
Submitted 12 October, 2022; v1 submitted 3 June, 2022;
originally announced June 2022.
-
AVA-AVD: Audio-Visual Speaker Diarization in the Wild
Authors:
Eric Zhongcong Xu,
Zeyang Song,
Satoshi Tsutsui,
Chao Feng,
Mang Ye,
Mike Zheng Shou
Abstract:
Audio-visual speaker diarization aims at detecting "who spoke when" using both auditory and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor environments like meeting rooms or news studios, which are quite different from in-the-wild videos in many scenarios such as movies, documentaries, and audience sitcoms. To develop diarization methods for these challengi…
▽ More
Audio-visual speaker diarization aims at detecting "who spoke when" using both auditory and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor environments like meeting rooms or news studios, which are quite different from in-the-wild videos in many scenarios such as movies, documentaries, and audience sitcoms. To develop diarization methods for these challenging videos, we create the AVA Audio-Visual Diarization (AVA-AVD) dataset. Our experiments demonstrate that adding AVA-AVD into training set can produce significantly better diarization models for in-the-wild videos despite that the data is relatively small. Moreover, this benchmark is challenging due to the diverse scenes, complicated acoustic conditions, and completely off-screen speakers. As a first step towards addressing the challenges, we design the Audio-Visual Relation Network (AVR-Net) which introduces a simple yet effective modality mask to capture discriminative information based on face visibility. Experiments show that our method not only can outperform state-of-the-art methods but is more robust as varying the ratio of off-screen speakers. Our data and code has been made publicly available at https://github.com/showlab/AVA-AVD.
△ Less
Submitted 16 July, 2022; v1 submitted 29 November, 2021;
originally announced November 2021.
-
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Authors:
Kristen Grauman,
Andrew Westbury,
Eugene Byrne,
Zachary Chavis,
Antonino Furnari,
Rohit Girdhar,
Jackson Hamburger,
Hao Jiang,
Miao Liu,
Xingyu Liu,
Miguel Martin,
Tushar Nagarajan,
Ilija Radosavovic,
Santhosh Kumar Ramakrishnan,
Fiona Ryan,
Jayant Sharma,
Michael Wray,
Mengmeng Xu,
Eric Zhongcong Xu,
Chen Zhao,
Siddhant Bansal,
Dhruv Batra,
Vincent Cartillier,
Sean Crane,
Tien Do
, et al. (60 additional authors not shown)
Abstract:
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons…
▽ More
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/
△ Less
Submitted 11 March, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Surface oxidation and thermoelectric properties of indium-doped tin telluride nanowires
Authors:
Z. Li,
E. Z. Xu,
Y. Losovyj,
N. Li,
A. P. Chen,
B. Swartzentruber,
N. Sinitsyn,
J. K. Yoo,
Q. X. Jia,
S. X. Zhang
Abstract:
The recent discovery of excellent thermoelectric properties and topological surface states in SnTe-based compounds has attracted extensive attention in various research areas. Indium doped SnTe is of particular interest because, depending on the do** level, it can either generate resonant states in the bulk valence band leading to enhanced thermoelectric properties, or induce superconductivity t…
▽ More
The recent discovery of excellent thermoelectric properties and topological surface states in SnTe-based compounds has attracted extensive attention in various research areas. Indium doped SnTe is of particular interest because, depending on the do** level, it can either generate resonant states in the bulk valence band leading to enhanced thermoelectric properties, or induce superconductivity that coexists with topological states. Here we report on the vapor deposition of In-doped SnTe nanowires and the study of their surface oxidation and thermoelectric properties. The nanowire growth is assisted by Au catalysts, and their morphologies vary as a function of substrate position and temperature. Transmission electron microscopy characterization reveals the formation of amorphous surface in single crystalline nanowires. X-ray photoelectron spectroscopy studies suggest that the nanowire surface is composed of In2O3, SnO2, Te and TeO2 which can be readily removed by argon ion sputtering. Exposure of the cleaned nanowires to atmosphere yields rapid oxidation of the surface within only one minute. Characterizations of electrical conductivity σ, thermopower S, and thermal conductivity \k{appa} were performed on the same In-doped nanowire which shows suppressed σ and \k{appa} but enhanced S yielding an improved thermoelectric figure of merit ZT than the undoped SnTe.
△ Less
Submitted 2 August, 2017; v1 submitted 10 May, 2017;
originally announced May 2017.