-
Motion-Guided Masking for Spatiotemporal Representation Learning
Authors:
David Fan,
Jue Wang,
Shuai Liao,
Yi Zhu,
Vimal Bhat,
Hector Santos-Villalobos,
Rohith MV,
Xinyu Li
Abstract:
Several recent works have directly extended the image masked autoencoder (MAE) with random masking into video domain, achieving promising results. However, unlike images, both spatial and temporal information are important for video understanding. This suggests that the random masking strategy that is inherited from the image MAE is less effective for video MAE. This motivates the design of a nove…
▽ More
Several recent works have directly extended the image masked autoencoder (MAE) with random masking into video domain, achieving promising results. However, unlike images, both spatial and temporal information are important for video understanding. This suggests that the random masking strategy that is inherited from the image MAE is less effective for video MAE. This motivates the design of a novel masking algorithm that can more efficiently make use of video saliency. Specifically, we propose a motion-guided masking algorithm (MGM) which leverages motion vectors to guide the position of each mask over time. Crucially, these motion-based correspondences can be directly obtained from information stored in the compressed format of the video, which makes our method efficient and scalable. On two challenging large-scale video benchmarks (Kinetics-400 and Something-Something V2), we equip video MAE with our MGM and achieve up to +$1.3\%$ improvement compared to previous state-of-the-art methods. Additionally, our MGM achieves equivalent performance to previous video MAE using up to $66\%$ fewer training epochs. Lastly, we show that MGM generalizes better to downstream transfer learning and domain adaptation tasks on the UCF101, HMDB51, and Diving48 datasets, achieving up to +$4.9\%$ improvement compared to baseline methods.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation
Authors:
Najmeh Sadoughi,
Xinyu Li,
Avijit Vajpayee,
David Fan,
Bing Shuai,
Hector Santos-Villalobos,
Vimal Bhat,
Rohith MV
Abstract:
Previous research has studied the task of segmenting cinematic videos into scenes and into narrative acts. However, these studies have overlooked the essential task of multimodal alignment and fusion for effectively and efficiently processing long-form videos (>60min). In this paper, we introduce Multimodal alignmEnt aGgregation and distillAtion (MEGA) for cinematic long-video segmentation. MEGA t…
▽ More
Previous research has studied the task of segmenting cinematic videos into scenes and into narrative acts. However, these studies have overlooked the essential task of multimodal alignment and fusion for effectively and efficiently processing long-form videos (>60min). In this paper, we introduce Multimodal alignmEnt aGgregation and distillAtion (MEGA) for cinematic long-video segmentation. MEGA tackles the challenge by leveraging multiple media modalities. The method coarsely aligns inputs of variable lengths and different modalities with alignment positional encoding. To maintain temporal synchronization while reducing computation, we further introduce an enhanced bottleneck fusion layer which uses temporal alignment. Additionally, MEGA employs a novel contrastive loss to synchronize and transfer labels across modalities, enabling act segmentation from labeled synopsis sentences on video shots. Our experimental results show that MEGA outperforms state-of-the-art methods on MovieNet dataset for scene segmentation (with an Average Precision improvement of +1.19%) and on TRIPOD dataset for act segmentation (with a Total Agreement improvement of +5.51%)
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Model-based Reconstruction for Multi-Frequency Collimated Beam Ultrasound Systems
Authors:
Abdulrahman M. Alanazi,
Singanallur Venkatakrishnan,
Hector Santos-Villalobos,
Gregery T. Buzzard,
Charles Bouman
Abstract:
Collimated beam ultrasound systems are a technology for imaging inside multi-layered structures such as geothermal wells. These systems work by using a collimated narrow-band ultrasound transmitter that can penetrate through multiple layers of heterogeneous material. A series of measurements can then be made at multiple transmit frequencies. However, commonly used reconstruction algorithms such as…
▽ More
Collimated beam ultrasound systems are a technology for imaging inside multi-layered structures such as geothermal wells. These systems work by using a collimated narrow-band ultrasound transmitter that can penetrate through multiple layers of heterogeneous material. A series of measurements can then be made at multiple transmit frequencies. However, commonly used reconstruction algorithms such as Synthetic Aperture Focusing Technique (SAFT) tend to produce poor quality reconstructions for these systems both because they do not model collimated beam systems and they do not jointly reconstruct the multiple frequencies.
In this paper, we propose a multi-frequency ultrasound model-based iterative reconstruction (UMBIR) algorithm designed for multi-frequency collimated beam ultrasound systems. The combined system targets reflective imaging of heterogeneous, multi-layered structures. For each transmitted frequency band, we introduce a physics-based forward model to accurately account for the propagation of the collimated narrow-band ultrasonic beam through the multi-layered media. We then show how the joint multi-frequency UMBIR reconstruction can be computed by modeling the direct arrival signals, detector noise, and incorporating a spatially varying image prior. Results using both simulated and experimental data indicate that multi-frequency UMBIR reconstruction yields much higher reconstruction quality than either single frequency UMBIR or SAFT.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Expanding Accurate Person Recognition to New Altitudes and Ranges: The BRIAR Dataset
Authors:
David Cornett III,
Joel Brogan,
Nell Barber,
Deniz Aykac,
Seth Baird,
Nick Burchfield,
Carl Dukes,
Andrew Duncan,
Regina Ferrell,
Jim Goddard,
Gavin Jager,
Matt Larson,
Bart Murphy,
Christi Johnson,
Ian Shelley,
Nisha Srinivas,
Brandon Stockwell,
Leanne Thompson,
Matt Yohe,
Robert Zhang,
Scott Dolvin,
Hector J. Santos-Villalobos,
David S. Bolme
Abstract:
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These app…
▽ More
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Model-Based Reconstruction for Collimated Beam Ultrasound Systems
Authors:
Abdulrahman Alanazi,
Singanallur Venkatakrishnan,
Hector Santos-Villalobos,
Gregery Buzzard,
Charles Bouman
Abstract:
Collimated beam ultrasound systems are a novel technology for imaging inside multi-layered structures such as geothermal wells. Such systems include a transmitter and multiple receivers to capture reflected signals. Common algorithms for ultrasound reconstruction use delay-and-sum (DAS) approaches; these have low computational complexity but produce inaccurate images in the presence of complex str…
▽ More
Collimated beam ultrasound systems are a novel technology for imaging inside multi-layered structures such as geothermal wells. Such systems include a transmitter and multiple receivers to capture reflected signals. Common algorithms for ultrasound reconstruction use delay-and-sum (DAS) approaches; these have low computational complexity but produce inaccurate images in the presence of complex structures and specialized geometries such as collimated beams.
In this paper, we propose a multi-layer, ultrasonic, model-based iterative reconstruction algorithm designed for collimated beam systems. We introduce a physics-based forward model to accurately account for the propagation of a collimated ultrasonic beam in multi-layer media and describe an efficient implementation using binary search. We model direct arrival signals, detector noise, and a spatially varying image prior, then cast the reconstruction as a maximum a posteriori estimation problem. Using simulated and experimental data we obtain significantly fewer artifacts relative to DAS while running in near real time using commodity compute resources.
△ Less
Submitted 19 February, 2022;
originally announced February 2022.
-
The Mertens Unrolled Network (MU-Net): A High Dynamic Range Fusion Neural Network for Through the Windshield Driver Recognition
Authors:
Max Ruby,
David S. Bolme,
Joel Brogan,
David Cornett III,
Baldemar Delgado,
Gavin Jager,
Christi Johnson,
Jose Martinez-Mendoza,
Hector Santos-Villalobos,
Nisha Srinivas
Abstract:
Face recognition of vehicle occupants through windshields in unconstrained environments poses a number of unique challenges ranging from glare, poor illumination, driver pose and motion blur. In this paper, we further develop the hardware and software components of a custom vehicle imaging system to better overcome these challenges. After the build out of a physical prototype system that performs…
▽ More
Face recognition of vehicle occupants through windshields in unconstrained environments poses a number of unique challenges ranging from glare, poor illumination, driver pose and motion blur. In this paper, we further develop the hardware and software components of a custom vehicle imaging system to better overcome these challenges. After the build out of a physical prototype system that performs High Dynamic Range (HDR) imaging, we collect a small dataset of through-windshield image captures of known drivers. We then re-formulate the classical Mertens-Kautz-Van Reeth HDR fusion algorithm as a pre-initialized neural network, which we name the Mertens Unrolled Network (MU-Net), for the purpose of fine-tuning the HDR output of through-windshield images. Reconstructed faces from this novel HDR method are then evaluated and compared against other traditional and experimental HDR methods in a pre-trained state-of-the-art (SOTA) facial recognition pipeline, verifying the efficacy of our approach.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
Model-Based Iterative Reconstruction for One-Sided Ultrasonic Non-Destructive Evaluation
Authors:
Hani Almansouri,
Singanallur Venkatakrishnan,
Charles Bouman,
Hector Santos-Villalobos
Abstract:
One-sided ultrasonic non-destructive evaluation (UNDE) is extensively used to characterize structures that need to be inspected and maintained from defects and flaws that could affect the performance of power plants, such as nuclear power plants. Most UNDE systems send acoustic pulses into the structure of interest, measure the received waveform and use an algorithm to reconstruct the quantity of…
▽ More
One-sided ultrasonic non-destructive evaluation (UNDE) is extensively used to characterize structures that need to be inspected and maintained from defects and flaws that could affect the performance of power plants, such as nuclear power plants. Most UNDE systems send acoustic pulses into the structure of interest, measure the received waveform and use an algorithm to reconstruct the quantity of interest. The most widely used algorithm in UNDE systems is the synthetic aperture focusing technique (SAFT) because it produces acceptable results in real time. A few regularized inversion techniques with linear models have been proposed which can improve on SAFT, but they tend to make simplifying assumptions that do not address how to obtain reconstructions from large real data sets. In this paper, we propose a model-based iterative reconstruction (MBIR) algorithm designed for scanning UNDE systems. To further reduce some of the artifacts in the results, we enhance the forward model to account for the transmitted beam profile, the occurrence of direct arrival signals, and the correlation between scans from adjacent regions. Next, we combine the forward model with a spatially variant prior model to account for the attenuation of deeper regions. We also present an algorithm to jointly reconstruct measurements from large data sets. Finally, using simulated and extensive experimental data, we show MBIR results and demonstrate how we can improve over SAFT as well as existing regularized inversion techniques.
△ Less
Submitted 9 August, 2018;
originally announced August 2018.
-
Deep neural networks for non-linear model-based ultrasound reconstruction
Authors:
Hani Almansouri,
S. V. Venkatakrishnan,
Gregery T. Buzzard,
Charles A. Bouman,
Hector Santos-Villalobos
Abstract:
Ultrasound reflection tomography is widely used to image large complex specimens that are only accessible from a single side, such as well systems and nuclear power plant containment walls. Typical methods for inverting the measurement rely on delay-and-sum algorithms that rapidly produce reconstructions but with significant artifacts. Recently, model-based reconstruction approaches using a linear…
▽ More
Ultrasound reflection tomography is widely used to image large complex specimens that are only accessible from a single side, such as well systems and nuclear power plant containment walls. Typical methods for inverting the measurement rely on delay-and-sum algorithms that rapidly produce reconstructions but with significant artifacts. Recently, model-based reconstruction approaches using a linear forward model have been shown to significantly improve image quality compared to the conventional approach. However, even these techniques result in artifacts for complex objects because of the inherent non-linearity of the ultrasound forward model.
In this paper, we propose a non-iterative model-based reconstruction method for inverting measurements that are based on non-linear forward models for ultrasound imaging. Our approach involves obtaining an approximate estimate of the reconstruction using a simple linear back-projection and training a deep neural network to refine this to the actual reconstruction. We apply our method to simulated ultrasound data and demonstrate dramatic improvements in image quality compared to the delay-and-sum approach and the linear model-based reconstruction approach.
△ Less
Submitted 28 September, 2018; v1 submitted 3 July, 2018;
originally announced July 2018.