Search | arXiv e-print repository

An Open Software Suite for Event-Based Video

Abstract: While traditional video representations are organized around discrete image frames, event-based video is a new paradigm that forgoes image frames altogether. Rather, pixel samples are temporally asynchronous and independent of one another. Until now, researchers have lacked a cohesive software framework for exploring the representation, compression, and applications of event-based video. I present… ▽ More While traditional video representations are organized around discrete image frames, event-based video is a new paradigm that forgoes image frames altogether. Rather, pixel samples are temporally asynchronous and independent of one another. Until now, researchers have lacked a cohesive software framework for exploring the representation, compression, and applications of event-based video. I present the AD$Δ$ER software suite to fill this gap. This framework includes utilities for transcoding framed and multimodal event-based video sources to a common representation, rate control mechanisms, lossy compression, application support, and an interactive GUI for transcoding and playback. In this paper, I describe these various software components and their usage. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.09515 [pdf, other]

Enhancing Surveillance Camera FOV Quality via Semantic Line Detection and Classification with Deep Hough Transform

Authors: Andrew C. Freeman, Wen**g Shi, Bin Hwang

Abstract: The quality of recorded videos and images is significantly influenced by the camera's field of view (FOV). In critical applications like surveillance systems and self-driving cars, an inadequate FOV can give rise to severe safety and security concerns, including car accidents and thefts due to the failure to detect individuals and objects. The conventional methods for establishing the correct FOV… ▽ More The quality of recorded videos and images is significantly influenced by the camera's field of view (FOV). In critical applications like surveillance systems and self-driving cars, an inadequate FOV can give rise to severe safety and security concerns, including car accidents and thefts due to the failure to detect individuals and objects. The conventional methods for establishing the correct FOV heavily rely on human judgment and lack automated mechanisms to assess video and image quality based on FOV. In this paper, we introduce an innovative approach that harnesses semantic line detection and classification alongside deep Hough transform to identify semantic lines, thus ensuring a suitable FOV by understanding 3D view through parallel lines. Our approach yields an effective F1 score of 0.729 on the public EgoCart dataset, coupled with a notably high median score in the line placement metric. We illustrate that our method offers a straightforward means of assessing the quality of the camera's field of view, achieving a classification accuracy of 83.8\%. This metric can serve as a proxy for evaluating the potential performance of video and image quality applications. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Appeared in the WACV 2024 Workshop on Image/Video/Audio Quality in Computer Vision and Generative AI

arXiv:2312.08213 [pdf, other]

Accelerated Event-Based Feature Detection and Compression for Surveillance Video Systems

Authors: Andrew C. Freeman, Ketan Mayer-Patel, Montek Singh

Abstract: The strong temporal consistency of surveillance video enables compelling compression performance with traditional methods, but downstream vision applications operate on decoded image frames with a high data rate. Since it is not straightforward for applications to extract information on temporal redundancy from the compressed video representations, we propose a novel system which conveys temporal… ▽ More The strong temporal consistency of surveillance video enables compelling compression performance with traditional methods, but downstream vision applications operate on decoded image frames with a high data rate. Since it is not straightforward for applications to extract information on temporal redundancy from the compressed video representations, we propose a novel system which conveys temporal redundancy within a sparse decompressed representation. We leverage a video representation framework called ADDER to transcode framed videos to sparse, asynchronous intensity samples. We introduce mechanisms for content adaptation, lossy compression, and asynchronous forms of classical vision algorithms. We evaluate our system on the VIRAT surveillance video dataset, and we show a median 43.7% speed improvement in FAST feature detection compared to OpenCV. We run the same algorithm as OpenCV, but only process pixels that receive new asynchronous events, rather than process every pixel in an image frame. Our work paves the way for upcoming neuromorphic sensors and is amenable to future applications with spiking neural networks. △ Less

Submitted 8 February, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: Accepted for publication in the proceedings of ACM Multimedia Systems '24

arXiv:2312.07523 [pdf, other]

Self-Healing Distributed Swarm Formation Control Using Image Moments

Authors: C. Lin Liu, Israel L. Donato Ridgley, Matthew L. Elwin, Michael Rubenstein, Randy A. Freeman, Kevin M. Lynch

Abstract: Human-swarm interaction is facilitated by a low-dimensional encoding of the swarm formation, independent of the (possibly large) number of robots. We propose using image moments to encode two-dimensional formations of robots. Each robot knows its pose and the desired formation moments, and simultaneously estimates the current moments of the entire swarm while controlling its motion to better achie… ▽ More Human-swarm interaction is facilitated by a low-dimensional encoding of the swarm formation, independent of the (possibly large) number of robots. We propose using image moments to encode two-dimensional formations of robots. Each robot knows its pose and the desired formation moments, and simultaneously estimates the current moments of the entire swarm while controlling its motion to better achieve the desired group moments. The estimator is a distributed optimization, requiring no centralized processing, and self-healing, meaning that the process is robust to initialization errors, packet drops, and robots being added to or removed from the swarm. Our experimental results with a swarm of 50 robots, suffering nearly 50% packet loss, show that distributed estimation and control of image moments effectively achieves desired swarm formations. △ Less

Submitted 8 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.11183 [pdf, other]

doi 10.1109/LRA.2024.3360020

Deploying and Evaluating LLMs to Program Service Mobile Robots

Authors: Zichao Hu, Francesca Lucchetti, Claire Schlesinger, Yash Saxena, Anders Freeman, Sadanand Modak, Arjun Guha, Joydeep Biswas

Abstract: Recent advancements in large language models (LLMs) have spurred interest in using them for generating robot programs from natural language, with promising initial results. We investigate the use of LLMs to generate programs for service mobile robots leveraging mobility, perception, and human interaction skills, and where accurate sequencing and ordering of actions is crucial for success. We contr… ▽ More Recent advancements in large language models (LLMs) have spurred interest in using them for generating robot programs from natural language, with promising initial results. We investigate the use of LLMs to generate programs for service mobile robots leveraging mobility, perception, and human interaction skills, and where accurate sequencing and ordering of actions is crucial for success. We contribute CodeBotler, an open-source robot-agnostic tool to program service mobile robots from natural language, and RoboEval, a benchmark for evaluating LLMs' capabilities of generating programs to complete service robot tasks. CodeBotler performs program generation via few-shot prompting of LLMs with an embedded domain-specific language (eDSL) in Python, and leverages skill abstractions to deploy generated programs on any general-purpose mobile robot. RoboEval evaluates the correctness of generated programs by checking execution traces starting with multiple initial states, and checking whether the traces satisfy temporal logic properties that encode correctness for each task. RoboEval also includes multiple prompts per task to test for the robustness of program generation. We evaluate several popular state-of-the-art LLMs with the RoboEval benchmark, and perform a thorough analysis of the modes of failures, resulting in a taxonomy that highlights common pitfalls of LLMs at generating robot programs. We release our code and benchmark at https://amrl.cs.utexas.edu/codebotler/. △ Less

Submitted 21 February, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

Comments: 8 pages, Accepted at IEEE Robotics and Automation Letters (RA-L)

Journal ref: IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2853-2860, March 2024

arXiv:2308.09895 [pdf, other]

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Authors: Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, Arjun Guha

Abstract: Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript)… ▽ More Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available. Low resource languages include OCaml, Racket, and several others. This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, MultiPL-T, translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize tests for commented code from a high-resource language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate Python code to a target low-resource language, and use tests to validate the translation. We apply this approach to generate tens of thousands of validated training items for Julia, Lua, OCaml, R, and Racket. Furthermore, we use an open model (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done. With MultiPL-T generated data, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket. On established benchmarks (MultiPL-E), these models outperform other open Code LLMs. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer. △ Less

Submitted 10 February, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

arXiv:2308.07246 [pdf, other]

Self-Healing First-Order Distributed Optimization with Packet Loss

Authors: Israel L. Donato Ridgley, Randy A. Freeman, Kevin M. Lynch

Abstract: We describe SH-SVL, a parameterized family of first-order distributed optimization algorithms that enable a network of agents to collaboratively calculate a decision variable that minimizes the sum of cost functions at each agent. These algorithms are self-healing in that their convergence to the correct optimizer can be guaranteed even if they are initialized randomly, agents join or leave the ne… ▽ More We describe SH-SVL, a parameterized family of first-order distributed optimization algorithms that enable a network of agents to collaboratively calculate a decision variable that minimizes the sum of cost functions at each agent. These algorithms are self-healing in that their convergence to the correct optimizer can be guaranteed even if they are initialized randomly, agents join or leave the network, or local cost functions change. We also present simulation evidence that our algorithms are self-healing in the case of dropped communication packets. Our algorithms are the first single-Laplacian methods for distributed convex optimization to exhibit all of these characteristics. We achieve self-healing by sacrificing internal stability, a fundamental trade-off for single-Laplacian methods. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2104.01959

arXiv:2301.08783 [pdf, other]

doi 10.1145/3587819.3590969

An Asynchronous Intensity Representation for Framed and Event Video Sources

Authors: Andrew C. Freeman, Montek Singh, Ketan Mayer-Patel

Abstract: Neuromorphic "event" cameras, designed to mimic the human vision system with asynchronous sensing, unlock a new realm of high-speed and high dynamic range applications. However, researchers often either revert to a framed representation of event data for applications, or build bespoke applications for a particular camera's event data type. To usher in the next era of video systems, accommodate new… ▽ More Neuromorphic "event" cameras, designed to mimic the human vision system with asynchronous sensing, unlock a new realm of high-speed and high dynamic range applications. However, researchers often either revert to a framed representation of event data for applications, or build bespoke applications for a particular camera's event data type. To usher in the next era of video systems, accommodate new event camera designs, and explore the benefits to asynchronous video in classical applications, we argue that there is a need for an asynchronous, source-agnostic video representation. In this paper, we introduce a novel, asynchronous intensity representation for both framed and non-framed data sources. We show that our representation can increase intensity precision and greatly reduce the number of samples per pixel compared to grid-based representations. With framed sources, we demonstrate that by permitting a small amount of loss through the temporal averaging of similar pixel values, we can reduce our representational sample rate by more than half, while incurring a drop in VMAF quality score of only 4.5. We also demonstrate lower latency than the state-of-the-art method for fusing and transcoding framed and event camera data to an intensity representation, while maintaining $2000\times$ the temporal resolution. We argue that our method provides the computational efficiency and temporal granularity necessary to build real-time intensity-based applications for event cameras. △ Less

Submitted 20 January, 2023; originally announced January 2023.

Comments: 10 pages

arXiv:2206.14293 [pdf, other]

doi 10.1109/LRA.2022.3226366

Human-Multirobot Collaborative Mobile Manipulation: the Omnid Mocobots

Authors: Matthew L. Elwin, Billie Strong, Randy A. Freeman, Kevin M. Lynch

Abstract: The Omnid human-collaborative mobile manipulators are an experimental platform for testing control architectures for autonomous and human-collaborative multirobot mobile manipulation. An Omnid consists of a mecanum-wheel omnidirectional mobile base and a series-elastic Delta-type parallel manipulator, and it is a specific implementation of a broader class of mobile collaborative robots ("mocobots"… ▽ More The Omnid human-collaborative mobile manipulators are an experimental platform for testing control architectures for autonomous and human-collaborative multirobot mobile manipulation. An Omnid consists of a mecanum-wheel omnidirectional mobile base and a series-elastic Delta-type parallel manipulator, and it is a specific implementation of a broader class of mobile collaborative robots ("mocobots") suitable for safe human co-manipulation of delicate, flexible, and articulated payloads. Key features of mocobots include passive compliance, for the safety of the human and the payload, and high-fidelity end-effector force control independent of the potentially imprecise motions of the mobile base. We describe general considerations for the design of teams of mocobots; the design of the Omnids in light of these considerations; manipulator and mobile base controllers to achieve useful multirobot collaborative behaviors; and initial experiments in human-multirobot collaborative mobile manipulation of large, unwieldy payloads. For these experiments, the only communication among the humans and Omnids is mechanical, through the payload. △ Less

Submitted 29 November, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

Comments: 8 pages, 10 figures. Videos available at https://www.youtube.com/watch?v=SEuFfONryL0. Submitted to IEEE Robotics and Automation Letters (RA-L)

Journal ref: IEEE Robotics and Automation Letters (RA-L), January 2023, Volume 8, Issue 1, Pages 376-383, ISSN 2377-3766

arXiv:2104.01959 [pdf, other]

doi 10.1109/CDC45484.2021.9683487

Self-Healing First-Order Distributed Optimization

Authors: Israel L. Donato Ridgley, Randy A. Freeman, Kevin M. Lynch

Abstract: In this paper we describe a parameterized family of first-order distributed optimization algorithms that enable a network of agents to collaboratively calculate a decision variable that minimizes the sum of cost functions at each agent. These algorithms are self-healing in that their correctness is guaranteed even if they are initialized randomly, agents drop in or out of the network, local cost f… ▽ More In this paper we describe a parameterized family of first-order distributed optimization algorithms that enable a network of agents to collaboratively calculate a decision variable that minimizes the sum of cost functions at each agent. These algorithms are self-healing in that their correctness is guaranteed even if they are initialized randomly, agents drop in or out of the network, local cost functions change, or communication packets are dropped. Our algorithms are the first single-Laplacian methods to exhibit all of these characteristics. We achieve self-healing by sacrificing internal stability, a fundamental trade-off for single-Laplacian methods. △ Less

Submitted 12 April, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

Comments: Corrected equation (40) by changing "min" to "max", results unaffected

Journal ref: 2021 60th IEEE Conference on Decision and Control (CDC), Austin, TX, USA, 2021, pp. 3850-3856

arXiv:2012.09982 [pdf, other]

doi 10.1121/10.0004221

Deep embedded clustering of coral reef bioacoustics

Authors: Emma Ozanich, Aaron Thode, Peter Gerstoft, Lauren A. Freeman, Simon Freeman

Abstract: Deep clustering was applied to unlabeled, automatically detected signals in a coral reef soundscape to distinguish fish pulse calls from segments of whale song. Deep embedded clustering (DEC) learned latent features and formed classification clusters using fixed-length power spectrograms of the signals. Handpicked spectral and temporal features were also extracted and clustered with Gaussian mixtu… ▽ More Deep clustering was applied to unlabeled, automatically detected signals in a coral reef soundscape to distinguish fish pulse calls from segments of whale song. Deep embedded clustering (DEC) learned latent features and formed classification clusters using fixed-length power spectrograms of the signals. Handpicked spectral and temporal features were also extracted and clustered with Gaussian mixture models (GMM) and conventional clustering. DEC, GMM, and conventional clustering were tested on simulated datasets of fish pulse calls (fish) and whale song units (whale) with randomized bandwidth, duration, and SNR. Both GMM and DEC achieved high accuracy and identified clusters with fish, whale, and overlap** fish and whale signals. Conventional clustering methods had low accuracy in scenarios with unequal-sized clusters or overlap** signals. Fish and whale signals recorded near Hawaii in February-March 2020 were clustered with DEC, GMM, and conventional clustering. DEC features demonstrated the highest accuracy of 77.5% on a small, manually labeled dataset for classifying signals into fish and whale clusters. △ Less

Submitted 21 March, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

Comments: to appear in Journal of the Acoustical Society of America, April 2021

Journal ref: Journal of the Acoustical Society of America 149 (2021) 2587-2601

Showing 1–11 of 11 results for author: Freeman, A