Skip to main content

Showing 1–18 of 18 results for author: Marcu, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10165  [pdf, other

    cs.CV cs.RO

    CarLLaVA: Vision language models for camera-only closed-loop driving

    Authors: Katrin Renz, Long Chen, Ana-Maria Marcu, Jan Hünermann, Benoit Hanotte, Alice Karnsund, Jamie Shotton, Elahe Arani, Oleg Sinavski

    Abstract: In this technical report, we present CarLLaVA, a Vision Language Model (VLM) for autonomous driving, developed for the CARLA Autonomous Driving Challenge 2.0. CarLLaVA uses the vision encoder of the LLaVA VLM and the LLaMA architecture as backbone, achieving state-of-the-art closed-loop driving performance with only camera input and without the need for complex or expensive labels. Additionally, w… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Outstanding Champion & Innovation Award @ CARLA Autonomous Driving Challenge 2024; Project video: https://youtu.be/E1nsEgcHRuc

  2. arXiv:2402.06385  [pdf, other

    cs.CV

    Maia: A Real-time Non-Verbal Chat for Human-AI Interaction

    Authors: Dragos Costea, Alina Marcu, Cristina Lazar, Marius Leordeanu

    Abstract: Face-to-face communication modeling in computer vision is an area of research focusing on develo** algorithms that can recognize and analyze non-verbal cues and behaviors during face-to-face interactions. We propose an alternative to text chats for Human-AI interaction, based on non-verbal visual communication only, using facial expressions and head movements that mirror, but also improvise over… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 5 pages, 3 figures

  3. arXiv:2312.14115  [pdf, other

    cs.RO cs.AI cs.CV

    LingoQA: Video Question Answering for Autonomous Driving

    Authors: Ana-Maria Marcu, Long Chen, Jan Hünermann, Alice Karnsund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badrinarayanan, Alex Kendall, Jamie Shotton, Elahe Arani, Oleg Sinavski

    Abstract: Autonomous driving has long faced a challenge with public acceptance due to the lack of explainability in the decision-making process. Video question-answering (QA) in natural language provides the opportunity for bridging this gap. Nonetheless, evaluating the performance of Video QA models has proved particularly tough due to the absence of comprehensive benchmarks. To fill this gap, we introduce… ▽ More

    Submitted 19 March, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Benchmark and dataset are available at https://github.com/wayveai/LingoQA/

  4. arXiv:2308.11021  [pdf, other

    cs.CV cs.LG

    Multi-Task Hypergraphs for Semi-supervised Learning using Earth Observations

    Authors: Mihai Pirvu, Alina Marcu, Alexandra Dobrescu, Nabil Belbachir, Marius Leordeanu

    Abstract: There are many ways of interpreting the world and they are highly interdependent. We exploit such complex dependencies and introduce a powerful multi-task hypergraph, in which every node is a task and different paths through the hypergraph reaching a given task become unsupervised teachers, by forming ensembles that learn to generate reliable pseudolabels for that task. Each hyperedge is part of a… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted in ICCV 2023 Workshops

  5. arXiv:2308.07615  [pdf, other

    cs.CV

    Self-supervised Hypergraphs for Learning Multiple World Interpretations

    Authors: Alina Marcu, Mihai Pirvu, Dragos Costea, Emanuela Haller, Emil Slusanschi, Ahmed Nabil Belbachir, Rahul Sukthankar, Marius Leordeanu

    Abstract: We present a method for learning multiple scene representations given a small labeled set, by exploiting the relationships between such representations in the form of a multi-task hypergraph. We also show how we can use the hypergraph to improve a powerful pretrained VisTransformer model without any additional labeled data. In our hypergraph, each node is an interpretation layer (e.g., depth or se… ▽ More

    Submitted 21 August, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted in ICCV 2023 Workshops

  6. arXiv:2306.14709  [pdf, other

    cs.CV cs.LG cs.RO

    Self-supervised novel 2D view synthesis of large-scale scenes with efficient multi-scale voxel carving

    Authors: Alexandra Budisteanu, Dragos Costea, Alina Marcu, Marius Leordeanu

    Abstract: The task of generating novel views of real scenes is increasingly important nowadays when AI models become able to create realistic new worlds. In many practical applications, it is important for novel view synthesis methods to stay grounded in the physical world as much as possible, while also being able to imagine it from previously unseen views. While most current methods are developed and test… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: 11 pages, 3 figures

  7. arXiv:2211.13734  [pdf, other

    cs.CV cs.LG

    On Pitfalls of Measuring Occlusion Robustness through Data Distortion

    Authors: Antonia Marcu

    Abstract: Over the past years, the crucial role of data has largely been shadowed by the field's focus on architectures and training procedures. We often cause changes to the data without being aware of their wider implications. In this paper we show that distorting images without accounting for the artefacts introduced leads to biased results when establishing occlusion robustness. To ensure models behave… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2111.11514

  8. arXiv:2202.07350  [pdf, other

    cs.LG

    Generalisation and the Risk--Entropy Curve

    Authors: Dominic Belcher, Antonia Marcu, Adam Prügel-Bennett

    Abstract: In this paper we show that the expected generalisation performance of a learning machine is determined by the distribution of risks or equivalently its logarithm -- a quantity we term the risk entropy -- and the fluctuations in a quantity we call the training ratio. We show that the risk entropy can be empirically inferred for deep neural network models using Markov Chain Monte Carlo techniques. R… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

  9. arXiv:2111.11514  [pdf, other

    cs.LG

    On Data-centric Myths

    Authors: Antonia Marcu, Adam Prügel-Bennett

    Abstract: The community lacks theory-informed guidelines for building good data sets. We analyse theoretical directions relating to what aspects of the data matter and conclude that the intuitions derived from the existing literature are incorrect and misleading. Using empirical counter-examples, we show that 1) data dimension should not necessarily be minimised and 2) when manipulating data, preserving the… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

    Comments: arXiv admin note: text overlap with arXiv:2110.13968

  10. arXiv:2110.13968  [pdf, other

    cs.LG

    On the Effects of Artificial Data Modification

    Authors: Antonia Marcu, Adam Prügel-Bennett

    Abstract: Data distortion is commonly applied in vision models during both training (e.g methods like MixUp and CutMix) and evaluation (e.g. shape-texture bias and robustness). This data modification can introduce artificial information. It is often assumed that the resulting artefacts are detrimental to training, whilst being negligible when analysing models. We investigate these assumptions and conclude t… ▽ More

    Submitted 6 July, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

  11. arXiv:2010.01910  [pdf, other

    cs.CV

    Semantics through Time: Semi-supervised Segmentation of Aerial Videos with Iterative Label Propagation

    Authors: Alina Marcu, Vlad Licaret, Dragos Costea, Marius Leordeanu

    Abstract: Semantic segmentation is a crucial task for robot navigation and safety. However, current supervised methods require a large amount of pixelwise annotations to yield accurate results. Labeling is a tedious and time consuming process that has hampered progress in low altitude UAV applications. This paper makes an important step towards automatic annotation by introducing SegProp, a novel iterative… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: Accepted as oral presentation at Asian Conference on Computer Vision (ACCV), 2020. arXiv admin note: text overlap with arXiv:1910.10026

  12. arXiv:2010.01086  [pdf, other

    cs.CV

    Semi-Supervised Learning for Multi-Task Scene Understanding by Neural Graph Consensus

    Authors: Marius Leordeanu, Mihai Pirvu, Dragos Costea, Alina Marcu, Emil Slusanschi, Rahul Sukthankar

    Abstract: We address the challenging problem of semi-supervised learning in the context of multiple visual interpretations of the world by finding consensus in a graph of neural networks. Each graph node is a scene interpretation layer, while each edge is a deep net that transforms one layer at one node into another from a different node. During the supervised phase edge networks are trained independently.… ▽ More

    Submitted 3 December, 2020; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: Accepted at the 35th AAAI Conference on Artificial Intelligence (AAAI 2021)

  13. arXiv:2002.12047  [pdf, other

    cs.LG cs.CV cs.IT stat.ML

    FMix: Enhancing Mixed Sample Data Augmentation

    Authors: Ethan Harris, Antonia Marcu, Matthew Painter, Mahesan Niranjan, Adam Prügel-Bennett, Jonathon Hare

    Abstract: Mixed Sample Data Augmentation (MSDA) has received increasing attention in recent years, with many successful variants such as MixUp and CutMix. By studying the mutual information between the function learned by a VAE on the original data and on the augmented data we show that MixUp distorts learned functions in a way that CutMix does not. We further demonstrate this by showing that MixUp acts as… ▽ More

    Submitted 28 February, 2021; v1 submitted 27 February, 2020; originally announced February 2020.

    Comments: Code available at https://github.com/ecs-vlc/FMix

  14. arXiv:1911.04301  [pdf, other

    cs.LG stat.ML

    Rethinking Generalisation

    Authors: Antonia Marcu, Adam Prügel-Bennett

    Abstract: In this paper, a new approach to computing the generalisation performance is presented that assumes the distribution of risks, $ρ(r)$, for a learning scenario is known. From this, the expected error of a learning machine using empirical risk minimisation is computed for both classification and regression problems. A critical quantity in determining the generalisation performance is the power-law b… ▽ More

    Submitted 26 March, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

  15. arXiv:1910.10026  [pdf, other

    cs.CV

    Towards Automatic Annotation for Semantic Segmentation in Drone Videos

    Authors: Alina Marcu, Dragos Costea, Vlad Licaret, Marius Leordeanu

    Abstract: Semantic segmentation is a crucial task for robot navigation and safety. However, it requires huge amounts of pixelwise annotations to yield accurate results. While recent progress in computer vision algorithms has been heavily boosted by large ground-level datasets, the labeling time has hampered progress in low altitude UAV applications, mostly due to the difficulty imposed by large object scale… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

    Comments: 7 pages, 6 figures, submitted at the International Conference on Robotics and Automation (ICRA) 2020

  16. arXiv:1804.01322  [pdf, other

    cs.CV

    A Multi-Stage Multi-Task Neural Network for Aerial Scene Interpretation and Geolocalization

    Authors: Alina Marcu, Dragos Costea, Emil Slusanschi, Marius Leordeanu

    Abstract: Semantic segmentation and vision-based geolocalization in aerial images are challenging tasks in computer vision. Due to the advent of deep convolutional nets and the availability of relatively low cost UAVs, they are currently generating a growing attention in the field. We propose a novel multi-task multi-stage neural network that is able to handle the two problems at the same time, in a single… ▽ More

    Submitted 4 April, 2018; originally announced April 2018.

    Comments: 23 pages, 11 figures. Under review at the 15th European Conference on Computer Vision (ECCV 2018)

  17. arXiv:1607.05620  [pdf, other

    cs.CV

    A Local-Global Approach to Semantic Segmentation in Aerial Images

    Authors: Alina Elena Marcu

    Abstract: Aerial images are often taken under poor lighting conditions and contain low resolution objects, many times occluded by other objects. In this domain, visual context could be of great help, but there are still very few papers that consider context in aerial image understanding and still remains an open problem in computer vision. We propose a dual-stream deep neural network that processes informat… ▽ More

    Submitted 19 July, 2016; originally announced July 2016.

    Comments: 50 pages, 18 figures. Master's Thesis, University Politehnica of Bucharest

  18. arXiv:1605.05462  [pdf, other

    cs.CV

    Dual Local-Global Contextual Pathways for Recognition in Aerial Imagery

    Authors: Alina Marcu, Marius Leordeanu

    Abstract: Visual context is important in object recognition and it is still an open problem in computer vision. Along with the advent of deep convolutional neural networks (CNN), using contextual information with such systems starts to receive attention in the literature. At the same time, aerial imagery is gaining momentum. While advances in deep learning make good progress in aerial image analysis, this p… ▽ More

    Submitted 18 May, 2016; originally announced May 2016.