Skip to main content

Showing 1–27 of 27 results for author: Popescu, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18266  [pdf, other

    cs.CL

    "Vorbeşti Româneşte?" A Recipe to Train Powerful Romanian LLMs with English Instructions

    Authors: Mihai Masala, Denis C. Ilie-Ablachim, Alexandru Dima, Dragos Corlatescu, Miruna Zavelca, Ovio Olaru, Simina Terian, Andrei Terian, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian Rebedea

    Abstract: In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English; hence, their performance in English greatly exceeds other languages. To our knowledge, we are the first to collect and translate a large collection of texts, instructions, and benchmarks and trai… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.07703

  2. arXiv:2405.07703  [pdf, other

    cs.CL

    OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs

    Authors: Mihai Masala, Denis C. Ilie-Ablachim, Dragos Corlatescu, Miruna Zavelca, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian Rebedea

    Abstract: In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English. Hence, their performance in English greatly exceeds their performance in other languages. This document presents our approach to training and evaluating the first foundational and chat LLM specia… ▽ More

    Submitted 17 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  3. arXiv:2402.16197  [pdf

    cs.SE cs.LG cs.PL

    Language Models for Code Completion: A Practical Evaluation

    Authors: Maliheh Izadi, Jonathan Katzy, Tim van Dam, Marc Otten, Razvan Mihai Popescu, Arie van Deursen

    Abstract: Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public code language models when completing real-world code. We first developed an open-source IDE extension, Code4Me, for the online evaluation of the models. We collect… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: To be published in the proceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE 2024)

  4. arXiv:2306.12041  [pdf, other

    cs.CV cs.LG

    Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

    Authors: Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Radu Tudor Ionescu, Marius Popescu, Fahad Shahbaz Khan, Mubarak Shah

    Abstract: We propose an efficient abnormal event detection model based on a lightweight masked auto-encoder (AE) applied at the video frame level. The novelty of the proposed model is threefold. First, we introduce an approach to weight tokens based on motion gradients, thus shifting the focus from the static background scene to the foreground objects. Second, we integrate a teacher decoder and a student de… ▽ More

    Submitted 9 March, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: Accepted at CVPR 2024

  5. arXiv:2207.03477  [pdf, other

    cs.CL

    VeriDark: A Large-Scale Benchmark for Authorship Verification on the Dark Web

    Authors: Andrei Manolache, Florin Brad, Antonio Barbalau, Radu Tudor Ionescu, Marius Popescu

    Abstract: The DarkWeb represents a hotbed for illicit activity, where users communicate on different market forums in order to exchange goods and services. Law enforcement agencies benefit from forensic tools that perform authorship analysis, in order to identify and profile users based on their textual content. However, authorship analysis has been traditionally studied using corpora featuring literary tex… ▽ More

    Submitted 1 November, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks. 21 pages, 4 figures, 11 tables

  6. arXiv:2201.12216  [pdf, other

    cs.CV cs.LG

    Self-paced learning to improve text row detection in historical documents with missing labels

    Authors: Mihaela Gaman, Lida Ghadamiyan, Radu Tudor Ionescu, Marius Popescu

    Abstract: An important preliminary step of optical character recognition systems is the detection of text rows. To address this task in the context of historical data with missing labels, we propose a self-paced learning algorithm capable of improving the row detection performance. We conjecture that pages with more ground-truth bounding boxes are less likely to have missing annotations. Based on this hypot… ▽ More

    Submitted 15 August, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: Accepted at ECCV Workshop on Text in Everything (TiE 2022)

  7. arXiv:2112.05125  [pdf, other

    cs.CL

    Rethinking the Authorship Verification Experimental Setups

    Authors: Florin Brad, Andrei Manolache, Elena Burceanu, Antonio Barbalau, Radu Ionescu, Marius Popescu

    Abstract: One of the main drivers of the recent advances in authorship verification is the PAN large-scale authorship dataset. Despite generating significant progress in the field, inconsistent performance differences between the closed and open test sets have been reported. To this end, we improve the experimental setup by proposing five new public splits over the PAN dataset, specifically designed to isol… ▽ More

    Submitted 1 November, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: Accepted as a short paper at the EMNLP 2022 conference. 10 pages, 5 figures, 9 tables

  8. arXiv:2109.01745  [pdf, other

    cs.CV cs.LG

    A realistic approach to generate masked faces applied on two novel masked face recognition data sets

    Authors: Tudor Mare, Georgian Duta, Mariana-Iuliana Georgescu, Adrian Sandru, Bogdan Alexe, Marius Popescu, Radu Tudor Ionescu

    Abstract: The COVID-19 pandemic raises the problem of adapting face recognition systems to the new reality, where people may wear surgical masks to cover their noses and mouths. Traditional data sets (e.g., CelebA, CASIA-WebFace) used for training these systems were released before the pandemic, so they now seem unsuited due to the lack of examples of people wearing masks. We propose a method for enhancing… ▽ More

    Submitted 25 October, 2021; v1 submitted 3 September, 2021; originally announced September 2021.

    Comments: Accepted at NeurIPS 2021

  9. arXiv:2107.05754  [pdf, other

    cs.CR cs.CV cs.LG

    EvoBA: An Evolution Strategy as a Strong Baseline forBlack-Box Adversarial Attacks

    Authors: Andrei Ilie, Marius Popescu, Alin Stefanescu

    Abstract: Recent work has shown how easily white-box adversarial attacks can be applied to state-of-the-art image classifiers. However, real-life scenarios resemble more the black-box adversarial conditions, lacking transparency and usually imposing natural, hard constraints on the query budget. We propose $\textbf{EvoBA}$, a black-box adversarial attack based on a surprisingly simple evolutionary search… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

  10. arXiv:2011.07491  [pdf, other

    cs.CV cs.LG eess.IV

    Anomaly Detection in Video via Self-Supervised and Multi-Task Learning

    Authors: Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah

    Abstract: Anomaly detection in video is a challenging computer vision problem. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without full supervision. In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level. We first utilize a pre-trained detector to detect objects. The… ▽ More

    Submitted 10 September, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

    Comments: Accepted at CVPR 2021. Main paper and supplementary are both included

  11. arXiv:2010.11158  [pdf, other

    cs.CV cs.LG cs.NE

    Black-Box Ripper: Copying black-box models using generative evolutionary algorithms

    Authors: Antonio Barbalau, Adrian Cosma, Radu Tudor Ionescu, Marius Popescu

    Abstract: We study the task of replicating the functionality of black-box neural models, for which we only know the output class probabilities provided for a set of input images. We assume back-propagation through the black-box model is not possible and its training images are not available, e.g. the model could be exposed only through an API. In this context, we present a teacher-student framework that can… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Accepted as Oral at NeurIPS 2020

  12. arXiv:2010.11081  [pdf, other

    eess.IV cs.CV

    Anatomically-Informed Deep Learning on Contrast-Enhanced Cardiac MRI for Scar Segmentation and Clinical Feature Extraction

    Authors: Haley G. Abramson, Dan M. Popescu, Rebecca Yu, Changxin Lai, Julie K. Shade, Katherine C. Wu, Mauro Maggioni, Natalia A. Trayanova

    Abstract: Visualizing disease-induced scarring and fibrosis in the heart on cardiac magnetic resonance (CMR) imaging with contrast enhancement (LGE) is paramount in characterizing disease progression and quantifying pathophysiological substrates of arrhythmias. However, segmentation and scar/fibrosis identification from LGE-CMR is an intensive manual process prone to large inter-observer variability. Here,… ▽ More

    Submitted 8 January, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: Haley G. Abramson and Dan M. Popescu contributed equally to this work

  13. A Background-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video

    Authors: Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah

    Abstract: Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years. The complexity of the task arises from the commonly-adopted definition of an abnormal event, that is, a rarely occurring event that typically depends on the surrounding context. Following the standard formulation of abnormal event detection as outlier detection, we propo… ▽ More

    Submitted 6 April, 2023; v1 submitted 27 August, 2020; originally announced August 2020.

    Comments: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence

  14. arXiv:2006.03896  [pdf, other

    cs.LG stat.ML

    A Generic and Model-Agnostic Exemplar Synthetization Framework for Explainable AI

    Authors: Antonio Barbalau, Adrian Cosma, Radu Tudor Ionescu, Marius Popescu

    Abstract: With the growing complexity of deep learning methods adopted in practical applications, there is an increasing and stringent need to explain and interpret the decisions of such methods. In this work, we focus on explainable AI and propose a novel generic and model-agnostic framework for synthesizing input exemplars that maximize a desired response from a machine learning model. To this end, we use… ▽ More

    Submitted 4 August, 2020; v1 submitted 6 June, 2020; originally announced June 2020.

    Comments: Accepted at ECML-PKDD 2020

  15. arXiv:2004.10605  [pdf, ps, other

    cs.CV cs.LG eess.IV stat.ML

    Self-Supervised Representation Learning on Document Images

    Authors: Adrian Cosma, Mihai Ghidoveanu, Michael Panaitescu-Liess, Marius Popescu

    Abstract: This work analyses the impact of self-supervised pre-training on document images in the context of document image classification. While previous approaches explore the effect of self-supervision on natural images, we show that patch-based pre-training performs poorly on document images because of their different structural properties and poor intra-sample semantic information. We propose two conte… ▽ More

    Submitted 27 May, 2020; v1 submitted 18 April, 2020; originally announced April 2020.

    Comments: 15 pages, 5 figures. Accepted at DAS 2020: IAPR International Workshop on Document Analysis Systems

    MSC Class: 68T05

  16. arXiv:1912.02259  [pdf, other

    cs.CV

    Extending the Morphological Hit-or-Miss Transform to Deep Neural Networks

    Authors: Muhammad Aminul Islam, Bryce Murray, Andrew Buck, Derek T. Anderson, Grant Scott, Mihail Popescu, James Keller

    Abstract: While most deep learning architectures are built on convolution, alternative foundations like morphology are being explored for purposes like interpretability and its connection to the analysis and processing of geometric structures. The morphological hit-or-miss operation has the advantage that it takes into account both foreground and background information when evaluating target shape in an ima… ▽ More

    Submitted 27 September, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

  17. Mega-Archive and the EURONEAR Tools for Datamining World Astronomical Images

    Authors: Ovidiu Vaduvescu, Lucian Curelaru, Marcel Popescu

    Abstract: The world astronomical image archives represent huge opportunities to time-domain astronomy sciences and other hot topics such as space defense, and astronomical observatories should improve this wealth and make it more accessible in the big data era. In 2010 we introduced the Mega-Archive database and the Mega-Precovery server for data mining images containing Solar system bodies, with focus on n… ▽ More

    Submitted 21 May, 2019; originally announced May 2019.

    Comments: Paper submitted to Astronomy and Computing (25 Mar 2019)

    Report number: Accepted, available online 12 Dec 2019

    Journal ref: Astronomy and Computing, 2019

  18. Local Learning with Deep and Handcrafted Features for Facial Expression Recognition

    Authors: Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Marius Popescu

    Abstract: We present an approach that combines automatic features learned by convolutional neural networks (CNN) and handcrafted features computed by the bag-of-visual-words (BOVW) model in order to achieve state-of-the-art results in facial expression recognition. To obtain automatic features, we experiment with multiple CNN architectures, pre-trained models and training procedures, e.g. Dense-Sparse-Dense… ▽ More

    Submitted 12 March, 2020; v1 submitted 29 April, 2018; originally announced April 2018.

    Comments: Accepted in IEEE Access

    Journal ref: in IEEE Access, vol. 7, pp. 64827-64836, 2019

  19. arXiv:1801.05030  [pdf, other

    cs.CV

    Detecting abnormal events in video using Narrowed Normality Clusters

    Authors: Radu Tudor Ionescu, Sorina Smeureanu, Marius Popescu, Bogdan Alexe

    Abstract: We formulate the abnormal event detection problem as an outlier detection task and we propose a two-stage algorithm based on k-means clustering and one-class Support Vector Machines (SVM) to eliminate outliers. In the feature extraction stage, we propose to augment spatio-temporal cubes with deep appearance features extracted from the last convolutional layer of a pre-trained neural network. After… ▽ More

    Submitted 16 November, 2018; v1 submitted 12 January, 2018; originally announced January 2018.

    Comments: Accepted at WACV 2019. arXiv admin note: text overlap with arXiv:1705.08182

  20. arXiv:1707.08349  [pdf, other

    cs.CL

    Can string kernels pass the test of time in Native Language Identification?

    Authors: Radu Tudor Ionescu, Marius Popescu

    Abstract: We describe a machine learning approach for the 2017 shared task on Native Language Identification (NLI). The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from essays or speech transcripts, we also use a kernel based on i-vectors, a low-dimensional representation of audio record… ▽ More

    Submitted 4 August, 2017; v1 submitted 26 July, 2017; originally announced July 2017.

    Comments: In Proceedings of the 12th Workshop on Building Educational Applications Using NLP, 2017

  21. arXiv:1705.08280  [pdf, other

    cs.CV

    How hard can it be? Estimating the difficulty of visual search in an image

    Authors: Radu Tudor Ionescu, Bogdan Alexe, Marius Leordeanu, Marius Popescu, Dim P. Papadopoulos, Vittorio Ferrari

    Abstract: We address the problem of estimating image difficulty defined as the human response time for solving a visual search task. We collect human annotations of image difficulty for the PASCAL VOC 2012 data set through a crowd-sourcing platform. We then analyze what human interpretable image properties can have an impact on visual search difficulty, and how accurate are those properties for predicting d… ▽ More

    Submitted 23 May, 2017; originally announced May 2017.

    Comments: Published at CVPR 2016

    Journal ref: In Proceedings of CVPR, pp. 2157-2166, 2016

  22. arXiv:1705.08182  [pdf, other

    cs.CV

    Unmasking the abnormal events in video

    Authors: Radu Tudor Ionescu, Sorina Smeureanu, Bogdan Alexe, Marius Popescu

    Abstract: We propose a novel framework for abnormal event detection in video that requires no training sequences. Our framework is based on unmasking, a technique previously used for authorship verification in text documents, which we adapt to our task. We iteratively train a binary classifier to distinguish between two consecutive video sequences while removing at each step the most discriminant features.… ▽ More

    Submitted 25 July, 2017; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: Accepted at the 2017 International Conference on Computer Vision (ICCV 2017)

  23. arXiv:1509.03591  [pdf

    cs.DC

    High Performance Computer Acoustic Data Accelerator: A New System for Exploring Marine Mammal Acoustics for Big Data Applications

    Authors: Peter Dugan, John Zollweg, Marian Popescu, Denise Risch, Herve Glotin, Yann LeCun, and Christopher Clark

    Abstract: This paper presents a new software model designed for distributed sonic signal detection runtime using machine learning algorithms called DeLMA. A new algorithm--Acoustic Data-mining Accelerator (ADA)--is also presented. ADA is a robust yet scalable solution for efficiently processing big sound archives using distributing computing technologies. Together, DeLMA and the ADA algorithm provide a powe… ▽ More

    Submitted 11 September, 2015; originally announced September 2015.

    Comments: Seven pages, submitted at International Conference on Machine Learning 2014, Workshop uLearnBio, unsupervised learning for bioacoustic applications

    MSC Class: 68-04

  24. arXiv:1307.0414  [pdf, other

    stat.ML cs.LG

    Challenges in Representation Learning: A report on three machine learning contests

    Authors: Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, Yingbo Zhou, Chetan Ramaiah, Fangxiang Feng, Ruifan Li, Xiaojie Wang, Dimitris Athanasakis, John Shawe-Taylor, Maxim Milakov, John Park, Radu Ionescu, Marius Popescu, Cristian Grozea, James Bergstra, **g**g Xie, Lukasz Romaszko , et al. (3 additional authors not shown)

    Abstract: The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kin… ▽ More

    Submitted 1 July, 2013; originally announced July 2013.

    Comments: 8 pages, 2 figures

  25. arXiv:1305.3635  [pdf

    cs.CV

    Bioacoustic Signal Classification Based on Continuous Region Processing, Grid Masking and Artificial Neural Network

    Authors: Mohammad Pourhomayoun, Peter Dugan, Marian Popescu, Christopher Clark

    Abstract: In this paper, we develop a novel method based on machine-learning and image processing to identify North Atlantic right whale (NARW) up-calls in the presence of high levels of ambient and interfering noise. We apply a continuous region algorithm on the spectrogram to extract the regions of interest, and then use grid masking techniques to generate a small feature set that is then used in an artif… ▽ More

    Submitted 17 June, 2013; v1 submitted 15 May, 2013; originally announced May 2013.

    Comments: To be Submitted to "ICML 2013 Workshop on Machine Learning for Bioacoustics", 6 pages, 8 figures

  26. arXiv:1305.3633  [pdf

    cs.CV

    Classification for Big Dataset of Bioacoustic Signals Based on Human Scoring System and Artificial Neural Network

    Authors: Mohammad Pourhomayoun, Peter Dugan, Marian Popescu, Denise Risch, Hal Lewis, Christopher Clark

    Abstract: In this paper, we propose a method to improve sound classification performance by combining signal features, derived from the time-frequency spectrogram, with human perception. The method presented herein exploits an artificial neural network (ANN) and learns the signal features based on the human perception knowledge. The proposed method is applied to a large acoustic dataset containing 24 months… ▽ More

    Submitted 17 June, 2013; v1 submitted 15 May, 2013; originally announced May 2013.

    Comments: To be Submitted to "ICML 2013 Workshop on Machine Learning for Bioacoustics", 6 pages, 4 figures

  27. arXiv:1305.3250  [pdf

    cs.CV

    Bioacoustical Periodic Pulse Train Signal Detection and Classification using Spectrogram Intensity Binarization and Energy Projection

    Authors: Marian Popescu, Peter J. Dugan, Mohammad Pourhomayoun, Denise Risch, Harold W. Lewis III, Christopher W. Clark

    Abstract: The following work outlines an approach for automatic detection and recognition of periodic pulse train signals using a multi-stage process based on spectrogram edge detection, energy projection and classification. The method has been implemented to automatically detect and recognize pulse train songs of minke whales. While the long term goal of this work is to properly identify and detect minke s… ▽ More

    Submitted 28 June, 2013; v1 submitted 14 May, 2013; originally announced May 2013.

    Comments: ICML 2013 Workshop on Machine Learning for Bioacoustics, 2013, 6 pages