Skip to main content

Showing 1–3 of 3 results for author: Corlatescu, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18266  [pdf, other

    cs.CL

    "Vorbeşti Româneşte?" A Recipe to Train Powerful Romanian LLMs with English Instructions

    Authors: Mihai Masala, Denis C. Ilie-Ablachim, Alexandru Dima, Dragos Corlatescu, Miruna Zavelca, Ovio Olaru, Simina Terian, Andrei Terian, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian Rebedea

    Abstract: In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English; hence, their performance in English greatly exceeds other languages. To our knowledge, we are the first to collect and translate a large collection of texts, instructions, and benchmarks and trai… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.07703

  2. arXiv:2405.07703  [pdf, other

    cs.CL

    OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs

    Authors: Mihai Masala, Denis C. Ilie-Ablachim, Dragos Corlatescu, Miruna Zavelca, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian Rebedea

    Abstract: In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English. Hence, their performance in English greatly exceeds their performance in other languages. This document presents our approach to training and evaluating the first foundational and chat LLM specia… ▽ More

    Submitted 17 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  3. arXiv:2310.01835  [pdf, other

    cs.LG

    EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis

    Authors: Dragos Georgian Corlatescu, Alexandru Dinu, Mihaela Gaman, Paul Sumedrea

    Abstract: In recent years there has been a shift from heuristics-based malware detection towards machine learning, which proves to be more robust in the current heavily adversarial threat landscape. While we acknowledge machine learning to be better equipped to mine for patterns in the increasingly high amounts of similar-looking files, we also note a remarkable scarcity of the data available for similarity… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks