Skip to main content

Showing 1–6 of 6 results for author: Soselia, D

.
  1. arXiv:2406.07657  [pdf, other

    cs.LG cs.CL

    OPTune: Efficient Online Preference Tuning

    Authors: Lichang Chen, Jiuhai Chen, Chenxi Liu, John Kirchenbauer, Davit Soselia, Chen Zhu, Tom Goldstein, Tianyi Zhou, Heng Huang

    Abstract: Reinforcement learning with human feedback~(RLHF) is critical for aligning Large Language Models (LLMs) with human preference. Compared to the widely studied offline version of RLHF, \emph{e.g.} direct preference optimization (DPO), recent works have shown that the online variants achieve even better alignment. However, online alignment requires on-the-fly generation of new training data, which is… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 16 pages, 7 figures

  2. arXiv:2402.07319  [pdf, other

    cs.LG cs.AI cs.CL

    ODIN: Disentangled Reward Mitigates Hacking in RLHF

    Authors: Lichang Chen, Chen Zhu, Davit Soselia, Jiuhai Chen, Tianyi Zhou, Tom Goldstein, Heng Huang, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: In this work, we study the issue of reward hacking on the response length, a challenge emerging in Reinforcement Learning from Human Feedback (RLHF) on LLMs. A well-formatted, verbose but less helpful response from the LLMs can often deceive LLMs or even human evaluators to achieve high scores. The same issue also holds for some reward models in RL. To address the challenges in both training and e… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  3. arXiv:2306.07470  [pdf, other

    cs.CV cs.AI

    Reviving Shift Equivariance in Vision Transformers

    Authors: Peijian Ding, Davit Soselia, Thomas Armstrong, Jiahao Su, Furong Huang

    Abstract: Shift equivariance is a fundamental principle that governs how we perceive the world - our recognition of an object remains invariant with respect to shifts. Transformers have gained immense popularity due to their effectiveness in both language and vision tasks. While the self-attention operator in vision transformers (ViT) is permutation-equivariant and thus shift-equivariant, patch embedding, p… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 9 pages, 3 figures

  4. arXiv:2305.14637  [pdf, other

    cs.CV cs.LG

    Learning UI-to-Code Reverse Generator Using Visual Critic Without Rendering

    Authors: Davit Soselia, Khalid Saifullah, Tianyi Zhou

    Abstract: Automated reverse engineering of HTML/CSS code from UI screenshots is an important yet challenging problem with broad applications in website development and design. In this paper, we propose a novel vision-code transformer (ViCT) composed of a vision encoder processing the screenshots and a language decoder to generate the code. They are initialized by pre-trained models such as ViT/DiT and GPT-2… ▽ More

    Submitted 3 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  5. arXiv:1907.12935  [pdf

    cs.CV cs.LG stat.ML

    RNN-based Online Handwritten Character Recognition Using Accelerometer and Gyroscope Data

    Authors: Davit Soselia, Shota Amashukeli, Irakli Koberidze, Levan Shugliashvili

    Abstract: This abstract explores an RNN-based approach to online handwritten recognition problem. Our method uses data from an accelerometer and a gyroscope mounted on a handheld pen-like device to train and run a character pre-diction model. We have built a dataset of timestamped gyroscope and accelerometer data gathered during the manual process of handwriting Latin characters, labeled with the character… ▽ More

    Submitted 24 July, 2019; originally announced July 2019.

  6. arXiv:1812.04650  [pdf, ps, other

    cs.LG stat.ML

    Reproduction Report on "Learn to Pay Attention"

    Authors: Levan Shugliashvili, Davit Soselia, Shota Amashukeli, Irakli Koberidze

    Abstract: We have successfully implemented the "Learn to Pay Attention" model of attention mechanism in convolutional neural networks, and have replicated the results of the original paper in the categories of image classification and fine-grained recognition.

    Submitted 11 December, 2018; originally announced December 2018.

    Comments: 2 pages, 2 tables, originally made for the ICLR 2018 Reproducibility Challenge