Skip to main content

Showing 1–50 of 23,249 results for author: O.

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.06189  [pdf, other

    cs.CV cs.AI

    Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

    Authors: Orr Zohar, Xiaohan Wang, Yonatan Bitton, Idan Szpektor, Serena Yeung-Levy

    Abstract: The performance of Large Vision Language Models (LVLMs) is dependent on the size and quality of their training datasets. Existing video instruction tuning datasets lack diversity as they are derived by prompting large language models with video captions to generate question-answer pairs, and are therefore mostly descriptive. Meanwhile, many labeled video datasets with diverse labels and supervisio… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Project page: https://orrzohar.github.io/projects/video-star/

  2. arXiv:2407.06174  [pdf, other

    cs.CV

    The Tug-of-War Between Deepfake Generation and Detection

    Authors: Hannah Lee, Changyeon Lee, Kevin Farhat, Lin Qiu, Steve Geluso, Aerin Kim, Oren Etzioni

    Abstract: Multimodal generative models are rapidly evolving, leading to a surge in the generation of realistic video and audio that offers exciting possibilities but also serious risks. Deepfake videos, which can convincingly impersonate individuals, have particularly garnered attention due to their potential misuse in spreading misinformation and creating fraudulent content. This survey paper examines the… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  3. Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels

    Authors: Oluwaseun Adewunmi Alo, Sairam Sri Vatsavai, Ishan Thakkar

    Abstract: Deep Neural Networks (DNNs) predominantly rely on General Matrix Multiply (GEMM) kernels, which are often accelerated using specialized hardware architectures. Recently, analog photonic GEMM accelerators have emerged as a promising alternative, offering vastly superior speed and energy efficiency compared to traditional electronic accelerators. However, these photonic cannot support wider than 4-b… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Presented at the IEEE ISVLSI 2024

  4. arXiv:2407.06131  [pdf, other

    cs.CG math.CO

    Connected Matchings

    Authors: Oswin Aichholzer, Sergio Cabello, Viola Mészáros, Patrick Schnider, Jan Soukup

    Abstract: We show that each set of $n\ge 2$ points in the plane in general position has a straight-line matching with at least $(5n+1)/27$ edges whose segments form a connected set, and such a matching can be computed in $O(n \log n)$ time. As an upper bound, we show that for some planar point sets in general position the largest matching whose segments form a connected set has $\lceil \frac{n-1}{3}\rceil$… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 20 pages, 14 figures; preliminary version in EuroCG 2024

  5. arXiv:2407.06106  [pdf, other

    cs.LO

    Bridging abstract dialectical argumentation and Boolean gene regulation

    Authors: Eugenio Azpeitia, Stan Muñoz Gutiérrez, David A. Rosenblueth, Octavio Zapata

    Abstract: This paper leans on two similar areas so far detached from each other. On the one hand, Dung's pioneering contributions to abstract argumentation, almost thirty years ago, gave rise to a plethora of successors, including abstract dialectical frameworks (ADFs). On the other hand, Boolean networks (BNs), devised as models of gene regulation, have been successful for studying the behavior of molecula… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 41 pages, 9 figures

  6. Neural Garment Dynamics via Manifold-Aware Transformers

    Authors: Peizhuo Li, Tuanfeng Y. Wang, Timur Levent Kesdogan, Duygu Ceylan, Olga Sorkine-Hornung

    Abstract: Data driven and learning based solutions for modeling dynamic garments have significantly advanced, especially in the context of digital humans. However, existing approaches often focus on modeling garments with respect to a fixed parametric human body model and are limited to garment geometries that were seen during training. In this work, we take a different approach and model the dynamics of a… ▽ More

    Submitted 13 May, 2024; originally announced July 2024.

    Comments: EUROGRAPHICS 2024. Project page: https://peizhuoli.github.io/manifold-aware-transformers/ Video: https://www.youtube.com/watch?v=v6FCTHmjyqI

  7. arXiv:2407.06081  [pdf, ps, other

    cs.IT math.NT

    Optimal Rank-Metric Codes with Rank-Locality from Drinfeld Modules

    Authors: Luca Bastioni, Mohamed O. Darwish, Giacomo Micheli

    Abstract: We introduce a new technique to construct rank-metric codes using the arithmetic theory of Drinfeld modules over global fields, and Dirichlet Theorem on polynomial arithmetic progressions. Using our methods, we obtain a new infinite family of optimal rank-metric codes with rank-locality, i.e. every code in our family achieves the information theoretical bound for rank-metric codes with rank-locali… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    MSC Class: 14G50; 68P30; 11T06

  8. arXiv:2407.06079  [pdf, other

    cs.CV cs.AI

    Layered Diffusion Model for One-Shot High Resolution Text-to-Image Synthesis

    Authors: Emaad Khwaja, Abdullah Rashwan, Ting Chen, Oliver Wang, Suraj Kothawade, Yeqing Li

    Abstract: We present a one-shot text-to-image diffusion model that can generate high-resolution images from natural language descriptions. Our model employs a layered U-Net architecture that simultaneously synthesizes images at multiple resolution scales. We show that this method outperforms the baseline of synthesizing images only at the target resolution, while reducing the computational cost per step. We… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  9. arXiv:2407.06071  [pdf, other

    cs.CL cs.AI

    From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

    Authors: Maor Ivgi, Ori Yoran, Jonathan Berant, Mor Geva

    Abstract: Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We propose to view these behaviors as fallbacks that models exhibit under uncertainty, and investigate the connection between them. We categorize fallback behaviors -- sequence repetitions, degenerate text, and hallucinations -- and extensively analyze them in models from the same fam… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  10. arXiv:2407.05996  [pdf, other

    cs.RO

    Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals

    Authors: Moritz Reuss, Ömer Erdinç Yağmurlu, Fabian Wenzel, Rudolf Lioutikov

    Abstract: This work introduces the Multimodal Diffusion Transformer (MDT), a novel diffusion policy framework, that excels at learning versatile behavior from multimodal goal specifications with few language annotations. MDT leverages a diffusion-based multimodal transformer backbone and two self-supervised auxiliary objectives to master long-horizon manipulation tasks based on multimodal goals. The vast ma… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: RSS 2024

  11. arXiv:2407.05986  [pdf, other

    cs.CV cs.LG

    KidSat: satellite imagery to map childhood poverty dataset and benchmark

    Authors: Makkunda Sharma, Fan Yang, Duy-Nhat Vo, Esra Suel, Swapnil Mishra, Samir Bhatt, Oliver Fiala, William Rudgard, Seth Flaxman

    Abstract: Satellite imagery has emerged as an important tool to analyse demographic, health, and development indicators. While various deep learning models have been built for these tasks, each is specific to a particular problem, with few standard benchmarks available. We propose a new dataset pairing satellite imagery and high-quality survey data on child poverty to benchmark satellite feature representat… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 15 pages, 1 figure

  12. arXiv:2407.05961  [pdf, other

    cs.RO physics.app-ph

    Dynamic single-input control of multi-state multi-transition soft robotic actuator

    Authors: Geron Yamit, Ben-Haim Eran, Gat D. Amir, Or Yizhar, Givli Sefi

    Abstract: Soft robotics is an attractive and rapidly emerging field, in which actuation is coupled with the elastic response of the robot's structure to achieve complex deformation patterns. A crucial challenge is the need for multiple control inputs, which adds significant complication to the system. We propose a novel concept of single-input control of an actuator composed of interconnected bi-stable elem… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 6 figures

  13. arXiv:2407.05892  [pdf, other

    eess.IV cs.AI cs.CV

    An efficient method to automate tooth identification and 3D bounding box extraction from Cone Beam CT Images

    Authors: Ignacio Garrido Botella, Ignacio Arranz Águeda, Juan Carlos Armenteros Carmona, Oleg Vorontsov, Fernando Bayón Robledo, Adrián Alonso Barriuso

    Abstract: Accurate identification, localization, and segregation of teeth from Cone Beam Computed Tomography (CBCT) images are essential for analyzing dental pathologies. Modeling an individual tooth can be challenging and intricate to accomplish, especially when fillings and other restorations introduce artifacts. This paper proposes a method for automatically detecting, identifying, and extracting teeth f… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 7 pages, 6 figures, 4 tables

  14. arXiv:2407.05848  [pdf, other

    cs.CV

    Wavelet Convolutions for Large Receptive Fields

    Authors: Shahaf E. Finder, Roy Amoyal, Eran Treister, Oren Freifeld

    Abstract: In recent years, there have been attempts to increase the kernel size of Convolutional Neural Nets (CNNs) to mimic the global receptive field of Vision Transformers' (ViTs) self-attention blocks. That approach, however, quickly hit an upper bound and saturated way before achieving a global receptive field. In this work, we demonstrate that by leveraging the Wavelet Transform (WT), it is, in fact,… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  15. arXiv:2407.05836  [pdf, other

    cs.IR

    Academic Article Recommendation Using Multiple Perspectives

    Authors: Kenneth Church, Omar Alonso, Peter Vickers, Jiameng Sun, Abteen Ebrahimi, Raman Chandrasekar

    Abstract: We argue that Content-based filtering (CBF) and Graph-based methods (GB) complement one another in Academic Search recommendations. The scientific literature can be viewed as a conversation between authors and the audience. CBF uses abstracts to infer authors' positions, and GB uses citations to infer responses from the audience. In this paper, we describe nine differences between CBF and GB, as w… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  16. arXiv:2407.05754  [pdf, other

    cs.NI

    Can We Benefit from Reconfigurable Intelligent Surfaces in Upper Mid-Band 6G Networks? A Critical Look for Promising Use Cases

    Authors: Ferdi Kara, Özlem Tuğfe Demir, Emil Björnson

    Abstract: The upper mid-band frequencies (i.e., 7-24,GHz) are regarded as the golden bands for the sixth-generation (6G) wireless communication systems, combining good coverage, much new spectrum, and many antennas in compact form factors. The first 6G networks will most likely use this band. There is much prior work on channel modeling, coexistence, and possible implementation scenarios in these bands. On… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE Magazine

  17. arXiv:2407.05467  [pdf, other

    cs.DC cs.AI

    The infrastructure powering IBM's Gen AI model development

    Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

    Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of develo** and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

  18. Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness

    Authors: Idris Hamoud, Alexandros Karargyris, Aidean Sharghi, Omid Mohareri, Nicolas Padoy

    Abstract: Semantic segmentation and activity classification are key components to creating intelligent surgical systems able to understand and assist clinical workflow. In the Operating Room, semantic segmentation is at the core of creating robots aware of clinical surroundings, whereas activity classification aims at understanding OR workflow at a higher level. State-of-the-art semantic segmentation and ac… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: IPCAI Conference, International Journal of Computer Assisted Radiology and Surgery 2022

  19. arXiv:2407.05385  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

    Authors: Stefan Horoi, Albert Manuel Orozco Camacho, Eugene Belilovsky, Guy Wolf

    Abstract: Combining the predictions of multiple trained models through ensembling is generally a good way to improve accuracy by leveraging the different learned features of the models, however it comes with high computational and storage costs. Model fusion, the act of merging multiple models into one by combining their parameters reduces these costs but doesn't work as well in practice. Indeed, neural net… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Proceedings of the Forty-first International Conference on Machine Learning (ICML 2024)

  20. arXiv:2407.05333  [pdf, other

    physics.app-ph cs.AI

    Generating multi-scale NMC particles with radial grain architectures using spatial stochastics and GANs

    Authors: Lukas Fuchs, Orkun Furat, Donal P. Finegan, Jeffery Allen, Francois L. E. Usseglio-Viretta, Bertan Ozdogru, Peter J. Weddle, Kandler Smith, Volker Schmidt

    Abstract: Understanding structure-property relationships of Li-ion battery cathodes is crucial for optimizing rate-performance and cycle-life resilience. However, correlating the morphology of cathode particles, such as in NMC811, and their inner grain architecture with electrode performance is challenging, particularly, due to the significant length-scale difference between grain and particle sizes. Experi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  21. arXiv:2407.05248  [pdf, other

    cs.CV

    Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation

    Authors: Junming Su, Zhiqiang Shen, Peng Cao, **zhu Yang, Osmar R. Zaiane

    Abstract: The existing barely-supervised medical image segmentation (BSS) methods, adopting a registration-segmentation paradigm, aim to learn from data with very few annotations to mitigate the extreme label scarcity problem. However, this paradigm poses a challenge: pseudo-labels generated by image registration come with significant noise. To address this issue, we propose a self-paced sample selection fr… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  22. arXiv:2407.05206  [pdf, other

    cs.CV cs.HC cs.LG

    Helios: An extremely low power event-based gesture recognition for always-on smart eyewear

    Authors: Prarthana Bhattacharyya, Joshua Mitton, Ryan Page, Owen Morgan, Ben Menzies, Gabriel Homewood, Kemi Jacobs, Paolo Baesso, Dave Trickett, Chris Mair, Taru Muhonen, Rory Clark, Louis Berridge, Richard Vigars, Iain Wallace

    Abstract: This paper introduces Helios, the first extremely low-power, real-time, event-based hand gesture recognition system designed for all-day on smart eyewear. As augmented reality (AR) evolves, current smart glasses like the Meta Ray-Bans prioritize visual and wearable comfort at the expense of functionality. Existing human-machine interfaces (HMIs) in these devices, such as capacitive touch and voice… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 18 pages, 10 figures. First three authors contributed equally to this paper

  23. arXiv:2407.05180  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    R-Trans -- A Recurrent Transformer Model for Clinical Feedback in Surgical Skill Assessment

    Authors: Julien Quarez, Matthew Elliot, Oscar Maccormac, Nawal Khan, Marc Modat, Sebastien Ourselin, Jonathan Shapey, Alejandro Granados

    Abstract: In surgical skill assessment, Objective Structured Assessments of Technical Skills (OSATS scores) and the Global Rating Scale (GRS) are established tools for evaluating the performance of surgeons during training. These metrics, coupled with feedback on their performance, enable surgeons to improve and achieve standards of practice. Recent studies on the open-source dataset JIGSAW, which contains… ▽ More

    Submitted 22 April, 2024; originally announced July 2024.

  24. arXiv:2407.05061  [pdf, other

    cs.CV

    A Study of Test-time Contrastive Concepts for Open-world, Open-vocabulary Semantic Segmentation

    Authors: Monika Wysoczańska, Antonin Vobecky, Amaia Cardiel, Tomasz Trzciński, Renaud Marlet, Andrei Bursuc, Oriane Siméoni

    Abstract: Recent VLMs, pre-trained on large amounts of image-text pairs to align both modalities, have opened the way to open-vocabulary semantic segmentation. Given an arbitrary set of textual queries, image regions are assigned the closest query in feature space. However, the usual setup expects the user to list all possible visual concepts that may occur in the image, typically all classes of benchmark d… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  25. arXiv:2407.05032  [pdf, ps, other

    cs.HC

    And this is where we fu***d up! Lessons learned from Participatory Design in Digital Civic Initiatives

    Authors: Clara Rosa Cardoso, Sarah Rüller, Ana O Henriques, Anna R L Carter, Markus Rohde

    Abstract: Participatory design in digital civics aims to foster mutual learning and co-creation between public services and citizens. However, rarely do we collectively explore the challenges and failures we experience within PD and digital civics, to enable us to grow as a community. This workshop will explore real-world experiences that had to adapt to unforeseen circumstances. Through case presentations… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  26. arXiv:2407.05016  [pdf, ps, other

    cs.HC

    Envisioning Collaborative Futures: Advancing the Frontiers of Embedded Research

    Authors: Anna R. L. Carter, Kyle Montague, Reem Talhouk, Ana O. Henriques, Hugo Nicolau, Tiffany Knearem, Ceylan Besevli, Firaz Peer, Clara Crivellaro, Sarah Rüller

    Abstract: Participatory design initiatives, especially within the realm of digital civics, are often integrated and co-developed with the very citizens and communities they intend to assist. Digital civics research aims to create positive social change using a variety of digital technologies. These research projects commonly adopt various embedded processes, such as commissioning models \cite{dcitizensproj2… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  27. arXiv:2407.04965  [pdf, other

    cs.CL

    Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression

    Authors: Zhichao Xu, Ashim Gupta, Tao Li, Oliver Bentham, Vivek Srikumar

    Abstract: Large language models (LLMs) are increasingly deployed in real-world scenarios with the help of recent model compression techniques. Such momentum towards local deployment means the use of compressed LLMs will widely impact a large population. However, prior analysis works often prioritize on preserving perplexity which is a direct analogy to training loss. The impact of compression method on othe… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  28. arXiv:2407.04961  [pdf, other

    cs.SE

    A PRISMA-Driven Bibliometric Analysis of the Scientific Literature on Assurance Case Patterns

    Authors: Oluwafemi Odu, Alvine Boaye Belle, Song Wang, Kimya Khakzad Shahandashti

    Abstract: Justifying the correct implementation of the non-functional requirements (e.g., safety, security) of mission-critical systems is crucial to prevent system failure. The later could have severe consequences such as the death of people and financial losses. Assurance cases can be used to prevent system failure, They are structured arguments that allow arguing and relaying various safety-critical syst… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  29. arXiv:2407.04915  [pdf, other

    cs.HC

    Safe Generative Chats in a WhatsApp Intelligent Tutoring System

    Authors: Zachary Levonian, Owen Henkel

    Abstract: Large language models (LLMs) are flexible, personalizable, and available, which makes their use within Intelligent Tutoring Systems (ITSs) appealing. However, that flexibility creates risks: inaccuracies, harmful content, and non-curricular material. Ethically deploying LLM-backed ITS systems requires designing safeguards that ensure positive experiences for students. We describe the design of a c… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: EDM 2024 LLM Workshop

  30. arXiv:2407.04888  [pdf, other

    eess.IV cs.CV

    Unraveling Radiomics Complexity: Strategies for Optimal Simplicity in Predictive Modeling

    Authors: Mahdi Ait Lhaj Loutfi, Teodora Boblea Podasca, Alex Zwanenburg, Taman Upadhaya, Jorge Barrios, David R. Raleigh, William C. Chen, Dante P. I. Capaldi, Hong Zheng, Olivier Gevaert, **g Wu, Alvin C. Silva, Paul J. Zhang, Harrison X. Bai, Jan Seuntjens, Steffen Löck, Patrick O. Richard, Olivier Morin, Caroline Reinhold, Martin Lepage, Martin Vallières

    Abstract: Background: The high dimensionality of radiomic feature sets, the variability in radiomic feature types and potentially high computational requirements all underscore the need for an effective method to identify the smallest set of predictive features for a given clinical problem. Purpose: Develop a methodology and tools to identify and explain the smallest set of predictive radiomic features. Mat… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  31. arXiv:2407.04856  [pdf, other

    cs.LG cs.AI

    Explorative Imitation Learning: A Path Signature Approach for Continuous Environments

    Authors: Nathan Gavenski, Juarez Monteiro, Felipe Meneguzzi, Michael Luck, Odinaldo Rodrigues

    Abstract: Some imitation learning methods combine behavioural cloning with self-supervision to infer actions from state pairs. However, most rely on a large number of expert trajectories to increase generalisation and human intervention to capture key aspects of the problem, such as domain constraints. In this paper, we propose Continuous Imitation Learning from Observation (CILO), a new method augmenting i… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted in the 27th European Conference on Artificial Intelligence (ECAI) 2024

  32. Sensing technologies and machine learning methods for emotion recognition in autism: Systematic review

    Authors: Oresti Banos, Zhoe Comas-González, Javier Medina, Aurora Polo-Rodríguez, David Gil, Jesús Peral, Sandra Amador, Claudia Villalonga

    Abstract: Background: Human Emotion Recognition (HER) has been a popular field of study in the past years. Despite the great progresses made so far, relatively little attention has been paid to the use of HER in autism. People with autism are known to face problems with daily social communication and the prototypical interpretation of emotional responses, which are most frequently exerted via facial express… ▽ More

    Submitted 15 May, 2024; originally announced July 2024.

    Comments: 21 pages, 9 figures

    Journal ref: Sensing technologies and machine learning methods for emotion recognition in autism: Systematic review. International Journal of Medical Informatics, 187, 2024, 105469

  33. arXiv:2407.04694  [pdf, other

    cs.CL cs.AI cs.LG

    Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

    Authors: Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jeremy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans

    Abstract: AI assistants such as ChatGPT are trained to respond to users by saying, "I am a large language model". This raises questions. Do such models know that they are LLMs and reliably act on this knowledge? Are they aware of their current circumstances, such as being deployed to the public? We refer to a model's knowledge of itself and its circumstances as situational awareness. To quantify situational… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 page main body, 98 page appendix, 58 figures

  34. arXiv:2407.04528  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning

    Authors: Aleksander Ficek, Jiaqi Zeng, Oleksii Kuchaiev

    Abstract: Parameter-Efficient Fine-Tuning (PEFT) and Retrieval-Augmented Generation (RAG) have become popular methods for adapting large language models while minimizing compute requirements. In this paper, we apply PEFT methods (P-tuning, Adapters, and LoRA) to a modified Retrieval-Enhanced Transformer (RETRO) and a baseline GPT model across several sizes, ranging from 823 million to 48 billion parameters.… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  35. Sponsored Question Answering

    Authors: Tommy Mordo, Moshe Tennenholtz, Oren Kurland

    Abstract: The potential move from search to question answering (QA) ignited the question of how should the move from sponsored search to sponsored QA look like. We present the first formal analysis of a sponsored QA platform. The platform fuses an organic answer to a question with an ad to produce a so called {\em sponsored answer}. Advertisers then bid on their sponsored answers. Inspired by Generalized Se… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  36. arXiv:2407.04326  [pdf, other

    cs.CV

    LMSeg: A deep graph message-passing network for efficient and accurate semantic segmentation of large-scale 3D landscape meshes

    Authors: Zexian Huang, Kourosh Khoshelham, Gunditj Mirring Traditional Owners Corporation, Martin Tomko

    Abstract: Semantic segmentation of large-scale 3D landscape meshes is pivotal for various geospatial applications, including spatial analysis, automatic map** and localization of target objects, and urban planning and development. This requires an efficient and accurate 3D perception system to understand and analyze real-world environments. However, traditional mesh segmentation methods face challenges in… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  37. arXiv:2407.04273  [pdf

    cs.CY cs.HC

    Understanding the Landscape of Leveraging IoT for Sustainable Growth in Saudi Arabia

    Authors: Manal Alshehri, Ohoud Alharbi

    Abstract: The integration of Internet of Things (IoT) technologies in agriculture holds promise for transforming farming practices, particularly in the Kingdom of Saudi Arabia (KSA). This study explores the adoption of smart farming practices among KSA farmers. Due to the geographical location and nature of KSA, it faces significant challenges in agriculture. The objective of this research is to discuss how… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: This research was part of the SWE 540 course: Research Methods at King Saud University

    ACM Class: H.1.2

  38. arXiv:2407.04212  [pdf, other

    cs.AI

    Smart Vision-Language Reasoners

    Authors: Denisa Roberts, Lucas Roberts

    Abstract: In this article, we investigate vision-language models (VLM) as reasoners. The ability to form abstractions underlies mathematical reasoning, problem-solving, and other Math AI tasks. Several formalisms have been given to these underlying abstractions and skills utilized by humans and intelligent systems for reasoning. Furthermore, human reasoning is inherently multimodal, and as such, we focus ou… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted in ICML 2024 MATH AI Workshop

  39. arXiv:2407.04180  [pdf, other

    cs.CV

    Slice-100K: A Multimodal Dataset for Extrusion-based 3D Printing

    Authors: Anushrut Jignasu, Kelly O. Marshall, Ankush Kumar Mishra, Lucas Nerone Rillo, Baskar Ganapathysubramanian, Aditya Balu, Chinmay Hegde, Adarsh Krishnamurthy

    Abstract: G-code (Geometric code) or RS-274 is the most widely used computer numerical control (CNC) and 3D printing programming language. G-code provides machine instructions for the movement of the 3D printer, especially for the nozzle, stage, and extrusion of material for extrusion-based additive manufacturing. Currently there does not exist a large repository of curated CAD models along with their corre… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  40. arXiv:2407.04153  [pdf, other

    cs.LG cs.AI

    Mixture of A Million Experts

    Authors: Xu Owen He

    Abstract: The feedforward (FFW) layers in standard transformer architectures incur a linear increase in computational costs and activation memory as the hidden layer width grows. Sparse mixture-of-experts (MoE) architectures have emerged as a viable approach to address this issue by decoupling model size from computational cost. The recent discovery of the fine-grained MoE scaling law shows that higher gran… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  41. arXiv:2407.04009  [pdf, other

    cs.CR cs.LG

    A Critical Assessment of Interpretable and Explainable Machine Learning for Intrusion Detection

    Authors: Omer Subasi, Johnathan Cree, Joseph Manzano, Elena Peterson

    Abstract: There has been a large number of studies in interpretable and explainable ML for cybersecurity, in particular, for intrusion detection. Many of these studies have significant amount of overlap** and repeated evaluations and analysis. At the same time, these studies overlook crucial model, data, learning process, and utility related issues and many times completely disregard them. These issues in… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  42. arXiv:2407.03924  [pdf, other

    cs.CE cs.AI eess.SY math.DS

    TwinLab: a framework for data-efficient training of non-intrusive reduced-order models for digital twins

    Authors: Maximilian Kannapinn, Michael Schäfer, Oliver Weeger

    Abstract: Purpose: Simulation-based digital twins represent an effort to provide high-accuracy real-time insights into operational physical processes. However, the computation time of many multi-physical simulation models is far from real-time. It might even exceed sensible time frames to produce sufficient data for training data-driven reduced-order models. This study presents TwinLab, a framework for data… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted version of the revised manuscript published in Engineering Computations

  43. arXiv:2407.03790  [pdf, other

    cs.SE cs.HC

    Assessing Consensus of Developers' Views on Code Readability

    Authors: Agnia Sergeyuk, Olga Lvova, Sergey Titov, Anastasiia Serova, Farid Bagirov, Timofey Bryksin

    Abstract: The rapid rise of Large Language Models (LLMs) has changed software development, with tools like Copilot, JetBrains AI Assistant, and others boosting developers' productivity. However, developers now spend more time reviewing code than writing it, highlighting the importance of Code Readability for code comprehension. Our previous research found that existing Code Readability models were inaccurat… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 8 pages, 1 figure, accepted to be presented at the PPIG'24 workshop

  44. arXiv:2407.03734  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Improving Self-supervised Pre-training using Accent-Specific Codebooks

    Authors: Darshan Prabhu, Abhishek Gupta, Omkar Nitsure, Preethi Jyothi, Sriram Ganapathy

    Abstract: Speech accents present a serious challenge to the performance of state-of-the-art end-to-end Automatic Speech Recognition (ASR) systems. Even with self-supervised learning and pre-training of ASR models, accent invariance is seldom achieved. In this work, we propose an accent-aware adaptation technique for self-supervised learning that introduces a trainable set of accent-specific codebooks to the… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to INTERSPEECH 2024

  45. arXiv:2407.03623  [pdf, other

    cs.CV

    Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes

    Authors: Yusuke Hirota, Jerone T. A. Andrew, Dora Zhao, Orestis Papakyriakopoulos, Apostolos Modas, Yuta Nakashima, Alice Xiang

    Abstract: We tackle societal bias in image-text datasets by removing spurious correlations between protected groups and image attributes. Traditional methods only target labeled attributes, ignoring biases from unlabeled ones. Using text-guided inpainting models, our approach ensures protected group independence from all attributes and mitigates inpainting biases through data filtering. Evaluations on multi… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  46. arXiv:2407.03524  [pdf

    hep-ph cs.LG

    A multicategory jet image classification framework using deep neural network

    Authors: Jairo Orozco Sandoval, Vidya Manian, Sudhir Malik

    Abstract: Jet point cloud images are high dimensional data structures that needs to be transformed to a separable feature space for machine learning algorithms to distinguish them with simple decision boundaries. In this article, the authors focus on jet category separability by particle and jet feature extraction, resulting in more efficient training of a simple deep neural network, resulting in a computat… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 9 pages, y figures

  47. arXiv:2407.03518  [pdf

    cs.CL cs.AI

    Improving LLM Abilities in Idiomatic Translation

    Authors: Sundesh Donthi, Maximilian Spencer, Om Patel, Joon Doh, Eid Rodan

    Abstract: For large language models (LLMs) like NLLB and GPT, translating idioms remains a challenge. Our goal is to enhance translation fidelity by improving LLM processing of idiomatic language while preserving the original linguistic style. This has a significant social impact, as it preserves cultural nuances and ensures translated texts retain their intent and emotional resonance, fostering better cros… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  48. arXiv:2407.03511  [pdf

    cs.CR

    Scalable Zero-Knowledge Proofs for Verifying Cryptographic Hashing in Blockchain Applications

    Authors: Oleksandr Kuznetsov, Anton Yezhov, Vladyslav Yusiuk, Kateryna Kuznetsova

    Abstract: Zero-knowledge proofs (ZKPs) have emerged as a promising solution to address the scalability challenges in modern blockchain systems. This study proposes a methodology for generating and verifying ZKPs to ensure the computational integrity of cryptographic hashing, specifically focusing on the SHA-256 algorithm. By leveraging the Plonky2 framework, which implements the PLONK protocol with FRI comm… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  49. arXiv:2407.03510  [pdf

    cs.CR

    Evolutionary Approach to S-box Generation: Optimizing Nonlinear Substitutions in Symmetric Ciphers

    Authors: Oleksandr Kuznetsov, Nikolay Poluyanenko, Emanuele Frontoni, Marco Arnesano, Oleksii Smirnov

    Abstract: This study explores the application of genetic algorithms in generating highly nonlinear substitution boxes (S-boxes) for symmetric key cryptography. We present a novel implementation that combines a genetic algorithm with the Walsh-Hadamard Spectrum (WHS) cost function to produce 8x8 S-boxes with a nonlinearity of 104. Our approach achieves performance parity with the best-known methods, requirin… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  50. arXiv:2407.03502  [pdf, other

    cs.AI cs.CL cs.LG

    AgentInstruct: Toward Generative Teaching with Agentic Flows

    Authors: Arindam Mitra, Luciano Del Corro, Guoqing Zheng, Shweti Mahajan, Dany Rouhana, Andres Codas, Yadong Lu, Wei-ge Chen, Olga Vrousgos, Corby Rosset, Fillipe Silva, Hamed Khanpour, Yash Lara, Ahmed Awadallah

    Abstract: Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases, researchers also raised concerns around model collapse and drawbacks of imitating other models. This discrepancy can be attributed to the fact that synthetic data varies in quality and diversity. Effective use of synthetic data usually r… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.