-
Harnack inequality for parabolic equations in double-divergence form with singular lower order coefficients
Authors:
Istvan Gyöngy,
Seick Kim
Abstract:
This paper investigates the Harnack inequality for nonnegative solutions to second-order parabolic equations in double divergence form. We impose conditions where the principal coefficients satisfy the Dini mean oscillation condition in $x$, while the drift and zeroth-order coefficients belong to specific Morrey classes. Our analysis contributes to advancing the theoretical foundations of paraboli…
▽ More
This paper investigates the Harnack inequality for nonnegative solutions to second-order parabolic equations in double divergence form. We impose conditions where the principal coefficients satisfy the Dini mean oscillation condition in $x$, while the drift and zeroth-order coefficients belong to specific Morrey classes. Our analysis contributes to advancing the theoretical foundations of parabolic equations in double divergence form, including Fokker-Planck-Kolmogorov equations for probability densities.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Constraining Millicharged dark matter with Gravitational positivity bounds
Authors:
Suro Kim,
Pyungwon Ko
Abstract:
Gravitational positivity bounds provide consistency conditions for effective field theories with gravity. They turn out to be phenomenologically useful by providing lower bounds in parameters of new physics beyond the Standard Models (BSM). In this paper, we derive constraints on millicharged fermion dark matter models with massless dark photon using gravitational positivity bounds. Combining them…
▽ More
Gravitational positivity bounds provide consistency conditions for effective field theories with gravity. They turn out to be phenomenologically useful by providing lower bounds in parameters of new physics beyond the Standard Models (BSM). In this paper, we derive constraints on millicharged fermion dark matter models with massless dark photon using gravitational positivity bounds. Combining them with upper bounds from cosmological and astrophysical observations, we can severely constrain the parameter space of the model. In particular, we show that when the dark matter mass is lighter than the solar core temperature, most of the parameter region is excluded by combining gravitational positivity bounds and the stellar bounds.
△ Less
Submitted 22 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation
Authors:
Jihyun Kim,
Changjae Oh,
Hoseok Do,
Soohyun Kim,
Kwanghoon Sohn
Abstract:
We present a new multi-modal face image generation method that converts a text prompt and a visual input, such as a semantic mask or scribble map, into a photo-realistic face image. To do this, we combine the strengths of Generative Adversarial networks (GANs) and diffusion models (DMs) by employing the multi-modal features in the DM into the latent space of the pre-trained GANs. We present a simp…
▽ More
We present a new multi-modal face image generation method that converts a text prompt and a visual input, such as a semantic mask or scribble map, into a photo-realistic face image. To do this, we combine the strengths of Generative Adversarial networks (GANs) and diffusion models (DMs) by employing the multi-modal features in the DM into the latent space of the pre-trained GANs. We present a simple map** and a style modulation network to link two models and convert meaningful representations in feature maps and attention maps into latent codes. With GAN inversion, the estimated latent codes can be used to generate 2D or 3D-aware facial images. We further present a multi-step training strategy that reflects textual and structural representations into the generated image. Our proposed network produces realistic 2D, multi-view, and stylized face images, which align well with inputs. We validate our method by using pre-trained 2D and 3D GANs, and our results outperform existing methods. Our project page is available at https://github.com/1211sh/Diffusion-driven_GAN-Inversion/.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
On the Kauffman bracket skein module of $(S^1 \times S^2) \ \# \ (S^1 \times S^2)$
Authors:
Rhea Palak Bakshi,
Seongjeong Kim,
Xiao Wang
Abstract:
Determining the structure of the Kauffman bracket skein module of all $3$-manifolds over the ring of Laurent polynomials $\mathbb Z[A^{\pm 1}]$ is a big open problem in skein theory. Very little is known about the skein module of non-prime manifolds over this ring. In this paper, we compute the Kauffman bracket skein module of the $3$-manifold $(S^1 \times S^2) \ \# \ (S^1 \times S^2)$ over the ri…
▽ More
Determining the structure of the Kauffman bracket skein module of all $3$-manifolds over the ring of Laurent polynomials $\mathbb Z[A^{\pm 1}]$ is a big open problem in skein theory. Very little is known about the skein module of non-prime manifolds over this ring. In this paper, we compute the Kauffman bracket skein module of the $3$-manifold $(S^1 \times S^2) \ \# \ (S^1 \times S^2)$ over the ring $\mathbb Z[A^{\pm 1}]$. We do this by analysing the submodule of handle sliding relations, for which we provide a suitable basis. Along the way we also compute the Kauffman bracket skein module of $(S^1 \times S^2) \ \# \ (S^1 \times D^2)$.
△ Less
Submitted 13 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Bidirectional Adversarial Autoencoders for the design of Plasmonic Metasurfaces
Authors:
Yuansan Liu,
Jeygopi Panisilvam,
Peter Dower,
Sejeong Kim,
James Bailey
Abstract:
Deep Learning has been a critical part of designing inverse design methods that are computationally efficient and accurate. An example of this is the design of photonic metasurfaces by using their photoluminescent spectrum as the input data to predict their topology. One fundamental challenge of these systems is their ability to represent nonlinear relationships between sets of data that have diff…
▽ More
Deep Learning has been a critical part of designing inverse design methods that are computationally efficient and accurate. An example of this is the design of photonic metasurfaces by using their photoluminescent spectrum as the input data to predict their topology. One fundamental challenge of these systems is their ability to represent nonlinear relationships between sets of data that have different dimensionalities. Existing design methods often implement a conditional Generative Adversarial Network in order to solve this problem, but in many cases the solution is unable to generate structures that provide multiple peaks when validated. It is demonstrated that in response to the target spectrum, the Bidirectional Adversarial Autoencoder is able to generate structures that provide multiple peaks on several occasions. As a result the proposed model represents an important advance towards the generation of nonlinear photonic metasurfaces that can be used in advanced metasurface design.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Role of Sensing and Computer Vision in 6G Wireless Communications
Authors:
Seungnyun Kim,
Jihoon Moon,
**hong Kim,
Yongjun Ahn,
Donghoon Kim,
Sunwoo Kim,
Kyuhong Shim,
Byonghyo Shim
Abstract:
Recently, we are witnessing the remarkable progress and widespread adoption of sensing technologies in autonomous driving, robotics, and metaverse. Considering the rapid advancement of computer vision (CV) technology to analyze the sensing information, we anticipate a proliferation of wireless applications exploiting the sensing and CV technologies in 6G. In this article, we provide a holistic ove…
▽ More
Recently, we are witnessing the remarkable progress and widespread adoption of sensing technologies in autonomous driving, robotics, and metaverse. Considering the rapid advancement of computer vision (CV) technology to analyze the sensing information, we anticipate a proliferation of wireless applications exploiting the sensing and CV technologies in 6G. In this article, we provide a holistic overview of the sensing and CV-aided wireless communications (SVWC) framework for 6G. By analyzing the high-resolution sensing information through the powerful CV techniques, SVWC can quickly and accurately understand the wireless environments and then perform the wireless tasks. To demonstrate the efficacy of SVWC, we design the whole process of SVWC including the sensing dataset collection, DL model training, and execution of realistic wireless tasks. From the numerical evaluations on 6G communication scenarios, we show that SVWC achieves considerable performance gains over the conventional 5G systems in terms of positioning accuracy, data rate, and access latency.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Codexity: Secure AI-assisted Code Generation
Authors:
Sung Yong Kim,
Zhiyu Fan,
Yannic Noller,
Abhik Roychoudhury
Abstract:
Despite the impressive performance of Large Language Models (LLMs) in software development activities, recent studies show the concern of introducing vulnerabilities into software codebase by AI programming assistants (e.g., Copilot, CodeWhisperer). In this work, we present Codexity, a security-focused code generation framework integrated with five LLMs. Codexity leverages the feedback of static a…
▽ More
Despite the impressive performance of Large Language Models (LLMs) in software development activities, recent studies show the concern of introducing vulnerabilities into software codebase by AI programming assistants (e.g., Copilot, CodeWhisperer). In this work, we present Codexity, a security-focused code generation framework integrated with five LLMs. Codexity leverages the feedback of static analysis tools such as Infer and CppCheck to mitigate security vulnerabilities in LLM-generated programs. Our evaluation in a real-world benchmark with 751 automatically generated vulnerable subjects demonstrates Codexity can prevent 60% of the vulnerabilities being exposed to the software developer.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Data-Efficient Molecular Generation with Hierarchical Textual Inversion
Authors:
Seo** Kim,
Jaehyun Nam,
Sihyun Yu,
Younghoon Shin,
**woo Shin
Abstract:
Develo** an effective molecular generation framework even with a limited number of molecules is often important for its practical deployment, e.g., drug discovery, since acquiring task-related molecular data requires expensive and time-consuming experimental costs. To tackle this issue, we introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecula…
▽ More
Develo** an effective molecular generation framework even with a limited number of molecules is often important for its practical deployment, e.g., drug discovery, since acquiring task-related molecular data requires expensive and time-consuming experimental costs. To tackle this issue, we introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method. HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution. We propose to use multi-level embeddings to reflect such hierarchical features based on the adoption of the recent textual inversion technique in the visual domain, which achieves data-efficient image generation. Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution. We then generate molecules based on the interpolation of the multi-level token embeddings. Extensive experiments demonstrate the superiority of HI-Mol with notable data-efficiency. For instance, on QM9, HI-Mol outperforms the prior state-of-the-art method with 50x less training data. We also show the effectiveness of molecules generated by HI-Mol in low-shot molecular property prediction.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Digraphs in which every $t$ vertices have exactly $λ$ common out-neighbors
Authors:
Myungho Choi,
Ho** Chu,
Suh-Ryung Kim
Abstract:
We say that a digraph is a $(t,λ)$-liking digraph if every $t$ vertices have exactly $λ$ common out-neighbors. In 1975, Plesník [Graphs with a homogeneity, 1975. {\it Glasnik Mathematicki} 10:9-23] proved that any $(t,1)$-liking digraph is the complete digraph on $t+1$ vertices for each $t\geq 3$. Choi {\it et al}. [A digraph version of the Friendship Theorem, 2023. arXiv preprint arXiv:2305.04058…
▽ More
We say that a digraph is a $(t,λ)$-liking digraph if every $t$ vertices have exactly $λ$ common out-neighbors. In 1975, Plesník [Graphs with a homogeneity, 1975. {\it Glasnik Mathematicki} 10:9-23] proved that any $(t,1)$-liking digraph is the complete digraph on $t+1$ vertices for each $t\geq 3$. Choi {\it et al}. [A digraph version of the Friendship Theorem, 2023. arXiv preprint arXiv:2305.04058] (to appear in {\it Discrete mathematics}) showed that a $(2,1)$-liking digraph is a fancy wheel digraph or a $k$-diregular digraph for some positive integer $k$. In this paper, we extend these results by completely characterizing the $(t,λ)$-liking digraphs with $t \geq λ+2$ and giving some equivalent conditions for a $(t,λ)$-liking digraph being a complete digraph on $t+λ$ vertices.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands
Authors:
Hwayong Nam,
Seungmin Baek,
Minbok Wi,
Michael Jaemin Kim,
Jaehyun Park,
Chihun Song,
Nam Sung Kim,
Jung Ho Ahn
Abstract:
The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t…
▽ More
The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address map** and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Performance Analysis of an Optimization Algorithm for Metamaterial Design on the Integrated High-Performance Computing and Quantum Systems
Authors:
Seongmin Kim,
In-Saeng Suh
Abstract:
Optimizing metamaterials with complex geometries is a big challenge. Although an active learning algorithm, combining machine learning (ML), quantum computing, and optical simulation, has emerged as an efficient optimization tool, it still faces difficulties in optimizing complex structures that have potentially high performance. In this work, we comprehensively analyze the performance of an optim…
▽ More
Optimizing metamaterials with complex geometries is a big challenge. Although an active learning algorithm, combining machine learning (ML), quantum computing, and optical simulation, has emerged as an efficient optimization tool, it still faces difficulties in optimizing complex structures that have potentially high performance. In this work, we comprehensively analyze the performance of an optimization algorithm for metamaterial design on the integrated HPC and quantum systems. We demonstrate significant time advantages through message-passing interface (MPI) parallelization on the high-performance computing (HPC) system showing approximately 54% faster ML tasks and 67 times faster optical simulation against serial workloads. Furthermore, we analyze the performance of a quantum algorithm designed for optimization, which runs with various quantum simulators on a local computer or HPC-quantum system. Results showcase ~24 times speedup when executing the optimization algorithm on the HPC-quantum hybrid system. This study paves a way to optimize complex metamaterials using the integrated HPC-quantum system.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Intriguing aspects of light baryon resonances
Authors:
K. P. Khemchandani,
A. Martinez Torres,
Sang-Ho Kim,
Seung-il Nam,
A. Hosaka,
H. Nagahiro
Abstract:
We discuss that some light baryon resonances exhibit properties which cannot be described when attributing a three-valence quark structure to them. Besides pointing out the hadron resonances which clearly require description beyond the quark model, we focus on the third $s_{11},~ N^*$ state and its decay to final states consisting of the lightest hyperon resonances which have a partial width compa…
▽ More
We discuss that some light baryon resonances exhibit properties which cannot be described when attributing a three-valence quark structure to them. Besides pointing out the hadron resonances which clearly require description beyond the quark model, we focus on the third $s_{11},~ N^*$ state and its decay to final states consisting of the lightest hyperon resonances which have a partial width comparable to that for the decay to $πN$. Such properties of the mentioned nucleon resonance get manifested in the cross sections and other observables related to processes producing the lightest hyperon resonances. We show that all these findings arise from the strong association of the baryon resonances to the dynamics among the ground-state hadrons.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Authors:
Youngdong Jang,
Dong In Lee,
MinHyuk Jang,
Jong Wook Kim,
Feng Yang,
Sangpil Kim
Abstract:
The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representat…
▽ More
The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representations. In this work, we introduce an innovative watermarking method that can be employed in both representations of NeRF. This is achieved by fine-tuning NeRF to embed binary messages in the rendering process. In detail, we propose utilizing the discrete wavelet transform in the NeRF space for watermarking. Furthermore, we adopt a deferred back-propagation technique and introduce a combination with the patch-wise loss to improve rendering quality and bit accuracy with minimum trade-offs. We evaluate our method in three different aspects: capacity, invisibility, and robustness of the embedded watermarks in the 2D-rendered images. Our method achieves state-of-the-art performance with faster training speed over the compared state-of-the-art methods.
△ Less
Submitted 27 May, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
Towards Unbiased Evaluation of Detecting Unanswerable Questions in EHRSQL
Authors:
Yong** Yang,
Sihyeon Kim,
SangMook Kim,
Gyubok Lee,
Se-Young Yun,
Edward Choi
Abstract:
Incorporating unanswerable questions into EHR QA systems is crucial for testing the trustworthiness of a system, as providing non-existent responses can mislead doctors in their diagnoses. The EHRSQL dataset stands out as a promising benchmark because it is the only dataset that incorporates unanswerable questions in the EHR QA system alongside practical questions. However, in this work, we identi…
▽ More
Incorporating unanswerable questions into EHR QA systems is crucial for testing the trustworthiness of a system, as providing non-existent responses can mislead doctors in their diagnoses. The EHRSQL dataset stands out as a promising benchmark because it is the only dataset that incorporates unanswerable questions in the EHR QA system alongside practical questions. However, in this work, we identify a data bias in these unanswerable questions; they can often be discerned simply by filtering with specific N-gram patterns. Such biases jeopardize the authenticity and reliability of QA system evaluations. To tackle this problem, we propose a simple debiasing method of adjusting the split between the validation and test sets to neutralize the undue influence of N-gram filtering. By experimenting on the MIMIC-III dataset, we demonstrate both the existing data bias in EHRSQL and the effectiveness of our data split strategy in mitigating this bias.
△ Less
Submitted 28 April, 2024;
originally announced May 2024.
-
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Authors:
Seungone Kim,
Juyoung Suk,
Shayne Longpre,
Bill Yuchen Lin,
Jamin Shin,
Sean Welleck,
Graham Neubig,
Moontae Lee,
Kyungjae Lee,
Minjoon Seo
Abstract:
Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs. However, concerns including transparency, controllability, and affordability strongly motivate the development of open-source LMs specialized in evaluations. On the other hand, existing open evaluator LMs exhibit critical shortcomings: 1) they issue scores that significantly diverge from those ass…
▽ More
Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs. However, concerns including transparency, controllability, and affordability strongly motivate the development of open-source LMs specialized in evaluations. On the other hand, existing open evaluator LMs exhibit critical shortcomings: 1) they issue scores that significantly diverge from those assigned by humans, and 2) they lack the flexibility to perform both direct assessment and pairwise ranking, the two most prevalent forms of assessment. Additionally, they do not possess the ability to evaluate based on custom evaluation criteria, focusing instead on general attributes like helpfulness and harmlessness. To address these issues, we introduce Prometheus 2, a more powerful evaluator LM than its predecessor that closely mirrors human and GPT-4 judgements. Moreover, it is capable of processing both direct assessment and pair-wise ranking formats grouped with a user-defined evaluation criteria. On four direct assessment benchmarks and four pairwise ranking benchmarks, Prometheus 2 scores the highest correlation and agreement with humans and proprietary LM judges among all tested open evaluator LMs. Our models, code, and data are all publicly available at https://github.com/prometheus-eval/prometheus-eval.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
CrossMPT: Cross-attention Message-Passing Transformer for Error Correcting Codes
Authors:
Seong-Joon Park,
Hee-Youl Kwak,
Sang-Hyo Kim,
Yongjune Kim,
Jong-Seon No
Abstract:
Error correcting codes~(ECCs) are indispensable for reliable transmission in communication systems. The recent advancements in deep learning have catalyzed the exploration of ECC decoders based on neural networks. Among these, transformer-based neural decoders have achieved state-of-the-art decoding performance. In this paper, we propose a novel Cross-attention Message-Passing Transformer~(CrossMP…
▽ More
Error correcting codes~(ECCs) are indispensable for reliable transmission in communication systems. The recent advancements in deep learning have catalyzed the exploration of ECC decoders based on neural networks. Among these, transformer-based neural decoders have achieved state-of-the-art decoding performance. In this paper, we propose a novel Cross-attention Message-Passing Transformer~(CrossMPT). CrossMPT iteratively updates two types of input vectors (i.e., magnitude and syndrome vectors) using two masked cross-attention blocks. The mask matrices in these cross-attention blocks are determined by the code's parity-check matrix that delineates the relationship between magnitude and syndrome vectors. Our experimental results show that CrossMPT significantly outperforms existing neural network-based decoders, particularly in decoding low-density parity-check codes. Notably, CrossMPT also achieves a significant reduction in computational complexity, achieving over a 50\% decrease in its attention layers compared to the original transformer-based decoder, while retaining the computational complexity of the remaining layers.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust
Authors:
Sunnie S. Y. Kim,
Q. Vera Liao,
Mihaela Vorvoreanu,
Stephanie Ballard,
Jennifer Wortman Vaughan
Abstract:
Widely deployed large language models (LLMs) can produce convincing yet incorrect outputs, potentially misleading users who may rely on them as if they were correct. To reduce such overreliance, there have been calls for LLMs to communicate their uncertainty to end users. However, there has been little empirical work examining how users perceive and act upon LLMs' expressions of uncertainty. We ex…
▽ More
Widely deployed large language models (LLMs) can produce convincing yet incorrect outputs, potentially misleading users who may rely on them as if they were correct. To reduce such overreliance, there have been calls for LLMs to communicate their uncertainty to end users. However, there has been little empirical work examining how users perceive and act upon LLMs' expressions of uncertainty. We explore this question through a large-scale, pre-registered, human-subject experiment (N=404) in which participants answer medical questions with or without access to responses from a fictional LLM-infused search engine. Using both behavioral and self-reported measures, we examine how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance. We find that first-person expressions (e.g., "I'm not sure, but...") decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy. An exploratory analysis suggests that this increase can be attributed to reduced (but not fully eliminated) overreliance on incorrect answers. While we observe similar effects for uncertainty expressed from a general perspective (e.g., "It's not clear, but..."), these effects are weaker and not statistically significant. Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters. This highlights the importance of user testing before deploying LLMs at scale.
△ Less
Submitted 15 May, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild
Authors:
Donggyun Kim,
Seongwoong Cho,
Semin Kim,
Chong Luo,
Seunghoon Hong
Abstract:
Large language models have evolved data-efficient generalists, benefiting from the universal language interface and large-scale pre-training. However, constructing a data-efficient generalist for dense visual prediction presents a distinct challenge due to the variation in label structures across different tasks. Consequently, generalization to unseen dense prediction tasks in the low-data regime…
▽ More
Large language models have evolved data-efficient generalists, benefiting from the universal language interface and large-scale pre-training. However, constructing a data-efficient generalist for dense visual prediction presents a distinct challenge due to the variation in label structures across different tasks. Consequently, generalization to unseen dense prediction tasks in the low-data regime is not straightforward and has received less attention from previous vision generalists. In this study, we explore a universal model that can flexibly adapt to unseen dense label structures with a few examples, enabling it to serve as a data-efficient vision generalist in diverse real-world scenarios. To this end, we base our method on a powerful meta-learning framework and explore several axes to improve its performance and versatility for real-world problems, such as flexible adaptation mechanisms and scalability. We evaluate our model across a spectrum of unseen real-world scenarios where low-shot learning is desirable, including video, 3D, medical, biological, and user-interactive tasks. Equipped with a generic architecture and an effective adaptation mechanism, our model flexibly adapts to all of these tasks with at most 50 labeled images, showcasing a significant advancement over existing data-efficient generalist approaches. Codes are available at https://github.com/GitGyun/chameleon.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Tunable Ultrafast Dynamics of Antiferromagnetic Vortices in Nanoscale Dots
Authors:
Ji Zou,
Even Thingstad,
Se Kwon Kim,
Jelena Klinovaja,
Daniel Loss
Abstract:
Topological vortex textures in magnetic disks have garnered great attention due to their interesting physics and diverse applications. However, up to now, the vortex state has mainly been studied in microsize ferromagnetic disks, which have oscillation frequencies confined to the GHz range. Here, we propose an experimentally feasible ultrasmall and ultrafast vortex state in an antiferromagnetic na…
▽ More
Topological vortex textures in magnetic disks have garnered great attention due to their interesting physics and diverse applications. However, up to now, the vortex state has mainly been studied in microsize ferromagnetic disks, which have oscillation frequencies confined to the GHz range. Here, we propose an experimentally feasible ultrasmall and ultrafast vortex state in an antiferromagnetic nanodot surrounded by a heavy metal, which is further harnessed to construct a highly tunable vortex network. We theoretically demonstrate that, interestingly, the interfacial Dzyaloshinskii-Moriya interaction (iDMI) induced by the heavy metal at the boundary of the dot acts as an effective chemical potential for the vortices in the interior. Mimicking the creation of a superfluid vortex by rotation, we show that a magnetic vortex state can be stabilized by this iDMI. Subjecting the system to an electric current can trigger vortex oscillations via spin-transfer torque, which reside in the THz regime and can be further modulated by external magnetic fields. Furthermore, we show that coherent coupling between vortices in different nanodisks can be achieved via an antiferromagnetic link. Remarkably, this interaction depends on the vortex polarity and topological charge and is also exceptionally tunable through the vortex resonance frequency. This opens up the possibility for controllable interconnected networks of antiferromagnetic vortices. Our proposal therefore introduces a new avenue for develo** high-density memory, ultrafast logic devices, and THz signal generators, which are ideal for compact integration into microchips.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Multi-view Image Prompted Multi-view Diffusion for Improved 3D Generation
Authors:
Seungwook Kim,
Yichun Shi,
Kejie Li,
Minsu Cho,
Peng Wang
Abstract:
Using image as prompts for 3D generation demonstrate particularly strong performances compared to using text prompts alone, for images provide a more intuitive guidance for the 3D generation process. In this work, we delve into the potential of using multiple image prompts, instead of a single image prompt, for 3D generation. Specifically, we build on ImageDream, a novel image-prompt multi-view di…
▽ More
Using image as prompts for 3D generation demonstrate particularly strong performances compared to using text prompts alone, for images provide a more intuitive guidance for the 3D generation process. In this work, we delve into the potential of using multiple image prompts, instead of a single image prompt, for 3D generation. Specifically, we build on ImageDream, a novel image-prompt multi-view diffusion model, to support multi-view images as the input prompt. Our method, dubbed MultiImageDream, reveals that transitioning from a single-image prompt to multiple-image prompts enhances the performance of multi-view and 3D object generation according to various quantitative evaluation metrics and qualitative assessments. This advancement is achieved without the necessity of fine-tuning the pre-trained ImageDream multi-view diffusion model.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Meta-Object: Interactive and Multisensory Virtual Object Learned from the Real World for the Post-Metaverse
Authors:
Dooyoung Kim,
Taewook Ha,
**seok Hong,
Seonji Kim,
Selin Choi,
Heejeong Ko,
Woontack Woo
Abstract:
With the proliferation of wearable Augmented Reality/Virtual Reality (AR/VR) devices, ubiquitous virtual experiences seamlessly integrate into daily life through metaverse platforms. To support immersive metaverse experiences akin to reality, we propose a next-generation virtual object, a meta-object, a property-embedded virtual object that contains interactive and multisensory characteristics lea…
▽ More
With the proliferation of wearable Augmented Reality/Virtual Reality (AR/VR) devices, ubiquitous virtual experiences seamlessly integrate into daily life through metaverse platforms. To support immersive metaverse experiences akin to reality, we propose a next-generation virtual object, a meta-object, a property-embedded virtual object that contains interactive and multisensory characteristics learned from the real world. Current virtual objects differ significantly from real-world objects due to restricted sensory feedback based on limited physical properties. To leverage meta-objects in the metaverse, three key components are needed: meta-object modeling and property embedding, interaction-adaptive multisensory feedback, and an intelligence simulation-based post-metaverse platform. Utilizing meta-objects that enable both on-site and remote users to interact as if they were engaging with real objects could contribute to the advent of the post-metaverse era through wearable AR/VR devices.
△ Less
Submitted 28 April, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models
Authors:
Gahyeon Kim,
Sohee Kim,
Seokju Lee
Abstract:
Recent advances in large pre-trained vision-language models have demonstrated remarkable performance on zero-shot downstream tasks. Building upon this, recent studies, such as CoOp and CoCoOp, have proposed the use of prompt learning, where context within a prompt is replaced with learnable vectors, leading to significant improvements over manually crafted prompts. However, the performance improve…
▽ More
Recent advances in large pre-trained vision-language models have demonstrated remarkable performance on zero-shot downstream tasks. Building upon this, recent studies, such as CoOp and CoCoOp, have proposed the use of prompt learning, where context within a prompt is replaced with learnable vectors, leading to significant improvements over manually crafted prompts. However, the performance improvement for unseen classes is still marginal, and to tackle this problem, data augmentation has been frequently used in traditional zero-shot learning techniques. Through our experiments, we have identified important issues in CoOp and CoCoOp: the context learned through traditional image augmentation is biased toward seen classes, negatively impacting generalization to unseen classes. To address this problem, we propose adversarial token embedding to disentangle low-level visual augmentation features from high-level class information when inducing bias in learnable prompts. Through our novel mechanism called "Adding Attributes to Prompt Learning", AAPL, we guide the learnable context to effectively extract text features by focusing on high-level features for unseen classes. We have conducted experiments across 11 datasets, and overall, AAPL shows favorable performances compared to the existing methods in few-shot learning, zero-shot learning, cross-dataset, and domain generalization tasks.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
ProbGate at EHRSQL 2024: Enhancing SQL Query Generation Accuracy through Probabilistic Threshold Filtering and Error Handling
Authors:
Sangryul Kim,
Donghee Han,
Sehyun Kim
Abstract:
Recently, deep learning-based language models have significantly enhanced text-to-SQL tasks, with promising applications in retrieving patient records within the medical domain. One notable challenge in such applications is discerning unanswerable queries. Through fine-tuning model, we demonstrate the feasibility of converting medical record inquiries into SQL queries. Additionally, we introduce a…
▽ More
Recently, deep learning-based language models have significantly enhanced text-to-SQL tasks, with promising applications in retrieving patient records within the medical domain. One notable challenge in such applications is discerning unanswerable queries. Through fine-tuning model, we demonstrate the feasibility of converting medical record inquiries into SQL queries. Additionally, we introduce an entropy-based method to identify and filter out unanswerable results. We further enhance result quality by filtering low-confidence SQL through log probability-based distribution, while grammatical and schema errors are mitigated by executing queries on the actual database. We experimentally verified that our method can filter unanswerable questions, which can be widely utilized even when the parameters of the model are not accessible, and that it can be effectively utilized in practice.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Two-state transfer: a generalization of pair and plus state transfer
Authors:
Sooyeong Kim,
Hermie Monterde,
Bahman Ahmadi,
Ada Chan,
Stephen Kirkland,
Sarah Plosker
Abstract:
In the study of quantum state transfer, one is interested in being able to transmit a quantum state with high fidelity within a quantum spin network. In most of the literature, the state of interest is taken to be associated with a standard basis vector; however, more general states have recently been considered. Here, we consider a general linear combination of two vertex states, which encompasse…
▽ More
In the study of quantum state transfer, one is interested in being able to transmit a quantum state with high fidelity within a quantum spin network. In most of the literature, the state of interest is taken to be associated with a standard basis vector; however, more general states have recently been considered. Here, we consider a general linear combination of two vertex states, which encompasses the definitions of pair states and plus states in connected weighted graphs. A two-state in a graph $X$ is a quantum state of the form $\mathbf{e}_u+s\mathbf{e}_v$, where $u$ and $v$ are two vertices in $X$ and $s$ is a non-zero real number. If $s=-1$ or $s=1$, then such a state is called a pair state or a plus state, respectively.
In this paper, we investigate quantum state transfer between two-states, where the Hamiltonian is taken to be the adjacency, Laplacian or signless Laplacian matrix of the graph. By analyzing the spectral properties of the Hamiltonian, we characterize strongly cospectral two-states built from strongly cospectral vertices. This allows us to characterize perfect state transfer (PST) between two-states in complete graphs, cycles and hypercubes. We also produce infinite families of graphs that admit strong cospectrality and PST between two-states that are neither pair nor plus states. Using singular values and singular vectors, we show that vertex PST in the line graph of $X$ implies PST between the plus states formed by corresponding edges in $X$. Furthermore, we provide conditions such that the converse of the previous statement holds. As an application, we characterize strong cospectrality and PST between vertices in line graphs of trees, unicyclic graphs and Cartesian products.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey
Authors:
Marcos V. Conde,
Zhijun Lei,
Wen Li,
Cosmin Stejerean,
Ioannis Katsavounidis,
Radu Timofte,
Kihwan Yoon,
Ganzorig Gankhuyag,
Jiangtao Lv,
Long Sun,
**shan Pan,
Jiangxin Dong,
**hui Tang,
Zhiyuan Li,
Hao Wei,
Chenyang Ge,
Dongyang Zhang,
Tianle Liu,
Huaian Chen,
Yi **,
Menghan Zhou,
Yiqiang Yan,
Si Gao,
Biao Wu,
Shaoli Liu
, et al. (50 additional authors not shown)
Abstract:
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod…
▽ More
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
mmWave Wearable Antenna for Interaction with VR Devices
Authors:
Haksun Son,
Song Min Kim
Abstract:
The VR industry is one of the most promising industries for the near future, as it can provide a more immersive connection between people and the virtual world. Currently, VR devices interact with people using inconvenient controllers or cameras that perform poorly in dark environments. Interaction through millimeter-wave wearable devices has the potential to conveniently track human behavior rega…
▽ More
The VR industry is one of the most promising industries for the near future, as it can provide a more immersive connection between people and the virtual world. Currently, VR devices interact with people using inconvenient controllers or cameras that perform poorly in dark environments. Interaction through millimeter-wave wearable devices has the potential to conveniently track human behavior regardless of the lighting conditions. In this study, a millimeter-wave wearable antenna was developed, opening up the possibility for more immersive interaction with VR devices. The antenna features a low loss tangent polyester fabric to minimize dielectric losses and a smooth coating to reduce losses due to rough surfaces. The antenna operates in the 24GHz ISM band, with an S11 value of -29dB at 24.15GHz.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting
Authors:
Kyusun Cho,
Joungbin Lee,
Heeji Yoon,
Yeobin Hong,
Jaehoon Ko,
Sangjun Ahn,
Seungryong Kim
Abstract:
We propose GaussianTalker, a novel framework for real-time generation of pose-controllable talking heads. It leverages the fast rendering capabilities of 3D Gaussian Splatting (3DGS) while addressing the challenges of directly controlling 3DGS with speech audio. GaussianTalker constructs a canonical 3DGS representation of the head and deforms it in sync with the audio. A key insight is to encode t…
▽ More
We propose GaussianTalker, a novel framework for real-time generation of pose-controllable talking heads. It leverages the fast rendering capabilities of 3D Gaussian Splatting (3DGS) while addressing the challenges of directly controlling 3DGS with speech audio. GaussianTalker constructs a canonical 3DGS representation of the head and deforms it in sync with the audio. A key insight is to encode the 3D Gaussian attributes into a shared implicit feature representation, where it is merged with audio features to manipulate each Gaussian attribute. This design exploits the spatial-aware features and enforces interactions between neighboring points. The feature embeddings are then fed to a spatial-audio attention module, which predicts frame-wise offsets for the attributes of each Gaussian. It is more stable than previous concatenation or multiplication approaches for manipulating the numerous Gaussians and their intricate parameters. Experimental results showcase GaussianTalker's superiority in facial fidelity, lip synchronization accuracy, and rendering speed compared to previous methods. Specifically, GaussianTalker achieves a remarkable rendering speed up to 120 FPS, surpassing previous benchmarks. Our code is made available at https://github.com/KU-CVLAB/GaussianTalker/ .
△ Less
Submitted 25 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Neural network-based recognition of multiple nanobubbles in graphene
Authors:
Subin Kim,
Nojoon Myoung,
Seunghyun Jun,
Ara Go
Abstract:
We present a machine learning method for swiftly identifying nanobubbles in graphene, crucial for understanding electronic transport in graphene-based devices. Nanobubbles cause local strain, impacting graphene's transport properties. Traditional techniques like optical imaging are slow and limited for characterizing multiple nanobubbles. Our approach uses neural networks to analyze graphene's den…
▽ More
We present a machine learning method for swiftly identifying nanobubbles in graphene, crucial for understanding electronic transport in graphene-based devices. Nanobubbles cause local strain, impacting graphene's transport properties. Traditional techniques like optical imaging are slow and limited for characterizing multiple nanobubbles. Our approach uses neural networks to analyze graphene's density of states, enabling rapid detection and characterization of nanobubbles from electronic transport data. This method swiftly enumerates nanobubbles and surpasses conventional imaging methods in efficiency and speed. It enhances quality assessment and optimization of graphene nanodevices, marking a significant advance in condensed matter physics and materials science. Our technique offers an efficient solution for probing the interplay between nanoscale features and electronic properties in two-dimensional materials.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Robust Phase Retrieval by Alternating Minimization
Authors:
Seonho Kim,
Kiryung Lee
Abstract:
We consider a least absolute deviation (LAD) approach to the robust phase retrieval problem that aims to recover a signal from its absolute measurements corrupted with sparse noise. To solve the resulting non-convex optimization problem, we propose a robust alternating minimization (Robust-AM) derived as an unconstrained Gauss-Newton method. To solve the inner optimization arising in each step of…
▽ More
We consider a least absolute deviation (LAD) approach to the robust phase retrieval problem that aims to recover a signal from its absolute measurements corrupted with sparse noise. To solve the resulting non-convex optimization problem, we propose a robust alternating minimization (Robust-AM) derived as an unconstrained Gauss-Newton method. To solve the inner optimization arising in each step of Robust-AM, we adopt two computationally efficient methods for linear programs. We provide a non-asymptotic convergence analysis of these practical algorithms for Robust-AM under the standard Gaussian measurement assumption. These algorithms, when suitably initialized, are guaranteed to converge linearly to the ground truth at an order-optimal sample complexity with high probability while the support of sparse noise is arbitrarily fixed and the sparsity level is no larger than $1/4$. Additionally, through comprehensive numerical experiments on synthetic and image datasets, we show that Robust-AM outperforms existing methods for robust phase retrieval offering comparable theoretical performance
△ Less
Submitted 28 March, 2024;
originally announced April 2024.
-
A Nordhaus--Gaddum problem for the spectral gap of a graph
Authors:
Sooyeong Kim,
Neal Madras
Abstract:
Let $G$ be a graph on $n$ vertices, with complement $\overline{G}$. The spectral gap of the transition probability matrix of a random walk on $G$ is used to estimate how fast the random walk becomes stationary. We prove that the larger spectral gap of $G$ and $\overline{G}$ is $Ω(1/n)$. Moreover, if all degrees are $Ω(n)$ and $n-Ω(n)$, then the larger spectral gap of $G$ and $\overline{G}$ is…
▽ More
Let $G$ be a graph on $n$ vertices, with complement $\overline{G}$. The spectral gap of the transition probability matrix of a random walk on $G$ is used to estimate how fast the random walk becomes stationary. We prove that the larger spectral gap of $G$ and $\overline{G}$ is $Ω(1/n)$. Moreover, if all degrees are $Ω(n)$ and $n-Ω(n)$, then the larger spectral gap of $G$ and $\overline{G}$ is $Θ(1)$. We also show that if the maximum degree is $n-O(1)$ or if $G$ is a join of two graphs, then the spectral gap of $G$ is $Ω(1/n)$. Finally, we provide a family of connected graphs with connected complements such that the larger spectral gap of $G$ and $\overline{G}$ is $O(1/n^{3/4})$.
△ Less
Submitted 15 May, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Do not think pink elephant!
Authors:
Kyomin Hwang,
Suyoung Kim,
JunHoo Lee,
Nojun Kwak
Abstract:
Large Models (LMs) have heightened expectations for the potential of general AI as they are akin to human intelligence. This paper shows that recent large models such as Stable Diffusion and DALL-E3 also share the vulnerability of human intelligence, namely the "white bear phenomenon". We investigate the causes of the white bear phenomenon by analyzing their representation space. Based on this ana…
▽ More
Large Models (LMs) have heightened expectations for the potential of general AI as they are akin to human intelligence. This paper shows that recent large models such as Stable Diffusion and DALL-E3 also share the vulnerability of human intelligence, namely the "white bear phenomenon". We investigate the causes of the white bear phenomenon by analyzing their representation space. Based on this analysis, we propose a simple prompt-based attack method, which generates figures prohibited by the LM provider's policy. To counter these attacks, we introduce prompt-based defense strategies inspired by cognitive therapy techniques, successfully mitigating attacks by up to 48.22\%.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Probing bottom-associated production of a TeV scale scalar decaying to a top quark and dark matter at the LHC
Authors:
Amandeep Kaur Kalsi,
Teruki Kamon,
Seulgi Kim,
Jason S. H. Lee,
Denis Rathjens,
Youn Jung Roh,
Adrian Thompson,
Ian James Watson
Abstract:
A minimal non-thermal dark matter model that can explain both the existence of dark matter and the baryon asymmetry in the universe is studied. It requires two color-triplet, iso-singlet scalars with $\mathcal{O}$(TeV) masses and a singlet Majorana fermion with a mass of $\mathcal{O}$(GeV). The fermion becomes stable and can play the role of the dark matter candidate. We consider the fermion to in…
▽ More
A minimal non-thermal dark matter model that can explain both the existence of dark matter and the baryon asymmetry in the universe is studied. It requires two color-triplet, iso-singlet scalars with $\mathcal{O}$(TeV) masses and a singlet Majorana fermion with a mass of $\mathcal{O}$(GeV). The fermion becomes stable and can play the role of the dark matter candidate. We consider the fermion to interact with a top quark via the exchange of QCD-charged scalar fields coupled dominantly to third generation fermions. The signature of a single top quark production associated with a bottom quark and large missing transverse momentum opens up the possibility to search for this type of model at the LHC in a way complementary to existing monotop searches.
△ Less
Submitted 5 May, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Unveiling dynamic bifurcation of Resch-patterned origami for self-adaptive impact mitigation structure
Authors:
Yasuhiro Miyazawa,
Chia-Yung Chang,
Qixun Li,
Ryan Tenu Ahn,
Koshiro Yamaguchi,
Seonghyun Kim,
Minho Cha,
Junseo Kim,
Yuyang Song,
Shinnosuke Shimokawa,
Umesh Gandhi,
**kyu Yang
Abstract:
In the classic realm of impact mitigation, targeting different impact scenarios with a universally designed device still remains an unassailable challenge. In this study, we delve into the untapped potential of Resch-patterned origami for impact mitigation, specifically considering the adaptively reconfigurable nature of the Resch origami structure. Our unit-cell-level analyses reveal two distinct…
▽ More
In the classic realm of impact mitigation, targeting different impact scenarios with a universally designed device still remains an unassailable challenge. In this study, we delve into the untapped potential of Resch-patterned origami for impact mitigation, specifically considering the adaptively reconfigurable nature of the Resch origami structure. Our unit-cell-level analyses reveal two distinctive modes of deformation, each characterized by contrasting mechanical responses: the folding mode that displays monostability coupled with strain-hardening, and the unfolding mode that manifests bistability, facilitating energy absorption through snap-through dynamics. Drop tests further unveil a novel dynamic bifurcation phenomenon, where the origami switches between folding and unfolding depending on impact speed, thereby showcasing its innate self-reconfigurability in a wide range of dynamic events. The tessellated meter-scale Resch structure mimicking an automotive bumper inherits this dynamically bifurcating behavior, demonstrating the instantaneous morphing into favorable deformation mode to minimize the peak acceleration upon impact. This suggests a self-adaptive and universally applicable impact-absorbing nature of the Resch-patterned origami system. We believe that our findings pave the way for develo** smart, origami-inspired impact mitigation devices capable of real-time response and adaptation to external stimuli, offering insights into designing universally protective structures with enhanced performance in response to various impact scenarios.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Pegasus-v1 Technical Report
Authors:
Raehyuk Jung,
Hyojun Go,
Jaehyuk Yi,
Jiho Jang,
Daniel Kim,
Jay Suh,
Aiden Lee,
Cooper Han,
Jae Lee,
Jeff Kim,
**-Young Kim,
Junwan Kim,
Kyle Park,
Lucas Lee,
Mars Ha,
Minjoon Seo,
Abraham Jo,
Ed Park,
Hassan Kianinejad,
SJ Kim,
Tony Moon,
Wade Jeong,
Andrei Popescu,
Esther Kim,
EK Yoon
, et al. (19 additional authors not shown)
Abstract:
This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi…
▽ More
This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results
Authors:
Xiaoning Liu,
Zongwei Wu,
Ao Li,
Florin-Alexandru Vasluianu,
Yulun Zhang,
Shuhang Gu,
Le Zhang,
Ce Zhu,
Radu Timofte,
Zhi **,
Hongjun Wu,
Chenxi Wang,
Haitao Ling,
Yuanhao Cai,
Hao Bian,
Yuxin Zheng,
**g Lin,
Alan Yuille,
Ben Shao,
** Guo,
Tianli Liu,
Mohao Wu,
Yixu Feng,
Shuo Hou,
Haotian Lin
, et al. (87 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig…
▽ More
This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlighting, extreme darkness, and night scenes. A notable total of 428 participants registered for the challenge, with 22 teams ultimately making valid submissions. This paper meticulously evaluates the state-of-the-art advancements in enhancing low-light images, reflecting the significant progress and creativity in this field.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Computing the LCP Array of a Labeled Graph
Authors:
Jarno Alanko,
Davide Cenzato,
Nicola Cotumaccio,
Sung-Hwan Kim,
Giovanni Manzini,
Nicola Prezza
Abstract:
The LCP array is an important tool in stringology, allowing to speed up pattern matching algorithms and enabling compact representations of the suffix tree. Recently, Conte et al. [DCC 2023] and Cotumaccio et al. [SPIRE 2023] extended the definition of this array to Wheeler DFAs and, ultimately, to arbitrary labeled graphs, proving that it can be used to efficiently solve matching statistics queri…
▽ More
The LCP array is an important tool in stringology, allowing to speed up pattern matching algorithms and enabling compact representations of the suffix tree. Recently, Conte et al. [DCC 2023] and Cotumaccio et al. [SPIRE 2023] extended the definition of this array to Wheeler DFAs and, ultimately, to arbitrary labeled graphs, proving that it can be used to efficiently solve matching statistics queries on the graph's paths. In this paper, we provide the first efficient algorithm building the LCP array of a directed labeled graph with $n$ nodes and $m$ edges labeled over an alphabet of size $σ$. After arguing that the natural generalization of a compact-space LCP-construction algorithm by Beller et al. [J. Discrete Algorithms 2013] runs in time $Ω(nσ)$, we present a new algorithm based on dynamic range stabbing building the LCP array in $O(n\log σ)$ time and $O(n\logσ)$ bits of working space.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Sharp quantitative stability of the Yamabe problem
Authors:
Haixia Chen,
Seunghyeok Kim
Abstract:
Given a smooth closed Riemannian manifold $(M,g)$ of dimension $N \ge 3$, we derive sharp quantitative stability estimates for nonnegative functions near the solution set of the Yamabe problem on $(M,g)$. The seminal work of Struwe (1984) \cite{S} states that if $Γ(u) := \|Δ_g u - \frac{N-2}{4(N-1)} R_g u + u^{\frac{N+2}{N-2}}\|_{H^{-1}(M)} \to 0$, then…
▽ More
Given a smooth closed Riemannian manifold $(M,g)$ of dimension $N \ge 3$, we derive sharp quantitative stability estimates for nonnegative functions near the solution set of the Yamabe problem on $(M,g)$. The seminal work of Struwe (1984) \cite{S} states that if $Γ(u) := \|Δ_g u - \frac{N-2}{4(N-1)} R_g u + u^{\frac{N+2}{N-2}}\|_{H^{-1}(M)} \to 0$, then $\|u-(u_0+\sum_{i=1}^ν \mathcal{V}_i)\|_{H^1(M)} \to 0$ where $u_0$ is a solution to the Yamabe problem on $(M,g)$, $ν\in \mathbb{N} \cup \{0\}$, and $\mathcal{V}_i$ is a bubble-like function. If $M$ is the round sphere $\mathbb{S}^N$, then $u_0 \equiv 0$ and a natural candidate of $\mathcal{V}_i$ is a bubble itself. If $M$ is not conformally equivalent to $\mathbb{S}^N$, then either $u_0 > 0$ or $u_0 \equiv 0$, there is no canonical choice of $\mathcal{V}_i$, and so a careful selection of $\mathcal{V}_i$ must be made to attain optimal estimates.
For $3 \le N \le 5$, we construct suitable $\mathcal{V}_i$'s and then establish the inequality $\|u-(u_0+\sum_{i=1}^ν \mathcal{V}_i)\|_{H^1(M)}$ $ \le Cζ(Γ(u))$ where $C > 0$ and $ζ(t) = t$, consistent with the result of Figalli and Glaudo (2020) \cite{FG} on $\mathbb{S}^N$. In the case of $N \ge 6$, we investigate the single-bubbling phenomenon $(ν= 1)$ on generic Riemannian manifolds $(M,g)$, proving that $ζ(t)$ is determined by $N$, $u_0$, and $g$, and can be much larger than $t$. This exhibits a striking difference from the result of Ciraolo, Figalli, and Maggi (2018) \cite{CFM} on $\mathbb{S}^N$. All of the estimates presented herein are optimal.
△ Less
Submitted 13 May, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Quantitative Analysis of Roles of Direct and Indirect Pathways for Action Selection in The Basal Ganglia
Authors:
Sang-Yoon Kim,
Woochang Lim
Abstract:
The basal ganglia (BG) show diverse functions for motor and cognition. Here, we are concerned about action selection performed by the BG. Particularly, we make quantitative analysis of roles of direct pathway (DP) and indirect pathway (IP) for action selection in a spiking neural network with 3 competing channels. For such quantitative work, in each channel, we get the competition degree…
▽ More
The basal ganglia (BG) show diverse functions for motor and cognition. Here, we are concerned about action selection performed by the BG. Particularly, we make quantitative analysis of roles of direct pathway (DP) and indirect pathway (IP) for action selection in a spiking neural network with 3 competing channels. For such quantitative work, in each channel, we get the competition degree ${\cal C}_d$, given by the ratio of strength of DP (${\cal S}_{DP}$) to strength of IP (${\cal S}_{IP}$) (i.e., ${\cal C}_d = {\cal S}_{DP} / {\cal S}_{IP}$). Then, desired action is selected in the channel with the largest ${\cal C}_d$. Desired action selection is made mainly due to strong focused inhibitory projection to the output nucleus, SNr (substantia nigra pars reticulata) via the "Go" DP in the corresponding channel. Unlike the case of DP, there are two types of IPs; intra-channel IP and inter-channel IP, due to widespread diffusive excitation from the STN (subthalamic nucleus). The intra-channel "No-Go" IP plays a role of brake to suppress the desired action selection. On the other hand, the inter-channel IP to the SNr in the neighboring channels suppresses competing actions, leading to spotlight the desired action selection. In this way, role of the inter-channel IP is opposite to that of the intra-channel IP. But, to the best of our knowledge, no quantitative analysis for such roles of the DP and the two IPs was made. Here, by direct calculations of the DP and the intra- and the inter-channel IP presynaptic currents into the SNr in each channel, we get the competition degree of each channel to determine a desired action, and then roles of the DP and the intra- and inter-channel IPs are quantitatively made clear.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Machine Learning Prediction Models for Solid Electrolytes based on Lattice Dynamics Properties
Authors:
Jiyeon Kim,
Donggeon Lee,
Dongwoo Lee,
Xin Li,
Yea-Lee Lee,
Sooran Kim
Abstract:
Recently, machine-learning approaches have accelerated computational materials design and the search for advanced solid electrolytes. However, the predictors are currently limited to static structural parameters, which may not fully account for the dynamic nature of ionic transport. In this study, we meticulously curated features considering dynamic properties and developed machine-learning models…
▽ More
Recently, machine-learning approaches have accelerated computational materials design and the search for advanced solid electrolytes. However, the predictors are currently limited to static structural parameters, which may not fully account for the dynamic nature of ionic transport. In this study, we meticulously curated features considering dynamic properties and developed machine-learning models to predict the ionic conductivity of solid electrolytes. We compiled 14 phonon-related descriptors from first-principles phonon calculations along with 16 descriptors related to structure and electronic properties. Our logistic regression classifiers exhibit an accuracy of 93 %, while the random forest regression model yields a root mean square error of 1.179 S/cm and $R^2$ of 0.710. Notably, phonon-related features are essential for estimating the ionic conductivity in both models. Furthermore, we applied our prediction model to screen 264 Li-containing materials and identified 11 promising candidates as potential superionic conductors.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Map** Phonon Polaritons with Visible Light
Authors:
Kiernan E. Arledge,
Chase T. Ellis,
Nazli Rasouli Sarabi,
Vincent R. Whiteside,
Chul Soo Kim,
Mi** Kim,
Daniel C. Ratchford,
Michael A Meeker,
Binbin Weng,
Joseph G. Tischler
Abstract:
Phonon polaritons (PhPs) are hybrid photon-phonon waves which enable strong light-matter interactions and subdiffractional confinement, potentially empowering applications in sensing, nonlinear optics and nanoscale energy manipulation. In this work, we use confocal Raman microscopy to investigate the coupling between bulk phonon modes and localized surface phonon polariton (SPhP) modes in indium p…
▽ More
Phonon polaritons (PhPs) are hybrid photon-phonon waves which enable strong light-matter interactions and subdiffractional confinement, potentially empowering applications in sensing, nonlinear optics and nanoscale energy manipulation. In this work, we use confocal Raman microscopy to investigate the coupling between bulk phonon modes and localized surface phonon polariton (SPhP) modes in indium phosphide (InP) nanopillars and 4H-silicon carbide (4H-SiC) gratings. The Raman intensity within the nanostructures is described in terms of the SPhP eigenmodes and used to reconstruct the field intensity, providing a method to map SPhP eigenmodes using visible and near-IR light. Our results indicate that, contrary to expectation, all Raman-active bulk phonon modes of InP and 4H-SiC couple to the localized SPhP modes. Further, we confirm that polarizability selection rules form the predominant coupling mechanism between phonons and SPhP modes, with electron-phonon coupling playing a role for certain phonon modes (A1(LO) and E1(TO) in 4H-SiC). These observations provide a method for extending Raman studies of PhP modes to achieve full 3D reconstruction of the PhP eigenmodes and visualize light-matter interactions within nanostructures, thus advancing Raman scattering as a technique for understanding PhP modes.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Map** the path to Cryogenic Atom Probe Tomography Analysis of biomolecules
Authors:
Eric V. Woods,
Tim M. Schwarz,
Mahander P. Singh,
Shuo Zhang,
Se-Ho Kim,
Ayman A. El-Zoka,
Lothar Gremer,
Dieter Willbold,
Ingrid McCarroll,
B. Gault
Abstract:
The understanding of protein structure, folding, and interaction with other proteins remains one of the grand challenges of modern biology. Tremendous progress has been made thanks to X-ray- or electron-based techniques that have provided atomic configurations of proteins, and their solvation shell. These techniques though require a large number of similar molecules to provide an average view, and…
▽ More
The understanding of protein structure, folding, and interaction with other proteins remains one of the grand challenges of modern biology. Tremendous progress has been made thanks to X-ray- or electron-based techniques that have provided atomic configurations of proteins, and their solvation shell. These techniques though require a large number of similar molecules to provide an average view, and lack detailed compositional information that might play a major role in the biochemical activity of these macromolecules. Based on its intrinsic performance and recent impact in materials science, atom probe tomography (APT) has been touted as a potential novel tool to analyse biological materials, including proteins. However, analysis of biomolecules in their native, hydrated state by APT have not yet been routinely achieved, and the technique's true capabilities remain to be demonstrated. Here, we present and discuss systematic analyses of individual amino-acids in frozen aqueous solutions on two different nanoporous metal supports across a wide range of analysis conditions. Using a ratio of the molecular ions of water as a descriptor for the conditions of electrostatic field, we study the fragmentation and behavior of those amino acids. We discuss the importance sample support, specimen preparation route, acquisition conditions and data analysis, to pave the way towards establishing guidelines for cryo-APT analysis of biomolecules.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Camera Agnostic Two-Head Network for Ego-Lane Inference
Authors:
Chaehyeon Song,
Sungho Yoon,
Minhyeok Heo,
Ayoung Kim,
Sujung Kim
Abstract:
Vision-based ego-lane inference using High-Definition (HD) maps is essential in autonomous driving and advanced driver assistance systems. The traditional approach necessitates well-calibrated cameras, which confines variation of camera configuration, as the algorithm relies on intrinsic and extrinsic calibration. In this paper, we propose a learning-based ego-lane inference by directly estimating…
▽ More
Vision-based ego-lane inference using High-Definition (HD) maps is essential in autonomous driving and advanced driver assistance systems. The traditional approach necessitates well-calibrated cameras, which confines variation of camera configuration, as the algorithm relies on intrinsic and extrinsic calibration. In this paper, we propose a learning-based ego-lane inference by directly estimating the ego-lane index from a single image. To enhance robust performance, our model incorporates the two-head structure inferring ego-lane in two perspectives simultaneously. Furthermore, we utilize an attention mechanism guided by vanishing point-and-line to adapt to changes in viewpoint without requiring accurate calibration. The high adaptability of our model was validated in diverse environments, devices, and camera mounting points and orientations.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Novel indium phosphide charged particle detector characterization with a 120 GeV proton beam
Authors:
Sungjoon Kim,
Manoj B. Jadhav,
Vikas Berry,
Jessica E. Metcalfe,
Anirudha V. Sumant
Abstract:
Thin film detectors which incorporate semiconductor materials other than silicon have the potential to build upon their unique material properties and offer advantages such as faster response times, operation at room temperature, and radiation hardness. To explore the possibility, promising candidate materials were selected, and particle tracking detectors were fabricated. An indium phosphide dete…
▽ More
Thin film detectors which incorporate semiconductor materials other than silicon have the potential to build upon their unique material properties and offer advantages such as faster response times, operation at room temperature, and radiation hardness. To explore the possibility, promising candidate materials were selected, and particle tracking detectors were fabricated. An indium phosphide detector with a metal-insulator-metal (MIM) structure has been fabricated for particle tracking. The detector was tested using radioactive sources and a high energy proton beam at Fermi National Accelerator Laboratory. In addition to its simplistic design and fabrication process, the indium phosphide particle detector showed a very fast response time of hundreds of picoseconds for the 120 GeV protons, which are comparable to the ultra-fast silicon detectors. This fast-timing response is attributed to the high electron mobility of indium phosphide. Such material properties can be leveraged to build novel detectors with superlative performance.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Multiphoton super-resolution imaging via virtual structured illumination
Authors:
Sumin Lim,
Sungsam Kang,
**-Hee Hong,
Youngho **,
Kalpak Gupta,
Moonseok Kim,
Suhyun Kim,
Wonshik Choi,
Seokchan Yoon
Abstract:
Fluorescence imaging in thick biological tissues is challenging due to sample-induced aberration and scattering, which leads to severe degradation of image quality and resolution. Fluorescence imaging in reflection geometry further exacerbates this issue since the point spread function is distorted in both excitation and emission pathways. Here, we propose a novel approach termed adaptive optics v…
▽ More
Fluorescence imaging in thick biological tissues is challenging due to sample-induced aberration and scattering, which leads to severe degradation of image quality and resolution. Fluorescence imaging in reflection geometry further exacerbates this issue since the point spread function is distorted in both excitation and emission pathways. Here, we propose a novel approach termed adaptive optics virtual structured illumination microscopy (AO V-SIM) that enables super-resolution multiphoton imaging through a scattering medium in reflection geometry. Our approach exploits the incoherent reflection matrix obtained using a conventional point-scanning fluorescence microscope with an array detector. We introduce V-SIM super-resolution reconstruction algorithm based on the incoherent reflection matrix. Furthermore, we introduce a software adaptive optics correction algorithm, AO V-SIM, which recovers unattenuated and phase-corrected optical transfer function for both excitation and emission pathways. The effectiveness of our proposed method is experimentally validated through sub-diffraction-limited two-photon fluorescence imaging of various samples in the presence of strong aberration.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Decomposition of Longitudinal Disparities: an Application to the Fetal Growth-Singletons Study
Authors:
Sang Kyu Lee,
Seon** Kim,
Mi-Ok Kim,
Katherine L. Grantz,
Hyokyoung G. Hong
Abstract:
Addressing health disparities among different demographic groups is a key challenge in public health. Despite many efforts, there is still a gap in understanding how these disparities unfold over time. Our paper focuses on this overlooked longitudinal aspect, which is crucial in both clinical and public health settings. In this paper, we introduce a longitudinal disparity decomposition method that…
▽ More
Addressing health disparities among different demographic groups is a key challenge in public health. Despite many efforts, there is still a gap in understanding how these disparities unfold over time. Our paper focuses on this overlooked longitudinal aspect, which is crucial in both clinical and public health settings. In this paper, we introduce a longitudinal disparity decomposition method that decomposes disparities into three components: the explained disparity linked to differences in the exploratory variables' conditional distribution when the modifier distribution is identical between majority and minority groups, the explained disparity that emerges specifically from the unequal distribution of the modifier and its interaction with covariates, and the unexplained disparity. The proposed method offers a dynamic alternative to the traditional Peters-Belson decomposition approach, tackling both the potential reduction in disparity if the covariate distributions of minority groups matched those of the majority group and the evolving nature of disparity over time. We apply the proposed approach to a fetal growth study to gain insights into disparities between different race/ethnicity groups in fetal developmental progress throughout the course of pregnancy.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Spatio-Temporal Motion Retargeting for Quadruped Robots
Authors:
Taerim Yoon,
Dongho Kang,
Seungmin Kim,
Minsung Ahn,
Stelian Coros,
Sungjoon Choi
Abstract:
This work introduces a motion retargeting approach for legged robots, which aims to create motion controllers that imitate the fine behavior of animals. Our approach, namely spatio-temporal motion retargeting (STMR), guides imitation learning procedures by transferring motion from source to target, effectively bridging the morphological disparities by ensuring the feasibility of imitation on the t…
▽ More
This work introduces a motion retargeting approach for legged robots, which aims to create motion controllers that imitate the fine behavior of animals. Our approach, namely spatio-temporal motion retargeting (STMR), guides imitation learning procedures by transferring motion from source to target, effectively bridging the morphological disparities by ensuring the feasibility of imitation on the target system. Our STMR method comprises two components: spatial motion retargeting (SMR) and temporal motion retargeting (TMR). On the one hand, SMR tackles motion retargeting at the kinematic level by generating kinematically feasible whole-body motions from keypoint trajectories. On the other hand, TMR aims to retarget motion at the dynamic level by optimizing motion in the temporal domain. We showcase the effectiveness of our method in facilitating Imitation Learning (IL) for complex animal movements through a series of simulation and hardware experiments. In these experiments, our STMR method successfully tailored complex animal motions from various media, including video captured by a hand-held camera, to fit the morphology and physical properties of the target robots. This enabled RL policy training for precise motion tracking, while baseline methods struggled with highly dynamic motion involving flying phases. Moreover, we validated that the control policy can successfully imitate six different motions in two quadruped robots with different dimensions and physical properties in real-world settings.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System
Authors:
Sein Kim,
Hongseok Kang,
Seungyoon Choi,
Donghyun Kim,
Minchul Yang,
Chanyoung Park
Abstract:
Collaborative filtering recommender systems (CF-RecSys) have shown successive results in enhancing the user experience on social media and e-commerce platforms. However, as CF-RecSys struggles under cold scenarios with sparse user-item interactions, recent strategies have focused on leveraging modality information of user/items (e.g., text or images) based on pre-trained modality encoders and Larg…
▽ More
Collaborative filtering recommender systems (CF-RecSys) have shown successive results in enhancing the user experience on social media and e-commerce platforms. However, as CF-RecSys struggles under cold scenarios with sparse user-item interactions, recent strategies have focused on leveraging modality information of user/items (e.g., text or images) based on pre-trained modality encoders and Large Language Models (LLMs). Despite their effectiveness under cold scenarios, we observe that they underperform simple traditional collaborative filtering models under warm scenarios due to the lack of collaborative knowledge. In this work, we propose an efficient All-round LLM-based Recommender system, called A-LLMRec, that excels not only in the cold scenario but also in the warm scenario. Our main idea is to enable an LLM to directly leverage the collaborative knowledge contained in a pre-trained state-of-the-art CF-RecSys so that the emergent ability of the LLM as well as the high-quality user/item embeddings that are already trained by the state-of-the-art CF-RecSys can be jointly exploited. This approach yields two advantages: (1) model-agnostic, allowing for integration with various existing CF-RecSys, and (2) efficiency, eliminating the extensive fine-tuning typically required for LLM-based recommenders. Our extensive experiments on various real-world datasets demonstrate the superiority of A-LLMRec in various scenarios, including cold/warm, few-shot, cold user, and cross-domain scenarios. Beyond the recommendation task, we also show the potential of A-LLMRec in generating natural language outputs based on the understanding of the collaborative knowledge by performing a favorite genre prediction task. Our code is available at https://github.com/ghdtjr/A-LLMRec .
△ Less
Submitted 1 June, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform
Authors:
Chunghyun Park,
Seungwook Kim,
Jaesik Park,
Minsu Cho
Abstract:
Establishing accurate 3D correspondences between shapes stands as a pivotal challenge with profound implications for computer vision and robotics. However, existing self-supervised methods for this problem assume perfect input shape alignment, restricting their real-world applicability. In this work, we introduce a novel self-supervised Rotation-Invariant 3D correspondence learner with Local Shape…
▽ More
Establishing accurate 3D correspondences between shapes stands as a pivotal challenge with profound implications for computer vision and robotics. However, existing self-supervised methods for this problem assume perfect input shape alignment, restricting their real-world applicability. In this work, we introduce a novel self-supervised Rotation-Invariant 3D correspondence learner with Local Shape Transform, dubbed RIST, that learns to establish dense correspondences between shapes even under challenging intra-class variations and arbitrary orientations. Specifically, RIST learns to dynamically formulate an SO(3)-invariant local shape transform for each point, which maps the SO(3)-equivariant global shape descriptor of the input shape to a local shape descriptor. These local shape descriptors are provided as inputs to our decoder to facilitate point cloud self- and cross-reconstruction. Our proposed self-supervised training pipeline encourages semantically corresponding points from different shapes to be mapped to similar local shape descriptors, enabling RIST to establish dense point-wise correspondences. RIST demonstrates state-of-the-art performances on 3D part label transfer and semantic keypoint transfer given arbitrarily rotated point cloud pairs, outperforming existing methods by significant margins.
△ Less
Submitted 20 April, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs
Authors:
Taeho Kim,
Yanming Wang,
Vatshank Chaturvedi,
Lokesh Gupta,
Seyeon Kim,
Yongin Kwon,
Sangtae Ha
Abstract:
Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However, determining the most effective method for achieving rapid fine-tuning while preventing GPU out-of-memory issues in a given environment remains unclear. To address thi…
▽ More
Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However, determining the most effective method for achieving rapid fine-tuning while preventing GPU out-of-memory issues in a given environment remains unclear. To address this challenge, we introduce LLMem, a solution that estimates the GPU memory consumption when applying distributed fine-tuning methods across multiple GPUs and identifies the optimal method. We conduct GPU memory usage estimation prior to fine-tuning, leveraging the fundamental structure of transformer-based decoder models and the memory usage distribution of each method. Experimental results show that LLMem accurately estimates peak GPU memory usage on a single GPU, with error rates of up to 1.6%. Additionally, it shows an average error rate of 3.0% when applying distributed fine-tuning methods to LLMs with more than a billion parameters on multi-GPU setups.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting
Authors:
Ashkan Mirzaei,
Riccardo De Lutio,
Seung Wook Kim,
David Acuna,
Jonathan Kelly,
Sanja Fidler,
Igor Gilitschenski,
Zan Gojcic
Abstract:
Neural reconstruction approaches are rapidly emerging as the preferred representation for 3D scenes, but their limited editability is still posing a challenge. In this work, we propose an approach for 3D scene inpainting -- the task of coherently replacing parts of the reconstructed scene with desired content. Scene inpainting is an inherently ill-posed task as there exist many solutions that plau…
▽ More
Neural reconstruction approaches are rapidly emerging as the preferred representation for 3D scenes, but their limited editability is still posing a challenge. In this work, we propose an approach for 3D scene inpainting -- the task of coherently replacing parts of the reconstructed scene with desired content. Scene inpainting is an inherently ill-posed task as there exist many solutions that plausibly replace the missing content. A good inpainting method should therefore not only enable high-quality synthesis but also a high degree of control. Based on this observation, we focus on enabling explicit control over the inpainted content and leverage a reference image as an efficient means to achieve this goal. Specifically, we introduce RefFusion, a novel 3D inpainting method based on a multi-scale personalization of an image inpainting diffusion model to the given reference view. The personalization effectively adapts the prior distribution to the target scene, resulting in a lower variance of score distillation objective and hence significantly sharper details. Our framework achieves state-of-the-art results for object removal while maintaining high controllability. We further demonstrate the generality of our formulation on other downstream tasks such as object insertion, scene outpainting, and sparse view reconstruction.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.