-
DogeRM: Equip** Reward Models with Domain Knowledge through Model Merging
Authors:
Tzu-Han Lin,
Chen-An Li,
Hung-yi Lee,
Yun-Nung Chen
Abstract:
Reinforcement learning from human feedback (RLHF) is a popular strategy for aligning large language models (LLMs) with desired behaviors. Reward modeling is a crucial step in RLHF. However, collecting paired preference data for training reward models is often costly and time-consuming, especially for domain-specific preferences requiring expert annotation. To address this challenge, we propose the…
▽ More
Reinforcement learning from human feedback (RLHF) is a popular strategy for aligning large language models (LLMs) with desired behaviors. Reward modeling is a crucial step in RLHF. However, collecting paired preference data for training reward models is often costly and time-consuming, especially for domain-specific preferences requiring expert annotation. To address this challenge, we propose the \textbf{Do}main knowled\textbf{ge} merged \textbf{R}eward \textbf{M}odel (DogeRM), a novel framework that integrates domain-specific knowledge into a general reward model by model merging. The experiments demonstrate that DogeRM enhances performance across different benchmarks and provide a detailed analysis showcasing the effects of model merging, showing the great potential of facilitating model alignment.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Uniform approximation by harmonic polynomials for solving the Dirichlet problem of Laplace's equation on a disk
Authors:
Taewan Kim,
Haesung Lee
Abstract:
In this paper, we study the Dirichlet problem for Laplace's equation in an open disk. The uniqueness of solutions is ensured by the well-known weak maximum principle. We introduce a novel approach to demonstrate the existence of a solution using harmonic polynomials that converge uniformly to a solution. Specifically, we rigorously derive the convergence rate of the harmonic polynomials and show t…
▽ More
In this paper, we study the Dirichlet problem for Laplace's equation in an open disk. The uniqueness of solutions is ensured by the well-known weak maximum principle. We introduce a novel approach to demonstrate the existence of a solution using harmonic polynomials that converge uniformly to a solution. Specifically, we rigorously derive the convergence rate of the harmonic polynomials and show that smoother boundary data and proximity of the target point to the disk's origin accelerate the convergence. Additionally, we obtain uniform estimates for the derivatives of solutions of arbitrary orders, controlled by $L^1$-boundary data. Notably, the constants in our estimates are significantly improved compared to existing results. Furthermore, we provide an enhanced radius of convergence for Taylor's series of the solution at each point in the open disk.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Sampling from the Continuous Random Energy Model in Total Variation Distance
Authors:
Holden Lee,
Qiang Wu
Abstract:
The continuous random energy model (CREM) is a toy model of spin glasses on $\{0,1\}^N$ that, in the limit, exhibits an infinitely hierarchical correlation structure. We give two polynomial-time algorithms to approximately sample from the Gibbs distribution of the CREM in the high-temperature regime, based on a Markov chain and a sequential sampler. The running time depends algebraically on the de…
▽ More
The continuous random energy model (CREM) is a toy model of spin glasses on $\{0,1\}^N$ that, in the limit, exhibits an infinitely hierarchical correlation structure. We give two polynomial-time algorithms to approximately sample from the Gibbs distribution of the CREM in the high-temperature regime, based on a Markov chain and a sequential sampler. The running time depends algebraically on the desired TV distance and failure probability and exponentially in $(1/g')^{O(1)}$, where $g'$ is the gap to a certain inverse temperature threshold; this contrasts with previous results which only attain $o(N)$ accuracy in KL divergence. If the covariance function $A$ of the CREM is concave, the algorithms work up to the critical threshold $β_c$, which is the static phase transition point; moreover, for certain $A$, the algorithms work up to the known algorithmic threshold $β_G$ proposed in Addario-Berry and Maillard (2020) for non-trivial sampling guarantees. Our result depends on quantitative bounds for the fluctuation of the partition function and a new contiguity result of the ``tilted" CREM obtained from sampling, which is of independent interest. We also show that the spectral gap is exponentially small with high probability, suggesting that the algebraic dependence is unavoidable with a Markov chain approach.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Active Healing of Microtubule-Motor Networks
Authors:
Fan Yang,
Shichen Liu,
Heun ** Lee,
Rob Phillips,
Matt Thomson
Abstract:
Cytoskeletal networks have a self-healing property where networks can repair defects to maintain structural integrity. However, both the mechanisms and dynamics of healing remain largely unknown. Here we report an unexplored healing mechanism in microtubule-motor networks by active crosslinking. We directly generate network cracks using a light-controlled microtubule-motor system, and observe that…
▽ More
Cytoskeletal networks have a self-healing property where networks can repair defects to maintain structural integrity. However, both the mechanisms and dynamics of healing remain largely unknown. Here we report an unexplored healing mechanism in microtubule-motor networks by active crosslinking. We directly generate network cracks using a light-controlled microtubule-motor system, and observe that the cracks can self-heal. Combining theory and experiment, we find that the networks must overcome internal elastic resistance in order to heal cracks, giving rise to a bifurcation of dynamics dependent on the initial opening angle of the crack: the crack heals below a critical angle and opens up at larger angles. Simulation of a continuum model reproduces the bifurcation dynamics, revealing the importance of a boundary layer where free motors and microtubules can actively crosslink and thereby heal the crack. We also formulate a simple elastic-rod model that can qualitatively predict the critical angle, which is found to be tunable by two dimensionless geometric parameters, the ratio of the boundary layer and network width, and the aspect ratio of the network. Our results provide a new framework for understanding healing in cytoskeletal networks and designing self-healable biomaterials.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Local and Global Reciprocity in Orbital-Charge-Coupled Transport
Authors:
Dongwook Go,
Tom S. Seifert,
Tobias Kampfrath,
Kazuya Ando,
Hyun-Woo Lee,
Yuriy Mokrousov
Abstract:
The coupled transport of the charge and orbital angular momentum of electrons is at the heart of orbitronics. Here, we discuss the reciprocal relation between the direct and inverse orbital Hall effects (OHEs) in thin films. We argue that the conventional orbital current is ill-defined as it does not satisfy the reciprocal relation owing to non-conservation of the orbital angular momentum. We reso…
▽ More
The coupled transport of the charge and orbital angular momentum of electrons is at the heart of orbitronics. Here, we discuss the reciprocal relation between the direct and inverse orbital Hall effects (OHEs) in thin films. We argue that the conventional orbital current is ill-defined as it does not satisfy the reciprocal relation owing to non-conservation of the orbital angular momentum. We resolve the problem by adopting the definition of the so-called \emph{proper} orbital current, which is directly related to orbital accumulation. We prove the reciprocal relation between the \emph{global} response of orbital and charge currents. However, we show that their \emph{local} distributions are generally different, especially due to gigantic contributions at surfaces, which may lead to unintuitive results when charge and orbital currents are locally measured. We demonstrate our predictions by first-principles calculations on W(110) and Pt(111) thin films. In W(110), the direct and inverse OHEs are severely non-reciprocal locally in each layer although the total responses are exactly reciprocal. Interestingly, the SHEs are almost reciprocal locally in each layer. On the other hand, in Pt(111), both OHEs and SHEs are locally non-reciprocal, which we attribute to the pronounced spin-orbit interaction. We propose that the locally distinct responses may be used to distinguish the spin and orbital currents in experiments.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Global decomposition of networks into multiple cores formed by local hubs
Authors:
Wonhee Jeong,
Unjong Yu,
Sang Hoon Lee
Abstract:
Networks are ubiquitous in various fields, representing systems where nodes and their interconnections constitute their intricate structures. We introduce a network decomposition scheme to reveal multiscale core-periphery structures lurking inside, using the concept of locally defined nodal hub centrality and edge-pruning techniques built upon it. We demonstrate that the hub-centrality-based edge…
▽ More
Networks are ubiquitous in various fields, representing systems where nodes and their interconnections constitute their intricate structures. We introduce a network decomposition scheme to reveal multiscale core-periphery structures lurking inside, using the concept of locally defined nodal hub centrality and edge-pruning techniques built upon it. We demonstrate that the hub-centrality-based edge pruning reveals a series of breaking points in network decomposition, which effectively separates a network into its backbone and shell structures. Our local-edge decomposition method iteratively identifies and removes locally least important nodes, and uncovers an onion-like hierarchical structure as a result. Compared with the conventional $k$-core decomposition method, our method based on relative information residing in local structures exhibits a clear advantage in terms of discovering locally crucial substructures. Furthermore, we introduce the core-periphery score to properly separate the core and periphery with our decomposition scheme. By extending the method combined with the network community structure, we successfully detect multiple core-periphery structures by decomposition inside each community. Moreover, the application of our decomposition to supernode networks defined from the communities reveals the intricate relation between the two representative mesoscale structures.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Distinguishing Surface and Bulk Electromagnetism via Their Dynamics in an Intrinsic Magnetic Topological Insulator
Authors:
Khanh Duy Nguyen,
Woojoo Lee,
Jianchen Dang,
Tongyao Wu,
Gabriele Berruto,
Chenhui Yan,
Chi Ian Jess Ip,
Haoran Lin,
Qiang Gao,
Seng Huat Lee,
Binghai Yan,
Chaoxing Liu,
Zhiqiang Mao,
Xiao-Xiao Zhang,
Shuolong Yang
Abstract:
The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of th…
▽ More
The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of this mechanism remains elusive, especially in intrinsic MTIs. Here we combine time-resolved photoemission spectroscopy with time-resolved magneto-optical Kerr effect measurements to elucidate the unique electromagnetism at the surface of an intrinsic MTI MnBi2Te4. Theoretical modeling based on 2D Ruderman-Kittel-Kasuya-Yosida interactions captures the initial quenching of a surface-rooted exchange gap within a factor of two but over-estimates the bulk demagnetization by one order of magnitude. This mechanism directly explains the sizable gap in the quasi-2D electronic state and the nonzero residual magnetization in even-layer MnBi2Te4. Furthermore, it leads to efficient light-induced demagnetization comparable to state-of-the-art magnetophotonic crystals, promising an effective manipulation of magnetism and topological orders for future topotronics.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Refined approaches in second leptogenesis for the baryon-lepton asymmetry discrepancy
Authors:
YeolLin ChoeJo,
Kazuki Enomoto,
Yechan Kim,
Hye-Sung Lee
Abstract:
The temperature-dependent mass of the heavy neutrino can lead to the second leptogenesis occurring below the electroweak scale, potentially explaining the large discrepancy between baryon and lepton asymmetries. We investigate this scenario further, exploring the intricate interplay of the weak interaction processes within this framework. It includes notable shifts in the dominant decay channels o…
▽ More
The temperature-dependent mass of the heavy neutrino can lead to the second leptogenesis occurring below the electroweak scale, potentially explaining the large discrepancy between baryon and lepton asymmetries. We investigate this scenario further, exploring the intricate interplay of the weak interaction processes within this framework. It includes notable shifts in the dominant decay channels of heavy neutrinos around the electroweak symmetry breaking, along with the resonance behavior of the scattering processes near the $W/Z$ mass. The $CP$ asymmetry can also vary over cosmic history due to the temperature-dependent mass, allowing the $B-L$ asymmetry generation to be amplified in the late epoch. These findings elucidate how such alterations in the dynamics of second leptogenesis contribute to addressing the observed discrepancies in baryon-lepton asymmetry.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Statistical Analysis on Scale and Regional Distribution of Undergraduate Physics Programs in Korean Universities
Authors:
Gahyoun Gim,
Sang Hoon Lee
Abstract:
We report on the temporal changes in undergraduate-level physics programs at Korean universities from 1915 to 2023 by analyzing data on physics-related departments and their students using basic statistics and the scaling theory of statistical physics. Our analysis reveals that the number of departments peaked around the turn of the 21st century, and it has been steadily decreasing ever since, wit…
▽ More
We report on the temporal changes in undergraduate-level physics programs at Korean universities from 1915 to 2023 by analyzing data on physics-related departments and their students using basic statistics and the scaling theory of statistical physics. Our analysis reveals that the number of departments peaked around the turn of the 21st century, and it has been steadily decreasing ever since, with particularly severe declines in private universities located outside the capital region. Besides the change in the overall numbers, we also show the change in the self-identity of physics-related departments reflected in department names, which reveals a recent trend of emphasizing more application-side such as semiconductors and data. As a sophisticated measure to quantify regional imbalances relative to the population eligible for higher education, we present scaling exponents from the scaling theory, which shows a shift from sublinear to linear for departments and a shift from linear to superlinear for students. The result indicates the exacerbation of the regional imbalance of university-level physics education in Korea.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Isotropy of cosmic rays beyond $10^{20}$ eV favors their heavy mass composition
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
Y. Abe,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
I. Buckland,
B. G. Cheon,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
N. Globus,
R. Gonzalez,
W. Hanlon,
N. Hayashida,
H. He
, et al. (118 additional authors not shown)
Abstract:
We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the resul…
▽ More
We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the results are consistent with a relatively heavy injected composition at E ~ 10 EeV that becomes lighter up to E ~ 100 EeV, while the composition at E > 100 EeV is very heavy. The latter is true even in the presence of highest experimentally allowed extra-galactic magnetic fields, while the composition at lower energies can be light if a strong EGMF is present. The effect of the uncertainty in the galactic magnetic field on these results is subdominant.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Mass composition of ultra-high energy cosmic rays from distribution of their arrival directions with the Telescope Array
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
Y. Abe,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
I. Buckland,
B. G. Cheon,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
N. Globus,
R. Gonzalez,
W. Hanlon,
N. Hayashida,
H. He
, et al. (118 additional authors not shown)
Abstract:
We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale struc…
▽ More
We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale structure (LSS) of the Universe. As we report in the companion letter, the TA data show large deflections with respect to the LSS which can be explained, assuming small extra-galactic magnetic fields (EGMF), by an intermediate composition changing to a heavy one (iron) in the highest energy bin. Here we show that these results are robust to uncertainties in UHECR injection spectra, the energy scale of the experiment and galactic magnetic fields (GMF). The assumption of weak EGMF, however, strongly affects this interpretation at all but the highest energies E > 100 EeV, where the remarkable isotropy of the data implies a heavy injected composition even in the case of strong EGMF. This result also holds if UHECR sources are as rare as $2 \times 10^{-5}$ Mpc$^{-3}$, that is the conservative lower limit for the source number density.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
Authors:
Ke-Han Lu,
Zhehuai Chen,
Szu-Wei Fu,
He Huang,
Boris Ginsburg,
Yu-Chiang Frank Wang,
Hung-yi Lee
Abstract:
Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities, enabling SLMs to interpret and generate comprehensive natural language descriptions, thereby fa…
▽ More
Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities, enabling SLMs to interpret and generate comprehensive natural language descriptions, thereby facilitating the capability to understand both linguistic and non-linguistic features in speech. Enhanced with the proposed approach, our model demonstrates superior performance on the Dynamic-SUPERB benchmark, particularly in generalizing to unseen tasks. Moreover, we discover that the aligned model exhibits a zero-shot instruction-following capability without explicit speech instruction tuning. These findings highlight the potential to reshape instruction-following SLMs by incorporating rich, descriptive speech captions.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
From Pixels to Torques with Linear Feedback
Authors:
Jeong Hun Lee,
Sam Schoedel,
Aditya Bhardwaj,
Zachary Manchester
Abstract:
We demonstrate the effectiveness of simple observer-based linear feedback policies for "pixels-to-torques" control of robotic systems using only a robot-facing camera. Specifically, we show that the matrices of an image-based Luenberger observer (linear state estimator) for a "student" output-feedback policy can be learned from demonstration data provided by a "teacher" state-feedback policy via s…
▽ More
We demonstrate the effectiveness of simple observer-based linear feedback policies for "pixels-to-torques" control of robotic systems using only a robot-facing camera. Specifically, we show that the matrices of an image-based Luenberger observer (linear state estimator) for a "student" output-feedback policy can be learned from demonstration data provided by a "teacher" state-feedback policy via simple linear-least-squares regression. The resulting linear output-feedback controller maps directly from high-dimensional raw images to torques while being amenable to the rich set of analytical tools from linear systems theory, alowing us to enforce closed-loop stability constraints in the learning problem. We also investigate a nonlinear extension of the method via the Koopman embedding. Finally, we demonstrate the surprising effectiveness of linear pixels-to-torques policies on a cartpole system, both in simulation and on real-world hardware. The policy successfully executes both stabilizing and swing-up trajectory tracking tasks using only camera feedback while subject to model mismatch, process and sensor noise, perturbations, and occlusions.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Mental Modeling of Reinforcement Learning Agents by Language Models
Authors:
Wenhao Lu,
Xufeng Zhao,
Josua Spisak,
Jae Hee Lee,
Stefan Wermter
Abstract:
Can emergent language models faithfully model the intelligence of decision-making agents? Though modern language models exhibit already some reasoning ability, and theoretically can potentially express any probable distribution over tokens, it remains underexplored how the world knowledge these pretrained models have memorized can be utilized to comprehend an agent's behaviour in the physical worl…
▽ More
Can emergent language models faithfully model the intelligence of decision-making agents? Though modern language models exhibit already some reasoning ability, and theoretically can potentially express any probable distribution over tokens, it remains underexplored how the world knowledge these pretrained models have memorized can be utilized to comprehend an agent's behaviour in the physical world. This study empirically examines, for the first time, how well large language models (LLMs) can build a mental model of agents, termed agent mental modelling, by reasoning about an agent's behaviour and its effect on states from agent interaction history. This research may unveil the potential of leveraging LLMs for elucidating RL agent behaviour, addressing a key challenge in eXplainable reinforcement learning (XRL). To this end, we propose specific evaluation metrics and test them on selected RL task datasets of varying complexity, reporting findings on agent mental model establishment. Our results disclose that LLMs are not yet capable of fully mental modelling agents through inference alone without further innovations. This work thus provides new insights into the capabilities and limitations of modern LLMs.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Training-Free Exponential Extension of Sliding Window Context with Cascading KV Cache
Authors:
Jeffrey Willette,
Heejun Lee,
Youngwan Lee,
Myeongjae Jeon,
Sung Ju Hwang
Abstract:
The context window within a transformer provides a form of active memory for the current task, which can be useful for few-shot learning and conditional generation, both which depend heavily on previous context tokens. However, as the context length grows, the computational cost increases quadratically. Recent works have shown that saving a few initial tokens along with a fixed-sized sliding windo…
▽ More
The context window within a transformer provides a form of active memory for the current task, which can be useful for few-shot learning and conditional generation, both which depend heavily on previous context tokens. However, as the context length grows, the computational cost increases quadratically. Recent works have shown that saving a few initial tokens along with a fixed-sized sliding window leads to stable streaming generation with linear complexity in transformer-based Large Language Models (LLMs). However, they make suboptimal use of the fixed window by naively evicting all tokens unconditionally from the key-value (KV) cache once they reach the end of the window, resulting in tokens being forgotten and no longer able to affect subsequent predictions. To overcome this limitation, we propose a novel mechanism for storing longer sliding window contexts with the same total cache size by kee** separate cascading sub-cache buffers whereby each subsequent buffer conditionally accepts a fraction of the relatively more important tokens evicted from the previous buffer. Our method results in a dynamic KV cache that can store tokens from the more distant past than a fixed, static sliding window approach. Our experiments show improvements of 5.6% on long context generation (LongBench), 1.2% in streaming perplexity (PG19), and 0.6% in language understanding (MMLU STEM) using LLMs given the same fixed cache size. Additionally, we provide an efficient implementation that improves the KV cache latency from 1.33ms per caching operation to 0.54ms, a 59% speedup over previous work.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
IR physics from the holographic RG flow
Authors:
Chanyong Park,
Jung Hun Lee
Abstract:
Applying the holographic method, we investigate an RG flow and IR physics holographically when a two-dimensional conformal field theory is deformed by a relevant scalar operator. To do so, we first assume an RG flow from a UV to new IR CFT. On the dual gravity side, such an RG flow can be described by rolling down of a bulk scalar field from an unstable to stable equilibrium point. After consideri…
▽ More
Applying the holographic method, we investigate an RG flow and IR physics holographically when a two-dimensional conformal field theory is deformed by a relevant scalar operator. To do so, we first assume an RG flow from a UV to new IR CFT. On the dual gravity side, such an RG flow can be described by rolling down of a bulk scalar field from an unstable to stable equilibrium point. After considering a simple scalar potential allowing several local extrema, we study the change of a ground state along the RG flow. We show that the entanglement entropy at an IR fixed point leads to a logarithmic divergence due to restoring of the conformal symmetry. We study how the change of the ground state affects two-point functions. In the probe limit, we numerically calculate the change of a conformal dimension caused by the modification of the ground state. We further study the analytic form of the IR conformal dimension which is perfectly matched to the numerical result.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Probing the nature of the $χ_{c1}(3872)$ state using radiative decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1094 additional authors not shown)
Abstract:
The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and…
▽ More
The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and the ratio of its partial width to that of the $χ_{c1}(3872)\rightarrow J/ψγ$ decay is measured to be $$ \frac{Γ_{χ_{c1}(3872)\rightarrow ψ(2S)γ}}
{Γ_{χ_{c1}(3872)\rightarrow J/ψγ}} = 1.67 \pm 0.21 \pm 0.12 \pm0.04 , $$ where the first uncertainty is statistical, the second systematic and the third is due to the uncertainties on the branching fractions of the $ψ(2S)$ and $J/ψ$ mesons. The measured ratio makes the interpretation of the $χ_{c1}(3872)$ state as a~pure $D^0\bar{D}^{*0}+\bar{D}^0D^{*0}$ molecule questionable and strongly indicates a sizeable compact charmonium or tetraquark component within the $χ_{c1}(3872)$ state.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Higher differentiability for the fractional $p$-Laplacian
Authors:
Lars Diening,
Kyeongbae Kim,
Ho-Sik Lee,
Simon Nowak
Abstract:
In this work, we study the higher differentiability of solutions to the inhomogeneous fractional $p$-Laplace equation under different regularity assumptions on the data. In the superquadratic case, we extend and sharpen several previous results, while in the subquadratic regime our results constitute completely novel developments even in the homogeneous case. In particular, in the local limit our…
▽ More
In this work, we study the higher differentiability of solutions to the inhomogeneous fractional $p$-Laplace equation under different regularity assumptions on the data. In the superquadratic case, we extend and sharpen several previous results, while in the subquadratic regime our results constitute completely novel developments even in the homogeneous case. In particular, in the local limit our results are consistent with well-known higher differentiability results for the standard inhomogeneous $p$-Laplace equation. All of our main results remain valid in the vectorial context of fractional $p$-Laplace systems.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Project Management for Ground-based Telescope Array Development
Authors:
Ji Hoon Kim,
Myungshin Im,
Hyung Mok Lee,
Seo-Won Chang
Abstract:
Center for the Gravitational-Wave Universe at Seoul National University has been operating its main observational facility, the 7-Dimensional Telescope (7DT) since October 2023. Located at El Sauce Observatory in Chilean Rio Hurtado Valley, 7DT consists of 20 50-cm telescopes equipped with 40 medium-band filters of 25 nm full width at half maximum along with a CMOS camera of 61 megapixels. 7DT pro…
▽ More
Center for the Gravitational-Wave Universe at Seoul National University has been operating its main observational facility, the 7-Dimensional Telescope (7DT) since October 2023. Located at El Sauce Observatory in Chilean Rio Hurtado Valley, 7DT consists of 20 50-cm telescopes equipped with 40 medium-band filters of 25 nm full width at half maximum along with a CMOS camera of 61 megapixels. 7DT produces about 1 TB per night of spectral map** image data including calibration, and the byproduct of the data reduction pipeline once our planned three layered surveys (Reference Imaging Survey, Wide Field Survey, and Intensive Monitoring Survey) start in 2024. We are expecting to generate 1 PB per year by combining raw data, reduced data, and data products (e.g. calibrated stacked images, spectral cubes, and object catalogs). To incorporate this huge amount of data, we now have a data storage for 1 PB which we will increment by 1 PB per year. We also have a high-performance computation facility that is equipped with 2 NVIDIA A100 GPU cards since we plan to carry out real-time data reduction and analysis for follow-up observation data of gravitational wave events. To incorporate this, we established a 400 Mbps network connection between the facilities in Korea and Chile. Taking advantage of the high-performance network, we have been carrying out fully remote operations since October 2023. In this talk, we present details of designing, planning, and executing the ground-based telescope facility project, especially within low-budget academic environments. While we cover as much ground as possible, we will emphasize human resource management, project risk management, and financial contingency management.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Introduction to the 7-Dimensional Telescope: Commissioning Procedures and Data Characteristics
Authors:
Ji Hoon Kim,
Myungshin Im,
Hyung Mok Lee,
Seo-Won Chang,
Hyeonho Choi,
Gregory S. H. Paek
Abstract:
The 7-Dimensional Telescope (7DT) is a multi-telescope system designed to identify electromagnetic (EM) counterparts of gravitational-wave (GW) sources. Consisting of 20 50-cm telescopes along with 40 medium-band filters of 25 nm width, 7DT can obtain spectral map** images for a large field of view (~1.25 square degrees). Along with flexible operation, real-time data reduction, and analysis, the…
▽ More
The 7-Dimensional Telescope (7DT) is a multi-telescope system designed to identify electromagnetic (EM) counterparts of gravitational-wave (GW) sources. Consisting of 20 50-cm telescopes along with 40 medium-band filters of 25 nm width, 7DT can obtain spectral map** images for a large field of view (~1.25 square degrees). Along with flexible operation, real-time data reduction, and analysis, the 7DT's spectral map** capability enables 7DT to follow up GW events quickly and discover EM counterparts. Among 20 planned telescopes, 12 units are deployed at the El Sauce Observatory located at Rio Hurtado Valley in Chile. Since we obtained the first light of 7DT in October 2023, we started its commissioning procedures including examination of bias levels, master flat production, and spectrophotometric standardization. In this talk, we present 7DT instruments and their set-up, commissioning procedures, and data characteristics of 7DT along with our three-layered surveys which are assumed to be initiated in early 2024.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Unidirectional Chiral Emission via Twisted Bi-layer Metasurfaces
Authors:
Dmitrii Gromyko,
Shu An,
Sergey Gorelik,
Jiahui Xu,
Li Jun Lim,
Henry Yit Loong Lee,
Febiana Tjiptoharsono,
Zhi-Kuang Tan,
Cheng-Wei Qiu,
Zhaogang Dong,
Lin Wu
Abstract:
Controlling and channelling light emissions from unpolarized quantum dots into specific directions with chiral polarization remains a key challenge in modern photonics. Stacked metasurface designs offer a potential compact solution for chirality and directionality engineering. However, experimental observations of directional chiral radiation from resonant metasurfaces with quantum emitters remain…
▽ More
Controlling and channelling light emissions from unpolarized quantum dots into specific directions with chiral polarization remains a key challenge in modern photonics. Stacked metasurface designs offer a potential compact solution for chirality and directionality engineering. However, experimental observations of directional chiral radiation from resonant metasurfaces with quantum emitters remain obscure. In this paper, we present experimental observations of unidirectional chiral emission from a twisted bi-layer metasurface via multi-dimensional control, including twist angle, interlayer distance, and lateral displacement between the top and bottom layers, as enabled by doublet alignment lithography (DAL). First, maintaining alignment, the metasurface demonstrates a resonant intrinsic optical chirality with near-unity circular dichroism of 0.94 and reflectance difference of 74%, where a high circular dichroism greater than 0.9 persists across a wide range of angles from -11 to 11 degrees. Second, engineered lateral displacement induces a unidirectional chiral resonance, resulting in unidirectional chiral emission from the quantum dots deposited onto the metasurface. Our bi-layer metasurfaces offer a universal compact platform for efficient radiation manipulation over a wide angular range, promising potential applications in miniaturized lasers, grating couplers, and chiral nanoantennas.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
DataFreeShield: Defending Adversarial Attacks without Training Data
Authors:
Hyeyoon Lee,
Kanghyun Choi,
Dain Kwon,
Sunjong Park,
Mayoore Selvarasa Jaiswal,
Noseong Park,
Jonghyun Choi,
**ho Lee
Abstract:
Recent advances in adversarial robustness rely on an abundant set of training data, where using external or additional datasets has become a common setting. However, in real life, the training data is often kept private for security and privacy issues, while only the pretrained weight is available to the public. In such scenarios, existing methods that assume accessibility to the original data bec…
▽ More
Recent advances in adversarial robustness rely on an abundant set of training data, where using external or additional datasets has become a common setting. However, in real life, the training data is often kept private for security and privacy issues, while only the pretrained weight is available to the public. In such scenarios, existing methods that assume accessibility to the original data become inapplicable. Thus we investigate the pivotal problem of data-free adversarial robustness, where we try to achieve adversarial robustness without accessing any real data. Through a preliminary study, we highlight the severity of the problem by showing that robustness without the original dataset is difficult to achieve, even with similar domain datasets. To address this issue, we propose DataFreeShield, which tackles the problem from two perspectives: surrogate dataset generation and adversarial training using the generated data. Through extensive validation, we show that DataFreeShield outperforms baselines, demonstrating that the proposed method sets the first entirely data-free solution for the adversarial robustness problem.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
CSRT: Evaluation and Analysis of LLMs using Code-Switching Red-Teaming Dataset
Authors:
Haneul Yoo,
Yong** Yang,
Hwaran Lee
Abstract:
Recent studies in large language models (LLMs) shed light on their multilingual ability and safety, beyond conventional tasks in language modeling. Still, current benchmarks reveal their inability to comprehensively evaluate them and are excessively dependent on manual annotations. In this paper, we introduce code-switching red-teaming (CSRT), a simple yet effective red-teaming technique that simu…
▽ More
Recent studies in large language models (LLMs) shed light on their multilingual ability and safety, beyond conventional tasks in language modeling. Still, current benchmarks reveal their inability to comprehensively evaluate them and are excessively dependent on manual annotations. In this paper, we introduce code-switching red-teaming (CSRT), a simple yet effective red-teaming technique that simultaneously tests multilingual understanding and safety of LLMs. We release the CSRT dataset, which comprises 315 code-switching queries combining up to 10 languages and eliciting a wide range of undesirable behaviors. Through extensive experiments with ten state-of-the-art LLMs, we demonstrate that CSRT significantly outperforms existing multilingual red-teaming techniques, achieving 46.7% more attacks than existing methods in English. We analyze the harmful responses toward the CSRT dataset concerning various aspects under ablation studies with 16K samples, including but not limited to scaling laws, unsafe behavior categories, and input conditions for optimal data generation. Additionally, we validate the extensibility of CSRT, by generating code-switching attack prompts with monolingual data.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Minimal grid diagrams of the prime alternating knots with 13 crossings
Authors:
Hwa Jeong Lee,
Alexander Stoimenow,
Gyo Taek **
Abstract:
A knot is a closed loop in space without self-intersection. Two knots are equivalent if there is a self homeomorphism of space bringing one onto the other. An arc presentation is an embedding of a knot in the union of finitely many half planes with a common boundary line such that each half plane contains a simple arc of the knot. The minimal number of such half planes among all arc presentations…
▽ More
A knot is a closed loop in space without self-intersection. Two knots are equivalent if there is a self homeomorphism of space bringing one onto the other. An arc presentation is an embedding of a knot in the union of finitely many half planes with a common boundary line such that each half plane contains a simple arc of the knot. The minimal number of such half planes among all arc presentations of a given knot is called the arc index of the knot. A knot is usually presented as a planar diagram with finitely many crossings of two strands where one of the strands goes over the other. A grid diagram is a planar diagram which is a non-simple rectilinear polygon such that vertical edges always cross over horizontal edges at all crossings. It is easily seen that an arc presentation gives rise to a grid diagram and vice versa. It is known that the arc index of an alternating knot is two plus its minimal crossing number. There are 4878 prime alternating knots with minimal crossing number 13. We obtained minimal arc presentations of them in the form of grid diagrams having 15 vertical segments. This is a continuation of the works on prime alternating knots of 11 crossings and 12 crossings.
△ Less
Submitted 31 March, 2024;
originally announced June 2024.
-
Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks
Authors:
Hokyung Lee,
Sumanyu Sharma,
Bing Hu
Abstract:
Recent research in Needle-in-a-Haystack (NIAH) benchmarks has explored the capabilities of Large Language Models (LLMs) in retrieving contextual information from large text documents. However, as LLMs become increasingly integrated into software development processes, it is crucial to evaluate their performance in code-based environments. As LLMs are further developed for program synthesis, we nee…
▽ More
Recent research in Needle-in-a-Haystack (NIAH) benchmarks has explored the capabilities of Large Language Models (LLMs) in retrieving contextual information from large text documents. However, as LLMs become increasingly integrated into software development processes, it is crucial to evaluate their performance in code-based environments. As LLMs are further developed for program synthesis, we need to ensure that LLMs can understand syntax and write syntactically correct code. As a step in ensuring LLMs understand syntax, LLMs can be evaluated in their ability to find and detect syntax bugs. Our benchmark, Bug In The Code Stack (BICS), is designed to assess the ability of LLMs to identify simple syntax bugs within large source code. Our findings reveal three key insights: (1) code-based environments pose significantly more challenge compared to text-based environments for retrieval tasks, (2) there is a substantial performance disparity among different models, and (3) there is a notable correlation between longer context lengths and performance degradation, though the extent of this degradation varies between models.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Impact of Internal Dust Correction on the Stellar Populations of Galaxies Estimated Using the Full Spectrum Fitting
Authors:
Joon Hyeop Lee,
Hyun** Jeong,
Jiwon Chung,
Mina Pak,
Sree Oh
Abstract:
Full spectrum fitting is a powerful tool for estimating the stellar populations of galaxies, but the fitting results are often significantly influenced by internal dust attenuation. For understanding how the choice of the internal dust correction method affects the detailed stellar populations estimated from the full spectrum fitting, we analyze the Sydney-Australian Astronomical Observatory Multi…
▽ More
Full spectrum fitting is a powerful tool for estimating the stellar populations of galaxies, but the fitting results are often significantly influenced by internal dust attenuation. For understanding how the choice of the internal dust correction method affects the detailed stellar populations estimated from the full spectrum fitting, we analyze the Sydney-Australian Astronomical Observatory Multi-object Integral field spectrograph (SAMI) galaxy survey data using the Penalized PiXel-Fitting (PPXF) package. Three choices are compared: (Choice-1) using the PPXF reddening option, (Choice-2) using the multiplicative Legendre polynomial, and (Choice-3) using none of them (no dust correction). In any case, the total mean stellar populations show reasonable mass-age and mass-metallicity relations (MTR and MZR), although the correlations appear to be strongest for Choice-1 (MTR) and Choice-2 (MZR). When we compare the age-divided mean stellar populations, the MZR of young (< 10^9.5 yr ~ 3.2 Gyr) stellar components in Choice-2 is consistent with the gas-phase MZR, whereas those in the other two choices hardly are. On the other hand, the MTR of old (>= 10^9.5 yr) stellar components in Choice-1 seems to be more reasonable than that in Choice-2, because the old stellar components in low-mass galaxies tend to be relatively younger than those in massive galaxies. Based on the results, we provide empirical guidelines for choosing the optimal options for dust correction.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models
Authors:
Hwiyeol Jo,
Hyunwoo Lee,
Taiwoo Park
Abstract:
The recent advancements in large language models (LLMs) have brought significant progress in solving NLP tasks. Notably, in-context learning (ICL) is the key enabling mechanism for LLMs to understand specific tasks and gras** nuances. In this paper, we propose a simple yet effective method to contextualize a task toward a specific LLM, by (1) observing how a given LLM describes (all or a part of…
▽ More
The recent advancements in large language models (LLMs) have brought significant progress in solving NLP tasks. Notably, in-context learning (ICL) is the key enabling mechanism for LLMs to understand specific tasks and gras** nuances. In this paper, we propose a simple yet effective method to contextualize a task toward a specific LLM, by (1) observing how a given LLM describes (all or a part of) target datasets, i.e., open-ended zero-shot inference, and (2) aggregating the open-ended inference results by the LLM, and (3) finally incorporate the aggregated meta-information for the actual task. We show the effectiveness of this approach in text clustering tasks, and also highlight the importance of the contextualization through examples of the above procedure.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
The Powell Conjecture in genus four
Authors:
Sangbum Cho,
Yuya Koda,
Jung Hoon Lee
Abstract:
The Powell Conjecture states that four specific elements suffice to generate the Goeritz group of the Heegaard splitting of the $3$-sphere. We show that this conjecture is true when the genus of the splitting is four.
The Powell Conjecture states that four specific elements suffice to generate the Goeritz group of the Heegaard splitting of the $3$-sphere. We show that this conjecture is true when the genus of the splitting is four.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation
Authors:
Xin Yu,
Qi Yang,
Han Liu,
Ho Hin Lee,
Yucheng Tang,
Lucas W. Remedios,
Michael Kim,
Shunxing Bao,
Ann Xenobia Moore,
Luigi Ferrucci,
Bennett A. Landman
Abstract:
2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmenta…
▽ More
2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmentation results. In this work, we propose a novel 3D-to-2D distillation framework, leveraging pre-trained 3D models to enhance 2D single-slice segmentation. Specifically, we extract the prediction distribution centroid from the 3D representations, to guide the 2D student by learning intra- and inter-class correlation. Unlike traditional knowledge distillation methods that require the same data input, our approach employs unpaired 3D CT scans with any contrast to guide the 2D student model. Experiments conducted on 707 subjects from the single-slice Baltimore Longitudinal Study of Aging (BLSA) dataset demonstrate that state-of-the-art 2D multi-organ segmentation methods can benefit from the 3D teacher model, achieving enhanced performance in single-slice multi-organ segmentation. Notably, our approach demonstrates considerable efficacy in low-data regimes, outperforming the model trained with all available training subjects even when utilizing only 200 training subjects. Thus, this work underscores the potential to alleviate manual annotation burdens.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Authors:
Han-Hung Lee,
Yiming Zhang,
Angel X. Chang
Abstract:
We introduce Duoduo CLIP, a model for 3D representation learning that learns shape encodings from multi-view images instead of point-clouds. The choice of multi-view images allows us to leverage 2D priors from off-the-shelf CLIP models to facilitate fine-tuning with 3D data. Our approach not only shows better generalization compared to existing point cloud methods, but also reduces GPU requirement…
▽ More
We introduce Duoduo CLIP, a model for 3D representation learning that learns shape encodings from multi-view images instead of point-clouds. The choice of multi-view images allows us to leverage 2D priors from off-the-shelf CLIP models to facilitate fine-tuning with 3D data. Our approach not only shows better generalization compared to existing point cloud methods, but also reduces GPU requirements and training time. In addition, we modify the model with cross-view attention to leverage information across multiple frames of the object which further boosts performance. Compared to the current SOTA point cloud method that requires 480 A100 hours to train 1 billion model parameters we only require 57 A5000 hours and 87 million parameters. Multi-view images also provide more flexibility in use cases compared to point clouds. This includes being able to encode objects with a variable number of images, with better performance when more views are used. This is in contrast to point cloud based methods, where an entire scan or model of an object is required. We showcase this flexibility with object retrieval from images of real-world objects. Our model also achieves better performance in more fine-grained text to shape retrieval, demonstrating better text-and-shape alignment than point cloud based models.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection
Authors:
Yunsong Wang,
Na Zhao,
Gim Hee Lee
Abstract:
The use of synthetic data in indoor 3D object detection offers the potential of greatly reducing the manual labor involved in 3D annotations and training effective zero-shot detectors. However, the complicated domain shifts across syn-to-real indoor datasets remains underexplored. In this paper, we propose a novel Object-wise Hierarchical Domain Alignment (OHDA) framework for syn-to-real unsupervi…
▽ More
The use of synthetic data in indoor 3D object detection offers the potential of greatly reducing the manual labor involved in 3D annotations and training effective zero-shot detectors. However, the complicated domain shifts across syn-to-real indoor datasets remains underexplored. In this paper, we propose a novel Object-wise Hierarchical Domain Alignment (OHDA) framework for syn-to-real unsupervised domain adaptation in indoor 3D object detection. Our approach includes an object-aware augmentation strategy to effectively diversify the source domain data, and we introduce a two-branch adaptation framework consisting of an adversarial training branch and a pseudo labeling branch, in order to simultaneously reach holistic-level and class-level domain alignment. The pseudo labeling is further refined through two proposed schemes specifically designed for indoor UDA. Our adaptation results from synthetic dataset 3D-FRONT to real-world datasets ScanNetV2 and SUN RGB-D demonstrate remarkable mAP25 improvements of 9.7% and 9.1% over Source-Only baselines, respectively, and consistently outperform the methods adapted from 2D and 3D outdoor scenarios. The code will be publicly available upon paper acceptance.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding
Authors:
Yunsong Wang,
Na Zhao,
Gim Hee Lee
Abstract:
The field of self-supervised 3D representation learning has emerged as a promising solution to alleviate the challenge presented by the scarcity of extensive, well-annotated datasets. However, it continues to be hindered by the lack of diverse, large-scale, real-world 3D scene datasets for source data. To address this shortfall, we propose Generalizable Representation Learning (GRL), where we devi…
▽ More
The field of self-supervised 3D representation learning has emerged as a promising solution to alleviate the challenge presented by the scarcity of extensive, well-annotated datasets. However, it continues to be hindered by the lack of diverse, large-scale, real-world 3D scene datasets for source data. To address this shortfall, we propose Generalizable Representation Learning (GRL), where we devise a generative Bayesian network to produce diverse synthetic scenes with real-world patterns, and conduct pre-training with a joint objective. By jointly learning a coarse-to-fine contrastive learning task and an occlusion-aware reconstruction task, the model is primed with transferable, geometry-informed representations. Post pre-training on synthetic data, the acquired knowledge of the model can be seamlessly transferred to two principal downstream tasks associated with 3D scene understanding, namely 3D object detection and 3D semantic segmentation, using real-world benchmark datasets. A thorough series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
Authors:
Anas Awadalla,
Le Xue,
Oscar Lo,
Manli Shu,
Hannah Lee,
Etash Kumar Guha,
Matt Jordan,
Sheng Shen,
Mohamed Awadalla,
Silvio Savarese,
Caiming Xiong,
Ran Xu,
Ye** Choi,
Ludwig Schmidt
Abstract:
Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimo…
▽ More
Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimodal INTerleaved dataset to date. MINT-1T comprises one trillion text tokens and three billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. As scaling multimodal interleaved datasets requires substantial engineering effort, sharing the data curation process and releasing the dataset greatly benefits the community. Our experiments show that LMMs trained on MINT-1T rival the performance of models trained on the previous leading dataset, OBELICS. Our data and code will be released at https://github.com/mlfoundations/MINT-1T.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9
Authors:
Do Hyun Lee,
Yoonah Song,
Hong Kook Kim
Abstract:
We present a prompt-engineering-based text-augmentation approach applied to a language-queried audio source separation (LASS) task. To enhance the performance of LASS, the proposed approach utilizes large language models (LLMs) to generate multiple captions corresponding to each sentence of the training dataset. To this end, we first perform experiments to identify the most effective prompts for c…
▽ More
We present a prompt-engineering-based text-augmentation approach applied to a language-queried audio source separation (LASS) task. To enhance the performance of LASS, the proposed approach utilizes large language models (LLMs) to generate multiple captions corresponding to each sentence of the training dataset. To this end, we first perform experiments to identify the most effective prompts for caption augmentation with a smaller number of captions. A LASS model trained with these augmented captions demonstrates improved performance on the DCASE 2024 Task 9 validation set compared to that trained without augmentation. This study highlights the effectiveness of LLM-based caption augmentation in advancing language-queried audio source separation.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Long-time behavior toward composite wave of shocks for 3D barotropic navier-stokes system
Authors:
Moon-** Kang,
Hobin Lee
Abstract:
We consider the barotropic Navier-Stokes system in three space dimensions with periodic boundary condition in the transversal direction. We show the long-time behavior of the 3D barotropic Navier-Stokes flow perturbed from a composition of two shock waves with suitably small amplitudes. We prove that the perturbed Navier-Stokes flow converges, uniformly in space, towards a composition of two plana…
▽ More
We consider the barotropic Navier-Stokes system in three space dimensions with periodic boundary condition in the transversal direction. We show the long-time behavior of the 3D barotropic Navier-Stokes flow perturbed from a composition of two shock waves with suitably small amplitudes. We prove that the perturbed Navier-Stokes flow converges, uniformly in space, towards a composition of two planar viscous shock waves as time goes to infinity, up to dynamical shifts. This is the first result on time-asymptotic stability of composite wave of two shocks for multi-D Navier-Stokes system. The main part of proof is based on the method of a-contraction with shifts.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Dynamic Order Template Prediction for Generative Aspect-Based Sentiment Analysis
Authors:
Yonghyun Jun,
Hwanhee Lee
Abstract:
Aspect-based sentiment analysis (ABSA) assesses sentiments towards specific aspects within texts, resulting in detailed sentiment tuples. Previous ABSA models often use static templates to predict all of the elements in the tuples, and these models often fail to accurately capture dependencies between elements. Multi-view prompting method improves the performance of ABSA by predicting tuples with…
▽ More
Aspect-based sentiment analysis (ABSA) assesses sentiments towards specific aspects within texts, resulting in detailed sentiment tuples. Previous ABSA models often use static templates to predict all of the elements in the tuples, and these models often fail to accurately capture dependencies between elements. Multi-view prompting method improves the performance of ABSA by predicting tuples with various templates and then ensembling the results. However, this method suffers from inefficiencies and out-of-distribution errors. In this paper, we propose a Dynamic Order Template (DOT) method for ABSA, which dynamically generates necessary views for each instance based on instance-level entropy. Ensuring the diverse and relevant view generation, our proposed method improves F1-scores on ASQP and ACOS datasets while significantly reducing inference time.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Can LLMs Understand the Implication of Emphasized Sentences in Dialogue?
Authors:
Guan-Ting Lin,
Hung-yi Lee
Abstract:
Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue. While Large Language Models (LLMs) have revolutionized natural language processing, their ability to understand emphasis in dialogue remains unclear. This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the im…
▽ More
Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue. While Large Language Models (LLMs) have revolutionized natural language processing, their ability to understand emphasis in dialogue remains unclear. This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis. We evaluate various LLMs, both open-source and commercial, to measure their performance in understanding emphasis. Additionally, we propose an automatic evaluation pipeline using GPT-4, which achieves a high correlation with human rating. Our findings reveal that although commercial LLMs generally perform better, there is still significant room for improvement in comprehending emphasized sentences.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
Authors:
Guan-Ting Lin,
Wei-** Huang,
Hung-yi Lee
Abstract:
Deep learning-based end-to-end automatic speech recognition (ASR) has made significant strides but still struggles with performance on out-of-domain (OOD) samples due to domain shifts in real-world scenarios. Test-Time Adaptation (TTA) methods address this issue by adapting models using test samples at inference time. However, current ASR TTA methods have largely focused on non-continual TTA, whic…
▽ More
Deep learning-based end-to-end automatic speech recognition (ASR) has made significant strides but still struggles with performance on out-of-domain (OOD) samples due to domain shifts in real-world scenarios. Test-Time Adaptation (TTA) methods address this issue by adapting models using test samples at inference time. However, current ASR TTA methods have largely focused on non-continual TTA, which limits cross-sample knowledge learning compared to continual TTA. In this work, we propose a Fast-slow TTA framework for ASR, which leverages the advantage of continual and non-continual TTA. Within this framework, we introduce Dynamic SUTA (DSUTA), an entropy-minimization-based continual TTA method for ASR. To enhance DSUTA's robustness on time-varying data, we propose a dynamic reset strategy that automatically detects domain shifts and resets the model, making it more effective at handling multi-domain data. Our method demonstrates superior performance on various noisy ASR datasets, outperforming both non-continual and continual TTA baselines while maintaining robustness to domain changes without requiring domain boundary information.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Authors:
Hung-Ting Su,
Chun-Tong Chao,
Ya-Ching Hsu,
Xudong Lin,
Yulei Niu,
Hung-Yi Lee,
Winston H. Hsu
Abstract:
Large Language Models (LLMs) have demonstrated effectiveness not only in language tasks but also in video reasoning. This paper introduces a novel dataset, Tropes in Movies (TiM), designed as a testbed for exploring two critical yet previously overlooked video reasoning skills: (1) Abstract Perception: understanding and tokenizing abstract concepts in videos, and (2) Long-range Compositional Reaso…
▽ More
Large Language Models (LLMs) have demonstrated effectiveness not only in language tasks but also in video reasoning. This paper introduces a novel dataset, Tropes in Movies (TiM), designed as a testbed for exploring two critical yet previously overlooked video reasoning skills: (1) Abstract Perception: understanding and tokenizing abstract concepts in videos, and (2) Long-range Compositional Reasoning: planning and integrating intermediate reasoning steps for understanding long-range videos with numerous frames. Utilizing tropes from movie storytelling, TiM evaluates the reasoning capabilities of state-of-the-art LLM-based approaches. Our experiments show that current methods, including Captioner-Reasoner, Large Multimodal Model Instruction Fine-tuning, and Visual Programming, only marginally outperform a random baseline when tackling the challenges of Abstract Perception and Long-range Compositional Reasoning. To address these deficiencies, we propose Face-Enhanced Viper of Role Interactions (FEVoRI) and Context Query Reduction (ConQueR), which enhance Visual Programming by fostering role interaction awareness and progressively refining movie contexts and trope queries during reasoning processes, significantly improving performance by 15 F1 points. However, this performance still lags behind human levels (40 vs. 65 F1). Additionally, we introduce a new protocol to evaluate the necessity of Abstract Perception and Long-range Compositional Reasoning for task resolution. This is done by analyzing the code generated through Visual Programming using an Abstract Syntax Tree (AST), thereby confirming the increased complexity of TiM. The dataset and code are available at: https://ander1119.github.io/TiM
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Latitudinal Asymmetry in the Dayside Atmosphere of WASP-43b
Authors:
Ryan C. Challener,
Zafar Rustamkulov,
Elspeth K. H. Lee,
Nikole Lewis,
David K. Sing,
Stephan M. Birkmann,
Nicolas Crouzet,
Néstor Espinoza,
Elena Manjavacas,
Natalia Oliveros-Gomez,
Jeff A. Valenti,
**gxuan Yang
Abstract:
We present two-dimensional near-infrared temperature maps of the canonical hot Jupiter WASP-43b using a phase-curve observation with JWST NIRSpec/G395H. From the white-light planetary transit, we improve constraints on the planet's orbital parameters and measure a planet-to-star radius ratio of $0.15883^{+0.00056}_{-0.00053}$. Using the white-light phase curve, we measure a longitude of maximum br…
▽ More
We present two-dimensional near-infrared temperature maps of the canonical hot Jupiter WASP-43b using a phase-curve observation with JWST NIRSpec/G395H. From the white-light planetary transit, we improve constraints on the planet's orbital parameters and measure a planet-to-star radius ratio of $0.15883^{+0.00056}_{-0.00053}$. Using the white-light phase curve, we measure a longitude of maximum brightness of $6.9^{+0^\circ.5}_{-0^\circ.5}$ east of the substellar point and a phase-curve offset of $10.0^{+0^\circ.8}_{-0^\circ.8}$. We also find an $\approx4σ$ detection of a latitudinal hotspot offset of $-13.4^{+3^\circ.2}_{-1^\circ.7}$, the first significant detection of a non-equatorial hotspot in an exoplanet atmosphere. We show that this detection is robust to variations within planetary parameter uncertainties, but only if the transit is used to improve constraints, showing the importance of transit observations to eclipse map**. Maps retrieved from the NRS1 and NRS2 detectors are similar, with hotspot locations consistent between the two detectors at the $1σ$ level. Our JWST data show brighter (hotter) nightsides and a dimmer (colder) dayside at the shorter wavelengths relative to fits to \textit{Spitzer} 3.6 and 4.5 \microns\ phase curves. Through comparison between our phase curves and a set of general circulation models, we find evidence for clouds on the nightside and atmospheric drag or high metallicity reducing the eastward hotspot offset.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
On the Evaluation of Speech Foundation Models for Spoken Language Understanding
Authors:
Siddhant Arora,
Ankita Pasad,
Chung-Ming Chien,
Jionghao Han,
Roshan Sharma,
Jee-weon Jung,
Hira Dhamyal,
William Chen,
Suwon Shon,
Hung-yi Lee,
Karen Livescu,
Shinji Watanabe
Abstract:
The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for th…
▽ More
The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for these SLU tasks. However, the community still lacks a fine-grained understanding of the comparative utility of different SFMs. Inspired by this, we ask: which SFMs offer the most benefits for these complex SLU tasks, and what is the most effective approach for incorporating these SFMs? To answer this, we perform an extensive evaluation of multiple supervised and self-supervised SFMs using several evaluation protocols: (i) frozen SFMs with a lightweight prediction head, (ii) frozen SFMs with a complex prediction head, and (iii) fine-tuned SFMs with a lightweight prediction head. Although the supervised SFMs are pre-trained on much more speech recognition data (with labels), they do not always outperform self-supervised SFMs; the latter tend to perform at least as well as, and sometimes better than, supervised SFMs, especially on the sequence generation tasks in SLUE. While there is no universally optimal way of incorporating SFMs, the complex prediction head gives the best performance for most tasks, although it increases the inference time. We also introduce an open-source toolkit and performance leaderboard, SLUE-PERB, for these tasks and modeling strategies.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning
Authors:
Xiaowen Sun,
Xufeng Zhao,
Jae Hee Lee,
Wenhao Lu,
Matthias Kerzel,
Stefan Wermter
Abstract:
The state of an object reflects its current status or condition and is important for a robot's task planning and manipulation. However, detecting an object's state and generating a state-sensitive plan for robots is challenging. Recently, pre-trained Large Language Models (LLMs) and Vision-Language Models (VLMs) have shown impressive capabilities in generating plans. However, to the best of our kn…
▽ More
The state of an object reflects its current status or condition and is important for a robot's task planning and manipulation. However, detecting an object's state and generating a state-sensitive plan for robots is challenging. Recently, pre-trained Large Language Models (LLMs) and Vision-Language Models (VLMs) have shown impressive capabilities in generating plans. However, to the best of our knowledge, there is hardly any investigation on whether LLMs or VLMs can also generate object state-sensitive plans. To study this, we introduce an Object State-Sensitive Agent (OSSA), a task-planning agent empowered by pre-trained neural networks. We propose two methods for OSSA: (i) a modular model consisting of a pre-trained vision processing module (dense captioning model, DCM) and a natural language processing model (LLM), and (ii) a monolithic model consisting only of a VLM. To quantitatively evaluate the performances of the two methods, we use tabletop scenarios where the task is to clear the table. We contribute a multimodal benchmark dataset that takes object states into consideration. Our results show that both methods can be used for object state-sensitive tasks, but the monolithic approach outperforms the modular approach. The code for OSSA is available at \url{https://github.com/Xiao-wen-Sun/OSSA}
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages
Authors:
Junho Myung,
Nayeon Lee,
Yi Zhou,
Jiho **,
Rifki Afina Putri,
Dimosthenis Antypas,
Hsuvas Borkakoty,
Eunsu Kim,
Carla Perez-Almendros,
Abinew Ali Ayele,
Víctor Gutiérrez-Basulto,
Yazmín Ibáñez-García,
Hwaran Lee,
Shamsuddeen Hassan Muhammad,
Kiwoong Park,
Anar Sabuhi Rzayev,
Nina White,
Seid Muhie Yimam,
Mohammad Taher Pilehvar,
Nedjma Ousidhoum,
Jose Camacho-Collados,
Alice Oh
Abstract:
Large language models (LLMs) often lack culture-specific knowledge of daily life, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural sensitivities are limited to a single language or collected from online sources such as Wikipedia, which do not reflect the mundane everyday lifestyles of diverse regions. That is, information about the food…
▽ More
Large language models (LLMs) often lack culture-specific knowledge of daily life, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural sensitivities are limited to a single language or collected from online sources such as Wikipedia, which do not reflect the mundane everyday lifestyles of diverse regions. That is, information about the food people eat for their birthday celebrations, spices they typically use, musical instruments youngsters play, or the sports they practice in school is common cultural knowledge but uncommon in easily collected online sources, especially for underrepresented cultures. To address this issue, we introduce BLEnD, a hand-crafted benchmark designed to evaluate LLMs' everyday knowledge across diverse cultures and languages. BLEnD comprises 52.6k question-answer pairs from 16 countries/regions, in 13 different languages, including low-resource ones such as Amharic, Assamese, Azerbaijani, Hausa, and Sundanese. We construct the benchmark to include two formats of questions: short-answer and multiple-choice. We show that LLMs perform better for cultures that are highly represented online, with a maximum 57.34% difference in GPT-4, the best-performing model, in the short-answer format. For cultures represented by mid-to-high-resource languages, LLMs perform better in their local languages, but for cultures represented by low-resource languages, LLMs perform better in English than the local languages. We make our dataset publicly available at: https://github.com/nlee0212/BLEnD.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis
Authors:
Taewoo Kim,
Choongsang Cho,
Young Han Lee
Abstract:
In this paper, we present Period Singer, a novel end-to-end singing voice synthesis (SVS) model that utilizes variational inference for periodic and aperiodic components, aimed at producing natural-sounding waveforms. Recent end-to-end SVS models have demonstrated the capability of synthesizing high-fidelity singing voices. However, owing to deterministic pitch conditioning, they do not fully addr…
▽ More
In this paper, we present Period Singer, a novel end-to-end singing voice synthesis (SVS) model that utilizes variational inference for periodic and aperiodic components, aimed at producing natural-sounding waveforms. Recent end-to-end SVS models have demonstrated the capability of synthesizing high-fidelity singing voices. However, owing to deterministic pitch conditioning, they do not fully address the one-to-many problem. To address this problem, we present the Period Singer architecture, which integrates variational autoencoders for the periodic and aperiodic components. Additionally, our methodology eliminates the dependency on an external aligner by estimating the phoneme alignment through a monotonic alignment search within note boundaries. Our empirical evaluations show that Period Singer outperforms existing end-to-end SVS models on Mandarin and Korean datasets. The efficacy of the proposed method was further corroborated by ablation studies.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning
Authors:
Heejun Lee,
Geon Park,
Youngwan Lee,
**a Kim,
Wonyoung Jeong,
Myeongjae Jeon,
Sung Ju Hwang
Abstract:
In modern large language models (LLMs), increasing sequence lengths is a crucial challenge for enhancing their comprehension and coherence in handling complex tasks such as multi-modal question answering. However, handling long context sequences with LLMs is prohibitively costly due to the conventional attention mechanism's quadratic time and space complexity, and the context window size is limite…
▽ More
In modern large language models (LLMs), increasing sequence lengths is a crucial challenge for enhancing their comprehension and coherence in handling complex tasks such as multi-modal question answering. However, handling long context sequences with LLMs is prohibitively costly due to the conventional attention mechanism's quadratic time and space complexity, and the context window size is limited by the GPU memory. Although recent works have proposed linear and sparse attention mechanisms to address this issue, their real-world applicability is often limited by the need to re-train pre-trained models. In response, we propose a novel approach, Hierarchically Pruned Attention (HiP), which simultaneously reduces the training and inference time complexity from $O(T^2)$ to $O(T \log T)$ and the space complexity from $O(T^2)$ to $O(T)$. To this end, we devise a dynamic sparse attention mechanism that generates an attention mask through a novel tree-search-like algorithm for a given query on the fly. HiP is training-free as it only utilizes the pre-trained attention scores to spot the positions of the top-$k$ most significant elements for each query. Moreover, it ensures that no token is overlooked, unlike the sliding window-based sub-quadratic attention methods, such as StreamingLLM. Extensive experiments on diverse real-world benchmarks demonstrate that HiP significantly reduces prompt (i.e., prefill) and decoding latency and memory usage while maintaining high generation performance with little or no degradation. As HiP allows pretrained LLMs to scale to millions of tokens on commodity GPUs with no additional engineering due to its easy plug-and-play deployment, we believe that our work will have a large practical impact, opening up the possibility to many long-context LLM applications previously infeasible.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Projected background and sensitivity of AMoRE-II
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (81 additional authors not shown)
Abstract:
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap…
▽ More
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νββ}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Phase-resolving the absorption signatures of water and carbon monoxide in the atmosphere of the ultra-hot Jupiter WASP-121b with GEMINI-S/IGRINS
Authors:
Joost P. Wardenier,
Vivien Parmentier,
Michael R. Line,
Megan Weiner Mansfield,
Xianyu Tan,
Shang-Min Tsai,
Jacob L. Bean,
Jayne L. Birkby,
Matteo Brogi,
Jean-Michel Désert,
Siddharth Gandhi,
Elspeth K. H. Lee,
Colette I. Levens,
Lorenzo Pino,
Peter C. B. Smith
Abstract:
Ultra-hot Jupiters are among the best targets for atmospheric characterization at high spectral resolution. Resolving their transmission spectra as a function of orbital phase offers a unique window into the 3D nature of these objects. In this work, we present three transits of the ultra-hot Jupiter WASP-121b observed with Gemini-S/IGRINS. For the first time, we measure the phase-dependent absorpt…
▽ More
Ultra-hot Jupiters are among the best targets for atmospheric characterization at high spectral resolution. Resolving their transmission spectra as a function of orbital phase offers a unique window into the 3D nature of these objects. In this work, we present three transits of the ultra-hot Jupiter WASP-121b observed with Gemini-S/IGRINS. For the first time, we measure the phase-dependent absorption signals of CO and H$_{\text{2}}$O in the atmosphere of an exoplanet, and we find that they are different. While the blueshift of CO increases during the transit, the absorption lines of H$_{\text{2}}$O become less blueshifted with phase, and even show a redshift in the second half of the transit. These measurements reveal the distinct spatial distributions of both molecules across the atmospheres of ultra-hot Jupiters. Also, we find that the H$_{\text{2}}$O signal is absent in the first quarter of the transit, potentially hinting at cloud formation on the evening terminator of WASP-121b. To further interpret the absorption trails of CO and H$_{\text{2}}$O, as well as the Doppler shifts of Fe previously measured with VLT/ESPRESSO, we compare the data to simulated transits of WASP-121b. To this end, we post-processes the outputs of global circulation models with a 3D Monte-Carlo radiative transfer code. Our analysis shows that the atmosphere of WASP-121b is subject to atmospheric drag, as previously suggested by small hotspot offsets inferred from phase-curve observations. Our study highlights the importance of phase-resolved spectroscopy in unravelling the complex atmospheric structure of ultra-hot Jupiters and sets the stage for further investigations into their chemistry and dynamics.
△ Less
Submitted 19 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning
Authors:
Janghoon Han,
Changho Lee,
Joongbo Shin,
Stanley Jungkyu Choi,
Honglak Lee,
Kynghoon Bae
Abstract:
Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in ins…
▽ More
Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in instruction tuning, we perform instruction tuning individually for two distinct language meta-datasets. Subsequently, we assess the performance on unseen tasks in a language different from the one used for training. To facilitate this investigation, we introduce a novel non-English meta-dataset named "KORANI" (Korean Natural Instruction), comprising 51 Korean benchmarks. Moreover, we design cross-lingual templates to mitigate discrepancies in language and instruction-format of the template between training and inference within the cross-lingual setting. Our experiments reveal consistent improvements through cross-lingual generalization in both English and Korean, outperforming baseline by average scores of 20.7\% and 13.6\%, respectively. Remarkably, these enhancements are comparable to those achieved by monolingual instruction tuning and even surpass them in some tasks. The result underscores the significance of relevant data acquisition across languages over linguistic congruence with unseen tasks during instruction tuning.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
StreamBench: Towards Benchmarking Continuous Improvement of Language Agents
Authors:
Cheng-Kuang Wu,
Zhi Rui Tam,
Chieh-Yen Lin,
Yun-Nung Chen,
Hung-yi Lee
Abstract:
Recent works have shown that large language model (LLM) agents are able to improve themselves from experience, which is an important ability for continuous enhancement post-deployment. However, existing benchmarks primarily evaluate their innate capabilities and do not assess their ability to improve over time. To address this gap, we introduce StreamBench, a pioneering benchmark designed to evalu…
▽ More
Recent works have shown that large language model (LLM) agents are able to improve themselves from experience, which is an important ability for continuous enhancement post-deployment. However, existing benchmarks primarily evaluate their innate capabilities and do not assess their ability to improve over time. To address this gap, we introduce StreamBench, a pioneering benchmark designed to evaluate the continuous improvement of LLM agents over an input-feedback sequence. StreamBench simulates an online learning environment where LLMs receive a continuous flow of feedback stream and iteratively enhance their performance. In addition, we propose several simple yet effective baselines for improving LLMs on StreamBench, and provide a comprehensive analysis to identify critical components that contribute to successful streaming strategies. Our work serves as a step** stone towards develo** effective online learning strategies for LLMs, paving the way for more adaptive AI systems in streaming scenarios.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
Authors:
Jiatong Shi,
Shih-Heng Wang,
William Chen,
Martijn Bartelds,
Vanya Bannihatti Kumar,
**chuan Tian,
Xuankai Chang,
Dan Jurafsky,
Karen Livescu,
Hung-yi Lee,
Shinji Watanabe
Abstract:
ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB~2.0, which is a ne…
▽ More
ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB~2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models across downstream models, fine-tuning setups, and efficient model adaptation approaches. We find performance improvements over the setup of ML-SUPERB. However, performance depends on the downstream model design. Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches to improve multilingual ASR performance.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.