-
Effective Heterogeneous Federated Learning via Efficient Hypernetwork-based Weight Generation
Authors:
Yu** Shin,
Kichang Lee,
Sungmin Lee,
You Rim Choi,
Hyung-Sin Kim,
JeongGil Ko
Abstract:
While federated learning leverages distributed client resources, it faces challenges due to heterogeneous client capabilities. This necessitates allocating models suited to clients' resources and careful parameter aggregation to accommodate this heterogeneity. We propose HypeMeFed, a novel federated learning framework for supporting client heterogeneity by combining a multi-exit network architectu…
▽ More
While federated learning leverages distributed client resources, it faces challenges due to heterogeneous client capabilities. This necessitates allocating models suited to clients' resources and careful parameter aggregation to accommodate this heterogeneity. We propose HypeMeFed, a novel federated learning framework for supporting client heterogeneity by combining a multi-exit network architecture with hypernetwork-based model weight generation. This approach aligns the feature spaces of heterogeneous model layers and resolves per-layer information disparity during weight aggregation. To practically realize HypeMeFed, we also propose a low-rank factorization approach to minimize computation and memory overhead associated with hypernetworks. Our evaluations on a real-world heterogeneous device testbed indicate that HypeMeFed enhances accuracy by 5.12% over FedAvg, reduces the hypernetwork memory requirements by 98.22%, and accelerates its operations by 1.86 times compared to a naive hypernetwork approach. These results demonstrate HypeMeFed's effectiveness in leveraging and engaging heterogeneous clients for federated learning.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Hong-Ou-Mandel Interference with a Coexisting Clock using Transceivers for Synchronization over Deployed Fiber
Authors:
Anirudh Ramesh,
Daniel R. Reilly,
Kim Fook Lee,
Paul M. Moraw,
Joaquin Chung,
Md Shariful Islam,
Cristián Peña,
Xu Han,
Rajkumar Kettimuthu,
Prem Kumar,
Gregory Kanter
Abstract:
Interference between independently generated photons is a key step towards distributing entanglement over long distances, but it requires synchronization between the distantly-located photon sources. Synchronizing the clocks of such photon sources using coexisting two-way classical optical communications over the same fiber that transport the quantum photonic signals is a promising approach for ac…
▽ More
Interference between independently generated photons is a key step towards distributing entanglement over long distances, but it requires synchronization between the distantly-located photon sources. Synchronizing the clocks of such photon sources using coexisting two-way classical optical communications over the same fiber that transport the quantum photonic signals is a promising approach for achieving photon-photon interference over long distances, enabling entanglement distribution for quantum networking using the deployed fiber infrastructure. Here, we demonstrate photon-photon interference by observing the Hong-Ou-Mandel dip between two distantly-located sources: a weak coherent state source obtained by attenuating the output of a laser and a heralded single-photon source. We achieve a maximum dip visibility of $0.58 \pm 0.04$ when the two sources are connected via $4.3$ km of deployed fiber. Dip visibilities $>0.5$ are nonclassical and a first step towards achieving teleportation over the deployed fiber infrastructure. In our experiment, the classical optical communication is achieved with $-21$ dBm of optical signal launch power, which is used to synchronize the clocks in the two independent, distantly-located photon sources. The impact of spontaneous Raman scattering from the classical optical signals is mitigated by appropriate choice of the quantum and classical channel wavelengths. All equipment used in our experiment (the photon sources and the synchronization setup) is commercially available. Finally, our experiment represents a scalable approach to enabling practical quantum networking with commercial equipment and coexistence with classical communications in optical fiber.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
Authors:
Takyoung Kim,
Kyungjae Lee,
Young Rok Jang,
Ji Yong Cho,
Gangwoo Kim,
Minseok Cho,
Moontae Lee
Abstract:
Interactions with billion-scale large language models typically yield long-form responses due to their extensive parametric capacities, along with retrieval-augmented features. While detailed responses provide insightful viewpoint of a specific subject, they frequently generate redundant and less engaging content that does not meet user interests. In this work, we focus on the role of query outlin…
▽ More
Interactions with billion-scale large language models typically yield long-form responses due to their extensive parametric capacities, along with retrieval-augmented features. While detailed responses provide insightful viewpoint of a specific subject, they frequently generate redundant and less engaging content that does not meet user interests. In this work, we focus on the role of query outlining (i.e., selected sequence of queries) in scenarios that users request a specific range of information, namely coverage-conditioned ($C^2$) scenarios. For simulating $C^2$ scenarios, we construct QTree, 10K sets of information-seeking queries decomposed with various perspectives on certain topics. By utilizing QTree, we train QPlanner, a 7B language model generating customized query outlines that follow coverage-conditioned queries. We analyze the effectiveness of generated outlines through automatic and human evaluation, targeting on retrieval-augmented generation (RAG). Moreover, the experimental results demonstrate that QPlanner with alignment training can further provide outlines satisfying diverse user interests. Our resources are available at https://github.com/youngerous/qtree.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Ongoing and fossil large-scale outflows detected in a high-redshift radio galaxy: [C II] observations of TN J0924$-$2201 at $z=5.174$
Authors:
Kianhong Lee,
Masayuki Akiyama,
Kotaro Kohno,
Daisuke Iono,
Masatoshi Imanishi,
Bunyo Hatsukade,
Hideki Umehata,
Tohru Nagao,
Yoshiki Toba,
Xiaoyang Chen,
Fumi Egusa,
Kohei Ichikawa,
Takuma Izumi,
Naoki Matsumoto,
Malte Schramm,
Kenta Matsuoka
Abstract:
We present Atacama Large Millimeter/submillimeter Array observations of the [C II] 158 $μ$m line and the underlying continuum emission of TN J0924$-$2201, which is one of the most distant known radio galaxies at $z>5$. The [C II] line and 1-mm continuum emission are detected at the host galaxy. The systemic redshift derived from the [C II] line is $z_{\rm [C II]}=5.1736\pm0.0002$, indicating that…
▽ More
We present Atacama Large Millimeter/submillimeter Array observations of the [C II] 158 $μ$m line and the underlying continuum emission of TN J0924$-$2201, which is one of the most distant known radio galaxies at $z>5$. The [C II] line and 1-mm continuum emission are detected at the host galaxy. The systemic redshift derived from the [C II] line is $z_{\rm [C II]}=5.1736\pm0.0002$, indicating that the Ly$α$ line is redshifted by a velocity of $1035\pm10$ km s$^{-1}$, marking the largest velocity offset between the [C II] and Ly$α$ lines recorded at $z>5$ to date. In the central region of the host galaxy, we identified a redshifted substructure of [C II] with a velocity of $702\pm17$ km s$^{-1}$, which is close to the CIV line with a velocity of $500\pm10$ km s$^{-1}$. The position and the velocity offsets align with a model of an outflowing shell structure, consistent with the large velocity offset of Ly$α$. The non-detection of [C II] and dust emission from the three CO(1--0)-detected companions indicates their different nature compared to dwarf galaxies based on the photodissociation region model. Given their large velocity of $\sim1500$ km s$^{-1}$, outflowing molecular clouds induced by the AGN is the most plausible interpretation, and they may exceed the escape velocity of a $10^{13}\,M_{\odot}$ halo. These results suggest that TN J0924$-$2201, with the ongoing and fossil large-scale outflows, is in a distinctive phase of removing molecular gas from a central massive galaxy in an overdense region in the early universe. A dusty HI absorber at the host galaxy is an alternative interpretation.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
A Robust Power Model Training Framework for Cloud Native Runtime Energy Metric Exporter
Authors:
Sunyanan Choochotkaew,
Chen Wang,
Huamin Chen,
Tatsuhiro Chiba,
Marcelo Amaral,
Eun Kyung Lee,
Tamar Eilam
Abstract:
Estimating power consumption in modern Cloud environments is essential for carbon quantification toward green computing. Specifically, it is important to properly account for the power consumed by each of the running applications, which are packaged as containers. This paper examines multiple challenges associated with this goal. The first challenge is that multiple customers are sharing the same…
▽ More
Estimating power consumption in modern Cloud environments is essential for carbon quantification toward green computing. Specifically, it is important to properly account for the power consumed by each of the running applications, which are packaged as containers. This paper examines multiple challenges associated with this goal. The first challenge is that multiple customers are sharing the same hardware platform (multi-tenancy), where information on the physical servers is mostly obscured. The second challenge is the overhead in power consumption that the Cloud platform control plane induces. This paper addresses these challenges and introduces a novel pipeline framework for power model training. This allows versatile power consumption approximation of individual containers on the basis of available performance counters and other metrics. The proposed model utilizes machine learning techniques to predict the power consumed by the control plane and associated processes, and uses it for isolating the power consumed by the user containers, from the server power consumption. To determine how well the prediction results in an isolation, we introduce a metric termed isolation goodness. Applying the proposed power model does not require online power measurements, nor does it need information on the physical servers, configuration, or information on other tenants sharing the same machine. The results of cross-workload, cross-platform experiments demonstrated the higher accuracy of the proposed model when predicting power consumption of unseen containers on unknown platforms, including on virtual machines.
△ Less
Submitted 9 April, 2024;
originally announced July 2024.
-
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Authors:
Zheyang Xiong,
Vasilis Papageorgiou,
Kangwook Lee,
Dimitris Papailiopoulos
Abstract:
Recent studies have shown that Large Language Models (LLMs) struggle to accurately retrieve information and maintain reasoning capabilities when processing long-context inputs. To address these limitations, we propose a finetuning approach utilizing a carefully designed synthetic dataset comprising numerical key-value retrieval tasks. Our experiments on models like GPT-3.5 Turbo and Mistral 7B dem…
▽ More
Recent studies have shown that Large Language Models (LLMs) struggle to accurately retrieve information and maintain reasoning capabilities when processing long-context inputs. To address these limitations, we propose a finetuning approach utilizing a carefully designed synthetic dataset comprising numerical key-value retrieval tasks. Our experiments on models like GPT-3.5 Turbo and Mistral 7B demonstrate that finetuning LLMs on this dataset significantly improves LLMs' information retrieval and reasoning capabilities in longer-context settings. We present an analysis of the finetuned models, illustrating the transfer of skills from synthetic to real task evaluations (e.g., $10.5\%$ improvement on $20$ documents MDQA at position $10$ for GPT-3.5 Turbo). We also find that finetuned LLMs' performance on general benchmarks remains almost constant while LLMs finetuned on other baseline long-context augmentation data can encourage hallucination (e.g., on TriviaQA, Mistral 7B finetuned on our synthetic data cause no performance drop while other baseline data can cause a drop that ranges from $2.33\%$ to $6.19\%$). Our study highlights the potential of finetuning on synthetic data for improving the performance of LLMs on longer-context tasks.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Isotropy of cosmic rays beyond $10^{20}$ eV favors their heavy mass composition
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
Y. Abe,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
I. Buckland,
B. G. Cheon,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
N. Globus,
R. Gonzalez,
W. Hanlon,
N. Hayashida,
H. He
, et al. (118 additional authors not shown)
Abstract:
We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the resul…
▽ More
We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the results are consistent with a relatively heavy injected composition at E ~ 10 EeV that becomes lighter up to E ~ 100 EeV, while the composition at E > 100 EeV is very heavy. The latter is true even in the presence of highest experimentally allowed extra-galactic magnetic fields, while the composition at lower energies can be light if a strong EGMF is present. The effect of the uncertainty in the galactic magnetic field on these results is subdominant.
△ Less
Submitted 3 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Mass composition of ultra-high energy cosmic rays from distribution of their arrival directions with the Telescope Array
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
Y. Abe,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
I. Buckland,
B. G. Cheon,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
N. Globus,
R. Gonzalez,
W. Hanlon,
N. Hayashida,
H. He
, et al. (118 additional authors not shown)
Abstract:
We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale struc…
▽ More
We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale structure (LSS) of the Universe. As we report in the companion letter, the TA data show large deflections with respect to the LSS which can be explained, assuming small extra-galactic magnetic fields (EGMF), by an intermediate composition changing to a heavy one (iron) in the highest energy bin. Here we show that these results are robust to uncertainties in UHECR injection spectra, the energy scale of the experiment and galactic magnetic fields (GMF). The assumption of weak EGMF, however, strongly affects this interpretation at all but the highest energies E > 100 EeV, where the remarkable isotropy of the data implies a heavy injected composition even in the case of strong EGMF. This result also holds if UHECR sources are as rare as $2 \times 10^{-5}$ Mpc$^{-3}$, that is the conservative lower limit for the source number density.
△ Less
Submitted 3 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
The global dynamics for the Maxwell-Dirac system
Authors:
Yonggeun Cho,
Kiyeon Lee
Abstract:
In this paper, we study the (1+3) dimensional massive Maxwell-Dirac system in the context of global existence and asymptotic behavior of solutions under the Lorenz gauge condition, as well as the modified and linear scattering phenomena for the Dirac spinor and the electromagnetic potential, respectively. We employ a vector fields energy method combined with a detailed analysis of the space-time r…
▽ More
In this paper, we study the (1+3) dimensional massive Maxwell-Dirac system in the context of global existence and asymptotic behavior of solutions under the Lorenz gauge condition, as well as the modified and linear scattering phenomena for the Dirac spinor and the electromagnetic potential, respectively. We employ a vector fields energy method combined with a detailed analysis of the space-time resonance argument. This approach allows us to establish decay estimates and energy bounds crucial for proving the main theorems. Especially, we provide the explicit phase correction arising from the strong nonlinear resonances.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
Authors:
USVSN Sai Prashanth,
Alvin Deng,
Kyle O'Brien,
Jyothir S V,
Mohammad Aflah Khan,
Jaydeep Borkar,
Christopher A. Choquette-Choo,
Jacob Ray Fuehne,
Stella Biderman,
Tracy Ke,
Katherine Lee,
Naomi Saphra
Abstract:
Memorization in language models is typically treated as a homogenous phenomenon, neglecting the specifics of the memorized data. We instead model memorization as the effect of a set of complex factors that describe each sample and relate it to the model and corpus. To build intuition around these factors, we break memorization down into a taxonomy: recitation of highly duplicated sequences, recons…
▽ More
Memorization in language models is typically treated as a homogenous phenomenon, neglecting the specifics of the memorized data. We instead model memorization as the effect of a set of complex factors that describe each sample and relate it to the model and corpus. To build intuition around these factors, we break memorization down into a taxonomy: recitation of highly duplicated sequences, reconstruction of inherently predictable sequences, and recollection of sequences that are neither. We demonstrate the usefulness of our taxonomy by using it to construct a predictive model for memorization. By analyzing dependencies and inspecting the weights of the predictive model, we find that different factors influence the likelihood of memorization differently depending on the taxonomic category.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection
Authors:
Duc-Tuan Truong,
Ruijie Tao,
Tuan Nguyen,
Hieu-Thi Luong,
Kong Aik Lee,
Eng Siong Chng
Abstract:
Recent synthetic speech detectors leveraging the Transformer model have superior performance compared to the convolutional neural network counterparts. This improvement could be due to the powerful modeling ability of the multi-head self-attention (MHSA) in the Transformer model, which learns the temporal relationship of each input token. However, artifacts of synthetic speech can be located in sp…
▽ More
Recent synthetic speech detectors leveraging the Transformer model have superior performance compared to the convolutional neural network counterparts. This improvement could be due to the powerful modeling ability of the multi-head self-attention (MHSA) in the Transformer model, which learns the temporal relationship of each input token. However, artifacts of synthetic speech can be located in specific regions of both frequency channels and temporal segments, while MHSA neglects this temporal-channel dependency of the input sequence. In this work, we proposed a Temporal-Channel Modeling (TCM) module to enhance MHSA's capability for capturing temporal-channel dependencies. Experimental results on the ASVspoof 2021 show that with only 0.03M additional parameters, the TCM module can outperform the state-of-the-art system by 9.25% in EER. Further ablation study reveals that utilizing both temporal and channel information yields the most improvement for detecting synthetic speech.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Math-LLaVA: Bootstrap** Mathematical Reasoning for Multimodal Large Language Models
Authors:
Wenhao Shi,
Zhiqiang Hu,
Yi Bin,
Junhua Liu,
Yang Yang,
See-Kiong Ng,
Lidong Bing,
Roy Ka-Wei Lee
Abstract:
Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer pairs per image, do not fully exploit visual information to enhance the multimodal mathematical reasoning capabilities of Multimodal LLMs (MLLMs). To bridge th…
▽ More
Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer pairs per image, do not fully exploit visual information to enhance the multimodal mathematical reasoning capabilities of Multimodal LLMs (MLLMs). To bridge this gap, we address the lack of high-quality, diverse multimodal mathematical datasets by collecting 40K high-quality images with question-answer pairs from 24 existing datasets and synthesizing 320K new pairs, creating the MathV360K dataset, which enhances both the breadth and depth of multimodal mathematical questions. We introduce Math-LLaVA, a LLaVA-1.5-based model fine-tuned with MathV360K. This novel approach significantly improves the multimodal mathematical reasoning capabilities of LLaVA-1.5, achieving a 19-point increase and comparable performance to GPT-4V on MathVista's minitest split. Furthermore, Math-LLaVA demonstrates enhanced generalizability, showing substantial improvements on the MMMU benchmark. Our research highlights the importance of dataset diversity and synthesis in advancing MLLMs' mathematical reasoning abilities. The code and data are available at: \url{https://github.com/HZQ950419/Math-LLaVA}.
△ Less
Submitted 26 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Improving Rehabilitative Assessment with Statistical and Shape Preserving Surrogate Data and Singular Spectrum Analysis
Authors:
T. K. M. Lee,
H. W. Chan,
K. H. Leo,
E. Chew,
Ling Zhao,
S. Sanei
Abstract:
Time series data are collected in temporal order and are widely used to train systems for prediction, modeling and classification to name a few. These systems require large amounts of data to improve generalization and prevent over-fitting. However there is a comparative lack of time series data due to operational constraints. This situation is alleviated by synthesizing data which have a suitable…
▽ More
Time series data are collected in temporal order and are widely used to train systems for prediction, modeling and classification to name a few. These systems require large amounts of data to improve generalization and prevent over-fitting. However there is a comparative lack of time series data due to operational constraints. This situation is alleviated by synthesizing data which have a suitable spread of features yet retain the distinctive features of the original data. These would be its basic statistical properties and overall shape which are important for short time series such as in rehabilitative applications or in quickly changing portions of lengthy data. In our earlier work synthesized surrogate time series were used to augment rehabilitative data. This gave good results in classification but the resulting waveforms did not preserve the original signal shape. To remedy this, we use singular spectrum analysis (SSA) to separate a signal into trends and cycles to describe the shape of the signal and low level components. In a novel way we subject the low level component to randomizing processes then recombine this with the original trend and cycle components to form a synthetic time series. We compare our approach with other methods, using statistical and shape measures and demonstrate its effectiveness in classification.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation
Authors:
Katherine M. Collins,
Najoung Kim,
Yonatan Bitton,
Verena Rieser,
Shayegan Omidshafiei,
Yushi Hu,
Sherol Chen,
Senjuti Dutta,
Minsuk Chang,
Kimin Lee,
Youwei Liang,
Georgina Evans,
Sahil Singla,
Gang Li,
Adrian Weller,
Junfeng He,
Deepak Ramachandran,
Krishnamurthy Dj Dvijotham
Abstract:
Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional co…
▽ More
Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional coarse-grained feedback (for example, thumbs up/down or ranking between a set of options). While fine-grained feedback holds promise, particularly for systems catering to diverse societal preferences, we show that demonstrating its superiority to coarse-grained feedback is not automatic. Through experiments on real and synthetic preference data, we surface the complexities of building effective models due to the interplay of model choice, feedback type, and the alignment between human judgment and computational interpretation. We identify key challenges in eliciting and utilizing fine-grained feedback, prompting a reassessment of its assumed benefits and practicality. Our findings -- e.g., that fine-grained feedback can lead to worse models for a fixed budget, in some settings; however, in controlled settings with known attributes, fine grained rewards can indeed be more helpful -- call for careful consideration of feedback attributes and potentially beckon novel modeling approaches to appropriately unlock the potential value of fine-grained feedback in-the-wild.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer
Authors:
Lu Zhang,
Tiancheng Zhao,
Heting Ying,
Yibo Ma,
Kyusong Lee
Abstract:
Recent advancements in Large Language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding. However, processing extensive videos such as 24-hour CCTV footage or full-length films presents significant challenges due to the vast data and processing demands. Traditional methods, like extracting key frames or converting frames to text, ofte…
▽ More
Recent advancements in Large Language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding. However, processing extensive videos such as 24-hour CCTV footage or full-length films presents significant challenges due to the vast data and processing demands. Traditional methods, like extracting key frames or converting frames to text, often result in substantial information loss. To address these shortcomings, we develop OmAgent, efficiently stores and retrieves relevant video frames for specific queries, preserving the detailed content of videos. Additionally, it features an Divide-and-Conquer Loop capable of autonomous reasoning, dynamically invoking APIs and tools to enhance query processing and accuracy. This approach ensures robust video understanding, significantly reducing information loss. Experimental results affirm OmAgent's efficacy in handling various types of videos and complex tasks. Moreover, we have endowed it with greater autonomy and a robust tool-calling system, enabling it to accomplish even more intricate tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection
Authors:
Kyungbok Lee,
You Zhang,
Zhiyao Duan
Abstract:
This paper addresses the challenge of develo** a robust audio-visual deepfake detection model. In practical use cases, new generation algorithms are continually emerging, and these algorithms are not encountered during the development of detection methods. This calls for the generalization ability of the method. Additionally, to ensure the credibility of detection methods, it is beneficial for t…
▽ More
This paper addresses the challenge of develo** a robust audio-visual deepfake detection model. In practical use cases, new generation algorithms are continually emerging, and these algorithms are not encountered during the development of detection methods. This calls for the generalization ability of the method. Additionally, to ensure the credibility of detection methods, it is beneficial for the model to interpret which cues from the video indicate it is fake. Motivated by these considerations, we then propose a multi-stream fusion approach with one-class learning as a representation-level regularization technique. We study the generalization problem of audio-visual deepfake detection by creating a new benchmark by extending and re-splitting the existing FakeAVCeleb dataset. The benchmark contains four categories of fake video(Real Audio-Fake Visual, Fake Audio-Fake Visual, Fake Audio-Real Visual, and unsynchronized video). The experimental results show that our approach improves the model's detection of unseen attacks by an average of 7.31% across four test sets, compared to the baseline model. Additionally, our proposed framework offers interpretability, indicating which modality the model identifies as fake.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations
Authors:
Yunze Xiao,
Yujia Hu,
Kenny Tsu Wei Choo,
Roy Ka-wei Lee
Abstract:
Detecting hate speech and offensive language is essential for maintaining a safe and respectful digital environment. This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data, with a focus on Chinese, a language particularly susceptible to such perturbations. We introduce \textsf{ToxiCloakCN}, an enhan…
▽ More
Detecting hate speech and offensive language is essential for maintaining a safe and respectful digital environment. This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data, with a focus on Chinese, a language particularly susceptible to such perturbations. We introduce \textsf{ToxiCloakCN}, an enhanced dataset derived from ToxiCN, augmented with homophonic substitutions and emoji transformations, to test the robustness of LLMs against these cloaking perturbations. Our findings reveal that existing models significantly underperform in detecting offensive content when these perturbations are applied. We provide an in-depth analysis of how different types of offensive content are affected by these perturbations and explore the alignment between human and model explanations of offensiveness. Our work highlights the urgent need for more advanced techniques in offensive language detection to combat the evolving tactics used to evade detection mechanisms.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer
Authors:
Keon Lee,
Dong Won Kim,
Jaehyeon Kim,
Jaewoong Cho
Abstract:
Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models f…
▽ More
Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models for TTS. In this work, we present an efficient and scalable Diffusion Transformer (DiT) that utilizes off-the-shelf pre-trained text and speech encoders. Our approach addresses the challenge of text-speech alignment via cross-attention mechanisms with the prediction of the total length of speech representations. To achieve this, we enhance the DiT architecture to suit TTS and improve the alignment by incorporating semantic guidance into the latent space of speech. We scale the training dataset and the model size to 82K hours and 790M parameters, respectively. Our extensive experiments demonstrate that the large-scale diffusion model for TTS without domain-specific modeling not only simplifies the training pipeline but also yields superior or comparable zero-shot performance to state-of-the-art TTS models in terms of naturalness, intelligibility, and speaker similarity. Our speech samples are available at https://ditto-tts.github.io.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression
Authors:
Zilun Zhang,
Yutao Sun,
Tiancheng Zhao,
Leigang Sha,
Ruochen Xu,
Kyusong Lee,
Jianwei Yin
Abstract:
Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks…
▽ More
Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks was observed compared to their single-modality counterparts. To address these challenges, we introduce a novel model-agnostic self-decompression method, Tree Generation (TG), that decompresses knowledge within LLMs into the training corpus. This paper focuses on TG-SFT, which can synthetically generate SFT data for the instruction tuning steps. By incorporating the dumped corpus during SFT for MLLMs, we significantly reduce the forgetting problem.
△ Less
Submitted 19 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Expanding the Design Space of Computer Vision-based Interactive Systems for Group Dance Practice
Authors:
Soohwan Lee,
Seoyeong Hwang,
Ian Oakley,
Kyungho Lee
Abstract:
Group dance, a sub-genre characterized by intricate motions made by a cohort of performers in tight synchronization, has a longstanding and culturally significant history and, in modern forms such as cheerleading, a broad base of current adherents. However, despite its popularity, learning group dance routines remains challenging. Based on the prior success of interactive systems to support indivi…
▽ More
Group dance, a sub-genre characterized by intricate motions made by a cohort of performers in tight synchronization, has a longstanding and culturally significant history and, in modern forms such as cheerleading, a broad base of current adherents. However, despite its popularity, learning group dance routines remains challenging. Based on the prior success of interactive systems to support individual dance learning, this paper argues that group dance settings are fertile ground for augmentation by interactive aids. To better understand these design opportunities, this paper presents a sequence of user-centered studies of and with amateur cheerleading troupes, spanning from the formative (interviews, observations) through the generative (an ideation workshop) to concept validation (technology probes and speed dating). The outcomes are a nuanced understanding of the lived practice of group dance learning, a set of interactive concepts to support those practices, and design directions derived from validating the proposed concepts. Through this empirical work, we expand the design space of interactive dance practice systems from the established context of single-user practice (primarily focused on gesture recognition) to a multi-user, group-based scenario focused on feedback and communication.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Conversational Agents as Catalysts for Critical Thinking: Challenging Design Fixation in Group Design
Authors:
Soohwan Lee,
Seoyeong Hwang,
Kyungho Lee
Abstract:
This paper investigates the potential of LLM-based conversational agents (CAs) to enhance critical reflection and mitigate design fixation in group design work. By challenging AI-generated recommendations and prevailing group opinions, these agents address issues such as groupthink and promote a more dynamic and inclusive design process. Key design considerations include optimizing intervention ti…
▽ More
This paper investigates the potential of LLM-based conversational agents (CAs) to enhance critical reflection and mitigate design fixation in group design work. By challenging AI-generated recommendations and prevailing group opinions, these agents address issues such as groupthink and promote a more dynamic and inclusive design process. Key design considerations include optimizing intervention timing, ensuring clarity in counterarguments, and balancing critical thinking with designers' satisfaction. CAs can also adapt to various roles, supporting individual and collective reflection. Our work aligns with the "Death of the Design Researcher?" workshop's goals, emphasizing the transformative potential of generative AI in resha** design practices and promoting ethical considerations. By exploring innovative uses of generative AI in group design contexts, we aim to stimulate discussion and open new pathways for future research and development, ultimately contributing to practical tools and resources for design researchers.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis
Authors:
Xin Wang,
Tomi Kinnunen,
Kong Aik Lee,
Paul-Gauthier Noé,
Junichi Yamagishi
Abstract:
Fusing outputs from automatic speaker verification (ASV) and spoofing countermeasure (CM) is expected to make an integrated system robust to zero-effort imposters and synthesized spoofing attacks. Many score-level fusion methods have been proposed, but many remain heuristic. This paper revisits score-level fusion using tools from decision theory and presents three main findings. First, fusion by s…
▽ More
Fusing outputs from automatic speaker verification (ASV) and spoofing countermeasure (CM) is expected to make an integrated system robust to zero-effort imposters and synthesized spoofing attacks. Many score-level fusion methods have been proposed, but many remain heuristic. This paper revisits score-level fusion using tools from decision theory and presents three main findings. First, fusion by summing the ASV and CM scores can be interpreted on the basis of compositional data analysis, and score calibration before fusion is essential. Second, the interpretation leads to an improved fusion method that linearly combines the log-likelihood ratios of ASV and CM. However, as the third finding reveals, this linear combination is inferior to a non-linear one in making optimal decisions. The outcomes of these findings, namely, the score calibration before fusion, improved linear fusion, and better non-linear fusion, were found to be effective on the SASV challenge database.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning
Authors:
Jeongheon Oh,
Kibok Lee
Abstract:
Supervised contrastive representation learning has been shown to be effective in various transfer learning scenarios. However, while asymmetric non-contrastive learning (ANCL) often outperforms its contrastive learning counterpart in self-supervised representation learning, the extension of ANCL to supervised scenarios is less explored. To bridge the gap, we study ANCL for supervised representatio…
▽ More
Supervised contrastive representation learning has been shown to be effective in various transfer learning scenarios. However, while asymmetric non-contrastive learning (ANCL) often outperforms its contrastive learning counterpart in self-supervised representation learning, the extension of ANCL to supervised scenarios is less explored. To bridge the gap, we study ANCL for supervised representation learning, coined SupSiam and SupBYOL, leveraging labels in ANCL to achieve better representations. The proposed supervised ANCL framework improves representation learning while avoiding collapse. Our analysis reveals that providing supervision to ANCL reduces intra-class variance, and the contribution of supervision should be adjusted to achieve the best performance. Experiments demonstrate the superiority of supervised ANCL across various datasets and tasks. The code is available at: https://github.com/JH-Oh-23/Sup-ANCL.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Latitudinal Asymmetry in the Dayside Atmosphere of WASP-43b
Authors:
Ryan C. Challener,
Zafar Rustamkulov,
Elspeth K. H. Lee,
Nikole Lewis,
David K. Sing,
Stephan M. Birkmann,
Nicolas Crouzet,
Néstor Espinoza,
Elena Manjavacas,
Natalia Oliveros-Gomez,
Jeff A. Valenti,
**gxuan Yang
Abstract:
We present two-dimensional near-infrared temperature maps of the canonical hot Jupiter WASP-43b using a phase-curve observation with JWST NIRSpec/G395H. From the white-light planetary transit, we improve constraints on the planet's orbital parameters and measure a planet-to-star radius ratio of $0.15883^{+0.00056}_{-0.00053}$. Using the white-light phase curve, we measure a longitude of maximum br…
▽ More
We present two-dimensional near-infrared temperature maps of the canonical hot Jupiter WASP-43b using a phase-curve observation with JWST NIRSpec/G395H. From the white-light planetary transit, we improve constraints on the planet's orbital parameters and measure a planet-to-star radius ratio of $0.15883^{+0.00056}_{-0.00053}$. Using the white-light phase curve, we measure a longitude of maximum brightness of $6.9^{+0^\circ.5}_{-0^\circ.5}$ east of the substellar point and a phase-curve offset of $10.0^{+0^\circ.8}_{-0^\circ.8}$. We also find an $\approx4σ$ detection of a latitudinal hotspot offset of $-13.4^{+3^\circ.2}_{-1^\circ.7}$, the first significant detection of a non-equatorial hotspot in an exoplanet atmosphere. We show that this detection is robust to variations within planetary parameter uncertainties, but only if the transit is used to improve constraints, showing the importance of transit observations to eclipse map**. Maps retrieved from the NRS1 and NRS2 detectors are similar, with hotspot locations consistent between the two detectors at the $1σ$ level. Our JWST data show brighter (hotter) nightsides and a dimmer (colder) dayside at the shorter wavelengths relative to fits to \textit{Spitzer} 3.6 and 4.5 \microns\ phase curves. Through comparison between our phase curves and a set of general circulation models, we find evidence for clouds on the nightside and atmospheric drag or high metallicity reducing the eastward hotspot offset.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Projected background and sensitivity of AMoRE-II
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (81 additional authors not shown)
Abstract:
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap…
▽ More
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νββ}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Phase-resolving the absorption signatures of water and carbon monoxide in the atmosphere of the ultra-hot Jupiter WASP-121b with GEMINI-S/IGRINS
Authors:
Joost P. Wardenier,
Vivien Parmentier,
Michael R. Line,
Megan Weiner Mansfield,
Xianyu Tan,
Shang-Min Tsai,
Jacob L. Bean,
Jayne L. Birkby,
Matteo Brogi,
Jean-Michel Désert,
Siddharth Gandhi,
Elspeth K. H. Lee,
Colette I. Levens,
Lorenzo Pino,
Peter C. B. Smith
Abstract:
Ultra-hot Jupiters are among the best targets for atmospheric characterization at high spectral resolution. Resolving their transmission spectra as a function of orbital phase offers a unique window into the 3D nature of these objects. In this work, we present three transits of the ultra-hot Jupiter WASP-121b observed with Gemini-S/IGRINS. For the first time, we measure the phase-dependent absorpt…
▽ More
Ultra-hot Jupiters are among the best targets for atmospheric characterization at high spectral resolution. Resolving their transmission spectra as a function of orbital phase offers a unique window into the 3D nature of these objects. In this work, we present three transits of the ultra-hot Jupiter WASP-121b observed with Gemini-S/IGRINS. For the first time, we measure the phase-dependent absorption signals of CO and H$_{\text{2}}$O in the atmosphere of an exoplanet, and we find that they are different. While the blueshift of CO increases during the transit, the absorption lines of H$_{\text{2}}$O become less blueshifted with phase, and even show a redshift in the second half of the transit. These measurements reveal the distinct spatial distributions of both molecules across the atmospheres of ultra-hot Jupiters. Also, we find that the H$_{\text{2}}$O signal is absent in the first quarter of the transit, potentially hinting at cloud formation on the evening terminator of WASP-121b. To further interpret the absorption trails of CO and H$_{\text{2}}$O, as well as the Doppler shifts of Fe previously measured with VLT/ESPRESSO, we compare the data to simulated transits of WASP-121b. To this end, we post-processes the outputs of global circulation models with a 3D Monte-Carlo radiative transfer code. Our analysis shows that the atmosphere of WASP-121b is subject to atmospheric drag, as previously suggested by small hotspot offsets inferred from phase-curve observations. Our study highlights the importance of phase-resolved spectroscopy in unravelling the complex atmospheric structure of ultra-hot Jupiters and sets the stage for further investigations into their chemistry and dynamics.
△ Less
Submitted 19 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
Authors:
Kang-il Lee,
Minbeom Kim,
Minsung Kim,
Dongryeol Lee,
Hyukhun Koh,
Kyomin Jung
Abstract:
Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with im…
▽ More
Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with images that are out of training distribution. Despite its importance, current methods for accurately measuring language priors in LVLMs are poorly studied. Although existing benchmarks based on counterfactual or out-of-distribution images can partially be used to measure language priors, they fail to disentangle language priors from other confounding factors. To this end, we propose a new benchmark called VLind-Bench, which is the first benchmark specifically designed to measure the language priors, or blindness, of LVLMs. It not only includes tests on counterfactual images to assess language priors but also involves a series of tests to evaluate more basic capabilities such as commonsense knowledge, visual perception, and commonsense biases. For each instance in our benchmark, we ensure that all these basic tests are passed before evaluating the language priors, thereby minimizing the influence of other factors on the assessment. The evaluation and analysis of recent LVLMs in our benchmark reveal that almost all models exhibit a significant reliance on language priors, presenting a strong challenge in the field.
△ Less
Submitted 17 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
ODIN: Identifying Protoclusters and Cosmic Filaments Traced by Ly$α$-emitting Galaxies
Authors:
Vandana Ramakrishnan,
Kyoung-Soo Lee,
Maria Celeste Artale,
Eric Gawiser. Yu** Yang,
Changbom Park,
Robin Ciardullo,
Lucia Guaita,
Sang Hyeok Im,
Seongjae Kim,
Ankit Kumar,
Jaehyun Lee,
Seong-Kook Lee,
Byeongha Moon,
Nelson Padilla,
Alexandra Pope,
Roxana Popescu,
Hyunmi Song,
Paulina Troncoso,
Francisco Valdes,
Ann Zabludoff
Abstract:
To understand the formation and evolution of massive cosmic structures, studying them at high redshift, in the epoch when they formed the majority of their mass is essential. The One-hundred-deg$^2$ DECam Imaging in Narrowbands (ODIN) survey is undertaking the widest-area narrowband program to date, to use Ly$α$-emitting galaxies (LAEs) to trace the large-scale structure (LSS) of the Universe at t…
▽ More
To understand the formation and evolution of massive cosmic structures, studying them at high redshift, in the epoch when they formed the majority of their mass is essential. The One-hundred-deg$^2$ DECam Imaging in Narrowbands (ODIN) survey is undertaking the widest-area narrowband program to date, to use Ly$α$-emitting galaxies (LAEs) to trace the large-scale structure (LSS) of the Universe at three cosmic epochs. In this work, we present results at $z$ = 3.1 based on early ODIN data in the COSMOS field. We identify and characterize protoclusters and cosmic filaments using multiple methods and discuss their strengths and weaknesses. We then compare our observations against the IllustrisTNG suite of cosmological hydrodynamical simulations. The two are in excellent agreement, with a similar number and angular size of structures identified above a specified density threshold. We are able to recover the simulated protoclusters with $\log$(M$_{z=0}$/$M_\odot$) $\gtrsim$ 14.4 in $\sim$ 60\% of the cases. With these objects we show that the descendant masses of the protoclusters in our sample can be estimated purely based on our 2D measurements, finding a median $z$ = 0 mass of $\sim10^{14.5}$M$_\odot$. The lack of information on the radial extent of each protocluster introduces a $\sim$0.4~dex uncertainty in its descendant mass. Finally, we show that the recovery of the cosmic web in the vicinity of protoclusters is both efficient and accurate. The similarity of our observations and the simulations imply that our structure selection is likewise robust and efficient, demonstrating that LAEs are reliable tracers of the LSS.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Observation of Declination Dependence in the Cosmic Ray Energy Spectrum
Authors:
The Telescope Array Collaboration,
R. U. Abbasi,
T. Abu-Zayyad,
M. Allen,
J. W. Belz,
D. R. Bergman,
I. Buckland,
W. Campbell,
B. G. Cheon,
K. Endo,
A. Fedynitch,
T. Fujii,
K. Fujisue,
K. Fujita,
M. Fukushima,
G. Furlich,
Z. Gerber,
N. Globus,
W. Hanlon,
N. Hayashida,
H. He,
K. Hibino,
R. Higuchi,
D. Ikeda,
T. Ishii
, et al. (101 additional authors not shown)
Abstract:
We report on an observation of the difference between northern and southern skies of the ultrahigh energy cosmic ray energy spectrum with a significance of ${\sim}8σ$. We use measurements from the two largest experiments$\unicode{x2014}$the Telescope Array observing the northern hemisphere and the Pierre Auger Observatory viewing the southern hemisphere. Since the comparison of two measurements fr…
▽ More
We report on an observation of the difference between northern and southern skies of the ultrahigh energy cosmic ray energy spectrum with a significance of ${\sim}8σ$. We use measurements from the two largest experiments$\unicode{x2014}$the Telescope Array observing the northern hemisphere and the Pierre Auger Observatory viewing the southern hemisphere. Since the comparison of two measurements from different observatories introduces the issue of possible systematic differences between detectors and analyses, we validate the methodology of the comparison by examining the region of the sky where the apertures of the two observatories overlap. Although the spectra differ in this region, we find that there is only a $1.8σ$ difference between the spectrum measurements when anisotropic regions are removed and a fiducial cut in the aperture is applied.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
S. Afanasiev,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
H. Al-Bataineh,
J. Alexander,
M. Alfred,
K. Aoki,
N. Apadula,
L. Aphecetche,
J. Asai,
H. Asano,
E. T. Atomssa,
R. Averbeck,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
G. Baksay,
L. Baksay,
A. Baldisseri
, et al. (510 additional authors not shown)
Abstract:
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs…
▽ More
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding
Authors:
Rui Wang,
Li** Chen,
Kong AiK Lee,
Zhen-Hua Ling
Abstract:
Voice anonymization has been developed as a technique for preserving privacy by replacing the speaker's voice in a speech signal with that of a pseudo-speaker, thereby obscuring the original voice attributes from machine recognition and human perception. In this paper, we focus on altering the voice attributes against machine recognition while retaining human perception. We referred to this as the…
▽ More
Voice anonymization has been developed as a technique for preserving privacy by replacing the speaker's voice in a speech signal with that of a pseudo-speaker, thereby obscuring the original voice attributes from machine recognition and human perception. In this paper, we focus on altering the voice attributes against machine recognition while retaining human perception. We referred to this as the asynchronous voice anonymization. To this end, a speech generation framework incorporating a speaker disentanglement mechanism is employed to generate the anonymized speech. The speaker attributes are altered through adversarial perturbation applied on the speaker embedding, while human perception is preserved by controlling the intensity of perturbation. Experiments conducted on the LibriSpeech dataset showed that the speaker attributes were obscured with their human perception preserved for 60.71% of the processed utterances.
△ Less
Submitted 13 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Authors:
Eungbeom Kim,
Hantae Kim,
Kyogu Lee
Abstract:
Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in frame-level alignment which ultimately hinders it from improving the student model's performance. In order to resolve this problem, this paper introduce…
▽ More
Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in frame-level alignment which ultimately hinders it from improving the student model's performance. In order to resolve this problem, this paper introduces a self-knowledge distillation (SKD) method that guides the frame-level alignment during the training time. In contrast to the conventional method using separate teacher and student models, this study introduces a simple and effective method sharing encoder layers and applying the sub-model as the student model. Overall, our approach is effective in improving both the resource efficiency as well as performance. We also conducted an experimental analysis of the spike timings to illustrate that the proposed method improves performance by reducing the alignment disagreement.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Optimal Qubit Map** Search for Encoding Classical Data into Matrix Product State Representation with Minimal Loss
Authors:
Hyeongjun Jeon,
Kyungmin Lee,
Dongkyu Lee,
Bongsang Kim,
Taehyun Kim
Abstract:
Matrix product state (MPS) offers a framework for encoding classical data into quantum states, enabling the efficient utilization of quantum resources for data representation and processing. This research paper investigates techniques to enhance the efficiency and accuracy of MPS representations specifically designed for encoding classical data. Based on the observations that MPS truncation error…
▽ More
Matrix product state (MPS) offers a framework for encoding classical data into quantum states, enabling the efficient utilization of quantum resources for data representation and processing. This research paper investigates techniques to enhance the efficiency and accuracy of MPS representations specifically designed for encoding classical data. Based on the observations that MPS truncation error depends on the pattern of the classical data, we devised an algorithm that finds optimal qubit map** for given classical data, thereby improving the efficiency and fidelity of the MPS representation. Furthermore, we evaluate the impact of the optimized MPS in the context of quantum classifiers, demonstrating their enhanced performance compared to the conventional map**. This improvement confirms the efficacy of the proposed techniques for encoding classical data into quantum states. MPS representation combined with optimal qubit map** can pave a new way for more efficient and accurate quantum data representation and processing.
△ Less
Submitted 12 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection
Authors:
Shenao Yan,
Shen Wang,
Yue Duan,
Hanbin Hong,
Kiho Lee,
Doowon Kim,
Yuan Hong
Abstract:
Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. To address this critical security challenge, we introduce CodeBreaker, a pioneering LLM-assisted backdoo…
▽ More
Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. To address this critical security challenge, we introduce CodeBreaker, a pioneering LLM-assisted backdoor attack framework on code completion models. Unlike recent attacks that embed malicious payloads in detectable or irrelevant sections of the code (e.g., comments), CodeBreaker leverages LLMs (e.g., GPT-4) for sophisticated payload transformation (without affecting functionalities), ensuring that both the poisoned data for fine-tuning and generated code can evade strong vulnerability detection. CodeBreaker stands out with its comprehensive coverage of vulnerabilities, making it the first to provide such an extensive set for evaluation. Our extensive experimental evaluations and user studies underline the strong attack performance of CodeBreaker across various settings, validating its superiority over existing approaches. By integrating malicious payloads directly into the source code with minimal transformation, CodeBreaker challenges current security measures, underscoring the critical need for more robust defenses for code completion.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Analyzing user archetypes in Singapore's Telegram groups on COVID-19 and climate change
Authors:
Val Alvern Cueco Ligo,
Lan Tianxiang,
Ying Zeng,
Lam Yin Cheung,
Pi Zonooz,
Roy Ka-Wei Lee,
Koustuv Saha,
Edson C. Tandoc Jr.,
Navin Kumar
Abstract:
Social media platforms, particularly Telegram, play a pivotal role in sha** public perceptions and opinions on global and national issues. Unlike traditional news media, Telegram allows for the proliferation of user-generated content with minimal oversight, making it a significant venue for the spread of controversial and misinformative content. During the COVID-19 pandemic, Telegram's popularit…
▽ More
Social media platforms, particularly Telegram, play a pivotal role in sha** public perceptions and opinions on global and national issues. Unlike traditional news media, Telegram allows for the proliferation of user-generated content with minimal oversight, making it a significant venue for the spread of controversial and misinformative content. During the COVID-19 pandemic, Telegram's popularity surged in Singapore, a country with one of the highest rates of social media use globally. We leverage Singapore-based Telegram data to analyze information flows within groups focused on COVID-19 and climate change. Using k-means clustering, we identified distinct user archetypes, including Skeptic, Engaged Advocate, Observer, and Analyst, each contributing uniquely to the discourse. We developed a model to classify users into these clusters (Precision: Climate change: 0.99; COVID-19: 0.95). By identifying these user archetypes and examining their contributions to information dissemination, we sought to uncover patterns to inform effective strategies for combating misinformation and enhancing public discourse on pressing global issues.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning
Authors:
Donghu Kim,
Hojoon Lee,
Kyungmin Lee,
Dongyoon Hwang,
Jaegul Choo
Abstract:
Recently, various pre-training methods have been introduced in vision-based Reinforcement Learning (RL). However, their generalization ability remains unclear due to evaluations being limited to in-distribution environments and non-unified experimental setups. To address this, we introduce the Atari Pre-training Benchmark (Atari-PB), which pre-trains a ResNet-50 model on 10 million transitions fro…
▽ More
Recently, various pre-training methods have been introduced in vision-based Reinforcement Learning (RL). However, their generalization ability remains unclear due to evaluations being limited to in-distribution environments and non-unified experimental setups. To address this, we introduce the Atari Pre-training Benchmark (Atari-PB), which pre-trains a ResNet-50 model on 10 million transitions from 50 Atari games and evaluates it across diverse environment distributions. Our experiments show that pre-training objectives focused on learning task-agnostic features (e.g., identifying objects and understanding temporal dynamics) enhance generalization across different environments. In contrast, objectives focused on learning task-specific knowledge (e.g., identifying agents and fitting reward functions) improve performance in environments similar to the pre-training dataset but not in varied ones. We publicize our codes, datasets, and model checkpoints at https://github.com/dojeon-ai/Atari-PB.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Authors:
Seungone Kim,
Juyoung Suk,
Ji Yong Cho,
Shayne Longpre,
Chaeeun Kim,
Dongkeun Yoon,
Gui** Son,
Ye** Cho,
Sheikh Shafayat,
**heon Baek,
Sue Hyun Park,
Hyeonbin Hwang,
**kyung Jo,
Hyowon Cho,
Haebin Shin,
Seongyun Lee,
Hanseok Oh,
Noah Lee,
Namgyu Ho,
Se June Joo,
Miyoung Ko,
Yoonjoo Lee,
Hyungjoo Chae,
Jamin Shin,
Joel Jang
, et al. (7 additional authors not shown)
Abstract:
As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec…
▽ More
As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks. A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation. We apply this benchmark to assess 103 frontier LMs using five evaluator LMs. Our code, data, and evaluation results are all publicly available at https://github.com/prometheus-eval/prometheus-eval/tree/main/BiGGen-Bench.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Microscopic Dynamics of Particle Rearrangement and its Correlation with Stick-slip Behavior in Granular Shear
Authors:
Kwangmin Lee,
Ryan C. Hurley
Abstract:
The link between the microscopic dynamics of particles and the macroscale response of granular materials has not been well established. To address this, we investigated the microscopic dynamics and fluctuations in the force network in a granular material subjected to plane shear. A two-dimensional discrete element model of a plane shear test was established, considering both sliding and rolling fr…
▽ More
The link between the microscopic dynamics of particles and the macroscale response of granular materials has not been well established. To address this, we investigated the microscopic dynamics and fluctuations in the force network in a granular material subjected to plane shear. A two-dimensional discrete element model of a plane shear test was established, considering both sliding and rolling friction. We found that particle rearrangement originated from the reduction of inter-particle forces in an identifiable region, which we call the greatest reduction (GR) region, defined as a region containing inter-particle forces experiencing the greatest decrease in magnitude in a given time interval. Statistical analysis shows that not only the magnitude of the greatest non-affine deformation in the GR region but also dynamics of neighboring particles in the region are highly correlated with the macroscale shear stress drop. These trends were also observed for various sliding friction coefficients and for simulations containing many particles. However, the quantity and configuration of GR regions had minimal impact on macroscale stress drops. We expect that this study will shed new light on the relationship between microscopic dynamics and force network fluctuations in sheared granular media and contribute to develo** a mesoscale elasto-plastic model for these materials.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Aligning Large Language Models with Self-generated Preference Data
Authors:
Dongyoung Kim,
Kimin Lee,
**woo Shin,
Jaehyung Kim
Abstract:
Aligning large language models (LLMs) with human preferences becomes a key component to obtaining state-of-the-art performance, but it yields a huge cost to construct a large human-annotated preference dataset. To tackle this problem, we propose a new framework that boosts the alignment of LLMs through Self-generated Preference data (Selfie) using only a very small amount of human-annotated prefer…
▽ More
Aligning large language models (LLMs) with human preferences becomes a key component to obtaining state-of-the-art performance, but it yields a huge cost to construct a large human-annotated preference dataset. To tackle this problem, we propose a new framework that boosts the alignment of LLMs through Self-generated Preference data (Selfie) using only a very small amount of human-annotated preference data. Our key idea is leveraging the human prior knowledge within the small (seed) data and progressively improving the alignment of LLM, by iteratively generating the responses and learning from them with the self-annotated preference data. To be specific, we propose to derive the preference label from the logits of LLM to explicitly extract the model's inherent preference. Compared to the previous approaches using external reward models or implicit in-context learning, we observe that the proposed approach is significantly more effective. In addition, we introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data. Our experimental results demonstrate that the proposed framework significantly boosts the alignment of LLMs. For example, we achieve superior alignment performance on AlpacaEval 2.0 with only 3.3\% of the ground-truth preference labels in the Ultrafeedback data compared to the cases using the entire data or state-of-the-art baselines.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
L-shadowing lemma for the Cauchy equation
Authors:
K. Lee,
C. A. Morales
Abstract:
We prove that if the Cauchy problem $\dot{u}=Au$ in a Banach space is hyperbolic, then the problem has the L-shadowing property. Conversely, if the space is finite-dimensional and the L-shadowing property is satisfied, then the problem is hyperbolic. This generalizes a previous result by Ombach \cite{o, o1} for linear homeomorphisms. Some short applications are given.
We prove that if the Cauchy problem $\dot{u}=Au$ in a Banach space is hyperbolic, then the problem has the L-shadowing property. Conversely, if the space is finite-dimensional and the L-shadowing property is satisfied, then the problem is hyperbolic. This generalizes a previous result by Ombach \cite{o, o1} for linear homeomorphisms. Some short applications are given.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Coherent control of a triangular exchange-only spin qubit
Authors:
Edwin Acuna,
Joseph D. Broz,
Kaushal Shyamsundar,
Antonio B. Mei,
Colin P. Feeney,
Valerie Smetanka,
Tiffany Davis,
Kangmu Lee,
Maxwell D. Choi,
Brydon Boyd,
June Suh,
Wonill D. Ha,
Cameron Jennings,
Andrew S. Pan,
Daniel S. Sanchez,
Matthew D. Reed,
Jason R. Petta
Abstract:
We demonstrate coherent control of a three-electron exchange-only spin qubit with the quantum dots arranged in a close-packed triangular geometry. The device is tuned to confine one electron in each quantum dot, as evidenced by pairwise charge stability diagrams. Time-domain control of the exchange coupling is demonstrated and qubit performance is characterized using blind randomized benchmarking,…
▽ More
We demonstrate coherent control of a three-electron exchange-only spin qubit with the quantum dots arranged in a close-packed triangular geometry. The device is tuned to confine one electron in each quantum dot, as evidenced by pairwise charge stability diagrams. Time-domain control of the exchange coupling is demonstrated and qubit performance is characterized using blind randomized benchmarking, with an average single-qubit gate fidelity F = 99.84%. The compact triangular device geometry can be readily scaled to larger two-dimensional quantum dot arrays with high connectivity.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
The Massive and Distant Clusters of WISE Survey 2: A Stacking Analysis Investigating the Evolution of Star Formation Rates and Stellar Masses in Groups and Clusters
Authors:
A. Trudeau,
Anthony H. Gonzalez,
K. Thongkham,
Kyoung-Soo Lee,
Stacey Alberts,
M. Brodwin,
Thomas Connor,
Peter R. M. Eisenhardt,
Emily Moravec,
Eshwar Puvvada,
S. A. Stanford
Abstract:
The evolution of galaxies depends on their masses and local environments; understanding when and how environmental quenching starts to operate remains a challenge. Furthermore, studies of the high-redshift regime have been limited to massive cluster members, owing to sensitivity limits or small fields of views when the sensitivity is sufficient, intrinsically biasing the picture of cluster evoluti…
▽ More
The evolution of galaxies depends on their masses and local environments; understanding when and how environmental quenching starts to operate remains a challenge. Furthermore, studies of the high-redshift regime have been limited to massive cluster members, owing to sensitivity limits or small fields of views when the sensitivity is sufficient, intrinsically biasing the picture of cluster evolution. In this work, we use stacking to investigate the average star formation history of more than 10,000 groups and clusters drawn from the Massive and Distant Clusters of WISE Survey 2 (MaDCoWS2). Our analysis covers near ultraviolet to far infrared wavelengths, for galaxy overdensities at $0.5 \lesssim z \lesssim 2.54$. We employ SED fitting to measure the specific star formation rates (sSFR) in four annular apertures with radii between 0 and 1000 kpc. At $z \gtrsim 1.6$, the average sSFR evolves similarly to the field in both the core and the cluster outskirts. Between $\overline{z} = 1.60$ and $\overline{z} = 1.35$, the sSFR in the core drops sharply, and continues to fall relative to the field sSFR at lower redshifts. We interpret this change as evidence that the impact of environmental quenching dramatically increases at $z \sim 1.5$, with the short time span of the transition suggesting that the environmental quenching mechanism dominant at this redshift operates on a rapid timescale. We find indications that the sSFR may decrease with increasing host halo mass, but lower-scatter mass tracers than the signal-to-noise ratio (S/N) are needed to confirm this relationship.
△ Less
Submitted 25 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Physical Origin of H-Mode
Authors:
Kwan Chul Lee
Abstract:
The high confinement mode (H-mode), the most important operation mode for the nuclear fusion reactor, has been studied for 42 years, but the transition mechanism has not been unanimously agreed so far. Four decades of H-mode experiments revealed many features of heating power threshold (Pth) for the low to high confinement (L-H) transition, where Pth is proportional to the toroidal magnetic field…
▽ More
The high confinement mode (H-mode), the most important operation mode for the nuclear fusion reactor, has been studied for 42 years, but the transition mechanism has not been unanimously agreed so far. Four decades of H-mode experiments revealed many features of heating power threshold (Pth) for the low to high confinement (L-H) transition, where Pth is proportional to the toroidal magnetic field (B), inversely proportional to the ion mass (mi), and Pth has U-shaped dependence on the plasma density. It is found for the first time that this U-shaped rollover dependence came from Pth is inversely proportional to the multiplication of plasma density (ni) and the squire of neutral density (nn). The reason for the neutral density involved in the L-H transition is that the turbulence suppression takes place by the viscous force of the ion-neutral friction. When the plasma is in the equilibrium by the compensation of turbulence-induced return current to the gyro-center shift current generated by the ion-neutral charge exchanges, the Reynolds number (Re), the ratio of inertial force to the viscous force, can predict the onset of laminar to turbulent flow. Re for the plasma-neutral interaction is proportional to B / (mi nin^2n) which is well agreed with the experimental results of Pth. Re is also proportional to the second gradient of the radial electric field which is agreed with experiments. 15 characteristics of L-H transition are explained by Re including the feature of Pth having favorable dependence on the ion grad B drift toward the x-point.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers
Authors:
Brian K Chen,
Tianyang Hu,
Hui **,
Hwee Kuan Lee,
Kenji Kawaguchi
Abstract:
In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias ter…
▽ More
In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias terms. We mathematically demonstrate the equivalence between a model with ICL demonstration prompts and the same model with the additional bias terms. Our algorithm (ICLCA) allows for exact conversion in an inexpensive manner. Existing methods are not exact and require expensive parameter updates. We demonstrate the efficacy of our approach through experiments that show the exact incorporation of ICL tokens into a linear transformer. We further suggest how our method can be adapted to achieve cheap approximate conversion of ICL tokens, even in regular transformer networks that are not linearized. Our experiments on GPT-2 show that, even though the conversion is only approximate, the model still gains valuable context from the included bias terms.
△ Less
Submitted 6 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering
Authors:
ChaeHun Park,
Koanho Lee,
Hyesu Lim,
Jaeseok Kim,
Junmo Park,
Yu-Jung Heo,
Du-Seong Chang,
Jaegul Choo
Abstract:
Building a reliable visual question answering~(VQA) system across different languages is a challenging problem, primarily due to the lack of abundant samples for training. To address this challenge, recent studies have employed machine translation systems for the cross-lingual VQA task. This involves translating the evaluation samples into a source language (usually English) and using monolingual…
▽ More
Building a reliable visual question answering~(VQA) system across different languages is a challenging problem, primarily due to the lack of abundant samples for training. To address this challenge, recent studies have employed machine translation systems for the cross-lingual VQA task. This involves translating the evaluation samples into a source language (usually English) and using monolingual models (i.e., translate-test). However, our analysis reveals that translated texts contain unique characteristics distinct from human-written ones, referred to as translation artifacts. We find that these artifacts can significantly affect the models, confirmed by extensive experiments across diverse models, languages, and translation processes. In light of this, we present a simple data augmentation strategy that can alleviate the adverse impacts of translation artifacts.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
How should parallel cluster randomized trials with a baseline period be analyzed? A survey of estimands and common estimators
Authors:
Kenneth Menglin Lee,
Fan Li
Abstract:
The parallel cluster randomized trial with baseline (PB-CRT) is a common variant of the standard parallel cluster randomized trial (P-CRT) that maintains parallel randomization but additionally allows for both within and between-cluster comparisons. We define two estimands of interest in the context of PB-CRTs, the participant-average treatment effect (pATE) and cluster-average treatment effect (c…
▽ More
The parallel cluster randomized trial with baseline (PB-CRT) is a common variant of the standard parallel cluster randomized trial (P-CRT) that maintains parallel randomization but additionally allows for both within and between-cluster comparisons. We define two estimands of interest in the context of PB-CRTs, the participant-average treatment effect (pATE) and cluster-average treatment effect (cATE), to address participant and cluster-level hypotheses. Previous work has indicated that under informative cluster sizes, commonly used mixed-effects models may yield inconsistent estimators for the estimands of interest. In this work, we theoretically derive the convergence of the unweighted and inverse cluster-period size weighted (i.) independence estimating equation, (ii.) fixed-effects model, (iii.) exchangeable mixed-effects model, and (iv.) nested-exchangeable mixed-effects model treatment effect estimators in a PB-CRT with continuous outcomes. We report a simulation study to evaluate the bias and inference with these different treatment effect estimators and their corresponding model-based or jackknife variance estimators. We then re-analyze a PB-CRT examining the effects of community youth teams on improving mental health among adolescent girls in rural eastern India. We demonstrate that the unweighted and weighted independence estimating equation and fixed-effects model regularly yield consistent estimators for the pATE and cATE estimands, whereas the mixed-effects models yield inconsistent estimators under informative cluster sizes. However, we demonstrate that unlike the nested-exchangeable mixed-effects model and corresponding analyses in P-CRTs, the exchangeable mixed-effects model is surprisingly robust to bias in many PB-CRT scenarios.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
The clustering of Lyman Alpha Emitting galaxies at z=2-3
Authors:
M. White,
A. Raichoor,
Arjun Dey,
Lehman H. Garrison,
Eric Gawiser,
D. Lang,
Kyoung-soo Lee,
A. D. Myers,
D. Schlegel,
F. Valdes,
J. Aguilar,
S. Ahlen,
D. Brooks,
E. Chaussidon,
T. Claybaugh,
K. Dawson,
A. de la Macorra,
Biprateep Dey,
P. Doel,
K. Fanning,
A. Font-Ribera,
J. E. Forero-Romero,
S. Gontcho A Gontcho,
G. Gutierrez,
J. Guy
, et al. (30 additional authors not shown)
Abstract:
We measure the clustering of Lyman Alpha Emitting galaxies (LAEs) selected from the One-hundred-square-degree DECam Imaging in Narrowbands (ODIN) survey, with spectroscopic follow-up from Dark Energy Spectroscopic Instrument (DESI). We use DESI spectroscopy to optimize our selection and to constrain the interloper fraction and redshift distribution of our narrow-band selected sources. We select sa…
▽ More
We measure the clustering of Lyman Alpha Emitting galaxies (LAEs) selected from the One-hundred-square-degree DECam Imaging in Narrowbands (ODIN) survey, with spectroscopic follow-up from Dark Energy Spectroscopic Instrument (DESI). We use DESI spectroscopy to optimize our selection and to constrain the interloper fraction and redshift distribution of our narrow-band selected sources. We select samples at z=2.45 and 3.1 in the COSMOS field with median Ly-alpha fluxes of 10^{-16}erg/s/cm2. Covariances and cosmological inferences are obtained from a series of mock catalogs built upon high-resolution N-body simulations that match the footprint, number density, redshift distribution and observed clustering of the sample. We find that both samples have a correlation length of r_0=3.0+/-0.2 Mpc/h. Within our fiducial cosmology these correspond to 3D number densities of 10^{-3}h3/Mpc3 and, from our mock catalogs, biases of 1.7 and 2.0 at z=2.45 and 3.1, respectively. We discuss the implications of these measurements for the use of LAEs as large-scale structure tracers for high-redshift cosmology.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Searching For Music Mixing Graphs: A Pruning Approach
Authors:
Sungho Lee,
Marco A. Martínez-Ramírez,
Wei-Hsiang Liao,
Stefan Uhlich,
Giorgio Fabbro,
Kyogu Lee,
Yuki Mitsufuji
Abstract:
Music mixing is compositional -- experts combine multiple audio processors to achieve a cohesive mix from dry source tracks. We propose a method to reverse engineer this process from the input and output audio. First, we create a mixing console that applies all available processors to every chain. Then, after the initial console parameter optimization, we alternate between removing redundant proce…
▽ More
Music mixing is compositional -- experts combine multiple audio processors to achieve a cohesive mix from dry source tracks. We propose a method to reverse engineer this process from the input and output audio. First, we create a mixing console that applies all available processors to every chain. Then, after the initial console parameter optimization, we alternate between removing redundant processors and fine-tuning. We achieve this through differentiable implementation of both processors and pruning. Consequently, we find a sparse mixing graph that achieves nearly identical matching quality of the full mixing console. We apply this procedure to dry-mix pairs from various datasets and collect graphs that also can be used to train neural networks for music mixing applications.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Dimers for Type D Relativistic Toda Model
Authors:
Kimyeong Lee,
Norton Lee
Abstract:
We construct dimer graphs for type D relativistic Toda models by introducing impurities to the $Y^{2N,0}$ square dimer graphs. By properly placing the impurities and change of canonical variables assigned to the 1-loops on the dimer graph, we introduce the "folding" of the graphs and get the type D relativistic Toda lattice Hamiltonian and monodromy matrix.
We construct dimer graphs for type D relativistic Toda models by introducing impurities to the $Y^{2N,0}$ square dimer graphs. By properly placing the impurities and change of canonical variables assigned to the 1-loops on the dimer graph, we introduce the "folding" of the graphs and get the type D relativistic Toda lattice Hamiltonian and monodromy matrix.
△ Less
Submitted 23 June, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences
Authors:
Taeryung Lee,
Fabien Baradel,
Thomas Lucas,
Kyoung Mu Lee,
Gregory Rogez
Abstract:
In this paper, we address the challenging problem of long-term 3D human motion generation. Specifically, we aim to generate a long sequence of smoothly connected actions from a stream of multiple sentences (i.e., paragraph). Previous long-term motion generating approaches were mostly based on recurrent methods, using previously generated motion chunks as input for the next step. However, this appr…
▽ More
In this paper, we address the challenging problem of long-term 3D human motion generation. Specifically, we aim to generate a long sequence of smoothly connected actions from a stream of multiple sentences (i.e., paragraph). Previous long-term motion generating approaches were mostly based on recurrent methods, using previously generated motion chunks as input for the next step. However, this approach has two drawbacks: 1) it relies on sequential datasets, which are expensive; 2) these methods yield unrealistic gaps between motions generated at each step. To address these issues, we introduce simple yet effective T2LM, a continuous long-term generation framework that can be trained without sequential data. T2LM comprises two components: a 1D-convolutional VQVAE, trained to compress motion to sequences of latent vectors, and a Transformer-based Text Encoder that predicts a latent sequence given an input text. At inference, a sequence of sentences is translated into a continuous stream of latent vectors. This is then decoded into a motion by the VQVAE decoder; the use of 1D convolutions with a local temporal receptive field avoids temporal inconsistencies between training and generated sequences. This simple constraint on the VQ-VAE allows it to be trained with short sequences only and produces smoother transitions. T2LM outperforms prior long-term generation models while overcoming the constraint of requiring sequential data; it is also competitive with SOTA single-action generation models.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.