Search | arXiv e-print repository

Hybrid attention structure preserving network for reconstruction of under-sampled OCT images

Abstract: Optical coherence tomography (OCT) is a non-invasive, high-resolution imaging technology that provides cross-sectional images of tissues. Dense acquisition of A-scans along the fast axis is required to obtain high digital resolution images. However, the dense acquisition will increase the acquisition time, causing the discomfort of patients. In addition, the longer acquisition time may lead to mot… ▽ More Optical coherence tomography (OCT) is a non-invasive, high-resolution imaging technology that provides cross-sectional images of tissues. Dense acquisition of A-scans along the fast axis is required to obtain high digital resolution images. However, the dense acquisition will increase the acquisition time, causing the discomfort of patients. In addition, the longer acquisition time may lead to motion artifacts, thereby reducing imaging quality. In this work, we proposed a hybrid attention structure preserving network (HASPN) to achieve super-resolution of under-sampled OCT images to speed up the acquisition. It utilized adaptive dilated convolution-based channel attention (ADCCA) and enhanced spatial attention (ESA) to better capture the channel and spatial information of the feature. Moreover, convolutional neural networks (CNNs) exhibit a higher sensitivity of low-frequency than high-frequency information, which may lead to a limited performance on reconstructing fine structures. To address this problem, we introduced an additional branch, i.e., textures & details branch, using high-frequency decomposition images to better super-resolve retinal structures. The superiority of our method was demonstrated by qualitative and quantitative comparisons with mainstream methods. HASPN was applied to the diabetic macular edema retinal dataset, validating its good generalization ability. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.21071 [pdf, other]

A Multi-wavelength, Multi-epoch Monitoring Campaign of Accretion Variability in T Tauri Stars from the ODYSSEUS Survey. II. Photometric Light Curves

Authors: John Wendeborn, Catherine C. Espaillat, Thanawuth Thanathibodee, Connor E. Robinson, Caeley V. Pittman, Nuria Calvet, Ágnes Kóspál, Konstantin N. Grankin, Fredrick M. Walter, Zhen Guo, Jochen Eislöffel

Abstract: Classical T Tauri Stars (CTTSs) are young, low-mass stars which accrete material from their surrounding protoplanetary disk. To better understand accretion variability, we conducted a multi-epoch, multi-wavelength photometric monitoring campaign of four CTTSs: TW Hya, RU Lup, BP Tau, and GM Aur, in 2021 and 2022, contemporaneous with HST UV and optical spectra. We find that all four targets displa… ▽ More Classical T Tauri Stars (CTTSs) are young, low-mass stars which accrete material from their surrounding protoplanetary disk. To better understand accretion variability, we conducted a multi-epoch, multi-wavelength photometric monitoring campaign of four CTTSs: TW Hya, RU Lup, BP Tau, and GM Aur, in 2021 and 2022, contemporaneous with HST UV and optical spectra. We find that all four targets display significant variability in their light curves, generally on days-long timescales (but in some cases year-to-year) often due to periodicity associated with stellar rotation and to stochastic accretion variability. Their is a strong connection between mass accretion and photometric variability in all bands, but the relationship varies per target and epoch. Thus, photometry should be used with caution as a direct measure of accretion in CTTSs. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 28 pages, 12 figures

arXiv:2405.21038 [pdf, other]

A Multi-wavelength, Multi-epoch Monitoring Campaign of Accretion Variability in T Tauri Stars from the ODYSSEUS Survey. I. HST FUV and NUV Spectra

Authors: John Wendeborn, Catherine C. Espaillat, Sophia Lopez, Thanawuth Thanathibodee, Connor E. Robinson, Caeley V. Pittman, Nuria Calvet, Nicole Flors, Fredrick M. Walter, Ágnes Kóspál, Konstantin N. Grankin, Ignacio Mendigutía, Hans Moritz Günther, Jochen Eislöffel, Zhen Guo, Kevin France, Eleonora Fiorellino, William J. Fischer, Péter Ábrahám, Gregory J. Herczeg

Abstract: The Classical T Tauri Star (CTTS) stage is a critical phase of the star and planet formation process. In an effort to better understand the mass accretion process, which can dictate further stellar evolution and planet formation, a multi-epoch, multi-wavelength photometric and spectroscopic monitoring campaign of four CTTSs (TW Hya, RU Lup, BP Tau, and GM Aur) was carried out in 2021 and 2022/2023… ▽ More The Classical T Tauri Star (CTTS) stage is a critical phase of the star and planet formation process. In an effort to better understand the mass accretion process, which can dictate further stellar evolution and planet formation, a multi-epoch, multi-wavelength photometric and spectroscopic monitoring campaign of four CTTSs (TW Hya, RU Lup, BP Tau, and GM Aur) was carried out in 2021 and 2022/2023 as part of the Outflows and Disks Around Young Stars: Synergies for the Exploration of ULYSSES Spectra (ODYSSEUS) program. Here we focus on the HST UV spectra obtained by the HST Director's Discretionary Time UV Legacy Library of Young Stars as Essential Standards (ULLYSES) program. Using accretion shock modeling, we find that all targets exhibit accretion variability, varying from short increases in accretion rate by up to a factor of 3 within 48 hours, to longer decreases in accretion rate by a factor of 2.5 over the course of 1 year. This is despite the generally consistent accretion morphology within each target. Additionally, we test empirical relationships between accretion rate and UV luminosity and find stark differences, showing that these relationships should not be used to estimate the accretion rate for individual target. Our work reinforces that future multi-epoch and simultaneous multi-wavelength studies are critical in our understanding of the accretion process in low-mass star formation. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 37 pages, 14 figures

arXiv:2405.19732 [pdf, other]

Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning

Authors: Zixian Guo, Ming Liu, Zhilong Ji, **feng Bai, Yiwen Guo, Wangmeng Zuo

Abstract: Learning a skill generally relies on both practical experience by doer and insightful high-level guidance by instructor. Will this strategy also work well for solving complex non-convex optimization problems? Here, a common gradient-based optimizer acts like a disciplined doer, making locally optimal update at each step. Recent methods utilize large language models (LLMs) to optimize solutions for… ▽ More Learning a skill generally relies on both practical experience by doer and insightful high-level guidance by instructor. Will this strategy also work well for solving complex non-convex optimization problems? Here, a common gradient-based optimizer acts like a disciplined doer, making locally optimal update at each step. Recent methods utilize large language models (LLMs) to optimize solutions for concrete problems by inferring from natural language instructions, akin to a high-level instructor. In this paper, we show that these two optimizers are complementary to each other, suggesting a collaborative optimization approach. The gradient-based optimizer and LLM-based optimizer are combined in an interleaved manner. We instruct LLMs using task descriptions and timely optimization trajectories recorded during gradient-based optimization. Inferred results from LLMs are used as restarting points for the next stage of gradient optimization. By leveraging both the locally rigorous gradient-based optimizer and the high-level deductive LLM-based optimizer, our combined optimization method consistently yields improvements over competitive baseline prompt tuning methods. Our results demonstrate the synergistic effect of conventional gradient-based optimization and the inference ability of LLMs. The code is released at https://github.com/guozix/LLM-catalyst. △ Less

Submitted 6 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.18727 [pdf, other]

CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control

Authors: Huanshuo Liu, Hao Zhang, Zhijiang Guo, Kuicai Dong, Xiangyang Li, Yi Quan Lee, Cong Zhang, Yong Liu

Abstract: Retrieval-augmented generation (RAG) has emerged as a promising solution for mitigating hallucinations of large language models (LLMs) with retrieved external knowledge. Adaptive RAG enhances this approach by dynamically assessing the retrieval necessity, aiming to balance external and internal knowledge usage. However, existing adaptive RAG methods primarily realize retrieval on demand by relying… ▽ More Retrieval-augmented generation (RAG) has emerged as a promising solution for mitigating hallucinations of large language models (LLMs) with retrieved external knowledge. Adaptive RAG enhances this approach by dynamically assessing the retrieval necessity, aiming to balance external and internal knowledge usage. However, existing adaptive RAG methods primarily realize retrieval on demand by relying on superficially verbalize-based or probability-based feedback of LLMs, or directly fine-tuning LLMs via carefully crafted datasets, resulting in unreliable retrieval necessity decisions, heavy extra costs, and sub-optimal response generation. We present the first attempts to delve into the internal states of LLMs to mitigate such issues by introducing an effective probe-guided adaptive RAG framework, termed CtrlA. Specifically, CtrlA employs an honesty probe to regulate the LLM's behavior by manipulating its representations for increased honesty, and a confidence probe to monitor the internal states of LLM and assess confidence levels, determining the retrieval necessity during generation. Experiments show that CtrlA is superior to existing adaptive RAG methods on a diverse set of tasks, the honesty control can effectively make LLMs more honest and confidence monitoring is proven to be a promising indicator of retrieval trigger. Our codes are available at https://github.com/HSLiu-Initial/CtrlA.git. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 9 tables

arXiv:2405.18523 [pdf, other]

TripletMix: Triplet Data Augmentation for 3D Understanding

Authors: Jiaze Wang, Yi Wang, Ziyu Guo, Renrui Zhang, Donghao Zhou, Guangyong Chen, Anfeng Liu, Pheng-Ann Heng

Abstract: Data augmentation has proven to be a vital tool for enhancing the generalization capabilities of deep learning models, especially in the context of 3D vision where traditional datasets are often limited. Despite previous advancements, existing methods primarily cater to unimodal data scenarios, leaving a gap in the augmentation of multimodal triplet data, which integrates text, images, and point c… ▽ More Data augmentation has proven to be a vital tool for enhancing the generalization capabilities of deep learning models, especially in the context of 3D vision where traditional datasets are often limited. Despite previous advancements, existing methods primarily cater to unimodal data scenarios, leaving a gap in the augmentation of multimodal triplet data, which integrates text, images, and point clouds. Simultaneously augmenting all three modalities enhances diversity and improves alignment across modalities, resulting in more comprehensive and robust 3D representations. To address this gap, we propose TripletMix, a novel approach to address the previously unexplored issue of multimodal data augmentation in 3D understanding. TripletMix innovatively applies the principles of mixed-based augmentation to multimodal triplet data, allowing for the preservation and optimization of cross-modal connections. Our proposed TripletMix combines feature-level and input-level augmentations to achieve dual enhancement between raw data and latent features, significantly improving the model's cross-modal understanding and generalization capabilities by ensuring feature consistency and providing diverse and realistic training samples. We demonstrate that TripletMix not only improves the baseline performance of models in various learning scenarios including zero-shot and linear probing classification but also significantly enhances model generalizability. Notably, we improved the zero-shot classification accuracy on ScanObjectNN from 51.3 percent to 61.9 percent, and on Objaverse-LVIS from 46.8 percent to 51.4 percent. Our findings highlight the potential of multimodal data augmentation to significantly advance 3D object recognition and understanding. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.18216 [pdf, other]

A Survey on Modern Code Review: Progresses, Challenges and Opportunities

Authors: Zezhou Yang, Cuiyun Gao, Zhaoqiang Guo, Zhenhao Li, Kui Liu, Xin Xia, Yuming Zhou

Abstract: Over the past decade, modern code review (MCR) has been deemed as a crucial practice of software quality assurance, which is applied to improve software quality and transfer development knowledge within a software team. Despite its importance, MCR is often a complicated and time-consuming activity for practitioners. In recent years, many studies that are dedicated to the comprehension and the impr… ▽ More Over the past decade, modern code review (MCR) has been deemed as a crucial practice of software quality assurance, which is applied to improve software quality and transfer development knowledge within a software team. Despite its importance, MCR is often a complicated and time-consuming activity for practitioners. In recent years, many studies that are dedicated to the comprehension and the improvement of MCR have been explored so that the MCR activity can be carried out more conveniently and efficiently. To provide researchers and practitioners a clear understanding of the current research status on MCR, this paper conducts a systematic literature review of the past years. Given the collected 231 surveyed papers, this paper makes the following five contributions: First, we analyze the research trends of related MCR studies. Second, we provide a taxonomy for the current MCR, encompassing both Improvement Techniques and Understanding Studies. Third, we present the concrete research progress of each novel MCR methodology and prototype tool. Fourth, we exploit the main empirical insights from empirical study and user study that are helpful to improve MCR. Finally, we sum up unsolved challenges and outline several possible research opportunities in the future. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 62 pages

arXiv:2405.18132 [pdf, other]

EG4D: Explicit Generation of 4D Object without Score Distillation

Authors: Qi Sun, Zhiyang Guo, Ziyu Wan, **g Nathan Yan, Shengming Yin, Wengang Zhou, **g Liao, Houqiang Li

Abstract: In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and… ▽ More In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and Janus problem. Therefore, inspired by recent progress of video diffusion models, we propose to optimize a 4D representation by explicitly generating multi-view videos from one input image. However, it is far from trivial to handle practical challenges faced by such a pipeline, including dramatic temporal inconsistency, inter-frame geometry and texture diversity, and semantic defects brought by video generation results. To address these issues, we propose DG4D, a novel multi-stage framework that generates high-quality and consistent 4D assets without score distillation. Specifically, collaborative techniques and solutions are developed, including an attention injection strategy to synthesize temporal-consistent multi-view videos, a robust and efficient dynamic reconstruction method based on Gaussian Splatting, and a refinement stage with diffusion prior for semantic restoration. The qualitative results and user preference study demonstrate that our framework outperforms the baselines in generation quality by a considerable margin. Code will be released at \url{https://github.com/jasongzy/EG4D}. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17420 [pdf, other]

Survival of the Fittest Representation: A Case Study with Modular Addition

Authors: Xiaoman Delores Ding, Zifan Carl Guo, Eric J. Michaud, Ziming Liu, Max Tegmark

Abstract: When a neural network can learn multiple distinct algorithms to solve a task, how does it "choose" between them during training? To approach this question, we take inspiration from ecology: when multiple species coexist, they eventually reach an equilibrium where some survive while others die out. Analogously, we suggest that a neural network at initialization contains many solutions (representati… ▽ More When a neural network can learn multiple distinct algorithms to solve a task, how does it "choose" between them during training? To approach this question, we take inspiration from ecology: when multiple species coexist, they eventually reach an equilibrium where some survive while others die out. Analogously, we suggest that a neural network at initialization contains many solutions (representations and algorithms), which compete with each other under pressure from resource constraints, with the "fittest" ultimately prevailing. To investigate this Survival of the Fittest hypothesis, we conduct a case study on neural networks performing modular addition, and find that these networks' multiple circular representations at different Fourier frequencies undergo such competitive dynamics, with only a few circles surviving at the end. We find that the frequencies with high initial signals and gradients, the "fittest," are more likely to survive. By increasing the embedding dimension, we also observe more surviving frequencies. Inspired by the Lotka-Volterra equations describing the dynamics between species, we find that the dynamics of the circles can be nicely characterized by a set of linear differential equations. Our results with modular addition show that it is possible to decompose complicated representations into simpler components, along with their basic interactions, to offer insight on the training dynamics of representations. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16980 [pdf, other]

DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking

Authors: Hongtao Wang, Rongyu Feng, Liangyi Wu, Mutian Liu, Yinuo Cui, Chunxia Zhang, Zhenbo Guo

Abstract: In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based pi… ▽ More In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based picking methods input an image of a shot gather, and output a binary segmentation map, in which the maximum of each column is the location of FB. However, current designed segmentation networks is difficult to ensure the horizontal continuity of the segmentation. Additionally, FB jumps also exist in some areas, and it is not easy for current networks to detect such jumps. Therefore, it is important to pick as much as possible and ensure horizontal continuity. To alleviate this problem, we propose a novel semantic segmentation network for the 2-D seismic FB picking task, where we introduce the dynamic snake convolution into U-Net and call the new segmentation network dynamic-snake U-Net (DSU-Net). Specifically, we develop original dynamic-snake convolution (DSConv) in CV and propose a novel DSConv module, which can extract the horizontal continuous feature in the shallow feature of the shot gather. Many experiments have shown that DSU-Net demonstrates higher accuracy and robustness than the other 2-D segmentation-based models, achieving state-of-the-art (SOTA) performance in 2-D seismic field surveys. Particularly, it can effectively detect FB jumps and better ensure the horizontal continuity of FB. In addition, the ablation experiment and the anti-noise experiment, respectively, verify the optimal structure of the DSConv module and the robustness of the picking. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16952 [pdf, other]

A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition

Authors: Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui

Abstract: In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR). This new variance-preserving interpolation diffusion model (VPIDM) approach requires only 25 iterative steps and obviates the need for a corrector, an essential element in the existing variance-exploding interpolation… ▽ More In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR). This new variance-preserving interpolation diffusion model (VPIDM) approach requires only 25 iterative steps and obviates the need for a corrector, an essential element in the existing variance-exploding interpolation diffusion model (VEIDM). Two notable distinctions between VPIDM and VEIDM are the scaling function of the mean of state variables and the constraint imposed on the variance relative to the mean's scale. We conduct a systematic exploration of the theoretical mechanism underlying VPIDM and develop insights regarding VPIDM's applications in SE and ASR using VPIDM as a frontend. Our proposed approach, evaluated on two distinct data sets, demonstrates VPIDM's superior performances over conventional discriminative SE algorithms. Furthermore, we assess the performance of the proposed model under varying signal-to-noise ratio (SNR) levels. The investigation reveals VPIDM's improved robustness in target noise elimination when compared to VEIDM. Furthermore, utilizing the mid-outputs of both VPIDM and VEIDM results in enhanced ASR accuracies, thereby highlighting the practical efficacy of our proposed approach. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16802 [pdf, other]

AutoCV: Empowering Reasoning with Automated Process Labeling via Confidence Variation

Authors: Jianqiao Lu, Zhiyang Dou, Hongru Wang, Zeyu Cao, Jianbo Dai, Yingjia Wan, Yinya Huang, Zhijiang Guo

Abstract: In this work, we propose a novel method named \textbf{Auto}mated Process Labeling via \textbf{C}onfidence \textbf{V}ariation (\textbf{\textsc{AutoCV}}) to enhance the reasoning capabilities of large language models (LLMs) by automatically annotating the reasoning steps. Our approach begins by training a verification model on the correctness of final answers, enabling it to generate automatic proce… ▽ More In this work, we propose a novel method named \textbf{Auto}mated Process Labeling via \textbf{C}onfidence \textbf{V}ariation (\textbf{\textsc{AutoCV}}) to enhance the reasoning capabilities of large language models (LLMs) by automatically annotating the reasoning steps. Our approach begins by training a verification model on the correctness of final answers, enabling it to generate automatic process annotations. This verification model assigns a confidence score to each reasoning step, indicating the probability of arriving at the correct final answer from that point onward. We detect relative changes in the verification's confidence scores across reasoning steps to automatically annotate the reasoning process. This alleviates the need for numerous manual annotations or the high computational costs associated with model-induced annotation approaches. We experimentally validate that the confidence variations learned by the verification model trained on the final answer correctness can effectively identify errors in the reasoning steps. Subsequently, we demonstrate that the process annotations generated by \textsc{AutoCV} can improve the accuracy of the verification model in selecting the correct answer from multiple outputs generated by LLMs. Notably, we achieve substantial improvements across five datasets in mathematics and commonsense reasoning. The source code of \textsc{AutoCV} is available at \url{https://github.com/rookie-joe/AUTOCV}. △ Less

Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

Comments: 20 pages, 1 figure, 13 tables

arXiv:2405.15863 [pdf, other]

Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

Authors: Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang

Abstract: In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering a novel approach to synthesizing musical content from textual descriptions. Achieving high accuracy and diversity in this generation process requires extensive, high-quality data, which often constitutes only a fraction of available datasets. Within open-source datasets, the prevalence of issues like mi… ▽ More In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering a novel approach to synthesizing musical content from textual descriptions. Achieving high accuracy and diversity in this generation process requires extensive, high-quality data, which often constitutes only a fraction of available datasets. Within open-source datasets, the prevalence of issues like mislabeling, weak labeling, unlabeled data, and low-quality music waveform significantly hampers the development of music generation models. To overcome these challenges, we introduce a novel quality-aware masked diffusion transformer (QA-MDT) approach that enables generative models to discern the quality of input music waveform during training. Building on the unique properties of musical signals, we have adapted and implemented a MDT model for TTM task, while further unveiling its distinct capacity for quality control. Moreover, we address the issue of low-quality captions with a caption refinement data processing approach. Our demo page is shown in https://qa-mdt.github.io/. Code on https://github.com/ivcylc/qa-mdt △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15412 [pdf, other]

ORCA: A Global Ocean Emulator for Multi-year to Decadal Predictions

Authors: Zijie Guo, Pumeng Lyu, Fenghua Ling, **g-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

Abstract: Ocean dynamics plays a crucial role in driving global weather and climate patterns. Accurate and efficient modeling of ocean dynamics is essential for improved understanding of complex ocean circulation and processes, for predicting climate variations and their associated teleconnections, and for addressing the challenges of climate change. While great efforts have been made to improve numerical O… ▽ More Ocean dynamics plays a crucial role in driving global weather and climate patterns. Accurate and efficient modeling of ocean dynamics is essential for improved understanding of complex ocean circulation and processes, for predicting climate variations and their associated teleconnections, and for addressing the challenges of climate change. While great efforts have been made to improve numerical Ocean General Circulation Models (OGCMs), accurate forecasting of global oceanic variations for multi-year remains to be a long-standing challenge. Here, we introduce ORCA (Oceanic Reliable foreCAst), the first data-driven model predicting global ocean circulation from multi-year to decadal time scales. ORCA accurately simulates the three-dimensional circulations and dynamics of the global ocean with high physical consistency. Hindcasts of key oceanic variables demonstrate ORCA's remarkable prediction skills in predicting ocean variations compared with state-of-the-art numerical OGCMs and abilities in capturing occurrences of extreme events at the subsurface ocean and ENSO vertical patterns. These results demonstrate the potential of data-driven ocean models for providing cheap, efficient, and accurate global ocean modeling and prediction. Moreover, ORCA stably and faithfully emulates ocean dynamics at decadal timescales, demonstrating its potential even for climate projections. The model will be available at https://github.com/OpenEarthLab/ORCA. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15189 [pdf, other]

SOAP: Enhancing Efficiency of Generated Code via Self-Optimization

Authors: Dong Huang, Jianbo Dai, Han Weng, Puzhen Wu, Yuhao Qing, Jie M. Zhang, Heming Cui, Zhijiang Guo

Abstract: Large language models (LLMs) have shown remarkable progress in code generation, but their generated code often suffers from inefficiency, resulting in longer execution times and higher memory consumption. To address this issue, we propose Self Optimization based on OverheAd Profile (SOAP), a self-optimization framework that utilizes execution overhead profiles to improve the efficiency of LLM-gene… ▽ More Large language models (LLMs) have shown remarkable progress in code generation, but their generated code often suffers from inefficiency, resulting in longer execution times and higher memory consumption. To address this issue, we propose Self Optimization based on OverheAd Profile (SOAP), a self-optimization framework that utilizes execution overhead profiles to improve the efficiency of LLM-generated code. SOAP first generates code using an LLM, then executes it locally to capture execution time and memory usage profiles. These profiles are fed back to the LLM, which then revises the code to reduce overhead. To evaluate the effectiveness of SOAP, we conduct extensive experiments on the EffiBench, HumanEval, and MBPP with 16 open-source and 6 closed-source models. Our evaluation results demonstrate that through iterative self-optimization, SOAP significantly enhances the efficiency of LLM-generated code. For example, the execution time (ET) of StarCoder2-15B for the EffiBench decreases from 0.93 (s) to 0.12 (s) which reduces 87.1% execution time requirement compared with the initial code. The total memory usage (TMU) of StarCoder2-15B also decreases from 22.02 (Mb*s) to 2.03 (Mb*s), which decreases 90.8% total memory consumption during the execution process. The source code of SOAP was released in https://github.com/huangd1999/SOAP. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 31 pages, 18 figures, and 8 tables

arXiv:2405.13710 [pdf, other]

Optimizing Lymphocyte Detection in Breast Cancer Whole Slide Imaging through Data-Centric Strategies

Authors: Amine Marzouki, Zhuxian Guo, Qinghe Zeng, Camille Kurtz, Nicolas Loménie

Abstract: Efficient and precise quantification of lymphocytes in histopathology slides is imperative for the characterization of the tumor microenvironment and immunotherapy response insights. We developed a data-centric optimization pipeline that attain great lymphocyte detection performance using an off-the-shelf YOLOv5 model, without any architectural modifications. Our contribution that rely on strategi… ▽ More Efficient and precise quantification of lymphocytes in histopathology slides is imperative for the characterization of the tumor microenvironment and immunotherapy response insights. We developed a data-centric optimization pipeline that attain great lymphocyte detection performance using an off-the-shelf YOLOv5 model, without any architectural modifications. Our contribution that rely on strategic dataset augmentation strategies, includes novel biological upsampling and custom visual cohesion transformations tailored to the unique properties of tissue imagery, and enables to dramatically improve model performances. Our optimization reveals a pivotal realization: given intensive customization, standard computational pathology models can achieve high-capability biomarker development, without increasing the architectural complexity. We showcase the interest of this approach in the context of breast cancer where our strategies lead to good lymphocyte detection performances, echoing a broadly impactful paradigm shift. Furthermore, our data curation techniques enable crucial histological analysis benchmarks, highlighting improved generalizable potential. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13614 [pdf, ps, other]

On Relative Tractor Bundles

Authors: Andreas Cap, Zhangwen Guo, Michal Wasilewicz

Abstract: This article contributes to the relative BGG-machinery for parabolic geometries. Starting from a relative tractor bundle, this machinery constructs a sequence of differential operators that are naturally associated to the geometry in question. In many situations of interest, it is known that this sequence provides a resolution of a sheaf that can locally be realized as a pullback from a local leaf… ▽ More This article contributes to the relative BGG-machinery for parabolic geometries. Starting from a relative tractor bundle, this machinery constructs a sequence of differential operators that are naturally associated to the geometry in question. In many situations of interest, it is known that this sequence provides a resolution of a sheaf that can locally be realized as a pullback from a local leaf space of a foliation that is naturally available in this situation. An explicit description of the latter sheaf was only available under much more restrictive assumptions. For any geometry which admits relative tractor bundles, we construct a large family of such bundles for which we obtain a simple, explicit description of the resolved sheaves under weak assumptions on the torsion of the geometry. In particular, we discuss the cases of Legendrean contact structures and of generalized path geometries, which are among the most important examples for which the relative BGG machinery is available. In both cases, we show that essentially all relative tractor bundles are obtained by our construction and our description of the resolved sheaves applies whenever the BGG sequence is a resolution. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 21 pages, Comments are welcome

MSC Class: primary: 58J10; secondary: 53C07; 53C15; 58J60; 58J70

arXiv:2405.13532 [pdf, other]

What Makes Good Few-shot Examples for Vision-Language Models?

Authors: Zhaojun Guo, **ghui Lu, Xue**g Liu, Rui Zhao, ZhenXing Qian, Fei Tan

Abstract: Despite the notable advancements achieved by leveraging pre-trained vision-language (VL) models through few-shot tuning for downstream tasks, our detailed empirical study highlights a significant dependence of few-shot learning outcomes on the careful selection of training examples - a facet that has been previously overlooked in research. In this study, we delve into devising more effective strat… ▽ More Despite the notable advancements achieved by leveraging pre-trained vision-language (VL) models through few-shot tuning for downstream tasks, our detailed empirical study highlights a significant dependence of few-shot learning outcomes on the careful selection of training examples - a facet that has been previously overlooked in research. In this study, we delve into devising more effective strategies for the meticulous selection of few-shot training examples, as opposed to relying on random sampling, to enhance the potential of existing few-shot prompt learning methodologies. To achieve this, we assess the effectiveness of various Active Learning (AL) techniques for instance selection, such as Entropy and Margin of Confidence, within the context of few-shot training. Furthermore, we introduce two innovative selection methods - Representativeness (REPRE) and Gaussian Monte Carlo (Montecarlo) - designed to proactively pinpoint informative examples for labeling in relation to pre-trained VL models. Our findings demonstrate that both REPRE and Montecarlo significantly surpass both random selection and AL-based strategies in few-shot training scenarios. The research also underscores that these instance selection methods are model-agnostic, offering a versatile enhancement to a wide array of few-shot training methodologies. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 8 pages, 4 figures

arXiv:2405.12069 [pdf, other]

Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture War**

Authors: Tianhao Wu, **g Yang, Zhilin Guo, **gyi Wan, Fangcheng Zhong, Cengiz Oztireli

Abstract: By equip** the most recent 3D Gaussian Splatting representation with head 3D morphable models (3DMM), existing methods manage to create head avatars with high fidelity. However, most existing methods only reconstruct a head without the body, substantially limiting their application scenarios. We found that naively applying Gaussians to model the clothed chest and shoulders tends to result in blu… ▽ More By equip** the most recent 3D Gaussian Splatting representation with head 3D morphable models (3DMM), existing methods manage to create head avatars with high fidelity. However, most existing methods only reconstruct a head without the body, substantially limiting their application scenarios. We found that naively applying Gaussians to model the clothed chest and shoulders tends to result in blurry reconstruction and noisy floaters under novel poses. This is because of the fundamental limitation of Gaussians and point clouds -- each Gaussian or point can only have a single directional radiance without spatial variance, therefore an unnecessarily large number of them is required to represent complicated spatially varying texture, even for simple geometry. In contrast, we propose to model the body part with a neural texture that consists of coarse and pose-dependent fine colors. To properly render the body texture for each view and pose without accurate geometry nor UV map**, we optimize another sparse set of Gaussians as anchors that constrain the neural war** field that maps image plane coordinates to the texture space. We demonstrate that Gaussian Head & Shoulders can fit the high-frequency details on the clothed upper body with high fidelity and potentially improve the accuracy and fidelity of the head region. We evaluate our method with casual phone-captured and internet videos and show our method archives superior reconstruction quality and robustness in both self and cross reenactment tasks. To fully utilize the efficient rendering speed of Gaussian splatting, we additionally propose an accelerated inference method of our trained model without Multi-Layer Perceptron (MLP) queries and reach a stable rendering speed of around 130 FPS for any subjects. △ Less

Submitted 21 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: Project Page: https://gaussian-head-shoulders.netlify.app/

arXiv:2405.11682 [pdf, other]

FADet: A Multi-sensor 3D Object Detection Network based on Local Featured Attention

Authors: Ziang Guo, Zakhar Yagudin, Selamawit Asfaw, Artem Lykov, Dzmitry Tsetserukou

Abstract: Camera, LiDAR and radar are common perception sensors for autonomous driving tasks. Robust prediction of 3D object detection is optimally based on the fusion of these sensors. To exploit their abilities wisely remains a challenge because each of these sensors has its own characteristics. In this paper, we propose FADet, a multi-sensor 3D detection network, which specifically studies the characteri… ▽ More Camera, LiDAR and radar are common perception sensors for autonomous driving tasks. Robust prediction of 3D object detection is optimally based on the fusion of these sensors. To exploit their abilities wisely remains a challenge because each of these sensors has its own characteristics. In this paper, we propose FADet, a multi-sensor 3D detection network, which specifically studies the characteristics of different sensors based on our local featured attention modules. For camera images, we propose dual-attention-based sub-module. For LiDAR point clouds, triple-attention-based sub-module is utilized while mixed-attention-based sub-module is applied for features of radar points. With local featured attention sub-modules, our FADet has effective detection results in long-tail and complex scenes from camera, LiDAR and radar input. On NuScenes validation dataset, FADet achieves state-of-the-art performance on LiDAR-camera object detection tasks with 71.8% NDS and 69.0% mAP, at the same time, on radar-camera object detection tasks with 51.7% NDS and 40.3% mAP. Code will be released at https://github.com/ZionGo6/FADet. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: Submitted to IEEE

arXiv:2405.11532 [pdf, other]

Non-Invasive Monitoring of Vital Signs in Calves Using Thermal Imaging Technology

Authors: Ehsan Sadeghi, Zinan Guo, Alessandro Chiumento, Paul Havinga

Abstract: This study presents a non-invasive method using thermal imaging to estimate heart and respiration rates in calves, avoiding the stress from wearables. Using Kernelised Correlation Filters (KCF) for movement tracking and advanced signal processing, we targeted one ROI for respiration and four for heart rate based on their thermal correlation. Achieving Mean Absolute Percentage Errors (MAPE) of 3.08… ▽ More This study presents a non-invasive method using thermal imaging to estimate heart and respiration rates in calves, avoiding the stress from wearables. Using Kernelised Correlation Filters (KCF) for movement tracking and advanced signal processing, we targeted one ROI for respiration and four for heart rate based on their thermal correlation. Achieving Mean Absolute Percentage Errors (MAPE) of 3.08% for respiration and 3.15% for heart rate validates the efficacy of thermal imaging in vital signs monitoring, offering a practical, less intrusive tool for Precision Livestock Farming (PLF), improving animal welfare and management. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.11430 [pdf, other]

MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

Authors: Jianbo Dai, Jianqiao Lu, Yunlong Feng, Rongju Ruan, Ming Cheng, Haochen Tan, Zhijiang Guo

Abstract: Recent advancements in large language models (LLMs) have greatly improved code generation, specifically at the function level. For instance, GPT-4 has achieved an 88.4% pass rate on HumanEval. However, this draws into question the adequacy of existing benchmarks in thoroughly assessing function-level code generation capabilities. Our study analyzed two common benchmarks, HumanEval and MBPP, and fo… ▽ More Recent advancements in large language models (LLMs) have greatly improved code generation, specifically at the function level. For instance, GPT-4 has achieved an 88.4% pass rate on HumanEval. However, this draws into question the adequacy of existing benchmarks in thoroughly assessing function-level code generation capabilities. Our study analyzed two common benchmarks, HumanEval and MBPP, and found that these might not thoroughly evaluate LLMs' code generation capacities due to limitations in quality, difficulty, and granularity. To resolve this, we introduce the Mostly Hard Python Problems (MHPP) dataset, consisting of 140 unique human-curated problems. By focusing on the combination of natural language and code reasoning, MHPP gauges LLMs' abilities to comprehend specifications and restrictions, engage in multi-step reasoning, and apply coding knowledge effectively. Initial evaluations of 22 LLMs using MHPP showed many high-performing models on HumanEval failed to achieve similar success on MHPP. Moreover, MHPP highlighted various previously undiscovered limitations within various LLMs, leading us to believe that it could pave the way for a better understanding of LLMs' capabilities and limitations. Dataset and code are available at https://github.com/SparksofAGI/MHPP. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 39 pages, dataset and code are available at https://github.com/SparksofAGI/MHPP

arXiv:2405.10877 [pdf, other]

WEITS: A Wavelet-enhanced residual framework for interpretable time series forecasting

Authors: Ziyou Guo, Yan Sun, Tieru Wu

Abstract: Time series (TS) forecasting has been an unprecedentedly popular problem in recent years, with ubiquitous applications in both scientific and business fields. Various approaches have been introduced to time series analysis, including both statistical approaches and deep neural networks. Although neural network approaches have illustrated stronger ability of representation than statistical methods,… ▽ More Time series (TS) forecasting has been an unprecedentedly popular problem in recent years, with ubiquitous applications in both scientific and business fields. Various approaches have been introduced to time series analysis, including both statistical approaches and deep neural networks. Although neural network approaches have illustrated stronger ability of representation than statistical methods, they struggle to provide sufficient interpretablility, and can be too complicated to optimize. In this paper, we present WEITS, a frequency-aware deep learning framework that is highly interpretable and computationally efficient. Through multi-level wavelet decomposition, WEITS novelly infuses frequency analysis into a highly deep learning framework. Combined with a forward-backward residual architecture, it enjoys both high representation capability and statistical interpretability. Extensive experiments on real-world datasets have demonstrated competitive performance of our model, along with its additional advantage of high computation efficiency. Furthermore, WEITS provides a general framework that can always seamlessly integrate with state-of-the-art approaches for time series forecast. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.09488 by other authors

arXiv:2405.10277 [pdf, ps, other]

Hilbert Functions and Low-Degree Randomness Extractors

Authors: Alexander Golovnev, Zeyu Guo, Pooya Hatami, Satyajeet Nagargoje, Chao Yan

Abstract: For $S\subseteq \mathbb{F}^n$, consider the linear space of restrictions of degree-$d$ polynomials to $S$. The Hilbert function of $S$, denoted $\mathrm{h}_S(d,\mathbb{F})$, is the dimension of this space. We obtain a tight lower bound on the smallest value of the Hilbert function of subsets $S$ of arbitrary finite grids in $\mathbb{F}^n$ with a fixed size $|S|$. We achieve this by proving that th… ▽ More For $S\subseteq \mathbb{F}^n$, consider the linear space of restrictions of degree-$d$ polynomials to $S$. The Hilbert function of $S$, denoted $\mathrm{h}_S(d,\mathbb{F})$, is the dimension of this space. We obtain a tight lower bound on the smallest value of the Hilbert function of subsets $S$ of arbitrary finite grids in $\mathbb{F}^n$ with a fixed size $|S|$. We achieve this by proving that this value coincides with a combinatorial quantity, namely the smallest number of low Hamming weight points in a down-closed set of size $|S|$. Understanding the smallest values of Hilbert functions is closely related to the study of degree-$d$ closure of sets, a notion introduced by Nie and Wang (Journal of Combinatorial Theory, Series A, 2015). We use bounds on the Hilbert function to obtain a tight bound on the size of degree-$d$ closures of subsets of $\mathbb{F}_q^n$, which answers a question posed by Doron, Ta-Shma, and Tell (Computational Complexity, 2022). We use the bounds on the Hilbert function and degree-$d$ closure of sets to prove that a random low-degree polynomial is an extractor for samplable randomness sources. Most notably, we prove the existence of low-degree extractors and dispersers for sources generated by constant-degree polynomials and polynomial-size circuits. Until recently, even the existence of arbitrary deterministic extractors for such sources was not known. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.08981 [pdf, other]

doi 10.1145/3655602

Impact of Design Decisions in Scanpath Modeling

Authors: Parvin Emami, Yue Jiang, Zixin Guo, Luis A. Leiva

Abstract: Modeling visual saliency in graphical user interfaces (GUIs) allows to understand how people perceive GUI designs and what elements attract their attention. One aspect that is often overlooked is the fact that computational models depend on a series of design parameters that are not straightforward to decide. We systematically analyze how different design parameters affect scanpath evaluation metr… ▽ More Modeling visual saliency in graphical user interfaces (GUIs) allows to understand how people perceive GUI designs and what elements attract their attention. One aspect that is often overlooked is the fact that computational models depend on a series of design parameters that are not straightforward to decide. We systematically analyze how different design parameters affect scanpath evaluation metrics using a state-of-the-art computational model (DeepGaze++). We particularly focus on three design parameters: input image size, inhibition-of-return decay, and masking radius. We show that even small variations of these design parameters have a noticeable impact on standard evaluation metrics such as DTW or Eyenalysis. These effects also occur in other scanpath models, such as UMSS and ScanGAN, and in other datasets such as MASSVIS. Taken together, our results put forward the impact of design decisions for predicting users' viewing behavior on GUIs. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 16 pages

arXiv:2405.08591 [pdf, ps, other]

Degeneracy Enhancement of Neutron-Antineutron Oscillation in Neutron Star

Authors: Xuan-Ye Fu, Shao-Feng Ge, Zi-Yang Guo, Qi-Heng Wang

Abstract: We explore the fermion oscillation in a degenerate environment. The direct consequence is introducing a Pauli blocking factor $1 - f_i$, where $f_i$ is the phase space distribution function, for each intermediate mass eigenstate during propagation. It is then much easier for a state with larger existing fraction or density to oscillate into other states with less degeneracy while the reversed proc… ▽ More We explore the fermion oscillation in a degenerate environment. The direct consequence is introducing a Pauli blocking factor $1 - f_i$, where $f_i$ is the phase space distribution function, for each intermediate mass eigenstate during propagation. It is then much easier for a state with larger existing fraction or density to oscillate into other states with less degeneracy while the reversed process is not enhanced. This can significantly modify the oscillation behaviors. We apply this degenerate fermion oscillation to a concrete scenario of neutron-antineutron oscillation in neutron star. It turns out antineutrons receive a standing fraction to annihilate with the environmental neutrons. The subsequent neutron star heating can put an extremely stringent bound on the baryon number violating cross mass term between neutron and antineutron. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 6 pages, 2 figures

arXiv:2405.08448 [pdf, other]

Understanding the performance gap between online and offline alignment algorithms

Authors: Yunhao Tang, Daniel Zhaohan Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao, Eugene Tarassov, Rémi Munos, Bernardo Ávila Pires, Michal Valko, Yong Cheng, Will Dabney

Abstract: Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the context of reward over-optimization, we start with an opening set of experiments that demonstrate the clear advantage of online methods over offline methods. This pro… ▽ More Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the context of reward over-optimization, we start with an opening set of experiments that demonstrate the clear advantage of online methods over offline methods. This prompts us to investigate the causes to the performance discrepancy through a series of carefully designed experimental ablations. We show empirically that hypotheses such as offline data coverage and data quality by itself cannot convincingly explain the performance difference. We also find that while offline algorithms train policy to become good at pairwise classification, it is worse at generations; in the meantime the policies trained by online algorithms are good at generations while worse at pairwise classification. This hints at a unique interplay between discriminative and generative capabilities, which is greatly impacted by the sampling process. Lastly, we observe that the performance discrepancy persists for both contrastive and non-contrastive loss functions, and appears not to be addressed by simply scaling up policy networks. Taken together, our study sheds light on the pivotal role of on-policy sampling in AI alignment, and hints at certain fundamental challenges of offline alignment algorithms. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.07638 [pdf, other]

DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS

Authors: Qingyang Li, Yihang Zhang, Zhidong Jia, Yannan Hu, Lei Zhang, Jianrong Zhang, Yongming Xu, Yong Cui, Zongming Guo, Xinggong Zhang

Abstract: It is an interesting question Can and How Large Language Models (LLMs) understand non-language network data, and help us detect unknown malicious flows. This paper takes Carpet Bombing as a case study and shows how to exploit LLMs' powerful capability in the networking area. Carpet Bombing is a new DDoS attack that has dramatically increased in recent years, significantly threatening network infra… ▽ More It is an interesting question Can and How Large Language Models (LLMs) understand non-language network data, and help us detect unknown malicious flows. This paper takes Carpet Bombing as a case study and shows how to exploit LLMs' powerful capability in the networking area. Carpet Bombing is a new DDoS attack that has dramatically increased in recent years, significantly threatening network infrastructures. It targets multiple victim IPs within subnets, causing congestion on access links and disrupting network services for a vast number of users. Characterized by low-rates, multi-vectors, these attacks challenge traditional DDoS defenses. We propose DoLLM, a DDoS detection model utilizes open-source LLMs as backbone. By reorganizing non-contextual network flows into Flow-Sequences and projecting them into LLMs semantic space as token embeddings, DoLLM leverages LLMs' contextual understanding to extract flow representations in overall network context. The representations are used to improve the DDoS detection performance. We evaluate DoLLM with public datasets CIC-DDoS2019 and real NetFlow trace from Top-3 countrywide ISP. The tests have proven that DoLLM possesses strong detection capabilities. Its F1 score increased by up to 33.3% in zero-shot scenarios and by at least 20.6% in real ISP traces. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07072 [pdf, other]

Selecting focused digital cohorts from social media using the metric backbone of biomedical knowledge graphs

Authors: Ziqi Guo, Jack Felag, Jordan C. Rozum, Rion Brattig Correia, Luis M. Rocha

Abstract: The abundance of social media data allows researchers to construct large digital cohorts to study the interplay between human behavior and medical treatment. Identifying the users most relevant to a specific health problem is, however, a challenge in that social media sites vary in the generality of their discourse. While X (formerly Twitter), Instagram, and Facebook cater to wide ranging topics,… ▽ More The abundance of social media data allows researchers to construct large digital cohorts to study the interplay between human behavior and medical treatment. Identifying the users most relevant to a specific health problem is, however, a challenge in that social media sites vary in the generality of their discourse. While X (formerly Twitter), Instagram, and Facebook cater to wide ranging topics, Reddit subgroups and dedicated patient advocacy forums trade in much more specific, biomedically-relevant discourse. To hone in on relevant users anywhere, we have developed a general framework and applied it to epilepsy discourse in social media as a test case. We analyzed the text from posts by users who mention epilepsy drugs in the general-purpose social media sites X and Instagram, the epilepsy-focused Reddit subgroup (r/Epilepsy), and the Epilepsy Foundation of America (EFA) forums. We curated a medical terms dictionary and used it to generate a knowledge graph (KG) for each online community. For each KG, we computed the metric backbone--the smallest subgraph that preserves all shortest paths in the network. By comparing the subset of users who contribute to the backbone to the subset who do not, we found that epilepsy-focused social media users contribute to the KG backbone in much higher proportion than do general-purpose social media users. Furthermore, using human annotation of Instagram posts, we demonstrated that users who do not contribute to the backbone are more than twice as likely to use dictionary terms in a manner inconsistent with their biomedical meaning. For biomedical research applications, our backbone-based approach thus has several benefits over simple engagement-based approaches: It can retain low-engagement users who nonetheless contribute meaningful biomedical insights. It can filter out very vocal users who contribute no relevant content. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06763 [pdf, other]

Post-selection inference for causal effects after causal discovery

Authors: Ting-Hsuan Chang, Zijian Guo, Daniel Malinsky

Abstract: Algorithms for constraint-based causal discovery select graphical causal models among a space of possible candidates (e.g., all directed acyclic graphs) by executing a sequence of conditional independence tests. These may be used to inform the estimation of causal effects (e.g., average treatment effects) when there is uncertainty about which covariates ought to be adjusted for, or which variables… ▽ More Algorithms for constraint-based causal discovery select graphical causal models among a space of possible candidates (e.g., all directed acyclic graphs) by executing a sequence of conditional independence tests. These may be used to inform the estimation of causal effects (e.g., average treatment effects) when there is uncertainty about which covariates ought to be adjusted for, or which variables act as confounders versus mediators. However, naively using the data twice, for model selection and estimation, would lead to invalid confidence intervals. Moreover, if the selected graph is incorrect, the inferential claims may apply to a selected functional that is distinct from the actual causal effect. We propose an approach to post-selection inference that is based on a resampling and screening procedure, which essentially performs causal discovery multiple times with randomly varying intermediate test statistics. Then, an estimate of the target causal effect and corresponding confidence sets are constructed from a union of individual graph-based estimates and intervals. We show that this construction has asymptotically correct coverage for the true causal effect parameter. Importantly, the guarantee holds for a fixed population-level effect, not a data-dependent or selection-dependent quantity. Most of our exposition focuses on the PC-algorithm for learning directed acyclic graphs and the multivariate Gaussian case for simplicity, but the approach is general and modular, so it may be used with other conditional independence based discovery algorithms and distributional families. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.06041 [pdf]

Gate Tunable Asymmetric Ozone Adsorption on Graphene

Authors: Zhen Qi, Wanlei Li, Jun Cheng, Zhongxin Guo, Chenglong Li, Shang Wang, Zuoquan Tan, Zhiting Gao, Yongchao Wang, Zichen Lian, Shanshan Chen, Yonglin He, Zhiyong Wang, Yapei Wang, **song Zhang, Yayu Wang, Peng Cai

Abstract: Molecular adsorption is pivotal in device fabrication and material synthesis for quantum technology. However, elucidating the behavior of physisorption poses technical challenges. Here graphene with ultrahigh sensitivity was utilized to detect ozone adsorption at cryogenic temperatures. Significant hole do** observed in graphene indicates a strong interaction between ozone and graphene. Interest… ▽ More Molecular adsorption is pivotal in device fabrication and material synthesis for quantum technology. However, elucidating the behavior of physisorption poses technical challenges. Here graphene with ultrahigh sensitivity was utilized to detect ozone adsorption at cryogenic temperatures. Significant hole do** observed in graphene indicates a strong interaction between ozone and graphene. Interestingly, the adsorption exhibits asymmetry with positive and negative gate voltages. The strong affinity of ozone provides a tool to modulate materials and devices, while the gate tunability of adsorption offers new insights into construction and manipulation of oxide quantum materials. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05885 [pdf, other]

Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes

Authors: Ziang Guo, Artem Lykov, Zakhar Yagudin, Mikhail Konenkov, Dzmitry Tsetserukou

Abstract: Recent research about Large Language Model based autonomous driving solutions shows a promising picture in planning and control fields. However, heavy computational resources and hallucinations of Large Language Models continue to hinder the tasks of predicting precise trajectories and instructing control signals. To address this problem, we propose Co-driver, a novel autonomous driving assistant… ▽ More Recent research about Large Language Model based autonomous driving solutions shows a promising picture in planning and control fields. However, heavy computational resources and hallucinations of Large Language Models continue to hinder the tasks of predicting precise trajectories and instructing control signals. To address this problem, we propose Co-driver, a novel autonomous driving assistant system to empower autonomous vehicles with adjustable driving behaviors based on the understanding of road scenes. A pipeline involving the CARLA simulator and Robot Operating System 2 (ROS2) verifying the effectiveness of our system is presented, utilizing a single Nvidia 4090 24G GPU while exploiting the capacity of textual output of the Visual Language Model. Besides, we also contribute a dataset containing an image set and a corresponding prompt set for fine-tuning the Visual Language Model module of our system. In the real-world driving dataset, our system achieved 96.16% success rate in night scenes and 89.7% in gloomy scenes regarding reasonable predictions. Our Co-driver dataset will be released at https://github.com/ZionGo6/Co-driver. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: The paper is submitted to the IEEE conference

arXiv:2405.05317 [pdf, other]

First detection of CO isotopologues in a high-redshift main-sequence galaxy: evidence of a top-heavy stellar initial mass function

Authors: Ziyi Guo, Zhi-Yu Zhang, Zhiqiang Yan, Eda Gjergo, Allison Man, R. J. Ivison, Xiaoting Fu, Yong Shi

Abstract: Recent observations and theories have presented a strong challenge to the universality of the stellar initial mass function (IMF) in extreme environments. A notable example has been found for starburst conditions, where evidence favours a top-heavy IMF, i.e. there is a bias toward massive stars compared to the IMF that is responsible for the stellar mass function and elemental abundances observed… ▽ More Recent observations and theories have presented a strong challenge to the universality of the stellar initial mass function (IMF) in extreme environments. A notable example has been found for starburst conditions, where evidence favours a top-heavy IMF, i.e. there is a bias toward massive stars compared to the IMF that is responsible for the stellar mass function and elemental abundances observed in the Milky Way. Local starburst galaxies have star-formation rates similar to those in high-redshift main-sequence galaxies, which appear to dominate the stellar mass budget at early epochs. However, the IMF of high-redshift main-sequence galaxies is yet to be probed. Since $^{13}$CO and C$^{18}$O isotopologues are sensitive to the IMF, we have observed these lines towards four strongly-lensed high-redshift main-sequence galaxies using the Atacama Large Millimeter/sub-millimeter Array. Of our four targets, SDSS J0901+1814, at $z \approx 2.26$, is seen clearly in $^{13}$CO and C$^{18}$O, the first detection of CO isotopologues in the high-redshift main-sequence galaxy population. The observed $^{13}$C/$^{18}$O ratio, $2.4 \pm 0.8$, is significantly lower than that of local main-sequence galaxies. We estimate the isotope ratio, oxygen abundance and stellar mass using a series of chemical evolution models with varying star-formation histories and IMFs. All models favour an IMF that is more top-heavy than that of the Milky Way. Thus, as with starburst galaxies, main-sequence galaxies in the high-redshift Universe have a greater fraction of massive stars than a Milky-Way IMF would imply. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 15 pages, 8 figures, accepted by ApJ

arXiv:2405.05229 [pdf, other]

myAURA: Personalized health library for epilepsy management via knowledge graph sparsification and visualization

Authors: Rion Brattig Correia, Jordan C. Rozum, Leonard Cross, Jack Felag, Michael Gallant, Ziqi Guo, Bruce W. Herr II, Aehong Min, Deborah Stungis Rocha, Xuan Wang, Katy Börner, Wendy Miller, Luis M. Rocha

Abstract: Objective: We report the development of the patient-centered myAURA application and suite of methods designed to aid epilepsy patients, caregivers, and researchers in making decisions about care and self-management. Materials and Methods: myAURA rests on the federation of an unprecedented collection of heterogeneous data resources relevant to epilepsy, such as biomedical databases, social media,… ▽ More Objective: We report the development of the patient-centered myAURA application and suite of methods designed to aid epilepsy patients, caregivers, and researchers in making decisions about care and self-management. Materials and Methods: myAURA rests on the federation of an unprecedented collection of heterogeneous data resources relevant to epilepsy, such as biomedical databases, social media, and electronic health records. A generalizable, open-source methodology was developed to compute a multi-layer knowledge graph linking all this heterogeneous data via the terms of a human-centered biomedical dictionary. Results: The power of the approach is first exemplified in the study of the drug-drug interaction phenomenon. Furthermore, we employ a novel network sparsification methodology using the metric backbone of weighted graphs, which reveals the most important edges for inference, recommendation, and visualization, such as pharmacology factors patients discuss on social media. The network sparsification approach also allows us to extract focused digital cohorts from social media whose discourse is more relevant to epilepsy or other biomedical problems. Finally, we present our patient-centered design and pilot-testing of myAURA, including its user interface, based on focus groups and other stakeholder input. Discussion: The ability to search and explore myAURA's heterogeneous data sources via a sparsified multi-layer knowledge graph, as well as the combination of those layers in a single map, are useful features for integrating relevant information for epilepsy. Conclusion: Our stakeholder-driven, scalable approach to integrate traditional and non-traditional data sources, enables biomedical discovery and data-powered patient self-management in epilepsy, and is generalizable to other chronic conditions. △ Less

Submitted 10 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.02322 [pdf]

Towards Causal Interpretation of Sexual Orientation in Regression Analysis: Applications and Challenges

Authors: Junjie Lu, Zhongyi Guo, David H. Rehkopf

Abstract: This study presents an approach to analyze health disparities in Sexual and Gender Minority (SGM) populations, with a focus on the role of social support levels as an example to allow causal interpretations of regression models. We advocate for precisely defining the exposure variable and incorporating mediators into analyses, to address the limitations of comparing counterfactual outcomes solely… ▽ More This study presents an approach to analyze health disparities in Sexual and Gender Minority (SGM) populations, with a focus on the role of social support levels as an example to allow causal interpretations of regression models. We advocate for precisely defining the exposure variable and incorporating mediators into analyses, to address the limitations of comparing counterfactual outcomes solely between SGM and heterosexual populations. We define sexual orientation into domains (attraction, behavior, and identity), and emphasize a consideration of these elements either separately or together, depending on the research question. We also introduce social support measured before and after the disclosure of sexual orientation to facilitate inference. We illustrate this approach by examining the association between SGM status and depression diagnosis with data from the 2020 and 2021 National Health Interview Survey. We find a direct effect of SGM status on depression (OR: 3.07, 95% CI: 2.64 - 3.58) and no indirect effect through social support (OR: 1.07, 95% CI: 0.87-1.31). Our research emphasizes the necessity of the comprehensive measurement of sexual orientation and a focus on intervenable variables like social support in order to empower SGM communities and address SGM related health inequalities. △ Less

Submitted 21 April, 2024; originally announced May 2024.

arXiv:2405.01943 [pdf, other]

Dependency-Aware Semi-Structured Sparsity: Declining Roles of Outliers in Pruning GLU-based LLMs

Authors: Zhiyu Guo, Hidetaka Kamigaito, Taro Wanatnabe

Abstract: The rapid growth in the scale of Large Language Models (LLMs) has led to significant computational and memory costs, making model compression techniques such as network pruning increasingly crucial for their efficient deployment. Recent LLMs such as LLaMA2 and Mistral have adopted GLU-based MLP architectures. However, current LLM pruning strategies are primarily based on insights from older LLM ar… ▽ More The rapid growth in the scale of Large Language Models (LLMs) has led to significant computational and memory costs, making model compression techniques such as network pruning increasingly crucial for their efficient deployment. Recent LLMs such as LLaMA2 and Mistral have adopted GLU-based MLP architectures. However, current LLM pruning strategies are primarily based on insights from older LLM architectures, necessitating a reevaluation of these strategies to suit the new architectural characteristics. Contrary to traditional beliefs, we find that outliers play a diminished role in the input projections of GLU-based MLPs. Leveraging this new insight, we propose Dependency-aware Semi-structured Sparsity (DaSS), a novel pruning method for GLU-based LLMs. DaSS balances the flexibility of unstructured pruning and the structural consistency of dependency-based structured pruning by considering both of weight magnitude and corresponding intermediate activation norms in weight pruning metric. Empirical evaluations on the Mistral, Gemma, and LLaMA2 model families demonstrate the consistent effectiveness of DaSS in the prevailing GLU variants. △ Less

Submitted 20 June, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.00630 [pdf, other]

Depth Priors in Removal Neural Radiance Fields

Authors: Zhihao Guo, Peng Wang

Abstract: Neural Radiance Fields (NeRF) have achieved impressive results in 3D reconstruction and novel view generation. A significant challenge within NeRF involves editing reconstructed 3D scenes, such as object removal, which demands consistency across multiple views and the synthesis of high-quality perspectives. Previous studies have integrated depth priors, typically sourced from LiDAR or sparse depth… ▽ More Neural Radiance Fields (NeRF) have achieved impressive results in 3D reconstruction and novel view generation. A significant challenge within NeRF involves editing reconstructed 3D scenes, such as object removal, which demands consistency across multiple views and the synthesis of high-quality perspectives. Previous studies have integrated depth priors, typically sourced from LiDAR or sparse depth estimates from COLMAP, to enhance NeRF's performance in object removal. However, these methods are either expensive or time-consuming. This paper proposes a new pipeline that leverages SpinNeRF and monocular depth estimation models like ZoeDepth to enhance NeRF's performance in complex object removal with improved efficiency. A thorough evaluation of COLMAP's dense depth reconstruction on the KITTI dataset is conducted to demonstrate that COLMAP can be viewed as a cost-effective and scalable alternative for acquiring depth ground truth compared to traditional methods like LiDAR. This serves as the basis for evaluating the performance of monocular depth estimation models to determine the best one for generating depth priors for SpinNeRF. The new pipeline is tested in various scenarios involving 3D reconstruction and object removal, and the results indicate that our pipeline significantly reduces the time required for the acquisition of depth priors for object removal and enhances the fidelity of the synthesized views, suggesting substantial potential for building high-fidelity digital twin systems with increased efficiency in the future. △ Less

Submitted 3 July, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: 17 pages

MSC Class: 68T40; 68T07; 68T45 ACM Class: I.4.5

arXiv:2405.00236 [pdf, other]

STT: Stateful Tracking with Transformers for Autonomous Driving

Authors: Longlong **g, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng, Sang** Lee, Chris Sweeney, Qiurui He, Wei-Chih Hung, Tong He, Xingyi Zhou, Farshid Moussavi, Zijian Guo, Yin Zhou, Mingxing Tan, Weilong Yang, Congcong Li

Abstract: Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying c… ▽ More Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying complex heuristics to predict the states. In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately. STT consumes rich appearance, geometry, and motion signals through long term history of detections and is jointly optimized for both data association and state estimation tasks. Since the standard tracking metrics like MOTA and MOTP do not capture the combined performance of the two tasks in the wider spectrum of object states, we extend them with new metrics called S-MOTA and MOTPS that address this limitation. STT achieves competitive real-time performance on the Waymo Open Dataset. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: ICRA 2024

arXiv:2404.19484 [pdf, other]

More Compute Is What You Need

Authors: Zhen Guo

Abstract: Large language model pre-training has become increasingly expensive, with most practitioners relying on scaling laws to allocate compute budgets for model size and training tokens, commonly referred to as Compute-Optimal or Chinchilla Optimal. In this paper, we hypothesize a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models,… ▽ More Large language model pre-training has become increasingly expensive, with most practitioners relying on scaling laws to allocate compute budgets for model size and training tokens, commonly referred to as Compute-Optimal or Chinchilla Optimal. In this paper, we hypothesize a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models, independent of the specific allocation to model size and dataset size. Using this unified scaling law, we predict that (a) for inference efficiency, training should prioritize smaller model sizes and larger training datasets, and (b) assuming the exhaustion of available web datasets, scaling the model size might be the only way to further improve model performance. △ Less

Submitted 1 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19245 [pdf, other]

HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

Authors: Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, Chengzhong Xu

Abstract: Adapting Large Language Models (LLMs) to new tasks through fine-tuning has been made more efficient by the introduction of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA. However, these methods often underperform compared to full fine-tuning, particularly in scenarios involving complex datasets. This issue becomes even more pronounced in complex domains, highlighting the need for… ▽ More Adapting Large Language Models (LLMs) to new tasks through fine-tuning has been made more efficient by the introduction of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA. However, these methods often underperform compared to full fine-tuning, particularly in scenarios involving complex datasets. This issue becomes even more pronounced in complex domains, highlighting the need for improved PEFT approaches that can achieve better performance. Through a series of experiments, we have uncovered two critical insights that shed light on the training and parameter inefficiency of LoRA. Building on these insights, we have developed HydraLoRA, a LoRA framework with an asymmetric structure that eliminates the need for domain expertise. Our experiments demonstrate that HydraLoRA outperforms other PEFT approaches, even those that rely on domain knowledge during the training and inference phases. △ Less

Submitted 23 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: 19 pages, 7 figures

arXiv:2404.18146 [pdf, other]

doi 10.1088/2053-1583/ad3b12

Tailoring coercive fields and the Curie temperature via proximity coupling in WSe$_2$/Fe$_3$GeTe$_2$ van der Waals heterostructures

Authors: Guodong Ma, Renjun Du, Fuzhuo Lian, Song Bao, Zi**g Guo, Xiaofan Cai, **gkuan Xiao, Yaqing Han, Di Zhang, Siqi Jiang, Jiabei Huang, Xinglong Wu, Alexander S. Mayorov, **sheng Wen, Lei Wang, Geliang Yu

Abstract: Hybrid structures consisting of two-dimensional (2D) magnets and semiconductors have exhibited extensive functionalities in spintronics and opto-spintronics. In this work, we have fabricated WSe$_2$/Fe$_3$GeTe$_2$ van der Waals (vdW) heterostructures and investigated the proximity effects on 2D magnetism. Through reflective magnetic circular dichroism (RMCD), we have observed a temperature-depende… ▽ More Hybrid structures consisting of two-dimensional (2D) magnets and semiconductors have exhibited extensive functionalities in spintronics and opto-spintronics. In this work, we have fabricated WSe$_2$/Fe$_3$GeTe$_2$ van der Waals (vdW) heterostructures and investigated the proximity effects on 2D magnetism. Through reflective magnetic circular dichroism (RMCD), we have observed a temperature-dependent modulation of magnetic order in the heterostructure. For temperatures above $40$ K, WSe$_2$-covered Fe$_3$GeTe$_2$ exhibits a larger coercive field than that observed in bare Fe$_3$GeTe$_2$, accompanied by a noticeable enhancement of the Curie temperature by $21$ K. This strengthening suggests an increase in magnetic anisotropy in the interfacial Fe$_3$GeTe$_2$ layer, which can be attributed to the spin-orbit coupling (SOC) proximity effect induced by the adjacent WSe$_2$ layers. However, at much lower temperatures ($T<20$ K), a non-monotonic modification of the coercive field is observed, showing both reduction and enhancement, which depends on the thickness of the WSe$_2$ and Fe$_3$GeTe$_2$ layers. Moreover, an unconventional two-step magnetization process emerges in the heterostructure, indicating the short-range nature of SOC proximity effects. Our findings revealing proximity effects on 2D magnetism may shed light on the design of future spintronic and memory devices based on 2D magnetic heterostructures. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18045 [pdf, other]

doi 10.1021/acsanm.4c00914

Blood Works for Graphene Production

Authors: Xiaofan Cai, Ming Li, Chao Chen, Renjun Du, Zi**g Guo, ** Wang, Guodong Ma, Xinglong Wu, Zhiyuan Wang, Yaqing Han, Fuzhuo Lian, **gkuan Xiao, Siqi Jiang, Lei Wang, Alexander S. Mayorov, Libo Gao, Kostya S. Novoselov, Geliang Yu

Abstract: Blood, a ubiquitous and fundamental carbohydrate material composed of plasma, red blood cells, white blood cells, and platelets, has been playing an important role in biology, life science, history, and religious study, while graphene has garnered significant attention due to its exceptional properties and extensive range of potential applications. Achieving environmentally friendly, cost-effectiv… ▽ More Blood, a ubiquitous and fundamental carbohydrate material composed of plasma, red blood cells, white blood cells, and platelets, has been playing an important role in biology, life science, history, and religious study, while graphene has garnered significant attention due to its exceptional properties and extensive range of potential applications. Achieving environmentally friendly, cost-effective growth using hybrid precursors and obtaining high-quality graphene through a straightforward CVD process has been traditionally considered mutually exclusive. This study demonstrates that we can produce high-quality graphene domains with controlled thickness through a one-step growth process at atmospheric pressure using blood as a precursor. Raman spectroscopy confirms the uniformity of the blood-grown graphene films, and observing the half-integer quantum Hall effect in the measured devices highlights its outstanding electronic properties. This unprecedented approach opens possibilities for blood application, facilitating an unconventional route in graphene growth applications. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.17667 [pdf, other]

SiamQuality: A ConvNet-Based Foundation Model for Imperfect Physiological Signals

Authors: Cheng Ding, Zhicheng Guo, Zhaoliang Chen, Randall J Lee, Cynthia Rudin, Xiao Hu

Abstract: Foundation models, especially those using transformers as backbones, have gained significant popularity, particularly in language and language-vision tasks. However, large foundation models are typically trained on high-quality data, which poses a significant challenge, given the prevalence of poor-quality real-world data. This challenge is more pronounced for develo** foundation models for phys… ▽ More Foundation models, especially those using transformers as backbones, have gained significant popularity, particularly in language and language-vision tasks. However, large foundation models are typically trained on high-quality data, which poses a significant challenge, given the prevalence of poor-quality real-world data. This challenge is more pronounced for develo** foundation models for physiological data; such data are often noisy, incomplete, or inconsistent. The present work aims to provide a toolset for develo** foundation models on physiological data. We leverage a large dataset of photoplethysmography (PPG) signals from hospitalized intensive care patients. For this data, we propose SimQuality, a novel self-supervised learning task based on convolutional neural networks (CNNs) as the backbone to enforce representations to be similar for good and poor quality signals that are from similar physiological states. We pre-trained the SimQuality on over 36 million 30-second PPG pairs and then fine-tuned and tested on six downstream tasks using external datasets. The results demonstrate the superiority of the proposed approach on all the downstream tasks, which are extremely important for heart monitoring on wearable devices. Our method indicates that CNNs can be an effective backbone for foundation models that are robust to training data quality. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16812 [pdf, other]

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

Authors: Xinning Hui, Yuanchao Xu, Zhishan Guo, Xipeng Shen

Abstract: Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some i… ▽ More Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some important factors, leaving some large performance potential locked. This paper presents ESG, a new scheduling algorithm that directly addresses the difficulties. ESG treats sharable GPU as a first-order factor in scheduling. It employs an optimality-guided adaptive method by combining A*-search and a novel dual-blade pruning to dramatically prune the scheduling space without compromising the quality. It further introduces a novel method, dominator-based SLO distribution, to ensure the scalability of the scheduler. The results show that ESG can significantly improve the SLO hit rates 61%-80% while saving 47%-187% costs over prior work. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: To appear in the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC'24)

arXiv:2404.16022 [pdf, other]

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Authors: Zinan Guo, Yanze Wu, Zhuowei Chen, Lang Chen, Qian He

Abstract: We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. By incorporating a Lightning T2I branch with a standard diffusion one, PuLID introduces both contrastive alignment loss and accurate ID loss, minimizing disruption to the original model and ensuring high ID fidelity. Experiments show that PuLID achieves superior perform… ▽ More We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. By incorporating a Lightning T2I branch with a standard diffusion one, PuLID introduces both contrastive alignment loss and accurate ID loss, minimizing disruption to the original model and ensuring high ID fidelity. Experiments show that PuLID achieves superior performance in both ID fidelity and editability. Another attractive property of PuLID is that the image elements (e.g., background, lighting, composition, and style) before and after the ID insertion are kept as consistent as possible. Codes and models will be available at https://github.com/ToTheBeginning/PuLID △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: Tech Report. Codes and models will be available at https://github.com/ToTheBeginning/PuLID

arXiv:2404.14918 [pdf, ps, other]

Existence of weak solutions for a class of non-divergent parabolic equations with variable exponent

Authors: **gfeng Shao, Zhichang Guo, Zhongxiang Zhou

Abstract: A doubly degenerate parabolic equation in non-divergent form with variable growth is investigated in this paper. In suitable spaces, we prove the existence of weak solutions of the equation for cases $1\leq m < 2$ and $m\geq 2$ in different ways. And we establish the non-expansion of support of the solution for the problem. A doubly degenerate parabolic equation in non-divergent form with variable growth is investigated in this paper. In suitable spaces, we prove the existence of weak solutions of the equation for cases $1\leq m < 2$ and $m\geq 2$ in different ways. And we establish the non-expansion of support of the solution for the problem. △ Less

Submitted 23 April, 2024; originally announced April 2024.

MSC Class: 35D30; 35K59

arXiv:2404.14719 [pdf, other]

Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs

Authors: Ruitong Liu, Yanbin Wang, Haitao Xu, Bin Liu, Jianguo Sun, Zhenhao Guo, Wenrui Ma

Abstract: Currently, deep learning successfully applies to code vulnerability detection by learning from code sequences or property graphs. However, sequence-based methods often overlook essential code attributes such as syntax, control flow, and data dependencies, whereas graph-based approaches might underestimate the semantics of code and face challenges in capturing long-distance contextual information.… ▽ More Currently, deep learning successfully applies to code vulnerability detection by learning from code sequences or property graphs. However, sequence-based methods often overlook essential code attributes such as syntax, control flow, and data dependencies, whereas graph-based approaches might underestimate the semantics of code and face challenges in capturing long-distance contextual information. To address this gap, we propose Vul-LMGNN, a unified model that combines pre-trained code language models with code property graphs for code vulnerability detection. Vul-LMGNN constructs a code property graph that integrates various code attributes (including syntax, flow control, and data dependencies) into a unified graph structure, thereafter leveraging pre-trained code model to extract local semantic features as node embeddings in the code property graph. Furthermore, to effectively retain dependency information among various attributes, we introduce a gated code Graph Neural Network (GNN). By jointly training the code language model and the gated code GNN modules in Vul-LMGNN, our proposed method efficiently leverages the strengths of both mechanisms. Finally, we utilize a pre-trained CodeBERT as an auxiliary classifier, with the final detection results derived by learning the linear interpolation of Vul-LMGNN and CodeBERT. The proposed method, evaluated across four real-world vulnerability datasets, demonstrated superior performance compared to six state-of-the-art approaches. Our source code could be accessed via the link: https://github.com/Vul-LMGNN/vul-LMGGNN. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 10 pages, 6 figures

arXiv:2404.13779 [pdf, other]

Automated Text Mining of Experimental Methodologies from Biomedical Literature

Authors: Ziqing Guo

Abstract: Biomedical literature is a rapidly expanding field of science and technology. Classification of biomedical texts is an essential part of biomedicine research, especially in the field of biology. This work proposes the fine-tuned DistilBERT, a methodology-specific, pre-trained generative classification language model for mining biomedicine texts. The model has proven its effectiveness in linguistic… ▽ More Biomedical literature is a rapidly expanding field of science and technology. Classification of biomedical texts is an essential part of biomedicine research, especially in the field of biology. This work proposes the fine-tuned DistilBERT, a methodology-specific, pre-trained generative classification language model for mining biomedicine texts. The model has proven its effectiveness in linguistic understanding capabilities and has reduced the size of BERT models by 40\% but by 60\% faster. The main objective of this project is to improve the model and assess the performance of the model compared to the non-fine-tuned model. We used DistilBert as a support model and pre-trained on a corpus of 32,000 abstracts and complete text articles; our results were impressive and surpassed those of traditional literature classification methods by using RNN or LSTM. Our aim is to integrate this highly specialised and specific model into different research industries. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13230 [pdf, other]

Random Gabidulin Codes Achieve List Decoding Capacity in the Rank Metric

Authors: Zeyu Guo, Chen Yuan, Zihan Zhang

Abstract: Gabidulin codes, serving as the rank-metric counterpart of Reed-Solomon codes, constitute an important class of maximum rank distance (MRD) codes. However, unlike the fruitful positive results about the list decoding of Reed-Solomon codes, results concerning the list decodability of Gabidulin codes in the rank metric are all negative so far. For example, in contrast to Reed-Solomon codes, which ar… ▽ More Gabidulin codes, serving as the rank-metric counterpart of Reed-Solomon codes, constitute an important class of maximum rank distance (MRD) codes. However, unlike the fruitful positive results about the list decoding of Reed-Solomon codes, results concerning the list decodability of Gabidulin codes in the rank metric are all negative so far. For example, in contrast to Reed-Solomon codes, which are always list decodable up to the Johnson bound in the Hamming metric, Raviv and Wachter-Zeh (IEEE TIT, 2016 and 2017) constructed a class of Gabidulin codes that are not even combinatorially list decodable beyond the unique decoding radius in the rank metric. Proving the existence of Gabidulin codes with good combinatorial list decodability in the rank metric has remained a long-standing open problem. In this paper, we resolve the aforementioned open problem by showing that, with high probability, random Gabidulin codes over sufficiently large alphabets attain the optimal generalized Singleton bound for list decoding in the rank metric. In particular, they achieve list decoding capacity in the rank metric. Our work is significantly influenced by the recent breakthroughs in the combinatorial list decodability of Reed-Solomon codes, especially the work by Brakensiek, Gopi, and Makam (STOC 2023). Our major technical contributions, which may hold independent interest, consist of the following: (1) We initiate the study of ``higher order MRD codes'' and provide a novel unified theory, which runs parallel to the theory of ``higher order MDS codes'' developed by BGM. (2) We prove a natural analog of the GM-MDS theorem, proven by Lovett (FOCS 2018) and Yildiz and Hassibi (IEEE TIT, 2019), which we call the GM-MRD theorem. In particular, our GM-MRD theorem for Gabidulin codes are strictly stronger than the GM-MDS theorem for Gabidulin codes, proven by Yildiz and Hassibi (IEEE TIT, 2019). △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12364 [pdf, ps, other]

On the well-posedness of the KP-I equation

Authors: Zihua Guo, Luc Molinet

Abstract: We revisit the local well-posedness for the KP-I equation. We obtain unconditional local well-posedness in $H^{s,0}({\mathbb R}^2)$ for $s>3/4$ and unconditional global well-posedness in the energy space. We also prove the global existence of perturbations with finite energy of non decaying smooth global solutions. We revisit the local well-posedness for the KP-I equation. We obtain unconditional local well-posedness in $H^{s,0}({\mathbb R}^2)$ for $s>3/4$ and unconditional global well-posedness in the energy space. We also prove the global existence of perturbations with finite energy of non decaying smooth global solutions. △ Less

Submitted 18 April, 2024; originally announced April 2024.

MSC Class: Primary: 35A02; 35E15; 35Q53; Secondary: 35B45; 35D30

Showing 51–100 of 1,438 results for author: Guo, Z