-
ASPEN: High-Throughput LoRA Fine-Tuning of Large Language Models with a Single GPU
Authors:
Zhengmao Ye,
Dengchun Li,
**gqi Tian,
Tingfeng Lan,
Jie Zuo,
Lei Duan,
Hui Lu,
Yexi Jiang,
Jian Sha,
Ke Zhang,
Mingjie Tang
Abstract:
Transformer-based large language models (LLMs) have demonstrated outstanding performance across diverse domains, particularly when fine-turned for specific domains. Recent studies suggest that the resources required for fine-tuning LLMs can be economized through parameter-efficient methods such as Low-Rank Adaptation (LoRA). While LoRA effectively reduces computational burdens and resource demands…
▽ More
Transformer-based large language models (LLMs) have demonstrated outstanding performance across diverse domains, particularly when fine-turned for specific domains. Recent studies suggest that the resources required for fine-tuning LLMs can be economized through parameter-efficient methods such as Low-Rank Adaptation (LoRA). While LoRA effectively reduces computational burdens and resource demands, it currently supports only a single-job fine-tuning setup.
In this paper, we present ASPEN, a high-throughput framework for fine-tuning LLMs. ASPEN efficiently trains multiple jobs on a single GPU using the LoRA method, leveraging shared pre-trained model and adaptive scheduling. ASPEN is compatible with transformer-based language models like LLaMA and ChatGLM, etc. Experiments show that ASPEN saves 53% of GPU memory when training multiple LLaMA-7B models on NVIDIA A100 80GB GPU and boosts training throughput by about 17% compared to existing methods when training with various pre-trained models on different GPUs. The adaptive scheduling algorithm reduces turnaround time by 24%, end-to-end training latency by 12%, prioritizing jobs and preventing out-of-memory issues.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Random Green's function method for large-scale electronic structure calculation
Authors:
Mingfa Tang,
Chang Liu,
Aixia Zhang,
Qingyun Zhang,
Shengjun Yuan,
Youqi Ke
Abstract:
We report a linear-scaling random Green's function (rGF) method for large-scale electronic structure calculation. In this method, the rGF is defined on a set of random states to stochastically express the density matrix, and rGF is calculated with the linear-scaling computational cost. We show the rGF method is generally applicable to the nonorthogonal localized basis, and circumvent the large Che…
▽ More
We report a linear-scaling random Green's function (rGF) method for large-scale electronic structure calculation. In this method, the rGF is defined on a set of random states to stochastically express the density matrix, and rGF is calculated with the linear-scaling computational cost. We show the rGF method is generally applicable to the nonorthogonal localized basis, and circumvent the large Chebyshev expansion for the density matrix. As a demonstration, we implement rGF with density-functional Tight-Binding method and apply it to self-consistently calculate water clusters up 9984 H2Os. We find the rGF method combining with a simple fragment correction can reach an error of ~1meV per H2O in total energy, compared to the deterministic calculations, due to the self-average. The development of rGF method advances the stochastic electronic structure theory to a new stage of the efficiency and applicability.
△ Less
Submitted 3 March, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
ALMA High-resolution Spectral Survey of Thioformaldehyde (H2CS) Towards Massive Protoclusters
Authors:
Li Chen,
Sheng-Li Qin,
Tie Liu,
Hong-Li Liu,
Sheng-Yuan Liu,
Meizhu Liu,
Hongqiong Shi,
Chuanshou Li,
Mengyao Tang,
Tianwei Zhang,
Ken'ichi Tatematsu,
Xiaohu Li,
Fengwei Xu,
Yuefang Wu,
Dongting Yang
Abstract:
Investigating the temperature and density structures of gas in massive protoclusters is crucial for understanding the chemical properties therein. In this study, we present observations of the continuum and thioformaldehyde (H2CS) lines at 345 GHz of 11 massive protoclusters using the Atacama Large Millimeter/submillimeter Array (ALMA) telescope. High spatial resolution and sensitivity observation…
▽ More
Investigating the temperature and density structures of gas in massive protoclusters is crucial for understanding the chemical properties therein. In this study, we present observations of the continuum and thioformaldehyde (H2CS) lines at 345 GHz of 11 massive protoclusters using the Atacama Large Millimeter/submillimeter Array (ALMA) telescope. High spatial resolution and sensitivity observations have detected 145 continuum cores from the 11 sources. H2CS line transitions are observed in 72 out of 145 cores, including line-rich cores, warm cores and cold cores. The H2 column densities of the 72 cores are estimated from the continuum emission which are larger than the density threshold value for star formation, suggesting that H2CS can be widely distributed in star-forming cores with different physical environments. Rotation temperature and column density of H2CS are derived by use of the XCLASS software. The results show the H2CS abundances increase as temperature rises and higher gas temperatures are usually associated with higher H2CS column densities. The abundances of H2CS are positively correlated with its column density, suggesting that the H2CS abundances are enhanced from cold cores, warm cores to line-rich cores in star forming regions.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
ScribbleGen: Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation
Authors:
Jacob Schnell,
Jieke Wang,
Lu Qi,
Vincent Tao Hu,
Meng Tang
Abstract:
Recent advances in generative models, such as diffusion models, have made generating high-quality synthetic images widely accessible. Prior works have shown that training on synthetic images improves many perception tasks, such as image classification, object detection, and semantic segmentation. We are the first to explore generative data augmentations for scribble-supervised semantic segmentatio…
▽ More
Recent advances in generative models, such as diffusion models, have made generating high-quality synthetic images widely accessible. Prior works have shown that training on synthetic images improves many perception tasks, such as image classification, object detection, and semantic segmentation. We are the first to explore generative data augmentations for scribble-supervised semantic segmentation. We propose ScribbleGen, a generative data augmentation method that leverages a ControlNet diffusion model conditioned on semantic scribbles to produce high-quality training data. However, naive implementations of generative data augmentations may inadvertently harm the performance of the downstream segmentor rather than improve it. We leverage classifier-free diffusion guidance to enforce class consistency and introduce encode ratios to trade off data diversity for data realism. Using the guidance scale and encode ratio, we can generate a spectrum of high-quality training images. We propose multiple augmentation schemes and find that these schemes significantly impact model performance, especially in the low-data regime. Our framework further reduces the gap between the performance of scribble-supervised segmentation and that of fully-supervised segmentation. We also show that our framework significantly improves segmentation performance on small datasets, even surpassing fully-supervised segmentation. The code is available at https://github.com/mengtang-lab/scribblegen.
△ Less
Submitted 16 April, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Cluster trajectory of SOFA score in predicting mortality in sepsis
Authors:
Yuhe Ke,
Matilda Swee Sun Tang,
Celestine Jia Ling Loh,
Hairil Rizal Abdullah,
Nicholas Brian Shannon
Abstract:
Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes.…
▽ More
Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes.
Design, setting, and participants: 3,253 patients in the Medical Information Mart for Intensive Care IV database who met the sepsis-3 criteria and were admitted from the emergency department with at least 72 hours of ICU admission and full-active resuscitation status were analysed. Group-based trajectory modelling with dynamic time war** and k-means clustering identified distinct trajectory patterns in dynamic SOFA scores. They were subsequently compared using Python.
Main outcome measures: Outcomes including hospital and ICU mortality, length of stay in hospital and ICU, and readmission during hospital stay, were collected. Discharge time from ICU to wards and cut-offs at 7-day and 14-day were taken.
Results: Four clusters were identified: A (consistently low SOFA scores), B (rapid increase followed by a decline in SOFA scores), C (higher baseline scores with gradual improvement), and D (persistently elevated scores). Cluster D had the longest ICU and hospital stays, highest ICU and hospital mortality. Discharge rates from ICU were similar for Clusters A and B, while Cluster C had initially comparable rates but a slower transition to ward.
Conclusion: Monitoring dynamic changes in SOFA score is valuable for assessing sepsis severity and treatment responsiveness.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Fourier Features for Identifying Differential Equations (FourierIdent)
Authors:
Mengyi Tang,
Hao Liu,
Wen**g Liao,
Sung Ha Kang
Abstract:
We investigate the benefits and challenges of utilizing the frequency information in differential equation identification. Solving differential equations and Fourier analysis are closely related, yet there is limited work in exploring this connection in the identification of differential equations. Given a single realization of the differential equation perturbed by noise, we aim to identify the u…
▽ More
We investigate the benefits and challenges of utilizing the frequency information in differential equation identification. Solving differential equations and Fourier analysis are closely related, yet there is limited work in exploring this connection in the identification of differential equations. Given a single realization of the differential equation perturbed by noise, we aim to identify the underlying differential equation governed by a linear combination of linear and nonlinear differential and polynomial terms in the frequency domain. This is challenging due to large magnitudes and sensitivity to noise. We introduce a Fourier feature denoising, and define the meaningful data region and the core regions of features to reduce the effect of noise in the frequency domain. We use Subspace Pursuit on the core region of the time derivative feature, and introduce a group trimming step to refine the support. We further introduce a new energy based on the core regions of features for coefficient identification. Utilizing the core regions of features serves two critical purposes: eliminating the low-response regions dominated by noise, and enhancing the accuracy in coefficient identification. The proposed method is tested on various differential equations with linear, nonlinear, and high-order derivative feature terms. Our results demonstrate the advantages of the proposed method, particularly on complex and highly corrupted datasets.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Mitigating Hallucination in Visual Language Models with Visual Supervision
Authors:
Zhiyang Chen,
Yousong Zhu,
Yufei Zhan,
Zhaowen Li,
Chaoyang Zhao,
**qiao Wang,
Ming Tang
Abstract:
Large vision-language models (LVLMs) suffer from hallucination a lot, generating responses that apparently contradict to the image content occasionally. The key problem lies in its weak ability to comprehend detailed content in a multi-modal context, which can be mainly attributed to two factors in training data and loss function. The vision instruction dataset primarily focuses on global descript…
▽ More
Large vision-language models (LVLMs) suffer from hallucination a lot, generating responses that apparently contradict to the image content occasionally. The key problem lies in its weak ability to comprehend detailed content in a multi-modal context, which can be mainly attributed to two factors in training data and loss function. The vision instruction dataset primarily focuses on global description, and the auto-regressive loss function favors text modeling rather than image understanding. In this paper, we bring more detailed vision annotations and more discriminative vision models to facilitate the training of LVLMs, so that they can generate more precise responses without encounter hallucination. On one hand, we generate image-text pairs with detailed relationship annotations in panoptic scene graph dataset (PSG). These conversations pay more attention on detailed facts in the image, encouraging the model to answer questions based on multi-modal contexts. On the other hand, we integrate SAM and mask prediction loss as auxiliary supervision, forcing the LVLMs to have the capacity to identify context-related objects, so that they can generate more accurate responses, mitigating hallucination. Moreover, to provide a deeper evaluation on the hallucination in LVLMs, we propose a new benchmark, RAH-Bench. It divides vision hallucination into three different types that contradicts the image with wrong categories, attributes or relations, and introduces False Positive Rate as detailed sub-metric for each type. In this benchmark, our approach demonstrates an +8.4% enhancement compared to original LLaVA and achieves widespread performance improvements across other models.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Continual Instruction Tuning for Large Multimodal Models
Authors:
**ghan He,
Haiyun Guo,
Ming Tang,
**qiao Wang
Abstract:
Instruction tuning is now a widely adopted approach to aligning large multimodal models (LMMs) to follow human intent. It unifies the data format of vision-language tasks, enabling multi-task joint training. However, vision-language tasks are constantly being created in practice. Instead of always re-training LMMs when new tasks arrive, continual learning offers flexibility for models to continual…
▽ More
Instruction tuning is now a widely adopted approach to aligning large multimodal models (LMMs) to follow human intent. It unifies the data format of vision-language tasks, enabling multi-task joint training. However, vision-language tasks are constantly being created in practice. Instead of always re-training LMMs when new tasks arrive, continual learning offers flexibility for models to continually and efficiently exploit the evolving data. This work aims to explore the following two questions: 1) Do LMMs still suffer from catastrophic forgetting in continual instruction tuning? 2) Are the existing three classes of continual learning methods still applicable to the continual instruction tuning of LMMs? An extensive study is conducted to address the above questions. First, we establish the first benchmark in this setting and reveal that catastrophic forgetting is still observed when continually instruction-tuning LMMs. However, the multi-task joint instruction tuning can facilitate the model's continual learning ability and mitigate forgetting. Second, we integrate and adapt classic continual learning methods to our context, demonstrating the efficacy of data replay and model expansion strategies across diverse scenarios. In contrast, regularization-based methods only perform well on models that have been jointly instruction-tuned on multiple tasks. Third, we delve into the correlation and forgetting dynamics between vision-language task pairs and propose task-similarity-informed regularization and model expansion methods for continual instruction tuning of LMMs. Experimental results show that our approach consistently boosts the model's performance.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
Authors:
Yufei Zhan,
Yousong Zhu,
Zhiyang Chen,
Fan Yang,
Ming Tang,
**qiao Wang
Abstract:
Replicating the innate human ability to detect all objects based on free-form texts at any granularity remains a formidable challenge for Vision-Language models. Current Large Vision Language Models (LVLMs) are predominantly constrained to grounding a single, pre-existing object, relying solely on data from Referring Expression Comprehension tasks. The limitation leads to a compromise in model des…
▽ More
Replicating the innate human ability to detect all objects based on free-form texts at any granularity remains a formidable challenge for Vision-Language models. Current Large Vision Language Models (LVLMs) are predominantly constrained to grounding a single, pre-existing object, relying solely on data from Referring Expression Comprehension tasks. The limitation leads to a compromise in model design, necessitating the introduction of visual expert models or the integration of customized head structures. Beyond these constraints, our research delves into the untapped potential of LVLMs and uncover their inherent capability for basic object perception, allowing them to accurately identify and locate objects of interest. Building on this insight, we introduce a novel language-prompted localization dataset designed to fully unleash the capabilities of LVLMs in integrating fine-grained object perception with precise location awareness. More importantly, we present $\textbf{Griffon}$, a purely LVLM-based baseline, which does not require the introduction of any special tokens, expert models, or additional detection modules. It simply maintains a consistent structure with popular LVLMs by unifying data formats across various localization-related scenarios and is trained end-to-end through a well-designed pipeline. Comprehensive experiments demonstrate that $\textbf{Griffon}$ not only achieves state-of-the-art performance on the fine-grained RefCOCO series but also approaches the capabilities of the expert model Faster RCNN on the detection benchmark MSCOCO.
△ Less
Submitted 27 November, 2023; v1 submitted 24 November, 2023;
originally announced November 2023.
-
Contagion dynamics in time-varying metapopulation networks with node's activity and attractiveness
Authors:
Lang Zeng,
Ming Tang,
Ying Liu,
Seung Yeop Yang,
Younghae Do
Abstract:
The metapopulation network model is effectively used to study the spatial spread of epidemics with individuals mobility. Considering the time-varying nature of individual activity and the preferences for attractive destinations in population mobility, this paper develops a time-varying network model in which activity of a population is correlated with its attractiveness. Based on the model, the sp…
▽ More
The metapopulation network model is effectively used to study the spatial spread of epidemics with individuals mobility. Considering the time-varying nature of individual activity and the preferences for attractive destinations in population mobility, this paper develops a time-varying network model in which activity of a population is correlated with its attractiveness. Based on the model, the spreading processes of the SIR disease on different correlated networks are studied, and global migration thresholds are derived. It is observed that increasing the correlation between activity and attractiveness results in a reduced outbreak threshold but suppresses the disease outbreak size and introduces greater heterogeneity in the spatial distribution of infected individuals. We also investigate the impact of non-pharmacological interventions (self-isolation and self-protection) on the spread of epidemics in different correlation networks. The results show that the simultaneous implementation of these measures is more effective in negatively correlated networks than in positively correlated or non-correlated networks, and the prevalence is reduced significantly. In addition, both self-isolation and self-protection strategies increase the migration threshold of the spreading and thus slow the spread of the epidemic. However, the effectiveness of each strategy in reducing the density of infected populations varies depending on different correlated networks. Self-protection is more effective in positively correlated networks, whereas self-isolation is more effective in negatively correlated networks. These findings contribute to a better understanding of epidemic spreading in large-scale time-varying metapopulation networks and provide insights for epidemic prevention and control.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
JWST spectroscopy of $z\sim 5-8$ UV-selected galaxies: New constraints on the evolution of the Ly$α$ escape fraction in the reionization era
Authors:
Zuyi Chen,
Daniel P. Stark,
Charlotte Mason,
Michael W. Top**,
Lily Whitler,
Mengtao Tang,
Ryan Endsley,
Stéphane Charlot
Abstract:
We describe {\it JWST}/NIRSpec prism measurements of Ly$α$ emission in $z\gtrsim 5$ galaxies. We identify Ly$α$ detections in 10 out of 69 galaxies with robust rest-optical emission line redshift measurements at $5\leq z<7$ in the CEERS and DDT-2750 observations of the EGS field. Galaxies at $z\simeq 6$ with faint continuum (F150W $=$ 27--29 mag) are found with extremely large rest-frame Ly$α$ equ…
▽ More
We describe {\it JWST}/NIRSpec prism measurements of Ly$α$ emission in $z\gtrsim 5$ galaxies. We identify Ly$α$ detections in 10 out of 69 galaxies with robust rest-optical emission line redshift measurements at $5\leq z<7$ in the CEERS and DDT-2750 observations of the EGS field. Galaxies at $z\simeq 6$ with faint continuum (F150W $=$ 27--29 mag) are found with extremely large rest-frame Ly$α$ equivalent widths (ranging up to 286 A). Likely Ly$α$ detections are also seen in two new $z>7$ galaxies ($z=$ 7.49 and 7.17) from the second epoch of CEERS observations, both showing large Ly$α$ equivalent widths that likely indicate significant transmission through the IGM. We measure high Ly$α$ escape fractions in the 12 Ly$α$ emitters in our sample (median 0.28), two of which show $f_{\rm esc}^{ {\rm Ly}α}$ near unity ($>0.80$). We find that $50_{-11}^{+11}$% of $z\simeq 6$ galaxies with [OIII]+H$β$ EW $>$ 1000 A have $f_{\rm esc}^{ {\rm Ly}α}$ $>0.2$, consistent with the fractions found in lower-redshift samples with matched [OIII]+H$β$ EWs. While uncertainties are still significant, we find that only $10_{-5}^{+9}$% of $z>7$ galaxies with similarly strong rest optical emission lines show such large $f_{\rm esc}^{ {\rm Ly}α}$, as may be expected if IGM attenuation of Ly$α$ increases towards higher redshifts. We identify photometric galaxy overdensities near the $z\gtrsim 7$ Ly$α$ emitters, potentially providing the ionizing flux necessary to create large ionized sightlines that facilitate Ly$α$ transmission. Finally, we investigate the absence of Ly$α$ emission in a comparable (and spectroscopically confirmed) galaxy overdensity at $z=7.88$ in the Abell 2744 field, discussing new prism spectra of the field obtained with the UNCOVER program.
△ Less
Submitted 20 February, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Ultrafast 3-D Super Resolution Ultrasound using Row-Column Array specific Coherence-based Beamforming and Rolling Acoustic Sub-aperture Processing: In Vitro, In Vivo and Clinical Study
Authors:
Joseph Hansen-Shearer,
Jipeng Yan,
Marcelo Lerendegui,
Biao Huang,
Matthieu Toulemonde,
Kai Riemer,
Qingyuan Tan,
Johanna Tonko,
Peter D. Weinberg,
Chris Dunsby,
Meng-Xing Tang
Abstract:
The row-column addressed array is an emerging probe for ultrafast 3-D ultrasound imaging. It achieves this with far fewer independent electronic channels and a wider field of view than traditional 2-D matrix arrays, of the same channel count, making it a good candidate for clinical translation. However, the image quality of row-column arrays is generally poor, particularly when investigating tissu…
▽ More
The row-column addressed array is an emerging probe for ultrafast 3-D ultrasound imaging. It achieves this with far fewer independent electronic channels and a wider field of view than traditional 2-D matrix arrays, of the same channel count, making it a good candidate for clinical translation. However, the image quality of row-column arrays is generally poor, particularly when investigating tissue. Ultrasound localisation microscopy allows for the production of super-resolution images even when the initial image resolution is not high. Unfortunately, the row-column probe can suffer from imaging artefacts that can degrade the quality of super-resolution images as `secondary' lobes from bright microbubbles can be mistaken as microbubble events, particularly when operated using plane wave imaging. These false events move through the image in a physiologically realistic way so can be challenging to remove via tracking, leading to the production of 'false vessels'. Here, a new type of rolling window image reconstruction procedure was developed, which integrated a row-column array-specific coherence-based beamforming technique with acoustic sub-aperture processing for the purposes of reducing `secondary' lobe artefacts, noise and increasing the effective frame rate. Using an {\it{in vitro}} cross tube, it was found that the procedure reduced the percentage of `false' locations from $\sim$26\% to $\sim$15\% compared to traditional orthogonal plane wave compounding. Additionally, it was found that the noise could be reduced by $\sim$7 dB and that the effective frame rate could be increased to over 4000 fps. Subsequently, {\it{in vivo}} ultrasound localisation microscopy was used to produce images non-invasively of a rabbit kidney and a human thyroid.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Test the weak cosmic censorship conjecture via cold dark matter-black hole and ultralight dark matter-black hole
Authors:
Meirong Tang,
Zhaoyi Xu
Abstract:
The weak cosmic censorship conjecture states that the black hole singularity is hidden inside the event horizon of the black hole, making it impossible for an external observer to measure. In this study, we investigate the weak cosmic censorship conjecture test of dark matter halo-black hole systems in both the cold dark matter model and ultralight dark matter model scenarios, with the aim of gain…
▽ More
The weak cosmic censorship conjecture states that the black hole singularity is hidden inside the event horizon of the black hole, making it impossible for an external observer to measure. In this study, we investigate the weak cosmic censorship conjecture test of dark matter halo-black hole systems in both the cold dark matter model and ultralight dark matter model scenarios, with the aim of gaining insights into the influence of dark matter particles on the weak cosmic censorship conjecture. By examining the particle incident on an extremely or nearly extremal dark matter- black hole, as well as the scattering of a scalar field by an extreme or near-extreme dark matter- black hole. We find that the weak cosmic censorship conjecture does not violate the extreme and near-extreme dark matter-black hole systems for incident particles. When a scalar field is incident on an extreme dark matter-black hole system, the weak cosmic censorship conjecture can be violated by a scalar field pattern that satisfies $\dfrac{1}{2Mω_{0}}<\dfracω{m}<\dfrac{Mω_{0}}{2M^{2}ω_{0}^{2}+k_{i}}$ ($k_{i}<0$; where $k_{1}$ corresponds to the cold dark matter model and $k_{2}$ corresponds to the ultralight dark matter model). The weak cosmic censorship conjecture remains unviolated in the presence of a scalar field incident upon a nearly extreme dark matter-black hole system. This research will contribute to furthering our comprehension of the intricate interplay between dark matter and black holes.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization
Authors:
Jiale Lao,
Yibo Wang,
Yufei Li,
Jian** Wang,
Yunjia Zhang,
Zhiyuan Cheng,
Wanghu Chen,
Mingjie Tang,
Jianguo Wang
Abstract:
Modern database management systems (DBMS) expose hundreds of configurable knobs to control system behaviours. Determining the appropriate values for these knobs to improve DBMS performance is a long-standing problem in the database community. As there is an increasing number of knobs to tune and each knob could be in continuous or categorical values, manual tuning becomes impractical. Recently, au…
▽ More
Modern database management systems (DBMS) expose hundreds of configurable knobs to control system behaviours. Determining the appropriate values for these knobs to improve DBMS performance is a long-standing problem in the database community. As there is an increasing number of knobs to tune and each knob could be in continuous or categorical values, manual tuning becomes impractical. Recently, automatic tuning systems using machine learning methods have shown great potentials. However, existing approaches still incur significant tuning costs or only yield sub-optimal performance. This is because they either ignore the extensive domain knowledge available (e.g., DBMS manuals and forum discussions) and only rely on the runtime feedback of benchmark evaluations to guide the optimization, or they utilize the domain knowledge in a limited way. Hence, we propose GPTuner, a manual-reading database tuning system. Firstly, we develop a Large Language Model (LLM)-based pipeline to collect and refine heterogeneous knowledge, and propose a prompt ensemble algorithm to unify a structured view of the refined knowledge. Secondly, using the structured knowledge, we (1) design a workload-aware and training-free knob selection strategy, (2) develop a search space optimization technique considering the value range of each knob, and (3) propose a Coarse-to-Fine Bayesian Optimization Framework to explore the optimized space. Finally, we evaluate GPTuner under different benchmarks (TPC-C and TPC-H), metrics (throughput and latency) as well as DBMS (PostgreSQL and MySQL). Compared to the state-of-the-art approaches, GPTuner identifies better configurations in 16x less time on average. Moreover, GPTuner achieves up to 30% performance improvement (higher throughput or lower latency) over the best-performing alternative.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
A Boundary Offset Prediction Network for Named Entity Recognition
Authors:
Minghao Tang,
Yongquan He,
Yongxiu Xu,
Hongbo Xu,
Wenyuan Zhang,
Yang Lin
Abstract:
Named entity recognition (NER) is a fundamental task in natural language processing that aims to identify and classify named entities in text. However, span-based methods for NER typically assign entity types to text spans, resulting in an imbalanced sample space and neglecting the connections between non-entity and entity spans. To address these issues, we propose a novel approach for NER, named…
▽ More
Named entity recognition (NER) is a fundamental task in natural language processing that aims to identify and classify named entities in text. However, span-based methods for NER typically assign entity types to text spans, resulting in an imbalanced sample space and neglecting the connections between non-entity and entity spans. To address these issues, we propose a novel approach for NER, named the Boundary Offset Prediction Network (BOPN), which predicts the boundary offsets between candidate spans and their nearest entity spans. By leveraging the guiding semantics of boundary offsets, BOPN establishes connections between non-entity and entity spans, enabling non-entity spans to function as additional positive samples for entity detection. Furthermore, our method integrates entity type and span representations to generate type-aware boundary offsets instead of using entity types as detection targets. We conduct experiments on eight widely-used NER datasets, and the results demonstrate that our proposed BOPN outperforms previous state-of-the-art methods.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Accelerate Microstructure Evolution Simulation Using Graph Neural Networks with Adaptive Spatiotemporal Resolution
Authors:
Shaoxun Fan,
Andrew L. Hitt,
Ming Tang,
Babak Sadigh,
Fei Zhou
Abstract:
Surrogate models driven by sizeable datasets and scientific machine-learning methods have emerged as an attractive microstructure simulation tool with the potential to deliver predictive microstructure evolution dynamics with huge savings in computational costs. Taking 2D and 3D grain growth simulations as an example, we present a completely overhauled computational framework based on graph neural…
▽ More
Surrogate models driven by sizeable datasets and scientific machine-learning methods have emerged as an attractive microstructure simulation tool with the potential to deliver predictive microstructure evolution dynamics with huge savings in computational costs. Taking 2D and 3D grain growth simulations as an example, we present a completely overhauled computational framework based on graph neural networks with not only excellent agreement to both the ground truth phase-field methods and theoretical predictions, but enhanced accuracy and efficiency compared to previous works based on convolutional neural networks. These improvements can be attributed to the graph representation, both improved predictive power and a more flexible data structure amenable to adaptive mesh refinement. As the simulated microstructures coarsen, our method can adaptively adopt remeshed grids and larger timesteps to achieve further speedup. The data-to-model pipeline with training procedures together with the source codes are provided.
△ Less
Submitted 19 January, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Learning to Correct Noisy Labels for Fine-Grained Entity Ty** via Co-Prediction Prompt Tuning
Authors:
Minghao Tang,
Yongquan He,
Yongxiu Xu,
Hongbo Xu,
Wenyuan Zhang,
Yang Lin
Abstract:
Fine-grained entity ty** (FET) is an essential task in natural language processing that aims to assign semantic types to entities in text. However, FET poses a major challenge known as the noise labeling problem, whereby current methods rely on estimating noise distribution to identify noisy labels but are confused by diverse noise distribution deviation. To address this limitation, we introduce…
▽ More
Fine-grained entity ty** (FET) is an essential task in natural language processing that aims to assign semantic types to entities in text. However, FET poses a major challenge known as the noise labeling problem, whereby current methods rely on estimating noise distribution to identify noisy labels but are confused by diverse noise distribution deviation. To address this limitation, we introduce Co-Prediction Prompt Tuning for noise correction in FET, which leverages multiple prediction results to identify and correct noisy labels. Specifically, we integrate prediction results to recall labeled labels and utilize a differentiated margin to identify inaccurate labels. Moreover, we design an optimization objective concerning divergent co-predictions during fine-tuning, ensuring that the model captures sufficient information and maintains robustness in noise identification. Experimental results on three widely-used FET datasets demonstrate that our noise correction approach significantly enhances the quality of various types of training samples, including those annotated using distant supervision, ChatGPT, and crowdsourcing.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
A Class of Forward-Backward Stochastic Differential Equations Driven by Lévy Processes and Application to LQ Problems
Authors:
Maozhong Xu,
Maoning Tang,
Qingxin Meng
Abstract:
In this paper, our primary focus lies in the thorough investigation of a specific category of nonlinear fully coupled forward-backward stochastic differential equations involving time delays and advancements with the incorporation of Lévy processes, which we shall abbreviate as FBSDELDAs. Drawing inspiration from diverse examples of linear-quadratic (LQ) optimal control problems featuring delays a…
▽ More
In this paper, our primary focus lies in the thorough investigation of a specific category of nonlinear fully coupled forward-backward stochastic differential equations involving time delays and advancements with the incorporation of Lévy processes, which we shall abbreviate as FBSDELDAs. Drawing inspiration from diverse examples of linear-quadratic (LQ) optimal control problems featuring delays and Lévy processes, we proceed to employ a set of domination-monotonicity conditions tailored to this class of FBSDELDAs. Through the application of the continuation method, we achieve the pivotal results of unique solvability and the derivation of a pair of estimates for the solutions of these FBSDELDAs. These findings, in turn, carry significant implications for a range of LQ problems. Specifically, they are relevant when stochastic Hamiltonian systems perfectly align with the FBSDELDAs that fulfill the domination-monotonicity conditions. Consequently, we are able to establish explicit expressions for the unique optimal controls by utilizing the solutions of the corresponding stochastic Hamiltonian systems.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Shuffle Bases and Quasisymmetric Power Sums
Authors:
Ricky Ini Liu,
Michael Tang
Abstract:
The algebra of quasisymmetric functions QSym and the shuffle algebra of compositions Sh are isomorphic as graded Hopf algebras (in characteristic zero), and isomorphisms between them can be specified via shuffle bases of QSym. We use the notion of infinitesimal characters to characterize shuffle bases, and we establish a universal property for Sh in the category of connected graded Hopf algebras e…
▽ More
The algebra of quasisymmetric functions QSym and the shuffle algebra of compositions Sh are isomorphic as graded Hopf algebras (in characteristic zero), and isomorphisms between them can be specified via shuffle bases of QSym. We use the notion of infinitesimal characters to characterize shuffle bases, and we establish a universal property for Sh in the category of connected graded Hopf algebras equipped with an infinitesimal character, analogous to the universal property of QSym as a combinatorial Hopf algebra described by Aguiar, Bergeron, and Sottile. We then use these results to give general constructions for quasisymmetric power sums, recovering four previous constructions from the literature, and study their properties.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Price of Stability in Quality-Aware Federated Learning
Authors:
Yizhou Yan,
Xinyu Tang,
Chao Huang,
Ming Tang
Abstract:
Federated Learning (FL) is a distributed machine learning scheme that enables clients to train a shared global model without exchanging local data. The presence of label noise can severely degrade the FL performance, and some existing studies have focused on algorithm design for label denoising. However, they ignored the important issue that clients may not apply costly label denoising strategies…
▽ More
Federated Learning (FL) is a distributed machine learning scheme that enables clients to train a shared global model without exchanging local data. The presence of label noise can severely degrade the FL performance, and some existing studies have focused on algorithm design for label denoising. However, they ignored the important issue that clients may not apply costly label denoising strategies due to them being self-interested and having heterogeneous valuations on the FL performance. To fill this gap, we model the clients' interactions as a novel label denoising game and characterize its equilibrium. We also analyze the price of stability, which quantifies the difference in the system performance (e.g., global model accuracy, social welfare) between the equilibrium outcome and the socially optimal solution. We prove that the equilibrium outcome always leads to a lower global model accuracy than the socially optimal solution does. We further design an efficient algorithm to compute the socially optimal solution. Numerical experiments on MNIST dataset show that the price of stability increases as the clients' data become noisier, calling for an effective incentive mechanism.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
A low-mass line-rich core found in Massive Star-forming Region IRAS 16351-4722
Authors:
Meizhu Liu,
Sheng-Li Qin,
Tie Liu,
Mengyao Tang,
Sheng-Yuan Liu,
Li Chen,
ChuanShou Li,
HongQiong Shi,
Xiaohu Li,
Tianwei Zhang,
Ken'ichi Tatematsu,
Fengwei Xu,
Yuefang Wu
Abstract:
We present ALMA sub-arcsecond-resolution observations of both continuum and molecular lines at 345 GHz towards the massive star-forming region IRAS 16351-4722 (hereafter I16351). A total of 12 dust cores were detected based on high spatial resolution observations of the continuum. Among them, a high-mass core (11.6 Msun) and a low-mass core (1.7 Msun) show abundant molecular line emissions. 164 mo…
▽ More
We present ALMA sub-arcsecond-resolution observations of both continuum and molecular lines at 345 GHz towards the massive star-forming region IRAS 16351-4722 (hereafter I16351). A total of 12 dust cores were detected based on high spatial resolution observations of the continuum. Among them, a high-mass core (11.6 Msun) and a low-mass core (1.7 Msun) show abundant molecular line emissions. 164 molecular transitions from 29 species and 104 molecular transitions from 25 species are identified in the high-mass and low-mass cores, respectively. Complex organic molecules (COMs) such as CH3OH, CH3OCHO, CH3OCH3, C2H5OH, and C2H5CN are detected in the two cores. Under the assumption of local thermodynamic equilibrium (LTE), rotational temperatures and column densities of the COMs are derived with the XCLASS software. The maximum rotation temperature values in the low-mass core and the high-mass core were found to be approximately 130 K and 198 K, respectively. Additionally, the line widths in the high-mass core are larger than those in the low-mass one. Abundant complex organic molecular line transitions, high gas temperatures, and smaller line widths indicate the presence of a low-mass line-rich core in the massive star formation region for the first time, while the high-mass line-rich core shows hot core property. When comparing the molecular abundances of CH3OH, CH3OCHO, CH3OCH3 and C2H5OH of the two cores with other hot cores and hot corinos reported in the literature, we further confirm that both a hot core and a low-mass line-rich core are simultaneously detected in I16351.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
EFFL: Egalitarian Fairness in Federated Learning for Mitigating Matthew Effect
Authors:
Jiashi Gao,
Changwu Huang,
Ming Tang,
Shin Hwei Tan,
Xin Yao,
Xuetao Wei
Abstract:
Recent advances in federated learning (FL) enable collaborative training of machine learning (ML) models from large-scale and widely dispersed clients while protecting their privacy. However, when different clients' datasets are heterogeneous, traditional FL mechanisms produce a global model that does not adequately represent the poorer clients with limited data resources, resulting in lower accur…
▽ More
Recent advances in federated learning (FL) enable collaborative training of machine learning (ML) models from large-scale and widely dispersed clients while protecting their privacy. However, when different clients' datasets are heterogeneous, traditional FL mechanisms produce a global model that does not adequately represent the poorer clients with limited data resources, resulting in lower accuracy and higher bias on their local data. According to the Matthew effect, which describes how the advantaged gain more advantage and the disadvantaged lose more over time, deploying such a global model in client applications may worsen the resource disparity among the clients and harm the principles of social welfare and fairness. To mitigate the Matthew effect, we propose Egalitarian Fairness Federated Learning (EFFL), where egalitarian fairness refers to the global model learned from FL has: (1) equal accuracy among clients; (2) equal decision bias among clients. Besides achieving egalitarian fairness among the clients, EFFL also aims for performance optimality, minimizing the empirical risk loss and the bias for each client; both are essential for any ML model training, whether centralized or decentralized. We formulate EFFL as a constrained multi-constrained multi-objectives optimization (MCMOO) problem, with the decision bias and egalitarian fairness as constraints and the minimization of the empirical risk losses on all clients as multiple objectives to be optimized. We propose a gradient-based three-stage algorithm to obtain the Pareto optimal solutions within the constraint space. Extensive experiments demonstrate that EFFL outperforms other state-of-the-art FL algorithms in achieving a high-performance global model with enhanced egalitarian fairness among all clients.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
The ALMA Survey of Star Formation and Evolution in Massive Protoclusters with Blue Profiles (ASSEMBLE): Core Growth, Cluster Contraction, and Primordial Mass Segregation
Authors:
Fengwei Xu,
Ke Wang,
Tie Liu,
Mengyao Tang,
Neal J. Evans II,
Aina Palau,
Kaho Morii,
**hua He,
Patricio Sanhueza,
Hong-Li Liu,
Amelia Stutz,
Qizhou Zhang,
Xi Chen,
Pak Shing Li,
Gilberto C. Gómez,
Enrique Vázquez-Semadeni,
Shanghuo Li,
Xiaofeng Mai,
Xing Lu,
Meizhu Liu,
Li Chen,
Chuanshou Li,
Hongqiong Shi,
Zhiyuan Ren,
Di Li
, et al. (18 additional authors not shown)
Abstract:
The ALMA Survey of Star Formation and Evolution in Massive Protoclusters with Blue Profiles (ASSEMBLE) aims to investigate the process of mass assembly and its connection to high-mass star formation theories in protoclusters in a dynamic view. We observed 11 massive (Mclump>1000 Msun), luminous (Lbol>10,000 Lsun), and blue-profile (infall signature) clumps by ALMA with resolution of 2200-5500 au a…
▽ More
The ALMA Survey of Star Formation and Evolution in Massive Protoclusters with Blue Profiles (ASSEMBLE) aims to investigate the process of mass assembly and its connection to high-mass star formation theories in protoclusters in a dynamic view. We observed 11 massive (Mclump>1000 Msun), luminous (Lbol>10,000 Lsun), and blue-profile (infall signature) clumps by ALMA with resolution of 2200-5500 au at 350 GHz (870 um) in continuum and line emission. 248 dense cores were identified, including 106 cores showing protostellar signatures and 142 prestellar core candidates. Compared to early-stage infrared dark clouds (IRDCs) by ASHES, the core mass and surface density within the ASSEMBLE clumps exhibited significant increment, suggesting concurrent core accretion during the evolution of the clumps. The maximum mass of prestellar cores was found to be 2 times larger than that in IRDCs, indicating evolved protoclusters have the potential to harbor massive prestellar cores. The mass relation between clumps and their most massive core (MMCs) is observed in ASSEMBLE but not in IRDCs, which is suggested to be regulated by multiscale mass accretion. The mass correlation between the core clusters and their MMCs has a steeper slope compared to that observed in stellar clusters, which can be due to fragmentation of the MMC and stellar multiplicity. We observe a decrease in core separation and an increase in central concentration as protoclusters evolve. We confirm primordial mass segregation in the ASSEMBLE protoclusters, possibly resulting from gravitational concentration and/or gas accretion.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Reconstructing the kinetic chemotaxis kernel using macroscopic data: well-posedness and ill-posedness
Authors:
Kathrin Hellmuth,
Christian Klingenberg,
Qin Li,
Min Tang
Abstract:
Bacterial motion is guided by external stimuli (chemotaxis), and the motion described on the mesoscopic scale is uniquely determined by a parameter $K$ that models velocity change response from the bacteria. This parameter is termed chemotaxis kernel. In a practical setting, experimental data was collected to infer this kernel. In this article, a PDE-constrained optimization framework is deployed…
▽ More
Bacterial motion is guided by external stimuli (chemotaxis), and the motion described on the mesoscopic scale is uniquely determined by a parameter $K$ that models velocity change response from the bacteria. This parameter is termed chemotaxis kernel. In a practical setting, experimental data was collected to infer this kernel. In this article, a PDE-constrained optimization framework is deployed to perform this reconstruction using velocity-averaged, localized data taken in the interior of the domain. The problem can be well-posed or ill-posed depending on the data preparation and the experimental setup. In particular, we propose one specific design that guarantees numerical reconstructability and local convergence. This design is adapted to the discretization of $K$ in space and decouples the reconstruction of local values of $K$ into smaller cell problems, opening up parallelization opportunities. Numerical evidences support the theoretical findings.
△ Less
Submitted 16 April, 2024; v1 submitted 10 September, 2023;
originally announced September 2023.
-
Individually Rational Collaborative Vehicle Routing through Give-And-Take Exchanges
Authors:
Paul Mingzheng Tang,
Ba Phong Tran,
Hoong Chuin Lau
Abstract:
In this paper, we are concerned with the automated exchange of orders between logistics companies in a marketplace platform to optimize total revenues. We introduce a novel multi-agent approach to this problem, focusing on the Collaborative Vehicle Routing Problem (CVRP) through the lens of individual rationality. Our proposed algorithm applies the principles of Vehicle Routing Problem (VRP) to pa…
▽ More
In this paper, we are concerned with the automated exchange of orders between logistics companies in a marketplace platform to optimize total revenues. We introduce a novel multi-agent approach to this problem, focusing on the Collaborative Vehicle Routing Problem (CVRP) through the lens of individual rationality. Our proposed algorithm applies the principles of Vehicle Routing Problem (VRP) to pairs of vehicles from different logistics companies, optimizing the overall routes while considering standard VRP constraints plus individual rationality constraints. By facilitating cooperation among competing logistics agents through a Give-and-Take approach, we show that it is possible to reduce travel distance and increase operational efficiency system-wide. More importantly, our approach ensures individual rationality and faster convergence, which are important properties of ensuring the long-term sustainability of the marketplace platform. We demonstrate the efficacy of our approach through extensive experiments using real-world test data from major logistics companies. The results reveal our algorithm's ability to rapidly identify numerous optimal solutions, underscoring its practical applicability and potential to transform the logistics industry.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
Authors:
Zhaopeng Gu,
Bingke Zhu,
Guibo Zhu,
Yingying Chen,
Ming Tang,
**qiao Wang
Abstract:
Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders t…
▽ More
Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders their effectiveness in the Industrial Anomaly Detection (IAD) task. On the other hand, most existing IAD methods only provide anomaly scores and necessitate the manual setting of thresholds to distinguish between normal and abnormal samples, which restricts their practical implementation. In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantic and design a prompt learner to fine-tune the LVLM using prompt embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustments, thus directly assesses the presence and locations of anomalies. Additionally, AnomalyGPT supports multi-turn dialogues and exhibits impressive few-shot in-context learning capabilities. With only one normal shot, AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset. Code is available at https://github.com/CASIA-IVA-Lab/AnomalyGPT.
△ Less
Submitted 28 December, 2023; v1 submitted 29 August, 2023;
originally announced August 2023.
-
An initial analysis of a strongly-lensed QSOs candidate identified by LAMOST
Authors:
Y. H. Chen,
M. Y. Tang,
H. Shu,
H. Tu
Abstract:
From 2011 to 2021, LAMOST has released a total of 76,167 quasar data. We try to search for gravitationally lensed QSOs by limiting coordinate differences and redshift differences of these QSOs. The name, brightness, spectrum, photometry and other information of each QSO will be visually checked carefully. Special attention should be paid to check whether there are groups of galaxies, gravitational…
▽ More
From 2011 to 2021, LAMOST has released a total of 76,167 quasar data. We try to search for gravitationally lensed QSOs by limiting coordinate differences and redshift differences of these QSOs. The name, brightness, spectrum, photometry and other information of each QSO will be visually checked carefully. Special attention should be paid to check whether there are groups of galaxies, gravitationally lensed arcs, Einstein crosses, or Einstein rings near the QSOs. Through careful selection, we select LAMOST J160603.01+290050.8 (A) and LAMOST J160602.81+290048.7 (B) as a candidate and perform an initial analysis. The component A and B are 3.36 arc seconds apart and they display blue during photometric observations. The redshift values of component A and B are 0.2\% different, their Gaia$\_$g values are 1.3\% different, and their ugriz values are 1.0\% or less different. For the spectra covering from 3,690 Å to 9,100 Å, the emission lines of C\,II, Mg, H\,$γ$, O\,III, and H\,$β$ are present for both component A and B and the ratio of flux(B) to flux(A) from LAMOST is basically a constant, around 2.2. We accidentally find a galaxy group near the component A and B. If the center of dark matter in the galaxy group is at the center between component A and B, the component A and B are probably gravitationally lensed QSOs. We estimate that the Einstein mass is 1.46 $\times$ $10^{11}$ $M_{\odot}$ and the total mass of the lens is 1.34 $\times$ $10^{13}$ $M_{\odot}$. The deflection angle is 1.97 arc seconds at positions A and B and the velocity dispersion is 261\,$km\,s^{-1}$. Theoretically, this candidate could be a pair of fold images of a strong lensing system by a galaxy group, and we will investigate the possibility when the redshifts of nearby galaxies are available.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
FrFT based estimation of linear and nonlinear impairments using Vision Transformer
Authors:
Ting Jiang,
Zheng Gao,
Yizhao Chen,
Zihe Hu,
Ming Tang
Abstract:
To comprehensively assess optical fiber communication system conditions, it is essential to implement joint estimation of the following four critical impairments: nonlinear signal-to-noise ratio (SNRNL), optical signal-to-noise ratio (OSNR), chromatic dispersion (CD) and differential group delay (DGD). However, current studies only achieve identifying a limited number of impairments within a narro…
▽ More
To comprehensively assess optical fiber communication system conditions, it is essential to implement joint estimation of the following four critical impairments: nonlinear signal-to-noise ratio (SNRNL), optical signal-to-noise ratio (OSNR), chromatic dispersion (CD) and differential group delay (DGD). However, current studies only achieve identifying a limited number of impairments within a narrow range, due to limitations in network capabilities and lack of unified representation of impairments. To address these challenges, we adopt time-frequency signal processing based on fractional Fourier transform (FrFT) to achieve the unified representation of impairments, while employing a Transformer based neural networks (NN) to break through network performance limitations. To verify the effectiveness of the proposed estimation method, the numerical simulation is carried on a 5-channel polarization-division-multiplexed quadrature phase shift keying (PDM-QPSK) long haul optical transmission system with the symbol rate of 50 GBaud per channel, the mean absolute error (MAE) for SNRNL, OSNR, CD, and DGD estimation is 0.091 dB, 0.058 dB, 117 ps/nm, and 0.38 ps, and the monitoring window ranges from 0~20 dB, 10~30 dB, 0~51000 ps/nm, and 0~100 ps, respectively. Our proposed method achieves accurate estimation of linear and nonlinear impairments over a broad range, representing a significant advancement in the field of optical performance monitoring (OPM).
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Test the Weak Cosmic Supervision Conjecture in Dark Matter-Black Hole System
Authors:
Li** Meng,
Zhaoyi Xu,
Meirong Tang
Abstract:
There is a possibility that the event horizon of a Kerr-like black hole with perfect fluid dark matter (DM) can be destroyed, providing a potential opportunity for understanding the weak cosmic censorship conjecture of black holes. In this study, we analyze the influence of the strength parameter of perfect fluid DM on the destruction of the event horizon of a Kerr-like black hole with spinning af…
▽ More
There is a possibility that the event horizon of a Kerr-like black hole with perfect fluid dark matter (DM) can be destroyed, providing a potential opportunity for understanding the weak cosmic censorship conjecture of black holes. In this study, we analyze the influence of the strength parameter of perfect fluid DM on the destruction of the event horizon of a Kerr-like black hole with spinning after injecting a test particle and a scalar field. We find that, when a test particle is incident on the black hole, the event horizon is destroyed by perfect fluid dark matter for extremal black holes. For nearly extremal black holes, when the dark matter parameter satisfies $α\in \left (-r_{h} , 0\right ) \cup \left ( r_{h} ,k_2\right )$ i.e.$(A<0)$, the event horizon of the black hole will not be destroyed; when the dark matter parameter satisfies $α\in\left ( k_1 ,-r_{h} \right ]\cup \left[0,r_{h}\right ]$ i.e.$(A\ge 0)$, the event horizon of the black hole will be destroyed. When a classical scalar field is incident into the black hole in the extremal black hole case, we find that the range of mode patterns of the scalar field that can disrupt the black hole event horizon is different for different values of the perfect fluid dark matter strength parameter. In the nearly extremal black hole case, through our analysis, we have found when $α\neq0 $ and $α\neq\pm\ r_h$ i.e.$A\neq0$, the event horizon of the black hole can be disrupted. Our research results indicate that dark matter might be capable of breaking the black hole horizon, thus potentially violating the weak cosmic censorship conjecture.
△ Less
Submitted 1 November, 2023; v1 submitted 24 August, 2023;
originally announced August 2023.
-
When MiniBatch SGD Meets SplitFed Learning:Convergence Analysis and Performance Evaluation
Authors:
Chao Huang,
Geng Tian,
Ming Tang
Abstract:
Federated learning (FL) enables collaborative model training across distributed clients (e.g., edge devices) without sharing raw data. Yet, FL can be computationally expensive as the clients need to train the entire model multiple times. SplitFed learning (SFL) is a recent distributed approach that alleviates computation workload at the client device by splitting the model at a cut layer into two…
▽ More
Federated learning (FL) enables collaborative model training across distributed clients (e.g., edge devices) without sharing raw data. Yet, FL can be computationally expensive as the clients need to train the entire model multiple times. SplitFed learning (SFL) is a recent distributed approach that alleviates computation workload at the client device by splitting the model at a cut layer into two parts, where clients only need to train part of the model. However, SFL still suffers from the \textit{client drift} problem when clients' data are highly non-IID. To address this issue, we propose MiniBatch-SFL. This algorithm incorporates MiniBatch SGD into SFL, where the clients train the client-side model in an FL fashion while the server trains the server-side model similar to MiniBatch SGD. We analyze the convergence of MiniBatch-SFL and show that the bound of the expected loss can be obtained by analyzing the expected server-side and client-side model updates, respectively. The server-side updates do not depend on the non-IID degree of the clients' datasets and can potentially mitigate client drift. However, the client-side model relies on the non-IID degree and can be optimized by properly choosing the cut layer. Perhaps counter-intuitive, our empirical result shows that a latter position of the cut layer leads to a smaller average gradient divergence and a better algorithm performance. Moreover, numerical results show that MiniBatch-SFL achieves higher accuracy than conventional SFL and FL. The accuracy improvement can be up to 24.1\% and 17.1\% with highly non-IID data, respectively.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Transmission of optical communication signals through ring core fiber using perfect vortex beams
Authors:
Nelson Villalba,
Cristóbal Melo,
Sebastián Ayala,
Christopher Mancilla,
Wladimir Valenzuela,
Miguel Figueroa,
Erik Baradit,
Riu Lin,
Ming Tang,
Stephen P. Walborn,
Gustavo Lima,
Gabriel Saavedra,
Gustavo Cañas
Abstract:
Orbital angular momentum can be used to implement high capacity data transmission systems that can be applied for classical and quantum communications. Here we experimentally study the generation and transmission properties of the so-called perfect vortex beams and the Laguerre-Gaussian beams in ring-core optical fibers. Our results show that when using a single preparation stage, the perfect vort…
▽ More
Orbital angular momentum can be used to implement high capacity data transmission systems that can be applied for classical and quantum communications. Here we experimentally study the generation and transmission properties of the so-called perfect vortex beams and the Laguerre-Gaussian beams in ring-core optical fibers. Our results show that when using a single preparation stage, the perfect vortex beams present less ring-radius variation that allows coupling of higher optical power into a ring core fiber. These results lead to lower power requirements to establish fiber-based communications links using orbital angular momentum and set the stage for future implementations of high-dimensional quantum communication over space division multiplexing fibers.
△ Less
Submitted 13 September, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Authors:
Xiaofei Wang,
Manthan Thakker,
Zhuo Chen,
Naoyuki Kanda,
Sefik Emre Eskimez,
Sanyuan Chen,
Min Tang,
Shujie Liu,
**yu Li,
Takuya Yoshioka
Abstract:
Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile…
▽ More
Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting, enabling unified and extensible modeling and providing a consistent way for leveraging textual input in speech enhancement and transformation tasks. Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise, achieving comparable or superior performance to specialized models across tasks. See https://aka.ms/speechx for demo samples.
△ Less
Submitted 25 June, 2024; v1 submitted 13 August, 2023;
originally announced August 2023.
-
Quasinormal modes of dark matter core-black hole spacetime
Authors:
Min Zhao,
Meirong Tang,
Zhaoyi Xu
Abstract:
In the galactic core, when the scale of dark matter is small, the distribution of dark matter is that of a constant density dark matter core.Considering the case of a supermassive black hole coupled to a constant density dark matter core, we study the quasinormal modes of the black hole in the constant density dark matter core black hole system and calculate the quasinormal modes frequency of the…
▽ More
In the galactic core, when the scale of dark matter is small, the distribution of dark matter is that of a constant density dark matter core.Considering the case of a supermassive black hole coupled to a constant density dark matter core, we study the quasinormal modes of the black hole in the constant density dark matter core black hole system and calculate the quasinormal modes frequency of the black hole using the third order WKB approximation and the prony method.In addition, we study the effect of the constant density dark matter core parameter $r_0$ on the quasinormal modes of black holes in the vicinity of black holes.As the angular quantum number increases, the ringdown process becomes closer and closer to the case of the ringdown process of a schwarzschild black hole.The presence of a constant density dark matter core affects the quasinormal modes of the black hole, with relative deviations on the order of $10^{-15}-10^{-13}$ with respect to the detector.These features suggest that with future improvements in detector accuracy, we can use them for the detection of gravitational waves in the spacetime of constant density dark matter core-black hole systems, which in turn opens up the possibility of understanding the behavior of dark matter in the vicinity of black holes.
△ Less
Submitted 20 February, 2024; v1 submitted 12 August, 2023;
originally announced August 2023.
-
Real-time FPGA Implementation of CNN-based Distributed Fiber Optic Vibration Event Recognition Method
Authors:
Zhongyao Luo,
Zhao Ge,
Hao Wu,
Ming Tang
Abstract:
Utilizing optical fibers to detect and pinpoint vibrations, Distributed Optical Fiber Vibration Sensing (DVS) technology provides real-time monitoring and surveillance of wide-reaching areas. This field has been leveraging Convolutional Neural Networks (CNN). Recently, a study has accomplished end-to-end vibration event recognition, enabling utilization of CNN-based DVS algorithms as real-time emb…
▽ More
Utilizing optical fibers to detect and pinpoint vibrations, Distributed Optical Fiber Vibration Sensing (DVS) technology provides real-time monitoring and surveillance of wide-reaching areas. This field has been leveraging Convolutional Neural Networks (CNN). Recently, a study has accomplished end-to-end vibration event recognition, enabling utilization of CNN-based DVS algorithms as real-time embedded system for edge computing in practical application situations. Considering the power consumption of central processing unit (CPU) and graphics processing unit (GPU), and the inflexibility of application-specific integrated circuit (ASIC), field-Programmable gate array (FPGA) is the optimal computing platform for the system. This paper proposes to compress pre-trained network and adopt a novel hardware structure, to design a fully on-chip, pipelined inference accelerator for CNN-based DVS algorithm, without fine tuning or re-training. This design allows for real-time processing with low power consumption and system requirement.An examination has been executed on an existing DVS algorithm based on a 40-layer CNN model comprising 2.7 million parameters. It is completely implemented on-chip, pipelined, with no reduction in accuracy.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Distributed Target Tracking with Fading Channels over Underwater Wireless Sensor Networks
Authors:
Miaoyi Tang,
Meiqin Liu,
Senlin Zhang,
Ronghao Zheng,
Shanling Dong
Abstract:
This paper investigates the problem of distributed target tracking via underwater wireless sensor networks (UWSNs) with fading channels. The degradation of signal quality due to wireless channel fading can significantly impact network reliability and subsequently reduce the tracking accuracy. To address this issue, we propose a modified distributed unscented Kalman filter (DUKF) named DUKF-Fc, whi…
▽ More
This paper investigates the problem of distributed target tracking via underwater wireless sensor networks (UWSNs) with fading channels. The degradation of signal quality due to wireless channel fading can significantly impact network reliability and subsequently reduce the tracking accuracy. To address this issue, we propose a modified distributed unscented Kalman filter (DUKF) named DUKF-Fc, which takes into account the effects of measurement fluctuation and transmission failure induced by channel fading. The channel estimation error is also considered when designing the estimator and a sufficient condition is established to ensure the stochastic boundedness of the estimation error. The proposed filtering scheme is versatile and possesses wide applicability to numerous real-world scenarios, e.g., tracking a maneuvering underwater target with acoustic sensors. Simulation results demonstrate the effectiveness of the proposed filtering algorithm. In addition, considering the constraints of network energy resources, the issue of investigating a trade-off between tracking performance and energy consumption is discussed accordingly.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Uncertainty-aware Gaussian Mixture Model for UWB Time Difference of Arrival Localization in Cluttered Environments
Authors:
Wenda Zhao,
Abhishek Goudar,
Mingliang Tang,
Xinyuan Qiao,
Angela P. Schoellig
Abstract:
Ultra-wideband (UWB) time difference of arrival(TDOA)-based localization has emerged as a low-cost and scalable indoor positioning solution. However, in cluttered environments, the performance of UWB TDOA-based localization deteriorates due to the biased and non-Gaussian noise distributions induced by obstacles. In this work, we present a bi-level optimization-based joint localization and noise mo…
▽ More
Ultra-wideband (UWB) time difference of arrival(TDOA)-based localization has emerged as a low-cost and scalable indoor positioning solution. However, in cluttered environments, the performance of UWB TDOA-based localization deteriorates due to the biased and non-Gaussian noise distributions induced by obstacles. In this work, we present a bi-level optimization-based joint localization and noise model learning algorithm to address this problem. In particular, we use a Gaussian mixture model (GMM) to approximate the measurement noise distribution. We explicitly incorporate the estimated state's uncertainty into the GMM noise model learning, referred to as uncertainty-aware GMM, to improve both noise modeling and localization performance. We first evaluate the GMM noise model learning and localization performance in numerous simulation scenarios. We then demonstrate the effectiveness of our algorithm in extensive real-world experiments using two different cluttered environments. We show that our algorithm provides accurate position estimates with low-cost UWB sensors, no prior knowledge about the obstacles in the space, and a significant amount of UWB radios occluded.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Multi-objective Deep Reinforcement Learning for Mobile Edge Computing
Authors:
Ning Yang,
Junrui Wen,
Meng Zhang,
Ming Tang
Abstract:
Mobile edge computing (MEC) is essential for next-generation mobile network applications that prioritize various performance metrics, including delays and energy consumption. However, conventional single-objective scheduling solutions cannot be directly applied to practical systems in which the preferences of these applications (i.e., the weights of different objectives) are often unknown or chall…
▽ More
Mobile edge computing (MEC) is essential for next-generation mobile network applications that prioritize various performance metrics, including delays and energy consumption. However, conventional single-objective scheduling solutions cannot be directly applied to practical systems in which the preferences of these applications (i.e., the weights of different objectives) are often unknown or challenging to specify in advance. In this study, we address this issue by formulating a multi-objective offloading problem for MEC with multiple edges to minimize expected long-term energy consumption and transmission delay while considering unknown preferences as parameters. To address the challenge of unknown preferences, we design a multi-objective (deep) reinforcement learning (MORL)-based resource scheduling scheme with proximal policy optimization (PPO). In addition, we introduce a well-designed state encoding method for constructing features for multiple edges in MEC systems, a sophisticated reward function for accurately computing the utilities of delay and energy consumption. Simulation results demonstrate that our proposed MORL scheme enhances the hypervolume of the Pareto front by up to 233.1% compared to benchmarks. Our full framework is available at https://github.com/gracefulning/mec_morl_multipolicy.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
New Covert and Side Channels Based on Retirement
Authors:
Ke Xu,
Ming Tang,
Quancheng Wang,
Han Wang
Abstract:
Intel processors utilize the retirement to orderly retire the micro-ops that have been executed out of order. To enhance retirement utilization, the retirement is dynamically shared between two logical cores on the same physical core. However, this shared retirement mechanism creates a potential vulnerability wherein an attacker can exploit the competition for retirement to infer the data of a vic…
▽ More
Intel processors utilize the retirement to orderly retire the micro-ops that have been executed out of order. To enhance retirement utilization, the retirement is dynamically shared between two logical cores on the same physical core. However, this shared retirement mechanism creates a potential vulnerability wherein an attacker can exploit the competition for retirement to infer the data of a victim on another logical core on the same physical core. Based on this leakage, we propose two new covert channels: the Different Instructions (DI) covert channel using different instructions for information transmission, and the Same Instructions (SI) covert channel using the same instructions to transmit information. The DI covert channel can achieve 98.5% accuracy with a bandwidth of 1450 Kbps, while the SI covert channel can achieve 94.85% accuracy with a bandwidth of 483.33 Kbps. Furthermore, this paper explores additional applications of retirement: Firstly, retirement is applied to Spectre attacks, resulting in a new variant of Spectre v1, which can achieve 94.17% accuracy with a bandwidth of 29 Kbps; Secondly, retirement is leveraged to infer the programs being executed by the victim, which can infer 10 integer benchmarks of SPEC with 89.28% accuracy. Finally, we discuss possible protection against new covert channels.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
MDI+: A Flexible Random Forest-Based Feature Importance Framework
Authors:
Abhineet Agarwal,
Ana M. Kenney,
Yan Shuo Tan,
Tiffany M. Tang,
Bin Yu
Abstract:
Mean decrease in impurity (MDI) is a popular feature importance measure for random forests (RFs). We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$. We use this interpretation to propose a flexible feature importance framework called MDI+. Speci…
▽ More
Mean decrease in impurity (MDI) is a popular feature importance measure for random forests (RFs). We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$. We use this interpretation to propose a flexible feature importance framework called MDI+. Specifically, MDI+ generalizes MDI by allowing the analyst to replace the linear regression model and $R^2$ metric with regularized generalized linear models (GLMs) and metrics better suited for the given data structure. Moreover, MDI+ incorporates additional features to mitigate known biases of decision trees against additive or smooth models. We further provide guidance on how practitioners can choose an appropriate GLM and metric based upon the Predictability, Computability, Stability framework for veridical data science. Extensive data-inspired simulations show that MDI+ significantly outperforms popular feature importance measures in identifying signal features. We also apply MDI+ to two real-world case studies on drug response prediction and breast cancer subtype classification. We show that MDI+ extracts well-established predictive genes with significantly greater stability compared to existing feature importance measures. All code and models are released in a full-fledged python package on Github.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Fast Segment Anything
Authors:
Xu Zhao,
Wenchao Ding,
Yongqi An,
Yinglong Du,
Tao Yu,
Min Li,
Ming Tang,
**qiao Wang
Abstract:
The recently proposed segment anything model (SAM) has made a significant influence in many computer vision tasks. It is becoming a foundation step for many high-level tasks, like image segmentation, image caption, and image editing. However, its huge computation costs prevent it from wider applications in industry scenarios. The computation mainly comes from the Transformer architecture at high-r…
▽ More
The recently proposed segment anything model (SAM) has made a significant influence in many computer vision tasks. It is becoming a foundation step for many high-level tasks, like image segmentation, image caption, and image editing. However, its huge computation costs prevent it from wider applications in industry scenarios. The computation mainly comes from the Transformer architecture at high-resolution inputs. In this paper, we propose a speed-up alternative method for this fundamental task with comparable performance. By reformulating the task as segments-generation and prompting, we find that a regular CNN detector with an instance segmentation branch can also accomplish this task well. Specifically, we convert this task to the well-studied instance segmentation task and directly train the existing instance segmentation method using only 1/50 of the SA-1B dataset published by SAM authors. With our method, we achieve a comparable performance with the SAM method at 50 times higher run-time speed. We give sufficient experimental results to demonstrate its effectiveness. The codes and demos will be released at https://github.com/CASIA-IVA-Lab/FastSAM.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Renderers are Good Zero-Shot Representation Learners: Exploring Diffusion Latents for Metric Learning
Authors:
Michael Tang,
David Shustin
Abstract:
Can the latent spaces of modern generative neural rendering models serve as representations for 3D-aware discriminative visual understanding tasks? We use retrieval as a proxy for measuring the metric learning properties of the latent spaces of Shap-E, including capturing view-independence and enabling the aggregation of scene representations from the representations of individual image views, and…
▽ More
Can the latent spaces of modern generative neural rendering models serve as representations for 3D-aware discriminative visual understanding tasks? We use retrieval as a proxy for measuring the metric learning properties of the latent spaces of Shap-E, including capturing view-independence and enabling the aggregation of scene representations from the representations of individual image views, and find that Shap-E representations outperform those of the classical EfficientNet baseline representations zero-shot, and is still competitive when both methods are trained using a contrative loss. These findings give preliminary indication that 3D-based rendering and generative models can yield useful representations for discriminative tasks in our innately 3D-native world. Our code is available at \url{https://github.com/michaelwilliamtang/golden-retriever}.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Real-time COVID-19 hospital admissions forecasting with leading indicators and ensemble methods in England
Authors:
Jonathon Mellor,
Rachel Christie,
Robert S Paton,
Rhianna Leslie,
Maria Tang,
Martyn Fyles,
Sarah Deeny,
Thomas Ward,
Christopher E Overton
Abstract:
Hospitalisations from COVID-19 with Omicron sub-lineages have put a sustained pressure on the English healthcare system. Understanding the expected healthcare demand enables more effective and timely planning from public health. We collect syndromic surveillance sources, which include online search data, NHS 111 telephonic and online triages. Incorporating this data we explore generalised additive…
▽ More
Hospitalisations from COVID-19 with Omicron sub-lineages have put a sustained pressure on the English healthcare system. Understanding the expected healthcare demand enables more effective and timely planning from public health. We collect syndromic surveillance sources, which include online search data, NHS 111 telephonic and online triages. Incorporating this data we explore generalised additive models, generalised linear mixed-models, penalised generalised linear models and model ensemble methods to forecast over a two-week forecast horizon at an NHS Trust level. Furthermore, we showcase how model combinations improve forecast scoring through a mean ensemble, weighted ensemble, and ensemble by regression. Validated over multiple Omicron waves, at different spatial scales, we show that leading indicators can improve performance of forecasting models, particularly at epidemic changepoints. Using a variety of scoring rules, we show that ensemble approaches outperformed all individual models, providing higher performance at a 21-day window than the corresponding individual models at 14-days. We introduce a modelling structure used by public health officials in England in 2022 to inform NHS healthcare strategy and policy decision making. This paper explores the significance of ensemble methods to improve forecasting performance and how novel syndromic surveillance can be practically applied in epidemic forecasting.
△ Less
Submitted 16 August, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Overview of the JWST Advanced Deep Extragalactic Survey (JADES)
Authors:
Daniel J. Eisenstein,
Chris Willott,
Stacey Alberts,
Santiago Arribas,
Nina Bonaventura,
Andrew J. Bunker,
Alex J. Cameron,
Stefano Carniani,
Stephane Charlot,
Emma Curtis-Lake,
Francesco D'Eugenio,
Ryan Endsley,
Pierre Ferruit,
Giovanna Giardino,
Kevin Hainline,
Ryan Hausen,
Peter Jakobsen,
Benjamin D. Johnson,
Roberto Maiolino,
Marcia Rieke,
George Rieke,
Hans-Walter Rix,
Brant Robertson,
Daniel P. Stark,
Sandro Tacchella
, et al. (51 additional authors not shown)
Abstract:
We present an overview of the James Webb Space Telescope (JWST) Advanced Deep Extragalactic Survey (JADES), an ambitious program of infrared imaging and spectroscopy in the GOODS-S and GOODS-N deep fields, designed to study galaxy evolution from high redshift to cosmic noon. JADES uses about 770 hours of Cycle 1 guaranteed time largely from the Near-Infrared Camera (NIRCam) and Near-Infrared Spect…
▽ More
We present an overview of the James Webb Space Telescope (JWST) Advanced Deep Extragalactic Survey (JADES), an ambitious program of infrared imaging and spectroscopy in the GOODS-S and GOODS-N deep fields, designed to study galaxy evolution from high redshift to cosmic noon. JADES uses about 770 hours of Cycle 1 guaranteed time largely from the Near-Infrared Camera (NIRCam) and Near-Infrared Spectrograph (NIRSpec) instrument teams. In GOODS-S, in and around the Hubble Ultra Deep Field and Chandra Deep Field South, JADES produces a deep imaging region of ~45 arcmin$^2$ with an average of 130 hrs of exposure time spread over 9 NIRCam filters. This is extended at medium depth in GOODS-S and GOODS-N with NIRCam imaging of ~175 arcmin$^2$ with an average exposure time of 20 hrs spread over 8-10 filters. In both fields, we conduct extensive NIRSpec multi-object spectroscopy, including 2 deep pointings of 55 hrs exposure time, 14 medium pointings of ~12 hrs, and 15 shallower pointings of ~4 hrs, targeting over 5000 HST and JWST-detected faint sources with 5 low, medium, and high-resolution dispersers covering 0.6-5.3 microns. Finally, JADES extends redward via coordinated parallels with the JWST Mid-Infrared Instrument (MIRI), featuring ~9 arcmin$^2$ with 43 hours of exposure at 7.7 microns and twice that area with 2-6.5 hours of exposure at 12.8 microns For nearly 30 years, the GOODS-S and GOODS-N fields have been developed as the premier deep fields on the sky; JADES is now providing a compelling start on the JWST legacy in these fields.
△ Less
Submitted 4 June, 2023;
originally announced June 2023.
-
BandwidthBreach: Unleashing Covert and Side Channels through Cache Bandwidth Exploitation
Authors:
Han Wang,
Ming Tang,
Ke Xu,
Quancheng Wang
Abstract:
In the modern CPU architecture, enhancements such as the Line Fill Buffer (LFB) and Super Queue (SQ), which are designed to track pending cache requests, have significantly boosted performance. To exploit this structures, we deliberately engineered blockages in the L2 to L1d route by controlling LFB conflict and triggering prefetch prediction failures, while consciously dismissing other plausible…
▽ More
In the modern CPU architecture, enhancements such as the Line Fill Buffer (LFB) and Super Queue (SQ), which are designed to track pending cache requests, have significantly boosted performance. To exploit this structures, we deliberately engineered blockages in the L2 to L1d route by controlling LFB conflict and triggering prefetch prediction failures, while consciously dismissing other plausible influencing factors. This approach was subsequently extended to the L3 to L2 and L2 to L1i pathways, resulting in three potent covert channels, termed L2CC, L3CC, and LiCC, with capacities of 10.02 Mbps, 10.37 Mbps, and 1.83 Mbps, respectively. Strikingly, the capacities of L2CC and L3CC surpass those of earlier non-shared-memory-based covert channels, reaching a level comparable to their shared memory-dependent equivalents. Leveraging this congestion further facilitated the extraction of key bits from RSA and EdDSA implementations. Coupled with SpectreV1 and V2, our covert channels effectively evade the majority of traditional Spectre defenses. Their confluence with Branch Prediction (BP) Timing assaults additionally undercuts balanced branch protections, hence broadening their capability to infiltrate a wide range of cryptography libraries.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
VO2 Phase Change Electrodes in Li-ion Batteries
Authors:
Samuel Castro-Pardo,
Anand B. Puthirath,
Shaoxun Fan,
Sreehari Saju,
Guang Yang,
Jagjit Nanda,
Robert Vajtai,
Ming Tang,
Pulickel M. Ajayan
Abstract:
Use of electrode materials that show phase change behavior and hence drastic changes in electrochemical activity during operation, have not been explored for Li-ion batteries. Here we demonstrate the vanadium oxide (VO2) cathode that undergoes metal-insulator transition due to first-order structural phase transition at accessible temperature of 68°C for battery operation. Using a suitable electrol…
▽ More
Use of electrode materials that show phase change behavior and hence drastic changes in electrochemical activity during operation, have not been explored for Li-ion batteries. Here we demonstrate the vanadium oxide (VO2) cathode that undergoes metal-insulator transition due to first-order structural phase transition at accessible temperature of 68°C for battery operation. Using a suitable electrolyte operable across the phase transition range and compatible with vanadium oxide cathodes, we studied the effect of electrode structure change on lithium insertion followed by the electrochemical characteristics above and below the phase transition temperature. The high-temperature VO2 phase shows significantly improved capacitance, enhanced current rate capabilities, improved electrical conductivity and lithium-ion diffusivity compared to the insulating low temperature phase. This opens up new avenues for electrode designs, allowing manipulation of electrochemical reactions around phase transition temperatures, and in particular enhancing electrochemical properties at elevated temperatures contrary to existing classes of battery chemistries that lead to performance deterioration at elevated temperatures.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
CTSN: Predicting Cloth Deformation for Skeleton-based Characters with a Two-stream Skinning Network
Authors:
Yudi Li,
Min Tang,
Yun Yang,
Ruofeng Tong,
Shuangcai Yang,
Yao Li,
Bailin An,
Qilong Kou
Abstract:
We present a novel learning method to predict the cloth deformation for skeleton-based characters with a two-stream network. The characters processed in our approach are not limited to humans, and can be other skeletal-based representations of non-human targets such as fish or pets. We use a novel network architecture which consists of skeleton-based and mesh-based residual networks to learn the c…
▽ More
We present a novel learning method to predict the cloth deformation for skeleton-based characters with a two-stream network. The characters processed in our approach are not limited to humans, and can be other skeletal-based representations of non-human targets such as fish or pets. We use a novel network architecture which consists of skeleton-based and mesh-based residual networks to learn the coarse and wrinkle features as the overall residual from the template cloth mesh. Our network is used to predict the deformation for loose or tight-fitting clothing or dresses. We ensure that the memory footprint of our network is low, and thereby result in reduced storage and computational requirements. In practice, our prediction for a single cloth mesh for the skeleton-based character takes about 7 milliseconds on an NVIDIA GeForce RTX 3090 GPU. Compared with prior methods, our network can generate fine deformation results with details and wrinkles.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Referral Augmentation for Zero-Shot Information Retrieval
Authors:
Michael Tang,
Shunyu Yao,
John Yang,
Karthik Narasimhan
Abstract:
We propose Referral-Augmented Retrieval (RAR), a simple technique that concatenates document indices with referrals, i.e. text from other documents that cite or link to the given document, to provide significant performance gains for zero-shot information retrieval. The key insight behind our method is that referrals provide a more complete, multi-view representation of a document, much like incom…
▽ More
We propose Referral-Augmented Retrieval (RAR), a simple technique that concatenates document indices with referrals, i.e. text from other documents that cite or link to the given document, to provide significant performance gains for zero-shot information retrieval. The key insight behind our method is that referrals provide a more complete, multi-view representation of a document, much like incoming page links in algorithms like PageRank provide a comprehensive idea of a webpage's importance. RAR works with both sparse and dense retrievers, and outperforms generative text expansion techniques such as DocT5Query and Query2Doc a 37% and 21% absolute improvement on ACL paper retrieval Recall@10 -- while also eliminating expensive model training and inference. We also analyze different methods for multi-referral aggregation and show that RAR enables up-to-date information retrieval without re-training.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
The Integrated Forward-Forward Algorithm: Integrating Forward-Forward and Shallow Backpropagation With Local Losses
Authors:
Desmond Y. M. Tang
Abstract:
The backpropagation algorithm, despite its widespread use in neural network learning, may not accurately emulate the human cortex's learning process. Alternative strategies, such as the Forward-Forward Algorithm (FFA), offer a closer match to the human cortex's learning characteristics. However, the original FFA paper and related works on the Forward-Forward Algorithm only mentioned very limited t…
▽ More
The backpropagation algorithm, despite its widespread use in neural network learning, may not accurately emulate the human cortex's learning process. Alternative strategies, such as the Forward-Forward Algorithm (FFA), offer a closer match to the human cortex's learning characteristics. However, the original FFA paper and related works on the Forward-Forward Algorithm only mentioned very limited types of neural network mechanisms and may limit its application and effectiveness. In response to these challenges, we propose an integrated method that combines the strengths of both FFA and shallow backpropagation, yielding a biologically plausible neural network training algorithm which can also be applied to various network structures. We applied this integrated approach to the classification of the Modified National Institute of Standards and Technology (MNIST) database, where it outperformed FFA and demonstrated superior resilience to noise compared to backpropagation. We show that training neural networks with the Integrated Forward-Forward Algorithm has the potential of generating neural networks with advantageous features like robustness.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Sequential Memory with Temporal Predictive Coding
Authors:
Mufeng Tang,
Helen Barron,
Rafal Bogacz
Abstract:
Forming accurate memory of sequential stimuli is a fundamental function of biological agents. However, the computational mechanism underlying sequential memory in the brain remains unclear. Inspired by neuroscience theories and recent successes in applying predictive coding (PC) to \emph{static} memory tasks, in this work we propose a novel PC-based model for \emph{sequential} memory, called \emph…
▽ More
Forming accurate memory of sequential stimuli is a fundamental function of biological agents. However, the computational mechanism underlying sequential memory in the brain remains unclear. Inspired by neuroscience theories and recent successes in applying predictive coding (PC) to \emph{static} memory tasks, in this work we propose a novel PC-based model for \emph{sequential} memory, called \emph{temporal predictive coding} (tPC). We show that our tPC models can memorize and retrieve sequential inputs accurately with a biologically plausible neural implementation. Importantly, our analytical study reveals that tPC can be viewed as a classical Asymmetric Hopfield Network (AHN) with an implicit statistical whitening process, which leads to more stable performance in sequential memory tasks of structured inputs. Moreover, we find that tPC exhibits properties consistent with behavioral observations and theories in neuroscience, thereby strengthening its biological relevance. Our work establishes a possible computational mechanism underlying sequential memory in the brain that can also be theoretically interpreted using existing memory model frameworks.
△ Less
Submitted 26 October, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in Natural Language Processing
Authors:
Tingting Wu,
Xiao Ding,
Minji Tang,
Hao Zhang,
Bing Qin,
Ting Liu
Abstract:
Large-scale datasets in the real world inevitably involve label noise. Deep models can gradually overfit noisy labels and thus degrade model generalization. To mitigate the effects of label noise, learning with noisy labels (LNL) methods are designed to achieve better generalization performance. Due to the lack of suitable datasets, previous studies have frequently employed synthetic label noise t…
▽ More
Large-scale datasets in the real world inevitably involve label noise. Deep models can gradually overfit noisy labels and thus degrade model generalization. To mitigate the effects of label noise, learning with noisy labels (LNL) methods are designed to achieve better generalization performance. Due to the lack of suitable datasets, previous studies have frequently employed synthetic label noise to mimic real-world label noise. However, synthetic noise is not instance-dependent, making this approximation not always effective in practice. Recent research has proposed benchmarks for learning with real-world noisy labels. However, the noise sources within may be single or fuzzy, making benchmarks different from data with heterogeneous label noises in the real world. To tackle these issues, we contribute NoisywikiHow, the largest NLP benchmark built with minimal supervision. Specifically, inspired by human cognition, we explicitly construct multiple sources of label noise to imitate human errors throughout the annotation, replicating real-world noise, whose corruption is affected by both ground-truth labels and instances. Moreover, we provide a variety of noise levels to support controlled experiments on noisy data, enabling us to evaluate LNL methods systematically and comprehensively. After that, we conduct extensive multi-dimensional experiments on a broad range of LNL methods, obtaining new and intriguing findings.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.