-
Mirage: An RNS-Based Photonic Accelerator for DNN Training
Authors:
Cansu Demirkiran,
Guowei Yang,
Darius Bunandar,
Ajay Joshi
Abstract:
Photonic computing is a compelling avenue for performing highly efficient matrix multiplication, a crucial operation in Deep Neural Networks (DNNs). While this method has shown great success in DNN inference, meeting the high precision demands of DNN training proves challenging due to the precision limitations imposed by costly data converters and the analog noise inherent in photonic hardware. Th…
▽ More
Photonic computing is a compelling avenue for performing highly efficient matrix multiplication, a crucial operation in Deep Neural Networks (DNNs). While this method has shown great success in DNN inference, meeting the high precision demands of DNN training proves challenging due to the precision limitations imposed by costly data converters and the analog noise inherent in photonic hardware. This paper proposes Mirage, a photonic DNN training accelerator that overcomes the precision challenges in photonic hardware using the Residue Number System (RNS). RNS is a numeral system based on modular arithmetic, allowing us to perform high-precision operations via multiple low-precision modular operations. In this work, we present a novel micro-architecture and dataflow for an RNS-based photonic tensor core performing modular arithmetic in the analog domain. By combining RNS and photonics, Mirage provides high energy efficiency without compromising precision and can successfully train state-of-the-art DNNs achieving accuracy comparable to FP32 training. Our study shows that on average across several DNNs when compared to systolic arrays, Mirage achieves more than $23.8\times$ faster training and $32.1\times$ lower EDP in an iso-energy scenario and consumes $42.8\times$ lower power with comparable or better EDP in an iso-area scenario.
△ Less
Submitted 24 May, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
The Graph Convolutional Network with Multi-representation Alignment for Drug Synergy Prediction
Authors:
Xinxing Yang,
Genke Yang,
Jian Chu
Abstract:
Drug combination refers to the use of two or more drugs to treat a specific disease at the same time. It is currently the mainstream way to treat complex diseases. Compared with single drugs, drug combinations have better efficacy and can better inhibit toxicity and drug resistance. The computational model based on deep learning concatenates the representation of multiple drugs and the correspondi…
▽ More
Drug combination refers to the use of two or more drugs to treat a specific disease at the same time. It is currently the mainstream way to treat complex diseases. Compared with single drugs, drug combinations have better efficacy and can better inhibit toxicity and drug resistance. The computational model based on deep learning concatenates the representation of multiple drugs and the corresponding cell line feature as input, and the output is whether the drug combination can have an inhibitory effect on the cell line. However, this strategy of concatenating multiple representations has the following defects: the alignment of drug representation and cell line representation is ignored, resulting in the synergistic relationship not being reflected positionally in the embedding space. Moreover, the alignment measurement function in deep learning cannot be suitable for drug synergy prediction tasks due to differences in input types. Therefore, in this work, we propose a graph convolutional network with multi-representation alignment (GCNMRA) for predicting drug synergy. In the GCNMRA model, we designed a multi-representation alignment function suitable for the drug synergy prediction task so that the positional relationship between drug representations and cell line representation is reflected in the embedding space. In addition, the vector modulus of drug representations and cell line representation is considered to improve the accuracy of calculation results and accelerate model convergence. Finally, many relevant experiments were run on multiple drug synergy datasets to verify the effectiveness of the above innovative elements and the excellence of the GCNMRA model.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
SEINE: SEgment-based Indexing for NEural information retrieval
Authors:
Sibo Dong,
Justin Goldstein,
Grace Hui Yang
Abstract:
Many early neural Information Retrieval (NeurIR) methods are re-rankers that rely on a traditional first-stage retriever due to expensive query time computations. Recently, representation-based retrievers have gained much attention, which learns query representation and document representation separately, making it possible to pre-compute document representations offline and reduce the workload at…
▽ More
Many early neural Information Retrieval (NeurIR) methods are re-rankers that rely on a traditional first-stage retriever due to expensive query time computations. Recently, representation-based retrievers have gained much attention, which learns query representation and document representation separately, making it possible to pre-compute document representations offline and reduce the workload at query time. Both dense and sparse representation-based retrievers have been explored. However, these methods focus on finding the representation that best represents a text (aka metric learning) and the actual retrieval function that is responsible for similarity matching between query and document is kept at a minimum by using dot product. One drawback is that unlike traditional term-level inverted index, the index formed by these embeddings cannot be easily re-used by another retrieval method. Another drawback is that kee** the interaction at minimum hurts retrieval effectiveness. On the contrary, interaction-based retrievers are known for their better retrieval effectiveness. In this paper, we propose a novel SEgment-based Neural Indexing method, SEINE, which provides a general indexing framework that can flexibly support a variety of interaction-based neural retrieval methods. We emphasize on a careful decomposition of common components in existing neural retrieval methods and propose to use segment-level inverted index to store the atomic query-document interaction values. Experiments on LETOR MQ2007 and MQ2008 datasets show that our indexing method can accelerate multiple neural retrieval methods up to 28-times faster without sacrificing much effectiveness.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Nonlinear Stability Boundary Assessment of Multi-Converter Systems Based On Reverse Time Trajectory
Authors:
Sujay Ghosh,
Mohammad Kazem Bakhshizadeh,
Guangya Yang,
Łukasz Kocewiak
Abstract:
As the integration of wind power accelerates, wind power plants (WPPs) are expected to play a crucial role in ensuring stability in future power grids. This paper examines the nonlinear stability boundary of a multi-converter system in a wind power plant (WPP) connected to an AC power grid via a long HVAC cable. Traditionally, for nonlinear analysis of WPPs, a simplification is adopted wherein the…
▽ More
As the integration of wind power accelerates, wind power plants (WPPs) are expected to play a crucial role in ensuring stability in future power grids. This paper examines the nonlinear stability boundary of a multi-converter system in a wind power plant (WPP) connected to an AC power grid via a long HVAC cable. Traditionally, for nonlinear analysis of WPPs, a simplification is adopted wherein the WPP is treated as an aggregation of individual wind turbines (WTs), with a simplified portrayal of the collector network. However, in the presence of different technologies, such as STATCOM, that are placed away from the WTs, the model aggregation will not hold. This paper presents a unified methodology to model and investigate the high-dimensional stability boundary of a WPP with a STATCOM. The stability region of the system, i.e. the region of attraction (RoA), is determined by the reverse time (backwards) trajectory technique. Furthermore, the estimated stability boundary is verified using time-domain simulation studies in PSCAD.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Where to Begin? From Random to Foundation Model Instructed Initialization in Federated Learning for Medical Image Segmentation
Authors:
Ming Li,
Guang Yang
Abstract:
In medical image analysis, Federated Learning (FL) stands out as a key technology that enables privacy-preserved, decentralized data processing, crucial for handling sensitive medical data. Currently, most FL models employ random initialization, which has been proven effective in various instances. However, given the unique challenges posed by non-IID (independently and identically distributed) da…
▽ More
In medical image analysis, Federated Learning (FL) stands out as a key technology that enables privacy-preserved, decentralized data processing, crucial for handling sensitive medical data. Currently, most FL models employ random initialization, which has been proven effective in various instances. However, given the unique challenges posed by non-IID (independently and identically distributed) data in FL, we propose a novel perspective: exploring the impact of using the foundation model with enormous pre-trained knowledge, such as the Segment Anything Model (SAM), as an instructive teacher for FL model initialization in medical image segmentation task. This work for the first time attempts to utilize the foundation model as an instructive teacher for initialization in FL, assessing its impact on the performance of FL models, especially in non-IID data scenarios. Our empirical evaluation on chest x-ray lung segmentation showcases that FL with foundation model instructed initialization not only achieves faster convergence but also improves performance in complex data contexts. These findings offer a new perspective for model initialization in FL.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Evidence for a Shallow Evolution in the Volume Densities of Massive Galaxies at $z=4$ to $8$ from CEERS
Authors:
Katherine Chworowsky,
Steven L. Finkelstein,
Michael Boylan-Kolchin,
Elizabeth J. McGrath,
Kartheik G. Iyer,
Casey Papovich,
Mark Dickinson,
Anthony J. Taylor,
L. Y. Aaron Yung,
Pablo Arrabal Haro,
Micaela B. Bagley,
Bren E. Backhaus,
Rachana Bhatawdekar,
Yingjie Cheng,
Nikko J. Cleri,
Justin W. Cole,
M. C. Cooper,
Luca Costantin,
Avishai Dekel,
Maximilien Franco,
Seiji Fujimoto,
Christopher C. Hayward,
Benne W. Holwerda,
Marc Huertas-Company,
Michaela Hirschmann
, et al. (14 additional authors not shown)
Abstract:
We analyze the evolution of massive (log$_{10}$ [$M_\star/M_\odot$] $>10$) galaxies at $z \sim$ 4--8 selected from the JWST Cosmic Evolution Early Release Science (CEERS) survey. We infer the physical properties of all galaxies in the CEERS NIRCam imaging through spectral energy distribution (SED) fitting with dense basis to select a sample of high redshift massive galaxies. Where available we inc…
▽ More
We analyze the evolution of massive (log$_{10}$ [$M_\star/M_\odot$] $>10$) galaxies at $z \sim$ 4--8 selected from the JWST Cosmic Evolution Early Release Science (CEERS) survey. We infer the physical properties of all galaxies in the CEERS NIRCam imaging through spectral energy distribution (SED) fitting with dense basis to select a sample of high redshift massive galaxies. Where available we include constraints from additional CEERS observing modes, including 18 sources with MIRI photometric coverage, and 28 sources with spectroscopic confirmations from NIRSpec or NIRCam wide-field slitless spectroscopy. We sample the recovered posteriors in stellar mass from SED fitting to infer the volume densities of massive galaxies across cosmic time, taking into consideration the potential for sample contamination by active galactic nuclei (AGN). We find that the evolving abundance of massive galaxies tracks expectations based on a constant baryon conversion efficiency in dark matter halos for $z \sim$ 1--4. At higher redshifts, we observe an excess abundance of massive galaxies relative to this simple model. These higher abundances can be explained by modest changes to star formation physics and/or the efficiencies with which star formation occurs in massive dark matter halos, and are not in tension with modern cosmology.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Machine-Learned Atomic Cluster Expansion Potentials for Fast and Quantum-Accurate Thermal Simulations of Wurtzite AlN
Authors:
Guang Yang,
Yuan-Bin Liu,
Lei Yang,
Bing-Yang Cao
Abstract:
Using the atomic cluster expansion (ACE) framework, we develop a machine learning interatomic potential for fast and accurately modelling the phonon transport properties of wurtzite aluminum nitride. The predictive power of the ACE potential against density functional theory (DFT) is demonstrated across a broad range of properties of w-AlN, including ground-state lattice parameters, specific heat…
▽ More
Using the atomic cluster expansion (ACE) framework, we develop a machine learning interatomic potential for fast and accurately modelling the phonon transport properties of wurtzite aluminum nitride. The predictive power of the ACE potential against density functional theory (DFT) is demonstrated across a broad range of properties of w-AlN, including ground-state lattice parameters, specific heat capacity, coefficients of thermal expansion, bulk modulus, and harmonic phonon dispersions. Validation of lattice thermal conductivity is further carried out by comparing the ACE-predicted values to the DFT calculations and experiments, exhibiting the overall capability of our ACE potential in sufficiently describing anharmonic phonon interactions. As a practical application, we perform a lattice dynamics analysis using the potential to unravel the effects of biaxial strains on thermal conductivity and phonon properties of w-AlN, which is identified as a significant tuning factor for near-junction thermal design of w-AlN-based electronics.
△ Less
Submitted 21 January, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Extracting neutron skin from elastic proton-nucleus scattering with deep neural network
Authors:
G. H. Yang,
Y. Kuang,
Z. X. Yang,
Z. P. Li
Abstract:
Based on the relativistic impulse approximation of proton-nucleus elastic scattering theory, the nucleon density distribution and neutron skin thickness of $^{48}$Ca are estimated via the deep learning method. The neural-network-generated densities are mainly compressed to be lower inside the nucleus compared with the results from the relativistic PC-PK1 density functional, resulting in a signific…
▽ More
Based on the relativistic impulse approximation of proton-nucleus elastic scattering theory, the nucleon density distribution and neutron skin thickness of $^{48}$Ca are estimated via the deep learning method. The neural-network-generated densities are mainly compressed to be lower inside the nucleus compared with the results from the relativistic PC-PK1 density functional, resulting in a significant improvement on the large-angle scattering observables, both for the differential cross section and analyzing power. The neutron skin thickness of $^{48}$Ca is captured to be 0.211(11) fm. The relatively thicker neutron skin is deemed reasonable from the perspective of density functional analysis.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Automatic Smart Contract Comment Generation via Large Language Models and In-Context Learning
Authors:
Junjie Zhao,
Xiang Chen,
Guang Yang,
Yiheng Shen
Abstract:
The previous smart contract code comment (SCC) generation approaches can be divided into two categories: fine-tuning paradigm-based approaches and information retrieval-based approaches. However, for the fine-tuning paradigm-based approaches, the performance may be limited by the quality of the gathered dataset for the downstream task and they may have knowledge-forgetting issues. While for the in…
▽ More
The previous smart contract code comment (SCC) generation approaches can be divided into two categories: fine-tuning paradigm-based approaches and information retrieval-based approaches. However, for the fine-tuning paradigm-based approaches, the performance may be limited by the quality of the gathered dataset for the downstream task and they may have knowledge-forgetting issues. While for the information retrieval-based approaches, it is difficult for them to generate high-quality comments if similar code does not exist in the historical repository. Therefore we want to utilize the domain knowledge related to SCC generation in large language models (LLMs) to alleviate the disadvantages of these two types of approaches. In this study, we propose an approach SCCLLM based on LLMs and in-context learning. Specifically, in the demonstration selection phase, SCCLLM retrieves the top-k code snippets from the historical corpus by considering syntax, semantics, and lexical information. In the in-context learning phase, SCCLLM utilizes the retrieved code snippets as demonstrations, which can help to utilize the related knowledge for this task. We select a large corpus from a smart contract community Etherscan.io as our experimental subject. Extensive experimental results show the effectiveness of SCCLLM when compared with baselines in automatic evaluation and human evaluation.
△ Less
Submitted 16 January, 2024; v1 submitted 17 November, 2023;
originally announced November 2023.
-
The $\mathbf{\bar{q}q\bar{s}Q}$ $\mathbf{(q=u,\,d;\,Q=c,\,b)}$ tetraquark system in a chiral quark model
Authors:
Gang Yang,
Jialun **,
Jorge Segovia
Abstract:
Inspired by the experimentally reported $T_{c\bar{s}}(2900)$ exotic states, the $S$-wave $\bar{q}q\bar{s}Q$ $(q=u,\,d;\,Q=c,\,b)$ tetraquarks, with spin-parity $J^P=0^+$, $1^+$ and $2^+$, in both isoscalar and isovector sectors are systematically studied in a chiral quark model. The meson-meson, diquark-antidiquark and K-type arrangements of quarks, along with all possible color wave functions, ar…
▽ More
Inspired by the experimentally reported $T_{c\bar{s}}(2900)$ exotic states, the $S$-wave $\bar{q}q\bar{s}Q$ $(q=u,\,d;\,Q=c,\,b)$ tetraquarks, with spin-parity $J^P=0^+$, $1^+$ and $2^+$, in both isoscalar and isovector sectors are systematically studied in a chiral quark model. The meson-meson, diquark-antidiquark and K-type arrangements of quarks, along with all possible color wave functions, are comprehensively considered. The four-body system is solved by means of a highly efficient computational approach, the Gaussian expansion method, along with a complex-scaling formulation of the problem to disentangle bound, resonance and scattering states. This theoretical framework has already been successfully applied in various tetra- and penta-quark systems. In the complete coupled-channel case, and within the complex-range formulation, several narrow resonances of $\bar{q}q\bar{s}c$ and $\bar{q}q\bar{s}b$ systems are obtained in each allowed $I(J^P)$-channels. Particularly, the $T_{c\bar{s}}(2900)$ is well identified as a $I(J^P)=1(0^+)$ $\bar{q}q\bar{s}c$ tetraquark state with a dominant molecular structure. Meanwhile, more resonances in $\bar{q}q\bar{s}c$ and $\bar{q}q\bar{s}b$ systems are also obtained within the energy regions $2.4-3.4$ GeV and $5.7-6.7$ GeV, respectively. The predicted exotic states, which are an indication of a richer color structure when going towards multiquark systems beyond mesons and baryons, are expected to be confirmed in future high-energy particle and nuclear experiments.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Orientation-dependent superconductivity and electronic structure of the rare-earth metal/KTaO3 interfaces
Authors:
Guowei Yang,
Weifan Zhu,
Jiawen Zhang,
Hao Zheng,
Yi Wu,
Huali Zhang,
Ge Ye,
Dajun Su,
Yanan Zhang,
Chao Cao,
Xin Lu,
Huiqiu Yuan,
Yang Liu
Abstract:
The recent discovery of orientation-dependent superconductivity in KTaO3-based interfaces has attracted considerable interest, while the underlying origin remains an open question. Here we report a different approach to tune the interfacial electron gas and superconductivity by forming interfaces between rare-earth (RE) metals (RE being La, Ce, Eu) and KTaO3 substrates with different orientations.…
▽ More
The recent discovery of orientation-dependent superconductivity in KTaO3-based interfaces has attracted considerable interest, while the underlying origin remains an open question. Here we report a different approach to tune the interfacial electron gas and superconductivity by forming interfaces between rare-earth (RE) metals (RE being La, Ce, Eu) and KTaO3 substrates with different orientations. We found that the interfacial superconductivity is strongest for the Eu/KTaO3 interfaces, becomes weaker in La/KTaO3 and is absent in Ce/KTaO3. Using in-situ photoemission, we observed distinct valence bands associated with RE metals, as well as a pronounced orientation dependence in the interfacial electronic structure, which can be linked to the orientation-dependent superconductivity. The photoemission spectra show similar double-peak structures for the (111) and (110) oriented interfaces, with an energy separation close to the LO4 phonon of KTaO3. Detailed analyses suggest that this double-peak structure could be attributed to electron-phonon coupling, which might be important for the interfacial superconductivity.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Sequencing Matters: A Generate-Retrieve-Generate Model for Building Conversational Agents
Authors:
Quinn Patwardhan,
Grace Hui Yang
Abstract:
This paper contains what the Georgetown InfoSense group has done in regard to solving the challenges presented by TREC iKAT 2023. Our submitted runs outperform the median runs by a significant margin, exhibiting superior performance in nDCG across various cut numbers and in overall success rate. Our approach uses a Generate-Retrieve-Generate method, which we've found to greatly outpace Retrieve-Th…
▽ More
This paper contains what the Georgetown InfoSense group has done in regard to solving the challenges presented by TREC iKAT 2023. Our submitted runs outperform the median runs by a significant margin, exhibiting superior performance in nDCG across various cut numbers and in overall success rate. Our approach uses a Generate-Retrieve-Generate method, which we've found to greatly outpace Retrieve-Then-Generate approaches for the purposes of iKAT. Our solution involves the use of Large Language Models (LLMs) for initial answers, answer grounding by BM25, passage quality filtering by logistic regression, and answer generation by LLMs again. We leverage several purpose-built Language Models, including BERT, Chat-based, and text-to-transfer-based models, for text understanding, classification, generation, and summarization. The official results of the TREC evaluation contradict our initial self-evaluation, which may suggest that a decrease in the reliance on our retrieval and classification methods is better. Nonetheless, our findings suggest that the sequence of involving these different components matters, where we see an essentiality of using LLMs before using search engines.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Software-Defined Virtual Synchronous Condenser
Authors:
Zimin Jiang,
Peng Zhang,
Yifan Zhou,
Łukasz Kocewiak,
Divya Kurthakoti Chandrashekhara,
Marie-Lou Picherit,
Zefan Tang,
Kenneth B. Bowes,
Guangya Yang
Abstract:
Synchronous condensers (SCs) play important roles in integrating wind energy into relatively weak power grids. However, the design of SCs usually depends on specific application requirements and may not be adaptive enough to the frequently-changing grid conditions caused by the transition from conventional to renewable power generation. This paper devises a software-defined virtual synchronous con…
▽ More
Synchronous condensers (SCs) play important roles in integrating wind energy into relatively weak power grids. However, the design of SCs usually depends on specific application requirements and may not be adaptive enough to the frequently-changing grid conditions caused by the transition from conventional to renewable power generation. This paper devises a software-defined virtual synchronous condenser (SDViSC) method to address the challenges. Our contributions are fourfold: 1) design of a virtual synchronous condenser (ViSC) to enable full converter wind turbines to provide built-in SC functionalities; 2) engineering SDViSCs to transfer hardware-based ViSC controllers into software services, where a Tustin transformation-based software-defined control algorithm guarantees accurate tracking of fast dynamics under limited communication bandwidth; 3) a software-defined networking-enhanced SDViSC communication scheme to allow enhanced communication reliability and reduced communication bandwidth occupation; and 4) Prototype of SDViSC on our real-time, cyber-in-the-loop digital twin of large-wind-farm in an RTDS environment. Extensive test results validate the excellent performance of SDViSC to support reliable and resilient operations of wind farms under various physical and cyber conditions.
△ Less
Submitted 17 November, 2023; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Simulation and analytical modeling of high-speed droplet impact onto a surface
Authors:
Yanchao Liu,
Xu Chu,
Guang Yang,
Bernhard Weigand
Abstract:
The fluid dynamics of liquid droplet impact on surfaces hold significant relevance to various industrial applications. However, high impact velocities introduce compressible effects, leading to material erosion. A gap in understanding and modeling these effects has motivated this study. We simulated droplet impacts on surfaces and proposed a new analytical model for impact pressure and droplet tur…
▽ More
The fluid dynamics of liquid droplet impact on surfaces hold significant relevance to various industrial applications. However, high impact velocities introduce compressible effects, leading to material erosion. A gap in understanding and modeling these effects has motivated this study. We simulated droplet impacts on surfaces and proposed a new analytical model for impact pressure and droplet turning line, targeting at predictions for enhanced cavitation. The highly compressed liquid behind the droplet expands sideways, causing lateral jetting. As the droplet encounters a shock wave, it reflects as a rarefaction wave, leading to low-pressure zones within the droplet. These zones converge at the droplet's center, causing cavitation, which, upon collapse, induces another shock wave, contributing to erosion. Using the well-established model for the low-velocity impact shows a significant discrepancy. Hence, an analytical model for the turning line radius is introduced, incorporating the lateral jetting's characteristic length scale. Comparing our model with existing ones, our new model exhibits superior predictive accuracy.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding
Authors:
Guangyu Yang,
**ghong Chen,
Weizhe Lin,
Bill Byrne
Abstract:
Minimum Bayes Risk (MBR) decoding can significantly improve translation performance of Multilingual Large Language Models (MLLMs). However, MBR decoding is computationally expensive. We show how the recently developed Reinforcement Learning technique, Direct Preference Optimization (DPO), can fine-tune MLLMs to get the gains of MBR without any additional computation in inference. Our method uses o…
▽ More
Minimum Bayes Risk (MBR) decoding can significantly improve translation performance of Multilingual Large Language Models (MLLMs). However, MBR decoding is computationally expensive. We show how the recently developed Reinforcement Learning technique, Direct Preference Optimization (DPO), can fine-tune MLLMs to get the gains of MBR without any additional computation in inference. Our method uses only a small monolingual fine-tuning set and yields significantly improved performance on multiple NMT test sets compared to MLLMs without DPO.
△ Less
Submitted 12 April, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Relaxation dynamics in the alternating XY chain following a quantum quench
Authors:
Kaiyuan Cao,
Yayun Hu,
Peiqing Tong,
Guangwen Yang,
Peng Liu
Abstract:
We investigate the relaxation dynamics of the fermion two-point correlation function $C_{mn}(t)=\langleψ(t)|c_{m}^†c_{n}|ψ(t)\rangle$ in the XY chain with staggered nearest-neighbor hop** interaction after a quench. We find that the deviation $δC_{mn}(t)=C_{mn}(t)-C_{mn}(\infty)$ decays with time following the power law behavior $t^{-μ}$, where the exponent $μ$ depends on whether the quench is t…
▽ More
We investigate the relaxation dynamics of the fermion two-point correlation function $C_{mn}(t)=\langleψ(t)|c_{m}^†c_{n}|ψ(t)\rangle$ in the XY chain with staggered nearest-neighbor hop** interaction after a quench. We find that the deviation $δC_{mn}(t)=C_{mn}(t)-C_{mn}(\infty)$ decays with time following the power law behavior $t^{-μ}$, where the exponent $μ$ depends on whether the quench is to the commensurate phase ($μ=1$) and incommensurate phase ($μ=\frac{1}{2}$). This decay of $δC_{mn}(t)$ arises from the transient behavior of the double excited quasiparticle occupations and the transitions between different excitation spectra. Furthermore, we find that the steady value $C_{mn}(\infty)$, which is different from the ground state expectation value, only involves the average fermion occupation numbers (i.e. the average excited single particle). We also observe nonanalytic singularities in the steady value $C_{mn}(\infty)$ for the quench to the critical points of the quantum phase transitions (QPTs), suggesting its potential use as a signature of QPTs.
△ Less
Submitted 4 January, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Nonlinear Stability Boundary Assessment Of Wind Power Plants Based on Reverse-Time Trajectory
Authors:
Sujay Ghosh,
Mohammad Kazem Bakhshizadeh,
Guangya Yang,
Lukasz Kocewiak
Abstract:
This letter determines the nonlinear stability boundary of a wind power plant (WPP) connected to an AC power grid via a long HVAC cable. The analysis focuses on the slow Phase-Locked Loop (PLL) dynamics, with an assumption that the fast current control dynamics can be neglected. To begin, we propose an aggregated reduced-order wind turbine model. This aggregation can be applied up to a limited fre…
▽ More
This letter determines the nonlinear stability boundary of a wind power plant (WPP) connected to an AC power grid via a long HVAC cable. The analysis focuses on the slow Phase-Locked Loop (PLL) dynamics, with an assumption that the fast current control dynamics can be neglected. To begin, we propose an aggregated reduced-order wind turbine model. This aggregation can be applied up to a limited frequency, e.g. 400Hz, which aligns with our assumption regarding low-frequency dynamics. The WPP collector and transmission network model is established using impedance/frequency scan approximated around $\pm$5 Hz of the PLL nominal frequency, accounting for the hard saturation limits. The stability boundary of the reduced-order system is determined by reverse time trajectory, offering valuable insights into the WPP's overall stability. The work presents a routine from modelling to nonlinear stability assessment for offshore wind farm applications.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Stain Consistency Learning: Handling Stain Variation for Automatic Digital Pathology Segmentation
Authors:
Michael Yeung,
Todd Watts,
Sean YW Tan,
Pedro F. Ferreira,
Andrew D. Scott,
Sonia Nielles-Vallespin,
Guang Yang
Abstract:
Stain variation is a unique challenge associated with automated analysis of digital pathology. Numerous methods have been developed to improve the robustness of machine learning methods to stain variation, but comparative studies have demonstrated limited benefits to performance. Moreover, methods to handle stain variation were largely developed for H&E stained data, with evaluation generally limi…
▽ More
Stain variation is a unique challenge associated with automated analysis of digital pathology. Numerous methods have been developed to improve the robustness of machine learning methods to stain variation, but comparative studies have demonstrated limited benefits to performance. Moreover, methods to handle stain variation were largely developed for H&E stained data, with evaluation generally limited to classification tasks. Here we propose Stain Consistency Learning, a novel framework combining stain-specific augmentation with a stain consistency loss function to learn stain colour invariant features. We perform the first, extensive comparison of methods to handle stain variation for segmentation tasks, comparing ten methods on Masson's trichrome and H&E stained cell and nuclei datasets, respectively. We observed that stain normalisation methods resulted in equivalent or worse performance, while stain augmentation or stain adversarial methods demonstrated improved performance, with the best performance consistently achieved by our proposed approach. The code is available at: https://github.com/mlyg/stain_consistency_learning
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Post-COVID Highlights: Challenges and Solutions of AI Techniques for Swift Identification of COVID-19
Authors:
Yingying Fang,
Xiaodan Xing,
Shiyi Wang,
Simon Walsh,
Guang Yang
Abstract:
Since the onset of the COVID-19 pandemic in 2019, there has been a concerted effort to develop cost-effective, non-invasive, and rapid AI-based tools. These tools were intended to alleviate the burden on healthcare systems, control the rapid spread of the virus, and enhance intervention outcomes, all in response to this unprecedented global crisis. As we transition into a post-COVID era, we retros…
▽ More
Since the onset of the COVID-19 pandemic in 2019, there has been a concerted effort to develop cost-effective, non-invasive, and rapid AI-based tools. These tools were intended to alleviate the burden on healthcare systems, control the rapid spread of the virus, and enhance intervention outcomes, all in response to this unprecedented global crisis. As we transition into a post-COVID era, we retrospectively evaluate these proposed studies and offer a review of the techniques employed in AI diagnostic models, with a focus on the solutions proposed for different challenges. This review endeavors to provide insights into the diverse solutions designed to address the multifaceted challenges that arose during the pandemic. By doing so, we aim to prepare the AI community for the development of AI tools tailored to address public health emergencies effectively.
△ Less
Submitted 24 November, 2023; v1 submitted 24 September, 2023;
originally announced November 2023.
-
Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models
Authors:
Shengzhe Zhou,
Zejian Lee,
Shengyuan Zhang,
Lefan Hou,
Changyuan Yang,
Guang Yang,
Zhiyuan Yang,
Lingyun Sun
Abstract:
Denoising Diffusion models have exhibited remarkable capabilities in image generation. However, generating high-quality samples requires a large number of iterations. Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process but causes degraded generative quality. Based on our analysis with bias-variance decomposition and experi…
▽ More
Denoising Diffusion models have exhibited remarkable capabilities in image generation. However, generating high-quality samples requires a large number of iterations. Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process but causes degraded generative quality. Based on our analysis with bias-variance decomposition and experimental observations, we attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model. Accordingly, we propose $\textbf{S}$patial $\textbf{F}$itting-$\textbf{E}$rror $\textbf{R}$eduction $\textbf{D}$istillation model ($\textbf{SFERD}$). SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error. Empirically, our proposed model facilitates high-quality sample generation in a few function evaluations. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64$\times$64 with only one step, outperforming existing diffusion methods. Our study provides a new perspective on diffusion distillation by highlighting the intrinsic denoising ability of models. Project link: \url{https://github.com/Sainzerjj/SFERD}.
△ Less
Submitted 21 December, 2023; v1 submitted 7 November, 2023;
originally announced November 2023.
-
TCM-GPT: Efficient Pre-training of Large Language Models for Domain Adaptation in Traditional Chinese Medicine
Authors:
Guoxing Yang,
Jianyu Shi,
Zan Wang,
Xiaohong Liu,
Guangyu Wang
Abstract:
Pre-training and fine-tuning have emerged as a promising paradigm across various natural language processing (NLP) tasks. The effectiveness of pretrained large language models (LLM) has witnessed further enhancement, holding potential for applications in the field of medicine, particularly in the context of Traditional Chinese Medicine (TCM). However, the application of these general models to spe…
▽ More
Pre-training and fine-tuning have emerged as a promising paradigm across various natural language processing (NLP) tasks. The effectiveness of pretrained large language models (LLM) has witnessed further enhancement, holding potential for applications in the field of medicine, particularly in the context of Traditional Chinese Medicine (TCM). However, the application of these general models to specific domains often yields suboptimal results, primarily due to challenges like lack of domain knowledge, unique objectives, and computational efficiency. Furthermore, their effectiveness in specialized domains, such as Traditional Chinese Medicine, requires comprehensive evaluation. To address the above issues, we propose a novel domain specific TCMDA (TCM Domain Adaptation) approach, efficient pre-training with domain-specific corpus. Specifically, we first construct a large TCM-specific corpus, TCM-Corpus-1B, by identifying domain keywords and retreving from general corpus. Then, our TCMDA leverages the LoRA which freezes the pretrained model's weights and uses rank decomposition matrices to efficiently train specific dense layers for pre-training and fine-tuning, efficiently aligning the model with TCM-related tasks, namely TCM-GPT-7B. We further conducted extensive experiments on two TCM tasks, including TCM examination and TCM diagnosis. TCM-GPT-7B archived the best performance across both datasets, outperforming other models by relative increments of 17% and 12% in accuracy, respectively. To the best of our knowledge, our study represents the pioneering validation of domain adaptation of a large language model with 7 billion parameters in TCM domain. We will release both TCMCorpus-1B and TCM-GPT-7B model once accepted to facilitate interdisciplinary development in TCM and NLP, serving as the foundation for further study.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Dynamic Multimodal Information Bottleneck for Multimodality Classification
Authors:
Yingying Fang,
Shuang Wu,
Sheng Zhang,
Chaoyan Huang,
Tieyong Zeng,
Xiaodan Xing,
Simon Walsh,
Guang Yang
Abstract:
Effectively leveraging multimodal data such as various images, laboratory tests and clinical information is gaining traction in a variety of AI-based medical diagnosis and prognosis tasks. Most existing multi-modal techniques only focus on enhancing their performance by leveraging the differences or shared features from various modalities and fusing feature across different modalities. These appro…
▽ More
Effectively leveraging multimodal data such as various images, laboratory tests and clinical information is gaining traction in a variety of AI-based medical diagnosis and prognosis tasks. Most existing multi-modal techniques only focus on enhancing their performance by leveraging the differences or shared features from various modalities and fusing feature across different modalities. These approaches are generally not optimal for clinical settings, which pose the additional challenges of limited training data, as well as being rife with redundant data or noisy modality channels, leading to subpar performance. To address this gap, we study the robustness of existing methods to data redundancy and noise and propose a generalized dynamic multimodal information bottleneck framework for attaining a robust fused feature representation. Specifically, our information bottleneck module serves to filter out the task-irrelevant information and noises in the fused feature, and we further introduce a sufficiency loss to prevent drop** of task-relevant information, thus explicitly preserving the sufficiency of prediction information in the distilled feature. We validate our model on an in-house and a public COVID19 dataset for mortality prediction as well as two public biomedical datasets for diagnostic tasks. Extensive experiments show that our method surpasses the state-of-the-art and is significantly more robust, being the only method to remain performance when large-scale noisy channels exist. Our code is publicly available at https://github.com/ayanglab/DMIB.
△ Less
Submitted 25 November, 2023; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Hidden-charm pentaquarks with strangeness in a chiral quark model
Authors:
Gang Yang,
Jialun **,
Jorge Segovia
Abstract:
The LHCb collaboration has recently announced the discovery of two hidden-charm pentaquark states with also strange quark content, $P_{cs}(4338)$ and $P_{cs}(4459)$; its analysis points towards having both hadrons isospin equal to zero and spin-parity quantum numbers $\frac12^-$ and $\frac32^-$, respectively. We perform herein a systematical investigation of the $qqsc\bar{c}$ $(q=u,\,d)$ system by…
▽ More
The LHCb collaboration has recently announced the discovery of two hidden-charm pentaquark states with also strange quark content, $P_{cs}(4338)$ and $P_{cs}(4459)$; its analysis points towards having both hadrons isospin equal to zero and spin-parity quantum numbers $\frac12^-$ and $\frac32^-$, respectively. We perform herein a systematical investigation of the $qqsc\bar{c}$ $(q=u,\,d)$ system by means of a chiral quark model, along with a highly accurate computational method, the Gaussian expansion approach combined with the complex-scaling technique. Baryon-meson configurations in both singlet- and hidden-color channels are considered. The $P_{cs}(4338)$ and $P_{cs}(4459)$ signals can be well identified as molecular bound states with dominant components $ΛJ/ψ$ $(60\%)$ and $Ξ_c D$ $(23\%)$ for the lowest-energy case and $Ξ_c D^*$ $(72\%)$ for the highest-energy one. Besides, it seems that some narrow resonances can be also found in each allowed $I(J^P)$-channel in the energy region of $4.6-5.5$ GeV, except for the $1(\frac12^-)$ where a shallow bound state with dominant $Ξ^*_c D^*$ structure is obtained at $4673$ MeV with binding energy $E_B=-3$ MeV. These exotic states are expected to be confirmed in future high energy experiments.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Is GPT Powerful Enough to Analyze the Emotions of Memes?
Authors:
**g**g Wang,
Joshua Luo,
Grace Yang,
Allen Hong,
Feng Luo
Abstract:
Large Language Models (LLMs), representing a significant achievement in artificial intelligence (AI) research, have demonstrated their ability in a multitude of tasks. This project aims to explore the capabilities of GPT-3.5, a leading example of LLMs, in processing the sentiment analysis of Internet memes. Memes, which include both verbal and visual aspects, act as a powerful yet complex tool for…
▽ More
Large Language Models (LLMs), representing a significant achievement in artificial intelligence (AI) research, have demonstrated their ability in a multitude of tasks. This project aims to explore the capabilities of GPT-3.5, a leading example of LLMs, in processing the sentiment analysis of Internet memes. Memes, which include both verbal and visual aspects, act as a powerful yet complex tool for expressing ideas and sentiments, demanding an understanding of societal norms and cultural contexts. Notably, the detection and moderation of hateful memes pose a significant challenge due to their implicit offensive nature. This project investigates GPT's proficiency in such subjective tasks, revealing its strengths and potential limitations. The tasks include the classification of meme sentiment, determination of humor type, and detection of implicit hate in memes. The performance evaluation, using datasets from SemEval-2020 Task 8 and Facebook hateful memes, offers a comparative understanding of GPT responses against human annotations. Despite GPT's remarkable progress, our findings underscore the challenges faced by these models in handling subjective tasks, which are rooted in their inherent limitations including contextual understanding, interpretation of implicit meanings, and data biases. This research contributes to the broader discourse on the applicability of AI in handling complex, context-dependent tasks, and offers valuable insights for future advancements.
△ Less
Submitted 31 October, 2023;
originally announced November 2023.
-
NeRF Revisited: Fixing Quadrature Instability in Volume Rendering
Authors:
Mikaela Angelina Uy,
Kiyohiro Nakayama,
Guandao Yang,
Rahul Krishna Thomas,
Leonidas Guibas,
Ke Li
Abstract:
Neural radiance fields (NeRF) rely on volume rendering to synthesize novel views. Volume rendering requires evaluating an integral along each ray, which is numerically approximated with a finite sum that corresponds to the exact integral along the ray under piecewise constant volume density. As a consequence, the rendered result is unstable w.r.t. the choice of samples along the ray, a phenomenon…
▽ More
Neural radiance fields (NeRF) rely on volume rendering to synthesize novel views. Volume rendering requires evaluating an integral along each ray, which is numerically approximated with a finite sum that corresponds to the exact integral along the ray under piecewise constant volume density. As a consequence, the rendered result is unstable w.r.t. the choice of samples along the ray, a phenomenon that we dub quadrature instability. We propose a mathematically principled solution by reformulating the sample-based rendering equation so that it corresponds to the exact integral under piecewise linear volume density. This simultaneously resolves multiple issues: conflicts between samples along different rays, imprecise hierarchical sampling, and non-differentiability of quantiles of ray termination distances w.r.t. model parameters. We demonstrate several benefits over the classical sample-based rendering equation, such as sharper textures, better geometric reconstruction, and stronger depth supervision. Our proposed formulation can be also be used as a drop-in replacement to the volume rendering equation of existing NeRF-based methods. Our project page can be found at pl-nerf.github.io.
△ Less
Submitted 19 January, 2024; v1 submitted 31 October, 2023;
originally announced October 2023.
-
High-Resolution Reference Image Assisted Volumetric Super-Resolution of Cardiac Diffusion Weighted Imaging
Authors:
Yinzhe Wu,
Jiahao Huang,
Fanwen Wang,
Pedro Ferreira,
Andrew Scott,
Sonia Nielles-Vallespin,
Guang Yang
Abstract:
Diffusion Tensor Cardiac Magnetic Resonance (DT-CMR) is the only in vivo method to non-invasively examine the microstructure of the human heart. Current research in DT-CMR aims to improve the understanding of how the cardiac microstructure relates to the macroscopic function of the healthy heart as well as how microstructural dysfunction contributes to disease. To get the final DT-CMR metrics, we…
▽ More
Diffusion Tensor Cardiac Magnetic Resonance (DT-CMR) is the only in vivo method to non-invasively examine the microstructure of the human heart. Current research in DT-CMR aims to improve the understanding of how the cardiac microstructure relates to the macroscopic function of the healthy heart as well as how microstructural dysfunction contributes to disease. To get the final DT-CMR metrics, we need to acquire diffusion weighted images of at least 6 directions. However, due to DWI's low signal-to-noise ratio, the standard voxel size is quite big on the scale for microstructures. In this study, we explored the potential of deep-learning-based methods in improving the image quality volumetrically (x4 in all dimensions). This study proposed a novel framework to enable volumetric super-resolution, with an additional model input of high-resolution b0 DWI. We demonstrated that the additional input could offer higher super-resolved image quality. Going beyond, the model is also able to super-resolve DWIs of unseen b-values, proving the model framework's generalizability for cardiac DWI superresolution. In conclusion, we would then recommend giving the model a high-resolution reference image as an additional input to the low-resolution image for training and inference to guide all super-resolution frameworks for parametric imaging where a reference image is available.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
The Missing U for Efficient Diffusion Models
Authors:
Sergio Calvo-Ordonez,
Chun-Wun Cheng,
Jiahao Huang,
Lipei Zhang,
Guang Yang,
Carola-Bibiane Schonlieb,
Angelica I Aviles-Rivero
Abstract:
Diffusion Probabilistic Models stand as a critical tool in generative modelling, enabling the generation of complex data distributions. This family of generative models yields record-breaking performance in tasks such as image synthesis, video generation, and molecule design. Despite their capabilities, their efficiency, especially in the reverse process, remains a challenge due to slow convergenc…
▽ More
Diffusion Probabilistic Models stand as a critical tool in generative modelling, enabling the generation of complex data distributions. This family of generative models yields record-breaking performance in tasks such as image synthesis, video generation, and molecule design. Despite their capabilities, their efficiency, especially in the reverse process, remains a challenge due to slow convergence rates and high computational costs. In this paper, we introduce an approach that leverages continuous dynamical systems to design a novel denoising network for diffusion models that is more parameter-efficient, exhibits faster convergence, and demonstrates increased noise robustness. Experimenting with Denoising Diffusion Probabilistic Models (DDPMs), our framework operates with approximately a quarter of the parameters, and $\sim$ 30\% of the Floating Point Operations (FLOPs) compared to standard U-Nets in DDPMs. Furthermore, our model is notably faster in inference than the baseline when measured in fair and equal conditions. We also provide a mathematical intuition as to why our proposed reverse process is faster as well as a mathematical discussion of the empirical tradeoffs in the denoising downstream task. Finally, we argue that our method is compatible with existing performance enhancement techniques, enabling further improvements in efficiency, quality, and speed.
△ Less
Submitted 5 April, 2024; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Assessing and Improving Syntactic Adversarial Robustness of Pre-trained Models for Code Translation
Authors:
Guang Yang,
Yu Zhou,
Xiangyu Zhang,
Xiang Chen,
Tingting Han,
Taolue Chen
Abstract:
Context: Pre-trained models (PTMs) have demonstrated significant potential in automatic code translation. However, the vulnerability of these models in translation tasks, particularly in terms of syntax, has not been extensively investigated. Objective: To fill this gap, our study aims to propose a novel approach CoTR to assess and improve the syntactic adversarial robustness of PTMs in code trans…
▽ More
Context: Pre-trained models (PTMs) have demonstrated significant potential in automatic code translation. However, the vulnerability of these models in translation tasks, particularly in terms of syntax, has not been extensively investigated. Objective: To fill this gap, our study aims to propose a novel approach CoTR to assess and improve the syntactic adversarial robustness of PTMs in code translation. Method: CoTR consists of two components: CoTR-A and CoTR-D. CoTR-A generates adversarial examples by transforming programs, while CoTR-D proposes a semantic distance-based sampling data augmentation method and adversarial training method to improve the model's robustness and generalization capabilities. The Pass@1 metric is used by CoTR to assess the performance of PTMs, which is more suitable for code translation tasks and offers a more precise evaluation in real world scenarios. Results: The effectiveness of CoTR is evaluated through experiments on real world Java to Python datasets. The results demonstrate that CoTR-A can significantly reduce the performance of existing PTMs, while CoTR-D effectively improves the robustness of PTMs. Conclusion: Our study identifies the limitations of current PTMs, including large language models, in code translation tasks. It highlights the potential of CoTR as an effective solution to enhance the robustness of PTMs for code translation tasks.
△ Less
Submitted 28 October, 2023;
originally announced October 2023.
-
Data-Free Distillation Improves Efficiency and Privacy in Federated Thorax Disease Analysis
Authors:
Ming Li,
Guang Yang
Abstract:
Thorax disease analysis in large-scale, multi-centre, and multi-scanner settings is often limited by strict privacy policies. Federated learning (FL) offers a potential solution, while traditional parameter-based FL can be limited by issues such as high communication costs, data leakage, and heterogeneity. Distillation-based FL can improve efficiency, but it relies on a proxy dataset, which is oft…
▽ More
Thorax disease analysis in large-scale, multi-centre, and multi-scanner settings is often limited by strict privacy policies. Federated learning (FL) offers a potential solution, while traditional parameter-based FL can be limited by issues such as high communication costs, data leakage, and heterogeneity. Distillation-based FL can improve efficiency, but it relies on a proxy dataset, which is often impractical in clinical practice. To address these challenges, we introduce a data-free distillation-based FL approach FedKDF. In FedKDF, the server employs a lightweight generator to aggregate knowledge from different clients without requiring access to their private data or a proxy dataset. FedKDF combines the predictors from clients into a single, unified predictor, which is further optimized using the learned knowledge in the lightweight generator. Our empirical experiments demonstrate that FedKDF offers a robust solution for efficient, privacy-preserving federated thorax disease analysis.
△ Less
Submitted 31 October, 2023; v1 submitted 22 October, 2023;
originally announced October 2023.
-
A Spectral Condition for Feature Learning
Authors:
Greg Yang,
James B. Simon,
Jeremy Bernstein
Abstract:
The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a process known as feature learning. Here, we show that feature learning is achieved by scaling the spectral norm of weight matrices and their updates like…
▽ More
The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a process known as feature learning. Here, we show that feature learning is achieved by scaling the spectral norm of weight matrices and their updates like $\sqrt{\texttt{fan-out}/\texttt{fan-in}}$, in contrast to widely used but heuristic scalings based on Frobenius norm and entry size. Our spectral scaling analysis also leads to an elementary derivation of \emph{maximal update parametrization}. All in all, we aim to provide the reader with a solid conceptual understanding of feature learning in neural networks.
△ Less
Submitted 13 May, 2024; v1 submitted 26 October, 2023;
originally announced October 2023.
-
Towards Matching Phones and Speech Representations
Authors:
Gene-** Yang,
Hao Tang
Abstract:
Learning phone types from phone instances has been a long-standing problem, while still being open. In this work, we revisit this problem in the context of self-supervised learning, and pose it as the problem of matching cluster centroids to phone embeddings. We study two key properties that enable matching, namely, whether cluster centroids of self-supervised representations reduce the variabilit…
▽ More
Learning phone types from phone instances has been a long-standing problem, while still being open. In this work, we revisit this problem in the context of self-supervised learning, and pose it as the problem of matching cluster centroids to phone embeddings. We study two key properties that enable matching, namely, whether cluster centroids of self-supervised representations reduce the variability of phone instances and respect the relationship among phones. We then use the matching result to produce pseudo-labels and introduce a new loss function for improving self-supervised representations. Our experiments show that the matching result captures the relationship among phones. Training the new loss function jointly with the regular self-supervised losses, such as APC and CPC, significantly improves the downstream phone classification.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Robust Source-Free Domain Adaptation for Fundus Image Segmentation
Authors:
Lingrui Li,
Yanfeng Zhou,
Ge Yang
Abstract:
Unsupervised Domain Adaptation (UDA) is a learning technique that transfers knowledge learned in the source domain from labelled training data to the target domain with only unlabelled data. It is of significant importance to medical image segmentation because of the usual lack of labelled training data. Although extensive efforts have been made to optimize UDA techniques to improve the accuracy o…
▽ More
Unsupervised Domain Adaptation (UDA) is a learning technique that transfers knowledge learned in the source domain from labelled training data to the target domain with only unlabelled data. It is of significant importance to medical image segmentation because of the usual lack of labelled training data. Although extensive efforts have been made to optimize UDA techniques to improve the accuracy of segmentation models in the target domain, few studies have addressed the robustness of these models under UDA. In this study, we propose a two-stage training strategy for robust domain adaptation. In the source training stage, we utilize adversarial sample augmentation to enhance the robustness and generalization capability of the source model. And in the target training stage, we propose a novel robust pseudo-label and pseudo-boundary (PLPB) method, which effectively utilizes unlabeled target data to generate pseudo labels and pseudo boundaries that enable model self-adaptation without requiring source data. Extensive experimental results on cross-domain fundus image segmentation confirm the effectiveness and versatility of our method. Source code of this study is openly accessible at https://github.com/LinGrayy/PLPB.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Galaxies Going Bananas: Inferring the 3D Geometry of High-Redshift Galaxies with JWST-CEERS
Authors:
Viraj Pandya,
Haowen Zhang,
Marc Huertas-Company,
Kartheik G. Iyer,
Elizabeth McGrath,
Guillermo Barro,
Steven L. Finkelstein,
Martin Kuemmel,
William G. Hartley,
Henry C. Ferguson,
Jeyhan S. Kartaltepe,
Joel Primack,
Avishai Dekel,
Sandra M. Faber,
David C. Koo,
Greg L. Bryan,
Rachel S. Somerville,
Ricardo O. Amorin,
Pablo Arrabal Haro,
Micaela B. Bagley,
Eric F. Bell,
Emmanuel Bertin,
Luca Costantin,
Romeel Dave,
Mark Dickinson
, et al. (31 additional authors not shown)
Abstract:
The 3D geometry of high-redshift galaxies remains poorly understood. We build a differentiable Bayesian model and use Hamiltonian Monte Carlo to efficiently and robustly infer the 3D shapes of star-forming galaxies in JWST-CEERS observations with $\log M_*/M_{\odot}=9.0-10.5$ at $z=0.5-8.0$. We reproduce previous results from HST-CANDELS in a fraction of the computing time and constrain the mean e…
▽ More
The 3D geometry of high-redshift galaxies remains poorly understood. We build a differentiable Bayesian model and use Hamiltonian Monte Carlo to efficiently and robustly infer the 3D shapes of star-forming galaxies in JWST-CEERS observations with $\log M_*/M_{\odot}=9.0-10.5$ at $z=0.5-8.0$. We reproduce previous results from HST-CANDELS in a fraction of the computing time and constrain the mean ellipticity, triaxiality, size and covariances with samples as small as $\sim50$ galaxies. We find high 3D ellipticities for all mass-redshift bins suggesting oblate (disky) or prolate (elongated) geometries. We break that degeneracy by constraining the mean triaxiality to be $\sim1$ for $\log M_*/M_{\odot}=9.0-9.5$ dwarfs at $z>1$ (favoring the prolate scenario), with significantly lower triaxialities for higher masses and lower redshifts indicating the emergence of disks. The prolate population traces out a ``banana'' in the projected $b/a-\log a$ diagram with an excess of low $b/a$, large $\log a$ galaxies. The dwarf prolate fraction rises from $\sim25\%$ at $z=0.5-1.0$ to $\sim50-80\%$ at $z=3-8$. If these are disks, they cannot be axisymmetric but instead must be unusually oval (triaxial) unlike local circular disks. We simultaneously constrain the 3D size-mass relation and its dependence on 3D geometry. High-probability prolate and oblate candidates show remarkably similar Sérsic indices ($n\sim1$), non-parametric morphological properties and specific star formation rates. Both tend to be visually classified as disks or irregular but edge-on oblate candidates show more dust attenuation. We discuss selection effects, follow-up prospects and theoretical implications.
△ Less
Submitted 15 January, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Spectral-Efficiency and Energy-Efficiency of Variable-Length XP-HARQ
Authors:
Jiahui Feng,
Zheng Shi,
Yaru Fu,
Hong Wang,
Guanghua Yang,
Shaodan Ma
Abstract:
A variable-length cross-packet hybrid automatic repeat request (VL-XP-HARQ) is proposed to boost the spectral efficiency (SE) and the energy efficiency (EE) of communications. The SE is firstly derived in terms of the outage probabilities, with which the SE is proved to be upper bounded by the ergodic capacity (EC). Moreover, to facilitate the maximization of the SE, the asymptotic outage probabil…
▽ More
A variable-length cross-packet hybrid automatic repeat request (VL-XP-HARQ) is proposed to boost the spectral efficiency (SE) and the energy efficiency (EE) of communications. The SE is firstly derived in terms of the outage probabilities, with which the SE is proved to be upper bounded by the ergodic capacity (EC). Moreover, to facilitate the maximization of the SE, the asymptotic outage probability is obtained at high signal-to-noise ratio (SNR), with which the SE is maximized by properly choosing the number of new information bits while guaranteeing outage requirement. By applying Dinkelbach's transform, the fractional objective function is transformed into a subtraction form, which can be decomposed into multiple sub-problems through alternating optimization. By noticing that the asymptotic outage probability is a convex function, each sub-problem can be easily relaxed to a convex problem by adopting successive convex approximation (SCA). Besides, the EE of VL-XP-HARQ is also investigated. An upper bound of the EE is found and proved to be attainable. Furthermore, by aiming at maximizing the EE via power allocation while confining outage within a certain constraint, the methods to the maximization of SE are invoked to solve the similar fractional problem. Finally, numerical results are presented for verification.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Exploiting User Comments for Early Detection of Fake News Prior to Users' Commenting
Authors:
Qiong Nan,
Qiang Sheng,
Juan Cao,
Yongchun Zhu,
Danding Wang,
Guang Yang,
**tao Li,
Kai Shu
Abstract:
Both accuracy and timeliness are key factors in detecting fake news on social media. However, most existing methods encounter an accuracy-timeliness dilemma: Content-only methods guarantee timeliness but perform moderately because of limited available information, while social context-based ones generally perform better but inevitably lead to latency because of social context accumulation needs. T…
▽ More
Both accuracy and timeliness are key factors in detecting fake news on social media. However, most existing methods encounter an accuracy-timeliness dilemma: Content-only methods guarantee timeliness but perform moderately because of limited available information, while social context-based ones generally perform better but inevitably lead to latency because of social context accumulation needs. To break such a dilemma, a feasible but not well-studied solution is to leverage social contexts (e.g., comments) from historical news for training a detection model and apply it to newly emerging news without social contexts. This requires the model to (1) sufficiently learn helpful knowledge from social contexts, and (2) be well compatible with situations that social contexts are available or not. To achieve this goal, we propose to absorb and parameterize useful knowledge from comments in historical news and then inject it into a content-only detection model. Specifically, we design the Comments Assisted Fake News Detection method (CAS-FEND), which transfers useful knowledge from a comments-aware teacher model to a content-only student model during training. The student model is further used to detect newly emerging fake news. Experiments show that the CAS-FEND student model outperforms all content-only methods and even those with 1/4 comments as inputs, demonstrating its superiority for early detection.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Authors:
Open X-Embodiment Collaboration,
Abby O'Neill,
Abdul Rehman,
Abhinav Gupta,
Abhiram Maddukuri,
Abhishek Gupta,
Abhishek Padalkar,
Abraham Lee,
Acorn Pooley,
Agrim Gupta,
Ajay Mandlekar,
A**kya Jain,
Albert Tung,
Alex Bewley,
Alex Herzog,
Alex Irpan,
Alexander Khazatsky,
Anant Rai,
Anchit Gupta,
Andrew Wang,
Andrey Kolobov,
Anikait Singh,
Animesh Garg,
Aniruddha Kembhavi,
Annie Xie
, et al. (267 additional authors not shown)
Abstract:
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method…
▽ More
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.
△ Less
Submitted 1 June, 2024; v1 submitted 13 October, 2023;
originally announced October 2023.
-
DualAug: Exploiting Additional Heavy Augmentation with OOD Data Rejection
Authors:
Zehao Wang,
Yiwen Guo,
Qizhang Li,
Guanglei Yang,
Wangmeng Zuo
Abstract:
Data augmentation is a dominant method for reducing model overfitting and improving generalization. Most existing data augmentation methods tend to find a compromise in augmenting the data, \textit{i.e.}, increasing the amplitude of augmentation carefully to avoid degrading some data too much and doing harm to the model performance. We delve into the relationship between data augmentation and mode…
▽ More
Data augmentation is a dominant method for reducing model overfitting and improving generalization. Most existing data augmentation methods tend to find a compromise in augmenting the data, \textit{i.e.}, increasing the amplitude of augmentation carefully to avoid degrading some data too much and doing harm to the model performance. We delve into the relationship between data augmentation and model performance, revealing that the performance drop with heavy augmentation comes from the presence of out-of-distribution (OOD) data. Nonetheless, as the same data transformation has different effects for different training samples, even for heavy augmentation, there remains part of in-distribution data which is beneficial to model training. Based on the observation, we propose a novel data augmentation method, named \textbf{DualAug}, to keep the augmentation in distribution as much as possible at a reasonable time and computational cost. We design a data mixing strategy to fuse augmented data from both the basic- and the heavy-augmentation branches. Extensive experiments on supervised image classification benchmarks show that DualAug improve various automated data augmentation method. Moreover, the experiments on semi-supervised learning and contrastive self-supervised learning demonstrate that our DualAug can also improve related method. Code is available at \href{https://github.com/shuguang99/DualAug}{https://github.com/shuguang99/DualAug}.
△ Less
Submitted 15 October, 2023; v1 submitted 12 October, 2023;
originally announced October 2023.
-
CEERS: 7.7 $μ$m PAH Star Formation Rate Calibration with JWST MIRI
Authors:
Kaila Ronayne,
Casey Papovich,
Guang Yang,
Lu Shen,
Mark Dickinson,
Robert Kennicutt,
Anahita Alavi,
Pablo Arrabal Haro,
Micaela Bagley,
Denis Burgarella,
Aurélien Le Bail,
Eric Bell,
Nikko Cleri,
Justin Cole,
Luca Costantin,
Alexander de la Vega,
Emanuele Daddi,
David Elbaz,
Steven Finkelstein,
Norman Grogin,
Benne Holwerda,
Jeyhan Kartaltepe,
Allison Kirkpatrick,
Anton Koekemoer,
Ray Lucas
, et al. (11 additional authors not shown)
Abstract:
We test the relationship between UV-derived star formation rates (SFRs) and the 7.7 $μ$m polycyclic aromatic hydrocarbon (PAH) luminosities from the integrated emission of galaxies at z ~ 0 - 2. We utilize multi-band photometry covering 0.2 - 160 $μ$m from HST, CFHT, JWST, Spitzer, and Herschel for galaxies in the Cosmic Evolution Early Release Science (CEERS) Survey. We perform spectral energy di…
▽ More
We test the relationship between UV-derived star formation rates (SFRs) and the 7.7 $μ$m polycyclic aromatic hydrocarbon (PAH) luminosities from the integrated emission of galaxies at z ~ 0 - 2. We utilize multi-band photometry covering 0.2 - 160 $μ$m from HST, CFHT, JWST, Spitzer, and Herschel for galaxies in the Cosmic Evolution Early Release Science (CEERS) Survey. We perform spectral energy distribution (SED) modeling of these data to measure dust-corrected far-UV (FUV) luminosities, $L_{FUV}$, and UV-derived SFRs. We then fit SED models to the JWST/MIRI 7.7 - 21 $μ$m CEERS data to derive rest-frame 7.7 $μ$m luminosities, $L_{770}$, using the average flux density in the rest-frame MIRI F770W bandpass. We observe a correlation between $L_{770}$ and $L_{FUV}$, where log $L_{770}$ is proportional to (1.27+/-0.04) log $L_{FUV}$. $L_{770}$ diverges from this relation for galaxies at lower metallicities, lower dust obscuration, and for galaxies dominated by evolved stellar populations. We derive a "single-wavelength" SFR calibration for $L_{770}$ which has a scatter from model estimated SFRs (${σ_{ΔSFR}}$) of 0.24 dex. We derive a "multi-wavelength" calibration for the linear combination of the observed FUV luminosity (uncorrected for dust) and the rest-frame 7.7 $μ$m luminosity, which has a scatter of ${σ_{ΔSFR}}$ = 0.21 dex. The relatively small decrease in $σ$ suggests this is near the systematic accuracy of the total SFRs using either calibration. These results demonstrate that the rest-frame 7.7 $μ$m emission constrained by JWST/MIRI is a tracer of the SFR for distant galaxies to this accuracy, provided the galaxies are dominated by star-formation with moderate-to-high levels of attenuation and metallicity.
△ Less
Submitted 13 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Dual Radar: A Multi-modal Dataset with Dual 4D Radar for Autonomous Driving
Authors:
Xinyu Zhang,
Li Wang,
Jian Chen,
Cheng Fang,
Lei Yang,
Ziying Song,
Guangqi Yang,
Yichen Wang,
Xiaofei Zhang,
Jun Li,
Zhiwei Li,
Qingshan Yang,
Zhenlin Zhang,
Shuzhi Sam Ge
Abstract:
Radar has stronger adaptability in adverse scenarios for autonomous driving environmental perception compared to widely adopted cameras and LiDARs. Compared with commonly used 3D radars, the latest 4D radars have precise vertical resolution and higher point cloud density, making it a highly promising sensor for autonomous driving in complex environmental perception. However, due to the much higher…
▽ More
Radar has stronger adaptability in adverse scenarios for autonomous driving environmental perception compared to widely adopted cameras and LiDARs. Compared with commonly used 3D radars, the latest 4D radars have precise vertical resolution and higher point cloud density, making it a highly promising sensor for autonomous driving in complex environmental perception. However, due to the much higher noise than LiDAR, manufacturers choose different filtering strategies, resulting in an inverse ratio between noise level and point cloud density. There is still a lack of comparative analysis on which method is beneficial for deep learning-based perception algorithms in autonomous driving. One of the main reasons is that current datasets only adopt one type of 4D radar, making it difficult to compare different 4D radars in the same scene. Therefore, in this paper, we introduce a novel large-scale multi-modal dataset featuring, for the first time, two types of 4D radars captured simultaneously. This dataset enables further research into effective 4D radar perception algorithms.Our dataset consists of 151 consecutive series, most of which last 20 seconds and contain 10,007 meticulously synchronized and annotated frames. Moreover, our dataset captures a variety of challenging driving scenarios, including many road conditions, weather conditions, nighttime and daytime with different lighting intensities and periods. Our dataset annotates consecutive frames, which can be applied to 3D object detection and tracking, and also supports the study of multi-modal tasks. We experimentally validate our dataset, providing valuable results for studying different types of 4D radars. This dataset is released on https://github.com/adept-thu/Dual-Radar.
△ Less
Submitted 9 November, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
On friendship and cyclic parking functions
Authors:
Yujia Kang,
Thomas Selig,
Guanyi Yang,
Yanting Zhang,
Haoyue Zhu
Abstract:
In parking problems, a given number of cars enter a one-way street sequentially, and try to park according to a specified preferred spot in the street. Various models are possible depending on the chosen rule for collisions, when two cars have the same preferred spot. In classical parking functions, if a car's preferred spot is already occupied by a previous car, it drives forward and looks for th…
▽ More
In parking problems, a given number of cars enter a one-way street sequentially, and try to park according to a specified preferred spot in the street. Various models are possible depending on the chosen rule for collisions, when two cars have the same preferred spot. In classical parking functions, if a car's preferred spot is already occupied by a previous car, it drives forward and looks for the first unoccupied spot to park. In this work, we introduce a variant of classical parking functions, called "friendship parking functions", which imposes additional restrictions on where cars can park. Namely, a car can only end up parking next to cars which are its friends (friendship will correspond to adjacency in an underlying graph). We characterise and enumerate such friendship parking functions according to their outcome permutation, which describes the final configuration when all cars have parked. We apply this to the case where the underlying friendship graph is the cycle graph. Finally, we consider a subset of classical parking functions, called "cyclic parking functions", where cars end up in an increasing cyclic order. We enumerate these cyclic parking functions and exhibit a bijection to permutation components.
△ Less
Submitted 4 January, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Small-Signal Stability and SCR Enhancement of Offshore WPPs with Synchronous Condensers
Authors:
Sulav Ghimire,
Kanakesh V. Kkuni,
Emerson D. Guest,
Kim H. Jensen,
Guangya Yang
Abstract:
Synchronous condensers (SCs) have been reported to improve the overall stability and short-circuit power of a power system. SCs are also being integrated into offshore wind power plants (WPPs) for the same reason. This paper, investigates the effect of synchronous condensers on an offshore wind power plant with grid-following (GFL) and grid-forming (GFM) converter controls. Primarily, the effect o…
▽ More
Synchronous condensers (SCs) have been reported to improve the overall stability and short-circuit power of a power system. SCs are also being integrated into offshore wind power plants (WPPs) for the same reason. This paper, investigates the effect of synchronous condensers on an offshore wind power plant with grid-following (GFL) and grid-forming (GFM) converter controls. Primarily, the effect of synchronous condensers can be two-fold: (1) overall stability enhancement of the WPP by providing reactive power support, (2) contribution to the effective short circuit ratio (SCR) of the WPP by fault current support. Therefore, this paper focuses on studies concerning these effects on an aggregated model of a WPP connected to the grid. To that end, a state-space model of the test system is developed for small-signal stability assessment and the synchronous condenser's effect on its stability. In addition, a mathematical explanation of SCR enhancement with synchronous condenser is provided and is verified with time-domain electromagnetic transient simulations.
△ Less
Submitted 30 January, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
High Accuracy and Cost-Saving Active Learning 3D WD-UNet for Airway Segmentation
Authors:
Shiyi Wang,
Yang Nan,
Simon Walsh,
Guang Yang
Abstract:
We propose a novel Deep Active Learning (DeepAL) model-3D Wasserstein Discriminative UNet (WD-UNet) for reducing the annotation effort of medical 3D Computed Tomography (CT) segmentation. The proposed WD-UNet learns in a semi-supervised way and accelerates learning convergence to meet or exceed the prediction metrics of supervised learning models. Our method can be embedded with different Active L…
▽ More
We propose a novel Deep Active Learning (DeepAL) model-3D Wasserstein Discriminative UNet (WD-UNet) for reducing the annotation effort of medical 3D Computed Tomography (CT) segmentation. The proposed WD-UNet learns in a semi-supervised way and accelerates learning convergence to meet or exceed the prediction metrics of supervised learning models. Our method can be embedded with different Active Learning (AL) strategies and different network structures. The model is evaluated on 3D lung airway CT scans for medical segmentation and show that the use of uncertainty metric, which is parametrized as an input of query strategy, leads to more accurate prediction results than some state-of-the-art Deep Learning (DL) supervised models, e.g.,3DUNet and 3D CEUNet. Compared to the above supervised DL methods, our WD-UNet not only saves the cost of annotation for radiologists but also saves computational resources. WD-UNet uses a limited amount of annotated data (35% of the total) to achieve better predictive metrics with a more efficient deep learning model algorithm.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Three-Sensor 2ω Method with Multi-directional Layout: A General Methodology for Measuring Thermal Conductivity of Solid Materials
Authors:
Guang Yang,
Bing-yang Cao
Abstract:
Anisotropic thermal transport plays a key role in both theoretical study and engineering practice of heat transfer, but accurately measuring anisotropic thermal conductivity remains a significant challenge. To address this issue, we propose the three-sensor 2ω method in this study, which is capable of accurately measuring the isotropic or anisotropic thermal conductivity of solid materials. In thi…
▽ More
Anisotropic thermal transport plays a key role in both theoretical study and engineering practice of heat transfer, but accurately measuring anisotropic thermal conductivity remains a significant challenge. To address this issue, we propose the three-sensor 2ω method in this study, which is capable of accurately measuring the isotropic or anisotropic thermal conductivity of solid materials. In this method, several three-sensor groups following the design guidelines are fabricated upon the sample along different characteristic directions, and each group consists of three parallel metal sensors with unequal widths and distances optimally designed based on sensitivity analysis. Among the three sensors, the outer two serve as AC heaters and the middle one as a DC detector. The 2ω voltage signals across the detector in each three-sensor group are measured, and then the data are processed by the proposed Intersection Method to derive the thermal conductivities along directions of interest. The application of the detector's 2ω instead of the heater's 3ω voltage signals eliminates the errors introduced by the uncertainties of thermal resistance in superficial structures (metal layer, insulation layer, interface, etc.). Meanwhile, by replacing the fitting algorithm with the Intersection Method, the local optimum trap of multivariate fitting is avoided. To verify the accuracy and reliability, four typical monocrystalline semiconductors, i.e., Si, GaN, AlN, and {β-Ga _2 O _3}, are measured, and the results are consistent with the literature. This method will provide a comprehensive and versatile solution for the thermal conductivity measurements of solid materials.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Probabilistic Method to Fundamental gap problems on the sphere
Authors:
Gunhee Cho,
Guofang Wei,
Guang Yang
Abstract:
We provide a probabilistic proof of the fundamental gap estimate for Schrödinger operators in convex domains on the sphere, which extends the probabilistic proof of F. Gong, H. Li, and D. Luo for the Euclidean case. Our results further generalize the results achieved for the Laplacian by S. Seto, L. Wang, and G. Wei, as well as by C. He, G. Wei, and Qi S. Zhang. The essential ingredient in our ana…
▽ More
We provide a probabilistic proof of the fundamental gap estimate for Schrödinger operators in convex domains on the sphere, which extends the probabilistic proof of F. Gong, H. Li, and D. Luo for the Euclidean case. Our results further generalize the results achieved for the Laplacian by S. Seto, L. Wang, and G. Wei, as well as by C. He, G. Wei, and Qi S. Zhang. The essential ingredient in our analysis is the reflection coupling method on Riemannian manifolds.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks
Authors:
Greg Yang,
Dingli Yu,
Chen Zhu,
Soufiane Hayou
Abstract:
By classifying infinite-width neural networks and identifying the *optimal* limit, Tensor Programs IV and V demonstrated a universal way, called $μ$P, for *widthwise hyperparameter transfer*, i.e., predicting optimal hyperparameters of wide neural networks from narrow ones. Here we investigate the analogous classification for *depthwise parametrizations* of deep residual networks (resnets). We cla…
▽ More
By classifying infinite-width neural networks and identifying the *optimal* limit, Tensor Programs IV and V demonstrated a universal way, called $μ$P, for *widthwise hyperparameter transfer*, i.e., predicting optimal hyperparameters of wide neural networks from narrow ones. Here we investigate the analogous classification for *depthwise parametrizations* of deep residual networks (resnets). We classify depthwise parametrizations of block multiplier and learning rate by their infinite-width-then-depth limits. In resnets where each block has only one layer, we identify a unique optimal parametrization, called Depth-$μ$P that extends $μ$P and show empirically it admits depthwise hyperparameter transfer. We identify *feature diversity* as a crucial factor in deep networks, and Depth-$μ$P can be characterized as maximizing both feature learning and feature diversity. Exploiting this, we find that absolute value, among all homogeneous nonlinearities, maximizes feature diversity and indeed empirically leads to significantly better performance. However, if each block is deeper (such as modern transformers), then we find fundamental limitations in all possible infinite-depth limits of such parametrizations, which we illustrate both theoretically and empirically on simple networks as well as Megatron transformer trained on Common Crawl.
△ Less
Submitted 12 October, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Lyfe Agents: Generative agents for low-cost real-time social interactions
Authors:
Zhao Kaiya,
Michelangelo Naim,
Jovana Kondic,
Manuel Cortes,
Jiaxin Ge,
Shuying Luo,
Guangyu Robert Yang,
Andrew Ahn
Abstract:
Highly autonomous generative agents powered by large language models promise to simulate intricate social behaviors in virtual societies. However, achieving real-time interactions with humans at a low computational cost remains challenging. Here, we introduce Lyfe Agents. They combine low-cost with real-time responsiveness, all while remaining intelligent and goal-oriented. Key innovations include…
▽ More
Highly autonomous generative agents powered by large language models promise to simulate intricate social behaviors in virtual societies. However, achieving real-time interactions with humans at a low computational cost remains challenging. Here, we introduce Lyfe Agents. They combine low-cost with real-time responsiveness, all while remaining intelligent and goal-oriented. Key innovations include: (1) an option-action framework, reducing the cost of high-level decisions; (2) asynchronous self-monitoring for better self-consistency; and (3) a Summarize-and-Forget memory mechanism, prioritizing critical memory items at a low cost. We evaluate Lyfe Agents' self-motivation and sociability across several multi-agent scenarios in our custom LyfeGame 3D virtual environment platform. When equipped with our brain-inspired techniques, Lyfe Agents can exhibit human-like self-motivated social reasoning. For example, the agents can solve a crime (a murder mystery) through autonomous collaboration and information exchange. Meanwhile, our techniques enabled Lyfe Agents to operate at a computational cost 10-100 times lower than existing alternatives. Our findings underscore the transformative potential of autonomous generative agents to enrich human social experiences in virtual worlds.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Grid-Forming Control Methods for Weakly Connected Offshore WPPs
Authors:
Sulav Ghimire,
Kanakesh V Kkuni,
Simon C Jakobsen,
Thyge Knueppel,
Kim H Jensen,
Emerson Guest,
Tonny W Rasmussen,
Guangya Yang
Abstract:
Grid-forming control (GFC) has seen numerous technological advances in their control types, applications, and the multitude of services they provide. Some examples of the services they provide include black start, inertial frequency response, and islanded operation capabilities with the possibility of re-synchronization without the need of additional support from other devices such as storage. Sta…
▽ More
Grid-forming control (GFC) has seen numerous technological advances in their control types, applications, and the multitude of services they provide. Some examples of the services they provide include black start, inertial frequency response, and islanded operation capabilities with the possibility of re-synchronization without the need of additional support from other devices such as storage. State of the art literature proposes a variety of GFCs which can provide single or multiple of these services. However, study of these different GFCs for weakly-connected offshore wind power plants (WPPs) based on time-domain simulation and focusing on the large signal disturbance is not well covered. This paper reviews some of the most researched grid-forming control methods applicable to offshore WPPs and provides a comparative investigation and discussion of their stability properties and applicability, especially when connected to a weak-grid. The paper also provides a discussion on the prerequisites and challenges surrounding the comparative study of different GFCs.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
T1/T2 relaxation temporal modelling from accelerated acquisitions using a Latent Transformer
Authors:
Fanwen Wang,
Michael Tanzer,
Mengyun Qiao,
Wenjia Bai,
Daniel Rueckert,
Guang Yang,
Sonia Nielles-Vallespin
Abstract:
Quantitative cardiac magnetic resonance T1 and T2 map** enable myocardial tissue characterisation but the lengthy scan times restrict their widespread clinical application. We propose a deep learning method that incorporates a time dependency Latent Transformer module to model relationships between parameterised time frames for improved reconstruction from undersampled data. The module, implemen…
▽ More
Quantitative cardiac magnetic resonance T1 and T2 map** enable myocardial tissue characterisation but the lengthy scan times restrict their widespread clinical application. We propose a deep learning method that incorporates a time dependency Latent Transformer module to model relationships between parameterised time frames for improved reconstruction from undersampled data. The module, implemented as a multi-resolution sequence-to-sequence transformer, is integrated into an encoder-decoder architecture to leverage the inherent temporal correlations in relaxation processes. The presented results for accelerated T1 and T2 map** show the model recovers maps with higher fidelity by explicit incorporation of time dynamics. This work demonstrates the importance of temporal modelling for artifact-free reconstruction in quantitative MRI.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Compositional Sculpting of Iterative Generative Processes
Authors:
Timur Garipov,
Sebastiaan De Peuter,
Ge Yang,
Vikas Garg,
Samuel Kaski,
Tommi Jaakkola
Abstract:
High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion models, is that to realize the desired target distribution, all steps of the generative process need to be coordinated, and satisfy delicate balance cond…
▽ More
High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion models, is that to realize the desired target distribution, all steps of the generative process need to be coordinated, and satisfy delicate balance conditions. In this work, we propose Compositional Sculpting: a general approach for defining compositions of iterative generative processes. We then introduce a method for sampling from these compositions built on classifier guidance. We showcase ways to accomplish compositional sculpting in both GFlowNets and diffusion models. We highlight two binary operations $\unicode{x2014}$ the harmonic mean ($p_1 \otimes p_2$) and the contrast ($p_1 \unicode{x25D1}\,p_2$) between pairs, and the generalization of these operations to multiple component distributions. We offer empirical results on image and molecular generation tasks.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Style Transfer and Self-Supervised Learning Powered Myocardium Infarction Super-Resolution Segmentation
Authors:
Lichao Wang,
Jiahao Huang,
Xiaodan Xing,
Yinzhe Wu,
Ramyah Rajakulasingam,
Andrew D. Scott,
Pedro F Ferreira,
Ranil De Silva,
Sonia Nielles-Vallespin,
Guang Yang
Abstract:
This study proposes a pipeline that incorporates a novel style transfer model and a simultaneous super-resolution and segmentation model. The proposed pipeline aims to enhance diffusion tensor imaging (DTI) images by translating them into the late gadolinium enhancement (LGE) domain, which offers a larger amount of data with high-resolution and distinct highlighting of myocardium infarction (MI) a…
▽ More
This study proposes a pipeline that incorporates a novel style transfer model and a simultaneous super-resolution and segmentation model. The proposed pipeline aims to enhance diffusion tensor imaging (DTI) images by translating them into the late gadolinium enhancement (LGE) domain, which offers a larger amount of data with high-resolution and distinct highlighting of myocardium infarction (MI) areas. Subsequently, the segmentation task is performed on the LGE style image. An end-to-end super-resolution segmentation model is introduced to generate high-resolution mask from low-resolution LGE style DTI image. Further, to enhance the performance of the model, a multi-task self-supervised learning strategy is employed to pre-train the super-resolution segmentation model, allowing it to acquire more representative knowledge and improve its segmentation performance after fine-tuning. https: github.com/wlc2424762917/Med_Img
△ Less
Submitted 27 September, 2023;
originally announced September 2023.