Search | arXiv e-print repository

Strong decays of the $φ(2170)$ as a fully-strange tetraquark state

Authors: Yi-Wei Jiang, Wei-Han Tan, Hua-Xing Chen, Er-Liang Cui

Abstract: We study strong decays of the $φ(2170)$, along with its possible partner $X(2436)$, as two fully-strange tetraquark states of $J^{PC} = 1^{--}$. We consider seven decay channels: $φη$, $φη^\prime$, $φf_0(980)$, $φf_1(1420)$, $h_1(1415) η$, $h_1(1415) η^\prime$, and $h_1(1415) f_1(1420)$. Some of these channels are kinematically possible, and we calculate their relative branching ratios through the… ▽ More We study strong decays of the $φ(2170)$, along with its possible partner $X(2436)$, as two fully-strange tetraquark states of $J^{PC} = 1^{--}$. We consider seven decay channels: $φη$, $φη^\prime$, $φf_0(980)$, $φf_1(1420)$, $h_1(1415) η$, $h_1(1415) η^\prime$, and $h_1(1415) f_1(1420)$. Some of these channels are kinematically possible, and we calculate their relative branching ratios through the Fierz rearrangement. Future experimental measurements on these ratios can be useful in determining the nature of the $φ(2170)$ and $X(2436)$. The $φ(2170)$ has been observed in the $φf_0(980)$, $φη$, and $φη^\prime$ channels, and we propose to further examine it in the $h_1(1415) η$ channel. Evidences of the $X(2436)$ have been observed in the $φf_0(980)$ channel, and we propose to verify whether this structure exists or not in the $φη$, $φη^\prime$, $h_1(1415) η$, and $h_1(1415) η^\prime$ channels. △ Less

Submitted 30 October, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: 10 pages, 3 figures, 1 table, suggestions and comments are welcome

arXiv:2310.10674 [pdf, other]

doi 10.1016/j.jpcs.2024.112148

New pairing mechanism via chiral electron-hole condensation for non-BCS superconductivity

Authors: Wanpeng Tan

Abstract: A novel chiral electron-hole (CEH) pairing mechanism is proposed to account for non-BCS superconductivity. In contrast to BCS Cooper pairs, CEH pairs exhibit a pronounced affinity to antiferromagnetism for superconductivity. The gap equations derived from this new microscopic mechanism are analyzed for both s- and d-wave superconductivity, revealing marked departures from the BCS theory. Unsurpris… ▽ More A novel chiral electron-hole (CEH) pairing mechanism is proposed to account for non-BCS superconductivity. In contrast to BCS Cooper pairs, CEH pairs exhibit a pronounced affinity to antiferromagnetism for superconductivity. The gap equations derived from this new microscopic mechanism are analyzed for both s- and d-wave superconductivity, revealing marked departures from the BCS theory. Unsurprisingly, CEH naturally describes superconductivity in strongly-correlated systems, necessitating an exceedingly large coupling parameter ($λ>1$ for s-wave and $λ>π/2$ for d-wave) to be efficacious. The new mechanism provides a better understanding of various non-BCS features, especially in cuprate and iron-based superconductors. In particular, CEH, through quantitative comparison with experimental data, shows promise in solving long-standing puzzles such as the unexpectedly large gap-to-critical-temperature ratio $Δ_0/T_c$, the lack of gap closure at $T_c$, superconducting phase diagrams, and a non-zero heat-capacity-to-temperature ratio $C/T$ at $T=0$ (i.e., the ``anomalous linear term''), along with its quadratic behavior near $T=0$ for d-wave cuprates. △ Less

Submitted 8 June, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: 22 pages, 12 figures, 1 table

Journal ref: J. Phys. Chem. Solids 2024

arXiv:2310.10522 [pdf, other]

Observation of GRB 221009A early afterglow in X/$γ$-ray energy band

Authors: Chao Zheng, Yan-Qiu Zhang, Shao-Lin Xiong, Cheng-Kui Li, He Gao, Wang-Chen Xue, Jia-Cong Liu, Chen-Wei Wang, Wen-Jun Tan, Wen-Xi Peng, Zheng-Hua An, Ce Cai, Ming-Yu Ge, Dong-Ya Guo, Yue Huang, Bing Li, Ti-Pei Li, Xiao-Bo Li, Xin-Qiao Li, Xu-Fang Li, **-Yuan Liao, Cong-Zhan Liu, Fang-Jun Lu, Xiang Ma, Rui Qiao , et al. (23 additional authors not shown)

Abstract: The early afterglow of a Gamma-ray burst (GRB) can provide critical information on the jet and progenitor of the GRB. The extreme brightness of GRB 221009A allows us to probe its early afterglow in unprecedented detail. In this letter, we report comprehensive observation results of the early afterglow of GRB 221009A (from $T_0$+660 s to $T_0$+1860 s, where $T_0$ is the \textit{Insight}-HXMT/HE tri… ▽ More The early afterglow of a Gamma-ray burst (GRB) can provide critical information on the jet and progenitor of the GRB. The extreme brightness of GRB 221009A allows us to probe its early afterglow in unprecedented detail. In this letter, we report comprehensive observation results of the early afterglow of GRB 221009A (from $T_0$+660 s to $T_0$+1860 s, where $T_0$ is the \textit{Insight}-HXMT/HE trigger time) in X/$γ$-ray energy band (from 20 keV to 20 MeV) by \textit{Insight}-HXMT/HE, GECAM-C and \textit{Fermi}/GBM. We find that the spectrum of the early afterglow in 20 keV-20 MeV could be well described by a cutoff power-law with an extra power-law which dominates the low and high energy bands respectively. The cutoff power-law $E_{\rm peak}$ is $\sim$ 30 keV and the power-law photon index is $\sim$ 1.8 throughout the early afterglow phase. By fitting the light curves in different energy bands, we find that a significant achromatic break (from keV to TeV) is required at $T_0$ + 1246$^{+27}_{-26}$ s (i.e. 1021 s since the afterglow starting time $T_{\rm AG}$=$T_0$+225 s), providing compelling evidence of a jet break. Interestingly, both the pre-break and post-break decay slopes vary with energy, and these two slopes become closer in the lower energy band, making the break less identifiable. Intriguingly, the spectrum of the early afterglow experienced a slight hardening before the break and a softening after the break. These results provide new insights into the understanding of this remarkable GRB. △ Less

Submitted 19 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted for publication in ApJ Letters on 19-Jan-2024, 11 pages, 7 figures and 2 tables

arXiv:2310.07205 [pdf, other]

Evidence of mini-jet emission in a large emission zone from a magnetically-dominated gamma-ray burst jet

Authors: S. -X. Yi, C. -W. Wang, X. -Y. Shao, R. Moradi, H. Gao, B. Zhang, S. -L. Xiong, S. -N. Zhang, W. -J. Tan, J. -C. Liu, W. -C. Xue, Y. -Q. Zhang, C. Zheng, Y. Wang, P. Zhang, Z. -H. An, C. Cai, P. -Y. Feng, K. Gong, D. -Y. Guo, Y. Huang, B. Li, X. -B. Li, X. -Q. Li, X. -J. Liu , et al. (21 additional authors not shown)

Abstract: The second brightest GRB in history, GRB230307A provides an ideal laboratory to study the details of GRB prompt emission thanks to its extraordinarily high photon statistics and its single broad pulse overall shape characterized by an energy-dependent fast-rise-exponential-decay (FRED) profile. Here we demonstrate that its broad pulse is composed of many rapidly variable short pulses, rather than… ▽ More The second brightest GRB in history, GRB230307A provides an ideal laboratory to study the details of GRB prompt emission thanks to its extraordinarily high photon statistics and its single broad pulse overall shape characterized by an energy-dependent fast-rise-exponential-decay (FRED) profile. Here we demonstrate that its broad pulse is composed of many rapidly variable short pulses, rather than being the superposition of many short pulses on top of a slow component. Such a feature is consistent with the picture of many mini-jets due to local magnetic reconnection events in a large emission zone far from the GRB central engine, as envisaged in the internal-collision-induced magnetic reconnection and turbulence (ICMART) model, but raises a great challenge to the internal shock models that attribute all variability components to collisions among different shells. Since relativistic mini-jets demand strong magnetization in the outflow, this work provides strong evidence for a Poynting-flux-dominated jet composition of this bright GRB. △ Less

Submitted 16 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 7 pages and 2 figures in the main text. 27 pages and 9 figures in total

arXiv:2310.04618 [pdf, other]

doi 10.1109/ICCAD57390.2023.10323839

KyberMat: Efficient Accelerator for Matrix-Vector Polynomial Multiplication in CRYSTALS-Kyber Scheme via NTT and Polyphase Decomposition

Authors: Weihang Tan, Yingjie Lao, Keshab K. Parhi

Abstract: CRYSTAL-Kyber (Kyber) is one of the post-quantum cryptography (PQC) key-encapsulation mechanism (KEM) schemes selected during the standardization process. This paper addresses optimization for Kyber architecture with respect to latency and throughput constraints. Specifically, matrix-vector multiplication and number theoretic transform (NTT)-based polynomial multiplication are critical operations… ▽ More CRYSTAL-Kyber (Kyber) is one of the post-quantum cryptography (PQC) key-encapsulation mechanism (KEM) schemes selected during the standardization process. This paper addresses optimization for Kyber architecture with respect to latency and throughput constraints. Specifically, matrix-vector multiplication and number theoretic transform (NTT)-based polynomial multiplication are critical operations and bottlenecks that require optimization. To address this challenge, we propose an algorithm and hardware co-design approach to systematically optimize matrix-vector multiplication and NTT-based polynomial multiplication by employing a novel sub-structure sharing technique in order to reduce computational complexity, i.e., the number of modular multiplications and modular additions/subtractions consumed. The sub-structure sharing approach is inspired by prior fast parallel approaches based on polyphase decomposition. The proposed efficient feed-forward architecture achieves high speed, low latency, and full utilization of all hardware components, which can significantly enhance the overall efficiency of the Kyber scheme. The FPGA implementation results show that our proposed design, using the fast two-parallel structure, leads to an approximate reduction of 90% in execution time, along with a 66 times improvement in throughput performance. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: Proc. 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Francisco, CA, Oct. 29 - Nov. 2, 2023

Journal ref: 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)

arXiv:2309.14724 [pdf, other]

doi 10.1051/0004-6361/202347477

Reconciling results of 2019 and 2020 stellar occultations on Pluto's atmosphere. New constraints from both the 5 September 2019 event and consistency analysis

Authors: Ye Yuan, Fan Li, Yanning Fu, Jian Chen, Wei Tan, Shuai Zhang, Wei Zhang, Chen Zhang, Qiang Zhang, Jiahui Ye, Delai Li, Yi**g Zhu, Zhensen Fu, Ansheng Zhu, Yue Chen, Jun Xu, Yang Zhang

Abstract: A stellar occultation by Pluto on 5 September 2019 yielded positive detections at two separate stations. Using an approach consistent with comparable studies, we derived a surface pressure of $11.478 \pm 0.55~\mathrm{μbar}$ for Pluto's atmosphere from the observations of this event. In addition, to avoid potential method inconsistancies highlighted by Sicardy et al. when comparing with historical… ▽ More A stellar occultation by Pluto on 5 September 2019 yielded positive detections at two separate stations. Using an approach consistent with comparable studies, we derived a surface pressure of $11.478 \pm 0.55~\mathrm{μbar}$ for Pluto's atmosphere from the observations of this event. In addition, to avoid potential method inconsistancies highlighted by Sicardy et al. when comparing with historical pressure measurements, we reanalyzed the data by 15 August 2018 and 17 July 2019 events, respectively. All the new measurements provide a bridge between the two different perspectives on the pressure variation since 2015: a rapid pressure drop from previous studies of the 15 August 2018 and 17 July 2019 events and a plateau phase from that of the 6 June 2020 event. The pressure measurement from the 5 September 2019 event aligns with those from 2016, 2018, and 2020, supporting the latter perspective. While the measurements from the 4 June 2011 and 17 July 2019 events suggest probable V-shaped pressure variations unaccounted for by the volatile transport model (VTM) from Meza et al., the VTM remains applicable on average. And, the validity of the V-shaped variations is debatable due to the stellar faintness of the 4 June 2011 event and the grazing single-chord geometry of the 17 July 2019 event. To reveal and understand all significant pressure variations of Pluto's atmosphere, it is essential to provide constraints on both short-term and long-term evolutions of the interacting atmosphere and surface by continuous pressure monitoring through occultation observations, whenever possible, complemented by frequent spectroscopy and photometry of the surface. △ Less

Submitted 5 November, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

Comments: Astronomy & Astrophysics, in press. 10 pages, 6 figures

Journal ref: A&A 680, A9 (2023)

arXiv:2309.12218 [pdf, other]

SR-PredictAO: Session-based Recommendation with High-Capability Predictor Add-On

Authors: Ruida Wang, Raymond Chi-Wing Wong, Weile Tan

Abstract: Session-based recommendation, aiming at making the prediction of the user's next item click based on the information in a single session only even in the presence of some random user's behavior, is a complex problem. This complex problem requires a high-capability model of predicting the user's next action. Most (if not all) existing models follow the encoder-predictor paradigm where all studies f… ▽ More Session-based recommendation, aiming at making the prediction of the user's next item click based on the information in a single session only even in the presence of some random user's behavior, is a complex problem. This complex problem requires a high-capability model of predicting the user's next action. Most (if not all) existing models follow the encoder-predictor paradigm where all studies focus on how to optimize the encoder module extensively in the paradigm but they ignore how to optimize the predictor module. In this paper, we discover the existing critical issue of the low-capability predictor module among existing models. Motivated by this, we propose a novel framework called \emph{\underline{S}ession-based \underline{R}ecommendation with \underline{Pred}ictor \underline{A}dd-\underline{O}n} (SR-PredictAO). In this framework, we propose a high-capability predictor module which could alleviate the effect of random user's behavior for prediction. It is worth mentioning that this framework could be applied to any existing models, which could give opportunities for further optimizing the framework. Extensive experiments on two real benchmark datasets for three state-of-the-art models show that \emph{SR-PredictAO} out-performs the current state-of-the-art model by up to 2.9\% in HR@20 and 2.3\% in MRR@20. More importantly, the improvement is consistent across almost all the existing models on all datasets, which could be regarded as a significant contribution in the field. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.11041 [pdf, ps, other]

Polarization-based cyclic weak value metrology for angular velocity measurement

Authors: Zi-Rui Zhong, Yue Chen, Wei-Jun Tan, Xiang-Ming Hu, Qing-Lin Wu

Abstract: Weak measurement has been proven to amplify the detection of changes in meters while discarding most photons due to the low probability of post-selection. Previous power-recycling schemes enable the failed post-selection photons to be repeatedly selected, thus overcoming the inefficient post-selection and increasing the precision of detection. In this study, we focus on the polarization-based weak… ▽ More Weak measurement has been proven to amplify the detection of changes in meters while discarding most photons due to the low probability of post-selection. Previous power-recycling schemes enable the failed post-selection photons to be repeatedly selected, thus overcoming the inefficient post-selection and increasing the precision of detection. In this study, we focus on the polarization-based weak value angular-velocity measurement and introduce three cyclic methods to enhance the accuracy of detecting time shift in a Gaussian beam: power recycling, signal recycling, and dual recycling schemes. By incorporating one or two partially transmitting mirrors into the system, both the power and signal-to-noise ratio (SNR) of the detected light are substantially enhanced. Compared to non-polarization schemes, polarization-based approaches offer several advantages, including lower optical loss, unique cyclic directions, and a wider optimal region. These features effectively reduce crosstalk among different light paths and theoretically eliminate the walk-off effect, thus yielding improvements in both theoretical performance and application. △ Less

Submitted 14 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: 9 pages, 4 figures

arXiv:2309.11039 [pdf, other]

Federated Learning in Intelligent Transportation Systems: Recent Applications and Open Problems

Authors: Shiying Zhang, Jun Li, Long Shi, Ming Ding, Dinh C. Nguyen, Wuzheng Tan, Jian Weng, Zhu Han

Abstract: Intelligent transportation systems (ITSs) have been fueled by the rapid development of communication technologies, sensor technologies, and the Internet of Things (IoT). Nonetheless, due to the dynamic characteristics of the vehicle networks, it is rather challenging to make timely and accurate decisions of vehicle behaviors. Moreover, in the presence of mobile wireless communications, the privacy… ▽ More Intelligent transportation systems (ITSs) have been fueled by the rapid development of communication technologies, sensor technologies, and the Internet of Things (IoT). Nonetheless, due to the dynamic characteristics of the vehicle networks, it is rather challenging to make timely and accurate decisions of vehicle behaviors. Moreover, in the presence of mobile wireless communications, the privacy and security of vehicle information are at constant risk. In this context, a new paradigm is urgently needed for various applications in dynamic vehicle environments. As a distributed machine learning technology, federated learning (FL) has received extensive attention due to its outstanding privacy protection properties and easy scalability. We conduct a comprehensive survey of the latest developments in FL for ITS. Specifically, we initially research the prevalent challenges in ITS and elucidate the motivations for applying FL from various perspectives. Subsequently, we review existing deployments of FL in ITS across various scenarios, and discuss specific potential issues in object recognition, traffic management, and service providing scenarios. Furthermore, we conduct a further analysis of the new challenges introduced by FL deployment and the inherent limitations that FL alone cannot fully address, including uneven data distribution, limited storage and computing power, and potential privacy and security concerns. We then examine the existing collaborative technologies that can help mitigate these challenges. Lastly, we discuss the open challenges that remain to be addressed in applying FL in ITS and propose several future research directions. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.08273 [pdf, other]

A Generative Framework for Self-Supervised Facial Representation Learning

Authors: Ruian He, Zhen Xing, Weimin Tan, Bo Yan

Abstract: Self-supervised representation learning has gained increasing attention for strong generalization ability without relying on paired datasets. However, it has not been explored sufficiently for facial representation. Self-supervised facial representation learning remains unsolved due to the coupling of facial identities, expressions, and external factors like pose and light. Prior methods primarily… ▽ More Self-supervised representation learning has gained increasing attention for strong generalization ability without relying on paired datasets. However, it has not been explored sufficiently for facial representation. Self-supervised facial representation learning remains unsolved due to the coupling of facial identities, expressions, and external factors like pose and light. Prior methods primarily focus on contrastive learning and pixel-level consistency, leading to limited interpretability and suboptimal performance. In this paper, we propose LatentFace, a novel generative framework for self-supervised facial representations. We suggest that the disentangling problem can be also formulated as generative objectives in space and time, and propose the solution using a 3D-aware latent diffusion model. First, we introduce a 3D-aware autoencoder to encode face images into 3D latent embeddings. Second, we propose a novel representation diffusion model to disentangle 3D latent into facial identity and expression. Consequently, our method achieves state-of-the-art performance in facial expression recognition (FER) and face verification among self-supervised facial representation learning models. Our model achieves a 3.75\% advantage in FER accuracy on RAF-DB and 3.35\% on AffectNet compared to SOTA methods. △ Less

Submitted 22 May, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.06832 [pdf, ps, other]

doi 10.1103/PhysRevA.108.032608

Dual-recycled interference-based weak value metrology

Authors: Zi-Rui Zhong, Wei-Jun Tan, Yue Chen, Qing-Lin Wu

Abstract: Weak-value-amplification permits small effects to be measured as observable changes at the sacrifice of power due to post-selection. The power recycling scheme has been proven to eliminate this inefficiency of the rare post-selection, thus surpassing the limit of the shot noise and improving the precision of the measurement. However, the improvement is strictly limited by the system setup, especia… ▽ More Weak-value-amplification permits small effects to be measured as observable changes at the sacrifice of power due to post-selection. The power recycling scheme has been proven to eliminate this inefficiency of the rare post-selection, thus surpassing the limit of the shot noise and improving the precision of the measurement. However, the improvement is strictly limited by the system setup, especially the system loss. Here we introduce a dual recycling model based on the interferometric weak-value-based deflection measurement. Two mirrors, the power-recycling mirror and signal-recycling mirror, are placed at the bright and dark port of the interferometer respectively, creating a composite resonator. The results show that both the power and the signal-to-noise ratio (SNR) are greatly enhanced in a wider range of experimental parameters compared to the power-recycling scheme. This work considerably loosens the constraint of the system setup and further explores the real advantage of weak measurement over traditional schemes. △ Less

Submitted 18 September, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: 7 pages, 5 figures, reference

arXiv:2308.11362 [pdf, other]

Calibration of the Timing Performance of GECAM-C

Authors: Shuo Xiao, Ya-Qing Liu, Ke Gong, Zheng-Hua An, Shao-Lin Xiong, Xin-Qiao Li, Xiang-Yang Wen, Wen-Xi Peng, Da-Li Zhang, You-Li Tuo, Shi-Jie Zheng, Li-Ming Song, ** Wang, Xiao-Yun Zhao, Yue Huang, Xiang Ma, Xiao-**g Liu, Rui Qiao, Yan-Bing Xu, Sheng Yang, Fan Zhang, Yue Wang, Yan-Qiu Zhang, Wang-Chen Xue, Jia-Cong Liu , et al. (13 additional authors not shown)

Abstract: As a new member of the Gravitational wave high-energy Electromagnetic Counterpart All-sky Monitor (GECAM) after GECAM-A and GECAM-B, GECAM-C (originally called HEBS), which was launched on board the SATech-01 satellite on July 27, 2022, aims to monitor and localize X-ray and gamma-ray transients from $\sim$ 6 keV to 6 MeV. GECAM-C utilizes a similar design to GECAM but operates in a more complex o… ▽ More As a new member of the Gravitational wave high-energy Electromagnetic Counterpart All-sky Monitor (GECAM) after GECAM-A and GECAM-B, GECAM-C (originally called HEBS), which was launched on board the SATech-01 satellite on July 27, 2022, aims to monitor and localize X-ray and gamma-ray transients from $\sim$ 6 keV to 6 MeV. GECAM-C utilizes a similar design to GECAM but operates in a more complex orbital environment. In this work, we utilize the secondary particles simultaneously produced by the cosmic-ray events on orbit and recorded by multiple detectors, to calibrate the relative timing accuracy between all detectors of GECAM-C. We find the result is 0.1 $μ\rm s$, which is the highest time resolution among all GRB detectors ever flown and very helpful in timing analyses such as minimum variable timescale and spectral lags, as well as in time delay localization. Besides, we calibrate the absolute time accuracy using the one-year Crab pulsar data observed by GECAM-C and Fermi/GBM, as well as GECAM-C and GECAM-B. The results are $2.02\pm 2.26\ μ\rm s$ and $5.82\pm 3.59\ μ\rm s$, respectively. Finally, we investigate the spectral lag between the different energy bands of Crab pulsar observed by GECAM and GBM, which is $\sim -0.2\ {\rm μs\ keV^{-1}}$. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: submitted

arXiv:2308.07748 [pdf, other]

Exploiting Sparsity in Automotive Radar Object Detection Networks

Authors: Marius Lippke, Maurice Quach, Sascha Braun, Daniel Köhler, Michael Ulrich, Bastian Bischoff, Wei Yap Tan

Abstract: Having precise perception of the environment is crucial for ensuring the secure and reliable functioning of autonomous driving systems. Radar object detection networks are one fundamental part of such systems. CNN-based object detectors showed good performance in this context, but they require large compute resources. This paper investigates sparse convolutional object detection networks, which co… ▽ More Having precise perception of the environment is crucial for ensuring the secure and reliable functioning of autonomous driving systems. Radar object detection networks are one fundamental part of such systems. CNN-based object detectors showed good performance in this context, but they require large compute resources. This paper investigates sparse convolutional object detection networks, which combine powerful grid-based detection with low compute resources. We investigate radar specific challenges and propose sparse kernel point pillars (SKPP) and dual voxel point convolutions (DVPC) as remedies for the grid rendering and sparse backbone architectures. We evaluate our SKPP-DPVCN architecture on nuScenes, which outperforms the baseline by 5.89% and the previous state of the art by 4.19% in Car AP4.0. Moreover, SKPP-DPVCN reduces the average scale error (ASE) by 21.41% over the baseline. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2308.01568 [pdf, other]

MVFlow: Deep Optical Flow Estimation of Compressed Videos with Motion Vector Prior

Authors: Shili Zhou, Xuhao Jiang, Weimin Tan, Ruian He, Bo Yan

Abstract: In recent years, many deep learning-based methods have been proposed to tackle the problem of optical flow estimation and achieved promising results. However, they hardly consider that most videos are compressed and thus ignore the pre-computed information in compressed video streams. Motion vectors, one of the compression information, record the motion of the video frames. They can be directly ex… ▽ More In recent years, many deep learning-based methods have been proposed to tackle the problem of optical flow estimation and achieved promising results. However, they hardly consider that most videos are compressed and thus ignore the pre-computed information in compressed video streams. Motion vectors, one of the compression information, record the motion of the video frames. They can be directly extracted from the compression code stream without computational cost and serve as a solid prior for optical flow estimation. Therefore, we propose an optical flow model, MVFlow, which uses motion vectors to improve the speed and accuracy of optical flow estimation for compressed videos. In detail, MVFlow includes a key Motion-Vector Converting Module, which ensures that the motion vectors can be transformed into the same domain of optical flow and then be utilized fully by the flow estimation module. Meanwhile, we construct four optical flow datasets for compressed videos containing frames and motion vectors in pairs. The experimental results demonstrate the superiority of our proposed MVFlow, which can reduce the AEPE by 1.09 compared to existing models or save 52% time to achieve similar accuracy to existing models. △ Less

Submitted 4 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: Accepted by ACM MM 2023

arXiv:2307.16586 [pdf, other]

SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment Anything Model

Authors: Shili Zhou, Ruian He, Weimin Tan, Bo Yan

Abstract: Optical Flow Estimation aims to find the 2D dense motion field between two frames. Due to the limitation of model structures and training datasets, existing methods often rely too much on local clues and ignore the integrity of objects, resulting in fragmented motion estimation. Through theoretical analysis, we find the pre-trained large vision models are helpful in optical flow estimation, and we… ▽ More Optical Flow Estimation aims to find the 2D dense motion field between two frames. Due to the limitation of model structures and training datasets, existing methods often rely too much on local clues and ignore the integrity of objects, resulting in fragmented motion estimation. Through theoretical analysis, we find the pre-trained large vision models are helpful in optical flow estimation, and we notice that the recently famous Segment Anything Model (SAM) demonstrates a strong ability to segment complete objects, which is suitable for solving the fragmentation problem. We thus propose a solution to embed the frozen SAM image encoder into FlowFormer to enhance object perception. To address the challenge of in-depth utilizing SAM in non-segmentation tasks like optical flow estimation, we propose an Optical Flow Task-Specific Adaption scheme, including a Context Fusion Module to fuse the SAM encoder with the optical flow context encoder, and a Context Adaption Module to adapt the SAM features for optical flow task with Learned Task-Specific Embedding. Our proposed SAMFlow model reaches 0.86/2.10 clean/final EPE and 3.55/12.32 EPE/F1-all on Sintel and KITTI-15 training set, surpassing Flowformer by 8.5%/9.9% and 13.2%/16.3%. Furthermore, our model achieves state-of-the-art performance on the Sintel and KITTI-15 benchmarks, ranking #1 among all two-frame methods on Sintel clean pass. △ Less

Submitted 21 December, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

Comments: Accepted by AAAI 2024

arXiv:2307.16555 [pdf, other]

doi 10.1145/3581783.3611752

Uncertainty-Guided Spatial Pruning Architecture for Efficient Frame Interpolation

Authors: Ri Cheng, Xuhao Jiang, Ruian He, Shili Zhou, Weimin Tan, Bo Yan

Abstract: The video frame interpolation (VFI) model applies the convolution operation to all locations, leading to redundant computations in regions with easy motion. We can use dynamic spatial pruning method to skip redundant computation, but this method cannot properly identify easy regions in VFI tasks without supervision. In this paper, we develop an Uncertainty-Guided Spatial Pruning (UGSP) architectur… ▽ More The video frame interpolation (VFI) model applies the convolution operation to all locations, leading to redundant computations in regions with easy motion. We can use dynamic spatial pruning method to skip redundant computation, but this method cannot properly identify easy regions in VFI tasks without supervision. In this paper, we develop an Uncertainty-Guided Spatial Pruning (UGSP) architecture to skip redundant computation for efficient frame interpolation dynamically. Specifically, pixels with low uncertainty indicate easy regions, where the calculation can be reduced without bringing undesirable visual results. Therefore, we utilize uncertainty-generated mask labels to guide our UGSP in properly locating the easy region. Furthermore, we propose a self-contrast training strategy that leverages an auxiliary non-pruning branch to improve the performance of our UGSP. Extensive experiments show that UGSP maintains performance but reduces FLOPs by 34%/52%/30% compared to baseline without pruning on Vimeo90K/UCF101/MiddleBury datasets. In addition, our method achieves state-of-the-art performance with lower FLOPs on multiple benchmarks. △ Less

Submitted 27 October, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

Comments: ACM Multimedia 2023

arXiv:2307.14884 [pdf, other]

Individual and Averaged Power Density Spectra of X-ray bursts from SGR J1935+2154: Quasiperiodic Oscillation Search and Slopes

Authors: Shuo Xiao, Xiao-Bo Li, Wang-Chen Xue, Shao-Lin Xiong, Shuang-Nan Zhang, Wen-Xi Peng, Ai-Jun Dong, You-Li Tuo, Ce Cai, Xi-Hong Luo, Jiao-Jiao Yang, Yue Wang, Chao Zheng, Yan-Qiu Zhang, Jia-Cong Liu, Wen-Jun Tan, Chen-Wei Wang, ** Wang, Cheng-Kui Li, Shu-Xu Yi, Shi-Jun Dang, Lun-Hua Shang, Ru-Shuang Zhao, Qing-Bo Ma, Wei Xie , et al. (7 additional authors not shown)

Abstract: The study of quasi-periodic oscillations (QPOs) and power density spectra (PDS) continuum properties can help shed light on the still illusive emission physics of magnetars and as a window into the interiors of neutron stars using asteroseismology. In this work, we employ a Bayesian method to search for the QPOs in the hundreds of X-ray bursts from SGR J1935+2154 observed by {\it Insight}-HXMT, GE… ▽ More The study of quasi-periodic oscillations (QPOs) and power density spectra (PDS) continuum properties can help shed light on the still illusive emission physics of magnetars and as a window into the interiors of neutron stars using asteroseismology. In this work, we employ a Bayesian method to search for the QPOs in the hundreds of X-ray bursts from SGR J1935+2154 observed by {\it Insight}-HXMT, GECAM and Fermi/GBM from July 2014 to January 2022. Although no definitive QPO signal (significance $>3σ$) is detected in individual bursts or the averaged periodogram of the bursts grouped by duration, we identify several bursts exhibiting possible QPO at $\sim$ 40 Hz, which is consistent with that reported in the X-ray burst associated with FRB 200428. We investigate the PDS continuum properties and find that the distribution of the PDS slope in the simple power-law model peaks $\sim$ 2.5, which is consistent with other magnetars but higher than 5/3 commonly seen in gamma-ray bursts. Besides, the distribution of the break frequency in the broken power-law model peaks at $\sim$ 60 Hz. Finally, we report that the power-law index of PDS has an anti-correlation and power-law dependence on the burst duration as well as the minimum variation timescale. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: comments welcome

arXiv:2307.14349 [pdf, other]

Copilot for Xcode: Exploring AI-Assisted Programming by Prompting Cloud-based Large Language Models

Authors: Chee Wei Tan, Shangxin Guo, Man Fai Wong, Ching Nam Hang

Abstract: This paper presents an AI-assisted programming tool called Copilot for Xcode for program composition and design to support human software developers. By seamlessly integrating cloud-based Large Language Models (LLM) with Apple's local development environment, Xcode, this tool enhances productivity and unleashes creativity for software development in Apple software ecosystem (e.g., iOS apps, macOS)… ▽ More This paper presents an AI-assisted programming tool called Copilot for Xcode for program composition and design to support human software developers. By seamlessly integrating cloud-based Large Language Models (LLM) with Apple's local development environment, Xcode, this tool enhances productivity and unleashes creativity for software development in Apple software ecosystem (e.g., iOS apps, macOS). Leveraging advanced natural language processing (NLP) techniques, Copilot for Xcode effectively processes source code tokens and patterns within code repositories, enabling features such as code generation, autocompletion, documentation, and error detection. Software developers can also query and make "small" decisions for program composition, some of which can be made simultaneously, and this is facilitated through prompt engineering in a chat interface of Copilot for Xcode. Finally, we present simple case studies as evidence of the effectiveness of utilizing NLP in Xcode to prompt popular LLM services like OpenAI ChatGPT for program composition and design. △ Less

Submitted 8 July, 2023; originally announced July 2023.

arXiv:2307.13716 [pdf, other]

FedDRL: A Trustworthy Federated Learning Model Fusion Method Based on Staged Reinforcement Learning

Authors: Leiming Chen, Weishan Zhang, Cihao Dong, Sibo Qiao, Ziling Huang, Yuming Nie, Zhaoxiang Hou, Chee Wei Tan

Abstract: Traditional federated learning uses the number of samples to calculate the weights of each client model and uses this fixed weight value to fusion the global model. However, in practical scenarios, each client's device and data heterogeneity leads to differences in the quality of each client's model. Thus the contribution to the global model is not wholly determined by the sample size. In addition… ▽ More Traditional federated learning uses the number of samples to calculate the weights of each client model and uses this fixed weight value to fusion the global model. However, in practical scenarios, each client's device and data heterogeneity leads to differences in the quality of each client's model. Thus the contribution to the global model is not wholly determined by the sample size. In addition, if clients intentionally upload low-quality or malicious models, using these models for aggregation will lead to a severe decrease in global model accuracy. Traditional federated learning algorithms do not address these issues. To solve this probelm, we propose FedDRL, a model fusion approach using reinforcement learning based on a two staged approach. In the first stage, Our method could filter out malicious models and selects trusted client models to participate in the model fusion. In the second stage, the FedDRL algorithm adaptively adjusts the weights of the trusted client models and aggregates the optimal global model. We also define five model fusion scenarios and compare our method with two baseline algorithms in those scenarios. The experimental results show that our algorithm has higher reliability than other algorithms while maintaining accuracy. △ Less

Submitted 19 March, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

arXiv:2307.06972 [pdf]

One-step Preparation of ZnO Electron Transport Layers Functionalized with Benzoic Acid Derivatives

Authors: Hao Liu, Chao Wang, Wen Liang Tan, Lars Thomsen, Anthony S. R. Chesman, Yvonne Hora, Martyn Jevric, Jonas M. Bjuggren, Mats R. Andersson, Yahui Tang, Lin**g Tang, Doan Vu, Christopher R. McNeill

Abstract: We present a "one-step" approach to modify ZnO electron transport layers (ETLs) used in organic solar cells. This approach involves adding benzoic acid (BZA) derivatives directly to the ZnO precursor solution, which are then present at the surface of the resulting ZnO film. We demonstrate this approach for three different BZA derivatives, namely benzoic acid, chlorobenzoic acid, and 4-hydrazinoben… ▽ More We present a "one-step" approach to modify ZnO electron transport layers (ETLs) used in organic solar cells. This approach involves adding benzoic acid (BZA) derivatives directly to the ZnO precursor solution, which are then present at the surface of the resulting ZnO film. We demonstrate this approach for three different BZA derivatives, namely benzoic acid, chlorobenzoic acid, and 4-hydrazinobenzoic acid. For all molecules, improved device performance and stability is demonstrated in solar cells using an active layer blend of PTQ10 (donor) and ITIC-Br (non-fullerene acceptor) compared to such cells prepared using untreated ZnO. Furthermore, similar or improved device performance and stability is demonstrated compared to conventional PEIE treatment of ZnO. The presence of the BZA derivatives at the surface after processing is established using X-ray photoelectron spectroscopy and near-edge X-ray absorption fine-structure spectroscopy. From atomic force microscopy analysis and X-ray diffraction studies, the addition of BZA derivatives appears to restrict ZnO grain growth; however, this does not negatively impact device performance. ZnO layers treated with BZA derivatives also exhibit higher water contact angle and lower work function compared to untreated ZnO. This approach enables simplification of device manufacture while still allowing optimization of the surface properties of metal oxide ETLs. Keywords: electron transport layers, zinc oxide, organic solar cells, surface modification △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: Manuscript: 25 pages, 8 figures, 5 tables. Supplementary Material: 36 pages, 22 figures, 13 tables. Submitted to Solar Energy Materials and Solar Cells

arXiv:2307.05805 [pdf, other]

Inverse design and additive manufacturing of shape-morphing structures based on functionally graded composites

Authors: Hirak Kansara, Mingchao Liu, Yinfeng He, Wei Tan

Abstract: Shape-morphing structures possess the ability to change their shapes from one state to another, and therefore, offer great potential for a broad range of applications. A typical paradigm of morphing is transforming from an initial two-dimensional (2D) flat configuration into a three-dimensional (3D) target structure. One popular fabrication method for these structures involves programming cuts in… ▽ More Shape-morphing structures possess the ability to change their shapes from one state to another, and therefore, offer great potential for a broad range of applications. A typical paradigm of morphing is transforming from an initial two-dimensional (2D) flat configuration into a three-dimensional (3D) target structure. One popular fabrication method for these structures involves programming cuts in specific locations of a thin sheet material (i.e.~kirigami), forming a desired 3D shape upon application of external mechanical load. In this paper, a novel inverse design strategy is proposed by modifying the bending stiffness via introducing distributed modulus in functionally graded composites (FGCs). The longitudinal modulus of each cross-sectional slice can be controlled through the rule of mixtures, hence matching the required modulus distribution along the elastic strip. Following the proposed framework, a diverse range of structures is obtained with different Gaussian curvatures in both numerical simulations and experiments. A very good agreement is achieved between the measured shapes of morphed structures and the targets. In addition, the compressive rigidity and specific energy absorption during compression of FGC-based hemi-ellipsoidal morphing structures with various aspect ratios were also examined numerically and validated against experiments. By conducting systematical numerical simulations, we also demonstrate the multifunctionality of the modulus-graded shape-morphing composites. This new inverse design framework provides an opportunity to create shape-morphing structures by utilising modulus-graded composite materials, which can be employed in a variety of applications involving multi-physical environments. Furthermore, this framework underscores the versatility of the approach, enabling precise control over material properties at a local level. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: Journal of the Mechanics and Physics of Solids

arXiv:2307.05689 [pdf, other]

Magnetar emergence in a peculiar gamma-ray burst from a compact star merger

Authors: H. Sun, C. -W. Wang, J. Yang, B. -B. Zhang, S. -L. Xiong, Y. -H. I. Yin, Y. Liu, Y. Li, W. -C. Xue, Z. Yan, C. Zhang, W. -J. Tan, H. -W. Pan, J. -C. Liu, H. -Q. Cheng, Y. -Q. Zhang, J. -W. Hu, C. Zheng, Z. -H. An, C. Cai, L. Hu, C. **, D. -Y. Li, X. -Q. Li, H. -Y. Liu , et al. (19 additional authors not shown)

Abstract: The central engine that powers gamma-ray bursts (GRBs), the most powerful explosions in the universe, is still not identified. Besides hyper-accreting black holes, rapidly spinning and highly magnetized neutron stars, known as millisecond magnetars, have been suggested to power both long and short GRBs. The presence of a magnetar engine following compact star mergers is of particular interest as i… ▽ More The central engine that powers gamma-ray bursts (GRBs), the most powerful explosions in the universe, is still not identified. Besides hyper-accreting black holes, rapidly spinning and highly magnetized neutron stars, known as millisecond magnetars, have been suggested to power both long and short GRBs. The presence of a magnetar engine following compact star mergers is of particular interest as it would provide essential constraints on the poorly understood equation of state for neutron stars. Indirect indications of a magnetar engine in these merger sources have been observed in the form of plateau features present in the X-ray afterglow light curves of some short GRBs. Additionally, some X-ray transients lacking gamma-ray bursts (GRB-less) have been identified as potential magnetar candidates originating from compact star mergers. Nevertheless, smoking gun evidence is still lacking for a magnetar engine in short GRBs, and the associated theoretical challenges have been addressed. Here we present a comprehensive analysis of the broad-band prompt emission data of a peculiar, very bright GRB 230307A. Despite its apparently long duration, the prompt emission and host galaxy properties point toward a compact star merger origin, being consistent with its association with a kilonova. More intriguingly, an extended X-ray emission component emerges as the $γ$-ray emission dies out, signifying the emergence of a magnetar central engine. We also identify an achromatic temporal break in the high-energy band during the prompt emission phase, which was never observed in previous bursts and reveals a narrow jet with half opening angle of approximately $3.4^\circ$. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: 44 pages, 10 figures, 5 tables

arXiv:2307.04291 [pdf, other]

Wait, wasn't that code here before? Detecting Outdated Software Documentation

Authors: Wen Siang Tan, Markus Wagner, Christoph Treude

Abstract: Encountering outdated documentation is not a rare occurrence for developers and users in the software engineering community. To ensure that software documentation is up-to-date, developers often have to manually check whether the documentation needs to be updated whenever changes are made to the source code. In our previous work, we proposed an approach to automatically detect outdated code elemen… ▽ More Encountering outdated documentation is not a rare occurrence for developers and users in the software engineering community. To ensure that software documentation is up-to-date, developers often have to manually check whether the documentation needs to be updated whenever changes are made to the source code. In our previous work, we proposed an approach to automatically detect outdated code element references in software repositories and found that more than a quarter of the 1000 most popular projects on GitHub contained at least one outdated reference. In this paper, we present a GitHub Actions tool that builds on our previous work's approach that GitHub developers can configure to automatically scan for outdated code element references in their GitHub project's documentation whenever a pull request is submitted. △ Less

Submitted 9 July, 2023; originally announced July 2023.

arXiv:2307.04061 [pdf]

doi 10.1561/1300000068

Contagion Source Detection in Epidemic and Infodemic Outbreaks: Mathematical Analysis and Network Algorithms

Authors: Chee Wei Tan, Pei-Duo Yu

Abstract: This monograph provides an overview of the mathematical theories and computational algorithm design for contagion source detection in large networks. By leveraging network centrality as a tool for statistical inference, we can accurately identify the source of contagions, trace their spread, and predict future trajectories. This approach provides fundamental insights into surveillance capability a… ▽ More This monograph provides an overview of the mathematical theories and computational algorithm design for contagion source detection in large networks. By leveraging network centrality as a tool for statistical inference, we can accurately identify the source of contagions, trace their spread, and predict future trajectories. This approach provides fundamental insights into surveillance capability and asymptotic behavior of contagion spreading in networks. Mathematical theory and computational algorithms are vital to understanding contagion dynamics, improving surveillance capabilities, and develo** effective strategies to prevent the spread of infectious diseases and misinformation. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: Suggested Citation: Chee Wei Tan and Pei-Duo Yu (2023), "Contagion Source Detection in Epidemic and Infodemic Outbreaks: Mathematical Analysis and Network Algorithms", Foundations and Trends in Networking: Vol. 13: No. 2-3, pp 107-251. http://dx.doi.org/10.1561/1300000068

arXiv:2307.02503 [pdf, other]

doi 10.3390/e25060888

Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review

Authors: Man Fai Wong, Shangxin Guo, Ching Nam Hang, Siu Wai Ho, Chee Wei Tan

Abstract: This paper provides a comprehensive review of the literature concerning the utilization of Natural Language Processing (NLP) techniques, with a particular focus on transformer-based large language models (LLMs) trained using Big Code, within the domain of AI-assisted programming tasks. LLMs, augmented with software naturalness, have played a crucial role in facilitating AI-assisted programming app… ▽ More This paper provides a comprehensive review of the literature concerning the utilization of Natural Language Processing (NLP) techniques, with a particular focus on transformer-based large language models (LLMs) trained using Big Code, within the domain of AI-assisted programming tasks. LLMs, augmented with software naturalness, have played a crucial role in facilitating AI-assisted programming applications, including code generation, code completion, code translation, code refinement, code summarization, defect detection, and clone detection. Notable examples of such applications include the GitHub Copilot powered by OpenAI's Codex and DeepMind AlphaCode. This paper presents an overview of the major LLMs and their applications in downstream tasks related to AI-assisted programming. Furthermore, it explores the challenges and opportunities associated with incorporating NLP techniques with software naturalness in these applications, with a discussion on extending AI-assisted programming capabilities to Apple's Xcode for mobile software development. This paper also presents the challenges of and opportunities for incorporating NLP techniques with software naturalness, empowering developers with advanced coding assistance and streamlining the software development process. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Journal ref: Entropy(2023), 25(6), 888

arXiv:2307.02247 [pdf]

Room-Temperature Ferromagnetism in Fe-doped SnSe Bulk Single Crystalline Semiconductor

Authors: Guangqiang Mei, Wei Tan, Xingxia Cui, Cong Wang, Qing Yuan, Yafei Li, Cancan Lou, Xuefeng Hou, Mengmeng Zhao, Yong Liu, Wei Ji, Xiaona Zhang, Min Feng, Limin Cao

Abstract: The quest for pragmatic room-temperature (RT) magnetic semiconductors (MSs) with a suitable bandgap constitutes one of the contemporary opportunities to be exploited. This may provide a materials platform for to bring new-generation ideal information device technologies into real-world applications where the otherwise conventionally separately utilized charge and spin are simultaneously exploited.… ▽ More The quest for pragmatic room-temperature (RT) magnetic semiconductors (MSs) with a suitable bandgap constitutes one of the contemporary opportunities to be exploited. This may provide a materials platform for to bring new-generation ideal information device technologies into real-world applications where the otherwise conventionally separately utilized charge and spin are simultaneously exploited. Here we present RT ferromagnetism in an Fe-doped SnSe (Fe:SnSe) van der Waals (vdW) single crystalline ferromagnetic semiconductor (FMS) with a semiconducting bandgap of ~1.19 eV (comparable to those of Si and GaAs). The synthesized Fe:SnSe single crystals feature a dilute Fe content of less than 1.0 at%, a Curie temperature of ~310 K, a layered vdW structure identical to that of pristine SnSe, and the absence of in-gap defect states. The Fe:SnSe vdW diluted magnetic semiconductor (DMS) single crystals are grown using a simple temperature-gradient melt-growth process, in which the magnetic Fe atom do** is realized uniquely using FeI2 as the dopant precursor whose melting point is low with respect to crystal growth, and which in principle possesses industrially unlimited scalability. Our work adds a new member in the family of long-searching RT magnetic semiconductors, and may establish a generalized strategy for large-volume production of related DMSs. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 25 pages, 5 figures

arXiv:2307.00773 [pdf, other]

DifFSS: Diffusion Model for Few-Shot Semantic Segmentation

Authors: Weimin Tan, Siyuan Chen, Bo Yan

Abstract: Diffusion models have demonstrated excellent performance in image generation. Although various few-shot semantic segmentation (FSS) models with different network structures have been proposed, performance improvement has reached a bottleneck. This paper presents the first work to leverage the diffusion model for FSS task, called DifFSS. DifFSS, a novel FSS paradigm, can further improve the perform… ▽ More Diffusion models have demonstrated excellent performance in image generation. Although various few-shot semantic segmentation (FSS) models with different network structures have been proposed, performance improvement has reached a bottleneck. This paper presents the first work to leverage the diffusion model for FSS task, called DifFSS. DifFSS, a novel FSS paradigm, can further improve the performance of the state-of-the-art FSS models by a large margin without modifying their network structure. Specifically, we utilize the powerful generation ability of diffusion models to generate diverse auxiliary support images by using the semantic mask, scribble or soft HED boundary of the support image as control conditions. This generation process simulates the variety within the class of the query image, such as color, texture variation, lighting, $etc$. As a result, FSS models can refer to more diverse support images, yielding more robust representations, thereby achieving a consistent improvement in segmentation performance. Extensive experiments on three publicly available datasets based on existing advanced FSS models demonstrate the effectiveness of the diffusion model for FSS task. Furthermore, we explore in detail the impact of different input settings of the diffusion model on segmentation performance. Hopefully, this completely new paradigm will bring inspiration to the study of FSS task integrated with AI-generated content. Code is available at https://github.com/TrinitialChan/DifFSS △ Less

Submitted 11 October, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: code is available at https://github.com/TrinitialChan/DifFSS

arXiv:2306.16795 [pdf, ps, other]

Joint constraint on the jet structure from the short GRB population and GRB 170817A

Authors: Xiao-Feng Cao, Wei-Wei Tan, Yun-Wei Yu, Zhen-Dong Zhang

Abstract: The nearest GRB 170817A provided an opportunity to probe the angular structure of the jet of this short gamma-ray burst (SGRB), by using its off-axis observed afterglow emission. It is investigated that whether the afterglow-constrained jet structures can be consistent with the luminosity of the prompt emission of GRB 170817A. Furthermore, by assuming that all SGRBs including GRB 170817A have the… ▽ More The nearest GRB 170817A provided an opportunity to probe the angular structure of the jet of this short gamma-ray burst (SGRB), by using its off-axis observed afterglow emission. It is investigated that whether the afterglow-constrained jet structures can be consistent with the luminosity of the prompt emission of GRB 170817A. Furthermore, by assuming that all SGRBs including GRB 170817A have the same explosive mechanism and jet structure, we apply the different jet structures into the calculation of the flux and redshfit distributions of the SGRB population, in comparison with the observational distributions of the Swift and Fermi sources. As a result, it is found that the single-Gaussian structure can be basically ruled out, whereas the power-law and two-Gaussian models can in principle survive. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: 9 pages,6 figures

arXiv:2306.15498 [pdf, other]

Using Large Language Models to Provide Explanatory Feedback to Human Tutors

Authors: Jionghao Lin, Danielle R. Thomas, Feifei Han, Shivang Gupta, Wei Tan, Ngoc Dang Nguyen, Kenneth R. Koedinger

Abstract: Research demonstrates learners engaging in the process of producing explanations to support their reasoning, can have a positive impact on learning. However, providing learners real-time explanatory feedback often presents challenges related to classification accuracy, particularly in domain-specific environments, containing situationally complex and nuanced responses. We present two approaches fo… ▽ More Research demonstrates learners engaging in the process of producing explanations to support their reasoning, can have a positive impact on learning. However, providing learners real-time explanatory feedback often presents challenges related to classification accuracy, particularly in domain-specific environments, containing situationally complex and nuanced responses. We present two approaches for supplying tutors real-time feedback within an online lesson on how to give students effective praise. This work-in-progress demonstrates considerable accuracy in binary classification for corrective feedback of effective, or effort-based (F1 score = 0.811), and ineffective, or outcome-based (F1 score = 0.350), praise responses. More notably, we introduce progress towards an enhanced approach of providing explanatory feedback using large language model-facilitated named entity recognition, which can provide tutors feedback, not only while engaging in lessons, but can potentially suggest real-time tutor moves. Future work involves leveraging large language models for data augmentation to improve accuracy, while also develo** an explanatory feedback interface. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: 12 pages Workshop paper, The 24th International Conference on Artificial Intelligence in Education, AIED 2023 Educational Dialogue Act Classification, Large Language Models, Named Entity Recognition, Tutor Training, Explanatory Feedback, Natural Language Processing

arXiv:2306.14519 [pdf, other]

Towards Sustainable Ultrawide Bandgap Van der Waals Materials: An ab initio Screening Effort

Authors: Chuin Wei Tan, Linqiang Xu, Chen Chen Er, Siang-Piao Chai, Boris Kozinsky, Hui Ying Yang, Shengyuan A. Yang, **g Lu, Yee Sin Ang

Abstract: The sustainable development of next-generation device technology is paramount in the face of climate change and the looming energy crisis. Tremendous efforts have been made in the discovery and design of nanomaterials that achieve device-level sustainability, where high performance and low operational energy cost are prioritized. However, many of such materials are composed of elements that are un… ▽ More The sustainable development of next-generation device technology is paramount in the face of climate change and the looming energy crisis. Tremendous efforts have been made in the discovery and design of nanomaterials that achieve device-level sustainability, where high performance and low operational energy cost are prioritized. However, many of such materials are composed of elements that are under threat of depletion and pose elevated risks to the environment. The role of material-level sustainability in computational screening efforts remains an open question thus far. Here we develop a general van der Waals materials screening framework imbued with sustainability-motivated search criteria. Using ultrawide bandgap (UWBG) materials as a backdrop -- an emerging materials class with great prospects in dielectric, power electronics, and ultraviolet device applications, we demonstrate how this screening framework results in 25 sustainable UWBG layered materials comprising only of low-risks elements. Our findings constitute a critical first-step towards reinventing a more sustainable electronics landscape beyond silicon, with the framework established in this work serving as a harbinger of sustainable 2D materials discovery. △ Less

Submitted 25 October, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: 17 pages, 8 figures

arXiv:2306.14459 [pdf, other]

Histopathology Image Classification using Deep Manifold Contrastive Learning

Authors: **g Wei Tan, Won-Ki Jeong

Abstract: Contrastive learning has gained popularity due to its robustness with good feature representation performance. However, cosine distance, the commonly used similarity metric in contrastive learning, is not well suited to represent the distance between two data points, especially on a nonlinear feature manifold. Inspired by manifold learning, we propose a novel extension of contrastive learning that… ▽ More Contrastive learning has gained popularity due to its robustness with good feature representation performance. However, cosine distance, the commonly used similarity metric in contrastive learning, is not well suited to represent the distance between two data points, especially on a nonlinear feature manifold. Inspired by manifold learning, we propose a novel extension of contrastive learning that leverages geodesic distance between features as a similarity metric for histopathology whole slide image classification. To reduce the computational overhead in manifold learning, we propose geodesic-distance-based feature clustering for efficient contrastive loss evaluation using prototypes without time-consuming pairwise feature similarity comparison. The efficacy of the proposed method is evaluated on two real-world histopathology image datasets. Results demonstrate that our method outperforms state-of-the-art cosine-distance-based contrastive learning methods. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.10813 [pdf, other]

Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions

Authors: Yuqi Sun, Ruian He, Weimin Tan, Bo Yan

Abstract: Recent neural talking radiance field methods have shown great success in photorealistic audio-driven talking face synthesis. In this paper, we propose a novel interactive framework that utilizes human instructions to edit such implicit neural representations to achieve real-time personalized talking face generation. Given a short speech video, we first build an efficient talking radiance field, an… ▽ More Recent neural talking radiance field methods have shown great success in photorealistic audio-driven talking face synthesis. In this paper, we propose a novel interactive framework that utilizes human instructions to edit such implicit neural representations to achieve real-time personalized talking face generation. Given a short speech video, we first build an efficient talking radiance field, and then apply the latest conditional diffusion model for image editing based on the given instructions and guiding implicit representation optimization towards the editing target. To ensure audio-lip synchronization during the editing process, we propose an iterative dataset updating strategy and utilize a lip-edge loss to constrain changes in the lip region. We also introduce a lightweight refinement network for complementing image details and achieving controllable detail generation in the final rendered image. Our method also enables real-time rendering at up to 30FPS on consumer hardware. Multiple metrics and user verification show that our approach provides a significant improvement in rendering quality compared to state-of-the-art methods. △ Less

Submitted 16 August, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 11 pages, 8 figures

arXiv:2306.10255 [pdf, other]

doi 10.1029/2022GL102325

The First GECAM Observation Results on Terrestrial Gamma-ray Flashes and Terrestrial Electron Beams

Authors: Y. Zhao, J. C. Liu, S. L. Xiong, W. C. Xue, Q. B. Yi, G. P. Lu, W. Xu, F. C. Lyu, J. C. Sun, W. X. Peng, C. Zheng, Y. Q. Zhang, C. Cai, S. Xiao, S. L. Xie, C. W. Wang, W. J. Tan, Z. H. An, G. Chen, Y. Q. Du, Y. Huang, M. Gao, K. Gong, D. Y. Guo, J. J. He , et al. (37 additional authors not shown)

Abstract: Gravitational-wave high-energy Electromagnetic Counterpart All-sky Monitor (GECAM) is a space-borne instrument dedicated to monitoring high-energy transients, including Terrestrial Gamma-ray Flashes (TGFs) and Terrestrial Electron Beams (TEBs). We implemented a TGF/TEB search algorithm for GECAM, with which 147 bright TGFs, 2 typical TEBs and 2 special TEB-like events are identified during an effe… ▽ More Gravitational-wave high-energy Electromagnetic Counterpart All-sky Monitor (GECAM) is a space-borne instrument dedicated to monitoring high-energy transients, including Terrestrial Gamma-ray Flashes (TGFs) and Terrestrial Electron Beams (TEBs). We implemented a TGF/TEB search algorithm for GECAM, with which 147 bright TGFs, 2 typical TEBs and 2 special TEB-like events are identified during an effective observation time of $\sim$9 months. We show that, with gamma-ray and charged particle detectors, GECAM can effectively identify and distinguish TGFs and TEBs, and measure their temporal and spectral properties in detail. A very high TGF-lightning association rate of $\sim$80\% is obtained between GECAM and GLD360 in east Asia region. △ Less

Submitted 17 June, 2023; originally announced June 2023.

Comments: The paper was accepted by Geophysical Research Letters on June 16th, 2023

arXiv:2306.07394 [pdf, other]

doi 10.3847/1538-4357/acde7b

Excited Hydroxyl Outflow in the High-Mass Star-Forming Region G34.26+0.15

Authors: W. S. Tan, E. D. Araya, C. Rigg, P. Hofner, S. Kurtz, H. Linz, V. Rosero

Abstract: G34.26+0.15 is a region of high-mass star formation that contains a broad range of young stellar objects in different stages of evolution, including a hot molecular core, hyper-compact HII regions and a prototypical cometary ultra-compact HII region. Previous high-sensitivity single dish observations by our group resulted in the detection of broad 6035 MHz OH absorption in this region; the line sh… ▽ More G34.26+0.15 is a region of high-mass star formation that contains a broad range of young stellar objects in different stages of evolution, including a hot molecular core, hyper-compact HII regions and a prototypical cometary ultra-compact HII region. Previous high-sensitivity single dish observations by our group resulted in the detection of broad 6035 MHz OH absorption in this region; the line showed a significant blue-shifted asymmetry indicative of molecular gas expansion. We present high-sensitivity Karl G. Jansky Very Large Array (VLA) observations of the 6035 MHz OH line conducted to image the absorption and investigate its origin with respect to the different star formation sites in the region. In addition, we report detection of 6030 MHz OH absorption with the VLA and further observations of 4.7 GHz and 6.0 GHz OH lines obtained with the Arecibo Telescope. The 6030 MHz OH line shows a very similar absorption profile as the 6035 MHz OH line. We found that the 6035 MHz OH line absorption region is spatially unresolved at $\sim 2$" scales, and it is coincident with one of the bright ionized cores of the cometary HII region that shows broad radio recombination line emission. We discuss a scenario where the OH absorption is tracing the remnants of a pole-on molecular outflow that is being ionized inside-out by the ultra-compact HII region. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: 19 pages, 6 figures. Accepted for publication in The Astrophysical Journal

arXiv:2306.05716 [pdf, other]

Transferring Foundation Models for Generalizable Robotic Manipulation

Authors: Jiange Yang, Wenhui Tan, Chuhao **, Keling Yao, Bei Liu, Jianlong Fu, Ruihua Song, Gangshan Wu, Limin Wang

Abstract: Improving the generalization capabilities of general-purpose robotic manipulation agents in the real world has long been a significant challenge. Existing approaches often rely on collecting large-scale robotic data which is costly and time-consuming, such as the RT-1 dataset. However, due to insufficient diversity of data, these approaches typically suffer from limiting their capability in open-d… ▽ More Improving the generalization capabilities of general-purpose robotic manipulation agents in the real world has long been a significant challenge. Existing approaches often rely on collecting large-scale robotic data which is costly and time-consuming, such as the RT-1 dataset. However, due to insufficient diversity of data, these approaches typically suffer from limiting their capability in open-domain scenarios with new objects and diverse environments. In this paper, we propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models, to condition robot manipulation tasks. By integrating the mask modality, which incorporates semantic, geometric, and temporal correlation priors derived from vision foundation models, into the end-to-end policy model, our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning, including new object instances, semantic categories, and unseen backgrounds. We first introduce a series of foundation models to ground natural language demands across multiple tasks. Secondly, we develop a two-stream 2D policy model based on imitation learning, which processes raw images and object masks to predict robot actions with a local-global perception manner. Extensive realworld experiments conducted on a Franka Emika robot arm demonstrate the effectiveness of our proposed paradigm and policy architecture. Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2. △ Less

Submitted 18 March, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: 9 pages, 5 figures

arXiv:2306.05506 [pdf, other]

doi 10.1002/adma.202309393

Non-polaritonic effects in cavity-modified photochemistry

Authors: Philip A. Thomas, Wai Jue Tan, Vasyl G. Kravets, Alexander N. Grigorenko, William L. Barnes

Abstract: Strong coupling of molecules to vacuum fields has been widely reported to lead to modified chemical properties such as reaction rates. However, some recent attempts to reproduce infrared strong coupling results have not been successful, suggesting that factors other than strong coupling may sometimes be involved. Here we re-examine the first of these vacuum-modified chemistry experiments in which… ▽ More Strong coupling of molecules to vacuum fields has been widely reported to lead to modified chemical properties such as reaction rates. However, some recent attempts to reproduce infrared strong coupling results have not been successful, suggesting that factors other than strong coupling may sometimes be involved. Here we re-examine the first of these vacuum-modified chemistry experiments in which changes to a molecular photoisomerisation process in the UV-vis spectral range were attributed to strong coupling of the molecules to visible light. We observed significant variations in photoisomerisation rates consistent with the original work; however, we found no evidence that these changes need to be attributed to strong coupling. Instead, we suggest that the photoisomerisation rates involved are most strongly influenced by the absorption of ultraviolet radiation in the cavity. Our results indicate that care must be taken to rule out non-polaritonic effects before invoking strong coupling to explain any changes of chemical properties arising in cavity-based experiments. △ Less

Submitted 14 July, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 33 pages, 17 figures

Journal ref: Adv. Mater. 2023, 2309393

arXiv:2306.01069 [pdf, other]

TimelineQA: A Benchmark for Question Answering over Timelines

Authors: Wang-Chiew Tan, Jane Dwivedi-Yu, Yuliang Li, Lambert Mathias, Marzieh Saeidi, **g Nathan Yan, Alon Y. Halevy

Abstract: Lifelogs are descriptions of experiences that a person had during their life. Lifelogs are created by fusing data from the multitude of digital services, such as online photos, maps, shop** and content streaming services. Question answering over lifelogs can offer personal assistants a critical resource when they try to provide advice in context. However, obtaining answers to questions over life… ▽ More Lifelogs are descriptions of experiences that a person had during their life. Lifelogs are created by fusing data from the multitude of digital services, such as online photos, maps, shop** and content streaming services. Question answering over lifelogs can offer personal assistants a critical resource when they try to provide advice in context. However, obtaining answers to questions over lifelogs is beyond the current state of the art of question answering techniques for a variety of reasons, the most pronounced of which is that lifelogs combine free text with some degree of structure such as temporal and geographical information. We create and publicly release TimelineQA1, a benchmark for accelerating progress on querying lifelogs. TimelineQA generates lifelogs of imaginary people. The episodes in the lifelog range from major life episodes such as high school graduation to those that occur on a daily basis such as going for a run. We describe a set of experiments on TimelineQA with several state-of-the-art QA models. Our experiments reveal that for atomic queries, an extractive QA system significantly out-performs a state-of-the-art retrieval-augmented QA system. For multi-hop queries involving aggregates, we show that the best result is obtained with a state-of-the-art table QA technique, assuming the ground truth set of episodes for deriving the answer is available. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2306.01061 [pdf, other]

Reimagining Retrieval Augmented Language Models for Answering Queries

Authors: Wang-Chiew Tan, Yuliang Li, Pedro Rodriguez, Richard James, Xi Victoria Lin, Alon Halevy, Scott Yih

Abstract: We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison. Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions, as opposed to the parametric nature of vanilla large language models. We give initial experimental findings that semi-pa… ▽ More We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison. Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions, as opposed to the parametric nature of vanilla large language models. We give initial experimental findings that semi-parametric architectures can be enhanced with views, a query analyzer/planner, and provenance to make a significantly more powerful system for question answering in terms of accuracy and efficiency, and potentially for other NLP tasks △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.18898 [pdf, other]

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

Authors: Chuhao **, Wenhui Tan, Jiange Yang, Bei Liu, Ruihua Song, Limin Wang, Jianlong Fu

Abstract: We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks, such as making a smiley face using building blocks. These tasks often involve complex multi-step reasoning, presenting significant challenges due to the limited paired data connecting human instructions (e.g., making a smiley face) and robot actions (e.g., end-effector movement). Existing appro… ▽ More We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks, such as making a smiley face using building blocks. These tasks often involve complex multi-step reasoning, presenting significant challenges due to the limited paired data connecting human instructions (e.g., making a smiley face) and robot actions (e.g., end-effector movement). Existing approaches relieve this challenge by adopting an open-loop paradigm decomposing high-level instructions into simple sub-task plans, and executing them step-by-step using low-level control models. However, these approaches are short of instant observations in multi-step reasoning, leading to sub-optimal results. To address this issue, we propose to automatically collect a cognitive robot dataset by Large Language Models (LLMs). The resulting dataset AlphaBlock consists of 35 comprehensive high-level tasks of multi-step text plans and paired observation sequences. To enable efficient data acquisition, we employ elaborated multi-round prompt designs that effectively reduce the burden of extensive human involvement. We further propose a closed-loop multi-modal embodied planning model that autoregressively generates plans by taking image observations as input. To facilitate effective learning, we leverage MiniGPT-4 with a frozen visual encoder and LLM, and finetune additional vision adapter and Q-former to enable fine-grained spatial perception for manipulation tasks. We conduct experiments to verify the superiority over existing open and closed-loop methods, and achieve a significant increase in success rate by 21.4% and 14.5% over ChatGPT and GPT-4 based robot tasks. Real-world demos are shown in https://www.youtube.com/watch?v=ayAzID1_qQk . △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.16642 [pdf, other]

doi 10.1007/s10618-023-00948-2

Improving Position Encoding of Transformers for Multivariate Time Series Classification

Authors: Navid Mohammadi Foumani, Chang Wei Tan, Geoffrey I. Webb, Mahsa Salehi

Abstract: Transformers have demonstrated outstanding performance in many applications of deep learning. When applied to time series data, transformers require effective position encoding to capture the ordering of the time series data. The efficacy of position encoding in time series analysis is not well-studied and remains controversial, e.g., whether it is better to inject absolute position encoding or re… ▽ More Transformers have demonstrated outstanding performance in many applications of deep learning. When applied to time series data, transformers require effective position encoding to capture the ordering of the time series data. The efficacy of position encoding in time series analysis is not well-studied and remains controversial, e.g., whether it is better to inject absolute position encoding or relative position encoding, or a combination of them. In order to clarify this, we first review existing absolute and relative position encoding methods when applied in time series classification. We then proposed a new absolute position encoding method dedicated to time series data called time Absolute Position Encoding (tAPE). Our new method incorporates the series length and input embedding dimension in absolute position encoding. Additionally, we propose computationally Efficient implementation of Relative Position Encoding (eRPE) to improve generalisability for time series. We then propose a novel multivariate time series classification (MTSC) model combining tAPE/eRPE and convolution-based input encoding named ConvTran to improve the position and data embedding of time series data. The proposed absolute and relative position encoding methods are simple and efficient. They can be easily integrated into transformer blocks and used for downstream tasks such as forecasting, extrinsic regression, and anomaly detection. Extensive experiments on 32 multivariate time-series datasets show that our model is significantly more accurate than state-of-the-art convolution and transformer-based models. Code and models are open-sourced at \url{https://github.com/Navidfoumani/ConvTran}. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.14655 [pdf, other]

Learning Survival Distribution with Implicit Survival Function

Authors: Yu Ling, Weimin Tan, Bo Yan

Abstract: Survival analysis aims at modeling the relationship between covariates and event occurrence with some untracked (censored) samples. In implementation, existing methods model the survival distribution with strong assumptions or in a discrete time space for likelihood estimation with censorship, which leads to weak generalization. In this paper, we propose Implicit Survival Function (ISF) based on I… ▽ More Survival analysis aims at modeling the relationship between covariates and event occurrence with some untracked (censored) samples. In implementation, existing methods model the survival distribution with strong assumptions or in a discrete time space for likelihood estimation with censorship, which leads to weak generalization. In this paper, we propose Implicit Survival Function (ISF) based on Implicit Neural Representation for survival distribution estimation without strong assumptions,and employ numerical integration to approximate the cumulative distribution function for prediction and optimization. Experimental results show that ISF outperforms the state-of-the-art methods in three public datasets and has robustness to the hyperparameter controlling estimation precision. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.13993 [pdf, other]

Condensing Multilingual Knowledge with Lightweight Language-Specific Modules

Authors: Haoran Xu, Weiting Tan, Shuyue Stella Li, Yunmo Chen, Benjamin Van Durme, Philipp Koehn, Kenton Murray

Abstract: Incorporating language-specific (LS) modules is a proven method to boost performance in multilingual machine translation. This approach bears similarity to Mixture-of-Experts (MoE) because it does not inflate FLOPs. However, the scalability of this approach to hundreds of languages (experts) tends to be unmanageable due to the prohibitive number of parameters introduced by full-rank matrices in fu… ▽ More Incorporating language-specific (LS) modules is a proven method to boost performance in multilingual machine translation. This approach bears similarity to Mixture-of-Experts (MoE) because it does not inflate FLOPs. However, the scalability of this approach to hundreds of languages (experts) tends to be unmanageable due to the prohibitive number of parameters introduced by full-rank matrices in fully-connected layers. In this work, we introduce the Language-Specific Matrix Synthesis (LMS) method. This approach constructs LS modules by generating low-rank matrices from two significantly smaller matrices to approximate the full-rank matrix. Furthermore, we condense multilingual knowledge from multiple LS modules into a single shared module with the Fuse Distillation (FD) technique to improve the efficiency of inference and model serialization. We show that our LMS method significantly outperforms previous LS methods and MoE methods with the same amount of extra parameters, e.g., 1.73 BLEU points over the Switch Transformer on many-to-many multilingual machine translation. Importantly, LMS is able to have comparable translation performance with much fewer parameters. △ Less

Submitted 22 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted at the main conference of EMNLP 2023

arXiv:2305.12458 [pdf, other]

Infor-Coef: Information Bottleneck-based Dynamic Token Downsampling for Compact and Efficient language model

Authors: Wenxi Tan

Abstract: The prevalence of Transformer-based pre-trained language models (PLMs) has led to their wide adoption for various natural language processing tasks. However, their excessive overhead leads to large latency and computational costs. The statically compression methods allocate fixed computation to different samples, resulting in redundant computation. The dynamic token pruning method selectively shor… ▽ More The prevalence of Transformer-based pre-trained language models (PLMs) has led to their wide adoption for various natural language processing tasks. However, their excessive overhead leads to large latency and computational costs. The statically compression methods allocate fixed computation to different samples, resulting in redundant computation. The dynamic token pruning method selectively shortens the sequences but are unable to change the model size and hardly achieve the speedups as static pruning. In this paper, we propose a model accelaration approaches for large language models that incorporates dynamic token downsampling and static pruning, optimized by the information bottleneck loss. Our model, Infor-Coef, achieves an 18x FLOPs speedup with an accuracy degradation of less than 8\% compared to BERT. This work provides a promising approach to compress and accelerate transformer-based models for NLP tasks. △ Less

Submitted 21 May, 2023; originally announced May 2023.

arXiv:2305.11921 [pdf, other]

An Approach to Multiple Comparison Benchmark Evaluations that is Stable Under Manipulation of the Comparate Set

Authors: Ali Ismail-Fawaz, Angus Dempster, Chang Wei Tan, Matthieu Herrmann, Lynn Miller, Daniel F. Schmidt, Stefano Berretti, Jonathan Weber, Maxime Devanne, Germain Forestier, Geoffrey I. Webb

Abstract: The measurement of progress using benchmarks evaluations is ubiquitous in computer science and machine learning. However, common approaches to analyzing and presenting the results of benchmark comparisons of multiple algorithms over multiple datasets, such as the critical difference diagram introduced by Demšar (2006), have important shortcomings and, we show, are open to both inadvertent and inte… ▽ More The measurement of progress using benchmarks evaluations is ubiquitous in computer science and machine learning. However, common approaches to analyzing and presenting the results of benchmark comparisons of multiple algorithms over multiple datasets, such as the critical difference diagram introduced by Demšar (2006), have important shortcomings and, we show, are open to both inadvertent and intentional manipulation. To address these issues, we propose a new approach to presenting the results of benchmark comparisons, the Multiple Comparison Matrix (MCM), that prioritizes pairwise comparisons and precludes the means of manipulating experimental results in existing approaches. MCM can be used to show the results of an all-pairs comparison, or to show the results of a comparison between one or more selected algorithms and the state of the art. MCM is implemented in Python and is publicly available. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.10713 [pdf, other]

Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency

Authors: Lingfeng Shen, Weiting Tan, Boyuan Zheng, Daniel Khashabi

Abstract: With growing capabilities of large language models, prompting them has become the dominant way to access them. This has motivated the development of strategies for automatically selecting effective language prompts. In this paper, we introduce prompt flatness, a new metric to quantify the expected utility of a language prompt. This metric is inspired by flatness regularization in statistical learn… ▽ More With growing capabilities of large language models, prompting them has become the dominant way to access them. This has motivated the development of strategies for automatically selecting effective language prompts. In this paper, we introduce prompt flatness, a new metric to quantify the expected utility of a language prompt. This metric is inspired by flatness regularization in statistical learning that quantifies the robustness of the model towards its parameter perturbations. We provide theoretical foundations for this metric and its relationship with other prompt selection metrics, providing a comprehensive understanding of existing methods. Empirically, we show that combining prompt flatness with existing metrics improves both performance and sample efficiency. Our metric outperforms the previous prompt selection metrics with an average increase of 5% in accuracy and 10% in Pearson correlation across 6 classification benchmarks. △ Less

Submitted 22 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.09139 [pdf]

Mechanism design for the end-to-end deterministic transmissions with decoupled time domains

Authors: Binwei Wu, Shuo Wang, Weiqian Tan

Abstract: This paper proposes an innovative end-to-end deterministic network mechanism to achieve delay-bounded transmissions across multiple network domains. The proposed mechanism installs discrete shapers at the edge of the network domains, which serves to decouple the clock domains of different networks. Thereby, the challenges associated with cross-domain clock synchronization that are inherent in stat… ▽ More This paper proposes an innovative end-to-end deterministic network mechanism to achieve delay-bounded transmissions across multiple network domains. The proposed mechanism installs discrete shapers at the edge of the network domains, which serves to decouple the clock domains of different networks. Thereby, the challenges associated with cross-domain clock synchronization that are inherent in state-of-the-art deterministic mechanisms are mitigated, e.g., high complexity during the system implementation and the traffic scheduling. Moreover, the proposed mechanism enhances the availability of the deterministic networking, i.e., not only periodic deterministic traffic, but also aperiodic deterministic traffic and stochastic flows are enabled to be served. Furthermore, an auction-based online scheduling algorithm is developed to improve network efficiency and reduce cost. Simulation results show that the proposed mechanism can effectively realize the end-to-end delay-bounded transmission across multiple domains. Meanwhile, the cross-domain latency could also be reduced compared to the existing methods. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: in Chinese language

arXiv:2305.02760 [pdf, other]

Multi-Modality Deep Network for JPEG Artifacts Reduction

Authors: Xuhao Jiang, Weimin Tan, Qing Lin, Chenxi Ma, Bo Yan, Liquan Shen

Abstract: In recent years, many convolutional neural network-based models are designed for JPEG artifacts reduction, and have achieved notable progress. However, few methods are suitable for extreme low-bitrate image compression artifacts reduction. The main challenge is that the highly compressed image loses too much information, resulting in reconstructing high-quality image difficultly. To address this i… ▽ More In recent years, many convolutional neural network-based models are designed for JPEG artifacts reduction, and have achieved notable progress. However, few methods are suitable for extreme low-bitrate image compression artifacts reduction. The main challenge is that the highly compressed image loses too much information, resulting in reconstructing high-quality image difficultly. To address this issue, we propose a multimodal fusion learning method for text-guided JPEG artifacts reduction, in which the corresponding text description not only provides the potential prior information of the highly compressed image, but also serves as supplementary information to assist in image deblocking. We fuse image features and text semantic features from the global and local perspectives respectively, and design a contrastive loss built upon contrastive learning to produce visually pleasing results. Extensive experiments, including a user study, prove that our method can obtain better deblocking results compared to the state-of-the-art methods. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: 18 pages, 17 figures, accepted by IJCAI 2023

arXiv:2304.13583 [pdf, other]

Multi-Modality Deep Network for Extreme Learned Image Compression

Authors: Xuhao Jiang, Weimin Tan, Tian Tan, Bo Yan, Liquan Shen

Abstract: Image-based single-modality compression learning approaches have demonstrated exceptionally powerful encoding and decoding capabilities in the past few years , but suffer from blur and severe semantics loss at extremely low bitrates. To address this issue, we propose a multimodal machine learning method for text-guided image compression, in which the semantic information of text is used as prior i… ▽ More Image-based single-modality compression learning approaches have demonstrated exceptionally powerful encoding and decoding capabilities in the past few years , but suffer from blur and severe semantics loss at extremely low bitrates. To address this issue, we propose a multimodal machine learning method for text-guided image compression, in which the semantic information of text is used as prior information to guide image compression for better compression performance. We fully study the role of text description in different components of the codec, and demonstrate its effectiveness. In addition, we adopt the image-text attention module and image-request complement module to better fuse image and text features, and propose an improved multimodal semantic-consistent loss to produce semantically complete reconstructions. Extensive experiments, including a user study, prove that our method can obtain visually pleasing results at extremely low bitrates, and achieves a comparable or even better performance than state-of-the-art methods, even though these methods are at 2x to 4x bitrates of ours. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: 13 pages, 14 figures, accepted by AAAI 2023

arXiv:2304.13010 [pdf, ps, other]

Unstructured and structured data: Can we have the best of both worlds with large language models?

Authors: Wang-Chiew Tan

Abstract: This paper presents an opinion on the potential of using large language models to query on both unstructured and structured data. It also outlines some research challenges related to the topic of building question-answering systems for both types of data. This paper presents an opinion on the potential of using large language models to query on both unstructured and structured data. It also outlines some research challenges related to the topic of building question-answering systems for both types of data. △ Less

Submitted 5 July, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.10001 [pdf, other]

Weakly Supervised Detection of Baby Cry

Authors: Weijun Tan, Qi Yao, **gfeng Liu

Abstract: Detection of baby cries is an important part of baby monitoring and health care. Almost all existing methods use supervised SVM, CNN, or their varieties. In this work, we propose to use weakly supervised anomaly detection to detect a baby cry. In this weak supervision, we only need weak annotation if there is a cry in an audio file. We design a data mining technique using the pre-trained VGGish fe… ▽ More Detection of baby cries is an important part of baby monitoring and health care. Almost all existing methods use supervised SVM, CNN, or their varieties. In this work, we propose to use weakly supervised anomaly detection to detect a baby cry. In this weak supervision, we only need weak annotation if there is a cry in an audio file. We design a data mining technique using the pre-trained VGGish feature extractor and an anomaly detection network on long untrimmed audio files. The obtained datasets are used to train a simple CNN feature network for cry/non-cry classification. This CNN is then used as a feature extractor in an anomaly detection framework to achieve better cry detection performance. △ Less

Submitted 25 November, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Showing 51–100 of 449 results for author: Tan, W