-
Additive engineering for Sb$_2$S$_3$ indoor photovoltaics with efficiency exceeding 17%
Authors:
Xiao Chen,
Xiaoxuan Shu,
Jiangcheng Zhou,
Lei Wan,
Peng Xiao,
Yuchen Fu,
Junzhi Ye,
Yi-Teng Huang,
Bin Yan,
Dingjiang Xue,
Tao Chen,
Jiejie Chen,
Robert L. Z. Hoye,
Ru Zhou
Abstract:
Indoor photovoltaics (IPVs) have attracted increasing attention for sustainably powering Internet of Things (IoT) electronics. Sb$_2$S$_3$ is a promising IPV candidate material with a bandgap of ~1.75 eV, which is near the optimal value for indoor energy harvesting. However, the performance of Sb$_2$S$_3$ solar cells is limited by nonradiative recombination, closely associated with the poor-qualit…
▽ More
Indoor photovoltaics (IPVs) have attracted increasing attention for sustainably powering Internet of Things (IoT) electronics. Sb$_2$S$_3$ is a promising IPV candidate material with a bandgap of ~1.75 eV, which is near the optimal value for indoor energy harvesting. However, the performance of Sb$_2$S$_3$ solar cells is limited by nonradiative recombination, closely associated with the poor-quality absorber films. Additive engineering is an effective strategy to improved the properties of solution-processed films. This work shows that the addition of monoethanolamine (MEA) into the precursor solution allows the nucleation and growth of Sb$_2$S$_3$ films to be controlled, enabling the deposition of high-quality Sb$_2$S$_3$ absorbers with reduced grain boundary density, optimized band positions and increased carrier concentration. Complemented with computations, it is revealed that the incorporation of MEA leads to a more efficient and energetically favorable deposition for enhanced heterogeneous nucleation on the substrate, which increases the grain size and accelerates the deposition rate of Sb$_2$S$_3$ films. Due to suppressed carrier recombination and improved charge-carrier transport in Sb$_2$S$_3$ absorber films, the MEA-modulated Sb$_2$S$_3$ solar cell yields a power conversion efficiency (PCE) of 7.22% under AM1.5G illumination, and an IPV PCE of 17.55% under 1000 lux white light emitting diode (WLED) illumination, which is the highest yet reported for Sb$_2$S$_3$ IPVs. Furthermore, we construct high performance large-area Sb$_2$S$_3$ IPV modules to power IoT wireless sensors, and realize the long-term continuous recording of environmental parameters under WLED illumination in an office. This work highlights the great prospect of Sb$_2$S$_3$ photovoltaics for indoor energy harvesting.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
The SkatingVerse Workshop & Challenge: Methods and Results
Authors:
Jian Zhao,
Lei **,
Jianshu Li,
Zheng Zhu,
Yinglei Teng,
Jiaojiao Zhao,
Sadaf Gulshad,
Zheng Wang,
Bo Zhao,
Xiangbo Shu,
Yunchao Wei,
Xuecheng Nie,
Xiaojie **,
Xiaodan Liang,
Shin'ichi Satoh,
Yandong Guo,
Cewu Lu,
Junliang Xing,
Jane Shen Shengmei
Abstract:
The SkatingVerse Workshop & Challenge aims to encourage research in develo** novel and accurate methods for human action understanding. The SkatingVerse dataset used for the SkatingVerse Challenge has been publicly released. There are two subsets in the dataset, i.e., the training subset and testing subset. The training subsets consists of 19,993 RGB video sequences, and the testing subsets cons…
▽ More
The SkatingVerse Workshop & Challenge aims to encourage research in develo** novel and accurate methods for human action understanding. The SkatingVerse dataset used for the SkatingVerse Challenge has been publicly released. There are two subsets in the dataset, i.e., the training subset and testing subset. The training subsets consists of 19,993 RGB video sequences, and the testing subsets consists of 8,586 RGB video sequences. Around 10 participating teams from the globe competed in the SkatingVerse Challenge. In this paper, we provide a brief summary of the SkatingVerse Workshop & Challenge including brief introductions to the top three methods. The submission leaderboard will be reopened for researchers that are interested in the human action understanding challenge. The benchmark dataset and other information can be found at: https://skatingverse.github.io/.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition
Authors:
Meiqi Cao,
Rui Yan,
Xiangbo Shu,
Guangzhao Dai,
Yazhou Yao,
Guo-Sen Xie
Abstract:
Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multip…
▽ More
Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multiple persons with varying size and spatial occlusion in panoramic scenes, blocking the performance gain of PAR. To this end, we consider learning a detector adapting varying-size occluded persons, which is optimized along with the recognition module in the all-in-one framework. Therefore, we propose a novel Adapt-Focused bi-Propagating Prototype learning (AdaFPP) framework to jointly recognize individual, group, and global activities in panoramic activity scenes by learning an adapt-focused detector and multi-granularity prototypes as the pretext tasks in an end-to-end way. Specifically, to accommodate the varying sizes and spatial occlusion of multiple persons in crowed panoramic scenes, we introduce a panoramic adapt-focuser, achieving the size-adapting detection of individuals by comprehensively selecting and performing fine-grained detections on object-dense sub-regions identified through original detections. In addition, to mitigate information loss due to inaccurate individual localizations, we introduce a bi-propagation prototyper that promotes closed-loop interaction and informative consistency across different granularities by facilitating bidirectional information propagation among the individual, group, and global levels. Extensive experiments demonstrate the significant performance of AdaFPP and emphasize its powerful applicability for PAR.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
Authors:
Hongyu Qu,
Rui Yan,
Xiangbo Shu,
Hailiang Gao,
Peng Huang,
Guo-Sen Xie
Abstract:
Recent few-shot action recognition (FSAR) methods typically perform semantic matching on learned discriminative features to achieve promising performance. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, etc) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocit…
▽ More
Recent few-shot action recognition (FSAR) methods typically perform semantic matching on learned discriminative features to achieve promising performance. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, etc) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocity Progressive-alignment (MVP-Shot) framework to progressively learn and align semantic-related action features at multi-velocity levels. Concretely, a Multi-Velocity Feature Alignment (MVFA) module is designed to measure the similarity between features from support and query videos with different velocity scales and then merge all similarity scores in a residual fashion. To avoid the multiple velocity features deviating from the underlying motion semantic, our proposed Progressive Semantic-Tailored Interaction (PSTI) module injects velocity-tailored text information into the video feature via feature interaction on channel and temporal domains at different velocities. The above two modules compensate for each other to make more accurate query sample predictions under the few-shot settings. Experimental results show our method outperforms current state-of-the-art methods on multiple standard few-shot benchmarks (i.e., HMDB51, UCF101, Kinetics, and SSv2-small).
△ Less
Submitted 23 May, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
A Channel to Form Fast-spinning Black Hole-Neutron Star Binary Mergers as Multimessenger Sources. II. Accretion-induced Spin-up
Authors:
Zhen-Han-Tao Wang,
Rui-Chong Hu,
Ying Qin,
**-** Zhu,
Bing Zhang,
Shuang-Xi Yi,
Qin-Wen Tang,
Xin-Wen Shu,
Fen Lyu,
En-Wei Liang
Abstract:
In this work, we investigate an alternative channel for the formation of fast-spinning black hole-neutron star (BHNS) binaries, in which super-Eddington accretion is expected to occur in accreting BHs during the stable mass transfer phase within BH-stripped helium (BH--He-rich) star binary systems. We evolve intensive \texttt{MESA} grids of close-orbit BH--He-rich star systems to systematically ex…
▽ More
In this work, we investigate an alternative channel for the formation of fast-spinning black hole-neutron star (BHNS) binaries, in which super-Eddington accretion is expected to occur in accreting BHs during the stable mass transfer phase within BH-stripped helium (BH--He-rich) star binary systems. We evolve intensive \texttt{MESA} grids of close-orbit BH--He-rich star systems to systematically explore the projected aligned spins of BHs in BHNS binaries, as well as the impact of different accretion limits on the tidal disruption probability and electromagnetic (EM) signature of BHNS mergers. Most of the BHs in BHNS mergers cannot be effectively spun up through accretion, if the accretion rate is limited to $\lesssim10\,\dot{M}_{\rm Edd}$, where $\dot{M}_{\rm Edd}$ is the standard Eddington accretion limit. In order to reach high spins (e.g., $χ_{\rm BH} \gtrsim 0.5$), the BHs are required to be born less massive (e.g., $\lesssim3.0\,M_\odot$) in binary systems with initial periods of $\lesssim0.2-0.3\,{\rm days}$ and accrete material at $\sim100\,\dot{M}_{\rm Edd}$. However, even under this high accretion limit, $\gtrsim6\,M_\odot$ BHs are typically challenging to significantly spin up and generate detectable associated EM signals. Our population simulations suggest that different accretion limits have a slight impact on the ratio of tidal disruption events. However, as the accretion limit increases, the EM counterparts from the cosmological BHNS population can become bright overall.
△ Less
Submitted 23 February, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Authors:
Guangzhao Dai,
Xiangbo Shu,
Wenhao Wu,
Rui Yan,
Jiachao Zhang
Abstract:
Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks. This advancement paves the way for notable performance in Zero-Shot Egocentric Action Recognition (ZS-EAR). Typically, VLMs handle ZS-EAR as a global video-text matching task, which often leads to suboptimal alignment of vision and linguistic knowledge. We prop…
▽ More
Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks. This advancement paves the way for notable performance in Zero-Shot Egocentric Action Recognition (ZS-EAR). Typically, VLMs handle ZS-EAR as a global video-text matching task, which often leads to suboptimal alignment of vision and linguistic knowledge. We propose a refined approach for ZS-EAR using VLMs, emphasizing fine-grained concept-description alignment that capitalizes on the rich semantic and contextual details in egocentric videos. In this paper, we introduce GPT4Ego, a straightforward yet remarkably potent VLM framework for ZS-EAR, designed to enhance the fine-grained alignment of concept and description between vision and language. Extensive experiments demonstrate GPT4Ego significantly outperforms existing VLMs on three large-scale egocentric video benchmarks, i.e., EPIC-KITCHENS-100 (33.2%, +9.4%), EGTEA (39.6%, +5.5%), and CharadesEgo (31.5%, +2.6%).
△ Less
Submitted 11 May, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
ASASSN-18ap: A Dusty Tidal Disruption Event Candidate with an Early Bump in the Light Curve
Authors:
Yibo Wang,
Tingui Wang,
Ning Jiang,
Xiaer Zhang,
JiaZheng Zhu,
XinWen Shu,
Shifeng Huang,
FaBao Zhang,
Zhenfeng Sheng,
Zheyu Lin
Abstract:
We re-examined the classification of the optical transient ASASSN-18ap, which was initially identified as a supernova (SNe) upon its discovery. Based on newly emerged phenomena, such as a delayed luminous infrared outburst and the emergence of luminous coronal emission lines, we suggest that ASASSN-18ap is more likely a tidal disruption event (TDE) in a dusty environment, rather than a supernova.…
▽ More
We re-examined the classification of the optical transient ASASSN-18ap, which was initially identified as a supernova (SNe) upon its discovery. Based on newly emerged phenomena, such as a delayed luminous infrared outburst and the emergence of luminous coronal emission lines, we suggest that ASASSN-18ap is more likely a tidal disruption event (TDE) in a dusty environment, rather than a supernova. The total energy in the infrared outburst is $\rm 3.1\times10^{51}$ erg, which is an order of magnitude higher than the total energy in the optical-to-ultraviolet range, indicating a large dust extinction, an extra-EUV component, or anisotropic continuum emission. A bumpy feature appeared in the optical light curve at the start of brightening, which was reported in a couple of TDEs very recently. This early bump may have been overlooked in the past due to the lack of sufficient sampling of the light curves of most TDEs during their ascending phase, and it could provide insight into the origin of optical emission.
△ Less
Submitted 10 March, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
MAD-MulW: A Multi-Window Anomaly Detection Framework for BGP Security Events
Authors:
Songtao Peng,
Yi** Chen,
Xincheng Shu,
Wu Shuai,
Shenhao Fang,
Zhongyuan Ruan,
Qi Xuan
Abstract:
In recent years, various international security events have occurred frequently and interacted between real society and cyberspace. Traditional traffic monitoring mainly focuses on the local anomalous status of events due to a large amount of data. BGP-based event monitoring makes it possible to perform differential analysis of international events. For many existing traffic anomaly detection meth…
▽ More
In recent years, various international security events have occurred frequently and interacted between real society and cyberspace. Traditional traffic monitoring mainly focuses on the local anomalous status of events due to a large amount of data. BGP-based event monitoring makes it possible to perform differential analysis of international events. For many existing traffic anomaly detection methods, we have observed that the window-based noise reduction strategy effectively improves the success rate of time series anomaly detection. Motivated by this observation, we propose an unsupervised anomaly detection model, MAD-MulW, which incorporates a multi-window serial framework. Firstly, we design the W-GAT module to adaptively update the sample weights within the window and retain the updated information of the trailing sample, which not only reduces the outlier samples' noise but also avoids the space consumption of data scale expansion. Then, the W-LAT module based on predictive reconstruction both captures the trend of sample fluctuations over a certain period of time and increases the interclass variation through the reconstruction of the predictive sample. Our model has been experimentally validated on multiple BGP anomalous events with an average F1 score of over 90\%, which demonstrates the significant improvement effect of the stage windows and adaptive strategy on the efficiency and stability of the timing model.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Delayed and fast rising radio flares from an optical and X-ray detected tidal disruption event in the center of a dwarf galaxy
Authors:
Fabao Zhang,
Xinwen Shu,
Lei Yang,
Luming Sun,
Zhumao Zhang,
Yibo Wang,
Guobin Mou,
Xue-Guang Zhang,
Tianyao Zhou,
Fangkun Peng
Abstract:
AT2018cqh is a unique tidal disruption event (TDE) candidate discovered in a dwarf galaxy. Both the light curve fitting and galaxy scaling relationships suggest a central black hole mass in the range of 5.9<logM_BH/M_sun<6.4. A delayed X-ray brightening was found around 590 days after the optical discovery, but shows unusual long-time rising to peak over at least 558 days, which could be coming fr…
▽ More
AT2018cqh is a unique tidal disruption event (TDE) candidate discovered in a dwarf galaxy. Both the light curve fitting and galaxy scaling relationships suggest a central black hole mass in the range of 5.9<logM_BH/M_sun<6.4. A delayed X-ray brightening was found around 590 days after the optical discovery, but shows unusual long-time rising to peak over at least 558 days, which could be coming from delayed accretion of a newly forming debris disk. We report the discovery of delayed radio flares around 1105 days since its discovery, characterized by an initial steep rise of ~>175 days, a flattening lasting about 544 days, and a phase with another steep rise. The rapid rise in radio flux coupled with the slow decay in the X-ray emission points to a delayed launching of outflow, perhaps due to a transition in the accretion state. However, known accretion models can hardly explain the origins of the secondary radio flare that is rising even more rapidly in comparison with the initial one. If confirmed, AT2018cqh would be a rare TDE in a dwarf galaxy exhibiting optical, X-ray and radio flares. We call for continued multi-frequency radio observations to monitor its spectral and temporal evolution, which may help to reveal new physical processes that are not included in standard TDE models.
△ Less
Submitted 11 January, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
The dimension of the region of feasible tournament profiles
Authors:
Daniel Kral,
Ander Lamaison,
Magdalena Prorok,
Xichao Shu
Abstract:
Erd\H os, Lovász and Spencer showed in the late 1970s that the dimension of the region of $k$-vertex graph profiles, i.e., the region of feasible densities of $k$-vertex graphs in large graphs, is equal to the number of non-trivial connected graphs with at most $k$ vertices. We determine the dimension of the region of $k$-vertex tournament profiles. Our result, which explores an interesting connec…
▽ More
Erd\H os, Lovász and Spencer showed in the late 1970s that the dimension of the region of $k$-vertex graph profiles, i.e., the region of feasible densities of $k$-vertex graphs in large graphs, is equal to the number of non-trivial connected graphs with at most $k$ vertices. We determine the dimension of the region of $k$-vertex tournament profiles. Our result, which explores an interesting connection to Lyndon words, yields that the dimension is much larger than just the number of strongly connected tournaments, which would be the answer expected as the analogy to the setting of graphs.
△ Less
Submitted 8 June, 2024; v1 submitted 30 October, 2023;
originally announced October 2023.
-
CCSPNet-Joint: Efficient Joint Training Method for Traffic Sign Detection Under Extreme Conditions
Authors:
Haoqin Hong,
Yue Zhou,
Xiangyu Shu,
Xiaofang Hu
Abstract:
Traffic sign detection is an important research direction in intelligent driving. Unfortunately, existing methods often overlook extreme conditions such as fog, rain, and motion blur. Moreover, the end-to-end training strategy for image denoising and object detection models fails to utilize inter-model information effectively. To address these issues, we propose CCSPNet, an efficient feature extra…
▽ More
Traffic sign detection is an important research direction in intelligent driving. Unfortunately, existing methods often overlook extreme conditions such as fog, rain, and motion blur. Moreover, the end-to-end training strategy for image denoising and object detection models fails to utilize inter-model information effectively. To address these issues, we propose CCSPNet, an efficient feature extraction module based on Contextual Transformer and CNN, capable of effectively utilizing the static and dynamic features of images, achieving faster inference speed and providing stronger feature enhancement capabilities. Furthermore, we establish the correlation between object detection and image denoising tasks and propose a joint training model, CCSPNet-Joint, to improve data efficiency and generalization. Finally, to validate our approach, we create the CCTSDB-AUG dataset for traffic sign detection in extreme scenarios. Extensive experiments have shown that CCSPNet achieves state-of-the-art performance in traffic sign detection under extreme conditions. Compared to end-to-end methods, CCSPNet-Joint achieves a 5.32% improvement in precision and an 18.09% improvement in [email protected].
△ Less
Submitted 3 February, 2024; v1 submitted 13 September, 2023;
originally announced September 2023.
-
Super-Eddington Accretion as a Possible Scenario to Form GW190425
Authors:
W. T. Zhang,
Z. H. T. Wang,
J. -P. Zhu,
R. -C. Hu,
X. W. Shu,
Q. W. Tang,
S. X. Yi,
F. Lyu,
E. W. Liang,
Y. Qin
Abstract:
On 2019 April 25, the LIGO/Virgo Scientific Collaboration detected a compact binary coalescence, GW190425. Under the assumption of the binary neutron star (BNS), the total mass of $3.4^{+0.3}_{-0.1}\, M_\odot$ lies five standard deviations away from the known Galactic population mean. In the standard common envelope scenario, the immediate progenitor of GW190425 is a close binary system composed o…
▽ More
On 2019 April 25, the LIGO/Virgo Scientific Collaboration detected a compact binary coalescence, GW190425. Under the assumption of the binary neutron star (BNS), the total mass of $3.4^{+0.3}_{-0.1}\, M_\odot$ lies five standard deviations away from the known Galactic population mean. In the standard common envelope scenario, the immediate progenitor of GW190425 is a close binary system composed of an NS and a He-rich star. With the detailed binary evolutionary modeling, we find that in order to reproduce GW190425-like events, super-Eddington accretion (e.g., $1,000\,\dot{M}_{\rm Edd}$) from a He-rich star onto the first-born NS with a typical mass of 1.33 $M_\odot$ via stable Case BB mass transfer (MT) is necessarily required. Furthermore, the immediate progenitors should potentially have an initial mass of $M_{\rm ZamsHe}$ in a range of $3.0-3.5$ $M_\odot$ and an initial orbital period of $P_{\rm init}$ from 0.08 days to 0.12 days, respectively. The corresponding mass accreted onto NSs via stable Case BB MT phase varies from $0.70\, M_\odot$ to $0.77\, M_\odot$. After the formation of the second-born NS, the BNSs are expected to be merged due to gravitational wave emission from $\sim$ 11 Myr to $\sim$ 190 Myr.
△ Less
Submitted 28 September, 2023; v1 submitted 10 September, 2023;
originally announced September 2023.
-
Occlusion-Aware Detection and Re-ID Calibrated Network for Multi-Object Tracking
Authors:
Yukun Su,
Ruizhou Sun,
Xin Shu,
Yu Zhang,
Qingyao Wu
Abstract:
Multi-Object Tracking (MOT) is a crucial computer vision task that aims to predict the bounding boxes and identities of objects simultaneously. While state-of-the-art methods have made remarkable progress by jointly optimizing the multi-task problems of detection and Re-ID feature learning, yet, few approaches explore to tackle the occlusion issue, which is a long-standing challenge in the MOT fie…
▽ More
Multi-Object Tracking (MOT) is a crucial computer vision task that aims to predict the bounding boxes and identities of objects simultaneously. While state-of-the-art methods have made remarkable progress by jointly optimizing the multi-task problems of detection and Re-ID feature learning, yet, few approaches explore to tackle the occlusion issue, which is a long-standing challenge in the MOT field. Generally, occluded objects may hinder the detector from estimating the bounding boxes, resulting in fragmented trajectories. And the learned occluded Re-ID embeddings are less distinct since they contain interferer. To this end, we propose an occlusion-aware detection and Re-ID calibrated network for multi-object tracking, termed as ORCTrack. Specifically, we propose an Occlusion-Aware Attention (OAA) module in the detector that highlights the object features while suppressing the occluded background regions. OAA can serve as a modulator that enhances the detector for some potentially occluded objects. Furthermore, we design a Re-ID embedding matching block based on the optimal transport problem, which focuses on enhancing and calibrating the Re-ID representations through different adjacent frames complementarily. To validate the effectiveness of the proposed method, extensive experiments are conducted on two challenging VisDrone2021-MOT and KITTI benchmarks. Experimental evaluations demonstrate the superiority of our approach, which can achieve new state-of-the-art performance and enjoy high run-time efficiency.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Unified and Dynamic Graph for Temporal Character Grou** in Long Videos
Authors:
Xiujun Shu,
Wei Wen,
Liangsheng Xu,
Ruizhi Qiao,
Taian Guo,
Hanjun Li,
Bei Gan,
Xiao Wang,
Xing Sun
Abstract:
Video temporal character grou** locates appearing moments of major characters within a video according to their identities. To this end, recent works have evolved from unsupervised clustering to graph-based supervised clustering. However, graph methods are built upon the premise of fixed affinity graphs, bringing many inexact connections. Besides, they extract multi-modal features with kinds of…
▽ More
Video temporal character grou** locates appearing moments of major characters within a video according to their identities. To this end, recent works have evolved from unsupervised clustering to graph-based supervised clustering. However, graph methods are built upon the premise of fixed affinity graphs, bringing many inexact connections. Besides, they extract multi-modal features with kinds of models, which are unfriendly to deployment. In this paper, we present a unified and dynamic graph (UniDG) framework for temporal character grou**. This is accomplished firstly by a unified representation network that learns representations of multiple modalities within the same space and still preserves the modality's uniqueness simultaneously. Secondly, we present a dynamic graph clustering where the neighbors of different quantities are dynamically constructed for each node via a cyclic matching strategy, leading to a more reliable affinity graph. Thirdly, a progressive association method is introduced to exploit spatial and temporal contexts among different modalities, allowing multi-modal clustering results to be well fused. As current datasets only provide pre-extracted features, we evaluate our UniDG method on a collected dataset named MTCG, which contains each character's appearing clips of face and body and speaking voice tracks. We also evaluate our key components on existing clustering and retrieval datasets to verify the generalization ability. Experimental results manifest that our method can achieve promising results and outperform several state-of-the-art approaches.
△ Less
Submitted 22 June, 2024; v1 submitted 27 August, 2023;
originally announced August 2023.
-
Revisiting the Properties of GW190814 and Its Formation History
Authors:
F. Lyu,
L. Yuan,
D. H. Wu,
W. H. Guo,
Y. Z. Wang,
S. X. Yi,
Q. W. Tang,
R. -C. Hu,
J. -P. Zhu,
X. W. Shu,
Y. Qin,
E. W. Liang
Abstract:
GW190814 was reported during LIGO's and Virgo's third observing run with the most asymmetric component masses (a $\sim 23$ $M_{\odot}$ black hole and a $\sim2.6$ $M_{\odot}$ compact object). Under the assumption that this event is a binary black hole (BBH) merger formed through the isolated binary evolution channel, we reanalyze the publicly released data of GW190814 with the modified astrophysica…
▽ More
GW190814 was reported during LIGO's and Virgo's third observing run with the most asymmetric component masses (a $\sim 23$ $M_{\odot}$ black hole and a $\sim2.6$ $M_{\odot}$ compact object). Under the assumption that this event is a binary black hole (BBH) merger formed through the isolated binary evolution channel, we reanalyze the publicly released data of GW190814 with the modified astrophysical priors on the effective spin $χ_{\rm eff}$, and further explore its formation history using detailed binary modeling. We show that GW190814 is likely to have been formed through the classical common envelope channel. Our findings show that the properties inferred using the modified astrophysical priors are consistent with those inferred by the uniform priors. With the newly-inferred properties of GW190814, we perform detailed binary evolution of the immediate progenitor of the BBH (namely a close binary system composed of a BH and a helium star) in a large parameter space, taking into account mass-loss, internal differential rotation, supernova kicks, and tidal interactions between the helium star and the BH companion. Our findings show that GW190814-like events could be formed in limited initial conditions just after the common envelope phase: a $\sim 23$ $M_{\odot}$ BH and a helium star of $M_{\rm ZamsHe}$ $\sim$ 8.5 $M_{\odot}$ at solar metallicity ($\sim$ 7.5 $M_{\odot}$ at 10\% solar metallicity) with an initial orbital period at around 1.0 day. Additionally, the inferred low spin of the secondary indicates that the required metallicity for reproducing GW190814-like events should not be too low (e.g., Z $\gtrsim$ 0.1 $Z_{\odot}$).
△ Less
Submitted 3 September, 2023; v1 submitted 18 August, 2023;
originally announced August 2023.
-
D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation
Authors:
Hanjun Li,
Xiujun Shu,
Sunan He,
Ruizhi Qiao,
Wei Wen,
Taian Guo,
Bei Gan,
Xing Sun
Abstract:
Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. Recently, weakly supervised methods still have a large performance gap compared to fully supervised ones, while the latter requires laborious timestamp annotations. In this study, we aim to reduce the annotation cost yet keep competitive performance for TSG task compared…
▽ More
Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. Recently, weakly supervised methods still have a large performance gap compared to fully supervised ones, while the latter requires laborious timestamp annotations. In this study, we aim to reduce the annotation cost yet keep competitive performance for TSG task compared to fully supervised ones. To achieve this goal, we investigate a recently proposed glance-supervised temporal sentence grounding task, which requires only single frame annotation (referred to as glance annotation) for each query. Under this setup, we propose a Dynamic Gaussian prior based Grounding framework with Glance annotation (D3G), which consists of a Semantic Alignment Group Contrastive Learning module (SA-GCL) and a Dynamic Gaussian prior Adjustment module (DGA). Specifically, SA-GCL samples reliable positive moments from a 2D temporal map via jointly leveraging Gaussian prior and semantic consistency, which contributes to aligning the positive sentence-moment pairs in the joint embedding space. Moreover, to alleviate the annotation bias resulting from glance annotation and model complex queries consisting of multiple events, we propose the DGA module, which adjusts the distribution dynamically to approximate the ground truth of target moments. Extensive experiments on three challenging benchmarks verify the effectiveness of the proposed D3G. It outperforms the state-of-the-art weakly supervised methods by a large margin and narrows the performance gap compared to fully supervised methods. Code is available at https://github.com/solicucu/D3G.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
WonderFlow: Narration-Centric Design of Animated Data Videos
Authors:
Yun Wang,
Leixian Shen,
Zhengxin You,
Xinhuan Shu,
Bongshin Lee,
John Thompson,
Haidong Zhang,
Dongmei Zhang
Abstract:
Creating an animated data video enriched with audio narration takes a significant amount of time and effort and requires expertise. Users not only need to design complex animations, but also turn written text scripts into audio narrations and synchronize visual changes with the narrations. This paper presents WonderFlow, an interactive authoring tool, that facilitates narration-centric design of a…
▽ More
Creating an animated data video enriched with audio narration takes a significant amount of time and effort and requires expertise. Users not only need to design complex animations, but also turn written text scripts into audio narrations and synchronize visual changes with the narrations. This paper presents WonderFlow, an interactive authoring tool, that facilitates narration-centric design of animated data videos. WonderFlow allows authors to easily specify a semantic link between text and the corresponding chart elements. Then it automatically generates audio narration by leveraging text-to-speech techniques and aligns the narration with an animation. WonderFlow provides a visualization structure-aware animation library designed to ease chart animation creation, enabling authors to apply pre-designed animation effects to common visualization components. It also allows authors to preview and iteratively refine their data videos in a unified system, without having to switch between different creation tools. To evaluate WonderFlow's effectiveness and usability, we created an example gallery and conducted a user study and expert interviews. The results demonstrated that WonderFlow is easy to use and simplifies the creation of data videos with narration-animation interplay.
△ Less
Submitted 6 June, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
Epidemic spreading under game-based self-quarantine behaviors guided by local and global infection information
Authors:
Zegang Huang,
Xincheng Shu,
Qi Xuan,
Zhongyuan Ruan
Abstract:
During the outbreak of an epidemic, individuals may modify their behaviors in response to external (including local and global) infection-related information. However, the difference between local and global information in influencing the spread of diseases remains inadequately explored. Here we study a simple epidemic model that incorporates the game-based self-quarantine behavior of individuals,…
▽ More
During the outbreak of an epidemic, individuals may modify their behaviors in response to external (including local and global) infection-related information. However, the difference between local and global information in influencing the spread of diseases remains inadequately explored. Here we study a simple epidemic model that incorporates the game-based self-quarantine behavior of individuals, taking into account the influence of local infection status, global disease prevalence and node heterogeneity (i.e., non-uniform node degrees). Our findings reveal that local information can effectively contain an epidemic, even with only a small proportion of individuals opting for self-quarantine. On the other hand, global information can induce oscillations in infection evolution curves during the declining phase of an epidemic, owing to the synchronous release of nodes with the same degree from the quarantined state. In contrast, the releasing pattern under the local information appears to be more random. This oscillation phenomenon can be observed in various types of networks associated with different characteristics. Significantly, our model is essentially different from conventional epidemic models in that the network heterogeneity plays a negative role in the spread of epidemics, which is contrary to the previous findings.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
High-performance real-world optical computing trained by in situ model-free optimization
Authors:
Guangyuan Zhao,
Xin Shu,
Renjie Zhou
Abstract:
Optical computing systems provide high-speed and low-energy data processing but face deficiencies in computationally demanding training and simulation-to-reality gaps. We propose a gradient-based model-free optimization (G-MFO) method based on a Monte Carlo gradient estimation algorithm for computationally efficient in situ training of optical computing systems. This approach treats an optical com…
▽ More
Optical computing systems provide high-speed and low-energy data processing but face deficiencies in computationally demanding training and simulation-to-reality gaps. We propose a gradient-based model-free optimization (G-MFO) method based on a Monte Carlo gradient estimation algorithm for computationally efficient in situ training of optical computing systems. This approach treats an optical computing system as a black box and back-propagates the loss directly to the optical computing weights' probability distributions, circumventing the need for a computationally heavy and biased system simulation. Our experiments on diffractive optical computing systems show that G-MFO outperforms hybrid training on the MNIST and FMNIST datasets. Furthermore, we demonstrate image-free and high-speed classification of cells from their marker-free phase maps. Our method's model-free and high-performance nature, combined with its low demand for computational resources, paves the way for accelerating the transition of optical computing from laboratory demonstrations to practical, real-world applications.
△ Less
Submitted 2 April, 2024; v1 submitted 21 July, 2023;
originally announced July 2023.
-
Scientific Objectives of the Hot Universe Baryon Surveyor (HUBS) Mission
Authors:
Joel Bregman,
Renyue Cen,
Yang Chen,
Wei Cui,
Taotao Fang,
Fulai Guo,
Edmund Hodges-Kluck,
Rui Huang,
Luis C. Ho,
Li Ji,
Suoqing Ji,
Xi Kang,
Xiaoyu Lai,
Hui Li,
Jiangtao Li,
Miao Li,
Xiangdong Li,
Yuan Li,
Zhaosheng Li,
Guiyun Liang,
Helei Liu,
Wenhao Liu,
Fangjun Lu,
Junjie Mao,
Gabriele Ponti
, et al. (29 additional authors not shown)
Abstract:
The Hot Universe Baryon Surveyor (HUBS) is a proposed space-based X-ray telescope for detecting X-ray emissions from the hot gas content in our universe. With its unprecedented spatially-resolved high-resolution spectroscopy and large field of view, the HUBS mission will be uniquely qualified to measure the physical and chemical properties of the hot gas in the interstellar medium, the circumgalac…
▽ More
The Hot Universe Baryon Surveyor (HUBS) is a proposed space-based X-ray telescope for detecting X-ray emissions from the hot gas content in our universe. With its unprecedented spatially-resolved high-resolution spectroscopy and large field of view, the HUBS mission will be uniquely qualified to measure the physical and chemical properties of the hot gas in the interstellar medium, the circumgalactic medium, the intergalactic medium, and the intracluster medium. These measurements will be valuable for two key scientific goals of HUBS, namely to unravel the AGN and stellar feedback physics that governs the formation and evolution of galaxies, and to probe the baryon budget and multi-phase states from galactic to cosmological scales. In addition to these two goals, the HUBS mission will also help us solve some problems in the fields of galaxy clusters, AGNs, diffuse X-ray backgrounds, supernova remnants, and compact objects. This paper discusses the perspective of advancing these fields using the HUBS telescope.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
A Randomized Algorithm for Single-Source Shortest Path on Undirected Real-Weighted Graphs
Authors:
Ran Duan,
Jiayi Mao,
Xinkai Shu,
Longhui Yin
Abstract:
In undirected graphs with real non-negative weights, we give a new randomized algorithm for the single-source shortest path (SSSP) problem with running time $O(m\sqrt{\log n \cdot \log\log n})$ in the comparison-addition model. This is the first algorithm to break the $O(m+n\log n)$ time bound for real-weighted sparse graphs by Dijkstra's algorithm with Fibonacci heaps. Previous undirected non-neg…
▽ More
In undirected graphs with real non-negative weights, we give a new randomized algorithm for the single-source shortest path (SSSP) problem with running time $O(m\sqrt{\log n \cdot \log\log n})$ in the comparison-addition model. This is the first algorithm to break the $O(m+n\log n)$ time bound for real-weighted sparse graphs by Dijkstra's algorithm with Fibonacci heaps. Previous undirected non-negative SSSP algorithms give time bound of $O(mα(m,n)+\min\{n\log n, n\log\log r\})$ in comparison-addition model, where $α$ is the inverse-Ackermann function and $r$ is the ratio of the maximum-to-minimum edge weight [Pettie & Ramachandran 2005], and linear time for integer edge weights in RAM model [Thorup 1999]. Note that there is a proposed complexity lower bound of $Ω(m+\min\{n\log n, n\log\log r\})$ for hierarchy-based algorithms for undirected real-weighted SSSP [Pettie & Ramachandran 2005], but our algorithm does not obey the properties required for that lower bound. As a non-hierarchy-based approach, our algorithm shows great advantage with much simpler structure, and is much easier to implement.
△ Less
Submitted 4 October, 2023; v1 submitted 9 July, 2023;
originally announced July 2023.
-
Creating Emordle: Animating Word Cloud for Emotion Expression
Authors:
Liwenhan Xie,
Xinhuan Shu,
Jeon Cheol Su,
Yun Wang,
Siming Chen,
Huamin Qu
Abstract:
We propose emordle, a conceptual design that animates wordles (compact word clouds) to deliver their emotional context to the audiences. To inform the design, we first reviewed online examples of animated texts and animated wordles, and summarized strategies for injecting emotion into the animations. We introduced a composite approach that extends an existing animation scheme for one word to multi…
▽ More
We propose emordle, a conceptual design that animates wordles (compact word clouds) to deliver their emotional context to the audiences. To inform the design, we first reviewed online examples of animated texts and animated wordles, and summarized strategies for injecting emotion into the animations. We introduced a composite approach that extends an existing animation scheme for one word to multiple words in a wordle with two global factors: the randomness of text animation (entropy) and the animation speed (speed). To create an emordle, general users can choose one predefined animated scheme that matches the intended emotion class and fine-tune the emotion intensity with the two parameters. We designed proof-of-concept emordle examples for four basic emotion classes, namely happiness, sadness, anger, and fear. We conducted two controlled crowdsourcing studies to evaluate our approach. The first study confirmed that people generally agreed on the conveyed emotions from well-crafted animations, and the second one demonstrated that our identified factors helped fine-tune the delivered emotion extent. We also invited general users to create emordles on their own based on our proposed framework. Through this user study, we confirmed the effectiveness of the approach. We concluded with implications for future research opportunities of supporting emotion expression in visualizations.
△ Less
Submitted 14 June, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
Filling the gaps in video transcoder deployment in the cloud
Authors:
Vibhoothi,
Daniel Joseph Ringis,
Xin Shu,
François Pitié,
Zsolt Lorincz,
Philippe Brodeur,
Anil Kokaram
Abstract:
Cloud-based deployment of content production and broadcast workflows has continued to disrupt the industry after the pandemic. The key tools required for unlocking cloud workflows, e.g., transcoding, metadata parsing, and streaming playback, are increasingly commoditized. However, as video traffic continues to increase there is a need to consider tools which offer opportunities for further bitrate…
▽ More
Cloud-based deployment of content production and broadcast workflows has continued to disrupt the industry after the pandemic. The key tools required for unlocking cloud workflows, e.g., transcoding, metadata parsing, and streaming playback, are increasingly commoditized. However, as video traffic continues to increase there is a need to consider tools which offer opportunities for further bitrate/quality gains as well as those which facilitate cloud deployment. In this paper we consider preprocessing, rate/distortion optimisation and cloud cost prediction tools which are only just emerging from the research community. These tools are posed as part of the per-clip optimisation approach to transcoding which has been adopted by large streaming media processing entities but has yet to be made more widely available for the industry.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Attack is Good Augmentation: Towards Skeleton-Contrastive Representation Learning
Authors:
Binqian Xu,
Xiangbo Shu,
Rui Yan,
Guo-Sen Xie,
Yixiao Ge,
Mike Zheng Shou
Abstract:
Contrastive learning, relying on effective positive and negative sample pairs, is beneficial to learn informative skeleton representations in unsupervised skeleton-based action recognition. To achieve these positive and negative pairs, existing weak/strong data augmentation methods have to randomly change the appearance of skeletons for indirectly pursuing semantic perturbations. However, such app…
▽ More
Contrastive learning, relying on effective positive and negative sample pairs, is beneficial to learn informative skeleton representations in unsupervised skeleton-based action recognition. To achieve these positive and negative pairs, existing weak/strong data augmentation methods have to randomly change the appearance of skeletons for indirectly pursuing semantic perturbations. However, such approaches have two limitations: 1) solely perturbing appearance cannot well capture the intrinsic semantic information of skeletons, and 2) randomly perturbation may change the original positive/negative pairs to soft positive/negative ones. To address the above dilemma, we start the first attempt to explore an attack-based augmentation scheme that additionally brings in direct semantic perturbation, for constructing hard positive pairs and further assisting in constructing hard negative pairs. In particular, we propose a novel Attack-Augmentation Mixing-Contrastive learning (A$^2$MC) to contrast hard positive features and hard negative features for learning more robust skeleton representations. In A$^2$MC, Attack-Augmentation (Att-Aug) is designed to collaboratively perform targeted and untargeted perturbations of skeletons via attack and augmentation respectively, for generating high-quality hard positive features. Meanwhile, Positive-Negative Mixer (PNM) is presented to mix hard positive features and negative features for generating hard negative features, which are adopted for updating the mixed memory banks. Extensive experiments on three public datasets demonstrate that A$^2$MC is competitive with the state-of-the-art methods.
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering
Authors:
Xinyao Shu,
Shiyang Yan,
Xu Yang,
Ziheng Wu,
Zhongfeng Chen,
Zhenyu Lu
Abstract:
Visual question answering (VQA) is a critical multimodal task in which an agent must answer questions according to the visual cue. Unfortunately, language bias is a common problem in VQA, which refers to the model generating answers only by associating with the questions while ignoring the visual content, resulting in biased results. We tackle the language bias problem by proposing a self-supervis…
▽ More
Visual question answering (VQA) is a critical multimodal task in which an agent must answer questions according to the visual cue. Unfortunately, language bias is a common problem in VQA, which refers to the model generating answers only by associating with the questions while ignoring the visual content, resulting in biased results. We tackle the language bias problem by proposing a self-supervised counterfactual metric learning (SC-ML) method to focus the image features better. SC-ML can adaptively select the question-relevant visual features to answer the question, reducing the negative influence of question-irrelevant visual features on inferring answers. In addition, question-irrelevant visual features can be seamlessly incorporated into counterfactual training schemes to further boost robustness. Extensive experiments have proved the effectiveness of our method with improved results on the VQA-CP dataset. Our code will be made publicly available.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies
Authors:
Bei Gan,
Xiujun Shu,
Ruizhi Qiao,
Haoqian Wu,
Keyu Chen,
Hanjun Li,
Bo Ren
Abstract:
Movie highlights stand out of the screenplay for efficient browsing and play a crucial role on social media platforms. Based on existing efforts, this work has two observations: (1) For different annotators, labeling highlight has uncertainty, which leads to inaccurate and time-consuming annotations. (2) Besides previous supervised or unsupervised settings, some existing video corpora can be usefu…
▽ More
Movie highlights stand out of the screenplay for efficient browsing and play a crucial role on social media platforms. Based on existing efforts, this work has two observations: (1) For different annotators, labeling highlight has uncertainty, which leads to inaccurate and time-consuming annotations. (2) Besides previous supervised or unsupervised settings, some existing video corpora can be useful, e.g., trailers, but they are often noisy and incomplete to cover the full highlights. In this work, we study a more practical and promising setting, i.e., reformulating highlight detection as "learning with noisy labels". This setting does not require time-consuming manual annotations and can fully utilize existing abundant video corpora. First, based on movie trailers, we leverage scene segmentation to obtain complete shots, which are regarded as noisy labels. Then, we propose a Collaborative noisy Label Cleaner (CLC) framework to learn from noisy highlight moments. CLC consists of two modules: augmented cross-propagation (ACP) and multi-modality cleaning (MMC). The former aims to exploit the closely related audio-visual signals and fuse them to learn unified multi-modal representations. The latter aims to achieve cleaner highlight labels by observing the changes in losses among different modalities. To verify the effectiveness of CLC, we further collect a large-scale highlight dataset named MovieLights. Comprehensive experiments on MovieLights and YouTube Highlights datasets demonstrate the effectiveness of our approach. Code has been made available at: https://github.com/TencentYoutuResearch/HighlightDetection-CLC
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
Work with AI and Work for AI: Autonomous Vehicle Safety Drivers' Lived Experiences
Authors:
Mengdi Chu,
Keyu Zong,
Xin Shu,
Jiangtao Gong,
Zicong Lu,
Kaimin Guo,
Xinyi Dai,
Guyue Zhou
Abstract:
The development of Autonomous Vehicle (AV) has created a novel job, the safety driver, recruited from experienced drivers to supervise and operate AV in numerous driving missions. Safety drivers usually work with non-perfect AV in high-risk real-world traffic environments for road testing tasks. However, this group of workers is under-explored in the HCI community. To fill this gap, we conducted s…
▽ More
The development of Autonomous Vehicle (AV) has created a novel job, the safety driver, recruited from experienced drivers to supervise and operate AV in numerous driving missions. Safety drivers usually work with non-perfect AV in high-risk real-world traffic environments for road testing tasks. However, this group of workers is under-explored in the HCI community. To fill this gap, we conducted semi-structured interviews with 26 safety drivers. Our results present how safety drivers cope with defective algorithms and shape and calibrate their perceptions while working with AV. We found that, as front-line workers, safety drivers are forced to take risks accumulated from the AV industry upstream and are also confronting restricted self-development in working for AV development. We contribute the first empirical evidence of the lived experience of safety drivers, the first passengers in the development of AV, and also the grassroots workers for AV, which can shed light on future human-AI interaction research.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Pyramid Self-attention Polymerization Learning for Semi-supervised Skeleton-based Action Recognition
Authors:
Binqian Xu,
Xiangbo Shu
Abstract:
Most semi-supervised skeleton-based action recognition approaches aim to learn the skeleton action representations only at the joint level, but neglect the crucial motion characteristics at the coarser-grained body (e.g., limb, trunk) level that provide rich additional semantic information, though the number of labeled data is limited. In this work, we propose a novel Pyramid Self-attention Polyme…
▽ More
Most semi-supervised skeleton-based action recognition approaches aim to learn the skeleton action representations only at the joint level, but neglect the crucial motion characteristics at the coarser-grained body (e.g., limb, trunk) level that provide rich additional semantic information, though the number of labeled data is limited. In this work, we propose a novel Pyramid Self-attention Polymerization Learning (dubbed as PSP Learning) framework to jointly learn body-level, part-level, and joint-level action representations of joint and motion data containing abundant and complementary semantic information via contrastive learning covering coarse-to-fine granularity. Specifically, to complement semantic information from coarse to fine granularity in skeleton actions, we design a new Pyramid Polymerizing Attention (PPA) mechanism that firstly calculates the body-level attention map, part-level attention map, and joint-level attention map, as well as polymerizes these attention maps in a level-by-level way (i.e., from body level to part level, and further to joint level). Moreover, we present a new Coarse-to-fine Contrastive Loss (CCL) including body-level contrast loss, part-level contrast loss, and joint-level contrast loss to jointly measure the similarity between the body/part/joint-level contrasting features of joint and motion data. Finally, extensive experiments are conducted on the NTU RGB+D and North-Western UCLA datasets to demonstrate the competitive performance of the proposed PSP Learning in the semi-supervised skeleton-based action recognition task. The source codes of PSP Learning are publicly available at https://github.com/1xbq1/PSP-Learning.
△ Less
Submitted 5 February, 2023;
originally announced February 2023.
-
Spatiotemporal Decouple-and-Squeeze Contrastive Learning for Semi-Supervised Skeleton-based Action Recognition
Authors:
Binqian Xu,
Xiangbo Shu
Abstract:
Contrastive learning has been successfully leveraged to learn action representations for addressing the problem of semi-supervised skeleton-based action recognition. However, most contrastive learning-based methods only contrast global features mixing spatiotemporal information, which confuses the spatial- and temporal-specific information reflecting different semantic at the frame level and joint…
▽ More
Contrastive learning has been successfully leveraged to learn action representations for addressing the problem of semi-supervised skeleton-based action recognition. However, most contrastive learning-based methods only contrast global features mixing spatiotemporal information, which confuses the spatial- and temporal-specific information reflecting different semantic at the frame level and joint level. Thus, we propose a novel Spatiotemporal Decouple-and-Squeeze Contrastive Learning (SDS-CL) framework to comprehensively learn more abundant representations of skeleton-based actions by jointly contrasting spatial-squeezing features, temporal-squeezing features, and global features. In SDS-CL, we design a new Spatiotemporal-decoupling Intra-Inter Attention (SIIA) mechanism to obtain the spatiotemporal-decoupling attentive features for capturing spatiotemporal specific information by calculating spatial- and temporal-decoupling intra-attention maps among joint/motion features, as well as spatial- and temporal-decoupling inter-attention maps between joint and motion features. Moreover, we present a new Spatial-squeezing Temporal-contrasting Loss (STL), a new Temporal-squeezing Spatial-contrasting Loss (TSL), and the Global-contrasting Loss (GL) to contrast the spatial-squeezing joint and motion features at the frame level, temporal-squeezing joint and motion features at the joint level, as well as global joint and motion features at the skeleton level. Extensive experimental results on four public datasets show that the proposed SDS-CL achieves performance gains compared with other competitive methods.
△ Less
Submitted 5 February, 2023;
originally announced February 2023.
-
Triplet Contrastive Representation Learning for Unsupervised Vehicle Re-identification
Authors:
Fei Shen,
Xiaoyu Du,
Liyan Zhang,
Xiangbo Shu,
**hui Tang
Abstract:
Part feature learning is critical for fine-grained semantic understanding in vehicle re-identification. However, existing approaches directly model part features and global features, which can easily lead to serious gradient vanishing issues due to their unequal feature information and unreliable pseudo-labels for unsupervised vehicle re-identification. To address this problem, in this paper, we p…
▽ More
Part feature learning is critical for fine-grained semantic understanding in vehicle re-identification. However, existing approaches directly model part features and global features, which can easily lead to serious gradient vanishing issues due to their unequal feature information and unreliable pseudo-labels for unsupervised vehicle re-identification. To address this problem, in this paper, we propose a simple Triplet Contrastive Representation Learning (TCRL) framework which leverages cluster features to bridge the part features and global features for unsupervised vehicle re-identification. Specifically, TCRL devises three memory banks to store the instance/cluster features and proposes a Proxy Contrastive Loss (PCL) to make contrastive learning between adjacent memory banks, thus presenting the associations between the part and global features as a transition of the part-cluster and cluster-global associations. Since the cluster memory bank copes with all the vehicle features, it can summarize them into a discriminative feature representation. To deeply exploit the instance/cluster information, TCRL proposes two additional loss functions. For the instance-level feature, a Hybrid Contrastive Loss (HCL) re-defines the sample correlations by approaching the positive instance features and pushing the all negative instance features away. For the cluster-level feature, a Weighted Regularization Cluster Contrastive Loss (WRCCL) refines the pseudo labels by penalizing the mislabeled images according to the instance similarity. Extensive experiments show that TCRL outperforms many state-of-the-art unsupervised vehicle re-identification approaches.
△ Less
Submitted 15 March, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Formation of Fast-spinning Neutron Stars in Close Binaries and Magnetar-driven Stripped-envelope Supernovae
Authors:
Rui-Chong Hu,
**-** Zhu,
Ying Qin,
Yong Shao,
Bing Zhang,
Yun-Wei Yu,
En-Wei Liang,
Liang-Duan Liu,
Bo Wang,
Xin-Wen Shu,
Jian-Feng Liu
Abstract:
Extreme stripped-envelope supernovae (SESNe), including Type Ic superluminous supernovae (SLSNe), broad-line Type Ic SNe (SNe Ic-BL), and fast blue optical transients (FBOTs), are widely believed to harbor a newborn fast-spinning highly-magnetized neutron star (``magnetar''), which can lose its rotational energy via spin-down processes to accelerate and heat the ejecta. The progenitor(s) of these…
▽ More
Extreme stripped-envelope supernovae (SESNe), including Type Ic superluminous supernovae (SLSNe), broad-line Type Ic SNe (SNe Ic-BL), and fast blue optical transients (FBOTs), are widely believed to harbor a newborn fast-spinning highly-magnetized neutron star (``magnetar''), which can lose its rotational energy via spin-down processes to accelerate and heat the ejecta. The progenitor(s) of these magnetar-driven SESNe, and the origin of considerable angular momentum (AM) in the cores of massive stars to finally produce such fast-spinning magnetars upon core-collapse are still under debate. Popular proposed scenarios in the literature cannot simultaneously explain their event rate density, SN and magnetar parameters, and the observed metallicity. Here, we perform a detailed binary evolution simulation that demonstrates that tidal spin-up helium stars with efficient AM transport mechanism in close binaries can form fast-spinning magnetars at the end of stars' life to naturally reproduce the universal energy-mass correlation of these magnetar-driven SESNe. Our models are consistent with the event rate densities, host environments, ejecta masses, and energetics of these different kinds of magnetar-driven SESNe, supporting that the isolated common-envelope formation channel could be a major common origin of magnetar-driven SESNe. The remnant compact binary systems of magnetar-driven SESNe are progenitors of some galactic systems and gravitational-wave transients.
△ Less
Submitted 16 January, 2023;
originally announced January 2023.
-
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Authors:
Bo Fang,
Wenhao Wu,
Chang Liu,
Yu Zhou,
Yuxin Song,
Wei** Wang,
Xiangbo Shu,
Xiangyang Ji,
**gdong Wang
Abstract:
With the explosive growth of web videos and emerging large-scale vision-language pre-training models, e.g., CLIP, retrieving videos of interest with text instructions has attracted increasing attention. A common practice is to transfer text-video pairs to the same embedding space and craft cross-modal interactions with certain entities in specific granularities for semantic correspondence. Unfortu…
▽ More
With the explosive growth of web videos and emerging large-scale vision-language pre-training models, e.g., CLIP, retrieving videos of interest with text instructions has attracted increasing attention. A common practice is to transfer text-video pairs to the same embedding space and craft cross-modal interactions with certain entities in specific granularities for semantic correspondence. Unfortunately, the intrinsic uncertainties of optimal entity combinations in appropriate granularities for cross-modal queries are understudied, which is especially critical for modalities with hierarchical semantics, e.g., video, text, etc. In this paper, we propose an Uncertainty-Adaptive Text-Video Retrieval approach, termed UATVR, which models each look-up as a distribution matching procedure. Concretely, we add additional learnable tokens in the encoders to adaptively aggregate multi-grained semantics for flexible high-level reasoning. In the refined embedding space, we represent text-video pairs as probabilistic distributions where prototypes are sampled for matching evaluation. Comprehensive experiments on four benchmarks justify the superiority of our UATVR, which achieves new state-of-the-art results on MSR-VTT (50.8%), VATEX (64.5%), MSVD (49.7%), and DiDeMo (45.8%). The code is available at https://github.com/bofang98/UATVR.
△ Less
Submitted 18 August, 2023; v1 submitted 16 January, 2023;
originally announced January 2023.
-
Merging binary black holes formed through double-core evolution
Authors:
Y. Qin,
R. -C. Hu,
G. Meynet,
Y. Z. Wang,
J. -P. Zhu,
H. F. Song,
X. W. Shu,
S. C. Wu
Abstract:
To date, various formation channels of merging events have been heavily explored with the detection of nearly 100 double black hole (BH) merger events reported by the LIGO-Virgo-KAGRA (LVK) Collaboration. We here systematically investigate an alternative formation scenario, i.e., binary BHs (BBHs) formed through double helium stars (hereafter double-core evolution channel). In this scenario, the t…
▽ More
To date, various formation channels of merging events have been heavily explored with the detection of nearly 100 double black hole (BH) merger events reported by the LIGO-Virgo-KAGRA (LVK) Collaboration. We here systematically investigate an alternative formation scenario, i.e., binary BHs (BBHs) formed through double helium stars (hereafter double-core evolution channel). In this scenario, the two helium stars (He-rich stars) could be the outcome of the classical isolated binary evolution scenario involving with and without common-envelope phase (i.e., CE channel and stable mass transfer channel), or alternatively of massive close binaries evolving chemically homogeneously (i.e., CHE channel). We perform detailed stellar structure and binary evolution calculations that take into account internal differential rotation and mass loss of He-rich stars, as well as tidal interactions in binaries. For double He-rich stars with equal masses in binaries, we find that tides start to be at work on the Zero Age Helium Main Sequence (ZAHeMS: the time when a He-rich star starts to burn helium in the core, which is analogous to ZAMS for core hydrogen burning) for initial orbital periods not longer than 1.0 day, depending on the initial metallicities. Besides the stellar mass loss rate and tidal interactions in binaries, we find that the role of the angular momentum transport efficiency in determining the resulting BH spins, becomes stronger when considering BH progenitors originated from a higher metal-metallicity environment. We highlight that double-core evolution scenario does not always produce fast-spinning BBHs and compare the properties of the BBHs reported from the LVK with our modeling.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Visual observation of optical Floquet-Bloch oscillations
Authors:
Zhen Zhang,
Yuan Li,
Xiankai Sun,
Xuewen Shu
Abstract:
Bloch oscillations, an important transport phenomenon, have extensively been studied in static systems but remain largely unexplored in Floquet systems. Here, we propose a new type of Bloch oscillations, namely the "Floquet-Bloch oscillations," which refer to rescaled Bloch oscillations with a period of extended least common multiple of the modulation and Bloch periods. We report the first visual…
▽ More
Bloch oscillations, an important transport phenomenon, have extensively been studied in static systems but remain largely unexplored in Floquet systems. Here, we propose a new type of Bloch oscillations, namely the "Floquet-Bloch oscillations," which refer to rescaled Bloch oscillations with a period of extended least common multiple of the modulation and Bloch periods. We report the first visual observation of such Floquet-Bloch oscillations in femtosecond-laser-written waveguide arrays by using waveguide fluorescence microscopy. These Floquet-Bloch oscillations exhibit exotic properties, such as fractal spectrum and fractional Floquet tunneling. This new transport mechanism offers an intriguing method of wave manipulation, which has significant applications in coherent quantum transport.
△ Less
Submitted 29 December, 2022;
originally announced December 2022.
-
Dilation-Erosion for Single-Frame Supervised Temporal Action Localization
Authors:
Bin Wang,
Yan Song,
Fanming Wang,
Yang Zhao,
Xiangbo Shu,
Yan Rui
Abstract:
To balance the annotation labor and the granularity of supervision, single-frame annotation has been introduced in temporal action localization. It provides a rough temporal location for an action but implicitly overstates the supervision from the annotated-frame during training, leading to the confusion between actions and backgrounds, i.e., action incompleteness and background false positives. T…
▽ More
To balance the annotation labor and the granularity of supervision, single-frame annotation has been introduced in temporal action localization. It provides a rough temporal location for an action but implicitly overstates the supervision from the annotated-frame during training, leading to the confusion between actions and backgrounds, i.e., action incompleteness and background false positives. To tackle the two challenges, in this work, we present the Snippet Classification model and the Dilation-Erosion module. In the Dilation-Erosion module, we expand the potential action segments with a loose criterion to alleviate the problem of action incompleteness and then remove the background from the potential action segments to alleviate the problem of action incompleteness. Relying on the single-frame annotation and the output of the snippet classification, the Dilation-Erosion module mines pseudo snippet-level ground-truth, hard backgrounds and evident backgrounds, which in turn further trains the Snippet Classification model. It forms a cyclic dependency. Furthermore, we propose a new embedding loss to aggregate the features of action instances with the same label and separate the features of actions from backgrounds. Experiments on THUMOS14 and ActivityNet 1.2 validate the effectiveness of the proposed method. Code has been made publicly available (https://github.com/LingJun123/single-frame-TAL).
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
Learning Single Image Defocus Deblurring with Misaligned Training Pairs
Authors:
Yu Li,
Dongwei Ren,
Xinya Shu,
Wangmeng Zuo
Abstract:
By adopting popular pixel-wise loss, existing methods for defocus deblurring heavily rely on well aligned training image pairs. Although training pairs of ground-truth and blurry images are carefully collected, e.g., DPDD dataset, misalignment is inevitable between training pairs, making existing methods possibly suffer from deformation artifacts. In this paper, we propose a joint deblurring and r…
▽ More
By adopting popular pixel-wise loss, existing methods for defocus deblurring heavily rely on well aligned training image pairs. Although training pairs of ground-truth and blurry images are carefully collected, e.g., DPDD dataset, misalignment is inevitable between training pairs, making existing methods possibly suffer from deformation artifacts. In this paper, we propose a joint deblurring and reblurring learning (JDRL) framework for single image defocus deblurring with misaligned training pairs. Generally, JDRL consists of a deblurring module and a spatially invariant reblurring module, by which deblurred result can be adaptively supervised by ground-truth image to recover sharp textures while maintaining spatial consistency with the blurry image. First, in the deblurring module, a bi-directional optical flow-based deformation is introduced to tolerate spatial misalignment between deblurred and ground-truth images. Second, in the reblurring module, deblurred result is reblurred to be spatially aligned with blurry image, by predicting a set of isotropic blur kernels and weighting maps. Moreover, we establish a new single image defocus deblurring (SDD) dataset, further validating our JDRL and also benefiting future research. Our JDRL can be applied to boost defocus deblurring networks in terms of both quantitative metrics and visual quality on DPDD, RealDOF and our SDD datasets.
△ Less
Submitted 29 November, 2022; v1 submitted 26 November, 2022;
originally announced November 2022.
-
AdaTriplet-RA: Domain Matching via Adaptive Triplet and Reinforced Attention for Unsupervised Domain Adaptation
Authors:
Xinyao Shu,
Shiyang Yan,
Zhenyu Lu,
Xinshao Wang,
Yuan Xie
Abstract:
Unsupervised domain adaption (UDA) is a transfer learning task where the data and annotations of the source domain are available but only have access to the unlabeled target data during training. Most previous methods try to minimise the domain gap by performing distribution alignment between the source and target domains, which has a notable limitation, i.e., operating at the domain level, but ne…
▽ More
Unsupervised domain adaption (UDA) is a transfer learning task where the data and annotations of the source domain are available but only have access to the unlabeled target data during training. Most previous methods try to minimise the domain gap by performing distribution alignment between the source and target domains, which has a notable limitation, i.e., operating at the domain level, but neglecting the sample-level differences. To mitigate this weakness, we propose to improve the unsupervised domain adaptation task with an inter-domain sample matching scheme. We apply the widely-used and robust Triplet loss to match the inter-domain samples. To reduce the catastrophic effect of the inaccurate pseudo-labels generated during training, we propose a novel uncertainty measurement method to select reliable pseudo-labels automatically and progressively refine them. We apply the advanced discrete relaxation Gumbel Softmax technique to realise an adaptive Topk scheme to fulfil the functionality. In addition, to enable the global ranking optimisation within one batch for the domain matching, the whole model is optimised via a novel reinforced attention mechanism with supervision from the policy gradient algorithm, using the Average Precision (AP) as the reward. Our model (termed \textbf{\textit{AdaTriplet-RA}}) achieves State-of-the-art results on several public benchmark datasets, and its effectiveness is validated via comprehensive ablation studies. Our method improves the accuracy of the baseline by 9.7\% (ResNet-101) and 6.2\% (ResNet-50) on the VisDa dataset and 4.22\% (ResNet-50) on the Domainnet dataset. {The source code is publicly available at \textit{https://github.com/shuxy0120/AdaTriplet-RA}}.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
Searching for candidates of coalescing binary black holes formed through chemically homogeneous evolution in GWTC-3
Authors:
Ying Qin,
Yuan-Zhu Wang,
Simone S. Bavera,
Shichao Wu,
Georges Meynet,
Yi-Ying Wang,
Rui-Chong Hu,
**-** Zhu,
Dong-Hong Wu,
Xin-Wen Shu,
Fang-Kun Peng,
Han-Feng Song,
Da-Ming Wei
Abstract:
The LIGO, Virgo, and KAGRA (LVK) collaboration has announced 90 coalescing binary black holes (BBHs) with $p_{\rm astro} > 50\%$ to date, however, the origin of their formation channels is still an open scientific question. Given various properties of BBHs (BH component masses and individual spins) inferred using the default priors by the LVK, independent groups have been trying to explain the for…
▽ More
The LIGO, Virgo, and KAGRA (LVK) collaboration has announced 90 coalescing binary black holes (BBHs) with $p_{\rm astro} > 50\%$ to date, however, the origin of their formation channels is still an open scientific question. Given various properties of BBHs (BH component masses and individual spins) inferred using the default priors by the LVK, independent groups have been trying to explain the formation of the BBHs with different formation channels. Of all formation scenarios, the chemically homogeneous evolution (CHE) channel has stood out with distinguishing features, namely, nearly-equal component masses and preferentially high individual spins aligned with the orbital angular momentum. We perform Bayesian inference on the BBH events officially reported in GWTC-3 with astrophysically-predicted priors representing different formation channels of the isolated binary evolution (CEE: common-envelope evolution channel; CHE; SMT: stable mass transfer). Given assumed models, we report strong evidence for GW190517\_055101 being most likely to have formed through the CHE channel. Assuming the BBH events in the subsample are all formed through one of the isolated binary evolution channels, we obtain the lower limits on the local merger rate density of these channels at $11.45 ~\mathrm{Gpc^{-3}~yr^{-1}}$ (CEE), $0.18 ~\mathrm{Gpc^{-3}~yr^{-1}}$ (CHE), and $0.63 ~\mathrm{Gpc^{-3}~yr^{-1}}$ (SMT) at $90\%$ credible level.
△ Less
Submitted 23 December, 2022; v1 submitted 10 November, 2022;
originally announced November 2022.
-
Online Nash Welfare Maximization Without Predictions
Authors:
Zhiyi Huang,
Minming Li,
Xinkai Shu,
Tianze Wei
Abstract:
The maximization of Nash welfare, which equals the geometric mean of agents' utilities, is widely studied because it balances efficiency and fairness in resource allocation problems. Banerjee, Gkatzelis, Gorokh, and ** (2022) recently introduced the model of online Nash welfare maximization for $T$ divisible items and $N$ agents with additive utilities with predictions of each agent's utility for…
▽ More
The maximization of Nash welfare, which equals the geometric mean of agents' utilities, is widely studied because it balances efficiency and fairness in resource allocation problems. Banerjee, Gkatzelis, Gorokh, and ** (2022) recently introduced the model of online Nash welfare maximization for $T$ divisible items and $N$ agents with additive utilities with predictions of each agent's utility for receiving all items. They gave online algorithms whose competitive ratios are logarithmic. We initiate the study of online Nash welfare maximization \emph{without predictions}, assuming either that the agents' utilities for receiving all items differ by a bounded ratio, or that their utilities for the Nash welfare maximizing allocation differ by a bounded ratio. We design online algorithms whose competitive ratios only depend on the logarithms of the aforementioned ratios of agents' utilities and the number of agents.
△ Less
Submitted 10 February, 2023; v1 submitted 6 November, 2022;
originally announced November 2022.
-
NAS-PRNet: Neural Architecture Search generated Phase Retrieval Net for Off-axis Quantitative Phase Imaging
Authors:
Xin Shu,
Mengxuan Niu,
Yi Zhang,
Renjie Zhou
Abstract:
Single neural networks have achieved simultaneous phase retrieval with aberration compensation and phase unwrap** in off-axis Quantitative Phase Imaging (QPI). However, when designing the phase retrieval neural network architecture, the trade-off between computation latency and accuracy has been largely neglected. Here, we propose Neural Architecture Search (NAS) generated Phase Retrieval Net (N…
▽ More
Single neural networks have achieved simultaneous phase retrieval with aberration compensation and phase unwrap** in off-axis Quantitative Phase Imaging (QPI). However, when designing the phase retrieval neural network architecture, the trade-off between computation latency and accuracy has been largely neglected. Here, we propose Neural Architecture Search (NAS) generated Phase Retrieval Net (NAS-PRNet), which is an encoder-decoder style neural network, automatically found from a large neural network architecture search space. The NAS scheme in NAS-PRNet is modified from SparseMask, in which the learning of skip connections between the encoder and the decoder is formulated as a differentiable NAS problem, and the gradient decent is applied to efficiently search the optimal skip connections. Using MobileNet-v2 as the encoder and a synthesized loss that incorporates phase reconstruction and network sparsity losses, NAS-PRNet has realized fast and accurate phase retrieval of biological cells. When tested on a cell dataset, NAS-PRNet has achieved a Peak Signal-to-Noise Ratio (PSNR) of 36.1 dB, outperforming the widely used U-Net and original SparseMask-generated neural network. Notably, the computation latency of NAS-PRNet is only 31 ms which is 12 times less than U-Net. Moreover, the connectivity scheme in NAS-PRNet, identified from one off-axis QPI system, can be well fitted to another with different fringe patterns.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
How the reversible change of contact network affects the epidemic spreading
Authors:
Xincheng Shu,
Zhongyuan Ruan
Abstract:
The mobility patterns of individuals in China during the early outbreak of the COVID-19 pandemic exhibit reversible changes -- in many regions, the mobility first decreased significantly and later restored. Based on this observation, here we study the classical SIR model on a particular type of time-varying network where the links undergo a freeze-recovery process. We first focus on an isolated ne…
▽ More
The mobility patterns of individuals in China during the early outbreak of the COVID-19 pandemic exhibit reversible changes -- in many regions, the mobility first decreased significantly and later restored. Based on this observation, here we study the classical SIR model on a particular type of time-varying network where the links undergo a freeze-recovery process. We first focus on an isolated network and find that the recovery mechanism could lead to the resurgence of an epidemic. The influence of link freezing on epidemic dynamics is subtle. In particular, we show that there is an optimal value of the freezing rate for links which corresponds to the lowest prevalence of the epidemic. This result challenges our conventional idea that stricter prevention measures (corresponding to a larger freezing rate) could always have a better inhibitory effect on epidemic spreading. We further investigate an open system where a small fraction of nodes in the network may acquire the disease from the "environment" (the outside infected nodes). In this case, the second wave would appear even if the number of infected nodes has declined to zero, which can not be explained by the isolated network model.
△ Less
Submitted 23 October, 2022;
originally announced October 2022.
-
A possible 250-second X-ray quasi-periodicity in the fast blue optical transient AT2018cow
Authors:
Wenjie Zhang,
Xinwen Shu,
**-Hong Chen,
Luming Sun,
Rong-Feng Shen,
Lian Tao,
Chun Chen,
Ning Jiang,
LiMing Dou,
Ying Qin,
Xue-Guang Zhang,
Liang Zhang,
**lu Qu,
Tinggui Wang
Abstract:
The fast blue optical transients (FBOTs) are a new population of extragalactic transients of unclear physical origin. A variety of mechanisms have been proposed including failed supernova explosion, shock interaction with a dense medium, young magnetar, accretion onto a compact object, and stellar tidal disruption event, but none is conclusive. Here we report the discovery of a possible X-ray quas…
▽ More
The fast blue optical transients (FBOTs) are a new population of extragalactic transients of unclear physical origin. A variety of mechanisms have been proposed including failed supernova explosion, shock interaction with a dense medium, young magnetar, accretion onto a compact object, and stellar tidal disruption event, but none is conclusive. Here we report the discovery of a possible X-ray quasi-periodicity signal with a period of $\sim$250 second (at a significance level of 99.76%) in the brightest FBOT AT2018cow through the analysis of XMM-Newton/PN data. The signal is independently detected at the same frequency in the average power density spectrum from data taken from the Swift telescope, with observations covering from 6 to 37 days after the optical discovery, though the significance level is lower (94.26%). This suggests that the QPO frequency may be stable over at least 1.1$\times$ 10$^{4}$ cycles. Assuming the $\sim$250 second QPO to be a scaled-down analogue of that typically seen in stellar mass black holes, a black hole mass of $\sim10^{3}-10^{5}$ solar masses could be inferred. The overall X-ray luminosity evolution could be modeled with the stellar tidal disruption by a black hole of $\sim10^4$ solar masses, providing a viable mechanism to produce AT2018cow. Our findings suggest that other bright FBOTs may also harbor intermediate-mass black holes.
△ Less
Submitted 9 October, 2022;
originally announced October 2022.
-
The hidden side of cosmic star formation at z > 3: Bridging optically-dark and Lyman break galaxies with GOODS-ALMA
Authors:
Mengyuan Xiao,
David Elbaz,
Carlos Gómez-Guijarro,
Lucas Leroy,
Longji Bing,
Emanuele Daddi,
Benjamin Magnelli,
Maximilien Franco,
Luwenjia Zhou,
Mark Dickinson,
Tao Wang,
Wiphu Rujopakarn,
Georgios E. Magdis,
Ezequiel Treister,
Hanae Inami,
Ricardo Demarco,
Mark T. Sargent,
Xinwen Shu,
Jeyhan S. Kartaltepe,
David M. Alexander,
Matthieu Béthermin,
Frederic Bournaud,
Laure Ciesla,
Henry C. Ferguson,
Steven L. Finkelstein
, et al. (15 additional authors not shown)
Abstract:
Our current understanding of the cosmic star formation history at z>3 is primarily based on UV-selected galaxies (i.e., LBGs). Recent studies of H-dropouts have revealed that we may be missing a large proportion of star formation that is taking place in massive galaxies at z>3. In this work, we extend the H-dropout criterion to lower masses to select optically dark/faint galaxies (OFGs), in order…
▽ More
Our current understanding of the cosmic star formation history at z>3 is primarily based on UV-selected galaxies (i.e., LBGs). Recent studies of H-dropouts have revealed that we may be missing a large proportion of star formation that is taking place in massive galaxies at z>3. In this work, we extend the H-dropout criterion to lower masses to select optically dark/faint galaxies (OFGs), in order to complete the census between LBGs and H-dropouts. Our criterion (H> 26.5 mag & [4.5] < 25 mag) combined with a de-blending technique is designed to select not only extremely dust-obscured massive galaxies but also normal star-forming galaxies. In total, we identified 27 OFGs at z_phot > 3 (z_med=4.1) in the GOODS-ALMA field, covering a wide distribution of stellar masses with log($M_{\star}$/$M_{\odot}$) = 9.4-11.1. We find that up to 75% of the OFGs with log($M_{\star}$/$M_{\odot}$) = 9.5-10.5 were neglected by previous LBGs and H-dropout selection techniques. After performing stacking analyses, the OFGs exhibit shorter gas depletion timescales, slightly lower gas fractions, and lower dust temperatures than typical star-forming galaxies. Their SFR_tot (SFR_ IR+SFR_UV) is much larger than SFR_UVcorr (corrected for dust extinction), with SFR_tot/SFR_UVcorr = $8\pm1$, suggesting the presence of hidden dust regions in the OFGs that absorb all UV photons. The average dust size measured by a circular Gaussian model fit is R_e(1.13 mm)=1.01$\pm$0.05 kpc. We find that the cosmic SFRD at z>3 contributed by massive OFGs is at least two orders of magnitude higher than the one contributed by equivalently massive LBGs. Finally, we calculate the combined contribution of OFGs and LBGs to the cosmic SFRD at z=4-5 to be 4 $\times$ 10$^{-2}$ $M_{\odot}$ yr$^{-1}$Mpc$^{-3}$, which is about 0.15 dex (43%) higher than the SFRD derived from UV-selected samples alone at the same redshift.
△ Less
Submitted 10 February, 2023; v1 submitted 6 October, 2022;
originally announced October 2022.
-
MetaGlyph: Automatic Generation of Metaphoric Glyph-based Visualization
Authors:
Lu Ying,
Xinhuan Shu,
Dazhen Deng,
Yuchen Yang,
Tan Tang,
Lingyun Yu,
Yingcai Wu
Abstract:
Glyph-based visualization achieves an impressive graphic design when associated with comprehensive visual metaphors, which help audiences effectively grasp the conveyed information through revealing data semantics. However, creating such metaphoric glyph-based visualization (MGV) is not an easy task, as it requires not only a deep understanding of data but also professional design skills. This pap…
▽ More
Glyph-based visualization achieves an impressive graphic design when associated with comprehensive visual metaphors, which help audiences effectively grasp the conveyed information through revealing data semantics. However, creating such metaphoric glyph-based visualization (MGV) is not an easy task, as it requires not only a deep understanding of data but also professional design skills. This paper proposes MetaGlyph, an automatic system for generating MGVs from a spreadsheet. To develop MetaGlyph, we first conduct a qualitative analysis to understand the design of current MGVs from the perspectives of metaphor embodiment and glyph design. Based on the results, we introduce a novel framework for generating MGVs by metaphoric image selection and an MGV construction. Specifically, MetaGlyph automatically selects metaphors with corresponding images from online resources based on the input data semantics. We then integrate a Monte Carlo tree search algorithm that explores the design of an MGV by associating visual elements with data dimensions given the data importance, semantic relevance, and glyph non-overlap. The system also provides editing feedback that allows users to customize the MGVs according to their design preferences. We demonstrate the use of MetaGlyph through a set of examples, one usage scenario, and validate its effectiveness through a series of expert interviews.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
X-ray view of a merging supermassive black hole binary candidate SDSSJ1430+2303: Results from the first ~200 days of observations
Authors:
Liming Dou,
Ning Jiang,
Tinggui Wang,
Xinwen Shu,
Huan Yang,
Zhen Pan,
Jiazheng Zhu,
Tao An,
Zhen-Ya Zheng,
Yanli Ai
Abstract:
Recently we discovered an unprecedented supermassive black hole binary (SMBHB) candidate in the nearby Seyfert galaxy SDSS J1430+2303, which is predicted to merge within three years. X-ray spectroscopy may bring unique kinematic evidence for the last inspiraling stage, when the binary is too close to allow each of them to hold an individual broad line region. We try to confirm the unique SMBHB mer…
▽ More
Recently we discovered an unprecedented supermassive black hole binary (SMBHB) candidate in the nearby Seyfert galaxy SDSS J1430+2303, which is predicted to merge within three years. X-ray spectroscopy may bring unique kinematic evidence for the last inspiraling stage, when the binary is too close to allow each of them to hold an individual broad line region. We try to confirm the unique SMBHB merger event and understand the associated high-energy processes from a comprehensive X-ray view. We observed SDSS J1430+2303 with XMM-Newton, NuSTAR, Chandra, and Swift spanning the first ~200 days since its discovery. X-ray variability, up to a factor of 7, has been detected on a timescale of a few days. The broadband spectrum from 0.2-70 keV can be well fitted with a model consisting of a power law and a relativistic reflection covered by a warm absorber. The properties of the warm absorber changed dramatically, for example, with a decrease in the line-of-sight velocity from ~0.2c to ~0.02c, between the two XMM-Newton observations separated by only 19 days, which can be naturally understood in the context of the SMBHB; although, the clumpy wind scenario cannot be completely excluded. Broad Fe Kalpha emission has been robustly detected, though its velocity shift or profile change is not yet measurable. Further longer X-ray observations are highly encouraged to detect the expected orbital motion of the binary.
△ Less
Submitted 31 August, 2022; v1 submitted 25 August, 2022;
originally announced August 2022.
-
VLMAE: Vision-Language Masked Autoencoder
Authors:
Sunan He,
Taian Guo,
Tao Dai,
Ruizhi Qiao,
Chen Wu,
Xiujun Shu,
Bo Ren
Abstract:
Image and language modeling is of crucial importance for vision-language pre-training (VLP), which aims to learn multi-modal representations from large-scale paired image-text data. However, we observe that most existing VLP methods focus on modeling the interactions between image and text features while neglecting the information disparity between image and text, thus suffering from focal bias. T…
▽ More
Image and language modeling is of crucial importance for vision-language pre-training (VLP), which aims to learn multi-modal representations from large-scale paired image-text data. However, we observe that most existing VLP methods focus on modeling the interactions between image and text features while neglecting the information disparity between image and text, thus suffering from focal bias. To address this problem, we propose a vision-language masked autoencoder framework (VLMAE). VLMAE employs visual generative learning, facilitating the model to acquire fine-grained and unbiased features. Unlike the previous works, VLMAE pays attention to almost all critical patches in an image, providing more comprehensive understanding. Extensive experiments demonstrate that VLMAE achieves better performance in various vision-language downstream tasks, including visual question answering, image-text retrieval and visual grounding, even with up to 20% pre-training speedup.
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval
Authors:
Xiujun Shu,
Wei Wen,
Haoqian Wu,
Keyu Chen,
Yiran Song,
Ruizhi Qiao,
Bo Ren,
Xiao Wang
Abstract:
Text-based person retrieval aims to find the query person based on a textual description. The key is to learn a common latent space map** between visual-textual modalities. To achieve this goal, existing works employ segmentation to obtain explicitly cross-modal alignments or utilize attention to explore salient alignments. These methods have two shortcomings: 1) Labeling cross-modal alignments…
▽ More
Text-based person retrieval aims to find the query person based on a textual description. The key is to learn a common latent space map** between visual-textual modalities. To achieve this goal, existing works employ segmentation to obtain explicitly cross-modal alignments or utilize attention to explore salient alignments. These methods have two shortcomings: 1) Labeling cross-modal alignments are time-consuming. 2) Attention methods can explore salient cross-modal alignments but may ignore some subtle and valuable pairs. To relieve these issues, we introduce an Implicit Visual-Textual (IVT) framework for text-based person retrieval. Different from previous models, IVT utilizes a single network to learn representation for both modalities, which contributes to the visual-textual interaction. To explore the fine-grained alignment, we further propose two implicit semantic alignment paradigms: multi-level alignment (MLA) and bidirectional mask modeling (BMM). The MLA module explores finer matching at sentence, phrase, and word levels, while the BMM module aims to mine \textbf{more} semantic alignments between visual and textual modalities. Extensive experiments are carried out to evaluate the proposed IVT on public datasets, i.e., CUHK-PEDES, RSTPReID, and ICFG-PEDES. Even without explicit body part alignment, our approach still achieves state-of-the-art performance. Code is available at: https://github.com/TencentYoutuResearch/PersonRetrieval-IVT.
△ Less
Submitted 25 August, 2022; v1 submitted 17 August, 2022;
originally announced August 2022.
-
Transient radio emission from low-redshift galaxies at z<0.3 revealed by VLASS and FIRST surveys
Authors:
Fabao Zhang,
Xinwen Shu,
Luming Sun,
Lei Yang,
Ning Jiang,
Liming Dou,
Jianguo Wang,
Tinggui Wang
Abstract:
We present the discovery of a sample of 18 low-redshift (z<0.3) galaxies with transient nuclear radio emission. These galaxies are not or weakly detected in the Faint Images of the Radio Sky at Twenty cm survey performed on 1993-2009, but have brightened significantly in the radio flux (by a factor of >5) in the epoch I (2017-2019) observations of Very Large Array Sky Survey (VLASS). All the 18 ga…
▽ More
We present the discovery of a sample of 18 low-redshift (z<0.3) galaxies with transient nuclear radio emission. These galaxies are not or weakly detected in the Faint Images of the Radio Sky at Twenty cm survey performed on 1993-2009, but have brightened significantly in the radio flux (by a factor of >5) in the epoch I (2017-2019) observations of Very Large Array Sky Survey (VLASS). All the 18 galaxies have been detected in the epoch II VLASS observations in 2020-2021, for which the radio flux is found to evolve slowly (by a factor of ~40%) over a period of about three years. 15 galaxies have been observed in the Rapid ASKAP Continuum Survey, and a flat or inverted spectral slope between 888 MHz and 3 GHz is found. Based on the Sloan Digital Sky Survey spectra taken before the radio brightening, 14 out of 18 can be classified to be LINERs or normal galaxies with weak or no nuclear activity. Most galaxies are red and massive, with more than half having central black hole masses above 10^8Msun. We find that only one galaxy in our sample displays optical flare lasting for at least two months, and a long decay in the infrared light curve that can be explained as the dust-heated echo emission of central optical flare, such as a stellar tidal disruption event. We discuss several possibilities for the transient radio emission and conclude that it is likely associated with a new-born radio jet triggered by short sporadic fueling of supermassive black hole. Such a scenario can be tested with further multi-frequency radio observations of these sources through measuring their radio flux variability and spectral evolution.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Exploiting Feature Diversity for Make-up Temporal Video Grounding
Authors:
Xiujun Shu,
Wei Wen,
Taian Guo,
Sunan He,
Chen Wu,
Ruizhi Qiao
Abstract:
This technical report presents the 3rd winning solution for MTVG, a new task introduced in the 4-th Person in Context (PIC) Challenge at ACM MM 2022. MTVG aims at localizing the temporal boundary of the step in an untrimmed video based on a textual description. The biggest challenge of this task is the fi ne-grained video-text semantics of make-up steps. However, current methods mainly extract vid…
▽ More
This technical report presents the 3rd winning solution for MTVG, a new task introduced in the 4-th Person in Context (PIC) Challenge at ACM MM 2022. MTVG aims at localizing the temporal boundary of the step in an untrimmed video based on a textual description. The biggest challenge of this task is the fi ne-grained video-text semantics of make-up steps. However, current methods mainly extract video features using action-based pre-trained models. As actions are more coarse-grained than make-up steps, action-based features are not sufficient to provide fi ne-grained cues. To address this issue,we propose to achieve fi ne-grained representation via exploiting feature diversities. Specifically, we proposed a series of methods from feature extraction, network optimization, to model ensemble. As a result, we achieved 3rd place in the MTVG competition.
△ Less
Submitted 12 August, 2022;
originally announced August 2022.
-
Compact and variable radio emission from an active galaxy with supersoft X-ray emission
Authors:
Lei Yang,
Xinwen Shu,
Fabao Zhang,
Yogesh Chandola,
Daizhong Liu,
Yi Liu,
Minfeng Gu,
Margherita Giustini,
Ning Jiang,
Ya-** Li,
Di Li,
David Elbaz,
Stephanie Juneau,
Maurilio Pannella,
Luming Sun,
Ningyu Tang,
Tinggui Wang,
Hongyan Zhou
Abstract:
RX J1301.9+2747 is a unique active galaxy with supersoft X-ray spectrum that lacks significant emission at energies above 2 keV. In addition, it is one of few galaxies displaying quasi-periodic X-ray eruptions that recur on a timescale of 13-20 ks. We present multi-epoch radio observations of RX J1301.9+2747 using GMRT, VLA and VLBA. The VLBA imaging at 1.6 GHz reveals a compact radio emission unr…
▽ More
RX J1301.9+2747 is a unique active galaxy with supersoft X-ray spectrum that lacks significant emission at energies above 2 keV. In addition, it is one of few galaxies displaying quasi-periodic X-ray eruptions that recur on a timescale of 13-20 ks. We present multi-epoch radio observations of RX J1301.9+2747 using GMRT, VLA and VLBA. The VLBA imaging at 1.6 GHz reveals a compact radio emission unresolved at a scale of <0.7 pc, with a brightness temperature of T_b>5x10^7 K. The radio emission is variable by more than a factor of 2.5 over a few days, based on the data taken from VLA monitoring campaigns. The short-term radio variability suggests that the radio emitting region has a size as small as 8x10^{-4} pc, resulting in an even higher brightness temperature of T_b ~10^{12} K. A similar limit on the source size can be obtained if the observed flux variability is not intrinsic and caused by the interstellar scintillation effect. The overall radio spectrum is steep with a time-averaged spectral index alpha=-0.78+/-0.03 between 0.89 GHz and 14 GHz. These observational properties rule out a thermal or star-formation origin of the radio emission, and appear to be consistent with the scenario of episodic jet ejections driven by magnetohydrodynamic process. Simultaneous radio and X-ray monitoring observations down to a cadence of hours are required to test whether the compact and variable radio emission is correlated with the quasi-periodic X-ray eruptions.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.