-
Novel deep learning methods for 3D flow field segmentation and classification
Authors:
Xiaorui Bai,
Wenyong Wang,
Jun Zhang,
Yueqing Wang,
Yu Xiang
Abstract:
Flow field segmentation and classification help researchers to understand vortex structure and thus turbulent flow. Existing deep learning methods mainly based on global information and focused on 2D circumstance. Based on flow field theory, we propose novel flow field segmentation and classification deep learning methods in three-dimensional space. We construct segmentation criterion based on loc…
▽ More
Flow field segmentation and classification help researchers to understand vortex structure and thus turbulent flow. Existing deep learning methods mainly based on global information and focused on 2D circumstance. Based on flow field theory, we propose novel flow field segmentation and classification deep learning methods in three-dimensional space. We construct segmentation criterion based on local velocity information and classification criterion based on the relationship between local vorticity and vortex wake, to identify vortex structure in 3D flow field, and further classify the type of vortex wakes accurately and rapidly. Simulation experiment results showed that, compared with existing methods, our segmentation method can identify the vortex area more accurately, while the time consumption is reduced more than 50%; our classification method can reduce the time consumption by more than 90% while maintaining the same classification accuracy level.
△ Less
Submitted 14 June, 2023; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Recouple Event Field via Probabilistic Bias for Event Extraction
Authors:
Xingyu Bai,
Taiqiang Wu,
Han Guo,
Zhe Zhao,
Xuefeng Yang,
Jiayi Li,
Weijie Liu,
Qi Ju,
Weigang Guo,
Yujiu Yang
Abstract:
Event Extraction (EE), aiming to identify and classify event triggers and arguments from event mentions, has benefited from pre-trained language models (PLMs). However, existing PLM-based methods ignore the information of trigger/argument fields, which is crucial for understanding event schemas. To this end, we propose a Probabilistic reCoupling model enhanced Event extraction framework (ProCE). S…
▽ More
Event Extraction (EE), aiming to identify and classify event triggers and arguments from event mentions, has benefited from pre-trained language models (PLMs). However, existing PLM-based methods ignore the information of trigger/argument fields, which is crucial for understanding event schemas. To this end, we propose a Probabilistic reCoupling model enhanced Event extraction framework (ProCE). Specifically, we first model the syntactic-related event fields as probabilistic biases, to clarify the event fields from ambiguous entanglement. Furthermore, considering multiple occurrences of the same triggers/arguments in EE, we explore probabilistic interaction strategies among multiple fields of the same triggers/arguments, to recouple the corresponding clarified distributions and capture more latent information fields. Experiments on EE datasets demonstrate the effectiveness and generalization of our proposed approach.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Full velocities and propagation directions of coronal mass ejections inferred from simultaneous full-disk imaging and Sun-as-a-star spectroscopic observations
Authors:
Hong-peng Lu,
Hui Tian,
He-chao Chen,
Yu Xu,
Zhen-yong Hou,
Xian-yong Bai,
Guang-yu Tan,
Zi-hao Yang,
Jie Ren
Abstract:
Coronal mass ejections (CMEs) are violent ejections of magnetized plasma from the Sun, which can trigger geomagnetic storms, endanger satellite operations and destroy electrical infrastructures on the Earth. After systematically searching Sun-as-a-star spectra observed by the Extreme-ultraviolet Variability Experiment (EVE) onboard the Solar Dynamics Observatory (SDO) from May 2010 to May 2022, we…
▽ More
Coronal mass ejections (CMEs) are violent ejections of magnetized plasma from the Sun, which can trigger geomagnetic storms, endanger satellite operations and destroy electrical infrastructures on the Earth. After systematically searching Sun-as-a-star spectra observed by the Extreme-ultraviolet Variability Experiment (EVE) onboard the Solar Dynamics Observatory (SDO) from May 2010 to May 2022, we identified eight CMEs associated with flares and filament eruptions by analyzing the blue-wing asymmetry of the O III 52.58 nm line profiles. Combined with images simultaneously taken by the 30.4 nm channel of the Atmospheric Imaging Assembly onboard SDO, the full velocity and propagation direction for each of the eight CMEs are derived. We find a strong correlation between geomagnetic indices (Kp and Dst) and the angle between the CME propagation direction and the Sun-Earth line, suggesting that Sun-as-a-star spectroscopic observations at EUV wavelengths can potentially help to improve the prediction accuracy of the geoeffectiveness of CMEs. Moreover, an analysis of synthesized long-exposure Sun-as-a-star spectra implies that it is possible to detect CMEs from other stars through blue-wing asymmetries or blueshifts of spectral lines.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
On the Hidden Mystery of OCR in Large Multimodal Models
Authors:
Yuliang Liu,
Zhang Li,
Biao Yang,
Chunyuan Li,
Xucheng Yin,
Cheng-lin Liu,
Lianwen **,
Xiang Bai
Abstract:
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks including Text Recognition, Scene Text-Cent…
▽ More
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks including Text Recognition, Scene Text-Centric Visual Question Answering (VQA), Document-Oriented VQA, Key Information Extraction (KIE), and Handwritten Mathematical Expression Recognition (HMER). To facilitate the assessment of Optical Character Recognition (OCR) capabilities in Large Multimodal Models, we propose OCRBench, a comprehensive evaluation benchmark.Our study encompasses 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression recognition. Most importantly, the baseline results showcased in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal techniques. The evaluation pipeline and benchmark are available at https://github.com/Yuliang-Liu/MultimodalOCR.
△ Less
Submitted 17 January, 2024; v1 submitted 13 May, 2023;
originally announced May 2023.
-
Multi-Modal 3D Object Detection by Box Matching
Authors:
Zhe Liu,
Xiaoqing Ye,
Zhikang Zou,
Xinwei He,
Xiao Tan,
Errui Ding,
**gdong Wang,
Xiang Bai
Abstract:
Multi-modal 3D object detection has received growing attention as the information from different sensors like LiDAR and cameras are complementary. Most fusion methods for 3D detection rely on an accurate alignment and calibration between 3D point clouds and RGB images. However, such an assumption is not reliable in a real-world self-driving system, as the alignment between different modalities is…
▽ More
Multi-modal 3D object detection has received growing attention as the information from different sensors like LiDAR and cameras are complementary. Most fusion methods for 3D detection rely on an accurate alignment and calibration between 3D point clouds and RGB images. However, such an assumption is not reliable in a real-world self-driving system, as the alignment between different modalities is easily affected by asynchronous sensors and disturbed sensor placement. We propose a novel {F}usion network by {B}ox {M}atching (FBMNet) for multi-modal 3D detection, which provides an alternative way for cross-modal feature alignment by learning the correspondence at the bounding box level to free up the dependency of calibration during inference. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features. Extensive experiments on the nuScenes dataset demonstrate that our method is much more stable in dealing with challenging cases such as asynchronous sensors, misaligned sensor placement, and degenerated camera images than existing fusion methods. We hope that our FBMNet could provide an available solution to dealing with these challenging cases for safety in real autonomous driving scenarios. Codes will be publicly available at https://github.com/happinesslz/FBMNet.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution
Authors:
Jianfeng Kuang,
Wei Hua,
Dingkang Liang,
Mingkun Yang,
Deqiang Jiang,
Bo Ren,
Xiang Bai
Abstract:
Visual information extraction (VIE), which aims to simultaneously perform OCR and information extraction in a unified framework, has drawn increasing attention due to its essential role in various applications like understanding receipts, goods, and traffic signs. However, as existing benchmark datasets for VIE mainly consist of document images without the adequate diversity of layout structures,…
▽ More
Visual information extraction (VIE), which aims to simultaneously perform OCR and information extraction in a unified framework, has drawn increasing attention due to its essential role in various applications like understanding receipts, goods, and traffic signs. However, as existing benchmark datasets for VIE mainly consist of document images without the adequate diversity of layout structures, background disturbs, and entity categories, they cannot fully reveal the challenges of real-world applications. In this paper, we propose a large-scale dataset consisting of camera images for VIE, which contains not only the larger variance of layout, backgrounds, and fonts but also much more types of entities. Besides, we propose a novel framework for end-to-end VIE that combines the stages of OCR and information extraction in an end-to-end learning fashion. Different from the previous end-to-end approaches that directly adopt OCR features as the input of an information extraction module, we propose to use contrastive learning to narrow the semantic gap caused by the difference between the tasks of OCR and information extraction. We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE (a widely used English dataset) to our proposed dataset due to the larger variance of layout and entities. These results demonstrate our dataset is more practical for promoting advanced VIE algorithms. In addition, experiments demonstrate that the proposed VIE method consistently achieves the obvious performance gains on the proposed and SROIE datasets.
△ Less
Submitted 14 June, 2023; v1 submitted 12 May, 2023;
originally announced May 2023.
-
Investigating Forgetting in Pre-Trained Representations Through Continual Learning
Authors:
Yun Luo,
Zhen Yang,
Xuefeng Bai,
Fandong Meng,
Jie Zhou,
Yue Zhang
Abstract:
Representation forgetting refers to the drift of contextualized representations during continual training. Intuitively, the representation forgetting can influence the general knowledge stored in pre-trained language models (LMs), but the concrete effect is still unclear. In this paper, we study the effect of representation forgetting on the generality of pre-trained language models, i.e. the pote…
▽ More
Representation forgetting refers to the drift of contextualized representations during continual training. Intuitively, the representation forgetting can influence the general knowledge stored in pre-trained language models (LMs), but the concrete effect is still unclear. In this paper, we study the effect of representation forgetting on the generality of pre-trained language models, i.e. the potential capability for tackling future downstream tasks. Specifically, we design three metrics, including overall generality destruction (GD), syntactic knowledge forgetting (SynF), and semantic knowledge forgetting (SemF), to measure the evolution of general knowledge in continual learning. With extensive experiments, we find that the generality is destructed in various pre-trained LMs, and syntactic and semantic knowledge is forgotten through continual learning. Based on our experiments and analysis, we further get two insights into alleviating general knowledge forgetting: 1) training on general linguistic tasks at first can mitigate general knowledge forgetting; 2) the hybrid continual learning method can mitigate the generality destruction and maintain more general knowledge compared with those only considering rehearsal or regularization.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Measurement of ultra-high-energy diffuse gamma-ray emission of the Galactic plane from 10 TeV to 1 PeV with LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The diffuse Galactic $γ$-ray emission, mainly produced via interactions between cosmic rays and the interstellar medium and/or radiation field, is a very important probe of the distribution, propagation, and interaction of cosmic rays in the Milky Way. In this work we report the measurements of diffuse $γ$-rays from the Galactic plane between 10 TeV and 1 PeV energies, with the square kilometer ar…
▽ More
The diffuse Galactic $γ$-ray emission, mainly produced via interactions between cosmic rays and the interstellar medium and/or radiation field, is a very important probe of the distribution, propagation, and interaction of cosmic rays in the Milky Way. In this work we report the measurements of diffuse $γ$-rays from the Galactic plane between 10 TeV and 1 PeV energies, with the square kilometer array of the Large High Altitude Air Shower Observatory (LHAASO). Diffuse emissions from the inner ($15^{\circ}<l<125^{\circ}$, $|b|<5^{\circ}$) and outer ($125^{\circ}<l<235^{\circ}$, $|b|<5^{\circ}$) Galactic plane are detected with $29.1σ$ and $12.7σ$ significance, respectively. The outer Galactic plane diffuse emission is detected for the first time in the very- to ultra-high-energy domain ($E>10$~TeV). The energy spectrum in the inner Galaxy regions can be described by a power-law function with an index of $-2.99\pm0.04$, which is different from the curved spectrum as expected from hadronic interactions between locally measured cosmic rays and the line-of-sight integrated gas content. Furthermore, the measured flux is higher by a factor of $\sim3$ than the prediction. A similar spectrum with an index of $-2.99\pm0.07$ is found in the outer Galaxy region, and the absolute flux for $10\lesssim E\lesssim60$ TeV is again higher than the prediction for hadronic cosmic ray interactions. The latitude distributions of the diffuse emission are consistent with the gas distribution, while the longitude distributions show clear deviation from the gas distribution. The LHAASO measurements imply that either additional emission sources exist or cosmic ray intensities have spatial variations.
△ Less
Submitted 19 August, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
PromptRank: Unsupervised Keyphrase Extraction Using Prompt
Authors:
Aobo Kong,
Shiwan Zhao,
Hao Chen,
Qicheng Li,
Yong Qin,
Ruiqi Sun,
Xiaoyan Bai
Abstract:
The keyphrase extraction task refers to the automatic selection of phrases from a given document to summarize its core content. State-of-the-art (SOTA) performance has recently been achieved by embedding-based algorithms, which rank candidates according to how similar their embeddings are to document embeddings. However, such solutions either struggle with the document and candidate length discrep…
▽ More
The keyphrase extraction task refers to the automatic selection of phrases from a given document to summarize its core content. State-of-the-art (SOTA) performance has recently been achieved by embedding-based algorithms, which rank candidates according to how similar their embeddings are to document embeddings. However, such solutions either struggle with the document and candidate length discrepancies or fail to fully utilize the pre-trained language model (PLM) without further fine-tuning. To this end, in this paper, we propose a simple yet effective unsupervised approach, PromptRank, based on the PLM with an encoder-decoder architecture. Specifically, PromptRank feeds the document into the encoder and calculates the probability of generating the candidate with a designed prompt by the decoder. We extensively evaluate the proposed PromptRank on six widely used benchmarks. PromptRank outperforms the SOTA approach MDERank, improving the F1 score relatively by 34.18%, 24.87%, and 17.57% for 5, 10, and 15 returned results, respectively. This demonstrates the great potential of using prompt for unsupervised keyphrase extraction. We release our code at https://github.com/HLT-NLP/PromptRank.
△ Less
Submitted 15 May, 2023; v1 submitted 8 May, 2023;
originally announced May 2023.
-
A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension
Authors:
Weijia Wu,
Yuzhong Zhao,
Zhuang Li,
Jiahong Li,
Hong Zhou,
Mike Zheng Shou,
Xiang Bai
Abstract:
Most existing cross-modal language-to-video retrieval (VR) research focuses on single-modal input from video, i.e., visual representation, while the text is omnipresent in human environments and frequently critical to understand video. To study how to retrieve video with both modal inputs, i.e., visual and text semantic representations, we first introduce a large-scale and cross-modal Video Retrie…
▽ More
Most existing cross-modal language-to-video retrieval (VR) research focuses on single-modal input from video, i.e., visual representation, while the text is omnipresent in human environments and frequently critical to understand video. To study how to retrieve video with both modal inputs, i.e., visual and text semantic representations, we first introduce a large-scale and cross-modal Video Retrieval dataset with text reading comprehension, TextVR, which contains 42.2k sentence queries for 10.5k videos of 8 scenario domains, i.e., Street View (indoor), Street View (outdoor), Games, Sports, Driving, Activity, TV Show, and Cooking. The proposed TextVR requires one unified cross-modal model to recognize and comprehend texts, relate them to the visual context, and decide what text semantic information is vital for the video retrieval task. Besides, we present a detailed analysis of TextVR compared to the existing datasets and design a novel multimodal video retrieval baseline for the text-based video retrieval task. The dataset analysis and extensive experiments show that our TextVR benchmark provides many new technical challenges and insights from previous datasets for the video-and-language community. The project website and GitHub repo can be found at https://sites.google.com/view/loveucvpr23/guest-track and https://github.com/callsys/TextVR, respectively.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Search for correlations of high-energy neutrinos detected in IceCube with radio-bright AGN and gamma-ray emission from blazars
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
G. Anton,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
V. Basu,
R. Bay,
J. J. Beatty,
K. -H. Becker,
J. Becker Tjus,
J. Beise,
C. Bellenghi
, et al. (379 additional authors not shown)
Abstract:
The IceCube Neutrino Observatory sends realtime neutrino alerts with high probability of being astrophysical in origin. We present a new method to correlate these events and possible candidate sources using $2,089$ blazars from the Fermi-LAT 4LAC-DR2 catalog and with $3,413$ AGNs from the Radio Fundamental Catalog. No statistically significant neutrino emission was found in any of the catalog sear…
▽ More
The IceCube Neutrino Observatory sends realtime neutrino alerts with high probability of being astrophysical in origin. We present a new method to correlate these events and possible candidate sources using $2,089$ blazars from the Fermi-LAT 4LAC-DR2 catalog and with $3,413$ AGNs from the Radio Fundamental Catalog. No statistically significant neutrino emission was found in any of the catalog searches. The result is compatible with a small fraction, $<1$%, of AGNs being neutrino emitters and prior evidence for neutrino emission presented by IceCube and other authors from sources such as TXS 0506+056 and PKS 1502+06. We also present cross-checks to other analyses that claim a significant correlation using similar data samples, and we find that adding more information on the neutrino events and more data overall makes the result compatible with background.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
Measurement of Atmospheric Neutrino Mixing with Improved IceCube DeepCore Calibration and Data Processing
Authors:
IceCube Collaboration,
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
G. Anton,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
V. Basu,
R. Bay,
J. J. Beatty,
K. -H. Becker,
J. Becker Tjus,
J. Beise
, et al. (383 additional authors not shown)
Abstract:
We describe a new data sample of IceCube DeepCore and report on the latest measurement of atmospheric neutrino oscillations obtained with data recorded between 2011-2019. The sample includes significant improvements in data calibration, detector simulation, and data processing, and the analysis benefits from a detailed treatment of systematic uncertainties, with significantly higher level of detai…
▽ More
We describe a new data sample of IceCube DeepCore and report on the latest measurement of atmospheric neutrino oscillations obtained with data recorded between 2011-2019. The sample includes significant improvements in data calibration, detector simulation, and data processing, and the analysis benefits from a detailed treatment of systematic uncertainties, with significantly higher level of detail since our last study. By measuring the relative fluxes of neutrino flavors as a function of their reconstructed energies and arrival directions we constrain the atmospheric neutrino mixing parameters to be $\sin^2θ_{23} = 0.51\pm 0.05$ and $Δm^2_{32} = 2.41\pm0.07\times 10^{-3}\mathrm{eV}^2$, assuming a normal mass ordering. The resulting 40\% reduction in the error of both parameters with respect to our previous result makes this the most precise measurement of oscillation parameters using atmospheric neutrinos. Our results are also compatible and complementary to those obtained using neutrino beams from accelerators, which are obtained at lower neutrino energies and are subject to different sources of uncertainties.
△ Less
Submitted 8 August, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.
-
ICDAR 2023 Competition on Reading the Seal Title
Authors:
Wenwen Yu,
Mingyu Liu,
Mingrui Chen,
Ning Lu,
Yinlong Wen,
Yuliang Liu,
Dimosthenis Karatzas,
Xiang Bai
Abstract:
Reading seal title text is a challenging task due to the variable shapes of seals, curved text, background noise, and overlapped text. However, this important element is commonly found in official and financial scenarios, and has not received the attention it deserves in the field of OCR technology. To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (Re…
▽ More
Reading seal title text is a challenging task due to the variable shapes of seals, curved text, background noise, and overlapped text. However, this important element is commonly found in official and financial scenarios, and has not received the attention it deserves in the field of OCR technology. To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2). We constructed a dataset of 10,000 real seal data, covering the most common classes of seals, and labeled all seal title texts with text polygons and text contents. The competition opened on 30th December, 2022 and closed on 20th March, 2023. The competition attracted 53 participants from academia and industry including 28 submissions for Task 1 and 25 submissions for Task 2, which demonstrated significant interest in this challenging task. In this report, we present an overview of the competition, including the organization, challenges, and results. We describe the dataset and tasks, and summarize the submissions and evaluation results. The results show that significant progress has been made in the field of seal title text reading, and we hope that this competition will inspire further research and development in this important area of OCR technology.
△ Less
Submitted 5 June, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.
-
IDLS: Inverse Depth Line based Visual-Inertial SLAM
Authors:
Wanting Li,
Shuo Wang,
Yongcai Wang,
Yu Shao,
Xuewei Bai,
Deying Li
Abstract:
For robust visual-inertial SLAM in perceptually-challenging indoor environments,recent studies exploit line features to extract descriptive information about scene structure to deal with the degeneracy of point features. But existing point-line-based SLAM methods mainly use Plücker matrix or orthogonal representation to represent a line, which needs to calculate at least four variables to determin…
▽ More
For robust visual-inertial SLAM in perceptually-challenging indoor environments,recent studies exploit line features to extract descriptive information about scene structure to deal with the degeneracy of point features. But existing point-line-based SLAM methods mainly use Plücker matrix or orthogonal representation to represent a line, which needs to calculate at least four variables to determine a line. Given the numerous line features to determine in each frame, the overly flexible line representation increases the computation burden and comprises the accuracy of the results. In this paper, we propose inverse depth representation for a line, which models each extracted line feature using only two variables, i.e., the inverse depths of the two ending points. It exploits the fact that the projected line's pixel coordinates on the image plane are rather accurate, which partially restrict the line. Using this compact line presentation, Inverse Depth Line SLAM (IDLS) is proposed to track the line features in SLAM in an accurate and efficient way. A robust line triangulation method and a novel line re-projection error model are introduced. And a two-step optimization method is proposed to firstly determine the lines and then to estimate the camera poses in each frame. IDLS is extensively evaluated in multiple perceptually-challenging datasets. The results show it is more accurate, robust, and needs lower computational overhead than the current state-of-the-art of point-line-based SLAM methods.
△ Less
Submitted 30 June, 2024; v1 submitted 23 April, 2023;
originally announced April 2023.
-
The Magnetohydrodynamic-Particle-In-Cell Module in Athena++: Implementation and Code Tests
Authors:
Xiaochen Sun,
Xue-Ning Bai
Abstract:
We present a new magnetohydrodynamic-particle-in-cell (MHD-PIC) code integrated into the Athena++ framework. It treats energetic particles as in conventional PIC codes while the rest of thermal plasmas are treated as background fluid described by MHD, thus primarily targeting at multi-scale astrophysical problems involving the kinetic physics of the cosmic-rays (CRs). The code is optimized toward…
▽ More
We present a new magnetohydrodynamic-particle-in-cell (MHD-PIC) code integrated into the Athena++ framework. It treats energetic particles as in conventional PIC codes while the rest of thermal plasmas are treated as background fluid described by MHD, thus primarily targeting at multi-scale astrophysical problems involving the kinetic physics of the cosmic-rays (CRs). The code is optimized toward efficient vectorization in interpolation and particle deposits, with excellent parallel scaling. The code is also compatible with static/adaptive mesh refinement, with dynamic load balancing to further enhance multi-scale simulations. In addition, we have implemented a compressing/expanding box framework which allows adiabatic driving of CR pressure anisotropy, as well as the $δf$ method that can dramatically reduce Poisson noise in problems where distribution function $f$ is only expected to slightly deviate from the background. The code performance is demonstrated over a series of benchmark test problems including particle acceleration in non-relativistic parallel shocks. In particular, we reproduce the linear growth of the CR gyro-resonant (streaming and pressure anisotropy) instabilities, under both the periodic and expanding/compressing box setting. We anticipate the code to open up the avenue for a wide range of astrophysical and plasma physics applications.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Learning to "Segment Anything" in Thermal Infrared Images through Knowledge Distillation with a Large Scale Dataset SATIR
Authors:
Junzhang Chen,
Xiangzhi Bai
Abstract:
The Segment Anything Model (SAM) is a promptable segmentation model recently introduced by Meta AI that has demonstrated its prowess across various fields beyond just image segmentation. SAM can accurately segment images across diverse fields, and generating various masks. We discovered that this ability of SAM can be leveraged to pretrain models for specific fields. Accordingly, we have proposed…
▽ More
The Segment Anything Model (SAM) is a promptable segmentation model recently introduced by Meta AI that has demonstrated its prowess across various fields beyond just image segmentation. SAM can accurately segment images across diverse fields, and generating various masks. We discovered that this ability of SAM can be leveraged to pretrain models for specific fields. Accordingly, we have proposed a framework that utilizes SAM to generate pseudo labels for pretraining thermal infrared image segmentation tasks. Our proposed framework can effectively improve the accuracy of segmentation results of specific categories beyond the SOTA ImageNet pretrained model. Our framework presents a novel approach to collaborate with models trained with large data like SAM to address problems in special fields. Also, we generated a large scale thermal infrared segmentation dataset used for pretaining, which contains over 100,000 images with pixel-annotation labels. This approach offers an effective solution for working with large models in special fields where label annotation is challenging. Our code is available at https://github.com/chenjzBUAA/SATIR
△ Less
Submitted 16 April, 2023;
originally announced April 2023.
-
On the use of dielectric elements in axion searches with microwave resonant cavities
Authors:
Xiran Bai,
Michael J. Jewell,
Steve K. Lamoreaux,
Reina H. Maruyama,
Karl van Bibber
Abstract:
This study explores the primary effects of dielectric materials in a resonant cavity-based search for axion dark matter. While dielectrics prove beneficial in numerous cases, their incorporation may lead to less-than-optimal performance, especially for the lowest TM mode. Additionally, the stronger confinement of the electric field inside the dielectrics can exacerbate mode mixings, in particular…
▽ More
This study explores the primary effects of dielectric materials in a resonant cavity-based search for axion dark matter. While dielectrics prove beneficial in numerous cases, their incorporation may lead to less-than-optimal performance, especially for the lowest TM mode. Additionally, the stronger confinement of the electric field inside the dielectrics can exacerbate mode mixings, in particular for higher-order modes. Case studies have been carried out using a combination of analytical solutions and numerical simulations. The findings indicate dielectric cavities employing the $\text{TM}_{010}$ mode experience a significant reduction in sensitivity when compared to a similar search conducted in a cavity at equivalent frequency using no dielectrics.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
SOOD: Towards Semi-Supervised Oriented Object Detection
Authors:
Wei Hua,
Dingkang Liang,
**gyu Li,
Xiaolong Liu,
Zhikang Zou,
Xiaoqing Ye,
Xiang Bai
Abstract:
Semi-Supervised Object Detection (SSOD), aiming to explore unlabeled data for boosting object detectors, has become an active task in recent years. However, existing SSOD approaches mainly focus on horizontal objects, leaving multi-oriented objects that are common in aerial images unexplored. This paper proposes a novel Semi-supervised Oriented Object Detection model, termed SOOD, built upon the m…
▽ More
Semi-Supervised Object Detection (SSOD), aiming to explore unlabeled data for boosting object detectors, has become an active task in recent years. However, existing SSOD approaches mainly focus on horizontal objects, leaving multi-oriented objects that are common in aerial images unexplored. This paper proposes a novel Semi-supervised Oriented Object Detection model, termed SOOD, built upon the mainstream pseudo-labeling framework. Towards oriented objects in aerial scenes, we design two loss functions to provide better supervision. Focusing on the orientations of objects, the first loss regularizes the consistency between each pseudo-label-prediction pair (includes a prediction and its corresponding pseudo label) with adaptive weights based on their orientation gap. Focusing on the layout of an image, the second loss regularizes the similarity and explicitly builds the many-to-many relation between the sets of pseudo-labels and predictions. Such a global consistency constraint can further boost semi-supervised learning. Our experiments show that when trained with the two proposed losses, SOOD surpasses the state-of-the-art SSOD methods under various settings on the DOTA-v1.5 benchmark. The code will be available at https://github.com/HamPerdredes/SOOD.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
ICDAR 2023 Video Text Reading Competition for Dense and Small Text
Authors:
Weijia Wu,
Yuzhong Zhao,
Zhuang Li,
Jiahong Li,
Mike Zheng Shou,
Umapada Pal,
Dimosthenis Karatzas,
Xiang Bai
Abstract:
Recently, video text detection, tracking, and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenarios, while ignoring extreme video text challenges, i.e., dense and small text in various scenarios. In this competition report, we establish a…
▽ More
Recently, video text detection, tracking, and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenarios, while ignoring extreme video text challenges, i.e., dense and small text in various scenarios. In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios. Compared with the previous datasets, the proposed dataset mainly include three new challenges: 1) Dense video texts, a new challenge for video text spotter. 2) High-proportioned small texts. 3) Various new scenarios, e.g., Game, sports, etc. The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task 2)). During the competition period (opened on 15th February 2023 and closed on 20th March 2023), a total of 24 teams participated in the three proposed tasks with around 30 valid submissions, respectively. In this article, we describe detailed statistical information of the dataset, tasks, evaluation protocols and the results summaries of the ICDAR 2023 on DSText competition. Moreover, we hope the benchmark will promise video text research in the community.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model
Authors:
Dingkang Liang,
Jiahao Xie,
Zhikang Zou,
Xiaoqing Ye,
Wei Xu,
Xiang Bai
Abstract:
Supervised crowd counting relies heavily on costly manual labeling, which is difficult and expensive, especially in dense scenes. To alleviate the problem, we propose a novel unsupervised framework for crowd counting, named CrowdCLIP. The core idea is built on two observations: 1) the recent contrastive pre-trained vision-language model (CLIP) has presented impressive performance on various downst…
▽ More
Supervised crowd counting relies heavily on costly manual labeling, which is difficult and expensive, especially in dense scenes. To alleviate the problem, we propose a novel unsupervised framework for crowd counting, named CrowdCLIP. The core idea is built on two observations: 1) the recent contrastive pre-trained vision-language model (CLIP) has presented impressive performance on various downstream tasks; 2) there is a natural map** between crowd patches and count text. To the best of our knowledge, CrowdCLIP is the first to investigate the vision language knowledge to solve the counting problem. Specifically, in the training stage, we exploit the multi-modal ranking loss by constructing ranking text prompts to match the size-sorted crowd patches to guide the image encoder learning. In the testing stage, to deal with the diversity of image patches, we propose a simple yet effective progressive filtering strategy to first select the highly potential crowd patches and then map them into the language space with various counting intervals. Extensive experiments on five challenging datasets demonstrate that the proposed CrowdCLIP achieves superior performance compared to previous unsupervised state-of-the-art counting methods. Notably, CrowdCLIP even surpasses some popular fully-supervised methods under the cross-dataset setting. The source code will be available at https://github.com/dk-liang/CrowdCLIP.
△ Less
Submitted 9 April, 2023;
originally announced April 2023.
-
IceCat-1: the IceCube Event Catalog of Alert Tracks
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
G. Anton,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
V. Basu,
R. Bay,
J. J. Beatty,
K. -H. Becker,
J. Becker Tjus,
J. Beise,
C. Bellenghi
, et al. (369 additional authors not shown)
Abstract:
We present a catalog of likely astrophysical neutrino track-like events from the IceCube Neutrino Observatory. IceCube began reporting likely astrophysical neutrinos in 2016 and this system was updated in 2019. The catalog presented here includes events that were reported in real-time since 2019, as well as events identified in archival data samples starting from 2011. We report 275 neutrino event…
▽ More
We present a catalog of likely astrophysical neutrino track-like events from the IceCube Neutrino Observatory. IceCube began reporting likely astrophysical neutrinos in 2016 and this system was updated in 2019. The catalog presented here includes events that were reported in real-time since 2019, as well as events identified in archival data samples starting from 2011. We report 275 neutrino events from two selection channels as the first entries in the catalog, the IceCube Event Catalog of Alert Tracks, which will see ongoing extensions with additional alerts. The gold and bronze alert channels respectively provide neutrino candidates with 50\% and 30\% probability of being astrophysical, on average assuming an astrophysical neutrino power law energy spectral index of 2.19. For each neutrino alert, we provide the reconstructed energy, direction, false alarm rate, probability of being astrophysical in origin, and likelihood contours describing the spatial uncertainty in the alert's reconstructed location. We also investigate a directional correlation of these neutrino events with gamma-ray and X-ray catalogs including 4FGL, 3HWC, TeVCat and Swift-BAT.
△ Less
Submitted 11 April, 2024; v1 submitted 3 April, 2023;
originally announced April 2023.
-
One Training for Multiple Deployments: Polar-based Adaptive BEV Perception for Autonomous Driving
Authors:
Huitong Yang,
Xuyang Bai,
Xinge Zhu,
Yuexin Ma
Abstract:
Current on-board chips usually have different computing power, which means multiple training processes are needed for adapting the same learning-based algorithm to different chips, costing huge computing resources. The situation becomes even worse for 3D perception methods with large models. Previous vision-centric 3D perception approaches are trained with regular grid-represented feature maps of…
▽ More
Current on-board chips usually have different computing power, which means multiple training processes are needed for adapting the same learning-based algorithm to different chips, costing huge computing resources. The situation becomes even worse for 3D perception methods with large models. Previous vision-centric 3D perception approaches are trained with regular grid-represented feature maps of fixed resolutions, which is not applicable to adapt to other grid scales, limiting wider deployment. In this paper, we leverage the Polar representation when constructing the BEV feature map from images in order to achieve the goal of training once for multiple deployments. Specifically, the feature along rays in Polar space can be easily adaptively sampled and projected to the feature in Cartesian space with arbitrary resolutions. To further improve the adaptation capability, we make multi-scale contextual information interact with each other to enhance the feature representation. Experiments on a large-scale autonomous driving dataset show that our method outperforms others as for the good property of one training for multiple deployments.
△ Less
Submitted 2 April, 2023;
originally announced April 2023.
-
Switching Pushing Skill Combined MPC and Deep Reinforcement Learning for Planar Non-prehensile Manipulation
Authors:
Bo Zhang,
Cong Huang,
Haixu Zhang,
Xiaoshan Bai
Abstract:
In this paper, a novel switching pushing skill algorithm is proposed to improve the efficiency of planar non-prehensile manipulation, which draws inspiration from human pushing actions and comprises two sub-problems, i.e., discrete decision-making of pushing point and continuous feedback control of pushing action. In order to solve the sub-problems above, a combination of Model Predictive Control…
▽ More
In this paper, a novel switching pushing skill algorithm is proposed to improve the efficiency of planar non-prehensile manipulation, which draws inspiration from human pushing actions and comprises two sub-problems, i.e., discrete decision-making of pushing point and continuous feedback control of pushing action. In order to solve the sub-problems above, a combination of Model Predictive Control (MPC) and Deep Reinforcement Learning (DRL) method is employed. Firstly, the selection of pushing point is modeled as a Markov decision process,and an off-policy DRL method is used by resha** the reward function to train the decision-making model for selecting pushing point from a pre-constructed set based on the current state. Secondly, a motion constraint region (MCR) is constructed for the specific pushing point based on the distance from the target, followed by utilizing the MPC controller to regulate the motion of the object within the MCR towards the target pose. The trigger condition for switching the pushing point occurs when the object reaches the boundary of the MCR under the pushing action. Subsequently, the pushing point and the controller are updated iteratively until the target pose is reached. We conducted pushing experiments on four distinct object shapes in both simulated and physical environments to evaluate our method. The results indicate that our method achieves a significantly higher training efficiency, with a training time that is only about 20% of the baseline method while maintaining around the same success rate. Moreover, our method outperforms the baseline method in terms of both training and execution efficiency of pushing operations, allowing for rapid learning of robot pushing skills.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Continuous spin excitations in the three-dimensional frustrated magnet K2Ni2(SO4)3
Authors:
Weiliang Yao,
Qing Huang,
Tao Xie,
Andrey Podlesnyak,
Alexander Brassington,
Chengkun Xing,
Ranuri S. Dissanayaka Mudiyanselage,
Weiwei Xie,
Shengzhi Zhang,
Minseong Lee,
Vivien S. Zapf,
Xiaojian Bai,
D. Alan Tennant,
Jian Liu,
Haidong Zhou
Abstract:
Continuous spin excitations are widely recognized as one of the hallmarks of novel spin states in quantum magnets, such as quantum spin liquids (QSLs). Here, we report the observation of such kind of excitations in K2Ni2(SO4)3, which consists of two sets of intersected spin-1 Ni2+ trillium lattices. Our inelastic neutron scattering measurement on single crystals clearly shows a dominant excitation…
▽ More
Continuous spin excitations are widely recognized as one of the hallmarks of novel spin states in quantum magnets, such as quantum spin liquids (QSLs). Here, we report the observation of such kind of excitations in K2Ni2(SO4)3, which consists of two sets of intersected spin-1 Ni2+ trillium lattices. Our inelastic neutron scattering measurement on single crystals clearly shows a dominant excitation continuum, which exhibits a distinct temperature-dependent behavior from that of spin waves, and is rooted in strong quantum spin fluctuations. Further using the self-consistent-gaussian-approximation method, we determined the fourth- and fifth-nearest neighbor exchange interactions are dominant. These two bonds together form a unique three-dimensional network of corner-sharing tetrahedra, which we name as ''hyper-trillium'' lattice. Our results provide direct evidence for the existence of QSL features in K2Ni2(SO4)3 and highlight the potential for the hyper-trillium lattice to host frustrated quantum magnetism.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
A Search for IceCube sub-TeV Neutrinos Correlated with Gravitational-Wave Events Detected By LIGO/Virgo
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
G. Anton,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
V. Basu,
R. Bay,
J. J. Beatty,
K. -H. Becker,
J. Becker Tjus,
J. Beise,
C. Bellenghi
, et al. (364 additional authors not shown)
Abstract:
The LIGO/Virgo collaboration published the catalogs GWTC-1, GWTC-2.1 and GWTC-3 containing candidate gravitational-wave (GW) events detected during its runs O1, O2 and O3. These GW events can be possible sites of neutrino emission. In this paper, we present a search for neutrino counterparts of 90 GW candidates using IceCube DeepCore, the low-energy infill array of the IceCube Neutrino Observatory…
▽ More
The LIGO/Virgo collaboration published the catalogs GWTC-1, GWTC-2.1 and GWTC-3 containing candidate gravitational-wave (GW) events detected during its runs O1, O2 and O3. These GW events can be possible sites of neutrino emission. In this paper, we present a search for neutrino counterparts of 90 GW candidates using IceCube DeepCore, the low-energy infill array of the IceCube Neutrino Observatory. The search is conducted using an unbinned maximum likelihood method, within a time window of 1000 s and uses the spatial and timing information from the GW events. The neutrinos used for the search have energies ranging from a few GeV to several tens of TeV. We do not find any significant emission of neutrinos, and place upper limits on the flux and the isotropic-equivalent energy emitted in low-energy neutrinos. We also conduct a binomial test to search for source populations potentially contributing to neutrino emission. We report a non-detection of a significant neutrino-source population with this test.
△ Less
Submitted 11 December, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
STCF Conceptual Design Report: Volume 1 -- Physics & Detector
Authors:
M. Achasov,
X. C. Ai,
R. Aliberti,
L. P. An,
Q. An,
X. Z. Bai,
Y. Bai,
O. Bakina,
A. Barnyakov,
V. Blinov,
V. Bobrovnikov,
D. Bodrov,
A. Bogomyagkov,
A. Bondar,
I. Boyko,
Z. H. Bu,
F. M. Cai,
H. Cai,
J. J. Cao,
Q. H. Cao,
Z. Cao,
Q. Chang,
K. T. Chao,
D. Y. Chen,
H. Chen
, et al. (413 additional authors not shown)
Abstract:
The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII,…
▽ More
The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII, providing a unique platform for exploring the asymmetry of matter-antimatter (charge-parity violation), in-depth studies of the internal structure of hadrons and the nature of non-perturbative strong interactions, as well as searching for exotic hadrons and physics beyond the Standard Model. The STCF project in China is under development with an extensive R\&D program. This document presents the physics opportunities at the STCF, describes conceptual designs of the STCF detector system, and discusses future plans for detector R\&D and physics case studies.
△ Less
Submitted 5 October, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Heating of quiescent coronal loops caused by nearby eruptions observed with the Solar Dynamics Observatory and the Solar Upper Transition Region Imager
Authors:
Le** Li,
Hui Tian,
Huadong Chen,
Hongqiang Song,
Zhenyong Hou,
Xianyong Bai,
Kaifan Ji,
Yuanyong Deng
Abstract:
How structures, e.g., magnetic loops, in the upper atmosphere, i.e., the transition region and corona, are heated and sustained is one of the major unresolved issues in solar and stellar physics. Various theoretical and observational studies on the heating of coronal loops have been undertaken. The heating of quiescent loops caused by eruptions is, however, rarely observed. In this study, employin…
▽ More
How structures, e.g., magnetic loops, in the upper atmosphere, i.e., the transition region and corona, are heated and sustained is one of the major unresolved issues in solar and stellar physics. Various theoretical and observational studies on the heating of coronal loops have been undertaken. The heating of quiescent loops caused by eruptions is, however, rarely observed. In this study, employing data from the Solar Dynamics Observatory (SDO) and Solar Upper Transition Region Imager (SUTRI), we report the heating of quiescent loops associated with nearby eruptions. In active regions (ARs) 13092 and 13093, a long filament and a short filament, and their overlying loops are observed on 2022 September 4. In AR 13093, a warm channel erupted toward the northeast, whose material moved along its axis toward the northwest under the long filament, turned to the west above the long filament, and divided into two branches falling to the solar surface. Subsequently, the short filament erupted toward the southeast. Associated with these two eruptions, the quiescent loops overlying the long filament appeared in SDO/Atmospheric Imaging Assembly (AIA) high-temperature images, indicating the heating of loops. During the heating, signature of magnetic reconnection between loops is identified, including the inflowing motions of loops, and the formation of X-type structures and newly reconnected loops. The heated loops then cooled down. They appeared sequentially in AIA and SUTRI lower-temperature images. All the results suggest that the quiescent loops are heated by reconnection between loops caused by the nearby warm channel and filament eruptions.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
An End-to-End Framework For Universal Lesion Detection With Missing Annotations
Authors:
Xiaoyu Bai,
Yong Xia
Abstract:
Fully annotated large-scale medical image datasets are highly valuable. However, because labeling medical images is tedious and requires specialized knowledge, the large-scale datasets available often have missing annotation issues. For instance, DeepLesion, a large-scale CT image dataset with labels for various kinds of lesions, is reported to have a missing annotation rate of 50\%. Directly trai…
▽ More
Fully annotated large-scale medical image datasets are highly valuable. However, because labeling medical images is tedious and requires specialized knowledge, the large-scale datasets available often have missing annotation issues. For instance, DeepLesion, a large-scale CT image dataset with labels for various kinds of lesions, is reported to have a missing annotation rate of 50\%. Directly training a lesion detector on it would suffer from false negative supervision caused by unannotated lesions. To address this issue, previous works have used sophisticated multi-stage strategies to switch between lesion mining and detector training. In this work, we present a novel end-to-end framework for mining unlabeled lesions while simultaneously training the detector. Our framework follows the teacher-student paradigm. In each iteration, the teacher model infers the input data and creates a set of predictions. High-confidence predictions are combined with partially-labeled ground truth for training the student model. On the DeepLesion dataset, using the original partially labeled training set, our model can outperform all other more complicated methods and surpass the previous best method by 2.3\% on average sensitivity and 2.7\% on average precision, achieving state-of-the-art universal lesion detection results.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
A Large Double-ring Disk around the Taurus M Dwarf J04124068+2438157
Authors:
Feng Long,
Bin B. Ren,
Nicole L. Wallack,
Daniel Harsono,
Gregory J. Herczeg,
Paola Pinilla,
Dimitri Mawet,
Michael C. Liu,
Sean M. Andrews,
Xue-Ning Bai,
Sylvie Cabrit,
Lucas A. Cieza,
Doug Johnstone,
Jarron M. Leisenring,
Giuseppe Lodato,
Yao Liu,
Carlo F. Manara,
Gijs D. Mulders,
Enrico Ragusa,
Steph Sallum,
Yangfan Shi,
Marco Tazzari,
Taichi Uyama,
Kevin Wagner,
David J. Wilner
, et al. (1 additional authors not shown)
Abstract:
Planet formation imprints signatures on the physical structures of disks. In this paper, we present high-resolution ($\sim$50 mas, 8 au) Atacama Large Millimeter/submillimeter Array (ALMA) observations of 1.3 mm dust continuum and CO line emission toward the disk around the M3.5 star 2MASS J04124068+2438157. The dust disk consists only of two narrow rings at radial distances of 0.47 and 0.78 arcse…
▽ More
Planet formation imprints signatures on the physical structures of disks. In this paper, we present high-resolution ($\sim$50 mas, 8 au) Atacama Large Millimeter/submillimeter Array (ALMA) observations of 1.3 mm dust continuum and CO line emission toward the disk around the M3.5 star 2MASS J04124068+2438157. The dust disk consists only of two narrow rings at radial distances of 0.47 and 0.78 arcsec ($\sim$70 and 116 au), with Gaussian $σ$ widths of 5.6 and 8.5 au, respectively. The width of the outer ring is smaller than the estimated pressure scale height by $\sim25\%$, suggesting dust trap** in a radial pressure bump. The dust disk size, set by the location of the outermost ring, is significantly larger (by $3σ$) than other disks with similar millimeter luminosity, which can be explained by an early formation of local pressure bump to stop radial drift of millimeter dust grains. After considering the disk's physical structure and accretion properties, we prefer planet--disk interaction over dead zone or photoevaporation models to explain the observed dust disk morphology. We carry out high-contrast imaging at $L'$ band using Keck/NIRC2 to search for potential young planets, but do not identify any source above $5σ$. Within the dust gap between the two rings, we reach a contrast level of $\sim$7 mag, constraining the possible planet below $\sim$2--4 $M_{\rm Jup}$. Analyses of the gap/ring properties suggest a $\sim$Saturn mass planet at $\sim$90 au is likely responsible for the formation of the outer ring, which can be potentially revealed with JWST.
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
ViPFormer: Efficient Vision-and-Pointcloud Transformer for Unsupervised Pointcloud Understanding
Authors:
Hongyu Sun,
Yongcai Wang,
Xudong Cai,
Xuewei Bai,
Deying Li
Abstract:
Recently, a growing number of work design unsupervised paradigms for point cloud processing to alleviate the limitation of expensive manual annotation and poor transferability of supervised methods. Among them, CrossPoint follows the contrastive learning framework and exploits image and point cloud data for unsupervised point cloud understanding. Although the promising performance is presented, th…
▽ More
Recently, a growing number of work design unsupervised paradigms for point cloud processing to alleviate the limitation of expensive manual annotation and poor transferability of supervised methods. Among them, CrossPoint follows the contrastive learning framework and exploits image and point cloud data for unsupervised point cloud understanding. Although the promising performance is presented, the unbalanced architecture makes it unnecessarily complex and inefficient. For example, the image branch in CrossPoint is $\sim$8.3x heavier than the point cloud branch leading to higher complexity and latency. To address this problem, in this paper, we propose a lightweight Vision-and-Pointcloud Transformer (ViPFormer) to unify image and point cloud processing in a single architecture. ViPFormer learns in an unsupervised manner by optimizing intra-modal and cross-modal contrastive objectives. Then the pretrained model is transferred to various downstream tasks, including 3D shape classification and semantic segmentation. Experiments on different datasets show ViPFormer surpasses previous state-of-the-art unsupervised methods with higher accuracy, lower model complexity and runtime latency. Finally, the effectiveness of each component in ViPFormer is validated by extensive ablation studies. The implementation of the proposed method is available at https://github.com/auniquesun/ViPFormer.
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs
Authors:
Taiqiang Wu,
Zhe Zhao,
Jiahao Wang,
Xingyu Bai,
Lei Wang,
Ngai Wong,
Yujiu Yang
Abstract:
Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely exclusively on the node features and fail to capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various…
▽ More
Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely exclusively on the node features and fail to capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose a Prototype-Guided Knowledge Distillation~(PGKD) method, which does not require graph edges~(edge-free) yet learns structure-aware MLPs. Specifically, we analyze the graph structural information in GNN teachers, and distill such information from GNNs to MLPs via prototypes in an edge-free setting. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD.
△ Less
Submitted 27 March, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Search for neutrino lines from dark matter annihilation and decay with IceCube
Authors:
The IceCube Collaboration,
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
G. Anton,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
V. Basu,
R. Bay,
J. J. Beatty,
K. -H. Becker,
J. Becker Tjus,
J. Beise
, et al. (373 additional authors not shown)
Abstract:
Dark Matter particles in the Galactic Center and halo can annihilate or decay into a pair of neutrinos producing a monochromatic flux of neutrinos. The spectral feature of this signal is unique and it is not expected from any astrophysical production mechanism. Its observation would constitute a dark matter smoking gun signal. We performed the first dedicated search with a neutrino telescope for s…
▽ More
Dark Matter particles in the Galactic Center and halo can annihilate or decay into a pair of neutrinos producing a monochromatic flux of neutrinos. The spectral feature of this signal is unique and it is not expected from any astrophysical production mechanism. Its observation would constitute a dark matter smoking gun signal. We performed the first dedicated search with a neutrino telescope for such signal, by looking at both the angular and energy information of the neutrino events. To this end, a total of five years of IceCube's DeepCore data has been used to test dark matter masses ranging from 10~GeV to 40~TeV. No significant neutrino excess was found and upper limits on the annihilation cross section, as well as lower limits on the dark matter lifetime, were set. The limits reached are of the order of $10^{-24}$~cm$^3/s$ for an annihilation and up to $10^{27}$ seconds for decaying Dark Matter. Using the same data sample we also derive limits for dark matter annihilation or decay into a pair of Standard Model charged particles.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Authors:
Zhibo Yang,
Rujiao Long,
Pengfei Wang,
Sibo Song,
Humen Zhong,
Wenqing Cheng,
Xiang Bai,
Cong Yao
Abstract:
Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to the wide range of real-world applications. Previously, numerous works have been proposed to tackle this problem. However, the benchmarks used to assess these methods are relatively plain, i.e., scenarios with real-world complexity are not fully represented in these bench…
▽ More
Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to the wide range of real-world applications. Previously, numerous works have been proposed to tackle this problem. However, the benchmarks used to assess these methods are relatively plain, i.e., scenarios with real-world complexity are not fully represented in these benchmarks. As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common. All these factors may lead to failures in information extraction. Therefore, as the second contribution, we explore an alternative approach to precisely and robustly extract key information from document images under such tough conditions. Specifically, in contrast to previous methods, which usually either incorporate visual information into a multi-modal architecture or train text spotting and information extraction in an end-to-end fashion, we explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities, which could largely benefit entity labeling and linking. Extensive experiments on standard benchmarks in this field as well as the proposed dataset demonstrate that the proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models. Dataset is available at https://www.modelscope.cn/datasets/damo/SIBR/summary.
△ Less
Submitted 28 March, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
Authors:
Kaixin Xiong,
Shi Gong,
Xiaoqing Ye,
Xiao Tan,
Ji Wan,
Errui Ding,
**gdong Wang,
Xiang Bai
Abstract:
In this paper, we address the problem of detecting 3D objects from multi-view images. Current query-based methods rely on global 3D position embeddings (PE) to learn the geometric correspondence between images and 3D space. We claim that directly interacting 2D image features with global 3D PE could increase the difficulty of learning view transformation due to the variation of camera extrinsics.…
▽ More
In this paper, we address the problem of detecting 3D objects from multi-view images. Current query-based methods rely on global 3D position embeddings (PE) to learn the geometric correspondence between images and 3D space. We claim that directly interacting 2D image features with global 3D PE could increase the difficulty of learning view transformation due to the variation of camera extrinsics. Thus we propose a novel method based on CAmera view Position Embedding, called CAPE. We form the 3D position embeddings under the local camera-view coordinate system instead of the global coordinate system, such that 3D position embedding is free of encoding camera extrinsic parameters. Furthermore, we extend our CAPE to temporal modeling by exploiting the object queries of previous frames and encoding the ego-motion for boosting 3D object detection. CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset. Codes and models are available on \href{https://github.com/PaddlePaddle/Paddle3D}{Paddle3D} and \href{https://github.com/kaixinbear/CAPE}{PyTorch Implementation}.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
InstMove: Instance Motion for Object-centric Video Segmentation
Authors:
Qihao Liu,
Junfeng Wu,
Yi Jiang,
Xiang Bai,
Alan Yuille,
Song Bai
Abstract:
Despite significant efforts, cutting-edge video segmentation methods still remain sensitive to occlusion and rapid movement, due to their reliance on the appearance of objects in the form of object embeddings, which are vulnerable to these disturbances. A common solution is to use optical flow to provide motion information, but essentially it only considers pixel-level motion, which still relies o…
▽ More
Despite significant efforts, cutting-edge video segmentation methods still remain sensitive to occlusion and rapid movement, due to their reliance on the appearance of objects in the form of object embeddings, which are vulnerable to these disturbances. A common solution is to use optical flow to provide motion information, but essentially it only considers pixel-level motion, which still relies on appearance similarity and hence is often inaccurate under occlusion and fast movement. In this work, we study the instance-level motion and present InstMove, which stands for Instance Motion for Object-centric Video Segmentation. In comparison to pixel-wise motion, InstMove mainly relies on instance-level motion information that is free from image feature embeddings, and features physical interpretations, making it more accurate and robust toward occlusion and fast-moving objects. To better fit in with the video segmentation tasks, InstMove uses instance masks to model the physical presence of an object and learns the dynamic model through a memory network to predict its position and shape in the next frame. With only a few lines of code, InstMove can be integrated into current SOTA methods for three different video segmentation tasks and boost their performance. Specifically, we improve the previous arts by 1.5 AP on OVIS dataset, which features heavy occlusions, and 4.9 AP on YouTubeVIS-Long dataset, which mainly contains fast-moving objects. These results suggest that instance-level motion is robust and accurate, and hence serving as a powerful solution in complex scenarios for object-centric video segmentation.
△ Less
Submitted 30 March, 2023; v1 submitted 14 March, 2023;
originally announced March 2023.
-
CrysFieldExplorer: a software for rapid optimization of crystal field Hamiltonian
Authors:
Qianli Ma,
Xiaojian Bai,
Erxi Feng,
Guannan Zhang,
Huibo Cao
Abstract:
We present a new lite python-based program, CrysFieldExplorer, for fast optimizing crystal electric field (CEF) parameters to fit experimental data. The main novelty of CrysFieldExplorer is the development of a unique loss function, referred to as the Spectrum-Characteristic Loss ($L_{\text{Spectrum}}$), which is defined based on the characteristic polynomial of the Hamiltonian matrix. Particle Sw…
▽ More
We present a new lite python-based program, CrysFieldExplorer, for fast optimizing crystal electric field (CEF) parameters to fit experimental data. The main novelty of CrysFieldExplorer is the development of a unique loss function, referred to as the Spectrum-Characteristic Loss ($L_{\text{Spectrum}}$), which is defined based on the characteristic polynomial of the Hamiltonian matrix. Particle Swarm Optimization and Covariance matrix adaptation evolution strategy are used to find the minimum of the total loss function. We demonstrate that CrysFieldExplorer can performs direct fitting of CEF parameters to any experimental data such as neutron spectrum, susceptibility, magnetizations etc. CrysFieldExplorer can handle a large amount of none-zero CEF parameters and reveal multiple local and global minimum solutions. Detailed crystal field theory, description of the loss function, implementation and limit of the program are discussed within context of two examples.
△ Less
Submitted 14 March, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Mode-locking Theory for Long-Range Interaction in Artificial Neural Networks
Authors:
Xiuxiu Bai,
Shuaishuai Zhao,
Yao Gao,
Zhe Liu
Abstract:
Visual long-range interaction refers to modeling dependencies between distant feature points or blocks within an image, which can significantly enhance the model's robustness. Both CNN and Transformer can establish long-range interactions through layering and patch calculations. However, the underlying mechanism of long-range interaction in visual space remains unclear. We propose the mode-locking…
▽ More
Visual long-range interaction refers to modeling dependencies between distant feature points or blocks within an image, which can significantly enhance the model's robustness. Both CNN and Transformer can establish long-range interactions through layering and patch calculations. However, the underlying mechanism of long-range interaction in visual space remains unclear. We propose the mode-locking theory as the underlying mechanism, which constrains the phase and wavelength relationship between waves to achieve mode-locked interference waveform. We verify this theory through simulation experiments and demonstrate the mode-locking pattern in real-world scene models. Our proposed theory of long-range interaction provides a comprehensive understanding of the mechanism behind this phenomenon in artificial neural networks. This theory can inspire the integration of the mode-locking pattern into models to enhance their robustness.
△ Less
Submitted 9 March, 2023;
originally announced March 2023.
-
Observation of Seasonal Variations of the Flux of High-Energy Atmospheric Neutrinos with IceCube
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
G. Anton,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
V. Basu,
R. Bay,
J. J. Beatty,
K. -H. Becker,
J. Becker Tjus,
J. Beise,
C. Bellenghi
, et al. (369 additional authors not shown)
Abstract:
Atmospheric muon neutrinos are produced by meson decays in cosmic-ray-induced air showers. The flux depends on meteorological quantities such as the air temperature, which affects the density of air. Competition between decay and re-interaction of those mesons in the first particle production generations gives rise to a higher neutrino flux when the air density in the stratosphere is lower, corres…
▽ More
Atmospheric muon neutrinos are produced by meson decays in cosmic-ray-induced air showers. The flux depends on meteorological quantities such as the air temperature, which affects the density of air. Competition between decay and re-interaction of those mesons in the first particle production generations gives rise to a higher neutrino flux when the air density in the stratosphere is lower, corresponding to a higher temperature. A measurement of a temperature dependence of the atmospheric $ν_μ$ flux provides a novel method for constraining hadro\-nic interaction models of air showers. It is particularly sensitive to the production of kaons. Studying this temperature dependence for the first time requires a large sample of high-energy neutrinos as well as a detailed understanding of atmospheric properties. We report the significant ($> 10 σ$) observation of a correlation between the rate of more than 260,000 neutrinos, detected by IceCube between 2012 and 2018, and atmospheric temperatures of the stratosphere, measured by the Atmospheric Infrared Sounder (AIRS) instrument aboard NASA's AQUA satellite. For the observed 10$\%$ seasonal change of effective atmospheric temperature we measure a 3.5(3)$\%$ change in the muon neutrino flux. This observed correlation deviates by about 2-3 standard deviations from the expected correlation of 4.3$\%$ as obtained from theoretical predictions under the assumption of various hadronic interaction models
△ Less
Submitted 9 May, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
The Solar Upper Transition Region Imager (SUTRI) onboard the SATech-01 satellite
Authors:
Xianyong Bai,
Hui Tian,
Yuanyong Deng,
Zhanshan Wang,
Jianfeng Yang,
Xiaofeng Zhang,
Yonghe Zhang,
Runze Qi,
Nange Wang,
Yang Gao,
Jun Yu,
Chunling He,
Zhengxiang Shen,
Lun Shen,
Song Guo,
Zhenyong Hou,
Kaifan Ji,
Xingzi Bi,
Wei Duan,
Xiao Yang,
Jiaben Lin,
Ziyao Hu,
Qian Song,
Zihao Yang,
Yajie Chen
, et al. (34 additional authors not shown)
Abstract:
The Solar Upper Transition Region Imager (SUTRI) onboard the Space Advanced Technology demonstration satellite (SATech-01), which was launched to a sun-synchronous orbit at a height of 500 km in July 2022, aims to test the on-orbit performance of our newly developed Sc-Si multi-layer reflecting mirror and the 2kx2k EUV CMOS imaging camera and to take full-disk solar images at the Ne VII 46.5 nm sp…
▽ More
The Solar Upper Transition Region Imager (SUTRI) onboard the Space Advanced Technology demonstration satellite (SATech-01), which was launched to a sun-synchronous orbit at a height of 500 km in July 2022, aims to test the on-orbit performance of our newly developed Sc-Si multi-layer reflecting mirror and the 2kx2k EUV CMOS imaging camera and to take full-disk solar images at the Ne VII 46.5 nm spectral line with a filter width of 3 nm. SUTRI employs a Ritchey-Chretien optical system with an aperture of 18 cm. The on-orbit observations show that SUTRI images have a field of view of 41.6'x41.6' and a moderate spatial resolution of 8" without an image stabilization system. The normal cadence of SUTRI images is 30 s and the solar observation time is about 16 hours each day because the earth eclipse time accounts for about 1/3 of SATech-01's orbit period. Approximately 15 GB data is acquired each day and made available online after processing. SUTRI images are valuable as the Ne VII 46.5 nm line is formed at a temperature regime of 0.5 MK in the solar atmosphere, which has rarely been sampled by existing solar imagers. SUTRI observations will establish connections between structures in the lower solar atmosphere and corona, and advance our understanding of various types of solar activity such as flares, filament eruptions, coronal jets and coronal mass ejections.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
Constraining High-Energy Neutrino Emission from Supernovae with IceCube
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
G. Anton,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
V. Basu,
R. Bay,
J. J. Beatty,
K. -H. Becker,
J. Becker Tjus,
J. Beise,
C. Bellenghi
, et al. (364 additional authors not shown)
Abstract:
Core-collapse supernovae are a promising potential high-energy neutrino source class. We test for correlation between seven years of IceCube neutrino data and a catalog containing more than 1000 core-collapse supernovae of types IIn and IIP and a sample of stripped-envelope supernovae. We search both for neutrino emission from individual supernovae, and for combined emission from the whole superno…
▽ More
Core-collapse supernovae are a promising potential high-energy neutrino source class. We test for correlation between seven years of IceCube neutrino data and a catalog containing more than 1000 core-collapse supernovae of types IIn and IIP and a sample of stripped-envelope supernovae. We search both for neutrino emission from individual supernovae, and for combined emission from the whole supernova sample through a stacking analysis. No significant spatial or temporal correlation of neutrinos with the cataloged supernovae was found. The overall deviation of all tested scenarios from the background expectation yields a p-value of 93% which is fully compatible with background. The derived upper limits on the total energy emitted in neutrinos are $1.7\times 10^{48}$ erg for stripped-envelope supernovae, $2.8\times 10^{48}$ erg for type IIP, and $1.3\times 10^{49}$ erg for type IIn SNe, the latter disfavouring models with optimistic assumptions for neutrino production in interacting supernovae. We conclude that strippe-envelope supernovae and supernovae of type IIn do not contribute more than $14.6\%$ and $33.9\%$ respectively to the diffuse neutrino flux in the energy range of about $10^3-10^5$ GeV, assuming that the neutrino energy spectrum follows a power-law with an index of $-2.5$. Under the same assumption, we can only constrain the contribution of type IIP SNe to no more than $59.9\%$. Thus core-collapse supernovae of types IIn and stripped-envelope supernovae can both be ruled out as the dominant source of the diffuse neutrino flux under the given assumptions.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
The Present and Future of QCD
Authors:
P. Achenbach,
D. Adhikari,
A. Afanasev,
F. Afzal,
C. A. Aidala,
A. Al-bataineh,
D. K. Almaalol,
M. Amaryan,
D. Androić,
W. R. Armstrong,
M. Arratia,
J. Arrington,
A. Asaturyan,
E. C. Aschenauer,
H. Atac,
H. Avakian,
T. Averett,
C. Ayerbe Gayoso,
X. Bai,
K. N. Barish,
N. Barnea,
G. Basar,
M. Battaglieri,
A. A. Baty,
I. Bautista
, et al. (378 additional authors not shown)
Abstract:
This White Paper presents the community inputs and scientific conclusions from the Hot and Cold QCD Town Meeting that took place September 23-25, 2022 at MIT, as part of the Nuclear Science Advisory Committee (NSAC) 2023 Long Range Planning process. A total of 424 physicists registered for the meeting. The meeting highlighted progress in Quantum Chromodynamics (QCD) nuclear physics since the 2015…
▽ More
This White Paper presents the community inputs and scientific conclusions from the Hot and Cold QCD Town Meeting that took place September 23-25, 2022 at MIT, as part of the Nuclear Science Advisory Committee (NSAC) 2023 Long Range Planning process. A total of 424 physicists registered for the meeting. The meeting highlighted progress in Quantum Chromodynamics (QCD) nuclear physics since the 2015 LRP (LRP15) and identified key questions and plausible paths to obtaining answers to those questions, defining priorities for our research over the coming decade. In defining the priority of outstanding physics opportunities for the future, both prospects for the short (~ 5 years) and longer term (5-10 years and beyond) are identified together with the facilities, personnel and other resources needed to maximize the discovery potential and maintain United States leadership in QCD physics worldwide. This White Paper is organized as follows: In the Executive Summary, we detail the Recommendations and Initiatives that were presented and discussed at the Town Meeting, and their supporting rationales. Section 2 highlights major progress and accomplishments of the past seven years. It is followed, in Section 3, by an overview of the physics opportunities for the immediate future, and in relation with the next QCD frontier: the EIC. Section 4 provides an overview of the physics motivations and goals associated with the EIC. Section 5 is devoted to the workforce development and support of diversity, equity and inclusion. This is followed by a dedicated section on computing in Section 6. Section 7 describes the national need for nuclear data science and the relevance to QCD research.
△ Less
Submitted 4 March, 2023;
originally announced March 2023.
-
Turning a CLIP Model into a Scene Text Detector
Authors:
Wenwen Yu,
Yuliang Liu,
Wei Hua,
Deqiang Jiang,
Bo Ren,
Xiang Bai
Abstract:
The recent large-scale Contrastive Language-Image Pretraining (CLIP) model has shown great potential in various downstream tasks via leveraging the pretrained vision and language knowledge. Scene text, which contains rich textual and visual information, has an inherent connection with a model like CLIP. Recently, pretraining approaches based on vision language models have made effective progresses…
▽ More
The recent large-scale Contrastive Language-Image Pretraining (CLIP) model has shown great potential in various downstream tasks via leveraging the pretrained vision and language knowledge. Scene text, which contains rich textual and visual information, has an inherent connection with a model like CLIP. Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection. In contrast to these works, this paper proposes a new method, termed TCM, focusing on Turning the CLIP Model directly for text detection without pretraining process. We demonstrate the advantages of the proposed TCM as follows: (1) The underlying principle of our framework can be applied to improve existing scene text detector. (2) It facilitates the few-shot training capability of existing methods, e.g., by using 10% of labeled data, we significantly improve the performance of the baseline method with an average of 22% in terms of the F-measure on 4 benchmarks. (3) By turning the CLIP model into existing scene text detection methods, we further achieve promising domain adaptation ability. The code will be publicly released at https://github.com/wenwenyu/TCM.
△ Less
Submitted 26 March, 2023; v1 submitted 28 February, 2023;
originally announced February 2023.
-
SynGen: A Syntactic Plug-and-play Module for Generative Aspect-based Sentiment Analysis
Authors:
Chengze Yu,
Taiqiang Wu,
Jiayi Li,
Xingyu Bai,
Yujiu Yang
Abstract:
Aspect-based Sentiment Analysis (ABSA) is a sentiment analysis task at fine-grained level. Recently, generative frameworks have attracted increasing attention in ABSA due to their ability to unify subtasks and their continuity to upstream pre-training tasks. However, these generative models suffer from the neighboring dependency problem that induces neighboring words to get higher attention. In th…
▽ More
Aspect-based Sentiment Analysis (ABSA) is a sentiment analysis task at fine-grained level. Recently, generative frameworks have attracted increasing attention in ABSA due to their ability to unify subtasks and their continuity to upstream pre-training tasks. However, these generative models suffer from the neighboring dependency problem that induces neighboring words to get higher attention. In this paper, we propose SynGen, a plug-and-play syntactic information aware module. As a plug-in module, our SynGen can be easily applied to any generative framework backbones. The key insight of our module is to add syntactic inductive bias to attention assignment and thus direct attention to the correct target words. To the best of our knowledge, we are the first one to introduce syntactic information to generative ABSA frameworks. Our module design is based on two main principles: (1) maintaining the structural integrity of backbone PLMs and (2) disentangling the added syntactic information and original semantic information. Empirical results on four popular ABSA datasets demonstrate that SynGen enhanced model achieves a comparable performance to the state-of-the-art model with relaxed labeling specification and less training consumption.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Side Adapter Network for Open-Vocabulary Semantic Segmentation
Authors:
Mengde Xu,
Zheng Zhang,
Fangyun Wei,
Han Hu,
Xiang Bai
Abstract:
This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named Side Adapter Network (SAN). Our approach models the semantic segmentation task as a region recognition problem. A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is app…
▽ More
This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named Side Adapter Network (SAN). Our approach models the semantic segmentation task as a region recognition problem. A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks. This decoupled design has the benefit CLIP in recognizing the class of mask proposals. Since the attached side network can reuse CLIP features, it can be very light. In addition, the entire network can be trained end-to-end, allowing the side network to be adapted to the frozen CLIP model, which makes the predicted mask proposals CLIP-aware. Our approach is fast, accurate, and only adds a few additional trainable parameters. We evaluate our approach on multiple semantic segmentation benchmarks. Our method significantly outperforms other counterparts, with up to 18 times fewer trainable parameters and 19 times faster inference speed. We hope our approach will serve as a solid baseline and help ease future research in open-vocabulary semantic segmentation. The code will be available at https://github.com/MendelXu/SAN.
△ Less
Submitted 22 March, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
Solar coronal magnetic field measurements using spectral lines available in Hinode/EIS observations: Strong and weak field techniques and temperature diagnostics
Authors:
Yajie Chen,
Xianyong Bai,
Hui Tian,
Wenxian Li,
Feng Chen,
Zihao Yang,
Yang Yang
Abstract:
Recently, it has been proposed that the magnetic-field-induced transition (MIT) in Fe X can be used to measure coronal magnetic field strengths. Several techniques, the direct line ratio technique and the weak and strong magnetic field techniques, are developed to apply the MIT theory to spectroscopic observations taken by EUV Imaging Spectrometer (EIS) onboard Hinode. However, the suitability of…
▽ More
Recently, it has been proposed that the magnetic-field-induced transition (MIT) in Fe X can be used to measure coronal magnetic field strengths. Several techniques, the direct line ratio technique and the weak and strong magnetic field techniques, are developed to apply the MIT theory to spectroscopic observations taken by EUV Imaging Spectrometer (EIS) onboard Hinode. However, the suitability of coronal magnetic field measurements based on the weak and strong magnetic field techniques has not been evaluated. Besides, temperature diagnostics is also important for measuring coronal magnetic field based on the MIT theory, but how to determine the accurate formation temperature of the Fe X lines from EIS observations still needs investigation. In this study, we synthesized emissions of several spectral lines from a 3D radiation magnetohydrodynamic model of a solar active region, and then derived magnetic field strengths using different methods. We first compared the magnetic field strengths derived from the weak and strong magnetic field techniques to the values in the model. Our study suggests that both weak and strong magnetic field techniques underestimate the coronal magnetic field strength. Then we developed two methods to calculate the formation temperature of the Fe X lines. One is based on differential emission measure analyses, and the other is deriving temperature from the Fe IX and Fe XI line pairs. However, neither of the two methods can provide temperature determination for accurate coronal magnetic field measurements as those derived from the Fe X 174/175 and 184/345 Å line ratios. More efforts are still needed for accurate coronal magnetic field measurements using EIS observations.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Active Learning in Brain Tumor Segmentation with Uncertainty Sampling, Annotation Redundancy Restriction, and Data Initialization
Authors:
Daniel D Kim,
Rajat S Chandra,
Jian Peng,
**g Wu,
Xue Feng,
Michael Atalay,
Chetan Bettegowda,
Craig Jones,
Haris Sair,
Wei-hua Liao,
Chengzhang Zhu,
Beiji Zou,
Li Yang,
Anahita Fathi Kazerooni,
Ali Nabavizadeh,
Harrison X Bai,
Zhicheng Jiao
Abstract:
Deep learning models have demonstrated great potential in medical 3D imaging, but their development is limited by the expensive, large volume of annotated data required. Active learning (AL) addresses this by training a model on a subset of the most informative data samples without compromising performance. We compared different AL strategies and propose a framework that minimizes the amount of da…
▽ More
Deep learning models have demonstrated great potential in medical 3D imaging, but their development is limited by the expensive, large volume of annotated data required. Active learning (AL) addresses this by training a model on a subset of the most informative data samples without compromising performance. We compared different AL strategies and propose a framework that minimizes the amount of data needed for state-of-the-art performance. 638 multi-institutional brain tumor MRI images were used to train a 3D U-net model and compare AL strategies. We investigated uncertainty sampling, annotation redundancy restriction, and initial dataset selection techniques. Uncertainty estimation techniques including Bayesian estimation with dropout, bootstrap**, and margins sampling were compared to random query. Strategies to avoid annotation redundancy by removing similar images within the to-be-annotated subset were considered as well. We determined the minimum amount of data necessary to achieve similar performance to the model trained on the full dataset (α = 0.1). A variance-based selection strategy using radiomics to identify the initial training dataset is also proposed. Bayesian approximation with dropout at training and testing showed similar results to that of the full data model with less than 20% of the training data (p=0.293) compared to random query achieving similar performance at 56.5% of the training data (p=0.814). Annotation redundancy restriction techniques achieved state-of-the-art performance at approximately 40%-50% of the training data. Radiomics dataset initialization had higher Dice with initial dataset sizes of 20 and 80 images, but improvements were not significant. In conclusion, we investigated various AL strategies with dropout uncertainty estimation achieving state-of-the-art performance with the least annotated data.
△ Less
Submitted 4 February, 2023;
originally announced February 2023.
-
Anatomical Invariance Modeling and Semantic Alignment for Self-supervised Learning in 3D Medical Image Analysis
Authors:
Yankai Jiang,
Mingze Sun,
Heng Guo,
Xiaoyu Bai,
Ke Yan,
Le Lu,
Minfeng Xu
Abstract:
Self-supervised learning (SSL) has recently achieved promising performance for 3D medical image analysis tasks. Most current methods follow existing SSL paradigm originally designed for photographic or natural images, which cannot explicitly and thoroughly exploit the intrinsic similar anatomical structures across varying medical images. This may in fact degrade the quality of learned deep represe…
▽ More
Self-supervised learning (SSL) has recently achieved promising performance for 3D medical image analysis tasks. Most current methods follow existing SSL paradigm originally designed for photographic or natural images, which cannot explicitly and thoroughly exploit the intrinsic similar anatomical structures across varying medical images. This may in fact degrade the quality of learned deep representations by maximizing the similarity among features containing spatial misalignment information and different anatomical semantics. In this work, we propose a new self-supervised learning framework, namely Alice, that explicitly fulfills Anatomical invariance modeling and semantic alignment via elaborately combining discriminative and generative objectives. Alice introduces a new contrastive learning strategy which encourages the similarity between views that are diversely mined but with consistent high-level semantics, in order to learn invariant anatomical features. Moreover, we design a conditional anatomical feature alignment module to complement corrupted embeddings with globally matched semantics and inter-patch topology information, conditioned by the distribution of local image content, which permits to create better contrastive pairs. Our extensive quantitative experiments on three 3D medical image analysis tasks demonstrate and validate the performance superiority of Alice, surpassing the previous best SSL counterpart methods and showing promising ability for united representation learning. Codes are available at https://github.com/alibaba-damo-academy/alice.
△ Less
Submitted 17 August, 2023; v1 submitted 11 February, 2023;
originally announced February 2023.
-
Limits on Neutrino Emission from GRB 221009A from MeV to PeV using the IceCube Neutrino Observatory
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
N. Aggarwal,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
G. Anton,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
V. Basu,
R. Bay,
J. J. Beatty,
K. -H. Becker,
J. Becker Tjus,
J. Beise
, et al. (362 additional authors not shown)
Abstract:
Gamma-ray bursts (GRBs) have long been considered a possible source of high-energy neutrinos. While no correlations have yet been detected between high-energy neutrinos and GRBs, the recent observation of GRB 221009A - the brightest GRB observed by Fermi-GBM to date and the first one to be observed above an energy of 10 TeV - provides a unique opportunity to test for hadronic emission. In this pap…
▽ More
Gamma-ray bursts (GRBs) have long been considered a possible source of high-energy neutrinos. While no correlations have yet been detected between high-energy neutrinos and GRBs, the recent observation of GRB 221009A - the brightest GRB observed by Fermi-GBM to date and the first one to be observed above an energy of 10 TeV - provides a unique opportunity to test for hadronic emission. In this paper, we leverage the wide energy range of the IceCube Neutrino Observatory to search for neutrinos from GRB 221009A. We find no significant deviation from background expectation across event samples ranging from MeV to PeV energies, placing stringent upper limits on the neutrino emission from this source.
△ Less
Submitted 29 March, 2023; v1 submitted 10 February, 2023;
originally announced February 2023.
-
Three-dimensional Global Simulations of Type-II Planet-disk Interaction with a Magnetized Disk Wind: I. Magnetic Flux Concentration and Gap Properties
Authors:
Yuhiko Aoyama,
Xuening Bai
Abstract:
Giant planets embedded in protoplanetary disks (PPDs) can create annulus density gaps around their orbits in the type-II regime, potentially responsible for the ubiquity of annular substructures observed in PPDs. Despite of substantial amount of works studying type-II planet migration and gap properties, they are almost exclusively conducted under the viscous accretion disk framework. However, rec…
▽ More
Giant planets embedded in protoplanetary disks (PPDs) can create annulus density gaps around their orbits in the type-II regime, potentially responsible for the ubiquity of annular substructures observed in PPDs. Despite of substantial amount of works studying type-II planet migration and gap properties, they are almost exclusively conducted under the viscous accretion disk framework. However, recent studies have established magnetized disk winds as the primary driving disk accretion and evolution, which can co-exist with turbulence from the magneto-rotational instability (MRI) in the outer PPDs. We conduct a series of 3D global non-ideal magneto-hydrodynamic (MHD) simulations of type-II planet-disk interaction applicable to the outer PPDs. Our simulations properly resolve the MRI turbulence and accommodate the MHD disk wind. We found that the planet triggers the poloidal magnetic flux concentration around its orbit. The concentrated magnetic flux strongly enhances angular momentum removal in the gap, which is along the inclined poloidal field through a strong outflow emanating from the disk surface outward of the planet gap. The resulting planet-induced gap shape is more similar to an inviscid disk, while being much deeper, which can be understood from a simple inhomogeneous wind torque prescription. The corotation region is characterized by a fast trans-sonic accretion flow that is asymmetric in azimuth about the planet and lacking the horseshoe turns, and the meridional flow is weakened. The torque acting on the planet generally drives inward migration, though the migration rate can be affected by the presence of neighboring gaps through stochastic, planet-free magnetic flux concentration.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
New Results from HAYSTAC's Phase II Operation with a Squeezed State Receiver
Authors:
HAYSTAC Collaboration,
M. J. Jewell,
A. F. Leder,
K. M. Backes,
Xiran Bai,
K. van Bibber,
B. M. Brubaker,
S. B. Cahn,
A. Droster,
Maryam H. Esmat,
Sumita Ghosh,
Eleanor Graham,
Gene C. Hilton,
H. Jackson,
Claire Laffan,
S. K. Lamoreaux,
K. W. Lehnert,
S. M. Lewis,
M. Malnou,
R. H. Maruyama,
D. A. Palken,
N. M. Rapidis,
E. P. Ruddy,
M. Simanovskaia,
Sukhman Singh
, et al. (4 additional authors not shown)
Abstract:
A search for dark matter axions with masses $>10 μeV/c^{2}$ has been performed using the HAYSTAC experiment's squeezed state receiver to achieve sub-quantum limited noise. This report includes details of the design and operation of the experiment previously used to search for axions in the mass ranges $16.96-17.12$ and $17.14-17.28 μeV/c^{2}$($4.100-4.140$GHz) and $4.145-4.178$GHz) as well as upgr…
▽ More
A search for dark matter axions with masses $>10 μeV/c^{2}$ has been performed using the HAYSTAC experiment's squeezed state receiver to achieve sub-quantum limited noise. This report includes details of the design and operation of the experiment previously used to search for axions in the mass ranges $16.96-17.12$ and $17.14-17.28 μeV/c^{2}$($4.100-4.140$GHz) and $4.145-4.178$GHz) as well as upgrades to facilitate an extended search at higher masses. These upgrades include improvements to the data acquisition routine which have reduced the effective dead time by a factor of 5, allowing for the new region to be scanned $\sim$1.6 times faster with comparable sensitivity. No statistically significant evidence of an axion signal is found in the range $18.44-18.71μeV/c^{2}$($4.459-4.523$GHz), leading to an aggregate upper limit exclusion at the $90\%$ level on the axion-photon coupling of $2.06\times g_γ^{KSVZ}$.
△ Less
Submitted 26 January, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.