Search | arXiv e-print repository

Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations

Authors: Nikolaos-Antonios Ypsilantis, Kaifeng Chen, Bingyi Cao, Mário Lipovský, Pelin Dogan-Schönberger, Grzegorz Makosa, Boris Bluntschli, Mojtaba Seyedhosseini, Ondřej Chum, André Araujo

Abstract: Fine-grained and instance-level recognition methods are commonly trained and evaluated on specific domains, in a model per domain scenario. Such an approach, however, is impractical in real large-scale applications. In this work, we address the problem of universal image embedding, where a single universal model is trained and used in multiple domains. First, we leverage existing domain-specific d… ▽ More Fine-grained and instance-level recognition methods are commonly trained and evaluated on specific domains, in a model per domain scenario. Such an approach, however, is impractical in real large-scale applications. In this work, we address the problem of universal image embedding, where a single universal model is trained and used in multiple domains. First, we leverage existing domain-specific datasets to carefully construct a new large-scale public benchmark for the evaluation of universal image embeddings, with 241k query images, 1.4M index images and 2.8M training images across 8 different domains and 349k classes. We define suitable metrics, training and evaluation protocols to foster future research in this area. Second, we provide a comprehensive experimental evaluation on the new dataset, demonstrating that existing approaches and simplistic extensions lead to worse performance than an assembly of models trained for each domain separately. Finally, we conducted a public research competition on this topic, leveraging industrial datasets, which attracted the participation of more than 1k teams worldwide. This exercise generated many interesting research ideas and findings which we present in detail. Project webpage: https://cmp.felk.cvut.cz/univ_emb/ △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: ICCV 2023 Accepted

arXiv:2309.01212 [pdf, other]

NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement

Authors: Wen Wang, Dongchao Yang, Qichen Ye, Bowen Cao, Yuexian Zou

Abstract: The goal of speech enhancement (SE) is to eliminate the background interference from the noisy speech signal. Generative models such as diffusion models (DM) have been applied to the task of SE because of better generalization in unseen noisy scenes. Technical routes for the DM-based SE methods can be summarized into three types: task-adapted diffusion process formulation, generator-plus-condition… ▽ More The goal of speech enhancement (SE) is to eliminate the background interference from the noisy speech signal. Generative models such as diffusion models (DM) have been applied to the task of SE because of better generalization in unseen noisy scenes. Technical routes for the DM-based SE methods can be summarized into three types: task-adapted diffusion process formulation, generator-plus-conditioner (GPC) structures and the multi-stage frameworks. We focus on the first two approaches, which are constructed under the GPC architecture and use the task-adapted diffusion process to better deal with the real noise. However, the performance of these SE models is limited by the following issues: (a) Non-Gaussian noise estimation in the task-adapted diffusion process. (b) Conditional domain bias caused by the weak conditioner design in the GPC structure. (c) Large amount of residual noise caused by unreasonable interpolation operations during inference. To solve the above problems, we propose a noise-aware diffusion-based SE model (NADiffuSE) to boost the SE performance, where the noise representation is extracted from the noisy speech signal and introduced as a global conditional information for estimating the non-Gaussian components. Furthermore, the anchor-based inference algorithm is employed to achieve a compromise between the speech distortion and noise residual. In order to mitigate the performance degradation caused by the conditional domain bias in the GPC framework, we investigate three model variants, all of which can be viewed as multi-stage SE based on the preprocessing networks for Mel spectrograms. Experimental results show that NADiffuSE outperforms other DM-based SE models under the GPC infrastructure. Audio samples are available at: https://square-of-w.github.io/NADiffuSE-demo/. △ Less

Submitted 3 September, 2023; originally announced September 2023.

arXiv:2308.15703 [pdf, other]

Fragment and Integrate Network (FIN): A Novel Spatial-Temporal Modeling Based on Long Sequential Behavior for Online Food Ordering Click-Through Rate Prediction

Authors: Jun Li, **gjian Wang, Hongwei Wang, Xing Deng, Jielong Chen, Bing Cao, Zekun Wang, Guanjie Xu, Ge Zhang, Feng Shi, Hualei Liu

Abstract: Spatial-temporal information has been proven to be of great significance for click-through rate prediction tasks in online Location-Based Services (LBS), especially in mainstream food ordering platforms such as DoorDash, Uber Eats, Meituan, and Ele.me. Modeling user spatial-temporal preferences with sequential behavior data has become a hot topic in recommendation systems and online advertising. H… ▽ More Spatial-temporal information has been proven to be of great significance for click-through rate prediction tasks in online Location-Based Services (LBS), especially in mainstream food ordering platforms such as DoorDash, Uber Eats, Meituan, and Ele.me. Modeling user spatial-temporal preferences with sequential behavior data has become a hot topic in recommendation systems and online advertising. However, most of existing methods either lack the representation of rich spatial-temporal information or only handle user behaviors with limited length, e.g. 100. In this paper, we tackle these problems by designing a new spatial-temporal modeling paradigm named Fragment and Integrate Network (FIN). FIN consists of two networks: (i) Fragment Network (FN) extracts Multiple Sub-Sequences (MSS) from lifelong sequential behavior data, and captures the specific spatial-temporal representation by modeling each MSS respectively. Here both a simplified attention and a complicated attention are adopted to balance the performance gain and resource consumption. (ii) Integrate Network (IN) builds a new integrated sequence by utilizing spatial-temporal interaction on MSS and captures the comprehensive spatial-temporal representation by modeling the integrated sequence with a complicated attention. Both public datasets and production datasets have demonstrated the accuracy and scalability of FIN. Since 2022, FIN has been fully deployed in the recommendation advertising system of Ele.me, one of the most popular online food ordering platforms in China, obtaining 5.7% improvement on Click-Through Rate (CTR) and 7.3% increase on Revenue Per Mille (RPM). △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: Accepted by CIKM 2023 Applied Research Paper

arXiv:2308.13057 [pdf, other]

Data-Side Efficiencies for Lightweight Convolutional Neural Networks

Authors: Bryan Bo Cao, Lawrence O'Gorman, Michael Coss, Shubham Jain

Abstract: We examine how the choice of data-side attributes for two important visual tasks of image classification and object detection can aid in the choice or design of lightweight convolutional neural networks. We show by experimentation how four data attributes - number of classes, object color, image resolution, and object scale affect neural network model size and efficiency. Intra- and inter-class si… ▽ More We examine how the choice of data-side attributes for two important visual tasks of image classification and object detection can aid in the choice or design of lightweight convolutional neural networks. We show by experimentation how four data attributes - number of classes, object color, image resolution, and object scale affect neural network model size and efficiency. Intra- and inter-class similarity metrics, based on metric learning, are defined to guide the evaluation of these attributes toward achieving lightweight models. Evaluations made using these metrics are shown to require 30x less computation than running full inference tests. We provide, as an example, applying the metrics and methods to choose a lightweight model for a robot path planning application and achieve computation reduction of 66% and accuracy gain of 3.5% over the pre-method model. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: 10 pages, 5 figures, 6 tables

arXiv:2308.06954 [pdf, other]

Global Features are All You Need for Image Retrieval and Reranking

Authors: Shihao Shao, Kaifeng Chen, Arjun Karpur, Qinghua Cui, Andre Araujo, Bingyi Cao

Abstract: Image retrieval systems conventionally use a two-stage paradigm, leveraging global features for initial retrieval and local features for reranking. However, the scalability of this method is often limited due to the significant storage and computation cost incurred by local feature matching in the reranking stage. In this paper, we present SuperGlobal, a novel approach that exclusively employs glo… ▽ More Image retrieval systems conventionally use a two-stage paradigm, leveraging global features for initial retrieval and local features for reranking. However, the scalability of this method is often limited due to the significant storage and computation cost incurred by local feature matching in the reranking stage. In this paper, we present SuperGlobal, a novel approach that exclusively employs global features for both stages, improving efficiency without sacrificing accuracy. SuperGlobal introduces key enhancements to the retrieval system, specifically focusing on the global feature extraction and reranking processes. For extraction, we identify sub-optimal performance when the widely-used ArcFace loss and Generalized Mean (GeM) pooling methods are combined and propose several new modules to improve GeM pooling. In the reranking stage, we introduce a novel method to update the global features of the query and top-ranked images by only considering feature refinement with a small set of images, thus being very compute and memory efficient. Our experiments demonstrate substantial improvements compared to the state of the art in standard benchmarks. Notably, on the Revisited Oxford+1M Hard dataset, our single-stage results improve by 7.1%, while our two-stage gain reaches 3.7% with a strong 64,865x speedup. Our two-stage system surpasses the current single-stage state-of-the-art by 16.3%, offering a scalable, accurate alternative for high-performing image retrieval systems with minimal time overhead. Code: https://github.com/ShihaoShao-GH/SuperGlobal. △ Less

Submitted 19 August, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: ICCV23 camera-ready + appendix

arXiv:2308.04792 [pdf, ps, other]

NNPP: A Learning-Based Heuristic Model for Accelerating Optimal Path Planning on Uneven Terrain

Authors: Yiming Ji, Yang Liu, Guanghu Xie, Boyu Ma, Zongwu Xie, Baoshi Cao

Abstract: Intelligent autonomous path planning is essential for enhancing the exploration efficiency of mobile robots operating in uneven terrains like planetary surfaces and off-road environments.In this paper, we propose the NNPP model for computing the heuristic region, enabling foundation algorithms like Astar to find the optimal path solely within this reduced search space, effectively decreasing the s… ▽ More Intelligent autonomous path planning is essential for enhancing the exploration efficiency of mobile robots operating in uneven terrains like planetary surfaces and off-road environments.In this paper, we propose the NNPP model for computing the heuristic region, enabling foundation algorithms like Astar to find the optimal path solely within this reduced search space, effectively decreasing the search time. The NNPP model learns semantic information about start and goal locations, as well as map representations, from numerous pre-annotated optimal path demonstrations, and produces a probabilistic distribution over each pixel representing the likelihood of it belonging to an optimal path on the map. More specifically, the paper computes the traversal cost for each grid cell from the slope, roughness and elevation difference obtained from the digital elevation model. Subsequently, the start and goal locations are encoded using a Gaussian distribution and different location encoding parameters are analyzed for their effect on model performance. After training, the NNPP model is able to \textcolor{revision}{accelerate} path planning on novel maps. △ Less

Submitted 20 June, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

arXiv:2308.03591 [pdf, other]

On Data-Driven Modeling and Control in Modern Power Grids Stability: Survey and Perspective

Authors: Xun Gong, Xiaozhe Wang, Bo Cao

Abstract: Modern power grids are fast evolving with the increasing volatile renewable generation, distributed energy resources (DERs) and time-varying operating conditions. The DERs include rooftop photovoltaic (PV), small wind turbines, energy storages, flexible loads, electric vehicles (EVs), etc. The grid control is confronted with low inertia, uncertainty and nonlinearity that challenge the operation se… ▽ More Modern power grids are fast evolving with the increasing volatile renewable generation, distributed energy resources (DERs) and time-varying operating conditions. The DERs include rooftop photovoltaic (PV), small wind turbines, energy storages, flexible loads, electric vehicles (EVs), etc. The grid control is confronted with low inertia, uncertainty and nonlinearity that challenge the operation security, efficacy and efficiency. The ongoing digitization of power grids provides opportunities to address the challenges with data-driven and control. This paper provides a comprehensive review of emerging data-driven dynamical modeling and control methods and their various applications in power grid. Future trends are also discussed based on advances in data-driven control. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: To appear in Applied Energy

arXiv:2306.10265 [pdf, other]

NBMOD: Find It and Grasp It in Noisy Background

Authors: Boyuan Cao, Xinyu Zhou, Congmin Guo, Baohua Zhang, Yuchen Liu, Qianqiu Tan

Abstract: Gras** objects is a fundamental yet important capability of robots, and many tasks such as sorting and picking rely on this skill. The prerequisite for stable gras** is the ability to correctly identify suitable gras** positions. However, finding appropriate gras** points is challenging due to the diverse shapes, varying density distributions, and significant differences between the baryce… ▽ More Gras** objects is a fundamental yet important capability of robots, and many tasks such as sorting and picking rely on this skill. The prerequisite for stable gras** is the ability to correctly identify suitable gras** positions. However, finding appropriate gras** points is challenging due to the diverse shapes, varying density distributions, and significant differences between the barycenter of various objects. In the past few years, researchers have proposed many methods to address the above-mentioned issues and achieved very good results on publicly available datasets such as the Cornell dataset and the Jacquard dataset. The problem is that the backgrounds of Cornell and Jacquard datasets are relatively simple - typically just a whiteboard, while in real-world operational environments, the background could be complex and noisy. Moreover, in real-world scenarios, robots usually only need to grasp fixed types of objects. To address the aforementioned issues, we proposed a large-scale grasp detection dataset called NBMOD: Noisy Background Multi-Object Dataset for grasp detection, which consists of 31,500 RGB-D images of 20 different types of fruits. Accurate prediction of angles has always been a challenging problem in the detection task of oriented bounding boxes. This paper presents a Rotation Anchor Mechanism (RAM) to address this issue. Considering the high real-time requirement of robotic systems, we propose a series of lightweight architectures called RA-GraspNet (GraspNet with Rotation Anchor): RARA (network with Rotation Anchor and Region Attention), RAST (network with Rotation Anchor and Semi Transformer), and RAGT (network with Rotation Anchor and Global Transformer) to tackle this problem. Among them, the RAGT-3/3 model achieves an accuracy of 99% on the NBMOD dataset. The NBMOD and our code are available at https://github.com/kmittle/Grasp-Detection-NBMOD. △ Less

Submitted 17 June, 2023; originally announced June 2023.

arXiv:2306.08911 [pdf, other]

doi 10.1063/5.0189262

Two-Temperature Principle for Electrothermal Performance Evaluation of GaN HEMTs

Authors: Yang Shen, Bing-Yang Cao

Abstract: We present a comprehensive investigation of self-heating in gallium nitride (GaN) high-electron-mobility transistors (HEMTs) through technology computer-aided design (TCAD) simulations and phonon Monte Carlo (MC) simulations. With microscopic phonon-based electrothermal simulations, we scrutinize both the temperature profiles and electrothermal coupling effect within GaN HEMTs. Two metrics, maximu… ▽ More We present a comprehensive investigation of self-heating in gallium nitride (GaN) high-electron-mobility transistors (HEMTs) through technology computer-aided design (TCAD) simulations and phonon Monte Carlo (MC) simulations. With microscopic phonon-based electrothermal simulations, we scrutinize both the temperature profiles and electrothermal coupling effect within GaN HEMTs. Two metrics, maximum channel temperature ($T_\text{max}$) and equivalent channel temperature ($T_\text{eq}$), are introduced to measure the reliability and electrical performance degradation of the device, respectively. The influence of bias-dependent heat generation and phonon ballistic transport on the two indicators is thoroughly examined. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.05301 [pdf, other]

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

Authors: Qiaoyu Tang, Ziliang Deng, Hongyu Lin, Xianpei Han, Qiao Liang, Boxi Cao, Le Sun

Abstract: Enabling large language models to utilize real-world tools effectively is crucial for achieving embodied intelligence. Existing approaches to tool learning have either primarily relied on extremely large language models, such as GPT-4, to attain generalized tool-use abilities in a zero-shot manner, or utilized supervised learning to train limited scopes of tools on compact models. However, it rema… ▽ More Enabling large language models to utilize real-world tools effectively is crucial for achieving embodied intelligence. Existing approaches to tool learning have either primarily relied on extremely large language models, such as GPT-4, to attain generalized tool-use abilities in a zero-shot manner, or utilized supervised learning to train limited scopes of tools on compact models. However, it remains uncertain whether smaller language models can achieve generalized tool-use abilities without tool-specific training. To address this question, this paper introduces ToolAlpaca, a novel framework designed to automatically generate a diverse tool-use corpus and learn generalized tool-use abilities on compact language models with minimal human intervention. Specifically, ToolAlpaca first automatically creates a highly diversified tool-use corpus by building a multi-agent simulation environment. The corpus contains 3938 tool-use instances from more than 400 real-world tool APIs spanning 50 distinct categories. Subsequently, the constructed corpus is employed to fine-tune compact language models, resulting in two models, namely ToolAlpaca-7B and ToolAlpaca-13B, respectively. Finally, we evaluate the ability of these models to utilize previously unseen tools without specific training. Experimental results demonstrate that ToolAlpaca achieves effective generalized tool-use capabilities comparable to those of extremely large language models like GPT-3.5, demonstrating that learning generalized tool-use ability is feasible for compact language models. △ Less

Submitted 7 September, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2306.03417 [pdf, other]

Optical pum** of electronic quantum Hall states with vortex light

Authors: Deric Session, Mahmoud Jalali Mehrabad, Nikil Paithankar, Tobias Grass, Christian J. Eckhardt, Bin Cao, Daniel Gustavo Suárez Forero, Kevin Li, Mohammad S. Alam, Kenji Watanabe, Takashi Taniguchi, Glenn S. Solomon, Nathan Schine, Jay Sau, Roman Sordan, Mohammad Hafezi

Abstract: A fundamental requirement for quantum technologies is the ability to coherently control the interaction between electrons and photons. However, in many scenarios involving the interaction between light and matter, the exchange of linear or angular momentum between electrons and photons is not feasible, a condition known as the dipole-approximation limit. An example of a case beyond this limit that… ▽ More A fundamental requirement for quantum technologies is the ability to coherently control the interaction between electrons and photons. However, in many scenarios involving the interaction between light and matter, the exchange of linear or angular momentum between electrons and photons is not feasible, a condition known as the dipole-approximation limit. An example of a case beyond this limit that has remained experimentally elusive is when the interplay between chiral electrons and vortex light is considered, where the orbital angular momentum of light can be transferred to electrons. Here, we present a novel mechanism for such an orbital angular momentum transfer from optical vortex beams to electronic quantum Hall states. Specifically, we identify a robust contribution to the radial photocurrent, in an annular graphene sample within the quantum Hall regime, that depends on the vorticity of light. This phenomenon can be interpreted as an optical pum** scheme, where the angular momentum of photons is transferred to electrons, generating a radial current, and the current direction is determined by the vorticity of the light. Our findings offer fundamental insights into the optical probing and manipulation of quantum coherence, with wide-ranging implications for advancing quantum coherent optoelectronics. △ Less

Submitted 27 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2305.11038 [pdf, other]

Learning In-context Learning for Named Entity Recognition

Authors: Jiawei Chen, Yaojie Lu, Hongyu Lin, Jie Lou, Wei Jia, Dai Dai, Hua Wu, Boxi Cao, Xianpei Han, Le Sun

Abstract: Named entity recognition in real-world applications suffers from the diversity of entity types, the emergence of new entity types, and the lack of high-quality annotations. To address the above problems, this paper proposes an in-context learning-based NER approach, which can effectively inject in-context NER ability into PLMs and recognize entities of novel types on-the-fly using only a few demon… ▽ More Named entity recognition in real-world applications suffers from the diversity of entity types, the emergence of new entity types, and the lack of high-quality annotations. To address the above problems, this paper proposes an in-context learning-based NER approach, which can effectively inject in-context NER ability into PLMs and recognize entities of novel types on-the-fly using only a few demonstrative instances. Specifically, we model PLMs as a meta-function $\mathcal{ λ_ {\text{instruction, demonstrations, text}}. M}$, and a new entity extractor can be implicitly constructed by applying new instruction and demonstrations to PLMs, i.e., $\mathcal{ (λ. M) }$(instruction, demonstrations) $\to$ $\mathcal{F}$ where $\mathcal{F}$ will be a new entity extractor, i.e., $\mathcal{F}$: text $\to$ entities. To inject the above in-context NER ability into PLMs, we propose a meta-function pre-training algorithm, which pre-trains PLMs by comparing the (instruction, demonstration)-initialized extractor with a surrogate golden extractor. Experimental results on 4 few-shot NER datasets show that our method can effectively inject in-context NER ability into PLMs and significantly outperforms the PLMs+fine-tuning counterparts. △ Less

Submitted 26 May, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023 Main Conference

arXiv:2305.09144 [pdf, other]

Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

Authors: Boxi Cao, Qiaoyu Tang, Hongyu Lin, Shanshan Jiang, Bin Dong, Xianpei Han, Jiawei Chen, Tianshu Wang, Le Sun

Abstract: Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forg… ▽ More Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models. △ Less

Submitted 13 March, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: Accepted by LREC-COLING 2024

arXiv:2305.06154 [pdf, other]

Alleviating Over-smoothing for Unsupervised Sentence Representation

Authors: Nuo Chen, Linjun Shou, Ming Gong, Jian Pei, Bowen Cao, Jianhui Chang, Daxin Jiang, Jia Li

Abstract: Currently, learning better unsupervised sentence representations is the pursuit of many natural language processing communities. Lots of approaches based on pre-trained language models (PLMs) and contrastive learning have achieved promising results on this task. Experimentally, we observe that the over-smoothing problem reduces the capacity of these powerful PLMs, leading to sub-optimal sentence r… ▽ More Currently, learning better unsupervised sentence representations is the pursuit of many natural language processing communities. Lots of approaches based on pre-trained language models (PLMs) and contrastive learning have achieved promising results on this task. Experimentally, we observe that the over-smoothing problem reduces the capacity of these powerful PLMs, leading to sub-optimal sentence representations. In this paper, we present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue, which samples negatives from PLMs intermediate layers, improving the quality of the sentence representation. Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting, which can be seen as a plug-and-play contrastive framework for learning unsupervised sentence representation. Extensive results prove that SSCL brings the superior performance improvements of different strong baselines (e.g., BERT and SimCSE) on Semantic Textual Similarity and Transfer datasets. Our codes are available at https://github.com/nuochenpku/SSCL. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 13 pages

Journal ref: ACL 2023

arXiv:2305.05936 [pdf, other]

Multi-hop Commonsense Knowledge Injection Framework for Zero-Shot Commonsense Question Answering

Authors: Xin Guan, Biwei Cao, Qingqing Gao, Zheng Yin, Bo Liu, Jiuxin Cao

Abstract: Commonsense question answering (QA) research requires machines to answer questions based on commonsense knowledge. However, this research requires expensive labor costs to annotate data as the basis of research, and models that rely on fine-tuning paradigms only apply to specific tasks, rather than learn a general commonsense reasoning ability. As a more robust method, zero-shot commonsense questi… ▽ More Commonsense question answering (QA) research requires machines to answer questions based on commonsense knowledge. However, this research requires expensive labor costs to annotate data as the basis of research, and models that rely on fine-tuning paradigms only apply to specific tasks, rather than learn a general commonsense reasoning ability. As a more robust method, zero-shot commonsense question answering shows a good prospect. The current zero-shot framework tries to convert triples in commonsense knowledge graphs (KGs) into QA-form samples as the pre-trained data source to incorporate commonsense knowledge into the model. However, this method ignores the multi-hop relationship in the KG, which is also an important central problem in commonsense reasoning. In this paper, we propose a novel multi-hop commonsense knowledge injection framework. Specifically, it explores multi-hop reasoning paradigm in KGs that conform to linguistic logic, and we further propose two multi-hop QA generation methods based on KGs. Then, we utilize contrastive learning to pre-train the model with the synthetic QA dataset to inject multi-hop commonsense knowledge. Extensive experiments on five commonsense question answering benchmarks demonstrate that our framework achieves state-of-art performance. △ Less

Submitted 10 May, 2023; originally announced May 2023.

arXiv:2304.04529 [pdf, other]

FAN: Fatigue-Aware Network for Click-Through Rate Prediction in E-commerce Recommendation

Authors: Ming Li, Naiyin Liu, Xiaofeng Pan, Yang Huang, Ningning Li, Yingmin Su, Chengjun Mao, Bo Cao

Abstract: Since clicks usually contain heavy noise, increasing research efforts have been devoted to modeling implicit negative user behaviors (i.e., non-clicks). However, they either rely on explicit negative user behaviors (e.g., dislikes) or simply treat non-clicks as negative feedback, failing to learn negative user interests comprehensively. In such situations, users may experience fatigue because of s… ▽ More Since clicks usually contain heavy noise, increasing research efforts have been devoted to modeling implicit negative user behaviors (i.e., non-clicks). However, they either rely on explicit negative user behaviors (e.g., dislikes) or simply treat non-clicks as negative feedback, failing to learn negative user interests comprehensively. In such situations, users may experience fatigue because of seeing too many similar recommendations. In this paper, we propose Fatigue-Aware Network (FAN), a novel CTR model that directly perceives user fatigue from non-clicks. Specifically, we first apply Fourier Transformation to the time series generated from non-clicks, obtaining its frequency spectrum which contains comprehensive information about user fatigue. Then the frequency spectrum is modulated by category information of the target item to model the bias that both the upper bound of fatigue and users' patience is different for different categories. Moreover, a gating network is adopted to model the confidence of user fatigue and an auxiliary task is designed to guide the learning of user fatigue, so we can obtain a well-learned fatigue representation and combine it with user interests for the final CTR prediction. Experimental results on real-world datasets validate the superiority of FAN and online A/B tests also show FAN outperforms representative CTR models significantly. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2303.18049 [pdf, other]

No Place to Hide: Dual Deep Interaction Channel Network for Fake News Detection based on Data Augmentation

Authors: Biwei Cao, Lulu Hua, Jiuxin Cao, Jie Gui, Bo Liu, James Tin-Yau Kwok

Abstract: Online Social Network (OSN) has become a hotbed of fake news due to the low cost of information dissemination. Although the existing methods have made many attempts in news content and propagation structure, the detection of fake news is still facing two challenges: one is how to mine the unique key features and evolution patterns, and the other is how to tackle the problem of small samples to bui… ▽ More Online Social Network (OSN) has become a hotbed of fake news due to the low cost of information dissemination. Although the existing methods have made many attempts in news content and propagation structure, the detection of fake news is still facing two challenges: one is how to mine the unique key features and evolution patterns, and the other is how to tackle the problem of small samples to build the high-performance model. Different from popular methods which take full advantage of the propagation topology structure, in this paper, we propose a novel framework for fake news detection from perspectives of semantic, emotion and data enhancement, which excavates the emotional evolution patterns of news participants during the propagation process, and a dual deep interaction channel network of semantic and emotion is designed to obtain a more comprehensive and fine-grained news representation with the consideration of comments. Meanwhile, the framework introduces a data enhancement module to obtain more labeled data with high quality based on confidence which further improves the performance of the classification model. Experiments show that the proposed approach outperforms the state-of-the-art methods. △ Less

Submitted 31 March, 2023; originally announced March 2023.

arXiv:2303.07616 [pdf, other]

doi 10.1007/s11633-023-1416-x

The Life Cycle of Knowledge in Big Language Models: A Survey

Authors: Boxi Cao, Hongyu Lin, Xianpei Han, Le Sun

Abstract: Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the lear… ▽ More Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: paperlist: https://github.com/c-box/KnowledgeLifecycle

Journal ref: Machine Intelligence Research. vol. 21, no. 2, pp. 217-238, 2024

arXiv:2302.14717 [pdf, other]

doi 10.1145/3544548.3581405

Bridging the Generational Gap: Exploring How Virtual Reality Supports Remote Communication Between Grandparents and Grandchildren

Authors: Xiaoying Wei, Yizheng Gu, Emily Kuang, Xian Wang, Beiyan Cao, Xiaofu **, Mingming Fan

Abstract: When living apart, grandparents and grandchildren often use audio-visual communication approaches to stay connected. However, these approaches seldom provide sufficient companionship and intimacy due to a lack of co-presence and spatial interaction, which can be fulfilled by immersive virtual reality (VR). To understand how grandparents and grandchildren might leverage VR to facilitate their remot… ▽ More When living apart, grandparents and grandchildren often use audio-visual communication approaches to stay connected. However, these approaches seldom provide sufficient companionship and intimacy due to a lack of co-presence and spatial interaction, which can be fulfilled by immersive virtual reality (VR). To understand how grandparents and grandchildren might leverage VR to facilitate their remote communication and better inform future design, we conducted a user-centered participatory design study with twelve pairs of grandparents and grandchildren. Results show that VR affords casual and equal communication by reducing the generational gap, and promotes conversation by offering shared activities as bridges for connection. Participants preferred resemblant appearances on avatars for conveying well-being but created ideal selves for gaining playfulness. Based on the results, we contribute eight design implications that inform future VR-based grandparent-grandchild communications. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023, Hamburg, Germany

arXiv:2302.14682 [pdf, other]

doi 10.1145/3544548.3581053

Sparkling Silence: Practices and Challenges of Livestreaming Among Deaf or Hard of Hearing Streamers

Authors: Beiyan Cao, Changyang He, Muzhi Zhou, Mingming Fan

Abstract: Understanding livestream platforms' accessibility challenges for minority groups, such as people with disabilities, is critical to increasing the diversity and inclusion of those platforms. While prior work investigated the experiences of streamers with vision or motor loss, little is known about the experiences of deaf or hard of hearing (DHH) streamers who must work with livestreaming platforms… ▽ More Understanding livestream platforms' accessibility challenges for minority groups, such as people with disabilities, is critical to increasing the diversity and inclusion of those platforms. While prior work investigated the experiences of streamers with vision or motor loss, little is known about the experiences of deaf or hard of hearing (DHH) streamers who must work with livestreaming platforms that heavily depend on audio. We conducted semi-structured interviews with DHH streamers to learn why they livestream, how they navigate livestream platforms and related challenges. Our findings revealed their desire to break the stereotypes towards the DHH groups via livestream and the intense interplay between interaction methods, such as sign language, texts, lip language, background music, and viewer characteristics. Major accessibility challenges include the lack of real-time captioning, the small sign language reading window, and misinterpretation of sign language. We present design considerations for improving the accessibility of the livestream platforms. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023, Hamburg, Germany

arXiv:2302.13975 [pdf]

Observation of ballistic-diffusive thermal transport in GaN transistors using thermoreflectance thermal imaging

Authors: Zhi-Ke Liu, Yang Shen, Han-Ling Li, Bing-Yang Cao

Abstract: To develop effective thermal management strategies for GaN transistors, it is essential to accurately predict the device junction temperature. Since the width of the heat generation in the devices is comparable to phonon mean free paths of GaN, phonon ballistic transport exists and can significantly affect the heat transport process, which necessitates a thorough understanding of the influence of… ▽ More To develop effective thermal management strategies for GaN transistors, it is essential to accurately predict the device junction temperature. Since the width of the heat generation in the devices is comparable to phonon mean free paths of GaN, phonon ballistic transport exists and can significantly affect the heat transport process, which necessitates a thorough understanding of the influence of the phonon ballistic effects in GaN transistors. In this paper, the ballistic-diffusive phonon transport in GaN-on-SiC devices is examined by measuring the hotspot temperature using the thermoreflectance thermal imaging TTI combined with the hybrid phonon Monte Carlo-diffusion simulations. A series of Au heaters are fabricated on the top of the GaN layer to quantitatively mimic the different heat source distributions during device operation. The experimental and simulation results show a good consistency and both indicate that the phonon ballistic effects can significantly increase the hotspot temperature. With the size of the heat source decreasing, the errors of Fourier's law-based predictions increase, which emphasizes the necessity to carefully consider the phonon ballistic transport in device thermal simulations. △ Less

Submitted 16 February, 2023; originally announced February 2023.

arXiv:2302.11814 [pdf, other]

FTM: A Frame-level Timeline Modeling Method for Temporal Graph Representation Learning

Authors: Bowen Cao, Qichen Ye, Weiyuan Xu, Yuexian Zou

Abstract: Learning representations for graph-structured data is essential for graph analytical tasks. While remarkable progress has been made on static graphs, researches on temporal graphs are still in its beginning stage. The bottleneck of the temporal graph representation learning approach is the neighborhood aggregation strategy, based on which graph attributes share and gather information explicitly. E… ▽ More Learning representations for graph-structured data is essential for graph analytical tasks. While remarkable progress has been made on static graphs, researches on temporal graphs are still in its beginning stage. The bottleneck of the temporal graph representation learning approach is the neighborhood aggregation strategy, based on which graph attributes share and gather information explicitly. Existing neighborhood aggregation strategies fail to capture either the short-term features or the long-term features of temporal graph attributes, leading to unsatisfactory model performance and even poor robustness and domain generality of the representation learning method. To address this problem, we propose a Frame-level Timeline Modeling (FTM) method that helps to capture both short-term and long-term features and thus learns more informative representations on temporal graphs. In particular, we present a novel link-based framing technique to preserve the short-term features and then incorporate a timeline aggregator module to capture the intrinsic dynamics of graph evolution as long-term features. Our method can be easily assembled with most temporal GNNs. Extensive experiments on common datasets show that our method brings great improvements to the capability, robustness, and domain generality of backbone methods in downstream tasks. Our code can be found at https://github.com/yeeeqichen/FTM. △ Less

Submitted 15 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted in AAAI 2023, oral

arXiv:2302.11799 [pdf, other]

FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering

Authors: Qichen Ye, Bowen Cao, Nuo Chen, Weiyuan Xu, Yuexian Zou

Abstract: Knowledge-aware question answering (KAQA) requires the model to answer questions over a knowledge base, which is essential for both open-domain QA and domain-specific QA, especially when language models alone cannot provide all the knowledge needed. Despite the promising result of recent KAQA systems which tend to integrate linguistic knowledge from pre-trained language models (PLM) and factual kn… ▽ More Knowledge-aware question answering (KAQA) requires the model to answer questions over a knowledge base, which is essential for both open-domain QA and domain-specific QA, especially when language models alone cannot provide all the knowledge needed. Despite the promising result of recent KAQA systems which tend to integrate linguistic knowledge from pre-trained language models (PLM) and factual knowledge from knowledge graphs (KG) to answer complex questions, a bottleneck exists in effectively fusing the representations from PLMs and KGs because of (i) the semantic and distributional gaps between them, and (ii) the difficulties in joint reasoning over the provided knowledge from both modalities. To address the above two problems, we propose a Fine-grained Two-stage training framework (FiTs) to boost the KAQA system performance: The first stage aims at aligning representations from the PLM and the KG, thus bridging the modality gaps between them, named knowledge adaptive post-training. The second stage, called knowledge-aware fine-tuning, aims to improve the model's joint reasoning ability based on the aligned representations. In detail, we fine-tune the post-trained model via two auxiliary self-supervised tasks in addition to the QA supervision. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on three benchmarks in the commonsense reasoning (i.e., CommonsenseQA, OpenbookQA) and medical question answering (i.e., MedQA-USMILE) domains. △ Less

Submitted 15 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted in AAAI 2023, oral

arXiv:2302.05596 [pdf]

doi 10.1103/PhysRevB.107.224414

Multi-k magnetic structure and large anomalous Hall effect in candidate magnetic Weyl semimetal NdAlGe

Authors: C. Dhital, R. L. Dally, R. Ruvalcaba, R. Gonzalez-Hernandez, J. Guerrero-Sanchez, H. B. Cao, Q. Zhang, W. Tian, Y. Wu, M. D. Frontzek, S. K. Karna, A. Meads, B. Wilson, R. Chapai, D. Graf, J. Bacsa, R. **, J. F. DiTusa

Abstract: The magnetic structure, magnetoresistance, and Hall effect of non-centrosymmetric magnetic semimetal NdAlGe are investigated revealing an unusual magnetic state and anomalous transport properties that are associated with the electronic structure of this non-centrosymmetric compound. The magnetization and magnetoresistance measurements are both highly anisotropic and indicate an Ising-like magnetic… ▽ More The magnetic structure, magnetoresistance, and Hall effect of non-centrosymmetric magnetic semimetal NdAlGe are investigated revealing an unusual magnetic state and anomalous transport properties that are associated with the electronic structure of this non-centrosymmetric compound. The magnetization and magnetoresistance measurements are both highly anisotropic and indicate an Ising-like magnetic system. The magnetic structure is complex in that it involves three magnetic ordering vectors including an incommensurate spin density wave and commensurate ferrimagnetic state in zero field. We have discovered a large anomalous Hall conductivity that reaches = 430 Ω-1cm-1 implying that it originates from an intrinsic Berry curvature effect stemming from Weyl nodes found in the electronic structure. These electronic structure calculations indicate the presence of nested Fermi surface pockets with nesting wave vectors similar to the measured magnetic ordering wavevector and the presence of Weyl nodes in proximity to the Fermi surface. We associate the incommensurate magnetic structure with the large anomalous Hall response to be the result of the combination of Fermi surface nesting and the Berry curvature associated with Weyl nodes. △ Less

Submitted 29 June, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

Comments: 46 pages, 12 figures

Journal ref: PRB 2023

arXiv:2302.03223 [pdf, other]

A Tightly Coupled Bi-Level Coordination Framework for CAVs at Road Intersections

Authors: Donglin Li, Tingting Zhang, Ji** Luo, Tianhao Liang, Bin Cao, Xuanli Wu, Qinyu Zhang

Abstract: Since the traffic administration at road intersections determines the capacity bottleneck of modern transportation systems, intelligent cooperative coordination for connected autonomous vehicles (CAVs) has shown to be an effective solution. In this paper, we try to formulate a Bi-Level CAV intersection coordination framework, where coordinators from High and Low levels are tightly coupled. In the… ▽ More Since the traffic administration at road intersections determines the capacity bottleneck of modern transportation systems, intelligent cooperative coordination for connected autonomous vehicles (CAVs) has shown to be an effective solution. In this paper, we try to formulate a Bi-Level CAV intersection coordination framework, where coordinators from High and Low levels are tightly coupled. In the High-Level coordinator where vehicles from multiple roads are involved, we take various metrics including throughput, safety, fairness and comfort into consideration. Motivated by the time consuming space-time resource allocation framework in [1], we try to give a low complexity solution by transforming the complicated original problem into a sequential linear programming one. Based on the "feasible tunnels" (FT) generated from the High-Level coordinator, we then propose a rapid gradient-based trajectory optimization strategy in the Low-Level planner, to effectively avoid collisions beyond High-level considerations, such as the pedestrian or bicycles. Simulation results and laboratory experiments show that our proposed method outperforms existing strategies. Moreover, the most impressive advantage is that the proposed strategy can plan vehicle trajectory in milliseconds, which is promising in realworld deployments. A detailed description include the coordination framework and experiment demo could be found at the supplement materials, or online at https://youtu.be/MuhjhKfNIOg. △ Less

Submitted 13 November, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

arXiv:2302.01392 [pdf, other]

Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion

Authors: Yiming Sun, Bing Cao, Pengfei Zhu, Qinghua Hu

Abstract: Infrared and visible image fusion aims to integrate comprehensive information from multiple sources to achieve superior performances on various practical tasks, such as detection, over that of a single modality. However, most existing methods directly combined the texture details and object contrast of different modalities, ignoring the dynamic changes in reality, which diminishes the visible text… ▽ More Infrared and visible image fusion aims to integrate comprehensive information from multiple sources to achieve superior performances on various practical tasks, such as detection, over that of a single modality. However, most existing methods directly combined the texture details and object contrast of different modalities, ignoring the dynamic changes in reality, which diminishes the visible texture in good lighting conditions and the infrared contrast in low lighting conditions. To fill this gap, we propose a dynamic image fusion framework with a multi-modal gated mixture of local-to-global experts, termed MoE-Fusion, to dynamically extract effective and comprehensive information from the respective modalities. Our model consists of a Mixture of Local Experts (MoLE) and a Mixture of Global Experts (MoGE) guided by a multi-modal gate. The MoLE performs specialized learning of multi-modal local features, prompting the fused images to retain the local information in a sample-adaptive manner, while the MoGE focuses on the global information that complements the fused image with overall texture detail and contrast. Extensive experiments show that our MoE-Fusion outperforms state-of-the-art methods in preserving multi-modal image texture and contrast through the local-to-global dynamic learning paradigm, and also achieves superior performance on detection tasks. Our code will be available: https://github.com/SunYM2020/MoE-Fusion. △ Less

Submitted 23 March, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

arXiv:2212.06458 [pdf, other]

HS-Diffusion: Semantic-Mixing Diffusion for Head Swap**

Authors: Qinghe Wang, Lijie Liu, Miao Hua, Pengfei Zhu, Wangmeng Zuo, Qinghua Hu, Huchuan Lu, Bing Cao

Abstract: Image-based head swap** task aims to stitch a source head to another source body flawlessly. This seldom-studied task faces two major challenges: 1) Preserving the head and body from various sources while generating a seamless transition region. 2) No paired head swap** dataset and benchmark so far. In this paper, we propose a semantic-mixing diffusion model for head swap** (HS-Diffusion) wh… ▽ More Image-based head swap** task aims to stitch a source head to another source body flawlessly. This seldom-studied task faces two major challenges: 1) Preserving the head and body from various sources while generating a seamless transition region. 2) No paired head swap** dataset and benchmark so far. In this paper, we propose a semantic-mixing diffusion model for head swap** (HS-Diffusion) which consists of a latent diffusion model (LDM) and a semantic layout generator. We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator, achieving a coarse-grained head swap**. Semantic-mixing LDM can further implement a fine-grained head swap** with the inpainted layout as condition by a progressive fusion process, while preserving head and body with high-quality reconstruction. To this end, we propose a semantic calibration strategy for natural inpainting and a neck alignment for geometric realism. Importantly, we construct a new image-based head swap** benchmark and design two tailor-designed metrics (Mask-FID and Focal-FID). Extensive experiments demonstrate the superiority of our framework. The code will be available: https://github.com/qinghew/HS-Diffusion. △ Less

Submitted 3 August, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

arXiv:2212.05439 [pdf, other]

doi 10.1016/j.buildenv.2022.109950

Personalized local heating neutralizing individual, spatial and temporal thermo-physiological variances in extreme cold environments

Authors: Yi Ju, Xinyuan Ju, Hui Zhang, Bin Cao, Bin Liu, Yingxin Zhu

Abstract: In this paper, we investigate the feasibility, robustness and optimization of introducing personal comfort systems (PCS), apparatuses that promises in energy saving and comfort improvement, into a broader range of environments. We report a series of laboratory experiments systematically examining the effect of personalized heating in neutralizing individual, spatial and temporal variations of ther… ▽ More In this paper, we investigate the feasibility, robustness and optimization of introducing personal comfort systems (PCS), apparatuses that promises in energy saving and comfort improvement, into a broader range of environments. We report a series of laboratory experiments systematically examining the effect of personalized heating in neutralizing individual, spatial and temporal variations of thermal demands. The experiments were conducted in an artificial climate chamber at -15 degC in order to simulate extreme cold environments. We developed a heating garment with 20 pieces of 20 * 20 cm2 heating cloth (grouped into 9 regions) comprehensively covering human body. Surface temperatures of the garment can be controlled independently, quickly (within 20 seconds), precisely (within 1 degC) and easily (through a tablet) up to 45 degC. Participants were instructed to adjust surface temperatures of each segment to their preferences, with their physiological, psychological and adjustment data collected. We found that active heating could significantly and stably improve thermal satisfaction. The overall TSV and TCV were improved 1.50 and 1.53 during the self-adjustment phase. Preferred heating surface temperatures for different segments varied widely. Further, even for the same segment, individual differences among participants were considerable. Such variances were observed through local heating powers, while unnoticeable among thermal perception votes. In other words, all these various differences could be neutralized given the flexibility in personalized adjustments. Our research reaffirms the paradigm of "adaptive thermal comfort" and will promote innovations on human-centric design for more efficient PCSs. △ Less

Submitted 27 December, 2022; v1 submitted 11 December, 2022; originally announced December 2022.

Journal ref: Building and Environment, 109950 (2022)

arXiv:2211.14238 [pdf, other]

Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time

Authors: Huaxiu Yao, Caroline Choi, Bochuan Cao, Yoonho Lee, Pang Wei Koh, Chelsea Finn

Abstract: Distribution shift occurs when the test distribution differs from the training distribution, and it can considerably degrade performance of machine learning models deployed in the real world. Temporal shifts -- distribution shifts arising from the passage of time -- often occur gradually and have the additional structure of timestamp metadata. By leveraging timestamp metadata, models can potential… ▽ More Distribution shift occurs when the test distribution differs from the training distribution, and it can considerably degrade performance of machine learning models deployed in the real world. Temporal shifts -- distribution shifts arising from the passage of time -- often occur gradually and have the additional structure of timestamp metadata. By leveraging timestamp metadata, models can potentially learn from trends in past distribution shifts and extrapolate into the future. While recent works have studied distribution shifts, temporal shifts remain underexplored. To address this gap, we curate Wild-Time, a benchmark of 5 datasets that reflect temporal distribution shifts arising in a variety of real-world applications, including patient prognosis and news classification. On these datasets, we systematically benchmark 13 prior approaches, including methods in domain generalization, continual learning, self-supervised learning, and ensemble learning. We use two evaluation strategies: evaluation with a fixed time split (Eval-Fix) and evaluation with a data stream (Eval-Stream). Eval-Fix, our primary evaluation strategy, aims to provide a simple evaluation protocol, while Eval-Stream is more realistic for certain real-world applications. Under both evaluation strategies, we observe an average performance drop of 20% from in-distribution to out-of-distribution data. Existing methods are unable to close this gap. Code is available at https://wild-time.github.io/. △ Less

Submitted 15 January, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

Comments: Accepted by NeurIPS 2022 Track on Datasets and Benchmarks; v2: fixed some issues in FMoW and change the name from "FMoW" to "FMoW-Time"

arXiv:2211.12377 [pdf, other]

Direct tests of T, CP, CPT symmetries in transitions of neutral K mesons with the KLOE experiment

Authors: D. Babusci, M. Berłowski, C. Bloise, F. Bossi, P. Branchini, B. Cao, F. Ceradini, P. Ciambrone, F. Curciarello, E. Czerwiński, G. D'Agostini, R. D'Amico, E. Danè, V. De Leo, E. De Lucia, A. De Santis, P. De Simone, A. Di Domenico, E. Diociaiuti, D. Domenici, A. D'Uffizi, G. Fantini, A. Gajos, S. Gamrat, P. Gauzzi , et al. (18 additional authors not shown)

Abstract: Tests of the T, CP and CPT symmetries in the neutral kaon system are performed by the direct comparison of the probabilities of a kaon transition process to its symmetry-conjugate. The exchange of in and out states required for a genuine test involving an anti-unitary transformation implied by time-reversal is implemented exploiting the entanglement of $K^0\bar{K}{}^0$ pairs produced at a $φ$-fact… ▽ More Tests of the T, CP and CPT symmetries in the neutral kaon system are performed by the direct comparison of the probabilities of a kaon transition process to its symmetry-conjugate. The exchange of in and out states required for a genuine test involving an anti-unitary transformation implied by time-reversal is implemented exploiting the entanglement of $K^0\bar{K}{}^0$ pairs produced at a $φ$-factory. A data sample collected by the KLOE experiment at DA$Φ$NE corresponding to an integrated luminosity of about 1.7 fb$^{-1}$ is analysed to study the $Δ$t distributions of the $φ\to K_{S}K_{L}\to π^+π^- \: π^{\pm}e^{\mp}ν$ and $φ\to K_{S}K_{L}\to π^{\pm}e^{\mp}ν\: 3π^0$ processes, with $Δ$t the difference of the kaon decay times. A comparison of the measured $Δ$t distributions in the asymptotic region $Δt \gg τ_{S}$ allows to test for the first time T and CPT symmetries in kaon transitions with a precision of few percent, and to observe CP violation with this novel method. △ Less

Submitted 19 December, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: 26 pages, 10 figures. Minor style corrections applied before journal submission. Sumbmitted to Physics Letters B

arXiv:2211.08736 [pdf, other]

doi 10.1109/TMM.2022.3222118

AlignVE: Visual Entailment Recognition Based on Alignment Relations

Authors: Biwei Cao, Jiuxin Cao, Jie Gui, Jiayun Shen, Bo Liu, Lei He, Yuan Yan Tang, James Tin-Yau Kwok

Abstract: Visual entailment (VE) is to recognize whether the semantics of a hypothesis text can be inferred from the given premise image, which is one special task among recent emerged vision and language understanding tasks. Currently, most of the existing VE approaches are derived from the methods of visual question answering. They recognize visual entailment by quantifying the similarity between the hypo… ▽ More Visual entailment (VE) is to recognize whether the semantics of a hypothesis text can be inferred from the given premise image, which is one special task among recent emerged vision and language understanding tasks. Currently, most of the existing VE approaches are derived from the methods of visual question answering. They recognize visual entailment by quantifying the similarity between the hypothesis and premise in the content semantic features from multi modalities. Such approaches, however, ignore the VE's unique nature of relation inference between the premise and hypothesis. Therefore, in this paper, a new architecture called AlignVE is proposed to solve the visual entailment problem with a relation interaction method. It models the relation between the premise and hypothesis as an alignment matrix. Then it introduces a pooling operation to get feature vectors with a fixed size. Finally, it goes through the fully-connected layer and normalization layer to complete the classification. Experiments show that our alignment-based architecture reaches 72.45\% accuracy on SNLI-VE dataset, outperforming previous content-based models under the same settings. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: This paper is accepted for publication as a REGULAR paper in the IEEE Transactions on Multimedia

arXiv:2210.05513 [pdf, other]

ViFiCon: Vision and Wireless Association Via Self-Supervised Contrastive Learning

Authors: Nicholas Meegan, Hansi Liu, Bryan Cao, Abrar Alali, Kristin Dana, Marco Gruteser, Shubham Jain, Ashwin Ashok

Abstract: We introduce ViFiCon, a self-supervised contrastive learning scheme which uses synchronized information across vision and wireless modalities to perform cross-modal association. Specifically, the system uses pedestrian data collected from RGB-D camera footage as well as WiFi Fine Time Measurements (FTM) from a user's smartphone device. We represent the temporal sequence by stacking multi-person de… ▽ More We introduce ViFiCon, a self-supervised contrastive learning scheme which uses synchronized information across vision and wireless modalities to perform cross-modal association. Specifically, the system uses pedestrian data collected from RGB-D camera footage as well as WiFi Fine Time Measurements (FTM) from a user's smartphone device. We represent the temporal sequence by stacking multi-person depth data spatially within a banded image. Depth data from RGB-D (vision domain) is inherently linked with an observable pedestrian, but FTM data (wireless domain) is associated only to a smartphone on the network. To formulate the cross-modal association problem as self-supervised, the network learns a scene-wide synchronization of the two modalities as a pretext task, and then uses that learned representation for the downstream task of associating individual bounding boxes to specific smartphones, i.e. associating vision and wireless information. We use a pre-trained region proposal model on the camera footage and then feed the extrapolated bounding box information into a dual-branch convolutional neural network along with the FTM data. We show that compared to fully supervised SoTA models, ViFiCon achieves high performance vision-to-wireless association, finding which bounding box corresponds to which smartphone device, without hand-labeled association examples for training data. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2210.02046 [pdf, other]

doi 10.1109/ROBIO55434.2022.10012007

Prototype Design and Efficiency Analysis of a Novel Robot Drive Based on 3K-H-V Topology

Authors: Le Qi, Dapeng Yang, Baoshi Cao, Zhiqi Li, Yikun Gu, Zongwu Xie, Hong Liu

Abstract: Robot actuators directly affect the performance of robots, and robot drives directly affect the performance of robot actuators. With the development of robotics, robots have put higher requirements on robot drives, such as high stiffness, high accuracy, high loading, high efficiency, low backlash, compact size, and hollow structure. In order to meet the demand development of robot actuators, this… ▽ More Robot actuators directly affect the performance of robots, and robot drives directly affect the performance of robot actuators. With the development of robotics, robots have put higher requirements on robot drives, such as high stiffness, high accuracy, high loading, high efficiency, low backlash, compact size, and hollow structure. In order to meet the demand development of robot actuators, this research base proposes a new robot drive based on 3K-H-V topology using involute and cycloidal gear shapes, planetary cycloidal drive, from the perspective of drive topology and through the design idea of decoupling. In this study, the reduction ratio and the efficiency model of the 3K-H-V topology were analyzed, and a prototype planetary cycloidal actuator was designed. The feasibility of the drive is initially verified by experimentally concluding that the PCA has a hollow structure, compact size, and high torque density (69 kg/Nm). △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2209.10935 [pdf, ps, other]

Learning swimming via deep reinforcement learning

Authors: ** Zhang, Lei Zhou, Bochao Cao

Abstract: For decades, people have been seeking for fishlike flap** motions that can realize underwater propulsion with low energy cost. Complexity of the nonstationary flow field around the flap** body makes this problem very difficult. In earlier studies, motion patterns are usually prescribed as certain periodic functions which constrains the following optimization process in a small subdomain of the… ▽ More For decades, people have been seeking for fishlike flap** motions that can realize underwater propulsion with low energy cost. Complexity of the nonstationary flow field around the flap** body makes this problem very difficult. In earlier studies, motion patterns are usually prescribed as certain periodic functions which constrains the following optimization process in a small subdomain of the whole motion space. In this work, to avoid this motion constraint, a variational autoencoder (VAE) is designed to compress various flap** motions into a simple action vector. Then we let a flap** airfoil continuously interact with water tunnel environment and adjust its action accordingly through a reinforcement learning (RL) framework. By this automatic close-looped experiment, we obtain several motion patterns that can result in high hydrodynamic efficiency comparing to pure harmonic motions with the same thrust level. And we find that, after numerous trials and errors, RL trainings in current experiment always converge to motion patterns that are close to harmonic motions. In other words, current work proves that harmonic motion with appropriate amplitude and frequency is always an optimal choice for efficient underwater propulsion. Furthermore, the RL framework proposed here can be also extended to the study of other complex swimming problems, which might pave the way for the creation of a robotic fish that can swim like a real fish. △ Less

Submitted 27 June, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: We found some errors in the experimental data

arXiv:2209.02363 [pdf]

doi 10.1063/5.0129290

Symmetry-enforced planar nodal chain phonons in non-symmorphic materials

Authors: Hong-Ao Yang, Hao-Yu Wei, Bing-Yang Cao

Abstract: Topological semimetal states which are constrained by symmetries and give birth to innovative excitations are the frontiers of topological quantum matter. Nodal chains in which two nodal rings connect at one point were first discovered in non-symmorphic electronic systems and then generalized to symmorphic phononic systems. In this work, we identify a new class of planar nodal chains in non-symmor… ▽ More Topological semimetal states which are constrained by symmetries and give birth to innovative excitations are the frontiers of topological quantum matter. Nodal chains in which two nodal rings connect at one point were first discovered in non-symmorphic electronic systems and then generalized to symmorphic phononic systems. In this work, we identify a new class of planar nodal chains in non-symmorphic phononic systems, where the connecting rings lie in the same plane. The constituting nodal rings are protected by mirror symmetry, their intersection is guaranteed by the combination of time-reversal and non-symmorphic two-fold screw symmetry. In addition, the connecting points are four-fold degenerate while those in previous works are two-fold degenerate. We searched all 230 space groups and found 8 space groups that can host the proposed planar nodal chain phonons. Taking wurtzite GaN (space group No.186) as an example, the planar nodal chain is confirmed by first-principles calculations. The planar nodal chains result in two distinct classes of drumhead surface. The first category lies on the [10(-1)0] surface Brillouin zone and the second lies on the [0001] surface Brillouin zone. Our finding reveals a class of planar nodal chains in non-symmorphic phononic systems, expands the catalog of topological nodal chains, and enriches the family of topological surface states. △ Less

Submitted 18 August, 2022; originally announced September 2022.

arXiv:2208.08467 [pdf]

doi 10.1063/5.0120284

Three-Sensor 3ω-2ω Method for the Simultaneous Measurement of Thermal Conductivity and Thermal Boundary Resistance in Film-on-Substrate Heterostructures

Authors: Guang Yang, Bing-Yang Cao

Abstract: Solid heterostructures composed of substrates and epitaxial films are extensively used in advanced technologies, and their thermophysical properties fundamentally determine the performance, efficiency, reliability, and lifetime of the corresponding devices. However, an experimental method that is truly appropriate for the thermophysical property measurement of solid heterostructures is still lacki… ▽ More Solid heterostructures composed of substrates and epitaxial films are extensively used in advanced technologies, and their thermophysical properties fundamentally determine the performance, efficiency, reliability, and lifetime of the corresponding devices. However, an experimental method that is truly appropriate for the thermophysical property measurement of solid heterostructures is still lacking. To this end, a three-sensor 3ω-2ω method is proposed, which can simultaneously measure the thermal conductivities of the film and the substrate, along with the film-substrate thermal boundary resistance (TBR) in a single solid heterostructure without any reference samples, showing broad applicability for miscellaneous heterostructures with film thickness ranging from 100 nm to 10 μm. In this method, three parallel metal sensors with unequal width and spacing are fabricated on the sample surface, in which the two outer sensors are used as heaters, and the middle sensor is used as a detector. The respective 3ω signals of the two heaters and 2ω signal of the detector are measured, and then the thermophysical properties of the sample are fitted within 3D finite element simulations. To verify this method, two typical wide bandgap semiconductor heterojunctions, i.e., GaN on SiC (#SiC) and GaN on Si (#Si) with ~2.3 μm GaN epilayers, are measured. The thermal conductivity of the GaN film, the thermal conductivities of the SiC and Si substrates, and the GaN/substrate TBRs are derived, exhibiting good agreement with the reported values in the literature. The proposed method will provide a comprehensive solution for the thermophysical property measurements of various solid heterostructures. △ Less

Submitted 11 August, 2022; originally announced August 2022.

arXiv:2208.04872 [pdf, other]

Measurement of the $K_S \to πe ν$ branching fraction with the KLOE experiment

Authors: D. Babusci, M. Berlowski, C. Bloise, F. Bossi, P. Branchini, B. Cao, F. Ceradini, P. Ciambrone, F. Curciarello, E. Czerwiński, G. D'Agostini, R. D'Amico, E. Danè, V. De Leo, E. De Lucia, A. De Santis, P. De Simone, A. Di Cicco, A. Di Domenico, E. Diociaiuti, D. Domenici, A. D'Uffizi, G. Fantini, A. Gajos, S. Gamrat , et al. (18 additional authors not shown)

Abstract: The branching fraction for the decay $K_S \to πe ν$ has been measured with a sample of 300 million $K_S$ mesons produced in $φ\to K_L K_S$ decays recorded by the KLOE experiment at the DA$Φ$NE $e^+e^-$ collider. Signal decays are selected by a boosted decision tree built with kinematic variables and time-of-flight measurements. Data control samples of $K_L \to πe ν$ decays are used to evaluate sig… ▽ More The branching fraction for the decay $K_S \to πe ν$ has been measured with a sample of 300 million $K_S$ mesons produced in $φ\to K_L K_S$ decays recorded by the KLOE experiment at the DA$Φ$NE $e^+e^-$ collider. Signal decays are selected by a boosted decision tree built with kinematic variables and time-of-flight measurements. Data control samples of $K_L \to πe ν$ decays are used to evaluate signal selection efficiencies. A fit to the reconstructed electron mass distribution finds 49647$\pm$316 signal events. Normalising to the $K_S \to π^+π^-$ decay events the result for the branching fraction is $\mathcal{B}(K_S \to πe ν) = (7.211 \pm 0.046_{\rm stat} \pm 0.052_{\rm syst}) \times10^{-4}$. The combination with our previous measurement gives $\mathcal{B}(K_S \to πe ν) = (7.153 \pm 0.037_{\rm stat} \pm 0.043_{\rm syst}) \times10^{-4}$. From this value we derive $f_+(0)|V_{us}| = 0.2170 \pm 0.009$. △ Less

Submitted 24 January, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:1912.05990

arXiv:2206.01326 [pdf, other]

Improving Fairness in Large-Scale Object Recognition by CrowdSourced Demographic Information

Authors: Zu Kim, André Araujo, Bingyi Cao, Cam Askew, Jack Sim, Mike Green, N'Mah Fodiatu Yilla, Tobias Weyand

Abstract: There has been increasing awareness of ethical issues in machine learning, and fairness has become an important research topic. Most fairness efforts in computer vision have been focused on human sensing applications and preventing discrimination by people's physical attributes such as race, skin color or age by increasing visual representation for particular demographic groups. We argue that ML f… ▽ More There has been increasing awareness of ethical issues in machine learning, and fairness has become an important research topic. Most fairness efforts in computer vision have been focused on human sensing applications and preventing discrimination by people's physical attributes such as race, skin color or age by increasing visual representation for particular demographic groups. We argue that ML fairness efforts should extend to object recognition as well. Buildings, artwork, food and clothing are examples of the objects that define human culture. Representing these objects fairly in machine learning datasets will lead to models that are less biased towards a particular culture and more inclusive of different traditions and values. There exist many research datasets for object recognition, but they have not carefully considered which classes should be included, or how much training data should be collected per class. To address this, we propose a simple and general approach, based on crowdsourcing the demographic composition of the contributors: we define fair relevance scores, estimate them, and assign them to each class. We showcase its application to the landmark recognition domain, presenting a detailed analysis and the final fairer landmark rankings. We present analysis which leads to a much fairer coverage of the world compared to existing datasets. The evaluation dataset was used for the 2021 Google Landmark Challenges, which was the first of a kind with an emphasis on fairness in generic object recognition. △ Less

Submitted 2 June, 2022; originally announced June 2022.

arXiv:2205.07731 [pdf, other]

doi 10.1016/j.cma.2022.115852

Transfer learning based physics-informed neural networks for solving inverse problems in engineering structures under different loading scenarios

Authors: Chen Xu, Ba Trung Cao, Yong Yuan, Günther Meschke

Abstract: Recently, a class of machine learning methods called physics-informed neural networks (PINNs) has been proposed and gained prevalence in solving various scientific computing problems. This approach enables the solution of partial differential equations (PDEs) via embedding physical laws into the loss function. Many inverse problems can be tackled by simply combining the data from real life scenari… ▽ More Recently, a class of machine learning methods called physics-informed neural networks (PINNs) has been proposed and gained prevalence in solving various scientific computing problems. This approach enables the solution of partial differential equations (PDEs) via embedding physical laws into the loss function. Many inverse problems can be tackled by simply combining the data from real life scenarios with existing PINN algorithms. In this paper, we present a multi-task learning method using uncertainty weighting to improve the training efficiency and accuracy of PINNs for inverse problems in linear elasticity and hyperelasticity. Furthermore, we demonstrate an application of PINNs to a practical inverse problem in structural analysis: prediction of external loads of diverse engineering structures based on limited displacement monitoring points. To this end, we first determine a simplified loading scenario at the offline stage. By setting unknown boundary conditions as learnable parameters, PINNs can predict the external loads with the support of measured data. When it comes to the online stage in real engineering projects, transfer learning is employed to fine-tune the pre-trained model from offline stage. Our results show that, even with noisy gappy data, satisfactory results can still be obtained from the PINN model due to the dual regularization of physics laws and prior knowledge, which exhibits better robustness compared to traditional analysis methods. Our approach is capable of bridging the gap between various structures with geometric scaling and under different loading scenarios, and the convergence of training is also greatly accelerated through not only the layer freezing but also the multi-task weight inheritance from pre-trained models, thus making it possible to be applied as surrogate models in actual engineering projects. △ Less

Submitted 3 February, 2023; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: final version

Journal ref: Computer Methods in Applied Mechanics and Engineering, Volume 405, 15 February 2023, 115852 Volume 405, 15 February 2023, 115852

arXiv:2204.04367 [pdf]

Design of wavelength division multiplexing devices based on tunable edge states of valley photonic crystals

Authors: YuHui Han, HongMing Fei, Han Lin, MingDa Zhang, Xin Liu, XiaoRong Wang, BinZhao Cao, YiBiao Yang, LianTuan Xiao

Abstract: Wavelength division multiplexing (WDM) devices are key elements of Photonic integrated circuits (PICs). Conventional WDM devices based on silicon waveguides and photonic crystals have limited transmittance due to high loss introduced by the strong backward scattering from defects. In addition, it is challenging to reduce the footprint of those devices. Here we theoretically demonstrate a WDM devic… ▽ More Wavelength division multiplexing (WDM) devices are key elements of Photonic integrated circuits (PICs). Conventional WDM devices based on silicon waveguides and photonic crystals have limited transmittance due to high loss introduced by the strong backward scattering from defects. In addition, it is challenging to reduce the footprint of those devices. Here we theoretically demonstrate a WDM device in the telecommunication range based on all-dielectric silicon topological valley photonic crystal (VPC) structures. We tune its effective refractive index by tuning the physical parameters of the lattice in the silicon substrate, which can continuously tune the working wavelength range of the topological edge states, which allows designing WDM devices with different channels. The WDM device has two channels (1470 nm-1523 nm and 1548 nm-1609 nm), with contrast ratios of 22.4 dB and 24.9 dB, respectively. The principle of manipulating the working bandwidth of the topological edge states can be generally applied in designing different integratable photonic devices, thus it will find broad applications. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: 13 pages, 6 figures

arXiv:2203.16898 [pdf, other]

Semantic-shape Adaptive Feature Modulation for Semantic Image Synthesis

Authors: Zhengyao Lv, Xiaoming Li, Zhenxing Niu, Bing Cao, Wangmeng Zuo

Abstract: Recent years have witnessed substantial progress in semantic image synthesis, it is still challenging in synthesizing photo-realistic images with rich details. Most previous methods focus on exploiting the given semantic map, which just captures an object-level layout for an image. Obviously, a fine-grained part-level semantic layout will benefit object details generation, and it can be roughly in… ▽ More Recent years have witnessed substantial progress in semantic image synthesis, it is still challenging in synthesizing photo-realistic images with rich details. Most previous methods focus on exploiting the given semantic map, which just captures an object-level layout for an image. Obviously, a fine-grained part-level semantic layout will benefit object details generation, and it can be roughly inferred from an object's shape. In order to exploit the part-level layouts, we propose a Shape-aware Position Descriptor (SPD) to describe each pixel's positional feature, where object shape is explicitly encoded into the SPD feature. Furthermore, a Semantic-shape Adaptive Feature Modulation (SAFM) block is proposed to combine the given semantic map and our positional features to produce adaptively modulated features. Extensive experiments demonstrate that the proposed SPD and SAFM significantly improve the generation of objects with rich details. Moreover, our method performs favorably against the SOTA methods in terms of quantitative and qualitative evaluation. The source code and model are available at https://github.com/cszy98/SAFM. △ Less

Submitted 31 March, 2022; originally announced March 2022.

arXiv:2203.14101

A Roadmap for Big Model

Authors: Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, **g Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui , et al. (75 additional authors not shown)

Abstract: With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM… ▽ More With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view. △ Less

Submitted 20 April, 2022; v1 submitted 26 March, 2022; originally announced March 2022.

Comments: This report has been withdrawn by the authors due to critical issues in Section 2.3.1 of Article 2

arXiv:2203.12274 [pdf, other]

Pre-training to Match for Unified Low-shot Relation Extraction

Authors: Fangchao Liu, Hongyu Lin, Xianpei Han, Boxi Cao, Le Sun

Abstract: Low-shot relation extraction~(RE) aims to recognize novel relations with very few or even no samples, which is critical in real scenario application. Few-shot and zero-shot RE are two representative low-shot RE tasks, which seem to be with similar target but require totally different underlying abilities. In this paper, we propose Multi-Choice Matching Networks to unify low-shot relation extractio… ▽ More Low-shot relation extraction~(RE) aims to recognize novel relations with very few or even no samples, which is critical in real scenario application. Few-shot and zero-shot RE are two representative low-shot RE tasks, which seem to be with similar target but require totally different underlying abilities. In this paper, we propose Multi-Choice Matching Networks to unify low-shot relation extraction. To fill in the gap between zero-shot and few-shot RE, we propose the triplet-paraphrase meta-training, which leverages triplet paraphrase to pre-train zero-shot label matching ability and uses meta-learning paradigm to learn few-shot instance summarizing ability. Experimental results on three different low-shot RE tasks show that the proposed method outperforms strong baselines by a large margin, and achieve the best performance on few-shot RE leaderboard. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: Accepted to the main conference of ACL2022

arXiv:2203.12258 [pdf, other]

Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View

Authors: Boxi Cao, Hongyu Lin, Xianpei Han, Fangchao Liu, Le Sun

Abstract: Prompt-based probing has been widely used in evaluating the abilities of pretrained language models (PLMs). Unfortunately, recent studies have discovered such an evaluation may be inaccurate, inconsistent and unreliable. Furthermore, the lack of understanding its inner workings, combined with its wide applicability, has the potential to lead to unforeseen risks for evaluating and applying PLMs in… ▽ More Prompt-based probing has been widely used in evaluating the abilities of pretrained language models (PLMs). Unfortunately, recent studies have discovered such an evaluation may be inaccurate, inconsistent and unreliable. Furthermore, the lack of understanding its inner workings, combined with its wide applicability, has the potential to lead to unforeseen risks for evaluating and applying PLMs in real-world applications. To discover, understand and quantify the risks, this paper investigates the prompt-based probing from a causal view, highlights three critical biases which could induce biased results and conclusions, and proposes to conduct debiasing via causal intervention. This paper provides valuable insights for the design of unbiased datasets, better probing frameworks and more reliable evaluations of pretrained language models. Furthermore, our conclusions also echo that we need to rethink the criteria for identifying better pretrained language models. We openly released the source code and data at https://github.com/c-box/causalEval. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: Accepted to the main conference of ACL2022

arXiv:2203.10278 [pdf, other]

doi 10.1007/s11263-022-01590-z

Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and Semi-Supervised Semantic Segmentation

Authors: Junwen Pan, Pengfei Zhu, Kaihua Zhang, Bing Cao, Yu Wang, Dingwen Zhang, Junwei Han, Qinghua Hu

Abstract: Semantic segmentation with limited annotations, such as weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS), is a challenging task that has attracted much attention recently. Most leading WSSS methods employ a sophisticated multi-stage training strategy to estimate pseudo-labels as precise as possible, but they suffer from high model complexity. In contr… ▽ More Semantic segmentation with limited annotations, such as weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS), is a challenging task that has attracted much attention recently. Most leading WSSS methods employ a sophisticated multi-stage training strategy to estimate pseudo-labels as precise as possible, but they suffer from high model complexity. In contrast, there exists another research line that trains a single network with image-level labels in one training cycle. However, such a single-stage strategy often performs poorly because of the compounding effect caused by inaccurate pseudo-label estimation. To address this issue, this paper presents a Self-supervised Low-Rank Network (SLRNet) for single-stage WSSS and SSSS. The SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several complementary attentive LR representations from different views of an image to learn precise pseudo-labels. Specifically, we reformulate the LR representation learning as a collective matrix factorization problem and optimize it jointly with the network learning in an end-to-end manner. The resulting LR representation deprecates noisy information while capturing stable semantics across different views, making it robust to the input variations, thereby reducing overfitting to self-supervision errors. The SLRNet can provide a unified single-stage framework for various label-efficient semantic segmentation settings: 1) WSSS with image-level labeled data, 2) SSSS with a few pixel-level labeled data, and 3) SSSS with a few pixel-level labeled data and many image-level labeled data. Extensive experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings, proving its good generalizability and efficacy. △ Less

Submitted 19 March, 2022; originally announced March 2022.

Comments: Accepted to IJCV 2022

arXiv:2203.00818 [pdf]

doi 10.1021/acsami.2c01849

Emergence of insulating ferrimagnetism and perpendicular magnetic anisotropy in 3d-5d perovskite oxide composite films for insulator spintronic

Authors: Zeliang Ren, Bin Lao, Xuan Zheng, Lei Liao, Zengxing Lu, Sheng Li, Yongjie Yang, Bingshan Cao, Lijie Wen, Kenan Zhao, Lifen Wang, Xuedong Bai, Xianfeng Hao, Zhaoliang Liao, Zhiming Wang, Run-Wei Li

Abstract: Magnetic insulators with strong perpendicular magnetic anisotropy (PMA) play a key role in exploring pure spin current phenomena and develo** ultralow-dissipation spintronic devices, thereby it is highly desirable to develop new material platforms. Here we report epitaxial growth of La2/3Sr1/3MnO3 (LSMO)-SrIrO3 (SIO) composite oxide films (LSMIO) with different crystalline orientations fabricate… ▽ More Magnetic insulators with strong perpendicular magnetic anisotropy (PMA) play a key role in exploring pure spin current phenomena and develo** ultralow-dissipation spintronic devices, thereby it is highly desirable to develop new material platforms. Here we report epitaxial growth of La2/3Sr1/3MnO3 (LSMO)-SrIrO3 (SIO) composite oxide films (LSMIO) with different crystalline orientations fabricated by sequential two-target ablation process using pulsed laser deposition. The LSMIO films exhibit high crystalline quality with homogeneous mixture of LSMO and SIO at atomic level. Ferrimagnetic and insulating transport characteristics are observed, with the temperature-dependent electric resistivity well fitted by Mott variable-range-hop** model. Moreover, the LSMIO films show strong PMA. Through further constructing all perovskite oxide heterostructures of the ferrimagnetic insulator LSMIO and a strong spin-orbital coupled SIO layer, pronounced spin Hall magnetoresistance (SMR) and spin Hall-like anomalous Hall effect (SH-AHE) were observed. These results illustrate the potential application of the ferrimagnetic insulator LSMIO in develo** all-oxide ultralow-dissipation spintronic devices. △ Less

Submitted 1 March, 2022; originally announced March 2022.

arXiv:2202.13770 [pdf, ps, other]

doi 10.1115/1.4054698

Thermal spreading resistance of GaN HEMTs with heat source heating studied by hybrid Monte Carlo-diffusion simulations

Authors: Han-Ling Li, Yang Shen, Yu-Chao Hua, S. L. Sobolev, Bing-Yang Cao

Abstract: Exact assessment of thermal spreading resistance is of great importance to the thermal management of electronic devices, especially when completely considering the heat conduction process from the nanoscale heat source to the macroscopic scale heat sink. The existing simulation methods are either based on convectional Fourier's law or limited to small system sizes, making it difficult to accuratel… ▽ More Exact assessment of thermal spreading resistance is of great importance to the thermal management of electronic devices, especially when completely considering the heat conduction process from the nanoscale heat source to the macroscopic scale heat sink. The existing simulation methods are either based on convectional Fourier's law or limited to small system sizes, making it difficult to accurately and efficiently study the cross-scale heat transfer. In this paper, a hybrid phonon Monte Carlo-diffusion method that couples phonon Monte Carlo (MC) method with Fourier's law by dividing the computational domain is adopted to analyze thermal spreading resistance in ballistic-diffusive regime. Compared with phonon MC simulation, the junction temperature of the hybrid method has the same precision, while the time costs could be reduced up to 2 orders of magnitude at most. Furthermore, the simulation results indicate that the heating scheme has a remarkable impact on phonon transport. The thermal resistance of the heat source (HS) scheme can be larger than that of the heat flux (HF) scheme, which is opposite from the prediction of Fourier's law. In the HS scheme, the enhanced phonon-boundary scattering counteracts the broadening of the heat source, leading to a stronger ballistic effect as the heat source thickness decreases. The conclusion is verified by a one-dimensional thermal resistance model. This work has opened up an opportunity for the fast and extensive thermal modeling of cross-scale heat transfer in electronic devices and highlighted the influence of heating schemes. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Journal ref: Journal of Electronic Packaging-Transactions of the ASME, 2023, 145: 011203

arXiv:2202.06415 [pdf, other]

doi 10.1103/PhysRevB.106.L081409

Magneto-Optical Measurements of the Negatively Charged 2$s$ Exciton in WSe$_2$

Authors: J. C. Sell, J. R. Vannucci, D. G. Suarez-Forero, B. Cao, D. W. Session, H. -J. Chuang, K. M. McCreary, M. R. Rosenberger, B. T. Jonker, S. Mittal, M. Hafezi

Abstract: Monolayer transition metal dichalcogenides host a variety of optically excited quasiparticles species that stem from two-dimensional confinement combined with relatively large carrier effective masses and reduced dielectric screening. The magnetic response of these quasiparticles gives information on their spin and valley configurations, nuanced carrier interactions, and insight into the underlyin… ▽ More Monolayer transition metal dichalcogenides host a variety of optically excited quasiparticles species that stem from two-dimensional confinement combined with relatively large carrier effective masses and reduced dielectric screening. The magnetic response of these quasiparticles gives information on their spin and valley configurations, nuanced carrier interactions, and insight into the underlying band structure. Recently, there have been several reports of 2$s$/3$s$ charged excitons in TMDs, but very little is still known about their response to external magnetic fields. Using photoluminescence excitation spectroscopy, we observe the presence of the 2$s$ charged exciton and report for the first time its response to an applied magnetic field. We benchmark this response against the neutral exciton and find that both the 2$s$ neutral and charged excitons exhibit similar behavior with g-factors of g$_{\rm{X_0^{2s}}}$=-5.20$\pm$0.11 and g$_{\rm{X_-^{2s}}}$=-4.98$\pm$0.11, respectively. △ Less

Submitted 29 August, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

Comments: J.C. Sell and J.R. Vannucci contributed equally to this work

Journal ref: Phys. Rev. B 106, L081409 (2022)

arXiv:2201.03788 [pdf, other]

doi 10.1109/TED.2022.3168798

Spectral Thermal Spreading Resistance of Wide Bandgap Semiconductors in Ballistic-Diffusive Regime

Authors: Yang Shen, Yu-Chao Hua, Han-Ling Li, S. L. Sobolev, Bing-Yang Cao

Abstract: To develop efficient thermal management strategies for wide bandgap (WBG) semiconductor devices, it is essential to have a clear understanding of the heat transport process within the device and accurately predict the junction temperature. In this paper, we used the phonon Monte Carlo (MC) method with the phonon dispersion of various typical WBG semiconductors, including GaN, SiC, AlN, and \ce{β-G… ▽ More To develop efficient thermal management strategies for wide bandgap (WBG) semiconductor devices, it is essential to have a clear understanding of the heat transport process within the device and accurately predict the junction temperature. In this paper, we used the phonon Monte Carlo (MC) method with the phonon dispersion of various typical WBG semiconductors, including GaN, SiC, AlN, and \ce{β-Ga_2O_3}, to investigate the thermal spreading resistance in a ballistic-diffusive regime. It was found that when compared with Fourier's law-based predictions, the increase in the thermal resistance caused by ballistic effects was strongly related to different phonon dispersions. Based on the model deduced under the gray-medium approximation and the results of dispersion MC, we obtained a thermal resistance model that can well address the issues of thermal spreading and ballistic effects, and the influences of phonon dispersion. The model can be easily coupled with FEM based thermal analysis and applied to different materials. This paper can provide a clearer understanding of the influences of phonon dispersion on the thermal transport process, and it can be useful for the prediction of junction temperatures and the development of thermal management strategies for WBG semiconductor devices. △ Less

Submitted 11 January, 2022; originally announced January 2022.

arXiv:2112.13753 [pdf, other]

doi 10.1145/3477495.3531733

MetaCVR: Conversion Rate Prediction via Meta Learning in Small-Scale Recommendation Scenarios

Authors: Xiaofeng Pan, Ming Li, **g Zhang, Keren Yu, Lu** Wang, Hong Wen, Chengjun Mao, Bo Cao

Abstract: Different from large-scale platforms such as Taobao and Amazon, CVR modeling in small-scale recommendation scenarios is more challenging due to the severe Data Distribution Fluctuation (DDF) issue. DDF prevents existing CVR models from being effective since 1) several months of data are needed to train CVR models sufficiently in small scenarios, leading to considerable distribution discrepancy bet… ▽ More Different from large-scale platforms such as Taobao and Amazon, CVR modeling in small-scale recommendation scenarios is more challenging due to the severe Data Distribution Fluctuation (DDF) issue. DDF prevents existing CVR models from being effective since 1) several months of data are needed to train CVR models sufficiently in small scenarios, leading to considerable distribution discrepancy between training and online serving; and 2) e-commerce promotions have significant impacts on small scenarios, leading to distribution uncertainty of the upcoming time period. In this work, we propose a novel CVR method named MetaCVR from a perspective of meta learning to address the DDF issue. Firstly, a base CVR model which consists of a Feature Representation Network (FRN) and output layers is designed and trained sufficiently with samples across months. Then we treat time periods with different data distributions as different occasions and obtain positive and negative prototypes for each occasion using the corresponding samples and the pre-trained FRN. Subsequently, a Distance Metric Network (DMN) is devised to calculate the distance metrics between each sample and all prototypes to facilitate mitigating the distribution uncertainty. At last, we develop an Ensemble Prediction Network (EPN) which incorporates the output of FRN and DMN to make the final CVR prediction. In this stage, we freeze the FRN and train the DMN and EPN with samples from recent time period, therefore effectively easing the distribution discrepancy. To the best of our knowledge, this is the first study of CVR prediction targeting the DDF issue in small-scale recommendation scenarios. Experimental results on real-world datasets validate the superiority of our MetaCVR and online A/B test also shows our model achieves impressive gains of 11.92% on PCVR and 8.64% on GMV. △ Less

Submitted 28 April, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

Showing 51–100 of 242 results for author: Cao, B