Search | arXiv e-print repository

UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis

Authors: Yulong Hui, Yao Lu, Huanchen Zhang

Abstract: The use of Retrieval-Augmented Generation (RAG) has improved Large Language Models (LLMs) in collaborating with external data, yet significant challenges exist in real-world scenarios. In areas such as academic literature and finance question answering, data are often found in raw text and tables in HTML or PDF formats, which can be lengthy and highly unstructured. In this paper, we introduce a be… ▽ More The use of Retrieval-Augmented Generation (RAG) has improved Large Language Models (LLMs) in collaborating with external data, yet significant challenges exist in real-world scenarios. In areas such as academic literature and finance question answering, data are often found in raw text and tables in HTML or PDF formats, which can be lengthy and highly unstructured. In this paper, we introduce a benchmark suite, namely Unstructured Document Analysis (UDA), that involves 2,965 real-world documents and 29,590 expert-annotated Q&A pairs. We revisit popular LLM- and RAG-based solutions for document analysis and evaluate the design choices and answer qualities across multiple document domains and diverse query types. Our evaluation yields interesting findings and highlights the importance of data parsing and retrieval. We hope our benchmark can shed light and better serve real-world document analysis applications. The benchmark suite and code can be found at https://github.com/qinchuanhui/UDA-Benchmark. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2405.00344 [pdf, other]

Expert Insight-Enhanced Follow-up Chest X-Ray Summary Generation

Authors: Zhichuan Wang, Kinhei Lee, Qiao Deng, Tiffany Y. So, Wan Hang Chiu, Yeung Yu Hui, Bing**g Zhou, Edward S. Hui

Abstract: A chest X-ray radiology report describes abnormal findings not only from X-ray obtained at current examination, but also findings on disease progression or change in device placement with reference to the X-ray from previous examination. Majority of the efforts on automatic generation of radiology report pertain to reporting the former, but not the latter, type of findings. To the best of the auth… ▽ More A chest X-ray radiology report describes abnormal findings not only from X-ray obtained at current examination, but also findings on disease progression or change in device placement with reference to the X-ray from previous examination. Majority of the efforts on automatic generation of radiology report pertain to reporting the former, but not the latter, type of findings. To the best of the authors' knowledge, there is only one work dedicated to generating summary of the latter findings, i.e., follow-up summary. In this study, we therefore propose a transformer-based framework to tackle this task. Motivated by our observations on the significance of medical lexicon on the fidelity of summary generation, we introduce two mechanisms to bestow expert insight to our model, namely expert soft guidance and masked entity modeling loss. The former mechanism employs a pretrained expert disease classifier to guide the presence level of specific abnormalities, while the latter directs the model's attention toward medical lexicon. Extensive experiments were conducted to demonstrate that the performance of our model is competitive with or exceeds the state-of-the-art. △ Less

Submitted 6 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: accepted by 22nd International Conference on Artificial Intelligence in medicine (AIME2024)

ACM Class: I.2.1

arXiv:2404.14984 [pdf, other]

Surface profile recovery from electromagnetic field with physics--informed neural networks

Authors: Yuxuan Chen, Ce Wang, Yuan Hui, Mark Spivack

Abstract: Physics--informed neural networks (PINN) have shown their potential in solving both direct and inverse problems of partial differential equations. In this paper, we introduce a PINN-based deep learning approach to reconstruct one-dimensional rough surfaces from field data illuminated by an electromagnetic incident wave. In the proposed algorithm, the rough surface is approximated by a neural netwo… ▽ More Physics--informed neural networks (PINN) have shown their potential in solving both direct and inverse problems of partial differential equations. In this paper, we introduce a PINN-based deep learning approach to reconstruct one-dimensional rough surfaces from field data illuminated by an electromagnetic incident wave. In the proposed algorithm, the rough surface is approximated by a neural network, with which the spatial derivatives of surface function can be obtained via automatic differentiation and then the scattered field can be calculated via the method of moments. The neural network is trained by minimizing the loss between the calculated and the observed field data. Furthermore, the proposed method is an unsupervised approach, independent of any surface data, rather only the field data is used. Both TE field (Dirichlet boundary condition) and TM field (Neumann boundary condition) are considered. Two types of field data are used here: full scattered field data and phaseless total field data. The performance of the method is verified by testing with Gaussian-correlated random rough surfaces. Numerical results demonstrate that the PINN-based method can recover rough surfaces with great accuracy and is robust with respect to a wide range of problem regimes. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14750 [pdf, other]

Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray

Authors: Qiao Deng, Zhongzhen Huang, Yunqi Wang, Zhichuan Wang, Zhao Wang, Xiaofan Zhang, Qi Dou, Yeung Yu Hui, Edward S. Hui

Abstract: Medical vision-language pre-training has emerged as a promising approach for learning domain-general representations of medical image and text. Current algorithms that exploit the global and local alignment between medical image and text could however be marred by the redundant information in medical data. To address this issue, we propose a grounded knowledge-enhanced medical vision-language pre-… ▽ More Medical vision-language pre-training has emerged as a promising approach for learning domain-general representations of medical image and text. Current algorithms that exploit the global and local alignment between medical image and text could however be marred by the redundant information in medical data. To address this issue, we propose a grounded knowledge-enhanced medical vision-language pre-training (GK-MVLP) framework for chest X-ray. In this framework, medical knowledge is grounded to the appropriate anatomical regions by using a transformer-based grounded knowledge-enhanced module for fine-grained alignment between anatomical region-level visual features and the textural features of medical knowledge. The performance of GK-MVLP is competitive with or exceeds the state of the art on downstream chest X-ray disease classification, disease localization, report generation, and medical visual question-answering tasks. Our results show the advantage of incorporating grounding mechanism to remove biases and improve the alignment between chest X-ray image and radiology report. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2402.15972 [pdf, other]

Structural Knowledge-Driven Meta-Learning for Task Offloading in Vehicular Networks with Integrated Communications, Sensing and Computing

Authors: Rui** Sun, Yao Wen, Nan Cheng, Wei Wan, Rong Chai, Yilong Hui

Abstract: Task offloading is a potential solution to satisfy the strict requirements of computation-intensive and latency-sensitive vehicular applications due to the limited onboard computing resources. However, the overwhelming upload traffic may lead to unacceptable uploading time. To tackle this issue, for tasks taking environmental data as input, the data perceived by roadside units (RSU) equipped with… ▽ More Task offloading is a potential solution to satisfy the strict requirements of computation-intensive and latency-sensitive vehicular applications due to the limited onboard computing resources. However, the overwhelming upload traffic may lead to unacceptable uploading time. To tackle this issue, for tasks taking environmental data as input, the data perceived by roadside units (RSU) equipped with several sensors can be directly exploited for computation, resulting in a novel task offloading paradigm with integrated communications, sensing and computing (I-CSC). With this paradigm, vehicles can select to upload their sensed data to RSUs or transmit computing instructions to RSUs during the offloading. By optimizing the computation mode and network resources, in this paper, we investigate an I-CSC-based task offloading problem to reduce the cost caused by resource consumption while guaranteeing the latency of each task. Although this non-convex problem can be handled by the alternating minimization (AM) algorithm that alternatively minimizes the divided four sub-problems, it leads to high computational complexity and local optimal solution. To tackle this challenge, we propose a creative structural knowledge-driven meta-learning (SKDML) method, involving both the model-based AM algorithm and neural networks. Specifically, borrowing the iterative structure of the AM algorithm, also referred to as structural knowledge, the proposed SKDML adopts long short-term memory (LSTM) network-based meta-learning to learn an adaptive optimizer for updating variables in each sub-problem, instead of the handcrafted counterpart in the AM algorithm. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2401.14754 [pdf, other]

VJT: A Video Transformer on Joint Tasks of Deblurring, Low-light Enhancement and Denoising

Authors: Yuxiang Hui, Yang Liu, Yaofang Liu, Fan Jia, **shan Pan, Raymond Chan, Tieyong Zeng

Abstract: Video restoration task aims to recover high-quality videos from low-quality observations. This contains various important sub-tasks, such as video denoising, deblurring and low-light enhancement, since video often faces different types of degradation, such as blur, low light, and noise. Even worse, these kinds of degradation could happen simultaneously when taking videos in extreme environments. T… ▽ More Video restoration task aims to recover high-quality videos from low-quality observations. This contains various important sub-tasks, such as video denoising, deblurring and low-light enhancement, since video often faces different types of degradation, such as blur, low light, and noise. Even worse, these kinds of degradation could happen simultaneously when taking videos in extreme environments. This poses significant challenges if one wants to remove these artifacts at the same time. In this paper, to the best of our knowledge, we are the first to propose an efficient end-to-end video transformer approach for the joint task of video deblurring, low-light enhancement, and denoising. This work builds a novel multi-tier transformer where each tier uses a different level of degraded video as a target to learn the features of video effectively. Moreover, we carefully design a new tier-to-tier feature fusion scheme to learn video features incrementally and accelerate the training process with a suitable adaptive weighting scheme. We also provide a new Multiscene-Lowlight-Blur-Noise (MLBN) dataset, which is generated according to the characteristics of the joint task based on the RealBlur dataset and YouTube videos to simulate realistic scenes as far as possible. We have conducted extensive experiments, compared with many previous state-of-the-art methods, to show the effectiveness of our approach clearly. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 12 pages,8 figures

arXiv:2401.12230 [pdf, other]

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

Authors: Yao Lu, Song Bian, Lequn Chen, Yongjun He, Yulong Hui, Matthew Lentz, Beibin Li, Fei Liu, Jialin Li, Qi Liu, Rui Liu, Xiaoxuan Liu, Lin Ma, Kexin Rong, Jianguo Wang, Yingjun Wu, Yongji Wu, Huanchen Zhang, Minjia Zhang, Qizhen Zhang, Tianyi Zhou, Danyang Zhuo

Abstract: In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computin… ▽ More In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computing paradigm that harnesses the power of both cloud-native technologies (e.g., multi-tenancy and serverless computing) and advanced machine learning runtime (e.g., batched LoRA inference). These joint efforts aim to optimize costs-of-goods-sold (COGS) and improve resource accessibility. The journey of merging these two domains is just at the beginning and we hope to stimulate future research and development in this area. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2312.00535 [pdf, other]

RIS-Based On-the-Air Semantic Communications -- a Diffractional Deep Neural Network Approach

Authors: Shuyi Chen, Yingzhe Hui, Yifan Qin, Yueyi Yuan, Weixiao Meng, Xuewen Luo, Hsiao-Hwa Chen

Abstract: Semantic communication has gained significant attention recently due to its advantages in achieving higher transmission efficiency by focusing on semantic information instead of bit-level information. However, current AI-based semantic communication methods require digital hardware for implementation. With the rapid advancement on reconfigurable intelligence surfaces (RISs), a new approach called… ▽ More Semantic communication has gained significant attention recently due to its advantages in achieving higher transmission efficiency by focusing on semantic information instead of bit-level information. However, current AI-based semantic communication methods require digital hardware for implementation. With the rapid advancement on reconfigurable intelligence surfaces (RISs), a new approach called on-the-air diffractional deep neural networks (D$^2$NN) can be utilized to enable semantic communications on the wave domain. This paper proposes a new paradigm of RIS-based on-the-air semantic communications, where the computational process occurs inherently as wireless signals pass through RISs. We present the system model and discuss the data and control flows of this scheme, followed by a performance analysis using image transmission as an example. In comparison to traditional hardware-based approaches, RIS-based semantic communications offer appealing features, such as light-speed computation, low computational power requirements, and the ability to handle multiple tasks simultaneously. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 17 pages, 5 figures, accepted by IEEE WCM

arXiv:2310.07728 [pdf]

AI Algorithm for the Generation of Three-Dimensional Accessibility Ramps in Grasshopper / Rhinoceros 7

Authors: Antonio Li, Leila Yi, Brandon Yeo Pei Hui

Abstract: Often overlooked as a component of urban development, accessibility infrastructure is undeniably crucial in daily life. Accessibility ramps are one of the most common types of accessibility infrastructure, and serve to benefit not only people with mobile impairments but also able-bodied third parties. While the necessity of accessibility ramps is acknowledged, actual implementation fails in light… ▽ More Often overlooked as a component of urban development, accessibility infrastructure is undeniably crucial in daily life. Accessibility ramps are one of the most common types of accessibility infrastructure, and serve to benefit not only people with mobile impairments but also able-bodied third parties. While the necessity of accessibility ramps is acknowledged, actual implementation fails in light of the limits of manpower required for the design stage. In response, we present an algorithm capable of the automatic generation of a feasible accessibility ramp based on a 3D model of the relevant environment. Through the manual specification of initial and terminal points within a 3D model, the algorithm uses AI search algorithms to determine the optimal pathway connecting these points. Essential components in devising a wheelchair-accessible ramp are encoded within the process, as evaluated by the algorithm, including but not limited to elevation differentials, spatial constraints, and gradient specifications. From this, the algorithm then generates the pathway to be expanded into a full-scale, usable model of a ramp, which then can be easily exported and transformed through inter-software exchanges. Though some human input is still required following the generation stage, the minimising of human resources provides significant boosts of efficiency in the design process thus lowering the threshold for the incorporation of accessibility features in future urban design. △ Less

Submitted 29 September, 2023; originally announced October 2023.

Comments: 9 pages, 7 figures

arXiv:2309.09304 [pdf, other]

Mobile Metaverse: A Road Map from Metaverse to Metavehicles

Authors: Yilong Hui, Gaosheng Zhao, Nan Cheng, Haibo Zhou, Zhou Su

Abstract: With the rapid development of communication technologies and extended reality (XR), the services and applications of the Metaverse are gradually entering our lives. However, the current development of the Metaverse provides users with services that are homogeneous with the user experience that the Internet has brought in the past, making them more like an extension of the Internet. In addition, as… ▽ More With the rapid development of communication technologies and extended reality (XR), the services and applications of the Metaverse are gradually entering our lives. However, the current development of the Metaverse provides users with services that are homogeneous with the user experience that the Internet has brought in the past, making them more like an extension of the Internet. In addition, as a mobile application carrier for the Metaverse, it is also worth considering how vehicles with diverse onboard components can develop in synergy with the Metaverse. In this article, we focus on the core of the Metaverse, namely user experience, and provide a road map from Metaverse to Metaverse vehicles (Metavehicles). Specifically, we first elaborate on six features of the Metaverse from the perspective of user experience and propose a hierarchical framework for the Metaverse based on the evolutionary logic of the features. Under the guidance of this framework, we discuss the empowerment of onboard components of Metavehicles on the development of the Metaverse, and analyze the service experience that Metavehicles can bring to two types of users, namely drivers and passengers. Finally, considering the differentiated development levels of Metaverse and autonomous driving, we further establish a hierarchical framework for Metavehicles from three aspects (i.e., enhance Metaverse, enhance driving experience, and enhance entertainment experience), providing an evolutionary path for the development of Metavehicles. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: 7 pages, 5 figures

arXiv:2306.08938 [pdf, other]

Scalable Resource Management for Dynamic MEC: An Unsupervised Link-Output Graph Neural Network Approach

Authors: Xiucheng Wang, Nan Cheng, Lianhao Fu, Wei Quan, Rui** Sun, Yilong Hui, Tom Luan, Xuemin Shen

Abstract: Deep learning has been successfully adopted in mobile edge computing (MEC) to optimize task offloading and resource allocation. However, the dynamics of edge networks raise two challenges in neural network (NN)-based optimization methods: low scalability and high training costs. Although conventional node-output graph neural networks (GNN) can extract features of edge nodes when the network scales… ▽ More Deep learning has been successfully adopted in mobile edge computing (MEC) to optimize task offloading and resource allocation. However, the dynamics of edge networks raise two challenges in neural network (NN)-based optimization methods: low scalability and high training costs. Although conventional node-output graph neural networks (GNN) can extract features of edge nodes when the network scales, they fail to handle a new scalability issue whereas the dimension of the decision space may change as the network scales. To address the issue, in this paper, a novel link-output GNN (LOGNN)-based resource management approach is proposed to flexibly optimize the resource allocation in MEC for an arbitrary number of edge nodes with extremely low algorithm inference delay. Moreover, a label-free unsupervised method is applied to train the LOGNN efficiently, where the gradient of edge tasks processing delay with respect to the LOGNN parameters is derived explicitly. In addition, a theoretical analysis of the scalability of the node-output GNN and link-output GNN is performed. Simulation results show that the proposed LOGNN can efficiently optimize the MEC resource allocation problem in a scalable way, with an arbitrary number of servers and users. In addition, the proposed unsupervised training method has better convergence performance and speed than supervised learning and reinforcement learning-based training methods. The code is available at \url{https://github.com/UNIC-Lab/LOGNN}. △ Less

Submitted 19 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.08344 [pdf, other]

UIERL: Internal-External Representation Learning Network for Underwater Image Enhancement

Authors: Zhengyong Wang, Liquan Shen, Yihan Yu, Yuan Hui

Abstract: Underwater image enhancement (UIE) is a meaningful but challenging task, and many learning-based UIE methods have been proposed in recent years. Although much progress has been made, these methods still exist two issues: (1) There exists a significant region-wise quality difference in a single underwater image due to the underwater imaging process, especially in regions with different scene depths… ▽ More Underwater image enhancement (UIE) is a meaningful but challenging task, and many learning-based UIE methods have been proposed in recent years. Although much progress has been made, these methods still exist two issues: (1) There exists a significant region-wise quality difference in a single underwater image due to the underwater imaging process, especially in regions with different scene depths. However, existing methods neglect this internal characteristic of underwater images, resulting in inferior performance; (2) Due to the uniqueness of the acquisition approach, underwater image acquisition tools usually capture multiple images in the same or similar scenes. Thus, the underwater images to be enhanced in practical usage are highly correlated. However, when processing a single image, existing methods do not consider the rich external information provided by the related images. There is still room for improvement in their performance. Motivated by these two aspects, we propose a novel internal-external representation learning (UIERL) network to better perform UIE tasks with internal and external information, simultaneously. In the internal representation learning stage, a new depth-based region feature guidance network is designed, including a region segmentation based on scene depth to sense regions with different quality levels, followed by a region-wise space encoder module. With performing region-wise feature learning for regions with different quality separately, the network provides an effective guidance for global features and thus guides intra-image differentiated enhancement. In the external representation learning stage, we first propose an external information extraction network to mine the rich external information in the related images. Then, internal and external features interact with each other via the proposed external-assist-internal module and internal-assist-e △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2305.08663 [pdf, other]

Leveraging Graph Embeddings for Opinion Leader Detection

Authors: Yunming Hui, Luuk Buijsman, Mel Chekol, Shihan Wang

Abstract: Nowadays, social media plays an important role in many fields, such as the promotion of measures against major infectious diseases, merchandising, etc. In social media, some people are known as opinion leaders due to their strong ability to influence the opinions of others. The detection of opinion leaders has become an important task in social network analysis. Social networks are often represent… ▽ More Nowadays, social media plays an important role in many fields, such as the promotion of measures against major infectious diseases, merchandising, etc. In social media, some people are known as opinion leaders due to their strong ability to influence the opinions of others. The detection of opinion leaders has become an important task in social network analysis. Social networks are often represented in the form of graphs which allows a large number of graph analysis methods to be used for opinion leader detection. Some studies have attempted to apply graph representation learning for opinion leader detection and achieved good results. In this paper, we propose a model-agnostic framework that formulate the opinion leader detection problem as a ranking task of node embeddings. A variety of methods and datasets are chosen to analyze the performance of our framework both qualitatively and quantitatively. Based on the analysis results, we propose a strategy that combines opinion leaders detected by two different ranking algorithms to obtain a more comprehensive set of opinion leaders. And we analyze the temporal changes of the opinion leaders in one of the dynamic social networks. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2304.05028 [pdf, other]

An Empirical Evaluation of Columnar Storage Formats

Authors: Xinyu Zeng, Yulong Hui, Jiahong Shen, Andrew Pavlo, Wes McKinney, Huanchen Zhang

Abstract: Columnar storage is a core component of a modern data analytics system. Although many database management systems (DBMSs) have proprietary storage formats, most provide extensive support to open-source storage formats such as Parquet and ORC to facilitate cross-platform data sharing. But these formats were developed over a decade ago, in the early 2010s, for the Hadoop ecosystem. Since then, both… ▽ More Columnar storage is a core component of a modern data analytics system. Although many database management systems (DBMSs) have proprietary storage formats, most provide extensive support to open-source storage formats such as Parquet and ORC to facilitate cross-platform data sharing. But these formats were developed over a decade ago, in the early 2010s, for the Hadoop ecosystem. Since then, both the hardware and workload landscapes have changed. In this paper, we revisit the most widely adopted open-source columnar storage formats (Parquet and ORC) with a deep dive into their internals. We designed a benchmark to stress-test the formats' performance and space efficiency under different workload configurations. From our comprehensive evaluation of Parquet and ORC, we identify design decisions advantageous with modern hardware and real-world data distributions. These include using dictionary encoding by default, favoring decoding speed over compression ratio for integer encoding algorithms, making block compression optional, and embedding finer-grained auxiliary data structures. We also point out the inefficiencies in the format designs when handling common machine learning workloads and using GPUs for decoding. Our analysis identified important considerations that may guide future formats to better fit modern technology trends. △ Less

Submitted 7 November, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: 15 pages; typos corrected, missing figure legend added

arXiv:2302.02352 [pdf, other]

TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou

Authors: Jianxin Chang, Chenbin Zhang, Zhiyi Fu, Xiaoxue Zang, Lin Guan, **g Lu, Yiqun Hui, Dewei Leng, Yanan Niu, Yang Song, Kun Gai

Abstract: Life-long user behavior modeling, i.e., extracting a user's hidden interests from rich historical behaviors in months or even years, plays a central role in modern CTR prediction systems. Conventional algorithms mostly follow two cascading stages: a simple General Search Unit (GSU) for fast and coarse search over tens of thousands of long-term behaviors and an Exact Search Unit (ESU) for effective… ▽ More Life-long user behavior modeling, i.e., extracting a user's hidden interests from rich historical behaviors in months or even years, plays a central role in modern CTR prediction systems. Conventional algorithms mostly follow two cascading stages: a simple General Search Unit (GSU) for fast and coarse search over tens of thousands of long-term behaviors and an Exact Search Unit (ESU) for effective Target Attention (TA) over the small number of finalists from GSU. Although efficient, existing algorithms mostly suffer from a crucial limitation: the \textit{inconsistent} target-behavior relevance metrics between GSU and ESU. As a result, their GSU usually misses highly relevant behaviors but retrieves ones considered irrelevant by ESU. In such case, the TA in ESU, no matter how attention is allocated, mostly deviates from the real user interests and thus degrades the overall CTR prediction accuracy. To address such inconsistency, we propose \textbf{TWo-stage Interest Network (TWIN)}, where our Consistency-Preserved GSU (CP-GSU) adopts the identical target-behavior relevance metric as the TA in ESU, making the two stages twins. Specifically, to break TA's computational bottleneck and extend it from ESU to GSU, or namely from behavior length $10^2$ to length $10^4-10^5$, we build a novel attention mechanism by behavior feature splitting. For the video inherent features of a behavior, we calculate their linear projection by efficient pre-computing \& caching strategies. And for the user-item cross features, we compress each into a one-dimentional bias term in the attention score calculation to save the computational cost. The consistency between two stages, together with the effective TA-based relevance metric in CP-GSU, contributes to significant performance gain in CTR prediction. △ Less

Submitted 26 June, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: Accepted by KDD 2023

arXiv:2302.01115 [pdf, other]

PEPNet: Parameter and Embedding Personalized Network for Infusing with Personalized Prior Information

Authors: Jianxin Chang, Chenbin Zhang, Yiqun Hui, Dewei Leng, Yanan Niu, Yang Song, Kun Gai

Abstract: With the increase of content pages and interactive buttons in online services such as online-shop** and video-watching websites, industrial-scale recommender systems face challenges in multi-domain and multi-task recommendations. The core of multi-task and multi-domain recommendation is to accurately capture user interests in multiple scenarios given multiple user behaviors. In this paper, we pr… ▽ More With the increase of content pages and interactive buttons in online services such as online-shop** and video-watching websites, industrial-scale recommender systems face challenges in multi-domain and multi-task recommendations. The core of multi-task and multi-domain recommendation is to accurately capture user interests in multiple scenarios given multiple user behaviors. In this paper, we propose a plug-and-play \textit{\textbf{P}arameter and \textbf{E}mbedding \textbf{P}ersonalized \textbf{Net}work (\textbf{PEPNet})} for multi-domain and multi-task recommendation. PEPNet takes personalized prior information as input and dynamically scales the bottom-level Embedding and top-level DNN hidden units through gate mechanisms. \textit{Embedding Personalized Network (EPNet)} performs personalized selection on Embedding to fuse features with different importance for different users in multiple domains. \textit{Parameter Personalized Network (PPNet)} executes personalized modification on DNN parameters to balance targets with different sparsity for different users in multiple tasks. We have made a series of special engineering optimizations combining the Kuaishou training framework and the online deployment environment. By infusing personalized selection of Embedding and personalized modification of DNN parameters, PEPNet tailored to the interests of each individual obtains significant performance gains, with online improvements exceeding 1\% in multiple task metrics across multiple domains. We have deployed PEPNet in Kuaishou apps, serving over 300 million users every day. △ Less

Submitted 26 June, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

Comments: Accepted by KDD 2023

arXiv:2211.06687 [pdf, other]

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Authors: Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov

Abstract: Contrastive learning has shown remarkable success in the field of multimodal representation learning. In this paper, we propose a pipeline of contrastive language-audio pretraining to develop an audio representation by combining audio data with natural language descriptions. To accomplish this target, we first release LAION-Audio-630K, a large collection of 633,526 audio-text pairs from different… ▽ More Contrastive learning has shown remarkable success in the field of multimodal representation learning. In this paper, we propose a pipeline of contrastive language-audio pretraining to develop an audio representation by combining audio data with natural language descriptions. To accomplish this target, we first release LAION-Audio-630K, a large collection of 633,526 audio-text pairs from different data sources. Second, we construct a contrastive language-audio pretraining model by considering different audio encoders and text encoders. We incorporate the feature fusion mechanism and keyword-to-caption augmentation into the model design to further enable the model to process audio inputs of variable lengths and enhance the performance. Third, we perform comprehensive experiments to evaluate our model across three tasks: text-to-audio retrieval, zero-shot audio classification, and supervised audio classification. The results demonstrate that our model achieves superior performance in text-to-audio retrieval task. In audio classification tasks, the model achieves state-of-the-art performance in the zero-shot setting and is able to obtain performance comparable to models' results in the non-zero-shot setting. LAION-Audio-630K and the proposed model are both available to the public. △ Less

Submitted 21 March, 2024; v1 submitted 12 November, 2022; originally announced November 2022.

arXiv:2210.15149 [pdf]

Fully Automated Deep Learning-enabled Detection for Hepatic Steatosis on Computed Tomography: A Multicenter International Validation Study

Authors: Zhongyi Zhang, Guixia Li, Ziqiang Wang, Feng Xia, Ning Zhao, Huibin Nie, Zezhong Ye, Joshua Lin, Yiyi Hui, Xiangchun Liu

Abstract: Despite high global prevalence of hepatic steatosis, no automated diagnostics demonstrated generalizability in detecting steatosis on multiple international datasets. Traditionally, hepatic steatosis detection relies on clinicians selecting the region of interest (ROI) on computed tomography (CT) to measure liver attenuation. ROI selection demands time and expertise, and therefore is not routinely… ▽ More Despite high global prevalence of hepatic steatosis, no automated diagnostics demonstrated generalizability in detecting steatosis on multiple international datasets. Traditionally, hepatic steatosis detection relies on clinicians selecting the region of interest (ROI) on computed tomography (CT) to measure liver attenuation. ROI selection demands time and expertise, and therefore is not routinely performed in populations. To automate the process, we validated an existing artificial intelligence (AI) system for 3D liver segmentation and used it to purpose a novel method: AI-ROI, which could automatically select the ROI for attenuation measurements. AI segmentation and AI-ROI method were evaluated on 1,014 non-contrast enhanced chest CT images from eight international datasets: LIDC-IDRI, NSCLC-Lung1, RIDER, VESSEL12, RICORD-1A, RICORD-1B, COVID-19-Italy, and COVID-19-China. AI segmentation achieved a mean dice coefficient of 0.957. Attenuations measured by AI-ROI showed no significant differences (p = 0.545) and a reduction of 71% time compared to expert measurements. The area under the curve (AUC) of the steatosis classification of AI-ROI is 0.921 (95% CI: 0.883 - 0.959). If performed as a routine screening method, our AI protocol could potentially allow early non-invasive, non-pharmacological preventative interventions for hepatic steatosis. 1,014 expert-annotated liver segmentations of patients with hepatic steatosis annotations can be downloaded here: https://drive.google.com/drive/folders/1-g_zJeAaZXYXGqL1OeF6pUjr6KB0igJX. △ Less

Submitted 6 November, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.01863 [pdf, other]

Group Personalized Federated Learning

Authors: Zhe Liu, Yue Hui, Fuchun Peng

Abstract: Federated learning (FL) can help promote data privacy by training a shared model in a de-centralized manner on the physical devices of clients. In the presence of highly heterogeneous distributions of local data, personalized FL strategy seeks to mitigate the potential client drift. In this paper, we present the group personalization approach for applications of FL in which there exist inherent pa… ▽ More Federated learning (FL) can help promote data privacy by training a shared model in a de-centralized manner on the physical devices of clients. In the presence of highly heterogeneous distributions of local data, personalized FL strategy seeks to mitigate the potential client drift. In this paper, we present the group personalization approach for applications of FL in which there exist inherent partitions among clients that are significantly distinct. In our method, the global FL model is fine-tuned through another FL training process over each homogeneous group of clients, after which each group-specific FL model is further adapted and personalized for any client. The proposed method can be well interpreted from a Bayesian hierarchical modeling perspective. With experiments on two real-world datasets, we demonstrate this approach can achieve superior personalization performance than other FL counterparts. △ Less

Submitted 11 October, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

arXiv:2207.04001 [pdf, other]

doi 10.1093/mnras/stac1996

On Improving the Performance of Glitch Classification for Gravitational Wave Detection by using Generative Adversarial Networks

Authors: Jianqi Yan, Alex P. Leung, David C. Y. Hui

Abstract: Spectrogram classification plays an important role in analyzing gravitational wave data. In this paper, we propose a framework to improve the classification performance by using Generative Adversarial Networks (GANs). As substantial efforts and expertise are required to annotate spectrograms, the number of training examples is very limited. However, it is well known that deep networks can perform… ▽ More Spectrogram classification plays an important role in analyzing gravitational wave data. In this paper, we propose a framework to improve the classification performance by using Generative Adversarial Networks (GANs). As substantial efforts and expertise are required to annotate spectrograms, the number of training examples is very limited. However, it is well known that deep networks can perform well only when the sample size of the training set is sufficiently large. Furthermore, the imbalanced sample sizes in different classes can also hamper the performance. In order to tackle these problems, we propose a GAN-based data augmentation framework. While standard data augmentation methods for conventional images cannot be applied on spectrograms, we found that a variant of GANs, ProGAN, is capable of generating high-resolution spectrograms which are consistent with the quality of the high-resolution original images and provide a desirable diversity. We have validated our framework by classifying glitches in the {\it Gravity Spy} dataset with the GAN-generated spectrograms for training. We show that the proposed method can provide an alternative to transfer learning for the classification of spectrograms using deep networks, i.e. using a high-resolution GAN for data augmentation instead. Furthermore, fluctuations in classification performance with small sample sizes for training and evaluation can be greatly reduced. Using the trained network in our framework, we have also examined the spectrograms with label anomalies in {\it Gravity Spy}. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: Accepted for publication in MNRAS, 16 pages, 14 figures, 5 tables

arXiv:2206.02407 [pdf, other]

Green Interference Based Symbiotic Security in Integrated Satellite-terrestrial Communications

Authors: Zhisheng Yin, Nan Cheng, Tom H. Luan, Yilong Hui, Wei Wang

Abstract: In this paper, we investigate secure transmissions in integrated satellite-terrestrial communications and the green interference based symbiotic security scheme is proposed. Particularly, the co-channel interference induced by the spectrum sharing between satellite and terrestrial networks and the inter-beam interference due to frequency reuse among satellite multi-beam serve as the green interfer… ▽ More In this paper, we investigate secure transmissions in integrated satellite-terrestrial communications and the green interference based symbiotic security scheme is proposed. Particularly, the co-channel interference induced by the spectrum sharing between satellite and terrestrial networks and the inter-beam interference due to frequency reuse among satellite multi-beam serve as the green interference to assist the symbiotic secure transmission, where the secure transmissions of both satellite and terrestrial links are guaranteed simultaneously. Specifically, to realize the symbiotic security, we formulate a problem to maximize the sum secrecy rate of satellite users by cooperatively beamforming optimizing and a constraint of secrecy rate of each terrestrial user is guaranteed. Since the formulated problem is non-convex and intractable, the Taylor expansion and semi-definite relaxation (SDR) are adopted to further reformulate this problem, and the successive convex approximation (SCA) algorithm is designed to solve it. Finally, the tightness of the relaxation is proved. In addition, numerical results verify the efficiency of our proposed approach. △ Less

Submitted 6 June, 2022; originally announced June 2022.

arXiv:2204.12153 [pdf, other]

Cybertwin-enabled 6G Space-air-ground Integrated Networks: Architecture, Open Issue, and Challenges

Authors: Zhisheng Yin, Tom H. Luan, Nan Cheng, Yilong Hui, Wei Wang

Abstract: Space-air-ground integrated network (SAGIN) is considered as a core requirement in emerging 6G networks, which integrates the terrestrial and non-terrestrial networks to reach the full network coverage and ubiquitous services. To envision the ubiquitous intelligence and the deep integration in 6G SAGIN, a paradigm of cybertwin-enabled 6G SAGIN is presented in this paper. Specifically, a cybertwin-… ▽ More Space-air-ground integrated network (SAGIN) is considered as a core requirement in emerging 6G networks, which integrates the terrestrial and non-terrestrial networks to reach the full network coverage and ubiquitous services. To envision the ubiquitous intelligence and the deep integration in 6G SAGIN, a paradigm of cybertwin-enabled 6G SAGIN is presented in this paper. Specifically, a cybertwin-enabled SAGIN architecture is first presented, where a novel five-dimension digital twin (DT) model is presented. Particularly, three categories of critical technologies are presented based on the cybertwin of SAGIN, i.e., cybertwin-based multi-source heterogeneous network integration, cybertwin-based integrated cloud-edge-end, and cybertwin-based integrated sensing-communication-computing. Besides, two open issues in the cybertwin-enabled SAGIN are studied, i.e., the networking decision and optimization and the cybertwin-enabled cross-layer privacy and security, where the challenges are discussed and the potential solutions are directed. In addition, a case study with federal learning is developed and open research issues are discussed. △ Less

Submitted 26 April, 2022; originally announced April 2022.

arXiv:2203.02098 [pdf, other]

Universal Segmentation of 33 Anatomies

Authors: Pengbo Liu, Yang Deng, Ce Wang, Yuan Hui, Qian Li, Jun Li, Shiwei Luo, Mengke Sun, Quan Quan, Shuxin Yang, You Hao, Honghu Xiao, Chunpeng Zhao, Xinbao Wu, S. Kevin Zhou

Abstract: In the paper, we present an approach for learning a single model that universally segments 33 anatomical structures, including vertebrae, pelvic bones, and abdominal organs. Our model building has to address the following challenges. Firstly, while it is ideal to learn such a model from a large-scale, fully-annotated dataset, it is practically hard to curate such a dataset. Thus, we resort to lear… ▽ More In the paper, we present an approach for learning a single model that universally segments 33 anatomical structures, including vertebrae, pelvic bones, and abdominal organs. Our model building has to address the following challenges. Firstly, while it is ideal to learn such a model from a large-scale, fully-annotated dataset, it is practically hard to curate such a dataset. Thus, we resort to learn from a union of multiple datasets, with each dataset containing the images that are partially labeled. Secondly, along the line of partial labelling, we contribute an open-source, large-scale vertebra segmentation dataset for the benefit of spine analysis community, CTSpine1K, boasting over 1,000 3D volumes and over 11K annotated vertebrae. Thirdly, in a 3D medical image segmentation task, due to the limitation of GPU memory, we always train a model using cropped patches as inputs instead a whole 3D volume, which limits the amount of contextual information to be learned. To this, we propose a cross-patch transformer module to fuse more information in adjacent patches, which enlarges the aggregated receptive field for improved segmentation performance. This is especially important for segmenting, say, the elongated spine. Based on 7 partially labeled datasets that collectively contain about 2,800 3D volumes, we successfully learn such a universal model. Finally, we evaluate the universal model on multiple open-source datasets, proving that our model has a good generalization performance and can potentially serve as a solid foundation for downstream tasks. △ Less

Submitted 3 March, 2022; originally announced March 2022.

arXiv:2112.01154 [pdf, other]

Autonomous Vehicular Networks: Perspective and Open Issues

Authors: Tom H. Luan, Yao Zhang, Lin Cai, Yilong Hui, Changle Li, Nan Cheng

Abstract: The vehicular ad hoc networks (VANETs) have been researched for over twenty years. Although being a fundamental communication approach for vehicles, the conventional VANETs are challenged by the newly emerged autonomous vehicles (AVs) which introduce new features and challenges on communications. In the meantime, with the recent advances of artificial intelligence and 5G cellular networks, how sho… ▽ More The vehicular ad hoc networks (VANETs) have been researched for over twenty years. Although being a fundamental communication approach for vehicles, the conventional VANETs are challenged by the newly emerged autonomous vehicles (AVs) which introduce new features and challenges on communications. In the meantime, with the recent advances of artificial intelligence and 5G cellular networks, how should the fundamental framework of VANET evolve to utilize the new technologies? In this article, we reconsider the problem of vehicle-to-vehicle communications when the network is composed of AVs. We discuss the features and specific demands of AVs and how the conventional VANETs should adapt to fit them. △ Less

Submitted 2 December, 2021; originally announced December 2021.

arXiv:2111.10790 [pdf, other]

DuDoTrans: Dual-Domain Transformer Provides More Attention for Sinogram Restoration in Sparse-View CT Reconstruction

Authors: Ce Wang, Kun Shang, Haimiao Zhang, Qian Li, Yuan Hui, S. Kevin Zhou

Abstract: While Computed Tomography (CT) reconstruction from X-ray sinograms is necessary for clinical diagnosis, iodine radiation in the imaging process induces irreversible injury, thereby driving researchers to study sparse-view CT reconstruction, that is, recovering a high-quality CT image from a sparse set of sinogram views. Iterative models are proposed to alleviate the appeared artifacts in sparse-vi… ▽ More While Computed Tomography (CT) reconstruction from X-ray sinograms is necessary for clinical diagnosis, iodine radiation in the imaging process induces irreversible injury, thereby driving researchers to study sparse-view CT reconstruction, that is, recovering a high-quality CT image from a sparse set of sinogram views. Iterative models are proposed to alleviate the appeared artifacts in sparse-view CT images, but the computation cost is too expensive. Then deep-learning-based methods have gained prevalence due to the excellent performances and lower computation. However, these methods ignore the mismatch between the CNN's \textbf{local} feature extraction capability and the sinogram's \textbf{global} characteristics. To overcome the problem, we propose \textbf{Du}al-\textbf{Do}main \textbf{Trans}former (\textbf{DuDoTrans}) to simultaneously restore informative sinograms via the long-range dependency modeling capability of Transformer and reconstruct CT image with both the enhanced and raw sinograms. With such a novel design, reconstruction performance on the NIH-AAPM dataset and COVID-19 dataset experimentally confirms the effectiveness and generalizability of DuDoTrans with fewer involved parameters. Extensive experiments also demonstrate its robustness with different noise-level scenarios for sparse-view CT reconstruction. The code and models are publicly available at https://github.com/DuDoTrans/CODE △ Less

Submitted 25 November, 2021; v1 submitted 21 November, 2021; originally announced November 2021.

arXiv:2108.13246 [pdf, other]

LUAI Challenge 2021 on Learning to Understand Aerial Images

Authors: Gui-Song Xia, Jian Ding, Ming Qian, Nan Xue, Jiaming Han, Xiang Bai, Michael Ying Yang, Shengyang Li, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang, Qiang Zhou, Chao-hui Yu, Kaixuan Hu, Yingjia Bu, Wenming Tan, Zhe Yang, Wei Li, Shang Liu, Jiaxuan Zhao, Tianzhi Ma, Zi-han Gao, Lingqi Wang , et al. (11 additional authors not shown)

Abstract: This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images. Using DOTA-v2.0 and GID-15 datasets, this challenge proposes three tasks for oriented object detection, horizontal object detection, and semantic segmentation of common categories in aerial images. This cha… ▽ More This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images. Using DOTA-v2.0 and GID-15 datasets, this challenge proposes three tasks for oriented object detection, horizontal object detection, and semantic segmentation of common categories in aerial images. This challenge received a total of 146 registrations on the three tasks. Through the challenge, we hope to draw attention from a wide range of communities and call for more efforts on the problems of learning to understand aerial images. △ Less

Submitted 17 September, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: 7 pages, 2 figures, accepted by ICCVW 2021

arXiv:2106.14226 [pdf, other]

Sequential Recommendation with Graph Neural Networks

Authors: Jianxin Chang, Chen Gao, Yu Zheng, Yiqun Hui, Yanan Niu, Yang Song, Depeng **, Yong Li

Abstract: Sequential recommendation aims to leverage users' historical behaviors to predict their next interaction. Existing works have not yet addressed two main challenges in sequential recommendation. First, user behaviors in their rich historical sequences are often implicit and noisy preference signals, they cannot sufficiently reflect users' actual preferences. In addition, users' dynamic preferences… ▽ More Sequential recommendation aims to leverage users' historical behaviors to predict their next interaction. Existing works have not yet addressed two main challenges in sequential recommendation. First, user behaviors in their rich historical sequences are often implicit and noisy preference signals, they cannot sufficiently reflect users' actual preferences. In addition, users' dynamic preferences often change rapidly over time, and hence it is difficult to capture user patterns in their historical sequences. In this work, we propose a graph neural network model called SURGE (short for SeqUential Recommendation with Graph neural nEtworks) to address these two issues. Specifically, SURGE integrates different types of preferences in long-term user behaviors into clusters in the graph by re-constructing loose item sequences into tight item-item interest graphs based on metric learning. This helps explicitly distinguish users' core interests, by forming dense clusters in the interest graph. Then, we perform cluster-aware and query-aware graph convolutional propagation and graph pooling on the constructed graph. It dynamically fuses and extracts users' current activated core interests from noisy user behavior sequences. We conduct extensive experiments on both public and proprietary industrial datasets. Experimental results demonstrate significant performance gains of our proposed method compared to state-of-the-art methods. Further studies on sequence length confirm that our method can model long behavioral sequences effectively and efficiently. △ Less

Submitted 26 July, 2023; v1 submitted 27 June, 2021; originally announced June 2021.

Comments: Accepted by SIGIR 2021

arXiv:2105.14711 [pdf, other]

CTSpine1K: A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

Authors: Yang Deng, Ce Wang, Yuan Hui, Qian Li, Jun Li, Shiwei Luo, Mengke Sun, Quan Quan, Shuxin Yang, You Hao, Pengbo Liu, Honghu Xiao, Chunpeng Zhao, Xinbao Wu, S. Kevin Zhou

Abstract: Spine-related diseases have high morbidity and cause a huge burden of social cost. Spine imaging is an essential tool for noninvasively visualizing and assessing spinal pathology. Segmenting vertebrae in computed tomography (CT) images is the basis of quantitative medical image analysis for clinical diagnosis and surgery planning of spine diseases. Current publicly available annotated datasets on… ▽ More Spine-related diseases have high morbidity and cause a huge burden of social cost. Spine imaging is an essential tool for noninvasively visualizing and assessing spinal pathology. Segmenting vertebrae in computed tomography (CT) images is the basis of quantitative medical image analysis for clinical diagnosis and surgery planning of spine diseases. Current publicly available annotated datasets on spinal vertebrae are small in size. Due to the lack of a large-scale annotated spine image dataset, the mainstream deep learning-based segmentation methods, which are data-driven, are heavily restricted. In this paper, we introduce a large-scale spine CT dataset, called CTSpine1K, curated from multiple sources for vertebra segmentation, which contains 1,005 CT volumes with over 11,100 labeled vertebrae belonging to different spinal conditions. Based on this dataset, we conduct several spinal vertebrae segmentation experiments to set the first benchmark. We believe that this large-scale dataset will facilitate further research in many spine-related image analysis tasks, including but not limited to vertebrae segmentation, labeling, 3D spine reconstruction from biplanar radiographs, image super-resolution, and enhancement. △ Less

Submitted 5 July, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

arXiv:2105.14602 [pdf, other]

On the geometry of generalization and memorization in deep neural networks

Authors: Cory Stephenson, Suchismita Padhy, Abhinav Ganesh, Yue Hui, Hanlin Tang, SueYeon Chung

Abstract: Understanding how large neural networks avoid memorizing training data is key to explaining their high generalization performance. To examine the structure of when and where memorization occurs in a deep network, we use a recently developed replica-based mean field theoretic geometric analysis method. We find that all layers preferentially learn from examples which share features, and link this be… ▽ More Understanding how large neural networks avoid memorizing training data is key to explaining their high generalization performance. To examine the structure of when and where memorization occurs in a deep network, we use a recently developed replica-based mean field theoretic geometric analysis method. We find that all layers preferentially learn from examples which share features, and link this behavior to generalization performance. Memorization predominately occurs in the deeper layers, due to decreasing object manifolds' radius and dimension, whereas early layers are minimally affected. This predicts that generalization can be restored by reverting the final few layer weights to earlier epochs before significant memorization occurred, which is confirmed by the experiments. Additionally, by studying generalization under different model sizes, we reveal the connection between the double descent phenomenon and the underlying model geometry. Finally, analytical analysis shows that networks avoid memorization early in training because close to initialization, the gradient contribution from permuted examples are small. These findings provide quantitative evidence for the structure of memorization across layers of a deep neural network, the drivers for such structure, and its connection to manifold geometric properties. △ Less

Submitted 30 May, 2021; originally announced May 2021.

Comments: ICLR 2021

arXiv:2105.09493 [pdf, other]

Futuristic Intelligent Transportation System

Authors: Yilong Hui, Zhou Su, Tom H. Luan, Nan Cheng

Abstract: The emerging autonomous vehicles (AVs) will inevitably revolutionize the transportation systems. This is because of a key feature of AVs; instead of being managed by human drivers as the conventional vehicles, AVs are of the complete capability to manage the driving by themselves. As a result, the futuristic intelligent transportation system (FITS) can be a centrally managed and optimized system w… ▽ More The emerging autonomous vehicles (AVs) will inevitably revolutionize the transportation systems. This is because of a key feature of AVs; instead of being managed by human drivers as the conventional vehicles, AVs are of the complete capability to manage the driving by themselves. As a result, the futuristic intelligent transportation system (FITS) can be a centrally managed and optimized system with the fully coordinated driving of vehicles, which is impossible by the current transportation systems controlled by humans. In this article, we envision the operation of such FITS when AVs, advanced vehicular networks (VANETs) and artificial intelligence (AI) are adopted. Specifically, we first develop the autonomous vehicular networks (AVNs) based on the advanced development of AVs and heterogeneous vehicular communication technologies to achieve global data collection and real-time data sharing. With this network architecture, we then integrate AVNs and AI based on the intelligent digital twin (IDT) to design the FITS with the target of setting up an accurate and efficient global traffic scheduling system. After that, compared with the conventional schemes, a customized path planning case is studied to evaluate the performance of the proposed FITS. Finally, we highlight the emerging issues related to the FITS for future research. △ Less

Submitted 19 May, 2021; originally announced May 2021.

arXiv:2102.10905 [pdf, other]

Joint Intent Detection And Slot Filling Based on Continual Learning Model

Authors: Yanfei Hui, Jianzong Wang, Ning Cheng, Fengying Yu, Tianbo Wu, **g Xiao

Abstract: Slot filling and intent detection have become a significant theme in the field of natural language understanding. Even though slot filling is intensively associated with intent detection, the characteristics of the information required for both tasks are different while most of those approaches may not fully aware of this problem. In addition, balancing the accuracy of two tasks effectively is an… ▽ More Slot filling and intent detection have become a significant theme in the field of natural language understanding. Even though slot filling is intensively associated with intent detection, the characteristics of the information required for both tasks are different while most of those approaches may not fully aware of this problem. In addition, balancing the accuracy of two tasks effectively is an inevitable problem for the joint learning model. In this paper, a Continual Learning Interrelated Model (CLIM) is proposed to consider semantic information with different characteristics and balance the accuracy between intent detection and slot filling effectively. The experimental results show that CLIM achieves state-of-the-art performace on slot filling and intent detection on ATIS and Snips. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: Accepted to ICASSP 2021

arXiv:2007.12770 [pdf, other]

BabyAI 1.1

Authors: David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

Abstract: The BabyAI platform is designed to measure the sample efficiency of training an agent to follow grounded-language instructions. BabyAI 1.0 presents baseline results of an agent trained by deep imitation or reinforcement learning. BabyAI 1.1 improves the agent's architecture in three minor ways. This increases reinforcement learning sample efficiency by up to 3 times and improves imitation learning… ▽ More The BabyAI platform is designed to measure the sample efficiency of training an agent to follow grounded-language instructions. BabyAI 1.0 presents baseline results of an agent trained by deep imitation or reinforcement learning. BabyAI 1.1 improves the agent's architecture in three minor ways. This increases reinforcement learning sample efficiency by up to 3 times and improves imitation learning performance on the hardest level from 77 % to 90.4 %. We hope that these improvements increase the computational efficiency of BabyAI experiments and help users design better agents. △ Less

Submitted 24 July, 2020; originally announced July 2020.

Comments: 9 pages, 1 figure, technical report

arXiv:2007.04250 [pdf, other]

A Benchmark of Medical Out of Distribution Detection

Authors: Tianshi Cao, Chin-Wei Huang, David Yu-Tung Hui, Joseph Paul Cohen

Abstract: Motivation: Deep learning models deployed for use on medical tasks can be equipped with Out-of-Distribution Detection (OoDD) methods in order to avoid erroneous predictions. However it is unclear which OoDD method should be used in practice. Specific Problem: Systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images s… ▽ More Motivation: Deep learning models deployed for use on medical tasks can be equipped with Out-of-Distribution Detection (OoDD) methods in order to avoid erroneous predictions. However it is unclear which OoDD method should be used in practice. Specific Problem: Systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images should be flagged by an OoDD method prior to diagnosis. Our approach: This paper defines 3 categories of OoD examples and benchmarks popular OoDD methods in three domains of medical imaging: chest X-ray, fundus imaging, and histology slides. Results: Our experiments show that despite methods yielding good results on some categories of out-of-distribution samples, they fail to recognize images close to the training distribution. Conclusion: We find a simple binary classifier on the feature representation has the best accuracy and AUPRC on average. Users of diagnostic tools which employ these OoDD methods should still remain vigilant that images very close to the training distribution yet not in it could yield unexpected results. △ Less

Submitted 4 August, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

Comments: Submitted to Machine Learning for Biomedical Imaging Journal (MELBA)

arXiv:2002.00412 [pdf, other]

Combating False Negatives in Adversarial Imitation Learning

Authors: Konrad Zolna, Chitwan Saharia, Leonard Boussioux, David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

Abstract: In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's t… ▽ More In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the 'False Negatives' (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude. △ Less

Submitted 2 February, 2020; originally announced February 2020.

Comments: This is an extended version of the student abstract published at 34th AAAI Conference on Artificial Intelligence

arXiv:1809.05076 [pdf, other]

Computer Vision-aided Atom Tracking in STEM Imaging

Authors: Yawei Hui, Yaohua Liu

Abstract: To address the SMC'17 data challenge -- "Data mining atomically resolved images for material properties", we first used the classic "blob detection" algorithms developed in computer vision to identify all atom centers in each STEM image frame. With the help of nearest neighbor analysis, we then found and labeled every atom center common to all the STEM frames and tracked their movements through th… ▽ More To address the SMC'17 data challenge -- "Data mining atomically resolved images for material properties", we first used the classic "blob detection" algorithms developed in computer vision to identify all atom centers in each STEM image frame. With the help of nearest neighbor analysis, we then found and labeled every atom center common to all the STEM frames and tracked their movements through the given time interval for both Molybdenum or Selenium atoms. △ Less

Submitted 13 September, 2018; originally announced September 2018.

arXiv:1809.05039 [pdf, other]

Discovering Features in Sr$_{14}$Cu$_{24}$O$_{41}$ Neutron Single Crystal Diffraction Data by Cluster Analysis

Authors: Yawei Hui, Yaohua Liu, Byung-Hoon Park

Abstract: To address the SMC'18 data challenge, "Discovering Features in Sr$_{14}$Cu$_{24}$O$_{41}$", we have used the clustering algorithm "DBSCAN" to separate the diffuse scattering features from the Bragg peaks, which takes into account both spatial and photometric information in the dataset during in the clustering process. We find that, in additional to highly localized Bragg peaks, there exists broad… ▽ More To address the SMC'18 data challenge, "Discovering Features in Sr$_{14}$Cu$_{24}$O$_{41}$", we have used the clustering algorithm "DBSCAN" to separate the diffuse scattering features from the Bragg peaks, which takes into account both spatial and photometric information in the dataset during in the clustering process. We find that, in additional to highly localized Bragg peaks, there exists broad diffuse scattering patterns consisting of distinguishable geometries. Besides these two distinctive features, we also identify a third distinguishable feature submerged in the low signal-to-noise region in the reciprocal space, whose origin remains an open question. △ Less

Submitted 13 September, 2018; originally announced September 2018.

arXiv:1710.05994 [pdf]

Volumetric Data Exploration with Machine Learning-Aided Visualization in Neutron Science

Authors: Yawei Hui, Yaohua Liu

Abstract: Recent advancements in neutron and X-ray sources, instrumentation and data collection modes have significantly increased the experimental data size (which could easily contain 10$^{8}$ -- 10$^{10}$ data points), so that conventional volumetric visualization approaches become inefficient for both still imaging and interactive OpenGL rendition in a 3D setting. We introduce a new approach based on th… ▽ More Recent advancements in neutron and X-ray sources, instrumentation and data collection modes have significantly increased the experimental data size (which could easily contain 10$^{8}$ -- 10$^{10}$ data points), so that conventional volumetric visualization approaches become inefficient for both still imaging and interactive OpenGL rendition in a 3D setting. We introduce a new approach based on the unsupervised machine learning algorithm, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), to efficiently analyze and visualize large volumetric datasets. Here we present two examples of analyzing and visualizing datasets from the diffuse scattering experiment of a single crystal sample and the tomographic reconstruction of a neutron scanning of a turbine blade. We found that by using the intensity as the weighting factor in the clustering process, DBSCAN becomes very effective in de-noising and feature/boundary detection, and thus enables better visualization of the hierarchical internal structures of the neutron scattering data. △ Less

Submitted 24 September, 2018; v1 submitted 16 October, 2017; originally announced October 2017.

Comments: 14 pages, 7 figures; the Computer Vision Conference (CVC), Las Vegas, Nevada, 2019; accepted

arXiv:1305.0540 [pdf, other]

Privacy Preserving Recommendation System Based on Groups

Authors: Shang Shang, Yuk Hui, Pan Hui, Paul Cuff, Sanjeev Kulkarni

Abstract: Recommendation systems have received considerable attention in the recent decades. Yet with the development of information technology and social media, the risk in revealing private data to service providers has been a growing concern to more and more users. Trade-offs between quality and privacy in recommendation systems naturally arise. In this paper, we present a privacy preserving recommendati… ▽ More Recommendation systems have received considerable attention in the recent decades. Yet with the development of information technology and social media, the risk in revealing private data to service providers has been a growing concern to more and more users. Trade-offs between quality and privacy in recommendation systems naturally arise. In this paper, we present a privacy preserving recommendation framework based on groups. The main idea is to use groups as a natural middleware to preserve users' privacy. A distributed preference exchange algorithm is proposed to ensure the anonymity of data, wherein the effective size of the anonymity set asymptotically approaches the group size with time. We construct a hybrid collaborative filtering model based on Markov random walks to provide recommendations and predictions to group members. Experimental results on the MovieLens and Epinions datasets show that our proposed methods outperform the baseline methods, L+ and ItemRank, two state-of-the-art personalized recommendation algorithms, for both recommendation precision and hit rate despite the absence of personal preference information. △ Less

Submitted 13 May, 2013; v1 submitted 2 May, 2013; originally announced May 2013.

Showing 1–38 of 38 results for author: Hui, Y