Skip to main content

Showing 1–48 of 48 results for author: She, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18193  [pdf, ps, other

    cs.CV cs.AI

    MammothModa: Multi-Modal Large Language Model

    Authors: Qi She, Junwen Pan, Xin Wan, Rui Zhang, Dawei Lu, Kai Huang

    Abstract: In this report, we introduce MammothModa, yet another multi-modal large language model (MLLM) designed to achieve state-of-the-art performance starting from an elementary baseline. We focus on three key design insights: (i) Integrating Visual Capabilities while Maintaining Complex Language Understanding: In addition to the vision encoder, we incorporated the Visual Attention Experts into the LLM t… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Technical report

  2. arXiv:2406.08100  [pdf, other

    cs.CL cs.AI

    Multimodal Table Understanding

    Authors: Mingyu Zheng, Xinwei Feng, Qingyi Si, Qiaoqiao She, Zheng Lin, Wenbin Jiang, Wei** Wang

    Abstract: Although great progress has been made by previous table understanding methods including recent approaches based on large language models (LLMs), they rely heavily on the premise that given tables must be converted into a certain text sequence (such as Markdown or HTML) to serve as model input. However, it is difficult to access such high-quality textual table representations in some real-world sce… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 23 pages, 16 figures, ACL 2024 main conference, camera-ready version

  3. arXiv:2404.09150  [pdf, other

    cs.RO cs.GR

    Learning Cross-hand Policies for High-DOF Reaching and Gras**

    Authors: Qi** She, Shishun Zhang, Yunfan Ye, Min Liu, Ruizhen Hu, Kai Xu

    Abstract: Reaching-and-gras** is a fundamental skill for robotic manipulation, but existing methods usually train models on a specific gripper and cannot be reused on another gripper without retraining. In this paper, we propose a novel method that can learn a unified policy model that can be easily transferred to different dexterous grippers. Our method consists of two stages: a gripper-agnostic policy m… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  4. arXiv:2404.07473  [pdf

    eess.IV cs.CV cs.LG

    LUCF-Net: Lightweight U-shaped Cascade Fusion Network for Medical Image Segmentation

    Authors: Songkai Sun, Qingshan She, Yuliang Ma, Rihui Li, Yingchun Zhang

    Abstract: In this study, the performance of existing U-shaped neural network architectures was enhanced for medical image segmentation by adding Transformer. Although Transformer architectures are powerful at extracting global information, its ability to capture local information is limited due to its high complexity. To address this challenge, we proposed a new lightweight U-shaped cascade fusion network (… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  5. arXiv:2402.13634  [pdf, other

    cs.RO cs.LG

    Learning Dual-arm Object Rearrangement for Cartesian Robots

    Authors: Shishun Zhang, Qi** She, Wenhao Li, Chenyang Zhu, Yongjun Wang, Ruizhen Hu, Kai Xu

    Abstract: This work focuses on the dual-arm object rearrangement problem abstracted from a realistic industrial scenario of Cartesian robots. The goal of this problem is to transfer all the objects from sources to targets with the minimum total completion time. To achieve the goal, the core idea is to develop an effective object-to-arm task assignment strategy for minimizing the cumulative task execution ti… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 7 pages, 9 figures, conference

  6. arXiv:2303.13824  [pdf, other

    cs.CL cs.AI

    $k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference

    Authors: Benfeng Xu, Quan Wang, Zhendong Mao, Yajuan Lyu, Qiaoqiao She, Yongdong Zhang

    Abstract: In-Context Learning (ICL), which formulates target tasks as prompt completion conditioned on in-context demonstrations, has become the prevailing utilization of LLMs. In this paper, we first disclose an actual predicament for this typical usage that it can not scale up with training data due to context length restriction. Besides, existing works have shown that ICL also suffers from various biases… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: ICLR 2023. Code is available at https://github.com/BenfengXu/KNNPrompting

    Journal ref: ICLR 2023

  7. arXiv:2210.16031  [pdf, other

    cs.CV cs.CL

    UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance

    Authors: Wei Li, Xue Xu, Xinyan Xiao, Jiachen Liu, Hu Yang, Guohao Li, Zhanpeng Wang, Zhifan Feng, Qiaoqiao She, Yajuan Lyu, Hua Wu

    Abstract: Diffusion generative models have recently greatly improved the power of text-conditioned image generation. Existing image generation models mainly include text conditional diffusion model and cross-modal guided diffusion model, which are good at small scene image generation and complex scene image generation respectively. In this work, we propose a simple yet effective approach, namely UPainting,… ▽ More

    Submitted 2 November, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: First Version, 16 pages

  8. arXiv:2208.03720  [pdf, other

    cs.CV

    PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs

    Authors: Zhengyang Shen, Tao Hong, Qi She, **wen Ma, Zhouchen Lin

    Abstract: Steerable models can provide very general and flexible equivariance by formulating equivariance requirements in the language of representation theory and feature fields, which has been recognized to be effective for many vision tasks. However, deriving steerable models for 3D rotations is much more difficult than that in the 2D case, due to more complicated mathematics of 3D rotations. In this wor… ▽ More

    Submitted 7 August, 2022; originally announced August 2022.

    Comments: accepted by ICML2022

  9. arXiv:2208.00399  [pdf, other

    cs.CL cs.AI

    Neural Knowledge Bank for Pretrained Transformers

    Authors: Damai Dai, Wenbin Jiang, Qingxiu Dong, Yajuan Lyu, Qiaoqiao She, Zhifang Sui

    Abstract: The ability of pretrained Transformers to remember factual knowledge is essential but still limited for existing models. Inspired by existing work that regards Feed-Forward Networks (FFNs) in Transformers as key-value memories, we design a Neural Knowledge Bank (NKB) and a knowledge injection strategy to introduce extra factual knowledge for pretrained Transformers. The NKB is in the form of addit… ▽ More

    Submitted 16 August, 2022; v1 submitted 31 July, 2022; originally announced August 2022.

  10. arXiv:2205.12593  [pdf, other

    cs.CL

    Less Learn Shortcut: Analyzing and Mitigating Learning of Spurious Feature-Label Correlation

    Authors: Yanrui Du, **g Yan, Yan Chen, **g Liu, Sendong Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, Bing Qin

    Abstract: Recent research has revealed that deep neural networks often take dataset biases as a shortcut to make decisions rather than understand tasks, leading to failures in real-world applications. In this study, we focus on the spurious correlation between word features and labels that models learn from the biased data distribution of training data. In particular, we define the word highly co-occurring… ▽ More

    Submitted 22 June, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

  11. arXiv:2204.13998  [pdf, other

    cs.RO cs.CV cs.GR

    Learning High-DOF Reaching-and-Gras** via Dynamic Representation of Gripper-Object Interaction

    Authors: Qi** She, Ruizhen Hu, Juzhan Xu, Min Liu, Kai Xu, Hui Huang

    Abstract: We approach the problem of high-DOF reaching-and-gras** via learning joint planning of grasp and motion with deep reinforcement learning. To resolve the sample efficiency issue in learning the high-dimensional and complex control of dexterous gras**, we propose an effective representation of gras** state characterizing the spatial interaction between the gripper and the target object. To rep… ▽ More

    Submitted 3 April, 2022; originally announced April 2022.

  12. arXiv:2203.10232  [pdf, other

    cs.CL cs.IR

    DuReader_retrieval: A Large-scale Chinese Benchmark for Passage Retrieval from Web Search Engine

    Authors: Yifu Qiu, Hongyu Li, Yingqi Qu, Ying Chen, Qiaoqiao She, **g Liu, Hua Wu, Haifeng Wang

    Abstract: In this paper, we present DuReader_retrieval, a large-scale Chinese dataset for passage retrieval. DuReader_retrieval contains more than 90K queries and over 8M unique passages from a commercial search engine. To alleviate the shortcomings of other datasets and ensure the quality of our benchmark, we (1) reduce the false negatives in development and test sets by manually annotating results pooled… ▽ More

    Submitted 15 November, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: EMNLP 2022, 13 pages

  13. arXiv:2203.01785  [pdf, other

    cs.LG cs.CV

    On Learning Contrastive Representations for Learning with Noisy Labels

    Authors: Li Yi, Sheng Liu, Qi She, A. Ian McLeod, Boyu Wang

    Abstract: Deep neural networks are able to memorize noisy labels easily with a softmax cross-entropy (CE) loss. Previous studies attempted to address this issue focus on incorporating a noise-robust loss function to the CE loss. However, the memorization issue is alleviated but still remains due to the non-robust CE loss. To address this issue, we focus on learning robust contrastive representations of data… ▽ More

    Submitted 23 July, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

  14. arXiv:2203.01714  [pdf, other

    cs.CV cs.LG

    Weakly Supervised Object Localization as Domain Adaption

    Authors: Lei Zhu, Qi She, Qian Chen, Yunfei You, Boyu Wang, Yanye Lu

    Abstract: Weakly supervised object localization (WSOL) focuses on localizing objects only with the supervision of image-level classification masks. Most previous WSOL methods follow the classification activation map (CAM) that localizes objects based on the classification structure with the multi-instance learning (MIL) mechanism. However, the MIL mechanism makes CAM only activate discriminative object part… ▽ More

    Submitted 24 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: Accept by CVPR 2022 Conference

  15. arXiv:2112.14379  [pdf, other

    cs.CV

    Background-aware Classification Activation Map for Weakly Supervised Object Localization

    Authors: Lei Zhu, Qi She, Qian Chen, Xiangxi Meng, Mufeng Geng, Lujia **, Zhe Jiang, Bin Qiu, Yunfei You, Yibao Zhang, Qiushi Ren, Yanye Lu

    Abstract: Weakly supervised object localization (WSOL) relaxes the requirement of dense annotations for object localization by using image-level classification masks to supervise its learning process. However, current WSOL methods suffer from excessive activation of background locations and need post-processing to obtain the localization mask. This paper attributes these issues to the unawareness of backgro… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

  16. arXiv:2111.13241  [pdf, other

    cs.CV

    Learning from Temporal Gradient for Semi-supervised Action Recognition

    Authors: Junfei Xiao, Longlong **g, Lin Zhang, Ju He, Qi She, Zongwei Zhou, Alan Yuille, Yingwei Li

    Abstract: Semi-supervised video action recognition tends to enable deep neural networks to achieve remarkable performance even with very limited labeled data. However, existing methods are mainly transferred from current image-based methods (e.g., FixMatch). Without specifically utilizing the temporal dynamics and inherent multimodal attributes, their results could be suboptimal. To better leverage the enco… ▽ More

    Submitted 23 April, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: CVPR 2022

  17. arXiv:2110.08814  [pdf, other

    cs.CV

    TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding

    Authors: Zhengwei Wang, Qi She, Aljosa Smolic

    Abstract: Most of existing video action recognition models ingest raw RGB frames. However, the raw video stream requires enormous storage and contains significant temporal redundancy. Video compression (e.g., H.264, MPEG-4) reduces superfluous information by representing the raw video stream using the concept of Group of Pictures (GOP). Each GOP is composed of the first I-frame (aka RGB image) followed by a… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Comments: To appear in BMVC 2021

  18. arXiv:2110.07367  [pdf, other

    cs.CL

    RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

    Authors: Ruiyang Ren, Yingqi Qu, **g Liu, Wayne Xin Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, Ji-Rong Wen

    Abstract: In various natural language processing tasks, passage retrieval and passage re-ranking are two key procedures in finding and ranking relevant information. Since both the two procedures contribute to the final performance, it is important to jointly optimize them in order to achieve mutual improvement. In this paper, we propose a novel joint training approach for dense passage retrieval and passage… ▽ More

    Submitted 23 April, 2023; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: EMNLP 2021

  19. arXiv:2110.02794  [pdf, other

    cs.CV

    3rd Place Solution to Google Landmark Recognition Competition 2021

    Authors: Cheng Xu, Weimin Wang, Shuai Liu, Yong Wang, Yuxiang Tang, Tianling Bian, Yanyu Yan, Qi She, Cheng Yang

    Abstract: In this paper, we show our solution to the Google Landmark Recognition 2021 Competition. Firstly, embeddings of images are extracted via various architectures (i.e. CNN-, Transformer- and hybrid-based), which are optimized by ArcFace loss. Then we apply an efficient pipeline to re-rank predictions by adjusting the retrieval score with classification logits and non-landmark distractors. Finally, th… ▽ More

    Submitted 7 October, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: Corrected typos

  20. PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval

    Authors: Ruiyang Ren, Shangwen Lv, Yingqi Qu, **g Liu, Wayne Xin Zhao, QiaoQiao She, Hua Wu, Haifeng Wang, Ji-Rong Wen

    Abstract: Recently, dense passage retrieval has become a mainstream approach to finding relevant information in various natural language processing tasks. A number of studies have been devoted to improving the widely adopted dual-encoder architecture. However, most of the previous studies only consider query-centric similarity relation when learning the dual-encoder retriever. In order to capture more compr… ▽ More

    Submitted 23 April, 2023; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: ACL 2021

  21. arXiv:2108.05722  [pdf, other

    cs.CV cs.LG

    MT-ORL: Multi-Task Occlusion Relationship Learning

    Authors: Panhe Feng, Qi She, Lei Zhu, Jiaxin Li, Lin Zhang, Zijian Feng, Changhu Wang, Chunpeng Li, Xue**g Kang, Anlong Ming

    Abstract: Retrieving occlusion relation among objects in a single image is challenging due to sparsity of boundaries in image. We observe two key issues in existing works: firstly, lack of an architecture which can exploit the limited amount of coupling in the decoder stage between the two subtasks, namely occlusion boundary extraction and occlusion orientation prediction, and secondly, improper representat… ▽ More

    Submitted 18 August, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV 2021

  22. arXiv:2108.02451  [pdf, other

    cs.CV cs.AI

    Unifying Nonlocal Blocks for Neural Networks

    Authors: Lei Zhu, Qi She, Duo Li, Yanye Lu, Xue**g Kang, Jie Hu, Changhu Wang

    Abstract: The nonlocal-based blocks are designed for capturing long-range spatial-temporal dependencies in computer vision tasks. Although having shown excellent performance, they still lack the mechanism to encode the rich, structured information among elements in an image or video. In this paper, to theoretically analyze the property of these nonlocal-based blocks, we provide a new perspective to interpre… ▽ More

    Submitted 17 August, 2021; v1 submitted 5 August, 2021; originally announced August 2021.

    Comments: Accept by ICCV 2021 Conference

  23. arXiv:2107.12025  [pdf, other

    cs.IR cs.AI

    ContextNet: A Click-Through Rate Prediction Framework Using Contextual information to Refine Feature Embedding

    Authors: Zhiqiang Wang, Qingyun She, PengTao Zhang, Junlin Zhang

    Abstract: Click-through rate (CTR) estimation is a fundamental task in personalized advertising and recommender systems and it's important for ranking models to effectively capture complex high-order features.Inspired by the success of ELMO and Bert in NLP field, which dynamically refine word embedding according to the context sentence information where the word appears, we think it's also important to dyna… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: arXiv admin note: text overlap with arXiv:2102.07619

  24. arXiv:2107.12024  [pdf, other

    cs.IR cs.AI

    Leaf-FM: A Learnable Feature Generation Factorization Machine for Click-Through Rate Prediction

    Authors: Qingyun She, Zhiqiang Wang, Junlin Zhang

    Abstract: Click-through rate (CTR) prediction plays important role in personalized advertising and recommender systems. Though many models have been proposed such as FM, FFM and DeepFM in recent years, feature engineering is still a very important way to improve the model performance in many applications because using raw features can rarely lead to optimal results. For example, the continuous features are… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

  25. arXiv:2107.11098  [pdf, other

    cs.LG cs.AI

    Generative adversarial networks in time series: A survey and taxonomy

    Authors: Eoin Brophy, Zhengwei Wang, Qi She, Tomas Ward

    Abstract: Generative adversarial networks (GANs) studies have grown exponentially in the past few years. Their impact has been seen mainly in the computer vision field with realistic image and video manipulation, especially generation, making significant advancements. While these computer vision advances have garnered much attention, GAN applications have diversified across disciplines such as time series a… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

  26. arXiv:2107.01194  [pdf, other

    cs.CV cs.AI

    Inter-intra Variant Dual Representations forSelf-supervised Video Recognition

    Authors: Lin Zhang, Qi She, Zhengyang Shen, Changhu Wang

    Abstract: Contrastive learning applied to self-supervised representation learning has seen a resurgence in deep models. In this paper, we find that existing contrastive learning based solutions for self-supervised video recognition focus on inter-variance encoding but ignore the intra-variance existing in clips within the same video. We thus propose to learn dual representations for each clip which (\romann… ▽ More

    Submitted 23 October, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: Accepted by BMVC 2021

  27. arXiv:2104.00405  [pdf, other

    cs.LG cs.AI cs.CV

    Avalanche: an End-to-End Library for Continual Learning

    Authors: Vincenzo Lomonaco, Lorenzo Pellegrini, Andrea Cossu, Antonio Carta, Gabriele Graffieti, Tyler L. Hayes, Matthias De Lange, Marc Masana, Jary Pomponi, Gido van de Ven, Martin Mundt, Qi She, Keiland Cooper, Jeremy Forest, Eden Belouadah, Simone Calderara, German I. Parisi, Fabio Cuzzolin, Andreas Tolias, Simone Scardapane, Luca Antiga, Subutai Amhad, Adrian Popescu, Christopher Kanan, Joost van de Weijer , et al. (3 additional authors not shown)

    Abstract: Learning continually from non-stationary data streams is a long-standing goal and a challenging problem in machine learning. Recently, we have witnessed a renewed and fast-growing interest in continual learning, especially within the deep learning community. However, algorithmic solutions are often difficult to re-implement, evaluate and port across different settings, where even results on standa… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: Official Website: https://avalanche.continualai.org

  28. arXiv:2103.14910  [pdf, other

    cs.CV cs.GR cs.LG

    MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis

    Authors: Jiaxin Li, Zijian Feng, Qi She, Henghui Ding, Changhu Wang, Gim Hee Lee

    Abstract: In this paper, we propose MINE to perform novel view synthesis and depth estimation via dense 3D reconstruction from a single image. Our approach is a continuous depth generalization of the Multiplane Images (MPI) by introducing the NEural radiance fields (NeRF). Given a single image as input, MINE predicts a 4-channel image (RGB and volume density) at arbitrary depth values to jointly reconstruct… ▽ More

    Submitted 30 July, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

    Comments: ICCV 2021. Main paper and supplementary materials

  29. arXiv:2103.10681  [pdf, other

    cs.CV cs.LG

    Learning the Superpixel in a Non-iterative and Lifelong Manner

    Authors: Lei Zhu, Qi She, Bin Zhang, Yanye Lu, Zhilin Lu, Duo Li, Jie Hu

    Abstract: Superpixel is generated by automatically clustering pixels in an image into hundreds of compact partitions, which is widely used to perceive the object contours for its excellent contour adherence. Although some works use the Convolution Neural Network (CNN) to generate high-quality superpixel, we challenge the design principles of these networks, specifically for their dependence on manual labels… ▽ More

    Submitted 21 April, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

    Comments: Accept by CVPR2021

  30. arXiv:2103.07372  [pdf, other

    cs.CV

    ACTION-Net: Multipath Excitation for Action Recognition

    Authors: Zhengwei Wang, Qi She, Aljosa Smolic

    Abstract: Spatial-temporal, channel-wise, and motion patterns are three complementary and crucial types of information for video action recognition. Conventional 2D CNNs are computationally cheap but cannot catch temporal relationships; 3D CNNs can achieve good performance but are computationally intensive. In this work, we tackle this dilemma by designing a generic and effective module that can be embedded… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: To appear in CVPR 2021

  31. arXiv:2103.06255  [pdf, other

    cs.CV

    Involution: Inverting the Inherence of Convolution for Visual Recognition

    Authors: Duo Li, Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen

    Abstract: Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision. In this work, we rethink the inherent principles of standard convolution for vision tasks, specifically spatial-agnostic and channel-specific. Instead, we present a novel atomic operation for deep neural networks by inverting the aforementioned design principles of convolution, coin… ▽ More

    Submitted 11 April, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

    Comments: Accepted to CVPR 2021. Code and models are available at https://github.com/d-li14/involution

  32. arXiv:2102.07619  [pdf, other

    cs.IR

    MaskNet: Introducing Feature-Wise Multiplication to CTR Ranking Models by Instance-Guided Mask

    Authors: Zhiqiang Wang, Qingyun She, Junlin Zhang

    Abstract: Click-Through Rate(CTR) estimation has become one of the most fundamental tasks in many real-world applications and it's important for ranking models to effectively capture complex high-order features. Shallow feed-forward network is widely used in many state-of-the-art DNN models such as FNN, DeepFM and xDeepFM to implicitly capture high-order feature interactions. However, some research has prov… ▽ More

    Submitted 26 July, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: In Proceedings of DLP-KDD 2021. ACM,Singapore. arXiv admin note: text overlap with arXiv:2006.12753

  33. arXiv:2009.09929  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    CVPR 2020 Continual Learning in Computer Vision Competition: Approaches, Results, Current Challenges and Future Directions

    Authors: Vincenzo Lomonaco, Lorenzo Pellegrini, Pau Rodriguez, Massimo Caccia, Qi She, Yu Chen, Quentin Jodelet, Rui** Wang, Zheda Mai, David Vazquez, German I. Parisi, Nikhil Churamani, Marc Pickett, Issam Laradji, Davide Maltoni

    Abstract: In the last few years, we have witnessed a renewed and fast-growing interest in continual learning with deep neural networks with the shared objective of making current AI systems more adaptive, efficient and autonomous. However, despite the significant and undoubted progress of the field in addressing the issue of catastrophic forgetting, benchmarking different continual learning approaches is a… ▽ More

    Submitted 14 September, 2020; originally announced September 2020.

    Comments: Pre-print v1: 12 pages, 3 figures, 8 tables

  34. arXiv:2009.05959  [pdf, other

    cs.CL cs.AI

    BoostingBERT:Integrating Multi-Class Boosting into BERT for NLP Tasks

    Authors: Tongwen Huang, Qingyun She, Junlin Zhang

    Abstract: As a pre-trained Transformer model, BERT (Bidirectional Encoder Representations from Transformers) has achieved ground-breaking performance on multiple NLP tasks. On the other hand, Boosting is a popular ensemble learning technique which combines many base classifiers and has been demonstrated to yield better generalization performance in many machine learning tasks. Some works have indicated that… ▽ More

    Submitted 13 September, 2020; originally announced September 2020.

    Comments: 11 pages, 3 figures

  35. arXiv:2007.03519  [pdf, other

    cs.LG

    GateNet: Gating-Enhanced Deep Network for Click-Through Rate Prediction

    Authors: Tongwen Huang, Qingyun She, Zhiqiang Wang, Junlin Zhang

    Abstract: Advertising and feed ranking are essential to many Internet companies such as Facebook. Among many real-world advertising and feed ranking systems, click through rate (CTR) prediction plays a central role. In recent years, many neural network based CTR models have been proposed and achieved success such as Factorization-Machine Supported Neural Networks, DeepFM and xDeepFM. Many of them contain tw… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

  36. arXiv:2006.14978  [pdf, other

    cs.LG stat.ML

    Online 3D Bin Packing with Constrained Deep Reinforcement Learning

    Authors: Hang Zhao, Qi** She, Chenyang Zhu, Yin Yang, Kai Xu

    Abstract: We solve a challenging yet practically useful variant of 3D Bin Packing Problem (3D-BPP). In our problem, the agent has limited information about the items to be packed into the bin, and an item must be packed immediately after its arrival without buffering or readjusting. The item's placement also subjects to the constraints of collision avoidance and physical stability. We formulate this online… ▽ More

    Submitted 13 January, 2022; v1 submitted 26 June, 2020; originally announced June 2020.

    Comments: AAAI 2021

  37. arXiv:2006.12753  [pdf, other

    cs.LG stat.ML

    Correct Normalization Matters: Understanding the Effect of Normalization On Deep Neural Network Models For Click-Through Rate Prediction

    Authors: Zhiqiang Wang, Qingyun She, PengTao Zhang, Junlin Zhang

    Abstract: Normalization has become one of the most fundamental components in many deep neural networks for machine learning tasks while deep neural network has also been widely used in CTR estimation field. Among most of the proposed deep neural network models, few model utilize normalization approaches. Though some works such as Deep & Cross Network (DCN) and Neural Factorization Machine (NFM) use Batch No… ▽ More

    Submitted 7 July, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

  38. arXiv:2004.14774  [pdf, other

    cs.CV cs.LG cs.RO eess.IV stat.ML

    IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report

    Authors: Qi She, Fan Feng, Qi Liu, Rosa H. M. Chan, Xinyue Hao, Chuanlin Lan, Qihan Yang, Vincenzo Lomonaco, German I. Parisi, Heechul Bae, Eoin Brophy, Baoquan Chen, Gabriele Graffieti, Vidit Goel, Hyonyoung Han, Sathursan Kanagarajah, Somesh Kumar, Siew-Kei Lam, Tin Lun Lam, Liang Ma, Davide Maltoni, Lorenzo Pellegrini, Duvindu Piyasena, Shiliang Pu, Debdoot Sheet , et al. (11 additional authors not shown)

    Abstract: This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

    Comments: 9 pages, 11 figures, 3 tables, accepted into IEEE Robotics and Automation Magazine. arXiv admin note: text overlap with arXiv:1911.06487

  39. arXiv:2004.09215  [pdf, other

    cs.CV cs.LG eess.IV

    CatNet: Class Incremental 3D ConvNets for Lifelong Egocentric Gesture Recognition

    Authors: Zhengwei Wang, Qi She, Tejo Chalasani, Aljosa Smolic

    Abstract: Egocentric gestures are the most natural form of communication for humans to interact with wearable devices such as VR/AR helmets and glasses. A major issue in such scenarios for real-world applications is that may easily become necessary to add new gestures to the system e.g., a proper VR system should allow users to customize gestures incrementally. Traditional deep learning methods require stor… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

    Comments: CVPR 2020 Workshop at Continual Learning (CLVISION)

  40. arXiv:2003.03193  [pdf, other

    cs.CV cs.HC cs.LG eess.IV

    A Neuro-AI Interface for Evaluating Generative Adversarial Networks

    Authors: Zhengwei Wang, Qi She, Alan F. Smeaton, Tomas E. Ward, Graham Healy

    Abstract: Generative adversarial networks (GANs) are increasingly attracting attention in the computer vision, natural language processing, speech synthesis and similar domains. However, evaluating the performance of GANs is still an open and challenging problem. Existing evaluation metrics primarily measure the dissimilarity between real and generated images using automated statistical methods. They often… ▽ More

    Submitted 6 April, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: Accepted by ICLR 2020 Workshop Bridging AI and Cognitive Science (BAICS). arXiv admin note: substantial text overlap with arXiv:1905.04243

  41. arXiv:1911.06487  [pdf, other

    cs.CV cs.LG cs.RO stat.ML

    OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning

    Authors: Qi She, Fan Feng, Xinyue Hao, Qihan Yang, Chuanlin Lan, Vincenzo Lomonaco, Xuesong Shi, Zhengwei Wang, Yao Guo, Yimin Zhang, Fei Qiao, Rosa H. M. Chan

    Abstract: The recent breakthroughs in computer vision have benefited from the availability of large representative datasets (e.g. ImageNet and COCO) for training. Yet, robotic vision poses unique challenges for applying visual algorithms developed from these standard computer vision datasets due to their implicit assumption over non-varying distributions for a fixed set of tasks. Fully retraining models eac… ▽ More

    Submitted 6 March, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

    Comments: 7 pages, 7 figures, 4 tables

  42. arXiv:1911.05603  [pdf, other

    cs.RO cs.CV

    Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM

    Authors: Xuesong Shi, Dongjiang Li, Pengpeng Zhao, Qinbin Tian, Yuxin Tian, Qiwei Long, Chunhao Zhu, **gwei Song, Fei Qiao, Le Song, Yangquan Guo, Zhigang Wang, Yimin Zhang, Baoxing Qin, Wei Yang, Fangshi Wang, Rosa H. M. Chan, Qi She

    Abstract: Service robots should be able to operate autonomously in dynamic and daily changing environments over an extended period of time. While Simultaneous Localization And Map** (SLAM) is one of the most fundamental problems for robotic autonomy, most existing SLAM works are evaluated with data sequences that are recorded in a short period of time. In real-world deployment, there can be out-of-sight s… ▽ More

    Submitted 13 March, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

    Comments: To be published on ICRA 2020; 7 pages, 3 figures; v2 fixed a number in Table III

  43. arXiv:1911.01059  [pdf, other

    cs.CV

    A Spectral Nonlocal Block for Neural Networks

    Authors: Lei Zhu, Qi She, Lidan Zhang, ** Guo

    Abstract: The nonlocal-based blocks are designed for capturing long-range spatial-temporal dependencies in computer vision tasks. Although having shown excellent performances, they lack the mechanism to encode the rich, structured information among elements in an image. In this paper, to theoretically analyze the property of these nonlocal-based blocks, we provide a unified approach to interpreting them, wh… ▽ More

    Submitted 10 February, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

  44. arXiv:1907.10233  [pdf, ps, other

    cs.CV

    Stochastic trajectory prediction with social graph network

    Authors: Lidan Zhang, Qi She, ** Guo

    Abstract: Pedestrian trajectory prediction is a challenging task because of the complexity of real-world human social behaviors and uncertainty of the future motion. For the first issue, existing methods adopt fully connected topology for modeling the social behaviors, while ignoring non-symmetric pairwise relationships. To effectively capture social behaviors of relevant pedestrians, we utilize a directed… ▽ More

    Submitted 24 July, 2019; originally announced July 2019.

    Comments: 10 pages, 5 figures

  45. arXiv:1907.00650  [pdf, other

    cs.LG q-bio.NC stat.ML

    Neural Dynamics Discovery via Gaussian Process Recurrent Neural Networks

    Authors: Qi She, Anqi Wu

    Abstract: Latent dynamics discovery is challenging in extracting complex dynamics from high-dimensional noisy neural data. Many dimensionality reduction methods have been widely adopted to extract low-dimensional, smooth and time-evolving latent trajectories. However, simple state transition structures, linear embedding assumptions, or inflexible inference networks impede the accurate recovery of dynamic po… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: 11 pages, 3 figures, 7 Tables, accepted to The Conference on Uncertainty in Artificial Intelligence (UAI), 2019

  46. arXiv:1906.01529  [pdf, other

    cs.LG cs.CV

    Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy

    Authors: Zhengwei Wang, Qi She, Tomas E. Ward

    Abstract: Generative adversarial networks (GANs) have been extensively studied in the past few years. Arguably their most significant impact has been in the area of computer vision where great advances have been made in challenges such as plausible image generation, image-to-image translation, facial attribute manipulation and similar domains. Despite the significant successes achieved to date, applying GAN… ▽ More

    Submitted 29 December, 2020; v1 submitted 4 June, 2019; originally announced June 2019.

    Comments: Accepted by ACM Computing Surveys, 23 November 2020

  47. arXiv:1905.04243  [pdf, other

    cs.CV cs.LG eess.IV eess.SP

    Synthetic-Neuroscore: Using A Neuro-AI Interface for Evaluating Generative Adversarial Networks

    Authors: Zhengwei Wang, Qi She, Alan F. Smeaton, Tomas E. Ward, Graham Healy

    Abstract: Generative adversarial networks (GANs) are increasingly attracting attention in the computer vision, natural language processing, speech synthesis and similar domains. Arguably the most striking results have been in the area of image synthesis. However, evaluating the performance of GANs is still an open and challenging problem. Existing evaluation metrics primarily measure the dissimilarity betwe… ▽ More

    Submitted 2 February, 2020; v1 submitted 10 May, 2019; originally announced May 2019.

  48. arXiv:1711.05073  [pdf, other

    cs.CL

    DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

    Authors: Wei He, Kai Liu, **g Liu, Yajuan Lyu, Shiqi Zhao, Xinyan Xiao, Yuan Liu, Yizhong Wang, Hua Wu, Qiaoqiao She, Xuan Liu, Tian Wu, Haifeng Wang

    Abstract: This paper introduces DuReader, a new large-scale, open-domain Chinese ma- chine reading comprehension (MRC) dataset, designed to address real-world MRC. DuReader has three advantages over previous MRC datasets: (1) data sources: questions and documents are based on Baidu Search and Baidu Zhidao; answers are manually generated. (2) question types: it provides rich annotations for more question typ… ▽ More

    Submitted 10 June, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: 10 pages, ACL 2018 MRQA Workshop camera-ready version