Skip to main content

Showing 1–50 of 59 results for author: Jizhong

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13294  [pdf, other

    cs.MM cs.LG

    Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens

    Authors: Xikang Yang, Xuehai Tang, Fuqing Zhu, Jizhong Han, Songlin Hu

    Abstract: Vision-language models (VLMs) seamlessly integrate visual and textual data to perform tasks such as image classification, caption generation, and visual question answering. However, adversarial images often struggle to deceive all prompts effectively in the context of cross-prompt migration attacks, as the probability distribution of the tokens in these images tends to favor the semantics of the o… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages

  2. arXiv:2406.13275  [pdf, other

    cs.SD cs.CL eess.AS

    Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

    Authors: Jizhong Liu, Gang Li, Junbo Zhang, Heinrich Dinkel, Yongqing Wang, Zhiyong Yan, Yujun Wang, Bin Wang

    Abstract: Automated audio captioning (AAC) is an audio-to-text task to describe audio contents in natural language. Recently, the advancements in large language models (LLMs), with improvements in training approaches for audio encoders, have opened up possibilities for improving AAC. Thus, we explore enhancing AAC from three aspects: 1) a pre-trained audio encoder via consistent ensemble distillation (CED)… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  3. arXiv:2406.07362  [pdf, other

    cs.HC

    AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database

    Authors: Wanling Gao, Yuan Liu, Zhuoming Yu, Dandan Cui, Wen**g Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Fan Huang, Gangyuan Zhao, Chongrong Jiang, Tianyi Wei, Zhifei Zhang, Yunyou Huang, Jianfeng Zhan

    Abstract: Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f… ▽ More

    Submitted 15 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 12 pages

  4. arXiv:2406.07012  [pdf, other

    cs.SD cs.CL eess.AS

    Bridging Language Gaps in Audio-Text Retrieval

    Authors: Zhiyong Yan, Heinrich Dinkel, Yongqing Wang, Jizhong Liu, Junbo Zhang, Yujun Wang, Bin Wang

    Abstract: Audio-text retrieval is a challenging task, requiring the search for an audio clip or a text caption within a database. The predominant focus of existing research on English descriptions poses a limitation on the applicability of such models, given the abundance of non-English content in real-world data. To address these linguistic disparities, we propose a language enhancement (LE), using a multi… ▽ More

    Submitted 16 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: interspeech2024

  5. arXiv:2405.05610  [pdf, other

    cs.CL cs.CR cs.LG

    Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM

    Authors: Xikang Yang, Xuehai Tang, Songlin Hu, Jizhong Han

    Abstract: Large language models (LLMs) have achieved remarkable performance in various natural language processing tasks, especially in dialogue systems. However, LLM may also pose security and moral threats, especially in multi round conversations where large models are more easily guided by contextual content, resulting in harmful or biased responses. In this paper, we present a novel method to attack LLM… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  6. arXiv:2404.19171  [pdf, other

    cs.CV cs.AI

    Explicit Correlation Learning for Generalizable Cross-Modal Deepfake Detection

    Authors: Cai Yu, Shan Jia, Xiaomeng Fu, ** Liu, Jiahe Tian, Jiao Dai, Xi Wang, Siwei Lyu, Jizhong Han

    Abstract: With the rising prevalence of deepfakes, there is a growing interest in develo** generalizable detection methods for various types of deepfakes. While effective in their specific modalities, traditional detection methods fall short in addressing the generalizability of detection across diverse cross-modal deepfakes. This paper aims to explicitly learn potential cross-modal correlation to enhance… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: accepted by ICME 2024

  7. Surveyor: Facilitating Discovery Within Video Games for Blind and Low Vision Players

    Authors: Vishnu Nair, Hanxiu 'Hazel' Zhu, Peize Song, Jizhong Wang, Brian A. Smith

    Abstract: Video games are increasingly accessible to blind and low vision (BLV) players, yet many aspects remain inaccessible. One aspect is the joy players feel when they explore environments and make new discoveries, which is integral to many games. Sighted players experience discovery by surveying environments and identifying unexplored areas. Current accessibility tools, however, guide BLV players direc… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Journal ref: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI '24), May 2024

  8. arXiv:2403.08487  [pdf, other

    cs.CV

    Model Will Tell: Training Membership Inference for Diffusion Models

    Authors: Xiaomeng Fu, Xi Wang, Qiao Li, ** Liu, Jiao Dai, Jizhong Han

    Abstract: Diffusion models pose risks of privacy breaches and copyright disputes, primarily stemming from the potential utilization of unauthorized data during the training phase. The Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model, representing a critical tool for privacy violation verification. However, the increa… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 18 pages, 6 figures, 7 tables

  9. arXiv:2312.01663  [pdf, other

    cs.CV cs.AI

    Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

    Authors: Runze He, Shaofei Huang, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu

    Abstract: In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt. However, obtaining desired editing results conformed with the editing prompt is nontrivial since there exist two significant challenges, including accurate editing of only foreground regions and multi-view consistency… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 14 pages, 13 figures, project website: https://customnerf.github.io/

  10. arXiv:2311.15570  [pdf, other

    cs.LG cs.CV

    UFDA: Universal Federated Domain Adaptation with Practical Assumptions

    Authors: Xinhui Liu, Zhenghao Chen, Lu** Zhou, Dong Xu, Wei Xi, Gairui Bai, Yihan Zhao, Jizhong Zhao

    Abstract: Conventional Federated Domain Adaptation (FDA) approaches usually demand an abundance of assumptions, which makes them significantly less feasible for real-world situations and introduces security hazards. This paper relaxes the assumptions from previous FDAs and studies a more practical scenario named Universal Federated Domain Adaptation (UFDA). It only requires the black-box model and the label… ▽ More

    Submitted 19 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted by AAAI2024

    Journal ref: AAAI2024

  11. arXiv:2311.01091  [pdf, other

    cs.CV

    Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding

    Authors: Tianrui Hui, Zihan Ding, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu

    Abstract: Panoptic narrative grounding (PNG) aims to segment things and stuff objects in an image described by noun phrases of a narrative caption. As a multimodal task, an essential aspect of PNG is the visual-linguistic interaction between image and caption. The previous two-stage method aggregates visual contexts from offline-generated mask proposals to phrase features, which tend to be noisy and fragmen… ▽ More

    Submitted 10 March, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted by IJCAI 2023. Since the PNG benchmark adopts a different data partition manner from ours, we update the experimental results on the things/stuff/singulars/plurals subsets based on the PNG's code

  12. arXiv:2309.16148  [pdf, other

    cs.CV

    OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions

    Authors: ** Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han

    Abstract: One-shot talking head generation has no explicit head movement reference, thus it is difficult to generate talking heads with head motions. Some existing works only edit the mouth area and generate still talking heads, leading to unreal talking head performance. Other works construct one-to-one map** between audio signal and head motion sequences, introducing ambiguity correspondences into the m… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Paper Under Review

  13. arXiv:2309.09501  [pdf, other

    cs.CV

    Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation

    Authors: Shaofei Huang, Han Li, Yuqing Wang, Hongji Zhu, Jiao Dai, Jizhong Han, Wenge Rong, Si Liu

    Abstract: Audio visual segmentation (AVS) aims to segment the sounding objects for each frame of a given video. To distinguish the sounding objects from silent ones, both audio-visual semantic correspondence and temporal interaction are required. The previous method applies multi-frame cross-modal attention to conduct pixel-level interactions between audio features and visual features of multiple frames sim… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted by IJCAI 2023

  14. arXiv:2308.16635  [pdf, other

    cs.CV

    MFR-Net: Multi-faceted Responsive Listening Head Generation via Denoising Diffusion Model

    Authors: ** Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han

    Abstract: Face-to-face communication is a common scenario including roles of speakers and listeners. Most existing research methods focus on producing speaker videos, while the generation of listener heads remains largely overlooked. Responsive listening head generation is an important task that aims to model face-to-face communication scenarios by generating a listener head video given a speaker video and… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  15. arXiv:2307.14491  [pdf, other

    cs.MM cs.SD eess.AS

    A Unified Framework for Modality-Agnostic Deepfakes Detection

    Authors: Cai Yu, Peng Chen, Jiahe Tian, ** Liu, Jiao Dai, Xi Wang, Yesheng Chai, Shan Jia, Siwei Lyu, Jizhong Han

    Abstract: As AI-generated content (AIGC) thrives, deepfakes have expanded from single-modality falsification to cross-modal fake content creation, where either audio or visual components can be manipulated. While using two unimodal detectors can detect audio-visual deepfakes, cross-modal forgery clues could be overlooked. Existing multimodal deepfake detection methods typically establish correspondence betw… ▽ More

    Submitted 24 October, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  16. arXiv:2303.17789  [pdf, other

    cs.CV

    FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions

    Authors: ** Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han

    Abstract: One-shot talking head generation has received growing attention in recent years, with various creative and practical applications. An ideal natural and vivid generated talking head video should contain natural head pose changes. However, it is challenging to map head pose sequences from driving audio since there exists a natural gap between audio-visual modalities. In this work, we propose a Flow-… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: Accepted by ICME2023

  17. arXiv:2303.06868  [pdf, other

    eess.IV cs.AI cs.CV

    Deep Learning-based Eye-Tracking Analysis for Diagnosis of Alzheimer's Disease Using 3D Comprehensive Visual Stimuli

    Authors: Fangyu Zuo, Peiguang **g, **glin Sun, Jizhong, Duan, Yong Ji, Yu Liu

    Abstract: Alzheimer's Disease (AD) causes a continuous decline in memory, thinking, and judgment. Traditional diagnoses are usually based on clinical experience, which is limited by some realistic factors. In this paper, we focus on exploiting deep learning techniques to diagnose AD based on eye-tracking behaviors. Visual attention, as typical eye-tracking behavior, is of great clinical value to detect cogn… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  18. arXiv:2302.08197  [pdf, other

    cs.CV

    OPT: One-shot Pose-Controllable Talking Head Generation

    Authors: ** Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han

    Abstract: One-shot talking head generation produces lip-sync talking heads based on arbitrary audio and one source face. To guarantee the naturalness and realness, recent methods propose to achieve free pose control instead of simply editing mouth areas. However, existing methods do not preserve accurate identity of source face when generating head motions. To solve the identity mismatch problem and achieve… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP2023

  19. arXiv:2301.02371  [pdf, other

    cs.CV

    Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection

    Authors: Shaofei Huang, Zhenwei Shen, Zehao Huang, Zi-han Ding, Jiao Dai, Jizhong Han, Naiyan Wang, Si Liu

    Abstract: Monocular 3D lane detection is a challenging task due to its lack of depth information. A popular solution is to first transform the front-viewed (FV) images or features into the bird-eye-view (BEV) space with inverse perspective map** (IPM) and detect lanes from BEV features. However, the reliance of IPM on flat ground assumption and loss of context information make it inaccurate to restore 3D… ▽ More

    Submitted 28 March, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

    Comments: Accepted by CVPR 2023

  20. arXiv:2210.06881  [pdf, other

    cs.CV cs.AI

    RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval

    Authors: Xing Wu, Chaochen Gao, Zijia Lin, Zhongyuan Wang, Jizhong Han, Songlin Hu

    Abstract: Video language pre-training methods have mainly adopted sparse sampling techniques to alleviate the temporal redundancy of videos. Though effective, sparse sampling still suffers inter-modal redundancy: visual redundancy and textual redundancy. Compared with highly generalized text, sparsely sampled frames usually contain text-independent portions, called visual redundancy. Sparse sampling is also… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  21. arXiv:2210.06432  [pdf, other

    cs.CL

    InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings

    Authors: Xing Wu, Chaochen Gao, Zijia Lin, Jizhong Han, Zhongyuan Wang, Songlin Hu

    Abstract: Contrastive learning has been extensively studied in sentence embedding learning, which assumes that the embeddings of different views of the same sentence are closer. The constraint brought by this assumption is weak, and a good sentence representation should also be able to reconstruct the original sentence fragments. Therefore, this paper proposes an information-aggregated contrastive learning… ▽ More

    Submitted 13 October, 2022; v1 submitted 8 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  22. Robotic Inspection and Characterization of Subsurface Defects on Concrete Structures Using Impact Sounding

    Authors: Ejup Hoxha, **glun Feng, Diar Sanakov, Ardian G**ofci, Jizhong Xiao

    Abstract: Impact-sounding (IS) and impact-echo (IE) are well-developed non-destructive evaluation (NDE) methods that are widely used for inspections of concrete structures to ensure the safety and sustainability. However, it is a tedious work to collect IS and IE data along grid lines covering a large target area for characterization of subsurface defects. On the other hand, data processing is very complica… ▽ More

    Submitted 12 August, 2022; originally announced August 2022.

    Journal ref: Structural Health Monitorign 2021

  23. arXiv:2206.12628  [pdf, other

    cs.RO

    FreSCo: Frequency-Domain Scan Context for LiDAR-based Place Recognition with Translation and Rotation Invariance

    Authors: Yongzhi Fan, Xin Du, Lun Luo, Jizhong Shen

    Abstract: Place recognition plays a crucial role in re-localization and loop closure detection tasks for robots and vehicles. This paper seeks a well-defined global descriptor for LiDAR-based place recognition. Compared to local descriptors, global descriptors show remarkable performance in urban road scenes but are usually viewpoint-dependent. To this end, we propose a simple yet robust global descriptor d… ▽ More

    Submitted 27 September, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

    Comments: 8 pages, 10 figures. Accepted for ICARCV 2022

  24. arXiv:2206.03789  [pdf, other

    cs.CV cs.MM

    Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation

    Authors: Zihan Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Jizhong Han, Si Liu

    Abstract: Referring video object segmentation aims to predict foreground labels for objects referred by natural language expressions in videos. Previous methods either depend on 3D ConvNets or incorporate additional 2D ConvNets as encoders to extract mixed spatial-temporal features. However, these methods suffer from spatial misalignment or false distractors due to delayed and implicit spatial-temporal inte… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

    Comments: Accepted by CVPR 2022

  25. arXiv:2201.10792  [pdf, other

    cs.CL cs.SD eess.AS

    On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End Mandarin Chinese ASR

    Authors: Zhao Yang, Dianwen Ng, Xiao Fu, Li** Han, Wei Xi, Rui Wang, Rui Jiang, Jizhong Zhao

    Abstract: End-to-end automatic speech recognition (ASR) has achieved promising results. However, most existing end-to-end ASR methods neglect the use of specific language characteristics. For Mandarin Chinese ASR tasks, there exist mutual promotion relationship between Pinyin and Character where Chinese characters can be romanized by Pinyin. Based on the above intuition, we first investigate types of end-to… ▽ More

    Submitted 30 March, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

    Comments: submitted to INTERSPEECH 2022

  26. arXiv:2110.13125  [pdf

    eess.AS cs.SD eess.SP

    Automatic Impact-sounding Acoustic Inspection of Concrete Structure

    Authors: **glun Feng, Hua Xiao, Ejup Hoxha, Yifeng Song, Liang Yang, Jizhong Xiao

    Abstract: Impact sounding signal has been shown to contain information about structural integrity flaws and subsurface objects from previous research. As non-destructive testing (NDT) method, one of the biggest challenges in impact sounding based inspection is the subsurface targets detection and reconstruction. This paper presents the importance and practicability of using solenoids to trigger impact sound… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Journal ref: 10th International Conference on Structural Health Monitoring of Intelligent Infrastructure, SHMII 10, 2021

  27. arXiv:2109.04380  [pdf, other

    cs.CL cs.AI

    ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding

    Authors: Xing Wu, Chaochen Gao, Liangjun Zang, Jizhong Han, Zhongyuan Wang, Songlin Hu

    Abstract: Contrastive learning has been attracting much attention for learning unsupervised sentence embeddings. The current state-of-the-art unsupervised method is the unsupervised SimCSE (unsup-SimCSE). Unsup-SimCSE takes dropout as a minimal data augmentation method, and passes the same input sentence to a pre-trained Transformer encoder (with dropout turned on) twice to obtain the two corresponding embe… ▽ More

    Submitted 11 September, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: COLING 2022

  28. arXiv:2109.04321  [pdf, other

    cs.CL cs.AI

    Smoothed Contrastive Learning for Unsupervised Sentence Embedding

    Authors: Xing Wu, Chaochen Gao, Yipeng Su, Jizhong Han, Zhongyuan Wang, Songlin Hu

    Abstract: Contrastive learning has been gradually applied to learn high-quality unsupervised sentence embedding. Among the previous un-supervised methods, the latest state-of-the-art method, as far as we know, is unsupervised SimCSE (unsup-SimCSE). Unsup-SimCSE uses the InfoNCE1loss function in the training stage by pulling semantically similar sentences together and pushing apart dis-similar ones.Theoretic… ▽ More

    Submitted 11 September, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: COLING 2022

  29. arXiv:2106.01907  [pdf, other

    eess.IV cs.CV

    Robotic Inspection of Underground Utilities for Construction Survey Using a Ground Penetrating Radar

    Authors: **glun Feng, Liang Yang, Ejup Hoxha, Jiang Biao, Jizhong Xiao

    Abstract: Ground Penetrating Radar (GPR) is a very useful non-destructive evaluation (NDE) device for locating and map** underground assets prior to digging and trenching efforts in construction. This paper presents a novel robotic system to automate the GPR data collection process, localize the underground utilities, interpret and reconstruct the underground objects for better visualization allowing regu… ▽ More

    Submitted 19 April, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

  30. arXiv:2105.06818  [pdf, other

    cs.CV cs.MM

    Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation

    Authors: Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang

    Abstract: Language-queried video actor segmentation aims to predict the pixel-level mask of the actor which performs the actions described by a natural language query in the target frames. Existing methods adopt 3D CNNs over the video clip as a general encoder to extract a mixed spatio-temporal feature for the target frame. Though 3D convolutions are amenable to recognizing which actor is performing the que… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: Accepted by CVPR 2021

  31. arXiv:2104.02850  [pdf, other

    cs.CV cs.AI cs.MM

    LI-Net: Large-Pose Identity-Preserving Face Reenactment Network

    Authors: ** Liu, Peng Chen, Tao Liang, Zhaoxing Li, Cai Yu, Shuqiao Zou, Jiao Dai, Jizhong Han

    Abstract: Face reenactment is a challenging task, as it is difficult to maintain accurate expression, pose and identity simultaneously. Most existing methods directly apply driving facial landmarks to reenact source faces and ignore the intrinsic gap between two identities, resulting in the identity mismatch issue. Besides, they neglect the entanglement of expression and pose features when encoding driving… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: IEEE International Conference on Multimedia and Expo(ICME) 2021 Oral

  32. ORDNet: Capturing Omni-Range Dependencies for Scene Parsing

    Authors: Shaofei Huang, Si Liu, Tianrui Hui, Jizhong Han, Bo Li, Jiashi Feng, Shuicheng Yan

    Abstract: Learning to capture dependencies between spatial positions is essential to many visual tasks, especially the dense labeling problems like scene parsing. Existing methods can effectively capture long-range dependencies with self-attention mechanism while short ones by local convolution. However, there is still much gap between long-range and short-range dependencies, which largely reduces the model… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

    Comments: Published at TIP

    Journal ref: IEEE Transactions on Image Processing, 2020, 29: 8251-8263

  33. arXiv:2012.04233  [pdf, other

    cs.CL cs.SI

    Early Detection of Fake News by Utilizing the Credibility of News, Publishers, and Users Based on Weakly Supervised Learning

    Authors: Chunyuan Yuan, Qianwen Ma, Wei Zhou, Jizhong Han, Songlin Hu

    Abstract: The dissemination of fake news significantly affects personal reputation and public trust. Recently, fake news detection has attracted tremendous attention, and previous studies mainly focused on finding clues from news content or diffusion path. However, the required features of previous models are often unavailable or insufficient in early detection scenarios, resulting in poor performance. Thus… ▽ More

    Submitted 13 December, 2020; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: Accepted as a long paper at COLING 2020

  34. arXiv:2011.02635  [pdf, other

    cs.CV cs.RO eess.IV

    GPR-based Model Reconstruction System for Underground Utilities Using GPRNet

    Authors: **glun Feng, Liang Yang, Ejup Hoxha, Diar Sanakov, Stanislav Sotnikov, Jizhong Xiao

    Abstract: Ground Penetrating Radar (GPR) is one of the most important non-destructive evaluation (NDE) instruments to detect and locate underground objects (i.e., rebars, utility pipes). Many previous researches focus on GPR image-based feature detection only, and none can process sparse GPR measurements to successfully reconstruct a very fine and detailed 3D model of underground objects for better visualiz… ▽ More

    Submitted 18 May, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: Accepted by ICRA 2021

  35. arXiv:2010.00515  [pdf, other

    cs.CV cs.CL

    Linguistic Structure Guided Context Modeling for Referring Image Segmentation

    Authors: Tianrui Hui, Si Liu, Shaofei Huang, Guanbin Li, Sansi Yu, Faxi Zhang, Jizhong Han

    Abstract: Referring image segmentation aims to predict the foreground mask of the object referred by a natural language sentence. Multimodal context of the sentence is crucial to distinguish the referent from the background. Existing methods either insufficiently or redundantly model the multimodal context. To tackle this problem, we propose a "gather-propagate-distribute" scheme to model multimodal context… ▽ More

    Submitted 5 October, 2020; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: Accepted by ECCV 2020. Code is available at https://github.com/spyflying/LSCM-Refseg

  36. arXiv:2010.00514  [pdf, other

    cs.CV cs.CL

    Referring Image Segmentation via Cross-Modal Progressive Comprehension

    Authors: Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li

    Abstract: Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression. Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities, but usually fail to explore informative words of the expression to well align features from the two modalitie… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

    Comments: Accepted by CVPR 2020. Code is available at https://github.com/spyflying/CMPC-Refseg

  37. arXiv:2007.08445  [pdf, other

    cs.CL

    Hierarchical Interaction Networks with Rethinking Mechanism for Document-level Sentiment Analysis

    Authors: Lingwei Wei, Dou Hu, Wei Zhou, Xuehai Tang, Xiaodan Zhang, Xin Wang, Jizhong Han, Songlin Hu

    Abstract: Document-level Sentiment Analysis (DSA) is more challenging due to vague semantic links and complicate sentiment information. Recent works have been devoted to leveraging text summarization and have achieved promising results. However, these summarization-based methods did not take full advantage of the summary including ignoring the inherent interactions between the summary and document. As a res… ▽ More

    Submitted 7 September, 2022; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: 17 pages, accepted by ECML-PKDD 2020

  38. arXiv:2004.12498  [pdf, other

    cs.CV

    Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point Clouds of Wild Scenes

    Authors: Haiyan Wang, Xuejian Rong, Liang Yang, **glun Feng, Jizhong Xiao, Yingli Tian

    Abstract: The deficiency of 3D segmentation labels is one of the main obstacles to effective point cloud segmentation, especially for scenes in the wild with varieties of different objects. To alleviate this issue, we propose a novel deep graph convolutional network-based framework for large-scale semantic scene segmentation in point clouds with sole 2D supervision. Different with numerous preceding multi-v… ▽ More

    Submitted 17 May, 2020; v1 submitted 26 April, 2020; originally announced April 2020.

    Comments: 13 pages, 8 figures, Under review as a journal paper at CVIU

  39. arXiv:2004.02234  [pdf, other

    cs.CV

    Feature Super-Resolution Based Facial Expression Recognition for Multi-scale Low-Resolution Faces

    Authors: Wei **g, Feng Tian, Jizhong Zhang, Kuo-Ming Chao, Zhenxin Hong, Xu Liu

    Abstract: Facial Expressions Recognition(FER) on low-resolution images is necessary for applications like group expression recognition in crowd scenarios(station, classroom etc.). Classifying a small size facial image into the right expression category is still a challenging task. The main cause of this problem is the loss of discriminative feature due to reduced resolution. Super-resolution method is often… ▽ More

    Submitted 5 April, 2020; originally announced April 2020.

    Comments: 13 pages, 5 figures

  40. arXiv:2002.09634  [pdf, other

    cs.CL

    Data Augmentation for Copy-Mechanism in Dialogue State Tracking

    Authors: Xiaohui Song, Liangjun Zang, Yipeng Su, Xing Wu, Jizhong Han, Songlin Hu

    Abstract: While several state-of-the-art approaches to dialogue state tracking (DST) have shown promising performances on several benchmarks, there is still a significant performance gap between seen slot values (i.e., values that occur in both training set and test set) and unseen ones (values that occur in training set but not in test set). Recently, the copy-mechanism has been widely used in DST models t… ▽ More

    Submitted 22 February, 2020; originally announced February 2020.

  41. arXiv:1911.03626  [pdf, other

    cs.CL cs.IR cs.LG

    Beyond Statistical Relations: Integrating Knowledge Relations into Style Correlations for Multi-Label Music Style Classification

    Authors: Qianwen Ma, Chunyuan Yuan, Wei Zhou, Jizhong Han, Songlin Hu

    Abstract: Automatically labeling multiple styles for every song is a comprehensive application in all kinds of music websites. Recently, some researches explore review-driven multi-label music style classification and exploit style correlations for this task. However, their methods focus on mining the statistical relations between different music styles and only consider shallow style relations. Moreover, t… ▽ More

    Submitted 11 January, 2021; v1 submitted 9 November, 2019; originally announced November 2019.

    Comments: Accepted as WSDM 2020 Regular Paper

  42. arXiv:1909.05364  [pdf, other

    cs.CL cs.AI

    TransSent: Towards Generation of Structured Sentences with Discourse Marker

    Authors: Xing Wu, Dongjun Wei, Liangjun Zang, Jizhong Han, Songlin Hu

    Abstract: Structured sentences are important expressions in human writings and dialogues. Previous works on neural text generation fused semantic and structural information by encoding the entire sentence into a mixed hidden representation. However, when a generated sentence becomes complicated, the structure is difficult to be properly maintained. To alleviate this problem, we explicitly separate the model… ▽ More

    Submitted 8 May, 2020; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: 5 figures

  43. arXiv:1909.04465  [pdf, other

    cs.CL cs.IR cs.SI

    Jointly embedding the local and global relations of heterogeneous graph for rumor detection

    Authors: Chunyuan Yuan, Qianwen Ma, Wei Zhou, Jizhong Han, Songlin Hu

    Abstract: The development of social media has revolutionized the way people communicate, share information and make decisions, but it also provides an ideal platform for publishing and spreading rumors. Existing rumor detection methods focus on finding clues from text content, user profiles, and propagation patterns. However, the local semantic relation and global structural information in the message propa… ▽ More

    Submitted 11 September, 2019; v1 submitted 10 September, 2019; originally announced September 2019.

    Comments: 10 pages, Accepted to the IEEE International Conference on Data Mining 2019

  44. arXiv:1909.04455  [pdf, other

    cs.CL cs.AI

    Learning review representations from user and product level information for spam detection

    Authors: Chunyuan Yuan, Wei Zhou, Qianwen Ma, Shangwen Lv, Jizhong Han, Songlin Hu

    Abstract: Opinion spam has become a widespread problem in social media, where hired spammers write deceptive reviews to promote or demote products to mislead the consumers for profit or fame. Existing works mainly focus on manually designing discrete textual or behavior features, which cannot capture complex semantics of reviews. Although recent works apply deep learning methods to learn review-level semant… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

    Comments: 6 pages. Accepted as IEEE ICDM 2019, Short Paper

  45. arXiv:1908.08039  [pdf, other

    cs.CL

    "Mask and Infill" : Applying Masked Language Model to Sentiment Transfer

    Authors: Xing Wu, Tao Zhang, Liangjun Zang, Jizhong Han, Songlin Hu

    Abstract: This paper focuses on the task of sentiment transfer on non-parallel text, which modifies sentiment attributes (e.g., positive or negative) of sentences while preserving their attribute-independent content. Due to the limited capability of RNNbased encoder-decoder structure to capture deep and long-range dependencies among words, previous works can hardly generate satisfactory sentences from scrat… ▽ More

    Submitted 21 August, 2019; originally announced August 2019.

    Comments: IJCAI 2019

  46. arXiv:1905.10625  [pdf, other

    cs.CL cs.AI

    ESA: Entity Summarization with Attention

    Authors: Dongjun Wei, Yaxin Liu, Fuqing Zhu, Liangjun Zang, Wei Zhou, Jizhong Han, Songlin Hu

    Abstract: Entity summarization aims at creating brief but informative descriptions of entities from knowledge graphs. While previous work mostly focused on traditional techniques such as clustering algorithms and graph models, we ask how to apply deep learning methods into this task. In this paper we propose ESA, a neural network with supervised attention mechanisms for entity summarization. Specifically, w… ▽ More

    Submitted 25 May, 2020; v1 submitted 25 May, 2019; originally announced May 2019.

    Comments: 12pages, accepted in EYRE@CIKM'2019

  47. arXiv:1903.11919  [pdf, other

    cs.CL cs.LG

    Imbalanced Sentiment Classification Enhanced with Discourse Marker

    Authors: Tao Zhang, Xing Wu, Meng Lin, Jizhong Han, Songlin Hu

    Abstract: Imbalanced data commonly exists in real world, espacially in sentiment-related corpus, making it difficult to train a classifier to distinguish latent sentiment in text data. We observe that humans often express transitional emotion between two adjacent discourses with discourse markers like "but", "though", "while", etc, and the head discourse and the tail discourse 3 usually indicate opposite em… ▽ More

    Submitted 28 March, 2019; originally announced March 2019.

    Comments: 12 pages, 1 figures

  48. arXiv:1901.06773  [pdf, other

    cs.LG cs.AI

    AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks

    Authors: **rong Guo, Wantao Liu, Wang Wang, Qu Lu, Songlin Hu, Jizhong Han, Ruixuan Li

    Abstract: Typically, Ultra-deep neural network(UDNN) tends to yield high-quality model, but its training process is usually resource intensive and time-consuming. Modern GPU's scarce DRAM capacity is the primary bottleneck that hinders the trainability and the training efficiency of UDNN. In this paper, we present "AccUDNN", an accelerator that aims to make the utmost use of finite GPU memory resources to s… ▽ More

    Submitted 20 June, 2019; v1 submitted 20 January, 2019; originally announced January 2019.

    Comments: 12 pages,11 figures, 3 tables

  49. arXiv:1812.06705  [pdf, other

    cs.CL cs.AI cs.LG

    Conditional BERT Contextual Augmentation

    Authors: Xing Wu, Shangwen Lv, Liangjun Zang, Jizhong Han, Songlin Hu

    Abstract: We propose a novel data augmentation method for labeled sentences called conditional BERT contextual augmentation. Data augmentation methods are often applied to prevent overfitting and improve generalization of deep neural network models. Recently proposed contextual augmentation augments labeled sentences by randomly replacing words with more varied substitutions predicted by language model. BER… ▽ More

    Submitted 17 December, 2018; originally announced December 2018.

    Comments: 9 pages, 1 figure

  50. arXiv:1812.00477  [pdf, other

    cs.CV

    Ego-Downward and Ambient Video based Person Location Association

    Authors: Liang Yang, Hao Jiang, Jizhong Xiao, Zhouyuan Huo

    Abstract: Using an ego-centric camera to do localization and tracking is highly needed for urban navigation and indoor assistive system when GPS is not available or not accurate enough. The traditional hand-designed feature tracking and estimation approach would fail without visible features. Recently, there are several works exploring to use context features to do localization. However, all of these suffer… ▽ More

    Submitted 2 December, 2018; originally announced December 2018.