Skip to main content

Showing 1–50 of 107 results for author: Hong, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12442  [pdf, other

    cs.CL cs.AI

    Abstraction-of-Thought Makes Language Models Better Reasoners

    Authors: Ruixin Hong, Hongming Zhang, Xiaoman Pan, Dong Yu, Changshui Zhang

    Abstract: Abstract reasoning, the ability to reason from the abstract essence of a problem, serves as a key to generalization in human reasoning. However, eliciting language models to perform reasoning with abstraction remains unexplored. This paper seeks to bridge this gap by introducing a novel structured reasoning format called Abstraction-of-Thought (AoT). The uniqueness of AoT lies in its explicit requ… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Work in Process

  2. arXiv:2406.08214  [pdf, other

    cs.IR

    Graph Bottlenecked Social Recommendation

    Authors: Yonghui Yang, Le Wu, Zihan Wang, Zhuangzhuang He, Richang Hong, Meng Wang

    Abstract: With the emergence of social networks, social recommendation has become an essential technique for personalized services. Recently, graph-based social recommendations have shown promising results by capturing the high-order social influence. Most empirical studies of graph-based social recommendations directly take the observed social networks into formulation, and produce user preferences based o… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  3. arXiv:2406.03064  [pdf, other

    cs.LG cs.IR

    Path-Specific Causal Reasoning for Fairness-aware Cognitive Diagnosis

    Authors: Dacao Zhang, Kun Zhang, Le Wu, Mi Tian, Richang Hong, Meng Wang

    Abstract: Cognitive Diagnosis~(CD), which leverages students and exercise data to predict students' proficiency levels on different knowledge concepts, is one of fundamental components in Intelligent Education. Due to the scarcity of student-exercise interaction data, most existing methods focus on making the best use of available data, such as exercise content and student information~(e.g., educational con… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accpeted by KDD'2024

  4. Multimodality Invariant Learning for Multimedia-Based New Item Recommendation

    Authors: Haoyue Bai, Le Wu, Min Hou, Miaomiao Cai, Zhuangzhuang He, Yuyang Zhou, Richang Hong, Meng Wang

    Abstract: Multimedia-based recommendation provides personalized item suggestions by learning the content preferences of users. With the proliferation of digital devices and APPs, a huge number of new items are created rapidly over time. How to quickly provide recommendations for new items at the inference time is challenging. What's worse, real-world items exhibit varying degrees of modality missing(e.g., m… ▽ More

    Submitted 28 April, 2024; originally announced May 2024.

  5. arXiv:2405.11272  [pdf, other

    cs.IR cs.AI

    Double Correction Framework for Denoising Recommendation

    Authors: Zhuangzhuang He, Yifan Wang, Yonghui Yang, Peijie Sun, Le Wu, Haoyue Bai, **qi Gong, Richang Hong, Min Zhang

    Abstract: As its availability and generality in online services, implicit feedback is more commonly used in recommender systems. However, implicit feedback usually presents noisy samples in real-world recommendation scenarios (such as misclicks or non-preferential behaviors), which will affect precise user preference learning. To overcome the noisy samples problem, a popular solution is based on drop** no… ▽ More

    Submitted 27 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  6. arXiv:2405.08209  [pdf, other

    cs.CY cs.CL cs.CV cs.LG

    Who's in and who's out? A case study of multimodal CLIP-filtering in DataComp

    Authors: Rachel Hong, William Agnew, Tadayoshi Kohno, Jamie Morgenstern

    Abstract: As training datasets become increasingly drawn from unstructured, uncontrolled environments such as the web, researchers and industry practitioners have increasingly relied upon data filtering techniques to "filter out the noise" of web-scraped data. While datasets have been widely shown to reflect the biases and values of their creators, in this paper we contribute to an emerging body of research… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Content warning: This paper discusses societal stereotypes and sexually-explicit material that may be disturbing, distressing, and/or offensive to the reader

  7. arXiv:2405.06078  [pdf, ps, other

    cs.CY cs.HC

    Collaborative Design for Job-Seekers with Autism: A Conceptual Framework for Future Research

    Authors: Sungsoo Ray Hong, Marcos Zampieri, Brittany N. Hand, Vivian Motti, Dongjun Chung, Ozlem Uzuner

    Abstract: The success of employment is highly related to a job seeker's capability of communicating and collaborating with others. While leveraging one's network during the job-seeking process is intuitive to the neurotypical, this can be challenging for people with autism. Recent empirical findings have started to show how facilitating collaboration between people with autism and their social surroundings… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  8. arXiv:2403.11070  [pdf, other

    cs.CV

    Controllable Relation Disentanglement for Few-Shot Class-Incremental Learning

    Authors: Yuan Zhou, Richang Hong, Yanrong Guo, Lin Liu, Shijie Hao, Hanwang Zhang

    Abstract: In this paper, we propose to tackle Few-Shot Class-Incremental Learning (FSCIL) from a new perspective, i.e., relation disentanglement, which means enhancing FSCIL via disentangling spurious relation between categories. The challenge of disentangling spurious correlations lies in the poor controllability of FSCIL. On one hand, an FSCIL model is required to be trained in an incremental manner and t… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  9. arXiv:2403.09036  [pdf, other

    cs.CV

    Gradient-Aware Logit Adjustment Loss for Long-tailed Classifier

    Authors: Fan Zhang, Wei Qin, Weijieying Ren, Lei Wang, Zetong Chen, Richang Hong

    Abstract: In the real-world setting, data often follows a long-tailed distribution, where head classes contain significantly more training samples than tail classes. Consequently, models trained on such data tend to be biased toward head classes. The medium of this bias is imbalanced gradients, which include not only the ratio of scale between positive and negative gradients but also imbalanced gradients fr… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 5 pages, 2 figures. Accepted by icassp 2024, see https://cmsworkshops.com/ICASSP2024/papers/accepted_papers.php by searching this paper title

  10. arXiv:2403.02981  [pdf, other

    cs.CV

    Doubly Abductive Counterfactual Inference for Text-based Image Editing

    Authors: Xue Song, Jiequan Cui, Hanwang Zhang, **g**g Chen, Richang Hong, Yu-Gang Jiang

    Abstract: We study text-based image editing (TBIE) of a single image by counterfactual inference because it is an elegant formulation to precisely address the requirement: the edited image should retain the fidelity of the original one. Through the lens of the formulation, we find that the crux of TBIE is that existing techniques hardly achieve a good trade-off between editability and fidelity, mainly due t… ▽ More

    Submitted 25 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  11. arXiv:2403.02649  [pdf, other

    cs.CV

    Few-shot Learner Parameterization by Diffusion Time-steps

    Authors: Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Qianru Sun

    Abstract: Even when using large multi-modal foundation models, few-shot learning is still challenging -- if there is no proper inductive bias, it is nearly impossible to keep the nuanced class attributes while removing the visually prominent attributes that spuriously correlate with class labels. To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced cla… ▽ More

    Submitted 26 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  12. arXiv:2403.01722  [pdf, other

    cs.HC

    Closing the Knowledge Gap in Designing Data Annotation Interfaces for AI-powered Disaster Management Analytic Systems

    Authors: Zinat Ara, Hossein Salemi, Sungsoo Ray Hong, Yasas Senarath, Steve Peterson, Amanda Lee Hughes, Hemant Purohit

    Abstract: Data annotation interfaces predominantly leverage ground truth labels to guide annotators toward accurate responses. With the growing adoption of Artificial Intelligence (AI) in domain-specific professional tasks, it has become increasingly important to help beginning annotators identify how their early-stage knowledge can lead to inaccurate answers, which in turn, helps to ensure quality annotati… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  13. arXiv:2403.01715  [pdf, other

    cs.HC

    Collaborative Job Seeking for People with Autism: Challenges and Design Opportunities

    Authors: Zinat Ara, Amrita Ganguly, Donna Peppard, Dongjun Chung, Slobodan Vucetic, Vivian Genaro Motti, Sungsoo Ray Hong

    Abstract: Successful job search results from job seekers' well-shaped social communication. While well-known differences in communication exist between people with autism and neurotypicals, little is known about how people with autism collaborate with their social surroundings to strive in the job market. To better understand the practices and challenges of collaborative job seeking for people with autism,… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  14. arXiv:2401.15877  [pdf, other

    cs.HC cs.CV

    3DPFIX: Improving Remote Novices' 3D Printing Troubleshooting through Human-AI Collaboration

    Authors: Nahyun Kwon, Tong Sun, Yuyang Gao, Liang Zhao, Xu Wang, Jeeeun Kim, Sungsoo Ray Hong

    Abstract: The widespread consumer-grade 3D printers and learning resources online enable novices to self-train in remote settings. While troubleshooting plays an essential part of 3D printing, the process remains challenging for many remote novices even with the help of well-developed online sources, such as online troubleshooting archives and online community help. We conducted a formative study with 76 ac… ▽ More

    Submitted 1 February, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: CSCW2024

  15. arXiv:2401.13270  [pdf, other

    cs.CV cs.AI

    Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics

    Authors: Pengcheng Zhao, Yanxiang Chen, Yang Zhao, Wei Jia, Zhao Zhang, Ronggang Wang, Richang Hong

    Abstract: Automatic image colorization is inherently an ill-posed problem with uncertainty, which requires an accurate semantic understanding of scenes to estimate reasonable colors for grayscale images. Although recent interaction-based methods have achieved impressive performance, it is still a very difficult task to infer realistic and accurate colors for automatic colorization. To reduce the difficulty… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  16. arXiv:2312.16477  [pdf, other

    cs.CV cs.AI

    Group Multi-View Transformer for 3D Shape Analysis with Spatial Encoding

    Authors: Lixiang Xu, Qingzhe Cui, Richang Hong, Wei Xu, Enhong Chen, Xin Yuan, Chenglong Li, Yuanyan Tang

    Abstract: In recent years, the results of view-based 3D shape recognition methods have saturated, and models with excellent performance cannot be deployed on memory-limited devices due to their huge size of parameters. To address this problem, we introduce a compression method based on knowledge distillation for this field, which largely reduces the number of parameters while preserving model performance as… ▽ More

    Submitted 30 December, 2023; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: 13pages, 8 figuers

    MSC Class: 68 ACM Class: I.2.10

  17. arXiv:2312.05447  [pdf, other

    cs.CV

    From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos

    Authors: Yin Chen, Jia Li, Shiguang Shan, Meng Wang, Richang Hong

    Abstract: Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations, e.g., insufficient quantity and diversity of pose, occlusion and illumination, as well as the inherent ambiguity of facial expressions. In contrast, static facial expression recognition (SFER) currently shows much higher performance and can benefit from more abundant high-quality training data. Moreover… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Code will be available at: https://github.com/FER-LMC/S2D

  18. arXiv:2312.01294  [pdf, other

    cs.LG stat.ML

    Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series

    Authors: Ying Liu, Peng Cui, Wenbo Hu, Richang Hong

    Abstract: Multivariate time series are everywhere. Nevertheless, real-world time series data often exhibit numerous missing values, which is the time series imputation task. Although previous deep learning methods have been shown to be effective for time series imputation, they are shown to produce overconfident imputations, which might be a potentially overlooked threat to the reliability of the intelligen… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  19. arXiv:2311.17438  [pdf, other

    cs.CL cs.AI

    CLOMO: Counterfactual Logical Modification with Large Language Models

    Authors: Yinya Huang, Ruixin Hong, Hongming Zhang, Wei Shao, Zhicheng Yang, Dong Yu, Changshui Zhang, Xiaodan Liang, Linqi Song

    Abstract: In this study, we delve into the realm of counterfactual reasoning capabilities of large language models (LLMs). Our primary objective is to cultivate the counterfactual thought processes within LLMs and rigorously assess these processes for their validity. Specifically, we introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark. In this ta… ▽ More

    Submitted 7 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Journal ref: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL 2024)

  20. arXiv:2311.16462  [pdf, other

    cs.CV cs.MM

    Viewport Prediction for Volumetric Video Streaming by Exploring Video Saliency and Trajectory Information

    Authors: Jie Li, Zhixin Li, Zhi Liu, Pengyuan Zhou, Richang Hong, Qiyue Li, Han Hu

    Abstract: Volumetric video, also known as hologram video, is a novel medium that portrays natural content in Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR). It is expected to be the next-gen video technology and a prevalent use case for 5G and beyond wireless communication. Considering that each user typically only watches a section of the volumetric video, known as the viewport, it is… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  21. One-bit Supervision for Image Classification: Problem, Solution, and Beyond

    Authors: Hengtong Hu, Lingxi Xie, Xinyue Hue, Richang Hong, Qi Tian

    Abstract: This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification. Instead of training model using the accurate label of each sample, our setting requires the model to interact with the system by predicting the class label of each sample and learn from the answer whether the guess is correct, which provides one bit (yes or no) of information. An intri… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: ACM TOMM. arXiv admin note: text overlap with arXiv:2009.06168

  22. arXiv:2311.11695  [pdf, other

    cs.CV

    Clarity ChatGPT: An Interactive and Adaptive Processing System for Image Restoration and Enhancement

    Authors: Yanyan Wei, Zhao Zhang, Jiahuan Ren, Xiaogang Xu, Richang Hong, Yi Yang, Shuicheng Yan, Meng Wang

    Abstract: The generalization capability of existing image restoration and enhancement (IRE) methods is constrained by the limited pre-trained datasets, making it difficult to handle agnostic inputs such as different degradation levels and scenarios beyond their design scopes. Moreover, they are not equipped with interactive mechanisms to consider user preferences or feedback, and their end-to-end settings c… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  23. arXiv:2311.07954  [pdf, other

    cs.AI cs.CL

    A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning

    Authors: Ruixin Hong, Hongming Zhang, Xinyu Pang, Dong Yu, Changshui Zhang

    Abstract: Logical reasoning has been an ongoing pursuit in the field of AI. Despite significant advancements made by large language models (LLMs), they still struggle with complex logical reasoning problems. To enhance reasoning performance, one promising direction is scalable oversight, which requires LLMs to identify their own errors and then improve by themselves. Various self-verification methods have b… ▽ More

    Submitted 23 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 Main Conference

  24. arXiv:2310.17869  [pdf, other

    cs.CV

    Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering

    Authors: Zijie Song, Zhenzhen Hu, Richang Hong

    Abstract: Unsupervised representation learning for image clustering is essential in computer vision. Although the advancement of visual models has improved image clustering with efficient visual representations, challenges still remain. Firstly, these features often lack the ability to represent the internal structure of images, hindering the accurate clustering of visually similar images. Secondly, the exi… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  25. arXiv:2310.06403  [pdf, other

    cs.CV

    Boundary Discretization and Reliable Classification Network for Temporal Action Detection

    Authors: Zhenying Fang, Jun Yu, Richang Hong

    Abstract: Temporal action detection aims to recognize the action category and determine each action instance's starting and ending time in untrimmed videos. The mixed methods have achieved remarkable performance by seamlessly merging anchor-based and anchor-free approaches. Nonetheless, there are still two crucial issues within the mixed framework: (1) Brute-force merging and handcrafted anchor design hinde… ▽ More

    Submitted 7 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: 12 pages, Source code: https://github.com/zhenyingfang/BDRC-Net

  26. arXiv:2309.13549  [pdf, other

    cs.RO cs.CV

    Towards Robust Robot 3D Perception in Urban Environments: The UT Campus Object Dataset

    Authors: Arthur Zhang, Chaitanya Eranki, Christina Zhang, Ji-Hwan Park, Raymond Hong, Pranav Kalyani, Lochana Kalyanaraman, Arsh Gamare, Arnav Bagad, Maria Esteva, Joydeep Biswas

    Abstract: We introduce the UT Campus Object Dataset (CODa), a mobile robot egocentric perception dataset collected on the University of Texas Austin Campus. Our dataset contains 8.5 hours of multimodal sensor data: synchronized 3D point clouds and stereo RGB video from a 128-channel 3D LiDAR and two 1.25MP RGB cameras at 10 fps; RGB-D videos from an additional 0.5MP sensor at 7 fps, and a 9-DOF IMU sensor a… ▽ More

    Submitted 1 October, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: 19 pages, 18 figures, 12 tables

  27. arXiv:2308.12636  [pdf, other

    cs.MM

    Exploring Transferability of Multimodal Adversarial Samples for Vision-Language Pre-training Models with Contrastive Learning

    Authors: Youze Wang, Wenbo Hu, Yinpeng Dong, Hanwang Zhang, Richang Hong

    Abstract: Vision-language pre-training models (VLP) are vulnerable, especially to multimodal adversarial samples, which can be crafted by adding imperceptible perturbations on both original images and texts. However, under the black-box setting, there have been no works to explore the transferability of multimodal adversarial attacks against the VLP models. In this work, we take CLIP as the surrogate model… ▽ More

    Submitted 4 November, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

  28. arXiv:2308.09250  [pdf, other

    cs.LG cs.DM cs.NE math.MG math.NA

    Capacity Bounds for Hyperbolic Neural Network Representations of Latent Tree Structures

    Authors: Anastasis Kratsios, Ruiyang Hong, Haitz Sáez de Ocáriz Borde

    Abstract: We study the representation capacity of deep hyperbolic neural networks (HNNs) with a ReLU activation function. We establish the first proof that HNNs can $\varepsilon$-isometrically embed any finite weighted tree into a hyperbolic space of dimension $d$ at least equal to $2$ with prescribed sectional curvature $κ<0$, for any $\varepsilon> 1$ (where $\varepsilon=1$ being optimal). We establish rig… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: 22 Pages + References, 1 Table, 4 Figures

    MSC Class: 68T07; 30L05; 68R12; 05C05

  29. Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning

    Authors: Zijie Song, Zhenzhen Hu, Yuanen Zhou, Ye Zhao, Richang Hong, Meng Wang

    Abstract: Cross-lingual image captioning is a challenging task that requires addressing both cross-lingual and cross-modal obstacles in multimedia analysis. The crucial issue in this task is to model the global and the local matching between the image and different languages. Existing cross-modal embedding methods based on the transformer architecture oversee the local matching between the image region and… ▽ More

    Submitted 5 April, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

  30. arXiv:2307.05100  [pdf, other

    cs.IR

    Generative Contrastive Graph Learning for Recommendation

    Authors: Yonghui Yang, Zhengwei Wu, Le Wu, Kun Zhang, Richang Hong, Zhiqiang Zhang, Jun Zhou, Meng Wang

    Abstract: By treating users' interactions as a user-item graph, graph learning models have been widely deployed in Collaborative Filtering(CF) based recommendation. Recently, researchers have introduced Graph Contrastive Learning(GCL) techniques into CF to alleviate the sparse supervision issue, which first constructs contrastive views by data augmentations and then provides self-supervised signals by maxim… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: This paper is accepted to SIGIR 2023. Code is avaliable: https://github.com/yimutianyang/SIGIR23-VGCL

  31. arXiv:2307.04036  [pdf, other

    cs.HC cs.AI cs.CV cs.LG

    Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations

    Authors: Tong Steven Sun, Yuyang Gao, Shubham Khaladkar, Sijia Liu, Liang Zhao, Young-Ho Kim, Sungsoo Ray Hong

    Abstract: The local explanation provides heatmaps on images to explain how Convolutional Neural Networks (CNNs) derive their output. Due to its visual straightforwardness, the method has been one of the most popular explainable AI (XAI) methods for diagnosing CNNs. Through our formative study (S1), however, we captured ML engineers' ambivalent perspective about the local explanation as a valuable and indisp… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: 32 pages, 6 figures, 5 tables. Accepted for publication in the Proceedings of the ACM on Human-Computer Interaction (PACM HCI), CSCW 2023

  32. arXiv:2305.13208  [pdf, other

    cs.CV cs.AI

    Iterative Adversarial Attack on Image-guided Story Ending Generation

    Authors: Youze Wang, Wenbo Hu, Richang Hong

    Abstract: Multimodal learning involves develo** models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships… ▽ More

    Submitted 23 January, 2024; v1 submitted 16 May, 2023; originally announced May 2023.

  33. arXiv:2305.10868  [pdf, other

    cs.CV

    Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided Relation Alignment and Adaptation

    Authors: Yuan Zhou, Xin Chen, Yanrong Guo, Shijie Hao, Richang Hong, Qi Tian

    Abstract: Incremental few-shot semantic segmentation (IFSS) aims to incrementally extend a semantic segmentation model to novel classes according to only a few pixel-level annotated data, while preserving its segmentation capability on previously learned base categories. This task faces a severe semantic-aliasing issue between base and novel classes due to data imbalance, which makes segmentation results un… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  34. arXiv:2305.09241  [pdf, other

    cs.LG cs.CR cs.CV

    Unlearnable Examples Give a False Sense of Security: Piercing through Unexploitable Data with Learnable Examples

    Authors: Wan Jiang, Yunfeng Diao, He Wang, Jianxin Sun, Meng Wang, Richang Hong

    Abstract: Safeguarding data from unauthorized exploitation is vital for privacy and security, especially in recent rampant research in security breach such as adversarial/membership attacks. To this end, \textit{unlearnable examples} (UEs) have been recently proposed as a compelling protection, by adding imperceptible perturbation to data so that models trained on them cannot classify them accurately on ori… ▽ More

    Submitted 3 October, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: Accepted in MM 2023

  35. arXiv:2305.05138  [pdf, other

    cs.CL

    Read, Diagnose and Chat: Towards Explainable and Interactive LLMs-Augmented Depression Detection in Social Media

    Authors: Wei Qin, Zetong Chen, Lei Wang, Yunshi Lan, Weijieying Ren, Richang Hong

    Abstract: This paper proposes a new depression detection system based on LLMs that is both interpretable and interactive. It not only provides a diagnosis, but also diagnostic evidence and personalized recommendations based on natural language dialogue with the user. We address challenges such as the processing of large amounts of text and integrate professional diagnostic criteria. Our system outperforms t… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: 8 pages, 5 figures

  36. arXiv:2305.02556  [pdf, other

    cs.CL

    Faithful Question Answering with Monte-Carlo Planning

    Authors: Ruixin Hong, Hongming Zhang, Hong Zhao, Dong Yu, Changshui Zhang

    Abstract: Although large language models demonstrate remarkable question-answering performances, revealing the intermediate reasoning steps that the models faithfully follow remains challenging. In this paper, we propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps. The reasoning steps are organized as a structured entailment tree, which… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: ACL 2023 main

  37. arXiv:2303.12360  [pdf

    cs.CV eess.IV

    Automatically Predict Material Properties with Microscopic Image Example Polymer Compatibility

    Authors: Zhilong Liang, Zhenzhi Tan, Ruixin Hong, Wanli Ouyang, **ying Yuan, Changshui Zhang

    Abstract: Many material properties are manifested in the morphological appearance and characterized with microscopic image, such as scanning electron microscopy (SEM). Polymer miscibility is a key physical quantity of polymer material and commonly and intuitively judged by SEM images. However, human observation and judgement for the images is time-consuming, labor-intensive and hard to be quantified. Comput… ▽ More

    Submitted 3 August, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

  38. arXiv:2303.09164  [pdf, other

    cs.CV

    Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers

    Authors: Jia Li, Yin Chen, Xuesong Zhang, Jiantao Nie, Ziqiang Li, Yangchen Yu, Yan Zhang, Richang Hong, Meng Wang

    Abstract: In this paper, we present our advanced solutions to the two sub-challenges of Affective Behavior Analysis in the wild (ABAW) 2023: the Emotional Reaction Intensity (ERI) Estimation Challenge and Expression (Expr) Classification Challenge. ABAW 2023 aims to tackle the challenge of affective behavior analysis in natural contexts, with the ultimate goal of creating intelligent machines and robots tha… ▽ More

    Submitted 14 April, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Solutions of HFUT-CVers Team at the 5th ABAW Competition (CVPR 2023 workshop)

  39. arXiv:2303.06869  [pdf, other

    cs.CV

    Adaptive Data-Free Quantization

    Authors: Biao Qian, Yang Wang, Richang Hong, Meng Wang

    Abstract: Data-free quantization (DFQ) recovers the performance of quantized network (Q) without the original data, but generates the fake sample via a generator (G) by learning from full-precision network (P), which, however, is totally independent of Q, overlooking the adaptability of the knowledge from generated samples, i.e., informative or not to the learning process of Q, resulting into the overflow o… ▽ More

    Submitted 20 March, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: 9 pages, 6 figures, Refined camera ready version for CVPR 2023

  40. arXiv:2302.13668  [pdf, other

    cs.CV cs.MM

    Contrastive Video Question Answering via Video Graph Transformer

    Authors: Junbin Xiao, Pan Zhou, Angela Yao, Yicong Li, Richang Hong, Shuicheng Yan, Tat-Seng Chua

    Abstract: We propose to perform video question answering (VideoQA) in a Contrastive manner via a Video Graph Transformer model (CoVGT). CoVGT's uniqueness and superiority are three-fold: 1) It proposes a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations and dynamics, for complex spatio-temporal reasoning. 2) It designs separate video and text tr… ▽ More

    Submitted 11 July, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted by IEEE T-PAMI'23

  41. arXiv:2302.09572  [pdf, other

    cs.CV

    Rethinking Data-Free Quantization as a Zero-Sum Game

    Authors: Biao Qian, Yang Wang, Richang Hong, Meng Wang

    Abstract: Data-free quantization (DFQ) recovers the performance of quantized network (Q) without accessing the real data, but generates the fake sample via a generator (G) by learning from full-precision network (P) instead. However, such sample generation process is totally independent of Q, specialized as failing to consider the adaptability of the generated samples, i.e., beneficial or adversarial, over… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

    Comments: 9 pages, 7 figures, accepted by AAAI 2023

  42. arXiv:2302.06333  [pdf, other

    cs.IR cs.LG

    Improving Recommendation Fairness via Data Augmentation

    Authors: Lei Chen, Le Wu, Kun Zhang, Richang Hong, Defu Lian, Zhiqiang Zhang, Jun Zhou, Meng Wang

    Abstract: Collaborative filtering based recommendation learns users' preferences from all users' historical behavior data, and has been popular to facilitate decision making. R Recently, the fairness issue of recommendation has become more and more essential. A recommender system is considered unfair when it does not perform equally well for different user groups according to users' sensitive attributes~(e.… ▽ More

    Submitted 21 February, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: The paper is accepted by WWW 2023

  43. arXiv:2302.02141  [pdf, other

    cs.CV cs.CL cs.MM

    LipFormer: Learning to Lipread Unseen Speakers based on Visual-Landmark Transformers

    Authors: Feng Xue, Yu Li, Deyin Liu, Yincen Xie, Lin Wu, Richang Hong

    Abstract: Lipreading refers to understanding and further translating the speech of a speaker in the video into natural language. State-of-the-art lipreading methods excel in interpreting overlap speakers, i.e., speakers appear in both training and inference sets. However, generalizing these methods to unseen speakers incurs catastrophic performance degradation due to the limited number of speakers in traini… ▽ More

    Submitted 4 February, 2023; originally announced February 2023.

    Comments: Under review

  44. arXiv:2212.03954  [pdf, other

    cs.AI cs.CV cs.HC cs.LG

    Going Beyond XAI: A Systematic Survey for Explanation-Guided Learning

    Authors: Yuyang Gao, Siyi Gu, Junji Jiang, Sungsoo Ray Hong, Dazhou Yu, Liang Zhao

    Abstract: As the societal impact of Deep Neural Networks (DNNs) grows, the goals for advancing DNNs become more complex and diverse, ranging from improving a conventional model accuracy metric to infusing advanced human virtues such as fairness, accountability, transparency (FaccT), and unbiasedness. Recently, techniques in Explainable Artificial Intelligence (XAI) are attracting considerable attention, and… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

  45. arXiv:2211.10104  [pdf, other

    cs.CV

    Stereo Image Rain Removal via Dual-View Mutual Attention

    Authors: Yanyan Wei, Zhao Zhang, Zhongqiu Zhao, Yang Zhao, Richang Hong, Yi Yang

    Abstract: Stereo images, containing left and right view images with disparity, are utilized in solving low-vision tasks recently, e.g., rain removal and super-resolution. Stereo image restoration methods usually obtain better performance than monocular methods by learning the disparity between dual views either implicitly or explicitly. However, existing stereo rain removal methods still cannot make full us… ▽ More

    Submitted 19 December, 2022; v1 submitted 18 November, 2022; originally announced November 2022.

  46. arXiv:2211.00859  [pdf, other

    cs.CV

    Decoupled Cross-Scale Cross-View Interaction for Stereo Image Enhancement in The Dark

    Authors: Huan Zheng, Zhao Zhang, Jicong Fan, Richang Hong, Yi Yang, Shuicheng Yan

    Abstract: Low-light stereo image enhancement (LLSIE) is a relatively new task to enhance the quality of visually unpleasant stereo images captured in dark condition. However, current methods achieve inferior performance on detail recovery and illumination adjustment. We find it is because: 1) the insufficient single-scale inter-view interaction makes the cross-view cues unable to be fully exploited; 2) lack… ▽ More

    Submitted 12 November, 2022; v1 submitted 2 November, 2022; originally announced November 2022.

  47. arXiv:2210.12487  [pdf, other

    cs.AI cs.CL cs.LO

    MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure

    Authors: Yinya Huang, Hongming Zhang, Ruixin Hong, Xiaodan Liang, Changshui Zhang, Dong Yu

    Abstract: In this paper, we propose a comprehensive benchmark to investigate models' logical reasoning capabilities in complex real-life scenarios. Current explanation datasets often employ synthetic data with simple reasoning structures. Therefore, it cannot express more complex reasoning processes, such as the rebuttal to a reasoning step and the degree of certainty of the evidence. To this end, we propos… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: To appear at the main conference of EMNLP 2022

    Journal ref: EMNLP 2022

  48. MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation

    Authors: Kang Liu, Feng Xue, Dan Guo, Le Wu, Shujie Li, Richang Hong

    Abstract: In most E-commerce platforms, whether the displayed items trigger the user's interest largely depends on their most eye-catching multimodal content. Consequently, increasing efforts focus on modeling multimodal user preference, and the pressing paradigm is to incorporate complete multimodal deep features of the items into the recommendation module. However, the existing studies ignore the mismatch… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  49. Joint Multi-grained Popularity-aware Graph Convolution Collaborative Filtering for Recommendation

    Authors: Kang Liu, Feng Xue, Xiangnan He, Dan Guo, Richang Hong

    Abstract: Graph Convolution Networks (GCNs), with their efficient ability to capture high-order connectivity in graphs, have been widely applied in recommender systems. Stacking multiple neighbor aggregation is the major operation in GCNs. It implicitly captures popularity features because the number of neighbor nodes reflects the popularity of a node. However, existing GCN-based methods ignore a universal… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  50. arXiv:2210.00545  [pdf, other

    cs.CV

    Seeing Through the Noisy Dark: Towards Real-world Low-Light Image Enhancement and Denoising

    Authors: Jiahuan Ren, Zhao Zhang, Richang Hong, Mingliang Xu, Yi Yang, Shuicheng Yan

    Abstract: Low-light image enhancement (LLIE) aims at improving the illumination and visibility of dark images with lighting noise. To handle the real-world low-light images often with heavy and complex noise, some efforts have been made for joint LLIE and denoising, which however only achieve inferior restoration performance. We attribute it to two challenges: 1) in real-world low-light images, noise is som… ▽ More

    Submitted 15 November, 2022; v1 submitted 2 October, 2022; originally announced October 2022.