Skip to main content

Showing 1–43 of 43 results for author: You, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.17815  [pdf, other

    cs.CV

    Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

    Authors: Haogeng Liu, Quanzeng You, Xiaotian Han, Yongfei Liu, Huaibo Huang, Ran He, Hongxia Yang

    Abstract: In the realm of Multimodal Large Language Models (MLLMs), vision-language connector plays a crucial role to link the pre-trained vision encoders with Large Language Models (LLMs). Despite its importance, the vision-language connector has been relatively less explored. In this study, we aim to propose a strong vision-language connector that enables MLLMs to achieve high accuracy while maintain low… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2403.18361  [pdf, other

    cs.CV

    ViTAR: Vision Transformer with Any Resolution

    Authors: Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

    Abstract: This paper tackles a significant challenge faced by Vision Transformers (ViTs): their constrained scalability across different image resolutions. Typically, ViTs experience a performance decline when processing resolutions different from those seen during training. Our work introduces two key innovations to address this issue. Firstly, we propose a novel module for dynamic resolution adjustment, d… ▽ More

    Submitted 28 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  3. arXiv:2403.01487  [pdf, other

    cs.CV

    InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

    Authors: Haogeng Liu, Quanzeng You, Xiaotian Han, Yiqi Wang, Bohan Zhai, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

    Abstract: Multimodal Large Language Models (MLLMs) have experienced significant advancements recently. Nevertheless, challenges persist in the accurate recognition and comprehension of intricate details within high-resolution images. Despite being indispensable for the development of robust MLLMs, this area remains underinvestigated. To tackle this challenge, our work introduces InfiMM-HD, a novel architect… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  4. arXiv:2401.08968  [pdf, other

    cs.CV

    COCO is "ALL'' You Need for Visual Instruction Fine-tuning

    Authors: Xiaotian Han, Yiqi Wang, Bohan Zhai, Quanzeng You, Hongxia Yang

    Abstract: Multi-modal Large Language Models (MLLMs) are increasingly prominent in the field of artificial intelligence. Visual instruction fine-tuning (IFT) is a vital process for aligning MLLMs' output with user's intentions. High-quality and diversified instruction following data is the key to this fine-tuning process. Recent studies propose to construct visual IFT datasets through a multifaceted approach… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  5. arXiv:2401.06805  [pdf, other

    cs.CL cs.AI

    Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

    Authors: Yiqi Wang, Wentao Chen, Xiaotian Han, Xudong Lin, Haiteng Zhao, Yongfei Liu, Bohan Zhai, Jianbo Yuan, Quanzeng You, Hongxia Yang

    Abstract: Strong Artificial Intelligence (Strong AI) or Artificial General Intelligence (AGI) with abstract reasoning ability is the goal of next-generation AI. Recent advancements in Large Language Models (LLMs), along with the emerging field of Multimodal Large Language Models (MLLMs), have demonstrated impressive capabilities across a wide range of multimodal tasks and applications. Particularly, various… ▽ More

    Submitted 18 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

  6. arXiv:2312.05742  [pdf, other

    cs.LG cs.AI

    The Generalization Gap in Offline Reinforcement Learning

    Authors: Ishita Mediratta, Qingfei You, Minqi Jiang, Roberta Raileanu

    Abstract: Despite recent progress in offline learning, these methods are still trained and tested on the same environment. In this paper, we compare the generalization abilities of widely used online and offline learning methods such as online reinforcement learning (RL), offline RL, sequence modeling, and behavioral cloning. Our experiments show that offline learning algorithms perform worse on new environ… ▽ More

    Submitted 14 March, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: Published as a conference paper at ICLR 2024; First two authors contributed equally

  7. arXiv:2312.01408  [pdf, other

    cs.CV

    Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts

    Authors: Tianqi Chen, Yongfei Liu, Zhendong Wang, Jianbo Yuan, Quanzeng You, Hongxia Yang, Mingyuan Zhou

    Abstract: In light of the remarkable success of in-context learning in large language models, its potential extension to the vision domain, particularly with visual foundation models like Stable Diffusion, has sparked considerable interest. Existing approaches in visual in-context learning frequently face hurdles such as expensive pretraining, limiting frameworks, inadequate visual comprehension, and limite… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  8. arXiv:2311.17126  [pdf, other

    cs.CV cs.CL

    Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

    Authors: Xiaohui Chen, Yongfei Liu, Yingxiang Yang, Jianbo Yuan, Quanzeng You, Li-** Liu, Hongxia Yang

    Abstract: Recent advancements in text-to-image (T2I) generative models have shown remarkable capabilities in producing diverse and imaginative visuals based on text prompts. Despite the advancement, these diffusion models sometimes struggle to translate the semantic content from the text into images entirely. While conditioning on the layout has shown to be effective in improving the compositional ability o… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: preprint

  9. arXiv:2311.11567  [pdf, other

    cs.CV

    InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

    Authors: Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang, Hongxia Yang

    Abstract: Multi-modal Large Language Models (MLLMs) are increasingly prominent in the field of artificial intelligence. These models not only excel in traditional vision-language tasks but also demonstrate impressive performance in contemporary multi-modal benchmarks. Although many of these benchmarks attempt to holistically evaluate MLLMs, they typically concentrate on basic reasoning tasks, often yielding… ▽ More

    Submitted 4 December, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

  10. arXiv:2310.06389  [pdf, other

    cs.CV stat.ML

    Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling

    Authors: Huangjie Zheng, Zhendong Wang, Jianbo Yuan, Guanghan Ning, Pengcheng He, Quanzeng You, Hongxia Yang, Mingyuan Zhou

    Abstract: Diffusion models excel at generating photo-realistic images but come with significant computational costs in both training and sampling. While various techniques address these computational challenges, a less-explored issue is designing an efficient and adaptable network backbone for iterative refinement. Current options like U-Net and Vision Transformer often rely on resource-intensive deep netwo… ▽ More

    Submitted 27 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

  11. arXiv:2306.04774  [pdf, other

    cs.CV

    RefineVIS: Video Instance Segmentation with Temporal Attention Refinement

    Authors: Andre Abrantes, Jiang Wang, Peng Chu, Quanzeng You, Zicheng Liu

    Abstract: We introduce a novel framework called RefineVIS for Video Instance Segmentation (VIS) that achieves good object association between frames and accurate segmentation masks by iteratively refining the representations using sequence context. RefineVIS learns two separate representations on top of an off-the-shelf frame-level image instance segmentation model: an association representation responsible… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  12. arXiv:2208.11663  [pdf, other

    cs.CL

    PEER: A Collaborative Language Model

    Authors: Timo Schick, Jane Dwivedi-Yu, Zhengbao Jiang, Fabio Petroni, Patrick Lewis, Gautier Izacard, Qingfei You, Christoforos Nalmpantis, Edouard Grave, Sebastian Riedel

    Abstract: Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes. Agnostic of this process, today's language models are trained to generate only the final result. As a consequence, they lack several abilities crucial for collaborative writing: They are unable to update existing texts, difficult to control and i… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

  13. arXiv:2206.07011  [pdf, other

    cs.CV

    Consistent Video Instance Segmentation with Inter-Frame Recurrent Attention

    Authors: Quanzeng You, Jiang Wang, Peng Chu, Andre Abrantes, Zicheng Liu

    Abstract: Video instance segmentation aims at predicting object segmentation masks for each frame, as well as associating the instances across multiple frames. Recent end-to-end video instance segmentation methods are capable of performing object segmentation and instance association together in a direct parallel sequence decoding/prediction framework. Although these methods generally predict higher quality… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: 11 pages, 5 figures, 4 tables

  14. arXiv:2203.12198  [pdf, other

    cs.CV

    Deep Frequency Filtering for Domain Generalization

    Authors: Shiqi Lin, Zhizheng Zhang, Zhipeng Huang, Yan Lu, Cuiling Lan, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Amey Parulkar, Viraj Navkal, Zhibo Chen

    Abstract: Improving the generalization ability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge. Some theoretical studies have uncovered that DNNs have preferences for some frequency components in the learning process and indicated that this may affect the robustness of learned features. In this paper, we propose Deep Frequency Filtering (DFF) for… ▽ More

    Submitted 25 March, 2023; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Accepted by CVPR2023

  15. arXiv:2201.10654  [pdf, ps, other

    cs.CV

    SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

    Authors: Peixi Xiong, Quanzeng You, Pei Yu, Zicheng Liu, Ying Wu

    Abstract: Visual Question Answering (VQA) attracts much attention from both industry and academia. As a multi-modality task, it is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality representations. Previous approaches extensively employ entity-level alignments, such as the correlations between the visual regions and their semantic labels, o… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

  16. arXiv:2112.06632  [pdf, other

    cs.CV

    Lifelong Unsupervised Domain Adaptive Person Re-identification with Coordinated Anti-forgetting and Adaptation

    Authors: Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Zheng-jun Zha

    Abstract: Unsupervised domain adaptive person re-identification (ReID) has been extensively investigated to mitigate the adverse effects of domain gaps. Those works assume the target domain data can be accessible all at once. However, for the real-world streaming data, this hinders the timely adaptation to changing data statistics and sufficient exploitation of increasing samples. In this paper, to address… ▽ More

    Submitted 29 March, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted by CVPR2022

  17. arXiv:2111.15157  [pdf, other

    cs.CV

    MMPTRACK: Large-scale Densely Annotated Multi-camera Multiple People Tracking Benchmark

    Authors: Xiaotian Han, Quanzeng You, Chunyu Wang, Zhizheng Zhang, Peng Chu, Houdong Hu, Jiang Wang, Zicheng Liu

    Abstract: Multi-camera tracking systems are gaining popularity in applications that demand high-quality tracking results, such as frictionless checkout because monocular multi-object tracking (MOT) systems often fail in cluttered and crowded environments due to occlusion. Multiple highly overlapped cameras can significantly alleviate the problem by recovering partial 3D information. However, the cost of cre… ▽ More

    Submitted 30 November, 2021; originally announced November 2021.

  18. arXiv:2106.13802  [pdf, other

    cs.CV

    Efficient Document Image Classification Using Region-Based Graph Neural Network

    Authors: Jaya Krishna Mandivarapu, Eric Bunch, Qian You, Glenn Fung

    Abstract: Document image classification remains a popular research area because it can be commercialized in many enterprise applications across different industries. Recent advancements in large pre-trained computer vision and language models and graph neural networks has lent document image classification many tools. However using large pre-trained models usually requires substantial computing resources wh… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

  19. arXiv:2106.06471  [pdf, other

    cs.CL cs.CV cs.IR

    Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation

    Authors: Xingyi Yang, Muchao Ye, Quanzeng You, Fenglong Ma

    Abstract: Medical report generation is one of the most challenging tasks in medical image analysis. Although existing approaches have achieved promising results, they either require a predefined template database in order to retrieve sentences or ignore the hierarchical nature of medical report generation. To address these issues, we propose MedWriter that incorporates a novel hierarchical retrieval mechani… ▽ More

    Submitted 25 May, 2021; originally announced June 2021.

    Comments: Accepted by ACL 2021, Camera-ready version

  20. arXiv:2104.00194  [pdf, other

    cs.CV

    TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking

    Authors: Peng Chu, Jiang Wang, Quanzeng You, Haibin Ling, Zicheng Liu

    Abstract: Tracking multiple objects in videos relies on modeling the spatial-temporal interactions of the objects. In this paper, we propose a solution named TransMOT, which leverages powerful graph transformers to efficiently model the spatial and temporal interactions among the objects. TransMOT effectively models the interactions of a large number of objects by arranging the trajectories of the tracked o… ▽ More

    Submitted 3 April, 2021; v1 submitted 31 March, 2021; originally announced April 2021.

  21. arXiv:2103.13917  [pdf, other

    cs.CV

    Disentanglement-based Cross-Domain Feature Augmentation for Effective Unsupervised Domain Adaptive Person Re-identification

    Authors: Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Quanzeng You, Zicheng Liu, Kecheng Zheng, Zhibo Chen

    Abstract: Unsupervised domain adaptive (UDA) person re-identification (ReID) aims to transfer the knowledge from the labeled source domain to the unlabeled target domain for person matching. One challenge is how to generate target domain samples with reliable labels for training. To address this problem, we propose a Disentanglement-based Cross-Domain Feature Augmentation (DCDFA) strategy, where the augment… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

  22. arXiv:2102.00426  [pdf, other

    cs.LG cs.HC cs.IR

    A Simple yet Brisk and Efficient Active Learning Platform for Text Classification

    Authors: Teja Kanchinadam, Qian You, Keith Westpfahl, James Kim, Siva Gunda, Sebastian Seith, Glenn Fung

    Abstract: In this work, we propose the use of a fully managed machine learning service, which utilizes active learning to directly build models from unstructured data. With this tool, business users can quickly and easily build machine learning models and then directly deploy them into a production ready hosted environment without much involvement from data scientists. Our approach leverages state-of-the-ar… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

  23. arXiv:2012.02420  [pdf, other

    cs.CL cs.LG

    Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation

    Authors: Junyu Luo, Zifei Zheng, Hanzhong Ye, Muchao Ye, Yaqing Wang, Quanzeng You, Cao Xiao, Fenglong Ma

    Abstract: Patients with low health literacy usually have difficulty understanding medical jargon and the complex structure of professional medical language. Although some studies are proposed to automatically translate expert language into layperson-understandable language, only a few of them focus on both accuracy and readability aspects simultaneously in the clinical domain. Thus, simplification of the cl… ▽ More

    Submitted 21 September, 2023; v1 submitted 4 December, 2020; originally announced December 2020.

    Comments: COLING 2022

    Journal ref: 2022.coling-1.313

  24. arXiv:2006.05480  [pdf

    eess.IV cs.CV cs.LG

    DcardNet: Diabetic Retinopathy Classification at Multiple Levels Based on Structural and Angiographic Optical Coherence Tomography

    Authors: Pengxiao Zang, Liqin Gao, Tristan T. Hormel, Jie Wang, Qisheng You, Thomas S. Hwang, Yali Jia

    Abstract: Objective: Optical coherence tomography (OCT) and its angiography (OCTA) have several advantages for the early detection and diagnosis of diabetic retinopathy (DR). However, automated, complete DR classification frameworks based on both OCT and OCTA data have not been proposed. In this study, a convolutional neural network (CNN) based method is proposed to fulfill a DR classification framework usi… ▽ More

    Submitted 24 September, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: Accepted for publication by IEEE Transactions on Biomedical Engineering

  25. arXiv:2003.11753  [pdf, other

    cs.CV

    Real-time 3D Deep Multi-Camera Tracking

    Authors: Quanzeng You, Hao Jiang

    Abstract: Tracking a crowd in 3D using multiple RGB cameras is a challenging task. Most previous multi-camera tracking algorithms are designed for offline setting and have high computational complexity. Robust real-time multi-camera 3D tracking is still an unsolved problem. In this work, we propose a novel end-to-end tracking pipeline, Deep Multi-Camera Tracking (DMCT), which achieves reliable real-time mul… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

    Comments: 17 pages, 8 figures

  26. arXiv:1903.01695  [pdf, other

    cs.CV

    Real-time Multiple People Hand Localization in 4D Point Clouds

    Authors: Hao Jiang, Quanzeng You

    Abstract: We propose novel real-time algorithm to localize hands and find their associations with multiple people in the cluttered 4D volumetric data (dynamic 3D volumes). Different from the traditional multiple view approaches, which find key points in 2D and then triangulate to recover the 3D locations, our method directly processes the dynamic 3D data that involve both clutter and crowd. The volumetric r… ▽ More

    Submitted 5 March, 2019; originally announced March 2019.

  27. Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM

    Authors: Yuxiao Chen, Jianbo Yuan, Quanzeng You, Jiebo Luo

    Abstract: Sentiment analysis on large-scale social media data is important to bridge the gaps between social media contents and real world activities including political election prediction, individual and public emotional status monitoring and analysis, and so on. Although textual sentiment analysis has been well studied based on platforms such as Twitter and Instagram, analysis of the role of extensive em… ▽ More

    Submitted 6 August, 2018; v1 submitted 20 July, 2018; originally announced July 2018.

  28. arXiv:1807.03871  [pdf, other

    cs.CV

    "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

    Authors: Tianlang Chen, Zhong** Zhang, Quanzeng You, Chen Fang, Zhaowen Wang, Hailin **, Jiebo Luo

    Abstract: Generating stylized captions for an image is an emerging topic in image captioning. Given an image as input, it requires the system to generate a caption that has a specific style (e.g., humorous, romantic, positive, and negative) while describing the image content semantically accurately. In this paper, we propose a novel stylized image captioning model that effectively takes both requirements in… ▽ More

    Submitted 29 July, 2018; v1 submitted 10 July, 2018; originally announced July 2018.

    Comments: 17 pages, 7 figures, ECCV 2018

  29. arXiv:1806.02424  [pdf, other

    cs.CV

    Action4D: Real-time Action Recognition in the Crowd and Clutter

    Authors: Quanzeng You, Hao Jiang

    Abstract: Recognizing every person's action in a crowded and cluttered environment is a challenging task. In this paper, we propose a real-time action recognition method, Action4D, which gives reliable and accurate results in the real-world settings. We propose to tackle the action recognition problem using a holistic 4D "scan" of a cluttered scene to include every detail about the people and environment. R… ▽ More

    Submitted 6 June, 2018; originally announced June 2018.

  30. arXiv:1803.02952  [pdf, other

    cs.HC

    Touch Your Heart: A Tone-aware Chatbot for Customer Care on Social Media

    Authors: Tianran Hu, Anbang Xu, Zhe Liu, Quanzeng You, Yufan Guo, Vibha Sinha, Jiebo Luo, Rama Akkiraju

    Abstract: Chatbot has become an important solution to rapidly increasing customer care demands on social media in recent years. However, current work on chatbot for customer care ignores a key to impact user experience - tones. In this work, we create a novel tone-aware chatbot that generates toned responses to user requests on social media. We first conduct a formative research, in which the effects of ton… ▽ More

    Submitted 14 March, 2018; v1 submitted 7 March, 2018; originally announced March 2018.

    Comments: 12 pages, CHI 2018

    ACM Class: H.5.3

  31. arXiv:1801.10121  [pdf, other

    cs.CV

    Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

    Authors: Quanzeng You, Hailin **, Jiebo Luo

    Abstract: Automatic image captioning has recently approached human-level performance due to the latest advances in computer vision and natural language understanding. However, most of the current models can only generate plain factual descriptions about the content of a given image. However, for human beings, image caption writing is quite flexible and diverse, where additional language dimensions, such as… ▽ More

    Submitted 30 January, 2018; originally announced January 2018.

    Comments: 8 pages, 5 figures and 4 tables

  32. arXiv:1801.02713  [pdf, ps, other

    cs.IT

    Duality of Channel Encoding and Decoding - Part II: Rate-1 Non-binary Convolutional Codes

    Authors: Qimin You, Yonghui Li, Soung Chang Liew, Branka Vucetic

    Abstract: This is the second part of a series of papers on a revisit to the bidirectional Bahl-Cocke-Jelinek-Raviv (BCJR) soft-in-soft-out (SISO) maximum a posteriori probability (MAP) decoding algorithm. Part I revisited the BCJR MAP decoding algorithm for rate-1 binary convolutional codes and proposed a linear complexity decoder using shift registers in the complex number field. Part II proposes a low com… ▽ More

    Submitted 8 January, 2018; originally announced January 2018.

    Comments: A linear complexity MAP decoder of rate-1 Non-binary Convolutional Codes. The proposed decoder significantly reduces the computational complexity of the bidirectional BCJR MAP algorithm from exponential to linear with constraint length of convolutional codes.For binary decoder, please refer to part I paper: arXiv:1201.2483

    Journal ref: European Transactions on Emerging Telecommunications Technologies (ETT), vol. 27, no. 5, May 2016, pp. 685-697

  33. Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks

    Authors: Fenglong Ma, Radha Chitta, **g Zhou, Quanzeng You, Tong Sun, **g Gao

    Abstract: Predicting the future health information of patients from the historical Electronic Health Records (EHR) is a core research task in the development of personalized healthcare. Patient EHR data consist of sequences of visits over time, where each visit contains multiple medical codes, including diagnosis, medication, and procedure codes. The most important challenges for this task are to model the… ▽ More

    Submitted 18 June, 2017; originally announced June 2017.

  34. arXiv:1705.08974  [pdf, other

    cs.CV cs.SI

    Cultural Diffusion and Trends in Facebook Photographs

    Authors: Quanzeng You, Darío García-García, Mahohar Paluri, Jiebo Luo, Jungseock Joo

    Abstract: Online social media is a social vehicle in which people share various moments of their lives with their friends, such as playing sports, cooking dinner or just taking a selfie for fun, via visual means, that is, photographs. Our study takes a closer look at the popular visual concepts illustrating various cultural lifestyles from aggregated, de-identified photographs. We perform analysis both at m… ▽ More

    Submitted 24 May, 2017; originally announced May 2017.

    Comments: 10 pages, To appear in ICWSM 2017 (Full Paper)

  35. Image Based Appraisal of Real Estate Properties

    Authors: Quanzeng You, Ran Pang, Liangliang Cao, Jiebo Luo

    Abstract: Real estate appraisal, which is the process of estimating the price for real estate properties, is crucial for both buys and sellers as the basis for negotiation and transaction. Traditionally, the repeat sales model has been widely adopted to estimate real estate price. However, it depends the design and calculation of a complex economic related index, which is challenging to estimate accurately.… ▽ More

    Submitted 27 July, 2017; v1 submitted 28 November, 2016; originally announced November 2016.

    Comments: 8 pages, 8 figures

  36. arXiv:1610.09002  [pdf, other

    cs.SI

    The Effect of Pets on Happiness: A Data-Driven Approach via Large-Scale Social Media

    Authors: Yuchen Wu, Jianbo Yuan, Quanzeng You, Jiebo Luo

    Abstract: Psychologists have demonstrated that pets have a positive impact on owners' happiness. For example, lonely people are often advised to have a dog or cat to quell their social isolation. Conventional psychological research methods of analyzing this phenomenon are mostly based on surveys or self-reported questionnaires, which are time-consuming and lack of scalability. Utilizing social media as an a… ▽ More

    Submitted 27 October, 2016; originally announced October 2016.

    Comments: In Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 2016

  37. arXiv:1605.02677  [pdf, other

    cs.AI cs.CV

    Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark

    Authors: Quanzeng You, Jiebo Luo, Hailin **, Jianchao Yang

    Abstract: Psychological research results have confirmed that people can have different emotional reactions to different visual stimuli. Several papers have been published on the problem of visual emotion analysis. In particular, attempts have been made to analyze and predict people's emotional reaction towards images. To this end, different kinds of hand-tuned features are proposed. The results reported on… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 7 pages, 7 figures, AAAI 2016

  38. arXiv:1604.07103  [pdf, other

    cs.SI

    Voting with Feet: Who are Leaving Hillary Clinton and Donald Trump?

    Authors: Yu Wang, Yuncheng Li, Quanzeng You, Xiyang Zhang, Richard Niemi, Jiebo Luo

    Abstract: From a crowded field with 17 candidates, Hillary Clinton and Donald Trump have emerged as the two front-runners in the 2016 U.S. presidential campaign. The two candidates each boast more than 5 million followers on Twitter, and at the same time both have witnessed hundreds of thousands of people leave their camps. In this paper we attempt to characterize individuals who have left Hillary Clinton a… ▽ More

    Submitted 25 April, 2016; v1 submitted 24 April, 2016; originally announced April 2016.

    Comments: 4 pages, 8 figures, under review

  39. arXiv:1603.03925  [pdf, other

    cs.CV

    Image Captioning with Semantic Attention

    Authors: Quanzeng You, Hailin **, Zhaowen Wang, Chen Fang, Jiebo Luo

    Abstract: Automatically generating a natural language description of an image has attracted interests recently both because of its importance in practical applications and because it connects two major artificial intelligence fields: computer vision and natural language processing. Existing approaches are either top-down, which start from a gist of an image and convert it into words, or bottom-up, which com… ▽ More

    Submitted 12 March, 2016; originally announced March 2016.

    Comments: 10 pages, 5 figures, CVPR16

  40. arXiv:1509.06041  [pdf, other

    cs.CV cs.IR cs.LG

    Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks

    Authors: Quanzeng You, Jiebo Luo, Hailin **, Jianchao Yang

    Abstract: Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment analysis to develop systems to predict political elections, measure economic indicators, and so on. Recently, social media users are increasingly using images and videos to express their opinions and share their experiences. Sentiment analysis… ▽ More

    Submitted 20 September, 2015; originally announced September 2015.

    Comments: 9 pages, 5 figures, AAAI 2015

  41. arXiv:1504.04558  [pdf, other

    cs.SI cs.IR cs.MM

    A Picture Tells a Thousand Words -- About You! User Interest Profiling from User Generated Visual Content

    Authors: Quanzeng You, Sumit Bhatia, Jiebo Luo

    Abstract: Inference of online social network users' attributes and interests has been an active research topic. Accurate identification of users' attributes and interests is crucial for improving the performance of personalization and recommender systems. Most of the existing works have focused on textual content generated by the users and have successfully used it for predicting users' interests and other… ▽ More

    Submitted 17 April, 2015; originally announced April 2015.

    Comments: 7 pages, 6 Figures, 4 Tables

  42. arXiv:1209.4405  [pdf, ps, other

    cs.IT math.NA

    Strongly Convex Programming for Principal Component Pursuit

    Authors: Qingshan You, Qun Wan, Yipeng Liu

    Abstract: In this paper, we address strongly convex programming for princi- pal component pursuit with reduced linear measurements, which decomposes a superposition of a low-rank matrix and a sparse matrix from a small set of linear measurements. We first provide sufficient conditions under which the strongly convex models lead to the exact low-rank and sparse matrix recov- ery; Second, we also give suggest… ▽ More

    Submitted 19 September, 2012; originally announced September 2012.

    Comments: 10 pages

  43. Duality of Channel Encoding and Decoding - Part I: Rate-1 Binary Convolutional Codes

    Authors: Yonghui Li, Qimin You, Soung C. Liew, Branka Vucetic

    Abstract: In this paper, we revisit the forward, backward and bidirectional Bahl-Cocke-Jelinek-Raviv (BCJR) soft-input soft-output (SISO) maximum a posteriori probability (MAP) decoding process of rate-1 binary convolutional codes. From this we establish some interesting explicit relationships between encoding and decoding of rate-1 convolutional codes. We observe that the forward and backward BCJR SISO MAP… ▽ More

    Submitted 10 November, 2016; v1 submitted 12 January, 2012; originally announced January 2012.

    Comments: 32 pages, 20 figures, to appear in ETT