Skip to main content

Showing 1–32 of 32 results for author: Tu, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02043  [pdf, other

    cs.CL

    Concise and Precise Context Compression for Tool-Using Language Models

    Authors: Yang Xu, Yunlong Feng, Honglin Mu, Yutai Hou, Yitong Li, Xinghao Wang, Wanjun Zhong, Zhongyang Li, Dandan Tu, Qingfu Zhu, Min Zhang, Wanxiang Che

    Abstract: Through reading the documentation in the context, tool-using language models can dynamically extend their capability using external tools. The cost is that we have to input lengthy documentation every time the model needs to use the tool, occupying the input window as well as slowing down the decoding process. Given the progress in general-purpose compression, soft context compression is a suita… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. Object Detection in Thermal Images Using Deep Learning for Unmanned Aerial Vehicles

    Authors: Minh Dang Tu, Kieu Trang Le, Manh Duong Phung

    Abstract: This work presents a neural network model capable of recognizing small and tiny objects in thermal images collected by unmanned aerial vehicles. Our model consists of three parts, the backbone, the neck, and the prediction head. The backbone is developed based on the structure of YOLOv5 combined with the use of a transformer encoder at the end. The neck includes a BI-FPN block combined with the us… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Published in: 2024 IEEE/SICE International Symposium on System Integration (SII)

  3. arXiv:2401.01004  [pdf

    q-bio.BM cs.LG

    Predicting the activity of chemical compounds based on machine learning approaches

    Authors: Do Hoang Tu, Tran Van Lang, Pham Cong Xuyen, Le Mau Long

    Abstract: Exploring methods and techniques of machine learning (ML) to address specific challenges in various fields is essential. In this work, we tackle a problem in the domain of Cheminformatics; that is, providing a suitable solution to aid in predicting the activity of a chemical compound to the best extent possible. To address the problem at hand, this study conducts experiments on 100 different combi… ▽ More

    Submitted 10 September, 2023; originally announced January 2024.

  4. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  5. arXiv:2310.17910  [pdf, other

    cs.CV

    DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF

    Authors: Chaowei Liu, Jichun Li, Yihua Teng, Chaoqun Wang, Nuo Xu, Jihao Wu, Dandan Tu

    Abstract: For capturing colored document images, e.g. posters and magazines, it is common that multiple degradations such as shadows, wrinkles, etc., are simultaneously introduced due to external factors. Restoring multi-degraded colored document images is a great challenge, yet overlooked, as most existing algorithms focus on enhancing color-ignored document images via binarization. Thus, we propose DocSto… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

  6. arXiv:2308.13857  [pdf, other

    cs.CV

    Joint Gaze-Location and Gaze-Object Detection

    Authors: Danyang Tu, Wei Shen, Wei Sun, Xiongkuo Min, Guangtao Zhai

    Abstract: This paper proposes an efficient and effective method for joint gaze location detection (GL-D) and gaze object detection (GO-D), \emph{i.e.}, gaze following detection. Current approaches frame GL-D and GO-D as two separate tasks, employing a multi-stage framework where human head crops must first be detected and then be fed into a subsequent GL-D sub-network, which is further followed by an additi… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

    Comments: Technical Report. arXiv admin note: text overlap with arXiv:2203.10433

  7. arXiv:2308.08370  [pdf, other

    cs.CV cs.AI

    Agglomerative Transformer for Human-Object Interaction Detection

    Authors: Danyang Tu, Wei Sun, Guangtao Zhai, Wei Shen

    Abstract: We propose an agglomerative Transformer (AGER) that enables Transformer-based human-object interaction (HOI) detectors to flexibly exploit extra instance-level cues in a single-stage and end-to-end manner for the first time. AGER acquires instance tokens by dynamically clustering patch tokens and aligning cluster centers to instances with textual guidance, thus enjoying two benefits: 1) Integralit… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV'23

  8. arXiv:2306.12964  [pdf, other

    q-fin.ST cs.AI cs.CE cs.LG q-fin.CP

    Generating Synergistic Formulaic Alpha Collections via Reinforcement Learning

    Authors: Shuo Yu, Hongyan Xue, Xiang Ao, Feiyang Pan, Jia He, Dandan Tu, Qing He

    Abstract: In the field of quantitative trading, it is common practice to transform raw historical stock data into indicative signals for the market trend. Such signals are called alpha factors. Alphas in formula forms are more interpretable and thus favored by practitioners concerned with risk. In practice, a set of formulaic alphas is often used together for better modeling precision, so we need to find sy… ▽ More

    Submitted 25 May, 2023; originally announced June 2023.

    Comments: Accepted by KDD '23, ADS track

  9. arXiv:2306.02421  [pdf, other

    cs.DB cs.LG

    Auto-Validate by-History: Auto-Program Data Quality Constraints to Validate Recurring Data Pipelines

    Authors: Dezhan Tu, Yeye He, Weiwei Cui, Song Ge, Haidong Zhang, Han Shi, Dongmei Zhang, Surajit Chaudhuri

    Abstract: Data pipelines are widely employed in modern enterprises to power a variety of Machine-Learning (ML) and Business-Intelligence (BI) applications. Crucially, these pipelines are \emph{recurring} (e.g., daily or hourly) in production settings to keep data updated so that ML models can be re-trained regularly, and BI dashboards refreshed frequently. However, data quality (DQ) issues can often creep i… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: full version of a paper accepted to KDD 2023

  10. arXiv:2303.17316  [pdf, other

    cs.CV

    Masked Autoencoders as Image Processors

    Authors: Huiyu Duan, Wei Shen, Xiongkuo Min, Danyang Tu, Long Teng, Jia Wang, Guangtao Zhai

    Abstract: Transformers have shown significant effectiveness for various vision tasks including both high-level vision and low-level vision. Recently, masked autoencoders (MAE) for feature pre-training have further unleashed the potential of Transformers, leading to state-of-the-art performances on various high-level vision tasks. However, the significance of MAE pre-training on low-level vision tasks has no… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

  11. arXiv:2303.14933  [pdf, other

    cs.CV

    MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

    Authors: Zicheng Zhang, Wei Wu, Wei Sun, Dangyang Tu, Wei Lu, Xiongkuo Min, Ying Chen, Guangtao Zhai

    Abstract: User-generated content (UGC) live videos are often bothered by various distortions during capture procedures and thus exhibit diverse visual qualities. Such source videos are further compressed and transcoded by media server providers before being distributed to end-users. Because of the flourishing of UGC live videos, effective video quality assessment (VQA) tools are needed to monitor and percep… ▽ More

    Submitted 19 April, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR2023

  12. arXiv:2303.12123  [pdf, other

    eess.IV cs.CV

    Oral-3Dv2: 3D Oral Reconstruction from Panoramic X-Ray Imaging with Implicit Neural Representation

    Authors: Weinan Song, Haoxin Zheng, Dezhan Tu, Chengwen Liang, Lei He

    Abstract: 3D reconstruction of medical imaging from 2D images has become an increasingly interesting topic with the development of deep learning models in recent years. Previous studies in 3D reconstruction from limited X-ray images mainly rely on learning from paired 2D and 3D images, where the reconstruction quality relies on the scale and variation of collected data. This has brought significant challeng… ▽ More

    Submitted 3 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

  13. arXiv:2303.11716  [pdf, other

    cs.LG cs.AI q-fin.RM

    Style Miner: Find Significant and Stable Explanatory Factors in Time Series with Constrained Reinforcement Learning

    Authors: Dapeng Li, Feiyang Pan, Jia He, Zhiwei Xu, Dandan Tu, Guoliang Fan

    Abstract: In high-dimensional time-series analysis, it is essential to have a set of key factors (namely, the style factors) that explain the change of the observed variable. For example, volatility modeling in finance relies on a set of risk factors, and climate change studies in climatology rely on a set of causal factors. The ideal low-dimensional style factors should balance significance (with high expl… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: 9 pages, 6 figures

  14. arXiv:2302.00179  [pdf, other

    cs.CV

    Stable Attribute Group Editing for Reliable Few-shot Image Generation

    Authors: Guanqi Ding, Xinzhe Han, Shuhui Wang, Xin **, Dandan Tu, Qingming Huang

    Abstract: Few-shot image generation aims to generate data of an unseen category based on only a few samples. Apart from basic content generation, a bunch of downstream applications hopefully benefit from this task, such as low-data detection and few-shot classification. To achieve this goal, the generated images should guarantee category retention for classification beyond the visual quality and diversity.… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

  15. arXiv:2206.01908  [pdf, other

    cs.CV

    Video-based Human-Object Interaction Detection from Tubelet Tokens

    Authors: Danyang Tu, Wei Sun, Xiongkuo Min, Guangtao Zhai, Wei Shen

    Abstract: We present a novel vision Transformer, named TUTOR, which is able to learn tubelet tokens, served as highly-abstracted spatiotemporal representations, for video-based human-object interaction (V-HOI) detection. The tubelet tokens structurize videos by agglomerating and linking semantically-related patch tokens along spatial and temporal domains, which enjoy two benefits: 1) Compactness: each tubel… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

  16. arXiv:2204.08308  [pdf, other

    cs.CV

    Saliency in Augmented Reality

    Authors: Huiyu Duan, Wei Shen, Xiongkuo Min, Danyang Tu, **g Li, Guangtao Zhai

    Abstract: With the rapid development of multimedia technology, Augmented Reality (AR) has become a promising next-generation mobile platform. The primary theory underlying AR is human visual confusion, which allows users to perceive the real-world scenes and augmented contents (virtual-world scenes) simultaneously by superimposing them together. To achieve good Quality of Experience (QoE), it is important t… ▽ More

    Submitted 12 July, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

  17. arXiv:2204.00795  [pdf, other

    cs.CV

    Unsupervised Coherent Video Cartoonization with Perceptual Motion Consistency

    Authors: Zhenhuan Liu, Liang Li, Huajie Jiang, Xin **, Dandan Tu, Shuhui Wang, Zheng-Jun Zha

    Abstract: In recent years, creative content generations like style transfer and neural photo editing have attracted more and more attention. Among these, cartoonization of real-world scenes has promising applications in entertainment and industry. Different from image translations focusing on improving the style effect of generated images, video cartoonization has additional requirements on the temporal con… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

  18. arXiv:2203.12872  [pdf, other

    cs.CV

    Intrinsic Bias Identification on Medical Image Datasets

    Authors: Shijie Zhang, Lanjun Wang, Lian Ding, An-an Liu, Senhua Zhu, Dandan Tu

    Abstract: Machine learning based medical image analysis highly depends on datasets. Biases in the dataset can be learned by the model and degrade the generalizability of the applications. There are studies on debiased models. However, scientists and practitioners are difficult to identify implicit biases in the datasets, which causes lack of reliable unbias test datasets to valid models. To tackle this issu… ▽ More

    Submitted 29 March, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: 19pages, 12 figures

  19. arXiv:2203.10537  [pdf, other

    cs.CV

    Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows

    Authors: Danyang Tu, Xiongkuo Min, Huiyu Duan, Guodong Guo, Guangtao Zhai, Wei Shen

    Abstract: This paper presents a new vision Transformer, named Iwin Transformer, which is specifically designed for human-object interaction (HOI) detection, a detailed scene understanding task involving a sequential process of human/object detection and interaction recognition. Iwin Transformer is a hierarchical Transformer which progressively performs token representation learning and token agglomeration w… ▽ More

    Submitted 19 October, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

    Comments: Accepted to ECCV 2022

  20. arXiv:2203.10433  [pdf, other

    cs.CV

    End-to-End Human-Gaze-Target Detection with Transformers

    Authors: Danyang Tu, Xiongkuo Min, Huiyu Duan, Guodong Guo, Guangtao Zhai, Wei Shen

    Abstract: In this paper, we propose an effective and efficient method for Human-Gaze-Target (HGT) detection, i.e., gaze following. Current approaches decouple the HGT detection task into separate branches of salient object detection and human gaze prediction, employing a two-stage framework where human head locations must first be detected and then be fed into the next gaze target prediction sub-network. In… ▽ More

    Submitted 23 March, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  21. arXiv:2203.08422  [pdf, other

    cs.CV

    Attribute Group Editing for Reliable Few-shot Image Generation

    Authors: Guanqi Ding, Xinzhe Han, Shuhui Wang, Shuzhe Wu, Xin **, Dandan Tu, Qingming Huang

    Abstract: Few-shot image generation is a challenging task even using the state-of-the-art Generative Adversarial Networks (GANs). Due to the unstable GAN training process and the limited training data, the generated images are often of low quality and low diversity. In this work, we propose a new editing-based method, i.e., Attribute Group Editing (AGE), for few-shot image generation. The basic assumption i… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: CVPR2022

  22. arXiv:2203.02797  [pdf, other

    cs.CL

    ClueGraphSum: Let Key Clues Guide the Cross-Lingual Abstractive Summarization

    Authors: Shuyu Jiang, Dengbiao Tu, Xingshu Chen, Rui Tang, Wenxian Wang, Haizhou Wang

    Abstract: Cross-Lingual Summarization (CLS) is the task to generate a summary in one language for an article in a different language. Previous studies on CLS mainly take pipeline methods or train the end-to-end model using the translated parallel data. However, the quality of generated cross-lingual summaries needs more further efforts to improve, and the model performance has never been evaluated on the ha… ▽ More

    Submitted 9 March, 2022; v1 submitted 5 March, 2022; originally announced March 2022.

    Comments: 12 pages,4 figures

  23. Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence

    Authors: Xiang Bai, Hanchen Wang, Liya Ma, Yongchao Xu, Jiefeng Gan, Ziwei Fan, Fan Yang, Ke Ma, Jiehua Yang, Song Bai, Chang Shu, Xinyu Zou, Renhao Huang, Changzheng Zhang, Xiaowu Liu, Dandan Tu, Chuou Xu, Wenqing Zhang, Xi Wang, Anguo Chen, Yu Zeng, Dehua Yang, Ming-Wei Wang, Nagaraj Holalkere, Neil J. Halin , et al. (21 additional authors not shown)

    Abstract: Artificial intelligence (AI) provides a promising substitution for streamlining COVID-19 diagnoses. However, concerns surrounding security and trustworthiness impede the collection of large-scale representative medical data, posing a considerable challenge for training a well-generalised model in clinical practices. To address this, we launch the Unified CT-COVID AI Diagnostic Initiative (UCADI),… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: Nature Machine Intelligence

  24. arXiv:2108.01997  [pdf, other

    eess.IV cs.CV cs.LG

    DuCN: Dual-children Network for Medical Diagnosis and Similar Case Recommendation towards COVID-19

    Authors: Chengtao Peng, Yunfei Long, Senhua Zhu, Dandan Tu, Bin Li

    Abstract: Early detection of the coronavirus disease 2019 (COVID-19) helps to treat patients timely and increase the cure rate, thus further suppressing the spread of the disease. In this study, we propose a novel deep learning based detection and similar case recommendation network to help control the epidemic. Our proposed network contains two stages: the first one is a lung region segmentation step and i… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

  25. Blind Quality Assessment for in-the-Wild Images via Hierarchical Feature Fusion and Iterative Mixed Database Training

    Authors: Wei Sun, Xiongkuo Min, Danyang Tu, Guangtao Zhai, Siwei Ma

    Abstract: Image quality assessment (IQA) is very important for both end-users and service providers since a high-quality image can significantly improve the user's quality of experience (QoE) and also benefit lots of computer vision algorithms. Most existing blind image quality assessment (BIQA) models were developed for synthetically distorted images, however, they perform poorly on in-the-wild images, whi… ▽ More

    Submitted 27 April, 2023; v1 submitted 30 May, 2021; originally announced May 2021.

    Comments: Accepted by IEEE Journal of Selected Topics in Signal Processing

  26. arXiv:2101.00098  [pdf, other

    cs.HC

    OralViewer: 3D Demonstration of Dental Surgeries for Patient Education with Oral Cavity Reconstruction from a 2D Panoramic X-ray

    Authors: Yuan Liang, Liang Qiu, Tiancheng Lu, Zhujun Fang, Dezhan Tu, Jiawei Yang, Tiandong Zhao, Yiting Shao, Kun Wang, Xiang 'Anthony' Chen, Lei He

    Abstract: Patient's understanding on forthcoming dental surgeries is required by patient-centered care and helps reduce fear and anxiety. Due to the gap of expertise between patients and dentists, conventional techniques of patient education are usually not effective for explaining surgical steps. In this paper, we present \textit{OralViewer} -- the first interactive application that enables dentist's demon… ▽ More

    Submitted 31 December, 2020; originally announced January 2021.

  27. arXiv:2010.04893  [pdf, other

    cs.LG

    Trust the Model When It Is Confident: Masked Model-based Actor-Critic

    Authors: Feiyang Pan, Jia He, Dandan Tu, Qing He

    Abstract: It is a popular belief that model-based Reinforcement Learning (RL) is more sample efficient than model-free RL, but in practice, it is not always true due to overweighed model errors. In complex and noisy settings, model-based RL tends to have trouble using the model if it does not know when to trust the model. In this work, we find that better model usage can make a huge difference. We show th… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020

  28. arXiv:2007.11349  [pdf, other

    cs.CV

    Learning Directional Feature Maps for Cardiac MRI Segmentation

    Authors: Feng Cheng, Cheng Chen, Yukang Wang, Heshui Shi, Yukun Cao, Dandan Tu, Changzheng Zhang, Yongchao Xu

    Abstract: Cardiac MRI segmentation plays a crucial role in clinical diagnosis for evaluating personalized cardiac performance parameters. Due to the indistinct boundaries and heterogeneous intensity distributions in the cardiac MRI, most existing methods still suffer from two aspects of challenges: inter-class indistinction and intra-class inconsistency. To tackle these two problems, we propose a novel meth… ▽ More

    Submitted 22 July, 2020; originally announced July 2020.

    Comments: Accepted by MICCAI2020

  29. arXiv:2001.10161  [pdf, other

    cs.AI cs.CL

    Bringing Stories Alive: Generating Interactive Fiction Worlds

    Authors: Prithviraj Ammanabrolu, Wesley Cheung, Dan Tu, William Broniec, Mark O. Riedl

    Abstract: World building forms the foundation of any task that requires narrative intelligence. In this work, we focus on procedurally generating interactive fiction worlds---text-based worlds that players "see" and "talk to" using natural language. Generating these worlds requires referencing everyday and thematic commonsense priors in addition to being semantically consistent, interesting, and coherent th… ▽ More

    Submitted 27 January, 2020; originally announced January 2020.

  30. arXiv:1910.13983  [pdf, other

    cs.LG cs.CY stat.ML

    DADI: Dynamic Discovery of Fair Information with Adversarial Reinforcement Learning

    Authors: Michiel A. Bakker, Duy Patrick Tu, Humberto Riverón Valdés, Krishna P. Gummadi, Kush R. Varshney, Adrian Weller, Alex Pentland

    Abstract: We introduce a framework for dynamic adversarial discovery of information (DADI), motivated by a scenario where information (a feature set) is used by third parties with unknown objectives. We train a reinforcement learning agent to sequentially acquire a subset of the information while balancing accuracy and fairness of predictors downstream. Based on the set of already acquired features, the age… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: Accepted at NeurIPS 2019 HCML Workshop

  31. arXiv:1805.07311  [pdf, ps, other

    math.OC cs.CC cs.LG

    Blended Conditional Gradients: the unconditioning of conditional gradients

    Authors: Gábor Braun, Sebastian Pokutta, Dan Tu, Stephen Wright

    Abstract: We present a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank--Wolfe algorithm (also called conditional gradient) with gradient-based steps, different from away steps and pairwise steps, but still achieving linear convergence for strongly convex functions, along with good practical performance. Our approach retains all favorable p… ▽ More

    Submitted 31 May, 2019; v1 submitted 18 May, 2018; originally announced May 2018.

    Comments: 33 pages + 12 figures

    MSC Class: 68Q32; 90C52

  32. arXiv:1802.02142  [pdf

    cs.CV

    Face Detection Using Improved Faster RCNN

    Authors: Changzheng Zhang, Xiang Xu, Dandan Tu

    Abstract: Faster RCNN has achieved great success for generic object detection including PASCAL object detection and MS COCO object detection. In this report, we propose a detailed designed Faster RCNN method named FDNet1.0 for face detection. Several techniques were employed including multi-scale training, multi-scale testing, light-designed RCNN, some tricks for inference and a vote-based ensemble method.… ▽ More

    Submitted 6 February, 2018; originally announced February 2018.