Skip to main content

Showing 1–50 of 75 results for author: Geng, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05184  [pdf, other

    cs.CV

    The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

    Authors: Scott Geng, Cheng-Yu Hsieh, Vivek Ramanujan, Matthew Wallingford, Chun-Liang Li, Pang Wei Koh, Ranjay Krishna

    Abstract: Generative text-to-image models enable us to synthesize unlimited amounts of images in a controllable manner, spurring many recent efforts to train vision models with synthetic data. However, every synthetic image ultimately originates from the upstream data used to train the generator. What additional value does the intermediate generator provide over directly training on relevant parts of the up… ▽ More

    Submitted 3 July, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Correspondence to sgeng at cs dot washington dot edu. RK and PWK equally advised the project

  2. arXiv:2405.05945  [pdf, other

    cs.CV

    Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

    Authors: Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, **gwen He, Yu Qiao, Hongsheng Li

    Abstract: Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified f… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Technical Report; Code at: https://github.com/Alpha-VLLM/Lumina-T2X

  3. arXiv:2405.03239  [pdf, other

    cs.LG cs.AI

    Deep Learning for Detecting and Early Predicting Chronic Obstructive Pulmonary Disease from Spirogram Time Series: A UK Biobank Study

    Authors: Shuhao Mei, Yuxi Zhou, Jiahao Xu, Yuxuan Wan, Shan Cao, Qinghao Zhao, Shijia Geng, Junqing Xie, Shenda Hong

    Abstract: Chronic Obstructive Pulmonary Disease (COPD) is a chronic inflammatory lung condition that causes airflow obstruction. The existing methods can only detect patients who already have COPD based on obvious features shown in the spirogram (In this article, the spirogram specifically involves measuring Volume-Flow curve time series). Early prediction of COPD risk is vital for monitoring COPD disease p… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  4. arXiv:2404.07940  [pdf, other

    cs.SE cs.LG

    InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models

    Authors: Linyi Li, Shijie Geng, Zhenwen Li, Yibo He, Hao Yu, Ziyue Hua, Guanghan Ning, Siwei Wang, Tao Xie, Hongxia Yang

    Abstract: Large Language Models for code (code LLMs) have witnessed tremendous progress in recent years. With the rapid development of code LLMs, many popular evaluation benchmarks, such as HumanEval, DS-1000, and MBPP, have emerged to measure the performance of code LLMs with a particular focus on code generation tasks. However, they are insufficient to cover the full range of expected capabilities of code… ▽ More

    Submitted 27 June, 2024; v1 submitted 10 March, 2024; originally announced April 2024.

    Comments: 30 pages, 10 pages for main content, work in progress

  5. arXiv:2403.11183  [pdf, other

    cs.CL

    Decoding Continuous Character-based Language from Non-invasive Brain Recordings

    Authors: Cenyuan Zhang, Xiaoqing Zheng, Ruicheng Yin, Shujie Geng, Jianhan Xu, Xuan Gao, Changze Lv, Zixuan Ling, Xuan**g Huang, Miao Cao, Jianfeng Feng

    Abstract: Deciphering natural language from brain activity through non-invasive devices remains a formidable challenge. Previous non-invasive decoders either require multiple experiments with identical stimuli to pinpoint cortical regions and enhance signal-to-noise ratios in brain activity, or they are limited to discerning basic linguistic elements such as letters and words. We propose a novel approach to… ▽ More

    Submitted 19 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  6. arXiv:2403.06011  [pdf, other

    cs.LG math.OC

    Reinforcement Learning Paycheck Optimization for Multivariate Financial Goals

    Authors: Melda Alaluf, Giulia Crippa, Sinong Geng, Zijian **g, Nikhil Krishnan, Sanjeev Kulkarni, Wyatt Navarro, Ronnie Sircar, Jonathan Tang

    Abstract: We study paycheck optimization, which examines how to allocate income in order to achieve several competing financial goals. For paycheck optimization, a quantitative methodology is missing, due to a lack of a suitable problem formulation. To deal with this issue, we formulate the problem as a utility maximization problem. The proposed formulation is able to (i) unify different financial goals; (i… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Journal ref: Risk and Decision Analysis, Volume 9, 2023

  7. arXiv:2402.05935  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

    Authors: Dongyang Liu, Renrui Zhang, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng **, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao, Peng Gao

    Abstract: We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we… ▽ More

    Submitted 26 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML 2024. Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory

  8. arXiv:2402.02968  [pdf, other

    cs.CV cs.LG

    Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

    Authors: Sheng Luo, Wei Chen, Wanxin Tian, Rui Liu, Luanxuan Hou, Xiubao Zhang, Haifeng Shen, Ruiqi Wu, Shuyi Geng, Yi Zhou, Ling Shao, Yi Yang, Bojun Gao, Qun Li, Guobin Wu

    Abstract: Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the context of intelligent vehicles, leveraging the power of foundation models has proven to be transformative, offering notable advancements in visual understanding. Equipped with multi-modal and multi-task learning capabilitie… ▽ More

    Submitted 26 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted to IEEE Transactions on Intelligent Vehicles(T-IV). 24 pages, 9 figures, 1 table

  9. arXiv:2401.12783  [pdf, other

    cs.AI cs.LG eess.SP

    A Review of Deep Learning Methods for Photoplethysmography Data

    Authors: Guangkun Nie, Jiabao Zhu, Gongzheng Tang, Deyun Zhang, Shijia Geng, Qinghao Zhao, Shenda Hong

    Abstract: Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this rev… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  10. arXiv:2401.09967  [pdf, other

    cs.CL

    Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access

    Authors: Saibo Geng, Berkay Döner, Chris Wendler, Martin Josifoski, Robert West

    Abstract: Constrained decoding, a technique for enforcing constraints on language model outputs, offers a way to control text generation without retraining or architectural modifications. Its application is, however, typically restricted to models that give users access to next-token distributions (usually via softmax logits), which poses a limitation with blackbox large language models (LLMs). This paper i… ▽ More

    Submitted 2 July, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ACL 2024 Main Conference

  11. arXiv:2309.15940  [pdf, other

    cs.RO cs.CV

    Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

    Authors: Haonan Chang, Kowndinya Boyalakuntla, Shiyang Lu, Siwei Cai, Eric **g, Shreesh Keskar, Shijie Geng, Adeeb Abbas, Lifeng Zhou, Kostas Bekris, Abdeslam Boularias

    Abstract: We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: The code and dataset used for evaluation can be found at https://github.com/changhaonan/OVSG}{https://github.com/changhaonan/OVSG. This paper has been accepted by CoRL2023

  12. arXiv:2308.03312  [pdf, other

    cs.LG cs.CR cs.PL

    Exploiting Code Symmetries for Learning Program Semantics

    Authors: Kexin Pei, Weichen Li, Qirui **, Shuyang Liu, Scott Geng, Lorenzo Cavallaro, Junfeng Yang, Suman Jana

    Abstract: This paper tackles the challenge of teaching code semantics to Large Language Models (LLMs) for program analysis by incorporating code symmetries into the model architecture. We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics. Our solution, SymC,… ▽ More

    Submitted 6 June, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

  13. arXiv:2308.01285  [pdf, other

    cs.AI cs.HC

    Flows: Building Blocks of Reasoning and Collaborating AI

    Authors: Martin Josifoski, Lars Klein, Maxime Peyrard, Nicolas Baldwin, Yifei Li, Saibo Geng, Julian Paul Schnitzler, Yuxing Yao, Jiheng Wei, Debjit Paul, Robert West

    Abstract: Recent advances in artificial intelligence (AI) have produced highly capable and controllable systems. This creates unprecedented opportunities for structured reasoning as well as collaboration among multiple AI systems and humans. To fully realize this potential, it is essential to develop a principled way of designing and studying such structured interactions. For this purpose, we introduce the… ▽ More

    Submitted 7 February, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

  14. arXiv:2306.16046  [pdf, other

    cs.RO

    Robo-centric ESDF: A Fast and Accurate Whole-body Collision Evaluation Tool for Any-shape Robotic Planning

    Authors: Shuang Geng, Qianhao Wang, Lei Xie, Chao Xu, Yanjun Cao, Fei Gao

    Abstract: For letting mobile robots travel flexibly through complicated environments, increasing attention has been paid to the whole-body collision evaluation. Most existing works either opt for the conservative corridor-based methods that impose strict requirements on the corridor generation, or ESDF-based methods that suffer from high computational overhead. It is still a great challenge to achieve fast… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted at IROS 2023

  15. arXiv:2306.06691  [pdf, other

    cs.CV

    Self-Enhancement Improves Text-Image Retrieval in Foundation Visual-Language Models

    Authors: Yuguang Yang, Yiming Wang, Shupeng Geng, Runqi Wang, Yimi Wang, Sheng Wu, Baochang Zhang

    Abstract: The emergence of cross-modal foundation models has introduced numerous approaches grounded in text-image retrieval. However, on some domain-specific retrieval tasks, these models fail to focus on the key attributes required. To address this issue, we propose a self-enhancement framework, A^{3}R, based on the CLIP-ViT/G-14, one of the largest cross-modal models. First, we perform an Attribute Augme… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: Accepted by CVPR 2023 Workshop

  16. arXiv:2306.03833  [pdf

    cs.LG

    Predicting Consultation Success in Online Health Platforms Using Dynamic Knowledge Networks and Multimodal Data Fusion

    Authors: Shuang Geng, Wenli Zhang, Jiaheng Xie, Gemin Liang, Ben Niu, Sudha Ram

    Abstract: Online healthcare consultation in virtual health is an emerging industry marked by innovation and fierce competition. Accurate and timely prediction of healthcare consultation success can proactively help online platforms address patient concerns and improve retention rates. However, predicting online consultation success is challenging due to the partial role of virtual consultations in patients'… ▽ More

    Submitted 14 June, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

    MSC Class: K.5 ACM Class: H.4.m

  17. arXiv:2306.00321  [pdf, other

    cs.LG

    Improving Offline RL by Blending Heuristics

    Authors: Sinong Geng, Aldo Pacchiano, Andrey Kolobov, Ching-An Cheng

    Abstract: We propose Heuristic Blending (HUBL), a simple performance-improving technique for a broad class of offline RL algorithms based on value bootstrap**. HUBL modifies the Bellman operators used in these algorithms, partially replacing the bootstrapped values with heuristic ones that are estimated with Monte-Carlo returns. For trajectories with higher returns, HUBL relies more on the heuristic value… ▽ More

    Submitted 15 March, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

  18. arXiv:2305.14302  [pdf, other

    cs.IR cs.AI cs.HC cs.LG cs.MM

    VIP5: Towards Multimodal Foundation Models for Recommendation

    Authors: Shijie Geng, Juntao Tan, Shuchang Liu, Zuohui Fu, Yongfeng Zhang

    Abstract: Computer Vision (CV), Natural Language Processing (NLP), and Recommender Systems (RecSys) are three prominent AI applications that have traditionally developed independently, resulting in disparate modeling and engineering methodologies. This has impeded the ability for these fields to directly benefit from each other's advancements. With the recent development of foundation models, large language… ▽ More

    Submitted 14 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted by EMNLP 2023

  19. arXiv:2305.13971  [pdf, other

    cs.CL cs.AI cs.LG

    Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning

    Authors: Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West

    Abstract: Despite their impressive performance, large language models (LMs) still struggle with reliably generating complex output structures when not finetuned to follow the required output format exactly. To address this issue, grammar-constrained decoding (GCD) can be used to control the generation of LMs, guaranteeing that the output follows a given structure. Most existing GCD methods are, however, lim… ▽ More

    Submitted 18 January, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP 2023 Main Conference

  20. arXiv:2304.15010  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

    Authors: Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao

    Abstract: How to efficiently transform large language models (LLMs) into instruction followers is recently a popular research direction, while training LLM for multi-modal reasoning remains less explored. Although the recent LLaMA-Adapter demonstrates the potential to handle visual inputs with LLMs, it still cannot generalize well to open-ended visual instructions and lags behind GPT-4. In this paper, we pr… ▽ More

    Submitted 28 April, 2023; originally announced April 2023.

    Comments: Code and models are available at https://github.com/ZrrSkywalker/LLaMA-Adapter

  21. arXiv:2304.04916  [pdf, other

    cs.LG stat.ML

    A Data-Driven State Aggregation Approach for Dynamic Discrete Choice Models

    Authors: Sinong Geng, Houssam Nassif, Carlos A. Manzanares

    Abstract: We study dynamic discrete choice models, where a commonly studied problem involves estimating parameters of agent reward functions (also known as "structural" parameters), using agent behavioral data. Maximum likelihood estimation for such models requires dynamic programming, which is limited by the curse of dimensionality. In this work, we present a novel algorithm that provides a data-driven met… ▽ More

    Submitted 31 May, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Journal ref: The Conference on Uncertainty in Artificial Intelligence (UAI'23), Pittsburgh, PA, pp. 647-657, 2023

  22. arXiv:2303.14865  [pdf, other

    cs.CV

    Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens

    Authors: Yuxiao Chen, Jianbo Yuan, Yu Tian, Shijie Geng, Xinyu Li, Ding Zhou, Dimitris N. Metaxas, Hongxia Yang

    Abstract: Contrastive learning-based vision-language pre-training approaches, such as CLIP, have demonstrated great success in many vision-language tasks. These methods achieve cross-modal alignment by encoding a matched image-text pair with similar feature embeddings, which are generated by aggregating information from visual patches and language tokens. However, direct aligning cross-modal information usi… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  23. arXiv:2303.02995  [pdf, other

    cs.CV cs.CL cs.LG

    HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention

    Authors: Shijie Geng, Jianbo Yuan, Yu Tian, Yuxiao Chen, Yongfeng Zhang

    Abstract: The success of large-scale contrastive vision-language pretraining (CLIP) has benefited both visual recognition and multimodal content understanding. The concise design brings CLIP the advantage in inference efficiency against other vision-language models with heavier cross-attention fusion layers, making it a popular choice for a wide spectrum of downstream tasks. However, CLIP does not explicitl… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted at ICLR 2023

  24. arXiv:2302.10301  [pdf, other

    cs.CV cs.AI

    Artificial Intelligence System for Detection and Screening of Cardiac Abnormalities using Electrocardiogram Images

    Authors: Deyun Zhang, Shijia Geng, Yang Zhou, Weilun Xu, Guodong Wei, Kai Wang, Jie Yu, Qiang Zhu, Yongkui Li, Yonghong Zhao, Xingyue Chen, Rui Zhang, Zhaoji Fu, Rongbo Zhou, Yanqi E, Sumei Fan, Qinghao Zhao, Chuandong Cheng, Nan Peng, Liang Zhang, Linlin Zheng, Jianjun Chu, Hongbin Xu, Chen Tan, Jian Liu , et al. (6 additional authors not shown)

    Abstract: The artificial intelligence (AI) system has achieved expert-level performance in electrocardiogram (ECG) signal analysis. However, in underdeveloped countries or regions where the healthcare information system is imperfect, only paper ECGs can be provided. Analysis of real-world ECG images (photos or scans of paper ECGs) remains challenging due to complex environments or interference. In this stud… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: 47 pages, 29 figures

  25. arXiv:2301.13244  [pdf, other

    cs.RO cs.CV

    Mono-STAR: Mono-camera Scene-level Tracking and Reconstruction

    Authors: Haonan Chang, Dhruv Metha Ramesh, Shijie Geng, Yuqiu Gan, Abdeslam Boularias

    Abstract: We present Mono-STAR, the first real-time 3D reconstruction system that simultaneously supports semantic fusion, fast motion tracking, non-rigid object deformation, and topological change under a unified framework. The proposed system solves a new optimization problem incorporating optical-flow-based 2D constraints to deal with fast motion and a novel semantic-aware deformation graph (SAD-graph) f… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: This paper has been accepted by ICRA2023

  26. arXiv:2301.10939  [pdf, other

    cs.CV cs.CL cs.LG

    Affective Faces for Goal-Driven Dyadic Communication

    Authors: Scott Geng, Revant Teotia, Purva Tendulkar, Sachit Menon, Carl Vondrick

    Abstract: We introduce a video framework for modeling the association between verbal and non-verbal communication during dyadic conversation. Given the input speech of a speaker, our approach retrieves a video of a listener, who has facial expressions that would be socially appropriate given the context. Our approach further allows the listener to be conditioned on their own goals, personalities, or backgro… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  27. arXiv:2212.07016  [pdf, other

    cs.CV

    Understanding Zero-Shot Adversarial Robustness for Large-Scale Models

    Authors: Chengzhi Mao, Scott Geng, Junfeng Yang, Xin Wang, Carl Vondrick

    Abstract: Pretrained large-scale vision-language models like CLIP have exhibited strong generalization over unseen tasks. Yet imperceptible adversarial perturbations can significantly reduce CLIP's performance on new tasks. In this work, we identify and explore the problem of \emph{adapting large-scale models for zero-shot adversarial robustness}. We first identify two key factors during model adaption -- t… ▽ More

    Submitted 21 April, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

  28. arXiv:2210.02853  [pdf, other

    cs.CR cs.LG cs.PL cs.SE

    NeuDep: Neural Binary Memory Dependence Analysis

    Authors: Kexin Pei, Dongdong She, Michael Wang, Scott Geng, Zhou Xuan, Yaniv David, Junfeng Yang, Suman Jana, Baishakhi Ray

    Abstract: Determining whether multiple instructions can access the same memory location is a critical task in binary analysis. It is challenging as statically computing precise alias information is undecidable in theory. The problem aggravates at the binary level due to the presence of compiler optimizations and the absence of symbols and types. Existing approaches either produce significant spurious depend… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: ESEC/FSE 2022

  29. arXiv:2208.03550  [pdf, other

    cs.CV

    Frozen CLIP Models are Efficient Video Learners

    Authors: Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang, Jifeng Dai, Yu Qiao, Hongsheng Li

    Abstract: Video recognition has been dominated by the end-to-end learning paradigm -- first initializing a video recognition model with weights of a pretrained image model and then conducting end-to-end training on videos. This enables the video network to benefit from the pretrained image model. However, this requires substantial computation and memory resources for finetuning on videos and the alternative… ▽ More

    Submitted 6 August, 2022; originally announced August 2022.

    Comments: ECCV 2022

  30. arXiv:2207.09644  [pdf, other

    cs.CV

    Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

    Authors: Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie Geng, Ligong Han, Dimitris N. Metaxas

    Abstract: Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult. Recent studies focus on learning video-level temporal and discriminative information using contrastive learning, but overlook the hie… ▽ More

    Submitted 27 March, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  31. arXiv:2205.09928  [pdf, other

    cs.LG

    Self-Supervised Time Series Representation Learning via Cross Reconstruction Transformer

    Authors: Wenrui Zhang, Ling Yang, Shijia Geng, Shenda Hong

    Abstract: Unsupervised/self-supervised representation learning in time series is critical since labeled samples are usually scarce in real-world scenarios. Existing approaches mainly leverage the contrastive learning framework, which automatically learns to understand the similar and dissimilar data pairs. Nevertheless, they are restricted to the prior knowledge of constructing pairs, cumbersome sampling po… ▽ More

    Submitted 7 July, 2023; v1 submitted 19 May, 2022; originally announced May 2022.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

  32. Explainable Fairness in Recommendation

    Authors: Yingqiang Ge, Juntao Tan, Yan Zhu, Yinglong Xia, Jiebo Luo, Shuchang Liu, Zuohui Fu, Shijie Geng, Zelong Li, Yongfeng Zhang

    Abstract: Existing research on fairness-aware recommendation has mainly focused on the quantification of fairness and the development of fair recommendation models, neither of which studies a more substantial problem--identifying the underlying reason of model disparity in recommendation. This information is critical for recommender system designers to understand the intrinsic recommendation mechanism and p… ▽ More

    Submitted 6 June, 2022; v1 submitted 23 April, 2022; originally announced April 2022.

    Comments: In Proceedings of SIGIR 2022

  33. arXiv:2204.09265  [pdf, other

    cs.RO

    Dynamic Free-Space Roadmap for Safe Quadrotor Motion Planning

    Authors: Junlong Guo, Zhiren Xun, Shuang Geng, Yi Lin, Chao Xu, Fei Gao

    Abstract: Free-space-oriented roadmaps typically generate a series of convex geometric primitives, which constitute the safe region for motion planning. However, a static environment is assumed for this kind of roadmap. This assumption makes it unable to deal with dynamic obstacles and limits its applications. In this paper, we present a dynamic free-space roadmap, which provides feasible spaces and a navig… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

  34. arXiv:2203.13366  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)

    Authors: Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, Yongfeng Zhang

    Abstract: For a long time, different recommendation tasks typically require designing task-specific architectures and training objectives. As a result, it is hard to transfer the learned knowledge and representations from one task to another, thus restricting the generalization ability of existing recommendation approaches, e.g., a sequential recommendation model can hardly be applied or transferred to a re… ▽ More

    Submitted 2 January, 2023; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted to ACM RecSys 2022. The arXiv version is more comprehensive

  35. arXiv:2203.09487  [pdf, other

    eess.SP cs.CR cs.LG

    Defending Against Adversarial Attack in ECG Classification with Adversarial Distillation Training

    Authors: Jiahao Shao, Shijia Geng, Zhaoji Fu, Weilun Xu, Tong Liu, Shenda Hong

    Abstract: In clinics, doctors rely on electrocardiograms (ECGs) to assess severe cardiac disorders. Owing to the development of technology and the increase in health awareness, ECG signals are currently obtained by using medical and commercial devices. Deep neural networks (DNNs) can be used to analyze these signals because of their high accuracy rate. However, researchers have found that adversarial attack… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  36. arXiv:2203.00512  [pdf, other

    eess.SP cs.AI cs.LG

    A Deep Bayesian Neural Network for Cardiac Arrhythmia Classification with Rejection from ECG Recordings

    Authors: Wenrui Zhang, Xinxin Di, Guodong Wei, Shijia Geng, Zhaoji Fu, Shenda Hong

    Abstract: With the development of deep learning-based methods, automated classification of electrocardiograms (ECGs) has recently gained much attention. Although the effectiveness of deep neural networks has been encouraging, the lack of information given by the outputs restricts clinicians' reexamination. If the uncertainty estimation comes along with the classification results, cardiologists can pay more… ▽ More

    Submitted 25 February, 2022; originally announced March 2022.

  37. arXiv:2202.12458  [pdf, other

    cs.LG eess.SP

    A Simple Self-Supervised ECG Representation Learning Method via Manipulated Temporal-Spatial Reverse Detection

    Authors: Wenrui Zhang, Shijia Geng, Shenda Hong

    Abstract: Learning representations from electrocardiogram (ECG) signals can serve as a fundamental step for different machine learning-based ECG tasks. In order to extract general ECG representations that can be adapted to various downstream tasks, the learning process needs to be based on a general ECG-related task which can be achieved through self-supervised learning (SSL). However, existing SSL approach… ▽ More

    Submitted 21 September, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  38. arXiv:2202.12450  [pdf, other

    cs.LG cs.AI

    MetaVA: Curriculum Meta-learning and Pre-fine-tuning of Deep Neural Networks for Detecting Ventricular Arrhythmias based on ECGs

    Authors: Wenrui Zhang, Shijia Geng, Zhaoji Fu, Linlin Zheng, Chenyang Jiang, Shenda Hong

    Abstract: Ventricular arrhythmias (VA) are the main causes of sudden cardiac death. Develo** machine learning methods for detecting VA based on electrocardiograms (ECGs) can help save people's lives. However, develo** such machine learning models for ECGs is challenging because of the following: 1) group-level diversity from different subjects and 2) individual-level diversity from different moments of… ▽ More

    Submitted 28 February, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  39. Learning and Evaluating Graph Neural Network Explanations based on Counterfactual and Factual Reasoning

    Authors: Juntao Tan, Shijie Geng, Zuohui Fu, Yingqiang Ge, Shuyuan Xu, Yunqi Li, Yongfeng Zhang

    Abstract: Structural data well exists in Web applications, such as social networks in social media, citation networks in academic websites, and threads data in online forums. Due to the complex topology, it is difficult to process and make use of the rich information within such data. Graph Neural Networks (GNNs) have shown great advantages on learning representations for structural data. However, the non-t… ▽ More

    Submitted 10 May, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: Published at the Web Conference 2022 (WWW 2022)

  40. arXiv:2112.05892  [pdf, other

    cs.CV

    COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality

    Authors: Honglu Zhou, Asim Kadav, Aviv Shamsian, Shijie Geng, Farley Lai, Long Zhao, Ting Liu, Mubbasir Kapadia, Hans Peter Graf

    Abstract: Group Activity Recognition detects the activity collectively performed by a group of actors, which requires compositional reasoning of actors and objects. We approach the task by modeling the video as tokens that represent the multi-scale semantic concepts in the video. We propose COMPOSER, a Multiscale Transformer based architecture that performs attention-based reasoning over tokens at each scal… ▽ More

    Submitted 24 July, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: ECCV 2022

  41. arXiv:2111.14745  [pdf, other

    cs.CV

    A Simple Long-Tailed Recognition Baseline via Vision-Language Model

    Authors: Teli Ma, Shijie Geng, Mengmeng Wang, **g Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao

    Abstract: The visual world naturally exhibits a long-tailed distribution of open classes, which poses great challenges to modern visual systems. Existing approaches either perform class re-balancing strategies or directly improve network modules to address the problem. However, they still train models with a finite set of predefined labels, limiting their supervision information and restricting their transf… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

  42. arXiv:2110.06894  [pdf, other

    cs.CL

    Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

    Authors: Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori

    Abstract: In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8). In these challenges, the best-performing systems relied heavily on human-generated descriptions of the video content, which were available in the dat… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: https://dstc10.dstc.community/home and https://github.com/dialogtekgeek/AVSD-DSTC10_Official/

  43. arXiv:2110.04544  [pdf, other

    cs.CV cs.CL

    CLIP-Adapter: Better Vision-Language Models with Feature Adapters

    Authors: Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, Yu Qiao

    Abstract: Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning. Unlike traditional visual systems trained by a fixed set of discrete labels, a new paradigm was introduced in \cite{radford2021learning} to directly learn to align images with raw texts in an open-vocabulary setting. On downstream tasks, a carefully chosen text prompt is employed… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

    Comments: Technical Report

  44. arXiv:2109.11778  [pdf, other

    cs.CV cs.CL

    Dense Contrastive Visual-Linguistic Pretraining

    Authors: Lei Shi, Kai Shuang, Shijie Geng, Peng Gao, Zuohui Fu, Gerard de Melo, Yunpeng Chen, Sen Su

    Abstract: Inspired by the success of BERT, several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. In particular, LXMERT and UNITER adopt visual region feature regression and label classification as pretext tasks. However,… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

    Comments: Accepted by ACM Multimedia 2021. arXiv admin note: text overlap with arXiv:2007.13135

  45. arXiv:2109.06862  [pdf, other

    cs.CL

    Legal Transformer Models May Not Always Help

    Authors: Saibo Geng, Rémi Lebret, Karl Aberer

    Abstract: Deep learning-based Natural Language Processing methods, especially transformers, have achieved impressive performance in the last few years. Applying those state-of-the-art NLP methods to legal activities to automate or simplify some simple work is of great value. This work investigates the value of domain adaptive pre-training and language adapters in legal NLP tasks. By comparing the performanc… ▽ More

    Submitted 15 September, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

  46. arXiv:2109.01962  [pdf, other

    cs.CL

    Counterfactual Evaluation for Explainable AI

    Authors: Yingqiang Ge, Shuchang Liu, Zelong Li, Shuyuan Xu, Shijie Geng, Yunqi Li, Juntao Tan, Fei Sun, Yongfeng Zhang

    Abstract: While recent years have witnessed the emergence of various explainable methods in machine learning, to what degree the explanations really represent the reasoning process behind the model prediction -- namely, the faithfulness of explanation -- is still an open problem. One commonly used way to measure faithfulness is \textit{erasure-based} criteria. Though conceptually simple, erasure-based crite… ▽ More

    Submitted 4 September, 2021; originally announced September 2021.

  47. arXiv:2106.02242  [pdf, other

    cs.CL

    Scalable Transformers for Neural Machine Translation

    Authors: Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li

    Abstract: Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation. However, the deployment of Transformer is challenging because different scenarios require models of different complexities and scales. Naively training multiple Transformers is redundant in terms of both computation and memory. In this paper, we propos… ▽ More

    Submitted 17 June, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Mostly overlap** with version 1, with minor updates/revisions

  48. arXiv:2104.14882  [pdf, other

    cs.CV cs.AI

    Vehicle Re-identification Method Based on Vehicle Attribute and Mutual Exclusion Between Cameras

    Authors: Junru Chen, Shiqing Geng, Yongluan Yan, Danyang Huang, Hao Liu, Yadong Li

    Abstract: Vehicle Re-identification aims to identify a specific vehicle across time and camera view. With the rapid growth of intelligent transportation systems and smart cities, vehicle Re-identification technology gets more and more attention. However, due to the difference of shooting angle and the high similarity of vehicles belonging to the same brand, vehicle re-identification becomes a great challeng… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

  49. arXiv:2101.09755  [pdf, other

    cs.CL

    RomeBERT: Robust Training of Multi-Exit BERT

    Authors: Shijie Geng, Peng Gao, Zuohui Fu, Yongfeng Zhang

    Abstract: BERT has achieved superior performances on Natural Language Understanding (NLU) tasks. However, BERT possesses a large number of parameters and demands certain resources to deploy. For acceleration, Dynamic Early Exiting for BERT (DeeBERT) has been proposed recently, which incorporates multiple exits and adopts a dynamic early-exit mechanism to ensure efficient inference. While obtaining an effici… ▽ More

    Submitted 24 January, 2021; originally announced January 2021.

  50. arXiv:2012.11185  [pdf, other

    cs.AI cs.CV

    Infrared image pedestrian target detection based on Yolov3 and migration learning

    Authors: Shengqi Geng

    Abstract: With the gradual application of infrared night vision vehicle assistance system in automatic driving, the accuracy of the collected infrared images of pedestrians is gradually improved. In this paper, the migration learning method is used to apply YOLOv3 model to realize pedestrian target detection in infrared images. The target detection model YOLOv3 is migrated to the CVC infrared pedestrian dat… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.