Skip to main content

Showing 1–25 of 25 results for author: Dang, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10100  [pdf, other

    cs.CV cs.AI

    SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding

    Authors: Junwei Luo, Zhen Pang, Yongjun Zhang, Tingzhu Wang, Linlin Wang, Bo Dang, Jiangwei Lao, Jian Wang, **gdong Chen, Yihua Tan, Yansheng Li

    Abstract: Remote Sensing Large Multi-Modal Models (RSLMMs) are develo** rapidly and showcase significant capabilities in remote sensing imagery (RSI) comprehension. However, due to the limitations of existing datasets, RSLMMs have shortcomings in understanding the rich semantic relations among objects in complex remote sensing scenes. To unlock RSLMMs' complex comprehension ability, we propose a large-sca… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 30 pages, 5 figures, 19 tables, dataset and code see https://github.com/Luo-Z13/SkySenseGPT

  2. arXiv:2406.09410  [pdf, other

    cs.CV cs.AI

    Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach

    Authors: Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan

    Abstract: Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting intelligent understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it necessary to holistically conduct SGG in large-size very-high-reso… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper releases a SAI-oriented SGG toolkit with about 30 OBD methods and 10 SGG methods, and develops a benchmark based on RSG where our HOD-Net and RPCM significantly outperform the state-of-the-art methods in both OBD and SGG tasks. The RSG dataset and SAI-oriented toolkit will be made publicly available at https://linlin-dev.github.io/project/RSG

  3. arXiv:2406.05691  [pdf, other

    cs.CV cs.GR

    Diverse 3D Human Pose Generation in Scenes based on Decoupled Structure

    Authors: Bowen Dang, Xi Zhao

    Abstract: This paper presents a novel method for generating diverse 3D human poses in scenes with semantic control. Existing methods heavily rely on the human-scene interaction dataset, resulting in a limited diversity of the generated human poses. To overcome this challenge, we propose to decouple the pose and interaction generation process. Our approach consists of three stages: pose generation, contact g… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: The 37th International Conference on Computer Animation and Social Agents (CASA 2024)

  4. arXiv:2405.05983  [pdf

    cs.CV cs.AI cs.LG

    Real-Time Pill Identification for the Visually Impaired Using Deep Learning

    Authors: Bo Dang, Wenchao Zhao, Yufeng Li, Danqing Ma, Qixuan Yu, Elly Yijun Zhu

    Abstract: The prevalence of mobile technology offers unique opportunities for addressing healthcare challenges, especially for individuals with visual impairments. This paper explores the development and implementation of a deep learning-based mobile application designed to assist blind and visually impaired individuals in real-time pill identification. Utilizing the YOLO framework, the application aims to… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  5. arXiv:2404.16407  [pdf, other

    cs.CL eess.AS

    U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF

    Authors: Xingchen Song, Di Wu, Binbin Zhang, Dinghao Zhou, Zhendong Peng, Bo Dang, Fu** Pan, Chao Yang

    Abstract: Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    ACM Class: I.2.7

  6. arXiv:2404.09292  [pdf, other

    cs.CV cs.AI

    Bridging Data Islands: Geographic Heterogeneity-Aware Federated Learning for Collaborative Remote Sensing Semantic Segmentation

    Authors: Jieyi Tan, Yansheng Li, Sergey A. Bartalev, Bo Dang, Wei Chen, Yongjun Zhang, Liangqi Yuan

    Abstract: Remote sensing semantic segmentation (RSS) is an essential task in Earth Observation missions. Due to data privacy concerns, high-quality remote sensing images with annotations cannot be well shared among institutions, making it difficult to fully utilize RSS data to train a generalized model. Federated Learning (FL), a privacy-preserving collaborative learning technology, is a potential solution.… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 13 pages,9 figures, 4 tables

  7. arXiv:2403.16212  [pdf, other

    eess.IV cs.CV cs.LG

    Leveraging Deep Learning and Xception Architecture for High-Accuracy MRI Classification in Alzheimer Diagnosis

    Authors: Shaojie Li, Haichen Qu, Xinqi Dong, Bo Dang, Hengyi Zang, Yulu Gong

    Abstract: Exploring the application of deep learning technologies in the field of medical diagnostics, Magnetic Resonance Imaging (MRI) provides a unique perspective for observing and diagnosing complex neurodegenerative diseases such as Alzheimer Disease (AD). With advancements in deep learning, particularly in Convolutional Neural Networks (CNNs) and the Xception network architecture, we are now able to a… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  8. arXiv:2403.14483  [pdf, other

    cs.LG cs.AI q-fin.ST

    Utilizing the LightGBM Algorithm for Operator User Credit Assessment Research

    Authors: Shaojie Li, Xinqi Dong, Danqing Ma, Bo Dang, Hengyi Zang, Yulu Gong

    Abstract: Mobile Internet user credit assessment is an important way for communication operators to establish decisions and formulate measures, and it is also a guarantee for operators to obtain expected benefits. However, credit evaluation methods have long been monopolized by financial industries such as banks and credit. As supporters and providers of platform network technology and network resources, co… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  9. arXiv:2403.13703  [pdf

    cs.CV cs.AI

    Fostc3net:A Lightweight YOLOv5 Based On the Network Structure Optimization

    Authors: Danqing Ma, Shaojie Li, Bo Dang, Hengyi Zang, Xinqi Dong

    Abstract: Transmission line detection technology is crucial for automatic monitoring and ensuring the safety of electrical facilities. The YOLOv5 series is currently one of the most advanced and widely used methods for object detection. However, it faces inherent challenges, such as high computational load on devices and insufficient detection accuracy. To address these concerns, this paper presents an enha… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  10. arXiv:2312.10115  [pdf, other

    cs.CV

    SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery

    Authors: Xin Guo, Jiangwei Lao, Bo Dang, Yingying Zhang, Lei Yu, Lixiang Ru, Liheng Zhong, Ziyuan Huang, Kang Wu, Dingxiang Hu, Huimei He, Jian Wang, **gdong Chen, Ming Yang, Yongjun Zhang, Yansheng Li

    Abstract: Prior studies on Remote Sensing Foundation Model (RSFM) reveal immense potential towards a generic model for Earth Observation. Nevertheless, these works primarily focus on a single modality without temporal and geo-context modeling, hampering their capabilities for diverse tasks. In this study, we present SkySense, a generic billion-scale model, pre-trained on a curated multi-modal Remote Sensing… ▽ More

    Submitted 22 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR2024

  11. A Versatile Data Fabric for Advanced IoT-Based Remote Health Monitoring

    Authors: Italo Buleje, Vince S. Siu, Kuan Yu Hsieh, Nigel Hinds, Bing Dang, Erhan Bilal, Thanhnha Nguyen, Ellen E. Lee, Colin A. Depp, Jeffrey L. Rogers

    Abstract: This paper presents a data-centric and security-focused data fabric designed for digital health applications. With the increasing interest in digital health research, there has been a surge in the volume of Internet of Things (IoT) data derived from smartphones, wearables, and ambient sensors. Managing this vast amount of data, encompassing diverse data types and varying time scales, is crucial. M… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Journal ref: 2023 IEEE International Conference on Digital Health (ICDH), Chicago, IL, USA, 2023, pp. 88-90

  12. arXiv:2310.01228  [pdf, other

    cs.CV cs.GR

    Reconstructing 3D Human Pose from RGB-D Data with Occlusions

    Authors: Bowen Dang, Xi Zhao, Bowen Zhang, He Wang

    Abstract: We propose a new method to reconstruct the 3D human body from RGB-D images with occlusions. The foremost challenge is the incompleteness of the RGB-D data due to occlusions between the body and the environment, leading to implausible reconstructions that suffer from severe human-scene penetration. To reconstruct a semantically and physically plausible human body, we propose to reduce the solution… ▽ More

    Submitted 15 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

  13. ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs

    Authors: Xingchen Song, Di Wu, Binbin Zhang, Zhendong Peng, Bo Dang, Fu** Pan, Zhiyong Wu

    Abstract: In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}. The core idea of ZeroPrompt is to append zeroed content to each chunk during inference, which acts like a prompt to encourage the… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: accepted by interspeech 2023

    ACM Class: I.2.7

    Journal ref: @inproceedings{song23c_interspeech, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={1648--1652}}

  14. arXiv:2305.04594  [pdf, other

    cs.RO

    A sensor fusion approach for improving implementation speed and accuracy of RTAB-Map algorithm based indoor 3D map**

    Authors: Hoang-Anh Phan, Phuc Vinh Nguyen, Thu Hang Thi Khuat, Hieu Dang Van, Dong Huu Quoc Tran, Bao Lam Dang, Tung Thanh Bui, Van Nguyen Thi Thanh, Trinh Chu Duc

    Abstract: In recent years, 3D map** for indoor environments has undergone considerable research and improvement because of its effective applications in various fields, including robotics, autonomous navigation, and virtual reality. Building an accurate 3D map for indoor environment is challenging due to the complex nature of the indoor space, the problem of real-time embedding and positioning errors of t… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to 20th International Joint Conference on Computer Science and Software Engineering (JCSSE 2023). 5 pages

  15. arXiv:2303.09310  [pdf, other

    cs.CV

    GLH-Water: A Large-Scale Dataset for Global Surface Water Detection in Large-Size Very-High-Resolution Satellite Imagery

    Authors: Yansheng Li, Bo Dang, Wanchun Li, Yongjun Zhang

    Abstract: Global surface water detection in very-high-resolution (VHR) satellite imagery can directly serve major applications such as refined flood map** and water resource assessment. Although achievements have been made in detecting surface water in small-size satellite images corresponding to local geographic scales, datasets and methods suitable for map** and analyzing global surface water have yet… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  16. arXiv:2211.12425  [pdf, other

    cs.CV

    Progressive Learning with Cross-Window Consistency for Semi-Supervised Semantic Segmentation

    Authors: Bo Dang, Yansheng Li, Yongjun Zhang, Jiayi Ma

    Abstract: Semi-supervised semantic segmentation focuses on the exploration of a small amount of labeled data and a large amount of unlabeled data, which is more in line with the demands of real-world image understanding applications. However, it is still hindered by the inability to fully and effectively leverage unlabeled images. In this paper, we reveal that cross-window consistency (CWC) is helpful in co… ▽ More

    Submitted 26 March, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

  17. arXiv:2211.11316  [pdf, other

    cs.CV

    EHSNet: End-to-End Holistic Learning Network for Large-Size Remote Sensing Image Semantic Segmentation

    Authors: Wei Chen, Yansheng Li, Bo Dang, Yongjun Zhang

    Abstract: This paper presents EHSNet, a new end-to-end segmentation network designed for the holistic learning of large-size remote sensing image semantic segmentation (LRISS). Large-size remote sensing images (LRIs) can lead to GPU memory exhaustion due to their extremely large size, which has been handled in previous works through either global-local fusion or multi-stage refinement, both of which are lim… ▽ More

    Submitted 10 April, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

  18. Health Guardian Platform: A technology stack to accelerate discovery in Digital Health research

    Authors: Bo Wen, Vince S. Siu, Italo Buleje, Kuan Yu Hsieh, Takashi Itoh, Lukas Zimmerli, Nigel Hinds, Elif Eyigoz, Bing Dang, Stefan von Cavallar, Jeffrey L. Rogers

    Abstract: This paper highlights the design philosophy and architecture of the Health Guardian, a platform developed by the IBM Digital Health team to accelerate discoveries of new digital biomarkers and development of digital health technologies. The Health Guardian allows for rapid translation of artificial intelligence (AI) research into cloud-based microservices that can be tested with data from clinical… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: 6 pages, 3 figures, https://ieeexplore.ieee.org/document/9861047

    Journal ref: IEEE International Conference on Digital Health (ICDH), 2022, pp. 40-46

  19. arXiv:2107.12920  [pdf, other

    cs.CL

    Emotion Stimulus Detection in German News Headlines

    Authors: Bao Minh Doan Dang, Laura Oberländer, Roman Klinger

    Abstract: Emotion stimulus extraction is a fine-grained subtask of emotion analysis that focuses on identifying the description of the cause behind an emotion expression from a text passage (e.g., in the sentence "I am happy that I passed my exam" the phrase "passed my exam" corresponds to the stimulus.). Previous work mainly focused on Mandarin and English, with no resources or models for German. We fill t… ▽ More

    Submitted 16 May, 2022; v1 submitted 27 July, 2021; originally announced July 2021.

    Comments: KONVENS 2021, published at https://aclanthology.org/2021.konvens-1.7/ Please cite by using https://aclanthology.org/2021.konvens-1.7.bib

  20. arXiv:2106.13405  [pdf, other

    cs.CL

    JNLP Team: Deep Learning Approaches for Legal Processing Tasks in COLIEE 2021

    Authors: Ha-Thanh Nguyen, Phuong Minh Nguyen, Thi-Hai-Yen Vuong, Quan Minh Bui, Chau Minh Nguyen, Binh Tran Dang, Vu Tran, Minh Le Nguyen, Ken Satoh

    Abstract: COLIEE is an annual competition in automatic computerized legal text processing. Automatic legal document processing is an ambitious goal, and the structure and semantics of the law are often far more complex than everyday language. In this article, we survey and report our methods and experimental results in using deep learning in legal document processing. The results show the difficulties as we… ▽ More

    Submitted 7 September, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: Also published in COLIEE 2021's proceeding

  21. arXiv:2106.13403  [pdf, other

    cs.CL cs.AI

    ParaLaw Nets -- Cross-lingual Sentence-level Pretraining for Legal Text Processing

    Authors: Ha-Thanh Nguyen, Vu Tran, Phuong Minh Nguyen, Thi-Hai-Yen Vuong, Quan Minh Bui, Chau Minh Nguyen, Binh Tran Dang, Minh Le Nguyen, Ken Satoh

    Abstract: Ambiguity is a characteristic of natural language, which makes expression ideas flexible. However, in a domain that requires accurate statements, it becomes a barrier. Specifically, a single word can have many meanings and multiple words can have the same meaning. When translating a text into a foreign language, the translator needs to determine the exact meaning of each element in the original se… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: Also published in COLIEE 2021's Proceeding

  22. arXiv:2011.08071  [pdf, other

    cs.CL cs.IR cs.LG

    JNLP Team: Deep Learning for Legal Processing in COLIEE 2020

    Authors: Ha-Thanh Nguyen, Hai-Yen Thi Vuong, Phuong Minh Nguyen, Binh Tran Dang, Quan Minh Bui, Sinh Trong Vu, Chau Minh Nguyen, Vu Tran, Ken Satoh, Minh Le Nguyen

    Abstract: We propose deep learning based methods for automatic systems of legal retrieval and legal question-answering in COLIEE 2020. These systems are all characterized by being pre-trained on large amounts of data before being finetuned for the specified tasks. This approach helps to overcome the data scarcity and achieve good performance, thus can be useful for tackling related problems in information r… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: Also be published in JURISIN2020

  23. arXiv:1804.10999  [pdf, other

    cs.HC

    But Who Protects the Moderators? The Case of Crowdsourced Image Moderation

    Authors: Brandon Dang, Martin J. Riedl, Matthew Lease

    Abstract: Though detection systems have been developed to identify obscene content such as pornography and violence, artificial intelligence is simply not good enough to fully automate this task yet. Due to the need for manual verification, social media companies may hire internal reviewers, contract specialized workers from third parties, or outsource to online labor markets for the purpose of commercial c… ▽ More

    Submitted 4 January, 2020; v1 submitted 29 April, 2018; originally announced April 2018.

    Comments: To be presented at the 6th AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2018) and the 6th ACM Collective Intelligence Conference (CI 2018)

  24. arXiv:1611.06792  [pdf, other

    cs.IR

    Neural Information Retrieval: A Literature Review

    Authors: Ye Zhang, Md Mustafizur Rahman, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek Khetan, Tyler McDonnell, An Thanh Nguyen, Dan Xu, Byron C. Wallace, Matthew Lease

    Abstract: A recent "third wave" of Neural Network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, this new NN research is often referred to as deep learning. Stemming from this tide of NN work, a number of researchers… ▽ More

    Submitted 3 March, 2017; v1 submitted 17 November, 2016; originally announced November 2016.

    Comments: 44 pages

  25. arXiv:1609.00945  [pdf, other

    cs.HC

    MmmTurkey: A Crowdsourcing Framework for Deploying Tasks and Recording Worker Behavior on Amazon Mechanical Turk

    Authors: Brandon Dang, Miles Hutson, Matt Lease

    Abstract: Internal HITs on Mechanical Turk can be programmatically restrictive, and as a result, many requesters turn to using external HITs as a more flexible alternative. However, creating such HITs can be redundant and time-consuming. We present MmmTurkey, a framework that enables researchers to not only quickly create and manage external HITs, but more significantly also capture and record detailed work… ▽ More

    Submitted 26 October, 2016; v1 submitted 4 September, 2016; originally announced September 2016.