Skip to main content

Showing 1–16 of 16 results for author: Yoshie, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19736  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

    Authors: Jihao Liu, Xin Huang, **liang Zheng, Boxiao Liu, Jia Wang, Osamu Yoshie, Yu Liu, Hongsheng Li

    Abstract: This paper introduces MM-Instruct, a large-scale dataset of diverse and high-quality visual instruction data designed to enhance the instruction-following capabilities of large multimodal models (LMMs). While existing visual instruction datasets often focus on question-answering, they struggle to generalize to broader application scenarios such as creative writing, summarization, or image analysis… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Dataset and models are available at https://github.com/jihaonew/MM-Instruct

  2. arXiv:2402.07536  [pdf, other

    cs.AI cs.CL

    BreakGPT: A Large Language Model with Multi-stage Structure for Financial Breakout Detection

    Authors: Kang Zhang, Osamu Yoshie, Weiran Huang

    Abstract: Trading range breakout (TRB) is a key method in the technical analysis of financial trading, widely employed by traders in financial markets such as stocks, futures, and foreign exchange. However, distinguishing between true and false breakout and providing the correct rationale cause significant challenges to investors. Recently, large language models have achieved success in various downstream a… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  3. arXiv:2311.17770  [pdf, other

    cs.CV cs.RO

    PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based 3D Object Detection

    Authors: Weixin Mao, Tiancai Wang, Diankun Zhang, Junjie Yan, Osamu Yoshie

    Abstract: This paper shows the effectiveness of 2D backbone scaling and pretraining for pillar-based 3D object detectors. Pillar-based methods mainly employ randomly initialized 2D convolution neural network (ConvNet) for feature extraction and fail to enjoy the benefits from the backbone scaling and pretraining in the image domain. To show the scaling-up capacity in point clouds, we introduce the dense Con… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  4. arXiv:2306.17450  [pdf, other

    cs.CV cs.AI

    GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection

    Authors: Weixin Mao, **rong Yang, Zheng Ge, Lin Song, Hongyu Zhou, Tiezheng Mao, Zeming Li, Osamu Yoshie

    Abstract: Depth perception is a crucial component of monoc-ular 3D detection tasks that typically involve ill-posed problems. In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for improving depth perception in 3D object detection. Concretely, we introduce a plain metric to evaluate the quality of depth predictions, which chooses the… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: 8 pages, 4 figures

  5. arXiv:2301.07088  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Vision Learners Meet Web Image-Text Pairs

    Authors: Bingchen Zhao, Quan Cui, Hao Wu, Osamu Yoshie, Cheng Yang, Oisin Mac Aodha

    Abstract: Most recent self-supervised learning methods are pre-trained on the well-curated ImageNet-1K dataset. In this work, given the excellent scalability of web data, we consider self-supervised pre-training on noisy web sourced image-text paired data. First, we conduct a benchmark study of representative self-supervised pre-training methods on large-scale web data in a like-for-like setting. We compare… ▽ More

    Submitted 5 April, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

    Comments: Project page: https://bzhao.me/MUG/

  6. arXiv:2203.03871  [pdf, other

    cs.CV cs.AI

    Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective

    Authors: Quan Cui, Bingchen Zhao, Zhao-Min Chen, Borui Zhao, Renjie Song, Jiajun Liang, Boyan Zhou, Osamu Yoshie

    Abstract: This work simultaneously considers the discriminability and transferability properties of deep representations in the typical supervised learning task, i.e., image classification. By a comprehensive temporal analysis, we observe a trade-off between these two properties. The discriminability keeps increasing with the training progressing while the transferability intensely diminishes in the later t… ▽ More

    Submitted 21 July, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted by ECCV 2022, Quan Cui and Bingchen Zhao contributed equally to this work

  7. arXiv:2112.09331  [pdf, other

    cs.CV cs.MM

    Contrastive Vision-Language Pre-training with Limited Resources

    Authors: Quan Cui, Boyan Zhou, Yu Guo, Weidong Yin, Hao Wu, Osamu Yoshie, Yubo Chen

    Abstract: Pioneering dual-encoder pre-training works (e.g., CLIP and ALIGN) have revealed the potential of aligning multi-modal representations with contrastive learning. However, these works require a tremendous amount of data and computational resources (e.g., billion-level web data and hundreds of GPUs), which prevent researchers with limited resources from reproduction and further exploration. To this e… ▽ More

    Submitted 18 July, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: Accepted to ECCV2022

  8. arXiv:2104.10419  [pdf, other

    cs.CV

    PP-YOLOv2: A Practical Object Detector

    Authors: Xin Huang, Xinxin Wang, Wenyu Lv, Xiaying Bai, Xiang Long, Kaipeng Deng, Qingqing Dang, Shumin Han, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma, Osamu Yoshie

    Abstract: Being effective and efficient is essential to an object detector for practical use. To meet these two concerns, we comprehensively evaluate a collection of existing refinements to improve the performance of PP-YOLO while almost keep the infer time unchanged. This paper will analyze a collection of refinements and empirically evaluate their impact on the final model performance through incremental… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

  9. arXiv:2103.14259  [pdf, other

    cs.CV

    OTA: Optimal Transport Assignment for Object Detection

    Authors: Zheng Ge, Songtao Liu, Zeming Li, Osamu Yoshie, Jian Sun

    Abstract: Recent advances in label assignment in object detection mainly seek to independently define positive/negative training samples for each ground-truth (gt) object. In this paper, we innovatively revisit the label assignment from a global perspective and propose to formulate the assigning procedure as an Optimal Transport (OT) problem -- a well-studied topic in Optimization Theory. Concretely, we def… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: CVPR2021

  10. arXiv:2102.09398  [pdf, other

    cs.LG eess.IV physics.optics

    A Reinforcement learning method for Optical Thin-Film Design

    Authors: Anqing Jiang, Liangyao Chen, Osamu Yoshie

    Abstract: Machine learning, especially deep learning, is dramatically changing the methods associated with optical thin-film inverse design. The vast majority of this research has focused on the parameter optimization (layer thickness, and structure size) of optical thin-films. A challenging problem that arises is an automated material search. In this work, we propose a new end-to-end algorithm for optical… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

  11. arXiv:2101.04307  [pdf, other

    cs.CV

    LLA: Loss-aware Label Assignment for Dense Pedestrian Detection

    Authors: Zheng Ge, Jianfeng Wang, Xin Huang, Songtao Liu, Osamu Yoshie

    Abstract: Label assignment has been widely studied in general object detection because of its great impact on detectors' performance. However, none of these works focus on label assignment in dense pedestrian detection. In this paper, we propose a simple yet effective assigning strategy called Loss-aware Label Assignment (LLA) to boost the performance of pedestrian detectors in crowd scenarios. LLA first ca… ▽ More

    Submitted 11 March, 2021; v1 submitted 12 January, 2021; originally announced January 2021.

    Comments: In the reviewing process of Pattern Recognition

  12. arXiv:2008.01369  [pdf, other

    cs.CV

    ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Image Retrieval

    Authors: Quan Cui, Qing-Yuan Jiang, Xiu-Shen Wei, Wu-Jun Li, Osamu Yoshie

    Abstract: Retrieving content relevant images from a large-scale fine-grained dataset could suffer from intolerably slow query speed and highly redundant storage cost, due to high-dimensional real-valued embeddings which aim to distinguish subtle visual differences of fine-grained objects. In this paper, we study the novel fine-grained hashing topic to generate compact binary codes for fine-grained images, l… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

    Comments: Accepted by ECCV2020

  13. arXiv:2005.11472  [pdf, other

    cs.CV

    Delving into the Imbalance of Positive Proposals in Two-stage Object Detection

    Authors: Zheng Ge, Zequn Jie, Xin Huang, Chengzheng Li, Osamu Yoshie

    Abstract: Imbalance issue is a major yet unsolved bottleneck for the current object detection models. In this work, we observe two crucial yet never discussed imbalance issues. The first imbalance lies in the large number of low-quality RPN proposals, which makes the R-CNN module (i.e., post-classification layers) become highly biased towards the negative proposals in the early training stage. The second im… ▽ More

    Submitted 23 May, 2020; originally announced May 2020.

  14. arXiv:2003.12729  [pdf, other

    cs.CV

    NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing

    Authors: Xin Huang, Zheng Ge, Zequn Jie, Osamu Yoshie

    Abstract: Although significant progress has been made in pedestrian detection recently, pedestrian detection in crowded scenes is still challenging. The heavy occlusion between pedestrians imposes great challenges to the standard Non-Maximum Suppression (NMS). A relative low threshold of intersection over union (IoU) leads to missing highly overlapped pedestrians, while a higher one brings in plenty of fals… ▽ More

    Submitted 21 April, 2020; v1 submitted 28 March, 2020; originally announced March 2020.

    Comments: Accepted by CVPR2020. The first two authors contributed equally, and are listed in alphabetical order

  15. arXiv:2003.07080  [pdf, other

    cs.CV cs.LG eess.IV

    PS-RCNN: Detecting Secondary Human Instances in a Crowd via Primary Object Suppression

    Authors: Zheng Ge, Zequn Jie, Xin Huang, Rong Xu, Osamu Yoshie

    Abstract: Detecting human bodies in highly crowded scenes is a challenging problem. Two main reasons result in such a problem: 1). weak visual cues of heavily occluded instances can hardly provide sufficient information for accurate detection; 2). heavily occluded instances are easier to be suppressed by Non-Maximum-Suppression (NMS). To address these two issues, we introduce a variant of two-stage detector… ▽ More

    Submitted 16 March, 2020; originally announced March 2020.

    Comments: 6pages, accepted by ICME2020

  16. arXiv:1812.02873  [pdf, other

    cs.LG stat.ML

    A new multilayer optical film optimal method based on deep q-learning

    Authors: Anqing Jiang, Osamu Yoshie, LiangYao Chen

    Abstract: Multi-layer optical film has been found to afford important applications in optical communication, optical absorbers, optical filters, etc. Different algorithms of multi-layer optical film design has been developed, as simplex method, colony algorithm, genetic algorithm. These algorithms rapidly promote the design and manufacture of multi-layer films. However, traditional numerical algorithms of c… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.