Skip to main content

Showing 1–17 of 17 results for author: Bae, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00323  [pdf, other

    cs.IR cs.MM

    BeFA: A General Behavior-driven Feature Adapter for Multimedia Recommendation

    Authors: Qile Fan, Penghang Yu, Zhiyi Tan, Bing-Kun Bao, Guanming Lu

    Abstract: Multimedia recommender systems focus on utilizing behavioral information and content information to model user preferences. Typically, it employs pre-trained feature encoders to extract content features, then fuses them with behavioral features. However, pre-trained feature encoders often extract features from the entire content simultaneously, including excessive preference-irrelevant details. We… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  2. arXiv:2404.05979  [pdf, other

    cs.CV

    StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

    Authors: Ming Tao, Bing-Kun Bao, Hao Tang, Yaowei Wang, Changsheng Xu

    Abstract: Story visualization aims to generate a series of realistic and coherent images based on a storyline. Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner. Although these models have shown notable progress, there are still three flaws. 1) The unidirectional generation of auto-regressive manner restricts the usability i… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 17 pages

  3. arXiv:2403.10744  [pdf, ps, other

    cs.AI

    Game and Reference: Policy Combination Synthesis for Epidemic Prevention and Control

    Authors: Zhiyi Tan, Bingkun Bao

    Abstract: In recent years, epidemic policy-making models are increasingly being used to provide reference for governors on prevention and control policies against catastrophic epidemics such as SARS, H1N1 and COVID-19. Existing studies are currently constrained by two issues: First, previous methods develop policies based on effect evaluation, since few of factors in real-world decision-making can be modele… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 16 pages, single line, 7 figures, written with Springer conference template

  4. arXiv:2309.15363  [pdf

    cs.IR

    LD4MRec: Simplifying and Powering Diffusion Model for Multimedia Recommendation

    Authors: Penghang Yu, Zhiyi Tan, Guanming Lu, Bing-Kun Bao

    Abstract: Multimedia recommendation aims to predict users' future behaviors based on historical behavioral data and item's multimodal information. However, noise inherent in behavioral data, arising from unintended user interactions with uninteresting items, detrimentally impacts recommendation performance. Recently, diffusion models have achieved high-quality information generation, in which the reverse pr… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  5. arXiv:2308.15840  [pdf, other

    cs.LG cs.AI physics.soc-ph q-bio.PE

    MSGNN: Multi-scale Spatio-temporal Graph Neural Network for Epidemic Forecasting

    Authors: Mingjie Qiu, Zhiyi Tan, Bing-kun Bao

    Abstract: Infectious disease forecasting has been a key focus and proved to be crucial in controlling epidemic. A recent trend is to develop forecast-ing models based on graph neural networks (GNNs). However, existing GNN-based methods suffer from two key limitations: (1) Current models broaden receptive fields by scaling the depth of GNNs, which is insuffi-cient to preserve the semantics of long-range conn… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 29 pages

    Report number: DAMI-D-23-00319R2

    Journal ref: Data Min Knowl Disc (2024)

  6. Multi-View Graph Convolutional Network for Multimedia Recommendation

    Authors: Penghang Yu, Zhiyi Tan, Guanming Lu, Bing-Kun Bao

    Abstract: Multimedia recommendation has received much attention in recent years. It models user preferences based on both behavior information and item multimodal information. Though current GCN-based methods achieve notable success, they suffer from two limitations: (1) Modality noise contamination to the item representations. Existing methods often mix modality features and behavior features in a single v… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: MM'23

  7. arXiv:2304.14226  [pdf, other

    cs.LG cs.AI cs.PF

    TorchBench: Benchmarking PyTorch with High API Surface Coverage

    Authors: Yueming Hao, Xu Zhao, Bin Bao, David Berard, Will Constable, Adnan Aziz, Xu Liu

    Abstract: Deep learning (DL) has been a revolutionary technique in various domains. To facilitate the model development and deployment, many deep learning frameworks are proposed, among which PyTorch is one of the most popular solutions. The performance of ecosystem around PyTorch is critically important, which saves the costs of training models and reduces the response time of model inferences. In this pap… ▽ More

    Submitted 24 June, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

  8. arXiv:2303.16557  [pdf, other

    cs.CV cs.AI

    Self-accumulative Vision Transformer for Bone Age Assessment Using the Sauvegrain Method

    Authors: Hong-Jun Choi, Dongbin Na, Kyung** Cho, Byunguk Bae, Seo Taek Kong, Hyunjoon An

    Abstract: This study presents a novel approach to bone age assessment (BAA) using a multi-view, multi-task classification model based on the Sauvegrain method. A straightforward solution to automating the Sauvegrain method, which assesses a maturity score for each landmark in the elbow and predicts the bone age, is to train classifiers independently to score each region of interest (RoI), but this approach… ▽ More

    Submitted 30 March, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: 13 pages

  9. arXiv:2301.12959  [pdf, other

    cs.CV cs.AI

    GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

    Authors: Ming Tao, Bing-Kun Bao, Hao Tang, Changsheng Xu

    Abstract: Synthesizing high-fidelity complex images from text is challenging. Based on large pretraining, the autoregressive and diffusion models can synthesize photo-realistic images. Although these large models have shown notable progress, there remain three flaws. 1) These models require tremendous training data and parameters to achieve good performance. 2) The multi-step generation design slows the ima… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: 11 pages

  10. arXiv:2206.01160  [pdf, other

    cs.CV cs.MM

    DE-Net: Dynamic Text-guided Image Editing Adversarial Networks

    Authors: Ming Tao, Bing-Kun Bao, Hao Tang, Fei Wu, Longhui Wei, Qi Tian

    Abstract: Text-guided image editing models have shown remarkable results. However, there remain two problems. First, they employ fixed manipulation modules for various editing requirements (e.g., color changing, texture changing, content adding and removing), which results in over-editing or insufficient editing. Second, they do not clearly distinguish between text-required and text-irrelevant parts, which… ▽ More

    Submitted 20 August, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

  11. arXiv:2107.04768  [pdf, other

    cs.MM cs.AI cs.CV

    DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering

    Authors: Jianyu Wang, Bing-Kun Bao, Changsheng Xu

    Abstract: Video question answering is a challenging task, which requires agents to be able to understand rich video contents and perform spatial-temporal reasoning. However, existing graph-based methods fail to perform multi-step reasoning well, neglecting two properties of VideoQA: (1) Even for the same video, different questions may require different amount of video clips or objects to infer the answer wi… ▽ More

    Submitted 10 July, 2021; originally announced July 2021.

    Comments: 12 pages, 12 figures

    Journal ref: IEEE Transactions on Multimedia 2021

  12. arXiv:2106.05735  [pdf, other

    eess.IV cs.CV cs.LG

    The Medical Segmentation Decathlon

    Authors: Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani, AnnetteKopp-Schneider, Bennett A. Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M. Summers, Bram van Ginneken, Michel Bilello, Patrick Bilic, Patrick F. Christ, Richard K. G. Do, Marc J. Gollub, Stephan H. Heckers, Henkjan Huisman, William R. Jarnagin, Maureen K. McHugo, Sandy Napel, Jennifer S. Goli Pernicka, Kawal Rhode, Catalina Tobon-Gomez, Eugene Vorontsov , et al. (34 additional authors not shown)

    Abstract: International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical pro… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    MSC Class: 68T07

  13. arXiv:2010.08924   

    cs.LG cs.AI

    Meta-path Free Semi-supervised Learning for Heterogeneous Networks

    Authors: Shin-woo Park, Byung Jun Bae, **young Yeo, Seung-won Hwang

    Abstract: Graph neural networks (GNNs) have been widely used in representation learning on graphs and achieved superior performance in tasks such as node classification. However, analyzing heterogeneous graph of different types of nodes and links still brings great challenges for injecting the heterogeneity into a graph neural network. A general remedy is to manually or automatically design meta-paths to tr… ▽ More

    Submitted 6 January, 2021; v1 submitted 18 October, 2020; originally announced October 2020.

    Comments: The technical description of [Proposed Models] section has an error. Especially, the training process

  14. arXiv:2008.05865  [pdf, other

    cs.CV

    DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

    Authors: Ming Tao, Hao Tang, Fei Wu, Xiao-Yuan **g, Bing-Kun Bao, Changsheng Xu

    Abstract: Synthesizing high-quality realistic images from text descriptions is a challenging task. Existing text-to-image Generative Adversarial Networks generally employ a stacked architecture as the backbone yet still remain three flaws. First, the stacked architecture introduces the entanglements between generators of different image scales. Second, existing studies prefer to apply and fix extra networks… ▽ More

    Submitted 14 October, 2022; v1 submitted 13 August, 2020; originally announced August 2020.

  15. arXiv:2004.13840  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Using LSTM to Translate French to Senegalese Local Languages: Wolof as a Case Study

    Authors: Lo Alla, Dione Cheikh Bamba, Nguer Elhadji Mamadou, Ba Sileye O. Ba, Lo Moussa

    Abstract: In this paper, we propose a neural machine translation system for Wolof, a low-resource Niger-Congo language. First we gathered a parallel corpus of 70000 aligned French-Wolof sentences. Then we developped a baseline LSTM based encoder-decoder architecture which was further extended to bidirectional LSTMs with attention mechanisms. Our models are trained on a limited amount of parallel French-Wolo… ▽ More

    Submitted 27 March, 2020; originally announced April 2020.

    Comments: 4 pages, 2 tables, ICLR AfricaNLP2020 workshop

  16. arXiv:1904.00623  [pdf, other

    cs.AI cs.CV cs.LG cs.MM

    Constructing Hierarchical Q&A Datasets for Video Story Understanding

    Authors: Yu-Jung Heo, Kyoung-Woon On, Seongho Choi, Jaeseo Lim, **ah Kim, Jeh-Kwang Ryu, Byung-Chull Bae, Byoung-Tak Zhang

    Abstract: Video understanding is emerging as a new paradigm for studying human-like AI. Question-and-Answering (Q&A) is used as a general benchmark to measure the level of intelligence for video understanding. While several previous studies have suggested datasets for video Q&A tasks, they did not really incorporate story-level understanding, resulting in highly-biased and lack of variance in degree of ques… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: Accepted to AAAI 2019 Spring Symposium Series : Story-Enabled Intelligence

  17. arXiv:1607.05327  [pdf, other

    cs.RO cs.HC

    Emotional Storytelling using Virtual and Robotic Agents

    Authors: Sandra Costa, Alberto Brunete, Byung-Chull Bae, Nikolaos Mavridis

    Abstract: In order to create effective storytelling agents three fundamental questions must be answered: first, is a physically embodied agent preferable to a virtual agent or a voice-only narration? Second, does a human voice have an advantage over a synthesised voice? Third, how should the emotional trajectory of the different characters in a story be related to a storyteller's facial expressions during s… ▽ More

    Submitted 18 July, 2016; originally announced July 2016.

    Comments: 14 pages, 10 Figures, 3 Tables