Search | arXiv e-print repository

DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction

Authors: Junwen Xiong, Peng Zhang, Tao You, Chuanyue Li, Wei Huang, Yufei Zha

Abstract: Audio-visual saliency prediction can draw support from diverse modality complements, but further performance enhancement is still challenged by customized architectures as well as task-specific loss functions. In recent studies, denoising diffusion models have shown more promising in unifying task frameworks owing to their inherent ability of generalization. Following this motivation, a novel Diff… ▽ More Audio-visual saliency prediction can draw support from diverse modality complements, but further performance enhancement is still challenged by customized architectures as well as task-specific loss functions. In recent studies, denoising diffusion models have shown more promising in unifying task frameworks owing to their inherent ability of generalization. Following this motivation, a novel Diffusion architecture for generalized audio-visual Saliency prediction (DiffSal) is proposed in this work, which formulates the prediction problem as a conditional generative task of the saliency map by utilizing input audio and video as the conditions. Based on the spatio-temporal audio-visual features, an extra network Saliency-UNet is designed to perform multi-modal attention modulation for progressive refinement of the ground-truth saliency map from the noisy map. Extensive experiments demonstrate that the proposed DiffSal can achieve excellent performance across six challenging audio-visual benchmarks, with an average relative improvement of 6.3\% over the previous state-of-the-art results by six metrics. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 15 pages, CVPR24

arXiv:2402.19329 [pdf, other]

Social Links vs. Language Barriers: Decoding the Global Spread of Streaming Content

Authors: Seoyoung Park, Sanghyeok Park, Taekho You, **hyuk Yun

Abstract: The development of the internet has allowed for the global distribution of content, redefining media communication and property structures through various streaming platforms. Previous studies successfully clarified the factors contributing to trends in each streaming service, yet the similarities and differences between platforms are commonly unexplored; moreover, the influence of social connecti… ▽ More The development of the internet has allowed for the global distribution of content, redefining media communication and property structures through various streaming platforms. Previous studies successfully clarified the factors contributing to trends in each streaming service, yet the similarities and differences between platforms are commonly unexplored; moreover, the influence of social connections and cultural similarity is usually overlooked. We hereby examine the social aspects of three significant streaming services--Netflix, Spotify, and YouTube--with an emphasis on the dissemination of content across countries. Using two-year-long trending chart datasets, we find that streaming content can be divided into two types: video-oriented (Netflix) and audio-oriented (Spotify). This characteristic is differentiated by accounting for the significance of social connectedness and linguistic similarity: audio-oriented content travels via social links, but video-oriented content tends to spread throughout linguistically akin countries. Interestingly, user-generated contents, YouTube, exhibits a dual characteristic by integrating both visual and auditory characteristics, indicating the platform is evolving into unique medium rather than simply residing a midpoint between video and audio media. △ Less

Submitted 18 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: 8pages, 4 figures, and 1 table in manuscript, including 9 SI figures and 7 SI tables

arXiv:2402.16664 [pdf, other]

LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery

Authors: Kexin Chen, Yuyang Du, Tao You, Mobarakol Islam, Ziyu Guo, Yueming **, Guangyong Chen, Pheng-Ann Heng

Abstract: Visual question answering (VQA) can be fundamentally crucial for promoting robotic-assisted surgical education. In practice, the needs of trainees are constantly evolving, such as learning more surgical types, adapting to different robots, and learning new surgical instruments and techniques for one surgery. Therefore, continually updating the VQA system by a sequential data stream from multiple r… ▽ More Visual question answering (VQA) can be fundamentally crucial for promoting robotic-assisted surgical education. In practice, the needs of trainees are constantly evolving, such as learning more surgical types, adapting to different robots, and learning new surgical instruments and techniques for one surgery. Therefore, continually updating the VQA system by a sequential data stream from multiple resources is demanded in robotic surgery to address new tasks. In surgical scenarios, the storage cost and patient data privacy often restrict the availability of old data when updating the model, necessitating an exemplar-free continual learning (CL) setup. However, prior studies overlooked two vital problems of the surgical domain: i) large domain shifts from diverse surgical operations collected from multiple departments or clinical centers, and ii) severe data imbalance arising from the uneven presence of surgical instruments or activities during surgical procedures. This paper proposes to address these two problems with a multimodal large language model (LLM) and an adaptive weight assignment methodology. We first develop a new multi-teacher CL framework that leverages a multimodal LLM as the additional teacher. The strong generalization ability of the LLM can bridge the knowledge gap when domain shifts and data imbalances occur. We then put forth a novel data processing method that transforms complex LLM embeddings into logits compatible with our CL framework. We further design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of the old CL model. We construct a new dataset for surgical VQA tasks, providing valuable data resources for future research. Extensive experimental results on three datasets demonstrate the superiority of our method to other advanced CL models. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: This paper has been accapted by 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2312.07844 [pdf, other]

Regional profile of questionable publishing

Authors: Taekho You, **seo Park, June Young Lee, **hyuk Yun

Abstract: Countries and authors in the academic periphery occasionally have been criticized for contributing to the expansion of questionable publishing because they share a major fraction of papers in questionable journals. On the other side, topics preferred by mainstream journals sometimes necessitate large-scale investigation, which is impossible for develo** countries. Thus, local journals, commonly… ▽ More Countries and authors in the academic periphery occasionally have been criticized for contributing to the expansion of questionable publishing because they share a major fraction of papers in questionable journals. On the other side, topics preferred by mainstream journals sometimes necessitate large-scale investigation, which is impossible for develo** countries. Thus, local journals, commonly low-impacted, are essential to sustain the regional academia for such countries. In this study, we perform an in-depth analysis of the distribution of questionable publications and journals with their interplay with countries quantifying the influence of questionable publications regarding academia's inequality. We find that low-impact journals play a vital role in the regional academic environment, whereas questionable journals with equivalent impact publish papers from all over the world, both geographically and academically. The business model of questionable journals differs from that of regional journals, and may thus be detrimental to the broader academic community. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 13 pages, 4 figures, supplementary information with 8 SI figures

arXiv:2310.19464 [pdf, other]

Generative Neural Fields by Mixtures of Neural Implicit Functions

Authors: Tackgeun You, Mijeong Kim, Jungtaek Kim, Bohyung Han

Abstract: We propose a novel approach to learning the generative neural fields represented by linear combinations of implicit basis networks. Our algorithm learns basis networks in the form of implicit neural representations and their coefficients in a latent space by either conducting meta-learning or adopting auto-decoding paradigms. The proposed method easily enlarges the capacity of generative neural fi… ▽ More We propose a novel approach to learning the generative neural fields represented by linear combinations of implicit basis networks. Our algorithm learns basis networks in the form of implicit neural representations and their coefficients in a latent space by either conducting meta-learning or adopting auto-decoding paradigms. The proposed method easily enlarges the capacity of generative neural fields by increasing the number of basis networks while maintaining the size of a network for inference to be small through their weighted model averaging. Consequently, sampling instances using the model is efficient in terms of latency and memory footprint. Moreover, we customize denoising diffusion probabilistic model for a target task to sample latent mixture coefficients, which allows our final model to generate unseen data effectively. Experiments show that our approach achieves competitive generation performance on diverse benchmarks for images, voxel data, and NeRF scenes without sophisticated designs for specific modalities and domains. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2309.08220 [pdf, other]

UniST: Towards Unifying Saliency Transformer for Video Saliency Prediction and Detection

Authors: Junwen Xiong, Peng Zhang, Chuanyue Li, Wei Huang, Yufei Zha, Tao You

Abstract: Video saliency prediction and detection are thriving research domains that enable computers to simulate the distribution of visual attention akin to how humans perceiving dynamic scenes. While many approaches have crafted task-specific training paradigms for either video saliency prediction or video salient object detection tasks, few attention has been devoted to devising a generalized saliency m… ▽ More Video saliency prediction and detection are thriving research domains that enable computers to simulate the distribution of visual attention akin to how humans perceiving dynamic scenes. While many approaches have crafted task-specific training paradigms for either video saliency prediction or video salient object detection tasks, few attention has been devoted to devising a generalized saliency modeling framework that seamlessly bridges both these distinct tasks. In this study, we introduce the Unified Saliency Transformer (UniST) framework, which comprehensively utilizes the essential attributes of video saliency prediction and video salient object detection. In addition to extracting representations of frame sequences, a saliency-aware transformer is designed to learn the spatio-temporal representations at progressively increased resolutions, while incorporating effective cross-scale saliency information to produce a robust representation. Furthermore, a task-specific decoder is proposed to perform the final prediction for each task. To the best of our knowledge, this is the first work that explores designing a transformer structure for both saliency modeling tasks. Convincible experiments demonstrate that the proposed UniST achieves superior performance across seven challenging benchmarks for two tasks, and significantly outperforms the other state-of-the-art methods. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: 11 pages, 7 figures

arXiv:2308.04767 [pdf, other]

doi 10.1145/3581783.3612502

Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

Authors: Tianyu Liu, Peng Zhang, Wei Huang, Yufei Zha, Tao You, Yanning Zhang

Abstract: Self-supervised sound source localization is usually challenged by the modality inconsistency. In recent studies, contrastive learning based strategies have shown promising to establish such a consistent correspondence between audio and sound sources in visual scenarios. Unfortunately, the insufficient attention to the heterogeneity influence in the different modality features still limits this sc… ▽ More Self-supervised sound source localization is usually challenged by the modality inconsistency. In recent studies, contrastive learning based strategies have shown promising to establish such a consistent correspondence between audio and sound sources in visual scenarios. Unfortunately, the insufficient attention to the heterogeneity influence in the different modality features still limits this scheme to be further improved, which also becomes the motivation of our work. In this study, an Induction Network is proposed to bridge the modality gap more effectively. By decoupling the gradients of visual and audio modalities, the discriminative visual representations of sound sources can be learned with the designed Induction Vector in a bootstrap manner, which also enables the audio modality to be aligned with the visual modality consistently. In addition to a visual weighted contrastive loss, an adaptive threshold selection strategy is introduced to enhance the robustness of the Induction Network. Substantial experiments conducted on SoundNet-Flickr and VGG-Sound Source datasets have demonstrated a superior performance compared to other state-of-the-art works in different challenging scenarios. The code is available at https://github.com/Tahy1/AVIN △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: Accepted to ACM Multimedia 2023

arXiv:2301.01926 [pdf, other]

Auditing citation polarization during the early COVID-19 pandemic

Authors: Taekho You, **seo Park, June Young Lee, **hyuk Yun

Abstract: The recent pandemic stimulated scientists to publish a significant amount of research that created a surge of citations of COVID-19-related publications in a short time, leading to an abrupt inflation of the journal impact factor (IF). By auditing the complete set of COVID-19-related publications in the Web of Science, we reveal here that COVID-19-related research worsened the polarization of acad… ▽ More The recent pandemic stimulated scientists to publish a significant amount of research that created a surge of citations of COVID-19-related publications in a short time, leading to an abrupt inflation of the journal impact factor (IF). By auditing the complete set of COVID-19-related publications in the Web of Science, we reveal here that COVID-19-related research worsened the polarization of academic journals: the IF before the pandemic was proportional to the increment of IF, which had the effect of increasing inequality while retaining the journal rankings. We also found that the most highly cited studies related to COVID-19 were published in prestigious journals at the onset of the epidemic. Through the present quantitative investigation, our findings caution against the belief that quantitative metrics, particularly IF, can indicate the significance of individual papers. Rather, such metrics reflect the social attention given to a particular study. △ Less

Submitted 24 May, 2024; v1 submitted 5 January, 2023; originally announced January 2023.

Comments: 31 pages of main text including 4 figures + 13 pages of supplementary information including 2 supplementary tables and 10 supplementary figures

arXiv:2204.11062 [pdf, other]

Selective clustering ensemble based on kappa and F-score

Authors: Jie Yan, Xin Liu, Ji Qi, Tao You, Zhong-Yuan Zhang

Abstract: Clustering ensemble has an impressive performance in improving the accuracy and robustness of partition results and has received much attention in recent years. Selective clustering ensemble (SCE) can further improve the ensemble performance by selecting base partitions or clusters in according to diversity and stability. However, there is a conflict between diversity and stability, and how to mak… ▽ More Clustering ensemble has an impressive performance in improving the accuracy and robustness of partition results and has received much attention in recent years. Selective clustering ensemble (SCE) can further improve the ensemble performance by selecting base partitions or clusters in according to diversity and stability. However, there is a conflict between diversity and stability, and how to make the trade-off between the two is challenging. The key here is how to evaluate the quality of the base partitions and clusters. In this paper, we propose a new evaluation method for partitions and clusters using kappa and F-score, leading to a new SCE method, which uses kappa to select informative base partitions and uses F-score to weight clusters based on stability. The effectiveness and efficiency of the proposed method is empirically validated over real datasets. △ Less

Submitted 23 April, 2022; originally announced April 2022.

arXiv:2106.15166 [pdf, other]

doi 10.1016/j.joi.2022.101294

Disturbance of questionable publishing to academia

Authors: Taekho You, **seo Park, June Young Lee, **hyuk Yun, Woo-Sung Jung

Abstract: Questionable publications have been accused of "greedy" practices; however, their influence on academia has not been gauged. Here, we probe the impact of questionable publications through a systematic and comprehensive analysis with various participants from academia and compare the results with those of their unaccused counterparts using billions of citation records, including liaisons, i.e., jou… ▽ More Questionable publications have been accused of "greedy" practices; however, their influence on academia has not been gauged. Here, we probe the impact of questionable publications through a systematic and comprehensive analysis with various participants from academia and compare the results with those of their unaccused counterparts using billions of citation records, including liaisons, i.e., journals and publishers, and prosumers, i.e., authors. Questionable publications attribute publisher-level self-citations to their journals while limiting journal-level self-citations; yet, conventional journal-level metrics are unable to detect these publisher-level self-citations. We propose a hybrid journal-publisher metric for detecting self-favouring citations among QJs from publishers. Additionally, we demonstrate that the questionable publications were less disruptive and influential than their counterparts. Our findings indicate an inflated citation impact of suspicious academic publishers. The findings provide a basis for actionable policy-making against questionable publications. △ Less

Submitted 19 April, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

Comments: 16 pages of main text including 4 figures + 42 pages of supplementary information including 38 supplementary figures

Journal ref: Journal of Informetrics, 2022, 16(2), 101294

arXiv:1906.03950 [pdf, other]

Domain-Specific Batch Normalization for Unsupervised Domain Adaptation

Authors: Woong-Gi Chang, Tackgeun You, Seonguk Seo, Suha Kwak, Bohyung Han

Abstract: We propose a novel unsupervised domain adaptation framework based on domain-specific batch normalization in deep neural networks. We aim to adapt to both domains by specializing batch normalization layers in convolutional neural networks while allowing them to share all other model parameters, which is realized by a two-stage algorithm. In the first stage, we estimate pseudo-labels for the example… ▽ More We propose a novel unsupervised domain adaptation framework based on domain-specific batch normalization in deep neural networks. We aim to adapt to both domains by specializing batch normalization layers in convolutional neural networks while allowing them to share all other model parameters, which is realized by a two-stage algorithm. In the first stage, we estimate pseudo-labels for the examples in the target domain using an external unsupervised domain adaptation algorithm---for example, MSTN or CPUA---integrating the proposed domain-specific batch normalization. The second stage learns the final models using a multi-task classification loss for the source and target domains. Note that the two domains have separate batch normalization layers in both stages. Our framework can be easily incorporated into the domain adaptation techniques based on deep neural networks with batch normalization layers. We also present that our approach can be extended to the problem with multiple source domains. The proposed algorithm is evaluated on multiple benchmark datasets and achieves the state-of-the-art accuracy in the standard setting and the multi-source domain adaption scenario. △ Less

Submitted 27 May, 2019; originally announced June 2019.

arXiv:1905.09780 [pdf, other]

Bayesian Optimization with Approximate Set Kernels

Authors: Jungtaek Kim, Michael McCourt, Tackgeun You, Saehoon Kim, Seung** Choi

Abstract: We propose a practical Bayesian optimization method over sets, to minimize a black-box function that takes a set as a single input. Because set inputs are permutation-invariant, traditional Gaussian process-based Bayesian optimization strategies which assume vector inputs can fall short. To address this, we develop a Bayesian optimization method with \emph{set kernel} that is used to build surroga… ▽ More We propose a practical Bayesian optimization method over sets, to minimize a black-box function that takes a set as a single input. Because set inputs are permutation-invariant, traditional Gaussian process-based Bayesian optimization strategies which assume vector inputs can fall short. To address this, we develop a Bayesian optimization method with \emph{set kernel} that is used to build surrogate functions. This kernel accumulates similarity over set elements to enforce permutation-invariance, but this comes at a greater computational cost. To reduce this burden, we propose two key components: (i) a more efficient approximate set kernel which is still positive-definite and is an unbiased estimator of the true set kernel with upper-bounded variance in terms of the number of subsamples, (ii) a constrained acquisition function optimization over sets, which uses symmetry of the feasible region that defines a set input. Finally, we present several numerical experiments which demonstrate that our method outperforms other methods. △ Less

Submitted 24 January, 2021; v1 submitted 23 May, 2019; originally announced May 2019.

Comments: 18 pages, 7 figures, 5 tables, accepted for publication in Machine Learning Journal

arXiv:1901.05447 [pdf, other]

doi 10.7232/iems.2018.17.4.833

A System Dynamics Analysis of National R&D Performance Measurement System in Korea

Authors: Taekho You, Woo-Sung Jung

Abstract: Peer review is one of useful and powerful performance measurement process. In Korea, it needs to increase quality of R&D performance, but bibliometric evaluation and lack of peers have opposite effect. We used system dynamics to describe Korean R&D performance measurement system and ways to increase performance quality. To meet a desired R&D performance quality, increasing fairness and quality of… ▽ More Peer review is one of useful and powerful performance measurement process. In Korea, it needs to increase quality of R&D performance, but bibliometric evaluation and lack of peers have opposite effect. We used system dynamics to describe Korean R&D performance measurement system and ways to increase performance quality. To meet a desired R&D performance quality, increasing fairness and quality of evaluation is needed. Size of peer pool decreased because of the specialization of R&D projects and the Sangpi process both, and it is critical to acquire both fairness and quality. Also, shortening evaluation period affect to R&D performance quality, by causing workloads increase, limiting long-term and innovative R&D projects, and decreasing evaluation quality. Previous evaluation policies do a role like micro-controlling the R&D's activities, but increasing the size of peer pool and changing evaluation period would make a change to quality and fairness of evaluation. △ Less

Submitted 16 January, 2019; originally announced January 2019.

Journal ref: Industrial Engineering & Management Systems Vol.17 No.4 pp.833-839 (2018)

arXiv:1710.05179 [pdf, other]

Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization

Authors: Hyeonwoo Noh, Tackgeun You, Jonghwan Mun, Bohyung Han

Abstract: Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is known as a successful regularizer, but it is still not clear enough why such training techniques work well in practice and how we can maximize their benefit in the… ▽ More Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is known as a successful regularizer, but it is still not clear enough why such training techniques work well in practice and how we can maximize their benefit in the presence of two conflicting objectives---optimizing to true data distribution and preventing overfitting by regularization. This paper addresses the above issues by 1) interpreting that the conventional training methods with regularization by noise injection optimize the lower bound of the true objective and 2) proposing a technique to achieve a tighter lower bound using multiple noise samples per training example in a stochastic gradient descent iteration. We demonstrate the effectiveness of our idea in several computer vision applications. △ Less

Submitted 9 November, 2017; v1 submitted 14 October, 2017; originally announced October 2017.

Comments: NIPS 2017 camera ready

arXiv:1512.07827 [pdf]

doi 10.1016/j.physa.2016.07.025

Community Detection in Complex Networks Using Density-based Clustering Algorithm

Authors: Tao You, Ben-Chang Shia, Zhong-Yuan Zhang

Abstract: Like clustering analysis, community detection aims at assigning nodes in a network into different communities. Fdp is a recently proposed density-based clustering algorithm which does not need the number of clusters as prior input and the result is insensitive to its parameter. However, Fdp cannot be directly applied to community detection due to its inability to recognize the community centers in… ▽ More Like clustering analysis, community detection aims at assigning nodes in a network into different communities. Fdp is a recently proposed density-based clustering algorithm which does not need the number of clusters as prior input and the result is insensitive to its parameter. However, Fdp cannot be directly applied to community detection due to its inability to recognize the community centers in the network. To solve the problem, a new community detection method (named IsoFdp) is proposed in this paper. First, we use Isomap technique to map the network data into a low dimensional manifold which can reveal diverse pair-wised similarity. Then Fdp is applied to detect the communities in networks. An improved partition density function is proposed to select the proper number of communities automatically. We test our method on both synthetic and real-world networks, and the results demonstrate the effectiveness of our algorithm over the state-of-the-art methods. △ Less

Submitted 31 January, 2016; v1 submitted 24 December, 2015; originally announced December 2015.

arXiv:1502.06796 [pdf, other]

Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network

Authors: Seunghoon Hong, Tackgeun You, Suha Kwak, Bohyung Han

Abstract: We propose an online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network (CNN). Given a CNN pre-trained on a large-scale image repository in offline, our algorithm takes outputs from hidden layers of the network as feature descriptors since they show excellent representation performance in various general visual recognition problems. The features ar… ▽ More We propose an online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network (CNN). Given a CNN pre-trained on a large-scale image repository in offline, our algorithm takes outputs from hidden layers of the network as feature descriptors since they show excellent representation performance in various general visual recognition problems. The features are used to learn discriminative target appearance models using an online Support Vector Machine (SVM). In addition, we construct target-specific saliency map by backpropagating CNN features with guidance of the SVM, and obtain the final tracking result in each frame based on the appearance model generatively constructed with the saliency map. Since the saliency map visualizes spatial configuration of target effectively, it improves target localization accuracy and enable us to achieve pixel-level target segmentation. We verify the effectiveness of our tracking algorithm through extensive experiment on a challenging benchmark, where our method illustrates outstanding performance compared to the state-of-the-art tracking algorithms. △ Less

Submitted 24 February, 2015; originally announced February 2015.

Showing 1–16 of 16 results for author: You, T