Skip to main content

Showing 1–28 of 28 results for author: Kwon, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.06650  [pdf, other

    eess.IV cs.CV

    Predicting the risk of early-stage breast cancer recurrence using H\&E-stained tissue images

    Authors: Geongyu Lee, Joonho Lee, Tae-Yeong Kwak, Sun Woo Kim, Youngmee Kwon, Chungyeul Kim, Hyeyoon Chang

    Abstract: Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images la… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 12 pages, 7 figures

  2. arXiv:2403.12098  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Deep Generative Design for Mass Production

    Authors: Jihoon Kim, Yongmin Kwon, Namwoo Kang

    Abstract: Generative Design (GD) has evolved as a transformative design approach, employing advanced algorithms and AI to create diverse and innovative solutions beyond traditional constraints. Despite its success, GD faces significant challenges regarding the manufacturability of complex designs, often necessitating extensive manual modifications due to limitations in standard manufacturing processes and t… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  3. arXiv:2309.14741  [pdf, other

    eess.AS cs.SD

    Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

    Authors: Hee-Soo Heo, KiHyun Nam, Bong-** Lee, Youngki Kwon, Minjae Lee, You ** Kim, Joon Son Chung

    Abstract: In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remain… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  4. arXiv:2309.04655  [pdf

    cs.RO cs.LG eess.SP eess.SY

    Intelligent upper-limb exoskeleton integrated with soft wearable bioelectronics and deep-learning for human intention-driven strength augmentation based on sensory feedback

    Authors: **woo Lee, Kangkyu Kwon, Ira Soltis, Jared Matthews, Yoonjae Lee, Hojoong Kim, Lissette Romero, Nathan Zavanelli, Young** Kwon, Shinjae Kwon, Jimin Lee, Yewon Na, Sung Hoon Lee, Ki Jun Yu, Minoru Shinohara, Frank L. Hammond, Woon-Hong Yeo

    Abstract: The age and stroke-associated decline in musculoskeletal strength degrades the ability to perform daily human tasks using the upper extremities. Although there are a few examples of exoskeletons, they need manual operations due to the absence of sensor feedback and no intention prediction of movements. Here, we introduce an intelligent upper-limb exoskeleton system that uses cloud-based deep learn… ▽ More

    Submitted 26 January, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: 15 pages, 6 figures, 1 table, published in npj flexible electronics journals

    MSC Class: 68T40 (Primary) 92C55; 68T99 (Secondary)

  5. arXiv:2307.02784  [pdf, other

    cs.IT cs.NI eess.SP

    On the Spatial-Wideband Effects in Millimeter-Wave Cell-Free Massive MIMO

    Authors: Seyoung Ahn, Soohyeong Kim, Yongseok Kwon, Joohan Park, Jiseung Youn, Sunghyun Cho

    Abstract: In this paper, we investigate the spatial-wideband effects in cell-free massive MIMO (CF-mMIMO) systems in mmWave bands. The utilization of mmWave frequencies brings challenges such as signal attenuation and the need for denser networks like ultra-dense networks (UDN) to maintain communication performance. CF-mMIMO is introduced as a solution, where distributed access points (APs) transmit signals… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  6. arXiv:2306.00680  [pdf, other

    cs.SD cs.AI eess.AS

    Encoder-decoder multimodal speaker change detection

    Authors: Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You ** Kim, Young-ki Kwon, Minjae Lee, Bong-** Lee

    Abstract: The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance. Recently, multimodal SCD (MMSCD) models, which utilise text modality in addition to audio, have shown improved performance. In this study, the proposed model are bui… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 5 pages, accepted for presentation at INTERSPEECH 2023

  7. arXiv:2305.13970  [pdf, other

    eess.SY

    Darwin: A DRAM-based Multi-level Processing-in-Memory Architecture for Data Analytics

    Authors: Donghyuk Kim, Jae-Young Kim, Wontak Han, Jongsoon Won, Haerang Choi, Yongkee Kwon, Joo-Young Kim

    Abstract: Processing-in-memory (PIM) architecture is an inherent match for data analytics application, but we observe major challenges to address when accelerating it using PIM. In this paper, we propose Darwin, a practical LRDIMM-based multi-level PIM architecture for data analytics, which fully exploits the internal bandwidth of DRAM using the bank-, bank group-, chip-, and rank-level parallelisms. Consid… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 14 pages, 16 figures

  8. arXiv:2302.13750  [pdf, other

    eess.AS cs.CL cs.SD

    MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

    Authors: Yoohwan Kwon, Soo-Whan Chung

    Abstract: Multi-lingual speech recognition aims to distinguish linguistic expressions in different languages and integrate acoustic processing simultaneously. In contrast, current multi-lingual speech recognition research follows a language-aware paradigm, mainly targeted to improve recognition performance rather than discriminate language characteristics. In this paper, we present a multi-lingual speech re… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  9. arXiv:2302.01493  [pdf

    eess.IV cs.CV physics.med-ph

    Deep Learning (DL)-based Automatic Segmentation of the Internal Pudendal Artery (IPA) for Reduction of Erectile Dysfunction in Definitive Radiotherapy of Localized Prostate Cancer

    Authors: Anjali Balagopal, Michael Dohopolski, Young Suk Kwon, Steven Montalvo, Howard Morgan, Ti Bai, Dan Nguyen, Xiao Liang, Xinran Zhong, Mu-Han Lin, Neil Desai, Steve Jiang

    Abstract: Background and purpose: Radiation-induced erectile dysfunction (RiED) is commonly seen in prostate cancer patients. Clinical trials have been developed in multiple institutions to investigate whether dose-sparing to the internal-pudendal-arteries (IPA) will improve retention of sexual potency. The IPA is usually not considered a conventional organ-at-risk (OAR) due to segmentation difficulty. In t… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  10. arXiv:2211.04768  [pdf, other

    eess.AS cs.SD

    Absolute decision corrupts absolutely: conservative online speaker diarisation

    Authors: Youngki Kwon, Hee-Soo Heo, Bong-** Lee, You ** Kim, Jee-weon Jung

    Abstract: Our focus lies in develo** an online speaker diarisation framework which demonstrates robust performance across diverse domains. In online speaker diarisation, outputs generated in real-time are irreversible, and a few misjudgements in the early phase of an input session can lead to catastrophic results. We hypothesise that cautiously increasing the number of estimated speakers is of paramount i… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: 5pages, 2 figure, 4 tables, submitted to ICASSP

  11. arXiv:2211.04060  [pdf, other

    cs.SD cs.CL eess.AS

    High-resolution embedding extractor for speaker diarisation

    Authors: Hee-Soo Heo, Youngki Kwon, Bong-** Lee, You ** Kim, Jee-weon Jung

    Abstract: Speaker embedding extractors significantly influence the performance of clustering-based speaker diarisation systems. Conventionally, only one embedding is extracted from each speech segment. However, because of the sliding window approach, a segment easily includes two or more speakers owing to speaker change points. This study proposes a novel embedding extractor architecture, referred to as a h… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 5pages, 2 figure, 3 tables, submitted to ICASSP

  12. arXiv:2210.14682  [pdf, other

    cs.SD cs.AI eess.AS

    In search of strong embedding extractors for speaker diarisation

    Authors: Jee-weon Jung, Hee-Soo Heo, Bong-** Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, Joon Son Chung

    Abstract: Speaker embedding extractors (EEs), which map input audio to a speaker discriminant latent space, are of paramount importance in speaker diarisation. However, there are several challenges when adopting EEs for diarisation, from which we tackle two key problems. First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: 5pages, 1 figure, 2 tables, submitted to ICASSP

  13. arXiv:2210.10985  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Large-scale learning of generalised representations for speaker recognition

    Authors: Jee-weon Jung, Hee-Soo Heo, Bong-** Lee, Jaesong Lee, Hye-** Shim, Youngki Kwon, Joon Son Chung, Shinji Watanabe

    Abstract: The objective of this work is to develop a speaker recognition model to be used in diverse scenarios. We hypothesise that two components should be adequately configured to build such a model. First, adequate architecture would be required. We explore several recent state-of-the-art models, including ECAPA-TDNN and MFA-Conformer, as well as other baselines. Second, a massive amount of data would be… ▽ More

    Submitted 27 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: 5pages, 5 tables, submitted to ICASSP

  14. Physically Consistent Preferential Bayesian Optimization for Food Arrangement

    Authors: Yuhwan Kwon, Yoshihisa Tsurumine, Takeshi Shimmura, Sadao Kawamura, Takamitsu Matsubara

    Abstract: This paper considers the problem of estimating a preferred food arrangement for users from interactive pairwise comparisons using Computer Graphics (CG)-based dish images. As a foodservice industry requirement, we need to utilize domain rules for the geometry of the arrangement (e.g., the food layout of some Japanese dishes is reminiscent of mountains). However, those rules are qualitative and amb… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: 8 pages, 10 figures, accepted by IEEE Robotics and Automation Letters (RA-L) 2022

  15. arXiv:2203.14525  [pdf, other

    eess.AS

    Curriculum learning for self-supervised speaker verification

    Authors: Hee-Soo Heo, Jee-weon Jung, **gu Kang, Youngki Kwon, You ** Kim, Bong-** Lee, Joon Son Chung

    Abstract: The goal of this paper is to train effective self-supervised speaker representations without identity labels. We propose two curriculum learning strategies within a self-supervised learning framework. The first strategy aims to gradually increase the number of speakers in the training phase by enlarging the used portion of the train dataset. The second strategy applies various data augmentations t… ▽ More

    Submitted 13 February, 2024; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: INTERSPEECH 2023. 5 pages, 3 figures, 4 tables

  16. arXiv:2203.08488  [pdf, other

    eess.AS cs.AI

    Pushing the limits of raw waveform speaker recognition

    Authors: Jee-weon Jung, You ** Kim, Hee-Soo Heo, Bong-** Lee, Youngki Kwon, Joon Son Chung

    Abstract: In recent years, speaker recognition systems based on raw waveform inputs have received increasing attention. However, the performance of such systems are typically inferior to the state-of-the-art handcrafted feature-based counterparts, which demonstrate equal error rates under 1% on the popular VoxCeleb1 test set. This paper proposes a novel speaker recognition model based on raw waveform inputs… ▽ More

    Submitted 28 March, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: submitted to INTERSPEECH 2022 as a conference paper. 5 pages, 2 figures, 5 tables

  17. arXiv:2110.03361  [pdf, other

    eess.AS cs.AI

    Multi-scale speaker embedding-based graph attention networks for speaker diarisation

    Authors: Youngki Kwon, Hee-Soo Heo, Jee-weon Jung, You ** Kim, Bong-** Lee, Joon Son Chung

    Abstract: The objective of this work is effective speaker diarisation using multi-scale speaker embeddings. Typically, there is a trade-off between the ability to recognise short speaker segments and the discriminative power of the embedding, according to the segment length used for embedding extraction. To this end, recent works have proposed the use of multi-scale embeddings where segments with varying le… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: 5 pages, 2 figures, submitted to ICASSP as a conference paper

  18. arXiv:2108.10137  [pdf, other

    eess.IV

    Finding essential parts of the brain in rs-fMRI can improve diagnosing ADHD by Deep Learning

    Authors: Byunggun Kim, Jaeseon Park, Taehun Kim, Younghun Kwon

    Abstract: Attention Deficit\Hyperactivity Disorder(ADHD) is considered a very common psychiatric disorder, but it is difficult to establish an accurate diagnostic method for ADHD. Recently, with the development of computing resources and machine learning methods, studies have been conducted to classify ADHD using resting-state functional magnetic resonance(rsfMRI) imaging data. However, most of them utilize… ▽ More

    Submitted 14 August, 2021; originally announced August 2021.

    Comments: 10 pages, 6 figures

  19. arXiv:2108.07640  [pdf, other

    cs.CV cs.SD eess.AS eess.IV

    Look Who's Talking: Active Speaker Detection in the Wild

    Authors: You ** Kim, Hee-Soo Heo, Soyeon Choe, Soo-Whan Chung, Yoohwan Kwon, Bong-** Lee, Youngki Kwon, Joon Son Chung

    Abstract: In this work, we present a novel audio-visual dataset for active speaker detection in the wild. A speaker is considered active when his or her face is visible and the voice is audible simultaneously. Although active speaker detection is a crucial pre-processing step for many audio-visual tasks, there is no existing dataset of natural human speech to evaluate the performance of active speaker detec… ▽ More

    Submitted 17 August, 2021; originally announced August 2021.

    Comments: To appear in Interspeech 2021. Data will be available from https://github.com/clovaai/lookwhostalking

  20. arXiv:2106.07268  [pdf, other

    cs.SD cs.LG eess.AS

    FastICARL: Fast Incremental Classifier and Representation Learning with Efficient Budget Allocation in Audio Sensing Applications

    Authors: Young D. Kwon, Jagmohan Chauhan, Cecilia Mascolo

    Abstract: Various incremental learning (IL) approaches have been proposed to help deep learning models learn new tasks/classes continuously without forgetting what was learned previously (i.e., avoid catastrophic forgetting). With the growing number of deployed audio sensing applications that need to dynamically incorporate new tasks and changing input distribution from users, the ability of IL on-device be… ▽ More

    Submitted 24 June, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: Accepted for publication at INTERSPEECH 2021

  21. arXiv:2104.02879  [pdf, other

    eess.AS cs.LG cs.SD

    Adapting Speaker Embeddings for Speaker Diarisation

    Authors: Youngki Kwon, Jee-weon Jung, Hee-Soo Heo, You ** Kim, Bong-** Lee, Joon Son Chung

    Abstract: The goal of this paper is to adapt speaker embeddings for solving the problem of speaker diarisation. The quality of speaker embeddings is paramount to the performance of speaker diarisation systems. Despite this, prior works in the field have directly used embeddings designed only to be effective on the speaker verification task. In this paper, we propose three techniques that can be used to bett… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 5 pages, 2 figures, 3 tables, submitted to Interspeech as a conference paper

  22. arXiv:2104.02878  [pdf, other

    eess.AS cs.LG cs.SD

    Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network

    Authors: Jee-weon Jung, Hee-Soo Heo, Youngki Kwon, Joon Son Chung, Bong-** Lee

    Abstract: In this work, we propose an overlapped speech detection system trained as a three-class classifier. Unlike conventional systems that perform binary classification as to whether or not a frame contains overlapped speech, the proposed approach classifies into three classes: non-speech, single speaker speech, and overlapped speech. By training a network with the more detailed label definition, the mo… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 5 pages, 2 figures, 4 tables, submitted to Interspeech as a conference paper

  23. arXiv:2011.14885  [pdf, ps, other

    cs.SD eess.AS

    Look who's not talking

    Authors: Youngki Kwon, Hee Soo Heo, Jaesung Huh, Bong-** Lee, Joon Son Chung

    Abstract: The objective of this work is speaker diarisation of speech recordings 'in the wild'. The ability to determine speech segments is a crucial part of diarisation systems, accounting for a large proportion of errors. In this paper, we present a simple but effective solution for speech activity detection based on the speaker embeddings. In particular, we discover that the norm of the speaker embedding… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

    Comments: SLT 2021

  24. arXiv:2011.02168  [pdf, other

    eess.AS

    Learning in your voice: Non-parallel voice conversion based on speaker consistency loss

    Authors: Yoohwan Kwon, Soo-Whan Chung, Hee-Soo Heo, Hong-Goo Kang

    Abstract: In this paper, we propose a novel voice conversion strategy to resolve the mismatch between the training and conversion scenarios when parallel speech corpus is unavailable for training. Based on auto-encoder and disentanglement frameworks, we design the proposed model to extract identity and content representations while reconstructing the input speech signal itself. Since we use other speaker's… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: ICASSP 2021 submitted

  25. arXiv:2010.15809  [pdf, other

    cs.SD eess.AS

    The ins and outs of speaker recognition: lessons from VoxSRC 2020

    Authors: Yoohwan Kwon, Hee-Soo Heo, Bong-** Lee, Joon Son Chung

    Abstract: The VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020 offers a challenging evaluation for speaker recognition systems, which includes celebrities playing different parts in movies. The goal of this work is robust speaker recognition of utterances recorded in these challenging environments. We utilise variants of the popular ResNet architecture for speaker recognition and perform… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

  26. arXiv:2008.05983  [pdf, other

    eess.AS cs.SD

    Cross attentive pooling for speaker verification

    Authors: Seong Min Kye, Yoohwan Kwon, Joon Son Chung

    Abstract: The goal of this paper is text-independent speaker verification where utterances come from 'in the wild' videos and may contain irrelevant signal. While speaker verification is naturally a pair-wise problem, existing methods to produce the speaker embeddings are instance-wise. In this paper, we propose Cross Attentive Pooling (CAP) that utilizes the context information across the reference-query p… ▽ More

    Submitted 3 December, 2020; v1 submitted 13 August, 2020; originally announced August 2020.

    Comments: SLT 2021. Code available at https://github.com/seongmin-kye/CAP

  27. arXiv:2008.01348  [pdf, other

    eess.AS cs.SD

    Intra-class variation reduction of speaker representation in disentanglement framework

    Authors: Yoohwan Kwon, Soo-Whan Chung, Hong-Goo Kang

    Abstract: In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent representations or embeddings containing solely speakercharacteristic information in order to be robust in terms of intra-speaker variations. By modifying the network architecture togenerate both speaker-re… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

    Comments: Accepted for INTERSPEECH 2020

  28. arXiv:1901.07375  [pdf

    cs.CV cs.LG eess.IV stat.ML

    Extension of Convolutional Neural Network with General Image Processing Kernels

    Authors: Jay Hoon Jung, Yousun Shin, YoungMin Kwon

    Abstract: We applied pre-defined kernels also known as filters or masks developed for image processing to convolution neural network. Instead of letting neural networks find its own kernels, we used 41 different general-purpose kernels of blurring, edge detecting, sharpening, discrete cosine transformation, etc. for the first layer of the convolution neural networks. This architecture, thus named as general… ▽ More

    Submitted 16 January, 2019; originally announced January 2019.

    Comments: 4 pages, 6 figures

    Journal ref: TENCON 2018