Search | arXiv e-print repository

Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation

Authors: Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi

Abstract: This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting the acoustic scene behind speech signals to desired environments. We propose AST-LDM for the AST task, which generates speech signals accompanied by… ▽ More This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting the acoustic scene behind speech signals to desired environments. We propose AST-LDM for the AST task, which generates speech signals accompanied by the target acoustic scene of the reference prompt. Specifically, AST-LDM is a latent diffusion model conditioned by CLAP embeddings that describe target acoustic scenes in either audio or text modalities. The contributions of this paper include introducing the AST task and implementing its baseline model. For AST-LDM, we emphasize its core framework, which is to preserve the input speech and generate audio consistently with both the given speech and the target acoustic environment. Experiments, including objective and subjective tests, validate the feasibility and efficacy of our approach. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted to Interspeech 2024

arXiv:2406.06534 [pdf, other]

Compressed Meta-Optical Encoder for Image Classification

Authors: Anna Wirth-Singh, **lin Xiang, Minho Choi, Johannes E. Fröch, Luocheng Huang, Shane Colburn, Eli Shlizerman, Arka Majumdar

Abstract: Optical and hybrid convolutional neural networks (CNNs) recently have become of increasing interest to achieve low-latency, low-power image classification and computer vision tasks. However, implementing optical nonlinearity is challenging, and omitting the nonlinear layers in a standard CNN comes at a significant reduction in accuracy. In this work, we use knowledge distillation to compress modif… ▽ More Optical and hybrid convolutional neural networks (CNNs) recently have become of increasing interest to achieve low-latency, low-power image classification and computer vision tasks. However, implementing optical nonlinearity is challenging, and omitting the nonlinear layers in a standard CNN comes at a significant reduction in accuracy. In this work, we use knowledge distillation to compress modified AlexNet to a single linear convolutional layer and an electronic backend (two fully connected layers). We obtain comparable performance to a purely electronic CNN with five convolutional layers and three fully connected layers. We implement the convolution optically via engineering the point spread function of an inverse-designed meta-optic. Using this hybrid approach, we estimate a reduction in multiply-accumulate operations from 17M in a conventional electronic modified AlexNet to only 86K in the hybrid compressed network enabled by the optical frontend. This constitutes over two orders of magnitude reduction in latency and power consumption. Furthermore, we experimentally demonstrate that the classification accuracy of the system exceeds 93% on the MNIST dataset. △ Less

Submitted 14 June, 2024; v1 submitted 22 April, 2024; originally announced June 2024.

arXiv:2312.05548 [pdf, other]

doi 10.1109/JBHI.2022.3219123

A Unified Multi-Phase CT Synthesis and Classification Framework for Kidney Cancer Diagnosis with Incomplete Data

Authors: Kwang-Hyun Uhm, Seung-Won Jung, Moon Hyung Choi, Sung-Hoo Hong, Sung-Jea Ko

Abstract: Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effectiv… ▽ More Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effective for the diagnosis task. In this paper, we propose a unified framework for kidney cancer diagnosis with incomplete multi-phase CT, which simultaneously recovers missing CT images and classifies cancer subtypes using the completed set of images. The advantage of our framework is that it encourages a synthesis model to explicitly learn to generate missing CT phases that are helpful for classifying cancer subtypes. We further incorporate lesion segmentation network into our framework to exploit lesion-level features for effective cancer classification in the whole CT volumes. The proposed framework is based on fully 3D convolutional neural networks to jointly optimize both synthesis and classification of 3D CT volumes. Extensive experiments on both in-house and external datasets demonstrate the effectiveness of our framework for the diagnosis with incomplete data compared with state-of-the-art baselines. In particular, cancer subtype classification using the completed CT data by our method achieves higher performance than the classification using the given incomplete data. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics

Journal ref: JBHI, 2022

arXiv:2312.05334 [pdf, other]

ProsDectNet: Bridging the Gap in Prostate Cancer Detection via Transrectal B-mode Ultrasound Imaging

Authors: Sulaiman Vesal, Indrani Bhattacharya, Hassan Jahanandish, Xinran Li, Zachary Kornberg, Steve Ran Zhou, Elijah Richard Sommer, Moon Hyung Choi, Richard E. Fan, Geoffrey A. Sonn, Mirabela Rusu

Abstract: Interpreting traditional B-mode ultrasound images can be challenging due to image artifacts (e.g., shadowing, speckle), leading to low sensitivity and limited diagnostic accuracy. While Magnetic Resonance Imaging (MRI) has been proposed as a solution, it is expensive and not widely available. Furthermore, most biopsies are guided by Transrectal Ultrasound (TRUS) alone and can miss up to 52% cancer… ▽ More Interpreting traditional B-mode ultrasound images can be challenging due to image artifacts (e.g., shadowing, speckle), leading to low sensitivity and limited diagnostic accuracy. While Magnetic Resonance Imaging (MRI) has been proposed as a solution, it is expensive and not widely available. Furthermore, most biopsies are guided by Transrectal Ultrasound (TRUS) alone and can miss up to 52% cancers, highlighting the need for improved targeting. To address this issue, we propose ProsDectNet, a multi-task deep learning approach that localizes prostate cancer on B-mode ultrasound. Our model is pre-trained using radiologist-labeled data and fine-tuned using biopsy-confirmed labels. ProsDectNet includes a lesion detection and patch classification head, with uncertainty minimization using entropy to improve model performance and reduce false positive predictions. We trained and validated ProsDectNet using a cohort of 289 patients who underwent MRI-TRUS fusion targeted biopsy. We then tested our approach on a group of 41 patients and found that ProsDectNet outperformed the average expert clinician in detecting prostate cancer on B-mode ultrasound images, achieving a patient-level ROC-AUC of 82%, a sensitivity of 74%, and a specificity of 67%. Our results demonstrate that ProsDectNet has the potential to be used as a computer-aided diagnosis system to improve targeted biopsy and treatment planning. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: Accepted in NeurIPS 2023 (Medical Imaging meets NeurIPS Workshop)

arXiv:2311.18505 [pdf, other]

String Sound Synthesizer on GPU-accelerated Finite Difference Scheme

Authors: ** Woo Lee, Min Jun Choi, Kyogu Lee

Abstract: This paper introduces a nonlinear string sound synthesizer, based on a finite difference simulation of the dynamic behavior of strings under various excitations. The presented synthesizer features a versatile string simulation engine capable of stochastic parameterization, encompassing fundamental frequency modulation, stiffness, tension, frequency-dependent loss, and excitation control. This open… ▽ More This paper introduces a nonlinear string sound synthesizer, based on a finite difference simulation of the dynamic behavior of strings under various excitations. The presented synthesizer features a versatile string simulation engine capable of stochastic parameterization, encompassing fundamental frequency modulation, stiffness, tension, frequency-dependent loss, and excitation control. This open-source physical model simulator not only benefits the audio signal processing community but also contributes to the burgeoning field of neural network-based audio synthesis by serving as a novel dataset construction tool. Implemented in PyTorch, this synthesizer offers flexibility, facilitating both CPU and GPU utilization, thereby enhancing its applicability as a simulator. GPU utilization expedites computation by parallelizing operations across spatial and batch dimensions, further enhancing its utility as a data generator. △ Less

Submitted 8 January, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: To be appeared in ICASSP 2024

arXiv:2305.18739 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095881

An empirical study on speech restoration guided by self supervised speech representation

Authors: Jaeuk Byun, Youna Ji, Soo Whan Chung, Soyeon Choe, Min Seok Choi

Abstract: Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clip**, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech represen… ▽ More Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clip**, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech representation learning on the speech restoration task. Specifically, we employ speech representation in various speech restoration networks and evaluate their performance under complicated distortion scenarios. Our experiments demonstrate that the contextual information provided by the self-supervised speech representation can enhance speech restoration performance in various distortion scenarios, while also increasing robustness against the duration of speech attenuation and mismatched test conditions. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: To be presented at ICASSP 2023

arXiv:2304.10839 [pdf, other]

doi 10.1109/TMI.2024.3405024

Cross-domain Denoising for Low-dose Multi-frame Spiral Computed Tomography

Authors: Yucheng Lu, Zhixin Xu, Moon Hyung Choi, Jimin Kim, Seung-Won Jung

Abstract: Computed tomography (CT) has been used worldwide as a non-invasive test to assist in diagnosis. However, the ionizing nature of X-ray exposure raises concerns about potential health risks such as cancer. The desire for lower radiation doses has driven researchers to improve reconstruction quality. Although previous studies on low-dose computed tomography (LDCT) denoising have demonstrated the effe… ▽ More Computed tomography (CT) has been used worldwide as a non-invasive test to assist in diagnosis. However, the ionizing nature of X-ray exposure raises concerns about potential health risks such as cancer. The desire for lower radiation doses has driven researchers to improve reconstruction quality. Although previous studies on low-dose computed tomography (LDCT) denoising have demonstrated the effectiveness of learning-based methods, most were developed on the simulated data. However, the real-world scenario differs significantly from the simulation domain, especially when using the multi-slice spiral scanner geometry. This paper proposes a two-stage method for the commercially available multi-slice spiral CT scanners that better exploits the complete reconstruction pipeline for LDCT denoising across different domains. Our approach makes good use of the high redundancy of multi-slice projections and the volumetric reconstructions while leveraging the over-smoothing problem in conventional cascaded frameworks caused by aggressive denoising. The dedicated design also provides a more explicit interpretation of the data flow. Extensive experiments on various datasets showed that the proposed method could remove up to 70\% of noise without compromised spatial resolution, and subjective evaluations by two experienced radiologists further supported its superior performance against state-of-the-art methods in clinical practice. △ Less

Submitted 28 June, 2024; v1 submitted 21 April, 2023; originally announced April 2023.

Journal ref: IEEE Transactions on Medical Imaging (2024)

arXiv:2301.07853 [pdf]

DECISIVE Benchmarking Data Report: sUAS Performance Results from Phase I

Authors: Adam Norton, Reza Ahmadzadeh, Kshitij Jerath, Paul Robinette, Jay Weitzen, Thanuka Wickramarathne, Holly Yanco, Minseop Choi, Ryan Donald, Brendan Donoghue, Christian Dumas, Peter Gavriel, Alden Giedraitis, Brendan Hertel, Jack Houle, Nathan Letteri, Edwin Meriaux, Zahra Rezaei Khavas, Rakshith Singh, Gregg Willcox, Naye Yoni

Abstract: This report reviews all results derived from performance benchmarking conducted during Phase I of the Development and Execution of Comprehensive and Integrated Subterranean Intelligent Vehicle Evaluations (DECISIVE) project by the University of Massachusetts Lowell, using the test methods specified in the DECISIVE Test Methods Handbook v1.1 for evaluating small unmanned aerial systems (sUAS) perfo… ▽ More This report reviews all results derived from performance benchmarking conducted during Phase I of the Development and Execution of Comprehensive and Integrated Subterranean Intelligent Vehicle Evaluations (DECISIVE) project by the University of Massachusetts Lowell, using the test methods specified in the DECISIVE Test Methods Handbook v1.1 for evaluating small unmanned aerial systems (sUAS) performance in subterranean and constrained indoor environments, spanning communications, field readiness, interface, obstacle avoidance, navigation, map**, autonomy, trust, and situation awareness. Using those 20 test methods, over 230 tests were conducted across 8 sUAS platforms: Cleo Robotics Dronut X1P (P = prototype), FLIR Black Hornet PRS, Flyability Elios 2 GOV, Lumenier Nighthawk V3, Parrot ANAFI USA GOV, Skydio X2D, Teal Golden Eagle, and Vantage Robotics Vesper. Best in class criteria is specified for each applicable test method and the sUAS that match this criteria are named for each test method, including a high-level executive summary of their performance. △ Less

Submitted 20 January, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

Comments: Approved for public release: PAO #PR2023_74172; arXiv admin note: substantial text overlap with arXiv:2211.01801

arXiv:2211.01801 [pdf]

DECISIVE Test Methods Handbook: Test Methods for Evaluating sUAS in Subterranean and Constrained Indoor Environments, Version 1.1

Authors: Adam Norton, Reza Ahmadzadeh, Kshitij Jerath, Paul Robinette, Jay Weitzen, Thanuka Wickramarathne, Holly Yanco, Minseop Choi, Ryan Donald, Brendan Donoghue, Christian Dumas, Peter Gavriel, Alden Giedraitis, Brendan Hertel, Jack Houle, Nathan Letteri, Edwin Meriaux, Zahra Rezaei Khavas, Rakshith Singh, Gregg Willcox, Naye Yoni

Abstract: This handbook outlines all test methods developed under the Development and Execution of Comprehensive and Integrated Subterranean Intelligent Vehicle Evaluations (DECISIVE) project by the University of Massachusetts Lowell for evaluating small unmanned aerial systems (sUAS) performance in subterranean and constrained indoor environments, spanning communications, field readiness, interface, obstac… ▽ More This handbook outlines all test methods developed under the Development and Execution of Comprehensive and Integrated Subterranean Intelligent Vehicle Evaluations (DECISIVE) project by the University of Massachusetts Lowell for evaluating small unmanned aerial systems (sUAS) performance in subterranean and constrained indoor environments, spanning communications, field readiness, interface, obstacle avoidance, navigation, map**, autonomy, trust, and situation awareness. For sUAS deployment in subterranean and constrained indoor environments, this puts forth two assumptions about applicable sUAS to be evaluated using these test methods: (1) able to operate without access to GPS signal, and (2) width from prop top to prop tip does not exceed 91 cm (36 in) wide (i.e., can physically fit through a typical doorway, although successful navigation through is not guaranteed). All test methods are specified using a common format: Purpose, Summary of Test Method, Apparatus and Artifacts, Equipment, Metrics, Procedure, and Example Data. All test methods are designed to be run in real-world environments (e.g., MOUT sites) or using fabricated apparatuses (e.g., test bays built from wood, or contained inside of one or more ship** containers). △ Less

Submitted 20 January, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

Comments: Approved for public release: PAO #PR2022_47058

arXiv:2210.17327 [pdf, other]

Diffusion-based Generative Speech Source Separation

Authors: Robin Scheibler, Youna Ji, Soo-Whan Chung, Jaeuk Byun, Soyeon Choe, Min-Seok Choi

Abstract: We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a… ▽ More We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a neural network to approximate the score function of the marginal probabilities or the diffusion-mixing process. Then, we use it to solve the reverse time SDE that progressively separates the sources starting from their mixture. We propose a modified training strategy to handle model mismatch and source permutation ambiguity. Experiments on the WSJ0 2mix dataset demonstrate the potential of the method. Furthermore, the method is also suitable for speech enhancement and shows performance competitive with prior work on the VoiceBank-DEMAND dataset. △ Less

Submitted 2 November, 2022; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: 5 pages, 3 figures, 2 tables. Submitted to ICASSP 2023

arXiv:2203.09769 [pdf, other]

SWIPT-enabled NOMA in Distributed Antenna System with Imperfect Channel State Information for Max-Sum-Rate and Max-Min Fairness

Authors: Dongjae Kim, Minseok Choi, Dong-Wook Seo

Abstract: Motivated by the fact that the data rate of non-orthogonal multiple access (NOMA) can be greatly increased with the help of the distributed antenna system (DAS), we presents a framework in which the DAS contributes not only to the data rate but also the energy harvesting of simultaneous wireless information and power transfer (SWIPT) enabled NOMA. This study considers the sum-rate maximization pro… ▽ More Motivated by the fact that the data rate of non-orthogonal multiple access (NOMA) can be greatly increased with the help of the distributed antenna system (DAS), we presents a framework in which the DAS contributes not only to the data rate but also the energy harvesting of simultaneous wireless information and power transfer (SWIPT) enabled NOMA. This study considers the sum-rate maximization problem and the max-min fairness problem for SWIPT-enabled NOMA in DAS and proposes two different schemes of power splitting and power allocation for SWIPT and NOMA, respectively, with imperfect channel state information (CSI). Numerical results validate the theoretical findings and demonstrate that the proposed framework of using SWIPT-enabled NOMA in DAS achieves the higher data rates than the existing SWIPT-enabled NOMA while guaranteeing the minimum harvested energy. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: 5pages, 5figures

arXiv:2111.09425 [pdf, other]

Quality-Aware Deep Reinforcement Learning for Streaming in Infrastructure-Assisted Connected Vehicles

Authors: Won Joon Yun, Dohyun Kwon, Minseok Choi, Joongheon Kim, Guiseppe Caire, Andreas F. Molisch

Abstract: This paper proposes a deep reinforcement learning-based video streaming scheme for mobility-aware vehicular networks, e.g., vehicles on the highway. We consider infrastructure-assisted and mmWave-based scenarios in which the macro base station (MBS) cannot directly provide the streaming service to vehicles due to the short range of mmWave beams so that small mmWave base stations (mBSs) along the r… ▽ More This paper proposes a deep reinforcement learning-based video streaming scheme for mobility-aware vehicular networks, e.g., vehicles on the highway. We consider infrastructure-assisted and mmWave-based scenarios in which the macro base station (MBS) cannot directly provide the streaming service to vehicles due to the short range of mmWave beams so that small mmWave base stations (mBSs) along the road deliver the desired videos to users. For a smoother streaming service, the MBS proactively pushes video chunks to mBSs. This is done to support vehicles that are currently covered and/or will be by each mBS. We formulate the dynamic video delivery scheme that adaptively determines 1) which content, 2) what quality and 3) how many chunks to be proactively delivered from the MBS to mBSs using Markov decision process (MDP). Since it is difficult for the MBS to track all the channel conditions and the network states have extensive dimensions, we adopt the deep deterministic policy gradient (DDPG) algorithm for the DRL-based video delivery scheme. This paper finally shows that the DRL agent learns a streaming policy that pursues high average quality while limiting packet drops, avoiding playback stalls, reducing quality fluctuations and saving backhaul usage. △ Less

Submitted 12 October, 2021; originally announced November 2021.

Comments: 15 pages, 8 figures, Submitted to IEEE Transactions on Vehicular Technology

arXiv:2106.14203 [pdf, other]

Joint Mobile Charging and Coverage-Time Extension for Unmanned Aerial Vehicles

Authors: Soohyun Park, Won-Yong Shin, Minseok Choi, Joongheon Kim

Abstract: In modern networks, the use of drones as mobile base stations (MBSs) has been discussed for coverage flexibility. However, the realization of drone-based networks raises several issues. One of the critical issues is drones are extremely power-hungry. To overcome this, we need to characterize a new type of drones, so-called charging drones, which can deliver energy to MBS drones. Motivated by the f… ▽ More In modern networks, the use of drones as mobile base stations (MBSs) has been discussed for coverage flexibility. However, the realization of drone-based networks raises several issues. One of the critical issues is drones are extremely power-hungry. To overcome this, we need to characterize a new type of drones, so-called charging drones, which can deliver energy to MBS drones. Motivated by the fact that the charging drones also need to be charged, we deploy ground-mounted charging towers for delivering energy to the charging drones. We introduce a new energy-efficiency maximization problem, which is partitioned into two independently separable tasks. More specifically, as our first optimization task, two-stage charging matching is proposed due to the inherent nature of our network model, where the first matching aims to schedule between charging towers and charging drones while the second matching solves the scheduling between charging drones and MBS drones. We analyze how to convert the formulation containing non-convex terms to another one only with convex terms. As our second optimization task, each MBS drone conducts energy-aware time-average transmit power allocation minimization subject to stability via Lyapunov optimization. Our solutions enable the MBS drones to extend their lifetimes; in turn, network coverage-time can be extended. △ Less

Submitted 27 June, 2021; originally announced June 2021.

arXiv:2101.09566 [pdf, other]

doi 10.1007/s00158-020-02803-0

Isogeometric Configuration Design Optimization of Three-dimensional Curved Beam Structures for Maximal Fundamental Frequency

Authors: Myung-** Choi, Jae-Hyun Kim, Bonyong Koo, Seonho Cho

Abstract: This paper presents a configuration design optimization method for three-dimensional curved beam built-up structures having maximized fundamental eigenfrequency. We develop the method of computation of design velocity field and optimal design of beam structures constrained on a curved surface, where both designs of the embedded beams and the curved surface are simultaneously varied during the opti… ▽ More This paper presents a configuration design optimization method for three-dimensional curved beam built-up structures having maximized fundamental eigenfrequency. We develop the method of computation of design velocity field and optimal design of beam structures constrained on a curved surface, where both designs of the embedded beams and the curved surface are simultaneously varied during the optimal design process. A shear-deformable beam model is used in the response analyses of structural vibrations within an isogeometric framework using the NURBS basis functions. An analytical design sensitivity expression of repeated eigenvalues is derived. The developed method is demonstrated through several illustrative examples. △ Less

Submitted 23 January, 2021; originally announced January 2021.

Comments: This document is the personal version of an article whose final publication is available at https://doi.org/10.1007/s00158-020-02803-0

Journal ref: Structural and Multidisciplinary Optimization, 2021

arXiv:2008.10267 [pdf, other]

A Computational Analysis of Real-World DJ Mixes using Mix-To-Track Subsequence Alignment

Authors: Taejun Kim, Minsuk Choi, Evan Sacks, Yi-Hsuan Yang, Juhan Nam

Abstract: A DJ mix is a sequence of music tracks concatenated seamlessly, typically rendered for audiences in a live setting by a DJ on stage. As a DJ mix is produced in a studio or the live version is recorded for music streaming services, computational methods to analyze DJ mixes, for example, extracting track information or understanding DJ techniques, have drawn research interests. Many of previous work… ▽ More A DJ mix is a sequence of music tracks concatenated seamlessly, typically rendered for audiences in a live setting by a DJ on stage. As a DJ mix is produced in a studio or the live version is recorded for music streaming services, computational methods to analyze DJ mixes, for example, extracting track information or understanding DJ techniques, have drawn research interests. Many of previous works are, however, limited to identifying individual tracks in a mix or segmenting it, and the sizes of the datasets are usually small. In this paper, we provide an in-depth analysis of DJ music by aligning a mix to its original music tracks. We set up the subsequence alignment such that the audio features are less sensitive to the tempo or key change of the original track in a mix. This approach provides temporally tight mix-to-track matching from which we can obtain cue-points, transition length, mix segmentation, and musical changes in DJ performance. Using 1,557 mixes from 1001Tracklists including 13,728 tracks and 20,765 transitions, we conduct the proposed analysis and show a wide range of statistics, which may elucidate the creative process of DJ music making. △ Less

Submitted 24 August, 2020; originally announced August 2020.

Comments: Accepted for publication at 21st International Society for Music Information Retrieval Conference (ISMIR 2020)

arXiv:1911.13010 [pdf, other]

Joint Distributed Link Scheduling and Power Allocation for Content Delivery in Wireless Caching Networks

Authors: Minseok Choi, Andreas F. Molisch, Joongheon Kim

Abstract: In wireless caching networks, the design of the content delivery method must consider random user requests, caching states, network topology, and interference management. In this paper, we establish a general framework for content delivery in wireless caching networks without stringent assumptions that restrict the network structure, delivery link, and interference model. Based on the framework, w… ▽ More In wireless caching networks, the design of the content delivery method must consider random user requests, caching states, network topology, and interference management. In this paper, we establish a general framework for content delivery in wireless caching networks without stringent assumptions that restrict the network structure, delivery link, and interference model. Based on the framework, we propose a dynamic and distributed link scheduling and power allocation scheme for content delivery that is assisted by belief-propagation (BP) algorithms. Considering content-requesting users and potential caching nodes, the scheme achieves three critical purposes of wireless caching networks: 1) limiting the delay of user request satisfactions, 2) maintaining the power efficiency of caching nodes, and 3) managing interference among users. In addition, we address the intrinsic problem of the BP algorithm in our network model, proposing a matching algorithm for one-to-one link scheduling. Simulation results show that the proposed scheme provides almost the same delay performance as the optimal scheme found through an exhaustive search at the expense of a little additional power consumption and does not require a clustering method and orthogonal resources in a large-scale D2D network. △ Less

Submitted 29 November, 2019; originally announced November 2019.

Comments: 30 pages, 13 figures

arXiv:1907.09184 [pdf, other]

Spectral data analysis methods for the two-dimensional imaging diagnostics

Authors: Minjun J. Choi

Abstract: Some spectral data analysis methods that are useful for the two-dimensional imaging diagnostics data are introduced. It is shown that the frequency spectrum, the local dispersion relation, the flow shear, and the nonlinear energy transfer rates can be estimated using the proper analysis methods. Some spectral data analysis methods that are useful for the two-dimensional imaging diagnostics data are introduced. It is shown that the frequency spectrum, the local dispersion relation, the flow shear, and the nonlinear energy transfer rates can be estimated using the proper analysis methods. △ Less

Submitted 27 August, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

arXiv:1807.00682 [pdf, other]

doi 10.1109/TWC.2019.2929809

Dynamic Power Allocation and User Scheduling for Power-Efficient and Low-Latency Communications

Authors: Minseok Choi, Joongheon Kim, Jaekyun Moon

Abstract: In this paper, we propose a joint dynamic power control and user pairing algorithm for power-efficient and low-latency hybrid multiple access systems. In a hybrid multiple access system, user pairing determines whether the transmitter should serve a certain user by orthogonal multiple access (OMA) or non-orthogonal multiple access (NOMA). The proposed optimization framework minimizes the long-term… ▽ More In this paper, we propose a joint dynamic power control and user pairing algorithm for power-efficient and low-latency hybrid multiple access systems. In a hybrid multiple access system, user pairing determines whether the transmitter should serve a certain user by orthogonal multiple access (OMA) or non-orthogonal multiple access (NOMA). The proposed optimization framework minimizes the long-term time-average transmit power expenditure while reducing the queueing delay and satisfying time-average data rate requirements. The proposed technique observes channel and queue state information and adjusts queue backlogs to avoid an excessive queueing delay by appropriate user pairing and power allocation. Further, user scheduling for determining the activation of a given user link as well as flexible use of resources are captured in the proposed algorithm. Data-intensive simulation results show that the proposed scheme guarantees an end-to-end delay smaller than 1 ms with high power-efficiency and high reliability, based on the short frame structure designed for ultra-reliable low-latency communications (URLLC). △ Less

Submitted 28 June, 2018; originally announced July 2018.

Comments: 30 pages, 10 figures, Submission to IEEE Journal on Selected Areas in Communication

Journal ref: IEEE Transactions on Wireless Communications, 26 July, 2019

Showing 1–18 of 18 results for author: Choi, M