Skip to main content

Showing 1–50 of 79 results for author: Naeem, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03674  [pdf, other

    cs.LG

    Short-Long Policy Evaluation with Novel Actions

    Authors: Hyunji Alex Nam, Yash Chandak, Emma Brunskill

    Abstract: From incorporating LLMs in education, to identifying new drugs and improving ways to charge batteries, innovators constantly try new strategies in search of better long-term outcomes for students, patients and consumers. One major bottleneck in this innovation cycle is the amount of time it takes to observe the downstream effects of a decision policy that incorporates new interventions. The key qu… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2406.15725  [pdf, other

    eess.AS cs.SD

    Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes

    Authors: Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park

    Abstract: To tackle sound event detection (SED) task, we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency war** and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 Challenge Task 4 technical report

  3. arXiv:2406.13312  [pdf, other

    eess.AS cs.SD

    Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution

    Authors: Hyeonuk Nam, Yong-Hwa Park

    Abstract: Frequency dynamic convolution (FDY conv) has been a milestone in the sound event detection (SED) field, but it involves a substantial increase in model size due to multiple basis kernels. In this work, we propose partial frequency dynamic convolution (PFD conv), which concatenates static convolution output and dynamic FDY conv output in order to minimize model size increase while maintaining the p… ▽ More

    Submitted 7 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  4. arXiv:2406.08070  [pdf, ps, other

    cs.CV cs.AI cs.LG

    CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

    Authors: Hyung** Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, Jong Chul Ye

    Abstract: Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG has notable drawbacks. For instance, DDIM with CFG lacks invertibility, complicating image editing; furthermore, high guidance scales, essential for high-quality outputs, frequently result in issues like mode collapse. Contrary to the widespread belief that these are… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2406.05341  [pdf, other

    eess.AS cs.SD

    Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

    Abstract: Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patte… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  6. arXiv:2406.03494  [pdf, other

    cs.LG math.NA stat.ML

    Solving Poisson Equations using Neural Walk-on-Spheres

    Authors: Hong Chul Nam, Julius Berner, Anima Anandkumar

    Abstract: We propose Neural Walk-on-Spheres (NWoS), a novel neural PDE solver for the efficient solution of high-dimensional Poisson equations. Leveraging stochastic representations and Walk-on-Spheres methods, we develop novel losses for neural networks based on the recursive solution of Poisson equations on spheres inside the domain. The resulting method is highly parallelizable and does not require spati… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024

  7. arXiv:2405.11094  [pdf, other

    cs.RO

    YORI: Autonomous Cooking System Utilizing a Modular Robotic Kitchen and a Dual-Arm Proprioceptive Manipulator

    Authors: Donghun Noh, Hyunwoo Nam, Kyle Gillespie, Yeting Liu, Dennis Hong

    Abstract: This article introduces the development and implementation of the Yummy Operations Robot Initiative (YORI), an innovative, autonomous robotic cooking system. YORI marks a major advancement in culinary automation, adept at handling a diverse range of cooking tasks, capable of preparing multiple dishes simultaneously, and offering the flexibility to adapt to an extensive array of culinary activities… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: This manuscript is 13 pages long, includes 10 figures, and cites 20 references. It is to be submitted

  8. arXiv:2405.02499  [pdf, other

    cs.CR cs.AR

    DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands

    Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

    Abstract: The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: To appear at the 51st IEEE/ACM International Symposium on Computer Architecture (ISCA)

  9. arXiv:2404.04819  [pdf, other

    cs.CV

    Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer

    Authors: Hyeong** Nam, Daniel Sungho Jung, Gyeongsik Moon, Kyoung Mu Lee

    Abstract: Human-object contact serves as a strong cue to understand how humans physically interact with objects. Nevertheless, it is not widely explored to utilize human-object contact information for the joint reconstruction of 3D human and object from a single image. In this work, we present a novel joint 3D human-object reconstruction method (CONTHO) that effectively exploits contact information between… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Published at CVPR 2024, 19 pages including the supplementary material

  10. arXiv:2403.16652  [pdf, other

    cs.RO eess.SY

    Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL

    Authors: Osama Ahmad, Zawar Hussain, Hammad Naeem

    Abstract: This study is about the implementation of a reinforcement learning algorithm in the trajectory planning of manipulators. We have a 7-DOF robotic arm to pick and place the randomly placed block at a random target point in an unknown environment. The obstacle is randomly moving which creates a hurdle in picking the object. The objective of the robot is to avoid the obstacle and pick the block with c… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted in ICIESTR-2024

  11. arXiv:2403.08187  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

    Authors: Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang, Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang, Hosung Nam

    Abstract: This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 12 pages, 2 figures

    ACM Class: I.2.7

  12. arXiv:2402.10595  [pdf, other

    cs.CV

    Compact and De-biased Negative Instance Embedding for Multi-Instance Learning on Whole-Slide Image Classification

    Authors: Joohyung Lee, Heejeong Nam, Kwanhyung Lee, Sangchul Hahn

    Abstract: Whole-slide image (WSI) classification is a challenging task because 1) patches from WSI lack annotation, and 2) WSI possesses unnecessary variability, e.g., stain protocol. Recently, Multiple-Instance Learning (MIL) has made significant progress, allowing for classification based on slide-level, rather than patch-level, annotations. However, existing MIL methods ignore that all patches from norma… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024

  13. arXiv:2401.04143  [pdf, other

    cs.CV

    RHOBIN Challenge: Reconstruction of Human Object Interaction

    Authors: Xianghui Xie, Xi Wang, Nikos Athanasiou, Bharat Lal Bhatnagar, Chun-Hao P. Huang, Kaichun Mo, Hao Chen, Xia Jia, Zerui Zhang, Liangxian Cui, Xiao Lin, Bingqiao Qian, Jie Xiao, Wenfei Yang, Hyeong** Nam, Daniel Sungho Jung, Kihoon Kim, Kyoung Mu Lee, Otmar Hilliges, Gerard Pons-Moll

    Abstract: Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate resear… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 14 pages, 5 tables, 7 figure. Technical report of the CVPR'23 workshop: RHOBIN challenge (https://rhobin-challenge.github.io/)

  14. arXiv:2312.15924  [pdf, other

    cs.IT eess.SP

    Modeling and Analysis of GEO Satellite Networks

    Authors: Dong-Hyun Jung, Hongjae Nam, Junil Choi, David J. Love

    Abstract: The extensive coverage offered by satellites makes them effective in enhancing service continuity for users on dynamic airborne and maritime platforms, such as airplanes and ships. In particular, geosynchronous Earth orbit (GEO) satellites ensure stable connectivity for terrestrial users due to their stationary characteristics when observed from Earth. This paper introduces a novel approach to mod… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: 12 pages, 9 figures, submitted to IEEE Transactions on Wireless Communications

  15. arXiv:2311.18608  [pdf, other

    cs.CV cs.AI cs.LG

    Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing

    Authors: Hyelin Nam, Gihyun Kwon, Geon Yeong Park, Jong Chul Ye

    Abstract: With the remarkable advent of text-to-image diffusion models, image editing methods have become more diverse and continue to evolve. A promising recent approach in this realm is Delta Denoising Score (DDS) - an image editing technique based on Score Distillation Sampling (SDS) framework that leverages the rich generative prior of text-to-image diffusion models. However, relying solely on the diffe… ▽ More

    Submitted 1 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: CVPR 2024 (poster); Project page: https://hyelinnam.github.io/CDS/

  16. arXiv:2311.13384  [pdf, other

    cs.CV

    LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

    Authors: Jaeyoung Chung, Suyoung Lee, Hyeong** Nam, Jaerin Lee, Kyoung Mu Lee

    Abstract: With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by… ▽ More

    Submitted 23 November, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: Project page: https://luciddreamer-cvlab.github.io/

  17. arXiv:2311.06567  [pdf, other

    cs.LG cs.AI cs.CV

    SCADI: Self-supervised Causal Disentanglement in Latent Variable Models

    Authors: Heejeong Nam

    Abstract: Causal disentanglement has great potential for capturing complex situations. However, there is a lack of practical and efficient approaches. It is already known that most unsupervised disentangling methods are unable to produce identifiable results without additional information, often leading to randomly disentangled output. Therefore, most existing models for disentangling are weakly supervised,… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

    Comments: 12 pages, 12 figures

  18. arXiv:2311.02010  [pdf, other

    cs.CY

    A cast of thousands: How the IDEAS Productivity project has advanced software productivity and sustainability

    Authors: Lois Curfman McInnes, Michael Heroux, David E. Bernholdt, Anshu Dubey, Elsa Gonsiorowski, Rinku Gupta, Osni Marques, J. David Moulton, Hai Ah Nam, Boyana Norris, Elaine M. Raybourn, Jim Willenbring, Ann Almgren, Ross Bartlett, Kita Cranfill, Stephen Fickas, Don Frederick, William Godoy, Patricia Grubel, Rebecca Hartman-Baker, Axel Huebl, Rose Lynch, Addi Malviya Thakur, Reed Milewicz, Mark C. Miller , et al. (9 additional authors not shown)

    Abstract: Computational and data-enabled science and engineering are revolutionizing advances throughout science and society, at all scales of computing. For example, teams in the U.S. DOE Exascale Computing Project have been tackling new frontiers in modeling, simulation, and analysis by exploiting unprecedented exascale computing capabilities-building an advanced software ecosystem that supports next-gene… ▽ More

    Submitted 16 February, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: 12 pages, 1 figure

  19. Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources

    Authors: Konstantinos Kanellopoulos, Hong Chul Nam, F. Nisa Bostanci, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Davide-Basilio Bartolini, Onur Mutlu

    Abstract: Address translation is a performance bottleneck in data-intensive workloads due to large datasets and irregular access patterns that lead to frequent high-latency page table walks (PTWs). PTWs can be reduced by using (i) large hardware TLBs or (ii) large software-managed TLBs. Unfortunately, both solutions have significant drawbacks: increased access latency, power and area (for hardware TLBs), an… ▽ More

    Submitted 5 January, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: To appear in 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023

    ACM Class: C.0

  20. arXiv:2309.11127  [pdf, other

    eess.SP cs.AI cs.CL

    Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation

    Authors: Hyelin Nam, Jihong Park, **ho Choi, Mehdi Bennis, Seong-Lyun Kim

    Abstract: By integrating recent advances in large language models (LLMs) and generative models into the emerging semantic communication (SC) paradigm, in this article we put forward to a novel framework of language-oriented semantic communication (LSC). In LSC, machines communicate using human language messages that can be interpreted and manipulated via natural language processing (NLP) techniques for SC e… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures, submitted to 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

  21. arXiv:2309.04287  [pdf, other

    eess.SP cs.AI

    Sequential Semantic Generative Communication for Progressive Text-to-Image Generation

    Authors: Hyelin Nam, Jihong Park, **ho Choi, Seong-Lyun Kim

    Abstract: This paper proposes new framework of communication system leveraging promising generation capabilities of multi-modal generative models. Regarding nowadays smart applications, successful communication can be made by conveying the perceptual meaning, which we set as text prompt. Text serves as a suitable semantic representation of image data as it has evolved to instruct an image or generate image… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 4 pages, 2 figures, to be published in IEEE International Conference on Sensing, Communication, and Networking, Workshop on Semantic Communication for 6G (SC6G-SECON23)

  22. arXiv:2309.00349  [pdf

    physics.chem-ph cs.LG

    Bespoke Nanoparticle Synthesis and Chemical Knowledge Discovery Via Autonomous Experimentations

    Authors: Hyuk Jun Yoo, Nayeon Kim, Heeseung Lee, Daeho Kim, Leslie Tiong Ching Ow, Hyobin Nam, Chansoo Kim, Seung Yong Lee, Kwan-Young Lee, Donghun Kim, Sang Soo Han

    Abstract: The optimization of nanomaterial synthesis using numerous synthetic variables is considered to be extremely laborious task because the conventional combinatorial explorations are prohibitively expensive. In this work, we report an autonomous experimentation platform developed for the bespoke design of nanoparticles (NPs) with targeted optical properties. This platform operates in a closed-loop man… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  23. Enhancing State Estimator for Autonomous Racing : Leveraging Multi-modal System and Managing Computing Resources

    Authors: Daegyu Lee, Hyunwoo Nam, Chanhoe Ryu, Sungwon Nah, Seongwoo Moon, D. Hyunchul Shim

    Abstract: This paper introduces an approach that enhances the state estimator for high-speed autonomous race cars, addressing challenges from unreliable measurements, localization failures, and computing resource management. The proposed robust localization system utilizes a Bayesian-based probabilistic approach to evaluate multimodal measurements, ensuring the use of credible data for accurate and reliable… ▽ More

    Submitted 12 February, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2207.12232

    Journal ref: IEEE Transactions on Intelligent Vehicles(2024)

  24. arXiv:2308.06554  [pdf, other

    cs.CV

    Cyclic Test-Time Adaptation on Monocular Video for 3D Human Mesh Reconstruction

    Authors: Hyeong** Nam, Daniel Sungho Jung, Yeonguk Oh, Kyoung Mu Lee

    Abstract: Despite recent advances in 3D human mesh reconstruction, domain gap between training and test data is still a major challenge. Several prior works tackle the domain gap problem via test-time adaptation that fine-tunes a network relying on 2D evidence (e.g., 2D human keypoints) from test images. However, the high reliance on 2D evidence during adaptation causes two major issues. First, 2D evidence… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

    Comments: Published at ICCV 2023, 16 pages including the supplementary material

  25. arXiv:2306.11277  [pdf, other

    cs.SD eess.AS

    Frequency & Channel Attention for Computationally Efficient Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Yong-Hwa Park

    Abstract: We explore on various attention methods on frequency and channel dimensions for sound event detection (SED) in order to enhance performance with minimal increase in computational cost while leveraging domain knowledge to address the frequency dimension of audio data. We have introduced frequency dynamic convolution (FDY conv) in a previous work to release the translational equivariance issue assoc… ▽ More

    Submitted 28 August, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted to DCASE 2023 workshop

  26. arXiv:2306.05004  [pdf, other

    eess.AS cs.AI cs.SD

    VIFS: An End-to-End Variational Inference for Foley Sound Synthesis

    Authors: Junhyeok Lee, Hyeonuk Nam, Yong-Hwa Park

    Abstract: The goal of DCASE 2023 Challenge Task 7 is to generate various sound clips for Foley sound synthesis (FSS) by "category-to-sound" approach. "Category" is expressed by a single index while corresponding "sound" covers diverse and different sound examples. To generate diverse sounds for a given category, we adopt VITS, a text-to-speech (TTS) model with variational inference. In addition, we apply va… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: DCASE 2023 Challenge Task 7

  27. arXiv:2306.04014  [pdf, other

    cs.DC

    Evaluating the Potential of Disaggregated Memory Systems for HPC applications

    Authors: Nan Ding, Pieter Maris, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, LeAnn Lindsey, Christopher Daley, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, Samuel Williams

    Abstract: Disaggregated memory is a promising approach that addresses the limitations of traditional memory architectures by enabling memory to be decoupled from compute nodes and shared across a data center. Cloud platforms have deployed such systems to improve overall system memory utilization, but performance can vary across workloads. High-performance computing (HPC) is crucial in scientific and enginee… ▽ More

    Submitted 16 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: The submission builds on the following conference paper: N. Ding, S. Williams, H.A. Nam, et al. Methodology for Evaluating the Potential of Disaggregated Memory Systems,2nd International Workshop on RESource DISaggregation in High-Performance Computing (RESDIS), November 18, 2022. It is now submitted to the CCPE journal for review

  28. X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands

    Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

    Abstract: The demand for accurate information about the internal structure and characteristics of dynamic random-access memory (DRAM) has been on the rise. Recent studies have explored the structure and characteristics of DRAM to improve processing in memory, enhance reliability, and mitigate a vulnerability known as rowhammer. However, DRAM manufacturers only disclose limited information through official d… ▽ More

    Submitted 12 August, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: 4 pages, 7 figures, accepted at IEEE Computer Architecture Letters

  29. arXiv:2303.14998  [pdf, other

    cs.CV cs.AI

    Multi-view Cross-Modality MR Image Translation for Vestibular Schwannoma and Cochlea Segmentation

    Authors: Bogyeong Kang, Hyeonyeong Nam, Ji-Wung Han, Keun-Soo Heo, Tae-Eui Kam

    Abstract: In this work, we propose a multi-view image translation framework, which can translate contrast-enhanced T1 (ceT1) MR imaging to high-resolution T2 (hrT2) MR imaging for unsupervised vestibular schwannoma and cochlea segmentation. We adopt two image translation models in parallel that use a pixel-level consistent constraint and a patch-level contrastive constraint, respectively. Thereby, we can au… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: 9 pages, 4 figures

  30. arXiv:2303.05370  [pdf, other

    cs.CV

    Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation

    Authors: Hongsuk Choi, Hyeong** Nam, Taeryung Lee, Gyeongsik Moon, Kyoung Mu Lee

    Abstract: Recently, a few self-supervised representation learning (SSL) methods have outperformed the ImageNet classification pre-training for vision tasks such as object detection. However, its effects on 3D human body pose and shape estimation (3DHPSE) are open to question, whose target is fixed to a unique class, the human, and has an inherent task gap with SSL. We empirically study and analyze the effec… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

    Comments: Accepted to ICLR 2023, 18 pages including the appendix

  31. LaplacianFusion: Detailed 3D Clothed-Human Body Reconstruction

    Authors: Hyomin Kim, Hyeonseo Nam, Jungeon Kim, Jaesik Park, Seungyong Lee

    Abstract: We propose LaplacianFusion, a novel approach that reconstructs detailed and controllable 3D clothed-human body shapes from an input depth or 3D point cloud sequence. The key idea of our approach is to use Laplacian coordinates, well-known differential coordinates that have been used for mesh editing, for representing the local structures contained in the input scans, instead of implicit 3D functio… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Journal ref: ACM Transactions on Graphics (TOG) 41.6 (2022): 1-14

  32. arXiv:2302.09422  [pdf, other

    cs.LG

    Neural Attention Memory

    Authors: Hyoungwook Nam, Seung Byum Seo

    Abstract: We propose a novel perspective of the attention mechanism by reinventing it as a memory architecture for neural networks, namely Neural Attention Memory (NAM). NAM is a memory structure that is both readable and writable via differentiable linear algebra operations. We explore three use cases of NAM: memory-augmented neural network (MANN), few-shot learning, and efficient long-range attention. Fir… ▽ More

    Submitted 14 October, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: Preprint. Under review

  33. arXiv:2302.01474  [pdf, other

    cs.CR cs.AR cs.LG

    Defensive ML: Defending Architectural Side-channels with Adversarial Obfuscation

    Authors: Hyoungwook Nam, Raghavendra Pradyumna Pothukuchi, Bo Li, Nam Sung Kim, Josep Torrellas

    Abstract: Side-channel attacks that use machine learning (ML) for signal analysis have become prominent threats to computer security, as ML models easily find patterns in signals. To address this problem, this paper explores using Adversarial Machine Learning (AML) methods as a defense at the computer architecture layer to obfuscate side channels. We call this approach Defensive ML, and the generator to obf… ▽ More

    Submitted 14 October, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: Preprint. Under review

  34. arXiv:2212.12844  [pdf, other

    eess.IV cs.CV

    Weakly-Supervised Deep Learning Model for Prostate Cancer Diagnosis and Gleason Grading of Histopathology Images

    Authors: Mohammad Mahdi Behzadi, Mohammad Madani, Hanzhang Wang, Jun Bai, Ankit Bhardwaj, Anna Tarakanova, Harold Yamase, Ga Hie Nam, Sheida Nabavi

    Abstract: Prostate cancer is the most common cancer in men worldwide and the second leading cause of cancer death in the United States. One of the prognostic features in prostate cancer is the Gleason grading of histopathology images. The Gleason grade is assigned based on tumor architecture on Hematoxylin and Eosin (H&E) stained whole slide images (WSI) by the pathologists. This process is time-consuming a… ▽ More

    Submitted 24 December, 2022; originally announced December 2022.

  35. arXiv:2210.15986  [pdf, other

    cs.DC cs.CV cs.LG

    Differentially Private CutMix for Split Learning with Vision Transformer

    Authors: Seungeun Oh, Jihong Park, Sihun Baek, Hyelin Nam, Praneeth Vepakomma, Ramesh Raskar, Mehdi Bennis, Seong-Lyun Kim

    Abstract: Recently, vision transformer (ViT) has started to outpace the conventional CNN in computer vision tasks. Considering privacy-preserving distributed learning with ViT, federated learning (FL) communicates models, which becomes ill-suited due to ViT' s large model size and computing costs. Split learning (SL) detours this by communicating smashed data at a cut-layer, yet suffers from data privacy le… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: to be presented at the 36nd Conference on Neural Information Processing Systems (NeurIPS 2022), First Workshop on Interpolation Regularizers and Beyond (INTERPOLATE), New Orleans, United States

  36. arXiv:2207.10053  [pdf, other

    cs.CV

    3D Clothed Human Reconstruction in the Wild

    Authors: Gyeongsik Moon, Hyeong** Nam, Takaaki Shiratori, Kyoung Mu Lee

    Abstract: Although much progress has been made in 3D clothed human reconstruction, most of the existing methods fail to produce robust results from in-the-wild images, which contain diverse human poses and appearances. This is mainly due to the large domain gap between training datasets and in-the-wild datasets. The training datasets are usually synthetic ones, which contain rendered images from GT 3D scans… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022, 25 pages including the supplementary material

  37. arXiv:2206.12059  [pdf

    eess.AS cs.SD

    Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes

    Authors: Byeong-Yun Ko, Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Seung-Deok Choi, Yong-Hwa Park

    Abstract: Performance of sound event localization and detection (SELD) in real scenes is limited by small size of SELD dataset, due to difficulty in obtaining sufficient amount of realistic multi-channel audio data recordings with accurate label. We used two main strategies to solve problems arising from the small real SELD dataset. First, we applied various data augmentation methods on all data dimensions:… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Report submitted for DCASE2022 Challenge Task3

  38. arXiv:2206.08585  [pdf, other

    cs.CV

    HairFIT: Pose-Invariant Hairstyle Transfer via Flow-based Hair Alignment and Semantic-Region-Aware Inpainting

    Authors: Chaeyeon Chung, Taewoo Kim, Hyelin Nam, Seunghwan Choi, Gyojung Gu, Sunghyun Park, Jaegul Choo

    Abstract: Hairstyle transfer is the task of modifying a source hairstyle to a target one. Although recent hairstyle transfer models can reflect the delicate features of hairstyles, they still have two major limitations. First, the existing methods fail to transfer hairstyles when a source and a target image have different poses (e.g., viewing direction or face size), which is prevalent in the real world. Al… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: BMVC 2021 Oral Presentation

  39. arXiv:2205.01679  [pdf, other

    eess.IV cs.CV

    Physics to the Rescue: Deep Non-line-of-sight Reconstruction for High-speed Imaging

    Authors: Fangzhou Mu, Sicheng Mo, Jiayong Peng, Xiaochun Liu, Ji Hyun Nam, Siddeshwar Raghavan, Andreas Velten, Yin Li

    Abstract: Computational approach to imaging around the corner, or non-line-of-sight (NLOS) imaging, is becoming a reality thanks to major advances in imaging hardware and reconstruction algorithms. A recent development towards practical NLOS imaging, Nam et al. demonstrated a high-speed non-confocal imaging system that operates at 5Hz, 100x faster than the prior art. This enormous gain in acquisition rate,… ▽ More

    Submitted 5 August, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

    Comments: ICCP 2022 (TPAMI Special Issue on Computational Photography). Project page: https://pages.cs.wisc.edu/~fmu/nlos3d/

  40. arXiv:2205.00084  [pdf

    q-bio.QM cs.AI cs.LG

    Infusing Linguistic Knowledge of SMILES into Chemical Language Models

    Authors: Ingoo Lee, Hojung Nam

    Abstract: The simplified molecular-input line-entry system (SMILES) is the most popular representation of chemical compounds. Therefore, many SMILES-based molecular property prediction models have been developed. In particular, transformer-based models show promising performance because the model utilizes a massive chemical dataset for self-supervised learning. However, there is no transformer-based model t… ▽ More

    Submitted 19 April, 2022; originally announced May 2022.

    Comments: 8 pages, 4 figures

  41. arXiv:2203.15277  [pdf, other

    eess.AS cs.SD

    Decomposed Temporal Dynamic CNN: Efficient Time-Adaptive Network for Text-Independent Speaker Verification Explained with Speaker Activation Map

    Authors: Seong-Hu Kim, Hyeonuk Nam, Yong-Hwa Park

    Abstract: To extract accurate speaker information for text-independent speaker verification, temporal dynamic CNNs (TDY-CNNs) adapting kernels to each time bin was proposed. However, model size of TDY-CNN is too large and the adaptive kernel's degree of freedom is limited. To address these limitations, we propose decomposed temporal dynamic CNNs (DTDY-CNNs) which forms time-adaptive kernel by combining stat… ▽ More

    Submitted 27 October, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Submitted to ICASSP 2023

  42. arXiv:2201.02677  [pdf, other

    cs.CR cs.SE

    Predicting sensitive information leakage in IoT applications using flows-aware machine learning approach

    Authors: Hajra Naeem, Manar H. Alalfi

    Abstract: This paper presents an approach for identification of vulnerable IoT applications. The approach focuses on a category of vulnerabilities that leads to sensitive information leakage which can be identified by using taint flow analysis. Tainted flows vulnerability is very much impacted by the structure of the program and the order of the statements in the code, designing an approach to detect such v… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

  43. arXiv:2109.07783   

    eess.IV cs.CV

    Towards Non-Line-of-Sight Photography

    Authors: Jiayong Peng, Fangzhou Mu, Ji Hyun Nam, Siddeshwar Raghavan, Yin Li, Andreas Velten, Zhiwei Xiong

    Abstract: Non-line-of-sight (NLOS) imaging is based on capturing the multi-bounce indirect reflections from the hidden objects. Active NLOS imaging systems rely on the capture of the time of flight of light through the scene, and have shown great promise for the accurate and robust reconstruction of hidden scenes without the need for specialized scene setups and prior assumptions. Despite that existing meth… ▽ More

    Submitted 17 April, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

    Comments: The proposed method and dataset are required further validations

  44. Spatiotemporal Texture Reconstruction for Dynamic Objects Using a Single RGB-D Camera

    Authors: Hyomin Kim, Jungeon Kim, Hyeonseo Nam, Jaesik Park, Seungyong Lee

    Abstract: This paper presents an effective method for generating a spatiotemporal (time-varying) texture map for a dynamic object using a single RGB-D camera. The input of our framework is a 3D template model and an RGB-D image sequence. Since there are invisible areas of the object at a frame in a single-camera setup, textures of such areas need to be borrowed from other frames. We formulate the problem as… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Journal ref: Computer Graphics Forum. Vol. 40. No. 2. pp. 523-535. 2021

  45. Deep learning based cough detection camera using enhanced features

    Authors: Gyeong-Tae Lee, Hyeonuk Nam, Seong-Hu Kim, Sang-Min Choi, Youngkey Kim, Yong-Hwa Park

    Abstract: Coughing is a typical symptom of COVID-19. To detect and localize coughing sounds remotely, a convolutional neural network (CNN) based deep learning model was developed in this work and integrated with a sound camera for the visualization of the cough sounds. The cough detection model is a binary classifier of which the input is a two second acoustic feature and the output is one of two inferences… ▽ More

    Submitted 24 May, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: 28 pages, 20 figures, and 14 tables

    Journal ref: Expert Systems With Applications, Vol. 206, No. 15, pp. 1-20, 2022

  46. arXiv:2107.03649  [pdf

    eess.AS cs.SD

    Heavily Augmented Sound Event Detection utilizing Weak Predictions

    Authors: Hyeonuk Nam, Byeong-Yun Ko, Gyeong-Tae Lee, Seong-Hu Kim, Won-Ho Jung, Sang-Min Choi, Yong-Hwa Park

    Abstract: The performances of Sound Event Detection (SED) systems are greatly limited by the difficulty in generating large strongly labeled dataset. In this work, we used two main approaches to overcome the lack of strongly labeled data. First, we applied heavy data augmentation on input features. Data augmentation methods used include not only conventional methods used in speech/audio domains but also our… ▽ More

    Submitted 14 September, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

    Comments: Won 3rd place on IEEE DCASE 2021 Task 4

  47. arXiv:2103.12622  [pdf, other

    eess.IV cs.CV

    Virtual Light Transport Matrices for Non-Line-Of-Sight Imaging

    Authors: Julio Marco, Adrian Jarabo, Ji Hyun Nam, Xiaochun Liu, Miguel Ángel Cosculluela, Andreas Velten, Diego Gutierrez

    Abstract: The light transport matrix (LTM) is an instrumental tool in line-of-sight (LOS) imaging, describing how light interacts with the scene and enabling applications such as relighting or separation of illumination components. We introduce a framework to estimate the LTM of non-line-of-sight (NLOS) scenarios, coupling recent virtual forward light propagation models for NLOS imaging with the LOS light t… ▽ More

    Submitted 5 October, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: ICCV 2021 (Oral)

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 2440-2449

  48. arXiv:2101.04921  [pdf, other

    cs.LG cs.AI cs.CL

    Neural Sequence-to-grid Module for Learning Symbolic Rules

    Authors: Segwang Kim, Hyoungwook Nam, Joonyoung Kim, Kyomin Jung

    Abstract: Logical reasoning tasks over symbols, such as learning arithmetic operations and computer program evaluations, have become challenges to deep learning. In particular, even state-of-the-art neural networks fail to achieve \textit{out-of-distribution} (OOD) generalization of symbolic reasoning tasks, whereas humans can easily extend learned symbolic rules. To resolve this difficulty, we propose a ne… ▽ More

    Submitted 26 April, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Comments: 9 pages, 9 figures, AAAI 2021

  49. Real-time Non-line-of-Sight imaging of dynamic scenes

    Authors: Ji Hyun Nam, Eric Brandt, Sebastian Bauer, Xiaochun Liu, Eftychios Sifakis, Andreas Velten

    Abstract: Non-Line-of-Sight (NLOS) imaging aims at recovering the 3D geometry of objects that are hidden from the direct line of sight. In the past, this method has suffered from the weak available multibounce signal limiting scene size, capture speed, and reconstruction quality. While algorithms capable of reconstructing scenes at several frames per second have been demonstrated, real-time NLOS video has o… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Journal ref: Nature Communications 12, 6526 (2021)

  50. arXiv:2010.11917  [pdf, other

    cs.RO cs.AI cs.LG

    Batch Exploration with Examples for Scalable Robotic Reinforcement Learning

    Authors: Annie S. Chen, HyunJi Nam, Suraj Nair, Chelsea Finn

    Abstract: Learning from diverse offline datasets is a promising path towards learning general purpose robotic agents. However, a core challenge in this paradigm lies in collecting large amounts of meaningful data, while not depending on a human in the loop for data collection. One way to address this challenge is through task-agnostic exploration, where an agent attempts to explore without a task-specific r… ▽ More

    Submitted 23 April, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: 11 Pages, 11 Figures

    Journal ref: IEEE Robotics and Automation Letters ( Volume: 6, Issue: 3, July 2021)