Skip to main content

Showing 1–29 of 29 results for author: Lee, H k

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.02847  [pdf, other

    cs.LG stat.ML

    Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers

    Authors: Brian K Chen, Tianyang Hu, Hui **, Hwee Kuan Lee, Kenji Kawaguchi

    Abstract: In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias ter… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  2. arXiv:2405.20672  [pdf, other

    cs.CV

    Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbations

    Authors: Davide Coppola, Hwee Kuan Lee

    Abstract: This study explores the impact of adversarial perturbations on Convolutional Neural Networks (CNNs) with the aim of enhancing the understanding of their underlying mechanisms. Despite numerous defense methods proposed in the literature, there is still an incomplete understanding of this phenomenon. Instead of treating the entire model as vulnerable, we propose that specific feature maps learned du… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 22 pages, 15 figures (including appendix)

  3. arXiv:2403.15456  [pdf, other

    cs.AI cs.CL

    WoLF: Wide-scope Large Language Model Framework for CXR Understanding

    Authors: Seil Kang, Donghyun Kim, Junhyeok Kim, Hyo Kyung Lee, Seong Jae Hwang

    Abstract: Significant methodological strides have been made toward Chest X-ray (CXR) understanding via modern vision-language models (VLMs), demonstrating impressive Visual Question Answering (VQA) and CXR report generation abilities. However, existing CXR understanding frameworks still possess several procedural caveats. (1) Previous methods solely use CXR reports, which are insufficient for comprehensive… ▽ More

    Submitted 29 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 11 pages main paper, 2 pages supplementary

  4. arXiv:2402.17549  [pdf, other

    cs.DS

    FlipHash: A Constant-Time Consistent Range-Hashing Algorithm

    Authors: Charles Masson, Homin K. Lee

    Abstract: Consistent range-hashing is a technique used in distributed systems, either directly or as a subroutine for consistent hashing, commonly to realize an even and stable data distribution over a variable number of resources. We introduce FlipHash, a consistent range-hashing algorithm with constant time complexity and low memory requirements. Like Jump Consistent Hash, FlipHash is intended for applica… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 16 pages, 3 figures, 4 tables

    ACM Class: E.2; E.1

  5. arXiv:2402.08604  [pdf, other

    cs.DS cs.DB

    Sampling Space-Saving Set Sketches

    Authors: Homin K. Lee, Charles Masson

    Abstract: Large, distributed data streams are now ubiquitous. High-accuracy sketches with low memory overhead have become the de facto method for analyzing this data. For instance, if we wish to group data by some label and report the largest counts using fixed memory, we need to turn to mergeable heavy hitter sketches that can provide highly accurate approximate counts. Similarly, if we wish to keep track… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: 12 pages, 5 figures

    ACM Class: E.1

  6. arXiv:2401.10458  [pdf, other

    cs.LG cs.CR

    Contrastive Unlearning: A Contrastive Approach to Machine Unlearning

    Authors: Hong kyu Lee, Qiuchen Zhang, Carl Yang, Jian Lou, Li Xiong

    Abstract: Machine unlearning aims to eliminate the influence of a subset of training samples (i.e., unlearning samples) from a trained model. Effectively and efficiently removing the unlearning samples without negatively impacting the overall model performance is still challenging. In this paper, we propose a contrastive unlearning framework, leveraging the concept of representation learning for more effect… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  7. arXiv:2305.05869  [pdf, other

    cs.LG cs.CV

    Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation

    Authors: Jiyi Zhang, Han Fang, Hwee Kuan Lee, Ee-Chien Chang

    Abstract: Given a poorly documented neural network model, we take the perspective of a forensic investigator who wants to find out the model's data domain (e.g. whether on face images or traffic signs). Although existing methods such as membership inference and model inversion can be used to uncover some information about an unknown model, they still require knowledge of the data domain to start with. In th… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  8. DPAR: Decoupled Graph Neural Networks with Node-Level Differential Privacy

    Authors: Qiuchen Zhang, Hong kyu Lee, **g Ma, Jian Lou, Carl Yang, Li Xiong

    Abstract: Graph Neural Networks (GNNs) have achieved great success in learning with graph-structured data. Privacy concerns have also been raised for the trained models which could expose the sensitive information of graphs including both node features and the structure information. In this paper, we aim to achieve node-level differential privacy (DP) for training GNNs so that a node and its edges are prote… ▽ More

    Submitted 14 March, 2024; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted to The 2024 Web Conference

  9. arXiv:2206.07515  [pdf

    eess.SP cs.AI cs.LG

    A Deep Learning Network for the Classification of Intracardiac Electrograms in Atrial Tachycardia

    Authors: Zerui Chen, Sonia Xhyn Teo, Andrie Ochtman, Shier Nee Saw, Nicholas Cheng, Eric Tien Siang Lim, Murphy Lyu, Hwee Kuan Lee

    Abstract: A key technology enabling the success of catheter ablation treatment for atrial tachycardia is activation map**, which relies on manual local activation time (LAT) annotation of all acquired intracardiac electrogram (EGM) signals. This is a time-consuming and error-prone procedure, due to the difficulty in identifying the signal activation peaks for fractionated signals. This work presents a Dee… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: 34 pages, 10 figures

    ACM Class: J.3

  10. arXiv:2205.11366  [pdf

    cond-mat.stat-mech cs.LG

    Statistical inference as Green's functions

    Authors: Hyun Keun Lee, Chulan Kwon, Yong Woon Kim

    Abstract: Statistical inference from data is a foundational task in science. Recently, it has received growing attention for its central role in inference systems of primary interest in data sciences and machine learning. However, the understanding of statistical inference is not that solid while remains as a matter of subjective belief or as the routine procedures once claimed objective. We here show that… ▽ More

    Submitted 10 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: 21 pages, 1 figure

  11. arXiv:2201.09534  [pdf, other

    cs.LG cs.AI

    PaRT: Parallel Learning Towards Robust and Transparent AI

    Authors: Mahsa Paknezhad, Hamsawardhini Rengarajan, Chenghao Yuan, Sujanya Suresh, Manas Gupta, Savitha Ramasamy, Hwee Kuan Lee

    Abstract: This paper takes a parallel learning approach for robust and transparent AI. A deep neural network is trained in parallel on multiple tasks, where each task is trained only on a subset of the network resources. Each subset consists of network segments, that can be combined and shared across specific tasks. Tasks can share resources with other tasks, while having independent task-related network re… ▽ More

    Submitted 23 February, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

  12. An End-to-End Breast Tumour Classification Model Using Context-Based Patch Modelling- A BiLSTM Approach for Image Classification

    Authors: Suvidha Tripathi, Satish Kumar Singh, Hwee Kuan Lee

    Abstract: Researchers working on computational analysis of Whole Slide Images (WSIs) in histopathology have primarily resorted to patch-based modelling due to large resolution of each WSI. The large resolution makes WSIs infeasible to be fed directly into the machine learning models due to computational constraints. However, due to patch-based analysis, most of the current methods fail to exploit the underl… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

    Comments: 36 pages, 5 figures, 9 tables. Published in Computerized Medical Imaging and Graphics

    Journal ref: Computerized Medical Imaging and Graphics, 87, 101838 (2021)

  13. arXiv:2103.00778  [pdf, other

    cs.AI

    Explaining Adversarial Vulnerability with a Data Sparsity Hypothesis

    Authors: Mahsa Paknezhad, Cuong Phuc Ngo, Amadeus Aristo Winarto, Alistair Cheong, Chuen Yang Beh, Jiayang Wu, Hwee Kuan Lee

    Abstract: Despite many proposed algorithms to provide robustness to deep learning (DL) models, DL models remain susceptible to adversarial attacks. We hypothesize that the adversarial vulnerability of DL models stems from two factors. The first factor is data sparsity which is that in the high dimensional input data space, there exist large regions outside the support of the data distribution. The second fa… ▽ More

    Submitted 17 February, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

    Journal ref: Neurocomputing, 2022

  14. arXiv:2101.12505  [pdf, other

    eess.IV cs.CV

    Automated Deep Learning Analysis of Angiography Video Sequences for Coronary Artery Disease

    Authors: Chengyang Zhou, Thao Vy Dinh, Heyi Kong, Jonathan Yap, Khung Keong Yeo, Hwee Kuan Lee, Kaicheng Liang

    Abstract: The evaluation of obstructions (stenosis) in coronary arteries is currently done by a physician's visual assessment of coronary angiography video sequences. It is laborious, and can be susceptible to interobserver variation. Prior studies have attempted to automate this process, but few have demonstrated an integrated suite of algorithms for the end-to-end analysis of angiograms. We report an auto… ▽ More

    Submitted 29 January, 2021; originally announced January 2021.

  15. arXiv:2006.01561  [pdf, other

    cs.CV cs.LG

    Studying The Effect of MIL Pooling Filters on MIL Tasks

    Authors: Mustafa Umit Oner, Jared Marc Song Kye-Jet, Hwee Kuan Lee, Wing-Kin Sung

    Abstract: There are different multiple instance learning (MIL) pooling filters used in MIL models. In this paper, we study the effect of different MIL pooling filters on the performance of MIL models in real world MIL tasks. We designed a neural network based MIL framework with 5 different MIL pooling filters: `max', `mean', `attention', `distribution' and `distribution with attention'. We also formulated 5… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: 16 pages

  16. arXiv:2003.02732  [pdf, other

    cs.CR cs.LG

    Confusing and Detecting ML Adversarial Attacks with Injected Attractors

    Authors: Jiyi Zhang, Ee-Chien Chang, Hwee Kuan Lee

    Abstract: Many machine learning adversarial attacks find adversarial samples of a victim model ${\mathcal M}$ by following the gradient of some attack objective functions, either explicitly or implicitly. To confuse and detect such attacks, we take the proactive approach that modifies those functions with the goal of misleading the attacks to some local minimals, or to some designated regions that can be ea… ▽ More

    Submitted 8 March, 2021; v1 submitted 5 March, 2020; originally announced March 2020.

  17. arXiv:2002.12588  [pdf, other

    eess.IV cs.CV cs.LG

    Regional Registration of Whole Slide Image Stacks Containing Highly Deformed Artefacts

    Authors: Mahsa Paknezhad, Sheng Yang Michael Loh, Yukti Choudhury, Valerie Koh Cui Koh, TimothyTay Kwang Yong, Hui Shan Tan, Ravindran Kanesvaran, Puay Hoon Tan, John Yuen Shyi Peng, Weimiao Yu, Yongcheng Benjamin Tan, Yong Zhen Loy, Min-Han Tan, Hwee Kuan Lee

    Abstract: Motivation: High resolution 2D whole slide imaging provides rich information about the tissue structure. This information can be a lot richer if these 2D images can be stacked into a 3D tissue volume. A 3D analysis, however, requires accurate reconstruction of the tissue volume from the 2D image stack. This task is not trivial due to the distortions that each individual tissue slice experiences wh… ▽ More

    Submitted 28 February, 2020; originally announced February 2020.

  18. arXiv:1910.04030  [pdf, other

    eess.IV cs.CV

    Cribriform pattern detection in prostate histopathological images using deep learning models

    Authors: Malay Singh, Emarene Mationg Kalaw, Wang Jie, Mundher Al-Shabi, Chin Fong Wong, Danilo Medina Giron, Kian-Tai Chong, Maxine Tan, Zeng Zeng, Hwee Kuan Lee

    Abstract: Architecture, size, and shape of glands are most important patterns used by pathologists for assessment of cancer malignancy in prostate histopathological tissue slides. Varying structures of glands along with cumbersome manual observations may result in subjective and inconsistent assessment. Cribriform gland with irregular border is an important feature in Gleason pattern 4. We propose using dee… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

    Comments: 21 pages, 4 figures, 6 tables

  19. DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees

    Authors: Charles Masson, Jee E. Rim, Homin K. Lee

    Abstract: Summary statistics such as the mean and variance are easily maintained for large, distributed data streams, but order statistics (i.e., sample quantiles) can only be approximately summarized. There is extensive literature on maintaining quantile sketches where the emphasis has been on bounding the rank error of the sketch while using little memory. Unfortunately, rank error guarantees do not precl… ▽ More

    Submitted 28 August, 2019; originally announced August 2019.

    Comments: 11 pages, 11 figures, VLDB

    Journal ref: PVLDB, 12(12): 2195-2205, 2019

  20. arXiv:1906.07647  [pdf, other

    cs.CV cs.LG

    Weakly Supervised Clustering by Exploiting Unique Class Count

    Authors: Mustafa Umit Oner, Hwee Kuan Lee, Wing-Kin Sung

    Abstract: A weakly supervised learning based clustering framework is proposed in this paper. As the core of this framework, we introduce a novel multiple instance learning task based on a bag level label called unique class count ($ucc$), which is the number of unique classes among all instances inside the bag. In this task, no annotations on individual instances inside the bag are needed during training of… ▽ More

    Submitted 25 January, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: Published as a conference paper at ICLR 2020

  21. arXiv:1906.00258  [pdf, other

    cs.LG cs.CV

    Enhancing Transformation-based Defenses using a Distribution Classifier

    Authors: Connie Kou, Hwee Kuan Lee, Ee-Chien Chang, Teck Khim Ng

    Abstract: Adversarial attacks on convolutional neural networks (CNN) have gained significant attention and there have been active research efforts on defense mechanisms. Stochastic input transformation methods have been proposed, where the idea is to recover the image from adversarial attack by random transformation, and to take the majority vote as consensus among the random samples. However, the transform… ▽ More

    Submitted 30 January, 2020; v1 submitted 1 June, 2019; originally announced June 2019.

  22. arXiv:1904.01209  [pdf, other

    cs.LG stat.ML

    Fence GAN: Towards Better Anomaly Detection

    Authors: Cuong Phuc Ngo, Amadeus Aristo Winarto, Connie Kou Khor Li, Sojeong Park, Farhan Akram, Hwee Kuan Lee

    Abstract: Anomaly detection is a classical problem where the aim is to detect anomalous data that do not belong to the normal data distribution. Current state-of-the-art methods for anomaly detection on complex high-dimensional data are based on the generative adversarial network (GAN). However, the traditional GAN loss is not directly aligned with the anomaly detection objective: it encourages the distribu… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

  23. arXiv:1901.00120  [pdf

    cs.CV cs.LG stat.ML

    Gated-Dilated Networks for Lung Nodule Classification in CT scans

    Authors: Mundher Al-Shabi, Hwee Kuan Lee, Maxine Tan

    Abstract: Different types of Convolutional Neural Networks (CNNs) have been applied to detect cancerous lung nodules from computed tomography (CT) scans. However, the size of a nodule is very diverse and can range anywhere between 3 and 30 millimeters. The high variation of nodule sizes makes classifying them a difficult and challenging task. In this study, we propose a novel CNN architecture called Gated-D… ▽ More

    Submitted 14 December, 2019; v1 submitted 1 January, 2019; originally announced January 2019.

    Comments: Published in IEEE Access

  24. arXiv:1811.01506  [pdf, other

    cs.LG stat.ML

    Theoretical and Experimental Analysis on the Generalizability of Distribution Regression Network

    Authors: Connie Kou, Hwee Kuan Lee, Jorge Sanz, Teck Khim Ng

    Abstract: There is emerging interest in performing regression between distributions. In contrast to prediction on single instances, these machine learning methods can be useful for population-based studies or on problems that are inherently statistical in nature. The recently proposed distribution regression network (DRN) has shown superior performance for the distribution-to-distribution regression task co… ▽ More

    Submitted 31 May, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

  25. arXiv:1804.04775  [pdf, other

    cs.LG stat.ML

    A Compact Network Learning Model for Distribution Regression

    Authors: Connie Kou, Hwee Kuan Lee, Teck Khim Ng

    Abstract: Despite the superior performance of deep learning in many applications, challenges remain in the area of regression on function spaces. In particular, neural networks are unable to encode function inputs compactly as each node encodes just a real value. We propose a novel idea to address this shortcoming: to encode an entire function in a single network node. To that end, we design a compact netwo… ▽ More

    Submitted 10 July, 2018; v1 submitted 12 April, 2018; originally announced April 2018.

  26. arXiv:1802.04504  [pdf, other

    cs.LG

    Flipped-Adversarial AutoEncoders

    Authors: Jiyi Zhang, Hung Dang, Hwee Kuan Lee, Ee-Chien Chang

    Abstract: We propose a flipped-Adversarial AutoEncoder (FAAE) that simultaneously trains a generative model G that maps an arbitrary latent code distribution to a data distribution and an encoder E that embodies an "inverse map**" that encodes a data sample into a latent code vector. Unlike previous hybrid approaches that leverage adversarial training criterion in constructing autoencoders, FAAE minimizes… ▽ More

    Submitted 3 April, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

  27. arXiv:1106.0518  [pdf, ps, other

    cs.LG cs.CC cs.GT

    Submodular Functions Are Noise Stable

    Authors: Mahdi Cheraghchi, Adam Klivans, Pravesh Kothari, Homin K. Lee

    Abstract: We show that all non-negative submodular functions have high {\em noise-stability}. As a consequence, we obtain a polynomial-time learning algorithm for this class with respect to any product distribution on $\{-1,1\}^n$ (for any constant accuracy parameter $ε$). Our algorithm also succeeds in the agnostic setting. Previous work on learning submodular functions required either query access or stro… ▽ More

    Submitted 13 June, 2011; v1 submitted 2 June, 2011; originally announced June 2011.

  28. arXiv:0805.1765  [pdf, ps, other

    cs.CC

    Efficiently Testing Sparse GF(2) Polynomials

    Authors: Ilias Diakonikolas, Homin K. Lee, Kevin Matulef, Rocco A. Servedio, Andrew Wan

    Abstract: We give the first algorithm that is both query-efficient and time-efficient for testing whether an unknown function $f: \{0,1\}^n \to \{0,1\}$ is an $s$-sparse GF(2) polynomial versus $\eps$-far from every such polynomial. Our algorithm makes $\poly(s,1/\eps)$ black-box queries to $f$ and runs in time $n \cdot \poly(s,1/\eps)$. The only previous algorithm for this testing problem \cite{DLM+:07}… ▽ More

    Submitted 12 May, 2008; originally announced May 2008.

    Comments: Full version of ICALP 2008 paper

  29. arXiv:0803.0924  [pdf, other

    cs.LG cs.CC cs.CR cs.DB

    What Can We Learn Privately?

    Authors: Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, Adam Smith

    Abstract: Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy different… ▽ More

    Submitted 18 February, 2010; v1 submitted 6 March, 2008; originally announced March 2008.

    Comments: 35 pages, 2 figures

    Journal ref: SIAM Journal of Computing 40(3) (2011) 793-826