-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seong** Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Recent Advances, Applications, and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2023 Symposium
Authors:
Hyewon Jeong,
Sarah Jabbour,
Yuzhe Yang,
Rahul Thapta,
Hussein Mozannar,
William Jongwon Han,
Nikita Mehandru,
Michael Wornow,
Vladislav Lialin,
Xin Liu,
Alejandro Lozano,
Jiacheng Zhu,
Rafal Dariusz Kocielnik,
Keith Harrigian,
Haoran Zhang,
Edward Lee,
Milos Vukadinovic,
Aparna Balagopalan,
Vincent Jeanselme,
Katherine Matton,
Ilker Demirel,
Jason Fries,
Parisa Rashidi,
Brett Beaulieu-Jones,
Xuhai Orson Xu
, et al. (18 additional authors not shown)
Abstract:
The third ML4H symposium was held in person on December 10, 2023, in New Orleans, Louisiana, USA. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the \ac{ML4H} community. Encouraged by the successful virtual roundtables in the previous year, we organized eleven in-person roundtables and four vir…
▽ More
The third ML4H symposium was held in person on December 10, 2023, in New Orleans, Louisiana, USA. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the \ac{ML4H} community. Encouraged by the successful virtual roundtables in the previous year, we organized eleven in-person roundtables and four virtual roundtables at ML4H 2022. The organization of the research roundtables at the conference involved 17 Senior Chairs and 19 Junior Chairs across 11 tables. Each roundtable session included invited senior chairs (with substantial experience in the field), junior chairs (responsible for facilitating the discussion), and attendees from diverse backgrounds with interest in the session's topic. Herein we detail the organization process and compile takeaways from these roundtable discussions, including recent advances, applications, and open challenges for each topic. We conclude with a summary and lessons learned across all roundtables. This document serves as a comprehensive review paper, summarizing the recent advancements in machine learning for healthcare as contributed by foremost researchers in the field.
△ Less
Submitted 5 April, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
AI Models Close to your Chest: Robust Federated Learning Strategies for Multi-site CT
Authors:
Edward H. Lee,
Brendan Kelly,
Emre Altinmakas,
Hakan Dogan,
Maryam Mohammadzadeh,
Errol Colak,
Steve Fu,
Olivia Choudhury,
Ujjwal Ratan,
Felipe Kitamura,
Hernan Chaves,
Jimmy Zheng,
Mourad Said,
Eduardo Reis,
Jaekwang Lim,
Patricia Yokoo,
Courtney Mitchell,
Golnaz Houshmand,
Marzyeh Ghassemi,
Ronan Killeen,
Wendy Qiu,
Joel Hayden,
Farnaz Rafiee,
Chad Klochko,
Nicholas Bevins
, et al. (5 additional authors not shown)
Abstract:
While it is well known that population differences from genetics, sex, race, and environmental factors contribute to disease, AI studies in medicine have largely focused on locoregional patient cohorts with less diverse data sources. Such limitation stems from barriers to large-scale data share and ethical concerns over data privacy. Federated learning (FL) is one potential pathway for AI developm…
▽ More
While it is well known that population differences from genetics, sex, race, and environmental factors contribute to disease, AI studies in medicine have largely focused on locoregional patient cohorts with less diverse data sources. Such limitation stems from barriers to large-scale data share and ethical concerns over data privacy. Federated learning (FL) is one potential pathway for AI development that enables learning across hospitals without data share. In this study, we show the results of various FL strategies on one of the largest and most diverse COVID-19 chest CT datasets: 21 participating hospitals across five continents that comprise >10,000 patients with >1 million images. We also propose an FL strategy that leverages synthetically generated data to overcome class and size imbalances. We also describe the sources of data heterogeneity in the context of FL, and show how even among the correctly labeled populations, disparities can arise due to these biases.
△ Less
Submitted 13 April, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Lightweight feature encoder for wake-up word detection based on self-supervised speech representation
Authors:
Hyungjun Lim,
Younggwan Kim,
Kiho Yeom,
Eunjoo Seo,
Hoodong Lee,
Stanley Jungkyu Choi,
Honglak Lee
Abstract:
Self-supervised learning method that provides generalized speech representations has recently received increasing attention. Wav2vec 2.0 is the most famous example, showing remarkable performance in numerous downstream speech processing tasks. Despite its success, it is challenging to use it directly for wake-up word detection on mobile devices due to its expensive computational cost. In this work…
▽ More
Self-supervised learning method that provides generalized speech representations has recently received increasing attention. Wav2vec 2.0 is the most famous example, showing remarkable performance in numerous downstream speech processing tasks. Despite its success, it is challenging to use it directly for wake-up word detection on mobile devices due to its expensive computational cost. In this work, we propose LiteFEW, a lightweight feature encoder for wake-up word detection that preserves the inherent ability of wav2vec 2.0 with a minimum scale. In the method, the knowledge of the pre-trained wav2vec 2.0 is compressed by introducing an auto-encoder-based dimensionality reduction technique and distilled to LiteFEW. Experimental results on the open-source "Hey Snips" dataset show that the proposed method applied to various model structures significantly improves the performance, achieving over 20% of relative improvements with only 64k parameters.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
NanoBatch Privacy: Enabling fast Differentially Private learning on the IPU
Authors:
Edward H. Lee,
Mario Michael Krell,
Alexander Tsyplikhin,
Victoria Rege,
Errol Colak,
Kristen W. Yeom
Abstract:
Differentially private SGD (DPSGD) has recently shown promise in deep learning. However, compared to non-private SGD, the DPSGD algorithm places computational overheads that can undo the benefit of batching in GPUs. Micro-batching is a common method to alleviate this and is fully supported in the TensorFlow Privacy library (TFDP). However, it degrades accuracy. We propose NanoBatch Privacy, a ligh…
▽ More
Differentially private SGD (DPSGD) has recently shown promise in deep learning. However, compared to non-private SGD, the DPSGD algorithm places computational overheads that can undo the benefit of batching in GPUs. Micro-batching is a common method to alleviate this and is fully supported in the TensorFlow Privacy library (TFDP). However, it degrades accuracy. We propose NanoBatch Privacy, a lightweight add-on to TFDP to be used on Graphcore IPUs by leveraging batch size of 1 (without microbatching) and gradient accumulation. This allows us to achieve large total batch sizes with minimal impacts to throughput. Second, we illustrate using Cifar-10 how larger batch sizes are not necessarily optimal from a privacy versus utility perspective. On ImageNet, we achieve more than 15x speedup over TFDP versus 8x A100s and significant speedups even across libraries such as Opacus. We also provide two extensions: 1) DPSGD for pipelined models and 2) per-layer clip** that is 15x faster than the Opacus implementation on 8x A100s. Finally as an application case study, we apply NanoBatch training for use on private Covid-19 chest CT prediction.
△ Less
Submitted 2 June, 2022; v1 submitted 24 September, 2021;
originally announced September 2021.
-
Deep Sequential Learning for Cervical Spine Fracture Detection on Computed Tomography Imaging
Authors:
Hojjat Salehinejad,
Edward Ho,
Hui-Ming Lin,
Priscila Crivellaro,
Oleksandra Samorodova,
Monica Tafur Arciniegas,
Zamir Merali,
Suradech Suthiphosuwan,
Aditya Bharatha,
Kristen Yeom,
Muhammad Mamdani,
Jefferson Wilson,
Errol Colak
Abstract:
Fractures of the cervical spine are a medical emergency and may lead to permanent paralysis and even death. Accurate diagnosis in patients with suspected fractures by computed tomography (CT) is critical to patient management. In this paper, we propose a deep convolutional neural network (DCNN) with a bidirectional long-short term memory (BLSTM) layer for the automated detection of cervical spine…
▽ More
Fractures of the cervical spine are a medical emergency and may lead to permanent paralysis and even death. Accurate diagnosis in patients with suspected fractures by computed tomography (CT) is critical to patient management. In this paper, we propose a deep convolutional neural network (DCNN) with a bidirectional long-short term memory (BLSTM) layer for the automated detection of cervical spine fractures in CT axial images. We used an annotated dataset of 3,666 CT scans (729 positive and 2,937 negative cases) to train and validate the model. The validation results show a classification accuracy of 70.92% and 79.18% on the balanced (104 positive and 104 negative cases) and imbalanced (104 positive and 419 negative cases) test datasets, respectively.
△ Less
Submitted 5 February, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
Statistical multiscale map** of IDH1, MGMT, and microvascular proliferation in human brain tumors from multiparametric MR and spatially-registered core biopsy
Authors:
Jason G Parker,
PhD,
Emily E Diller,
MS,
Sha Cao,
PhD,
Jeremy T Nelson,
PhD,
Kristen Yeom,
MD,
Chang Ho,
MD,
Robert Lober,
MD,
PhD
Abstract:
We propose a statistical multiscale map** approach to identify microscopic and molecular heterogeneity across a tumor microenvironment using multiparametric MR (mp-MR). Twenty-nine patients underwent pre-surgical mp-MR followed by MR-guided stereotactic core biopsy. The locations of the biopsy cores were identified in the pre-surgical images using stereotactic bitmaps acquired during surgery. Fe…
▽ More
We propose a statistical multiscale map** approach to identify microscopic and molecular heterogeneity across a tumor microenvironment using multiparametric MR (mp-MR). Twenty-nine patients underwent pre-surgical mp-MR followed by MR-guided stereotactic core biopsy. The locations of the biopsy cores were identified in the pre-surgical images using stereotactic bitmaps acquired during surgery. Feature matrices mapped the multiparametric voxel values in the vicinity of the biopsy cores to the pathologic outcome variables for each patient and logistic regression tested the individual and collective predictive power of the MR contrasts. A non-parametric weighted k-nearest neighbor classifier evaluated the feature matrices in a leave-one-out cross validation design across patients. Resulting class membership probabilities were converted to chi-square statistics to develop full-brain parametric maps, implementing Gaussian random field theory to estimate inter-voxel dependencies. Corrections for family-wise error rates were performed using Benjamini-Hochberg and random field theory, and the resulting accuracies were compared. The combination of all five image contrasts correlated with outcome (P<.001) for all four microscopic variables. The probabilistic map** method using Benjamini-Hochberg generated statistically significant results (P<.05) for three of the four dependent variables: 1) IDH1, 2) MGMT, and 3) microvascular proliferation, with an average classification accuracy of 0.984 +/- 0.02 and an average classification sensitivity of 1.567% +/- 0.967. The images corrected by random field theory demonstrated improved classification accuracy (0.989 +/- 0.008) and classification sensitivity (5.967% +/- 2.857) compared with Benjamini-Hochberg. Microscopic and molecular tumor properties can be assessed with statistical confidence across the brain from minimally-invasive, mp-MR.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Learning-based Single-step Quantitative Susceptibility Map** Reconstruction Without Brain Extraction
Authors:
Hongjiang Wei,
Steven Cao,
Yuyao Zhang,
Xiaojun Guan,
Fuhua Yan,
Kristen W. Yeom,
Chunlei Liu
Abstract:
Quantitative susceptibility map** (QSM) estimates the underlying tissue magnetic susceptibility from MRI gradient-echo phase signal and typically requires several processing steps. These steps involve phase unwrap**, brain volume extraction, background phase removal and solving an ill-posed inverse problem. The resulting susceptibility map is known to suffer from inaccuracy near the edges of t…
▽ More
Quantitative susceptibility map** (QSM) estimates the underlying tissue magnetic susceptibility from MRI gradient-echo phase signal and typically requires several processing steps. These steps involve phase unwrap**, brain volume extraction, background phase removal and solving an ill-posed inverse problem. The resulting susceptibility map is known to suffer from inaccuracy near the edges of the brain tissues, in part due to imperfect brain extraction, edge erosion of the brain tissue and the lack of phase measurement outside the brain. This inaccuracy has thus hindered the application of QSM for measuring the susceptibility of tissues near the brain edges, e.g., quantifying cortical layers and generating superficial venography. To address these challenges, we propose a learning-based QSM reconstruction method that directly estimates the magnetic susceptibility from total phase images without the need for brain extraction and background phase removal, referred to as autoQSM. The neural network has a modified U-net structure and is trained using QSM maps computed by a two-step QSM method. 209 healthy subjects with ages ranging from 11 to 82 years were employed for patch-wise network training. The network was validated on data dissimilar to the training data, e.g. in vivo mouse brain data and brains with lesions, which suggests that the network has generalized and learned the underlying mathematical relationship between magnetic field perturbation and magnetic susceptibility. AutoQSM was able to recover magnetic susceptibility of anatomical structures near the edges of the brain including the veins covering the cortical surface, spinal cord and nerve tracts near the mouse brain boundaries. The advantages of high-quality maps, no need for brain volume extraction and high reconstruction speed demonstrate its potential for future applications.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Deep Learning with Attention to Predict Gestational Age of the Fetal Brain
Authors:
Liyue Shen,
Katie Shpanskaya,
Edward Lee,
Emily McKenna,
Maryam Maleki,
Quin Lu,
Safwan Halabi,
John Pauly,
Kristen Yeom
Abstract:
Fetal brain imaging is a cornerstone of prenatal screening and early diagnosis of congenital anomalies. Knowledge of fetal gestational age is the key to the accurate assessment of brain development. This study develops an attention-based deep learning model to predict gestational age of the fetal brain. The proposed model is an end-to-end framework that combines key insights from multi-view MRI in…
▽ More
Fetal brain imaging is a cornerstone of prenatal screening and early diagnosis of congenital anomalies. Knowledge of fetal gestational age is the key to the accurate assessment of brain development. This study develops an attention-based deep learning model to predict gestational age of the fetal brain. The proposed model is an end-to-end framework that combines key insights from multi-view MRI including axial, coronal, and sagittal views. The model also uses age-activated weakly-supervised attention maps to enable rotation-invariant localization of the fetal brain among background noise. We evaluate our methods on the collected fetal brain MRI cohort with a large age distribution from 125 to 273 days. Our extensive experiments show age prediction performance with R2 = 0.94 using multi-view MRI and attention.
△ Less
Submitted 9 December, 2018;
originally announced December 2018.