-
Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera
Authors:
Inpyo Song,
Minjun Joo,
Joonhyung Kwon,
Jangwon Lee
Abstract:
This paper addresses the daily challenges encountered by visually impaired individuals, such as limited access to information, navigation difficulties, and barriers to social interaction. To alleviate these challenges, we introduce a novel visual question answering dataset. Our dataset offers two significant advancements over previous datasets: Firstly, it features videos captured using a 360-degr…
▽ More
This paper addresses the daily challenges encountered by visually impaired individuals, such as limited access to information, navigation difficulties, and barriers to social interaction. To alleviate these challenges, we introduce a novel visual question answering dataset. Our dataset offers two significant advancements over previous datasets: Firstly, it features videos captured using a 360-degree egocentric wearable camera, enabling observation of the entire surroundings, departing from the static image-centric nature of prior datasets. Secondly, unlike datasets centered on singular challenges, ours addresses multiple real-life obstacles simultaneously through an innovative visual-question answering framework. We validate our dataset using various state-of-the-art VideoQA methods and diverse metrics. Results indicate that while progress has been made, satisfactory performance levels for AI-powered assistive services remain elusive for visually impaired individuals. Additionally, our evaluation highlights the distinctive features of the proposed dataset, featuring ego-motion in videos captured via 360-degree cameras across varied scenarios.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seong** Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
DropBP: Accelerating Fine-Tuning of Large Language Models by Drop** Backward Propagation
Authors:
Sunghyeon Woo,
Baeseong Park,
Byeongwook Kim,
Minjung Jo,
Sejung Kwon,
Dongsuk Jeon,
Dongsoo Lee
Abstract:
Training deep neural networks typically involves substantial computational costs during both forward and backward propagation. The conventional layer drop** techniques drop certain layers during training for reducing the computations burden. However, drop** layers during forward propagation adversely affects the training process by degrading accuracy. In this paper, we propose Drop** Backwar…
▽ More
Training deep neural networks typically involves substantial computational costs during both forward and backward propagation. The conventional layer drop** techniques drop certain layers during training for reducing the computations burden. However, drop** layers during forward propagation adversely affects the training process by degrading accuracy. In this paper, we propose Drop** Backward Propagation (DropBP), a novel approach designed to reduce computational costs while maintaining accuracy. DropBP randomly drops layers during the backward propagation, which does not deviate forward propagation. Moreover, DropBP calculates the sensitivity of each layer to assign appropriate drop rate, thereby stabilizing the training process. DropBP is designed to enhance the efficiency of the training process with backpropagation, thereby enabling the acceleration of both full fine-tuning and parameter-efficient fine-tuning using backpropagation. Specifically, utilizing DropBP in QLoRA reduces training time by 44%, increases the convergence speed to the identical loss level by 1.5$\times$, and enables training with a 6.2$\times$ larger sequence length on a single NVIDIA-A100 80GiB GPU in LLaMA2-70B. The code is available at https://github.com/WooSunghyeon/dropbp.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Semantic-Aware Implicit Template Learning via Part Deformation Consistency
Authors:
Sihyeon Kim,
Minseok Joo,
Jaewon Lee,
Juyeon Ko,
Juhan Cha,
Hyunwoo J. Kim
Abstract:
Learning implicit templates as neural fields has recently shown impressive performance in unsupervised shape correspondence. Despite the success, we observe current approaches, which solely rely on geometric information, often learn suboptimal deformation across generic object shapes, which have high structural variability. In this paper, we highlight the importance of part deformation consistency…
▽ More
Learning implicit templates as neural fields has recently shown impressive performance in unsupervised shape correspondence. Despite the success, we observe current approaches, which solely rely on geometric information, often learn suboptimal deformation across generic object shapes, which have high structural variability. In this paper, we highlight the importance of part deformation consistency and propose a semantic-aware implicit template learning framework to enable semantically plausible deformation. By leveraging semantic prior from a self-supervised feature extractor, we suggest local conditioning with novel semantic-aware deformation code and deformation consistency regularizations regarding part deformation, global deformation, and global scaling. Our extensive experiments demonstrate the superiority of the proposed method over baselines in various tasks: keypoint transfer, part label transfer, and texture transfer. More interestingly, our framework shows a larger performance gain under more challenging settings. We also provide qualitative analyses to validate the effectiveness of semantic-aware deformation. The code is available at https://github.com/mlvlab/PDC.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Hawkes Process Based on Controlled Differential Equations
Authors:
Minju Jo,
Seungji Kook,
Noseong Park
Abstract:
Hawkes processes are a popular framework to model the occurrence of sequential events, i.e., occurrence dynamics, in several fields such as social diffusion. In real-world scenarios, the inter-arrival time among events is irregular. However, existing neural network-based Hawkes process models not only i) fail to capture such complicated irregular dynamics, but also ii) resort to heuristics to calc…
▽ More
Hawkes processes are a popular framework to model the occurrence of sequential events, i.e., occurrence dynamics, in several fields such as social diffusion. In real-world scenarios, the inter-arrival time among events is irregular. However, existing neural network-based Hawkes process models not only i) fail to capture such complicated irregular dynamics, but also ii) resort to heuristics to calculate the log-likelihood of events since they are mostly based on neural networks designed for regular discrete inputs. To this end, we present the concept of Hawkes process based on controlled differential equations (HP-CDE), by adopting the neural controlled differential equation (neural CDE) technology which is an analogue to continuous RNNs. Since HP-CDE continuously reads data, i) irregular time-series datasets can be properly treated preserving their uneven temporal spaces, and ii) the log-likelihood can be exactly computed. Moreover, as both Hawkes processes and neural CDEs are first developed to model complicated human behavioral dynamics, neural CDE-based Hawkes processes are successful in modeling such occurrence dynamics. In our experiments with 4 real-world datasets, our method outperforms existing methods by non-trivial margins.
△ Less
Submitted 18 May, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Learnable Path in Neural Controlled Differential Equations
Authors:
Sheo Yon Jhin,
Minju Jo,
Seungji Kook,
Noseong Park,
Sungpil Woo,
Sunhwan Lim
Abstract:
Neural controlled differential equations (NCDEs), which are continuous analogues to recurrent neural networks (RNNs), are a specialized model in (irregular) time-series processing. In comparison with similar models, e.g., neural ordinary differential equations (NODEs), the key distinctive characteristics of NCDEs are i) the adoption of the continuous path created by an interpolation algorithm from…
▽ More
Neural controlled differential equations (NCDEs), which are continuous analogues to recurrent neural networks (RNNs), are a specialized model in (irregular) time-series processing. In comparison with similar models, e.g., neural ordinary differential equations (NODEs), the key distinctive characteristics of NCDEs are i) the adoption of the continuous path created by an interpolation algorithm from each raw discrete time-series sample and ii) the adoption of the Riemann--Stieltjes integral. It is the continuous path which makes NCDEs be analogues to continuous RNNs. However, NCDEs use existing interpolation algorithms to create the path, which is unclear whether they can create an optimal path. To this end, we present a method to generate another latent path (rather than relying on existing interpolation algorithms), which is identical to learning an appropriate interpolation method. We design an encoder-decoder module based on NCDEs and NODEs, and a special training method for it. Our method shows the best performance in both time-series classification and forecasting.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
TimeKit: A Time-series Forecasting-based Upgrade Kit for Collaborative Filtering
Authors:
Seoyoung Hong,
Minju Jo,
Seungji Kook,
Jaeeun Jung,
Hyowon Wi,
Noseong Park,
Sung-Bae Cho
Abstract:
Recommender systems are a long-standing research problem in data mining and machine learning. They are incremental in nature, as new user-item interaction logs arrive. In real-world applications, we need to periodically train a collaborative filtering algorithm to extract user/item embedding vectors and therefore, a time-series of embedding vectors can be naturally defined. We present a time-serie…
▽ More
Recommender systems are a long-standing research problem in data mining and machine learning. They are incremental in nature, as new user-item interaction logs arrive. In real-world applications, we need to periodically train a collaborative filtering algorithm to extract user/item embedding vectors and therefore, a time-series of embedding vectors can be naturally defined. We present a time-series forecasting-based upgrade kit (TimeKit), which works in the following way: it i) first decides a base collaborative filtering algorithm, ii) extracts user/item embedding vectors with the base algorithm from user-item interaction logs incrementally, e.g., every month, iii) trains our time-series forecasting model with the extracted time-series of embedding vectors, and then iv) forecasts the future embedding vectors and recommend with their dot-product scores owing to a recent breakthrough in processing complicated time-series data, i.e., neural controlled differential equations (NCDEs). Our experiments with four real-world benchmark datasets show that the proposed time-series forecasting-based upgrade kit can significantly enhance existing popular collaborative filtering algorithms.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
LORD: Lower-Dimensional Embedding of Log-Signature in Neural Rough Differential Equations
Authors:
Jaehoon Lee,
**sung Jeon,
Sheo yon Jhin,
Jihyeon Hyeong,
Jayoung Kim,
Minju Jo,
Kook Seungji,
Noseong Park
Abstract:
The problem of processing very long time-series data (e.g., a length of more than 10,000) is a long-standing research problem in machine learning. Recently, one breakthrough, called neural rough differential equations (NRDEs), has been proposed and has shown that it is able to process such data. Their main concept is to use the log-signature transform, which is known to be more efficient than the…
▽ More
The problem of processing very long time-series data (e.g., a length of more than 10,000) is a long-standing research problem in machine learning. Recently, one breakthrough, called neural rough differential equations (NRDEs), has been proposed and has shown that it is able to process such data. Their main concept is to use the log-signature transform, which is known to be more efficient than the Fourier transform for irregular long time-series, to convert a very long time-series sample into a relatively shorter series of feature vectors. However, the log-signature transform causes non-trivial spatial overheads. To this end, we present the method of LOweR-Dimensional embedding of log-signature (LORD), where we define an NRDE-based autoencoder to implant the higher-depth log-signature knowledge into the lower-depth log-signature. We show that the encoder successfully combines the higher-depth and the lower-depth log-signature knowledge, which greatly stabilizes the training process and increases the model accuracy. In our experiments with benchmark datasets, the improvement ratio by our method is up to 75\% in terms of various classification and forecasting evaluation metrics.
△ Less
Submitted 19 April, 2022;
originally announced April 2022.
-
EXIT: Extrapolation and Interpolation-based Neural Controlled Differential Equations for Time-series Classification and Forecasting
Authors:
Sheo Yon Jhin,
Jaehoon Lee,
Minju Jo,
Seungji Kook,
**sung Jeon,
Jihyeon Hyeong,
Jayoung Kim,
Noseong Park
Abstract:
Deep learning inspired by differential equations is a recent research trend and has marked the state of the art performance for many machine learning tasks. Among them, time-series modeling with neural controlled differential equations (NCDEs) is considered as a breakthrough. In many cases, NCDE-based models not only provide better accuracy than recurrent neural networks (RNNs) but also make it po…
▽ More
Deep learning inspired by differential equations is a recent research trend and has marked the state of the art performance for many machine learning tasks. Among them, time-series modeling with neural controlled differential equations (NCDEs) is considered as a breakthrough. In many cases, NCDE-based models not only provide better accuracy than recurrent neural networks (RNNs) but also make it possible to process irregular time-series. In this work, we enhance NCDEs by redesigning their core part, i.e., generating a continuous path from a discrete time-series input. NCDEs typically use interpolation algorithms to convert discrete time-series samples to continuous paths. However, we propose to i) generate another latent continuous path using an encoder-decoder architecture, which corresponds to the interpolation process of NCDEs, i.e., our neural network-based interpolation vs. the existing explicit interpolation, and ii) exploit the generative characteristic of the decoder, i.e., extrapolation beyond the time domain of original data if needed. Therefore, our NCDE design can use both the interpolated and the extrapolated information for downstream machine learning tasks. In our experiments with 5 real-world datasets and 12 baselines, our extrapolation and interpolation-based NCDEs outperform existing baselines by non-trivial margins.
△ Less
Submitted 21 September, 2022; v1 submitted 19 April, 2022;
originally announced April 2022.
-
LightMove: A Lightweight Next-POI Recommendation for Taxicab Rooftop Advertising
Authors:
**sung Jeon,
Soyoung Kang,
Minju Jo,
Seunghyeon Cho,
Noseong Park,
Seonghoon Kim,
Chiyoung Song
Abstract:
Mobile digital billboards are an effective way to augment brand-awareness. Among various such mobile billboards, taxicab rooftop devices are emerging in the market as a brand new media. Motov is a leading company in South Korea in the taxicab rooftop advertising market. In this work, we present a lightweight yet accurate deep learning-based method to predict taxicabs' next locations to better prep…
▽ More
Mobile digital billboards are an effective way to augment brand-awareness. Among various such mobile billboards, taxicab rooftop devices are emerging in the market as a brand new media. Motov is a leading company in South Korea in the taxicab rooftop advertising market. In this work, we present a lightweight yet accurate deep learning-based method to predict taxicabs' next locations to better prepare for targeted advertising based on demographic information of locations. Considering the fact that next POI recommendation datasets are frequently sparse, we design our presented model based on neural ordinary differential equations (NODEs), which are known to be robust to sparse/incorrect input, with several enhancements. Our model, which we call LightMove, has a larger prediction accuracy, a smaller number of parameters, and/or a smaller training/inference time, when evaluating with various datasets, in comparison with state-of-the-art models.
△ Less
Submitted 18 August, 2021; v1 submitted 10 August, 2021;
originally announced August 2021.
-
ACE-NODE: Attentive Co-Evolving Neural Ordinary Differential Equations
Authors:
Sheo Yon Jhin,
Minju Jo,
Taeyong Kong,
**sung Jeon,
Noseong Park
Abstract:
Neural ordinary differential equations (NODEs) presented a new paradigm to construct (continuous-time) neural networks. While showing several good characteristics in terms of the number of parameters and the flexibility in constructing neural networks, they also have a couple of well-known limitations: i) theoretically NODEs learn homeomorphic map** functions only, and ii) sometimes NODEs show n…
▽ More
Neural ordinary differential equations (NODEs) presented a new paradigm to construct (continuous-time) neural networks. While showing several good characteristics in terms of the number of parameters and the flexibility in constructing neural networks, they also have a couple of well-known limitations: i) theoretically NODEs learn homeomorphic map** functions only, and ii) sometimes NODEs show numerical instability in solving integral problems. To handle this, many enhancements have been proposed. To our knowledge, however, integrating attention into NODEs has been overlooked for a while. To this end, we present a novel method of attentive dual co-evolving NODE (ACE-NODE): one main NODE for a downstream machine learning task and the other for providing attention to the main NODE. Our ACE-NODE supports both pairwise and elementwise attention. In our experiments, our method outperforms existing NODE-based and non-NODE-based baselines in almost all cases by non-trivial margins.
△ Less
Submitted 31 May, 2021;
originally announced May 2021.
-
Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques
Authors:
Kang-wook Kim,
Seung-won Park,
Junhyeok Lee,
Myun-chul Joe
Abstract:
Recent works on voice conversion (VC) focus on preserving the rhythm and the intonation as well as the linguistic content. To preserve these features from the source, we decompose current non-parallel VC systems into two encoders and one decoder. We analyze each module with several experiments and reassemble the best components to propose Assem-VC, a new state-of-the-art any-to-many non-parallel V…
▽ More
Recent works on voice conversion (VC) focus on preserving the rhythm and the intonation as well as the linguistic content. To preserve these features from the source, we decompose current non-parallel VC systems into two encoders and one decoder. We analyze each module with several experiments and reassemble the best components to propose Assem-VC, a new state-of-the-art any-to-many non-parallel VC system. We also examine that PPG and Cotatron features are speaker-dependent, and attempt to remove speaker identity with adversarial training. Code and audio samples are available at https://github.com/mindslab-ai/assem-vc.
△ Less
Submitted 11 October, 2021; v1 submitted 2 April, 2021;
originally announced April 2021.
-
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data
Authors:
Seung-won Park,
Doo-young Kim,
Myun-chul Joe
Abstract:
We propose Cotatron, a transcription-guided speech encoder for speaker-independent linguistic representation. Cotatron is based on the multispeaker TTS architecture and can be trained with conventional TTS datasets. We train a voice conversion system to reconstruct speech with Cotatron features, which is similar to the previous methods based on Phonetic Posteriorgram (PPG). By training and evaluat…
▽ More
We propose Cotatron, a transcription-guided speech encoder for speaker-independent linguistic representation. Cotatron is based on the multispeaker TTS architecture and can be trained with conventional TTS datasets. We train a voice conversion system to reconstruct speech with Cotatron features, which is similar to the previous methods based on Phonetic Posteriorgram (PPG). By training and evaluating our system with 108 speakers from the VCTK dataset, we outperform the previous method in terms of both naturalness and speaker similarity. Our system can also convert speech from speakers that are unseen during training, and utilize ASR to automate the transcription with minimal reduction of the performance. Audio samples are available at https://mindslab-ai.github.io/cotatron, and the code with a pre-trained model will be made available soon.
△ Less
Submitted 14 August, 2020; v1 submitted 7 May, 2020;
originally announced May 2020.
-
Improved upper bound on root number of linearized polynomials and its application to nonlinearity estimation of Boolean functions
Authors:
Sihem Mesnager,
Kwang Ho Kim,
Myong Song Jo
Abstract:
To determine the dimension of null space of any given linearized polynomial is one of vital problems in finite field theory, with concern to design of modern symmetric cryptosystems. But, the known general theory for this task is much far from giving the exact dimension when applied to a specific linearized polynomial. The first contribution of this paper is to give a better general method to get…
▽ More
To determine the dimension of null space of any given linearized polynomial is one of vital problems in finite field theory, with concern to design of modern symmetric cryptosystems. But, the known general theory for this task is much far from giving the exact dimension when applied to a specific linearized polynomial. The first contribution of this paper is to give a better general method to get more precise upper bound on the root number of any given linearized polynomial. We anticipate this result would be applied as a useful tool in many research branches of finite field and cryptography. Really we apply this result to get tighter estimations of the lower bounds on the second order nonlinearities of general cubic Boolean functions, which has been being an active research problem during the past decade, with many examples showing great improvements. Furthermore, this paper shows that by studying the distribution of radicals of derivatives of a given Boolean functions one can get a better lower bound of the second-order nonlinearity, through an example of the monomial Boolean function $g_μ=Tr(μx^{2^{2r}+2^r+1})$ over any finite field $\GF{n}$.
△ Less
Submitted 27 November, 2018;
originally announced November 2018.
-
Multi-hop Links Quality Analysis of 5G Enabled Vehicular Networks
Authors:
Shikuan Li,
Zipeng Li,
Xiaohu Ge,
**g Zhang,
Minho Jo
Abstract:
With the emerging of the fifth generation (5G) mobile communication systems, millimeter wave transmissions are believed to be a promising solution for vehicular networks, especially in vehicle to vehicle (V2V) communications. In millimeter wave V2V communications, different vehicular networking services have different quality requirements for V2V multi-hop links. To evaluate the quality of differe…
▽ More
With the emerging of the fifth generation (5G) mobile communication systems, millimeter wave transmissions are believed to be a promising solution for vehicular networks, especially in vehicle to vehicle (V2V) communications. In millimeter wave V2V communications, different vehicular networking services have different quality requirements for V2V multi-hop links. To evaluate the quality of different V2V wireless links, a new link quality indicator is proposed in this paper considering requirements of the real-time and the reliability in V2V multi-hop links. Moreover, different weight factors are configured to reflect the different requirements of different types of services on real-time and reliability in the new quality indicator. Based on the proposed link quality indicator, the relationship between V2V link quality and one-hop communication distance under different vehicle densities is analyzed in this paper. Simulation results indicate that the link quality is improved with the increasing of vehicle density and there exists an optimal one-hop communication distance for the link quality when the vehicle density is fixed.
△ Less
Submitted 1 August, 2017; v1 submitted 24 April, 2017;
originally announced April 2017.
-
Multi-user Massive MIMO Communication Systems Based on Irregular Antenna Arrays
Authors:
Xiaohu Ge,
Ran Zi,
Haichao Wang,
**g Zhang,
Minho Jo
Abstract:
In practical mobile communication engineering applications, surfaces of antenna array deployment regions are usually uneven. Therefore, massive multi-input-multi-output (MIMO) communication systems usually transmit wireless signals by irregular antenna arrays. To evaluate the performance of irregular antenna arrays, the matrix correlation coefficient and ergodic received gain are defined for massi…
▽ More
In practical mobile communication engineering applications, surfaces of antenna array deployment regions are usually uneven. Therefore, massive multi-input-multi-output (MIMO) communication systems usually transmit wireless signals by irregular antenna arrays. To evaluate the performance of irregular antenna arrays, the matrix correlation coefficient and ergodic received gain are defined for massive MIMO communication systems with mutual coupling effects. Furthermore, the lower bound of the ergodic achievable rate, symbol error rate (SER) and average outage probability are firstly derived for multi-user massive MIMO communication systems using irregular antenna arrays. Asymptotic results are also derived when the number of antennas approaches infinity. Numerical results indicate that there exists a maximum achievable rate when the number of antennas keeps increasing in massive MIMO communication systems using irregular antenna arrays. Moreover, the irregular antenna array outperforms the regular antenna array in the achievable rate of massive MIMO communication systems when the number of antennas is larger than or equal to a given threshold.
△ Less
Submitted 17 April, 2016;
originally announced April 2016.