Search | arXiv e-print repository

Tilt and Average : Geometric Adjustment of the Last Layer for Recalibration

Abstract: After the revelation that neural networks tend to produce overconfident predictions, the problem of calibration, which aims to align confidence with accuracy to enhance the reliability of predictions, has gained significant importance. Several solutions based on calibration maps have been proposed to address the problem of recalibrating a trained classifier using additional datasets. In this paper… ▽ More After the revelation that neural networks tend to produce overconfident predictions, the problem of calibration, which aims to align confidence with accuracy to enhance the reliability of predictions, has gained significant importance. Several solutions based on calibration maps have been proposed to address the problem of recalibrating a trained classifier using additional datasets. In this paper, we offer an algorithm that transforms the weights of the last layer of the classifier, distinct from the calibration-map-based approach. We concentrate on the geometry of the final linear layer, specifically its angular aspect, and adjust the weights of the corresponding layer. We name the method Tilt and Average(\textsc{Tna}), and validate the calibration effect empirically and theoretically. Through this, we demonstrate that our approach, in addition to the existing calibration-map-based techniques, can yield improved calibration performance. Code available : https://github.com/GYYYYYUUUUU/TNA_Angular_Scaling. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 20 pages, 11 figures, to appear in International Conference on Machine Learning (ICML2024)

arXiv:2405.19795 [pdf, other]

SLM as Guardian: Pioneering AI Safety with Small Language Models

Authors: Ohjoon Kwon, Donghyeon Jeon, Nayoung Choi, Gyu-Hwung Cho, Changbong Kim, Hyunwoo Lee, Inho Kang, Sun Kim, Taiwoo Park

Abstract: Most prior safety research of large language models (LLMs) has focused on enhancing the alignment of LLMs to better suit the safety requirements of humans. However, internalizing such safeguard features into larger models brought challenges of higher training cost and unintended degradation of helpfulness. To overcome such challenges, a modular approach employing a smaller LLM to detect harmful us… ▽ More Most prior safety research of large language models (LLMs) has focused on enhancing the alignment of LLMs to better suit the safety requirements of humans. However, internalizing such safeguard features into larger models brought challenges of higher training cost and unintended degradation of helpfulness. To overcome such challenges, a modular approach employing a smaller LLM to detect harmful user queries is regarded as a convenient solution in designing LLM-based system with safety requirements. In this paper, we leverage a smaller LLM for both harmful query detection and safeguard response generation. We introduce our safety requirements and the taxonomy of harmfulness categories, and then propose a multi-task learning mechanism fusing the two tasks into a single model. We demonstrate the effectiveness of our approach, providing on par or surpassing harmful query detection and safeguard response performance compared to the publicly available LLMs. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2307.02493 [pdf, other]

FREEDOM: Target Label & Source Data & Domain Information-Free Multi-Source Domain Adaptation for Unsupervised Personalization

Authors: Eunju Yang, Gyusang Cho, Chan-Hyun Youn

Abstract: From a service perspective, Multi-Source Domain Adaptation (MSDA) is a promising scenario to adapt a deployed model to a client's dataset. It can provide adaptation without a target label and support the case where a source dataset is constructed from multiple domains. However, it is impractical, wherein its training heavily relies on prior domain information of the multi-source dataset -- how man… ▽ More From a service perspective, Multi-Source Domain Adaptation (MSDA) is a promising scenario to adapt a deployed model to a client's dataset. It can provide adaptation without a target label and support the case where a source dataset is constructed from multiple domains. However, it is impractical, wherein its training heavily relies on prior domain information of the multi-source dataset -- how many domains exist and the domain label of each data sample. Moreover, MSDA requires both source and target datasets simultaneously (physically), causing storage limitations on the client device or data privacy issues by transferring client data to a server. For a more practical scenario of model adaptation from a service provider's point of view, we relax these constraints and present a novel problem scenario of Three-Free Domain Adaptation, namely TFDA, where 1) target labels, 2) source dataset, and mostly 3) source domain information (domain labels + the number of domains) are unavailable. Under the problem scenario, we propose a practical adaptation framework called FREEDOM. It leverages the power of the generative model, disentangling data into class and style aspects, where the style is defined as the class-independent information from the source data and designed with a nonparametric Bayesian approach. In the adaptation stage, FREEDOM aims to match the source class distribution with the target's under the philosophy that class distribution is consistent even if the style is different; after then, only part of the classification model is deployed as a personalized network. As a result, FREEDOM achieves state-of-the-art or comparable performance even without domain information, with reduced final model size on the target side, independent of the number of source domains. △ Less

Submitted 4 July, 2023; originally announced July 2023.

arXiv:2212.03961 [pdf, other]

FSID: Fully Synthetic Image Denoising via Procedural Scene Generation

Authors: Gyeongmin Choe, Beibei Du, Seonghyeon Nam, Xiaoyu Xiang, Bo Zhu, Rakesh Ranjan

Abstract: For low-level computer vision and image processing ML tasks, training on large datasets is critical for generalization. However, the standard practice of relying on real-world images primarily from the Internet comes with image quality, scalability, and privacy issues, especially in commercial contexts. To address this, we have developed a procedural synthetic data generation pipeline and dataset… ▽ More For low-level computer vision and image processing ML tasks, training on large datasets is critical for generalization. However, the standard practice of relying on real-world images primarily from the Internet comes with image quality, scalability, and privacy issues, especially in commercial contexts. To address this, we have developed a procedural synthetic data generation pipeline and dataset tailored to low-level vision tasks. Our Unreal engine-based synthetic data pipeline populates large scenes algorithmically with a combination of random 3D objects, materials, and geometric transformations. Then, we calibrate the camera noise profiles to synthesize the noisy images. From this pipeline, we generated a fully synthetic image denoising dataset (FSID) which consists of 175,000 noisy/clean image pairs. We then trained and validated a CNN-based denoising model, and demonstrated that the model trained on this synthetic data alone can achieve competitive denoising results when evaluated on real-world noisy images captured with smartphone cameras. △ Less

Submitted 7 December, 2022; originally announced December 2022.

arXiv:2211.08658 [pdf, other]

Consistent Direct Time-of-Flight Video Depth Super-Resolution

Authors: Zhanghao Sun, Wei Ye, **hui Xiong, Gyeongmin Choe, Jialiang Wang, Shuochen Su, Rakesh Ranjan

Abstract: Direct time-of-flight (dToF) sensors are promising for next-generation on-device 3D sensing. However, limited by manufacturing capabilities in a compact module, the dToF data has a low spatial resolution (e.g., $\sim 20\times30$ for iPhone dToF), and it requires a super-resolution step before being passed to downstream tasks. In this paper, we solve this super-resolution problem by fusing the low-… ▽ More Direct time-of-flight (dToF) sensors are promising for next-generation on-device 3D sensing. However, limited by manufacturing capabilities in a compact module, the dToF data has a low spatial resolution (e.g., $\sim 20\times30$ for iPhone dToF), and it requires a super-resolution step before being passed to downstream tasks. In this paper, we solve this super-resolution problem by fusing the low-resolution dToF data with the corresponding high-resolution RGB guidance. Unlike the conventional RGB-guided depth enhancement approaches, which perform the fusion in a per-frame manner, we propose the first multi-frame fusion scheme to mitigate the spatial ambiguity resulting from the low-resolution dToF imaging. In addition, dToF sensors provide unique depth histogram information for each local patch, and we incorporate this dToF-specific feature in our network design to further alleviate spatial ambiguity. To evaluate our models on complex dynamic indoor environments and to provide a large-scale dToF sensor dataset, we introduce DyDToF, the first synthetic RGB-dToF video dataset that features dynamic objects and a realistic dToF simulator following the physical imaging process. We believe the methods and dataset are beneficial to a broad community as dToF depth sensing is becoming mainstream on mobile devices. Our code and data are publicly available: https://github.com/facebookresearch/DVSR/ △ Less

Submitted 3 May, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

arXiv:1608.05204 [pdf, other]

Refining Geometry from Depth Sensors using IR Shading Images

Authors: Gyeongmin Choe, Jaesik Park, Yu-Wing Tai, In So Kweon

Abstract: We propose a method to refine geometry of 3D meshes from a consumer level depth camera, e.g. Kinect, by exploiting shading cues captured from an infrared (IR) camera. A major benefit to using an IR camera instead of an RGB camera is that the IR images captured are narrow band images that filter out most undesired ambient light, which makes our system robust against natural indoor illumination. Mor… ▽ More We propose a method to refine geometry of 3D meshes from a consumer level depth camera, e.g. Kinect, by exploiting shading cues captured from an infrared (IR) camera. A major benefit to using an IR camera instead of an RGB camera is that the IR images captured are narrow band images that filter out most undesired ambient light, which makes our system robust against natural indoor illumination. Moreover, for many natural objects with colorful textures in the visible spectrum, the subjects appear to have a uniform albedo in the IR spectrum. Based on our analyses on the IR projector light of the Kinect, we define a near light source IR shading model that describes the captured intensity as a function of surface normals, albedo, lighting direction, and distance between light source and surface points. To resolve the ambiguity in our model between the normals and distances, we utilize an initial 3D mesh from the Kinect fusion and multi-view information to reliably estimate surface details that were not captured and reconstructed by the Kinect fusion. Our approach directly operates on the mesh model for geometry refinement. We ran experiments on our algorithm for geometries captured by both the Kinect I and Kinect II, as the depth acquisition in Kinect I is based on a structured-light technique and that of the Kinect II is based on a time-of-flight (ToF) technology. The effectiveness of our approach is demonstrated through several challenging real-world examples. We have also performed a user study to evaluate the quality of the mesh models before and after our refinements. △ Less

Submitted 18 August, 2016; originally announced August 2016.

Comments: Accepted to the International Journal of Computer Vision (IJCV)

arXiv:1603.07475 [pdf, other]

Fine-scale Surface Normal Estimation using a Single NIR Image

Authors: Young** Yoon, Gyeongmin Choe, Namil Kim, Joon-Young Lee, In So Kweon

Abstract: We present surface normal estimation using a single near infrared (NIR) image. We are focusing on fine-scale surface geometry captured with an uncalibrated light source. To tackle this ill-posed problem, we adopt a generative adversarial network which is effective in recovering a sharp output, which is also essential for fine-scale surface normal estimation. We incorporate angular error and integr… ▽ More We present surface normal estimation using a single near infrared (NIR) image. We are focusing on fine-scale surface geometry captured with an uncalibrated light source. To tackle this ill-posed problem, we adopt a generative adversarial network which is effective in recovering a sharp output, which is also essential for fine-scale surface normal estimation. We incorporate angular error and integrability constraint into the objective function of the network to make estimated normals physically meaningful. We train and validate our network on a recent NIR dataset, and also evaluate the generality of our trained model by using new external datasets which are captured with a different camera under different environment. △ Less

Submitted 24 March, 2016; originally announced March 2016.

arXiv:1511.02432 [pdf]

A Study of an Modeling Method of T-S fuzzy System Based on Moving Fuzzy Reasoning and Its Application

Authors: Son-Il Kwak, Gang Choe, In-Song Kim, Gyong-Ho Jo, Chol-Jun Hwang

Abstract: To improve the effectiveness of the fuzzy identification, a structure identification method based on moving rate is proposed for T-S fuzzy model. The proposed method is called "T-S modeling (or T-S fuzzy identification method) based on moving rate". First, to improve the shortcomings of existing fuzzy reasoning methods based on matching degree, the moving rates for s-type, z-type and trapezoidal m… ▽ More To improve the effectiveness of the fuzzy identification, a structure identification method based on moving rate is proposed for T-S fuzzy model. The proposed method is called "T-S modeling (or T-S fuzzy identification method) based on moving rate". First, to improve the shortcomings of existing fuzzy reasoning methods based on matching degree, the moving rates for s-type, z-type and trapezoidal membership functions of T-S fuzzy model were defined. Then, the differences between proposed moving rate and existing matching degree were explained. Next, the identification method based on moving rate is proposed for T-S model. Finally, the proposed identification method is applied to the fuzzy modeling for the precipitation forecast and security situation prediction. Test results show that the proposed method significantly improves the effectiveness of fuzzy identification. △ Less

Submitted 7 November, 2015; originally announced November 2015.

Comments: 24 pages, 11 figures

arXiv:1501.04036 [pdf, ps, other]

An Improvement of the Cipolla-Lehmer Type Algorithms

Authors: Namhun Koo, Gook Hwa Cho, Byeonghwan, Soonhak Kwon

Abstract: Let F_q be a finite field with q elements with prime power q and let r>1 be an integer with $q\equiv 1 \pmod{r}$. In this paper, we present a refinement of the Cipolla-Lehmer type algorithm given by H. C. Williams, and subsequently improved by K. S. Williams and K. Hardy. For a given r-th power residue c in F_q where r is an odd prime, the algorithm of H. C. Williams determines a solution of X^r=c… ▽ More Let F_q be a finite field with q elements with prime power q and let r>1 be an integer with $q\equiv 1 \pmod{r}$. In this paper, we present a refinement of the Cipolla-Lehmer type algorithm given by H. C. Williams, and subsequently improved by K. S. Williams and K. Hardy. For a given r-th power residue c in F_q where r is an odd prime, the algorithm of H. C. Williams determines a solution of X^r=c in $O(r^3\log q)$ multiplications in F_q, and the algorithm of K. S. Williams and K. Hardy finds a solution in $O(r^4+r^2\log q)$ multiplications in F_q. Our refinement finds a solution in $O(r^3+r^2\log q)$ multiplications in F_q. Therefore our new method is better than the previously proposed algorithms independent of the size of r, and the implementation result via SAGE shows a substantial speed-up compared with the existing algorithms. △ Less

Submitted 16 January, 2015; originally announced January 2015.

MSC Class: 11T06; 11Y16; 68W40

Showing 1–9 of 9 results for author: Cho, G