Skip to main content

Showing 1–16 of 16 results for author: Kang, F

.
  1. arXiv:2405.02774  [pdf, other

    cs.LG cs.AI cs.CL

    Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs

    Authors: Feiyang Kang, Hoang Anh Just, Yifan Sun, Himanshu Jahagirdar, Yuanzhi Zhang, Rongxing Du, Anit Kumar Sahu, Ruoxi Jia

    Abstract: This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-tune a pre-trained language model. The goal is to minimize the need for costly domain-specific data for subsequent fine-tuning while achieving desired performance levels. While many data selection algorithms have been designed for small-scale applications, rendering them unsuitable for our context, some emerg… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at ICLR 2024

  2. arXiv:2404.15157  [pdf, other

    cs.CL cs.AI

    FASTTRACK: Fast and Accurate Fact Tracing for LLMs

    Authors: Si Chen, Feiyang Kang, Ning Yu, Ruoxi Jia

    Abstract: Fact tracing seeks to identify specific training examples that serve as the knowledge source for a given query. Existing approaches to fact tracing rely on assessing the similarity between each training sample and the query along a certain dimension, such as lexical similarity, gradient, or embedding space. However, these methods fall short of effectively distinguishing between samples that are me… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  3. arXiv:2402.08922  [pdf, other

    cs.LG stat.ML

    The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

    Authors: Myeongseob Ko, Feiyang Kang, Weiyan Shi, Ming **, Zhou Yu, Ruoxi Jia

    Abstract: Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious comp… ▽ More

    Submitted 19 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

  4. arXiv:2311.13712  [pdf, other

    cs.AI

    Data Acquisition: A New Frontier in Data-centric AI

    Authors: Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou

    Abstract: As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative. There is limited study on the challenges of data acquisition due to ad-hoc processes and lack of consistent methodologies. We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets, transparent prici… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  5. arXiv:2310.06878  [pdf, other

    physics.app-ph physics.optics

    Realization of the all-optical phase modulator, filter, splitter, and self-consistent logic gates based on assembled magneto-optical heterostructures

    Authors: Jie Xu, Yun You, Fengwen Kang, Sanshui Xiao, Lujun Hong, Yun Shen, Yamei Luo, Kosmas L. Tsakmakidis

    Abstract: All-optical computing has recently emerged as a vibrant research field in response to the energy crisis and the growing demand for information processing. However, the efficiency of subwavelength-scale all-optical devices remains relatively low due to challenges such as back-scattering reflections and strict surface roughness. Furthermore, achieving multifunctionality through the reassembly of all… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  6. arXiv:2307.02460  [pdf, other

    cs.LG cs.AI cs.CE cs.CV

    Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources

    Authors: Feiyang Kang, Hoang Anh Just, Anit Kumar Sahu, Ruoxi Jia

    Abstract: Traditionally, data selection has been studied in settings where all samples from prospective sources are fully revealed to a machine learning developer. However, in practical data exchange scenarios, data providers often reveal only a limited subset of samples before an acquisition decision is made. Recently, there have been efforts to fit scaling laws that predict model performance at any size a… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: An extended abstract of this work appears in Data-centric Machine Learning Research (DMLR) Workshop at 40th International Conference on Machine Learning, Honolulu HI, USA. July 29, 2023

  7. arXiv:2305.15713  [pdf

    physics.chem-ph cond-mat.mtrl-sci

    Proton Collective Quantum Tunneling Induces Anomalous Thermal Conductivity of Ice under Pressure

    Authors: Yufeng Wang, Ripeng Luo, Jian Chen, Xuefeng Zhou, Shanmin Wang, Junqiao Wu, Feiyu Kang, Kuang Yu, Bo Sun

    Abstract: Proton tunneling is believed to be non-local in ice but has never been shown experimentally. Here we measured thermal conductivity of ice under pressure up to 50 GPa and found it to increase with pressure until 20 GPa but decrease at higher pressures. We attribute this anomalous drop of thermal conductivity to the collective tunneling of protons at high pressures, supported by large-scale quantum… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  8. arXiv:2305.00054  [pdf, other

    cs.LG cs.AI stat.ML

    LAVA: Data Valuation without Pre-Specified Learning Algorithms

    Authors: Hoang Anh Just, Feiyang Kang, Jiachen T. Wang, Yi Zeng, Myeongseob Ko, Ming **, Ruoxi Jia

    Abstract: Traditionally, data valuation (DV) is posed as a problem of equitably splitting the validation performance of a learning algorithm among the training data. As a result, the calculated data values depend on many design choices of the underlying learning algorithm. However, this dependence is undesirable for many DV use cases, such as setting priorities over different data sources in a data acquisit… ▽ More

    Submitted 19 December, 2023; v1 submitted 28 April, 2023; originally announced May 2023.

    Comments: ICLR 2023 Spotlight Latest Updated Version: 2023/12/19

  9. arXiv:2204.06199  [pdf, ps, other

    physics.optics

    Realization of broadband index-near-zero modes in nonreciprocal magneto-optical heterostructures

    Authors: Yun Zhou, Panpan He, Sanshui Xiao, Fengwen Kang, Lujun Hong, Yun Shen, Yamei Luo, Jie Xu

    Abstract: Epsilon-near-zero (ENZ) metamaterial with the relative permittivity approaching zero has been a hot research subject in the past decades. The wave in the ENZ region has infinite phase velocity ($v=1/\sqrt{\varepsilonμ}$), whereas it cannot efficiently travel into the other devices or air due to the impedance mismatch or near-zero group velocity. In this paper, we demonstrate that the tunable index… ▽ More

    Submitted 18 April, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

  10. arXiv:2202.06010  [pdf

    cond-mat.mtrl-sci

    Incoherent phonon transport dominates heat conduction across van der Waals superlattices

    Authors: Lu Zhao, Lijuan Zhang, Houfu Song, Hongda Du, Renshaw X. Wang, Junqiao Wu, Feiyu Kang, Bo Sun

    Abstract: Heat conduction mechanisms in superlattices could be different across different types of interfaces. Van der Waals superlattices are structures physically assembled through weak van der Waals interactions by design, and may host properties beyond the traditional limits of lattice matching and processing compatibility, offering new types of interfaces. In this work, natural van der Waals (SnS)1.17(… ▽ More

    Submitted 12 February, 2022; originally announced February 2022.

    Comments: 31 pages, 7 figures

  11. arXiv:2103.00285  [pdf, other

    eess.SY

    Visual Navigation with a 2-pixel Camera---Possibilities and Limitations

    Authors: John Baillieul, Feiyang Kang

    Abstract: Borrowing terminology from fluid mechanics, the concepts of {\em Eulerian} and {\em Lagrangian optical flow sensing} are introduced. Eulerian optical flow sensing assumes that each photoreceptor in the camera or eye can instantaneously detect feature image points and their velocities on the retina. If this assumption is satisfied, even a two pixel imaging system can provide a moving agent with inf… ▽ More

    Submitted 27 February, 2021; originally announced March 2021.

    Comments: 8 pages, 2 figures

    Journal ref: In Proceedings of IFAC 2020, Virtual World Congress, Berlin, July 13-17,2020

  12. arXiv:1907.13231  [pdf

    physics.app-ph physics.optics

    Green low-cost carbon nanodots-polyurethane composites with novel anisotropic anti-quenching mechanism for strain sensing

    Authors: Yayuan Tian, Yan Zhao, Fengwen Kang, Fucong Lyu, Zebiao Li, Jian Lu, Yang Yang Li

    Abstract: A new type of nontoxic low-cost sensor is reported here, whose photoluminescence (PL) intensity is instantly responsive to the external strain applied over a large range (up to 250% strain). Highly stretchable fluorescent composites of carbon dots (CDs) and polyurethane (PU) are fabricated via a scalable green chemistry method by conveniently dispersing CDs in the aqueous solution of PU. It is dis… ▽ More

    Submitted 20 June, 2019; originally announced July 2019.

  13. arXiv:1804.01574  [pdf, other

    cs.OH

    Prediction-Based Fast Thermoelectric Generator Reconfiguration for Energy Harvesting from Vehicle Radiators

    Authors: Hanchen Yang, Feiyang Kang, Caiwen Ding, Ji Li, Jaemin Kim, Donkyu Baek, Shahin Nazarian, Xue Lin, Paul Bogdan, Naehyuck Chang

    Abstract: Thermoelectric generation (TEG) has increasingly drawn attention for being environmentally friendly. A few researches have focused on improving TEG efficiency at the system level on vehicle radiators. The most recent reconfiguration algorithm shows improvement in performance but suffers from major drawback on computational time and energy overhead, and non-scalability in terms of array size and pr… ▽ More

    Submitted 28 March, 2018; originally announced April 2018.

    Comments: 4 pages, 7figurs; Accepted at Design Automation and Test in Europe (DATE) 2018

  14. arXiv:1410.4223  [pdf

    cond-mat.mtrl-sci physics.chem-ph

    Manganese reduction/oxidation reaction on graphene composites as a reversible process for storing enormous energy at a fast rate

    Authors: Yanyi Chen, Chengjun Xu, Shan Shi, Jia Li, Feiyu Kang, Chunguang Wei

    Abstract: Oxygen reduction/evolution reaction (ORR/OER) is a basic process for fuel cells or metal air batteries. However, ORR/OER generally requires noble metal catalysts and suffers from low solubility (10-3 molar per liter) of O2, low kinetics rate (10-6 cm2/s) and low reversibility. We report a manganese reduction/oxidation reaction (MRR/MOR) on graphene/MnO2 composites, delivering a high capacity (4200… ▽ More

    Submitted 10 October, 2014; originally announced October 2014.

    Comments: Condensed mater, Material Science

  15. arXiv:1002.0186  [pdf

    cond-mat.mtrl-sci

    Negative refraction at deep-ultraviolet frequency in monocrystalline graphite

    Authors: **gbo Sun, Ji Zhou, Lei Kang, Rui Wang, Xianguo Meng, Bo Li, Feiyu Kang, Longtu Li

    Abstract: Negative refraction is such a prominent electromagnetic phenomenon that most researchers believe it can only occur in artificially engineered metamaterials. In this article, we report negative refraction for all incident angles for the first time in a naturally existing material. Using ellipsometry measurement of the equifrequency contour in the deep-ultraviolet frequency region (typically 254 n… ▽ More

    Submitted 1 February, 2010; originally announced February 2010.

    Comments: 4 figures for main text; 2 figures for supplementary material

  16. About the chemical composition of delta Scuti - the prototype of the class of pulsating variables

    Authors: A. Yushchenko, V. Gopka, C. Kim, F. Musaev, F. Kang

    Abstract: We present chemical abundances in the photosphere of $δ$ Scuti -- the prototype of the class of pulsating variables -- determined from the analysis of a spectrum obtained at Terskol observatory 2 meter telescope with resolution $R=52,000$, signal to noise ratio 250. VLT and IUE spectra were used also . Abundance pattern of \dsct consists of 49 chemical elements. The abundances of Be, P, Ge, Nb,… ▽ More

    Submitted 13 October, 2004; originally announced October 2004.

    Comments: 8 pages, 2 figures, subm. to Proc. of IAU Symp. 224