Skip to main content

Showing 1–12 of 12 results for author: Cong, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09321  [pdf, other

    cs.CR cs.AI cs.CL

    JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models

    Authors: Delong Ran, **yuan Liu, Yichen Gong, **gyi Zheng, Xinlei He, Tianshuo Cong, Anyu Wang

    Abstract: Jailbreak attacks aim to induce Large Language Models (LLMs) to generate harmful responses for forbidden instructions, presenting severe misuse threats to LLMs. Up to now, research into jailbreak attacks and defenses is emerging, however, there is (surprisingly) no consensus on how to evaluate whether a jailbreak attempt is successful. In other words, the methods to assess the harmfulness of an LL… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Our code is available at https://github.com/ThuCCSLab/JailbreakEval

  2. arXiv:2404.05188  [pdf, other

    cs.CR cs.AI cs.CL

    Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging

    Authors: Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, **yuan Liu, Yichen Gong, Qi Li, Anyu Wang, Xiaoyun Wang

    Abstract: Model merging is a promising lightweight model empowerment technique that does not rely on expensive computing devices (e.g., GPUs) or require the collection of specific training data. Instead, it involves editing different upstream model parameters to absorb their downstream task capabilities. However, uncertified model merging can infringe upon the Intellectual Property (IP) rights of the origin… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Technical Report

  3. arXiv:2403.05873  [pdf, other

    cs.SE cs.IR cs.LG

    LEGION: Harnessing Pre-trained Language Models for GitHub Topic Recommendations with Distribution-Balance Loss

    Authors: Yen-Trang Dang, Thanh-Le Cong, Phuc-Thanh Nguyen, Anh M. T. Bui, Phuong T. Nguyen, Bach Le, Quyet-Thang Huynh

    Abstract: Open-source development has revolutionized the software industry by promoting collaboration, transparency, and community-driven innovation. Today, a vast amount of various kinds of open-source software, which form networks of repositories, is often hosted on GitHub - a popular software development platform. To enhance the discoverability of the repository networks, i.e., groups of similar reposito… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted to EASE'24

  4. arXiv:2311.05608  [pdf, other

    cs.CR cs.AI cs.CL

    FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts

    Authors: Yichen Gong, Delong Ran, **yuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, Xiaoyun Wang

    Abstract: Ensuring the safety of artificial intelligence-generated content (AIGC) is a longstanding topic in the artificial intelligence (AI) community, and the safety concerns associated with Large Language Models (LLMs) have been widely investigated. Recently, large vision-language models (VLMs) represent an unprecedented revolution, as they are built upon LLMs but can incorporate additional modalities (e… ▽ More

    Submitted 13 December, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: Technical Report

  5. arXiv:2310.07736  [pdf, other

    cs.DB cs.LG

    Observatory: Characterizing Embeddings of Relational Tables

    Authors: Tianji Cong, Madelon Hulsebos, Zhenjie Sun, Paul Groth, H. V. Jagadish

    Abstract: Language models and specialized table embedding models have recently demonstrated strong performance on many tasks over tabular data. Researchers and practitioners are keen to leverage these models in many new application contexts; but limited understanding of the strengths and weaknesses of these models, and the table representations they generate, makes the process of finding a suitable model fo… ▽ More

    Submitted 27 January, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Camera ready of VLDB 2024

  6. arXiv:2308.08505  [pdf, other

    cs.CR

    Test-Time Poisoning Attacks Against Test-Time Adaptation Models

    Authors: Tianshuo Cong, Xinlei He, Yun Shen, Yang Zhang

    Abstract: Deploying machine learning (ML) models in the wild is challenging as it suffers from distribution shifts, where the model trained on an original domain cannot generalize well to unforeseen diverse transfer domains. To address this challenge, several test-time adaptation (TTA) methods have been proposed to improve the generalization ability of the target pre-trained models under test data to cope w… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: To Appear in the 45th IEEE Symposium on Security and Privacy, May 20-23, 2024

  7. arXiv:2308.07847  [pdf, other

    cs.CR

    Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models

    Authors: Yugeng Liu, Tianshuo Cong, Zhengyu Zhao, Michael Backes, Yun Shen, Yang Zhang

    Abstract: Large Language Models (LLMs) undergo continuous updates to improve user experience. However, prior research on the security and safety implications of LLMs has primarily focused on their specific versions, overlooking the impact of successive LLM updates. This prompts the need for a holistic understanding of the risks in these different versions of LLMs. To fill this gap, in this paper, we conduct… ▽ More

    Submitted 6 May, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

  8. arXiv:2301.04901  [pdf, other

    cs.DB cs.IR

    Pylon: Semantic Table Union Search in Data Lakes

    Authors: Tianji Cong, Fatemeh Nargesian, H. V. Jagadish

    Abstract: The large size and fast growth of data repositories, such as data lakes, has spurred the need for data discovery to help analysts find related data. The problem has become challenging as (i) a user typically does not know what datasets exist in an enormous data repository; and (ii) there is usually a lack of a unified data model to capture the interrelationships between heterogeneous datasets from… ▽ More

    Submitted 13 January, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: Version submitted to the third round of ICDE 2023 on October 8, 2022

  9. arXiv:2212.14155  [pdf, other

    cs.DB

    WarpGate: A Semantic Join Discovery System for Cloud Data Warehouses

    Authors: Tianji Cong, James Gale, Jason Frantz, H. V. Jagadish, Çağatay Demiralp

    Abstract: Data discovery is a major challenge in enterprise data analysis: users often struggle to find data relevant to their analysis goals or even to navigate through data across data sources, each of which may easily contain thousands of tables. One common user need is to discover tables joinable with a given table. This need is particularly critical because join is a ubiquitous operation in data analys… ▽ More

    Submitted 2 January, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: CIDR'23

  10. WeChat uptake of Chinese scholarly journals: an analysis of CSSCI-indexed journals

    Authors: Ting Cong, Zhichao Fang, Rodrigo Costas

    Abstract: The study of how science is discussed and how scholarly actors interact on social media has increasingly become popular in the field of scientometrics in recent years. While most prior studies focused on research outputs discussed on global platforms, such as Twitter or Facebook, the presence of scholarly journals on local platforms was seldom studied, especially in the Chinese social media contex… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Journal ref: Scientometrics (2022)

  11. arXiv:2201.11692  [pdf, other

    cs.CR cs.LG

    SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders

    Authors: Tianshuo Cong, Xinlei He, Yang Zhang

    Abstract: Self-supervised learning is an emerging machine learning paradigm. Compared to supervised learning which leverages high-quality labeled datasets, self-supervised learning relies on unlabeled datasets to pre-train powerful encoders which can then be treated as feature extractors for various downstream tasks. The huge amount of data and computational resources consumption makes the encoders themselv… ▽ More

    Submitted 31 August, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: Accepted by CCS 2022

  12. arXiv:1911.11946  [pdf, other

    cs.CV cs.CR cs.LG

    Can Attention Masks Improve Adversarial Robustness?

    Authors: Pratik Vaishnavi, Tianji Cong, Kevin Eykholt, Atul Prakash, Amir Rahmati

    Abstract: Deep Neural Networks (DNNs) are known to be susceptible to adversarial examples. Adversarial examples are maliciously crafted inputs that are designed to fool a model, but appear normal to human beings. Recent work has shown that pixel discretization can be used to make classifiers for MNIST highly robust to adversarial examples. However, pixel discretization fails to provide significant protectio… ▽ More

    Submitted 21 December, 2019; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: Version presented at AAAI-20 workshop on Engineering Dependable and Secure Machine Learning Systems (EDSMLS)