Skip to main content

Showing 1–21 of 21 results for author: Ni, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.19591  [pdf, other

    cs.LG cs.AR cs.NE

    Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

    Authors: **cheng Dong, Yonghao Tan, Dong Zhang, Tianwei Ni, Xuejiao Liu, Yu Liu, Peng Luo, Luhong Liang, Shih-Yang Liu, Xijie Huang, Huaiyu Zhu, Yun Pan, Fengwei An, Kwang-Ting Cheng

    Abstract: Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs. Previous state-of-the-art works optimize these operations by piece-wise linear approximation and store the parameters in look-up tables (LUT), but most of them require unfriendly high-precision arithmetics such as FP/INT 32 and lack consideration of… ▽ More

    Submitted 29 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: 61st ACM/IEEE Design Automation Conference (DAC) 2024

  2. arXiv:2403.16829  [pdf, ps, other

    cs.LG cs.AI

    Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm

    Authors: Titouan Renard, Andreas Schlaginhaufen, Tingting Ni, Maryam Kamgarpour

    Abstract: Given a dataset of expert demonstrations, inverse reinforcement learning (IRL) aims to recover a reward for which the expert is optimal. This work proposes a model-free algorithm to solve entropy-regularized IRL problem. In particular, we employ a stochastic gradient descent update for the reward and a stochastic soft policy iteration update for the policy. Assuming access to a generative model, w… ▽ More

    Submitted 23 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  3. arXiv:2402.05290  [pdf, other

    cs.LG cs.AI

    Do Transformer World Models Give Better Policy Gradients?

    Authors: Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon

    Abstract: A natural approach for reinforcement learning is to predict future rewards by unrolling a neural network world model, and to backpropagate through the resulting computational graph to learn a policy. However, this method often becomes impractical for long horizons since typical world models induce hard-to-optimize loss landscapes. Transformers are known to efficiently propagate gradients over long… ▽ More

    Submitted 10 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Michel Ma and Pierluca D'Oro contributed equally

  4. arXiv:2401.08898  [pdf, other

    cs.LG cs.AI

    Bridging State and History Representations: Understanding Self-Predictive RL

    Authors: Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon

    Abstract: Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared propertie… ▽ More

    Submitted 21 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: ICLR 2024 (Poster). Code is available at https://github.com/twni2016/self-predictive-rl

  5. arXiv:2312.00561  [pdf, other

    cs.LG math.OC

    A safe exploration approach to constrained Markov decision processes

    Authors: Tingting Ni, Maryam Kamgarpour

    Abstract: We consider discounted infinite horizon constrained Markov decision processes (CMDPs) where the goal is to find an optimal policy that maximizes the expected cumulative reward subject to expected cumulative constraints. Motivated by the application of CMDPs in online learning of safety-critical systems, we focus on develo** a model-free and simulator-free algorithm that ensures constraint satisf… ▽ More

    Submitted 23 May, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: 37 pages, 3 figures

  6. arXiv:2307.03864  [pdf, other

    cs.LG

    When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

    Authors: Tianwei Ni, Michel Ma, Benjamin Eysenbach, Pierre-Luc Bacon

    Abstract: Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, and determining how actions influence future returns. Both challenges involve modeling long-term dependencies. The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain. However, the u… ▽ More

    Submitted 3 November, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023 (Oral)

  7. arXiv:2207.11541  [pdf, other

    cs.LG

    FastATDC: Fast Anomalous Trajectory Detection and Classification

    Authors: Tianle Ni, **gwei Wang, Yunlong Ma, Shuang Wang, Min Liu, Weiming Shen

    Abstract: Automated detection of anomalous trajectories is an important problem with considerable applications in intelligent transportation systems. Many existing studies have focused on distinguishing anomalous trajectories from normal trajectories, ignoring the large differences between anomalous trajectories. A recent study has made great progress in identifying abnormal trajectory patterns and proposed… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: 6 pages, 4 figures, accepted by 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)

  8. arXiv:2112.12612  [pdf, other

    cs.RO cs.CV

    Towards Disturbance-Free Visual Mobile Manipulation

    Authors: Tianwei Ni, Kiana Ehsani, Luca Weihs, Jordi Salvador

    Abstract: Deep reinforcement learning has shown promising results on an abundance of robotic tasks in simulation, including visual navigation and manipulation. Prior work generally aims to build embodied agents that solve their assigned tasks as quickly as possible, while largely ignoring the problems caused by collision with objects during interaction. This lack of prioritization is understandable: there i… ▽ More

    Submitted 21 October, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: WACV 2023

  9. arXiv:2110.05038  [pdf, other

    cs.LG cs.AI cs.RO

    Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs

    Authors: Tianwei Ni, Benjamin Eysenbach, Ruslan Salakhutdinov

    Abstract: Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with memory-based architectures, such as recurrent neural networks, provides a general approach to solving all types of POMDPs. However, prior work has found that such recurrent model-free RL methods tend to perform worse than more… ▽ More

    Submitted 4 June, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: ICML 2022 camera ready version. Code: https://github.com/twni2016/pomdp-baselines Project site: https://sites.google.com/view/pomdp-baselines

  10. arXiv:2104.12333  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Explore BiLSTM-CRF-Based Models for Open Relation Extraction

    Authors: Tao Ni, Qing Wang, Gabriela Ferraro

    Abstract: Extracting multiple relations from text sentences is still a challenge for current Open Relation Extraction (Open RE) tasks. In this paper, we develop several Open RE models based on the bidirectional LSTM-CRF (BiLSTM-CRF) neural network and different contextualized word embedding methods. We also propose a new tagging scheme to solve overlap** problems and enhance models' performance. From the… ▽ More

    Submitted 25 April, 2021; originally announced April 2021.

  11. arXiv:2103.04439  [pdf, other

    cs.RO cs.AI cs.HC

    Adaptive Agent Architecture for Real-time Human-Agent Teaming

    Authors: Tianwei Ni, Huao Li, Siddharth Agrawal, Suhas Raja, Fan Jia, Yikang Gui, Dana Hughes, Michael Lewis, Katia Sycara

    Abstract: Teamwork is a set of interrelated reasoning, actions and behaviors of team members that facilitate common objectives. Teamwork theory and experiments have resulted in a set of states and processes for team effectiveness in both human-human and agent-agent teams. However, human-agent teaming is less well studied because it is so new and involves asymmetry in policy and intent not present in human t… ▽ More

    Submitted 7 March, 2021; originally announced March 2021.

    Comments: The first three authors contributed equally. In AAAI 2021 Workshop on Plan, Activity, and Intent Recognition

  12. arXiv:2011.04709  [pdf, other

    cs.LG cs.RO

    f-IRL: Inverse Reinforcement Learning via State Marginal Matching

    Authors: Tianwei Ni, Harshit Sikchi, Yufei Wang, Tejus Gupta, Lisa Lee, Benjamin Eysenbach

    Abstract: Imitation learning is well-suited for robotic tasks where it is difficult to directly program the behavior or specify a cost for optimal control. In this work, we propose a method for learning the reward function (and the corresponding policy) to match the expert state density. Our main result is the analytic gradient of any f-divergence between the agent and expert state distribution w.r.t. rewar… ▽ More

    Submitted 29 December, 2020; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: The first four authors have equal contribution (orders determined by dice rolling), and the last two authors have equal advising. The paper is accepted by Conference on Robot Learning (CoRL) 2020. Project videos and code link are available at https://sites.google.com/view/f-irl/home

  13. arXiv:2010.01235  [pdf, other

    cs.CR

    DCDChain: A Credible Architecture of Digital Copyright Detection Based on Blockchain

    Authors: Zhili Chen, Yuting Wang, Tianjiao Ni

    Abstract: Copyright detection is an effective method to prevent piracy. However, untrustworthy detection parties may lead to falsified detection results. Due to its credibility and tamper resistance, blockchain has been applied to copyright protection. Previous works mainly utilized blockchain for reliable copyright information storage or copyrighted digital media trading. As far as we know, the problem of… ▽ More

    Submitted 24 August, 2022; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: 5 figures

    Journal ref: Submission to Journal of Surveillance, Security and Safety (JSSS), 2022

  14. arXiv:2010.01234  [pdf, other

    cs.CR

    Utility-efficient Differentially Private K-means Clustering based on Cluster Merging

    Authors: Tianjiao Ni, Minghao Qiao, Zhili Chen, Shun Zhang, Hong Zhong

    Abstract: Differential privacy is widely used in data analysis. State-of-the-art $k$-means clustering algorithms with differential privacy typically add an equal amount of noise to centroids for each iterative computation. In this paper, we propose a novel differentially private $k$-means clustering algorithm, DP-KCCM, that significantly improves the utility of clustering by adding adaptive noise and mergin… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: 13 figures

  15. arXiv:2007.01932  [pdf, other

    cs.LG stat.ML

    Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient

    Authors: Yufei Wang, Tianwei Ni

    Abstract: Exploration-exploitation dilemma has long been a crucial issue in reinforcement learning. In this paper, we propose a new approach to automatically balance between these two. Our method is built upon the Soft Actor-Critic (SAC) algorithm, which uses an "entropy temperature" that balances the original task reward and the policy entropy, and hence controls the trade-off between exploitation and expl… ▽ More

    Submitted 31 July, 2020; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: published at 7th ICML Workshop on Automated Machine Learning (2020)

  16. arXiv:2002.09637  [pdf, other

    cs.CL

    Markov Chain Monte-Carlo Phylogenetic Inference Construction in Computational Historical Linguistics

    Authors: Tianyi Ni

    Abstract: More and more languages in the world are under study nowadays, as a result, the traditional way of historical linguistics study is facing some challenges. For example, the linguistic comparative research among languages needs manual annotation, which becomes more and more impossible with the increasing amount of language data coming out all around the world. Although it could hardly replace lingui… ▽ More

    Submitted 13 March, 2020; v1 submitted 22 February, 2020; originally announced February 2020.

  17. arXiv:2001.00694  [pdf, other

    cs.CR

    Differentially Private Combinatorial Cloud Auction

    Authors: Tianjiao Ni, Zhili Chen, Lin Chen, Hong Zhong, Shun Zhang, Yan Xu

    Abstract: Cloud service providers typically provide different types of virtual machines (VMs) to cloud users with various requirements. Thanks to its effectiveness and fairness, auction has been widely applied in this heterogeneous resource allocation. Recently, several strategy-proof combinatorial cloud auction mechanisms have been proposed. However, they fail to protect the bid privacy of users from being… ▽ More

    Submitted 2 January, 2020; originally announced January 2020.

    Comments: 12 pages, 6 figures

  18. arXiv:1812.00518  [pdf, other

    cs.CV

    Elastic Boundary Projection for 3D Medical Image Segmentation

    Authors: Tianwei Ni, Lingxi Xie, Huangjie Zheng, Elliot K. Fishman, Alan L. Yuille

    Abstract: We focus on an important yet challenging problem: using a 2D deep network to deal with 3D segmentation for medical image analysis. Existing approaches either applied multi-view planar (2D) networks or directly used volumetric (3D) networks for this purpose, but both of them are not ideal: 2D networks cannot capture 3D contexts effectively, and 3D networks are both memory-consuming and less stable… ▽ More

    Submitted 6 June, 2020; v1 submitted 2 December, 2018; originally announced December 2018.

    Comments: Accepted to CVPR 2019

  19. arXiv:1811.11814  [pdf, other

    cs.CV

    Phase Collaborative Network for Two-Phase Medical Image Segmentation

    Authors: Huangjie Zheng, Lingxi Xie, Tianwei Ni, Ya Zhang, Yan-Feng Wang, Qi Tian, Elliot K. Fishman, Alan L. Yuille

    Abstract: In real-world practice, medical images acquired in different phases possess complementary information, {\em e.g.}, radiologists often refer to both arterial and venous scans in order to make the diagnosis. However, in medical image analysis, fusing prediction from two phases is often difficult, because (i) there is a domain gap between two phases, and (ii) the semantic labels are not pixel-wise co… ▽ More

    Submitted 12 September, 2019; v1 submitted 28 November, 2018; originally announced November 2018.

  20. arXiv:1810.07873  [pdf, other

    cs.CR cs.GT

    Differentially Private Double Spectrum Auction with Approximate Social Welfare Maximization

    Authors: Zhili Chen, Tianjiao Ni, Hong Zhong, Shun Zhang, Jie Cui

    Abstract: Spectrum auction is an effective approach to improving spectrum utilization, by leasing idle spectrum from primary users to secondary users. Recently, a few differentially private spectrum auction mechanisms have been proposed, but, as far as we know, none of them addressed the differential privacy in the setting of double spectrum auctions. In this paper, we combine the concept of differential pr… ▽ More

    Submitted 17 October, 2018; originally announced October 2018.

    Comments: 12 pages, 7figures

  21. arXiv:1602.01930  [pdf, ps, other

    cs.SI cs.GT

    On The Robustness of Price-Anticipating Kelly Mechanism

    Authors: Yuedong Xu, Zhujun Xiao, Tianyu Ni, Jessie Hui Wang, Xin Wang, Eitan Altman

    Abstract: The price-anticipating Kelly mechanism (PAKM) is one of the most extensively used strategies to allocate divisible resources for strategic users in communication networks and computing systems. The users are deemed as selfish and also benign, each of which maximizes his individual utility of the allocated resources minus his payment to the network operator. However, in many applications a user can… ▽ More

    Submitted 4 October, 2021; v1 submitted 5 February, 2016; originally announced February 2016.

    Comments: 21