-
An Experiment with the Use of ChatGPT for LCSH Subject Assignment on Electronic Theses and Dissertations
Authors:
Eric H. C. Chow,
TJ Kao,
Xiaoli Li
Abstract:
This study delves into the potential use of Large Language Models (LLMs) for generating Library of Congress Subject Headings (LCSH). The authors employed ChatGPT to generate subject headings for electronic theses and dissertations (ETDs) based on their titles and summaries. The results revealed that although some generated subject headings were valid, there were issues regarding specificity and ex…
▽ More
This study delves into the potential use of Large Language Models (LLMs) for generating Library of Congress Subject Headings (LCSH). The authors employed ChatGPT to generate subject headings for electronic theses and dissertations (ETDs) based on their titles and summaries. The results revealed that although some generated subject headings were valid, there were issues regarding specificity and exhaustiveness. The study showcases that LLMs can serve as a strategic response to the backlog of items awaiting cataloging in academic libraries, while also offering a cost-effective approach for promptly generating LCSH. Nonetheless, human catalogers remain essential for verifying and enhancing the validity, exhaustiveness, and specificity of LCSH generated by LLMs.
△ Less
Submitted 3 April, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Minimum Description Length Control
Authors:
Ted Moskovitz,
Ta-Chu Kao,
Maneesh Sahani,
Matthew M. Botvinick
Abstract:
We propose a novel framework for multitask reinforcement learning based on the minimum description length (MDL) principle. In this approach, which we term MDL-control (MDL-C), the agent learns the common structure among the tasks with which it is faced and then distills it into a simpler representation which facilitates faster convergence and generalization to new tasks. In doing so, MDL-C natural…
▽ More
We propose a novel framework for multitask reinforcement learning based on the minimum description length (MDL) principle. In this approach, which we term MDL-control (MDL-C), the agent learns the common structure among the tasks with which it is faced and then distills it into a simpler representation which facilitates faster convergence and generalization to new tasks. In doing so, MDL-C naturally balances adaptation to each task with epistemic uncertainty about the task distribution. We motivate MDL-C via formal connections between the MDL principle and Bayesian inference, derive theoretical performance guarantees, and demonstrate MDL-C's empirical effectiveness on both discrete and high-dimensional continuous control tasks.
△ Less
Submitted 24 July, 2022; v1 submitted 17 July, 2022;
originally announced July 2022.
-
A Question-Answer Driven Approach to Reveal Affirmative Interpretations from Verbal Negations
Authors:
Md Mosharaf Hossain,
Luke Holman,
Anusha Kakileti,
Tiffany Iris Kao,
Nathan Raul Brito,
Aaron Abraham Mathews,
Eduardo Blanco
Abstract:
This paper explores a question-answer driven approach to reveal affirmative interpretations from verbal negations (i.e., when a negation cue grammatically modifies a verb). We create a new corpus consisting of 4,472 verbal negations and discover that 67.1% of them convey that an event actually occurred. Annotators generate and answer 7,277 questions for the 3,001 negations that convey an affirmati…
▽ More
This paper explores a question-answer driven approach to reveal affirmative interpretations from verbal negations (i.e., when a negation cue grammatically modifies a verb). We create a new corpus consisting of 4,472 verbal negations and discover that 67.1% of them convey that an event actually occurred. Annotators generate and answer 7,277 questions for the 3,001 negations that convey an affirmative interpretation. We first cast the problem of revealing affirmative interpretations from negations as a natural language inference (NLI) classification task. Experimental results show that state-of-the-art transformers trained with existing NLI corpora are insufficient to reveal affirmative interpretations. We also observe, however, that fine-tuning brings small improvements. In addition to NLI classification, we also explore the more realistic task of generating affirmative interpretations directly from negations with the T5 transformer. We conclude that the generation task remains a challenge as T5 substantially underperforms humans.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Natural continual learning: success is a journey, not (just) a destination
Authors:
Ta-Chu Kao,
Kristopher T. Jensen,
Gido M. van de Ven,
Alberto Bernacchia,
Guillaume Hennequin
Abstract:
Biological agents are known to learn many different tasks over the course of their lives, and to be able to revisit previous tasks and behaviors with little to no loss in performance. In contrast, artificial agents are prone to 'catastrophic forgetting' whereby performance on previous tasks deteriorates rapidly as new ones are acquired. This shortcoming has recently been addressed using methods th…
▽ More
Biological agents are known to learn many different tasks over the course of their lives, and to be able to revisit previous tasks and behaviors with little to no loss in performance. In contrast, artificial agents are prone to 'catastrophic forgetting' whereby performance on previous tasks deteriorates rapidly as new ones are acquired. This shortcoming has recently been addressed using methods that encourage parameters to stay close to those used for previous tasks. This can be done by (i) using specific parameter regularizers that map out suitable destinations in parameter space, or (ii) guiding the optimization journey by projecting gradients into subspaces that do not interfere with previous tasks. However, these methods often exhibit subpar performance in both feedforward and recurrent neural networks, with recurrent networks being of interest to the study of neural dynamics supporting biological continual learning. In this work, we propose Natural Continual Learning (NCL), a new method that unifies weight regularization and projected gradient descent. NCL uses Bayesian weight regularization to encourage good performance on all tasks at convergence and combines this with gradient projection using the prior precision, which prevents catastrophic forgetting during optimization. Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in feedforward and recurrent networks. Finally, the trained networks evolve task-specific dynamics that are strongly preserved as new tasks are learned, similar to experimental findings in biological circuits.
△ Less
Submitted 15 December, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Automatic differentiation of Sylvester, Lyapunov, and algebraic Riccati equations
Authors:
Ta-Chu Kao,
Guillaume Hennequin
Abstract:
Sylvester, Lyapunov, and algebraic Riccati equations are the bread and butter of control theorists. They are used to compute infinite-horizon Gramians, solve optimal control problems in continuous or discrete time, and design observers. While popular numerical computing frameworks (e.g., scipy) provide efficient solvers for these equations, these solvers are still largely missing from most automat…
▽ More
Sylvester, Lyapunov, and algebraic Riccati equations are the bread and butter of control theorists. They are used to compute infinite-horizon Gramians, solve optimal control problems in continuous or discrete time, and design observers. While popular numerical computing frameworks (e.g., scipy) provide efficient solvers for these equations, these solvers are still largely missing from most automatic differentiation libraries. Here, we derive the forward and reverse-mode derivatives of the solutions to all three types of equations, and showcase their application on an inverse control problem.
△ Less
Submitted 24 November, 2020; v1 submitted 23 November, 2020;
originally announced November 2020.
-
Manifold GPLVMs for discovering non-Euclidean latent structure in neural data
Authors:
Kristopher T. Jensen,
Ta-Chu Kao,
Marco Tripodi,
Guillaume Hennequin
Abstract:
A common problem in neuroscience is to elucidate the collective neural representations of behaviorally important variables such as head direction, spatial location, upcoming movements, or mental spatial transformations. Often, these latent variables are internal constructs not directly accessible to the experimenter. Here, we propose a new probabilistic latent variable model to simultaneously iden…
▽ More
A common problem in neuroscience is to elucidate the collective neural representations of behaviorally important variables such as head direction, spatial location, upcoming movements, or mental spatial transformations. Often, these latent variables are internal constructs not directly accessible to the experimenter. Here, we propose a new probabilistic latent variable model to simultaneously identify the latent state and the way each neuron contributes to its representation in an unsupervised way. In contrast to previous models which assume Euclidean latent spaces, we embrace the fact that latent states often belong to symmetric manifolds such as spheres, tori, or rotation groups of various dimensions. We therefore propose the manifold Gaussian process latent variable model (mGPLVM), where neural responses arise from (i) a shared latent variable living on a specific manifold, and (ii) a set of non-parametric tuning curves determining how each neuron contributes to the representation. Cross-validated comparisons of models with different topologies can be used to distinguish between candidate manifolds, and variational inference enables quantification of uncertainty. We demonstrate the validity of the approach on several synthetic datasets, as well as on calcium recordings from the ellipsoid body of Drosophila melanogaster and extracellular recordings from the mouse anterodorsal thalamic nucleus. These circuits are both known to encode head direction, and mGPLVM correctly recovers the ring topology expected from neural populations representing a single angular variable.
△ Less
Submitted 21 October, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Active Learning for Domain Classification in a Commercial Spoken Personal Assistant
Authors:
Xi C. Chen,
Adithya Sagar,
Justine T. Kao,
Tony Y. Li,
Christopher Klein,
Stephen Pulman,
Ashish Garg,
Jason D. Williams
Abstract:
We describe a method for selecting relevant new training data for the LSTM-based domain selection component of our personal assistant system. Adding more annotated training data for any ML system typically improves accuracy, but only if it provides examples not already adequately covered in the existing data. However, obtaining, selecting, and labeling relevant data is expensive. This work present…
▽ More
We describe a method for selecting relevant new training data for the LSTM-based domain selection component of our personal assistant system. Adding more annotated training data for any ML system typically improves accuracy, but only if it provides examples not already adequately covered in the existing data. However, obtaining, selecting, and labeling relevant data is expensive. This work presents a simple technique that automatically identifies new helpful examples suitable for human annotation. Our experimental results show that the proposed method, compared with random-selection and entropy-based methods, leads to higher accuracy improvements given a fixed annotation budget. Although developed and tested in the setting of a commercial intelligent assistant, the technique is of wider applicability.
△ Less
Submitted 29 August, 2019;
originally announced August 2019.
-
Pareto-Optimization Framework for Automated Network-on-Chip Design
Authors:
Tzyy-Juin Kao,
Wolfgang Fink
Abstract:
With the advent of multi-core processors, network-on-chip design has been key in addressing network performances, such as bandwidth, power consumption, and communication delays when dealing with on-chip communication between the increasing number of processor cores. As the numbers of cores increase, network design becomes more complex. Therefore, there is a critical need in soliciting computer aid…
▽ More
With the advent of multi-core processors, network-on-chip design has been key in addressing network performances, such as bandwidth, power consumption, and communication delays when dealing with on-chip communication between the increasing number of processor cores. As the numbers of cores increase, network design becomes more complex. Therefore, there is a critical need in soliciting computer aid in determining network configurations that afford optimal performance given resources and design constraints. We propose a Pareto-optimization framework that explores the space of possible network configurations to determine optimal network latencies, power consumption, and the corresponding link allocations. For a given number of routers, average network latency and power consumption as example performance objectives can be displayed in form of Pareto-optimal fronts, thus not only offering a design tool, but also enabling trade-off studies.
△ Less
Submitted 30 July, 2018;
originally announced July 2018.
-
Layer Communities in Multiplex Networks
Authors:
Ta-Chu Kao,
Mason A. Porter
Abstract:
Multiplex networks are a type of multilayer network in which entities are connected to each other via multiple types of connections. We propose a method, based on computing pairwise similarities between layers and then doing community detection, for grou** structurally similar layers in multiplex networks. We illustrate our approach using both synthetic and empirical networks, and we are able to…
▽ More
Multiplex networks are a type of multilayer network in which entities are connected to each other via multiple types of connections. We propose a method, based on computing pairwise similarities between layers and then doing community detection, for grou** structurally similar layers in multiplex networks. We illustrate our approach using both synthetic and empirical networks, and we are able to find meaningful groups of layers in both cases. For example, we find that airlines that are based in similar geographic locations tend to be grouped together in an airline multiplex network and that related research areas in physics tend to be grouped together in an multiplex collaboration network.
△ Less
Submitted 13 June, 2017;
originally announced June 2017.
-
Blind Index Coding
Authors:
David T. H. Kao,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
We introduce the blind index coding (BIC) problem, in which a single sender communicates distinct messages to multiple users over a shared channel. Each user has partial knowledge of each message as side information. However, unlike classic index coding, in BIC, the sender is uncertain of what side information is available to each user. In particular, the sender only knows the amount of bits in ea…
▽ More
We introduce the blind index coding (BIC) problem, in which a single sender communicates distinct messages to multiple users over a shared channel. Each user has partial knowledge of each message as side information. However, unlike classic index coding, in BIC, the sender is uncertain of what side information is available to each user. In particular, the sender only knows the amount of bits in each user's side information but not its content. This problem can arise naturally in caching and wireless networks. In order to blindly exploit side information in the BIC problem, we develop a hybrid coding scheme that XORs uncoded bits of a subset of messages with random combinations of bits from other messages. This scheme allows us to strike the right balance between maximizing the transmission rate to each user and minimizing the interference leakage to others. We also develop a general outer bound, which relies on a strong data processing inequality to effectively capture the senders uncertainty about the users' side information. Additionally, we consider the case where communication takes place over a shared wireless medium, modeled by an erasure broadcast channel, and show that surprisingly, combining repetition coding with hybrid coding improves the achievable rate region and outperforms alternative strategies of co** with channel erasure and while blindly exploiting side information.
△ Less
Submitted 1 September, 2015; v1 submitted 22 April, 2015;
originally announced April 2015.
-
Rover-to-Orbiter Communication in Mars: Taking Advantage of the Varying Topology
Authors:
Songze Li,
David T. H. Kao,
A. Salman Avestimehr
Abstract:
In this paper, we study the communication problem from rovers on Mars' surface to Mars-orbiting satellites. We first justify that, to a good extent, the rover-to-orbiter communication problem can be modelled as communication over a $2 \times 2$ X-channel with the network topology varying over time. For such a fading X-channel where transmitters are only aware of the time-varying topology but not t…
▽ More
In this paper, we study the communication problem from rovers on Mars' surface to Mars-orbiting satellites. We first justify that, to a good extent, the rover-to-orbiter communication problem can be modelled as communication over a $2 \times 2$ X-channel with the network topology varying over time. For such a fading X-channel where transmitters are only aware of the time-varying topology but not the time-varying channel state (i.e., no CSIT), we propose coding strategies that code across topologies, and develop upper bounds on the sum degrees-of-freedom (DoF) that is shown to be tight under certain pattern of the topology variation. Furthermore we demonstrate that the proposed scheme approximately achieves the ergodic sum-capacity of the network. Using the proposed coding scheme, we numerically evaluate the ergodic rate gain over a time-division-multiple-access (TDMA) scheme for Rayleigh and Rice fading channels. We also numerically demonstrate that with practical orbital parameters, a 9.6% DoF gain, as well as more than 11.6% throughput gain can be achieved for a rover-to-orbiter communication network.
△ Less
Submitted 10 December, 2015; v1 submitted 19 April, 2015;
originally announced April 2015.
-
Linear Degrees of Freedom of the MIMO X-Channel with Delayed CSIT
Authors:
David T. H. Kao,
A. Salman Avestimehr
Abstract:
We study the degrees of freedom (DoF) of the multiple-input multiple-output X-channel (MIMO XC) with delayed channel state information at the transmitters (delayed CSIT), assuming linear coding strategies at the transmitters. We present two results: 1) the linear sum DoF for MIMO XC with general antenna configurations, and 2) the linear DoF region for MIMO XC with symmetric antennas. The converse…
▽ More
We study the degrees of freedom (DoF) of the multiple-input multiple-output X-channel (MIMO XC) with delayed channel state information at the transmitters (delayed CSIT), assuming linear coding strategies at the transmitters. We present two results: 1) the linear sum DoF for MIMO XC with general antenna configurations, and 2) the linear DoF region for MIMO XC with symmetric antennas. The converse for each result is based on develo** a novel rank-ratio inequality that characterizes the maximum ratio between the dimensions of received linear subspaces at the two multiple-antenna receivers. The achievability of the linear sum DoF is based on a three-phase strategy, in which during the first two phases only the transmitter with fewer antennas exploits delayed CSIT in order to minimize the dimension of its signal at the unintended receiver. During Phase 3, both transmitters use delayed CSIT to send linear combinations of past transmissions such that each receiver receives a superposition of desired message data and known interference, thus simultaneously serving both receivers. We also derive other linear DoF outer bounds for the MIMO XC that, in addition to the outer bounds from the sum DoF converse and the proposed transmission strategy, allow us to characterize the linear DoF region for symmetric antenna configurations.
△ Less
Submitted 5 May, 2014;
originally announced May 2014.
-
An Upper Bound on the Capacity of Vector Dirty Paper with Unknown Spin and Stretch
Authors:
David T. H. Kao,
Ashutosh Sabharwal
Abstract:
Dirty paper codes are a powerful tool for combating known interference. However, there is a significant difference between knowing the transmitted interference sequence and knowing the received interference sequence, especially when the channel modifying the interference is uncertain. We present an upper bound on the capacity of a compound vector dirty paper channel where although an additive Gaus…
▽ More
Dirty paper codes are a powerful tool for combating known interference. However, there is a significant difference between knowing the transmitted interference sequence and knowing the received interference sequence, especially when the channel modifying the interference is uncertain. We present an upper bound on the capacity of a compound vector dirty paper channel where although an additive Gaussian sequence is known to the transmitter, the channel matrix between the interferer and receiver is uncertain but known to lie within a bounded set. Our bound is tighter than previous bounds in the low-SIR regime for the scalar version of the compound dirty paper channel and employs a construction that focuses on the relationship between the dimension of the message-bearing signal and the dimension of the additive state sequence. Additionally, a bound on the high-SNR behavior of the system is established.
△ Less
Submitted 16 May, 2013;
originally announced May 2013.
-
Two-User Interference Channels with Local Views: On Capacity Regions of TDM-Dominating Policies
Authors:
David T. -H. Kao,
Ashutosh Sabharwal
Abstract:
We study the capacity regions of two-user interference channels where transmitters base their transmission schemes on local views of the channel state. Under the local view model, each transmitter knows only a subset of the four channel gains, which may be mismatched from the other transmitter.
We consider a set of seven local views, and find that for five out of the seven local views, TDM is su…
▽ More
We study the capacity regions of two-user interference channels where transmitters base their transmission schemes on local views of the channel state. Under the local view model, each transmitter knows only a subset of the four channel gains, which may be mismatched from the other transmitter.
We consider a set of seven local views, and find that for five out of the seven local views, TDM is sufficient to achieve the qualified notion of capacity region for the linear deterministic interference channel which approximates the Gaussian interference channel. For these five local views, the qualified capacity result implies that no policy can achieve a rate point outside the TDM region without inducing a corner case of sub-TDM performance in another channel state. The common trait shared by the two remaining local views - those with the potential to outperform TDM - is transmitter knowledge of the outgoing interference link accompanied by some common knowledge of state, emphasizing their importance in creating opportunities to coordinate usage of more advanced schemes.
Our conclusions are extended to bounded gap characterizations of the capacity region for the Gaussian interference channel.
△ Less
Submitted 29 July, 2012; v1 submitted 4 October, 2011;
originally announced October 2011.