-
Typhoon: Towards an Effective Task-Specific Masking Strategy for Pre-trained Language Models
Authors:
Muhammed Shahir Abdurrahman,
Hashem Elezabi,
Bruce Changlong Xu
Abstract:
Through exploiting a high level of parallelism enabled by graphics processing units, transformer architectures have enabled tremendous strides forward in the field of natural language processing. In a traditional masked language model, special MASK tokens are used to prompt our model to gather contextual information from surrounding words to restore originally hidden information. In this paper, we…
▽ More
Through exploiting a high level of parallelism enabled by graphics processing units, transformer architectures have enabled tremendous strides forward in the field of natural language processing. In a traditional masked language model, special MASK tokens are used to prompt our model to gather contextual information from surrounding words to restore originally hidden information. In this paper, we explore a task-specific masking framework for pre-trained large language models that enables superior performance on particular downstream tasks on the datasets in the GLUE benchmark. We develop our own masking algorithm, Typhoon, based on token input gradients, and compare this with other standard baselines. We find that Typhoon offers performance competitive with whole-word masking on the MRPC dataset. Our implementation can be found in a public Github Repository.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Matching Markets
Authors:
Andrew Yang,
Bruce Changlong Xu,
Ivan Villa-Renteria
Abstract:
Matching markets are of particular interest in computer science and economics literature as they are often used to model real-world phenomena where we aim to equitably distribute a limited amount of resources to multiple agents and determine these distributions efficiently. Although it has been shown that finding market clearing prices for Fisher markets with indivisible goods is NP-hard, there ex…
▽ More
Matching markets are of particular interest in computer science and economics literature as they are often used to model real-world phenomena where we aim to equitably distribute a limited amount of resources to multiple agents and determine these distributions efficiently. Although it has been shown that finding market clearing prices for Fisher markets with indivisible goods is NP-hard, there exist polynomial-time algorithms able to compute these prices and allocations when the goods are divisible and the utility functions are linear. We provide a promising research direction toward the development of a market that simulates buyers' preferences that vary according to the bundles of goods allocated to other buyers. Our research aims to elucidate unique ways in which the theory of matching markets can be extended to account for more complex and often counterintuitive microeconomic phenomena.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
Separating Circuits : Switching Lemmas and Random Restrictions
Authors:
Bruce Changlong Xu
Abstract:
This was submitted as a final project for CS254B, taught by Li Yang Tan and Tom Knowles. The field of Circuit Complexity utilises careful analysis of Boolean Circuit Functions in order to extract meaningful information about a range of complexity classes. In particular, the complexity class $P / \text{Poly}$ has played a central role in much of the historical attempts to tackle the problem of whet…
▽ More
This was submitted as a final project for CS254B, taught by Li Yang Tan and Tom Knowles. The field of Circuit Complexity utilises careful analysis of Boolean Circuit Functions in order to extract meaningful information about a range of complexity classes. In particular, the complexity class $P / \text{Poly}$ has played a central role in much of the historical attempts to tackle the problem of whether solution and verification are equivalent i.e. the central $P$ versus $NP$ problem. Whilst circuits can potentially be easier to analyse than Turing Machines due to their non-uniform nature of computation (program size is allowed to depend on the input size), it is notoriously hard to establish lower bounds for them. In this report, we will touch upon several results published by Hastad, Sipser and Razborov that will highlight a dynamic interplay between circuit complexity and many of the central ideas of modern-day complexity theory, and in particular the central importance of Hastad's Switching Lemma.
△ Less
Submitted 17 September, 2021;
originally announced September 2021.