Search | arXiv e-print repository

IRG: Generating Synthetic Relational Databases using GANs

Abstract: There is an overgrowing demand for data sharing in academia and industry. However, such sharing has issues with personal privacy and data confidentiality. One option is to share only synthetically-generated versions of the real data. Generative Adversarial Network (GAN) is a recently-popular technique that can be used for this purpose. Relational databases usually have multiple tables that are r… ▽ More There is an overgrowing demand for data sharing in academia and industry. However, such sharing has issues with personal privacy and data confidentiality. One option is to share only synthetically-generated versions of the real data. Generative Adversarial Network (GAN) is a recently-popular technique that can be used for this purpose. Relational databases usually have multiple tables that are related to each other. So far, the use of GANs has essentially focused on generating single tables. This paper presents Incremental Relational Generator (IRG), which uses GANs to synthetically generate interrelated tables. Given an empirical relational database, IRG can generate a synthetic version that can be safely shared. IRG generates the tables in some sequential order. The key idea is to construct a context, based on the tables generated so far, when using a GAN to generate the next table. Experiments with public datasets and private student data show that IRG outperforms state-of-the-art in terms of statistical properties and query results. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2310.04003 [pdf, other]

The Role of Federated Learning in a Wireless World with Foundation Models

Authors: Zihan Chen, Howard H. Yang, Y. C. Tay, Kai Fong Ernest Chong, Tony Q. S. Quek

Abstract: Foundation models (FMs) are general-purpose artificial intelligence (AI) models that have recently enabled multiple brand-new generative AI applications. The rapid advances in FMs serve as an important contextual backdrop for the vision of next-generation wireless networks, where federated learning (FL) is a key enabler of distributed network intelligence. Currently, the exploration of the interpl… ▽ More Foundation models (FMs) are general-purpose artificial intelligence (AI) models that have recently enabled multiple brand-new generative AI applications. The rapid advances in FMs serve as an important contextual backdrop for the vision of next-generation wireless networks, where federated learning (FL) is a key enabler of distributed network intelligence. Currently, the exploration of the interplay between FMs and FL is still in its nascent stage. Naturally, FMs are capable of boosting the performance of FL, and FL could also leverage decentralized data and computing resources to assist in the training of FMs. However, the exceptionally high requirements that FMs have for computing resources, storage, and communication overhead would pose critical challenges to FL-enabled wireless networks. In this article, we explore the extent to which FMs are suitable for FL over wireless networks, including a broad overview of research challenges and opportunities. In particular, we discuss multiple new paradigms for realizing future intelligent networks that integrate FMs and FL. We also consolidate several broad research directions associated with these paradigms. △ Less

Submitted 7 May, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: 8 pages, 4 figures, 2 tables. This version has been accepted by IEEE Wireless Communications

arXiv:2201.04353 [pdf, other]

A simple model for citation curve

Authors: Y. C. Tay, Mostafa Rezazad, Hamid Sarbazi-Azad

Abstract: There is considerable interest in the citation count for an author's publications. This has led to many proposals for citation indices for characterizing citation distributions. However, there is so far no tractable model to facilitate the analysis of these distributions and the design of these indices. This paper presents a simple equation for such design and analysis. The equation has three para… ▽ More There is considerable interest in the citation count for an author's publications. This has led to many proposals for citation indices for characterizing citation distributions. However, there is so far no tractable model to facilitate the analysis of these distributions and the design of these indices. This paper presents a simple equation for such design and analysis. The equation has three parameters that are calibrated by three geometrical characteristics of a citation distribution. Its simple form makes it tractable. To demonstrate, the equation is used to derive closed-form expressions for various citation indices, analyze the effect of time and identify individual contribution to the Hirsch index for a group. △ Less

Submitted 12 January, 2022; originally announced January 2022.

Comments: 13 pages, 19 figures, 2 tables

arXiv:2005.13144 [pdf, other]

A review of analytical performance modeling and its role in computer engineering and science

Authors: Y. C. Tay

Abstract: This article is a review of analytical performance modeling for computer systems. It discusses the motivation for this area of research, examines key issues, introduces some ideas, illustrates how it is applied, and points out a role that it can play in develo** Computer Science. This article is a review of analytical performance modeling for computer systems. It discusses the motivation for this area of research, examines key issues, introduces some ideas, illustrates how it is applied, and points out a role that it can play in develo** Computer Science. △ Less

Submitted 26 May, 2020; originally announced May 2020.

Comments: Some parts of this article appeared in "Lessons from Teaching Analytical Performance Modeling", Proc. ICPE Workshop on Education and Practice of Performance Engineering, Mumbai, India (April 2019)

arXiv:1801.03645 [pdf, other]

A tool framework for tweaking features in synthetic datasets

Authors: J. W. Zhang, Y. C. Tay

Abstract: Researchers and developers use benchmarks to compare their algorithms and products. A database benchmark must have a dataset D. To be application-specific, this dataset D should be empirical. However, D may be too small, or too large, for the benchmarking experiments. D must, therefore, be scaled to the desired size. To ensure the scaled D' is similar to D, previous work typically specifies or e… ▽ More Researchers and developers use benchmarks to compare their algorithms and products. A database benchmark must have a dataset D. To be application-specific, this dataset D should be empirical. However, D may be too small, or too large, for the benchmarking experiments. D must, therefore, be scaled to the desired size. To ensure the scaled D' is similar to D, previous work typically specifies or extracts a fixed set of features F = {F_1, F_2, . . . , F_n} from D, then uses F to generate synthetic data for D'. However, this approach (D -> F -> D') becomes increasingly intractable as F gets larger, so a new solution is necessary. Different from existing approaches, this paper proposes ASPECT to scale D to enforce similarity. ASPECT first uses a size-scaler (S0) to scale D to D'. Then the user selects a set of desired features F'_1, . . . , F'_n. For each desired feature F'_k, there is a tweaking tool T_k that tweaks D' to make sure D' has the required feature F'_k. ASPECT coordinates the tweaking of T_1,...,T_n to D', so T_n(...(T_1(D'))...) has the required features F'_1,...,F'_n. By shifting from D -> F -> D' to D -> D' -> F', data scaling becomes flexible. The user can customise the scaled dataset with their own interested features. Extensive experiments on real datasets show that ASPECT can enforce similarity in the dataset effectively and efficiently. △ Less

Submitted 11 January, 2018; originally announced January 2018.

arXiv:1601.06838 [pdf, other]

doi 10.1109/INFOCOM.2016.7524445

A Utility Optimization Approach to Network Cache Design

Authors: Mostafa Dehghan, Laurent Massoulie, Don Towsley, Daniel Menasche, Y. C. Tay

Abstract: In any caching system, the admission and eviction policies determine which contents are added and removed from a cache when a miss occurs. Usually, these policies are devised so as to mitigate staleness and increase the hit probability. Nonetheless, the utility of having a high hit probability can vary across contents. This occurs, for instance, when service level agreements must be met, or if cer… ▽ More In any caching system, the admission and eviction policies determine which contents are added and removed from a cache when a miss occurs. Usually, these policies are devised so as to mitigate staleness and increase the hit probability. Nonetheless, the utility of having a high hit probability can vary across contents. This occurs, for instance, when service level agreements must be met, or if certain contents are more difficult to obtain than others. In this paper, we propose utility-driven caching, where we associate with each content a utility, which is a function of the corresponding content hit probability. We formulate optimization problems where the objectives are to maximize the sum of utilities over all contents. These problems differ according to the stringency of the cache capacity constraint. Our framework enables us to reverse engineer classical replacement policies such as LRU and FIFO, by computing the utility functions that they maximize. We also develop online algorithms that can be used by service providers to implement various caching policies based on arbitrary utility functions. △ Less

Submitted 25 January, 2016; originally announced January 2016.

Comments: IEEE INFOCOM 2016

arXiv:cs/0003072 [pdf, ps, other]

MOO: A Methodology for Online Optimization through Mining the Offline Optimum

Authors: Jason W. H. Lee, Y. C. Tay, Anthony K. H. Tung

Abstract: Ports, warehouses and courier services have to decide online how an arriving task is to be served in order that cost is minimized (or profit maximized). These operators have a wealth of historical data on task assignments; can these data be mined for knowledge or rules that can help the decision-making? MOO is a novel application of data mining to online optimization. The idea is to mine (logg… ▽ More Ports, warehouses and courier services have to decide online how an arriving task is to be served in order that cost is minimized (or profit maximized). These operators have a wealth of historical data on task assignments; can these data be mined for knowledge or rules that can help the decision-making? MOO is a novel application of data mining to online optimization. The idea is to mine (logged) expert decisions or the offline optimum for rules that can be used for online decisions. It requires little knowledge about the task distribution and cost structure, and is applicable to a wide range of problems. This paper presents a feasibility study of the methodology for the well-known k-server problem. Experiments with synthetic data show that optimization can be recast as classification of the optimum decisions; the resulting heuristic can achieve the optimum for strong request patterns, consistently outperforms other heuristics for weak patterns, and is robust despite changes in cost model. △ Less

Submitted 22 March, 2000; originally announced March 2000.

Comments: 12 pages, 4 figures

Report number: Research Report No. 743 ACM Class: F.2.2; H.2.8; F.1.2

Showing 1–7 of 7 results for author: Tay, Y C