-
The characterization of hyper-bent function with multiple trace terms in the extension field
Authors:
Peng Han,
Keli Pu
Abstract:
Bent functions are maximally nonlinear Boolean functions with an even number of variables, which include a subclass of functions, the so-called hyper-bent functions whose properties are stronger than bent functions and a complete classification of hyper-bent functions is elusive and inavailable.~In this paper,~we solve an open problem of Mesnager that describes hyper-bentness of hyper-bent functio…
▽ More
Bent functions are maximally nonlinear Boolean functions with an even number of variables, which include a subclass of functions, the so-called hyper-bent functions whose properties are stronger than bent functions and a complete classification of hyper-bent functions is elusive and inavailable.~In this paper,~we solve an open problem of Mesnager that describes hyper-bentness of hyper-bent functions with multiple trace terms via Dillon-like exponents with coefficients in the extension field~$\mathbb{F}_{2^{2m}}$~of this field~$\mathbb{F}_{2^{m}}$. By applying Möbius transformation and the theorems of hyperelliptic curves, hyper-bentness of these functions are successfully characterized in this field~$\mathbb{F}_{2^{2m}}$ with~$m$~odd integer.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
VizGroup: An AI-Assisted Event-Driven System for Real-Time Collaborative Programming Learning Analytics
Authors:
Xiaohang Tang,
Sam Wong,
Kevin Pu,
Xi Chen,
Yalong Yang,
Yan Chen
Abstract:
Programming instructors often conduct collaborative learning activities, like Peer Instruction, to foster a deeper understanding in students and enhance their engagement with learning. These activities, however, may not always yield productive outcomes due to the diversity of student mental models and their ineffective collaboration. In this work, we introduce VizGroup, an AI-assisted system that…
▽ More
Programming instructors often conduct collaborative learning activities, like Peer Instruction, to foster a deeper understanding in students and enhance their engagement with learning. These activities, however, may not always yield productive outcomes due to the diversity of student mental models and their ineffective collaboration. In this work, we introduce VizGroup, an AI-assisted system that enables programming instructors to easily oversee students' real-time collaborative learning behaviors during large programming courses. VizGroup leverages Large Language Models (LLMs) to recommend event specifications for instructors so that they can simultaneously track and receive alerts about key correlation patterns between various collaboration metrics and ongoing coding tasks. We evaluated VizGroup with 12 instructors using a dataset collected from a Peer Instruction activity that was conducted in a large programming lecture. The results showed that compared to a version of VizGroup without the suggested units, VizGroup with suggested units helped instructors create additional monitoring units on previously undetected patterns on their own, covered a more diverse range of metrics, and influenced the participants' following notification creation strategies.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs
Authors:
Syed Mekael Wasti,
Ken Q. Pu,
Ali Neshati
Abstract:
The evolution of Large Language Models (LLMs) has showcased remarkable capacities for logical reasoning and natural language comprehension. These capabilities can be leveraged in solutions that semantically and textually model complex problems. In this paper, we present our efforts toward constructing a framework that can serve as an intermediary between a user and their user interface (UI), enabl…
▽ More
The evolution of Large Language Models (LLMs) has showcased remarkable capacities for logical reasoning and natural language comprehension. These capabilities can be leveraged in solutions that semantically and textually model complex problems. In this paper, we present our efforts toward constructing a framework that can serve as an intermediary between a user and their user interface (UI), enabling dynamic and real-time interactions. We employ a system that stands upon textual semantic map**s of UI components, in the form of annotations. These map**s are stored, parsed, and scaled in a custom data structure, supplementary to an agent-based prompting backend engine. Employing textual semantic map**s allows each component to not only explain its role to the engine but also provide expectations. By comprehending the needs of both the user and the components, our LLM engine can classify the most appropriate application, extract relevant parameters, and subsequently execute precise predictions of the user's expected actions. Such an integration evolves static user interfaces into highly dynamic and adaptable solutions, introducing a new frontier of intelligent and responsive user experiences.
△ Less
Submitted 16 April, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
DiLogics: Creating Web Automation Programs With Diverse Logics
Authors:
Kevin Pu,
Jim Yang,
Angel Yuan,
Minyi Ma,
Rui Dong,
Xinyu Wang,
Yan Chen,
Tovi Grossman
Abstract:
Knowledge workers frequently encounter repetitive web data entry tasks, like updating records or placing orders. Web automation increases productivity, but translating tasks to web actions accurately and extending to new specifications is challenging. Existing tools can automate tasks that perform the same logical trace of UI actions (e.g., input text in each field in order), but do not support ta…
▽ More
Knowledge workers frequently encounter repetitive web data entry tasks, like updating records or placing orders. Web automation increases productivity, but translating tasks to web actions accurately and extending to new specifications is challenging. Existing tools can automate tasks that perform the same logical trace of UI actions (e.g., input text in each field in order), but do not support tasks requiring different executions based on varied input conditions. We present DiLogics, a programming-by-demonstration system that utilizes NLP to assist users in creating web automation programs that handle diverse specifications. DiLogics first semantically segments input data to structured task steps. By recording user demonstrations for each step, DiLogics generalizes the web macros to novel but semantically similar task requirements. Our evaluation showed that non-experts can effectively use DiLogics to create automation programs that fulfill diverse input instructions. DiLogics provides an efficient, intuitive, and expressive method for develo** web automation programs satisfying diverse specifications.
△ Less
Submitted 18 August, 2023; v1 submitted 10 August, 2023;
originally announced August 2023.
-
Solving Schrodinger equations using physically constrained neural network
Authors:
Kai-Fang Pu,
Hanlin Li,
Hong-Liang Lu,
Long-Gang Pang
Abstract:
Deep neural network (DNN) and auto differentiation have been widely used in computational physics to solve variational problems. When DNN is used to represent the wave function to solve quantum many-body problems using variational optimization, various physical constraints have to be injected into the neural network by construction, to increase the data and learning efficiency. We build the unitar…
▽ More
Deep neural network (DNN) and auto differentiation have been widely used in computational physics to solve variational problems. When DNN is used to represent the wave function to solve quantum many-body problems using variational optimization, various physical constraints have to be injected into the neural network by construction, to increase the data and learning efficiency. We build the unitary constraint to the variational wave function using a monotonic neural network to represent the Cumulative Distribution Function (CDF) $F(x) = \int_{-\infty}^{x} ψ^*ψdx'$. Using this constrained neural network to represent the variational wave function, we solve Schrodinger equations using auto-differentiation and stochastic gradient descent (SGD), by minimizing the violation of the trial wave function $ψ(x)$ to the Schrodinger equation. For several classical problems in quantum mechanics, we obtain their ground state wave function and energy with very low errors. The method developed in the present paper may pave a new way in solving nuclear many body problems in the future.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind
Authors:
Mo Yu,
Qiu**g Wang,
Shunchi Zhang,
Yisi Sang,
Kangsheng Pu,
Zekai Wei,
Han Wang,
Liyan Xu,
**g Li,
Yue Yu,
Jie Zhou
Abstract:
When reading a story, humans can quickly understand new fictional characters with a few observations, mainly by drawing analogies to fictional and real people they already know. This reflects the few-shot and meta-learning essence of humans' inference of characters' mental states, i.e., theory-of-mind (ToM), which is largely ignored in existing research. We fill this gap with a novel NLP dataset,…
▽ More
When reading a story, humans can quickly understand new fictional characters with a few observations, mainly by drawing analogies to fictional and real people they already know. This reflects the few-shot and meta-learning essence of humans' inference of characters' mental states, i.e., theory-of-mind (ToM), which is largely ignored in existing research. We fill this gap with a novel NLP dataset, ToM-in-AMC, the first assessment of machines' meta-learning of ToM in a realistic narrative understanding scenario. Our dataset consists of ~1,000 parsed movie scripts, each corresponding to a few-shot character understanding task that requires models to mimic humans' ability of fast digesting characters with a few starting scenes in a new movie.
We propose a novel ToM prompting approach designed to explicitly assess the influence of multiple ToM dimensions. It surpasses existing baseline models, underscoring the significance of modeling multiple ToM dimensions for our task. Our extensive human study verifies that humans are capable of solving our problem by inferring characters' mental states based on their previously seen movies. In comparison, our systems based on either state-of-the-art large language models (GPT-4) or meta-learning algorithms lags >20% behind, highlighting a notable limitation in existing approaches' ToM capabilities.
△ Less
Submitted 2 February, 2024; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Incremental Information Gain Mining Of Temporal Relational Streams
Authors:
Ken Pu,
Limin Ma
Abstract:
This paper studies the problem of mining for data values with high information gain in relational tables. High information gain can help data analysts and secondary data mining algorithms gain insights into strong statistical dependencies and causality relationship between key metrics. In this paper, we will study the problem of high information gain identification for scenarios involving temporal…
▽ More
This paper studies the problem of mining for data values with high information gain in relational tables. High information gain can help data analysts and secondary data mining algorithms gain insights into strong statistical dependencies and causality relationship between key metrics. In this paper, we will study the problem of high information gain identification for scenarios involving temporal relations where new records are added continuously to the relations. We show that information gain can be efficiently maintained in an incremental fashion, making it possible to monitor continuously high information gain values.
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
Data Lake Organization
Authors:
Fatemeh Nargesian,
Ken Q. Pu,
Bahar Ghadiri Bashardoost,
Erkang Zhu,
Renée J. Miller
Abstract:
We consider the problem of creating a navigation structure that allows a user to most effectively navigate a data lake. We define an organization as a graph that contains nodes representing sets of attributes within a data lake and edges indicating subset relationships among nodes. We present a new probabilistic model of how users interact with an organization and define the likelihood of a user f…
▽ More
We consider the problem of creating a navigation structure that allows a user to most effectively navigate a data lake. We define an organization as a graph that contains nodes representing sets of attributes within a data lake and edges indicating subset relationships among nodes. We present a new probabilistic model of how users interact with an organization and define the likelihood of a user finding a table using the organization. We propose the data lake organization problem as the problem of finding an organization that maximizes the expected probability of discovering tables by navigating an organization. We propose an approximate algorithm for the data lake organization problem. We show the effectiveness of the algorithm on both real data lakes containing data from open data portals and on benchmarks that emulate the observed characteristics of real data lakes. Through a formal user study, we show that navigation can help users discover relevant tables that cannot be found by keyword search. In addition, in our study, 42% of users preferred the use of navigation and 58% preferred keyword search, suggesting these are complementary and both useful modalities for data discovery in data lakes. Our experiments show that data lake organizations take into account the data lake distribution and outperform an existing hand-curated taxonomy and a common baseline organization.
△ Less
Submitted 2 March, 2020; v1 submitted 17 December, 2018;
originally announced December 2018.
-
LSH Ensemble: Internet-Scale Domain Search
Authors:
Erkang Zhu,
Fatemeh Nargesian,
Ken Q. Pu,
Renée J. Miller
Abstract:
We study the problem of domain search where a domain is a set of distinct values from an unspecified universe. We use Jaccard set containment, defined as $|Q \cap X|/|Q|$, as the relevance measure of a domain $X$ to a query domain $Q$. Our choice of Jaccard set containment over Jaccard similarity makes our work particularly suitable for searching Open Data and data on the web, as Jaccard similarit…
▽ More
We study the problem of domain search where a domain is a set of distinct values from an unspecified universe. We use Jaccard set containment, defined as $|Q \cap X|/|Q|$, as the relevance measure of a domain $X$ to a query domain $Q$. Our choice of Jaccard set containment over Jaccard similarity makes our work particularly suitable for searching Open Data and data on the web, as Jaccard similarity is known to have poor performance over sets with large differences in their domain sizes. We demonstrate that the domains found in several real-life Open Data and web data repositories show a power-law distribution over their domain sizes.
We present a new index structure, Locality Sensitive Hashing (LSH) Ensemble, that solves the domain search problem using set containment at Internet scale. Our index structure and search algorithm cope with the data volume and skew by means of data sketches (MinHash) and domain partitioning. Our index structure does not assume a prescribed set of values. We construct a cost model that describes the accuracy of LSH Ensemble with any given partitioning. This allows us to formulate the partitioning for LSH Ensemble as an optimization problem. We prove that there exists an optimal partitioning for any distribution. Furthermore, for datasets following a power-law distribution, as observed in Open Data and Web data corpora, we show that the optimal partitioning can be approximated using equi-depth, making it efficient to use in practice.
We evaluate our algorithm using real data (Canadian Open Data and WDC Web Tables) containing up over 262 M domains. The experiments demonstrate that our index consistently outperforms other leading alternatives in accuracy and performance. The improvements are most dramatic for data with large skew in the domain sizes. Even at 262 M domains, our index sustains query performance with under 3 seconds response time.
△ Less
Submitted 23 July, 2016; v1 submitted 23 March, 2016;
originally announced March 2016.