Skip to main content

Showing 1–14 of 14 results for author: Termehchy, A

.
  1. arXiv:2402.17926  [pdf, other

    stat.ML cs.DB cs.LG

    Certain and Approximately Certain Models for Statistical Learning

    Authors: Cheng Zhen, Nischal Aryal, Arash Termehchy, Alireza Aghasi, Amandeep Singh Chabada

    Abstract: Real-world data is often incomplete and contains missing values. To train accurate models over real-world datasets, users need to spend a substantial amount of time and resources imputing and finding proper values for missing data items. In this paper, we demonstrate that it is possible to learn accurate models directly from data with missing values for certain training data and target models. We… ▽ More

    Submitted 1 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: A technical report for a paper to appear at SIGMOD 2024

  2. arXiv:2312.15472  [pdf, ps, other

    cs.DB cs.CL

    Towards Consistent Language Models Using Declarative Constraints

    Authors: Jasmin Mousavi, Arash Termehchy

    Abstract: Large language models have shown unprecedented abilities in generating linguistically coherent and syntactically correct natural language output. However, they often return incorrect and inconsistent answers to input questions. Due to the complexity and uninterpretability of the internally learned representations, it is challenging to modify language models such that they provide correct and consi… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

  3. arXiv:2312.14291  [pdf, other

    cs.DB cs.MA

    Multi-Agent Join

    Authors: Vahid Ghadakchi, Mian Xie, Arash Termehchy, Bakhtiyar Doskenov, Bharghav Srikhakollu, Summit Haque, Huazheng Wang

    Abstract: It is crucial to provide real-time performance in many applications, such as interactive and exploratory data analysis. In these settings, users often need to view subsets of query results quickly. It is challenging to deliver such results over large datasets for relational operators over multiple relations, such as join. Join algorithms usually spend a long time on scanning and attempting to join… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  4. arXiv:2312.09407  [pdf, other

    cs.HC

    How Does User Behavior Evolve During Exploratory Visual Analysis?

    Authors: Sanad Saha, Nischal Aryal, Leilani Battle, Arash Termehchy

    Abstract: Exploratory visual analysis (EVA) is an essential stage of the data science pipeline, where users often lack clear analysis goals at the start and iteratively refine them as they learn more about their data. Accurate models of users' exploration behavior are becoming increasingly vital to develo** responsive and personalized tools for exploratory visual analysis. Yet we observe a discrepancy bet… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  5. arXiv:2109.07127  [pdf, ps, other

    cs.DB

    A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance

    Authors: Ga Young Lee, Lubna Alzamil, Bakhtiyar Doskenov, Arash Termehchy

    Abstract: Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done manually with data wrangling tools, or it can be completed automatically with a computer program. Data cleaning entails a slew of procedures that, once done, make th… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

  6. arXiv:2004.02308  [pdf, other

    cs.DB cs.LG

    Learning Over Dirty Data Without Cleaning

    Authors: Jose Picado, John Davis, Arash Termehchy, Ga Young Lee

    Abstract: Real-world datasets are dirty and contain many errors. Examples of these issues are violations of integrity constraints, duplicates, and inconsistencies in representing data values and entities. Learning over dirty databases may result in inaccurate models. Users have to spend a great deal of time and effort to repair data errors and create a clean database for learning. Moreover, as the informati… ▽ More

    Submitted 5 April, 2020; originally announced April 2020.

    Comments: To be published in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD'20)

  7. arXiv:1911.11184  [pdf, other

    cs.DB cs.PL

    Managing Variability in Relational Databases by VDBMS

    Authors: Parisa Ataei, Qiaoran Li, Eric Walkingshaw, Arash Termehchy

    Abstract: Variability inherently exists in databases in various contexts which creates database variants. For example, variants of a database could have different schemas/content (database evolution problem), variants of a database could root from different sources (data integration problem), variants of a database could be deployed differently for specific application domain (deploying a database for diffe… ▽ More

    Submitted 25 November, 2019; originally announced November 2019.

    Comments: 15 pages, 11 figures

  8. arXiv:1910.10263  [pdf, other

    cs.DB

    Integrating Information About Entities Progressively

    Authors: Ben McCamish, Christopher Buss, Arash Termehchy, David Maier

    Abstract: Users often have to integrate information about entities from multiple data sources. This task is challenging as each data source may represent information about the same entity in a distinct form, e.g., each data source may use a different name for the same person. Currently, data from different representations are translated into a unified one via lengthy and costly expert attention and tuning.… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

    Comments: demonstration

  9. arXiv:1710.01420  [pdf, other

    cs.DB cs.LG

    Usable & Scalable Learning Over Relational Data With Automatic Language Bias

    Authors: Jose Picado, Arash Termehchy, Sudhanshu Pathak, Alan Fern, Praveen Ilango, Yunqiao Cai

    Abstract: Relational databases are valuable resources for learning novel and interesting relations and concepts. In order to constraint the search through the large space of candidate definitions, users must tune the algorithm by specifying a language bias. Unfortunately, specifying the language bias is done via trial and error and is guided by the expert's intuitions. We propose AutoBias, a system that lev… ▽ More

    Submitted 6 April, 2020; v1 submitted 3 October, 2017; originally announced October 2017.

  10. arXiv:1603.04068  [pdf, other

    cs.DB cs.AI

    A Signaling Game Approach to Databases Querying and Interaction

    Authors: Ben McCamish, Vahid Ghadakchi, Arash Termehchy, Behrouz Touri

    Abstract: As most users do not precisely know the structure and/or the content of databases, their queries do not exactly reflect their information needs. The database management systems (DBMS) may interact with users and use their feedback on the returned results to learn the information needs behind their queries. Current query interfaces assume that users do not learn and modify the way way they express… ▽ More

    Submitted 4 May, 2018; v1 submitted 13 March, 2016; originally announced March 2016.

    Comments: 21 pages

  11. arXiv:1508.03846  [pdf, other

    cs.DB cs.AI cs.LG cs.LO

    Schema Independent Relational Learning

    Authors: Jose Picado, Arash Termehchy, Alan Fern, Parisa Ataei

    Abstract: Learning novel concepts and relations from relational databases is an important problem with many applications in database systems and machine learning. Relational learning algorithms learn the definition of a new relation in terms of existing relations in the database. Nevertheless, the same data set may be represented under different schemas for various reasons, such as efficiency, data quality,… ▽ More

    Submitted 6 November, 2017; v1 submitted 16 August, 2015; originally announced August 2015.

  12. arXiv:1508.03763  [pdf, other

    cs.DB

    Structural Generalizability: The Case of Similarity Search

    Authors: Yodsawalai Chodpathumwan, Arash Termehchy, Stephen A. Ramsey, Aayam Shresta, Amy Glen, Zheng Liu

    Abstract: Graph similarity search algorithms usually leverage the structural properties of a database. Hence, these algorithms are effective only on some structural variations of the data and are ineffective on other forms, which makes them hard to use. Ideally, one would like to design a data analytics algorithm that is structurally robust, i.e., it returns essentially the same accurate results over all po… ▽ More

    Submitted 31 March, 2021; v1 submitted 15 August, 2015; originally announced August 2015.

  13. arXiv:1503.05656  [pdf, other

    cs.DB

    Cost-Effective Conceptual Design Using Taxonomies

    Authors: Ali Vakilian, Yodsawalai Chodpathumwan, Arash Termehchy, Amir Nayyeri

    Abstract: It is known that annotating named entities in unstructured and semi-structured data sets by their concepts improves the effectiveness of answering queries over these data sets. As every enterprise has a limited budget of time or computational resources, it has to annotate a subset of concepts in a given domain whose costs of annotation do not exceed the budget. We call such a subset of concepts a… ▽ More

    Submitted 6 January, 2018; v1 submitted 19 March, 2015; originally announced March 2015.

  14. arXiv:1409.2553  [pdf, other

    cs.DB

    Representation Independent Analytics Over Structured Data

    Authors: Yodsawalai Chodpathumwan, Jose Picado, Arash Termehchy, Alan Fern, Yizhou Sun

    Abstract: Database analytics algorithms leverage quantifiable structural properties of the data to predict interesting concepts and relationships. The same information, however, can be represented using many different structures and the structural properties observed over particular representations do not necessarily hold for alternative structures. Thus, there is no guarantee that current database analytic… ▽ More

    Submitted 8 September, 2014; originally announced September 2014.