Data Agnostic RoBERTa-based Natural Language to SQL Query Generation

Pal, Debaditya; Sharma, Harsh; Chaudhari, Kaustubh

Computer Science > Artificial Intelligence

arXiv:2010.05243 (cs)

[Submitted on 11 Oct 2020 (v1), last revised 5 Mar 2021 (this version, v3)]

Title:Data Agnostic RoBERTa-based Natural Language to SQL Query Generation

Authors:Debaditya Pal, Harsh Sharma, Kaustubh Chaudhari

View PDF

Abstract:Relational databases are among the most widely used architectures to store massive amounts of data in the modern world. However, there is a barrier between these databases and the average user. The user often lacks the knowledge of a query language such as SQL required to interact with the database. The NL2SQL task aims at finding deep learning approaches to solve this problem by converting natural language questions into valid SQL queries. Given the sensitive nature of some databases and the growing need for data privacy, we have presented an approach with data privacy at its core. We have passed RoBERTa embeddings and data-agnostic knowledge vectors into LSTM based submodels to predict the final query. Although we have not achieved state of the art results, we have eliminated the need for the table data, right from the training of the model, and have achieved a test set execution accuracy of 76.7%. By eliminating the table data dependency while training we have created a model capable of zero shot learning based on the natural language question and table schema alone.

Comments:	8 Pages, 2 figures
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2010.05243 [cs.AI]
	(or arXiv:2010.05243v3 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2010.05243

Submission history

From: Debaditya Pal [view email]
[v1] Sun, 11 Oct 2020 13:18:46 UTC (164 KB)
[v2] Mon, 30 Nov 2020 06:29:58 UTC (164 KB)
[v3] Fri, 5 Mar 2021 05:55:10 UTC (167 KB)

Computer Science > Artificial Intelligence

Title:Data Agnostic RoBERTa-based Natural Language to SQL Query Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Data Agnostic RoBERTa-based Natural Language to SQL Query Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators