Search | arXiv e-print repository

Aalap: AI Assistant for Legal & Paralegal Functions in India

Authors: Aman Tiwari, Prathamesh Kalamkar, Atreyo Banerjee, Saurabh Karn, Varun Hemachandran, Smita Gupta

Abstract: Using proprietary Large Language Models on legal tasks poses challenges due to data privacy issues, domain data heterogeneity, domain knowledge sophistication, and domain objectives uniqueness. We created Aalalp, a fine-tuned Mistral 7B model on instructions data related to specific Indian legal tasks. The performance of Aalap is better than gpt-3.5-turbo in 31\% of our test data and obtains an eq… ▽ More Using proprietary Large Language Models on legal tasks poses challenges due to data privacy issues, domain data heterogeneity, domain knowledge sophistication, and domain objectives uniqueness. We created Aalalp, a fine-tuned Mistral 7B model on instructions data related to specific Indian legal tasks. The performance of Aalap is better than gpt-3.5-turbo in 31\% of our test data and obtains an equivalent score in 34\% of the test data as evaluated by GPT4. Training Aalap mainly focuses on teaching legal reasoning rather than legal recall. Aalap is definitely helpful for the day-to-day activities of lawyers, judges, or anyone working in legal systems. △ Less

Submitted 30 January, 2024; originally announced February 2024.

arXiv:2304.09548 [pdf, other]

SemEval 2023 Task 6: LegalEval - Understanding Legal Texts

Authors: Ashutosh Modi, Prathamesh Kalamkar, Saurabh Karn, Aman Tiwari, Abhinav Joshi, Sai Kiran Tanikella, Shouvik Kumar Guha, Sachin Malhan, Vivek Raghavan

Abstract: In populous countries, pending legal cases have been growing exponentially. There is a need for develo** NLP-based techniques for processing and automatically understanding legal documents. To promote research in the area of Legal NLP we organized the shared task LegalEval - Understanding Legal Texts at SemEval 2023. LegalEval task has three sub-tasks: Task-A (Rhetorical Roles Labeling) is about… ▽ More In populous countries, pending legal cases have been growing exponentially. There is a need for develo** NLP-based techniques for processing and automatically understanding legal documents. To promote research in the area of Legal NLP we organized the shared task LegalEval - Understanding Legal Texts at SemEval 2023. LegalEval task has three sub-tasks: Task-A (Rhetorical Roles Labeling) is about automatically structuring legal documents into semantically coherent units, Task-B (Legal Named Entity Recognition) deals with identifying relevant entities in a legal document and Task-C (Court Judgement Prediction with Explanation) explores the possibility of automatically predicting the outcome of a legal case along with providing an explanation for the prediction. In total 26 teams (approx. 100 participants spread across the world) submitted systems paper. In each of the sub-tasks, the proposed systems outperformed the baselines; however, there is a lot of scope for improvement. This paper describes the tasks, and analyzes techniques proposed by various teams. △ Less

Submitted 1 May, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: 13 Pages (9 Pages + References), Accepted at SemEval 2023 at ACL 2023

arXiv:2211.03442 [pdf, other]

Named Entity Recognition in Indian court judgments

Authors: Prathamesh Kalamkar, Astha Agarwal, Aman Tiwari, Smita Gupta, Saurabh Karn, Vivek Raghavan

Abstract: Identification of named entities from legal texts is an essential building block for develo** other legal Artificial Intelligence applications. Named Entities in legal texts are slightly different and more fine-grained than commonly used named entities like Person, Organization, Location etc. In this paper, we introduce a new corpus of 46545 annotated legal named entities mapped to 14 legal enti… ▽ More Identification of named entities from legal texts is an essential building block for develo** other legal Artificial Intelligence applications. Named Entities in legal texts are slightly different and more fine-grained than commonly used named entities like Person, Organization, Location etc. In this paper, we introduce a new corpus of 46545 annotated legal named entities mapped to 14 legal entity types. The Baseline model for extracting legal named entities from judgment text is also developed. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: to be published in NLLP 2022 Workshop at EMNLP

arXiv:2201.13125 [pdf, other]

Corpus for Automatic Structuring of Legal Documents

Authors: Prathamesh Kalamkar, Aman Tiwari, Astha Agarwal, Saurabh Karn, Smita Gupta, Vivek Raghavan, Ashutosh Modi

Abstract: In populous countries, pending legal cases have been growing exponentially. There is a need for develo** techniques for processing and organizing legal documents. In this paper, we introduce a new corpus for structuring legal documents. In particular, we introduce a corpus of legal judgment documents in English that are segmented into topical and coherent parts. Each of these parts is annotated… ▽ More In populous countries, pending legal cases have been growing exponentially. There is a need for develo** techniques for processing and organizing legal documents. In this paper, we introduce a new corpus for structuring legal documents. In particular, we introduce a corpus of legal judgment documents in English that are segmented into topical and coherent parts. Each of these parts is annotated with a label coming from a list of pre-defined Rhetorical Roles. We develop baseline models for automatically predicting rhetorical roles in a legal document based on the annotated corpus. Further, we show the application of rhetorical roles to improve performance on the tasks of summarization and legal judgment prediction. We release the corpus and baseline model code along with the paper. △ Less

Submitted 19 September, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

Comments: Accepted at LREC 2022, 10 Pages (8 page main paper + 2 page references)

arXiv:2107.06056 [pdf, other]

Indian Legal NLP Benchmarks : A Survey

Authors: Prathamesh Kalamkar, Janani Venugopalan Ph. D., Vivek Raghavan Ph. D

Abstract: Availability of challenging benchmarks is the key to advancement of AI in a specific field.Since Legal Text is significantly different than normal English text, there is a need to create separate Natural Language Processing benchmarks for Indian Legal Text which are challenging and focus on tasks specific to Legal Systems. This will spur innovation in applications of Natural language Processing fo… ▽ More Availability of challenging benchmarks is the key to advancement of AI in a specific field.Since Legal Text is significantly different than normal English text, there is a need to create separate Natural Language Processing benchmarks for Indian Legal Text which are challenging and focus on tasks specific to Legal Systems. This will spur innovation in applications of Natural language Processing for Indian Legal Text and will benefit AI community and Legal fraternity. We review the existing work in this area and propose ideas to create new benchmarks for Indian Legal Natural Language Processing. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Showing 1–5 of 5 results for author: Kalamkar, P