-
Aalap: AI Assistant for Legal & Paralegal Functions in India
Authors:
Aman Tiwari,
Prathamesh Kalamkar,
Atreyo Banerjee,
Saurabh Karn,
Varun Hemachandran,
Smita Gupta
Abstract:
Using proprietary Large Language Models on legal tasks poses challenges due to data privacy issues, domain data heterogeneity, domain knowledge sophistication, and domain objectives uniqueness. We created Aalalp, a fine-tuned Mistral 7B model on instructions data related to specific Indian legal tasks. The performance of Aalap is better than gpt-3.5-turbo in 31\% of our test data and obtains an eq…
▽ More
Using proprietary Large Language Models on legal tasks poses challenges due to data privacy issues, domain data heterogeneity, domain knowledge sophistication, and domain objectives uniqueness. We created Aalalp, a fine-tuned Mistral 7B model on instructions data related to specific Indian legal tasks. The performance of Aalap is better than gpt-3.5-turbo in 31\% of our test data and obtains an equivalent score in 34\% of the test data as evaluated by GPT4. Training Aalap mainly focuses on teaching legal reasoning rather than legal recall. Aalap is definitely helpful for the day-to-day activities of lawyers, judges, or anyone working in legal systems.
△ Less
Submitted 30 January, 2024;
originally announced February 2024.
-
SemEval 2023 Task 6: LegalEval - Understanding Legal Texts
Authors:
Ashutosh Modi,
Prathamesh Kalamkar,
Saurabh Karn,
Aman Tiwari,
Abhinav Joshi,
Sai Kiran Tanikella,
Shouvik Kumar Guha,
Sachin Malhan,
Vivek Raghavan
Abstract:
In populous countries, pending legal cases have been growing exponentially. There is a need for develo** NLP-based techniques for processing and automatically understanding legal documents. To promote research in the area of Legal NLP we organized the shared task LegalEval - Understanding Legal Texts at SemEval 2023. LegalEval task has three sub-tasks: Task-A (Rhetorical Roles Labeling) is about…
▽ More
In populous countries, pending legal cases have been growing exponentially. There is a need for develo** NLP-based techniques for processing and automatically understanding legal documents. To promote research in the area of Legal NLP we organized the shared task LegalEval - Understanding Legal Texts at SemEval 2023. LegalEval task has three sub-tasks: Task-A (Rhetorical Roles Labeling) is about automatically structuring legal documents into semantically coherent units, Task-B (Legal Named Entity Recognition) deals with identifying relevant entities in a legal document and Task-C (Court Judgement Prediction with Explanation) explores the possibility of automatically predicting the outcome of a legal case along with providing an explanation for the prediction. In total 26 teams (approx. 100 participants spread across the world) submitted systems paper. In each of the sub-tasks, the proposed systems outperformed the baselines; however, there is a lot of scope for improvement. This paper describes the tasks, and analyzes techniques proposed by various teams.
△ Less
Submitted 1 May, 2023; v1 submitted 19 April, 2023;
originally announced April 2023.
-
Named Entity Recognition in Indian court judgments
Authors:
Prathamesh Kalamkar,
Astha Agarwal,
Aman Tiwari,
Smita Gupta,
Saurabh Karn,
Vivek Raghavan
Abstract:
Identification of named entities from legal texts is an essential building block for develo** other legal Artificial Intelligence applications. Named Entities in legal texts are slightly different and more fine-grained than commonly used named entities like Person, Organization, Location etc. In this paper, we introduce a new corpus of 46545 annotated legal named entities mapped to 14 legal enti…
▽ More
Identification of named entities from legal texts is an essential building block for develo** other legal Artificial Intelligence applications. Named Entities in legal texts are slightly different and more fine-grained than commonly used named entities like Person, Organization, Location etc. In this paper, we introduce a new corpus of 46545 annotated legal named entities mapped to 14 legal entity types. The Baseline model for extracting legal named entities from judgment text is also developed.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Corpus for Automatic Structuring of Legal Documents
Authors:
Prathamesh Kalamkar,
Aman Tiwari,
Astha Agarwal,
Saurabh Karn,
Smita Gupta,
Vivek Raghavan,
Ashutosh Modi
Abstract:
In populous countries, pending legal cases have been growing exponentially. There is a need for develo** techniques for processing and organizing legal documents. In this paper, we introduce a new corpus for structuring legal documents. In particular, we introduce a corpus of legal judgment documents in English that are segmented into topical and coherent parts. Each of these parts is annotated…
▽ More
In populous countries, pending legal cases have been growing exponentially. There is a need for develo** techniques for processing and organizing legal documents. In this paper, we introduce a new corpus for structuring legal documents. In particular, we introduce a corpus of legal judgment documents in English that are segmented into topical and coherent parts. Each of these parts is annotated with a label coming from a list of pre-defined Rhetorical Roles. We develop baseline models for automatically predicting rhetorical roles in a legal document based on the annotated corpus. Further, we show the application of rhetorical roles to improve performance on the tasks of summarization and legal judgment prediction. We release the corpus and baseline model code along with the paper.
△ Less
Submitted 19 September, 2022; v1 submitted 31 January, 2022;
originally announced January 2022.
-
Indian Legal NLP Benchmarks : A Survey
Authors:
Prathamesh Kalamkar,
Janani Venugopalan Ph. D.,
Vivek Raghavan Ph. D
Abstract:
Availability of challenging benchmarks is the key to advancement of AI in a specific field.Since Legal Text is significantly different than normal English text, there is a need to create separate Natural Language Processing benchmarks for Indian Legal Text which are challenging and focus on tasks specific to Legal Systems. This will spur innovation in applications of Natural language Processing fo…
▽ More
Availability of challenging benchmarks is the key to advancement of AI in a specific field.Since Legal Text is significantly different than normal English text, there is a need to create separate Natural Language Processing benchmarks for Indian Legal Text which are challenging and focus on tasks specific to Legal Systems. This will spur innovation in applications of Natural language Processing for Indian Legal Text and will benefit AI community and Legal fraternity. We review the existing work in this area and propose ideas to create new benchmarks for Indian Legal Natural Language Processing.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.