ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation

Bhabesh Mali*, Karthik Maddala*, Vatsal Gupta*, Sweeya Reddy*, Chandan Karfa*, Ramesh Karri†
*Indian Institute of Technology Guwahati, India
†New York University, USA
 *{m.bhabesh, k.maddala, g.vatsal, r.poddutoori, ckarfa}@iitg.ac.in, †[email protected]
Abstract

System Verilog Assertion (SVA) formulation- a critical yet complex task is a prerequisite in the Assertion Based Verification (ABV) process. Traditionally, SVA formulation involves expert-driven interpretation of specifications, which is time-consuming and prone to human error. Recently, LLM-informed automatic assertion generation is gaining interest. We designed a novel framework called ChIRAAG, based on OpenAI GPT4, to generate SVA from natural language specifications of a design. ChIRAAG constitutes the systematic breakdown of design specifications into a standardized format, further generating assertions from formatted specifications using LLM. Furthermore, we used few test cases to validate the LLM-generated assertions. Automatic feedback of log messages from the simulation tool to the LLM ensures that the framework can generate correct SVAs. In our experiments, only 27% of LLM-generated raw assertions had errors, which was rectified in few iterations based on the simulation log. Our results on OpenTitan designs show that LLMs can streamline and assist engineers in the assertion generation process, resha** verification workflows.

Index Terms:
Assertion Generation, LLM, Formal Verification

I Introduction

Pre-silicon bug detection is a crucial step in designing functionally correct hardware. As per the functional verification study conducted by Wilson Research Group in 2020 [1], over 50% of the overall development resources and expenditures in Application-Specific Integrated Circuit (ASIC) and Field-Programmable Gate Array (FPGA) systems were allocated to the verification process. Assertion Based Verification (ABV) validates the correct design implementation by integrating safety and liveness properties, expressed as assertions, throughout the development process. Safety property states “something bad does not happen”. While liveness property states “something good will eventually happen” [2]. However, as the scale of design and verification requirements increases, so does the number of assertions, which can reach hundreds. Identifying and writing manually all assertions of a design is a non-trivial task.

Large Language Models (LLMs) have evolved to assist humans in solving complex problems. LLMs can also help in ABV by automatically generating the assertions from the natural language specification of a design. The existing methods have relied on generating SVA from Verilog code [3], comments, and assertion-based examples [4]. None of them could generate SVA directly from the specifications. This is the motivation for this study to formulate a novel framework that reduces human effort and time in SVA formulation. This work aims to analyze how LLMs can assist SVA generation semi-automatically from the natural language specifications. In our framework, manual intervention is needed when there is a potential bug in the implementation or the LLM cannot fix the issue in the generated assertion(s) based on the simulation log. We seek to answer two key questions through this work: RQ1: How good are LLMs in generating System Verilog Assertion (SVA) from natural language specification? RQ2: How functionally correct are the generated assertions within the design and verification framework?.

Our contributions are fourfold.

  • A framework to study the use of LLM in assertion generation for ABV.

  • The standardization of the format for specification and a prompting method to generate SVA.

  • An analysis of the performance of LLMs in generating SVA for OpenTitan designs [5].

  • Recommendations of LLMs for assertion generation.

The rest of the paper is organized as follows: Section \@slowromancapii@ discusses the related works. Section \@slowromancapiii@ describes our proposed framework. Section \@slowromancapiv@ shows a case study on a particular design. The experimentation and performance results of our proposed approach are illustrated in Section \@slowromancapv@. Section \@slowromancapvi@ concludes the paper with future directions.

II Background and related works

LLMs are Transformer Language models where the construction of the weight parameters involves training models using an extensive volume of text data acquired from diverse sources such as various websites, web-books, and PDFs. LLMs have leading-edge capabilities in diverse tasks, including source code generation [6], automated testing [7], and program repair [8]. As in [3], the authors showcased the capability of LLM to generate assertions solely from RTL code without additional specifications. Despite initial syntactic and semantic errors, the authors observed that, with proper guidance, LLMs consistently produce correct and complete assertions. The work in [4] uses comments about the security assertions to generate assertions. They assessed the effectiveness of LLMs in accelerating property verification. They generated 75,600 assertions across diverse conditions and varying levels of context and found that LLMs can generate correct assertions at an average rate of 4.53%. The work in [9] uses LLMs in security assertion generation. They perform formal property verification to identify vulnerabilities before develo** the Design Under Test (DUT).

In contrast, we have exclusively used the design specification in natural language to generate SVA instead of directly providing RTL code for assertion generation to the LLM. Since the objective of assertions is to identify the bug in the implementation, it is not desirable to obtain the assertions from implementation. Moreover, creating assertions from a design specification poses a significant challenge in the absence of a dedicated and specialized framework. We have also observed that the recent works are non-iterative, i.e., once the assertions are generated, the frameworks don’t try to refine any errors, such as syntactic or simulation errors, generated by LLM. Therefore, we proposed a novel framework, ChIRAAG, that formats the input specification and gives it to the LLM and further reviews, analyzes, and mends the LLM-generated assertions, if required. We have mainly focused on generating functional assertions rather than generating security assertions. None of the above-discussed works generate assertions using only the specification.

III ChIRAAG: Our SVA Generation Framework

The automatic assertion generation with ChIRAAG involves two stages: (i) formatting of specifications and (ii) automatic assertion generation. Algorithm 1 shows the steps of our proposed ChIRAAG framework. The stages are briefly described below:

III-A Formatting of specification

The formatting of design specifications includes representing the design context with specific labels, informing about its general definitions, signals, functional parameters, etc. The formatting involves extracting the important information from the design specification, 𝒟𝒟\mathcal{D}caligraphic_D, and later converting the extracted information, UnFmt𝑈𝑛𝐹𝑚𝑡UnFmtitalic_U italic_n italic_F italic_m italic_t, into a Javascript Object Notation (JSON) format, 𝒥𝒥\mathcal{J}caligraphic_J, using the functions Extraction and JSON, as mentioned in Line 6 of Algorithm 1. The JSON𝐽𝑆𝑂𝑁JSONitalic_J italic_S italic_O italic_N function comprises several sub-functions as shown in Table I to produce the formatted specification.

TABLE I: Formatting of specification
Label Description
Introduction 𝒥intJSONintsubscript𝒥𝑖𝑛𝑡𝐽𝑆𝑂subscript𝑁𝑖𝑛𝑡{\mathcal{J}_{int}}\leftarrow JSON_{int}caligraphic_J start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT ← italic_J italic_S italic_O italic_N start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT(UnFmt𝑈𝑛𝐹𝑚𝑡UnFmtitalic_U italic_n italic_F italic_m italic_t)
System_overview 𝒥s_ovJSONs_ovsubscript𝒥𝑠_𝑜𝑣𝐽𝑆𝑂subscript𝑁𝑠_𝑜𝑣{\mathcal{J}_{s\_ov}}\leftarrow JSON_{s\_ov}caligraphic_J start_POSTSUBSCRIPT italic_s _ italic_o italic_v end_POSTSUBSCRIPT ← italic_J italic_S italic_O italic_N start_POSTSUBSCRIPT italic_s _ italic_o italic_v end_POSTSUBSCRIPT ( UnFmt𝑈𝑛𝐹𝑚𝑡UnFmtitalic_U italic_n italic_F italic_m italic_t)
Definitions 𝒥dfJSONdfsubscript𝒥𝑑𝑓𝐽𝑆𝑂subscript𝑁𝑑𝑓\mathcal{J}_{df}\leftarrow JSON_{df}caligraphic_J start_POSTSUBSCRIPT italic_d italic_f end_POSTSUBSCRIPT ← italic_J italic_S italic_O italic_N start_POSTSUBSCRIPT italic_d italic_f end_POSTSUBSCRIPT (UnFmt𝑈𝑛𝐹𝑚𝑡UnFmtitalic_U italic_n italic_F italic_m italic_t)
Parameters 𝒥parJSONparsubscript𝒥𝑝𝑎𝑟𝐽𝑆𝑂subscript𝑁𝑝𝑎𝑟\mathcal{J}_{par}\leftarrow JSON_{par}caligraphic_J start_POSTSUBSCRIPT italic_p italic_a italic_r end_POSTSUBSCRIPT ← italic_J italic_S italic_O italic_N start_POSTSUBSCRIPT italic_p italic_a italic_r end_POSTSUBSCRIPT (UnFmt𝑈𝑛𝐹𝑚𝑡UnFmtitalic_U italic_n italic_F italic_m italic_t)
Functional_requirements 𝒥fn_rJSONfn_rsubscript𝒥𝑓𝑛_𝑟𝐽𝑆𝑂subscript𝑁𝑓𝑛_𝑟\mathcal{J}_{fn\_r}\leftarrow JSON_{fn\_r}caligraphic_J start_POSTSUBSCRIPT italic_f italic_n _ italic_r end_POSTSUBSCRIPT ← italic_J italic_S italic_O italic_N start_POSTSUBSCRIPT italic_f italic_n _ italic_r end_POSTSUBSCRIPT (UnFmt𝑈𝑛𝐹𝑚𝑡UnFmtitalic_U italic_n italic_F italic_m italic_t)
Timing_requirements 𝒥tm_rJSONtm_rsubscript𝒥𝑡𝑚_𝑟𝐽𝑆𝑂subscript𝑁𝑡𝑚_𝑟\mathcal{J}_{tm\_r}\leftarrow JSON_{tm\_r}caligraphic_J start_POSTSUBSCRIPT italic_t italic_m _ italic_r end_POSTSUBSCRIPT ← italic_J italic_S italic_O italic_N start_POSTSUBSCRIPT italic_t italic_m _ italic_r end_POSTSUBSCRIPT (UnFmt𝑈𝑛𝐹𝑚𝑡UnFmtitalic_U italic_n italic_F italic_m italic_t)
Extra_info 𝒥extJSONextsubscript𝒥𝑒𝑥𝑡𝐽𝑆𝑂subscript𝑁𝑒𝑥𝑡\mathcal{J}_{ext}\leftarrow JSON_{ext}caligraphic_J start_POSTSUBSCRIPT italic_e italic_x italic_t end_POSTSUBSCRIPT ← italic_J italic_S italic_O italic_N start_POSTSUBSCRIPT italic_e italic_x italic_t end_POSTSUBSCRIPT (UnFmt𝑈𝑛𝐹𝑚𝑡UnFmtitalic_U italic_n italic_F italic_m italic_t)

III-B Automatic Assertion Generation

In the next step, ChIRAAG generates SVA using the JSON formatted specifications. The formatted specifications of the designs are provided as input to the LLM through our framework by making a call to the OpenAI Application Programming Interface (API). This generates raw SVA as stated in Line 10 of Algorithm 1. The generated assertions are further verified for correctness in a simulation environment using testcases. This generates simulation logs that are then parsed using the ‘log_parser’ function, producing log message, 𝒲logsubscript𝒲𝑙𝑜𝑔\mathcal{W}_{log}caligraphic_W start_POSTSUBSCRIPT italic_l italic_o italic_g end_POSTSUBSCRIPT, as shown in Line 13 of Algorithm 1. This step gives rise to four possible cases that can eventually occur. They are

(a) No Error Message: When the generated assertion, Ansubscript𝐴𝑛A_{n}italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, passes all the test cases, i.e., no error message in 𝒲logsubscript𝒲𝑙𝑜𝑔\mathcal{W}_{log}caligraphic_W start_POSTSUBSCRIPT italic_l italic_o italic_g end_POSTSUBSCRIPT, then Ansubscript𝐴𝑛A_{n}italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is taken as correct output assertions suite.

(b) Syntax Error: Some LLM-generated assertions have Syntax Errors (SE). The LLM is provided the log message, 𝒲logsubscript𝒲𝑙𝑜𝑔\mathcal{W}_{log}caligraphic_W start_POSTSUBSCRIPT italic_l italic_o italic_g end_POSTSUBSCRIPT, automatically. Further, the LLM is prompted to refine the assertion. LLM is able to correct the SE within a few iterations. Line 17 of Algorithm 1 refine the assertion based on the error message.

(c) Simulation Failure: The assertions generated by LLM might fail in the simulation run because of specific errors. Two such simulation errors we have encountered are timing errors and missing signals. Line 20 of Algorithm 1 refine the assertion based on the error message.

(d) Implementation Bug: It might be possible that even the generated assertions are correct, but the implementation might be buggy. In such a scenario, the testcase fails, giving wrong outputs. In this case, manual inspection of design implementation is to be done as shown in Line 23 of Algorithm 1.

The assertions may not pass the testcases in a single run. Therefore, we iteratively run the error-checking process multiple times. Line 11 of Algorithm 1 shows that the while loop runs until n𝑛nitalic_n is less than the upper bound 𝒯𝒯\mathcal{T}caligraphic_T. If n𝑛nitalic_n gets equal to 𝒯𝒯\mathcal{T}caligraphic_T, and there are still error messages in 𝒲logsubscript𝒲𝑙𝑜𝑔\mathcal{W}_{log}caligraphic_W start_POSTSUBSCRIPT italic_l italic_o italic_g end_POSTSUBSCRIPT, we manually investigate and resolve the potential issue which LLM cannot fix automatically and restart the assertion generation process as stated in Lines 29-33.

The proposed ChIRAAG framework is illustrated in the Fig. 1. We use the design only to check the syntax and semantics of the generated assertions. No other refinement of assertions is done using the design. In this process, a correctly generated assertion can fix a bug (if any) in a design as well.

Refer to caption
Figure 1: ChIRAAG-SVA Generation Framework
Algorithm 1 ChIRAAG: SVA Generation Framework
1:Input:
2:      𝒟𝒟\mathcal{D}caligraphic_D: Design specification document, \mathcal{L}caligraphic_L: Log file of simulation tool, 𝒯𝒯\mathcal{T}caligraphic_T: Upper bound on the number of LLM interaction prompts
3:Output:
4:      Ansubscript𝐴𝑛A_{n}italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT: Suite of System Verilog Assertions
5:procedure Specification Extraction and Formatting
6:    UnFmtExtraction(𝒟)𝑈𝑛𝐹𝑚𝑡𝐸𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝒟UnFmt\leftarrow Extraction(\mathcal{D})italic_U italic_n italic_F italic_m italic_t ← italic_E italic_x italic_t italic_r italic_a italic_c italic_t italic_i italic_o italic_n ( caligraphic_D ), 𝒥JSON(UnFmt)𝒥JSON𝑈𝑛𝐹𝑚𝑡\mathcal{J}\leftarrow\text{JSON}(UnFmt)caligraphic_J ← JSON ( italic_U italic_n italic_F italic_m italic_t )
7:end procedure
8:procedure LLM Interaction for SVA Generation
9:    Initialize n0 and {}𝑛0 and n\leftarrow 0\text{ and }\mathcal{L}\leftarrow\{\}italic_n ← 0 and caligraphic_L ← { }
10:    AnLLM(𝒥,)subscript𝐴𝑛LLM𝒥A_{n}\leftarrow\text{LLM}(\mathcal{J},\mathcal{L})italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ← LLM ( caligraphic_J , caligraphic_L )
11:    while n<𝒯𝑛𝒯n<\mathcal{T}italic_n < caligraphic_T do
12:         Use VCS to check the generated Ansubscript𝐴𝑛A_{n}italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT on testcases
13:         𝒲loglog_parser()subscript𝒲𝑙𝑜𝑔log_parser\mathcal{W}_{log}\leftarrow\text{log\_parser}(\mathcal{L})caligraphic_W start_POSTSUBSCRIPT italic_l italic_o italic_g end_POSTSUBSCRIPT ← log_parser ( caligraphic_L )
14:         if no errors in 𝒲logsubscript𝒲𝑙𝑜𝑔\mathcal{W}_{log}caligraphic_W start_POSTSUBSCRIPT italic_l italic_o italic_g end_POSTSUBSCRIPT then
15:             break
16:         else if syntactic errors in 𝒲logsubscript𝒲𝑙𝑜𝑔\mathcal{W}_{log}caligraphic_W start_POSTSUBSCRIPT italic_l italic_o italic_g end_POSTSUBSCRIPT then
17:             A¯nsubscript¯𝐴𝑛absent\overline{A}_{n}\leftarrowover¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ← Rectify syntactic errors
18:             AnAn¯subscript𝐴𝑛¯subscript𝐴𝑛A_{n}\leftarrow\overline{A_{n}}italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ← over¯ start_ARG italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG
19:         else if timing or missing signal error in 𝒲logsubscript𝒲𝑙𝑜𝑔\mathcal{W}_{log}caligraphic_W start_POSTSUBSCRIPT italic_l italic_o italic_g end_POSTSUBSCRIPT then
20:             A¯nsubscript¯𝐴𝑛absent\overline{A}_{n}\leftarrowover¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ← Rectify simulation errors
21:             AnAn¯subscript𝐴𝑛¯subscript𝐴𝑛A_{n}\leftarrow\overline{A_{n}}italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ← over¯ start_ARG italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG
22:         else if testcase failure message in 𝒲logsubscript𝒲𝑙𝑜𝑔\mathcal{W}_{log}caligraphic_W start_POSTSUBSCRIPT italic_l italic_o italic_g end_POSTSUBSCRIPT then
23:             Manual inspection of design implementation for bug
24:             break
25:         end if
26:         nn+1𝑛𝑛1n\leftarrow n+1italic_n ← italic_n + 1
27:    end while
28:end procedure
29:procedure Design Implementation Examination
30:    if n==𝒯n==\mathcal{T}italic_n = = caligraphic_T and error message in 𝒲logsubscript𝒲𝑙𝑜𝑔\mathcal{W}_{log}caligraphic_W start_POSTSUBSCRIPT italic_l italic_o italic_g end_POSTSUBSCRIPT then
31:         Examine design implementation and restart the process
32:    end if
33:end procedure

IV Case Study: RV Timer

RV Timer [5] is a design module that is intended for use by processors to monitor the current time relative to the system reset or power-on. The block diagram of the RV Timer is shown in Fig. 2

Refer to caption
Figure 2: Block Diagram of RV Timer

The unstructured design specification of RV Timer is initially transformed to structured JSON format and presented as input to the LLM to generate the SVA. The LLM initially generated ten functional assertions. We further evaluated the correctness of the raw assertions using testcases and VCS simulator. Seven out of ten assertions were correct (i.e., there are no syntax or simulation errors) in their raw format. This shows the potential of our proposed framework ChIRAAG. The remaining three raw assertions were rectified automatically using the steps of Algorithm 1. This process generated an extra assertion, which satisfies the design aspects, increasing the count of generated assertions to eleven. Some of the initial assertions of RV Timer generated by the LLM are:

Assertion 1:
// Assertion to check if tick_count resets to 0 on reset
property p_reset_tick_count;
@(posedge clk_i) (!rst_ni) |-> (tick_count == 12’h0);
end property
assert property (p_reset_tick_count);
Assertion 2:
// Assertion to check if tick_count increments correctly
property p_tick_count_increment;
@(posedge clk_i) (active && (tick_count < prescaler)) |-> (tick_count == $past(tick_count) + 1);
end property
assert property (p_tick_count_increment);

The Assertion 1 is a correct raw assertion that passes all the tests. It ensures that ‘tick_count’ resets to 0 when reset is asserted, i.e., when ‘rst_ni’ or the active low reset signal is not active. The Assertion 2, we observed that a missing signal prevents passing all the test cases. The LLM understood the missing signal rst_ni from the simulation log message and included it in the generated assertion in the next iteration. Line 20 of Algorithm 1 is called upon to rectify the Assertion 2. The LLM generated the refined assertion and modified it into two distinct assertions Modified Assertion 2a and Modified Assertion 2b.

Modified Assertion 2a:
// Assertion for tick_count increment
property tick_count_increment;
@(posedge clk_i)
(rst_ni && active && (tick_count < prescaler)) |=> (tick_count == $past(tick_count) + 1);
endproperty
assert property (tick_count_increment);
Modified Assertion 2b:
// Assertion for tick_count reset on reset deassertion
property tick_count_reset_on_reset_deassert;
@(posedge clk_i)
(!$past(rst_ni, 1) && rst_ni) -> (tick_count == 0);
endproperty
assert property (tick_count_reset_on_reset_deassert);

The Modified Assertion 2a ensures that as long as the system is not in reset, is active, and the tick count is below a set threshold, the tick count will increment by 1 in each clock cycle. The Modified Assertion 2b verifies that when the system transitions from a reset state to a non-reset state, the tick count is reset to 0, ensuring proper initialization or re-initialization of the system’s tick counter. ChIRAAG took twelve iterations to converge because, initially, it has syntax errors like identifier missing and token missing, which are resolved in seven iterations for all the assertions combined, and then simulation errors like timing failure are rectified using further prompting in five more iterations.

V Experimental Results and Discussion

Our novel framework, ChIRAAG, was evaluated using designs provided in the OpenTitan [5] project. We performed our experiments on six designs as shown in Table II.

We have utilized OpenAI’s GPT-4 turbo [10] LLM as an automatic assertion generation tool for our ChIRAAG framework. This model has a capacity of 1280000 tokens of context window and training data up to December 2023. We collected the design specification and related files for each design and manually structured the data in a specific format with distinctive labels, and corresponding details for each design.

ChIRAAG generated the correct assertion using LLM within a few automatic prompting processes for each design. Table II shows the performance report for LLM-based assertion generation. The first column, “Module” shows different designs we have taken. The third column, “LLM Assert.”, shows the number of SV assertions generated by LLM for each design, and the second column, “OT Assert.”, shows the number of assertions provided in the RTL of OpenTitan. “#Prompts” denotes the number of prompts required to generate the correct assertions that satisfy all the test cases. “VCS Sim. Time” shows the time taken by the simulation process. While the “SVA Generation Time” is the initial assertion generation time taken by our framework.

ChIRAAG successfully generated all of the OpenTitan assertions for each design. Interestingly, ChIRAAG generates more assertions than that was provided in OpenTitan. Importantly, ChIRAAG-generated assertions are important and satisfy some aspects of the design that were missed by the OpenTitan assertions. Our proposed framework, ChIRAAG, generates the assertions for each design in less than 15 seconds. This shows the effectiveness of ChIRAAG in a faster assertion generation process than the traditional method.

TABLE II: Performance Report of ChIRAAG
Module OT Assert. LLM Assert. #Prompts VCS Sim. Time SVA Generation Time
RV Timer 0 11 12 80ns 6.34s
PattGen 0 9 9 110ns 9.45s
GPIO 0 6 8 190ns 12.34s
ROM Ctrl 6 11 14 250ns 14.34s
sram_ctrl 0 14 8 100ns 10.23s
adc_ctrl 5 8 9 460ns 7.56s

While ChIRAAG can rectify syntactic and simulation errors using the log of a simulator like VCS, it can also detect potential bugs in the implementation. To demonstrate such a scenario, we have considered a buggy implementation of a 4-bit full-adder, where xor is replaced by an or operation. Initially, formatted specifications were fed to ChIRAAG to generate the SVA. ChIRAAG correctly generated two assertions listed as Assertion 3 and Assertion 4, for checking the carry-out generated and the other for checking the sum bits generated.

Assertion 3:
//Assertion to check if the carry-out (C4) is generated correctly for the addition
property prop_carry_out;
@(posedge C0)
C4 |-> ((A[3] & B[3]) | (C3 & (A[3] ^ B[3])));
endproperty
assert property (prop_carry_out);
Assertion 4:
//Assertion to check if the sum bits (S) are correct
property prop_sum_bits;
@(posedge C0)
S[0] |-> (A[0] ^ B[0] ^ C0) &&
S[1] |-> (A[1] ^ B[1] ^ C1) &&
S[2] |-> (A[2] ^ B[2] ^ C2) &&
S[3] |-> (A[3] ^ B[3] ^ C3);
endproperty
assert property(prop_sum_bits);

Even though the above assertions are correct, the testcase fails due to the wrong output. After correcting the implementation to xor from or, the SVA could pass all the tests, This shows that our automated framework can also detect bugs in design implementation. This shows the usefulness of ChIRAAG in generating SVA from natural language specifications. The basic details of the formatted specification, along with the codebase, simulation files, logs, and prompts, are accessible on our GitHub repository 111https://github.com/karthikmaddala/ChIRAAG.

VI Conclusion

In this work, we have developed ChIRAAG that can generate SVA from the formatted specification using LLM. Primarily, unstructured design specifications are transformed into a structured manner and given to the LLM as an input prompt to generate an SVA suite. The prompting is crucial in generating correct assertions. The RTL implementation is only used to simulate and identify syntax and simulation errors in generated assertions. The performance of ChIRAAG on OpenTitan design shows that an LLM is a good starting point to assist an engineer in assertion generation. We noticed only 27% of the generated SVA require refinement upon iterative prompting. We performed our experiment with a general-domain LLM; however, we believe a domain-specific LLM for ABV should provide even more accurate assertions in fewer iterations. In our work, we ensure that the generated assertions are syntactically and semantically correct. However, how meaningful and complete these are, in the context of a design needs to be checked by experts. It is not possible to confirm 100% functional coverage of design intent by the generated assertions. Our future goal will be to check the capability of LLM to address the “consistency” and “completeness” issues in ABV.

References

  • [1] H. Foster, “Part 1: The 2020 wilson research group functional verification study,” Feb 2021. [Online]. Available: https://tinyurl.com/verification-foster-group
  • [2] P. Dasgupta, A roadmap for formal property verification.   Springer, 2006.
  • [3] M. Orenes-Vera, “Using llms to facilitate formal verification of rtl,” 2023.
  • [4] R. Kande et al., “Llm-assisted generation of hardware assertions,” 2023.
  • [5] OpenTitan, “OpenTitan Repository,” https://opentitan.org/, Visited: 2023.
  • [6] M. Chen et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021.
  • [7] Y. Deng et al., “Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt,” arXiv preprint arXiv:2304.02014, 2023.
  • [8] C. S. Xia, “Less training, more repairing please: revisiting automated program repair via zero-shot learning,” in Proceedings of ACM Joint European Software Engineering Conference and Symposium on Foundations of Software Engineering, 2022.
  • [9] H. Witharana et al., “Automated generation of security assertions for rtl models,” ACM Journal on Emerging Technologies in Computing Systems, vol. 19, no. 1, pp. 1–27, 2023.
  • [10] OpenAI, “Gpt-4 technical report,” 2023.