Effort and Size Estimation in Software Projects with Large Language Model-based Intelligent Interfaces

Claudionor N. Coelho Jr Zscaler Inc. ECE Department, Santa Clara University Hanchen Xiong Zscaler Inc. Tushar Karayil Zscaler Inc. Sree Koratala Zscaler Inc. Rex Shang Zscaler Inc. Jacob Bollinger Zscaler Inc. Mohamed Shabar Zscaler Inc. Syam Nair Zscaler Inc.

(June 28, 2024)

Abstract

The advancement of Large Language Models (LLM) has also resulted in an equivalent proliferation in its applications. Software design, being one, has gained tremendous benefits in using LLMs as an interface component that extends fixed user stories. However, inclusion of LLM-based AI agents in software design often poses unexpected challenges, especially in the estimation of development efforts. Through the example of UI-based user stories, we provide a comparison against traditional methods and propose a new way to enhance specifications of natural language-based questions that allows for the estimation of development effort by taking into account data sources, interfaces and algorithms.

INTRODUCTION

The acceleration of LLM model development and their visibility have prompted the genesis of many LLM-based products. Recently, the release of ChatGPT [1, 2] was a milestone that signaled a significant shift in society, including changes in software design paradigms. Initially, LLMs [3] like ChatGPT revolutionized the field with advanced chatbots and AI Agents [4], enhancing the ability of these models by connecting data sources, algorithms and visualizations to LLMs. However, in recent months many have witnessed a transition towards more sophisticated systems such as Retrieval-Augmented Generation (RAG) [5] and AI Agents [4].

Although more recent LLMs [6, 7, 8] have the capability to do data analysis and even data summarization and representation, the ability to connect to external data sources, algorithms and specialized interfaces to LLMs [9] adds additional flexibility to LLMs by enabling it to perform tasks that involves analysis of domain specific real time data, or even the possibility to perform tasks that are still beyond LLM’s capabilities.

This paper discusses the changes in software design using AI Agents, specifically, the shift from traditional UI/UX user stories [10] in software design to LLM-based AI Agent interfaces implementing several user stories using a single natural language interface. This transition represents a paradigm shift from well-structured documentation of data sources, UI/UX interactions, and algorithms, where you can reasonably well estimate size and effort of development, to a more flexible, albeit imprecise, mode of interaction through natural language descriptions. While this shift has unlocked unprecedented levels of user accessibility and software adaptability, it has also introduced unique challenges. One of the most fundamental questions that we intend to address in this paper is on how to estimate the development effort and size of these new systems, where the LLM interacts with the user sometimes in unknown ways.

UI/UX BASED SYSTEM DESIGN AND EFFORT ESTIMATION

In this section we provide a simple example to show how effort can be estimated using current software engineering methods [11, 12, 13]. We do not intend to show how to compute efforts, but we only emphasize here that knowing the number of data sources, user interface widgets and algorithms enables one to estimate the effort and size of a project or feature.

In this example, we want to examine the complexity of adding the user story of ordering a margherita gourmet pizza in 20 minutes to a food app, as an optimization to the the flow presented in Figure 1.

Refer to caption — Figure 1: User Story to Order Pizza for a Food Delivery App

We have to assume that to implement this use case, we need access to the following data sources and algorithms:

1.

Restaurant database that can be searched by location and by type of food.
2.

Menu database, where user can search for types of food served by the restaurant.
3.

Algorithm that computes the delivery time from the restaurant to your location.

Based on this information, and the number of widgets available in the user interface, we can estimate the development effort based on previous experiences from the team as mentioned in Table 1. In this example, we are not considering other data sources, UI widgets algorithms, but in reality, they would be required, such as payment infrastructure.

Tables	Algorithms	Widgets
2	1	4

Table 1: Summary of Effort Metrics

The reader should notice that this use case implements a single type of user interaction, and if we decide to modify the interaction, we will need to change the user story, or create another implementation that accommodates a different user story.

AI AGENTS

An AI Agent [14] encompasses a system that employs an LLM to process and reason about a specific domain. To generate specific answers (often related to the domain), the AI Agent leverages auxiliary systems in conjunction with the LLM. These auxiliary systems support the agent in comprehending the domain and facilitating the creation of accurate responses.

AI Agents consists of four major components. The agent core forms the central component and is responsible for orchestrating the agent’s overall functionality. The memory module enables the agent to store and retrieve relevant information, enhancing its ability to retain context and make informed decisions. The planner component guides the agent’s actions by formulating a strategic course of actions based on the given problem or task. Finally, the set of tools encompasses various external components and resources that assist the agent in performing specific tasks or functions within the defined domain. These components collaboratively enable AI Agents to effectively process information, reason, and generate responses in a manner aligned with their designated purpose.

Agent Core

The agent core is a crucial component within an AI Agent that plays a central role in orchestrating the agent’s overall functionality. It receives a query from the user. Consequently, it manages the decision-making processes, communication, and coordination of various modules and subsystems within the agent. Finally, it aggregates the information and generates a response.

The agent core is also responsible for managing the agent’s internal state. It maintains a representation of the agent’s assets and internal state, allowing it to reason, plan, and adapt its behavior accordingly. The core oversees the update and retrieval of information from the agent’s memory, enabling it to access relevant knowledge and contextual information during decision-making processes.

Memory

The memory module within an AI Agent encompasses two important aspects: historical memory and contextual memory.

Historical memory serves as a repository for past interactions and experiences of the AI Agent. It stores a record of previous inputs, outputs, and the outcomes of actions taken by the agent. This historical data is valuable as it enables the agent to learn from past interactions and avoid repeating mistakes. Through the historical memory the agent gains insights about effective strategies, successful outcomes/patterns enabling an informed decision making process.

Contextual memory, on the other hand, focuses on maintaining a coherent understanding of the current situation. It stores relevant context that provides the necessary background for the agent to interpret and respond appropriately to the present state. This can include information about the environment, the user’s preferences or intentions, and any other contextual factors that influence the agent’s behavior. Contextual memory allows the agent to adapt its action and responses to specific circumstances, thereby enhancing its ability to interact intelligently with changing environments.

Together, historical and contextual memories allow the AI agent to combine past experiences and current context for an efficient decision making process

Planner

The planner component within an AI Agent plays a crucial role in guiding the agent’s actions and formulating a strategic course of action based on the given problem or task. It is responsible for generating a sequence of steps or actions that lead the agent towards achieving its objectives. The planner analyzes the current state of the environment, along with any available information or constraints, to determine the most effective sequence of actions to achieve the desired outcome. It also takes into account other factors such as goals, resources, rules, and dependencies to generate a plan that optimizes the agent’s decision-making process.

An example of a prompt template that can be used by the planner is presented in Figure 3. Please note that we use the Model-View-Controller (MVC) architecture [13] as a convenient way to describe data, interfaces and algorithms, respectively, as LLMs have probably been exposed to this framework during training phase. The planner would then utilize this prompt template to generate a plan that outlines specific actions and steps to be taken. By employing the planner component, the AI Agent can systematically determine the optimal sequence of actions to achieve its objectives, ensuring efficient decision-making and effective utilization of available resources. The generated plan serves as a roadmap for the agent’s actions, enabling it to navigate complex problem spaces and accomplish its goals in an optimal manner.

GENERAL INSTRUCTIONS
You are an autonomous AI Agent who converts a text into
executable tasks using as few interactions as possible
with the user. Your task is to break down each complex
request from a list of user requests into simpler tasks. Each
simpler task should use an existing tool or if none is
avaiable, you should create a helper task. Each task should
be one of model, view or control from software MVC
architecture. Model or data sources are objects representing
database tables. Models can be searched and modified by
algorithms. Control are algorithms that represents actions on
data sources. Views or interfaces describe interaction with
user. They need to take as input either a data source, or an
algorithm, when some computation needs to be performed on one
or more data sources. Your list of tasks should concisely
represent the algorithms, data sources and interfaces that
need to be implemented to perform the task. Your answer
should be only a csv list with fields task type, function
call name and task description from MVC model and nothing more.


AVAILABLE TOOLS:
- Search Tool
- Math Tool

CONTEXTUAL INFORMATION:
<information from Memory to help LLM
to figure out the context>

USER REQUEST:
I want to order margherita pizza in 20 min in my app?

ANSWER FORMAT
csv list

Figure 3: Example of Prompt by Planner, modified from [14]

Tools

In an AI Agent, the set of tools encompasses various resources and functionalities that assist in performing specific tasks or functions within the defined domain. Here is a non-exhaustive list of possible tools that can be utilized in an AI Agent:

•

RAG (Retrieval-Augmented Generation) — Combines retrieval-based methods with generative language models. It enables the agent to retrieve relevant information from a knowledge base and utilize it to generate coherent and contextually appropriate responses. Common data sources for RAG include Question-Answer databases, documentation and web pages.
•

Database connections — Connect to databases and allow the AI Agent to access and retrieve information from structured external data sources. This tool enables the agent to query and extract relevant data for decision-making or generating responses, specially in domain specific scenarios.
•

Machine Learning frameworks — Provide tools and algorithms for training and deploying machine learning models. These frameworks enable the agent to leverage various machine learning techniques, including supervised learning, unsupervised learning, or reinforcement learning, to enhance its capabilities.
•

Visualization tools — Assist in representing and interpreting data or model outputs in a visual format. These tools can help the agent understand complex patterns, relationships, or trends in the data, aiding in decision-making and analysis.
•

Simulation environments — Provide a controlled virtual environment where the AI Agent can interact and learn without impacting the real world. These tools allow the agent to practice and refine its skills, test different strategies, and evaluate the potential outcomes of its actions.
•

Data preprocessing tools — Help in cleaning, transforming, and preparing raw data before feeding it into the AI Agent. These tools may include techniques for data cleaning, normalization, feature selection, or dimensionality reduction, ensuring the quality and relevance of data used by the agent.

These tools enhance LLMs by providing it with specialized functionalities for specific domains. It should be noted that these tools can be classified as data sources, visualization artifacts and algorithms.

LLM IS THE NEW UI/UX

With advent of LLMs in the previous year, we have seen people specifying user stories using natural language, as mentioned before, in the following way:

    I want to order a gourmet Margherita pizza in 20 minutes.

In user story development, as follow-up questions one would need to document in the development process, we would like to determine.

•

Which data sources should we connect to?
•

Which algorithms we need to invoke to solve this request?
•

Which interfaces are required to implement this user story?
•

Which other questions we want to be able to solve?

We have seen a deterioration of specification quality in user stories when people over abuse the capabilities of adaptability of LLMs and we will show how we can easily lose control of this simple requirement by just slightly changing the question.

1.

Can this restaurant deliver food in 20 min?
2.

Give me the list of all restaurants that deliver gourmet pizza in 20 min.
3.

Give me the 20 top evaluated restaurants that can deliver gourmet pizza in 20 minutes.

The reader can easily see that the first question requires just a simple yes/no answer. The second question requires a summarization or visualization agent to provide the answer. The third query will require getting data from possibly an additional table for the backend. Without fully specifying what are the problem the system is trying to solve, and resorting to just a single question (as people expect the LLMs to extrapolate automatically on these questions), estimating the development effort may become an almost impossible task.

ESTIMATING EFFORT IN AI AGENTS

The main idea of this paper is to show that we can retrieve a similar level of understanding of implementation effort of the user stories if we use the Planner of an AI Agent to enumerate the data sources and algorithms we need to use by sampling questions we want to be able to answer with these systems.

The idea is presented below by iterating over generation of related questions and asking planner to generate sub-tasks for the generated set of questions,

List of questions

Q

{\it AllTasks}\leftarrow\emptyset

for

q\in Q

Generate

N

CONCLUSIONS

Over the past year, we have seen an explosion in the integration of Large Language Models to existing systems (or even the creation of new systems where one of the UI/UX widgets is a natural language interface).

Such systems posed a challenge in normal software engineering practices of effort and size estimation, as the system are not as well documented as it used to be when specifying user stories explicitly. These new specifications started using sample questions on what the system should do, which inherits the ambiguity of written language. It becomes impossible to quantify effort or size of development of such systems, or even to document what the system does and does not do.

We have shown that by using an LLM to generate a list of similar questions, and leveraging the planner state of the AI Agent to create a list of non-duplicated sub-tasks, we are able to regain the same level of precision that user stories and use cases had achieved previously.

ACKNOWLEDGMENTS

Portions of this document used GPT4 to improve readability, and to automatically generate related questions and algorithms, data sources and UI interfaces.

REFERENCES

[1] “Introducing chatgpt.” https://openai.com/blog/chatgpt, November 2022. Accessed on December 19, 2023.
[2] OpenAI, “GPT-4 Technical Report.” arxiv:2303.08774 [cs.CL], 2023. Submitted on March 15, 2023 (v1), last revised on March 27, 2023 (v3); Accessed on May 4, 2023.
[3] C. Huyen, “Building llm applications for production.” https://huyenchip.com/2023/04/11/llm-engineering.html, April 2023. Accessed on May 4, 2023.
[4] “Wolfram—alpha as the way to bring computational knowledge superpowers to chatgpt.” https://writings.stephenwolfram.com/2023/01/wolframalpha-as-the-way-to-bring-computational-knowledge-superpowers-to-chatgpt/, January 2023. Accessed on December 19, 2023.
[5] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. tau Yih, T. Rocktäschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive nlp tasks.” arXiv:2005.11401 [cs.CL], 2021.
[6] OpenAI, “Gpt-4 technical report.” arXiv:2303.08774 [cs.CL], 2023.
[7] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M.-A. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, and T. Scialom, “Llama 2: Open foundation and fine-tuned chat models.” arXiv:2307.09288 [cs.CL], 2023.
[8] A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bamford, D. S. Chaplot, D. de las Casas, E. B. Hanna, F. Bressand, G. Lengyel, G. Bour, G. Lample, L. R. Lavaud, L. Saulnier, M.-A. Lachaux, P. Stock, S. Subramanian, S. Yang, S. Antoniak, T. L. Scao, T. Gervet, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed, “Mixtral of experts.” arXiv:2401.04088 [cs.LG], 2024.
[9] “Nexusraven-v2: Surpassing gpt-4 for zero-shot function calling.” https://nexusflow.ai/blogs/ravenv2, December 2023. Accessed on December 19, 2023.
[10] M. Cohn, User Stories Applied: For Agile Software Development. USA: Addison Wesley Longman Publishing Co., Inc., 2004.
[11] I. Sommerville, Software Engineering. Pearson, 10th ed., 2015.
[12] W. d. P. Paula Filho, Software Engineering: Fundamentals, Methods and Standards – Projects and Processes (in Portuguese). Gen-LTC Editora, 4th ed., 2019.
[13] D. Voorhees, Guide to Efficient Software Design: An MVC Approach to Concepts, Structures, and Models. Texts in Computer Science, Springer International Publishing, 2021.
[14] T. Varshney, “Introduction to llm agents.” https://developer.nvidia.com/blog/introduction-to-llm-agents/, November 2023. Accessed on December 19, 2023.

Task	Instruction
Algorithm	Algorithm to check the availability of the selected pizza type in real-time
Algorithm	Algorithm to record the new order with a gourmet margherita pizza and a set time of
	20 minutes from the current time
Algorithm	Algorithm to manage the countdown and ensure the order is ready in twenty minutes
Algorithm	Algorithm to notify the user when the order is placed, when it starts being prepared,
	and when it’s ready for delivery or pickup
Algorithm	Algorithm to handle payment for the order through the app’s integrated
	payment system
Algorithm	Algorithm to ensure the order is completed and pizza is handed off for delivery or
	pickup after twenty minutes
Algorithm	Algorithm to filter pizzerias that offer gourmet Margherita pizzas
Algorithm	Algorithm to estimate delivery time based on user location and pizzeria location
Algorithm	Algorithm to filter pizzerias with an estimated delivery time of 20 minutes or less
Algorithm	Algorithm to check for promotions or discounts on a specific item
Algorithm	Algorithm to determine if quick delivery is available for an item
Algorithm	Algorithm that combines CheckPromotionForItem and ShowPromotionDetails for a
	specific item
Algorithm	Algorithm that combines CheckQuickDeliveryOption and ShowDeliveryOption for a
	specific item
Algorithm	Filter the customizations applicable to Margherita pizza
Algorithm	Filter customizations ensuring a 20-minute delivery
Algorithm	Algorithm that retrieves restaurants sorted by user ratings and filters for
	gourmet Margherita pizza.
Algorithm	Algorithm that retrieves restaurant with the fastest delivery speed for
	Margherita pizza.
Algorithm	Algorithm that recommends the top-rated restaurant for gourmet Margherita
	pizza with the fastest delivery.
Algorithm	Check availability of Margherita gourmet pizza
Algorithm	Calculate total cost for a single Margherita gourmet pizza including additional fees
Algorithm	Provide delivery time estimate for quick delivery option
Algorithm	Algorithm to filter restaurant data based on certain criteria
Data Source	Database table containing different types of pizzas including gourmet margherita
Data Source	Database table to store information about user orders including details and timings
Data Source	Model containing pizzeria information including location and menu offerings
Data Source	Data source representing promotions or discounts
Data Source	Data source representing menu items including pizzas
Data Source	Retrieve list of gourmet pizza customizations
Data Source	Retrieve delivery times for each customization option
Data Source	Data source containing restaurant details including ratings and reviews.
Data Source	Data source containing delivery speed information for restaurants.
Data Source	Retrieve minimum order requirements and additional fees
Data Source	Retrieve delivery options, time estimates, and fees for quick delivery
User Interface	Interface to display the PizzaMenu for user selection
User Interface	Interface to show confirmation details and allow users to confirm their order
User Interface	Interface to display the real-time status of the order including the countdown and
	readiness status
User Interface	Interface to display the list of nearby pizzerias that meet the criteria
User Interface	Interface to display promotion details to the user
User Interface	Interface to display quick delivery availability to the user
User Interface	Show available crust types and cheese options for Margherita pizza within 20-minute
	delivery time
User Interface	Interface to show the recommended restaurant to the user.
User Interface	Show availability, total cost, and delivery time for a single Margherita gourmet pizza
User Interface	Interface to show filtered restaurant results to the user

Table 2: Raw list of instructions created by an LLM