HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: esvect
  • failed: suffix
  • failed: cuted
  • failed: parcolumns

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY 4.0
arXiv:2308.05361v4 [cs.CL] 07 Apr 2024
\WithSuffix

[1]

WeaverBird: Empowering Financial Decision-Making with
Large Language Model, Knowledge Base, and Search Engine

Siqiao Xue,*{}^{{\diamondsuit},{*}}start_FLOATSUPERSCRIPT ♢ , * end_FLOATSUPERSCRIPT, Fan Zhou,*{}^{{\diamondsuit},{*}}start_FLOATSUPERSCRIPT ♢ , * end_FLOATSUPERSCRIPT, Yi Xu{}^{\diamondsuit}start_FLOATSUPERSCRIPT ♢ end_FLOATSUPERSCRIPT, Ming **{}^{\clubsuit}start_FLOATSUPERSCRIPT ♣ end_FLOATSUPERSCRIPT, Qingsong Wen{}^{\heartsuit}start_FLOATSUPERSCRIPT ♡ end_FLOATSUPERSCRIPT, Hongyan Hao{}^{\diamondsuit}start_FLOATSUPERSCRIPT ♢ end_FLOATSUPERSCRIPT, Qingyang Dai{}^{\diamondsuit}start_FLOATSUPERSCRIPT ♢ end_FLOATSUPERSCRIPT and Caigao Jiang{}^{\diamondsuit}start_FLOATSUPERSCRIPT ♢ end_FLOATSUPERSCRIPT, Hongyu Zhao{}^{\diamondsuit}start_FLOATSUPERSCRIPT ♢ end_FLOATSUPERSCRIPT, Shuo Xie{}^{\diamondsuit}start_FLOATSUPERSCRIPT ♢ end_FLOATSUPERSCRIPT, Jianshan He{}^{\diamondsuit}start_FLOATSUPERSCRIPT ♢ end_FLOATSUPERSCRIPT, James Zhang{}^{\diamondsuit}start_FLOATSUPERSCRIPT ♢ end_FLOATSUPERSCRIPT, Hongyuan Mei{}^{\spadesuit}start_FLOATSUPERSCRIPT ♠ end_FLOATSUPERSCRIPT {\diamondsuit}Ant Group, {\clubsuit}Monash University, {\heartsuit}Alibaba DAMO Academy, {}^{\spadesuit}start_FLOATSUPERSCRIPT ♠ end_FLOATSUPERSCRIPTTTIC siqiao.xsq,hanlian.zf,[email protected],[email protected], [email protected] [email protected]
(2023)
Abstract.

We present WeaverBird, an intelligent dialogue system tailored for the finance sector. Our system harnesses a large language model of GPT architecture that has been tuned using extensive corpora of financial texts. As a result, our system possesses the capability to understand complex financial queries, such as “How should I manage my investments during inflation?”, and provide informed responses. Furthermore, our system integrates a local knowledge base and search engine to retrieve relevant information. The final system responses are conditioned on the web search and include proper references to the sources, thus enjoying enhanced credibility. Comparative evaluations on a wide range of financial question-answering highlight our system’s superior performance over other models. Experience WeaverBird for yourself through our live demo at https://weaverbird.ttic.edu and view our concise 2-minute video at https://www.youtube.com/watch?v=yofgeqnlrMc. Our code has also been made publicly available at https://github.com/ant-research/fin_domain_llm.

large language models, finance, dialogue system
copyright: acmcopyrightjournalyear: 2023doi: XXXXXXX.XXXXXXXconference: Make sure to enter the correct conference title from your rights confirmation emai; June 03–05, 2023; Woodstock, NYprice: 15.00isbn: 978-1-4503-XXXX-X/18/06ccs: Social and professional topics Computing and businessccs: Computing methodologies Natural language processing\text{*}\text{*}footnotetext: These authors contributed equally to this work.

1. Introduction

Everyone faces financial decisions from time to time. From questions like “What does the earnings report indicate for stock X?” to “How should I navigate my investments during inflation?”, many individuals prefer consulting financial professionals. Yet, not everyone has the means to access such expert services.

In this paper, we introduce WeaverBird111This name embodies our vision for WeaverBird, symbolizing our commitment to delivering comprehensive and intelligent support in the finance domain. It represents the intricate and skillful weaving of financial knowledge and information, empowering users with a robust and reliable resource., an intelligent dialogue system tailored for the financial sector. Our primary objective is to bridge the disparity in accessing financial advisory services and to democratize financial expertise. Acting as an intelligent financial assistant, WeaverBird is adept at understanding complex economic and financial queries, offering financial perspectives, and enabling users to make informed financial decisions. Through the launch of our system, we aspire to ensure that people across varied financial backgrounds gain access to trustworthy financial advice.

WebGLM (Liu et al., 2023) BloombergGPT (Wu et al., 2023) FinGPT (Yang et al., 2023) FinChat (fin, 2023) WeaverBird
Finance-domain specific LLM
Finetuned on bilingual financial data
Local knowledge base querying
Web search querying
Efficiency-optimized search on multi-source knowledge N/A
Temporal reasoning optimized N/A
Open access to training data &\&& code
Open access to deployed chatbot
Table 1. Comparative summary of competing approaches on various dimensions.

Another key motivation for develo** a finance dialogue system is to support future research in this area. A comparison of our WeaverBird system to other competitive approaches is summarized in Table 1. Throughout the development process of our system, we have amassed extensive collections of finance-centric text documents, which have been leveraged in fine-tuning language foundation models. This resource will be available to the research community for future exploration and analysis. Notably, our collection features a substantial corpus of Chinese financial documents, addressing the gap in the predominantly English-based finance corpora and offering a valuable addition. This equips our WeaverBird system with a distinctive edge in processing Chinese queries and delving into discussions concerning Chinese firms and equities. Furthermore, we have constructed a collection of challenging financial question-answer pairs, which can serve as a benchmark for evaluating the performance of future models and systems. Each question stems from a selected document, such as an industry research report. Consequently, answering these questions correctly requires the discovery of the specific document or equivalent information from the local knowledge base or across the internet. Our WeaverBird system harnesses the latest research advancements in large language models, enhanced by our novel training and inference methods, which we will elaborate on in section 2. This system sets a robust baseline for evaluating future models and systems in the finance domain.

An additional compelling motivation is the opportunity to leverage the knowledge and assets acquired during the development of our system to create intelligent assistants in broader domains, such as education and healthcare.

2. System Design

When presented with a natural language prompt, such as “How many EVs did BYD sell in Q1 2023?”, our system employs a two-phase process. First, it transforms the prompt into queries suitable for the knowledge base and search engine. Next, it analyzes the retrieved information, identifies the most relevant pieces, and generates a comprehensive natural language response, with proper references to the original sources. The Internet’s vast resources enable us to craft responses and insights based on up-to-date information, enhancing the adaptability and reliability of our approach. The overall pipeline of WeaverBird is depicted in Figure 1. While it builds upon the general framework of LangChain (Chase, 2022; LangChain-ChatGLM, 2023), our system integrates our novel training and inference techniques, which significantly enhance its overall performance and efficiency. In this section, we delineate the design of each phase, including the model architecture as well as training and inference paradigms.

Refer to caption
Figure 1. An illustration of WeaverBird that answers a financial query by intelligent search and generation.

2.1. Efficiency-Optimized Search and Update

The first phase is to retrieve information from our local knowledge base and the broader internet (i.e., the first step in Figure 1). Our knowledge base 𝒦𝒦\mathcal{K}caligraphic_K is a collection of financial documents 𝐝1loc,,𝐝Nlocsubscriptsuperscript𝐝loc1subscriptsuperscript𝐝loc𝑁{\bm{\mathbf{d}}}^{\text{loc}}_{1},\ldots,{\bm{\mathbf{d}}}^{\text{loc}}_{N}bold_d start_POSTSUPERSCRIPT loc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_d start_POSTSUPERSCRIPT loc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT where the number of documents N𝑁Nitalic_N is large. Following Chase (2022), we split each document 𝐝nsubscript𝐝𝑛{\bm{\mathbf{d}}}_{n}bold_d start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT into multiple paragraphs 𝐩n,1loc,,𝐩n,Mnlocsubscriptsuperscript𝐩loc𝑛1subscriptsuperscript𝐩loc𝑛subscript𝑀𝑛{\bm{\mathbf{p}}}^{\text{loc}}_{n,1},\ldots,{\bm{\mathbf{p}}}^{\text{loc}}_{n,% M_{n}}bold_p start_POSTSUPERSCRIPT loc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , 1 end_POSTSUBSCRIPT , … , bold_p start_POSTSUPERSCRIPT loc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_M start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT, where Mnsubscript𝑀𝑛M_{n}italic_M start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (and m𝑚mitalic_m below) denotes the index of paragraph for the n𝑛nitalic_n-th document, and embed each paragraph into a multidimensional embedding 𝐞n,mlocsubscriptsuperscript𝐞loc𝑛𝑚{\bm{\mathbf{e}}}^{\text{loc}}_{n,m}bold_e start_POSTSUPERSCRIPT loc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT through a neural encoder fkeysubscript𝑓keyf_{\text{key}}italic_f start_POSTSUBSCRIPT key end_POSTSUBSCRIPT. When a language query 𝐱𝐱{\bm{\mathbf{x}}}bold_x comes, it is embedded into a vector 𝐪𝐪{\bm{\mathbf{q}}}bold_q through another neural encoder fquerysubscript𝑓queryf_{\text{query}}italic_f start_POSTSUBSCRIPT query end_POSTSUBSCRIPT and we retrieve the top K𝐾Kitalic_K relevant paragraphs from the knowledge base according to their cosine similarities, i.e., 𝐪𝐞/𝐪𝐞superscript𝐪top𝐞norm𝐪norm𝐞{\bm{\mathbf{q}}}^{\top}{\bm{\mathbf{e}}}/\|{\bm{\mathbf{q}}}\|\|{\bm{\mathbf{% e}}}\|bold_q start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_e / ∥ bold_q ∥ ∥ bold_e ∥, where K𝐾Kitalic_K is a hyperparameter.

If the cosine similarity of the most relevant retrieval is larger than a predefined threshold c(0,1)𝑐01c\in(0,1)italic_c ∈ ( 0 , 1 ), we skip the internet search and proceed directly to the response generation phase that we present in section 2.3. Otherwise, we start an internet search, which is slower than a local search but may yield higher-quality results. This efficiency-optimized design is not in the original LangChain pipeline and we propose it to strike a balance between performance and efficiency.222Internet search is also not in the original LangChain, but it is in the latest release, concurrent with our work. When the local knowledge base already provides highly relevant information, we can avoid the need to search the internet. Otherwise, we invest additional effort to retrieve internet-based information for enhanced quality.

When an internet search333The search engine we use is Google Search. is used, we select the highest-ranked N𝑁Nitalic_N documents 𝐝1web,,𝐝Nwebsubscriptsuperscript𝐝web1subscriptsuperscript𝐝web𝑁{\bm{\mathbf{d}}}^{\text{web}}_{1},\ldots,{\bm{\mathbf{d}}}^{\text{web}}_{N}bold_d start_POSTSUPERSCRIPT web end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_d start_POSTSUPERSCRIPT web end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, split each document into paragraphs 𝐩n,1web,,𝐩n,Mnwebsubscriptsuperscript𝐩web𝑛1subscriptsuperscript𝐩web𝑛subscript𝑀𝑛{\bm{\mathbf{p}}}^{\text{web}}_{n,1},\ldots,{\bm{\mathbf{p}}}^{\text{web}}_{n,% M_{n}}bold_p start_POSTSUPERSCRIPT web end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , 1 end_POSTSUBSCRIPT , … , bold_p start_POSTSUPERSCRIPT web end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_M start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and embed each paragraph into a vector 𝐞n,mwebsubscriptsuperscript𝐞web𝑛𝑚{\bm{\mathbf{e}}}^{\text{web}}_{n,m}bold_e start_POSTSUPERSCRIPT web end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT through the neural encoder fkeysubscript𝑓keyf_{\text{key}}italic_f start_POSTSUBSCRIPT key end_POSTSUBSCRIPT that we have used for the knowledge base. Then we select the top K𝐾Kitalic_K relevant paragraphs according to their cosine similarities. Combined with the local retrievals, we end up with 2K2𝐾2K2 italic_K paragraphs and proceed to the next phase.

If any of the web paragraph exhibits a cosine similarity greater than the threshold c𝑐citalic_c, our system will automatically add the entire document to which it belongs into our local knowledge base. Users can also manually add any online document. This update ensures that future queries, whether identical or related, will receive more efficient responses. As updates occur, the knowledge base will expand, but the local search is still significantly more efficient than a web search.

2.2. Learning to Embed and Search

We confidently consider a higher similarity to signify a more relevant paragraph due to the training of the encoders fkeysubscript𝑓keyf_{\text{key}}italic_f start_POSTSUBSCRIPT key end_POSTSUBSCRIPT and fquerysubscript𝑓queryf_{\text{query}}italic_f start_POSTSUBSCRIPT query end_POSTSUBSCRIPT. Their optimization involves maximizing a well-defined objective:

(1) =𝐪𝐞0logi=0Iexp(𝐪𝐞i),superscript𝐪topsubscript𝐞0superscriptsubscript𝑖0𝐼superscript𝐪topsubscript𝐞𝑖\displaystyle\ell={\bm{\mathbf{q}}}^{\top}{\bm{\mathbf{e}}}_{0}-\log\sum_{i=0}% ^{I}\exp\left({\bm{\mathbf{q}}}^{\top}{\bm{\mathbf{e}}}_{i}\right),roman_ℓ = bold_q start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - roman_log ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT roman_exp ( bold_q start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ,

where 𝐞0subscript𝐞0{\bm{\mathbf{e}}}_{0}bold_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the embedding of the paragraph known to contain relevant information for the query, and the other I𝐼Iitalic_I embeddings 𝐞1,,𝐞Isubscript𝐞1subscript𝐞𝐼{\bm{\mathbf{e}}}_{1},\ldots,{\bm{\mathbf{e}}}_{I}bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_e start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT belong to a set of negative paragraphs (see section 3.3 for how they are selected). Intuitively, we want the dot products 𝐪𝐞superscript𝐪top𝐞{\bm{\mathbf{q}}}^{\top}{\bm{\mathbf{e}}}bold_q start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_e to be relatively large for the query-paragraph pairs that are actually relevant. Our encoders use Multilingual-E5-base model architecture (Wang et al., 2022).

Once we have trained the encoders, we could determine the threshold c𝑐citalic_c by computing the cosine similarities for a holdout set of query-paragraph pairs and analyzing the histogram.444We experimented with dot product and cosine similarity and found that the latter yields a better overall performance even though our training objective has used the former. In practice, we choose the 1% quantile to be our threshold c𝑐citalic_c.

Another question is where our query-paragraph pairs come from. We have collected a corpus of query-paragraph pairs by leveraging financial documents and question generation methods, which we will outline in section 3.

2.3. Generation by Large Language Models

In this phase, our system analyzes the search results, fills an appropriate prompt template and generates a response. We finally determine the strategies as follows:

Ranker.

The system firstly first ranks all the 2K2𝐾2K2 italic_K search results (or K𝐾Kitalic_K results if the internet search has been skipped as discussed in section 2.1) based on their cosine similarities with the query (i.e., the second step in Figure 1).

Prompt Formulation with Temporal Context.

After receiving the ranking results from the previous step, we plug the top J𝐽Jitalic_J (where J<K𝐽𝐾J<Kitalic_J < italic_K) results into the context part of the predefined prompt template. Since the knowledge of finance domain is dynamically evolving, shown in LABEL:lst:en_template, we ground each retrieved result and question with temporal facts (e.g., published date and question date) to facilitate the language model to adapt to time-evolved knowledge, possibly making the response more meaningful and accurate (** et al., 2023b).

Having temporal metadata for questions and retrieved results allows us to ask questions with relative time specifications (e.g., “3 months ago”), which are often used in finance queries. Moreover, it enables us to ingest new knowledge periodically, in a streaming setup, and evaluate on questions asked during that period (Liška et al., 2022).

English template
Context information:
{CONTEXT_DATE_1:CONTEXT_RETRO_1}
{CONTEXT_DATE_2K:CONTEXT_RETRO_2K}
The current date is {QUESTION_DATE}. Based on the given information, please provide a concise and professional response to the users question. If there are multiple questions in a query, please answer all of them. If the users question includes keywords like recent or latest to indicate a recent time frame, pay attention to the correspondence between the current date and the date of the information. If a clear answer cannot be determined, respond with Unable to answer the question based on the information provided”. You MUST respond in the same language as the question!
The question is: {QUESTION}.
Listing 1: Prompt templates for LLM.

Response Generation.

The large language model organizes the information to form a fluent and informed response, corresponding to the third step in Figure 1. The response also includes accurate references to the sources of the information. As an option, our system can further provide a list of citations for the remaining (KJ)𝐾𝐽(K-J)( italic_K - italic_J ) paragraphs at the end of the response, to complement the main response and provide users with additional recommended reading materials.

Our large language model has undergone adaptation to the finance domain, thus being able to skillfully process the input information and generate appropriate responses. We started with a pre-trained GLM (Du et al., 2022; Zeng et al., 2022) that has been pre-trained using extensive English and Chinese corpora. Then we fine-tuned this model using our collected Chinese and English financial documents as well as expert-annotated financial query-response pairs, which we introduce in section 3.

3. Data Collection and Utilization

In this section, we introduce our corpora of financial documents and their collection process, as well as how we used them to build the local knowledge base and generate training and evaluation query-paragraph pairs.

3.1. Collection of Financial Documents

We collected a large set of financial documents from well-known and accredited sources, most of which are global financial institutes. Our collection includes 240 million Chinese documents and 25 million English documents. They were all published between 2010-Jan-01 and 2023-June-30, and cover diverse topics such as macroeconomics and fixed income. Table 2 shows a break-down into the information sources, and Table 3 shows a break-down into the topics.

Chinese docs English docs
CICC 3.5%percent3.53.5\%3.5 % Deutsche Bank 18.5%percent18.518.5\%18.5 %
CITIC Sec. 3.1%percent3.13.1\%3.1 % CICC 17.3%percent17.317.3\%17.3 %
Industrial Sec. 2.9%percent2.92.9\%2.9 % HSBC 11.8%percent11.811.8\%11.8 %
Haitong Sec. 2.8%percent2.82.8\%2.8 % CITIC Sec. 6.4%percent6.46.4\%6.4 %
Guotaijunan Sec. 2.7%percent2.72.7\%2.7 % Guotaijunan Sec. 5.6%percent5.65.6\%5.6 %
Huatai Sec. 2.1%percent2.12.1\%2.1 % Kim Eng Sec. 4.8%percent4.84.8\%4.8 %
Guangfa Sec. 2.0%percent2.02.0\%2.0 % BNP Paribas 3.9%percent3.93.9\%3.9 %
Changjiang Sec. 2.0%percent2.02.0\%2.0 % Industrial Sec. 3.7%percent3.73.7\%3.7 %
CMS 2.0%percent2.02.0\%2.0 % Shenwan Hongyuan 3.7%percent3.73.7\%3.7 %
Founderfu 1.9%percent1.91.9\%1.9 % UBS 3.1%percent3.13.1\%3.1 %
Table 2. Top 10 sources of our financial documents.
Chinese docs English docs
Financial Engr. 25.8%percent25.825.8\%25.8 % HK Stocks 34.9%percent34.934.9\%34.9 %
Company 20.5%percent20.520.5\%20.5 % Company 21.7%percent21.721.7\%21.7 %
Industry 20.5%percent20.520.5\%20.5 % Industry 11.8%percent11.811.8\%11.8 %
Strategies 6.9%percent6.96.9\%6.9 % Macro econ 18.5%percent18.518.5\%18.5 %
HK Stocks 5.5%percent5.55.5\%5.5 % Strategies 5.6%percent5.65.6\%5.6 %
Morning report 4.9%percent4.94.9\%4.9 % Morning report 4.8%percent4.84.8\%4.8 %
Macro econ 4.8%percent4.84.8\%4.8 % Bond 3.9%percent3.93.9\%3.9 %
China GEM 4.5%percent4.54.5\%4.5 % China GEM 3.7%percent3.73.7\%3.7 %
Bond 4.1%percent4.14.1\%4.1 % Financial Engr. 0.1%percent0.10.1\%0.1 %
Fund 1.5%percent1.51.5\%1.5 % Fund 0.1%percent0.10.1\%0.1 %
Table 3. Top 10 topics of our financial documents.

For each document, we save into a text file the concatenation of its release time, title, summary, and other metadata (such as topic). By manually analyzing a sampled set of documents and consulting domain experts, we recognized that every document has a summary section that includes nearly all the important information of the document. Therefore, we did not save the full content, which will consume a significantly larger amount of disk space and memory but contribute minimal additional information. After processing, the average token count for Chinese documents is 5000 and that for English documents is 2000. Below are two examples of processed documents:

Example 0 (Chinese).

2023-04-03 00:00:00;海底捞(6862.HK):
2H22净利率7.5%; 门店重启稳步推进;海底捞发布2022年度业绩…(full content omitted); 港股;个股;公司定期报告评述;股票;**香港.

Example 0 (English).

2023-02-07 00:00:00; Feb 223 ECB monetary policy meeting commentary: ECB maintains pace of rate hikes without reinforcing…(full content omitted); macro;oversea;finance;economics research;economics;europe.

The primary use of our large-scale data collection is to build a local knowledge base. As mentioned in section 2.1, we split each document 𝐝nlocsubscriptsuperscript𝐝loc𝑛{\bm{\mathbf{d}}}^{\text{loc}}_{n}bold_d start_POSTSUPERSCRIPT loc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT into smaller chunks 𝐩n,1loc,,𝐩n,Mnlocsubscriptsuperscript𝐩loc𝑛1subscriptsuperscript𝐩loc𝑛subscript𝑀𝑛{\bm{\mathbf{p}}}^{\text{loc}}_{n,1},\ldots,{\bm{\mathbf{p}}}^{\text{loc}}_{n,% M_{n}}bold_p start_POSTSUPERSCRIPT loc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , 1 end_POSTSUBSCRIPT , … , bold_p start_POSTSUPERSCRIPT loc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_M start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT (about 250 tokens per chunk), and embed each chunk into a vector 𝐞n,mlocsubscriptsuperscript𝐞loc𝑛𝑚{\bm{\mathbf{e}}}^{\text{loc}}_{n,m}bold_e start_POSTSUPERSCRIPT loc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT with our trained neural encoder fkeysubscript𝑓keyf_{\text{key}}italic_f start_POSTSUBSCRIPT key end_POSTSUBSCRIPT. We implemented this vector-based knowledge base and its retrieval method using the Faiss (Johnson et al., 2019), a library highly optimized for efficient similarity search and clustering of dense vectors.

3.2. Collection of Query-Response Pairs

Another important use of our data collection is to create query-response pairs for tuning the language model. This step is crucial for the performance of our system: recent research has shown that language models tuned with query-response pairs have demonstrated superior performance in answering open-domain questions and engaging in conversations (Mangrulkar et al., 2022). We collected four kinds of query-response pairs, three of which were created from our collection of financial documents. Table 4 shows a breakdown of the pairs into these four categories. In each category, 60% of the pairs are in Chinese while the rest are in English.

Generation

For each document, we can construct a query-response pair that tunes the language model to perform content generation. To create the query, we fill the template “Please write a [TYPE] on [TOPIC] with the title [TITLE]” where TYPE, TOPIC, and TITLE can be found in the corresponding text file. The SUMMARY part of the file is used as the response. An example of query is “Please write a review report on China economics with the title CHINA POLICY: POLITBURO MAINTAINS PRO-GROWTH STANCE” and its response is “The politburo indicates continuous pro-growth policy without big stimulus…”

Summarization

The reverse of a content generation pair can be used as a content summarization pair. Now the query will be created by filling the template “Please summarize the following [TYPE]: [SUMMARY]” and the response is the TITLE. An example of query is “Please summarize the following report: NIO’s 4Q22 earnings missed our prior forcast…” and its response is “NIO INC. (NIO.US): 4Q22 miss”.

Close-book QA

Given a document, we would like the model to answer any questions that could be answered by directly identifying the relevant information from the document. This is the focus of the close-book QA research (Roberts et al., 2020). To achieve this goal, we need a specific kind of query-response pairs: query is the question and the document; response is the answer. We created such pairs using the question generation method proposed by Rajpurkar et al. (2016). Precisely, given a document, we used a pre-trained question generation model to identify some interesting information in the document and then generate a question whose answer should be that piece of information.

Open QA

We would also like the system to answer open questions: unlike close-book QA, the evidence is not available in the context. We created this kind of Chinese question-answer (i.e., query-response) pairs by scrap** well-known online QA forums such as Zhihu and SinaFinance. On such forums, users post questions, tag questions, answer questions, and upvote or downvote answers. We identified 6,000 highly recognized questions with the “finance” tag; with each question being the query, we took the answer with the most upvotes to be its response. We then randomly sampled 4,000 Enligsh open QA questions from the English finance QA dataset FinQA (Bharti, 2023).

Use case

%

# Pairs

Source

Generation

38%

13000

Collected financial documents

Summarization

20%

7000

Collected financial documents

Close-book QA

12%

4100

Collected financial documents

Open QA

30%

10000

Chinese: collected from online QA forums (e.g., Zhihu and SinaFinance)

English: sampled from FinQA (Bharti, 2023)

Table 4. Distribution of use case categories of the query-response pairs.

3.3. Data Collection for Training Encoders

Recall from section 2.2 that we need query-paragraph pairs to train the key and query encoders fkeysubscript𝑓keyf_{\text{key}}italic_f start_POSTSUBSCRIPT key end_POSTSUBSCRIPT and fquerysubscript𝑓queryf_{\text{query}}italic_f start_POSTSUBSCRIPT query end_POSTSUBSCRIPT. The close-book QA subset available in section 3.2 provides a direct resource to fulfill this requirement. First, we categorized all the close-book query-response pairs according to their topics. Then we manually reviewed the data of each topic and selected a total of 1,000 pairs while ensuring a comprehensive coverage of all topics. The selection of each pair was based on our anticipated likelihood that someone would ask this question during interaction with the demo. We used these 1,000 query-response pairs as the positive pairs in training the encoders. For each of them, we randomly sampled five negative responses from the entire pool of paragraphs.

4. Related Work

Large language models such as ChatGPT (Brown et al., 2020) and GPT-4 (OpenAI, 2023) have showcased their remarkable capabilities in engaging in human-like communication and understanding complex queries, bringing a trend of incorporating LLMs in various fields (Anil et al., 2023; Gunasekar et al., 2023; ** et al., 2023a). Recent advancements have further enhanced these models by incorporating external tools, enabling them to search for relevant online information (Nakano et al., 2021), utilize tools (Schick et al., 2023), and create more sophisticated applications (Chase, 2022). In this paper, we harness these advancements to build an intelligent dialogue system that specializes in the finance domain. The most relevant work is BloombergGPT (Wu et al., 2023), a generative Transformer trained with large corpora of finance-related text. While BloombergGPT has only been trained with English corpora, our system is also carefully tailored to handle Chinese, a language spoken by one-fifth of the global population. In addition, BloombergGPT is only trained to predict the next token given the context but not to engage in conversations like our WeaverBird system; nor is it integrated with knowledge bases or search engines.

Our work aligns with research on retrieval-based language generation, which involves enhancing language models by incorporating external datastores. This area has seen advancements in new model architectures that condition the next token generation on retrieved text (Khandelwal et al., 2020; Wu et al., 2022; Borgeaud et al., 2022), as well as developments in training and inference methods aimed at improving the quality of retrieval (Zhong et al., 2022) and the performance in downstream tasks such as question answering (Karpukhin et al., 2020; Izacard et al., 2022b; Shi et al., 2022; Menick et al., 2022) and code generation (Zhou et al., 2023). Our work aims to deliver a system that has superior performance in answering finance-related questions. It uses the standard GPT architecture, and augments the inputs (i.e., queries) with retrieved information (i.e., online documents or KB entries): this design is similar to the approaches of Shi et al. (2023); Ram et al. (2023), but our efficiency-optimized search and update (section 2.1) is novel.

Figure 2. Retrieval performance of all combinations of encoders and similarity scores. From left to right, they are: pretrained ME5 with cosine similarity, trained ME5 with cosine similarity, trained M3E with dot product, trained ME5 with Euclidean distance, trained Contriver with Euclidean distance, trained Contriver with dot product, and trained Contriver with cosine. Pre-trained Contriver is not presented since its performance is very poor.
Refer to captionRefer to caption
Refer to caption
Figure 2. Retrieval performance of all combinations of encoders and similarity scores. From left to right, they are: pretrained ME5 with cosine similarity, trained ME5 with cosine similarity, trained M3E with dot product, trained ME5 with Euclidean distance, trained Contriver with Euclidean distance, trained Contriver with dot product, and trained Contriver with cosine. Pre-trained Contriver is not presented since its performance is very poor.
Figure 3. Response quality performance of all methods. From left to right, they are: WebGLM, FinGPT, FinChat, WeaverBird, WeaverBird without knowledge base, WeaverBird without search engine and WeaverBird with neither knowledge base nor search engine.

5. Models and Training

5.1. Language Model

Architecture.

The system is based on the pre-trained ChatGLM2-6B (Du et al., 2022; Zeng et al., 2022), and we fine-tune it with the data and training strategy specified below.

Dataset.

We use the collected query-response pairs clarified in section 3.2 for training and evaluation. The whole set of pairs are divided into sets of 17,000 training pairs, 3,000 development pairs, and 4,100 test pairs.

Training.

We apply the LoRA (Hu et al., 2021) strategy to fine-tune the base model to adapt to the finance domain. The LoRA layer details are as follows: the LoRA rank is 16, the LoRA alpha is 32, the dropout is 0.1, and other settings are default.

5.2. Encoders

Architecture.

The key and query encoders fkeysubscript𝑓keyf_{\text{key}}italic_f start_POSTSUBSCRIPT key end_POSTSUBSCRIPT and fquerysubscript𝑓queryf_{\text{query}}italic_f start_POSTSUBSCRIPT query end_POSTSUBSCRIPT are initialized as Multilingual-E5-base (ME5) model architecture (Wang et al., 2022).

Dataset.

Recall from section 2.2 that we need query-paragraph pairs to train the key and query encoders fkeysubscript𝑓keyf_{\text{key}}italic_f start_POSTSUBSCRIPT key end_POSTSUBSCRIPT and fquerysubscript𝑓queryf_{\text{query}}italic_f start_POSTSUBSCRIPT query end_POSTSUBSCRIPT. We use the collected query-response pairs clarified in section 2.3 for training and evaluation. The chosen pairs were then divided into sets of 700 training pairs, 100 development pairs, and 200 test pairs.

Training.

We pass query-response pairs to the model to yield a scalar score for each of the pair and maximize the scores for the positive pairs while minimizing the scores for the negative pairs with the cross entropy loss.

5.3. Implementation and Deployment Details.

Knowledge Base and WebUI.

For the implementation of LangChain, we use the code from the public GitHub repository at https://github.com/imClumsyPanda/langchain-ChatGLM (LangChain-ChatGLM, 2023) without any License. For the implementation of WebUI, we use the code from the public GitHub repository at https://github.com/THUDM/WebGLM (Liu et al., 2023) with Apache 2.0 License.

Computation Cost.

All the experiments were conducted on a server with 128G RAM, a 64 logical cores CPU (Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz) and 8 NVIDIA A100 80G Tensor Core GPUs for acceleration. During the fine-tuning process on the finance dataset, we set the train batch size on each device equal to 8888, the gradient accumulation step equal to 100, the max step equal to 10,0001000010,00010 , 000 and the learning rate equal to 0.0001, and the running time of each step is close 72 seconds, the total running time is close to 200200200200 hours.

Deployment Details.

Our system is deployed on a server on Alibaba Cloud with 30G RAM, a 8 logical cores (Intel Xeon(Ice Lake) Platinum 8369B), and a NVIDIA A100 80G Tensor Core GPU. The inference speed depends search speed on the website, searches in local knowledge only take 1 second, and the search on the website takes 10 seconds on average.

6. Experiments

We present the experiments designed to evaluate the performance of the WeaverBird system, including the effectiveness of the trained encoders (section 6.1) and the generation quality (section 6.2).

6.1. Paragraph Retrieval Evaluation

Now we show the results of evaluating the trained encoders against pretrained ME5 and trained Contriver-based encoders (Izacard et al., 2022a).

Experimental Setup.

We use the test set specified in section 5.2 for evaluation. When evaluating a method (i.e., a specific model paired with a specific similarity metric), we take the top K=5𝐾5K=5italic_K = 5 highest-ranked retrievals for each given query, and evaluate the mean average precision (MAP) and mean average recall (MAR) on the entire test set of query-paragraph pairs. Intuitively, MAP measures the fraction of the predictions that are correct, while MAR measures the fraction of the actual paragraphs that are covered by the predictions. Higher is better.

Results.

The results are in Figure 3. As we can see, trained encoders work significantly better than pretrained encoders, indicating the effectiveness of our training method. Moreover, our ME5-based encoders significantly outperform the Contriver-based encoders: it is perhaps because a large portion of our test cases are in Chinese and ME5 is a bilingual model. Interestingly, cosine similarity turns out to work the best among all the similarity metrics, even though our training objective uses dot product.

6.2. Response Quality Evaluation

Experimental Setup

. We use the test set specified in section 5.1 for evaluation. Since automatic evaluation for complicated natural language QA still remains an open research question, we rate the response by both human expert and GPT-4 (OpenAI, 2023) due to its efficiency and consistency (Hackl et al., 2023): each response is given a score in [0,100]0100[0,100][ 0 , 100 ] by a financial expert and GPT-4, respectively and the final score is an average of the two. Higher is better. We compare WeaverBird with WebGLM, FinGPT and FinChat. BloombergGPT (Wu et al., 2023) is non open-sourced so it is excluded from the baselines. Due to the budget constraint, we evaluate FinChat, a commercial chatbot in finance, on a subset of 300 questions sampled from the test set.

Results.

The strongest results of our WeaverBird system is summarized in Figure 3. Although WebGLM is also bilingual and empowered with search engine, WeaverBird has superior performance mainly due to its domain-finetuned base model, trained encoders on financial documents as well as optimized retrieval on knowledge base and searched results. FinGPT has lower score than WeaverBird and WebGLM because it has not used multi-source knowledge. FinChat performs poorly because it is not billingual and its knowledge base covers a restricted list of stocks.

6.3. More Analysis

Analysis I: Effect of knowledge base and search engine on the response quality.

Firstly, we remove the knowledge base (‘wv-wo-kb’ in Figure 3) and this strategy causes a moderate drop in performance. Secondly, we remove the search engine (‘wv-wo-se’ in Figure 3) and the performance declines with a notable drop, indicating the web search ability, which introduces the ‘global knowledge base’ on the internet, greatly improves the accuracy of responses. To conclude, the searched results contribute most to the response quality while the local knowledge base contributes moderately.

Refer to caption
Figure 4. The effect of document retrieval accuracy on the response quality of the WeaverBird system.

Analysis II: Effect of the retriever on the response quality?

We randomly pick up 500500500500 examples of query-response pairs generated by WeaverBird and group the responses by the accuracies of retrieved documents (annotated by the experts). Not surprisingly, Figure 4 shows that higher accuracy of retrievals brings significant improvement on response qualities.

Analysis III: How to correct flawed cases.

We manually analyze 50505050 examples of finance-related query-response pairs annotated by the experts. Of these 50505050, 70%percent7070\%70 % cases arrived at reasonable responses. However, we also found some flawed output, most of which can be corrected with one of the following classes of modification:

  • Increase K𝐾Kitalic_K to have more searched results; increase J𝐽Jitalic_J to have more input for LLM; or increase both K𝐾Kitalic_K and J𝐽Jitalic_J.

  • The searched results usually consist of a number of relevant paragraphs instead of the full content of the web page. If needed, by using the functionality provided by our system, the user can choose to scrape the site and save the content to the local knowledge base. Given the full text, more relevant paragraphs are retrieved and better responses are possibly generated.

Analysis IV: Qualitative examples of challenging queries.

In this section, we present two illustrative examples. The first example poses a challenging question regarding the holdings of a renowned fund manager. Obviously, WeaverBird outperforms the competitors by delivering a reasonably accurate response, accompanied by citations to credible sources.

Query:   What is Microsofts dividend policy and how does it compare to other companies in the technology sector?
WebGLM:  Microsoft has a history of paying dividends to its shareholders. The companys most recent dividend increase was a 10%percent\%% increase on November 17, 2021, which brought its per-share payout to   0.68.Microsofthasalsoyields0.9formulae-sequence0.68𝑀𝑖𝑐𝑟𝑜𝑠𝑜𝑓𝑡𝑎𝑠𝑎𝑙𝑠𝑜𝑦𝑖𝑒𝑙𝑑𝑠0.90.68.Microsofthasalsoyields0.90.68 . italic_M italic_i italic_c italic_r italic_o italic_s italic_o italic_f italic_t italic_h italic_a italic_s italic_a italic_l italic_s italic_o italic_y italic_i italic_e italic_l italic_d italic_s 0.9\%onitsstock.Incomparisontoothertechnologycompanies,Microsoftsdividendpolicyisconsideredtobeagoodone.//citationstoinformationsourcesMoredetails:1.[InvestorsBusinessDaily],2.[TheMotleyFool],3.[Microsoft],4.[Forbes],5.[Barrons].@FinGPT:@Microsofthasahistoryofpayingdividendstoitsshareholders.Thecompanysmostrecentdividendincreasewasa10onitsstock.Incomparisontoothertechnologycompanies,Microsoft^{\prime}% sdividendpolicyisconsideredtobeagoodone.\par//% citationstoinformationsourcesMoredetails:1.[Investor^{\prime}sBusinessDaily],2% .[TheMotleyFool],3.[Microsoft],4.[Forbes],5.[Barron^{\prime}s].\par@% \lx@texthl@color{FinGPT:}@% Microsofthasahistoryofpayingdividendstoitsshareholders.Thecompany^{\prime}% smostrecentdividendincreasewasa10italic_o italic_n italic_i italic_t italic_s italic_s italic_t italic_o italic_c italic_k . italic_I italic_n italic_c italic_o italic_m italic_p italic_a italic_r italic_i italic_s italic_o italic_n italic_t italic_o italic_o italic_t italic_h italic_e italic_r italic_t italic_e italic_c italic_h italic_n italic_o italic_l italic_o italic_g italic_y italic_c italic_o italic_m italic_p italic_a italic_n italic_i italic_e italic_s , italic_M italic_i italic_c italic_r italic_o italic_s italic_o italic_f italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_p italic_o italic_l italic_i italic_c italic_y italic_i italic_s italic_c italic_o italic_n italic_s italic_i italic_d italic_e italic_r italic_e italic_d italic_t italic_o italic_b italic_e italic_a italic_g italic_o italic_o italic_d italic_o italic_n italic_e . / / italic_c italic_i italic_t italic_a italic_t italic_i italic_o italic_n italic_s italic_t italic_o italic_i italic_n italic_f italic_o italic_r italic_m italic_a italic_t italic_i italic_o italic_n italic_s italic_o italic_u italic_r italic_c italic_e italic_s italic_M italic_o italic_r italic_e italic_d italic_e italic_t italic_a italic_i italic_l italic_s : 1 . [ italic_I italic_n italic_v italic_e italic_s italic_t italic_o italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s italic_B italic_u italic_s italic_i italic_n italic_e italic_s italic_s italic_D italic_a italic_i italic_l italic_y ] , 2 . [ italic_T italic_h italic_e italic_M italic_o italic_t italic_l italic_e italic_y italic_F italic_o italic_o italic_l ] , 3 . [ italic_M italic_i italic_c italic_r italic_o italic_s italic_o italic_f italic_t ] , 4 . [ italic_F italic_o italic_r italic_b italic_e italic_s ] , 5 . [ italic_B italic_a italic_r italic_r italic_o italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s ] . @ italic_F italic_i italic_n italic_G italic_P italic_T : @ italic_M italic_i italic_c italic_r italic_o italic_s italic_o italic_f italic_t italic_h italic_a italic_s italic_a italic_h italic_i italic_s italic_t italic_o italic_r italic_y italic_o italic_f italic_p italic_a italic_y italic_i italic_n italic_g italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_s italic_t italic_o italic_i italic_t italic_s italic_s italic_h italic_a italic_r italic_e italic_h italic_o italic_l italic_d italic_e italic_r italic_s . italic_T italic_h italic_e italic_c italic_o italic_m italic_p italic_a italic_n italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s italic_m italic_o italic_s italic_t italic_r italic_e italic_c italic_e italic_n italic_t italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_i italic_n italic_c italic_r italic_e italic_a italic_s italic_e italic_w italic_a italic_s italic_a 10\%increaseonNovember17,2021,whichbroughtitspersharepayoutto𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒𝑜𝑛𝑁𝑜𝑣𝑒𝑚𝑏𝑒𝑟172021𝑤𝑖𝑐𝑏𝑟𝑜𝑢𝑔𝑡𝑖𝑡𝑠𝑝𝑒𝑟𝑠𝑎𝑟𝑒𝑝𝑎𝑦𝑜𝑢𝑡𝑡𝑜increaseonNovember17,2021,whichbroughtitsper-sharepayouttoitalic_i italic_n italic_c italic_r italic_e italic_a italic_s italic_e italic_o italic_n italic_N italic_o italic_v italic_e italic_m italic_b italic_e italic_r 17 , 2021 , italic_w italic_h italic_i italic_c italic_h italic_b italic_r italic_o italic_u italic_g italic_h italic_t italic_i italic_t italic_s italic_p italic_e italic_r - italic_s italic_h italic_a italic_r italic_e italic_p italic_a italic_y italic_o italic_u italic_t italic_t italic_o\
0.68. Microsoft has also yields 0.9%percent\%% on its stock. In comparison to other technology companies, Microsofts dividend policy is considered to be a good one.
FinChat:  Microsoft Corporation has a dividend policy in place. As of the most recent data, Microsoft pays a dividend four times a year. The dividend amount is   3.00pershare.Thedividendyield,whichisthedividendamountdividedbythestockprice,is0.88formulae-sequence3.00𝑝𝑒𝑟𝑠𝑎𝑟𝑒𝑇𝑒𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑦𝑖𝑒𝑙𝑑𝑤𝑖𝑐𝑖𝑠𝑡𝑒𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑎𝑚𝑜𝑢𝑛𝑡𝑑𝑖𝑣𝑖𝑑𝑒𝑑𝑏𝑦𝑡𝑒𝑠𝑡𝑜𝑐𝑘𝑝𝑟𝑖𝑐𝑒𝑖𝑠0.883.00pershare.Thedividendyield,whichisthedividendamountdividedbythestockprice,% is0.883.00 italic_p italic_e italic_r italic_s italic_h italic_a italic_r italic_e . italic_T italic_h italic_e italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_y italic_i italic_e italic_l italic_d , italic_w italic_h italic_i italic_c italic_h italic_i italic_s italic_t italic_h italic_e italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_a italic_m italic_o italic_u italic_n italic_t italic_d italic_i italic_v italic_i italic_d italic_e italic_d italic_b italic_y italic_t italic_h italic_e italic_s italic_t italic_o italic_c italic_k italic_p italic_r italic_i italic_c italic_e , italic_i italic_s 0.88\%asofOctober25,2023.Microsofthasaconsistenthistoryofpayingdividends,withthefirstdividendpaymentdatingbacktoFebruary19,2003.ThemostrecentdividendpaymentwasmadeonNovember15,2023.IntermsofhowMicrosoftsdividendpolicycomparestoothercompaniesinthetechnologysector,itisimportanttonotethatdividendpoliciescanvaryamongcompanies(longdetailsomitted).@WeaverBird:@Microsoftsdividendpolicyistopayaquarterlydividendofformulae-sequence𝑎𝑠𝑜𝑓𝑂𝑐𝑡𝑜𝑏𝑒𝑟252023𝑀𝑖𝑐𝑟𝑜𝑠𝑜𝑓𝑡𝑎𝑠𝑎𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡𝑖𝑠𝑡𝑜𝑟𝑦𝑜𝑓𝑝𝑎𝑦𝑖𝑛𝑔𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑠𝑤𝑖𝑡𝑡𝑒𝑓𝑖𝑟𝑠𝑡𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑝𝑎𝑦𝑚𝑒𝑛𝑡𝑑𝑎𝑡𝑖𝑛𝑔𝑏𝑎𝑐𝑘𝑡𝑜𝐹𝑒𝑏𝑟𝑢𝑎𝑟𝑦192003𝑇𝑒𝑚𝑜𝑠𝑡𝑟𝑒𝑐𝑒𝑛𝑡𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑝𝑎𝑦𝑚𝑒𝑛𝑡𝑤𝑎𝑠𝑚𝑎𝑑𝑒𝑜𝑛𝑁𝑜𝑣𝑒𝑚𝑏𝑒𝑟152023𝐼𝑛𝑡𝑒𝑟𝑚𝑠𝑜𝑓𝑜𝑤𝑀𝑖𝑐𝑟𝑜𝑠𝑜𝑓superscript𝑡𝑠𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑝𝑜𝑙𝑖𝑐𝑦𝑐𝑜𝑚𝑝𝑎𝑟𝑒𝑠𝑡𝑜𝑜𝑡𝑒𝑟𝑐𝑜𝑚𝑝𝑎𝑛𝑖𝑒𝑠𝑖𝑛𝑡𝑒𝑡𝑒𝑐𝑛𝑜𝑙𝑜𝑔𝑦𝑠𝑒𝑐𝑡𝑜𝑟𝑖𝑡𝑖𝑠𝑖𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑡𝑡𝑜𝑛𝑜𝑡𝑒𝑡𝑎𝑡𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑝𝑜𝑙𝑖𝑐𝑖𝑒𝑠𝑐𝑎𝑛𝑣𝑎𝑟𝑦𝑎𝑚𝑜𝑛𝑔𝑐𝑜𝑚𝑝𝑎𝑛𝑖𝑒𝑠𝑙𝑜𝑛𝑔𝑑𝑒𝑡𝑎𝑖𝑙𝑠𝑜𝑚𝑖𝑡𝑡𝑒𝑑@WeaverBird:@𝑀𝑖𝑐𝑟𝑜𝑠𝑜𝑓superscript𝑡𝑠𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑝𝑜𝑙𝑖𝑐𝑦𝑖𝑠𝑡𝑜𝑝𝑎𝑦𝑎𝑞𝑢𝑎𝑟𝑡𝑒𝑟𝑙𝑦𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑜𝑓asofOctober25,2023.\par Microsofthasaconsistenthistoryofpayingdividends,% withthefirstdividendpaymentdatingbacktoFebruary19,2003.% ThemostrecentdividendpaymentwasmadeonNovember15,2023.IntermsofhowMicrosoft^{% \prime}sdividendpolicycomparestoothercompaniesinthetechnologysector,% itisimportanttonotethatdividendpoliciescanvaryamongcompanies...(% longdetailsomitted).\par@\lx@texthl@color{WeaverBird:}@Microsoft^{\prime}sdividendpolicyistopayaquarterlydividendofitalic_a italic_s italic_o italic_f italic_O italic_c italic_t italic_o italic_b italic_e italic_r 25 , 2023 . italic_M italic_i italic_c italic_r italic_o italic_s italic_o italic_f italic_t italic_h italic_a italic_s italic_a italic_c italic_o italic_n italic_s italic_i italic_s italic_t italic_e italic_n italic_t italic_h italic_i italic_s italic_t italic_o italic_r italic_y italic_o italic_f italic_p italic_a italic_y italic_i italic_n italic_g italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_s , italic_w italic_i italic_t italic_h italic_t italic_h italic_e italic_f italic_i italic_r italic_s italic_t italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_p italic_a italic_y italic_m italic_e italic_n italic_t italic_d italic_a italic_t italic_i italic_n italic_g italic_b italic_a italic_c italic_k italic_t italic_o italic_F italic_e italic_b italic_r italic_u italic_a italic_r italic_y 19 , 2003 . italic_T italic_h italic_e italic_m italic_o italic_s italic_t italic_r italic_e italic_c italic_e italic_n italic_t italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_p italic_a italic_y italic_m italic_e italic_n italic_t italic_w italic_a italic_s italic_m italic_a italic_d italic_e italic_o italic_n italic_N italic_o italic_v italic_e italic_m italic_b italic_e italic_r 15 , 2023 . italic_I italic_n italic_t italic_e italic_r italic_m italic_s italic_o italic_f italic_h italic_o italic_w italic_M italic_i italic_c italic_r italic_o italic_s italic_o italic_f italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_p italic_o italic_l italic_i italic_c italic_y italic_c italic_o italic_m italic_p italic_a italic_r italic_e italic_s italic_t italic_o italic_o italic_t italic_h italic_e italic_r italic_c italic_o italic_m italic_p italic_a italic_n italic_i italic_e italic_s italic_i italic_n italic_t italic_h italic_e italic_t italic_e italic_c italic_h italic_n italic_o italic_l italic_o italic_g italic_y italic_s italic_e italic_c italic_t italic_o italic_r , italic_i italic_t italic_i italic_s italic_i italic_m italic_p italic_o italic_r italic_t italic_a italic_n italic_t italic_t italic_o italic_n italic_o italic_t italic_e italic_t italic_h italic_a italic_t italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_p italic_o italic_l italic_i italic_c italic_i italic_e italic_s italic_c italic_a italic_n italic_v italic_a italic_r italic_y italic_a italic_m italic_o italic_n italic_g italic_c italic_o italic_m italic_p italic_a italic_n italic_i italic_e italic_s … ( italic_l italic_o italic_n italic_g italic_d italic_e italic_t italic_a italic_i italic_l italic_s italic_o italic_m italic_i italic_t italic_t italic_e italic_d ) . @ italic_W italic_e italic_a italic_v italic_e italic_r italic_B italic_i italic_r italic_d : @ italic_M italic_i italic_c italic_r italic_o italic_s italic_o italic_f italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_p italic_o italic_l italic_i italic_c italic_y italic_i italic_s italic_t italic_o italic_p italic_a italic_y italic_a italic_q italic_u italic_a italic_r italic_t italic_e italic_r italic_l italic_y italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_o italic_f\
0.68 per share. This means that for each share of Microsoft stock, its shareholders will receive   0.68every3monthsor4timesperyear.Thecompanysdividendpaymentsaretypicallymadeintheformofaquarterlydividend,whichisapaymentmadebythecompanytoitsshareholdersonaregularbasis.WhencomparingMicrosoftsdividendpolicytoothercompaniesinthetechnologysector,itisimportanttoconsiderthecompanyspayoutratio,whichisthepercentageofitsearningsthatarepaidouttoshareholdersasdividends.Microsoftspayoutratiois66.1formulae-sequence0.68𝑒𝑣𝑒𝑟𝑦3𝑚𝑜𝑛𝑡𝑠𝑜𝑟4𝑡𝑖𝑚𝑒𝑠𝑝𝑒𝑟𝑦𝑒𝑎𝑟𝑇𝑒𝑐𝑜𝑚𝑝𝑎𝑛superscript𝑦𝑠𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑝𝑎𝑦𝑚𝑒𝑛𝑡𝑠𝑎𝑟𝑒𝑡𝑦𝑝𝑖𝑐𝑎𝑙𝑙𝑦𝑚𝑎𝑑𝑒𝑖𝑛𝑡𝑒𝑓𝑜𝑟𝑚𝑜𝑓𝑎𝑞𝑢𝑎𝑟𝑡𝑒𝑟𝑙𝑦𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑤𝑖𝑐𝑖𝑠𝑎𝑝𝑎𝑦𝑚𝑒𝑛𝑡𝑚𝑎𝑑𝑒𝑏𝑦𝑡𝑒𝑐𝑜𝑚𝑝𝑎𝑛𝑦𝑡𝑜𝑖𝑡𝑠𝑠𝑎𝑟𝑒𝑜𝑙𝑑𝑒𝑟𝑠𝑜𝑛𝑎𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑏𝑎𝑠𝑖𝑠𝑊𝑒𝑛𝑐𝑜𝑚𝑝𝑎𝑟𝑖𝑛𝑔𝑀𝑖𝑐𝑟𝑜𝑠𝑜𝑓superscript𝑡𝑠𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑝𝑜𝑙𝑖𝑐𝑦𝑡𝑜𝑜𝑡𝑒𝑟𝑐𝑜𝑚𝑝𝑎𝑛𝑖𝑒𝑠𝑖𝑛𝑡𝑒𝑡𝑒𝑐𝑛𝑜𝑙𝑜𝑔𝑦𝑠𝑒𝑐𝑡𝑜𝑟𝑖𝑡𝑖𝑠𝑖𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑡𝑡𝑜𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑡𝑒𝑐𝑜𝑚𝑝𝑎𝑛superscript𝑦𝑠𝑝𝑎𝑦𝑜𝑢𝑡𝑟𝑎𝑡𝑖𝑜𝑤𝑖𝑐𝑖𝑠𝑡𝑒𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒𝑜𝑓𝑖𝑡𝑠𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠𝑡𝑎𝑡𝑎𝑟𝑒𝑝𝑎𝑖𝑑𝑜𝑢𝑡𝑡𝑜𝑠𝑎𝑟𝑒𝑜𝑙𝑑𝑒𝑟𝑠𝑎𝑠𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑠𝑀𝑖𝑐𝑟𝑜𝑠𝑜𝑓superscript𝑡𝑠𝑝𝑎𝑦𝑜𝑢𝑡𝑟𝑎𝑡𝑖𝑜𝑖𝑠66.10.68every3monthsor4timesperyear.Thecompany^{\prime}% sdividendpaymentsaretypicallymadeintheformofaquarterlydividend,% whichisapaymentmadebythecompanytoitsshareholdersonaregularbasis.% WhencomparingMicrosoft^{\prime}% sdividendpolicytoothercompaniesinthetechnologysector,% itisimportanttoconsiderthecompany^{\prime}spayoutratio,% whichisthepercentageofitsearningsthatarepaidouttoshareholdersasdividends.% Microsoft^{\prime}spayoutratiois66.10.68 italic_e italic_v italic_e italic_r italic_y 3 italic_m italic_o italic_n italic_t italic_h italic_s italic_o italic_r 4 italic_t italic_i italic_m italic_e italic_s italic_p italic_e italic_r italic_y italic_e italic_a italic_r . italic_T italic_h italic_e italic_c italic_o italic_m italic_p italic_a italic_n italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_p italic_a italic_y italic_m italic_e italic_n italic_t italic_s italic_a italic_r italic_e italic_t italic_y italic_p italic_i italic_c italic_a italic_l italic_l italic_y italic_m italic_a italic_d italic_e italic_i italic_n italic_t italic_h italic_e italic_f italic_o italic_r italic_m italic_o italic_f italic_a italic_q italic_u italic_a italic_r italic_t italic_e italic_r italic_l italic_y italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d , italic_w italic_h italic_i italic_c italic_h italic_i italic_s italic_a italic_p italic_a italic_y italic_m italic_e italic_n italic_t italic_m italic_a italic_d italic_e italic_b italic_y italic_t italic_h italic_e italic_c italic_o italic_m italic_p italic_a italic_n italic_y italic_t italic_o italic_i italic_t italic_s italic_s italic_h italic_a italic_r italic_e italic_h italic_o italic_l italic_d italic_e italic_r italic_s italic_o italic_n italic_a italic_r italic_e italic_g italic_u italic_l italic_a italic_r italic_b italic_a italic_s italic_i italic_s . italic_W italic_h italic_e italic_n italic_c italic_o italic_m italic_p italic_a italic_r italic_i italic_n italic_g italic_M italic_i italic_c italic_r italic_o italic_s italic_o italic_f italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_p italic_o italic_l italic_i italic_c italic_y italic_t italic_o italic_o italic_t italic_h italic_e italic_r italic_c italic_o italic_m italic_p italic_a italic_n italic_i italic_e italic_s italic_i italic_n italic_t italic_h italic_e italic_t italic_e italic_c italic_h italic_n italic_o italic_l italic_o italic_g italic_y italic_s italic_e italic_c italic_t italic_o italic_r , italic_i italic_t italic_i italic_s italic_i italic_m italic_p italic_o italic_r italic_t italic_a italic_n italic_t italic_t italic_o italic_c italic_o italic_n italic_s italic_i italic_d italic_e italic_r italic_t italic_h italic_e italic_c italic_o italic_m italic_p italic_a italic_n italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s italic_p italic_a italic_y italic_o italic_u italic_t italic_r italic_a italic_t italic_i italic_o , italic_w italic_h italic_i italic_c italic_h italic_i italic_s italic_t italic_h italic_e italic_p italic_e italic_r italic_c italic_e italic_n italic_t italic_a italic_g italic_e italic_o italic_f italic_i italic_t italic_s italic_e italic_a italic_r italic_n italic_i italic_n italic_g italic_s italic_t italic_h italic_a italic_t italic_a italic_r italic_e italic_p italic_a italic_i italic_d italic_o italic_u italic_t italic_t italic_o italic_s italic_h italic_a italic_r italic_e italic_h italic_o italic_l italic_d italic_e italic_r italic_s italic_a italic_s italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_s . italic_M italic_i italic_c italic_r italic_o italic_s italic_o italic_f italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s italic_p italic_a italic_y italic_o italic_u italic_t italic_r italic_a italic_t italic_i italic_o italic_i italic_s 66.1\%,whichisrelativelyhighcomparedtosomeothertechnologycompanies.Forexample,Apple,Microsoftsmaincompetitorinthetechnologysector,paysadividendofonly3.4,whichisrelativelyhighcomparedtosomeothertechnologycompanies.Forexample,Apple,% Microsoft^{\prime}smaincompetitorinthetechnologysector,paysadividendofonly3.4, italic_w italic_h italic_i italic_c italic_h italic_i italic_s italic_r italic_e italic_l italic_a italic_t italic_i italic_v italic_e italic_l italic_y italic_h italic_i italic_g italic_h italic_c italic_o italic_m italic_p italic_a italic_r italic_e italic_d italic_t italic_o italic_s italic_o italic_m italic_e italic_o italic_t italic_h italic_e italic_r italic_t italic_e italic_c italic_h italic_n italic_o italic_l italic_o italic_g italic_y italic_c italic_o italic_m italic_p italic_a italic_n italic_i italic_e italic_s . italic_F italic_o italic_r italic_e italic_x italic_a italic_m italic_p italic_l italic_e , italic_A italic_p italic_p italic_l italic_e , italic_M italic_i italic_c italic_r italic_o italic_s italic_o italic_f italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s italic_m italic_a italic_i italic_n italic_c italic_o italic_m italic_p italic_e italic_t italic_i italic_t italic_o italic_r italic_i italic_n italic_t italic_h italic_e italic_t italic_e italic_c italic_h italic_n italic_o italic_l italic_o italic_g italic_y italic_s italic_e italic_c italic_t italic_o italic_r , italic_p italic_a italic_y italic_s italic_a italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_o italic_f italic_o italic_n italic_l italic_y 3.4\%ofitsearningsouttoshareholders.GooglesdividendpolicyissimilartoMicrosofts,butitspayoutratioislowerat52.5formulae-sequence𝑜𝑓𝑖𝑡𝑠𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠𝑜𝑢𝑡𝑡𝑜𝑠𝑎𝑟𝑒𝑜𝑙𝑑𝑒𝑟𝑠𝐺𝑜𝑜𝑔𝑙superscript𝑒𝑠𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑𝑝𝑜𝑙𝑖𝑐𝑦𝑖𝑠𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑡𝑜𝑀𝑖𝑐𝑟𝑜𝑠𝑜𝑓superscript𝑡𝑠𝑏𝑢𝑡𝑖𝑡𝑠𝑝𝑎𝑦𝑜𝑢𝑡𝑟𝑎𝑡𝑖𝑜𝑖𝑠𝑙𝑜𝑤𝑒𝑟𝑎𝑡52.5ofitsearningsouttoshareholders.Google^{\prime}% sdividendpolicyissimilartoMicrosoft^{\prime}s,butitspayoutratioislowerat52.5italic_o italic_f italic_i italic_t italic_s italic_e italic_a italic_r italic_n italic_i italic_n italic_g italic_s italic_o italic_u italic_t italic_t italic_o italic_s italic_h italic_a italic_r italic_e italic_h italic_o italic_l italic_d italic_e italic_r italic_s . italic_G italic_o italic_o italic_g italic_l italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s italic_d italic_i italic_v italic_i italic_d italic_e italic_n italic_d italic_p italic_o italic_l italic_i italic_c italic_y italic_i italic_s italic_s italic_i italic_m italic_i italic_l italic_a italic_r italic_t italic_o italic_M italic_i italic_c italic_r italic_o italic_s italic_o italic_f italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_s , italic_b italic_u italic_t italic_i italic_t italic_s italic_p italic_a italic_y italic_o italic_u italic_t italic_r italic_a italic_t italic_i italic_o italic_i italic_s italic_l italic_o italic_w italic_e italic_r italic_a italic_t 52.5\%
.
Overall, Microsofts dividend policy is relatively high compared to other technology companies, and its shareholders can expect to receive regular dividends as the company continues to perform well.
// citations to information sources
More details: 1. [Dividends Diversify], 2. [The Motley Fool], 3. [Seeking Alpha], 4. [Forbes], 5. [Barrons].
Listing 2: About company dividend policy.

The second example is in Chinese. As we can see, our WeaverBird system exhibits superior performance in answering Chinese finance-related questions, which neither WebGLM nor FinChat could answer well.

Query:  半夏资本**蓓的最新投资观点是什么?
<– English translation
What is the latest investment viewpoint of Li Bei at Banxia Capital?
–>
WebGLM:  半夏资本**蓓**日在半夏投资公众号发表了两篇文章,对当前全球宏观形势和投资机会进行了展望。她表示,人民币和**股市或进入牛市。在今年的4月底,她曾经提出过“股票风险已经越来越小”的观点。半夏投资是一家极具流量的知名私募公司,**蓓经常通过官方公众号发布自己对投资、市场以及一些热点话题的看法,且观点鲜明,行文犀利。
<– English translation
Li Bei, from Banxia Capital, recently published two articles on the Banxia Investment official WeChat account, providing insights into the current global macro situation and investment opportunities. She mentioned that the Renminbi and Chinese stock market may enter a bull market. In late April of this year, she also expressed the view that stock market risks are becoming smaller”. Banxia Investment is a well-known private equity firm with a large following. Li Bei often shares her views on investment, markets, and hot topics through the official WeChat account, presenting distinct opinions and sharp writing style.
–>
More Details: 1. [stcn.com], 2. [stcn.com], 3. [ce.cn], 4. [sina.cn], 5. [sina.com.cn].
FinGPT:   半夏资本投资总监**蓓在投资界活动分享中表示,A股市场投资难度越来越大,价值投资和成长投资之间的边界越来越模糊,在这样的市场环境中,投资策略的边界也变得越来越模糊。她认为,在这样的市场环境中,投资策略的边界也变得越来越模糊,在这样的市场环境中,投资者需要有更加开放的心态,能够灵活应对市场变化,寻找一些未被充分发掘的投资机会。
<– English translation
Banxia Capitals Investment Director, Li Bei, shared during an investment conference that investing in the A-share market has become increasingly challenging, with the boundaries between value investing and growth investing becoming more blurred. In such a market environment, the boundaries of investment strategies have also become increasingly unclear. She believes that in such a market environment, investors need to have a more open mindset and be flexible in responding to market changes, seeking out investment opportunities that have not been fully explored.
–>
FinChat:  I’m sorry, but I couldn’t find any information about半夏资本**蓓s latest investment views in the provided data sources. The data sources primarily focus on different companies and their financial information.
<– English translation
Based on the provided data, there is no information available about the latest investment views of Li Bei at Banxia Capital. The data source does not include Li Beis investment perspectives. If you have any other questions about Li Bei at Banxia Capital, I will do my best to answer them.
–>
WeaverBird:  根据已知信息,半夏投资创始合伙人兼基金经理**蓓在最**的采访中透露了一些最新的投资观点。她表示,过去一个月,半夏投资在持续提高股票仓位,主要集中在中下游大盘股。另外,她也提到了跨资产类别的好处,认为当你发现一个大的经济趋势的时候,并不是说投股票才是最好的选择,有些时候可能商品是更好的选择。**蓓还表示,基金规模过大并不利于宏观对冲策略的发挥。
<– English translation
Based on the known information, Li Bei, the co-founder and fund manager of Banxia Capital, revealed some of the latest investment perspectives in a recent interview. She mentioned that over the past month, Banxia Capital has been consistently increasing its stock positions, mainly focusing on large-cap stocks in the mid-to-low sectors. Additionally, she emphasized the benefits of diversifying across asset classes, stating that when you identify a major economic trend, it doesnt always mean investing in stocks is the best choice; sometimes commodities might be a better option. Li Bei also mentioned that having an excessively large fund size is not conducive to the implementation of macro hedging strategies.
–>
More Details: 1. [21**gji.com], 2. [stcn.com], 3. [ce.cn], 4. [sina.cn], 5. [sina.com.cn].
Listing 3: A Chinese example.

7. Conclusion

We presented an open-source, intelligent dialogue system tailored for the finance sector, which outperforms the best available solutions as evidenced by its superior capabilities in solving a wide range of finance-related questions. Our systematic approach contributes to the line of research on building domain-specific large language models. In addition, our training and inference strategies may be useful for develo** retrieval-based dialogue systems in other domains, allowing us to unlock broader real applications.

References

  • (1)
  • fin (2023) 2023. FinChat. https://finchat.io/
  • Anil et al. (2023) Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, et al. 2023. Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023).
  • Bharti (2023) Gaurang Bharti. 2023. wealth-alpaca. https://github.com/gaurangbharti1/wealth-alpaca
  • Borgeaud et al. (2022) Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. In Proceedings of the International Conference on Machine Learning (ICML). https://proceedings.mlr.press/v162/borgeaud22a/borgeaud22a.pdf
  • Brown et al. (2020) Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS). https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
  • Chase (2022) Harrison Chase. 2022. LangChain. https://github.com/hwchase17/langchain
  • Du et al. (2022) Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2022. GLM: General Language Model Pretraining with Autoregressive Blank Infilling. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). https://arxiv.longhoe.net/abs/2103.10360
  • Gunasekar et al. (2023) Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, et al. 2023. Textbooks Are All You Need. arXiv preprint arXiv:2306.11644 (2023).
  • Hackl et al. (2023) Veronika Hackl, Alexandra Elena Müller, Michael Granitzer, and Maximilian Sailer. 2023. Is GPT-4 a reliable rater? Evaluating Consistency in GPT-4 Text Ratings. arXiv:2308.02575 [cs.CL]
  • Hu et al. (2021) Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021). https://arxiv.longhoe.net/pdf/2106.09685.pdf
  • Izacard et al. (2022a) Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022a. Unsupervised Dense Information Retrieval with Contrastive Learning. Trans. Mach. Learn. Res. 2022 (2022). https://openreview.net/forum?id=jKN1pXi7b0
  • Izacard et al. (2022b) Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2022b. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299 (2022). https://arxiv.longhoe.net/abs/2208.03299
  • ** et al. (2023a) Ming **, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. 2023a. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. arXiv preprint arXiv:2310.01728 (2023).
  • ** et al. (2023b) Ming **, Qingsong Wen, Yuxuan Liang, Chaoli Zhang, Siqiao Xue, Xue Wang, James Zhang, Yi Wang, Haifeng Chen, Xiaoli Li, et al. 2023b. Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook. arXiv preprint arXiv:2310.10196 (2023).
  • Johnson et al. (2019) Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data (2019). https://arxiv.longhoe.net/abs/1702.08734
  • Karpukhin et al. (2020) Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). https://arxiv.longhoe.net/abs/2004.04906
  • Khandelwal et al. (2020) Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2020. Generalization through memorization: Nearest neighbor language models. In Proceedings of the International Conference on Learning Representations (ICLR). https://arxiv.longhoe.net/abs/1911.00172
  • LangChain-ChatGLM (2023) LangChain-ChatGLM. 2023. LangChain-ChatGLM. https://github.com/imClumsyPanda/langchain-ChatGLM
  • Liška et al. (2022) Adam Liška, Tomáš Kočiský, Elena Gribovskaya, Tayfun Terzi, Eren Sezener, Devang Agrawal, Cyprien de Masson d’Autume, Tim Scholtes, Manzil Zaheer, Susannah Young, Ellen Gilsenan-McMahon Sophia Austin, Phil Blunsom, and Angeliki Lazaridou. 2022. StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models. arXiv preprint arXiv:2205.11388 (2022). https://arxiv.longhoe.net/abs/2205.11388
  • Liu et al. (2023) Xiao Liu, Hanyu Lai, Hao Yu, Yifan Xu, Aohan Zeng, Zhengxiao Du, Peng Zhang, Yuxiao Dong, and Jie Tang. 2023. WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences. arXiv:2306.07906 [cs.CL]
  • Mangrulkar et al. (2022) Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, and Sayak Paul. 2022. PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods. https://github.com/huggingface/peft
  • Menick et al. (2022) Jacob Menick, Maja Trebacz, Vladimir Mikulik, John Aslanides, Francis Song, Martin Chadwick, Mia Glaese, Susannah Young, Lucy Campbell-Gillingham, Geoffrey Irving, et al. 2022. Teaching language models to support answers with verified quotes. arXiv preprint arXiv:2203.11147 (2022). https://arxiv.longhoe.net/abs/2203.11147
  • Nakano et al. (2021) Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, et al. 2021. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332 (2021). https://arxiv.longhoe.net/abs/2112.09332
  • OpenAI (2023) OpenAI. 2023. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023). https://arxiv.longhoe.net/abs/2303.08774.pdf
  • Rajpurkar et al. (2016) Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). https://arxiv.longhoe.net/abs/1606.05250
  • Ram et al. (2023) Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics (2023). https://arxiv.longhoe.net/abs/2302.00083
  • Roberts et al. (2020) Adam Roberts, Colin Raffel, and Noam Shazeer. 2020. How Much Knowledge Can You Pack Into the Parameters of a Language Model?. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 5418–5426. https://doi.org/10.18653/v1/2020.emnlp-main.437
  • Schick et al. (2023) Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 (2023). https://arxiv.longhoe.net/abs/2302.04761
  • Shi et al. (2022) Weijia Shi, Julian Michael, Suchin Gururangan, and Luke Zettlemoyer. 2022. Nearest neighbor zero-shot inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). https://aclanthology.org/2022.emnlp-main.214/
  • Shi et al. (2023) Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. 2023. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652 (2023). https://arxiv.longhoe.net/abs/2301.12652
  • Wang et al. (2022) Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text Embeddings by Weakly-Supervised Contrastive Pre-training. arXiv preprint arXiv:2212.03533 (2022).
  • Wu et al. (2023) Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. 2023. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 (2023). https://arxiv.longhoe.net/abs/2303.17564
  • Wu et al. (2022) Yuhuai Wu, Markus Norman Rabe, DeLesley Hutchins, and Christian Szegedy. 2022. Memorizing Transformers. In Proceedings of the International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=TrjbxzRcnf-
  • Yang et al. (2023) Hongyang Yang, Xiao-Yang Liu, and Christina Dan Wang. 2023. FinGPT: Open-Source Financial Large Language Models. arXiv preprint arXiv:2306.06031 (2023).
  • Zeng et al. (2022) Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, et al. 2022. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414 (2022). https://arxiv.longhoe.net/abs/2210.02414
  • Zhong et al. (2022) Zexuan Zhong, Tao Lei, and Danqi Chen. 2022. Training language models with memory augmentation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). https://arxiv.longhoe.net/abs/2205.12674
  • Zhou et al. (2023) Shuyan Zhou, Uri Alon, Frank F Xu, Zhengbao Jiang, and Graham Neubig. 2023. Docprompting: Generating code by retrieving the docs. In Proceedings of the International Conference on Learning Representations (ICLR). https://arxiv.longhoe.net/abs/2207.05987

Appendix A Ongoing and Future Work

We are currently exploring several extensions to deal with more complex dialogue cases in the finance domain. We are particularly interested in handling:

  • Multiple-turn conversations. In such scenarios, the system has to engage in the communication and provide new responses at each turn that are consistent and coherent with previous statements.

  • External tools. Users may want our system to interact with other external tools, such as pulling information from commercial data vendors (e.g., Bloomberg Terminal) and placing transaction orders in online brokerage software (e.g., Charles Schwab).

  • More user-friendly presentation. Users may desire our system presenting answers in richer formats such as tables and diagrams.

Appendix B Software Interface

The main interface of our WeaverBird system can be seen in Figures 5 and 6. Experience it at https://weaverbird.ttic.edu.

Refer to caption
Figure 5. The main interface of WeaverBird: the configuration and chatbox.
Refer to caption
Figure 6. The ’advanced’ tab of WeaverBird: the user can choose to save useful search results into a local knowledge base.