-
SEntFiN 1.0: Entity-Aware Sentiment Analysis for Financial News
Authors:
Ankur Sinha,
Satishwar Kedas,
Rishu Kumar,
Pekka Malo
Abstract:
Fine-grained financial sentiment analysis on news headlines is a challenging task requiring human-annotated datasets to achieve high performance. Limited studies have tried to address the sentiment extraction task in a setting where multiple entities are present in a news headline. In an effort to further research in this area, we make publicly available SEntFiN 1.0, a human-annotated dataset of 1…
▽ More
Fine-grained financial sentiment analysis on news headlines is a challenging task requiring human-annotated datasets to achieve high performance. Limited studies have tried to address the sentiment extraction task in a setting where multiple entities are present in a news headline. In an effort to further research in this area, we make publicly available SEntFiN 1.0, a human-annotated dataset of 10,753 news headlines with entity-sentiment annotations, of which 2,847 headlines contain multiple entities, often with conflicting sentiments. We augment our dataset with a database of over 1,000 financial entities and their various representations in news media amounting to over 5,000 phrases. We propose a framework that enables the extraction of entity-relevant sentiments using a feature-based approach rather than an expression-based approach. For sentiment extraction, we utilize 12 different learning schemes utilizing lexicon-based and pre-trained sentence representations and five classification approaches. Our experiments indicate that lexicon-based n-gram ensembles are above par with pre-trained word embedding schemes such as GloVe. Overall, RoBERTa and finBERT (domain-specific BERT) achieve the highest average accuracy of 94.29% and F1-score of 93.27%. Further, using over 210,000 entity-sentiment predictions, we validate the economic effect of sentiments on aggregate market movements over a long duration.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
Predicting Visit Cost of Obstructive Sleep Apnea using Electronic Healthcare Records with Transformer
Authors:
Zhaoyang Chen,
Lina Siltala-Li,
Mikko Lassila,
Pekka Malo,
Eeva Vilkkumaa,
Tarja Saaresranta,
Arho Veli Virkki
Abstract:
Background: Obstructive sleep apnea (OSA) is growing increasingly prevalent in many countries as obesity rises. Sufficient, effective treatment of OSA entails high social and financial costs for healthcare. Objective: For treatment purposes, predicting OSA patients' visit expenses for the coming year is crucial. Reliable estimates enable healthcare decision-makers to perform careful fiscal managem…
▽ More
Background: Obstructive sleep apnea (OSA) is growing increasingly prevalent in many countries as obesity rises. Sufficient, effective treatment of OSA entails high social and financial costs for healthcare. Objective: For treatment purposes, predicting OSA patients' visit expenses for the coming year is crucial. Reliable estimates enable healthcare decision-makers to perform careful fiscal management and budget well for effective distribution of resources to hospitals. The challenges created by scarcity of high-quality patient data are exacerbated by the fact that just a third of those data from OSA patients can be used to train analytics models: only OSA patients with more than 365 days of follow-up are relevant for predicting a year's expenditures. Methods and procedures: The authors propose a method applying two Transformer models, one for augmenting the input via data from shorter visit histories and the other predicting the costs by considering both the material thus enriched and cases with more than a year's follow-up. Results: The two-model solution permits putting the limited body of OSA patient data to productive use. Relative to a single-Transformer solution using only a third of the high-quality patient data, the solution with two models improved the prediction performance's $R^{2}$ from 88.8% to 97.5%. Even using baseline models with the model-augmented data improved the $R^{2}$ considerably, from 61.6% to 81.9%. Conclusion: The proposed method makes prediction with the most of the available high-quality data by carefully exploiting details, which are not directly relevant for answering the question of the next year's likely expenditure.
△ Less
Submitted 28 January, 2023;
originally announced January 2023.
-
A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications
Authors:
Ankur Sinha,
Pekka Malo,
Kalyanmoy Deb
Abstract:
Bilevel optimization is defined as a mathematical program, where an optimization problem contains another optimization problem as a constraint. These problems have received significant attention from the mathematical programming community. Only limited work exists on bilevel problems using evolutionary computation techniques; however, recently there has been an increasing interest due to the proli…
▽ More
Bilevel optimization is defined as a mathematical program, where an optimization problem contains another optimization problem as a constraint. These problems have received significant attention from the mathematical programming community. Only limited work exists on bilevel problems using evolutionary computation techniques; however, recently there has been an increasing interest due to the proliferation of practical applications and the potential of evolutionary algorithms in tackling these problems. This paper provides a comprehensive review on bilevel optimization from the basic principles to solution strategies; both classical and evolutionary. A number of potential application problems are also discussed. To offer the readers insights on the prominent developments in the field of bilevel optimization, we have performed an automated text-analysis of an extended list of papers published on bilevel optimization to date. This paper should motivate evolutionary computation researchers to pay more attention to this practical yet challenging area.
△ Less
Submitted 5 December, 2020; v1 submitted 17 May, 2017;
originally announced May 2017.
-
Optimal Management of Naturally Regenerating Uneven-aged Forests
Authors:
Ankur Sinha,
Janne Rämö,
Pekka Malo,
Markku Kallio,
Olli Tahvonen
Abstract:
A shift from even-aged forest management to uneven-aged management practices leads to a problem rather different from the existing straightforward practice that follows a rotation cycle of artificial regeneration, thinning of inferior trees and a clearcut. A lack of realistic models and methods suggesting how to manage uneven-aged stands in a way that is economically viable and ecologically sustai…
▽ More
A shift from even-aged forest management to uneven-aged management practices leads to a problem rather different from the existing straightforward practice that follows a rotation cycle of artificial regeneration, thinning of inferior trees and a clearcut. A lack of realistic models and methods suggesting how to manage uneven-aged stands in a way that is economically viable and ecologically sustainable creates difficulties in adopting this new management practice. To tackle this problem, we make a two-fold contribution in this paper. The first contribution is the proposal of an algorithm that is able to handle a realistic uneven-aged stand management model that is otherwise computationally tedious and intractable. The model considered in this paper is an empirically estimated size-structured ecological model for uneven-aged spruce forests. The second contribution is on the sensitivity analysis of the forest model with respect to a number of important parameters. The analysis provides us an insight into the behavior of the uneven-aged forest model.
△ Less
Submitted 17 August, 2016;
originally announced August 2016.
-
Test Problem Construction for Single-Objective Bilevel Optimization
Authors:
Ankur Sinha,
Pekka Malo,
Kalyanmoy Deb
Abstract:
In this paper, we propose a procedure for designing controlled test problems for single-objective bilevel optimization. The construction procedure is flexible and allows its user to control the different complexities that are to be included in the test problems independently of each other. In addition to properties that control the difficulty in convergence, the procedure also allows the user to i…
▽ More
In this paper, we propose a procedure for designing controlled test problems for single-objective bilevel optimization. The construction procedure is flexible and allows its user to control the different complexities that are to be included in the test problems independently of each other. In addition to properties that control the difficulty in convergence, the procedure also allows the user to introduce difficulties caused by interaction of the two levels. As a companion to the test problem construction framework, the paper presents a standard test suite of twelve problems, which includes eight unconstrained and four constrained problems. Most of the problems are scalable in terms of variables and constraints. To provide baseline results, we have solved the proposed test problems using a nested bilevel evolutionary algorithm. The results can be used for comparison, while evaluating the performance of any other bilevel optimization algorithm. The codes related to the paper may be accessed from the website \url{http://bilevel.org}.
△ Less
Submitted 16 August, 2016; v1 submitted 9 January, 2014;
originally announced January 2014.
-
Multi-objective Stackelberg Game Between a Regulating Authority and a Mining Company: A Case Study in Environmental Economics
Authors:
Ankur Sinha,
Pekka Malo,
Anton Frantsev,
Kalyanmoy Deb
Abstract:
Bilevel programming problems are often found in practice. In this paper, we handle one such bilevel application problem from the domain of environmental economics. The problem is a Stakelberg game with multiple objectives at the upper level, and a single objective at the lower level. The leader in this case is the regulating authority, and it tries to maximize its total tax revenue over multiple p…
▽ More
Bilevel programming problems are often found in practice. In this paper, we handle one such bilevel application problem from the domain of environmental economics. The problem is a Stakelberg game with multiple objectives at the upper level, and a single objective at the lower level. The leader in this case is the regulating authority, and it tries to maximize its total tax revenue over multiple periods while trying to minimize the environmental damages caused by a mining company. The follower is the mining company whose sole objective is to maximize its total profit over multiple periods under the limitations set by the leader. The solution to the model contains the optimal taxation and extraction decisions to be made by the players in each of the time periods. We construct a simplistic model for the Stackelberg game and provide an analytical solution to the problem. Thereafter, the model is extended to incorporate realism and is solved using a bilevel evolutionary algorithm capable of handling multiple objectives.
△ Less
Submitted 23 July, 2013;
originally announced July 2013.
-
Finding Optimal Strategies in a Multi-Period Multi-Leader-Follower Stackelberg Game Using an Evolutionary Algorithm
Authors:
Ankur Sinha,
Pekka Malo,
Anton Frantsev,
Kalyanmoy Deb
Abstract:
Stackelberg games are a classic example of bilevel optimization problems, which are often encountered in game theory and economics. These are complex problems with a hierarchical structure, where one optimization task is nested within the other. Despite a number of studies on handling bilevel optimization problems, these problems still remain a challenging territory, and existing methodologies are…
▽ More
Stackelberg games are a classic example of bilevel optimization problems, which are often encountered in game theory and economics. These are complex problems with a hierarchical structure, where one optimization task is nested within the other. Despite a number of studies on handling bilevel optimization problems, these problems still remain a challenging territory, and existing methodologies are able to handle only simple problems with few variables under assumptions of continuity and differentiability. In this paper, we consider a special case of a multi-period multi-leader-follower Stackelberg competition model with non-linear cost and demand functions and discrete production variables. The model has potential applications, for instance in aircraft manufacturing industry, which is an oligopoly where a few giant firms enjoy a tremendous commitment power over the other smaller players. We solve cases with different number of leaders and followers, and show how the entrance or exit of a player affects the profits of the other players. In the presence of various model complexities, we use a computationally intensive nested evolutionary strategy to find an optimal solution for the model. The strategy is evaluated on a test-suite of bilevel problems, and it has been shown that the method is successful in handling difficult bilevel problems.
△ Less
Submitted 23 July, 2013;
originally announced July 2013.
-
Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts
Authors:
Pekka Malo,
Ankur Sinha,
Pyry Takala,
Pekka Korhonen,
Jyrki Wallenius
Abstract:
The use of robo-readers to analyze news texts is an emerging technology trend in computational finance. In recent research, a substantial effort has been invested to develop sophisticated financial polarity-lexicons that can be used to investigate how financial sentiments relate to future company performance. However, based on experience from other fields, where sentiment analysis is commonly appl…
▽ More
The use of robo-readers to analyze news texts is an emerging technology trend in computational finance. In recent research, a substantial effort has been invested to develop sophisticated financial polarity-lexicons that can be used to investigate how financial sentiments relate to future company performance. However, based on experience from other fields, where sentiment analysis is commonly applied, it is well-known that the overall semantic orientation of a sentence may differ from the prior polarity of individual words. The objective of this article is to investigate how semantic orientations can be better detected in financial and economic news by accommodating the overall phrase-structure information and domain-specific use of language. Our three main contributions are: (1) establishment of a human-annotated finance phrase-bank, which can be used as benchmark for training and evaluating alternative models; (2) presentation of a technique to enhance financial lexicons with attributes that help to identify expected direction of events that affect overall sentiment; (3) development of a linearized phrase-structure model for detecting contextual semantic orientations in financial and economic news texts. The relevance of the newly added lexicon features and the benefit of using the proposed learning-algorithm are demonstrated in a comparative study against previously used general sentiment models as well as the popular word frequency models used in recent financial studies. The proposed framework is parsimonious and avoids the explosion in feature-space caused by the use of conventional n-gram features.
△ Less
Submitted 23 July, 2013; v1 submitted 19 July, 2013;
originally announced July 2013.
-
Efficient Evolutionary Algorithm for Single-Objective Bilevel Optimization
Authors:
Ankur Sinha,
Pekka Malo,
Kalyanmoy Deb
Abstract:
Bilevel optimization problems are a class of challenging optimization problems, which contain two levels of optimization tasks. In these problems, the optimal solutions to the lower level problem become possible feasible candidates to the upper level problem. Such a requirement makes the optimization problem difficult to solve, and has kept the researchers busy towards devising methodologies, whic…
▽ More
Bilevel optimization problems are a class of challenging optimization problems, which contain two levels of optimization tasks. In these problems, the optimal solutions to the lower level problem become possible feasible candidates to the upper level problem. Such a requirement makes the optimization problem difficult to solve, and has kept the researchers busy towards devising methodologies, which can efficiently handle the problem. Despite the efforts, there hardly exists any effective methodology, which is capable of handling a complex bilevel problem. In this paper, we introduce bilevel evolutionary algorithm based on quadratic approximations (BLEAQ) of optimal lower level variables with respect to the upper level variables. The approach is capable of handling bilevel problems with different kinds of complexities in relatively smaller number of function evaluations. Ideas from classical optimization have been hybridized with evolutionary methods to generate an efficient optimization algorithm for generic bilevel problems. The efficacy of the algorithm has been shown on two sets of test problems. The first set is a recently proposed SMD test set, which contains problems with controllable complexities, and the second set contains standard test problems collected from the literature. The proposed method has been evaluated against two benchmarks, and the performance gain is observed to be significant.
△ Less
Submitted 6 October, 2013; v1 submitted 15 March, 2013;
originally announced March 2013.
-
A Multi-objective Exploratory Procedure for Regression Model Selection
Authors:
Ankur Sinha,
Pekka Malo,
Timo Kuosmanen
Abstract:
Variable selection is recognized as one of the most critical steps in statistical modeling. The problems encountered in engineering and social sciences are commonly characterized by over-abundance of explanatory variables, non-linearities and unknown interdependencies between the regressors. An added difficulty is that the analysts may have little or no prior knowledge on the relative importance o…
▽ More
Variable selection is recognized as one of the most critical steps in statistical modeling. The problems encountered in engineering and social sciences are commonly characterized by over-abundance of explanatory variables, non-linearities and unknown interdependencies between the regressors. An added difficulty is that the analysts may have little or no prior knowledge on the relative importance of the variables. To provide a robust method for model selection, this paper introduces the Multi-objective Genetic Algorithm for Variable Selection (MOGA-VS) that provides the user with an optimal set of regression models for a given data-set. The algorithm considers the regression problem as a two objective task, and explores the Pareto-optimal (best subset) models by preferring those models over the other which have less number of regression coefficients and better goodness of fit. The model exploration can be performed based on in-sample or generalization error minimization. The model selection is proposed to be performed in two steps. First, we generate the frontier of Pareto-optimal regression models by eliminating the dominated models without any user intervention. Second, a decision making process is executed which allows the user to choose the most preferred model using visualisations and simple metrics. The method has been evaluated on a recently published real dataset on Communities and Crime within United States.
△ Less
Submitted 13 July, 2016; v1 submitted 28 March, 2012;
originally announced March 2012.
-
Semantic Content Filtering with Wikipedia and Ontologies
Authors:
Pekka Malo,
Pyry Siitari,
Oskar Ahlgren,
Jyrki Wallenius,
Pekka Korhonen
Abstract:
The use of domain knowledge is generally found to improve query efficiency in content filtering applications. In particular, tangible benefits have been achieved when using knowledge-based approaches within more specialized fields, such as medical free texts or legal documents. However, the problem is that sources of domain knowledge are time-consuming to build and equally costly to maintain. As a…
▽ More
The use of domain knowledge is generally found to improve query efficiency in content filtering applications. In particular, tangible benefits have been achieved when using knowledge-based approaches within more specialized fields, such as medical free texts or legal documents. However, the problem is that sources of domain knowledge are time-consuming to build and equally costly to maintain. As a potential remedy, recent studies on Wikipedia suggest that this large body of socially constructed knowledge can be effectively harnessed to provide not only facts but also accurate information about semantic concept-similarities. This paper describes a framework for document filtering, where Wikipedia's concept-relatedness information is combined with a domain ontology to produce semantic content classifiers. The approach is evaluated using Reuters RCV1 corpus and TREC-11 filtering task definitions. In a comparative study, the approach shows robust performance and appears to outperform content classifiers based on Support Vector Machines (SVM) and C4.5 algorithm.
△ Less
Submitted 3 December, 2010;
originally announced December 2010.
-
Automated Query Learning with Wikipedia and Genetic Programming
Authors:
Pekka Malo,
Pyry Siitari,
Ankur Sinha
Abstract:
Most of the existing information retrieval systems are based on bag of words model and are not equipped with common world knowledge. Work has been done towards improving the efficiency of such systems by using intelligent algorithms to generate search queries, however, not much research has been done in the direction of incorporating human-and-society level knowledge in the queries. This paper is…
▽ More
Most of the existing information retrieval systems are based on bag of words model and are not equipped with common world knowledge. Work has been done towards improving the efficiency of such systems by using intelligent algorithms to generate search queries, however, not much research has been done in the direction of incorporating human-and-society level knowledge in the queries. This paper is one of the first attempts where such information is incorporated into the search queries using Wikipedia semantics. The paper presents an essential shift from conventional token based queries to concept based queries, leading to an enhanced efficiency of information retrieval systems. To efficiently handle the automated query learning problem, we propose Wikipedia-based Evolutionary Semantics (Wiki-ES) framework where concept based queries are learnt using a co-evolving evolutionary procedure. Learning concept based queries using an intelligent evolutionary procedure yields significant improvement in performance which is shown through an extensive study using Reuters newswire documents. Comparison of the proposed framework is performed with other information retrieval systems. Concept based approach has also been implemented on other information retrieval systems to justify the effectiveness of a transition from token based queries to concept based queries.
△ Less
Submitted 3 December, 2010;
originally announced December 2010.