-
Breast Cancer Classification Using Gradient Boosting Algorithms Focusing on Reducing the False Negative and SHAP for Explainability
Authors:
João Manoel Herrera Pinheiro,
Marcelo Becker
Abstract:
Cancer is one of the diseases that kill the most women in the world, with breast cancer being responsible for the highest number of cancer cases and consequently deaths. However, it can be prevented by early detection and, consequently, early treatment. Any development for detection or perdition this kind of cancer is important for a better healthy life. Many studies focus on a model with high acc…
▽ More
Cancer is one of the diseases that kill the most women in the world, with breast cancer being responsible for the highest number of cancer cases and consequently deaths. However, it can be prevented by early detection and, consequently, early treatment. Any development for detection or perdition this kind of cancer is important for a better healthy life. Many studies focus on a model with high accuracy in cancer prediction, but sometimes accuracy alone may not always be a reliable metric. This study implies an investigative approach to studying the performance of different machine learning algorithms based on boosting to predict breast cancer focusing on the recall metric. Boosting machine learning algorithms has been proven to be an effective tool for detecting medical diseases. The dataset of the University of California, Irvine (UCI) repository has been utilized to train and test the model classifier that contains their attributes. The main objective of this study is to use state-of-the-art boosting algorithms such as AdaBoost, XGBoost, CatBoost and LightGBM to predict and diagnose breast cancer and to find the most effective metric regarding recall, ROC-AUC, and confusion matrix. Furthermore, our study is the first to use these four boosting algorithms with Optuna, a library for hyperparameter optimization, and the SHAP method to improve the interpretability of our model, which can be used as a support to identify and predict breast cancer. We were able to improve AUC or recall for all the models and reduce the False Negative for AdaBoost and LigthGBM the final AUC were more than 99.41\% for all models.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
PalmProbNet: A Probabilistic Approach to Understanding Palm Distributions in Ecuadorian Tropical Forest via Transfer Learning
Authors:
Kangning Cui,
Zishan Shao,
Gregory Larsen,
Victor Pauca,
Sarra Alqahtani,
David Segurado,
João Pinheiro,
Manqi Wang,
David Lutz,
Robert Plemmons,
Miles Silman
Abstract:
Palms play an outsized role in tropical forests and are important resources for humans and wildlife. A central question in tropical ecosystems is understanding palm distribution and abundance. However, accurately identifying and localizing palms in geospatial imagery presents significant challenges due to dense vegetation, overlap** canopies, and variable lighting conditions in mixed-forest land…
▽ More
Palms play an outsized role in tropical forests and are important resources for humans and wildlife. A central question in tropical ecosystems is understanding palm distribution and abundance. However, accurately identifying and localizing palms in geospatial imagery presents significant challenges due to dense vegetation, overlap** canopies, and variable lighting conditions in mixed-forest landscapes. Addressing this, we introduce PalmProbNet, a probabilistic approach utilizing transfer learning to analyze high-resolution UAV-derived orthomosaic imagery, enabling the detection of palm trees within the dense canopy of the Ecuadorian Rainforest. This approach represents a substantial advancement in automated palm detection, effectively pinpointing palm presence and locality in mixed tropical rainforests. Our process begins by generating an orthomosaic image from UAV images, from which we extract and label palm and non-palm image patches in two distinct sizes. These patches are then used to train models with an identical architecture, consisting of an unaltered pre-trained ResNet-18 and a Multilayer Perceptron (MLP) with specifically trained parameters. Subsequently, PalmProbNet employs a sliding window technique on the landscape orthomosaic, using both small and large window sizes to generate a probability heatmap. This heatmap effectively visualizes the distribution of palms, showcasing the scalability and adaptability of our approach in various forest densities. Despite the challenging terrain, our method demonstrated remarkable performance, achieving an accuracy of 97.32% and a Cohen's kappa of 94.59% in testing.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
ChartText: Linking Text with Charts in Documents
Authors:
Joao Pinheiro,
Jorge Poco
Abstract:
Recent works show that interactive documents connecting text with visualizations facilitate reading comprehension. However, creating this type of content requires specialized knowledge. We present ChartText, a method that links text with visualizations in this work. Our approach supports documents that include bar charts, line charts, and scatter plots. ChartText receives the visual encoding of th…
▽ More
Recent works show that interactive documents connecting text with visualizations facilitate reading comprehension. However, creating this type of content requires specialized knowledge. We present ChartText, a method that links text with visualizations in this work. Our approach supports documents that include bar charts, line charts, and scatter plots. ChartText receives the visual encoding of the visualization and its associated text as input. It then performs the linking in two stages: The matching stage creates individual links relating simple phrases between the text and the chart. Then, it combines the individual links according to the visual channels in the grou** stage, building more meaningful connections. We use two datasets to design and evaluate our method; the first comes from web documents (24 bar charts and texts) and the second from academic documents (25 bar charts, 25 line charts, and 25 scatter plots with their texts). Our experiments show that our method obtains F1 scores of 0.50 and 0.66 on both datasets. We can also use a semi-automatic approach correcting individual links; in this case, the scores rise to 0.68 and 0.84, respectively. To show the usefulness of our technique, we implement two proofs of concept. We create interactive documents using graphic overlays in the first one, facilitating the reading experience. We use voice instead of text to annotate charts in real-time in the second. For example, in a videoconference, our technique can automatically annotate a chart following the presenter's description.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
Towards QoS-Aware Recommendations
Authors:
Pavlos Sermpezis,
Savvas Kastanakis,
João Ismael Pinheiro,
Felipe Assis,
Mateus Nogueira,
Daniel Menasché,
Thrasyvoulos Spyropoulos
Abstract:
In this paper we propose that recommendation systems (RSs) for multimedia services should be "QoS-aware", i.e., take into account the expected QoS with which a content can be delivered, to increase the user satisfaction. Network-aware recommendations have been very recently proposed as a promising solution to improve network performance. However, the idea of QoS-aware RSs has been studied from the…
▽ More
In this paper we propose that recommendation systems (RSs) for multimedia services should be "QoS-aware", i.e., take into account the expected QoS with which a content can be delivered, to increase the user satisfaction. Network-aware recommendations have been very recently proposed as a promising solution to improve network performance. However, the idea of QoS-aware RSs has been studied from the network perspective. Its feasibility and performance performance advantages for the content-provider or user perspective have only been speculated. Hence, in this paper we aim to provide initial answers for the feasibility of the concept of QoS-aware RS, by investigating its impact on real user experience. To this end, we conduct experiments with real users on a testbed, and present initial experimental results. Our analysis demonstrates the potential of the idea: QoS-aware RSs could be beneficial for both the users (better experience) and content providers (higher user engagement). Moreover, based on the collected dataset, we build statistical models to (i) predict the user experience as a function of QoS, relevance of recommendations (QoR) and user interest, and (ii) provide useful insights for the design of QoS-aware RSs. We believe that our study is an important first step towards QoS-aware recommendations, by providing experimental evidence for their feasibility and benefits, and can help open a future research direction.
△ Less
Submitted 1 October, 2020; v1 submitted 15 July, 2019;
originally announced July 2019.
-
Semi-BCI Algebras
Authors:
Regivan H. N. Santiago,
Benjamin Bedregal,
João Marcos,
Carlos Caleiro,
Jocivania Pinheiro
Abstract:
The notion of semi-BCI algebras is introduced and some of its properties are investigated. This algebra is another generalization for BCI-algebras. It arises from the "intervalization" of BCI algebras. Semi-BCI have a similar structure to Pseudo-BCI algebras however they are not the same. In this paper we also provide an investigation on the similarity between these classes of algebras by showing…
▽ More
The notion of semi-BCI algebras is introduced and some of its properties are investigated. This algebra is another generalization for BCI-algebras. It arises from the "intervalization" of BCI algebras. Semi-BCI have a similar structure to Pseudo-BCI algebras however they are not the same. In this paper we also provide an investigation on the similarity between these classes of algebras by showing how they relate to the process of intervalization.
△ Less
Submitted 13 March, 2018;
originally announced March 2018.
-
Canonical form of linear subspaces and coding invariants: the poset metric point of view
Authors:
Jerry Anderson Pinheiro,
Marcelo Firer
Abstract:
In this work we introduce the concept of a sub-space decomposition, subject to a partition of the coordinates. Considering metrics determined by partial orders in the set of coordinates, the so called poset metrics, we show the existence of maximal decompositions according to the metric. These decompositions turns to be an important tool to obtain the canonical form for codes over any poset metric…
▽ More
In this work we introduce the concept of a sub-space decomposition, subject to a partition of the coordinates. Considering metrics determined by partial orders in the set of coordinates, the so called poset metrics, we show the existence of maximal decompositions according to the metric. These decompositions turns to be an important tool to obtain the canonical form for codes over any poset metrics and to obtain bounds for important invariants such as the packing radius of a linear subspace. Furthermore, using maximal decompositions, we are able to reduce and optimize the full lookup table algorithm for the syndrome decoding process.
△ Less
Submitted 29 June, 2017;
originally announced June 2017.
-
Combinatorial metrics: MacWilliams-type identities, isometries and extension property
Authors:
Jerry Anderson Pinheiro,
Roberto Assis Machado,
Marcelo Firer
Abstract:
In this work we characterize the combinatorial metrics admitting a MacWilliams-type identity and describe the group of linear isometries of such metrics. Considering coverings that are not connected, we classify the metrics satisfying the MacWilliams extension property.
In this work we characterize the combinatorial metrics admitting a MacWilliams-type identity and describe the group of linear isometries of such metrics. Considering coverings that are not connected, we classify the metrics satisfying the MacWilliams extension property.
△ Less
Submitted 23 March, 2017;
originally announced March 2017.
-
Characterization of metrics induced by hierarchical posets
Authors:
Roberto Assis Machado,
Jerry Anderson Pinheiro,
Marcelo Firer
Abstract:
In this paper we consider metrics determined by hierarchical posets and give explicit formulae for the main parameters of a linear code: the minimum distance and the packing, covering and Chebyshev radii of a code. We also present ten characterizations of hierarchical poset metrics, including new characterizations and simple new proofs to the known ones.
In this paper we consider metrics determined by hierarchical posets and give explicit formulae for the main parameters of a linear code: the minimum distance and the packing, covering and Chebyshev radii of a code. We also present ten characterizations of hierarchical poset metrics, including new characterizations and simple new proofs to the known ones.
△ Less
Submitted 23 March, 2017; v1 submitted 4 August, 2015;
originally announced August 2015.
-
Coding and Decoding Schemes for MSE and Image Transmission
Authors:
Marcelo Firer,
Luciano Panek,
Jerry Anderson Pinheiro
Abstract:
In this work we explore possibilities for coding and decoding tailor-made for mean squared error evaluation of error in contexts such as image transmission. To do so, we introduce a loss function that expresses the overall performance of a coding and decoding scheme for discrete channels and that exchanges the usual goal of minimizing the error probability to that of minimizing the expected loss.…
▽ More
In this work we explore possibilities for coding and decoding tailor-made for mean squared error evaluation of error in contexts such as image transmission. To do so, we introduce a loss function that expresses the overall performance of a coding and decoding scheme for discrete channels and that exchanges the usual goal of minimizing the error probability to that of minimizing the expected loss. In this environment we explore the possibilities of using ordered decoders to create a message-wise unequal error protection (UEP), where the most valuable information is protected by placing in its proximity information words that differ by a small valued error. We give explicit examples, using scale-of-gray images, including small-scale performance analysis and visual simulations for the BSMC.
△ Less
Submitted 4 November, 2014;
originally announced November 2014.
-
Bounds for complexity of syndrome decoding for poset metrics
Authors:
Marcelo Firer,
Jerry Anderson Pinheiro
Abstract:
In this work we show how to decompose a linear code relatively to any given poset metric. We prove that the complexity of syndrome decoding is determined by a maximal (primary) such decomposition and then show that a refinement of a partial order leads to a refinement of the primary decomposition. Using this and considering already known results about hierarchical posets, we can establish upper an…
▽ More
In this work we show how to decompose a linear code relatively to any given poset metric. We prove that the complexity of syndrome decoding is determined by a maximal (primary) such decomposition and then show that a refinement of a partial order leads to a refinement of the primary decomposition. Using this and considering already known results about hierarchical posets, we can establish upper and lower bounds for the complexity of syndrome decoding relatively to a poset metric.
△ Less
Submitted 17 February, 2015; v1 submitted 3 November, 2014;
originally announced November 2014.
-
Classification of poset-block spaces admitting MacWilliams-type identity
Authors:
Jerry Anderson Pinheiro,
Marcelo Firer
Abstract:
In this work we prove that a poset-block space admits a MacWilliams-type identity if and only if the poset is hierarchical and at any level of the poset, all the blocks have the same dimension. When the poset-block admits the MacWilliams-type identity we explicit the relation between the weight enumerators of a code and its dual.
In this work we prove that a poset-block space admits a MacWilliams-type identity if and only if the poset is hierarchical and at any level of the poset, all the blocks have the same dimension. When the poset-block admits the MacWilliams-type identity we explicit the relation between the weight enumerators of a code and its dual.
△ Less
Submitted 28 February, 2012;
originally announced February 2012.