-
PaLM 2 Technical Report
Authors:
Rohan Anil,
Andrew M. Dai,
Orhan Firat,
Melvin Johnson,
Dmitry Lepikhin,
Alexandre Passos,
Siamak Shakeri,
Emanuel Taropa,
Paige Bailey,
Zhifeng Chen,
Eric Chu,
Jonathan H. Clark,
Laurent El Shafey,
Yan** Huang,
Kathy Meier-Hellstern,
Gaurav Mishra,
Erica Moreira,
Mark Omernick,
Kevin Robinson,
Sebastian Ruder,
Yi Tay,
Kefan Xiao,
Yuanzhong Xu,
Yu**g Zhang,
Gustavo Hernandez Abrego
, et al. (103 additional authors not shown)
Abstract:
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on…
▽ More
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.
When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report.
△ Less
Submitted 13 September, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Human-Centered Tools for Co** with Imperfect Algorithms during Medical Decision-Making
Authors:
Carrie J. Cai,
Emily Reif,
Narayan Hegde,
Jason Hipp,
Been Kim,
Daniel Smilkov,
Martin Wattenberg,
Fernanda Viegas,
Greg S. Corrado,
Martin C. Stumpe,
Michael Terry
Abstract:
Machine learning (ML) is increasingly being used in image retrieval systems for medical decision making. One application of ML is to retrieve visually similar medical images from past patients (e.g. tissue from biopsies) to reference when making a medical decision with a new patient. However, no algorithm can perfectly capture an expert's ideal notion of similarity for every case: an image that is…
▽ More
Machine learning (ML) is increasingly being used in image retrieval systems for medical decision making. One application of ML is to retrieve visually similar medical images from past patients (e.g. tissue from biopsies) to reference when making a medical decision with a new patient. However, no algorithm can perfectly capture an expert's ideal notion of similarity for every case: an image that is algorithmically determined to be similar may not be medically relevant to a doctor's specific diagnostic needs. In this paper, we identified the needs of pathologists when searching for similar images retrieved using a deep learning algorithm, and developed tools that empower users to cope with the search algorithm on-the-fly, communicating what types of similarity are most important at different moments in time. In two evaluations with pathologists, we found that these refinement tools increased the diagnostic utility of images found and increased user trust in the algorithm. The tools were preferred over a traditional interface, without a loss in diagnostic accuracy. We also observed that users adopted new strategies when using refinement tools, re-purposing them to test and understand the underlying algorithm and to disambiguate ML errors from their own errors. Taken together, these findings inform future human-ML collaborative systems for expert decision-making.
△ Less
Submitted 8 February, 2019;
originally announced February 2019.
-
Similar Image Search for Histopathology: SMILY
Authors:
Narayan Hegde,
Jason D. Hipp,
Yun Liu,
Michael E. Buck,
Emily Reif,
Daniel Smilkov,
Michael Terry,
Carrie J. Cai,
Mahul B. Amin,
Craig H. Mermel,
Phil Q. Nelson,
Lily H. Peng,
Greg S. Corrado,
Martin C. Stumpe
Abstract:
The increasing availability of large institutional and public histopathology image datasets is enabling the searching of these datasets for diagnosis, research, and education. Though these datasets typically have associated metadata such as diagnosis or clinical notes, even carefully curated datasets rarely contain annotations of the location of regions of interest on each image. Because pathology…
▽ More
The increasing availability of large institutional and public histopathology image datasets is enabling the searching of these datasets for diagnosis, research, and education. Though these datasets typically have associated metadata such as diagnosis or clinical notes, even carefully curated datasets rarely contain annotations of the location of regions of interest on each image. Because pathology images are extremely large (up to 100,000 pixels in each dimension), further laborious visual search of each image may be needed to find the feature of interest. In this paper, we introduce a deep learning based reverse image search tool for histopathology images: Similar Medical Images Like Yours (SMILY). We assessed SMILY's ability to retrieve search results in two ways: using pathologist-provided annotations, and via prospective studies where pathologists evaluated the quality of SMILY search results. As a negative control in the second evaluation, pathologists were blinded to whether search results were retrieved by SMILY or randomly. In both types of assessments, SMILY was able to retrieve search results with similar histologic features, organ site, and prostate cancer Gleason grade compared with the original query. SMILY may be a useful general-purpose tool in the pathologist's arsenal, to improve the efficiency of searching large archives of histopathology images, without the need to develop and implement specific tools for each application.
△ Less
Submitted 5 February, 2019; v1 submitted 30 January, 2019;
originally announced January 2019.
-
TensorFlow.js: Machine Learning for the Web and Beyond
Authors:
Daniel Smilkov,
Nikhil Thorat,
Yannick Assogba,
Ann Yuan,
Nick Kreeger,
** Yu,
Kangyi Zhang,
Shanqing Cai,
Eric Nielsen,
David Soergel,
Stan Bileschi,
Michael Terry,
Charles Nicholson,
Sandeep N. Gupta,
Sarah Sirajuddin,
D. Sculley,
Rajat Monga,
Greg Corrado,
Fernanda B. Viégas,
Martin Wattenberg
Abstract:
TensorFlow.js is a library for building and executing machine learning algorithms in JavaScript. TensorFlow.js models run in a web browser and in the Node.js environment. The library is part of the TensorFlow ecosystem, providing a set of APIs that are compatible with those in Python, allowing models to be ported between the Python and JavaScript ecosystems. TensorFlow.js has empowered a new set o…
▽ More
TensorFlow.js is a library for building and executing machine learning algorithms in JavaScript. TensorFlow.js models run in a web browser and in the Node.js environment. The library is part of the TensorFlow ecosystem, providing a set of APIs that are compatible with those in Python, allowing models to be ported between the Python and JavaScript ecosystems. TensorFlow.js has empowered a new set of developers from the extensive JavaScript community to build and deploy machine learning models and enabled new classes of on-device computation. This paper describes the design, API, and implementation of TensorFlow.js, and highlights some of the impactful use cases.
△ Less
Submitted 27 February, 2019; v1 submitted 16 January, 2019;
originally announced January 2019.
-
Direct-Manipulation Visualization of Deep Networks
Authors:
Daniel Smilkov,
Shan Carter,
D. Sculley,
Fernanda B. Viégas,
Martin Wattenberg
Abstract:
The recent successes of deep learning have led to a wave of interest from non-experts. Gaining an understanding of this technology, however, is difficult. While the theory is important, it is also helpful for novices to develop an intuitive feel for the effect of different hyperparameters and structural variations. We describe TensorFlow Playground, an interactive, open sourced visualization that…
▽ More
The recent successes of deep learning have led to a wave of interest from non-experts. Gaining an understanding of this technology, however, is difficult. While the theory is important, it is also helpful for novices to develop an intuitive feel for the effect of different hyperparameters and structural variations. We describe TensorFlow Playground, an interactive, open sourced visualization that allows users to experiment via direct manipulation rather than coding, enabling them to quickly build an intuition about neural nets.
△ Less
Submitted 12 August, 2017;
originally announced August 2017.
-
SmoothGrad: removing noise by adding noise
Authors:
Daniel Smilkov,
Nikhil Thorat,
Been Kim,
Fernanda Viégas,
Martin Wattenberg
Abstract:
Explaining the output of a deep network remains a challenge. In the case of an image classifier, one type of explanation is to identify pixels that strongly influence the final decision. A starting point for this strategy is the gradient of the class score function with respect to the input image. This gradient can be interpreted as a sensitivity map, and there are several techniques that elaborat…
▽ More
Explaining the output of a deep network remains a challenge. In the case of an image classifier, one type of explanation is to identify pixels that strongly influence the final decision. A starting point for this strategy is the gradient of the class score function with respect to the input image. This gradient can be interpreted as a sensitivity map, and there are several techniques that elaborate on this basic idea. This paper makes two contributions: it introduces SmoothGrad, a simple method that can help visually sharpen gradient-based sensitivity maps, and it discusses lessons in the visualization of these maps. We publish the code for our experiments and a website with our results.
△ Less
Submitted 12 June, 2017;
originally announced June 2017.
-
Embedding Projector: Interactive Visualization and Interpretation of Embeddings
Authors:
Daniel Smilkov,
Nikhil Thorat,
Charles Nicholson,
Emily Reif,
Fernanda B. Viégas,
Martin Wattenberg
Abstract:
Embeddings are ubiquitous in machine learning, appearing in recommender systems, NLP, and many other applications. Researchers and developers often need to explore the properties of a specific embedding, and one way to analyze embeddings is to visualize them. We present the Embedding Projector, a tool for interactive visualization and interpretation of embeddings.
Embeddings are ubiquitous in machine learning, appearing in recommender systems, NLP, and many other applications. Researchers and developers often need to explore the properties of a specific embedding, and one way to analyze embeddings is to visualize them. We present the Embedding Projector, a tool for interactive visualization and interpretation of embeddings.
△ Less
Submitted 16 November, 2016;
originally announced November 2016.
-
Beyond network structure: How heterogenous susceptibility modulates the spread of epidemics
Authors:
Daniel Smilkov,
Cesar A. Hidalgo,
Ljupco Kocarev
Abstract:
The compartmental models used to study epidemic spreading often assume the same susceptibility for all individuals, and are therefore, agnostic about the effects that differences in susceptibility can have on epidemic spreading. Here we show that--for the SIS model--differential susceptibility can make networks more vulnerable to the spread of diseases when the correlation between a node's degree…
▽ More
The compartmental models used to study epidemic spreading often assume the same susceptibility for all individuals, and are therefore, agnostic about the effects that differences in susceptibility can have on epidemic spreading. Here we show that--for the SIS model--differential susceptibility can make networks more vulnerable to the spread of diseases when the correlation between a node's degree and susceptibility are positive, and less vulnerable when this correlation is negative. Moreover, we show that networks become more likely to contain a pocket of infection when individuals are more likely to connect with others that have similar susceptibility (the network is segregated). These results show that the failure to include differential susceptibility to epidemic models can lead to a systematic over/under estimation of fundamental epidemic parameters when the structure of the networks is not independent from the susceptibility of the nodes or when there are correlations between the susceptibility of connected individuals.
△ Less
Submitted 10 March, 2014;
originally announced March 2014.
-
The influence of the network topology on epidemic spreading
Authors:
Daniel Smilkov,
Ljupco Kocarev
Abstract:
The influence of the network's structure on the dynamics of spreading processes has been extensively studied in the last decade. Important results that partially answer this question show a weak connection between the macroscopic behavior of these processes and specific structural properties in the network, such as the largest eigenvalue of a topology related matrix. However, little is known about…
▽ More
The influence of the network's structure on the dynamics of spreading processes has been extensively studied in the last decade. Important results that partially answer this question show a weak connection between the macroscopic behavior of these processes and specific structural properties in the network, such as the largest eigenvalue of a topology related matrix. However, little is known about the direct influence of the network topology on microscopic level, such as the influence of the (neighboring) network on the probability of a particular node's infection. To answer this question, we derive both an upper and a lower bound for the probability that a particular node is infective in a susceptible-infective-susceptible model for two cases of spreading processes: reactive and contact processes. The bounds are derived by considering the $n-$hop neighborhood of the node; the bounds are tighter as one uses a larger $n-$hop neighborhood to calculate them. Consequently, using local information for different neighborhood sizes, we assess the extent to which the topology influences the spreading process, thus providing also a strong macroscopic connection between the former and the latter. Our findings are complemented by numerical results for a real-world e-mail network. A very good estimate for the infection density $ρ$ is obtained using only 2-hop neighborhoods which account for 0.4% of the entire network topology on average.
△ Less
Submitted 14 November, 2011;
originally announced November 2011.
-
Identifying communities by influence dynamics in social networks
Authors:
Angel Stanoev,
Daniel Smilkov,
Ljupco Kocarev
Abstract:
Communities are not static; they evolve, split and merge, appear and disappear, i.e. they are product of dynamical processes that govern the evolution of the network. A good algorithm for community detection should not only quantify the topology of the network, but incorporate the dynamical processes that take place on the network. We present a novel algorithm for community detection that combines…
▽ More
Communities are not static; they evolve, split and merge, appear and disappear, i.e. they are product of dynamical processes that govern the evolution of the network. A good algorithm for community detection should not only quantify the topology of the network, but incorporate the dynamical processes that take place on the network. We present a novel algorithm for community detection that combines network structure with processes that support creation and/or evolution of communities. The algorithm does not embrace the universal approach but instead tries to focus on social networks and model dynamic social interactions that occur on those networks. It identifies leaders, and communities that form around those leaders. It naturally supports overlap** communities by associating each node with a membership vector that describes node's involvement in each community. This way, in addition to overlap** communities, we can identify nodes that are good followers to their leader, and also nodes with no clear community involvement that serve as a proxy between several communities and are equally as important. We run the algorithm for several real social networks which we believe represent a good fraction of the wide body of social networks and discuss the results including other possible applications.
△ Less
Submitted 22 November, 2011; v1 submitted 27 April, 2011;
originally announced April 2011.
-
Rich-club and page-club coefficients for directed graphs
Authors:
Daniel Smilkov,
Ljupco Kocarev
Abstract:
Rich-club and page-club coefficients and their null models are introduced for directed graphs. Null models allow for a quantitative discussion of the rich-club and page-club phenomena. These coefficients are computed for four directed real-world networks: Arxiv High Energy Physics paper citation network, Web network (released from Google), Citation network among US Patents, and Email network from…
▽ More
Rich-club and page-club coefficients and their null models are introduced for directed graphs. Null models allow for a quantitative discussion of the rich-club and page-club phenomena. These coefficients are computed for four directed real-world networks: Arxiv High Energy Physics paper citation network, Web network (released from Google), Citation network among US Patents, and Email network from a EU research institution. The results show a high correlation between rich-club and page-club ordering. For journal paper citation network, we identify both rich-club and page-club ordering, showing that {}"elite" papers are cited by other {}"elite" papers. Google web network shows partial rich-club and page-club ordering up to some point and then a narrow declining of the corresponding normalized coefficients, indicating the lack of rich-club ordering and the lack of page-club ordering, i.e. high in-degree (PageRank) pages purposely avoid sharing links with other high in-degree (PageRank) pages. For UC patents citation network, we identify page-club and rich-club ordering providing a conclusion that {}"elite" patents are cited by other {}"elite" patents. Finally, for e-mail communication network we show lack of both rich-club and page-club ordering. We construct an example of synthetic network showing page-club ordering and the lack of rich-club ordering.
△ Less
Submitted 11 March, 2011;
originally announced March 2011.
-
Analytically solvable processes on networks
Authors:
Daniel Smilkov,
Ljupco Kocarev
Abstract:
We introduce a broad class of analytically solvable processes on networks. In the special case, they reduce to random walk and consensus process - two most basic processes on networks. Our class differs from previous models of interactions (such as stochastic Ising model, cellular automata, infinite particle system, and voter model) in several ways, two most important being: (i) the model is analy…
▽ More
We introduce a broad class of analytically solvable processes on networks. In the special case, they reduce to random walk and consensus process - two most basic processes on networks. Our class differs from previous models of interactions (such as stochastic Ising model, cellular automata, infinite particle system, and voter model) in several ways, two most important being: (i) the model is analytically solvable even when the dynamical equation for each node may be different and the network may have an arbitrary finite graph and influence structure; and (ii) in addition, when local dynamic is described by the same evolution equation, the model is decomposable: the equilibrium behavior of the system can be expressed as an explicit function of network topology and node dynamics
△ Less
Submitted 25 April, 2011; v1 submitted 11 March, 2011;
originally announced March 2011.