-
Academic Article Recommendation Using Multiple Perspectives
Authors:
Kenneth Church,
Omar Alonso,
Peter Vickers,
Jiameng Sun,
Abteen Ebrahimi,
Raman Chandrasekar
Abstract:
We argue that Content-based filtering (CBF) and Graph-based methods (GB) complement one another in Academic Search recommendations. The scientific literature can be viewed as a conversation between authors and the audience. CBF uses abstracts to infer authors' positions, and GB uses citations to infer responses from the audience. In this paper, we describe nine differences between CBF and GB, as w…
▽ More
We argue that Content-based filtering (CBF) and Graph-based methods (GB) complement one another in Academic Search recommendations. The scientific literature can be viewed as a conversation between authors and the audience. CBF uses abstracts to infer authors' positions, and GB uses citations to infer responses from the audience. In this paper, we describe nine differences between CBF and GB, as well as synergistic opportunities for hybrid combinations. Two embeddings will be used to illustrate these opportunities: (1) Specter, a CBF method based on BERT-like deepnet encodings of abstracts, and (2) ProNE, a GB method based on spectral clustering of more than 200M papers and 2B citations from Semantic Scholar.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
We Need to Talk About Classification Evaluation Metrics in NLP
Authors:
Peter Vickers,
Loïc Barrault,
Emilio Monti,
Nikolaos Aletras
Abstract:
In Natural Language Processing (NLP) classification tasks such as topic categorisation and sentiment analysis, model generalizability is generally measured with standard metrics such as Accuracy, F-Measure, or AUC-ROC. The diversity of metrics, and the arbitrariness of their application suggest that there is no agreement within NLP on a single best metric to use. This lack suggests there has not b…
▽ More
In Natural Language Processing (NLP) classification tasks such as topic categorisation and sentiment analysis, model generalizability is generally measured with standard metrics such as Accuracy, F-Measure, or AUC-ROC. The diversity of metrics, and the arbitrariness of their application suggest that there is no agreement within NLP on a single best metric to use. This lack suggests there has not been sufficient examination of the underlying heuristics which each metric encodes. To address this we compare several standard classification metrics with more 'exotic' metrics and demonstrate that a random-guess normalised Informedness metric is a parsimonious baseline for task performance. To show how important the choice of metric is, we perform extensive experiments on a wide range of NLP tasks including a synthetic scenario, natural language understanding, question answering and machine translation. Across these tasks we use a superset of metrics to rank models and find that Informedness best captures the ideal model characteristics. Finally, we release a Python implementation of Informedness following the SciKitLearn classifier format.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Sonification of Network Traffic Flow for Monitoring and Situational Awareness
Authors:
Mohamed Debashi,
Paul Vickers
Abstract:
Maintaining situational awareness of what is happening within a network is challenging, not least because the behaviour happens within computers and communications networks, but also because data traffic speeds and volumes are beyond human ability to process. Visualisation is widely used to present information about the dynamics of network traffic dynamics. Although it provides operators with an o…
▽ More
Maintaining situational awareness of what is happening within a network is challenging, not least because the behaviour happens within computers and communications networks, but also because data traffic speeds and volumes are beyond human ability to process. Visualisation is widely used to present information about the dynamics of network traffic dynamics. Although it provides operators with an overall view and specific information about particular traffic or attacks on the network, it often fails to represent the events in an understandable way. Visualisations require visual attention and so are not well suited to continuous monitoring scenarios in which network administrators must carry out other tasks. Situational awareness is critical and essential for decision-making in the domain of computer network monitoring where it is vital to be able to identify and recognize network environment behaviours.Here we present SoNSTAR (Sonification of Networks for SiTuational AwaReness), a real-time sonification system to be used in the monitoring of computer networks to support the situational awareness of network administrators. SoNSTAR provides an auditory representation of all the TCP/IP protocol traffic within a network based on the different traffic flows between between network hosts. SoNSTAR raises situational awareness levels for computer network defence by allowing operators to achieve better understanding and performance while imposing less workload compared to visual techniques. SoNSTAR identifies the features of network traffic flows by inspecting the status flags of TCP/IP packet headers and map** traffic events to recorded sounds to generate a soundscape representing the real-time status of the network traffic environment. Listening to the soundscape allows the administrator to recognise anomalous behaviour quickly and without having to continuously watch a computer screen.
△ Less
Submitted 19 December, 2017;
originally announced December 2017.
-
graphTPP: A multivariate based method for interactive graph layout and analysis
Authors:
Helen Gibson,
Paul Vickers
Abstract:
Graph layout is the process of creating a visual representation of a graph through a node-link diagram. Node-attribute graphs have additional data stored on the nodes which describe certain properties of the nodes called attributes. Typical force-directed representations often produce hairball-like structures that neither aid in understanding the graph's topology nor the relationship to its attrib…
▽ More
Graph layout is the process of creating a visual representation of a graph through a node-link diagram. Node-attribute graphs have additional data stored on the nodes which describe certain properties of the nodes called attributes. Typical force-directed representations often produce hairball-like structures that neither aid in understanding the graph's topology nor the relationship to its attributes. The aim of this research was to investigate the use of node-attributes for graph layout in order to improve the analysis process and to give further insight into the graph over purely topological layouts. In this article we present graphTPP, a graph based extension to targeted projection pursuit (TPP) --- an interactive, linear, dimension reduction technique --- as a method for graph layout and subsequent further analysis. TPP allows users to control the projection and is optimised for clustering. Three case studies were conducted in the areas of influence graphs, network security, and citation networks. In each case graphTPP was shown to outperform standard force-directed techniques and even other dimension reduction methods in terms of clarity of clustered structure in the layout, the association between the structure and the attributes and the insights elicited in each domain area.
△ Less
Submitted 15 December, 2017;
originally announced December 2017.
-
Direct Segmented Sonification of Characteristic Features of the Data Domain
Authors:
Paul Vickers,
Robert Höldrich
Abstract:
Sonification and audification create auditory displays of datasets. Audification translates data points into digital audio samples and the auditory display's duration is determined by the playback rate. Like audification, auditory graphs maintain the temporal relationships of data while using parameter map**s (typically data-to-frequency) to represent the ordinate values. Such direct approaches…
▽ More
Sonification and audification create auditory displays of datasets. Audification translates data points into digital audio samples and the auditory display's duration is determined by the playback rate. Like audification, auditory graphs maintain the temporal relationships of data while using parameter map**s (typically data-to-frequency) to represent the ordinate values. Such direct approaches have the advantage of presenting the data stream `as is' without the imposed interpretations or accentuation of particular features found in indirect approaches. However, datasets can often be subdivided into short non-overlap** variable length segments that each encapsulate a discrete unit of domain-specific significant information and current direct approaches cannot represent these. We present Direct Segmented Sonification (DSSon) for highlighting the segments' data distributions as individual sonic events. Using domain knowledge to segment data, DSSon presents segments as discrete auditory gestalts while retaining the overall temporal regime and relationships of the dataset. The method's structural decoupling from the sound stream's formation means playback speed is independent of the individual sonic event durations, thereby offering highly flexible time compression/stretching to allow zooming into or out of the data. Demonstrated by three models applied to biomechanical data, DSSon displays high directness, letting the data `speak' for themselves.
△ Less
Submitted 1 December, 2017; v1 submitted 30 November, 2017;
originally announced November 2017.
-
Sonification Aesthetics and Listening for Network Situational Awareness
Authors:
Paul Vickers,
Christopher Laing,
Mohamed Debashi,
Tom Fairfax
Abstract:
This paper looks at the problem of using sonification to enable network administrators to maintaining situational awareness about their network environment. Network environments generate a lot of data and the need for continuous monitoring means that sonification systems must be designed in such a way as to maximise acceptance while minimising annoyance and listener fatigue. It will be argued that…
▽ More
This paper looks at the problem of using sonification to enable network administrators to maintaining situational awareness about their network environment. Network environments generate a lot of data and the need for continuous monitoring means that sonification systems must be designed in such a way as to maximise acceptance while minimising annoyance and listener fatigue. It will be argued that solutions based on the concept of the soundscape offer an ecological advantage over other sonification designs.
△ Less
Submitted 18 September, 2014;
originally announced September 2014.
-
Sonification of a Network's Self-Organized Criticality
Authors:
Paul Vickers,
Chris Laing,
Tom Fairfax
Abstract:
Communication networks involve the transmission and reception of large volumes of data. Research indicates that network traffic volumes will continue to increase. These traffic volumes will be unprecedented and the behaviour of global information infrastructures when dealing with these data volumes is unknown. It has been shown that complex systems (including computer networks) exhibit self-organi…
▽ More
Communication networks involve the transmission and reception of large volumes of data. Research indicates that network traffic volumes will continue to increase. These traffic volumes will be unprecedented and the behaviour of global information infrastructures when dealing with these data volumes is unknown. It has been shown that complex systems (including computer networks) exhibit self-organized criticality under certain conditions. Given the possibility in such systems of a sudden and spontaneous system reset the development of techniques to inform system administrators of this behaviour could be beneficial. This article focuses on the combination of two dissimilar research concepts, namely sonification (a form of auditory display) and self-organized criticality (SOC). A system is described that sonifies in real time an information infrastructure's self-organized criticality to alert the network administrators of both normal and abnormal network traffic and operation.
△ Less
Submitted 17 July, 2014;
originally announced July 2014.
-
Ways of Listening and Modes of Being: Electroacoustic Auditory Display
Authors:
Paul Vickers
Abstract:
Auditory display is concerned with the use of non-speech sound to communicate information. If the term seems at first oxymoronic, then consider auditory display as an activity of perceptualization, that is, the process of making perceptible to humans aspects or features of a given data set or system. Most commonly this is done using visual representations (which process we call visualization) but…
▽ More
Auditory display is concerned with the use of non-speech sound to communicate information. If the term seems at first oxymoronic, then consider auditory display as an activity of perceptualization, that is, the process of making perceptible to humans aspects or features of a given data set or system. Most commonly this is done using visual representations (which process we call visualization) but it is not limited to the visual channel and recent years have witnessed the increased use of auditory representations in the production of tools for exploring data. By way of semiotics and an aesthetic perspective shift this article posits that auditory display may be considered a form of organized sound and explores the listening experience in this context.
△ Less
Submitted 21 November, 2013;
originally announced November 2013.
-
Lemma 4: Haptic Input + Auditory Display = Musical Instrument?
Authors:
Paul Vickers
Abstract:
In this paper we look at some of the design issues that affect the success of multimodal displays that combine acoustic and haptic modalities. First, issues affecting successful sonification design are explored and suggestions are made about how the language of electroacoustic music can assist. Next, haptic interaction is introduced in the light of this discussion, particularly focusing on the rol…
▽ More
In this paper we look at some of the design issues that affect the success of multimodal displays that combine acoustic and haptic modalities. First, issues affecting successful sonification design are explored and suggestions are made about how the language of electroacoustic music can assist. Next, haptic interaction is introduced in the light of this discussion, particularly focusing on the roles of gesture and mimesis. Finally, some observations are made regarding some of the issues that arise when the haptic and acoustic modalities are combined in the interface. This paper looks at examples of where auditory and haptic interaction have been successfully combined beyond the strict confines of the human-computer application interface (musical instruments in particular) and discusses lessons that may be drawn from these domains and applied to the world of multimodal human-computer interaction. The argument is made that combined haptic-auditory interaction schemes can be thought of as musical instruments and some of the possible ramifications of this are raised.
△ Less
Submitted 21 November, 2013;
originally announced November 2013.
-
The Well-tempered Compiler? The Aesthetics of Program Auralization
Authors:
Paul Vickers,
James L Alty
Abstract:
In this chapter we are concerned with external auditory representations of programs, also known as program auralization. As program auralization systems tend to use musical representations they are necessarily affected by artistic and aesthetic considerations. Therefore, it is instructive to explore program auralization in the light of aesthetic computing principles.
In this chapter we are concerned with external auditory representations of programs, also known as program auralization. As program auralization systems tend to use musical representations they are necessarily affected by artistic and aesthetic considerations. Therefore, it is instructive to explore program auralization in the light of aesthetic computing principles.
△ Less
Submitted 21 November, 2013;
originally announced November 2013.
-
Sonification Abstraite/Sonification Concrète: An 'Aesthetic Perspective Space' for Classifying Auditory Displays in the Ars Musica Domain
Authors:
Paul Vickers,
Bennett Hogg
Abstract:
This paper discusses æsthetic issues of sonifications and the relationships between sonification (ars informatica) and music & sound art (ars musica). It is posited that many sonifications have suffered from poor internal ecological validity which makes listening more difficult, thereby resulting in poorer data extraction and inference on the part of the listener. Lessons are drawn from the electr…
▽ More
This paper discusses æsthetic issues of sonifications and the relationships between sonification (ars informatica) and music & sound art (ars musica). It is posited that many sonifications have suffered from poor internal ecological validity which makes listening more difficult, thereby resulting in poorer data extraction and inference on the part of the listener. Lessons are drawn from the electroacoustic music and musique concrète communities as it is argued that it is not instructive to distinguish between sonifications and music/sound art.
△ Less
Submitted 21 November, 2013;
originally announced November 2013.
-
Understanding Visualization: A Formal Approach using Category Theory and Semiotics
Authors:
Paul Vickers,
Joe Faith,
Nick Rossiter
Abstract:
This article combines the vocabulary of semiotics and category theory to provide a formal analysis of visualization. It shows how familiar processes of visualization fit the semiotic frameworks of both Saussure and Peirce, and extends these structures using the tools of category theory to provide a general framework for understanding visualization in practice, including: relationships between syst…
▽ More
This article combines the vocabulary of semiotics and category theory to provide a formal analysis of visualization. It shows how familiar processes of visualization fit the semiotic frameworks of both Saussure and Peirce, and extends these structures using the tools of category theory to provide a general framework for understanding visualization in practice, including: relationships between systems, data collected from those systems, renderings of those data in the form of representations, the reading of those representations to create visualizations, and the use of those visualizations to create knowledge and understanding of the system under inspection. The resulting framework is validated by demonstrating how familiar information visualization concepts (such as literalness, sensitivity, redundancy, ambiguity, generalizability, and chart junk) arise naturally from it and can be defined formally and precisely. This article generalizes previous work on the formal characterization of visualization by, inter alia, Ziemkiewicz and Kosara and allows us to formally distinguish properties of the visualization process that previous work does not.
△ Less
Submitted 18 November, 2013;
originally announced November 2013.
-
The CAITLIN Auralization System: Hierarchical Leitmotif Design as a Clue to Program Comprehension
Authors:
James L. Alty,
Paul Vickers
Abstract:
Early experiments have suggested that program auralization can convey information about program structure [8]. Languages like Pascal contain classes of construct that are similar in nature allowing hierarchical classification of their features. This taxonomy can be reflected in the design of musical signatures which are used within the CAITLIN program auralization system. Experiments using these h…
▽ More
Early experiments have suggested that program auralization can convey information about program structure [8]. Languages like Pascal contain classes of construct that are similar in nature allowing hierarchical classification of their features. This taxonomy can be reflected in the design of musical signatures which are used within the CAITLIN program auralization system. Experiments using these hierarchical leitmotifs indicate whether or not their similarities can be put to good use in communicating information about program structure and state. (Note, at time of going to press experimental results could not be included. These will be presented at the conference and included later.)
△ Less
Submitted 18 November, 2013;
originally announced November 2013.