Search | arXiv e-print repository

Using autoencoders and deep transfer learning to determine the stellar parameters of 286 CARMENES M dwarfs

Authors: P. Mas-Buitrago, A. González-Marcos, E. Solano, V. M. Passegger, M. Cortés-Contreras, J. Ordieres-Meré, A. Bello-García, J. A. Caballero, A. Schweitzer, H. M. Tabernero, D. Montes, C. Cifuentes

Abstract: Deep learning (DL) techniques are a promising approach among the set of methods used in the ever-challenging determination of stellar parameters in M dwarfs. In this context, transfer learning could play an important role in mitigating uncertainties in the results due to the synthetic gap (i.e. difference in feature distributions between observed and synthetic data). We propose a feature-based dee… ▽ More Deep learning (DL) techniques are a promising approach among the set of methods used in the ever-challenging determination of stellar parameters in M dwarfs. In this context, transfer learning could play an important role in mitigating uncertainties in the results due to the synthetic gap (i.e. difference in feature distributions between observed and synthetic data). We propose a feature-based deep transfer learning (DTL) approach based on autoencoders to determine stellar parameters from high-resolution spectra. Using this methodology, we provide new estimations for the effective temperature, surface gravity, metallicity, and projected rotational velocity for 286 M dwarfs observed by the CARMENES survey. Using autoencoder architectures, we projected synthetic PHOENIX-ACES spectra and observed CARMENES spectra onto a new feature space of lower dimensionality in which the differences between the two domains are reduced. We used this low-dimensional new feature space as input for a convolutional neural network to obtain the stellar parameter determinations. We performed an extensive analysis of our estimated stellar parameters, ranging from 3050 to 4300 K, 4.7 to 5.1 dex, and -0.53 to 0.25 dex for Teff, logg, and [Fe/H], respectively. Our results are broadly consistent with those of recent studies using CARMENES data, with a systematic deviation in our Teff scale towards hotter values for estimations above 3750 K. Furthermore, our methodology mitigates the deviations in metallicity found in previous DL techniques due to the synthetic gap. We consolidated a DTL-based methodology to determine stellar parameters in M dwarfs from synthetic spectra, with no need for high-quality measurements involved in the knowledge transfer. These results suggest the great potential of DTL to mitigate the differences in feature distributions between the observations and the PHOENIX-ACES spectra. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Accepted in A&A

arXiv:2309.03592 [pdf, other]

doi 10.1145/3576915.3623094

Cybercrime Bitcoin Revenue Estimations: Quantifying the Impact of Methodology and Coverage

Authors: Gibran Gomez, Kevin van Liebergen, Juan Caballero

Abstract: Multiple works have leveraged the public Bitcoin ledger to estimate the revenue cybercriminals obtain from their victims. Estimations focusing on the same target often do not agree, due to the use of different methodologies, seed addresses, and time periods. These factors make it challenging to understand the impact of their methodological differences. Furthermore, they underestimate the revenue d… ▽ More Multiple works have leveraged the public Bitcoin ledger to estimate the revenue cybercriminals obtain from their victims. Estimations focusing on the same target often do not agree, due to the use of different methodologies, seed addresses, and time periods. These factors make it challenging to understand the impact of their methodological differences. Furthermore, they underestimate the revenue due to the (lack of) coverage on the target's payment addresses, but how large this impact remains unknown. In this work, we perform the first systematic analysis on the estimation of cybercrime bitcoin revenue. We implement a tool that can replicate the different estimation methodologies. Using our tool we can quantify, in a controlled setting, the impact of the different methodology steps. In contrast to what is widely believed, we show that the revenue is not always underestimated. There exist methodologies that can introduce huge overestimation. We collect 30,424 payment addresses and use them to compare the financial impact of 6 cybercrimes (ransomware, clippers, sextortion, Ponzi schemes, giveaway scams, exchange scams) and of 141 cybercriminal groups. We observe that the popular multi-input clustering fails to discover addresses for 40% of groups. We quantify, for the first time, the impact of the (lack of) coverage on the estimation. For this, we propose two techniques to achieve high coverage, possibly nearly complete, on the DeadBolt server ransomware. Our expanded coverage enables estimating DeadBolt's revenue at $2.47M, 39 times higher than the estimation using two popular Internet scan engines. △ Less

Submitted 27 November, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

arXiv:2307.14657 [pdf, other]

Decoding the Secrets of Machine Learning in Malware Classification: A Deep Dive into Datasets, Feature Extraction, and Model Performance

Authors: Savino Dambra, Yufei Han, Simone Aonzo, Platon Kotzias, Antonino Vitale, Juan Caballero, Davide Balzarotti, Leyla Bilge

Abstract: Many studies have proposed machine-learning (ML) models for malware detection and classification, reporting an almost-perfect performance. However, they assemble ground-truth in different ways, use diverse static- and dynamic-analysis techniques for feature extraction, and even differ on what they consider a malware family. As a consequence, our community still lacks an understanding of malware cl… ▽ More Many studies have proposed machine-learning (ML) models for malware detection and classification, reporting an almost-perfect performance. However, they assemble ground-truth in different ways, use diverse static- and dynamic-analysis techniques for feature extraction, and even differ on what they consider a malware family. As a consequence, our community still lacks an understanding of malware classification results: whether they are tied to the nature and distribution of the collected dataset, to what extent the number of families and samples in the training dataset influence performance, and how well static and dynamic features complement each other. This work sheds light on those open questions. by investigating the key factors influencing ML-based malware detection and classification. For this, we collect the largest balanced malware dataset so far with 67K samples from 670 families (100 samples each), and train state-of-the-art models for malware detection and family classification using our dataset. Our results reveal that static features perform better than dynamic features, and that combining both only provides marginal improvement over static features. We discover no correlation between packing and classification accuracy, and that missing behaviors in dynamically-extracted features highly penalize their performance. We also demonstrate how a larger number of families to classify make the classification harder, while a higher number of samples per family increases accuracy. Finally, we find that models trained on a uniform distribution of samples per family better generalize on unseen data. △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2306.11544 [pdf, other]

doi 10.1145/3592979.3593403

Lessons learned from a performance analysis and optimization of a multiscale cellular simulation

Authors: Marc Clascà, Marta Garcia-Gasulla, Arnau Montagud, Jose Carbonell Caballero, Alfonso Valencia

Abstract: This work presents a comprehensive performance analysis and optimization of a multiscale agent-based cellular simulation. The optimizations applied are guided by detailed performance analysis and include memory management, load balance, and a locality-aware parallelization. The outcome of this paper is not only the speedup of 2.4x achieved by the optimized version with respect to the original Phys… ▽ More This work presents a comprehensive performance analysis and optimization of a multiscale agent-based cellular simulation. The optimizations applied are guided by detailed performance analysis and include memory management, load balance, and a locality-aware parallelization. The outcome of this paper is not only the speedup of 2.4x achieved by the optimized version with respect to the original PhysiCell code, but also the lessons learned and best practices when develo** parallel HPC codes to obtain efficient and highly performant applications, especially in the computational biology field. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Journal ref: Proceedings of the Platform for Advanced Scientific Computing Conference, 2023, art. 4

arXiv:2301.12872 [pdf, other]

doi 10.1051/0004-6361/202245092

A Machine Learning approach for correcting radial velocities using physical observables

Authors: M. Perger, G. Anglada-Escudé, D. Baroch, M. Lafarga, I. Ribas, J. C. Morales, E. Herrero, P. J. Amado, J. R. Barnes, J. A. Caballero, S. V. Jeffers, A. Quirrenbach, A. Reiners

Abstract: Precision radial velocity (RV) measurements continue to be a key tool to detect and characterise extrasolar planets. While instrumental precision keeps improving, stellar activity remains a barrier to obtain reliable measurements below 1-2 m/s accuracy. Using simulations and real data, we investigate the capabilities of a Deep Neural Network approach to produce activity free Doppler measurements o… ▽ More Precision radial velocity (RV) measurements continue to be a key tool to detect and characterise extrasolar planets. While instrumental precision keeps improving, stellar activity remains a barrier to obtain reliable measurements below 1-2 m/s accuracy. Using simulations and real data, we investigate the capabilities of a Deep Neural Network approach to produce activity free Doppler measurements of stars. As case studies we use observations of two known stars (Eps Eridani and AUMicroscopii), both with clear signals of activity induced RV variability. Synthetic data using the starsim code are generated for the observables (inputs) and the resulting RV signal (labels), and used to train a Deep Neural Network algorithm. We identify an architecture consisting of convolutional and fully connected layers that is adequate to the task. The indices investigated are mean line-profile parameters (width, bisector, contrast) and multi-band photometry. We demonstrate that the RV-independent approach can drastically reduce spurious Doppler variability from known physical effects such as spots, rotation and convective blueshift. We identify the combinations of activity indices with most predictive power. When applied to real observations, we observe a good match of the correction with the observed variability, but we also find that the noise reduction is not as good as in the simulations, probably due to the lack of detail in the simulated physics. We demonstrate that a model-driven machine learning approach is sufficient to clean Doppler signals from activity induced variability for well known physical effects. There are dozens of known activity related observables whose inversion power remains unexplored indicating that the use of additional indicators, more complete models, and more observations with optimised sampling strategies can lead to significant improvements in our detrending capabilities. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Journal ref: A&A 672, A118 (2023)

arXiv:2301.07346 [pdf, other]

One Size Does not Fit All: Quantifying the Risk of Malicious App Encounters for Different Android User Profiles

Authors: Savino Dambra, Leyla Bilge, Platon Kotzias, Yun Shen, Juan Caballero

Abstract: Previous work has investigated the particularities of security practices within specific user communities defined based on country of origin, age, prior tech abuse, and economic status. Their results highlight that current security solutions that adopt a one-size-fits-all-users approach ignore the differences and needs of particular user communities. However, those works focus on a single communit… ▽ More Previous work has investigated the particularities of security practices within specific user communities defined based on country of origin, age, prior tech abuse, and economic status. Their results highlight that current security solutions that adopt a one-size-fits-all-users approach ignore the differences and needs of particular user communities. However, those works focus on a single community or cluster users into hard-to-interpret sub-populations. In this work, we perform a large-scale quantitative analysis of the risk of encountering malware and other potentially unwanted applications (PUA) across user communities. At the core of our study is a dataset of app installation logs collected from 12M Android mobile devices. Leveraging user-installed apps, we define intuitive profiles based on users' interests (e.g., gamers and investors), and fit a subset of 5.4M devices to those profiles. Our analysis is structured in three parts. First, we perform risk analysis on the whole population to measure how the risk of malicious app encounters is affected by different factors. Next, we create different profiles to investigate whether risk differences across users may be due to their interests. Finally, we compare a per-profile approach for classifying clean and infected devices with the classical approach that considers the whole population. We observe that features such as the diversity of the app signers and the use of alternative markets highly correlate with the risk of malicious app encounters. We also discover that some profiles such as gamers and social-media users are exposed to more than twice the risks experienced by the average users. We also show that the classification outcome has a marked accuracy improvement when using a per-profile approach to train the prediction models. Overall, our results confirm the inadequacy of one-size-fits-all protection solutions. △ Less

Submitted 18 January, 2023; originally announced January 2023.

arXiv:2210.15973 [pdf, other]

A Deep Dive into VirusTotal: Characterizing and Clustering a Massive File Feed

Authors: Kevin van Liebergen, Juan Caballero, Platon Kotzias, Chris Gates

Abstract: Online scanners analyze user-submitted files with a large number of security tools and provide access to the analysis results. As the most popular online scanner, VirusTotal (VT) is often used for determining if samples are malicious, labeling samples with their family, hunting for new threats, and collecting malware samples. We analyze 328M VT reports for 235M samples collected for one year throu… ▽ More Online scanners analyze user-submitted files with a large number of security tools and provide access to the analysis results. As the most popular online scanner, VirusTotal (VT) is often used for determining if samples are malicious, labeling samples with their family, hunting for new threats, and collecting malware samples. We analyze 328M VT reports for 235M samples collected for one year through the VT file feed. We use the reports to characterize the VT file feed in depth and compare it with the telemetry of a large security vendor. We answer questions such as How diverse is the feed? Does it allow building malware datasets for different filetypes? How fresh are the samples it provides? What is the distribution of malware families it sees? Does that distribution really represent malware on user devices? We then explore how to perform threat hunting at scale by investigating scalable approaches that can produce high purity clusters on the 235M feed samples. We investigate three clustering approaches: hierarchical agglomerative clustering (HAC), a more scalable HAC variant for TLSH digests (HAC-T), and a simple feature value grou** (FVG). Our results show that HAC-T and FVG using selected features produce high precision clusters on ground truth datasets. However, only FVG scales to the daily influx of samples in the feed. Moreover, FVG takes 15 hours to cluster the whole dataset of 235M samples. Finally, we use the produced clusters for threat hunting, namely for detecting 190K samples thought to be benign (i.e., with zero detections) that may really be malicious because they belong to 29K clusters where most samples are detected as malicious. △ Less

Submitted 31 October, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: 16 pages, 4 figures

arXiv:2208.00042 [pdf, other]

doi 10.1016/j.future.2023.02.012

The Rise of GoodFATR: A Novel Accuracy Comparison Methodology for Indicator Extraction Tools

Authors: Juan Caballero, Gibran Gomez, Srdjan Matic, Gustavo Sánchez, Silvia Sebastián, Arturo Villacañas

Abstract: To adapt to a constantly evolving landscape of cyber threats, organizations actively need to collect Indicators of Compromise (IOCs), i.e., forensic artifacts that signal that a host or network might have been compromised. IOCs can be collected through open-source and commercial structured IOC feeds. But, they can also be extracted from a myriad of unstructured threat reports written in natural la… ▽ More To adapt to a constantly evolving landscape of cyber threats, organizations actively need to collect Indicators of Compromise (IOCs), i.e., forensic artifacts that signal that a host or network might have been compromised. IOCs can be collected through open-source and commercial structured IOC feeds. But, they can also be extracted from a myriad of unstructured threat reports written in natural language and distributed using a wide array of sources such as blogs and social media. There exist multiple indicator extraction tools that can identify IOCs in natural language reports. But, it is hard to compare their accuracy due to the difficulty of building large ground truth datasets. This work presents a novel majority vote methodology for comparing the accuracy of indicator extraction tools, which does not require a manually-built ground truth. We implement our methodology into GoodFATR, an automated platform for collecting threat reports from a wealth of sources, extracting IOCs from the collected reports using multiple tools, and comparing their accuracy. GoodFATR supports 6 threat report sources: RSS, Twitter, Telegram, Malpedia, APTnotes, and ChainSmith. GoodFATR continuously monitors the sources, downloads new threat reports, extracts 41 indicator types from the collected reports, and filters non-malicious indicators to output the IOCs. We run GoodFATR over 15 months to collect 472,891 reports from the 6 sources; extract 978,151 indicators from the reports; and identify 618,217 IOCs. We analyze the collected data to identify the top IOC contributors and the IOC class distribution. We apply GoodFATR to compare the IOC extraction accuracy of 7 popular open-source tools with GoodFATR's own indicator extraction module. △ Less

Submitted 8 March, 2023; v1 submitted 29 July, 2022; originally announced August 2022.

Journal ref: Future Generation Computer Systems, Volume 144, 2023, Pages 74-89

arXiv:2206.00375 [pdf, other]

Watch Your Back: Identifying Cybercrime Financial Relationships in Bitcoin through Back-and-Forth Exploration

Authors: Gibran Gomez, Pedro Moreno-Sanchez, Juan Caballero

Abstract: Cybercriminals often leverage Bitcoin for their illicit activities. In this work, we propose back-and-forth exploration, a novel automated Bitcoin transaction tracing technique to identify cybercrime financial relationships. Given seed addresses belonging to a cybercrime campaign, it outputs a transaction graph, and identifies paths corresponding to relationships between the campaign under study a… ▽ More Cybercriminals often leverage Bitcoin for their illicit activities. In this work, we propose back-and-forth exploration, a novel automated Bitcoin transaction tracing technique to identify cybercrime financial relationships. Given seed addresses belonging to a cybercrime campaign, it outputs a transaction graph, and identifies paths corresponding to relationships between the campaign under study and external services and other cybercrime campaigns. Back-and-forth exploration provides two key contributions. First, it explores both forward and backwards, instead of only forward as done by prior work, enabling the discovery of relationships that cannot be found by only exploring forward (e.g., deposits from clients of a mixer). Second, it prevents graph explosion by combining a tagging database with a machine learning classifier for identifying addresses belonging to exchanges. We evaluate back-and-forth exploration on 30 malware families. We build oracles for 4 families using Bitcoin for C&C and use them to demonstrate that back-and-forth exploration identifies 13 C&C signaling addresses missed by prior work, 8 of which are fundamentally missed by forward-only explorations. Our approach uncovers a wealth of services used by the malware including 44 exchanges, 11 gambling sites, 5 payment service providers, 4 underground markets, 4 mining pools, and 2 mixers. In 4 families, the relations include new attribution points missed by forward-only explorations. It also identifies relationships between the malware families and other cybercrime campaigns, highlighting how some malware operators participate in a variety of cybercriminal activities. △ Less

Submitted 18 October, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

arXiv:2109.03878 [pdf, other]

doi 10.1155/1970/3676692

Unsupervised Detection and Clustering of Malicious TLS Flows

Authors: Gibran Gomez, Platon Kotzias, Matteo Dell'Amico, Leyla Bilge, Juan Caballero

Abstract: Malware abuses TLS to encrypt its malicious traffic, preventing examination by content signatures and deep packet inspection. Network detection of malicious TLS flows is an important, but challenging, problem. Prior works have proposed supervised machine learning detectors using TLS features. However, by trying to represent all malicious traffic, supervised binary detectors produce models that are… ▽ More Malware abuses TLS to encrypt its malicious traffic, preventing examination by content signatures and deep packet inspection. Network detection of malicious TLS flows is an important, but challenging, problem. Prior works have proposed supervised machine learning detectors using TLS features. However, by trying to represent all malicious traffic, supervised binary detectors produce models that are too loose, thus introducing errors. Furthermore, they do not distinguish flows generated by different malware. On the other hand, supervised multi-class detectors produce tighter models and can classify flows by malware family, but require family labels, which are not available for many samples. To address these limitations, this work proposes a novel unsupervised approach to detect and cluster malicious TLS flows. Our approach takes as input network traces from sandboxes. It clusters similar TLS flows using 90 features that capture properties of the TLS client, TLS server, certificate, and encrypted payload; and uses the clusters to build an unsupervised detector that can assign a malicious flow to the cluster it belongs to, or determine it is benign. We evaluate our approach using 972K traces from a commercial sandbox and 35M TLS flows from a research network. Our clustering shows very high precision and recall with an F1 score of 0.993. We compare our unsupervised detector with two state-of-the-art approaches, showing that it outperforms both. The false detection rate of our detector is 0.032% measured over four months of traffic. △ Less

Submitted 23 December, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

arXiv:2108.08300 [pdf, other]

Renormalized Wolfram model exhibiting non-relativistic quantum behavior

Authors: José Manuel Rodríguez Caballero

Abstract: We show a Wolfram model whose renormalization generates a sequence of approximations of a wave function having the Pauli-x matrix as Hamiltonian. We show a Wolfram model whose renormalization generates a sequence of approximations of a wave function having the Pauli-x matrix as Hamiltonian. △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: 2 figures

arXiv:2108.03751 [pdf, ps, other]

Incompatibility between 't Hooft's and Wolfram's models of quantum mechanics

Authors: José Manuel Rodríguez Caballero

Abstract: Stephen Wolfram and Gerard 't Hooft developed classical models of quantum mechanics. We show that the descriptive complexity grows differently as a function of time in each model. Therefore, they cannot describe the same physical system. In addition, we propose an interpretation of the Wolfram model, which shares some characteristics with 't Hooft's model, but which involves a non-computable funct… ▽ More Stephen Wolfram and Gerard 't Hooft developed classical models of quantum mechanics. We show that the descriptive complexity grows differently as a function of time in each model. Therefore, they cannot describe the same physical system. In addition, we propose an interpretation of the Wolfram model, which shares some characteristics with 't Hooft's model, but which involves a non-computable function. △ Less

Submitted 8 August, 2021; originally announced August 2021.

arXiv:2104.06310 [pdf, other]

Exploration of Spanish Olive Oil Quality with a Miniaturized Low-Cost Fluorescence Sensor and Machine Learning Techniques

Authors: Francesca Venturini, Michela Sperti, Umberto Michelucci, Ivo Herzig, Michael Baumgartner, Josep Palau Caballero, Arturo Jimenez, and Marco Agostino Deriu

Abstract: Extra virgin olive oil (EVOO) is the highest quality of olive oil and is characterized by highly beneficial nutritional properties. The large increase in both consumption and fraud, for example through adulteration, creates new challenges and an increasing demand for develo** new quality assessment methodologies that are easier and cheaper to perform. As of today, the determination of olive oil… ▽ More Extra virgin olive oil (EVOO) is the highest quality of olive oil and is characterized by highly beneficial nutritional properties. The large increase in both consumption and fraud, for example through adulteration, creates new challenges and an increasing demand for develo** new quality assessment methodologies that are easier and cheaper to perform. As of today, the determination of olive oil quality is performed by producers through chemical analysis and organoleptic evaluation. The chemical analysis requires the advanced equipment and chemical knowledge of certified laboratories, and has therefore a limited accessibility. In this work a minimalist, portable and low-cost sensor is presented, which can perform olive oil quality assessment using fluorescence spectroscopy. The potential of the proposed technology is explored by analyzing several olive oils of different quality levels, EVOO, virgin olive oil (VOO), and lampante olive oil (LOO). The spectral data were analyzed using a large number of machine learning methods, including artificial neural networks. The analysis performed in this work demonstrates the possibility of performing classification of olive oil in the three mentioned classes with an accuracy of 100$\%$. These results confirm that this minimalist low-cost sensor has the potential of substituting expensive and complex chemical analysis. △ Less

Submitted 9 April, 2021; originally announced April 2021.

arXiv:2104.00868 [pdf, other]

Inference of Recyclable Objects with Convolutional Neural Networks

Authors: Jaime Caballero, Francisco Vergara, Randal Miranda, José Serracín

Abstract: Population growth in the last decades has resulted in the production of about 2.01 billion tons of municipal waste per year. The current waste management systems are not capable of providing adequate solutions for the disposal and use of these wastes. Recycling and reuse have proven to be a solution to the problem, but large-scale waste segregation is a tedious task and on a small scale it depends… ▽ More Population growth in the last decades has resulted in the production of about 2.01 billion tons of municipal waste per year. The current waste management systems are not capable of providing adequate solutions for the disposal and use of these wastes. Recycling and reuse have proven to be a solution to the problem, but large-scale waste segregation is a tedious task and on a small scale it depends on public awareness. This research used convolutional neural networks and computer vision to develop a tool for the automation of solid waste sorting. The Fotini10k dataset was constructed, which has more than 10,000 images divided into the categories of 'plastic bottles', 'aluminum cans' and 'paper and cardboard'. ResNet50, MobileNetV1 and MobileNetV2 were retrained with ImageNet weights on the Fotini10k dataset. As a result, top-1 accuracy of 99% was obtained in the test dataset with all three networks. To explore the possible use of these networks in mobile applications, the three nets were quantized in float16 weights. By doing so, it was possible to obtain inference times twice as low for Raspberry Pi and three times as low for computer processing units. It was also possible to reduce the size of the networks by half. When quantizing the top-1 accuracy of 99% was maintained with all three networks. When quantizing MobileNetV2 to int-8, it obtained a top-1 accuracy of 97%. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: 11 pages, preprint version, comments are welcome!

arXiv:2010.11627

Malware Traffic Classification: Evaluation of Algorithms and an Automated Ground-truth Generation Pipeline

Authors: Syed Muhammad Kumail Raza, Juan Caballero

Abstract: Identifying threats in a network traffic flow which is encrypted is uniquely challenging. On one hand it is extremely difficult to simply decrypt the traffic due to modern encryption algorithms. On the other hand, passing such an encrypted stream through pattern matching algorithms is useless because encryption ensures there aren't any. Moreover, evaluating such models is also difficult due to lac… ▽ More Identifying threats in a network traffic flow which is encrypted is uniquely challenging. On one hand it is extremely difficult to simply decrypt the traffic due to modern encryption algorithms. On the other hand, passing such an encrypted stream through pattern matching algorithms is useless because encryption ensures there aren't any. Moreover, evaluating such models is also difficult due to lack of labeled benign and malware datasets. Other approaches have tried to tackle this problem by employing observable meta-data gathered from the flow. We try to augment this approach by extending it to a semi-supervised malware classification pipeline using these observable meta-data. To this end, we explore and test different kind of clustering approaches which make use of unique and diverse set of features extracted from this observable meta-data. We also, propose an automated packet data-labeling pipeline to generate ground-truth data which can serve as a base-line to evaluate the classifiers mentioned above in particular, or any other detection model in general. △ Less

Submitted 7 November, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: Submission to arxiv was not authorized by one of the authors. They request removal of the document. Suspected plagiarised work without proper acknowledgement

ACM Class: I.2.6; K.6.5

arXiv:2010.10088 [pdf, other]

How Did That Get In My Phone? Unwanted App Distribution on Android Devices

Authors: Platon Kotzias, Juan Caballero, Leyla Bilge

Abstract: Android is the most popular operating system with billions of active devices. Unfortunately, its popularity and openness makes it attractive for unwanted apps, i.e., malware and potentially unwanted programs (PUP). In Android, app installations typically happen via the official and alternative markets, but also via other smaller and less understood alternative distribution vectors such as Web down… ▽ More Android is the most popular operating system with billions of active devices. Unfortunately, its popularity and openness makes it attractive for unwanted apps, i.e., malware and potentially unwanted programs (PUP). In Android, app installations typically happen via the official and alternative markets, but also via other smaller and less understood alternative distribution vectors such as Web downloads, pay-per-install (PPI) services, backup restoration, bloatware, and IM tools. This work performs a thorough investigation on unwanted app distribution by quantifying and comparing distribution through different vectors. At the core of our measurements are reputation logs of a large security vendor, which include 7.9M apps observed in 12M devices between June and September 2019. As a first step, we measure that between 10% and 24% of users devices encounter at least one unwanted app, and compare the prevalence of malware and PUP. An analysis of the who-installs-who relationships between installers and child apps reveals that the Play market is the main app distribution vector, responsible for 87% of all installs and 67% of unwanted app installs, but it also has the best defenses against unwanted apps. Alternative markets distribute instead 5.7% of all apps, but over 10% of unwanted apps. Bloatware is also a significant unwanted app distribution vector with 6% of those installs. And, backup restoration is an unintentional distribution vector that may even allow unwanted apps to survive users' phone replacement. We estimate unwanted app distribution via PPI to be smaller than on Windows. Finally, we observe that Web downloads are rare, but provide a riskier proposition even compared to alternative markets. △ Less

Submitted 20 October, 2020; originally announced October 2020.

Comments: 17 pages, 3 figures, to be published at 42nd IEEE Symposium on Security and Privacy, 23-27 May 2021, San Fransisco, CA, USA

arXiv:2008.05894 [pdf, ps, other]

Homophilic networks evolving by mimesis

Authors: Jose Manuel Rodriguez Caballero

Abstract: We provide a mathematical model for networks based on similarities (homophily) and evolving by mutual imitation (mimesis). We show that such social networks will converge to a state of segregation, where the in-group interactions will be maximal and there will be no out-group flow of information. We establish some connections between our model and the Wolfram model for fundamental physics. We provide a mathematical model for networks based on similarities (homophily) and evolving by mutual imitation (mimesis). We show that such social networks will converge to a state of segregation, where the in-group interactions will be maximal and there will be no out-group flow of information. We establish some connections between our model and the Wolfram model for fundamental physics. △ Less

Submitted 8 August, 2020; originally announced August 2020.

arXiv:2006.10615 [pdf, other]

AVClass2: Massive Malware Tag Extraction from AV Labels

Authors: Silvia Sebastián, Juan Caballero

Abstract: Tags can be used by malware repositories and analysis services to enable searches for samples of interest across different dimensions. Automatically extracting tags from AV labels is an efficient approach to categorize and index massive amounts of samples. Recent tools like AVClass and Euphony have demonstrated that, despite their noisy nature, it is possible to extract family names from AV labels… ▽ More Tags can be used by malware repositories and analysis services to enable searches for samples of interest across different dimensions. Automatically extracting tags from AV labels is an efficient approach to categorize and index massive amounts of samples. Recent tools like AVClass and Euphony have demonstrated that, despite their noisy nature, it is possible to extract family names from AV labels. However, beyond the family name, AV labels contain much valuable information such as malware classes, file properties, and behaviors. This work presents AVClass2, an automatic malware tagging tool that given the AV labels for a potentially massive number of samples, extracts clean tags that categorize the samples. AVClass2 uses, and helps building, an open taxonomy that organizes concepts in AV labels, but is not constrained to a predefined set of tags. To keep itself updated as AV vendors introduce new tags, it provides an update module that automatically identifies new taxonomy entries, as well as tagging and expansion rules that capture relations between tags. We have evaluated AVClass2 on 42M and showed how it enables advanced malware searches and to maintain an updated knowledge base of malware concepts in AV labels. △ Less

Submitted 28 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: 12 pages, 3 figures

ACM Class: D.4.6

arXiv:1909.11795 [pdf, other]

Data consistency networks for (calibration-less) accelerated parallel MR image reconstruction

Authors: Jo Schlemper, **ming Duan, Cheng Ouyang, Chen Qin, Jose Caballero, Joseph V. Hajnal, Daniel Rueckert

Abstract: We present simple reconstruction networks for multi-coil data by extending deep cascade of CNN's and exploiting the data consistency layer. In particular, we propose two variants, where one is inspired by POCSENSE and the other is calibration-less. We show that the proposed approaches are competitive relative to the state of the art both quantitatively and qualitatively. We present simple reconstruction networks for multi-coil data by extending deep cascade of CNN's and exploiting the data consistency layer. In particular, we propose two variants, where one is inspired by POCSENSE and the other is calibration-less. We show that the proposed approaches are competitive relative to the state of the art both quantitatively and qualitatively. △ Less

Submitted 25 September, 2019; originally announced September 2019.

Comments: Presented at ISMRM 27th Annual Meeting & Exhibition (Abstract #4663)

arXiv:1909.11424 [pdf, other]

A Survey of Binary Code Similarity

Authors: Irfan Ul Haq, Juan Caballero

Abstract: Binary code similarity approaches compare two or more pieces of binary code to identify their similarities and differences. The ability to compare binary code enables many real-world applications on scenarios where source code may not be available such as patch analysis, bug search, and malware detection and analysis. Over the past 20 years numerous binary code similarity approaches have been prop… ▽ More Binary code similarity approaches compare two or more pieces of binary code to identify their similarities and differences. The ability to compare binary code enables many real-world applications on scenarios where source code may not be available such as patch analysis, bug search, and malware detection and analysis. Over the past 20 years numerous binary code similarity approaches have been proposed, but the research area has not yet been systematically analyzed. This paper presents a first survey of binary code similarity. It analyzes 61 binary code similarity approaches, which are systematized on four aspects: (1) the applications they enable, (2) their approach characteristics, (3) how the approaches are implemented, and (4) the benchmarks and methodologies used to evaluate them. In addition, the survey discusses the scope and origins of the area, its evolution over the past two decades, and the challenges that lie ahead. △ Less

Submitted 25 September, 2019; originally announced September 2019.

arXiv:1908.02204 [pdf, other]

Cross-Origin State Inference (COSI) Attacks: Leaking Web Site States through XS-Leaks

Authors: Avinash Sudhodanan, Soheil Khodayari, Juan Caballero

Abstract: In a Cross-Origin State Inference (COSI) attack, an attacker convinces a victim into visiting an attack web page, which leverages the cross-origin interaction features of the victim's web browser to infer the victim's state at a target web site. Multiple instances of COSI attacks have been found in the past under different names such as login detection or access detection attacks. But, those attac… ▽ More In a Cross-Origin State Inference (COSI) attack, an attacker convinces a victim into visiting an attack web page, which leverages the cross-origin interaction features of the victim's web browser to infer the victim's state at a target web site. Multiple instances of COSI attacks have been found in the past under different names such as login detection or access detection attacks. But, those attacks only consider two states (e.g., logged in or not) and focus on a specific browser leak method (or XS-Leak). This work shows that mounting more complex COSI attacks such as deanonymizing the owner of an account, determining if the victim owns sensitive content, and determining the victim's account type often requires considering more than two states. Furthermore, robust attacks require supporting a variety of browsers since the victim's browser cannot be predicted apriori. To address these issues, we present a novel approach to identify and build complex COSI attacks that differentiate more than two states and support multiple browsers by combining multiple attack vectors, possibly using different XS-Leaks. To enable our approach, we introduce the concept of a COSI attack class. We propose two novel techniques to generalize existing COSI attack instances into COSI attack classes and to discover new COSI attack classes. We systematically apply our techniques to existing attacks, identifying 40 COSI attack classes. As part of this process, we discover a novel XS-Leak based on window.postMessage. We implement our approach into Basta-COSI, a tool to find COSI attacks in a target web site. We apply Basta-COSI to test four stand-alone web applications and 58 popular web sites, finding COSI attacks against each of them. △ Less

Submitted 31 January, 2020; v1 submitted 6 August, 2019; originally announced August 2019.

arXiv:1907.06160 [pdf, other]

Smile, Be Happy :) Emoji Embedding for Visual Sentiment Analysis

Authors: Ziad Al-Halah, Andrew Aitken, Wenzhe Shi, Jose Caballero

Abstract: Due to the lack of large-scale datasets, the prevailing approach in visual sentiment analysis is to leverage models trained for object classification in large datasets like ImageNet. However, objects are sentiment neutral which hinders the expected gain of transfer learning for such tasks. In this work, we propose to overcome this problem by learning a novel sentiment-aligned image embedding that… ▽ More Due to the lack of large-scale datasets, the prevailing approach in visual sentiment analysis is to leverage models trained for object classification in large datasets like ImageNet. However, objects are sentiment neutral which hinders the expected gain of transfer learning for such tasks. In this work, we propose to overcome this problem by learning a novel sentiment-aligned image embedding that is better suited for subsequent visual sentiment analysis. Our embedding leverages the intricate relation between emojis and images in large-scale and readily available data from social media. Emojis are language-agnostic, consistent, and carry a clear sentiment signal which make them an excellent proxy to learn a sentiment aligned embedding. Hence, we construct a novel dataset of 4 million images collected from Twitter with their associated emojis. We train a deep neural model for image embedding using emoji prediction task as a proxy. Our evaluation demonstrates that the proposed embedding outperforms the popular object-based counterpart consistently across several sentiment analysis benchmarks. Furthermore, without bell and whistles, our compact, effective and simple embedding outperforms the more elaborate and customized state-of-the-art deep models on these public benchmarks. Additionally, we introduce a novel emoji representation based on their visual emotional response which supports a deeper understanding of the emoji modality and their usage on social media. △ Less

Submitted 8 August, 2020; v1 submitted 13 July, 2019; originally announced July 2019.

Comments: International Conference on Computer Vision (ICCV 2019) Workshops. Project page and the Visual Smiley Dataset: https://www.cs.utexas.edu/~ziad/emoji_visual_sentiment.html

arXiv:1902.10815 [pdf, other]

Generalizing Deep Learning MRI Reconstruction across Different Domains

Authors: Cheng Ouyang, Jo Schlemper, Carlo Biffi, Gavin Seegoolam, Jose Caballero, Anthony N. Price, Joseph V. Hajnal, Daniel Rueckert

Abstract: We look into the robustness of deep learning based MRI reconstruction when tested on unseen contrasts and organs. We then propose to generalize the network by training with large publicly-available natural image datasets with synthesized phase information to achieve high cross-domain reconstruction performance which is competitive with domain-specific training. To explain its generalization mechan… ▽ More We look into the robustness of deep learning based MRI reconstruction when tested on unseen contrasts and organs. We then propose to generalize the network by training with large publicly-available natural image datasets with synthesized phase information to achieve high cross-domain reconstruction performance which is competitive with domain-specific training. To explain its generalization mechanism, we have also analyzed patch sets for different training datasets. △ Less

Submitted 7 August, 2023; v1 submitted 31 January, 2019; originally announced February 2019.

Comments: Accepted for ISBI2019 as a 1-page abstract

arXiv:1902.03876 [pdf, other]

Deep Hashing using Entropy Regularised Product Quantisation Network

Authors: Jo Schlemper, Jose Caballero, Andy Aitken, Joost van Amersfoort

Abstract: In large scale systems, approximate nearest neighbour search is a crucial algorithm to enable efficient data retrievals. Recently, deep learning-based hashing algorithms have been proposed as a promising paradigm to enable data dependent schemes. Often their efficacy is only demonstrated on data sets with fixed, limited numbers of classes. In practical scenarios, those labels are not always availa… ▽ More In large scale systems, approximate nearest neighbour search is a crucial algorithm to enable efficient data retrievals. Recently, deep learning-based hashing algorithms have been proposed as a promising paradigm to enable data dependent schemes. Often their efficacy is only demonstrated on data sets with fixed, limited numbers of classes. In practical scenarios, those labels are not always available or one requires a method that can handle a higher input variability, as well as a higher granularity. To fulfil those requirements, we look at more flexible similarity measures. In this work, we present a novel, flexible, end-to-end trainable network for large-scale data hashing. Our method works by transforming the data distribution to behave as a uniform distribution on a product of spheres. The transformed data is subsequently hashed to a binary form in a way that maximises entropy of the output, (i.e. to fully utilise the available bit-rate capacity) while maintaining the correctness (i.e. close items hash to the same key in the map). We show that the method outperforms baseline approaches such as locality-sensitive hashing and product quantisation in the limited capacity regime. △ Less

Submitted 11 February, 2019; originally announced February 2019.

arXiv:1901.04747 [pdf, other]

Spectral estimation for detecting low-dimensional structure in networks using arbitrary null models

Authors: Mark D. Humphries, Javier A. Caballero, Mat Evans, Silvia Maggi, Abhinav Singh

Abstract: Discovering low-dimensional structure in real-world networks requires a suitable null model that defines the absence of meaningful structure. Here we introduce a spectral approach for detecting a network's low-dimensional structure, and the nodes that participate in it, using any null model. We use generative models to estimate the expected eigenvalue distribution under a specified null model, and… ▽ More Discovering low-dimensional structure in real-world networks requires a suitable null model that defines the absence of meaningful structure. Here we introduce a spectral approach for detecting a network's low-dimensional structure, and the nodes that participate in it, using any null model. We use generative models to estimate the expected eigenvalue distribution under a specified null model, and then detect where the data network's eigenspectra exceed the estimated bounds. On synthetic networks, this spectral estimation approach cleanly detects transitions between random and community structure, recovers the number and membership of communities, and removes noise nodes. On real networks spectral estimation finds either a significant fraction of noise nodes or no departure from a null model, in stark contrast to traditional community detection methods. Across all analyses, we find the choice of null model can strongly alter conclusions about the presence of network structure. Our spectral estimation approach is therefore a promising basis for detecting low-dimensional structure in real-world networks, or lack thereof. △ Less

Submitted 21 May, 2021; v1 submitted 15 January, 2019; originally announced January 2019.

Comments: 21 pages, 8 figures; supplemental note: 4 pages, 3 figures

arXiv:1811.06888 [pdf, other]

The MalSource Dataset: Quantifying Complexity and Code Reuse in Malware Development

Authors: Alejandro Calleja, Juan Tapiador, Juan Caballero

Abstract: During the last decades, the problem of malicious and unwanted software (malware) has surged in numbers and sophistication. Malware plays a key role in most of today's cyber attacks and has consolidated as a commodity in the underground economy. In this work, we analyze the evolution of malware from 1975 to date from a software engineering perspective. We analyze the source code of 456 samples fro… ▽ More During the last decades, the problem of malicious and unwanted software (malware) has surged in numbers and sophistication. Malware plays a key role in most of today's cyber attacks and has consolidated as a commodity in the underground economy. In this work, we analyze the evolution of malware from 1975 to date from a software engineering perspective. We analyze the source code of 456 samples from 428 unique families and obtain measures of their size, code quality, and estimates of the development costs (effort, time, and number of people). Our results suggest an exponential increment of nearly one order of magnitude per decade in aspects such as size and estimated effort, with code quality metrics similar to those of benign software.We also study the extent to which code reuse is present in our dataset. We detect a significant number of code clones across malware families and report which features and functionalities are more commonly shared. Overall, our results support claims about the increasing complexity of malware and its production progressively becoming an industry. △ Less

Submitted 16 November, 2018; originally announced November 2018.

Comments: To appear in IEEE Transactions on Information Forensics and Security

arXiv:1712.01751 [pdf, other]

Convolutional Recurrent Neural Networks for Dynamic MR Image Reconstruction

Authors: Chen Qin, Jo Schlemper, Jose Caballero, Anthony Price, Joseph V. Hajnal, Daniel Rueckert

Abstract: Accelerating the data acquisition of dynamic magnetic resonance imaging (MRI) leads to a challenging ill-posed inverse problem, which has received great interest from both the signal processing and machine learning community over the last decades. The key ingredient to the problem is how to exploit the temporal correlation of the MR sequence to resolve the aliasing artefact. Traditionally, such ob… ▽ More Accelerating the data acquisition of dynamic magnetic resonance imaging (MRI) leads to a challenging ill-posed inverse problem, which has received great interest from both the signal processing and machine learning community over the last decades. The key ingredient to the problem is how to exploit the temporal correlation of the MR sequence to resolve the aliasing artefact. Traditionally, such observation led to a formulation of a non-convex optimisation problem, which were solved using iterative algorithms. Recently, however, deep learning based-approaches have gained significant popularity due to its ability to solve general inversion problems. In this work, we propose a unique, novel convolutional recurrent neural network (CRNN) architecture which reconstructs high quality cardiac MR images from highly undersampled k-space data by jointly exploiting the dependencies of the temporal sequences as well as the iterative nature of the traditional optimisation algorithms. In particular, the proposed architecture embeds the structure of the traditional iterative algorithms, efficiently modelling the recurrence of the iterative reconstruction stages by using recurrent hidden connections over such iterations. In addition, spatiotemporal dependencies are simultaneously learnt by exploiting bidirectional recurrent hidden connections across time sequences. The proposed algorithm is able to learn both the temporal dependency and the iterative reconstruction process effectively with only a very small number of parameters, while outperforming current MR reconstruction methods in terms of computational complexity, reconstruction accuracy and speed. △ Less

Submitted 14 October, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

Comments: Published in IEEE Transactions on Medical Imaging

arXiv:1711.06045 [pdf, other]

Frame Interpolation with Multi-Scale Deep Loss Functions and Generative Adversarial Networks

Authors: Joost van Amersfoort, Wenzhe Shi, Alejandro Acosta, Francisco Massa, Johannes Totz, Zehan Wang, Jose Caballero

Abstract: Frame interpolation attempts to synthesise frames given one or more consecutive video frames. In recent years, deep learning approaches, and notably convolutional neural networks, have succeeded at tackling low- and high-level computer vision problems including frame interpolation. These techniques often tackle two problems, namely algorithm efficiency and reconstruction quality. In this paper, we… ▽ More Frame interpolation attempts to synthesise frames given one or more consecutive video frames. In recent years, deep learning approaches, and notably convolutional neural networks, have succeeded at tackling low- and high-level computer vision problems including frame interpolation. These techniques often tackle two problems, namely algorithm efficiency and reconstruction quality. In this paper, we present a multi-scale generative adversarial network for frame interpolation (\mbox{FIGAN}). To maximise the efficiency of our network, we propose a novel multi-scale residual estimation module where the predicted flow and synthesised frame are constructed in a coarse-to-fine fashion. To improve the quality of synthesised intermediate video frames, our network is jointly supervised at different levels with a perceptual loss function that consists of an adversarial and two content losses. We evaluate the proposed approach using a collection of 60fps videos from YouTube-8m. Our results improve the state-of-the-art accuracy and provide subjective visual quality comparable to the best performing interpolation method at x47 faster runtime. △ Less

Submitted 26 February, 2019; v1 submitted 16 November, 2017; originally announced November 2017.

arXiv:1710.05202 [pdf, other]

Malware Lineage in the Wild

Authors: Irfan Ul Haq, Sergio Chica, Juan Caballero, Somesh Jha

Abstract: Malware lineage studies the evolutionary relationships among malware and has important applications for malware analysis. A persistent limitation of prior malware lineage approaches is to consider every input sample a separate malware version. This is problematic since a majority of malware are packed and the packing process produces many polymorphic variants (i.e., executables with different file… ▽ More Malware lineage studies the evolutionary relationships among malware and has important applications for malware analysis. A persistent limitation of prior malware lineage approaches is to consider every input sample a separate malware version. This is problematic since a majority of malware are packed and the packing process produces many polymorphic variants (i.e., executables with different file hash) of the same malware version. Thus, many samples correspond to the same malware version and it is challenging to identify distinct malware versions from polymorphic variants. This problem does not manifest in prior malware lineage approaches because they work on synthetic malware, malware that are not packed, or packed malware for which unpackers are available. In this work, we propose a novel malware lineage approach that works on malware samples collected in the wild. Given a set of malware executables from the same family, for which no source code is available and which may be packed, our approach produces a lineage graph where nodes are versions of the family and edges describe the relationships between versions. To enable our malware lineage approach, we propose the first technique to identify the versions of a malware family and a scalable code indexing technique for determining shared functions between any pair of input samples. We have evaluated the accuracy of our approach on 13 open-source programs and have applied it to produce lineage graphs for 10 popular malware families. Our malware lineage graphs achieve on average a 26 times reduction from number of input samples to number of versions. △ Less

Submitted 14 October, 2017; originally announced October 2017.

arXiv:1707.02937 [pdf]

Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize

Authors: Andrew Aitken, Christian Ledig, Lucas Theis, Jose Caballero, Zehan Wang, Wenzhe Shi

Abstract: The most prominent problem associated with the deconvolution layer is the presence of checkerboard artifacts in output images and dense labels. To combat this problem, smoothness constraints, post processing and different architecture designs have been proposed. Odena et al. highlight three sources of checkerboard artifacts: deconvolution overlap, random initialization and loss functions. In this… ▽ More The most prominent problem associated with the deconvolution layer is the presence of checkerboard artifacts in output images and dense labels. To combat this problem, smoothness constraints, post processing and different architecture designs have been proposed. Odena et al. highlight three sources of checkerboard artifacts: deconvolution overlap, random initialization and loss functions. In this note, we proposed an initialization method for sub-pixel convolution known as convolution NN resize. Compared to sub-pixel convolution initialized with schemes designed for standard convolution kernels, it is free from checkerboard artifacts immediately after initialization. Compared to resize convolution, at the same computational complexity, it has more modelling power and converges to solutions with smaller test errors. △ Less

Submitted 10 July, 2017; originally announced July 2017.

arXiv:1705.08302 [pdf, other]

doi 10.1109/TMI.2017.2743464

Anatomically Constrained Neural Networks (ACNN): Application to Cardiac Image Enhancement and Segmentation

Authors: Ozan Oktay, Enzo Ferrante, Konstantinos Kamnitsas, Mattias Heinrich, Wenjia Bai, Jose Caballero, Stuart Cook, Antonio de Marvao, Timothy Dawes, Declan O'Regan, Bernhard Kainz, Ben Glocker, Daniel Rueckert

Abstract: Incorporation of prior knowledge about organ shape and location is key to improve performance of image analysis approaches. In particular, priors can be useful in cases where images are corrupted and contain artefacts due to limitations in image acquisition. The highly constrained nature of anatomical objects can be well captured with learning based techniques. However, in most recent and promisin… ▽ More Incorporation of prior knowledge about organ shape and location is key to improve performance of image analysis approaches. In particular, priors can be useful in cases where images are corrupted and contain artefacts due to limitations in image acquisition. The highly constrained nature of anatomical objects can be well captured with learning based techniques. However, in most recent and promising techniques such as CNN based segmentation it is not obvious how to incorporate such prior knowledge. State-of-the-art methods operate as pixel-wise classifiers where the training objectives do not incorporate the structure and inter-dependencies of the output. To overcome this limitation, we propose a generic training strategy that incorporates anatomical prior knowledge into CNNs through a new regularisation model, which is trained end-to-end. The new framework encourages models to follow the global anatomical properties of the underlying anatomy (e.g. shape, label structure) via learned non-linear representations of the shape. We show that the proposed approach can be easily adapted to different analysis tasks (e.g. image enhancement, segmentation) and improve the prediction accuracy of the state-of-the-art models. The applicability of our approach is shown on multi-modal cardiac datasets and public benchmarks. Additionally, we demonstrate how the learned deep models of 3D shapes can be interpreted and used as biomarkers for classification of cardiac pathologies. △ Less

Submitted 5 December, 2017; v1 submitted 22 May, 2017; originally announced May 2017.

Comments: Published in IEEE Transactions on Medical Imaging (Aug 2017)

arXiv:1704.02422 [pdf, other]

A Deep Cascade of Convolutional Neural Networks for Dynamic MR Image Reconstruction

Authors: Jo Schlemper, Jose Caballero, Joseph V. Hajnal, Anthony Price, Daniel Rueckert

Abstract: Inspired by recent advances in deep learning, we propose a framework for reconstructing dynamic sequences of 2D cardiac magnetic resonance (MR) images from undersampled data using a deep cascade of convolutional neural networks (CNNs) to accelerate the data acquisition process. In particular, we address the case where data is acquired using aggressive Cartesian undersampling. Firstly, we show that… ▽ More Inspired by recent advances in deep learning, we propose a framework for reconstructing dynamic sequences of 2D cardiac magnetic resonance (MR) images from undersampled data using a deep cascade of convolutional neural networks (CNNs) to accelerate the data acquisition process. In particular, we address the case where data is acquired using aggressive Cartesian undersampling. Firstly, we show that when each 2D image frame is reconstructed independently, the proposed method outperforms state-of-the-art 2D compressed sensing approaches such as dictionary learning-based MR image reconstruction, in terms of reconstruction error and reconstruction speed. Secondly, when reconstructing the frames of the sequences jointly, we demonstrate that CNNs can learn spatio-temporal correlations efficiently by combining convolution and data sharing approaches. We show that the proposed method consistently outperforms state-of-the-art methods and is capable of preserving anatomical structure more faithfully up to 11-fold undersampling. Moreover, reconstruction is very fast: each complete dynamic sequence can be reconstructed in less than 10s and, for the 2D case, each image frame can be reconstructed in 23ms, enabling real-time applications. △ Less

Submitted 23 November, 2017; v1 submitted 7 April, 2017; originally announced April 2017.

Comments: To be published in IEEE Transactions on Medical Imaging

arXiv:1703.00555 [pdf, other]

A Deep Cascade of Convolutional Neural Networks for MR Image Reconstruction

Authors: Jo Schlemper, Jose Caballero, Joseph V. Hajnal, Anthony Price, Daniel Rueckert

Abstract: The acquisition of Magnetic Resonance Imaging (MRI) is inherently slow. Inspired by recent advances in deep learning, we propose a framework for reconstructing MR images from undersampled data using a deep cascade of convolutional neural networks to accelerate the data acquisition process. We show that for Cartesian undersampling of 2D cardiac MR images, the proposed method outperforms the state-o… ▽ More The acquisition of Magnetic Resonance Imaging (MRI) is inherently slow. Inspired by recent advances in deep learning, we propose a framework for reconstructing MR images from undersampled data using a deep cascade of convolutional neural networks to accelerate the data acquisition process. We show that for Cartesian undersampling of 2D cardiac MR images, the proposed method outperforms the state-of-the-art compressed sensing approaches, such as dictionary learning-based MRI (DLMRI) reconstruction, in terms of reconstruction error, perceptual quality and reconstruction speed for both 3-fold and 6-fold undersampling. Compared to DLMRI, the error produced by the method proposed is approximately twice as small, allowing to preserve anatomical structures more faithfully. Using our method, each image can be reconstructed in 23 ms, which is fast enough to enable real-time applications. △ Less

Submitted 1 March, 2017; originally announced March 2017.

arXiv:1611.05250 [pdf, other]

Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation

Authors: Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, Wenzhe Shi

Abstract: Convolutional neural networks have enabled accurate image super-resolution in real-time. However, recent attempts to benefit from temporal correlations in video super-resolution have been limited to naive or inefficient architectures. In this paper, we introduce spatio-temporal sub-pixel convolution networks that effectively exploit temporal redundancies and improve reconstruction accuracy while m… ▽ More Convolutional neural networks have enabled accurate image super-resolution in real-time. However, recent attempts to benefit from temporal correlations in video super-resolution have been limited to naive or inefficient architectures. In this paper, we introduce spatio-temporal sub-pixel convolution networks that effectively exploit temporal redundancies and improve reconstruction accuracy while maintaining real-time speed. Specifically, we discuss the use of early fusion, slow fusion and 3D convolutions for the joint processing of multiple consecutive video frames. We also propose a novel joint motion compensation and video super-resolution algorithm that is orders of magnitude more efficient than competing methods, relying on a fast multi-resolution spatial transformer module that is end-to-end trainable. These contributions provide both higher accuracy and temporally more consistent videos, which we confirm qualitatively and quantitatively. Relative to single-frame models, spatio-temporal networks can either reduce the computational cost by 30% whilst maintaining the same quality or provide a 0.2dB gain for a similar computational cost. Results on publicly available datasets demonstrate that the proposed algorithms surpass current state-of-the-art performance in both accuracy and efficiency. △ Less

Submitted 10 April, 2017; v1 submitted 16 November, 2016; originally announced November 2016.

Comments: Changes: * Uploaded Vid4 results (footnote 1). * Added references [14, 29] as spatial-transformer prior art. * Fixed typos

arXiv:1610.04490 [pdf, other]

Amortised MAP Inference for Image Super-resolution

Authors: Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi, Ferenc Huszár

Abstract: Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A mor… ▽ More Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A more desirable approach would employ Maximum a Posteriori (MAP) inference, preferring solutions that always have a high probability under the image prior, and thus appear more plausible. Direct MAP estimation for SR is non-trivial, as it requires us to build a model for the image prior from samples. Furthermore, MAP inference is often performed via optimisation-based iterative algorithms which don't compare well with the efficiency of neural-network-based alternatives. Here we introduce new methods for amortised MAP inference whereby we calculate the MAP estimate directly using a convolutional neural network. We first introduce a novel neural network architecture that performs a projection to the affine subspace of valid SR solutions ensuring that the high resolution output of the network is always consistent with the low resolution input. We show that, using this architecture, the amortised MAP inference problem reduces to minimising the cross-entropy between two distributions, similar to training generative models. We propose three methods to solve this optimisation problem: (1) Generative Adversarial Networks (GAN) (2) denoiser-guided SR which backpropagates gradient-estimates from denoising to train the network, and (3) a baseline method using a maximum-likelihood-trained image prior. Our experiments show that the GAN based approach performs best on real image data. Lastly, we establish a connection between GANs and amortised variational inference as in e.g. variational autoencoders. △ Less

Submitted 21 February, 2017; v1 submitted 14 October, 2016; originally announced October 2016.

arXiv:1609.07009 [pdf]

Is the deconvolution layer the same as a convolutional layer?

Authors: Wenzhe Shi, Jose Caballero, Lucas Theis, Ferenc Huszar, Andrew Aitken, Christian Ledig, Zehan Wang

Abstract: In this note, we want to focus on aspects related to two questions most people asked us at CVPR about the network we presented. Firstly, What is the relationship between our proposed layer and the deconvolution layer? And secondly, why are convolutions in low-resolution (LR) space a better choice? These are key questions we tried to answer in the paper, but we were not able to go into as much dept… ▽ More In this note, we want to focus on aspects related to two questions most people asked us at CVPR about the network we presented. Firstly, What is the relationship between our proposed layer and the deconvolution layer? And secondly, why are convolutions in low-resolution (LR) space a better choice? These are key questions we tried to answer in the paper, but we were not able to go into as much depth and clarity as we would have liked in the space allowance. To better answer these questions in this note, we first discuss the relationships between the deconvolution layer in the forms of the transposed convolution layer, the sub-pixel convolutional layer and our efficient sub-pixel convolutional layer. We will refer to our efficient sub-pixel convolutional layer as a convolutional layer in LR space to distinguish it from the common sub-pixel convolutional layer. We will then show that for a fixed computational budget and complexity, a network with convolutions exclusively in LR space has more representation power at the same speed than a network that first upsamples the input in high resolution space. △ Less

Submitted 22 September, 2016; originally announced September 2016.

Comments: This is a note to share some additional insights for our the CVPR paper

arXiv:1609.05158 [pdf, other]

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

Authors: Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang

Abstract: Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolut… ▽ More Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods. △ Less

Submitted 23 September, 2016; v1 submitted 16 September, 2016; originally announced September 2016.

Comments: CVPR 2016 paper with updated affiliations and supplemental material, fixed typo in equation 4

arXiv:1609.04802 [pdf, other]

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Authors: Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi

Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. R… ▽ More Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method. △ Less

Submitted 25 May, 2017; v1 submitted 15 September, 2016; originally announced September 2016.

Comments: 19 pages, 15 figures, 2 tables, accepted for oral presentation at CVPR, main paper + some supplementary material

Showing 1–38 of 38 results for author: Caballero, J