-
CowScreeningDB: A public benchmark dataset for lameness detection in dairy cows
Authors:
Shahid Ismail,
Moises Diaz,
Cristina Carmona-Duarte,
Jose Manuel Vilar,
Miguel A. Ferrer
Abstract:
Lameness is one of the costliest pathological problems affecting dairy animals. It is usually assessed by trained veterinary clinicians who observe features such as gait symmetry or gait parameters as step counts in real-time. With the development of artificial intelligence, various modular systems have been proposed to minimize subjectivity in lameness assessment. However, the major limitation in…
▽ More
Lameness is one of the costliest pathological problems affecting dairy animals. It is usually assessed by trained veterinary clinicians who observe features such as gait symmetry or gait parameters as step counts in real-time. With the development of artificial intelligence, various modular systems have been proposed to minimize subjectivity in lameness assessment. However, the major limitation in their development is the unavailability of a public dataset which is currently either commercial or privately held. To tackle this limitation, we have introduced CowScreeningDB which was created using sensory data. This dataset was sourced from 43 cows at a dairy located in Gran Canaria, Spain. It consists of a multi-sensor dataset built on data collected using an Apple Watch 6 during the normal daily routine of a dairy cow. Thanks to the collection environment, sampling technique, information regarding the sensors, the applications used for data conversion and storage make the dataset a transparent one. This transparency of data can thus be used for further development of techniques for lameness detection for dairy cows which can be objectively compared. Aside from the public sharing of the dataset, we have also shared a machine-learning technique which classifies the caws in healthy and lame by using the raw sensory data. Hence validating the major objective which is to establish the relationship between sensor data and lameness.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Dynamics-informed deconvolutional neural networks for super-resolution identification of regime changes in epidemiological time series
Authors:
Jose M. G. Vilar,
Leonor Saiz
Abstract:
Inferring the timing and amplitude of perturbations in epidemiological systems from their stochastically spread low-resolution outcomes is as relevant as challenging. It is a requirement for current approaches to overcome the need to know the details of the perturbations to proceed with the analyses. However, the general problem of connecting epidemiological curves with the underlying incidence la…
▽ More
Inferring the timing and amplitude of perturbations in epidemiological systems from their stochastically spread low-resolution outcomes is as relevant as challenging. It is a requirement for current approaches to overcome the need to know the details of the perturbations to proceed with the analyses. However, the general problem of connecting epidemiological curves with the underlying incidence lacks the highly effective methodology present in other inverse problems, such as super-resolution and dehazing from computer vision. Here, we develop an unsupervised physics-informed convolutional neural network approach in reverse to connect death records with incidence that allows the identification of regime changes at single-day resolution. Applied to COVID-19 data with proper regularization and model-selection criteria, the approach can identify the implementation and removal of lockdowns and other nonpharmaceutical interventions with 0.93-day accuracy over the time span of a year.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
Speeding Hirschberg Algorithm for Sequence Alignment
Authors:
David Llorens,
Juan Miguel Vilar
Abstract:
The use of Hirschberg algorithm reduces the spatial cost of recovering the Longest Common Subsequence to linear space. The same technique can be applied to similar problems like Sequence Alignment. However, the price to pay is a duplication of temporal cost. We present here a technique to reduce this time overhead to a negligible amount.
The use of Hirschberg algorithm reduces the spatial cost of recovering the Longest Common Subsequence to linear space. The same technique can be applied to similar problems like Sequence Alignment. However, the price to pay is a duplication of temporal cost. We present here a technique to reduce this time overhead to a negligible amount.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Winning the Big Data Technologies Horizon Prize: Fast and reliable forecasting of electricity grid traffic by identification of recurrent fluctuations
Authors:
Jose M. G. Vilar
Abstract:
This paper provides a description of the approach and methodology I used in winning the European Union Big Data Technologies Horizon Prize on data-driven prediction of electricity grid traffic. The methodology relies on identifying typical short-term recurrent fluctuations, which is subsequently refined through a regression-of-fluctuations approach. The key points and strategic considerations that…
▽ More
This paper provides a description of the approach and methodology I used in winning the European Union Big Data Technologies Horizon Prize on data-driven prediction of electricity grid traffic. The methodology relies on identifying typical short-term recurrent fluctuations, which is subsequently refined through a regression-of-fluctuations approach. The key points and strategic considerations that led to selecting or discarding different methodological aspects are also discussed. The criteria include adaptability to changing conditions, reliability with outliers and missing data, robustness to noise, and efficiency in implementation.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
Accurate prediction of gene expression by integration of DNA sequence statistics with detailed modeling of transcription regulation
Authors:
Jose M. G. Vilar
Abstract:
Gene regulation involves a hierarchy of events that extend from specific protein-DNA interactions to the combinatorial assembly of nucleoprotein complexes. The effects of DNA sequence on these processes have typically been studied based either on its quantitative connection with single-domain binding free energies or on empirical rules that combine different DNA motifs to predict gene expression t…
▽ More
Gene regulation involves a hierarchy of events that extend from specific protein-DNA interactions to the combinatorial assembly of nucleoprotein complexes. The effects of DNA sequence on these processes have typically been studied based either on its quantitative connection with single-domain binding free energies or on empirical rules that combine different DNA motifs to predict gene expression trends on a genomic scale. The middle-point approach that quantitatively bridges these two extremes, however, remains largely unexplored. Here, we provide an integrated approach to accurately predict gene expression from statistical sequence information in combination with detailed biophysical modeling of transcription regulation by multidomain binding on multiple DNA sites. For the regulation of the prototypical lac operon, this approach predicts within 0.3-fold accuracy transcriptional activity over a 10,000-fold range from DNA sequence statistics for different intracellular conditions.
△ Less
Submitted 16 December, 2010;
originally announced December 2010.
-
CplexA: a Mathematica package to study macromolecular-assembly control of gene expression
Authors:
J. M. G. Vilar,
L. Saiz
Abstract:
Summary: Macromolecular assembly vertebrates essential cellular processes, such as gene regulation and signal transduction. A major challenge for conventional computational methods to study these processes is tackling the exponential increase of the number of configurational states with the number of components. CplexA is a Mathematica package that uses functional programming to efficiently comput…
▽ More
Summary: Macromolecular assembly vertebrates essential cellular processes, such as gene regulation and signal transduction. A major challenge for conventional computational methods to study these processes is tackling the exponential increase of the number of configurational states with the number of components. CplexA is a Mathematica package that uses functional programming to efficiently compute probabilities and average properties over such exponentially large number of states from the energetics of the interactions. The package is particularly suited to study gene expression at complex promoters controlled by multiple, local and distal, DNA binding sites for transcription factors. Availability: CplexA is freely available together with documentation at http://sourceforge.net/projects/cplexa/.
△ Less
Submitted 6 January, 2013; v1 submitted 4 November, 2010;
originally announced November 2010.