Skip to main content

Showing 1–2 of 2 results for author: Li, L W

.
  1. arXiv:2304.09082  [pdf, other

    cs.PL cs.CL cs.IR

    Unsupervised clustering of file dialects according to monotonic decompositions of mixtures

    Authors: Michael Robinson, Tate Altman, Denley Lam, Letitia W. Li

    Abstract: This paper proposes an unsupervised classification method that partitions a set of files into non-overlap** dialects based upon their behaviors, determined by messages produced by a collection of programs that consume them. The pattern of messages can be used as the signature of a particular kind of behavior, with the understanding that some messages are likely to co-occur, while others are not.… ▽ More

    Submitted 9 February, 2023; originally announced April 2023.

    MSC Class: 62P30 (Primary); 06A07 (Secondary) ACM Class: D.3.4

  2. Statistical detection of format dialects using the weighted Dowker complex

    Authors: Michael Robinson, Letitia W. Li, Cory Anderson, Steve Huntsman

    Abstract: This paper provides an experimentally validated, probabilistic model of file behavior when consumed by a set of pre-existing parsers. File behavior is measured by way of a standardized set of Boolean "messages" produced as the files are read. By thresholding the posterior probability that a file exhibiting a particular set of messages is from a particular dialect, our model yields a practical clas… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

    Comments: 15 pages, 11 figures, 5 tables

    MSC Class: 62P30; 55U10 ACM Class: D.3.4