Skip to main content

Showing 1–30 of 30 results for author: Sima, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11075  [pdf, other

    cs.DS cs.IT math.CO

    Perturbation-Resilient Trades for Dynamic Service Balancing

    Authors: ** Sima, Chao Pan, Olgica Milenkovic

    Abstract: A combinatorial trade is a pair of sets of blocks of elements that can be exchanged while preserving relevant subset intersection constraints. The class of balanced and swap-robust minimal trades was proposed in [1] for exchanging blocks of data chunks stored on distributed storage systems in an access- and load-balanced manner. More precisely, data chunks in the trades of interest are labeled by… ▽ More

    Submitted 21 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2303.12996

  2. arXiv:2402.08751  [pdf, other

    cs.CC cs.DM cs.LG cs.NE

    Nearest Neighbor Representations of Neural Circuits

    Authors: Kordag Mehmet Kilic, ** Sima, Jehoshua Bruck

    Abstract: Neural networks successfully capture the computational power of the human brain for many tasks. Similarly inspired by the brain architecture, Nearest Neighbor (NN) representations is a novel approach of computation. We establish a firmer correspondence between NN representations and neural networks. Although it was known how to represent a single neuron using NN representations, there were no resu… ▽ More

    Submitted 9 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: This paper is accepted to ISIT 2024. 2nd version has revisions for better clarity, more citations, and more explanation in the proofs. No results are changed

  3. arXiv:2402.08748  [pdf, ps, other

    cs.CC cs.DM cs.LG cs.NE

    Nearest Neighbor Representations of Neurons

    Authors: Kordag Mehmet Kilic, ** Sima, Jehoshua Bruck

    Abstract: The Nearest Neighbor (NN) Representation is an emerging computational model that is inspired by the brain. We study the complexity of representing a neuron (threshold function) using the NN representations. It is known that two anchors (the points to which NN is computed) are sufficient for a NN representation of a threshold function, however, the resolution (the maximum number of bits required fo… ▽ More

    Submitted 9 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: This paper is accepted to ISIT 2024. 2nd version had revisions for better clarity, fixing of typos. No results are changed

  4. arXiv:2402.00315  [pdf, ps, other

    cs.LG cs.CR cs.DS cs.IT

    Online Distribution Learning with Local Private Constraints

    Authors: ** Sima, Changlong Wu, Olgica Milenkovic, Wojciech Szpankowski

    Abstract: We study the problem of online conditional distribution estimation with \emph{unbounded} label sets under local differential privacy. Let $\mathcal{F}$ be a distribution-valued function class with unbounded label set. We aim at estimating an \emph{unknown} function $f\in \mathcal{F}$ in an online fashion so that at time $t$ when the context $\boldsymbol{x}_t$ is provided we can generate an estimat… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  5. arXiv:2401.15520  [pdf, ps, other

    cs.LG stat.ML

    Oracle-Efficient Hybrid Online Learning with Unknown Distribution

    Authors: Changlong Wu, ** Sima, Wojciech Szpankowski

    Abstract: We study the problem of oracle-efficient hybrid online learning when the features are generated by an unknown i.i.d. process and the labels are generated adversarially. Assuming access to an (offline) ERM oracle, we show that there exists a computationally efficient online predictor that achieves a regret upper bounded by $\tilde{O}(T^{\frac{3}{4}})$ for a finite-VC class, and upper bounded by… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

  6. arXiv:2310.03897  [pdf, other

    cs.IT

    Break-Resilient Codes for Forensic 3D Fingerprinting

    Authors: Canran Wang, ** Sima, Netanel Raviv

    Abstract: 3D printing brings about a revolution in consumption and distribution of goods, but poses a significant risk to public safety. Any individual with internet access and a commodity printer can now produce untraceable firearms, keys, and dangerous counterfeit products. To aid government authorities in combating these new security threats, objects are often tagged with identifying information. This in… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  7. arXiv:2310.01729  [pdf, other

    cs.IT

    Error Correction for DNA Storage

    Authors: ** Sima, Netanel Raviv, Moshe Schwartz, Jehoshua Bruck

    Abstract: DNA-based storage is an emerging storage technology that provides high information density and long duration. Due to the physical constraints in the reading and writing processes, error correction in DNA storage poses several interesting coding theoretic challenges, some of which are new. In this paper, we give a brief introduction to some of the coding challenges for DNA-based storage, including… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  8. arXiv:2308.07793  [pdf, ps, other

    cs.IT

    Robust Indexing for the Sliced Channel: Almost Optimal Codes for Substitutions and Deletions

    Authors: ** Sima, Netanel Raviv, Jehoshua Bruck

    Abstract: Encoding data as a set of unordered strings is receiving great attention as it captures one of the basic features of DNA storage systems. However, the challenge of constructing optimal redundancy codes for this channel remained elusive. In this paper, we address this problem and present an order-wise optimal construction of codes that are capable of correcting multiple substitution, deletion, and… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  9. arXiv:2308.06895  [pdf, other

    cs.LG cs.CR cs.DC

    Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls

    Authors: Saurav Prakash, ** Sima, Chao Pan, Eli Chien, Olgica Milenkovic

    Abstract: Hierarchical and tree-like data sets arise in many applications, including language processing, graph data mining, phylogeny and genomics. It is known that tree-like data cannot be embedded into Euclidean spaces of finite dimension with small distortion. This problem can be mitigated through the use of hyperbolic spaces. When such data also has to be processed in a distributed and privatized setti… ▽ More

    Submitted 16 January, 2024; v1 submitted 13 August, 2023; originally announced August 2023.

    Comments: Published in the Transactions on Machine Learning Research (TMLR). Link: https://openreview.net/forum?id=umggDfMHha

  10. arXiv:2305.05808  [pdf, ps, other

    cs.CC cs.DM cs.IT cs.LG cs.NE

    On the Information Capacity of Nearest Neighbor Representations

    Authors: Kordag Mehmet Kilic, ** Sima, Jehoshua Bruck

    Abstract: The $\textit{von Neumann Computer Architecture}$ has a distinction between computation and memory. In contrast, the brain has an integrated architecture where computation and memory are indistinguishable. Motivated by the architecture of the brain, we propose a model of $\textit{associative computation}$ where memory is defined by a set of vectors in $\mathbb{R}^n$ (that we call… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: The conference version is submitted to and accepted by ISIT 2023

  11. arXiv:2304.01365  [pdf, ps, other

    cs.IT

    Finding a Burst of Positives via Nonadaptive Semiquantitative Group Testing

    Authors: Yun-Han Li, Ryan Gabrys, ** Sima, Ilan Shomorony, Olgica Milenkovic

    Abstract: Motivated by testing for pathogenic diseases we consider a new nonadaptive group testing problem for which: (1) positives occur within a burst, capturing the fact that infected test subjects often come in clusters, and (2) that the test outcomes arise from semiquantitative measurements that provide coarse information about the number of positives in any tested group. Our model generalizes prior wo… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  12. arXiv:2303.12996  [pdf, other

    cs.IT cs.DM

    Perturbation-Resilient Sets for Dynamic Service Balancing

    Authors: ** Sima, Chao Pan, Olgica Milenkovic

    Abstract: Balanced and swap-robust minimal trades, introduced in [1], are important for studying the balance and stability of server access request protocols under data popularity changes. Constructions of such trades have so far relied on paired sets obtained through iterative combining of smaller sets that have provable stability guarantees, coupled with exhaustive computer search. Currently, there exists… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  13. arXiv:2303.12990  [pdf, ps, other

    cs.IT cs.DM

    On Constant-Weight Binary $B_2$-Sequences

    Authors: ** Sima, Yun-Han Li, Ilan Shomorony, Olgica Milenkovic

    Abstract: Motivated by applications in polymer-based data storage we introduced the new problem of characterizing the code rate and designing constant-weight binary $B_2$-sequences. Binary $B_2$-sequences are collections of binary strings of length $n$ with the property that the real-valued sums of all distinct pairs of strings are distinct. In addition to this defining property, constant-weight binary… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  14. arXiv:2210.16424  [pdf, other

    cs.LG cs.CR

    Machine Unlearning of Federated Clusters

    Authors: Chao Pan, ** Sima, Saurav Prakash, Vishal Rana, Olgica Milenkovic

    Abstract: Federated clustering (FC) is an unsupervised learning problem that arises in a number of practical applications, including personalized recommender and healthcare systems. With the adoption of recent laws ensuring the "right to be forgotten", the problem of machine unlearning for FC methods has become of significant importance. We introduce, for the first time, the problem of machine unlearning fo… ▽ More

    Submitted 30 June, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: 27 pages. ICLR 2023

  15. arXiv:2210.11818  [pdf, ps, other

    cs.IT

    Non-binary Codes for Correcting a Burst of at Most t Deletions

    Authors: Shuche Wang, Yuanyuan Tang, ** Sima, Ryan Gabrys, Farzad Farnoud

    Abstract: The problem of correcting deletions has received significant attention, partly because of the prevalence of these errors in DNA data storage. In this paper, we study the problem of correcting a consecutive burst of at most $t$ deletions in non-binary sequences. We first propose a non-binary code correcting a burst of at most 2 deletions for $q$-ary alphabets. Afterwards, we extend this result to t… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: 20 pages. The paper has been submitted to IEEE Transactions on Information Theory. Furthermore, the paper was presented in part at the ISIT2021 and Allerton2022

  16. arXiv:2207.08372  [pdf, ps, other

    cs.IT

    Correcting $k$ Deletions and Insertions in Racetrack Memory

    Authors: ** Sima, Jehoshua Bruck

    Abstract: One of the main challenges in develo** racetrack memory systems is the limited precision in controlling the track shifts, that in turn affects the reliability of reading and writing the data. A current proposal for combating deletions in racetrack memories is to use redundant heads per-track resulting in multiple copies (potentially erroneous) and recovering the data by solving a specialized ver… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  17. arXiv:2205.08032  [pdf, ps, other

    cs.CC cs.DM cs.IT cs.LG cs.NE

    On Algebraic Constructions of Neural Networks with Small Weights

    Authors: Kordag Mehmet Kilic, ** Sima, Jehoshua Bruck

    Abstract: Neural gates compute functions based on weighted sums of the input variables. The expressive power of neural gates (number of distinct functions it can compute) depends on the weight sizes and, in general, large weights (exponential in the number of inputs) are required. Studying the trade-offs among the weight sizes, circuit size and depth is a well-studied topic both in circuit complexity theory… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

  18. arXiv:2102.10416  [pdf, ps, other

    cs.FL

    Simplest Non-Regular Deterministic Context-Free Language

    Authors: Petr Jancar, Jiri Sima

    Abstract: We introduce a new notion of C-simple problems for a class C of decision problems (i.e. languages), w.r.t. a particular reduction. A problem is C-simple if it can be reduced to each problem in C. This can be viewed as a conceptual counterpart to C-hard problems to which all problems in C reduce. Our concrete example is the class of non-regular deterministic context-free languages (DCFL'), with a t… ▽ More

    Submitted 20 February, 2021; originally announced February 2021.

  19. arXiv:2102.05372  [pdf, ps, other

    cs.IT math.CO math.PR

    Trace Reconstruction with Bounded Edit Distance

    Authors: ** Sima, Jehoshua Bruck

    Abstract: The trace reconstruction problem studies the number of noisy samples needed to recover an unknown string $\boldsymbol{x}\in\{0,1\}^n$ with high probability, where the samples are independently obtained by passing $\boldsymbol{x}$ through a random deletion channel with deletion probability $q$. The problem is receiving significant attention recently due to its applications in DNA sequencing and DNA… ▽ More

    Submitted 14 April, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

  20. arXiv:2102.01633  [pdf, other

    cs.NE cs.FL

    Stronger Separation of Analog Neuron Hierarchy by Deterministic Context-Free Languages

    Authors: Jiří Šíma

    Abstract: We analyze the computational power of discrete-time recurrent neural networks (NNs) with the saturated-linear activation function within the Chomsky hierarchy. This model restricted to integer weights coincides with binary-state NNs with the Heaviside activation function, which are equivalent to finite automata (Chomsky level 3) recognizing regular languages (REG), while rational weights make this… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: 30 pages, 4 figures

  21. A polynomial-time construction of a hitting set for read-once branching programs of width 3

    Authors: Jiří Šíma, Stanislav Žák

    Abstract: Recently, an interest in constructing pseudorandom or hitting set generators for restricted branching programs has increased, which is motivated by the fundamental issue of derandomizing space-bounded computations. Such constructions have been known only in the case of width 2 and in very restricted cases of bounded width. In this paper, we characterize the hitting sets for read-once branching pro… ▽ More

    Submitted 7 February, 2022; v1 submitted 4 January, 2021; originally announced January 2021.

    Comments: 48 pages, 10 figures

    Journal ref: Fundamenta Informaticae, Volume 184, Issue 4 (March 10, 2022) fi:7043

  22. arXiv:1910.12247  [pdf, ps, other

    cs.IT

    Optimal $k$-Deletion Correcting Codes

    Authors: ** Sima, Jehoshua Bruck

    Abstract: Levenshtein introduced the problem of constructing $k$-deletion correcting codes in 1966, proved that the optimal redundancy of those codes is $O(k\log N)$, and proposed an optimal redundancy single-deletion correcting code (using the so-called VT construction). However, the problem of constructing optimal redundancy $k$-deletion correcting codes remained open. Our key contribution is a solution t… ▽ More

    Submitted 27 October, 2019; originally announced October 2019.

  23. arXiv:1809.02716  [pdf, ps, other

    cs.IT

    On Coding over Sliced Information

    Authors: ** Sima, Netanel Raviv, Jehoshua Bruck

    Abstract: The interest in channel models in which the data is sent as an unordered set of binary strings has increased lately, due to emerging applications in DNA storage, among others. In this paper we analyze the minimal redundancy of binary codes for this channel under substitution errors, and provide several constructions, some of which are shown to be asymptotically optimal up to constants. The surpris… ▽ More

    Submitted 27 October, 2019; v1 submitted 7 September, 2018; originally announced September 2018.

  24. arXiv:1806.09240  [pdf, ps, other

    cs.IT

    Two Deletion Correcting Codes from Indicator Vectors

    Authors: ** Sima, Netanel Raviv, Jehoshua Bruck

    Abstract: Construction of capacity achieving deletion correcting codes has been a baffling challenge for decades. A recent breakthrough by Brakensiek $et~al$., alongside novel applications in DNA storage, have reignited the interest in this longstanding open problem. In spite of recent advances, the amount of redundancy in existing codes is still orders of magnitude away from being optimal. In this paper, a… ▽ More

    Submitted 24 June, 2018; originally announced June 2018.

  25. arXiv:1804.00702  [pdf, other

    cs.DC

    ROLP: Runtime Object Lifetime Profiling for Big Data Memory Management

    Authors: Rodrigo Bruno, Duarte Patrício, José Simão, Luís Veiga, Paulo Ferreira

    Abstract: Low latency services such as credit-card fraud detection and website targeted advertisement rely on Big Data platforms (e.g., Lucene, Graphchi, Cassandra) which run on top of memory managed runtimes, such as the JVM. These platforms, however, suffer from unpredictable and unacceptably high pause times due to inadequate memory management decisions (e.g., allocating objects with very different lifet… ▽ More

    Submitted 9 March, 2018; originally announced April 2018.

  26. arXiv:1704.03324  [pdf, other

    cs.PL cs.DC

    Gang-GC: Locality-aware Parallel Data Placement Optimizations for Key-Value Storages

    Authors: Duarte Patrício, José Simão, Luís Veiga

    Abstract: Many cloud applications rely on fast and non-relational storage to aid in the processing of large amounts of data. Managed runtimes are now widely used to support the execution of several storage solutions of the NoSQL movement, particularly when dealing with big data key-value store-driven applications. The benefits of these runtimes can however be limited by modern parallel throughput-oriented G… ▽ More

    Submitted 11 April, 2017; originally announced April 2017.

    Report number: INESC-ID Tec. Rep. 5/2017, Feb 2017

  27. arXiv:1701.03507  [pdf, other

    cs.DC cs.SE

    Beyond NGS data sharing and towards open science

    Authors: Bruno Dantas, Calmenelias Fleitas, Alexandre P. Francisco, José Simão, Cátia Vaz

    Abstract: Biosciences have been revolutionized by next generation sequencing (NGS) technologies in last years, leading to new perspectives in medical, industrial and environmental applications. And although our motivation comes from biosciences, the following is true for many areas of science: published results are usually hard to reproduce either because data is not available or tools are not readily avail… ▽ More

    Submitted 11 November, 2016; originally announced January 2017.

    Comments: 19 pages, 10 figures

  28. arXiv:1601.05917  [pdf, other

    cs.IT

    Polar Codes for Broadcast Channels with Receiver Message Side Information and Noncausal State Available at the Encoder

    Authors: ** Sima, Wei Chen

    Abstract: In this paper polar codes are proposed for two receiver broadcast channels with receiver message side information (BCSI) and noncausal state available at the encoder, referred to as BCSI with noncausal state for short, where the two receivers know a priori the private messages intended for each other. This channel generalizes BCSI with common message and Gelfand-Pinsker problem and has application… ▽ More

    Submitted 22 January, 2016; originally announced January 2016.

    Comments: 22 pages, 7 figures

  29. arXiv:1407.8409  [pdf, other

    cs.IT

    Joint Network and Gelfand-Pinsker Coding for 3-Receiver Gaussian Broadcast Channels with Receiver Message Side Information

    Authors: ** Sima, Wei Chen

    Abstract: The problem of characterizing the capacity region for Gaussian broadcast channels with receiver message side information appears difficult and remains open for N >= 3 receivers. This paper proposes a joint network and Gelfand-Pinsker coding method for 3-receiver cases. Using the method, we establish a unified inner bound on the capacity region of 3-receiver Gaussian broadcast channels under genera… ▽ More

    Submitted 19 August, 2014; v1 submitted 31 July, 2014; originally announced July 2014.

    Comments: Author's final version (presented at the 2014 IEEE International Symposium on Information Theory [ISIT 2014])

  30. arXiv:cs/0506100  [pdf, ps, other

    cs.CC

    On the NP-Completeness of Some Graph Cluster Measures

    Authors: Jiri Sima, Satu Elisa Schaeffer

    Abstract: Graph clustering is the problem of identifying sparsely connected dense subgraphs (clusters) in a given graph. Proposed clustering algorithms usually optimize various fitness functions that measure the quality of a cluster within the graph. Examples of such cluster measures include the conductance, the local and relative densities, and single cluster editing. We prove that the decision problems… ▽ More

    Submitted 29 June, 2005; originally announced June 2005.

    Comments: 9 pages, no figures