Publications

Topics:
  1. J. Hermanns, A. Tsitsulin, M. Munkhoeva, A. M. Bronstein, D. Mottin, P. Karras, GRASP: Graph Alignment through Spectral Signatures, Proc. Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, 2022 details

    GRASP: Graph Alignment through Spectral Signatures

    J. Hermanns, A. Tsitsulin, M. Munkhoeva, A. M. Bronstein, D. Mottin, P. Karras
    Proc. Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, 2022

    What is the best way to match the nodes of two graphs? This graph alignment problem generalizes graph isomorphism and arises in applications from social network analysis to bioinformatics. Some solutions assume that auxiliary information on known matches or node or edge attributes is available, or utilize arbitrary graph features. Such methods fare poorly in the pure form of the problem, in which only graph structures are given. Other proposals translate the problem to one of aligning node embeddings, yet, by doing so, provide only a single-scale view of the graph. In this paper, we transfer the shape-analysis concept of functional maps from the continuous to the discrete case, and treat the graph alignment problem as a special case of the problem of finding a mapping between functions on graphs. We present GRASP, a method that first establishes a correspondence between functions derived from Laplacian matrix eigenvectors, which capture multiscale structural characteristics, and then exploits this correspondence to align nodes. Our experimental study, featuring noise levels higher than anything used in previous studies, shows that GRASP outperforms state-of-the-art methods for graph alignment across noise levels and graph types.

    P. Kang, Z. Lin, Z. Yang, A. M. Bronstein, Q. Li, W. Liu, Deep fused two-step cross-modal hashing with multiple semantic supervision, Multimedia Tools and Applications, 2022 details

    Deep fused two-step cross-modal hashing with multiple semantic supervision

    P. Kang, Z. Lin, Z. Yang, A. M. Bronstein, Q. Li, W. Liu
    Multimedia Tools and Applications, 2022

    Existing cross-modal hashing methods ignore the informative multimodal joint information and cannot fully exploit the semantic labels. In this paper, we propose a deep fused two-step cross-modal hashing (DFTH) framework with multiple semantic supervision. In the first step, DFTH learns unified hash codes for instances by a fusion network. Semantic label and similarity reconstruction have been introduced to acquire binary codes that are informative, discriminative and semantic similarity preserving. In the second step, two modality-specific hash networks are learned under the supervision of common hash codes reconstruction, label reconstruction, and intra-modal and inter-modal semantic similarity reconstruction. The modality-specific hash networks can generate semantic preserving binary codes for out-of-sample queries. To deal with the vanishing gradients of binarization, continuous differentiable tanh is introduced to approximate the discrete sign function, making the networks able to back-propagate by automatic gradient computation. Extensive experiments on MIRFlickr25K and NUS-WIDE show the superiority of DFTH over state-of-the-art methods.

    P. Kang, Z. Lin, Z. Yang, X. Fang, A. M. Bronstein, Q. Li, W. Liu, Intra-class low-rank regularization for supervised and semi-supervised cross-modal retrieval, Applied Intelligence, 52(1), pp. 33-54, 2022 details

    Intra-class low-rank regularization for supervised and semi-supervised cross-modal retrieval

    P. Kang, Z. Lin, Z. Yang, X. Fang, A. M. Bronstein, Q. Li, W. Liu
    Applied Intelligence, 52(1), pp. 33-54, 2022

    Cross-modal retrieval aims to retrieve related items across different modalities, for example, using an image query to retrieve related text. The existing deep methods ignore both the intra-modal and inter-modal intra-class low-rank structures when fusing various modalities, which decreases the retrieval performance. In this paper, two deep models (denoted as ILCMR and Semi-ILCMR) based on intra-class low-rank regularization are proposed for supervised and semi-supervised cross-modal retrieval, respectively. Specifically, ILCMR integrates the image network and text network into a unified framework to learn a common feature space by imposing three regularization terms to fuse the cross-modal data. First, to align them in the label space, we utilize semantic consistency regularization to convert the data representations to probability distributions over the classes. Second, we introduce an intra-modal low-rank regularization, which encourages the intra-class samples that originate from the same space to be more relevant in the common feature space. Third, an inter-modal low-rank regularization is applied to reduce the cross-modal discrepancy. To enable the low-rank regularization to be optimized using automatic gradients during network back-propagation, we propose the rank-r approximation and specify the explicit gradients for theoretical completeness. In addition to the three regularization terms that rely on label information incorporated by ILCMR, we propose Semi-ILCMR in the semi-supervised regime, which introduces a low-rank constraint before projecting the general representations into the common feature space. Extensive experiments on four public cross-modal datasets demonstrate the superiority of ILCMR and Semi-ILCMR over other state-of-the-art methods.

    Y. Nemcovsky, M. Jacoby, A. M. Bronstein, C. Baskin, Physical passive patch adversarial attacks on visual odometry systems, Proc. ACCV, 2022 details

    Physical passive patch adversarial attacks on visual odometry systems

    Y. Nemcovsky, M. Jacoby, A. M. Bronstein, C. Baskin
    Proc. ACCV, 2022

    Deep neural networks are known to be susceptible to adversarial perturbations — small perturbations that alter the output of the network and exist under strict norm limitations. While such perturbations are usually discussed as tailored to a specific input, a universal perturbation can be constructed to alter the model’s output on a set of inputs. Universal perturbations present a more realistic case of adversarial attacks, as awareness of the model’s exact input is not required. In addition, the universal attack setting raises the subject of generalization to unseen data, where given a set of inputs, the universal perturbations aim to alter the model’s output on out-of-sample data. In this work, we study physical passive patch adversarial attacks on visual odometry-based autonomous navigation systems. A visual odometry system aims to infer the relative camera motion between two corresponding viewpoints, and is frequently used by vision-based autonomous navigation systems to estimate their state. For such navigation systems, a patch adversarial perturbation poses a severe security issue, as it can be used to mislead a system onto some collision course. To the best of our knowledge, we show for the first time that the error margin of a visual odometry model can be significantly increased by deploying patch adversarial attacks in the scene. We provide evaluation on synthetic closed-loop drone navigation data and demonstrate that a comparable vulnerability exists in real data.

    L. Ackerman-Schraier, A. A. Rosenberg, A. Marx, A. M. Bronstein, Machine learning approaches demonstrate that protein structures carry information about their genetic coding, Nature Scientific Reports, 2022 details

    Machine learning approaches demonstrate that protein structures carry information about their genetic coding

    L. Ackerman-Schraier, A. A. Rosenberg, A. Marx, A. M. Bronstein
    Nature Scientific Reports, 2022

    Synonymous codons translate into the same amino acid. Although the identity of synonymous codons is often considered
    inconsequential to the final protein structure there is mounting evidence for an association between the two. Our study
    examined this association using regression and classification models, finding that codon sequences predict protein backbone dihedral angles with a lower error than amino acid sequences, and that models trained with true dihedral angles have better classification of synonymous codons given structural information than models trained with random dihedral angles. Using this classification approach, we investigated local codon-codon dependencies and tested whether synonymous codon identity can be predicted more accurately from codon context than amino acid context alone, and most specifically which codon context position carries the most predictive power.

    A. A. Rosenberg, N. Yehishalom, A. Marx, A. M. Bronstein, Defining amino acid pairs as structural units suggests mutation sensitivity to adjacent residues, biorXiv/2022/513383, 2022 details

    Defining amino acid pairs as structural units suggests mutation sensitivity to adjacent residues

    A. A. Rosenberg, N. Yehishalom, A. Marx, A. M. Bronstein
    biorXiv/2022/513383, 2022

    Proteins fold from chains of amino acids, forming secondary structures, α-helices and β-strands, that, at least for globular proteins, subsequently fold into a three-dimensional structure. A large-scale analysis of high-resolution protein structures suggests that amino acid pairs constitute another layer of ordered structure, more local than these conventionally defined secondary structures. We develop a cross-peptide-bond Ramachandran plot that captures the 15 conformational preferences of the amino acid pairs and show that the effect of a particular mutation on the stability of a protein depends in a predictable manner on the adjacent amino acid context.

    A. Rosenberg, A. Marx, A. M. Bronstein, Codon-specific Ramachandran plots show amino acid backbone conformation depends on identity of the translated codon, Nature Communications, 2022 details

    Codon-specific Ramachandran plots show amino acid backbone conformation depends on identity of the translated codon

    A. Rosenberg, A. Marx, A. M. Bronstein
    Nature Communications, 2022

    Synonymous codons translate into chemically identical amino acids. Once considered inconsequential to the formation of the protein product, there is now significant evidence to suggest that codon usage affects co-translational protein folding and the final structure of the expressed protein. Here we develop a method for computing and comparing codon-specific Ramachandran plots and demonstrate that the backbone dihedral angle distributions of some synonymous codons are distinguishable with statistical significance for some secondary structures. This shows that there exists a dependence between codon identity and backbone torsion of the translated amino acid. Although these findings cannot pinpoint the causal direction of this dependence, we discuss the vast biological implications should coding be shown to directly shape protein conformation and demonstrate the usefulness of this method as a tool for probing associations between codon usage and protein structure. Finally, we urge for the inclusion of exact genetic information into structural databases.

    E. Rozenberg, A. Karnieli, O. Yesharim, J. Foley-Comer, S. Trajtenberg-Mills, D. Freedman, A. M. Bronstein, A. Arie, Inverse design of spontaneous parametric downconversion for generation of high-dimensional qudits, Optica 9, 602-615, 2022 details

    Inverse design of spontaneous parametric downconversion for generation of high-dimensional qudits

    E. Rozenberg, A. Karnieli, O. Yesharim, J. Foley-Comer, S. Trajtenberg-Mills, D. Freedman, A. M. Bronstein, A. Arie
    Optica 9, 602-615, 2022

    Spontaneous parametric down-conversion in quantum optics is an invaluable resource for the realization of high-dimensional qudits with spatial modes of light. One of the main open challenges is how to directly generate a desirable qudit state in the SPDC process. This problem can be addressed through advanced computational learning methods; however, due to difficulties in modeling the SPDC process by a fully differentiable algorithm that takes into account all interaction effects, progress has been limited. Here, we overcome these limitations and introduce a physically-constrained and differentiable model, validated against experimental results for shaped pump beams and structured crystals, capable of learning every interaction parameter in the process. We avoid any restrictions induced by the stochastic nature of our physical model and integrate the dynamic equations governing the evolution under the SPDC Hamiltonian. We solve the inverse problem of designing a nonlinear quantum optical system that achieves the desired quantum state of down-converted photon pairs. The desired states are defined using either the second-order correlations between different spatial modes or by specifying the required density matrix. By learning nonlinear volume holograms as well as different pump shapes, we successfully show how to generate maximally entangled states. Furthermore, we simulate all-optical coherent control over the generated quantum state by actively changing the profile of the pump beam. Our work can be useful for applications such as novel designs of high-dimensional quantum key distribution and quantum information processing protocols. In addition, our method can be readily applied for controlling other degrees of freedom of light in the SPDC process, such as the spectral and temporal properties, and may even be used in condensed-matter systems having a similar interaction Hamiltonian.

    N. Talati, H. Ye, S. Vedula, K.-Y. Chen, Y. Chen, D. Liu, Y. Yuan, D. Blaauw, A. M. Bronstein, T. Mudge, R. Dreslinski, Mint: An Accelerator For Mining Temporal Motifs, Proc. MICRO, 2022 details

    Mint: An Accelerator For Mining Temporal Motifs

    N. Talati, H. Ye, S. Vedula, K.-Y. Chen, Y. Chen, D. Liu, Y. Yuan, D. Blaauw, A. M. Bronstein, T. Mudge, R. Dreslinski
    Proc. MICRO, 2022

    A variety of complex systems, including social and communication networks, financial markets, biology, and neuroscience are modeled using temporal graphs that contain a set of nodes and directed timestamped edges. Temporal motifs in temporal graphs are generalized from subgraph patterns in static graphs in that they also account for edge ordering and time duration, in addition to the graph structure. Mining temporal motifs is a fundamental problem used in several application domains. However, existing software frameworks offer suboptimal performance due to high algorithmic complexity and irregular memory accesses of temporal motif mining. This paper presents Mint—a novel accelerator architecture and a programming model for mining temporal motifs efficiently. We first divide this workload into three fundamental tasks: search, book-keeping, and backtracking. Based on this, we propose a task–centric programming model that enables decoupled, asynchronous execution. This model unlocks massive opportunities for parallelism, and allows storing task context information on-chip. To best utilize the proposed programming model, we design a domain-specific hardware accelerator using its data path and memory subsystem design to cater to the unique workload characteristics of temporal motif mining. To further improve performance, we propose a novel optimization called search index memoization that significantly reduces memory traffic. We comprehensively compare the performance of Mint with state-of-the-art temporal motif mining software frameworks (both approximate and exact) running on both CPU and GPU, and show 9×–2576× benefit in performance.

    E. Zheltonozhskii, C. Baskin, A. Mendelson, A. M. Bronstein, O. Litany, Contrast to divide: Self-supervised pre-training for learning with noisy labels, Proc. of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022 details

    Contrast to divide: Self-supervised pre-training for learning with noisy labels

    E. Zheltonozhskii, C. Baskin, A. Mendelson, A. M. Bronstein, O. Litany
    Proc. of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

    The success of learning with noisy labels (LNL) methods relies heavily on the success of a warm-up stage where standard supervised training is performed using the full (noisy) training set. In this paper, we identify a” warm-up obstacle”: the inability of standard warm-up stages to train high quality feature extractors and avert memorization of noisy labels. We propose” Contrast to Divide”(C2D), a simple framework that solves this problem by pre-training the feature extractor in a self-supervised fashion. Using self-supervised pre-training boosts the performance of existing LNL approaches by drastically reducing the warm-up stage’s susceptibility to noise level, shortening its duration, and improving extracted feature quality. C2D works out of the box with existing methods and demonstrates markedly improved performance, especially in the high noise regime, where we get a boost of more than 27% for CIFAR-100 with 90% noise over the previous state of the art. In real-life noise settings, C2D trained on mini-WebVision outperforms previous works both in WebVision and ImageNet validation sets by 3% top-1 accuracy. We perform an in-depth analysis of the framework, including investigating the performance of different pre-training approaches and estimating the effective upper bound of the LNL performance with semi-supervised learning.

    N. Diamant, N. Shandor, A. M. Bronstein, Delta-GAN-Encoder: Encoding semantic changes for explicit image editing, using few synthetic samples, arXiv:2111.08419, 2022 details

    Delta-GAN-Encoder: Encoding semantic changes for explicit image editing, using few synthetic samples

    N. Diamant, N. Shandor, A. M. Bronstein
    arXiv:2111.08419, 2022

    Understating and controlling generative models’ latent space is a complex task. In this paper, we propose a novel method for learning to control any desired attribute in a pre-trained GAN’s latent space, for the purpose of editing synthesized and real-world data samples accordingly. We perform Sim2Real learning, relying on minimal samples to achieve an unlimited amount of continuous precise edits. We present an Autoencoder-based model that learns to encode the semantics of changes between images as a basis for editing new samples later on, achieving precise desired results – example shown in Fig. 1. While previous editing methods rely on a known structure of latent spaces (e.g., linearity of some semantics in StyleGAN), our method inherently does not require any structural constraints. We demonstrate our method in the domain of facial imagery: editing different expressions, poses, and lighting attributes, achieving state-of-the-art results.

    T. Blau, R. Ganz, B. Kawar, A. M. Bronstein, M. Elad , Threat model-agnostic adversarial defense using diffusion models, arXiv preprint arXiv:2207.08089, 2022 details

    Threat model-agnostic adversarial defense using diffusion models

    T. Blau, R. Ganz, B. Kawar, A. M. Bronstein, M. Elad
    arXiv preprint arXiv:2207.08089, 2022

    Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Following the discovery of this vulnerability in real-world imaging and vision applications, the associated safety concerns have attracted vast research attention, and many defense techniques have been developed. Most of these defense methods rely on adversarial training (AT) — training the classification network on images perturbed according to a specific threat model, which defines the magnitude of the allowed modification. Although AT leads to promising results, training on a specific threat model fails to generalize to other types of perturbations. A different approach utilizes a preprocessing step to remove the adversarial perturbation from the attacked image. In this work, we follow the latter path and aim to develop a technique that leads to robust classifiers across various realizations of threat models. To this end, we harness the recent advances in stochastic generative modeling, and means to leverage these for sampling from conditional distributions. Our defense relies on an addition of Gaussian i.i.d noise to the attacked image, followed by a pretrained diffusion process — an architecture that performs a stochastic iterative process over a denoising network, yielding a high perceptual quality denoised outcome. The obtained robustness with this stochastic preprocessing step is validated through extensive experiments on the CIFAR-10 dataset, showing that our method outperforms the leading defense methods under various threat models.

    D. E. Fordham, D. Rosentraub, A. L. Polsky, T. Aviram, Y. Wolf, O. Perl, A. Devir, S. Rosentraub, D. H. Silver, Y. Gold Zamir, A. M. Bronstein, M. Lara Lara, J. Ben Nagi, A. Alvarez, S. Munné, Embryologist agreement when assessing blastocyst implantation probability: is data-driven prediction the solution to embryo assessment subjectivity?, Human Reproduction, Volume 37, Issue 10, Pages 2275–2290, 2022 details

    Embryologist agreement when assessing blastocyst implantation probability: is data-driven prediction the solution to embryo assessment subjectivity?

    D. E. Fordham, D. Rosentraub, A. L. Polsky, T. Aviram, Y. Wolf, O. Perl, A. Devir, S. Rosentraub, D. H. Silver, Y. Gold Zamir, A. M. Bronstein, M. Lara Lara, J. Ben Nagi, A. Alvarez, S. Munné
    Human Reproduction, Volume 37, Issue 10, Pages 2275–2290, 2022

    STUDY QUESTION
    What is the accuracy and agreement of embryologists when assessing the implantation probability of blastocysts using time-lapse imaging (TLI), and can it be improved with a data-driven algorithm?

    SUMMARY ANSWER
    The overall interobserver agreement of a large panel of embryologists was moderate and prediction accuracy was modest, while the purpose-built artificial intelligence model generally resulted in higher performance metrics.

    WHAT IS KNOWN ALREADY
    Previous studies have demonstrated significant interobserver variability amongst embryologists when assessing embryo quality. However, data concerning embryologists’ ability to predict implantation probability using TLI is still lacking. Emerging technologies based on data-driven tools have shown great promise for improving embryo selection and predicting clinical outcomes.

    STUDY DESIGN, SIZE, DURATION
    TLI video files of 136 embryos with known implantation data were retrospectively collected from two clinical sites between 2018 and 2019 for the performance assessment of 36 embryologists and comparison with a deep neural network (DNN).

    PARTICIPANTS/MATERIALS, SETTING, METHODS
    We recruited 39 embryologists from 13 different countries. All participants were blinded to clinical outcomes. A total of 136 TLI videos of embryos that reached the blastocyst stage were used for this experiment. Each embryo’s likelihood of successfully implanting was assessed by 36 embryologists, providing implantation probability grades (IPGs) from 1 to 5, where 1 indicates a very low likelihood of implantation and 5 indicates a very high likelihood. Subsequently, three embryologists with over 5 years of experience provided Gardner scores. All 136 blastocysts were categorized into three quality groups based on their Gardner scores. Embryologist predictions were then converted into predictions of implantation (IPG ≥ 3) and no implantation (IPG ≤ 2). Embryologists’ performance and agreement were assessed using Fleiss kappa coefficient. A 10-fold cross-validation DNN was developed to provide IPGs for TLI video files. The model’s performance was compared to that of the embryologists.

    MAIN RESULTS AND THE ROLE OF CHANCE
    Logistic regression was employed for the following confounding variables: country of residence, academic level, embryo scoring system, log years of experience and experience using TLI. None were found to have a statistically significant impact on embryologist performance at α = 0.05. The average implantation prediction accuracy for the embryologists was 51.9% for all embryos (N = 136). The average accuracy of the embryologists when assessing top quality and poor quality embryos (according to the Gardner score categorizations) was 57.5% and 57.4%, respectively, and 44.6% for fair quality embryos. Overall interobserver agreement was moderate (κ = 0.56, N = 136). The best agreement was achieved in the poor + top quality group (κ = 0.65, N = 77), while the agreement in the fair quality group was lower (κ = 0.25, N = 59). The DNN showed an overall accuracy rate of 62.5%, with accuracies of 62.2%, 61% and 65.6% for the poor, fair and top quality groups, respectively. The AUC for the DNN was higher than that of the embryologists overall (0.70 DNN vs 0.61 embryologists) as well as in all of the Gardner groups (DNN vs embryologists—Poor: 0.69 vs 0.62; Fair: 0.67 vs 0.53; Top: 0.77 vs 0.54).

    LIMITATIONS, REASONS FOR CAUTION
    Blastocyst assessment was performed using video files acquired from time-lapse incubators, where each video contained data from a single focal plane. Clinical data regarding the underlying cause of infertility and endometrial thickness before the transfer was not available, yet may explain implantation failure and lower accuracy of IPGs. Implantation was defined as the presence of a gestational sac, whereas the detection of fetal heartbeat is a more robust marker of embryo viability. The raw data were anonymized to the extent that it was not possible to quantify the number of unique patients and cycles included in the study, potentially masking the effect of bias from a limited patient pool. Furthermore, the lack of demographic data makes it difficult to draw conclusions on how representative the dataset was of the wider population. Finally, embryologists were required to assess the implantation potential, not embryo quality. Although this is not the traditional approach to embryo evaluation, morphology/morphokinetics as a means of assessing embryo quality is believed to be strongly correlated with viability and, for some methods, implantation potential.

    WIDER IMPLICATIONS OF THE FINDINGS
    Embryo selection is a key element in IVF success and continues to be a challenge. Improving the predictive ability could assist in optimizing implantation success rates and other clinical outcomes and could minimize the financial and emotional burden on the patient. This study demonstrates moderate agreement rates between embryologists, likely due to the subjective nature of embryo assessment. In particular, we found that average embryologist accuracy and agreement were significantly lower for fair quality embryos when compared with that for top and poor quality embryos. Using data-driven algorithms as an assistive tool may help IVF professionals increase success rates and promote much needed standardization in the IVF clinic. Our results indicate a need for further research regarding technological advancement in this field.

    E. Amrani, A. M. Bronstein, Self-supervised classification network, Proc. ECCV, 2022 details

    Self-supervised classification network

    E. Amrani, A. M. Bronstein
    Proc. ECCV, 2022

    We present Self-Classifier — a novel self-supervised end-to-end classification neural network. Self-Classifier learns labels and representations simultaneously in a single-stage end-to-end manner by optimizing for same-class prediction of two augmented views of the same sample. To guarantee non-degenerate solutions (i.e., solutions where all labels are assigned to the same class), a uniform prior is asserted on the labels. We show mathematically that unlike the regular cross-entropy loss, our approach avoids such solutions. Self-Classifier is simple to implement and is scalable to practically unlimited amounts of data. Unlike other unsupervised classification approaches, it does not require any form of pre-training or the use of expectation maximization algorithms, pseudo-labelling or external clustering. Unlike other contrastive learning representation learning approaches, it does not require a memory bank or a second network. Despite its relative simplicity, our approach achieves comparable results to state-of-the-art performance with ImageNet, CIFAR10 and CIFAR100 for its two objectives: unsupervised classification and unsupervised representation learning. Furthermore, it is the first unsupervised end-to-end classification network to perform well on the large-scale ImageNet dataset. Code will be made available.