RESEARCH

Networks learn networks

Equivariant Architectures for Learning in Deep Weight Spaces

Designing machine learning architectures for processing neural networks in their raw weight matrix form is a newly introduced research direction with a wide range of intriguing applications. Unfortunately, the unique symmetry structure of deep weight spaces makes this design very challenging. We present a novel network architecture for learning in deep weight spaces, which is equivariant to the natural permutation symmetry of the MLPs. We demonstrate the effectiveness of our architecture and its advantages over natural baselines in various learning tasks.

Learning with
limited data

Guided Deep Kernel Learning

Link to paper

Combining Gaussian processes with the expressive power of deep neural networks is commonly done nowadays through deep kernel learning (DKL). Unfortunately, due to the kernel optimization process, this often results in losing their Bayesian benefits. In this study, we present a novel approach for learning deep kernels by utilizing infinite-width neural networks. We propose to use the Neural Network Gaussian Process (NNGP) model as a guide to the DKL model in the optimization process. Our approach harnesses the reliable uncertainty estimation of the NNGPs to adapt the DKL target confidence when it encounters novel data points. As a result, we get the best of both worlds, we leverage the Bayesian behavior of the NNGP, namely its robustness to overfitting, and accurate uncertainty estimation, while maintaining the generalization abilities, scalability, and flexibility of deep kernels.
Code here: https://github.com/IdanAchituve/GDKL

Federated Learning

Personalized Federated Learning with Gaussian Processes

Link

Federated learning aims to learn a global model that performs well on client devices with limited cross-client communication. Personalized federated learning (PFL) further extends this setup to handle data heterogeneity between clients by learning personalized models. A key challenge in this setting is to learn effectively across clients even though each client has unique data that is often limited in size. we DEVELOPED pFedGP, a solution to PFL that is based on Gaussian processes (GPs) with deep kernel learning. We propose learning a shared kernel function across all clients, parameterized by a neural network, with a personal GP classifier for each client. Extensive experiments on standard PFL benchmarks with CIFAR-10, CIFAR-100, and CINIC-10, and on a new setup of learning under input noise show that pFedGP achieves well-calibrated predictions while significantly outperforming baseline methods, reaching up to 21% in accuracy gain.

Incremental-Learning

GP-Tree: A Gaussian Process Classifier for Few-Shot Incremental Learning

Link

Gaussian processes (GPs) are non-parametric, flexible, models that work well in many tasks. Combining GPs with deep learning methods via deep kernel learning is especially compelling due to the strong expressive power induced by the network. However, inference in GPs, whether with or without deep kernel learning, can be computationally challenging on large datasets. Here, we propose GP-Tree, a novel method for multi-class classification with Gaussian processes and deep kernel learning. We develop a tree-based hierarchical model in which each internal node of the tree fits a GP to the data using the Polya-Gamma augmentation scheme. As a result, our method scales well with both the number of classes and data size. We demonstrate our method effectiveness against other Gaussian process training baselines, and we show how our general GP approach is easily applied to incremental few-shot learning and reaches state-of-the-art performance.

Multi-Task learning

Auxiliary Learning by Implicit Differentiation (ICLR2021)

Link to project page

Training neural networks with auxiliary tasks is a common practice for improving the performance of the main task. Two main challenges arise in this multi-task learning setting: (i) designing useful auxiliary tasks; and (ii) combining auxiliary tasks into a single coherent loss. Here, we propose a novel framework, AuxiLearn, that targets both challenges using implicit differentiation. When useful auxiliaries are known, we propose learning a network that non-linearly combines all losses into a single coherent objective function. When no useful auxiliary task is known, we learn a network that generates a meaningful, novel auxiliary task.

Multi-source domain adaptation

Teacher-Student Consistency For Multi-Source Domain Adaptation

Link

In Multi-Source Domain Adaptation (MSDA), models are trained on samples from multiple source domains and used for inference on a different, target, domain. Mainstream domain adaptation approaches learn a joint representation of the source and the target domains. Unfortunately, a joint representation may emphasize features that are useful for the source domains but hurt inference on target (negative transfer) or remove essential information about the target domain (knowledge fading).

We propose Multi-source Student-Teacher (MUST), a novel procedure designed to alleviate these issues. The key idea has two steps: First, we train a teacher network on source labels and infer pseudo labels on the target. Then, we train a student network using the pseudo labels and regularized the teacher to fit the student predictions. This regularization helps the teacher predictions on the target data remain consistent between epochs. Evaluations of MUST on three MSDA benchmarks: digits, text sentiment analysis, and visual-object recognition show that MUST outperforms current SoTA, sometimes by a very large margin. We further analyze the solutions and the dynamics of the optimization showing that the learned models follow the target distribution density, implicitly using it as information within the unlabeled target data.

Self-supervised point clouds

Self-supervised learning for domain adaptation on point clouds (WACV2021)

Link

Self-supervised learning (SSL) is a technique for learning useful representations from unlabeled data. It has been applied effectively to domain adaptation (DA) on images and videos. It is still unknown if and how it can be leveraged for domain adaptation in 3D perception problems. Here we describe the first study of SSL for DA on point clouds. We introduce a new family of pretext tasks, Deformation Reconstruction, inspired by the deformations encountered in sim-to-real transformations. In addition, we propose a novel training procedure for labeled point cloud data motivated by the MixUp method called Point cloud Mixup (PCM). Evaluations on domain adaptations datasets for classification and segmentation, demonstrate a large improvement over existing and baseline methods.

Long-tail learning

Long-tail learning with attributes (ECCV 2020)

Link to project page

Real-world data is predominantly unbalanced and long-tailed, but deep models struggle to recognize rare classes in the presence of frequent classes. Often, classes can be accompanied by side information like textual descriptions, but it is not fully clear how to use them for learning with unbalanced long-tail data. We describe DRAGON, a late-fusion architecture for long-tail learning with class descriptors. It learns to (1) correct the bias towards head classes on a sample-by-sample basis; and (2) fuse information from class-descriptions to improve the tail-class accuracy. DRAGON outperforms state-of-the-art models on the new benchmark and also is a new SoTA on existing benchmarks for GFSL with class descriptors (GFSL-d) and standard (vision-only) long-tailed learning.

Incremental Learning with Limited Access

Learning New Classes Without Forgetting the Original Ones (EMNLP 2019)

Link

We address the problem of adding new classes to an existing classifier without hurting the original classes, when no access is allowed to any sample from the original classes. This problem arises frequently since models are often shared without their training data, due to privacy and data ownership concerns. We propose an easy-to-use approach that modifies the original classifier by retraining a suitable subset of layers using a linearly-tuned, knowledge-distillation regularization. The set of layers that is tuned depends on the number of new added classes and the number of original classes.

Cooperative Image Captioning

Joint Optimization of Networks for Image Captioning (ICCV 2019)

Link to project (github)

In image captioning task, descriptions can be made more informative if tuned using a downstream tasks. The challenge is the discrete nature of natural language, which make the optimization hard. To address this challenge, we developed a new effective optimization method. Our method takes advantage of the cooperative game between the two networks by transmitting more information to the downstream task.

Zero-Shot learning

Probabilistic AND-OR attribute grouping for Zero-shot learning (2018)

Link to project page

In zero-shot learning (ZSL), classifiers are trained to recognize visual classes without any image samples. Instead, it is given semantic information about the class, like a textual description or a set of attributes. We describe a probabilistic model trained end-to-end designed to capture natural soft and-or relations across groups of attributes.

Discriminative captions

Describe images in natural language, taking context into account

Link to the paper

We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation). For example, given images and captions of "siamese cat" and "tiger cat", we generate language that describes the "siamese cat" in a way that distinguishes it from "tiger cat".

Metric Learning (2016)

Learning Sparse Metrics, One Feature at a Time

Link to project page

Learning distance metrics from data amounts to optimization over the cone of positive definite (PD) matrices. This optimization is difficult since restricting optimization to remain within the PD cone or repeatedly projecting to the cone is prohibitively costly. We describe COMET, a block-coordinate descent procedure, which efficiently keeps the search within the PD cone, avoiding both costly projections and unnecessary computation of full gradients.

OASIS

Large scale learning of image similarity

Project page

Learning a measure of similarity between pairs of objects is an important generic problem in machine learning. It is particularly useful in large scale applications like searching for an image that is similar to a given image or finding videos that are relevant to a given video...

LORETA

Low rank similarity learning

Link

When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. Naive approaches for minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold that can be computed efficiently. LORETA also showed consistent improvement over standard methods in a large multi-label image classification task.

Generative AI with long tail data

Seed selection for text to image generation

TBD

Text-to-image diffusion models can synthesize a large variety of concepts in new compositions and scenarios. However, they still struggle with generating uncommon concepts, rare unusual combinations, or structured concepts like hand palms. Here we characterize the effect of unbalanced training data on text-to-image models and offer a remedy. We show that rare concepts can be cor- rectly generated by carefully selecting suitable generation seeds in the noise space, a technique that we call SeedSelect. We evaluate the benefit of SeedSelect in (1) few-shot semantic data augmentation, where we generate semantically correct images for few-shot and long-tail benchmarks and (2) on correcting images of hands, a well-known pitfall of current diffusion models.

Multitask learning

Auxiliary Learning as an Asymmetric Bargaining Game

Link to project page

Auxiliary learning is an effective method for enhancing the generalization capabilities of trained models, particularly when dealing with small datasets. However, this approach may present several difficulties: (i) optimizing multiple objectives can be more challenging, and (ii) how to balance the auxiliary tasks to best assist the main task is unclear.
In this work, we propose a novel approach, named AuxiNash, for balancing tasks in auxiliary learning by formalizing the problem as generalized bargaining game with asymmetric task bargaining power. Furthermore, we describe an efficient procedure for learning the bargaining power of tasks based on their contribution to the performance of the main task and derive theoretical guarantees for its convergence.

Federated Learning

Personalized Federated Learning using Hypernetworks

Link to project page

Personalized federated learning is tasked with training machine learning models for multiple clients, each with its own data distribution. The goal is to collaboratively train personalized models while accounting for the data disparity across clients and reducing communication costs.
We propose a novel approach to handle this problem using hypernetworks, termed pFedHN for personalized Federated HyperNetworks. In this approach, a central hypernetwork model is trained to generate a set of models, one model for each client. This architecture provides effective parameter-sharing across clients while maintaining the capacity to generate unique and diverse personal models. Furthermore, since hypernetwork parameters are never transmitted, this approach decouples communication cost from the trainable model size. We test pFedHN empirically in several personalized federated learning challenges and find that it outperforms previous methods. Finally, we show that pFedHN can generalize better to new clients whose distribution differ from any client observed during training.

Long-Tail Learning

Distributional Robustness Loss for Long-tail Learning

Link to project page

Real-world data is often unbalanced and long-tailed, but deep models struggle to recognize rare classes in the presence of frequent classes. To address unbalanced data, most studies try balancing the data, the loss, or the classifier to reduce classification bias towards head classes. Far less attention has been given to the latent representations learned with unbalanced data. We show that the feature extractor part of deep networks suffers greatly from this bias. We propose a new loss based on robustness theory, which encourages the model to learn high-quality representations for both head and tail classes. While the general form of the robustness loss may be hard to compute, we further derive an easy-to-compute upper bound that can be minimized efficiently. This procedure reduces representation bias towards head classes in the feature space and achieves new SOTA results on CIFAR100-LT, ImageNet-LT, and iNaturalist long-tail benchmarks. We find that training with robustness increases recognition accuracy of tail classes while largely maintaining the accuracy of head classes. The new robustness loss can be combined with various classifier balancing techniques and can be applied to representations at several layers of the deep model.

featured_hu6126490974709b6ac9f0b2ebfdf6c

Compositional learning

A causal view of compositional zero-shot recognition
(NeurIPS 2020)

Link to project page

People easily recognize new visual categories that are new combinations of known components. This compositional generalization capacity is critical for learning in real-world domains because the long tail of new combinations dominates the distribution. Unfortunately, learning systems struggle with compositional generalization because they often build on features that are correlated with class labels even if they are not “essential” for the class.

We describe an approach for compositional generalization that builds on causal ideas. We describe compositional zero-shot learning from a causal perspective and a causal-inspired embedding model that learns disentangled representations of elementary components of visual objects from correlated (confounded) training data.

Multi-Objective Optimization

Learning the Pareto Front with Hypernetworks (ICLR2021)

Link to project page

Multi objective optimization problems are prevalent in machine learning. These problems have a set of optimal solutions, called the Pareto front, where each point on the front represents a different trade-off between possibly conflicting objectives. Recent optimization algorithms can target a specific desired ray in loss space, but still face two grave limitations: (i) A separate model has to be trained for each point on the front; and (ii) The exact trade-off must be known prior to the optimization process. Here, we tackle the problem of learning the entire Pareto front, with the capability of selecting a desired operating point on the front after training. We call this new setup Pareto-Front Learning (PFL). We describe an approach to PFL implemented using HyperNetworks, which we term Pareto HyperNetworks (PHNs). PHN learns the entire Pareto front simultaneously using a single hypernetwork, which receives as input a desired preference vector, and returns a Pareto-optimal model whose loss vector is in the desired ray. The unified model is runtime efficient compared to training multiple models, and generalizes to new operating points not used during training. PHNs learns the entire Pareto front in roughly the same time as learning a single point on the front, and also reaches a better solution set.

Reasoning in Video

Learning Object Permanence from Video (ECCV2020)

Link to project page

Object Permanence (OP) allows people to reason about the location of objects even when they are not perceived directly. It is critical for building a model of the world, since objects in natural visual scenes dynamically occlude and contain each-other. Here we introduce the setup of learning Object Permanence from labeled videos. We dissect the problem into four components, where objects are (1) visible, (2) occluded, (3) contained by another object and (4) carried by a containing object. We then present a unified deep architecture that learns to predict object location under these four scenarios. We evaluate the architecture and system on a new dataset based on CATER, and find that it outperforms previous localization methods and various baselines.

Zero-shot learning with attributes

A probabilistic approach to combine information from vision and attributes (CVPR2020)

Link to project page

Generalized zero-shot learning (GZSL) is the problem of learning a classifier where some classes have samples and others are learned from side information, like semantic attributes or text description, in a zero-shot learning fashion (ZSL). Training a single model that operates in these two regimes simultaneously is challenging.

We developed a probabilistic approach that combines three modular components: A "gating" model that makes soft decisions if a sample is from a "seen" class, and two experts. A ZSL expert, and an expert model for seen classes. We address two main difficulties in this approach: How to provide an accurate estimate of the gating probability without any training samples for unseen classes; and how to use expert predictions when it observes samples outside of its domain.

IOTA - Informative Object Annotations

Information Theory Metric for Selecting Relevant Image Descriptions (CVPR2020)

Link

Capturing the interesting components of an image is a key aspect of image understanding. While people intuitively manage to focus on what is “informative” or “relevant”, automated classifiers can produce a large number of labels that are perhaps technically correct, but are often non-interesting. We present a new unsupervised approach for selecting the most informative term to describe an image.

Building on the insight that the relevance of a description depends on what a listener already knows (prior knowledge); The key idea is that communicated terms are aimed to reduce the uncertainty that a listener has about the semantic space. While the full estimation problem is intractable, we describe an efficient algorithm to approximate entropy reduction using a tree-structured graphical model (a Chow-Liu tree).

Generalized Zero-Shot Learning

Adaptive Confidence Smoothing for Generalized Zero-Shot Learning (2019)

Link

Patterns of
brain transcriptome

Detect changes along life and across the brain (2015)

Link

We study the spatial patterns of trasncriptome in the human brain

Decode neural activity from MEG

Fine differences in activity timing across the brain carry significant information (2018)

Link to video

We develop a method to learn neural codes from few dozen samples, operating in extremely high-dimensional space. We discover surprising properties of coding with timing differences.

Transcriptome hourglass in mouse brain development

Specialization patterns of brain transcriptome in develeopment (2013)

Link

Brain transcriptome changes during development, reflecting processes that determine functional specialization of brain regions. We found that during the time that the brain becomes anatomically regionalized in early development, transcription specialization actually decreases reaching a low, ‘‘neurotypic’’, point around birth. This decrease of specialization is brain-wide, and mainly due to biological processes involved in constructing brain circuitry. Regional specialization rises again during post-natal development, largely due to specialization of plasticity and neural activity processes. Post-natal specialization is particularly significant in the cerebellum, whose expression signature becomes increasingly different from other brain regions.

Serotonin genes in adolescence

Detect which neuromodulators involved in mood disorders change in adolescence

Link

Adolescence is a period of profound neurophysiological, behavioral, cognitive and psychological changes, but not much is known about the underlying molecular neural mechanisms. We systematically analyze expression of genes forming serotonergic and dopaminergic synapses during adolescence and found two serotonin receptors, HTR1E, HTR1B exhibit a sharp transition of expression in the prefrontal cortex in adolescence. A similar but smoother rise in expression levels is observed in HTR4 and HTR5A, and in HTR1E and HTR1B in three other expression datasets published. The expression of HTR1E and HTR1B is correlated across subjects within each age group, suggesting that they are controlled by common mechanisms

RNA editing in the human brain

Positive correlation between ADAR expression and its targets suggests a complex regulation mediated by RNA editing in the human brain

Link

A-to-I RNA editing by adenosine deaminases acting on RNA is a post-transcriptional modification that is crucial for normal life and development in vertebrates. We examine the relation between the expression of ADAR genes with the expression of their target genes. Surprisingly, we found that a large fraction of the edited genes are positively correlated with ADAR, opposing the assumption that editing would reduce expression. These findings suggest that ADAR expression does not have a genome-wide effect reducing the expression of editing targets. It is possible, however, that RNA editing by ADAR in non-coding regions of the gene might be a part of a more complex expression regulation mechanism.

LearningSystems

Chechik Lab Computer science, Bar-Ilan

RESEARCH

Networks learn networks

Equivariant Architectures for Learning in Deep Weight Spaces

Learning with limited data

Guided Deep Kernel Learning

Federated Learning

Personalized Federated Learning with Gaussian Processes

Incremental-Learning

GP-Tree: A Gaussian Process Classifier for Few-Shot Incremental Learning

Multi-Task learning

Auxiliary Learning by Implicit Differentiation (ICLR2021)

Multi-source domain adaptation

Teacher-Student Consistency For Multi-Source Domain Adaptation

Self-supervised point clouds

Self-supervised learning for domain adaptation on point clouds (WACV2021)

Long-tail learning

Long-tail learning with attributes (ECCV 2020)

Incremental Learning with Limited Access

Learning New Classes Without Forgetting the Original Ones (EMNLP 2019)

Cooperative Image Captioning

Joint Optimization of Networks for Image Captioning (ICCV 2019)

Zero-Shot learning

Probabilistic AND-OR attribute grouping for Zero-shot learning (2018)

Discriminative captions

Describe images in natural language, taking context into account

Metric Learning (2016)

Learning Sparse Metrics, One Feature at a Time

OASIS

LORETA

Generative AI with long tail data

Seed selection for text to image generation

Multitask learning

Auxiliary Learning as an Asymmetric Bargaining Game

Federated Learning

Personalized Federated Learning using Hypernetworks

Long-Tail Learning

Distributional Robustness Loss for Long-tail Learning

Compositional learning

A causal view of compositional zero-shot recognition (NeurIPS 2020)

Multi-Objective Optimization

Learning the Pareto Front with Hypernetworks (ICLR2021)

Reasoning in Video

Learning Object Permanence from Video (ECCV2020)

Zero-shot learning with attributes

A probabilistic approach to combine information from vision and attributes (CVPR2020)

IOTA - Informative Object Annotations

Information Theory Metric for Selecting Relevant Image Descriptions (CVPR2020)

​

Generalized Zero-Shot Learning

Adaptive Confidence Smoothing for Generalized Zero-Shot Learning (2019)

Patterns of brain transcriptome

Decode neural activity from MEG

Transcriptome hourglass in mouse brain development

Specialization patterns of brain transcriptome in develeopment (2013)

Serotonin genes in adolescence

Detect which neuromodulators involved in mood disorders change in adolescence

RNA editing in the human brain

Positive correlation between ADAR expression and its targets suggests a complex regulation mediated by RNA editing in the human brain

Chechik Lab
Computer science, Bar-Ilan

Learning with
limited data

A causal view of compositional zero-shot recognition
(NeurIPS 2020)

Patterns of
brain transcriptome