Home > Uncategorized > Weekly QuEST Meeting Discussion Topics and News, 4 Aug

Weekly QuEST Meeting Discussion Topics and News, 4 Aug

QuEST 4 Aug 2017:

This week we will have a guest lecture by colleagues from UCLA to discuss the paper by Achille / Soatto UCLA, arXiv:1706.01350v1 [cs.LG] 5 Jun 2017

On the emergence of invariance and disentangling in deep representations

Lots of interesting analysis in this article but what caught my eye was the discussion on properties of representations:

  • In many applications, the observed data x is high dimensional (e.g., images or video), while the task y is low-dimensional, e.g., a label or a coarsely quantized location. ** what if the task was a simulation – that was stable, consistent and useful – low dimensional?**
  • For this reason, instead of working directly with x, we want to use a representation z that captures all the information the data x contains about the task y, while also being simpler than the data itself.  ** and are there a range of tasks y that can be serviced by a representation z – how do we address the tension between the representation and the tasks – how do we define what tasks can be serviced by a given representation?**
  • Ideally, such a representation should be
  • (a) sufficient for the task y, i.e. I(y; z) = I(y; x), so that information about y is not lostamong all sufficient representations, it should be
  • (b) minimal, i.e. I(z; x) is minimized, so that it retains as little about x as possible, simplifying the role of the classifier; finally, it should be
  • (c) invariant to the effect of nuisances I(z; n) = 0, so that decisions based on the representation z will not overfit to spurious correlations between nuisances n and labels y present in the training dataset
  • Assuming such a representation exists, it would not be unique, since any bijective function preserves all these properties.
  • We can use this fact to our advantage and further aim to make the representation
  • (d) maximally disentangled, i.e., TC(z) is minimal, where disentanglement is often measured as the correlation of the network weights… the paper appears to use total correlation, which is the (presumably one-sided) KL divergence between the joint PDF of the weights and the Naïve Bayes estimate à KL(f(w1, w2, …, wn), f(w1)f(w2)…f(wn))
  • This simplifies the classifier rule, since no information is present in the complicated higher-order correlations between the components of z, a.k.a. “features.”
  • In short, an ideal representation of the data is a minimal sufficient invariant representation that is disentangled.
  • Inferring a representation that satisfies all these properties may seem daunting. However, in this section we show that we only need to enforce (a) sufficiency and (b) minimality, from which invariance and disentanglement follow naturally.
  • Between this and the next section, we will then show that sufficiency and minimality of the learned representation can be promoted easily through implicit or explicit regularization during the training process.

As we mature our view of how to work to these rich representation it brings up the discussion point of QuEST as a platform:


I would like to think through a QuEST solution that is a platform that uses existing front ends (application dependent by observation vendors) and existing big-data back ends like standard Big Data Solutions such Amazon Web services … , and possibly a series of knowledge creation vendors  – It is helpful here to consider the Cross Industry Standard Process for Data Mining (commonly known by its acronym CRISPDM, is a data mining process model that describes commonly used steps data mining experts use to tackle data mining problems) to show how QuEST fits within, and can enable, all aspects of the CRISP-DM process.

Independent of the representation used by a front end system that captures the observables and provides them to the QuEST agent – it becomes the quest agent’s job to take them and create two uses for them – the first is put them in a form usable by a big-data solution (following CRISP-DM, this would entail the Data Understanding, and Data Preparation), but do so based on an understanding of the relevant QuEST model (CRISP-DM, Modeling), and in a way that supports CRISP-DM Business Understanding (e.g., perhaps infer it based on its ‘Sys2 Artificial Consciousness’ – the next piece) to find if there exists experiences stored – something close enough to them to provide the appropriate response when in the CRISP-DM Deployed phase – and the second form has to be consistent with our situated / simulation tenets – so they are provided to a ‘simulation’ system that attempts to ‘constrain’ the simulation that will generate the artificially conscious ‘imagined’ present that can complement the ‘big-data’ response – in fact the simulated data might be fed as ‘imagined observables’ into the back end, infer gaps in CRISP-DM Business Understanding that then also feed the big-data response, and offer more valuable contributions to users in CRISP-DM Deployment– I would like to expand on this discussionnews summary (63)

Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: