Home > Uncategorized > Weekly QuEST Discussion Topics and News, 17 Mar

Weekly QuEST Discussion Topics and News, 17 Mar

QuEST 17 March 2017

Again there were several interesting email conversation threads going on this week:

We want to hit briefly the article from last week – this thread was initiated by Trevor and Todd from our Sensing dendrite:

Why does deep and cheap learning work so well?
Henry W. Lin and Max Tegmark
Dept. of Physics, Harvard University, Cambridge, MA 02138 and
Dept. of Physics & MIT Kavli Institute, Massachusetts Institute of Technology, Cambridge, MA 02139

arXiv:1608.08225v2 [cond-mat.dis-nn] 28 Sep 2016

  • We show how the success of deep learning depends not only on mathematics but also on physics: although well-knownmathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can be approximated through “cheap learning” with exponentially fewer parameters than generic ones, because they have simplifying properties tracing back to the laws of physics.
  • The exceptional simplicity of physics-based functions hinges on properties such as symmetry, locality, compositionality and polynomial log-probability, and we explore how these properties translate into exceptionally simple neural networks approximating both natural phenomena such as images and abstract representations thereof such as drawings.
  • We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one.
  • We formalize these claims using information theory and discuss the relation to renormalization group procedures. We prove various “no-flattening theorems” showing when such efficient deep networks cannot be accurately approximated by shallow ones without efficiency loss: flattening even linear functions can be costly, and flattening polynomials is exponentially expensive; we use group theoretic techniques to show that n variables cannot be multiplied using fewer than 2^n neurons in a single hidden layer.

A related topic to this first one:

In another email thread this week we were asked about:


Brainlike computers are a black box. Scientists are finally peering inside

By Jackie SnowMar. 7, 2017 , 3:15 PM

Last month, Facebook announced software that could simply look at a photo and tell, for example, whether it was a picture of a cat or a dog. A related program identifies cancerous skin lesions as well as trained dermatologists can. Both technologies are based on neural networks, sophisticated computer algorithms at the cutting edge of artificial intelligence (AI)—but even their developers aren’t sure exactly how they work. Now, researchers have found a way to “look” at neural networks in action and see how they draw conclusions.

Neural networks, also called neural nets, are loosely based on the brain’s use of layers of neurons working together. Like the human brain, they aren’t hard-wired to produce a specific result—they “learn” on training sets of data, making and reinforcing connections between multiple inputs. A neural net might have a layer of neurons that look at pixels and a layer that looks at edges, like the outline of a person against a background. After being trained on thousands or millions of data points, a neural network algorithm will come up with its own rules on how to process new data. But it’s unclear what the algorithm is using from those data to come to its conclusions.

“Neural nets are fascinating mathematical models,” says Wojciech Samek, a researcher at Fraunhofer Institute for Telecommunications at the Heinrich Hertz Institute in Berlin. “They outperform classical methods in many fields, but are often used in a black box manner.”

In an attempt to unlock this black box, Samek and his colleagues created software that can go through such networks backward in order to see where a certain decision was made, and how strongly this decision influenced the results.Their method, which they will describe this month at the Centre of Office Automation and Information Technology and Telecommunication conference in Hanover, Germany, enables researchers to measure how much individual inputs, like pixels of an image, contribute to the overall conclusion. Pixels and areas are then given a numerical score for their importance. With that information, researchers can create visualizations that impose a mask over the image. The mask is most bright where the pixels are important and darkest in regions that have little or no effect on the neural net’s output.

For example, the software was used on two neural nets trained to recognize horses. One neural net was using the body shape to determine whether it was horse. The other, however, was looking at copyright symbols on the images that were associated with horse association websites.

This work could improve neural networks, Samek suggests. That includes helping reduce the amount of data needed, one of the biggest problems in AI development, by focusing in on what the neural nets need. It could also help investigate errors when they occur in results, like misclassifying objects in an image.

Other researchers are working on similar processes to look into how algorithms make decisions, including neural nets for visuals as well as text. Continued research is important as algorithms make more decisions in our daily lives, says Sara Watson, a technology critic with the Berkman Klein Center for Internet & Society at Harvard University. The public needs tools to be able to understand how AI makes decisions. Algorithms, far from being perfect arbitrators of truth, are only as good as the data they’re given, she notes.

In a notorious neural network mess up, Google tagged a black woman as a gorilla in its photos application. Even more serious discrimination has been called into question in software that provides risk scores that some courts use to determine whether a criminal is likely to reoffend, with at least one study showing black defendants are given a higher risk score than white defendants for similar crimes. “It comes down to the importance of making machines, and the entities that employ them, accountable for their outputs,” Watson says


Not attempting to be dismissive but:

Cathy is pulling the technical article – but from the text in the news article this appears to be a rehash of something we invented in 1990:


  • Ruck, D. W., Rogers, S., Kabrisky, M., “Feature Selection Using a Multilayer Perceptron”, Journal of Neural Network Computing, Vol 2 (2), pp 40-48, Fall 1990.


When you use a supervised learning system with a mean squared error objective function and differentiable nonlinear neurons – then you can solve the partial differential equations to extract ‘saliency’ – that is you can work through any decision and rank order the inputs to decide an ‘order’ to their impact – in 1990 we weren’t doing representational learning (like with deep neural networks – we didn’t have enough data or compute power) but the equations are the same we just put in features extracted with our computer vision algorithms that were suggested by human radiologists – then after trained when we put in a new mammogram we could extract which features dominated the decision to call something cancer or normal


We’ve recently in deep neural networks done similar things in our captioning work to decide what aspects of an image or video a particular linguistic expression is evoked from – for example in a dog chasing Frisbee picture we can back project to find where in the image are the pixels that evoked the word Frisbee – this has cracked the black box somewhat also


So both of these suggest to me this news article is just stating what we know (although in general a black box these deep systems can provide us some aspects of their ‘meaning’ that we can understand – this will be a focus of the new start at DARPA xAI – for explainable AI) but again I will review the technical article and if there is more there I will provide an addendum to this email


We now have the technical article – I don’t think our response above is far off except for the approach is based on Taylor expansion versus our approach – the ideas are the same and the importance of the problem is good – in a very important way they extend our sensitivity analysis as a special case of their more general Taylor approach:

Pattern Recognition 65 (2017) 211–222

Explaining nonlinear classification decisions with deep Taylor


Grégoire Montavona,⁎, Sebastian Lapuschkinb, Alexander Binderc, Wojciech Samekb,⁎,

Klaus-Robert Müllera,d,⁎⁎

a Department of Electrical Engineering & Computer Science, Technische Universität Berlin, Marchstr. 23, Berlin 10587, Germany

b Department of Video Coding & Analytics, Fraunhofer Heinrich Hertz Institute, Einsteinufer 37, Berlin 10587, Germany

c Information Systems Technology & Design, Singapore University of Technology and Design, 8 Somapah Road, Building 1, Level 5, 487372, Singapore

d Department of Brain & Cognitive Engineering, Korea University, Anam-dong 5ga, Seongbuk-gu, Seoul 136-713, South Korea

Nonlinear methods such as Deep Neural Networks (DNNs) are the gold standard for various challenging machine learning problems such as image recognition. Although these methods perform impressively well, they have a significant disadvantage, the lack of transparency, limiting the interpretability of the solution and thus the scope of application in practice. Especially DNNs act as black boxes due to their multilayer nonlinear structure. In this paper we introduce a novel methodology for interpreting generic multilayer neural networks by decomposing the network classification decision into contributions of its input elements. Although our focus is on image classification, the method is applicable to a broad set of input data, learning tasks and network architectures. Our method called deep Taylor decomposition efficiently utilizes the structure of the network by backpropagating the explanations from the output to the input layer. We evaluate the proposed method empirically on the MNIST and ILSVRC data sets.


With respect to applications of deep systems:

Another thread was tied to my prior life in using AI/ML for medical detection / diagnosis – thus the article below:

Detecting Cancer Metastases on
Gigapixel Pathology Images
Yun Liu1?, Krishna Gadepalli1, Mohammad Norouzi1, George E. Dahl1,
Timo Kohlberger1, Aleksey Boyko1, Subhashini Venugopalan2??,
Aleksei Timofeev2, Philip Q. Nelson2, Greg S. Corrado1, Jason D. Hipp3,
Lily Peng1, and Martin C. Stumpe1
1Google Brain, 2Google Inc, 3Verily Life Sciences,
Mountain View, CA, USA

  • Each year, the treatment decisions for more than 230; 000 breast cancer patients in the U.S. hinge on whether the cancer has metastasized away from the breast.
  • Metastasis detection is currently performed by pathologists reviewing large expanses of biological tissues. This process is labor intensive and error-prone.
  • We present a framework to automatically detect and localize tumors as small as 100 x 100 pixels in gigapixel microscopy images sized 100,000 x 100,000 pixels.
  • Our method leverages a convolutional neural network (CNN) architecture and obtains state-of-the-art results on the Camelyon16 dataset in the challenging lesion-level tumor detection task.
  • At 8 false positives per image, we detect 92:4% of the tumors, relative to 82:7% by the previous best automated approach.
  • For comparison, a human pathologist attempting exhaustive search achieved 73:2% sensitivity.
  • We achieve image-level AUC scores above 97% on both the Camelyon16 test set and an independent set of 110 slides.
  • In addition, we discover that two slides in the Camelyon16 training set were erroneously labeled normal.
  • Our approach could considerably reduce false negative rates in metastasis detection.


As another application:

on March 9, 2017http://science.sciencemag.org/ Downloaded from


DeepStack: Expert-level artificial intelligence in heads-up no-limit poker
Matej Moravčík,1,2* Martin Schmid,1,2* Neil Burch,1 Viliam Lisý,1,3 Dustin Morrill,1 Nolan Bard,1 Trevor Davis,1 Kevin Waugh,1 Michael Johanson,1 Michael Bowling1


  • Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect information settings. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning. In a study involving 44,000 hands of poker, DeepStack defeated with statistical significance professional poker players in heads-up no-limit Texas hold’em. The approach is theoretically sound and is shown to produce more difficult to exploit strategies than prior approaches.


One of our main topics of multiple agents with objective functions commonality – from our sensing and AFIT mathematics dendrite:

How do we know that your “red” looks ** this is a quale statement – what you perceive in your consciousness is what I perceive **  the same as my “red”? For all we know, your “red” looks like my “blue.” In fact, for all we know your “red” looks nothing like any of my colors at all! If colors are just internal labels  ** labels here is not meant to imply the word – it is meant to describe the representation internal in a vocabulary of conscious thought ** , then as long as everything gets labeled, why should your brain and my brain use the same labels?  ** as long as we can align – why would they have to be the same – keep in mind the Kaku view – philosophers waste our time with these thought problems – he speaks to the ‘what is life concern that has disappeared’ – BUT OUR INTEREST IN OBJECTIVE FUNCTIONS THAT START WITH SIMILAR MACHINERY WHAT CAN I SAY ABOUT THE RESULTING REPRESENTATION – CAN I MAKE A STATEMENT ON WHAT CHAREACTERISTICS OF YOUR RED MY RED HAS TO HAVE? – IN THE RED CASE I WOULD CONTEND THAT THE RELATIONSHIPS BETWEEN YOUR RED AND YOUR BLUE HAVE TO BE REPLICATED – AND BY THE TIME I CONSTRAIN THE RELATIONSHIPS WITH SO MANY QUALIA IT CONSTRAINS THE MEANING (probably should say the comprehension – how qualia are related and how they can interact) TO BE THE SAME WHERE THE MEANING IS THE CHANGES TO THE REPRESENTATION RESULTING FROM THE STIMULI  – **

*** CAP POSITS:  THE MEANING / understanding / comprehension THE WAY WE DEFINE IN QUEST OF YOUR RED IS THE SAME AS THE MEANING OF MY RED ** ?understanding and maybe comprehension versus meaning – jared **really your representation / resulting meaning / understanding / resulting comprehension (relationships and ways the situations can interact – Representation is how agent structured knowledge meaning understanding and comprehension – recall we defined comprehension when we were defining qualia / situations – as something that can be comprehended as a whole – comprehended was defined as being able to discern how it is related to or how it can interact with other situations –


A situation is any part of the agent centric internal representation which can be comprehended as a whole by that agent through defining how it interacts with or is related to other parts of the representation in that agent.

We will define comprehended by defining how it interacts or is related to other situations via linking (and types of links).

interacting with other things we mean that the situations have properties or relate to other situations.” *** we would say  can and must be linked to other ‘situations’  = ‘other qualila’ = other chunks***


as I thought about this – and thought about the word doc – is your red my red – and my posit – that the meaning you generate for red is the same as what I generate for my red – your comment on I need to use understanding versus meaning – putting all this together I was forced to own up to what I need to stay ‘alignable’ – we don’t have to have the same exact changes to the representation (use our deep learning metaphor – I don’t care if all the activations are the same) – but what I have to maintain between agents for ‘red’ to be alignable is that the relationships to other situations is maintained between the respective agents – when I went down this path it reminded me of how we defined situations – and how we had to clear up the word comprehension – I’m happy to change the word comprehension to understanding – for now I have restricted comprehension to be understanding associated with the task of relationships / interactions with other situations


Another thread has continued to advance this week and related to objective function ‘red’ thread is the with interactions between our Airmen sensors autonomy team and our AFIT autonomy team – with the focus on ‘chat-bots’ – the idea that the future is all about these ‘AI bots’ versus apps – and that QuEST chat-bots might provide an avenue where knowledge of the developing representations that capture aspects of consciousness are key to solving the very tough problem of bots that accomplish the type of meaning-making required for many applications – and might be the key to bot-to-bot communication without have to strictly manually define a communication protocol.

Part of the interest of this thread is multi-modal communications – that is the reason the material below was inserted into the thread:

Snap Makes a Bet on the Cultural Supremacy of the Camera

  • https://www.nytimes.com/2017/03/08/technology/snap-makes-a-bet-on-the-cultural-supremacy-of-the-camera.html?_r=0
  • The rising dependence on cameras is changing the way we communicate. Credit Doug Chayka
  • If you’re watching Snap’s stock ticker, stop. The company that makes Snapchat, the popularphoto-messaging app, has been having a volatile few days after its rocket-fueled initial public offering last week.
  • But Snap’s success or failure isn’t going to be determined this week or even this year. This is a company that’s betting on a long-term trend: the rise and eventual global dominance of visual culture.
  • Snap calls itself a camera company. That’s a bit cute, considering that it only just released an actual camera, the Spectacles sunglasses, late last year. Snap will probably build other kinds of cameras, including potentially a drone.
  • But it’s best to take Snap’s camera company claim seriously, not literally. Snap does not necessarily mean that its primary business will be selling a bunch of camera hardware. It’s not going to turn into Nikon, Polaroid or GoPro. Instead it’s hit on something deeper and more important. Through both its hardware and software, Snap wants to enable the cultural supremacy of the camera, to make it at least as important to our daily lives as the keyboard.  ** profound point – camera / visual media as important as keyboard in communicating with other agents people and machines **


arXiv:1605.07736v2 [cs.LG] 31 Oct 2016

29th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain


Learning Multiagent Communication
with Backpropagation
Sainbayar Sukhbaatar
Dept. of Computer Science
Courant Institute, New York University
Arthur Szlam
Facebook AI Research
New York
Rob Fergus
Facebook AI Research
New York

  • Many tasks in AI require the collaboration of multiple agents. Typically, the

communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses continuous communication for fully cooperative tasks. The model consists of multiple agents and the communication between them is learned alongside their policy. We apply this model to a diverse set of tasks, demonstrating the ability of the agents to learn to communicate amongst themselves, yielding improved performance over non-communicative agents and baselines. In some cases, it is possible to interpret the language devised by the agents, revealing simple but effective strategies for solving the task at hand.




We are always interested in new information on the neuro-physiology that might provide guidance in our engineering endeavors:


Dynamics of cortical dendritic membrane potential and spikes in freely behaving rats
Jason J. Moore,1,2* Pascal M. Ravassard,1,3 David Ho,1,2 Lavanya Acharya,1,4 Ashley L. Kees,1,2 Cliff Vuong,1,3 Mayank R. Mehta1,2,3,5


  • Neural activity in vivo is primarily measured using extracellular somatic spikes, which provide limited information about neural computation. Hence, it is necessary to record from neuronal dendrites, which generate dendritic action potentials (DAP) and profoundly influence neural computation and plasticity. We measured neocortical sub- and suprathreshold dendritic membrane potential (DMP) from putative distal-most dendrites using tetrodes in freely behaving rats over multiple days with a high degree of stability and sub-millisecond temporal resolution. DAP firing rates were several fold larger than somatic rates. DAP rates were modulated by subthreshold DMP fluctuations which were far larger than DAP amplitude, indicting hybrid, analog-digital coding in the dendrites. Parietal DAP and DMP exhibited egocentric spatial maps comparable to pyramidal neurons. These results have important implications for neural coding and plasticity.news summary (45)
Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: