Home > Uncategorized > Weekly QuEST Discussion Topics and News, 9 Dec

Weekly QuEST Discussion Topics and News, 9 Dec

QuEST 9 Dec 2016:

This time of year we traditionally review all the topics from the year – in an attempt to capture the big lessons to incorporate them into the Kabrisky lecture – the first QuEST meeting of any calendar year (will be 6 jan 2017) is the Kabrisky memorial lecture where we capture the current answer to ‘what is quest?’ – in honor of our late esteemed colleague Prof Matthew Kabrisky.

But this year we also have the task before the end of the calendar year to capture the answers to the questions that would lead to a ‘funded’ effort (either inside or outside the government) to build a QuEST agent (a conscious computer).

  • I will be formulating the ‘pitch’ before the end of the year – the pitch has to answer:

–     What is it we suggest?

–     How will we do what we suggest?

–      Why is it we could do this now? 

–     Why we are the right people to do it – what in our approach that is new/different?

–     What will be the result – if successful what will be different? 

–     How long will it take and what will it cost? 

–     What are our mid-term and final exams that will tell us/others we are proceeding successfully?

On the second topic first – this week I will continue giving my current ‘what’ answer and a first cut at the how/why now answer for the QuEST pitch.  The ‘what’ answer is wrapped around the idea of making a conscious computer (one that is emotionally intelligent and can increase the emotional intelligence of its human partners) as that is the key to group intelligence.  Last week I attempted to capture the ‘what’/’how’ but focused on the QuEST agent having a dual process and thus generating emotional intelligence with respect to its subconscious calculations.  This week we will focus on how to make the QuEST agent have emotional intelligence with respect to the subconscious calculations of the human partner that is attempting to make higher quality decisions.

The ‘how’ last week was wrapped around generating a gist of the representation that the machine agent generates for example in deep learning agents in a given application area – the idea being that deep learning (in fact all big data approaches) extracts and memorizes at far too high a resolution to be able to robustly respond to irrelevant variations in the stimuli – therefore we posit that via unsupervised processing of that representation used to do the output classification we will generate ‘gists’.  The idea is to use the ‘gists’ of those representation vectors to provide a lower bit view of what is necessary to get an acceptable accuracy.  My idea for the How is that new ‘gists’ vocabulary ~ qualia can be used as a vocabulary for a simulation (either GANs or RL based) to complement the higher resolution current deep learning answers.  Then the challenge will be to appropriately blend the two.  An alternative to blending is to use the qualia in a single pass cognitive system but where the bottom up data evoked activations are replaced by some set of the ‘imagined’ qualia.

Let’s assume instead of just using the action / behavior of the human we take some text input (via speech or typed).  So to be clear the human consumes the output of the machine learning solution in some space like a recommender system.  The human now does something.  For example the human buys / watches / reads / or clicks to another option.  Most machine learning recommender systems use this action and attempt to find correlations in the actions and thus capture a model of user responses that can be used later.  Now instead of just monitoring what the human did in response to the recommendation we also gather some information about how they felt via analysis of the text (either typed or spoken to the system or of course if available any human state sensing means).  Now we have a set of words / measurements that we can use to extract a set of emotional states to use in a model of the human that can be used by the QuEST agent.

I look forward to the discussion on this view of the what and the how.  The why now part of the how is centered around the spectacular recent breakthroughs in deep learning.  There are many applications – anytime a human uses a recommendation from a computer and needs to understand that recommendation so it can be used appropriately AND the recommender can be improved by a better understanding of the how the human felt about the prior recommendations.

So that leads to some definable steps – what to do and how specifically to proceed and how much will it cost and how long will it take – we have those in hand now – been a great week!  Next week Scott and I will be discussing this in NY with potential collaborators.  More to come on that – but for now assume there will NOT be a QuEST meeting on the 16th of Dec.

On the second topic, reviewing the material we covered this calendar year that should be considered for inclusion into the Kabrisky lecture series, I will briefly remind everyone of the major topics we hit early this calendar year that maybe need to be included in our Kabrisky lecture.

In March we hit the dynamic memory networks of MetaMind:


Taking Baby Steps Toward Software That Reasons Like Humans



Richard Socher, founder and chief executive of MetaMind, a start-up developing artificial intelligence software. Credit Jim Wilson/The New York Times

Richard Socher appeared nervous as he waited for his artificial intelligence program to answer a simple question: “Is the tennis player wearing a cap?”

The word “processing” lingered on his laptop’s display for what felt like an eternity. Then the program offered the answer a human might have given instantly: “Yes.”

Mr. Socher, who clenched his fist to celebrate his small victory, is the founder of one of a torrent of Silicon Valley start-ups intent on pushing variations of a new generation of pattern recognition software, which, when combined with increasingly vast sets of data, is revitalizing the field of artificial intelligence.

His company MetaMind, which is in crowded offices just off the Stanford University campus in Palo Alto, Calif., was founded in 2014 with $8 million in financial backing from Marc Benioff, chief executive of the business software company Salesforce, and the venture capitalist Vinod Khosla.

MetaMind is now focusing on one of the most daunting challenges facing A.I. software.Computers are already on their way to identifying objects in digital images or converting sounds uttered by human voices into natural language. But the field of artificial intelligence has largely stumbled in giving computers the ability to reason in ways that mimic human thought.

Now a variety of machine intelligence software approaches known as “deep learning” or “deep neural nets” are taking baby steps toward solving problems like a human.

On Sunday, MetaMind published a paper describing advances its researchers have made in creating software capable of answering questions about the contents of both textual documents and digital images.

The new research is intriguing because it indicates that steady progress is being made toward “conversational” agents that can interact with humans. The MetaMind results also underscore how far researchers have to go to match human capabilities.

Other groups have previously made progress on discrete problems, but generalized systems that approach human levels of understanding and reasoning have not been developed.

Five years ago, IBM’s Watson system demonstrated that it was possible to outperform humans on “Jeopardy!”

Last year, Microsoft developed a “chatbot” program known as Xiaoice (pronounced Shao-ice) that is designed to engage humans in extended conversation on a diverse set of general topics.

To add to Xiaoice’s ability to offer realistic replies, the company developed a huge library of human question-and-answer interactions mined from social media sites in China. This made it possible for the program to respond convincingly to typed questions or statements from users.

In 2014, computer scientists at Google, Stanford and other research groups made significant advances in what is described as “scene understanding,” the ability to understand and describe a scene or picture in natural language, by combining the output of different types of deep neural net programs.

These programs were trained on images that humans had previously described. The approach made it possible for the software to examine a new image and describe it with a natural-language sentence.

While even machine vision is not yet a solved problem, steady, if incremental, progress continues to be made by start-ups like Mr. Socher’s; giant technology companies such as Facebook, Microsoft and Google; and dozens of research groups.

In their recent paper, the MetaMind researchers argue that the company’s approach, known as a dynamic memory network, holds out the possibility of simultaneously processing inputs including sound, sight and text. ** fusion **

The design of MetaMind software is evidence that neural network software technologies are becoming more sophisticated, in this case by adding the ability both to remember a sequence of statements and to focus on portions of an image. For example, a question like “What is the pattern on the cat’s fur on its tail?” might yield the answer “stripes” and show that the program had focused only on the cat’s tail to arrive at its answer.

“Another step toward really understanding images is, are you actually able to answer questions that have a right or wrong answer?” Mr. Socher said.

MetaMind is using the technology for commercial applications like automated customer support, he said. For example, insurance companies have asked if the MetaMind technology could respond to an email with an attached photo — perhaps of damage to a car or other property — he said.

There are two papers that we will use for the technical detail:

Ask Me Anything: Dynamic Memory Networks
for Natural Language Processing:

  • Most tasks in natural language processing can be cast into question answering (QA) problems over language input.  ** way we cast QuEST  Query response**
  • We introduce the dynamic memory network (DMN), a unified neural network framework which processes input sequences and questions, forms semantic and episodic memories, and generates relevant answers.
  • The DMN can be trained end-to-end and obtains state of the art results on several types of tasks and datasets:
  • question answering (Facebook’s bAbI dataset),
  • sequence modeling for part of speech tagging (WSJ-PTB),
  • and text classification for sentiment analysis (Stanford Sentiment Treebank).
  • The model relies exclusively on trained word vector representations and requires no string matching or manually engineered features.


The second paper:

Dynamic Memory Networks for Visual and Textual Question Answering
Xiong, Merity, Socher – arXiv:1603.01417v1 [cs.NE] 4 Mar 2016

  • Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering.
  • One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks.

–     However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images.

–     Based on an analysis of the DMN, we propose several improvements to its memory and input modules.

–     Together with these changes we introduce a novel input module for images in order to be able to answer visual questions.

–     Our new DMN+ model improves the state of the art on both the

  • Visual Question Answering dataset and
  • the bAbI-10k text question-answering dataset without supporting fact supervision.



The topic this week is a discussion about the unexpected query – specifically ‘zero-shot learning’.  We will use an article by Socher / Manning / Ng from NIPS 2013:

Zero-Shot Learning Through Cross-Modal Transfer
Richard Socher, Milind Ganjoo, Christopher D. Manning, Andrew Y. Ng

  • This work introduces a model that can recognize objects in images even if no training data is available for the object class.
  • The only necessary knowledge about unseen visual categories comes from unsupervised text corpora.

Related to question of the unexpected query – but unexpected with respect to the image classification system – not to the word / text processing system – so a sort of  transfer learning issue – transfer between systems

  • Unlike previous zero-shot learning models, which can only differentiate between unseen classes, our model can operate on a mixture of seen and unseen classes, simultaneously obtaining state of the art performance on classes with thousands of training images and reasonable performance on unseen classes.
  • This is achieved by seeing the distributions of words in texts as a semantic space for understanding what objects look like.
  • Our deep learning model does not require any manually defined semantic or visual features for either words or images.
  • Images are mapped to be close to semantic word vectors corresponding to their classes, and the resulting image embeddings can be used to distinguish whether an image is of a seen or unseen class.
  • We then use novelty detection methods to differentiate unseen classes from seen classes.
  • We demonstrate two novelty detection strategies;
  • the first gives high accuracy on unseen classes,
  • while the second is conservative in its prediction of novelty and keeps the seen classes’ accuracy high.

Then there was our diving into the generative / adversarial networks:

Alec Radford & Luke Metz
indico Research
Boston, MA
Soumith Chintala
Facebook AI Research

  • In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications.
  • Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning.
  • We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning.
  • Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator.
  • Additionally, we use the learned features for novel tasks – demonstrating their applicability as general image representations.

So again the QuEST interest here is Imagine we use the generative model and use the data not just the weights to generate the data – that is imagine that our previous idea of a conscious system that is separate from the subconscious system is wrong – imagine one system – but with processes that populate the sensory BU paths being what we call conscious and subconscious –

Imagine that even as early as the visual cortex that much of the content is inferred and not measured by the visual sensing (eyes) – this seems to me to be testable – by electrode studies confirm/refute the idea that much of what is present even early in the visual chain of processing is inferred versus captured by the eyes – this could account for the 10:1 feedback versus feedforward connections –

Here is the implication – we take Bernard’s generative models – and have them generate additional information (competing with the bottom up sensory data for populating the agent’s world model) – and then the winning populated solution gets processed by a bottom up deep learning experienced based solution –


Note ‘blending’ is now only the competition of the top down imagined information and the bottom up sensory data – but the cognition is all in the bottom up processing of the resulting world model


In May we hit:

One topic I want to remind people – I’m extremely interested in applying QuEST ideas to social and medical issues – specifically what to do about inner city violence and how to do predictive intelligence (for predicting shock onset) – one article I will post for potential discussion is:

Am J Community Psychol (2009) 44:273–286

DOI 10.1007/s10464-009-9268-2

Researching a Local Heroin Market as a Complex Adaptive


Lee D. Hoffer • Georgiy Bobashev •

Robert J. Morris

Abstract This project applies agent-based modeling (ABM) techniques to better understand the operation, organization, and structure of a local heroin market. The simulation detailed was developed using data from an 18- month ethnographic case study. The original research, collected in Denver, CO during the 1990s, represents the historic account of users and dealers who operated in the Larimer area heroin market. Working together, the authors studied the behaviors of customers, private dealers, streetsellers, brokers, and the police, reflecting the core elements pertaining to how the market operated. After evaluating the logical consistency between the data and agent behaviors, simulations scaled-up interactions to observe their aggregated outcomes. While the concept and findings from this study remain experimental, these methods represent a novel way in which to understand illicit drug markets and the dynamic adaptations and outcomes they generate. Extensions of this research perspective, as well as its strengths and limitations, are discussed.

And also the work of our colleague Sandy V:

A Novel Machine Learning Classifier Based on a Qualia Modeling Agent (QMA)


This dissertation addresses a problem found in standard machine learning (ML) supervised classifiers, that the target variable, i.e., the variable a classifier predicts, has to be identified before training begins and cannot change during training and testing. This research develops a computational agent, which overcomes this problem.


The Qualia Modeling Agent (QMA) is modeled after two cognitive theories:

Stanovich’s tripartite framework, which proposes learning results from interactions between conscious and unconscious processes; and, the Integrated Information Theory (IIT) of Consciousness, which proposes that the fundamental structural elements of consciousness are qualia.


By modeling the informational relationships of qualia, the QMA allows for retaining and reasoning-over data sets in a non-ontological, non-hierarchical qualia space (QS). This novel computational approach supports concept drift, by allowing the target variable to change ad infinitum without re-training, resulting in a novel Transfer Learning (TL) methodology, while achieving classification accuracy comparable to or greater than benchmark classifiers. Additionally, the research produced a functioning

model of Stanovich’s framework, and a computationally tractable working solution for a representation of qualia, which when exposed to new examples, is able to match the causal structure and generate new inferences.


Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: